Sustaining University English as a Foreign Language Learners’ Writing Performance through Provision of Comprehensive Written Corrective Feedback

: Writing is regarded as a crucial skill in English language curricula at the secondary and tertiary levels in the Chinese education system. Currently, Chinese teachers of English as a foreign language (EFL) often adopt a product approach to teaching EFL writing, in which they emphasize the quality of their students’ written products and show little concern with the writing process. To help L2 learners achieve sustainable development of their writing proficiency, teachers employ a comprehensive approach to correct their students’ language errors as a common practice. However, empirical studies regarding its efficacy on different dimensions of L2 writing are insufficient. This study intended to fill this lacuna in a Chinese EFL context, which investigated the effects of sustained comprehensive written corrective feedback (WCF) on accuracy, complexity, fluency, and content and organization quality of EFL students’ writing. Quasi-experimental in design, it involved a comparison group and a treatment group receiving four sessions of direct comprehensive WCF. Results show that such WCF contributed to writing accuracy and fluency over time. Our textual analysis further reveals that it particularly benefited students’ grammatical accuracy, reducing some rule-based grammatical error types. However, it showed limited effects on complexity, content, or organization of students’ writing. Interestingly, the comparison group did not improve any dimensions of their writing. Possible implications are also discussed.


Introduction
As an indispensable practice in writing instruction to sustain L2 learners' writing performance, feedback is widely utilized by L2 writing teachers to inform students of their writing problems and weaknesses so that students can improve their writing performance in both local (language) and global (content and organization) aspects [1][2][3]. Therefore, what invariably happens in L2 writing classrooms is that teachers spare no effort to provide students with feedback on a variety of errors, particularly errors in language use (i.e., written corrective feedback, or WCF for short) [4][5][6].
In the existing literature, there is a spirited discussion among researchers on the efficacy of WCF, which was triggered by Truscott [7]. After the synthesis of some early studies, he vehemently repudiated the practice of WCF, claiming that it was not only ineffective but also harmful for L2 writing [8,9]. However, a number of researchers refuted his claim, documenting that WCF can contribute to L2 writing accuracy in both revised and new texts [10][11][12][13][14][15]. Such a controversy reflects the complexity of WCF in the sphere of L2 writing. More significantly, what adds the complexity to this issue is the extent to which WCF should be provided in order to maximize its effectiveness. Some researchers have advocated that a focused (selective) feedback approach is supposed to be adopted. That is, teachers only focus on one or a few pre-selected types of errors and leave others uncorrected [14,[16][17][18]. Other scholars, in contrast, have cast doubts on the practicability of this approach in authentic L2 writing contexts [19][20][21][22][23]. From their perspective, unfocused (comprehensive) written feedback (i.e., the correction of a wide array of linguistic errors) should deserve a place in writing instruction. Apparently, a controversial issue among WCF researchers still remains, concerning the scope of written feedback that should be provided by teachers.
Despite the debate over focused and comprehensive WCF, the latter has become a ubiquitous practice in L2 writing teaching [5,[24][25][26]. While recent years have witnessed the proliferation of studies on focused WCF, studies that take comprehensive WCF as the research focus are limited [13,25,27,28]. Considering that studies on comprehensive WCF are relatively scarce and it has great relevance to real L2 writing classrooms, we report a quasi-experimental study, which concentrated on and tracked what effects of comprehensive WCF had on Chinese EFL learners' writing performance. In addition, different from the most existing WCF studies that have used a single or limited measures to evaluate students' writing production, our study employed multiple measures (i.e., accuracy, linguistic complexity, fluency, content and organization quality) due to the possible tradeoff among them (see details in Section 2.3), which could advance the current knowledge in this area. Pedagogically, our study would enable WCF practitioners to reflect on their current WCF practices and optimize their WCF practices or initiate some pedagogical innovations to help students make progress in different dimensions of L2 writing, which is regarded as a sustainable goal in L2 writing pedagogy.
Since this study was conducted in the Chinese tertiary EFL context, we offer some important background information on EFL education in Chinese universities. At the tertiary level, EFL teaching in mainland China follows the national curriculum requirements [6,29], which are mandated by the Ministry of Education of China [3]. Upon admission to universities, all students are required to learn English as a foreign language for at least two academic years. In general, their teachers teach them English with a grammar translation approach, whereby linguistic accuracy is emphasized [30]. In addition, Chinese tertiary EFL learners also have to sit for some high-stakes examinations to demonstrate that they are qualified EFL learners. These examinations include the College English Test, Bands 4 and 6 (CET-4 and CET-6), and the Test for English Majors, Bands 4 and 8 (TEM-4 and TEM-8). Overall, English examinations are of great importance for tertiary EFL teaching and learning in China. In other words, EFL education in China is, to some degree, examination-oriented [31].

Is WCF Effective in L2 Writing?
As a typical representative calling the value of WCF into question, Truscott [7][8][9] published a series of articles to argue that WCF does not play a role in L2 writing. He claimed that WCF not only does not show any effects on writing accuracy but also may produce simplified and short writing, which makes linguistic complexity and fluency suffer. Thus, teachers should abandon this pedagogical practice in writing instruction. However, his claim was rejected by a great many WCF researchers, who have lent a plethora of empirical evidence to demonstrate that WCF is of great value in L2 writing. Specifically, not only can WCF contribute to students' text revision [13,32,33], but also has positive impacts on new pieces of writing [34][35][36]. Favorable effects of WCF on new pieces of writing illustrate that its effects go beyond the revision stage. In the process of back-and-forth debate over its effectiveness, WCF has become a significant arena in L2 writing for robust investigations [25].
Scholars are in favor of the view that WCF still needs to be emphasized and employed in the learning and teaching of L2 writing [37][38][39]. In order to maximize the efficacy of WCF, researchers have paid much attention to the strategies to deliver WCF (direct or indirect WCF) as well as the scope of WCF to be offered (focused or unfocused WCF). Since this study is concerned with the effects of comprehensive WCF, the studies relevant to focused and unfocused WCF are reviewed next.

Focused or Unfocused WCF: Which One Is More Effective?
In the extant WCF literature, a divergence exists between focused (selective) feedback and unfocused (comprehensive) feedback. The focused-unfocused dichotomy hinges on the "the comprehensiveness of correction" (p. 11) [40]. Currently, a great many scholars in WCF show preference for focused WCF over comprehensive WCF and they have presented theoretical as well as empirical justifications for their preference. Theoretically, targeting only a few linguistic structures is friendlier to L2 learners, especially those underachieving ones, for it can avoid their cognitive overload and enable them to have more attentional resources to process new input [27,36]. Empirically, the researchers in this line have testified the effectiveness of focused WCF on L2 writing performance [10,14,35,41,42]. These studies revealed that the recipients of focused WCF outperformed those who did not receive WCF in terms of writing accuracy in the target structure(s), verifying the value and usefulness of focused WCF.
Although WCF researchers provided extensive empirical evidence to shed light on the effectiveness of focused WCF, some scholars voiced their apprehension towards this practice due to its lack of ecological validity [13,37,43,44]. The existing literature on focused WCF tends to target one or two linguistic features. From a practical perspective, L2 learners tend to commit a variety of linguistic errors in writing, and the ultimate goal of teachers' providing WCF is to improve the overall writing accuracy rather than accuracy in specific linguistic categories [37,45]. Thus, targeting only one or a few linguistic errors is far from sufficient. To generate direct pedagogical implications, the research on WCF should better reflect what happens in authentic L2 writing classrooms [13,46]. In view of this, comprehensive WCF, which corresponds to the actual WCF practices in classroom settings, may deserve due attention in scholarship on writing [40,44,47].
Although several studies have touched upon comprehensive WCF, with a research design that had one treatment group receiving comprehensive WCF, to be compared against other treatment/comparison groups [11,18,48], they were not systematically executed. The reason is that the research focus of these studies was not whether comprehensive WCF affected the general accuracy in L2 learners' writing, with which writing teachers are concerned. As aforementioned, the recent literature that exclusively focuses on the effects of comprehensive WCF is much limited in number when compared with studies on focused WCF, and consequently, inconsistent research findings are documented [13,[20][21][22]49]. For instance, Truscott and Hsu [49] inquired into the effects of comprehensive WCF with 47 upper-intermediate ESL learners as participants. They found that comprehensive WCF had favorable effects on the general accuracy in revision, but it failed to transfer such effects to new pieces of writing. Similarly, Karim and Nassaji [13] found that comprehensive WCF enabled L2 learners to improve their accuracy in revised writing, but it did not yield significantly beneficial effects on general accuracy in new texts. Such results indicate that comprehensive WCF is just an effective editing tool. By contrast, Van Beuningen, De Jong, and Kuiken [21,22] examined comprehensive WCF in Dutch as a foreign language (DFL) settings. They reported that such a practice not only contributed to students' accuracy in revised writing, but also enabled them to improve their performance in accuracy in new pieces of writing. Their results suggest that comprehensive WCF is both a valuable editing and learning instrument. The mixed results regarding the effects of comprehensive WCF on new texts may be attributed to various factors such as the complexity of WCF, research settings, participants' linguistic proficiency, and the inconsistent error categories that WCF targeted [13,50].

Trade-off Hypothesis
As a well-known theory in the realm of research on task complexity, Skehan's [51,52] Trade-off Hypothesis, which aligns with the theories of working memory, informs the task design and implementation, as well as makes predictions for the effects of task manipulation on L2 production [53]. This hypothesis, operationalized in the Limited Attention Model, postulates that L2 learners have limited attentional resources in task execution, so they can only attend to one aspect of language (i.e., complexity, accuracy, and fluency, CAF for short) at a given time in performing a task [51,52]. In this sense, it is likely that they improve one area of language production (CAF) at the expense of others. That is, there may exist a competition among the three features of L2 production.
From the perspective of the Trade-off Hypothesis, it could be predicted that WCF makes L2 learners pay much attention to the accuracy of linguistic forms and they probably have few extra attentional resources available to process linguistic complexity and increase linguistic fluency, which may hinder their development in these two dimensions. In other words, the improvement in accuracy rendered by WCF may compromise the linguistic complexity and fluency [15]. Such a prediction is in alignment with Truscott's [8] claim that in order to enhance accuracy, WCF recipients would shorten and simplify their writing. To date, a few WCF studies have assessed the accuracy along with complexity and/or fluency. Of the limited studies, this prediction was supported by Kepner [54] and Semke [55]. However, it was contradicted by others [20,22,56,57]. For example, Rahimi [57] found that students who were under the treatment of error correction improved the syntactic complexity, as well. Regarding fluency, it was revealed by Chandler's [56] study that WCF did not affect the writing fluency negatively. Instead, such a practice benefited its development.
Aside from the trade-off among CAF, Skehan [51,52] also posits that there is a competition between language and high-order dimensions in relation to L2 production. Like the above prediction on CAF, it can be hypothesized that WCF directs students' attention to language and results in the corresponding enhancement, which may induce little attention to high-order dimensions of writing (e.g., content and organization), imposing a harmful impact on these aspects. At present, not much is known about the effects of WCF on the content and organization of the text that students generate [58]. Two studies, to our knowledge, have attempted to address this issue, and the researchers found that WCF did not show any effects on the rhetoric appropriateness in writing although it contributed to the accuracy significantly [20,46]. Interestingly, the findings are not supportive of the trade-off effect between language and high-order dimensions.
In general, the current body of literature paints an incomplete picture about WCF effects [20,57,58]. Specifically, the prior studies mainly concentrate on whether such a methodology impacts accuracy but pay little attention to its effects on other aspects of writing. According to the Trade-off Hypothesis, the gains in accuracy probably come at the cost of other dimensions in relation to writing production. Thus, in order to contextualize the writing accuracy, other important indexes related to writing performance such as complexity, fluency, content, and organization need to be considered in WCF research [15,46].
To sum up, our review suggests that there is a need to further investigate comprehensive WCF. Firstly, in comparison with focused WCF, the recent studies focusing on comprehensive WCF are less common and have produced conflicting results concerning its effects on new pieces of writing [14,59]. Therefore, more studies are warranted to clarify the effects. Another concern is the length of WCF treatment. Most existing WCF studies tend to employ a one-shot intervention to explore its efficacy. Although such a design is relatively easy to implement, it is rather questionable [13,43]. Specifically, students need exposure to extensive WCF to enhance their writing proficiency [44], and the development of writing skills is not a linear but recursive process. In this sense, it would be unrealistic to expect L2 learners to improve their writing performance and maintain such improve-ments over a short period of time, let alone a one-off WCF [43]. Thus, multiple WCF treatment sessions are called for to promote the ecological value and have a better understanding of the effects of sustained WCF. Lastly, as noted earlier, the majority of prior studies adopt accuracy as the one single measure to assess WCF effects [60]. This practice in research fails to tell the whole story, as its effects on other aspects of writing performance should be examined due to the potential trade-off.
With the above considerations in mind, this study set out to address the following research questions with an ameliorated research design: (1) What are the effects of comprehensive WCF on Chinese EFL learners' writing complexity, accuracy, and fluency? (2) What are the effects of comprehensive WCF on Chinese EFL learners' content and organization quality?

Context and Participants
This study was conducted at a medium-ranked university in mainland China. Two parallel intact classes of English major sophomores (N = 72) were recruited on the basis of a convenience sampling strategy [61]. At this university, these students were required to enroll in an English Writing Course, which was offered once a week (two 45-minute class periods) in a 16-week semester. Such a course was designed to develop students' knowledge in English writing and foster their writing competence in some basic genres (i.e., narrative, argumentation, and exposition). The course for these two classes was taught by the same teacher, who earned her master's degree in applied linguistics and had eight years of EFL writing teaching experience. The two classes were randomly assigned to either a treatment group (N = 36) or a comparison group (N = 36). The two groups used the same university syllabus and attended the same writing activities both in and after class.
Before the intervention, all the participants completed a demographic questionnaire. In total, there were nine males (12.50%) and 63 females (87.50%) and their ages ranged from 19 to 21 with, on average, 10 years of English learning experience. In addition, all the participants learned English in mainland China and had no experience of studying abroad. In a word, the participants in the two groups were comparable in terms of age, educational background, and English learning experience. According to their examination scores in the last semester and the discussion with the course teacher, we came to a decision that these participants should be viewed as intermediate EFL learners.

Data Collection
This quasi-experimental study was over a span of nine weeks, consisting of three testing sessions and four rounds of treatment sessions. The detailed procedures of how the testing and treatment sessions were conducted are presented below (see Table 1). Delayed posttest Delayed posttest

Writing Tests
In our study, we chose argumentation as the genre for the writing tests. Argumentative writing is recognized as a reliable and popular instrument to evaluate L2 learners' writing proficiency in academic contexts [31]. As such, Chinese tertiary EFL learners tend to be required to complete argumentative writing in various well-established English proficiency examinations, including IELTS, TOEFL, College English Test Band 4/6 (CET-4/6), and Tests for English Majors Band 4/8 (TEM-4/8).
The participants in the two groups were invited to take a pretest, a posttest, and a delayed posttest, which were given prior to as well as immediately after the treatment and three weeks after the posttest. These tests were used to examine the effects of comprehensive WCF on the different aspects of Chinese EFL learners' writing performance. Three different topics for the three tests were selected from the past TEM-4 test battery (see Appendix A). Such a decision was made based on two rationales. Firstly, teachers' instructional practices are always examination-driven in Chinese tertiary EFL writing classrooms [3], so their students tend to write arguments based on the writing topics from the past TEM-4 papers. More importantly, since TEM-4 is a large-scale standardized test in mainland China, its writing topics are of high reliability and validity. To be specific, they are drawn from general education and students' everyday life experiences, so they are considered familiar and fair to each student [4], which ensures the difficulty of such topics to be largely consistent.
All the participants were required to complete each of the writing tasks within 40 min and the length of writing was expected to be no less than 200 words, as stipulated by TEM-4. During the testing sessions, they did not have access to any external resources such as dictionaries and textbooks.

Intervention
In the treatment group, participants received four rounds of WCF targeting all errors (e.g., articles, verbs, pronouns, singular/plural forms, part of speech, sentence structures, and word choices, among others). When conducting research on WCF, we should take the type(s) of WCF into consideration. In general, there are two types of WCF: direct WCF and indirect WCF. Specifically, the former refers to teachers' providing students with correct forms, while indirect WCF is realized by only indicating errors without offering students correct forms. When it comes to our study, we provided students with direct feedback (see following examples). This practice was rationalized by several reasons. First, in comparison with indirect WCF, direct WCF provides students with input, which enables them to avoid misunderstanding and confusion while coping with WCF, internalize the correct forms immediately, and have explicit information to test the hypotheses that they have made on target language [36,40,56]. Second, considering that the participants in our study were intermediate EFL learners instead of advanced ones, direct WCF was probably more appropriate for them. As suggested by scholars [62,63], indirect WCF is better suited to students with high language proficiency. Finally, whereas the relative effectiveness of direct and indirect WCF is inconclusive, Kang and Han's [50] meta-analysis showed that direct WCF is more effective than indirect WCF with regard to facilitating linguistic accuracy (g = 0.60 vs. 0.33). After each round of feedback provision, the students received the first draft of their essays with WCF. The students were given 10 min to study the provided WCF. After they had looked over their feedback, their writing samples were collected and they were allocated another 30 min to rewrite without having access to WCF. We followed many previous studies which involved direct WCF [13,47] to administer such a feedback procedure and its purpose was to avoid students' shallow processing of WCF (i.e., just copying corrections) because of direct WCF and make students engage with WCF more deeply [64]. In the process of rewriting, they were not provided with any external assistance.
The participants in the comparison group did not receive any feedback and they were just required to complete their writing based on the same given writing prompts. After finishing the writing tasks, they were encouraged to self-correct their written scripts. In order to avoid disadvantaging the participants in the comparison group, we provided them with the WCF after the intervention.

Data Analysis
In this intervention, the dependent variables included syntactic and lexical complexity, accuracy and fluency; content and organization quality, which were assessed by different measures. To explore the effects of comprehensive WCF on syntactic complexity, two measures were adopted: mean length of T-units and ratio of clauses per T-unit, two widely-used measures for overall syntactic complexity. For lexical complexity, we also used two measures. One was lexical density, i.e., the proportion of content words (e.g., verbs, nouns, adjectives and adverbs) in the total number of words. The other was mean segmental (50 words as a segment) type-token ratio. Syntactic and lexical complexity were processed by the L2 Syntactic Complexity Analyzer (L2SCA) and Lexical Complexity Analyzer (LCA), two web-based programs [65,66]. As for fluency, it was appraised by the total number of words students produced within 40 min.
Accuracy was evaluated by the number of errors per 100 words. In this study, errors were defined as any grammatical, lexical, and morphological inaccuracies in students' writing [67]. However, errors in relation to spelling and punctuation were not included, which was based on two considerations. For one thing, such an exclusion was to avoid the possible over-estimation of errors due to the illegible handwriting; for another, our browsing of participants' writing showed that they made few errors in spelling and punctuation. In this study, we adopted Geng's [68] guidelines to code and count the errors. To have a nuanced understanding about the effects of comprehensive WCF on writing accuracy, we further analyzed which error types were amenable to comprehensive WCF, as recommended by the prior studies [27,59,63]. Initially, we conducted a broad classification of the errors (i.e., grammatical vs. lexical errors) in the pretest, posttest, and delayed posttest according to Van Beuningen et al.'s [22] coding scheme. Next, we implemented a narrow coding to examine the specific error types (e.g., article, subject-verb agreement, tense, and plural/singular form, among others). The narrow coding was partly based on Chandler [56]and partly data-driven.
The content and organization quality were scored based on Jacobs et al.'s [69] ESL Writing Profile, a well-established and widely-used L2 writing rubric, by which we could determine whether WCF would impact the high-order dimensions of writing. In this writing rating scheme, the scores in the content and organization ranged from 13 to 30 points and from 7 to 20 points, respectively.
In order to maintain the reliability of coding linguistic errors, a co-coder who was a Chinese tertiary EFL teacher with a master's degree in applied linguistics was invited. After the first author completed the coding work, 32 students' texts (approximately 15% of all the writing samples) were selected randomly and re-coded by her. The inter-coder reliability was calculated by intraclass correlation coefficient (ICC). The result showed that the inter-coder reliability for coding was 0.83, which exceeded the acceptable level of 0.7 [70]. The same approach was applied to ascertain the inter-rater reliability for content and organization quality. The inter-rater reliability for these two aspects turned out to be 0.90 and 0.89, respectively.
Prior to inferential statistical analyses, the normality of each dependent variable was firstly scrutinized. In this study, the data were assumed to be normal distributed when the z-scores of skewness and kurtosis are less than 1.96 [71]. Since the data of our study met the normal distribution requirement, ANOVAs and t-tests were employed.
As for the calculation of the effect size, Cohen's d was used for t-tests and partial η 2 for ANOVAs. According to Cohen's [72] criteria, d values of 0.20, 0.50, and 0.80 and partial η 2 values of 0.01, 0.06, and 0.14 were considered to be small, medium, and large, respectively. Table 2 lists the descriptive data of means and standard deviations for different measures between the two groups across three tests. In order to guarantee the baseline conditions between the treatment group and the comparison group at the outset of the intervention, independent samples t-tests were run (see Table 3). The tests showed that there were no significant differences with respect to the various indexes at the time of the pretest: Mean length of T-units, p = 0.333; ratio of clauses per T-unit, p = 0.103; lexical density, p = 0.997; mean segmental type-token ratio, p = 0.088; errors per 100 words, p = 0.970; fluency, p = 0.475; content, p = 0.721; organization, p = 0.693. Note: TG = treatment group; CG = comparison group; MLT = mean length of T-units; RCT = ratio of clauses per T-unit; LD = lexical density; MSTTR = mean segmental type-token ratio; EP100W = the number of errors per 100 words; SD = standard deviation.

Effects on Mean Length of T-units (MLT)
As seen in Table 3, there are no significant differences between the two groups in terms of MLT in the pretest (t = 0.977, p = 0.333) and the posttest (t = 0.264, p = 0.793), but the difference is notable in the delayed posttest (p = 0.030) with a medium effect size (d = 0.64).
One-way repeated measures ANOVAs showed that the mean scores of MLT developed differently across time in the treatment group (F(2, 70) = 4.134, p = 0.022, partial η 2 = 0.147), but not in the comparison group (F(2, 70) = 0.267, p = 0.767). To further investigate the within-subjects differences in the treatment group, a series of paired samples t-tests were employed and a Bonferroni correction was used for the three comparisons (p = 0.017).
The paired samples t-tests showed that the treatment group displayed significantly better results in MLT from the posttest to the delayed posttest (p = 0.002, d = 0.72), but no significant improvement was observed from the pretest to the posttest, nor from the pretest to the delayed posttest. Note: MLT = mean length of T-units; RCT = ratio of clauses per T-unit; LD = lexical density; MSTTR = mean segmental type-token ratio; EP100W = the number of errors per 100 words. * p < 0.05; ** p < 0.001. Table 3 reveals that the treatment group and the comparison group had similar performance in terms of RCT in the pretest (t = 1.662, p = 0.103). In the same vein, no statistically significant differences were found in the posttest and the delayed posttest between the two groups (t = 0.216, p = 0.830; t = 0.703, p = 0.486, respectively). One-way repeated measures ANOVAs showed that the means of RCT did not vary significantly over time for the treatment group (F(2, 70) = 2.576, p = 0.087) or the comparison group (F(2, 70) = 1.596, p = 0.218).

Effects on Lexical Density (LD)
As shown in Table 3, the performance of the two groups regarding LD in the pretest and the posttest did not vary greatly (t = 0.003, p = 0.997; t = 0.633, p = 0.530, respectively). However, the treatment group outdid the comparison group in the delayed posttest (p = 0.038) with a medium effect size (d = 0.61). In terms of within-subjects differences, oneway repeated measures ANOVAs demonstrated no significant differences of LD in the treatment group over time (F(2, 70) = 2.011, p = 0.145), but occurred in the comparison group (F(2, 70) = 6.184, p = 0.004, partial η 2 = 0.212). Specifically, the mean scores of LD in the comparison group decreased significantly from the posttest to the delayed posttest (p = 0.006, d = 0.623) and from the pretest to the delayed posttest (p = 0.006, d = 0.624).

Effects on Mean Segmental Type-Token Ratio (MSTTR)
The between-subjects comparisons revealed that the mean scores of MSTTR between the two groups were similar in the pretest and the posttest (t = 1.740, p = 0.088; t = 1.067, p = 0.291, respectively) (see Table 3). However, students in the treatment group outperformed their comparison group peers with regard to MSTTR in the delayed posttest (p = 0.027, d = 0.65).
Regarding the within-subjects differences, one-way repeated measures ANOVAs showed that neither the treatment group nor the comparison group changed the means of MSTTR significantly across tests (F(2, 70) = 1.069, p = 0.352; F(2, 70) = 2.887, p = 0.066, respectively). Table 3 shows that the treatment and comparison groups had similar performance regarding EP100W in the pretest (t = −0.038, p = 0.970). Two-way repeated measures ANO-VAs found that there were significant main effects for time and group (F(2, 140) = 5.787, p = 0.006, partial η 2 = 0.110; F(1, 70) = 34.292, p = 0.000, partial η 2 = 0.422, respectively). The time-group interaction effect was significant as well (F(2, 140) = 15.926, p = 0.000, partial η 2 = 0.253). This indicated that the treatment group and the comparison group differed greatly in their performance in this measure over time. To further explore the betweensubjects differences, another two independent samples t-tests were conducted, as shown in Table 3. Results showed that the two groups varied greatly with regard to EP100W in the posttest (p = 0.000) and the delayed posttest (p = 0.000) with large effect sizes (d = 1.86 and d = 1.73, respectively). This suggested that the treatment group had much fewer errors than the comparison group in both the posttest and the delayed posttest. To establish whether the changes were significant in each group over time, one-way repeated measures ANOVAs were computed. Results indicated that there were significant changes across tests in the treatment group (F(2, 70) = 30.069, p = 0.000, partial η 2 = 0.556), but no significant changes were observed in the comparison group over time (F(2, 70) = 2.707, p = 0.097). The post-hoc analyses revealed that the intervention helped the students in the treatment group reduce the number of errors greatly from the pretest to the posttest (p = 0.000, d = 1.22) and from the pretest to the delayed posttest (p = 0.000, d = 1.37).

Effects on Number of Errors per 100 Words (EP100W)
Apart from the examination of the effects on general writing accuracy, we explored an important question in L2 writing research and instruction: Which error types benefit from comprehensive WCF? In terms of the broad categories (grammatical vs. lexical errors), one-way repeated measures ANOVAs showed that such WCF had few effects on Chinese EFL learners' lexical accuracy (F(2, 70) = 1.732, p = 0.188), but it facilitated their grammatical accuracy significantly across the three tests (337 vs. 239 vs. 223; F(2, 70) = 41.855, p = 0.000, partial η 2 = 0.636). Regarding the specific error types, as Table 4 illustrates, direct comprehensive WCF prompted students to reduce the errors in relation to verb forms, articles, singular/plural forms, and run-on sentences from the pretest to the immediate posttest. More encouragingly, they maintained the positive effects on these error types in the delayed posttest. Overall, these errors were ruled-based grammatical features.

Effects on Fluency
As seen in Table 3, the treatment group and the comparison group wrote a similar number of words in the pretest (t = −0.720, p = 0.475) and the posttest (t = 1.679, p = 0.100). However, their mean difference was great in the delayed posttest (p = 0.016, d = 0.72), suggesting that the treatment group excelled the comparison group in fluency in the delayed posttest.
Results of one-way repeated measures ANOVAs indicated that the variations in fluency were significant in the treatment group across tests (F(2, 70) = 6.930, p = 0.002, partial η 2 = 0.224), but the significant variations did not present themselves in the comparison group (F(2, 70) = 0.037, p = 0.923). The paired samples t-tests revealed that the intervention enabled the treatment group to improve their fluency in language production over time (pretest vs. posttest, p = 0.008, d = 0.57; pretest vs. delayed posttest, p = 0.001, d = 0.79).

Effects on Content Quality
Our statistical analyses (see Table 3) led us to the findings that there were no significant between-subjects differences in the content in the pretest, the posttest, and the de-

Effects on Organization Quality
The between-subjects comparisons presented in Table 3 illustrated that the two groups produced the virtually identical organization quality in each test (t = 0.398, p = 0.693; t = −0.600, p = 0.552; t = 0.517, p = 0.608, respectively).
Regarding the within-subjects differences, one-way repeated measures ANOVAs did not show any significant mean differences in organization quality over time in the treatment group (F(2, 70) = 2.105, p = 0.133) or the comparison group (F(2, 70) = 1.841, p = 0.170).

Effects on Accuracy
According to the results, offering comprehensive WCF not only prompted Chinese EFL learners to improve their general writing accuracy in the immediate posttest, but also retain the gains in the delayed posttest. This refutes Truscott [7,8], whose argument is that the practice of error correction does not benefit L2 writing and teachers should refrain from it. Our results regarding the beneficial effects of comprehensive WCF can also be observed in the previous studies [21,22,47]. Such favorable effects on accuracy in our study are not surprising, which may be attributable to several potential factors. Firstly, the comprehensive WCF in our study was delivered directly. As many researchers claimed [11,35,40], direct WCF provides students with explicit information, which helps them avoid misunderstanding and confusion while attending to WCF, and internalize the correct forms instantly. Due to its explicitness and immediacy, direct WCF consumes less cognitive resources, and it is considered less cognitively demanding compared to indirect WCF. Moreover, such facilitative effects may be associated with the feedback procedure. As noted previously, the participants in our research studied direct comprehensive WCF, but they did not have access to it while revising their writing. This practice probably encouraged students to reflect on and process WCF more profoundly than just copying teachers' direct corrections. With such a practice, what these students learned from comprehensive WCF could be strengthened. In addition, the multiple rounds of WCF treatment sessions possibly account for the results. Specifically, our study provided the students with four rounds of comprehensive WCF treatment, which probably enabled them to notice the errors that they were prone to making. Accordingly, they could avoid these errors in the follow-up writing tasks. However, our results run counter to Truscott and Hsu's [49] study, in which comprehensive WCF did not enhance students' overall linguistic accuracy in new pieces of writing. The mixed results may be ascribed to some variations between their study and our study in methodology such as the number of WCF treatments (one-off treatment vs. multiple rounds of treatments) and writing genres (narratives vs. argumentative essays).
Our investigation extends the previous studies on the usefulness of focused WCF [10,11,18,35,36,41]. Although studies in this line have confirmed the effectiveness of WCF focusing on one or a few structures (e.g., English article system or past tense), little is known concerning whether such a feedback practice could achieve any improvement in overall accuracy [27,62]. The positive effects of comprehensive WCF on accuracy contradict Sheen's [36] and Sheen, Wright, and Moldawa's [18] proposition that unfocused WCF has little learning potential, in that such WCF makes L2 learners so cognitively overwhelmed that they cannot process it effectively.
In addition to general writing accuracy, we examined the error types amenable to comprehensive WCF. Our study revealed that it contributed to Chinese EFL learners' reduction in grammatical errors significantly from the pretest to the delayed posttest, whereas lexical accuracy appeared to not benefit from such a practice. Our result contradicts Truscott's [8] argument that WCF is only effective for non-grammatical errors. Instead, it aligns with what Van Beuningen et al. [22] have found, and this result is quite understandable. Generally speaking, grammatical errors tend to be classified into treatable errors, which can be corrected by certain patterns or rules, while lexical errors into untreatable errors, which are idiosyncratic in nature and cannot be explained by rules [63]. As a result, it is challenging and difficult for students to reduce lexical errors, make progress in lexical accuracy, and maintain such progress [63]. More importantly, our study further revealed that such WCF helped Chinese EFL learners decrease their errors related to verb forms, articles, singular/plural forms, agreement, and run-on sentences over time. This is an important contribution made by our study, which advances the prior literature in this line [13,21,22]. These previous studies only focus on the effects of comprehensive WCF on broad error categories (i.e., grammar vs. nongrammar) and does not take the specific error types into consideration.
As for the participants in the comparison group, those who just rewrote their compositions without being offered any WCF were not able to improve writing accuracy in the immediate posttest or the delayed posttest. This result does not concur with Rahimi's [57] finding that non-WCF receivers could improve accuracy over time. Two plausible reasons may account for the disparity. First, her study lasted four months, which was much longer than this study. Second, in her study, the students in the comparison group still received general feedback comments on grammar, which may arouse their attention to their grammatical problems, while this study did not provide the comparison group with any feedback. The comparison group's failure to improve accuracy supports the assertion that the limited and short-term writing practices without feedback cannot contribute to EFL learners' accuracy in writing [73]. In other words, the sheer engagement in writing practices without extra scaffolding is futile, which parallels the recommendation that L2 learners' writing practices entail external support such as teacher feedback or writing instruction [62].

Effects on Syntactic Complexity
As aforementioned, two measures were employed to gauge the syntactic complexity. As for mean length of T-units, although the treatment group outperformed the comparison group in this index at the time of delayed posttest with a medium effect size, the two groups' mean scores in mean length of T-units remained unaffected from the pretest to the posttest as well as from the pretest to the delayed posttest. Regarding the ratio of clauses per T-units, it failed to vary significantly across groups or tests. This suggests that even though comprehensive WCF generated beneficial effects on accuracy, it appeared to not impact syntactic complexity adversely, regardless of how it was measured. This rebuts Truscott's [7,8] proposal that focusing on accuracy would compromise syntactic complexity, for learners may produce simplified writing to avoid errors.
From a theoretical perspective and in light of the research results regarding accuracy and syntactic complexity, this study does not agree with the Trade-off Hypothesis posited by Skehan [51,52], in which a competition between accuracy and complexity is expected.
However, we found that students did not yield simplified writing alongside the improvement in accuracy. This reveals that there seems to be no trade-off between accuracy and syntactic complexity. In contrast, the results regarding the effects of accuracy and syntactic complexity in our study seem to be in favor of Robinson's Cognition Hypothesis [74,75], which assumes that complexity and accuracy are not in a trade-off relationship since these two dimensions, as two aspects of L2 learners' output, are considered to be connected closely [40,67,76].
Empirically, the finding that comprehensive WCF did not affect students' syntactic complexity negatively or positively coincides with the prior studies [19,22,46]. All these studies found that with the comprehensive WCF treatment, participants' syntactic complexity remained untouched. Two possible reasons might explain the unchanged syntactic complexity in our study. One is that the intervention was not long enough. This could be borne out by the evidence in Rahimi [57]. In her study, after an intervention over a long period of time, the participants receiving WCF treatment improved their syntactic complexity. The other reason may be attributed to the inadequacies of the two measures adopted to appraise students' syntactic complexity development in the present study. To date, a firm conclusion has not yet been drawn that these two measures are good indicators of syntactic complexity development after a short period of time, notwithstanding their popularity in evaluating global syntactic complexity.

Effects on Lexical Complexity
Similar to the effects on syntactic complexity, no significant changes in lexical complexity were observed in the treatment group across tests even though it outperformed the comparison group in lexical density and mean segmental type-token ratio in the delayed posttest.
This finding corresponds to Hartshorn and Evans [46] and Van Beuningen, De Jong, and Kuiken [22], and is likely to be explained by two possible reasons. Firstly, our study explicitly required that students should complete writing tasks which were no less than 200 words within 40 min. In such an operationalization, they might consider lexical complexity comparatively peripheral and were likely to employ familiar words to complete the writing tasks, which resulted in the failure to improve lexical complexity. This indicates that they may sacrifice the lexical complexity for meeting the minimum requirements imposed by this study. More importantly, comprehensive WCF failed to enrich students' limited lexical knowledge. It focuses on and contributes to writing accuracy, but it may not play a role in enriching students' repertoire of vocabularies. Without a good command of lexical resources, it would be taxing for them to improve lexical complexity even if they feel there is a need to use more sophisticated or different types of words [77].
The ineffectiveness of WCF in improving lexical complexity can be viewed in an optimistic lens. This could be a good reason for L2 writing teachers to implement some pedagogical interventions, in which they need to include more sources of input alongside WCF provision. Doing so would afford students opportunities to expand their lexical resources and improve the writing accuracy simultaneously.

Effects on Fluency
Under the treatment of direct comprehensive WCF, students were able to write lengthier texts in the immediate posttest and the delayed posttest. Taking its effects on accuracy and fluency together, a conclusion can be drawn that this WCF contributed to L2 writers' writing fluency and accuracy concurrently. That is, the improvement of accuracy was not at the expense of fluency, which dismisses Skehan's [51,52] argument that there is a potential competition between accuracy and fluency.
The result that such WCF impacted fluency positively fits with what was reported by Chandler [56]. However, such a result differs from some previous studies, which reported that comprehensive WCF imposed few effects on students' fluency [19,20,46]. The inconsistent results are probably due to two potential factors. To begin with, compared with the participants in Hartshorn and Evans [46], participants in our study had a higher level of English proficiency as they were English major sophomores. Higher English proficiency endowed them with a greater potential to write longer texts. Moreover, this study required participants to complete the writing tasks with no less than 200 words within 40 min. To satisfy the requirement, students might try their best to write as much as possible. This differs from the research design in some previous studies [20,46]. In these studies, there was no requirement regarding the minimum number of words that students should accomplish.

Effects on Content and Organization Quality
As high-order dimensions of EFL writing production, both the content and the organization quality did not vary across tests in the treatment group. This suggests that comprehensive WCF did not show effects on them, which echoes the previous literature [20,46]. In this sense, it appears that such WCF exerts few effects on high-order dimensions of writing even though it can contribute to some linguistic output (i.e., accuracy and fluency), which responds to the claim that the improvement in linguistic areas does not pose a threat to the global level of L2 writing [53]. Such a finding does not lend support to the trade-off effects between language and high-order dimensions of L2 production [51,52]. In addition, this study also did not find significant mean differences concerning content and organization quality in the comparison group over time. That the treatment and comparison groups did not make progress in content confutes Ashwell's [32] assumption that rewriting could help students produce better content quality.
The inefficacy of WCF on the content and the organization quality is not surprising. In general, after WCF provision, L2 learners engage themselves with correcting linguistic errors in that they prioritize such errors in revision [78,79]. With this flawed task revision schema, students are not very likely to identify their problems in global areas of writing. Moreover, this result may also be associated with the nature of global issues. Compared with linguistic errors, problems in content and organization are difficult to detect and solve by L2 learners independently, which require more of their cognitive resources [67,80]. Without external scaffolding, it is demanding for them to make progress in these aspects.

Conclusions
This quasi-experimental study examined the effects of comprehensive WCF on L2 writing performance. Adopting a quantitative approach, our research reveals its complex effects on Chinese EFL learners' writing production. Specifically, providing such WCF improved students' overall writing accuracy greatly in the immediate posttest and the favorable effects were sustained in the delayed posttest. Furthermore, while such WCF appeared to show no effects on lexical accuracy, it contributed to grammatical accuracy significantly. In terms of the specific error types, it benefited the rule-based grammatical errors such as errors related to verb forms, articles, and run-on sentences. Additionally, it had favorable effects in terms of increasing the number of words in writing in the short term and long term. However, this practice did not result in any significant changes in syntactic complexity, lexical density and diversity, or content and organization quality.
Unsurprisingly, this study was not free from limitations. Firstly, this study only focused on the effects of direct comprehensive WCF and did not take into account indirect comprehensive WCF, although such a decision was justified previously. However, given that the effects of WCF are mediated by the explicitness of WCF [40,50], we still have little knowledge about what effects indirect comprehensive WCF has on the different dimensions of L2 learners' writing. As such, further studies are warranted to examine this issue. Furthermore, the interval between the posttest and the delayed posttest was only three weeks. To better assess the delayed effects, future studies can extend the gap between the posttest and the delayed posttest, which could provide us with a deeper insight into com-prehensive WCF effects over time. Finally, this study only documented the effects of comprehensive WCF on EFL learners' performance in argumentative writing. As Kang and Han [50] argued, WCF effects were influenced by task genres. Thus, future research needs to investigate the effects of comprehensive WCF on other genres.
Nonetheless, some contributions and implications can be derived from the present study. Although comprehensive WCF is often criticized for overwhelming L2 learners with multiple errors and imposing a heavy cognitive burden [5,10,18,36], our study reports that it benefits students' writing accuracy and fluency when offered directly. In this sense, direct feedback seems to reduce the cognitive load placed by comprehensive WCF. This suggests that to have a deep insight into the effectiveness of comprehensive WCF, feedback scope is not the only variable and the saliency of feedback (i.e., feedback explicitness) also plays an important role [47]. As claimed by many researchers [13,47,81], the explicitness of feedback matters when researchers examine WCF effectiveness. In addition, framed within the Trade-off Hypothesis, this study investigated the effects of comprehensive WCF on different dimensions of L2 writing. However, our results do not corroborate it. Specifically, we found that this practice improved L2 learners' accuracy and fluency simultaneously without negative effects on complexity or content and organization. This indicates that the improvement in accuracy was not at the cost of other aspects of writing. Such findings refute the predictions stipulated in the Trade-off Hypothesis (see Section 2.3) and, in a way, enrich our theoretical understanding.
This study has implications for L2 writing instruction, which may contribute to L2 learners' sustainable improvement of their writing performance. Firstly, it lends support to the value and usefulness of comprehensive WCF for L2 learners' writing performance in terms of writing accuracy as well as fluency. As such, it is reasonable that L2 teachers continue such an instructional practice in their normal writing pedagogy. Another implication is how to implement rewriting after the provision of direct WCF. As an important pedagogical practice, rewriting activity is worth carrying out after WCF provision, in which students have opportunities to practice and reinforce the knowledge they have learnt from WCF in revision. However, it would be of little use for students to copy their teachers' corrections after direct WCF in rewriting process. Alternatively, our study asked students to study the corrections but they did not have access to them in revision. While this practice is not common in real L2 writing classrooms, it enables learners to have a better understanding of WCF and their errors than just copying corrections passively. Our study probably provides teachers with a useful alternative, in which they can deliver their WCF directly (a ubiquitous practice in natural classroom settings) together with students' self-reflections while they revise their writing, which may maximize the efficacy of direct WCF and help foster self-sufficient revisers. Finally, as indicted by our results, WCF did not exert any effects on the content or the organization quality. Thus, to help students achieve a balanced development in L2 writing, teachers should not excessively place much emphasis on linguistic errors, at the expense of ignoring problems related to the content and organization of the written texts when providing feedback. This pedagogical suggestion based on our research results is in line with the recommendation that a balanced coverage on different aspects of writing should be stressed in feedback provision [82,83]. Focusing on both local and global issues in response to L2 learners' writing is believed to bring about the sustainable growth of their writing skills.
Author Contributions: X.C. and L.J.Z. conceived and designed the study. X.C. collected the data and drafted the manuscript, and all the authors revised the manuscript. L.J.Z. finalized it for submission as the corresponding author. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement:
The study involving human participants was reviewed and approved by the Human Participants Ethics Committee of The University of Auckland (020361, November 2017).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to ethical considerations.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Pretest: Nowadays, our life is getting a lot simpler and more convenient because of various intelligent machines. However, some people think that our brains will get lazy in a world run by intelligent machines. Write a composition of at least 200 words on the following topic: With intelligent machines to do the thinking, will our brains get lazy?
Posttest: An undergraduate of English at a university, in a recent letter to the university's president, complained that the mandatory math classes he had to take. He said that because a language major has little use for math, he would forget all of his math lessons soon after taking the required exams. Write a composition of at least 200 words on the following topic: Should university students of English major learn math?
Delayed posttest: Recently, a survey reported that 67% of Chinese university students think that saving money is a good habit while the rest believed that spending tomorrow's money today is better. Write a composition of at least 200 words on the following topic: Should university students save money or spend tomorrow's money?