Influence of Writing Instruction on Spanish Heritage Learners in Heritage-Only and Mixed Courses: A Longitudinal Study

University Spanish courses designed specifically for heritage language learners (HLLs) are becoming more common, and researchers have indicated that empirical research is needed to evaluate their effectiveness. This longitudinal study investigates the writing development of 24 HLLs as a result of instruction over the course of the semester. Nine were enrolled in a heritage-only section of a Spanish composition course, and the remainder were from mixed HL/L2 sections of the same course. Both section types were taught online. The major assignments the students produced were two 500-word essays, and students also completed bi-weekly forum posts. We examined the development of lexical density, sophistication, and diversity as well as syntactic complexity and accuracy by comparing each student’s first and final essay and forum posts. Findings indicate that there were significant differences between the scores received on the forum posts in comparison to the essays. However, there were no significant developmental differences in terms of group. Implications, avenues for future research, and pedagogical suggestions are discussed.


Introduction
While the body of research on the effectiveness of instruction for heritage language learners (HLLs) is steadily growing, there are still many areas of instructed heritage language acquisition (IHLA) left to explore. Writing development is one of the realms of IHLA in which there is a particular dearth of research. This is troublesome, as surveys have found writing to be the language skill that HLLs most hope to improve when they take language classes (Carreira and Kagan 2011). At the same time that universities are working to create courses or even whole tracks specifically for HLLs due to their unique language skills and backgrounds, they must also respond to the growing demand for online courses, making research on online language learning for HLLs vitally important. To our knowledge, no study has evaluated the writing development of HLLs as a result of instruction and simultaneously compared the efficacy of courses designed specifically for HLLs and the more common mixed HL/L2 courses at promoting gains in the quality of writing of HLLs. In the present study, we do just this, by comparing the writing of HLLs from a section of an online writing course designed specifically for HLLs (asynchronous) with HLLs from mixed HL/L2 sections of the same course (synchronous). We measured the development of their writing over the course of one semester of instruction using lexical measures, syntactic measures, and measures of accuracy.
In this article, we begin with a review of previous research on HL writing and the pedagogical implications of said research, and then provide a more general overview of existing longitudinal studies of writing development as a result of instruction, some of which we draw upon for the methodology of this study. We also discuss the benefits that computer-based writing has been demonstrated to afford to learners and how they might impact our HL participants. In the methodology section, we describe our participants, their language backgrounds, the course they were recruited from, and the corpus of their writing

Heritage Language Courses in Higher Education
As the population of bilinguals grows in the United States, some universities have started to offer HL courses in order to address the specific needs of these speakers (Beaudrie 2011(Beaudrie , 2012. Spanish is the most commonly taught HL, due to its status as the second most spoken language in the country after English (Acosta and de la Cruz 2011). However, the majority of universities that offer Spanish HL courses do not offer more than one or two courses, with only a handful of universities in the southwest of the country offering up to four courses (Beaudrie 2012;Brown and Gregory 2018). This means that at most colleges and universities, HLLs end up in mixed HL/L2 courses.
Furthermore, many HL programs face several challenges. For example, when only one course is offered (which is the most common scenario), it is often hard to find a time that is convenient for all HLLs interested in the class, which sometimes leads to the course being cancelled due to low enrollment (Bowles and Montrul 2014;Brown and Gregory 2018). Another problem is that most of these HL courses target HLLs with an intermediate or advanced proficiency level (Beaudrie 2006). Thus, HLLs with lower proficiency levels have to enroll in courses that have been designed for L2 speakers instead, which might not always lead to equally positive outcomes (Fairclough 2011;Felix 2004;Potowski 2002). Another challenge is that there is a scarcity of research investigating the outcomes of HL instruction, so further studies are needed in order to understand which teaching practices lead to greater results in HL courses (Montrul and Bowles 2017).

Previous Studies on the Writing of HLLs
Previous studies have reported that writing is the weakest skill of HLLs and the one they most want to improve (Spicer-Escalante 2005). This is due to the fact that even though HLLs usually have strong oral and aural skills, they rarely write in their HL, which is almost exclusively used in the home (Carreira and Kagan 2011;Mrak 2020). For this reason, the writing of HLLs tends to be informal, inaccurate from a prescriptive perspective (Escobar and Potowski 2015), and reflect orality (Callahan 2010;Colombi 1997). In fact, this is even the case of highly proficient HLLs, whose writing differs greatly from that of baseline speakers of the HL in question (Dengub 2012).
Drawing from these differences, some scholars have suggested that HLLs would benefit from explicit instruction in their HL in order to develop a more academic writing style (Achugar and Colombi 2008;Colombi 1995;Colombi and Harrington 2014). In this regard, Beaudrie et al. (2014) proposed a list of principles to be implemented during writing Languages 2021, 6, 109 3 of 17 instruction so that HLLs could develop their writing proficiency. In a similar fashion, Chevalier (2004) proposed a pedagogical model for HL writing classes that considered the various stages of the writing process.

Longitudinal Studies on the Impact of Instruction/Feedback on the Writing of HLLs
Nevertheless, despite these well-informed proposals, little research so far has examined the impact of instruction on the writing skills of HLLs (Bowles 2018;Montrul and Bowles 2017). The few available studies have examined HLLs' writing longitudinally and have focused either on instruction broadly or specifically on the role of feedback. For example, Jegerski and Ponti (2014) analyzed whether peer feedback was an effective method of writing instruction for HLLs at the university level. They tracked the writing development of 16 HLLs over a period of two weeks. The HLLs only seemed to improve in terms of number of words, but not in other quantitative measures such as lexical density or syntactic complexity. From a qualitative perspective, however, HLLs did report a self-perceived improvement as well as feeling more confident in their writing abilities thanks to the feedback received. In a study by Pérez-Núñez (2015), on the other hand, two types of feedback were compared in a group of 12 HLLs over a period of four weeks: written corrective feedback and content feedback. Whereas the first type of feedback is based on explicit grammatical corrections, the second one places a greater emphasis on the structure and organization of ideas. Pérez-Núñez investigated improvement in the areas of complexity, accuracy, and fluency (CAF) and showed that HLLs who received content feedback made only superficial corrections, whereas those who received written corrective feedback improved their grammar, specifically their use of definite articles. No other grammatical improvements were observed from the set of features he investigated. These two studies show that feedback does not have an outsized impact on the development of HLLs' writing skills, at least not quantitatively in the short term. It could also be the case that changes can only be observed after a longer period of time (Polio and Shea 2014).
A more recent study by Bowles and Bello-Uriarte (2019) examined the development of writing skills over the course of a semester (12 weeks) by comparing a group of instructed HLLs (n = 25) to an uninstructed group (n = 25). Writing development was measured in terms of lexical diversity, lexical density, lexical sophistication, syntactic complexity, accuracy, and fluency. Results showed that the instructed group slightly improved in lexical sophistication, syntactic complexity, and fluency. The uninstructed group, on the other hand, did not make any gains. A follow-up study (Bello-Uriarte in press) compared the use and development of three grammatical structures in the same instructed and uninstructed groups, now with 33 and 32 participants, respectively. Two of the structures were explicitly taught in the Spanish writing course the instructed group attended: the use of the preposition a in verbal periphrasis and the difference between the gerund and the infinitive. The third structure was not taught in class: gender assignment and agreement. Results revealed that none of the groups made any improvement in the case of gender assignment and agreement, probably due to not having received explicit instruction in this area. However, the instructed group did improve in their use of the preposition a with verbal periphrasis, while the uninstructed group did not. As for the use of gerunds and infinitives, there was not enough data to draw any firm conclusions.
Studies reviewed so far indicate that writing instruction or feedback does not always translate into an improvement in the writing skills of HLLs and, when it does, such improvement does not occur in all areas under study, as is common in writing development overall (Polio 2017). Therefore, it is crucial to continue investigating this issue so that we have a better understanding of the effects of current pedagogical practices as well as of how these can be further developed to maximize improvement.
In this regard, one interesting line of research, and the one we followed in this study, is examining whether writing courses exclusively designed for HLLs result in greater improvement than the more common mixed HL/L2 writing courses. To our knowledge, no study has explored this question to date, even though HL programs are becoming increas-Languages 2021, 6, 109 4 of 17 ingly popular at the university level (Benmamoun et al. 2010) and specific textbooks and guidelines are being published to this aim (e.g., Francés and Benítez 2019;Potowski 2017).
It was also imperative that we incorporated a technological component or perspective into our analysis of HL writing courses as this is overwhelmingly the most common writing format in university courses (Henshaw 2016;Torres 2016). This makes computerbased writing courses, and the research that examines them, both ecologically valid and necessary (Godwin-Jones 2018; Elola and Oskoz 2017). Elola and Oskoz (2017), in their conceptual article on the importance of digital literacy, advocate for writing classes in which students produce different kinds of written work apart from just formal essays, as modern communication takes place in many different arenas such as social media, comment threads, and blogs.

Writing Instruction and Technology
A fair amount of recent research has examined the impact that composing on a computer and having access to the Internet has on students and on their written work and has found it to be overwhelmingly positive (e.g., Williams and Beam 2019;Chen 2016;Lee 2015). These studies have focused on the K-12 level and largely on first-and secondlanguage learners (e.g., Elola 2014, 2016), but two recent studies (Henshaw 2016;Torres 2016) have examined the effect of technology on HLLs taking university level Spanish writing courses. Henshaw (2016) compared the grades and course evaluations of HLLs in a fully online course to those of L2 learners in a face-to-face course. Both groups reported being satisfied with the course in their evaluations and their final grades did not differ significantly between groups. These similarities indicate that online writing courses work well for HLLs at this level. Torres (2016) used a flipped classroom approach in a university-level Spanish writing course in which students viewed videos that discussed aspects of writing such as thesis statements instead of having in-class lectures on these topics. When asked if they felt these videos were beneficial, the 15 HLLs participating in the study reported that they found the videos to be useful because they could re-watch them as needed and that the videos made them more aware of the conventions they needed to use when writing.
Apart from these benefits, recent studies have shown that writing using computers as opposed to paper and pencil improves the students' ability to do text-level editing and organization such as outlining as well as sentence-level tasks like selecting appropriate vocabulary (Williams and Beam 2019;Chapelle 2003). It also has been shown to lead students to produce lengthier, higher quality texts (Goldberg et al. 2003;Williams and Beam 2019) observed improvement in the learners' ability to revise their own work successfully. At the high school level, using computers to write has been shown to improve students writing fluency as it allows them to recursively write and edit at the same time and overall lowers their cognitive load (Turner and Katic 2009;Pennington 2004).
In addition to the benefits computers afford learners in terms of the quality of their work and the recursivity of their writing processes, teenage students reported that they felt more creative and that they enjoyed writing more on the computer than with pencil and paper (Williams and Beam 2019). The same learners also shared that they felt more motivated, engaged, and self-confident as a result of using technology to write. This may be a result of having access to a plethora of social tools such as blogs and vlogs as well as a wide variety of new digital genres such as YouTube videos and tweets, as Elola and Oskoz (2017) point out. Students can now access and incorporate the same types of media they consume for their own enjoyment into their classwork. As writing can be an area of insecurity for many language learners, specifically HLLs studying their mother tongue, the confidence-boosting opportunities that computer-based writing affords are extremely valuable. Furthermore, writing in less traditional digital genres such as forum posts or blogs allows students to approach writing in a way that is less academic, focusing less on grammar and planning, and more on getting their message across. These types of writing activities can allow learners to become more fluent writers in their non-dominant language Languages 2021, 6, 109 5 of 17 by removing some of the expectations that traditional genres such as formal essays entail (Tabari 2016).
Another key attribute of writing with technology is that, in many cases, it allows for immediate feedback from peers and instructors and promotes collaboration. Goldberg et al. (2003) reported that the writing process in technology-based classrooms is more social and collaborative in nature and other studies have found that learners may find peer feedback less threatening than teacher feedback (Williams and Beam 2019;Lee 2015). Therefore, peer feedback allows students to improve their work while avoiding additional stress. Chen (2016), in a meta-analysis of online peer feedback, found that students often preferred to provide feedback electronically as opposed to face-to-face as it was less personal, and they could take the time to think through what they wanted to say.
In sum, research indicates that access to online tools, the ability to write and edit recursively, and the type of immediate, low-stakes electronic peer feedback that is possible in computer-based writing classes all allow students to be more creative, engaged, and confident and help them to write longer, higher quality texts.
However, as previously stated, the majority of the existing research focuses on L1 learners at the K-12 level and to a lesser extent, on L2 learners. As HLLs differ from these groups in many ways due to their natural exposure to the language in the home, more research like the present study is needed to assess the impact of computer-based writing for this growing learner population.

The Present Study
This study aimed to assess the effectiveness of writing instruction for HLLs by measuring the development of lexical density, lexical diversity, and lexical sophistication as well as syntactic complexity and accuracy in the writing of HLLs of Spanish over the course of one semester (16 weeks). Importantly, it also compared the progress made in these domains based on course type by comparing HLLs in a newly designed HL-specific section of a Spanish writing course and those in mixed HL/L2 sections.
The research questions that guide this study were: 1. How does instruction impact lexical density, lexical diversity, lexical sophistication, syntactic complexity, and accuracy in the writing of Spanish HLLs over the course of one semester? 2.
Are there differences in the effectiveness of instruction between sections designed specifically for HLLs of Spanish and mixed HL/L2 classes? 3.
Does assignment type (forum post vs. essay) interact with the lexical and syntactic measures described above? If so, how?
Since the few studies that have examined HLLs' writing development in HL writing courses have not found conclusive evidence of the impact of instruction (e.g., Bello-Uriarte 2019; Pérez-Núñez 2015) and improvements have usually been mild, we did not expect large gains nor gains in all aspects of writing. Regarding differences in terms of groups of students, we started by proposing a null hypothesis, since this study is the first to compare writing development in heritage-only and mixed HL/L2 courses. Finally, our third question looked at differences in terms of types of assignments. Even though previous research has not found assignment type to be a significant variable, we decided to include it due to differences in the way students were expected to approach each of the assignments (see the following section for more information on this). In general, because forum posts were more informal and unplanned than essays and vice versa, we wanted to be able to control for these differences in our analysis.

The Course
Data were collected in a fifth-semester course designed for students who had taken intermediate-level grammar courses or who could demonstrate an equivalent ability in Spanish as a result of previous studies or natural exposure to the language. Students signed up for the heritage-only section based on self-identification as HLLs. During the semester Languages 2021, 6, 109 6 of 17 the study was conducted, there was only one heritage-only section. However, in the past, more than one section has been offered if there was interest. Still, many HLLs instead signed up for the mixed HL/L2 sections, mostly due to personal preference, as the mixed sections were taught synchronously while the heritage-only section was asynchronous. The reason for this is practical in nature, as it was impossible to find one common time that would work for all interested HL students. This type of conflict is quite common in areas of the U.S. where the heritage population does not constitute a large proportion of the student body (see Bowles and Montrul 2014 for a discussion on this). Both section types were taught online. The only other major difference between the sections was the textbook, although these were also quite similar. In the heritage-only section of the course, Potowski's (2017) e-textbook Conversaciones escritas (Written Conversations) (1st edition), which is used in HL writing courses nationwide, was used as the basis for class activities. In the mixed HL/L2 sections, the e-textbook used was Henshaw's (2020) Comunicación escrita (Written Communication) (3rd edition), which is designed to meet the needs of both L2 and HLLs, though is not specialized for the latter.
The curriculum for this course reflected a post-process approach to writing following a genre-based methodology and promoting interaction among students, while still allowing for revisions and the provision of feedback. Post-process approaches to writing are characterized for promoting a social perspective on writing, as opposed to process approaches to writing, which are more individualistic or even asocial (Hyland 2011). In both section types, all writing and activities were done using a computer. The major assignments in the course were essays of different genres. For each essay, students would start by turning in a draft version of approximately 500 words in length that responded to a prompt provided by the instructor (see example below). Then, they would meet one-to-one with their instructors via Zoom to receive feedback on the draft, after which they would produce a longer final essay of about 1400 words in total. Feedback provided by instructors focused on content and structure rather than on form, making sure students followed genre-specific conventions. The Zoom sessions to receive feedback were structured the following way: students would sign up for an individual slot using SignUpGenius. During the Zoom meeting, students would share their screen with the instructor so that they could read the student's draft. Then, the instructor would make comments to the student concerning the prompt (in case students had not followed all instructions), content, and overall structure. These comments were discussed orally by looking at the student's draft. Students could also ask questions if they wished. Grammar issues were only addressed if errors made it hard to understand what the student was trying to convey. Students in the mixed HL/L2 sections produced two essays, one argumentative and one analytical, while students in the heritage-only section produced three, one narrative that was largely biographical, one argumentative, and finally, one analytical essay. Students in the heritage-only section wrote an additional essay because they did not have weekly synchronous hours and therefore had more time to write. Para esta composición, vas a escribir un ensayo analítico. Debes analizar una o dos obras (pinturas, películas, poemas, cuentos, etc.). Los únicos requisitos son: (1) la obra debe ser de un artista hispano (puede ser de Estados Unidos, pero debe ser latino); (2) si es un texto, canción o película debe ser en español; (3) no debe ser una obra súper famosa que ya ha sido analizada muchas veces (por ejemplo, "Autorretrato" de Frida Kahlo). [For this composition, you are going to write an analytical essay. You must analyze one or two artworks (paintings, movies, poems, stories, etc.). These are the requirements: (1) the artwork must be by a Hispanic artist (they can be from the United States, but they must be Latino); (2) if it is a text, song, poem or movie, it must be in Spanish; (3) it can't be a very famous artwork that has been widely analyzed (e.g., "Autorretrato" by Frida Kahlo).] All assignments that students from both sections had to complete revolved around the essays. Usually, they would start by reading sample texts belonging to the genre that was the focus of the essay. Then, they would complete some reading comprehension Languages 2021, 6, 109 7 of 17 activities that took place either during class time (small group discussions), in the case of the synchronous sections, or via forum posts in the case of the asynchronous section. Once students had discussed the content of the readings, they would move on to understanding the structure and conventions of the genre. This was done differently in each section type. In the synchronous sections, the instructor would give short lectures and show examples. In the asynchronous section, students were provided with readings and short presentations on the topic via the online platform. These presentations were designed using the learning management system and, in some cases, there were some questions embedded to make sure students understood the information. Most of these questions were multiple-choice and were graded automatically by the learning management system. Students had two attempts for each question. After this, the production of each major essay began with a phase of brainstorming, which students from both sections completed using various digital tools such as Pinterest and scoop.it, among others. These two tools were collaborative content curation tools that allowed students to post content related to a specific topic as well as to receive comments (or feedback, in general) from their peers. The process of brainstorming required students to identify sources and analyze them from news pages, YouTube, online repositories of art, and any other online sources they considered relevant. Students had to comment on the sources their peers had found and discuss whether these were appropriate or not. These discussions happened online in both sections, mostly using forum posts, though sometimes alternative tools like Google Docs or Pinterest were used. Students then moved on to the development of a paragraph, and eventually to the organization and production of a full-length essay. Throughout the development of the essay, students received more feedback from their peers via the online forum posts. The learners produced an average of two forum posts per week, regardless of section type. Time spent on the forum posts was not long, often around 15 min each in the synchronous sections and although we were not able to measure the exact amount of time students in the asynchronous section spent on each post, we expected that it was similar as 15 min is sufficient for them to respond to each prompt. Occasionally, when appropriate for the topic, students commented on each other's posts. The goal of commenting was most often to give feedback to their peers on their ideas or the organization of their message. This was done two or three times during the semester.
Regarding the role of instructors, they did not provide feedback on any of the forum posts or online discussions, since these were simply graded as 'credit'/'no credit'. However, they did provide feedback on the different versions of the essays, as previously mentioned. Overall, the role of instructors was mostly limited to giving short lectures during class time and guiding students through the daily pair and individual writing activities (in the synchronous sections), providing feedback on the essays, and updating the grades for the forum posts ('credit'/'no credit'). Additionally, they also held weekly office hours that students could attend if they had any questions or issues with the course. Each section was taught by a different instructor, all of them experienced teaching assistants with several years of experience teaching Spanish courses at the college-level. These teaching assistants were PhD students and had received formal training in language teaching including HL teaching, with some focus on the differences between HL and L2 learners when it comes to writing in Spanish.
The first drafts of the essays were submitted in weeks 3, 7, and 13 of the semester in the heritage-only section and in weeks 4 and 13 in the mixed HL/L2 sections. Our corpus consisted of four written assignments from each student: the first drafts of two of the essays, which had approximately 500 words each, and two of the weekly forum posts (about 200 words each). In order to examine the students' longitudinal writing development over the course of the semester, we analyzed the first and last essay draft for each section as well as the first and last forum posts. One could argue that genre could have a considerable effect on the students' writing, and in fact, it has been demonstrated to do so in some previous studies (Bi 2020;Oskoz 2010, 2014;Yoon and Polio 2017). However, we did not consider genre as a variable in this study to preserve ecological Languages 2021, 6, 109 8 of 17 validity because the genre of the first essay was different in each section (narrative in the heritage-only section and argumentative in the mixed HL/L2 section).

Measures
Since the 1970s, researchers have endeavored to find measures that could quantify L2 speakers' linguistic development in several domains including writing (Larsen-Freeman 1978). The complexity, accuracy, and fluency triad, most commonly known as CAF, has been widely used to assess L2 writing for several decades now ). These measures, which have also been employed in previous research to assess HLLs' writing (e.g., Bello-Uriarte 2019; Dengub 2012; Elola and Mikulski 2016;Pérez-Núñez 2015), have often been complemented with other lexical measures in order to get a better picture of language learners' writing performance (Skehan 2009). This means that L2 writing research (and HL writing research, by extension) employs various measures to gather data on multiple linguistic features.
For this paper, we decided to use most of the measures by Bowles and Bello-Uriarte (2019) given the similarities between the two studies. Thus, we examined syntactic complexity, accuracy, lexical density, lexical diversity, and lexical sophistication. In what follows, we define each measure and explain how it was calculated. Note that we did not examine fluency (a key component of the CAF triad), as it is calculated based on the time it takes the learner to write the whole text, a piece of information that we could not collect due to the nature of the course.
Syntactic complexity refers to the range of syntactic structures produced as well as the degree of sophistication of those structures (Ortega 2003). Several indices have been used to operationalize the construct of syntactic complexity, usually involving extensive manual coding and, hence, not large amounts of data (Bulté and Housen 2012;Norris and Ortega 2009;Wolfe-Quintero et al. 1998). For our study, we followed Bowles and Bello-Uriarte's operationalization and calculated syntactic complexity as the number of clauses per T-unit. The term 'T-unit' was introduced by Hunt (1965) and refers to an independent clause plus all the subordinate clauses attached to it. If a sentence consisted of two or more coordinated clauses, these were counted as individual T-units. Table 1 shows a sample calculation of syntactic complexity. Additionally, and in order to ensure inter-rater reliability, 20% of all texts were coded by two researchers and the percent of exact agreement for clauses per T-unit was calculated, reaching 92%. Disputed cases were determined through discussion. The remaining texts were then distributed between the two researchers and coded individually. Accuracy, as opposed to syntactic complexity, is a more straightforward construct (Pallotti 2009), and it is associated with correctness. In other words, accuracy describes the ability to write (and to use language, in general) in an error-free manner. Therefore, we calculated it by dividing the number of error-free T-units in a text by the total number of T-units. We followed Bowles and Bello-Uriarte's analysis in that we only took into account the following types of errors: gender agreement, number agreement, verb agreement, miss-ing articles, verb tense, verb aspect, mood, ser/estar (i.e., Spanish copula), extra or missing words, gerund/infinitive use, preposition selection, word order, missing clause complements, and lexical problems. The notion of what constitutes an 'error' has been widely discussed in the literature, but it is usually defined as a deviation from the native norm (Hammerly 1991;Wolfe-Quintero et al. 1998). Since our research deals with HLLs, whose language use and writing is often unique in comparison to that of monolingual Spanish speakers and L2 learners (Montrul 2016; Polinsky 2018), we did not count English borrowings or uniquely U.S. Spanish words as inaccurate and instead focused on 'acceptability' or 'appropriateness' as defined by Ellis (2008) and Polio (1997). For example, phrases such as atender a la escuela ("attend school") or ser elegible ("be eligible") were considered acceptable in place of their standard alternatives (asistir a la escuela; ser apto), since the first is quite common in U.S. Spanish. Percent of exact agreement was also calculated based on 20% of all the texts, which were double-coded. Agreement reached 91%, and disputed cases were also discussed. Each rater then coded half of the remaining texts individually. The raters were one native speaker and one near-native second language learner of Spanish. Both teach Spanish at the university level.
Lexical diversity, on the other hand, refers to "the range of different words used in a text" (McCarthy and Jarvis 2010, p. 381). Lexical diversity has been found to correlate with vocabulary knowledge and writing quality, among others. The most common way of measuring it is by calculating the type-token ratio of a text (TTR) (Templin 1957). TTR is calculated by dividing the number of 'types' in a text (i.e., unique words) by the total number of 'tokens' (i.e., words). Typically, the more types there are, the more diverse the vocabulary is in a text. That is, there is a positive correlation between types and tokens in a text. Nevertheless, TTR has been claimed to be very sensitive to text length, so most researchers use a modified version of this measure. In our case, we used the measure of textual lexical diversity (MTLD) following Bowles and Bello-Uriarte (2019). Given that the algorithm to calculate MTLD is quite complex, researchers usually rely on specific software for this calculation. We decided to use the open-source R package koRpus (Michalke 2021), which contains a function to calculate the MTLD for a text or group of texts.
The last two measures, lexical density and lexical sophistication, are used to indicate how rich a text is from a lexical perspective. Lexical density is the percentage of lexical words or content words (i.e., nouns, verbs, adjectives, and adverbs) in relation to the total number of words in a text (Laufer and Nation 1995). Because content words are the ones that convey information, a text will be more informative ('denser') the greater the number of content words. Lexical sophistication, on the other hand, is "the percentage of 'advanced' words in a text" (Laufer and Nation 1995, p. 309). The term 'advanced' here is very subjective and can thus be interpreted in different ways. However, following Bowles and Bello-Uriarte (2019), we interpreted 'advanced' as 'infrequent'. Whether a word was infrequent or not was determined in reference to the Corpus del Español (Davies 2006), a popular Spanish corpus containing 20 million words from oral and written sources. We took the first 2000 most common lemmas in the corpus as being 'frequent' in such a way that only lemmas not included in that set could be considered 'infrequent' (and thus 'advanced'). We then divided the number of advanced lemmas by the total number of lemmas and multiplied the result by 100.

Participants
All 24 participants were second-generation HLLs attending an American university in the Midwest. They were born in the U.S. to Spanish-speaking parents, with whom they grew up hearing and speaking Spanish at home. Since they all learned English once they started school (around the age of five), they were sequential bilinguals.
Out of the 24 students who participated in the study, nine were enrolled in the heritageonly section, whereas the other 15 were recruited from the other sections of the course that enrolled both L2 and HL learners. Each student completed a portion of the written Spanish DELE test in order to assess their proficiency. This test has been widely used in previous studies with Spanish heritage speakers (e.g., Montrul and Slabakova 2003;Montrul 2004). Students in the heritage-only section obtained a mean score of 36.56 points out of 50 (SD: 5.88) on the test, whereas students in the mixed section obtained a mean score of 35.80 points (SD: 5.71). Following the DELE's scoring protocols, this is interpreted as having an upper-intermediate level of Spanish. There were no significant differences between the two groups in terms of proficiency, as an independent samples t-test showed, t(22) = 0.31, p = 0.759.
When asked about previous writing courses they had taken in college, 14 of them said they had taken one writing course in English (six students from the heritage-only section and eight students from the mixed section). The remaining 10 had not previously taken any writing courses. Given that all undergraduate students in the U.S. are required to take at least one composition course in English, this means that these 10 students had placed out of the composition requirement. None of the students reported having previously taken a writing course in Spanish.

Results
Before running any statistical analyses on the writing measures under consideration, we checked that our data were normally distributed via the Shapiro-Wilk test of normality. Likewise, we also checked for homogeneity of variances using Levene's test. The tests showed that all variables were normally distributed and presented equal variances, except for syntactic complexity, which was not normally distributed but rather positively skewed. Following Curran-Everett (2018), the data for syntactic complexity were log-transformed, resulting in a normal distribution. In Table 2, we present descriptive statistics for all writing measures by group, time, and assignment type. For each writing measure, we ran a three-factor ANOVA to determine whether there were differences between groups, time, and type of assignment. When a main effect or interaction was significant, we conducted a Tukey's post-hoc test to further investigate the source of the statistical significance.
Regarding syntactic complexity, the ANOVA showed that there were only significant differences for the variable group (F(1,88) = 14.08, p < 0.001, d = 0.78), with the students in the heritage-only section receiving lower scores overall in this measure. The same pattern was also observed for accuracy (F(1,88) = 8.41, p = 0.025, d = 0.48).
Turning now to lexical measures, the ANOVA for lexical density revealed that there were significant differences in terms of assignment type (F(1,88) = 14.01, p < 0.001, d = 0.76), with both groups receiving lower scores on the forum posts than on the essays, regardless of the section they were enrolled in. This same pattern was also observed in the case of lexical sophistication (F(1,88) = 5.84, p < 0.001, d = 2.13). Finally, two significant differences emerged from the ANOVA for lexical diversity. Thus, there were significant differences for the variable group (F(1,88) = 16.46, p < 0.001, d = −0.81) as well as for the interaction between time and type of assignment (F(1,88) = 13.41, p < 0.001). The heritage-only group seemed to have higher scores overall. Regarding the interaction effect, it seems that students in both groups received higher scores in the forum posts at the end of the semester (increase), whereas their scores for the essays decreased ( Figure 1). For each writing measure, we ran a three-factor ANOVA to determine whether there were differences between groups, time, and type of assignment. When a main effect or interaction was significant, we conducted a Tukey's post-hoc test to further investigate the source of the statistical significance.
Regarding syntactic complexity, the ANOVA showed that there were only significant differences for the variable group (F(1,88) = 14.08, p < 0.001, d = 0.78), with the students in the heritage-only section receiving lower scores overall in this measure. The same pattern was also observed for accuracy (F(1,88) = 8.41, p = 0.025, d = 0.48).
Turning now to lexical measures, the ANOVA for lexical density revealed that there were significant differences in terms of assignment type (F(1,88) = 14.01, p < 0.001, d = 0.76), with both groups receiving lower scores on the forum posts than on the essays, regardless of the section they were enrolled in. This same pattern was also observed in the case of lexical sophistication (F(1,88) = 5.84, p < 0.001, d = 2.13). Finally, two significant differences emerged from the ANOVA for lexical diversity. Thus, there were significant differences for the variable group (F(1,88) = 16.46, p < 0.001, d = -0.81) as well as for the interaction between time and type of assignment (F(1,88) = 13.41, p < 0.001). The heritage-only group seemed to have higher scores overall. Regarding the interaction effect, it seems that students in both groups received higher scores in the forum posts at the end of the semester (increase), whereas their scores for the essays decreased ( Figure 1).

Discussion
This study examined the effect of writing instruction on a group of 24 HLLs, some of them enrolled in a heritage-only section (n = 9) and the remainder enrolled in mixed HL/L2 sections (n = 15). All sections were taught online and made extensive use of technology.
Our first research question asked whether one semester of writing instruction had any effect on the writing skills of HLLs in terms of syntactic complexity, accuracy, lexical density, lexical diversity, and lexical sophistication. Results showed significant improvement for HLLs only in the area of lexical diversity. While this may seem unusual and perhaps concerning, previous longitudinal studies have pointed out that writing instruction often does not lead to huge improvements in the short-term. For example, in Pérez-Núñez's (2015) study, HLLs who received explicit written feedback improved only in their use of one grammatical construction but did not make any gains in terms of complexity, fluency, nor in the accuracy of the remainder grammatical constructions under study. Similarly, Bowles and Bello-Uriarte (2019) observed a significant medium-size improvement in fluency, lexical sophistication, and syntactic complexity when controlling for the type of genre. However, there were no improvements in other areas.
Two important questions arise here: (1) Can any language improvements be observed in areas of HL writing not directly examined in the research done up to this point? and (2) How might we adapt HL writing courses to help HLLs improve on the lexical and syntactic measures analyzed in this and other studies? The first issue has already been mentioned in previous writing research (Bello-Uriarte 2019; Norris and Manchón), but it has not been addressed yet in the literature. Studies on writing instruction, be it L2 writing or HL writing, usually rely on quantitative measures, as the ones reported in this study, and tend to not include other, more qualitative measures. It could be the case that HLLs did show gains in areas such as self-confidence or self-perceived improvement. Therefore, using mixed methods approaches to research writing outcomes could be beneficial to tap into other potential areas of improvement. As for the second question, it could be that improvements cannot be observed until later on. In other words, one-semester courses might not be long enough to have an observable impact on HLLs' writing skills using these kinds of measures. As longitudinal studies so far have not spanned longer than a few weeks, it is hard to say, but we believe that this is a fruitful area for research.
Our second question looked at whether there were differences between the two groups under study: a heritage-only section and a mixed HL/L2 section. The only differences observed were in terms of syntactic complexity, accuracy, and lexical diversity. It seems that HLLs enrolled in the heritage-only section received lower scores overall in syntactic complexity and accuracy than HLLs in the mixed section. This difference could simply be due to the fact that the first group of learners already showed lower rates of syntactic complexity and accuracy on the first assignments of the semester than the latter group. Thus, if none of the groups made any gains throughout the semester, it is expected that this difference would still persist in the last assignments. Given that both groups were comparable in terms of proficiency, we attribute this initial difference to a matter of chance. The opposite happened in the case of lexical diversity, as learners in the heritage-only section showed a higher rate of lexical diversity overall compared to learners in the mixed sections.
These results are telling, since, to our knowledge, this study is the first to compare HLLs enrolled in a heritage-only section to those in mixed HL/L2 sections. From a longitudinal perspective, the fact that there were no differences between the groups suggests that quantitatively, HLLs did equally well in both section types. However, we believe our results should be taken with caution, as our findings are based on a small-sized sample of HLLs enrolled at one specific university in the U.S. The fact that the heritage-only section was taught asynchronously while the mixed HL/L2 sections were synchronous may also have played a role, and that was a factor we were not able to operationalize in this study as the asynchronous section is the only heritage-only section of the course offered at the university where this study was conducted. Further research is surely needed before making any conclusions in this area, especially longer-term studies. It would be especially interesting to compare the writing improvement as a result of instruction of HLLs in heritage-only and mixed HL/L2 courses at universities that offer multiple heritage-specific courses to see if more notable improvement emerges over time.
Another issue worth mentioning here is that we only examined the first draft of the essays produced by the learners. This means that we did not analyze whether they improved in subsequent versions after they met with their instructors for personalized feedback. It could be that the feedback received did lead them to improve their drafts differently in each group. This is a possibility that we are currently exploring.
Finally, our third question looked at whether there were any differences between the two assignment types (forum posts and essays). In this regard, we only found differences in the lexical measures examined, but neither in accuracy nor in syntactic complexity. It seems that all HLLs, regardless of the section they were enrolled in, received lower scores in the forum posts than in the essays for all measures, except for lexical diversity, where they showed the opposite pattern. We believe that lower scores in the forum posts for lexical density and lexical sophistication could be attributed to forum posts being a more informal type of assignment than essays. Therefore, these might have made learners less eager to develop their lexical skills, relying on rather frequent and less rich and sophisticated vocabulary. As for lexical diversity, we think that exposure to Spanish throughout the semester played a key role in their development of lexical diversity, since students in both sections read and analyzed many texts over the course of the semester. Interestingly, this improvement in lexical diversity did not extend to the essays. We believe that this might be due to the essays differing in terms of genre and to the amount of practice students had with each genre. While the forum posts prompts remained quite consistent over the course of the semester like asking students to do things like paraphrase a source or describe the message of a piece of art, the two essays we analyzed differed in genre to a greater extent. The first essays in the heritage-specific section and the mixed HL/L2 sections were narrative and argumentative, respectively. In both section types, the second essay we analyzed was analytical. For many students, this was the first time they had done an in-depth analysis of a piece of art of any kind, which was the topic of the assignment. We think that this essay might have been more demanding for students in terms of generating ideas and content, which may have led them to focus less on using a diverse vocabulary and more on simply getting their message across. Previous research has found that for L2 learners, narrative is the least complex genre, then argumentative, and then analytical (Jeong 2017;Yoon and Polio 2017). Even though previous L2 writing studies have found that genre is usually a significant variable (Bi 2020; Yoon and Polio 2017), we were not able to factor it in because there were differences between the groups, and we wanted to preserve the ecological validity of the study. We acknowledge that this is a significant limitation of our study and future research should control for this variable to tease apart whether lexical diversity interacts with type of genre.
Since the intermediate-level HLLs in this study were already good at communicating in informal contexts and because forum posts were simply graded for completion, the forum posts might have enhanced that informality (Elola and Oskoz 2017). Thus, this type of writing assignment might not be the best to develop HLLs' writing skills at intermediate or advanced levels of proficiency and might instead benefit L2 learners to a greater extent or even HLLs at lower levels of proficiency, as they often have more difficulty writing fluently. Alternatively, informal forum posts assignments could be graded for accuracy and grammaticality as well as completion to encourage students to pay more attention to these aspects of their writing. This, in turn, suggests that while using technology in the HL classroom can be very positive, online assignments should still be carefully designed to promote more formal writing, which pushes them past simple informal communication.
While we did not find evidence of marked improvement using quantitative measures, the content of their final forum posts, in which students reflected on the class, revealed that learners in both section types found the course to be very valuable. In the final forum post of the semester, which was one of the two analyzed for this study, learners wrote about their experience in the course. Even though we did not conduct any qualitative analysis on these data, based on mere observation, it seems that many of those who participated in this study reported feeling that they were improving their writing skills markedly as a result of this course and that they felt more confident in their written work. This perceived improvement and confidence boost is not something to be overlooked. Attitude, self-confidence, and motivation all play key roles in language learning, and research that demonstrates how writing courses help HLLs grow in these areas would be very beneficial. As already pointed out in this discussion, future studies should also include qualitative data to examine these issues more in depth.

Conclusions
The findings of this study are interesting in that they seem to confirm what previous research in this same vein has found; instructors should not expect enormous gains or gains in all areas after just one semester of instruction. The HLLs examined in this course did not improve in measures of syntactic complexity, accuracy, lexical density, or lexical sophistication over the course of a semester of instruction, but they did in terms of lexical diversity regardless of section type (heritage-only or mixed HL/L2). On all other measures, they performed similarly at the beginning and end of the semester. There are many potential reasons for this including the role of genre, and the fact that we only analyzed the first draft of the students' essays before they received individual feedback from their instructors and worked on their second version. Although no qualitative measures were used in this study, many students did report that they found the course to be helpful and that they felt more confident in their writing as a result. Thus, it might be important to incorporate mixed methods into this type of research in the future. An additional finding that is important to consider is that informal discussion forums may not be ideal for HLLs at the intermediate or advanced level of proficiency, as they often are quite comfortable with informal communication and this assignment type may not push them to write more accurately or in a more complex fashion. Finally, our sample size for this study was quite small and our findings therefore should be interpreted bearing that in mind. Future studies would do well to compare larger numbers of HLLs and to do so over a longer period of time. It would be especially interesting to see a similar study carried out at a university where more than one heritage-only course is offered to see how extended amounts of diverse HL-specific instruction might help students improve their writing skills.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Illinois at Urbana-Champaign (protocol #20791, approved on 1 September 2020).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to including FERPA-protected information.