The E ﬀ ect of Silence and Distance: Evidence from Recomplementation in US Heritage Spanish

: The present study investigates the architecture of heritage language grammars, as well as divergence from the baseline, by o ﬀ ering novel data. Recomplementation is deﬁned as a left-dislocated phrase sandwiched between a primary (C1) and an optional secondary (C2) complementizer, e.g., He said that 1 later in the afternoon that 2 he would clean his room . While formal syntactic-theoretical accounts align on the grammaticality of recomplementation, experimental ﬁndings suggest that the overt C2 option is associated with a decrement in acceptability. An aural acceptability judgment task and a forced-choice preference task were administered in Spanish to 15 advanced US heritage speakers of Spanish (HS) and 12 members of a Colombian Spanish baseline group. Results show that HSs do not rate the overt C2 construction with a decrement in acceptability when compared to the null one. This behavior, along with a higher proportion of overt C2 preference, diverges from the baseline. In line with the Model of Divergent Attainment, we argue that the complexity associated with silent elements and dependency distance combined with processing burden leads to a reanalysis of the linguistic phenomenon. We introduce a multiple representations proposal that accurately describes the data in question and is faithful to current syntactic-theoretical accounts.


Introduction
In a keynote article for Bilingualism: Language and Cognition (Cambridge), Polinsky and Scontras (2019) lay out a preliminary roadmap for modeling the divergent morphosyntactic properties of heritage languages. Importantly, this divergent attainment model is meant to serve as a framework for predicting and discussing the differences in language competence between heritage speakers and a relevant baseline group (i.e., ideally the source of the heritage speakers' input); it is not intended to facilitate arguments around persistent difficulty or incomplete acquisition. In the framework, the authors propose "problematic" areas of language that can be viewed as sources of divergence in heritage grammar: 1.
Ambiguity (i.e., one-to-many mappings between form and meaning) This is not an exhaustive list, as the authors acknowledge. However, these four sources offer a novel way to categorize the literature on divergent attainment in heritage language acquisition and also help to focus the impending research program. The authors submit that the roadmap can ". . . serve as a jumping-off point for further progress toward a model. In particular, it can lead to specific empirical Languages 2020, 5, 66 2 of 25 predictions about the ways in which heritage languages will (and will not) deviate from their respective baselines" (13).
With this framework in mind, the present study scrutinizes the architecture of heritage language grammars by investigating a still unexplored CP-related phenomenon: recomplementation. This is the linguistic phenomenon whereby a dislocated argument or circumstantial adjunct is sandwiched between two complementizers, as exemplified in (1): 1. Me dice que por suerte que va a tener suficiente tiempo CL.DAT says that for luck that go to have enough time 'S/he says that luckily s/he is going to have enough time.' While recomplementation is a topic of frequent investigation in formal circles (e.g., Brovetto 2002;Demonte and Fernández-Soriano 2009;Fontana 1993;González i Planas 2014;Gupton 2010;Martín-González 2002;Mascarenhas 2007;Paoli 2006;Radford 2013Radford , 2018Rizzi 2013;Villa-García 2012, 2015, very few studies have investigated the phenomenon via experimental methods (e.g., Casasanto and Sag 2008;Frank 2016;Frank and Toribio 2017). In addition, despite claims of widespread use among the dialects of present-day Spanish (e.g., Demonte and Fernández-Soriano 2009), no study to date has examined the acquisition of recomplementation constructions in heritage language populations. This is surprising given the growing body of literature that examines divergent attainment in advanced heritage speakers with respect to left-periphery related phenomena like verb-second, embedded clauses and wh-questions (e.g., Bruhn de Garavito 2002;Cuza 2013;Frank 2011, 2015;Frank 2013;Montrul 2010;Silva-Corvalán 1993;Zapata et al. 2005). The present study contributes to this line of research by offering the following: 1.
An initial investigation of the acquisition of recomplementation structures in a heritage language population 2.
Novel data in support of the growing literature arguing that the left periphery is a "vulnerable" domain 3.
Evidence in support of the Model of Divergent Attainment in heritage grammar.
In pursuing these three contributions, the work also sheds light on how the investigation of lesser studied populations with diverse profiles can and should inform existing syntactic-theoretical debates. The study begins with an overview of the linguistic phenomenon and a discussion of divergent attainment in Heritage Language Acquisition.
The TopicP account argues that the secondary complementizer (C2) marks a left-dislocation and merges in the head of TopicP. Overt C2 is contingent on the dislocated phrase (e.g., esa guitarra vieja "the old guitar"), which merges in the specifier position of the same phrase. The primary complementizer (C1), on the other hand, introduces the semantic function of the sentence and merges in the head of ForceP. The alternative doubled-ForceP analysis proposes that high and low complementizers merge in ForceP and doubled-ForceP, respectively. In other words, C2 is a reiteration of C1, reintroducing the semantic function of the sentence. The TopicP proposal is the primary account adopted in the formal literature because, as Villa-García (2015) puts it, "this analysis accounts for the facts without recourse to additional projections and without further stipulation" (72). However, the formal syntactic-theoretical literature relies heavily on data from monolingual speakers, particularly of peninsular Spanish. It is not yet known whether representations other than TopicP might better 'account for the facts' in other linguistic populations, such as US heritage speakers.
Findings from experimental and corpus-based research, though few in number, have afforded the field a deeper understanding of the why and when behind C2 lexicalization. Casasanto and Sag (2008) predicted that the low complementizer that is not licensed by the grammar but lowers processing costs in complex constructions. In order to investigate this prediction, they investigated two fixed factors-length of the left-dislocated material (one word versus seven words) and presence of the low that (null versus overt)-as represented in (6) and (7) below: 6. John reminded Mary that after he was finished with his meeting (that) his brother would be ready to leave.
7. John reminded Mary that soon (that) his brother would be ready to leave.
Results from an acceptability judgment task and a self-paced reading task that measured the reading time of the critical region brother supported the idea of a tradeoff between grammaticality and processing complexity. Specifically, multiple that constructions are less acceptable but easier to process than their equivalent single that version. Notably, these judgments are conditioned by the length of the left-dislocated (LD) material, where the overt low that is more acceptable in the 7-word condition (e.g., 6) than the 1-word condition (e.g., 7). Furthermore, in the longer LD phrase condition, participants process the critical region brother faster in the overt low that condition. The authors argue that these results support a memory-based account of resolving processing difficulty (e.g., Gibson 1998Gibson , 2000, where the low complementizer reiterates the information provided by the first and thus reduces Languages 2020, 5, 66 4 of 25 the strain on working memory when it is spelled out. The overt complementizer further indicates that the left-dislocated segment has come to an end, which might also assist in the processing of an ensuing complement. In the second study, Frank (2016) investigated the grammatical status of the low complementizer in Colombian Spanish. An aural acceptability judgment task, adopting the same scale as in the previous study, was designed to measure the acceptability of the overt low complementizer in question and statement contexts. Frank hypothesized that if Casasanto and Sag's (2008) findings apply to Spanish, then overt low complementizer constructions should be less acceptable than null ones. Furthermore, the author predicted that overt C2 question items would be rated higher than statement ones, given that C1 in indirect questions is a reportative/quotative marker, which not only permits non-ask/wonder verbs like decir 'to say/tell' to select for an indirect question but also is helpful in disambiguating a semantically ambiguous wh-complement (for a review see Frank 2011, 2015). For the reader's convenience, question and statement test items from Frank (2016) are reproduced in (8)  Results support the notion that the grammatical status of recomplementation is similar in English and Spanish. Specifically, the overt low complementizer in Spanish was indeed associated with a decrement in acceptability judgment, a result that was robust across statement and question constructions. Furthermore, no main effect was found for sentence type, which does not support the hypothesis that the type of dependency relationship between the complementizer and its complement influences acceptability ratings. Future experimental research is required to more precisely determine whether Spanish and English are equally predisposed to accept embedded complementizers. Variables of interest include but are not limited to the matrix verb, the category of the left-dislocate, as well as the length of the material sandwiched between C1 and C2. For a comprehensive formal syntactic-theoretical account, as well as an abundance of attested examples, see Villa-García's (2019) unified proposal.
Finally, Echeverría and Seoane (2019) created a corpus of 124 recomplementation instances in the 14th-century Spanish written text El conde Lucanor. Their analysis found that the length of the left-dislocated material was a significant predictor of C2 lexicalization. Specifically, of the instances with short intervener length (i.e., 1-3 words), only 6/61 (10%) display overt C2. On the other hand when intervener length was long (i.e., 4 or more words), 43/63 (68%) of the items display overt C2. See (10) and (11) for short and long intervener examples, respectively. This length effect has also been documented in present-day English. According to Radford (2018), the average number of words sandwiched between a high and low complementizer in his broadcast English corpus is 5.9. Importantly, more corpus-based research is needed to determine an equivalency between modern Spanish and English.
10. que algunos otros Ø non ayan envidia dellos 'that some others don't envy them' 11. entendiendo que pues todo fincava en su poder, que podría obrar en ello como quisiese 'understanding that because everything laid in his power, that he could act in it as he wished' Echeverría and López Seoane also found that mood, namely the subjunctive as opposed to the indicative, was a significant predictor of C2 lexicalization, adding credence to the hypothesis that the type of dependency relationship between the complementizer and its complement is a relevant factor in predicting C2 lexicalization (Frank 2016). The authors conclude that conventional patterns of C2 usage can be predicted along probabilistic constraints rather than categorical rules.
In summary, the experimental evidence on recomplementation offers a more complicated story of C2 expression than the optionality proposed by syntactic-theoretical accounts. Casasanto and Sag (2008) argue that C2 lexicalization is associated with a grammatical violation that is overridden by the benefit it brings to real-time sentence processing. Frank (2016) does not go as far as to claim that overt C2 is ungrammatical but does provide evidence that it is associated with a decrement in acceptability when compared to the null counterpart. Finally, Echeverría and Seoane (2019) adopt a usage-based account and ignore the question of grammaticality altogether. They note that C2 lexicalization occurs in written contexts and its occurrence can be predicted probabilistically along constraints like length of dislocated material and mood. These three studies provide the field with a deeper understanding of why C2 is overt in some contexts and not in others. It remains to be seen whether attested monolingual C2 expression patterns hold in heritage speaker populations or whether the extra burden of holding two languages in parallel while communicating in one's less dominant language triggers divergent outcomes. If outcomes diverge, then we propose a multiple representations account of recomplementation. This account is further discussed in Sections 1.2 and 4. The following section reviews some of the sources of divergence in heritage grammar and concludes by framing the acquisition of recomplementation within the model of divergent attainment.

Divergent Attainment in US Heritage Spanish
Heritage speakers of Spanish have been shown to diverge from the monolingual norm across several grammatical properties, including but not limited to subject-verb inversion (e.g., Cuza 2016), number and gender agreement (e.g., Cuza and Pérez-Tattam 2016;Scontras et al. 2018), pro-drop (e.g., Montrul 2002Montrul , 2004Montrul , 2008Montrul , 2016, mood selection (e.g., Giancaspro 2017; Perez-Cortes 2016), and clitic expression (e.g., Cuza et al. 2013;Montrul 2010), as well as higher structural projections in the C-domain more generally (e.g., Bruhn de Garavito 2002;Cuza 2013;Frank 2011, 2015;Montrul 2010;Silva-Corvalán 1993). C-domain or left periphery phenomena like recomplementation or verb-second, embedded clauses and wh-questions are arguably more prone to divergence due to the complexity associated with the interface between syntax and pragmatics (e.g., Sorace 2011). Only a few studies have investigated the acquisition of the complementizer que 'that' in US heritage Spanish specifically. For example, on the topic of argument clauses, Silva-Corvalán (1993) found several examples of null que in Los Angeles heritage speakers, as in (12): 12. Yo creo Ø inventaron el nombre.
'I think (that) they invented the name.' Because the null complementizer is perfectly acceptable in the English equivalent, cross-linguistic influence (CLI) effects in the direction of the minority language may be in play. Critically, no examples of que omission in relative clauses were found in the dataset, where omission is ungrammatical in Spanish but grammatical in English. Thus, the minority language appears to be susceptible to the effects of CLI when the surface structure of the two languages overlaps (e.g., Müller and Hulk 2001;Yip and Matthews 2009). This accounts for the observed que omission in Spanish argument clauses and lack thereof in relative clauses. Frank (2011, 2015) ask whether late second language learners and heritage speakers of Spanish of comparable high proficiency acquire the features that regulate the representation of simple indirect constructions, in which the overt complementizer is argued to be obligatory: Languages 2020, 5, 66 6 of 25 13. Me dijo (que)* cuándo iban a salir.
'He asked when they were going to leave.' Data collected from an elicited production task, an acceptability judgment task and a forced-choice preference task suggest that both bilingual groups produce and accept the null que condition in contexts that require a question interpretation. However, heritage speakers outperform second language learners, as demonstrated by greater overt complementizer production, higher acceptability rating of overt complementizer items, and a preference for the overt item. For example, when forced to choose between an overt and a null complementizer option, second language learners overwhelmingly prefer the null condition, while 10/17 heritage speakers prefer the overt complementizer condition. While there is a certain level of individual variation within the heritage speaker group, when compared to the baseline group, heritage speakers overall produce, accept and prefer null que constructions at a higher rate.
In summary, speakers of Spanish as a heritage language diverge from the monolingual norm across several grammatical properties. The few studies that have investigated the acquisition of the complementizer que 'that' in US heritage Spanish support the hypothesis that left periphery phenomena are a vulnerable domain. As discussed in the introduction, Polinsky and Scontras (2019) lay out a preliminary framework for modeling the divergent morphosyntactic properties of heritage languages. With the proposed model, their goal is to accurately predict divergent outcomes in heritage language competence. They organize existing literature on divergent attainment along four intersecting sources of divergence as well as two specific triggers.
The first source of divergence-the morphology problem-pertains to number and gender agreement and overmarking (e.g., past tense sorteded instead of sorted) or overregularization (e.g., past tense bringed instead of brought) (e.g., Polinsky 2018; Scontras et al. 2018). The distance problem speaks to the challenges associated with long distance dependencies (e.g., antecedent-gap, anaphor binding, agreement, left-dislocation) (e.g., Kim et al. 2010;Polinsky 2011). The general outcome is a preference for local over non-local dependency even when this results in non-target performance. The third type of divergence is manifested as the silent problem, which refers to the challenges associated with the interpretation of null elements. For example, while Spanish is a pro-drop language, heritage speakers have been shown to prefer and overuse overt pronouns when compared to a relevant baseline group (e.g., de Prada Pérez 2009;Montrul 2016;Silva-Corvalán 1994). Further, as discussed earlier, Frank (2011, 2015) found that heritage speakers produce and accept the nontarget-like null que condition in contexts that require a question interpretation (see example 12). This suggests that silent material can be the source of reanalysis or restructuring of interpretive possibilities. Finally, the fourth source references the complexities associated with one-to-many mapping between form and meaning. Take for example scope ambiguity. Scontras et al. (2017) found that English-dominant heritage speakers of Chinese only allow surface interpretations of doubly-quantified sentences like A shark attacked every pirate in Chinese (target performance) and English (nontarget performance). These four problems help to focus the future research program that will follow the model.
The two proposed triggers of divergent attainment are (i) quantity and quality of input and (ii) demands on processing and memory. Much has been said on the differing experience between heritage speakers and relevant baseline groups (see Unsworth 2016 for a review). A typical Spanish heritage speaker growing up in the United States may acquire Spanish as their first language in the home. But at school age, they enter an education system and society where English is the dominant language. Their quantity of input in Spanish is greatly reduced at this point. Over time, with an increased use of and exposure to English in school, social and work settings, the minority language becomes less dominant than the majority one. This experience stands in stark contrast to that of monolingual Spanish speakers. What is more, heritage speakers have fewer speaking partners in Spanish (e.g., immediate family, extended family, neighbors) as compared to their majority language partners. Importantly, this is not to say that the input from Spanish varieties that exhibit contact-induced changes or signs of attrition is less legitimate (e.g., Pascual y Cabo and Rothman 2012). Rather, as Polinsky and Scontras (2019) note, ". . . increased exposure to the heritage language will only get heritage speakers so far; they also need exposure from a variety of sources" (11). Thus, one trigger for divergent attainment is the interrelated dimension of quantity and quality of input.
Processing pressure presents a second trigger for divergent attainment. We know from psycholinguistic research on monolingual populations that our online processing resources are limited and some areas of language comprehension and production test these limitations more than others. Examples include but are not limited to dependencies at a distance, the recovery of missing information, and unexpected material (e.g., Bailey and Ferreira 2003;Gibson 1998Gibson , 2000Hale 2001;Levy 2008). These areas of high cognitive demand should be particularly difficult for the heritage speaker, who must maintain two grammars in parallel, while communicating in their less dominant language (e.g., Montrul 2016; Keating et al. 2016;Polinsky and Scontras 2019;Sánchez 2019). For example, Sánchez (2019) proposes a bilingual alignments hypothesis, where co-activation of stored information from different language components is particularly costly. These so-called 'permeable' alignments are possible across all levels of proficiency, though more likely at lower levels.
The phenomenon of recomplementation is a fine candidate for the development of the Model of Divergent Attainment. First, unlike core aspects such as agreement, the phenomenon is not reinforced in school. Second, potential divergent behavior in Spanish-English bilinguals cannot be fully accounted for by cross-linguistic influence effects. As mentioned earlier, previous research argues that both Spanish and English monolinguals find the null complementizer more acceptable (Casasanto and Sag 2008;Frank 2016) and a single syntactic-theoretical analysis can account for the phenomenon cross-linguistically (Villa-García 2019). Finally, it fits neatly into the model's constrained research program, particularly so if one adopts the doubled-ForceP account. Specifically, recomplementation exemplifies the intersection of the silence and distance problems. A null C2 entails that the semantic function of the complement must be interpreted and the information must be retrieved from the C1, which is separated by the dislocated material.
In summary, in Section 1.1, we showed that the TopicP proposal of recomplementation is the primary syntactic-theoretical account adopted in the formal literature. In the present section, we argued why the alternative DoubledForceP account better represents the architecture of heritage language grammar. This divergent representation is motivated by the linguistic complexity of recomplementation associated with silent elements and distance dependency. This, along with processing burden, leads to reinterpretation of the input and eventual divergent attainment. With this multiple representations proposal in mind, the following section introduces the specific research questions that drive the remainder of the study.

Research Questions and Predictions
The present study is an initial investigation of the acquisition of recomplementation structures in a heritage language population. It elaborates on the Model of Divergent Attainment (Polinsky and Scontras 2019) by asking the following research questions:

1.
Do advanced heritage speakers accept the null C2 construction at a higher rate than the overt C2 option? Does language use or proficiency predict this outcome? 2.
Do advanced heritage speakers prefer the null C2 construction at a higher rate than the overt C2 option? Does language use or proficiency predict this outcome? 3.
With respect to (1) and (2), do advanced heritage speakers diverge from the baseline group?
With respect to (1) and (2), we predict that advanced speakers of Spanish as a heritage language will accept and prefer the overt C2 construction at a higher rate when compared to the null variety. Recomplementation as the intersection of silence and distance problems combined with the extra burden of holding two languages in parallel while communicating in one's less dominant language will drive this effect (e.g., Sánchez 2019). Specifically, according to the DoubledForceP account, a lexicalized C2 reintroduces the force or the semantic function of the complement. When C2 is not spelled out, the relevant semantic information must be interpreted or retrieved, which increases the burden on processing resources. We further predict that language use and proficiency will be correlated with rate of acceptability and proportion of preference of the overt C2 option. Specifically, heritage speakers with higher rates of language use and higher levels of proficiency will have more available resources for storage and retrieval of information and thus favor the overt variety.
With regard to (3), we anticipate that the test group's performance across the acceptability judgment and preference tasks will not pattern the control group's behavior. Specifically, the null C2 variety will be associated with a decrement in acceptability judgment in the former group when compared to the latter. Further, the heritage speaker group will prefer the overt C2 option at a significantly higher rate when compared to the baseline group. Our first point of evidence comes from previous research which has shown that the overt C2 construction is associated with a decrement in acceptability judgment in a monolingual Spanish baseline group (Frank 2016). Secondly, while a lexicalized C2 does under certain conditions bring a benefit to the relevant baseline group in real-time sentence processing, the conditions are not met for C2 lexicalization (e.g., long intervener length). Further, the offline measures are meant to serve as a window into language competence, not sentence processing. The predicted differential outcomes for the two groups support earlier research that has attested to the vulnerability of CP-related phenomena in bilingual populations. The prediction also supports a multiple representations account of recomplementation, where the TopicP and the DoubledForceP accounts pertain to the monolingual and bilingual groups, respectively. That is to say, the silence and distance problems along with the extra burden on processing leads to a reanalysis of the linguistic phenomenon and eventual divergent attainment (i.e., different representations) in heritage grammar.

Participants
In order to test these predictions, a total of 27 participants took part in the present study. The participants were divided into two groups, a baseline or control group of native speakers of Colombian Spanish (n = 12) and a US heritage Spanish test group (n = 15). The former group was recruited through word of mouth with the support of local contacts in Bogotá, Colombia. They were all residents of Bogotá at the time of testing and had never lived in a country where a language other than Spanish was the primary language of society and education. Their ages ranged from 18 to 35 (M = 22, SD = 5.4) and they were all at minimum high school educated, with the majority having attended college (8/12). The majority of the control group were students, while other professional industries included engineering, logistics and music. The test group was recruited from a large public university in the southwestern United States. The participants' birthplace was the United States, with one exception (Mexico). They were all raised by native Spanish speaking parents who spoke to them in either Spanish (8/15) or Spanish and English (7/15) in childhood. Their primary language of instruction from primary school through college was English. Lastly, their ages ranged from 18 to 23 (M = 20, SD = 1.5).
Given that recomplementation is not thought to be a source of dialectal variation, we adopted a control group that does not directly match the input of the heritage speaker group. Beyond the dialect difference, the control group may also differ from the baseline variety the HSs would have been exposed to in terms of English language history. Whereas the baseline variety may have learned English as an adult upon immigration, 4/12 participants in the control group studied English to some degree in a high school and/or university setting. The incongruency between the control group and the traditional baseline variety, along with the possible inflation of type II error owing to a low number of participants, constitute two of the study's limitations. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Office of Research at the University of Texas at Austin (IRB protocol #2015-04-0048). The last portion of the language history questionnaire included a brief proficiency self-assessment in both Spanish and English, which also facilitated a language dominance calculation. Ratings along a four-point scale (i.e., 1 "basic", 2 "adequate", 3 "good", 4 "excellent") were elicited across four modes: reading, writing, speaking and comprehension. As reported in Table 1, after collapsing the four modes, the test group's mean for English is greater than Spanish, M = 3.83 and M = 2.97, respectively. Importantly, it is also true, with one exception, that each participant was English dominant. The one exceptional participant self-rated as balanced. Contrastingly, the baseline's mean for Spanish is greater than English (M = 3.79 and M = 1.79, respectively) and without exception, each participant was Spanish dominant. A standardized dominance test was not administered since this variable was only a secondary and exploratory one and the experimental session was already timed at one hour. Further, the calculation of a dominance coefficient by subtracting a proficiency assessment in language 1 from a proficiency assessment in language 2 is common practice (e.g., Cuza et al. 2019). For a summary of the profile of each group, see Table 1.  In addition to the language history questionnaire, all participants completed an adapted version of the DELE (Diploma of Spanish as a Foreign Language) proficiency test (e.g., Montrul and Slabakova 2003). As demonstrated in Table 1 above, the control group scores were in the range of 39-47 (M = 43, SD = 2.7) and the test group was in the range of 30-44 (M = 37, SD = 4.9) out of a possible 50 items. An independent-sample t-test showed a significant difference between the baseline and test group proficiency results, (t(24) = −1.882, p < 0.001). Specifically, the native speakers of Colombian Spanish outperformed the US heritage Spanish test group.
Lastly, for the test group, a proportion of current Spanish language use was calculated. Participants were asked how often they used Spanish in the following four contexts: school, home, work and social situations. Their responses were recorded along the scale of 0 "English only", 1 "mainly English", 2 "a little more English", 3 "both equally", 4 "a little more Spanish", 5 "mainly Spanish", 6 "Spanish only". Individual responses were then divided by 6 (Spanish only) to calculate a proportion of Spanish use, with a possible range of 0 "English only" to 1 "Spanish only". As demonstrated in Table 2, the overall proportion of Spanish language use is M = 0.24 (SD = 0.13) or one quarter of total language use.
Table 2 also shows that the imbalance between Spanish and English is particularly driven by school, work and social contexts. Interestingly, the proportion of Spanish used in the home is M = 0.47 (SD = 0.32), suggesting nearly equal usage of Spanish and English. This is perhaps not surprising given the students are college-aged and "home" still refers to their family home, where they were raised with either Spanish or Spanish and English.

Experimental Tasks
The study itself consisted of two experiments, an aural acceptability judgment task (AJT) and a written forced-choice preference task. Two offline judgment tasks were adopted in order to provide a more comprehensive window into the participant's underlying representations. Whereas an AJT assumes that a decrement in acceptability judgment is a devaluation of grammatical status, a preference task makes no such claim. As noted in Cuza and Frank (2015), on a preference task where " . . . the two structures are presented together, the internal representation of the speakers is more activated as they can compare the two structures and choose the one that is more preferable to them" (20). Further, supplementing reading/writing tests with aural/oral ones, or avoiding the written medium altogether, has been argued to be critical when eliciting data from heritage language communities. Written tasks underrepresent the overall performance abilities of heritage speakers (e.g., Bowles 2011; Cuza and Frank 2015;Cuza 2013;Potowski et al. 2009). For example, within the same study, heritage speakers have been outperformed by second language learners in written tasks and then outperformed their counterparts in verbal ones (e.g., Alarcón 2011; Montrul 2011; Montrul et al. 2008).
The aural AJT was designed on a professional package of the Weebly web-hosting service. All testing instructions and tokens were reviewed and read by a naïve native Spanish speaker and university-level Spanish instructor, whose voice was recorded and then edited on version 2.0.3 of Audacity ® software (Audacity Team 2019). The recorded instructions explained that each test item contained three sections: a preamble, a question, and a response to the question. After listening to all three parts, the listener's task was to determine whether the response was well formed, using the scale 1-totally acceptable to 7-totally unacceptable, adopting Casasanto and Sag's (2008)  As demonstrated in (14) and (15), the preamble represents a clitic left-dislocated (CLLD) statement and question, respectively, establishing the argument (e.g., el traje, el dibujo) as having been previously mentioned in the discourse. The left-dislocated segment is specifically composed of a demonstrative adjective, a noun, and an adjective that modifies the noun, and is followed by an informal future expression with the clitic attached to the infinitive verb. The preamble is followed by the question, which remains consistent across all test items. The third and final section pertains to the response, which is the recomplementation test item that participants are to judge on the aforementioned ordinal scale. Test items include embedded statements and questions with a variety of sandwiched arguments previously mentioned in the discourse. The material that intervenes between C1 and optional C2 is controlled at three words. Finally, for the question condition only, the locative or the temporal adjunct wh-words dónde 'where' and cuándo 'when', respectively, introduce the question complement.
The aural AJT was designed to measure the acceptability of the null versus the overt low complementizer in both question and statement contexts. It is composed of 6 statements with null secondary complementizer items, 6 statements with overt secondary complementizer items, 6 questions with null secondary complementizer items, and 6 questions with overt secondary complementizer items for a total of (n = 24) tokens (see Appendix A). These test tokens were scrambled with 48 distractor items (of both question and statement varieties investigating unrelated phenomena) so as to ensure that no two identical conditions appeared consecutively.
In the second experiment, the paper and pencil forced-choice preference task, participants were directed to read a short preamble and then select the preferred one of two available continuation statements, see (16) and (17) below: 16. Preamble: Ayer Miguel tuvo que recordarme de la chaqueta que habíamos visto.
'Miguel asked me again when I was going to buy that jacket.'.
17. Preamble: Ayer Leonardo tuvo que recordarme del folleto que creamos la semana pasada. 'Yesterday Leonardo reminded me of the flyer that we created last week. ________ Leonardo me dijo que ese folleto, iba a distribuirlo en el centro. ________ Leonardo me dijo que ese folleto, que iba a distribuirlo en el centro.
'Leonardo told me that he was going to distribute that flyer downtown.' As in the AJT, recomplementation test items include embedded question and statement sentence types for a total of 16 tokens (see Appendix A). Test items were composed of a variety of left-dislocated topics that are previously mentioned in the discourse and the material that intervenes between C1 and optional C2 is controlled at two words: a demonstrative adjective and a noun. In all test items, we include the comma punctuation after the dislocated phrase. The comma has the effect of representing the intonational break or prosodic boundary in written form and is in line with Villa-García (2019) who uses commas in his examples. Test items were scrambled with 24 distractors and the two choices within each token were then counterbalanced in order to avoid a bias for selecting the first available option. Given the forced-choice design of the experiment, the dependent measure was binary, where a preference for the null option was coded with a score of 0 and a preference for the overt option received a score of 1.

Acceptability Judgment Task
The baseline and test group results from the aural AJT are displayed in Figures 1 and 2, respectively. Each column represents the mean acceptability rating with standard error bar for each of the four test conditions: null C2 questions, null C2 statements, overt C2 questions and overt C2 statements. Recall that each condition was made up of six test items for a total of (n = 24) tokens. Further, the acceptability rating scale is a seven-point scale from 1-totally acceptability to 7-totally unacceptable. That is to say, the shorter the column, the more acceptable the condition. The scale has been truncated on the y-axis to better fit the data.
respectively. Each column represents the mean acceptability rating with standard error bar for each of the four test conditions: null C2 questions, null C2 statements, overt C2 questions and overt C2 statements. Recall that each condition was made up of six test items for a total of (n = 24) tokens. Further, the acceptability rating scale is a seven-point scale from 1-totally acceptability to 7-totally unacceptable. That is to say, the shorter the column, the more acceptable the condition. The scale has been truncated on the y-axis to better fit the data.  The baseline group results from Figure 1 suggest that the null C2 construction, independent of sentence type, is judged as more acceptable than the overt C2 construction. This behavior appears to contrast with the behavior of the test group. Specifically, Figure 2 shows similar behavior across all conditions for the test group. Importantly, given the 1-totally acceptable to 7-totally unacceptable rating scale, all the experimental items, even the ones associated with a decrement in acceptability judgment, fall within the acceptable range (less than 3.5) in both groups. Thus, we cannot claim that any of the sentence types are ungrammatical. Table 3 summarizes these results. of the four test conditions: null C2 questions, null C2 statements, overt C2 questions and overt C2 statements. Recall that each condition was made up of six test items for a total of (n = 24) tokens. Further, the acceptability rating scale is a seven-point scale from 1-totally acceptability to 7-totally unacceptable. That is to say, the shorter the column, the more acceptable the condition. The scale has been truncated on the y-axis to better fit the data.  The baseline group results from Figure 1 suggest that the null C2 construction, independent of sentence type, is judged as more acceptable than the overt C2 construction. This behavior appears to contrast with the behavior of the test group. Specifically, Figure 2 shows similar behavior across all conditions for the test group. Importantly, given the 1-totally acceptable to 7-totally unacceptable rating scale, all the experimental items, even the ones associated with a decrement in acceptability judgment, fall within the acceptable range (less than 3.5) in both groups. Thus, we cannot claim that any of the sentence types are ungrammatical. Table 3 summarizes these results. The baseline group results from Figure 1 suggest that the null C2 construction, independent of sentence type, is judged as more acceptable than the overt C2 construction. This behavior appears to contrast with the behavior of the test group. Specifically, Figure 2 shows similar behavior across all conditions for the test group. Importantly, given the 1-totally acceptable to 7-totally unacceptable rating scale, all the experimental items, even the ones associated with a decrement in acceptability judgment, fall within the acceptable range (less than 3.5) in both groups. Thus, we cannot claim that any of the sentence types are ungrammatical. Table 3 summarizes these results. In order to investigate these descriptive statistics and to shed light on (RQ1) and (RQ3), a logistic mixed effects model for ordinal data was run with the CLMM (cumulative link mixed model) function in the R ordinal library (Christensen 2015; R Core Team 2017). The model defined three fixed effects-group (Colombian or heritage speaker), type (question or statement) and C2 (null or overt)-four interactions (group*type, group*comp, type*comp, group*type*comp) and one random intercept for subject. Both group (β = 2.236, z = 3.331, p < 0.001) and C2 (β = 1.339, z = 4.045, p < 0.001) were significant, along with the interaction of group*C2 (β = −1.307, z = −3.044, p = 0.002). Type did not reach significance (β = −0.563, z = −1.626, p = 0.104). To further explore the interaction, a post hoc pairwise comparison with Bonferroni adjustment was run. The analysis showed that the effect of C2 was significant in the Colombian baseline group (β = −1.308, z = −5.939, p < 0.001) and not the heritage speaker test group (β = −0.093, z = −0.498, p = 0.619). In summary, the results demonstrate that heritage speakers do not accept the null C2 construction at a higher rate than the overt C2 option, partially confirming what we predicted in (RQ1). The confirmation is only partial because rather than accepting the overt C2 construction at a higher rate, the effect of C2 was not significant. The results also demonstrate that the test group's behavior diverges from the baseline group, who rates the overt C2 variety with a decrement in acceptability, confirming our prediction of divergent performance in (RQ3).
In an attempt to further analyze the test group's divergent behavior and individual variation, three correlations were run with data from the participant profile (Tables 1 and 2). We investigated whether Spanish proficiency (results from DELE exam), degree of English dominance or Spanish language use predicted the outcome of the null C2 acceptability ratings, where the divergence is most salient (see Figures 1 and 2). We hypothesized that Spanish proficiency, dominance and language use as proxies for experience might predict the test group's divergent behavior and be correlated with the acceptance of the null C2 variety. However, results returned weak and insignificant correlations between the acceptability ratings and proficiency (r = 0.105, p = 0.708), dominance (r = −0.022, p = 0.937) and language use (r = −0.031, p = 0.914). Importantly, the present study does not set as its primary goal to investigate heritage speakers of varying levels of Spanish experience and proficiency. On the contrary, heritage speakers on the higher end of the proficiency distribution are singled out to investigate whether divergent representation is eventually attained. As a result, we cannot entirely rule out the possibility of an experience or proficiency-related effect. In fact, in the following section which reviews the results of the preference task, we see that language use does predict divergent behavior. Specifically, HSs with greater exposure to Spanish prefer the null C2 construction. A follow-up study should include proficiency and language use in the primary experimental design. What is more, as noted by one reviewer, future research might also investigate the effect of family language planning, i.e., being raised in an exclusively Spanish-speaking family versus a Spanish-English speaking family.
A final grouping analysis was adopted to determine whether the descriptive and statistical analyses above were also observed at the individual level. In this analysis, participants were divided into five distinct groups, depending on (1) whether they rated the null or the overt variety as more acceptable and (2) the degree by which they did so. Specifically, those who favored the null over the overt variety by a difference of less than 1 on the 1-7 rating scale, were placed in the "null low" group. If the difference was greater than 1, then they were placed in the "null high" group. The same divisions were applied to those who favored the overt variety. Finally, if the difference was equal to zero, they were placed in the "equal" group. Sentence type (question or statement) was collapsed since the effect was not significant. The results of this analysis for Colombian (baseline) and US heritage (test) groups are represented in Table 4. As demonstrated above, the majority of the baseline Colombian participants rate the null variety as more acceptable, 8/12 (67%). Furthermore, the majority of the Colombian participants who find the null varieties more acceptable fall into the "high" category. Curiously, this grouping analysis shows us that the heritage speaker participants also rate the null variety as more acceptable by a small majority, 8/15 (53%). However, Table 4 suggests a much more complicated story for the test group.
At first glance, the data appears to be distributed bimodally, with 8/15 (53%) participants accepting the null variety at a higher rate and 6/15 (40%) accepting the overt variety at a higher rate. Upon further inspection, nearly all the test group participants are housed in the "low" category. This suggests that despite the fact that there is only one US heritage participant in the "equal" category, these participants do not find the overt or the null varieties much more acceptable than one another.

Preference Task
In order to supplement the AJT findings, participants also completed a forced-choice preference task. Recall that participants are instructed to select the preferred one of two available options (null and overt C2 varieties). Thus, the dependent measure was binary, where a preference for the null option was coded with a score of 0 and a preference for the overt option received a score of 1. Figure 3 below depicts the proportion of overt C2 options that were preferred, where any value below 0.50 entails null C2 preference.
As demonstrated above, the majority of the baseline Colombian participants rate the null variety as more acceptable, 8/12 (67%). Furthermore, the majority of the Colombian participants who find the null varieties more acceptable fall into the "high" category. Curiously, this grouping analysis shows us that the heritage speaker participants also rate the null variety as more acceptable by a small majority, 8/15 (53%). However, Table 4 suggests a much more complicated story for the test group. At first glance, the data appears to be distributed bimodally, with 8/15 (53%) participants accepting the null variety at a higher rate and 6/15 (40%) accepting the overt variety at a higher rate. Upon further inspection, nearly all the test group participants are housed in the "low" category. This suggests that despite the fact that there is only one US heritage participant in the "equal" category, these participants do not find the overt or the null varieties much more acceptable than one another.

Preference Task
In order to supplement the AJT findings, participants also completed a forced-choice preference task. Recall that participants are instructed to select the preferred one of two available options (null and overt C2 varieties). Thus, the dependent measure was binary, where a preference for the null option was coded with a score of 0 and a preference for the overt option received a score of 1. Figure  3 below depicts the proportion of overt C2 options that were preferred, where any value below 0.50 entails null C2 preference. In Figure 3, the baseline group demonstrates a strong preference for null option that is robust across question and statement conditions, 0.16 and 0.13, respectively. Contrastingly, the test group not only demonstrates a marginal preference for the overt C2 in question items, but this behavior appears to differ with respect to a moderate preference for the null C2 in statement items. Table 5 summarizes these results.

Question Statement
Baseline In Figure 3, the baseline group demonstrates a strong preference for null option that is robust across question and statement conditions, 0.16 and 0.13, respectively. Contrastingly, the test group not only demonstrates a marginal preference for the overt C2 in question items, but this behavior appears to differ with respect to a moderate preference for the null C2 in statement items. Table 5 summarizes these results. In order to elaborate on the preliminary findings in Table 5 and to shed light on (RQ2) and (RQ3), a generalized linear mixed effects model was run with the GLMER function in R (R Core Team 2017). The model defined two fixed effects, group (Colombian or heritage speaker) and type (question or statement), an interaction for group*type and a random intercept for subject. Group was found to be significant (β= 2.714, z = 2.833, p = 0.004), while type was not (β = −0.328, z = −0.699, p = 0.485). The interaction of the two was marginally significant (β = −1.156, z = −1.934, p = 0.053). In sum, the significant effect of group tells us that the heritage speaker group's proportion of overt C2 preference is significantly greater than the baseline group's, suggesting divergent behavior and confirming our expectations for (RQ3). Preliminary results do however point to the test group's marginal overall preference for the null variety when both sentence types are averaged together (0.43). Thus (RQ2), which asks whether heritage speakers prefer the null variety over the overt variety, deserves further scrutiny.
To explore the marginal interaction between group and type, we ran a post hoc pairwise comparison for all combinations of group and type with Bonferroni adjustment. This analysis shows that the effect of type is significant in the test group (β = 1.484, z = 4.001, p < 0.001) and not in the baseline group (β = 0.328, z = 0.699, p = 0.485). This curious effect can be explained if we consider the complementizer in embedded questions as being helpful in disambiguating a semantically ambiguous wh-complement (for a review see Frank 2011, 2015). The problem of distance introduced by left-dislocated material intervening between C1 and optional C2 can make the lexicalization of C2 all the more helpful.
With data from the language history questionnaire, a complementary correlational analysis was then conducted to investigate a potential effect of language experience. The analysis returned two moderate and statistically significant correlations with language use as a predictor, while the remaining correlations with the proficiency and dominance predictors were weak and insignificant. Specifically, the test group's proportion of Spanish language use predicted both the proportion of overt C2 preference in questions (r = 0.525, p = 0.045) and the overt C2 preference overall (r = 0.595, p = 0.019). This means that heritage speaker participants who use Spanish more often in the school, work, home and social contexts prefer the overt C2 items at a higher rate in question and combined question and statement conditions. Prima facie it is surprising that Spanish language use is positively correlated with behavior that is less baseline-like. However, this finding is entirely compatible with a divergent attainment and multiple representations account. Specifically, the silence and distance complexities along with the extra processing burden of holding two languages in parallel while communicating in the less dominant one leads to a reanalysis of the linguistic phenomenon. Eventually, a heritage speaker with sufficient Spanish exposure might develop a divergent representation of recomplementation-one that better accounts for the benefit of overt C2 (see DoubledForceP account in Sections 1.1 and 1.2, as well as Section 4 for continued discussion).
One last grouping analysis was designed to determine whether the descriptive and statistical analyses above were also observed at the individual level. Specifically, participants were grouped into three categories: those who preferred the null C2 variety (proportion of overt C2 preference between 0.0 and 0.49), those who preferred the overt C2 variety (proportion of overt C2 preference between 0.51 and 1.00) and those who displayed no preference (proportion of overt C2 preference equals 0.50). Type (question or statement) was not collapsed as it was for the AJT, given the effect proved significant in the preference task.
As demonstrated by Table 6, the baseline participants prefer the null C2 variety over the overt one. This result is robust across both question (10/12, 83%) and statement types (12/12, 100%). Contrastingly, individual variation is high in the test group. The heritage speaker group roughly patterns with the baseline group in the statement condition, with the majority of participants demonstrating null C2 preference (11/15, 73% vs. 4/15, 27%). Still, it is important to note that nearly 1 out of 3 participants do prefer the overt C2 in the statement condition. In the question condition, the test group participants vary considerably: 7/15 (47%) prefer the overt option; 5/15 (33%) prefer the null option; and 3/15 (20%) have no preference. High individual variation appears to be the most accurate conclusion. In sum, the heritage speaker group prefers overt C2 at a significantly higher rate than the baseline group. Further, their proportion of overt C2 preference is significantly greater in questions as compared to statements. Importantly, the individual grouping analysis in Table 6 complicates this narrative by showing high individual variation. Curiously, the effect of type (statement vs. question) found in the preference task was not replicated in the AJT. The preference task was seemingly more sensitive to this divergent outcome.

Discussion
The present study offered an initial exploration of recomplementation in advanced speakers of Spanish as a heritage language. We adopted an aural version of an acceptability judgment task so as not to underrepresent the overall performance abilities of this population. This task was supplemented by a written forced-choice preference task. Together these offline tasks were assumed to serve as a window into heritage speaker knowledge of the secondary complementizer in statement and question contexts. Data analyzed at the group and individual level shed light on whether heritage speakers (1) accept and (2) prefer the null C2 construction at a higher rate than the overt C2 one and (3) whether their behavior diverged from a relevant baseline group. We predicted that the test group would accept and prefer the overt C2 at a higher rate and that this would diverge from the control group. These predictions were primarily motivated by previous research on the vulnerability of CP-related phenomena in bilingual populations, sources of divergence that pertain to null elements and distance dependencies, and evidence of C2 expression constraints in the relevant baseline group.
Findings from the AJT show that heritage speakers do not accept the null C2 construction at a higher rate than the overt C2 option. Prima facie, this supports our expectations. However, heritage speakers did not significantly accept the overt C2 at a higher rate either, though we cannot discount the possibility of type II error. Rather, there was no significant effect for C2. This result is supported by the individual analysis, where what appears to be a bimodal distribution at first glance-8/15 rate the null variety higher vs. 6/15 the overt variety-turns out to be a bit misleading. That is to say, the amount that tips the scale for nearly all participants is "low". Thus, we can conclude that the heritage group does not find the overt or the null varieties much more acceptable than one another. Importantly, no significant effect of language use or proficiency was found. Still, in line with our expectations, this behavior does diverge from the baseline group, which accepts the null variety at a significantly higher rate (see Figures 1 and 2). Individual analysis supports this finding, where despite some variation, the majority of the baseline participants rate the null variety as more acceptable.
The picture is made still more complex when we consider the preference task results. A group analysis of the heritage speakers displays a marginal overall preference for the null variety, 0.43, where a score below 0.50 signals null preference. In addition, there is a significant effect for sentence type, where heritage speakers prefer the null variety significantly more in embedded statements than in embedded questions. In fact, we see a moderate preference for the null C2 construction in statement condition (0.32) and a marginal overt C2 preference in the question condition (0.53). Further, we find that Spanish language use is positively correlated with overt C2 preference. This finding suggests that heritage speakers who use Spanish more often diverge most from the baseline group. Finally, results from heritage speaker individual analysis demonstrate high individual variation, particularly in the question condition, where 7/15 (47%) prefer the overt option, 5/15 (33%) prefer the null option and 3/15 (20%) display no preference. As in the AJT, the test group results from the preference task do not perfectly align with categorical expectations. Also like the AJT, our expectations are confirmed with regard to divergent behavior when compared to the baseline group. The control group prefers the null variety at a significantly higher rate, an effect that is both robust across sentence type and confirmed by the individual analysis.
With respect to the overall marginal to no effect of C2 displayed by the heritage speaker group, we offer the following interpretation. The oversuppliance of overt forms (e.g., pronouns) among heritage speakers, and bilingual populations more generally, is well documented. Polinsky and Scontras (2019) devote a section to the silent problem with reference to pro-drop specifically. They note that the increase in the adoption of overt forms can be traced to earlier generations, even first-generation immigrants (e.g., Montrul 2016;Otheguy and Zentella 2012;Otheguy et al. 2007;Sorace 2004). What is more, the overuse of overt material has been observed as a result of contact itself, not fully explained by cross-linguistic influence effects (e.g., de Prada Pérez 2009). Importantly, the claim is not that null forms have not been acquired; rather, their rates of use are reduced proportionally. Importantly, there is no reason to assume that the probabilistic constraints for C2 expression are equivalent across heritage speaker and baseline populations. It is possible that the constraints or features regulating the secondary que result in optional selection between null and overt C2 as the correct setting for the test group in the specific contexts of the test items (i.e., short intervener phrases composed of dislocated objects). It remains to be seen whether optional selection between null and overt C2 is the correct setting when the length of the material between C1 and C2 is long, when the dislocated phrase is a subject or adverbial, or when C2 introduces a subjunctive clause (Echeverría and Seoane 2019).
While it has been argued that heritage speaker acquisition of C-domain phenomena (e.g., verb-second, embedded clauses and wh-questions) are more likely to diverge from the baseline group due to the complexity associated with the interface between syntax and pragmatics (e.g., Sorace 2011), we would like to consider a different framework, namely the Model of Divergent Attainment in Heritage Grammar (Polinsky and Scontras 2019). We argue that the silence and distance problems along with the extra burden on processing leads to a reanalysis of the linguistic phenomenon and eventual divergent attainment. According to the DoubledForceP account, a spelled out C2 reintroduces the semantic function of the embedded clause. When C2 is null, it must be interpreted and/or relevant information must be retrieved from the primary complementizer (C1), which is separated by a string of intervening material seemingly limitless in length. As we know from monolingual processing literature, interpretation of missing elements and distant information retrieval places an increased demand on working memory and processing resources (e.g., Casasanto and Sag 2008;Gibson 1998). The strain on available resources should be even greater in bilinguals and heritage speakers specifically, who are holding multiple languages in parallel while working in the less dominant one (e.g., Montrul 2016;Keating et al. 2016;Polinsky and Scontras 2019;Sánchez 2019). The differing processing/usage constraints between groups motivates us to propose a multiple representations account of recomplementation. We specifically hypothesize that monolinguals have the TopicP representation of recomplementation, whereas HSs have the doubled-ForceP representation, because they have reinterpreted the input and arrived at a divergent representation. This account is summarized in Table 7. In the baseline group, the overt C2 adds to sentence complexity. This is triggered by a principle of economy: when all things are equal, choose the less complex of the two options. In accordance with the syntactic-theoretical literature, and because the monolingual data does not support a benefit derived from doubling or duplicating the complementizer, we propose the baseline group acquires a TopicP representation. On the other hand, silent elements and distance dependencies are particularly problematic for the heritage speaker group. These sources of complexity trigger heuristics that pertain to the avoidance of silent material and the establishment of the shortest distance dependency. With reference to recomplementation specifically, this motivates a reinterpretation of the input, which leads to an eventual restructuring of the phenomenon and a divergent representation (i.e., Doubled-ForceP).
We believe the pronounced effect of sentence type in the preference task adds credence to the multiple representations proposal. Specifically, the complement in embedded wh-questions, such as (7) . . . dónde iba a colgarlo 'where I was going to hang it' is semantically ambiguous [+QU] or [−QU] (Suñer 1993). This is not true of the embedded statements, such as (8) . . . iba a pedirlo 's/he was going to order it', which are [−QU]. We might then consider the complementizer in embedded wh-questions as being helpful in disambiguating a semantically ambiguous wh-complement, where the interrogative force and/or the reportative nature of the secondary complementizer predicts overt C2. Echeverría and Seoane's (2019) finding that mood predicts C2 expression adds credence to the importance of the dependency type. This problem of ambiguity, or one-to-many mapping between form and meaning in the wh-complement, may even be exacerbated by the distance between C1 and C2. This is an empirical question that can be investigated in a future study. Future studies should also adopt a control group that matches the heritage speaker input, increase the number of participants as well as include a production task in order to highlight any potential mismatches between production and comprehension.
A broader contribution of the present study is a demonstration of how bilinguals and speakers with diverse profiles inform our understanding of the nature of grammatical representations. Preliminary evidence on monolingual populations has suggested overt C2 can be predicted probabilistically along defined constraints, such as length of dislocated material and mood (Casasanto and Sag 2008;Echeverría and Seoane 2019). In the present study, we see that even probabilistic predictions along defined constraints fail to fully capture the effect, which leads us to a multiple representations account of recomplementation. Importantly, different populations, as well as individuals within the same population, can display probabilities and constraints that differ from one another (e.g., Dąbrowska 2012). Given already documented domains of divergent attainment and relatively high within-group variation, heritage speaker populations afford researchers valuable data.
In closing, evidence in support of Polinsky and Scontras' (2019) model of divergent attainment is preliminary in nature and limitations of the present study should be mentioned. We primarily refer to two limitations: (1) statistical power and (2) the baseline group. First, we acknowledge that the experimental power to detect differences is low. While we adopt the standard significance level of 0.05 to set a conservative criterion and avoid type I error, test group and control group sample sizes are small, n = 15 and n = 12, respectively. A small sample size increases the chance of false negatives, or inconclusive nonsignificant effects (i.e., type II error). Secondly, an ideal heritage speaker comparison group would represent the input to which the test group is exposed (e.g., see relevant baseline, Polinsky and Scontras 2019). These speaking partners might include immediate and extended family, neighbors and other community members, monolinguals and bilinguals alike. While there is no widespread knowledge that the linguistic phenomenon is subject to dialectal variation, the limitation of adopting a monolingual Colombian Spanish baseline group must be noted. Future studies should account for these shortcomings.
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflict of interest. Table A1. Stimuli for Acceptability Judgment Task.

Item
Sentence Type C2 Stimuli

statement que
Preamble: Esa casita antigua, voy a pintarla. "I will paint the old house." Response: Me dijo que esa casita antigua que iba a pintarla. "S/he told me s/he was going to paint the old house." 2 statement que Preamble: Esa chaqueta roja, ¿cuándo vas a comprarla? "When will you buy the red jacket?" Response: Me dijo que esa chaqueta roja cuándo iba a comprarla. "S/he asked me when I was going to buy the red jacket." 24 question no que Preamble: Ese museo privado, ¿cuándo vas a visitarlo? "When will you visit the private museum?" Response: Me dijo que ese museo privado cuándo iba a visitarlo. "S/he asked me when I was going to visit the private museum." Table A2. Stimuli for Preference Task.

Statement
Preamble: Ayer Leonardo tuvo que recordarme del folleto que creamos la semana pasada. "Yesterday Leo reminded me about the flyer that we created last week." Option 1: Leonardo me dijo que ese folleto, iba a distribuirlo en el centro. Option 2: Leonardo me dijo que ese folleto, que iba a distribuirlo en el centro. "Leo told me that he was going to distribute the flyer downtown."

Statement
Preamble: Ayer tuve que recordarle a Natalia de los conciertos que el músico iba a presentar esta semana. "Yesterday I reminded Natalie of the concerts that the musician was going to present this week." Option 1: Yo le dije que ese concierto, iba a asistirlo este viernes. Option 2: Yo le dije que ese concierto, que iba a asistirlo este viernes. "I told him that I was going to attend the concert this Friday." Table A2. Cont.

Statement
Preamble: Ayer Pablo tuvo que recordarme de la opción de alquilar la computadora de la biblioteca. "Yesterday Pablo reminded me of the option of renting a computer from the library." Option 1: Pablo me dijo que esa computadora, que iba a alquilarla toda la semana. Option 2: Pablo me dijo que esa computadora, iba a alquilarla toda la semana. "Pablo told me that he was going to rent the computer for the entire week."

Statement
Preamble: Ayer Raúl tuvo que recordarme de su sombrero que no había llevado por mucho tiempo. "Yesterday Raul reminded me about his hat which he hadn't worn for a while." Option 1: Raúl me dijo que ese sombrero, iba a llevarlo por la tarde. Option 2: Raúl me dijo que ese sombrero, que iba a llevarlo por la tarde. "Raul told me that he was going to wear the hat in the afternoon."

Statement
Preamble: Ayer, como la semana pasada, Miguel tuvo que recordarme del anillo que había visto en la joyería. "Yesterday, like last week, Miguel reminded me of the ring that he had seen in the jewelry store." Option 1: Miguel me repitió que ese anillo, que iba a comprarlo un día pronto. Option 2: Miguel me repitió que ese anillo, iba a comprarlo un día pronto. "Miguel told me again that he was going to buy the ring one day soon."

Statement
Preamble: Ayer, como la semana pasada, tuve que recordarle a Alfredo de lo que iba a hacer con la camisa fea. "Yesterday, like last week, I reminded Alfredo what I was going to do with the ugly shirt" Option 1: Yo le repetí que esa camisa, que iba a llevarla al cumpleaños. Option 2: Yo le repetí que esa camisa, iba a llevarla al cumpleaños. "I told him again that I was going to wear the shirt for the birthday party."

Statement
Preamble: Ayer, como la semana pasada, tuve que recordarle a Javier de la hora que iba a tomar la clase. "Yesterday, like last week, I had to remind Javier of the time that I was going to take the class." Option 1: Yo le repetí que esa clase, iba a tomarla por la tarde. Option 2: Yo le repetí que esa clase, que iba a tomarla por la tarde. "I told him again that I was going to take the class in the afternoon."

Statement
Preamble: Ayer, como la semana pasado, tuve que recordarle a Ramón de la cama. "Yesterday, like last week, I reminded Ramon about the bed." Option 1: Yo le repetí que esa cama, iba a comprarla pronto. Option 2: Yo le repetí que esa cama, que iba a comprarla pronto. "I told him again that I was going to buy the bed soon."

question
Preamble: Ayer, Felipe tuvo que recordarme de la renovación de la casa. "Yesterday, Philip reminded me about the home renovation." Option 1: Felipe me preguntó que esa casa, que cuándo iba a renovarla. Option 2: Felipe me preguntó que esa casa, cuándo iba a renovarla. "Philip asked me when I was going to renovate the home."

question
Preamble: Ayer, tuve que recordarle a Mario de la colección de joyería. "Yesterday I reminded Mario of the jewelry collection." Option 1: Yo le pregunté que esa joyería, adónde iba a llevarla. Option 2: Yo le pregunté que esa joyería, que adónde iba a llevarla. "I asked him where he was going to take the jewelry."

question
Preamble: Ayer, como la semana pasada, Rodrigo tuvo que recordarme del tamaño del árbol. "Yesterday, like last week, Rodrigo reminded me of the size of the tree." Option 1: Rodrigo me preguntó que ese árbol, dónde iba a sembrarlo. Option 2: Rodrigo me preguntó que ese árbol, que dónde iba a sembrarlo. "Rodrigo asked me where I was going to plant the tree."

question
Preamble: Ayer, Carlos tuvo que recordarme que no íbamos a dejar el postre en la mesa. "Yesterday, Carlos reminded me that we weren't going to leave the dessert on the table." Option 1: Carlos me dijo que ese postre, que adónde iba a guardarlo. Option 2: Carlos me dijo que ese postre, adónde iba a guardarlo. "Carlos asked me where I was going to leave the dessert."

question
Preamble: Ayer, como la semana pasada, tuve que recordarle a Emilia de la bicicleta en nuestro garaje. "Yesterday, like last week, I reminded Emilia about the bicycle in our garage." Option 1: Yo le dije que esa bicicleta, que cuándo iba a montarla. Option 2: Yo le dije que esa bicicleta, cuándo iba a montarla. "I asked her when she was going to ride the bicycle."

question
Preamble: Ayer, como la semana pasada, Miguel tuvo que recordarme de la chaqueta que habíamos visto. "Yesterday, like last week, Miguel reminded me about the jacket that we had seen." Option 1: Miguel me repitió que esa chaqueta, cuándo iba a comprarla. Option 2: Miguel me repitió que esa chaqueta, que cuándo iba a comprarla. "Miguel asked me again when I was going to buy the jacket."

question
Preamble: Ayer, como la semana pasada, tuve que recordarle a María del dibujo en el suelo. "Yesterday, like last week, I reminded Maria of the picture on the floor." Option 1: Yo le repetí que ese dibujo, dónde iba a colgarlo. Option 2: Yo le repetí que ese dibujo, que dónde iba a colgarlo. "I asked her again where she was going to hang the painting."