Paradigmatic Uniformity: Evidence from Heritage Speakers of Spanish

: Subject-verb agreement mismatches have been reported in the L2 and heritage literature, usually involving inﬁnitives, analyzed as default morphological forms for fully speciﬁed T-heads. This article explores the mechanisms behind these mismatches, testing two hypotheses: the default form and the surface-similarity hypotheses. It compares non-ﬁnite and ﬁnite S-V mismatches with subjects with different persons, testing whether similarity with other paradigmatic forms makes them more acceptable, controlling for the role of verb frequency. Participants were asked to rate sentences on a Likert scale that included (a) inﬁnitive forms with ﬁrst, second and third person subjects, and (b) third person verbal forms with ﬁrst, second and third person subjects. Two stem-stressed verbs (e.g., TRA .j -o ‘brought.3 P . PAST ’) and two afﬁx-stressed verbs (e.g., me.ti - O ‘introduced.3 P . PAST ’), varying in frequency were tested. Inﬂectional afﬁxes of stem-stressed verbs are similar to other forms of the paradigm both phonologically and in being unstressed ( TRA .j -o ‘brought.3 P . PAST ’ vs. TRAI .g-o ‘bring.1 P . PRES ’), whereas afﬁxes of afﬁx-stressed verbs have dissimilar stress patterns ( me.ti - O ´ introduced.3 P . PAST ’ vs. ME .t-o ‘introduce.1 P . PRES ’). Results show signiﬁcantly higher acceptability for ﬁnite vs. non-ﬁnite non-matching, and for 1st vs. 2nd person subjects. Stem-stressed verbs showed higher acceptability ratings than afﬁx-stressed ones, suggesting a role for surface-form correspondence, partially conﬁrming previous ﬁndings.


Introduction
Spanish has a generalized agreement between the subject and the verb, realized as systematic variation in the inflectional morphology of the verb depending on the features of the subject. For example, in (1)a, -o encodes 1st person singular on the verb to match the person features of the subject, whereas in (1)b, the affix -e indicates 3rd person singular to match a 3rd person subject. The property of agreement is thus indirectly encoded in the inflectional morphology of the verb. In certain cases, verbal morphology is not completely transparent in the sense that the same morpheme can encode different person/tense combinations, as seen in (2), where the affix for a 1st person, present tense is the same as the affix for a 3rd person preterit.
she eat-3SG 'She eats.' (2) a. Yo traig-o. I bring-1SG.PRES 'I bring.' b. Ella traj-o she brought-3SG.PAST Heritage speakers, who can be characterized as speakers exposed to a minority language at home from birth in the context of a socially dominant majority language show, in some cases, variability with respect to subject-verb agreement, which we will call agreement mismatches. On the one hand, as (3) illustrates, instances of infinitives in main clauses (so-called root infinitives) have been documented among low proficiency heritage speakers (Camacho n.d. to appear, among others, see below) and also late bilinguals (see Liceras et al. 1999 andPrévost andWhite 2000). The infinitive in (3) appears in a root clause and since it is unmarked for a person, it fails to agree with the subject. Adult root infinitives of this type are not possible in monolingual Spanish (for cases where root infinitives with different properties are possible, see Hernanz 1999 On the other hand, some low proficiency heritage speakers produce examples in which what looks like a 3rd person singular verb agrees with a 1st person singular subject, as seen in (4)a; and similar mismatches have been reported in heritage Hungarian, Egyptian Arabic, in addition to heritage Spanish (see Sánchez 1983;Bolonyai 2007;Albirini et al. 2011;Silva-Corvalán 2014;Rodríguez and Reglero 2015, among others). By contrast, the general consensus is that adult Spanish monolinguals and possibly advanced proficiency heritage speakers would typically produce traj-e, as in (4)b. The difference between root infinitives and person-mismatched finite verbs is that the former lacks person features whereas the latter has them. Put another way, (3) involves lack of agreement, whereas in (4)a the 1st person features of the subject and the 3rd person features of the verb clash. Neither root infinitives nor person-agreement mismatches have been studied in depth in the heritage language literature, although they are well-documented in the production of L2 speakers. Assuming that the mentions in the HS literature point to real phenomena, even if infrequent, they raise several interesting questions: is there a systematic relationship between root infinitives and person-agreement mismatches? If so, is this relationship related to linguistic representations or to proficiency (or to both)? Relatedly, what determines the morphological shape of the verbal form? Finally, does language contact play a role?
This paper tests the acceptability of root infinitives and person-agreement mismatches among heritage speakers with higher proficiency levels, and it advances a hypothesis about one potential factor that determines the morphological shape of the verbal form in Spanish. Specifically, I capitalize on the observation that certain forms in the Spanish verbal paradigm are similar and that this similarity improves acceptability (see below). Specifically, comparing infinitival verbal forms, which are not similar to other forms in the paradigm to mismatched finite forms, which may be similar to others, allows us to assess whether surface similarity is the relevant notion. As we will see below, previous studies have argued for an alternative explanation for adult root infinitives, suggesting that they appear because they are analyzed as default forms (see Prévost and White 2000 and below). This study will compare the predictions of those alternative explanations.

The Representation of Non-Target Inflection in Bilinguals
In general, verb agreement morphology is an area where heritage and monolingual speakers diverge less, for example, in Hindi (Montrul et al. 2012), Russian and other languages, as noted by Benmamoun et al. (2010Benmamoun et al. ( , 2013. However, several studies have shown examples like (3)-(4)a with an infinitive and a mismatching finite verb Languages 2022, 7, 14 3 of 21 respectively. Infinitives are arguably not specified for a person, whereas finite forms have fully specified person features that clash with the features of the subject in (4)a (McCarthy 2006, p. 205). Prévost and White (2000) document instances of root infinitives in the spontaneous production of adult L2 speakers of French and German, which they analyze as cases in which a fully specified syntactic tense head is mapped to underspecified morphology, in what is known as the missing surface inflection hypothesis (see also Liceras et al. 1999 andHerschensohn 2001). These adult L2 infinitives are different from child root infinitives, which have been analyzed as not being fully inflected T heads (see Pierce 1989;Grinstead 1994;Guasti 1994;Rizzi 1994;Wexler 1994;Haegeman 1995;Phillips 1996;Lasser 1997;Hoekstra and Hyams 1998;Ezeizabarrena 2002Ezeizabarrena , 2003Hyams 2005;Liceras et al. 2006, a.o.).
In Prévost & White's account, infinitives can appear in root contexts in adult language because the rule that guides morpheme insertion to the abstract syntactic nodes relaxes the conditions under which it applies. Normally, a syntactic tense node specified as [+FINITE] must be matched with a morpheme that is equally specified as [+FINITE], but for these speakers, the feature [±FINITE] of the morpheme becomes an underspecified [αFINITE] and can, therefore, match a syntactic feature specified as + or − [FINITE]. Prévost & White's analysis thus relies on two notions: feature underspecification (from a ± value to an underspecified α value) and the notion of a 'default' form, namely the idea that an underspecified morphological feature ([αFINITE] in this case) may match a more specified syntactic feature ([+FINITE], for example). In this sense, a default form is a less specified form. (i) McCarthy (2006) found that the majority of L2 agreement "errors" in the spontaneous production of intermediate L2 Spanish speakers constitute underspecification cases (92%), which she analyzed as cases where the underspecified 3rd person morpheme is inserted in the terminal node instead of the more specified 1st or 2nd person morpheme. VanPatten et al. (2012) challenged these results, noting that their study did not find an asymmetry in the combination of person or number features their participants are sensitive to, as one would expect if 3rd person were underspecified. Their study analyzed reaction times in agreement matching and non-matching S-V sentences using a moving window paradigm, comparing low proficiency L2 and native Spanish speakers. Their study included two separate groups of stimuli: the first group combined 1st and 3rd person subjects with either matching verbs (yo tomo 'I.1SG drink.1SG', Pedro toma 'Pedro.3SG drinks.3SG') or mismatching verbs (*yo toma 'I.1SG drink.3SG' or *Pedro tomo 'Pedro.3SG drink.1SG'); the second group combined 2nd person singular and 3rd person plural subjects with either matching verbs (tú tocas 'you.2SG play.2SG', ellos tocan 'they.3PL play.3PL) or mismatching verbs (tú tocan 'you.2SG play.3PL, ellos tocas 'you.3SG play.2SG'). If 3rd person singular is the underspecified form, one would expect asymmetrical response times for items containing the 3rd person default form compared to non-default ones. However, their L2 speakers did not show sensitivity to any of the person/number combinations, that is, reaction times were not significantly different between yo tomo 'I.1SG drink.1SG' and *yo toma 'I.1SG drink.3SG'. It is important to note that McCarthy's and VanPatten et al.'s studies differ with respect to the proficiency level of participants (intermediate and low L2 respectively) and also with respect to analyzing production versus processing data. VanPatten et al. suggest that default effects may appear in more advanced L2 speakers, but not in lower proficiency ones. Shibuya and Wakabayashi (2008) and Wakabayashi et al. (2021) observe that L2 English (Japanese and Taiwanese L1) speakers are more sensitive to mismatches when subject plurality is marked with a demonstrative plus quantifier (*these two students speaks English) or syntactically (*Sam and Tom speaks English) than when only the head of the DP marks plurality (*the students speaks English). Additionally, they argue that L2 speakers are more sensitive to mismatches with I/you than with they, suggesting that number is problematic for these speakers. However, as VanPatten et al. (2012) note, these results may be due to the limited inflectional paradigm of English, since their own study did not find asymmetries based on person or number. Rodríguez and Reglero (2015) replicated VanPatten et al.'s study with advanced heritage speakers, intermediate and advanced L2 speakers and native speakers, concluding that advanced heritage speakers showed slightly different reaction patterns to grammatical vs. ungrammatical items compared to native speakers: whereas native speakers showed significantly delayed reaction times in the Verb + 2 (words) region, heritage speakers showed a delayed reaction in the Verb + 3 (words) region. Heritage speakers patterned similarly to advanced L2 speakers, and both groups were different than intermediate L2 speakers. Unfortunately for our purposes, the article does not break down the results by person.
In addition to Rodríguez and Reglero's (2015) study, which included heritage speakers in the experimental design, several studies mention S-V agreement mismatches in heritage languages, but few offer detailed accounts. For Spanish, Silva-Corvalán (2014) observed substitutions of 3rd person for 1st person forms (yo mató 'I killed-3S') which are more frequent in children that had less exposure to Spanish in their study. Bolonyai (2007) analyzed nominal and verbal inflection in Hungarian heritage speakers with English as dominant L2, noting that nominal, possessive inflection was more consistently dropped than verbal inflection, which was substituted for with a different form. She suggested that because verbal inflection is abstractly present in the L2, this may prevent deletion of the relevant morphology, whereas possessive inflection does not have an abstract parallel in English. Note that this is the opposite explanation to what Prévost and White (2000) and McCarthy (2006) propose for Spanish, in the sense that underspecified morphology is possible precisely in the context of verbal inflection in their analysis. Albirini et al. (2011) analyzed the oral production of heritage varieties of Egyptian and Palestinian Arabic, observing two possibly related properties. First, word order tended to be SVO, as opposed to the canonical VSO in monolingual varieties. Relatedly, instances of VSO almost uniformly involved 3rd person singular masculine verbs regardless of the features of the subject, suggesting that these heritage speakers had difficulties mastering the morphosyntax of VSO order. Second, Egyptian HS showed slightly higher levels of S-V agreement mismatches (6.42%) than Palestinian HS (2.57%). Many of these mismatches were with plural and feminine nouns. Albirini et al. also reported a tendency to use participial forms in place of fully inflected verbal forms, although participial forms inflect for gender and number. To the extent that participial forms lack specified tense information, they could be seen as parallel to infinitival forms in Spanish.
Turning to heritage Spanish, Anderson's (2001) longitudinal study of two children who moved from Puerto Rico to the US included instances of mismatching S-V, and noted that most of those instances involved using 3rd person singular forms for other person/numbers (va a cocinar go.3S to cook for voy a cocinar go.1S to cook 'I am going to cook). Goldin (2020) analyzed the development of inflection in several groups of Spanish-English bilingual children in a dual-language immersion program that included HS speakers. She found that HS speakers showed similar development to a Spanish-dominant comparison group.
With the exception of VanPatten et al. (2012) and Rodríguez and Reglero (2015), most of the studies on agreement mismatches are production-based; this is an important difference with the current study, which is based on acceptability ratings. The potential consequences of this difference are discussed in Section 4.
In sum, the L2 and heritage literature note instances of S-V agreement mismatches and instances of root infinitives, typically restricted to lower proficiency levels. Both are considered instances of default strategies, although some researchers do not find evidence for a preferred default strategy when reaction times are measured, a fact that may be attributed to the lower proficiency of the speakers. Reaction time studies have also found different patterns with respect to mismatches in S-V agreement in heritage speakers. Based on these findings, the question that arises is whether person-agreement mismatches and root infinitives are related. One natural explanation is that they both involve underspecification, Languages 2022, 7, 14 5 of 21 but how is this relationship articulated in such a way that it covers both instances? Are these related representations mediated perhaps by proficiency?

The Source of Mismatches
As suggested in the preceding section, Prévost and White (2000) argue that root infinitives are default forms, that is, they are the result of a rule that inserts a form with a less-specified feature. A similar account has been proposed for agreement mismatches in inflected verbal forms, proposing that 3rd person is the unmarked form (McCarthy 2006). Whether these default accounts are correct or not, one can ask what drives insertion of a finite 3rd person default vs. an infinitival default. One factor we will explore in this paper is the role of a priming effect across members of the verbal paradigm. Bybee (1988Bybee ( , 1995 suggests that the morphological properties of words emerge from a network of connected surface forms based on the degree of similarity so that more related forms are more connected. In this context, related means sharing the number and type of semantic features and the degree of phonological similarity. More strongly connected forms will result in more interaction between them, explaining, for example, certain instances of historical change. In this paper, we explore an extension of this notion of similarity that other phonological aspects, specifically stress patterns and morphological similarity. In this sense, form similarity may explain some of the variability observed in mismatches: since the inflectional ending for teng-o 'I have' is sometimes the same as the inflectional ending for 3rd person tuv-o 's/he had', it is possible that the two identical inflectional endings -o will develop a stronger connection, despite their different person features, and therefore, trigger mismatches, such as yo tuvo 'I had.3p'. 2 Additionally, Bybee proposes that more frequent forms create stronger connections than less frequent ones, so we would expect to see these types of mismatches more with frequent forms than with infrequent forms. Bybee and Brewer (1980) note that 3p.sg forms are more frequent in their corpus analysis, closely followed by 1p.sg forms, particularly in the preterit. This suggests that we should expect to see more mismatches between 1p.sg and 3p.sg forms in the preterite since those are the two most frequent forms (but see below).
To illustrate the case for this surface-network model, Bybee reports results from Bybee and Slobin (1982) in which "experimentally induced errors involving vowel changes for past tense result in almost all cases not in the production of nonce forms, such as the past tense of heap as *hept, but rather in the replacement of one preexisting word for another, usually within the semantic domain (Bybee 1988, p. 125)." For example, rose was the form given for the past tense of raise, sat for the past tense of seat and sought for search. In other words, the substitutions made by these speakers build on existing connections based on phonological and semantic similarity with another existing word. Bybee also argues that the rules used in generative linguistics are really reinforced representational patterns, namely "abstractions from existing lexical forms which share one or more semantic properties (p. 135)." Importantly, connection strength depends on frequency, so that forms more frequently available in the input establish stronger connections than less frequent ones. The role of frequency in producing and/or recognizing morphemes has also been repeatedly documented in several studies, specifically for bilingual populations in Gal (1989); Giancaspro (2017Giancaspro ( , 2020; Hur (2020); Hur et al. (2020), among others. Burzio (2004aBurzio ( , 2004b) formalizes a similar idea within an Optimality Theory framework, proposing the notion of connections as output-to-output faithfulness constraints (OO) that are stronger among forms that are similar ("closer" in Burzio's terminology). The notion of output-to-output faithfulness is a consequence of the Representational Entailments Hypothesis in (5), which suggests that two forms that shift together in context X will also do so in context Y. Stems and morphemes are also formalized as entailments that encode information specific for individual lexical items, for example, ive ⇒ generat__. At the same time, certain entailments result from the summation of individual entailments, yielding higher order or general selection properties, for example, -ive ⇒ V. These higher order selectional entailments perform the same functions as word-formation rules in other frameworks, but like Bybee's schemata, they emerge from the lexicon itself, so they are not independent rules in a separate grammar or morphology module, and they are violable. Frequency effects also result from a summation of entailments over the lexicon.
This notion of surface similarity suggests a possible factor in agreement mismatches. Specifically, does surface similarity interact with default forms when establishing surfaceto-surface relationships? Consider the two mismatching forms in (6). The first one involves a 1st person pronoun with an infinitive verb, the second one a 1st person pronoun with a 3rd person verb. In Prévost and White's analysis, (6)a is a case of a fully inflected T head mapped to a default form that lacks specification for finiteness. All things being equal, we would expect it to be similarly possible for subjects in all persons. By the same logic, if the finite verb in (6)b is also generated as a default (as suggested by McCarthy 2006), we would also expect it to be equally possible with subjects in all persons. Alternatively, vino 'came.3s' in (6)b could be the result of a surface-to-surface correspondence with 1st person forms, such as met-o 'I introduce.1s', but no such possibility is available for (6)a, because no other forms are like the infinitive but with person features. In other words, a pure default analysis predicts that (6)a and b should be equally possible with all persons, but surface similarity predicts an effect in (6)b but not in (6) In this paper, I will address these different predictions made by the default account vs. the OO correspondence (surface similarity) account. Specifically, the OO correspondence account predicts that only forms that are surface-similar to the target form should appear in mismatches, as formulated in (7)a, whereas the default account predicts that all person subjects should be possible with a 3rd person verb, as in (8)a. Second, the OO correspondence account predicts the opposite, since they are not surface-similar, as formulated in (7)b, whereas the default account predicts no differences between finite and non-finite 3rd person forms, since they are both defaults, as formulated in (8)b.
a. Surface similarity affects OO correspondence i. If confirmed, S1-V3 should have higher acceptability ratings than S2-V3 b. Finiteness affects OO correspondence i. If confirmed, mismatching finite and non-finite forms should be accepted at different rates. ii. If rejected, mismatching finite and non-finite forms should be accepted at comparable rates.
a. 3rd person insertion is the default rule i. All subjects should be equally acceptable with 3rd person verbs.
Regarding the question of whether surface similarity affects feature mismatches, compare the forms in (9), which vary depending on whether the stem or the affix is stressed in the preterit (the examples show syllable boundaries and stressed syllables). This difference is important because the 3rd person preterit inflection of these verbs has the same stress pattern as the 1st person inflection of the present tense: TRA.j-o 's/he brought' vs. TRAI.g-o 'I bring-1P', whereas affix-stressed verbs are different: me.TI-O 'inserted-3P' vs. ME.t-o 'insert-1P'. The affix of me.TI-O is stressed, but the affix of TRAI.g-o is unstressed. OO correspondence based on phonological (including prosodic) information should relate TRA.j-o 's/he brought-1P' to TRAI.g-o 'I bring-1p' more strongly than me.TI-O 'insert-1P' to ME.t-o 'insert-1P', because of the stress difference in the latter pair. In other words, (9)a is compatible with OO correspondence (in the relevant aspects related to stress), but (9)b is not. For this reason, in order to test hypothesis (7)a, we included two stemstressed and two affix-stressed verbs. 3 Only a few verbs in Spanish are stem-stressed, and in this sense they are irregular. Tener 'have' shows a further root alternation: tenin most forms vs. tuvin the preterit and some subjunctive forms. Traer 'bring', on the other hand, shows three different roots: tra-(trae 's/he brings') vs. traig-(traigo 'I bring') vs. traj-(traje 'I brought'). Corbett et al. (2001) point out that morphological irregularity has long been associated in the literature with high frequency, although they note that this relationship is not strictly linear. Bybee (2001, p. 12) hypothesizes that "morphological irregularity is always centered on the high-frequency items of a language", noting that low-frequency irregular verbs like English weep/wept regularize to weeped, whereas high-frequency irregular verbs like keep/kept do not. In the context of HS, Perez-Cortes (2022) finds that heritage speakers produce target forms more accurately in irregular embedded verbs (which she assumes to be high-frequency) in the context of mood alternations.
Following this line of research, the next question we raise is whether frequency plays a role in OO correspondence. Whether one follows Bybee's or Burzio's formulation, frequency should favor OO correspondence, because surface forms of more frequent verbs should have stronger links (or more entailments) than those of less frequent verbs. This leads to hypothesis 2 in (10). Notice that the default account makes no specific predictions with respect to frequency.
OO correspondence is mediated by frequency effects. i. If confirmed, OO correspondence should vary depending on verb frequencies To test for frequency effects, we selected two high-frequency and two low-frequency verbs, based on type-frequency counts from corpora (see below for details).

Participants
Forty-six advanced college-age (M = 19.8, SD = 1.74) heritage Spanish speakers from the Chicago area completed the task. Additionally, a group of 37 participants (age M = 34.6, SD = 9.14) who grew up in a Spanish-speaking country until at least 15 years of age but currently live in the US also completed the task as a Spanish-dominant comparison group. On average, the self-reported age of acquisition of Spanish for the heritage group was 2.35 (SD = 2.45) and 3.82 (SD = 2.14) for English. These data stems from the response to the question "at what age did you start learning the following languages?", so may explain why the mean was 2.35. They spent an average of 19.02 years (SD = 1.61) in a Spanishspeaking family and 16.53 (SD = 5.89) in an English-speaking family. These averages may reflect changes in the composition of their family, or perhaps the introduction of English as children begin schooling in English. Their education has been primarily in English (M = 15.29 years, SD = 3.12) and to a lesser degree in Spanish (M = 4.36 years, SD = 3.76). They have spent an average of 7.97 years (SD = 8.91) in a Spanish-speaking region, which may include visits and 18.8 (SD = 1.75) in an English-speaking region.
All the Spanish-dominant participants lived at least their first 15 years of life in a Spanish-speaking country, 94% of them spent more than 20 years (M = 19.62, SD = 1.42) and an average of 9.02 years (SD = 5.66) in an English-speaking country.
Participants reported self-reported proficiency in different abilities in Spanish and English on a scale of 0-3, as seen in Table 1. Although the use of self-rating to establish proficiency is controversial, Marian et al. (2007) show that this measure accounts for the most variance in their factor analysis of the Language Experience and Proficiency Questionnaire (LEAP-Q). Tomoschuk et al. (2019) point out that self-rating raises issues regarding comparability across languages and differences in scales. Since this study does not establish group comparisons, the first criticism is less relevant. In sum, the heritage participants in this study had advanced oral skills and slightly lower reading skills and lower writing skills in Spanish. Their self-reported skills in English were close to the ceiling. The Spanish-dominant group, on the other hand, had close to ceiling self-reported abilities in Spanish and high proficiency in English.

Materials
The main task was an acceptability judgment task on a scale of 1-5, which contrasts several types of Subject-Verb (S-V) agreement patterns. In addition to S3-V3 matching sentences, it included finite-verb mismatches S1-V3 (see (11)a), S2-V3 (see (11)b), and non-finite verb mismatches including S1-V INF (see (11)c), S2-V INF (see (11)d) and S3-V INF (see (11) The default analysis predicts that all five mismatching items in (11) should be rated similarly, whereas the OO surface correspondence account predicts (11)a to be rated higher due to the similar stress pattern between TRA.j-e 'I brought-1p' and TRA.j-o 's/he brought-Languages 2022, 7, 14 9 of 21 1p' (and possibly also with the 1st person affix in CAN.t-o 'I sing-1p'). Furthermore, OO correspondence predicts higher acceptability ratings for mismatched finite forms than for non-finite forms, on the assumption that -o may be surface-similar to other forms in the paradigm, but -er is not.
Four verbs were selected, two high frequency in monolingual corpora (ver 'see' and tener 'have') and two low frequency (traer 'bring', meter 'introduce'). Traer 'bring' and tener 'have' are stem-stressed verbs and the other two (ver and meter) are affix-stressed in the 3rd person, as seen in Table 2 Frequency calculations yield different results depending on the corpora used. Larger corpora, such as (CREA n.d.), the NOW and the Web/dialects corpora (Davies 2016) include close to 8 billion words taken together, so they represent a large sample of language data. Relative frequencies for these items, presented in Table 3, converge with Bybee and Brewer's (1980, p. 224) observation that 1st person preterit forms are much less frequent than 3rd person preterit forms. 1st person forms are presented for illustration purposes since the experiment did not include 1st person verb forms. 5 However, these corpora include oral and mostly written texts of different genres and from many regions, so it is not obvious whose language input they represent. Furthermore, these corpora may not reflect the mostly oral input that bilinguals hear in the US. For that reason, we also assessed the normalized frequencies in the Corpus del Español en el Sur de Arizona (Carvalho 2012), an oral corpus with close to 680,000 words from speakers born mostly in Arizona and Mexico. Although smaller, this corpus has the advantage of being based on oral sociolinguistic interviews from US-based speakers, and, in this sense, the frequencies in Table 4 may better reflect the input heritage speakers in the US receive. Comparison between the two tables shows interesting patterns. First, ver and tener are much more frequent than traer and meter, with the notable exception of 3P. preterit trajo 'brought', which has a much lower frequency in Table 4 than Table 3. The frequency of infinitives in the two tables follows similar trends: tener > ver > meter > traer. Second, the frequency of infinitival forms is systematically higher than the frequency of 3rd (and 1st) person forms, so that different by itself should favor the infinitive if frequency is an important factor. Fisher's exact test determined no significant association between stress type (stem vs. affix-stressed) and frequency (High vs. Low) for 3rd person (p = 0.21). For the CREA, NOW and WEB corpora, there was a significant association for 3rd person (p = 0.02). Since stress does not shift in the infinitive, testing the association between stress placement and frequency is less relevant. We used two separate corpora frequency calculations in the statistical computation. The task, presented online using Qualtrics ® , included a total of 24 experimental sentences and 30 fillers (see Appendix A). Half of the verbs in the experimental items appeared in the infinitive form, half were presented in inflected 3rd person, all of them with a 1st, 2nd, or 3rd person singular subject, as described in Table 5. Fillers included 15 grammatical items (seven with clitics, eight without) and 15 deviant sentences (seven fully ungrammatical with clitics, eight semantically anomalous ones, see (12)). In total, 27 items were ungrammatical, 19 were fully grammatical and eight were semantically anomalous but syntactically grammatical. Because the experimental items included matching S3-V3 sentences, we thought it important to include sentences with clitics to assess grammaticality independently of the experimental items (see Montrul 2010 on the acceptability of clitics by HS). Participants first read an informed consent document, followed by a description of the task and instructions on how to react to each item. Experimental items were presented in randomized order, followed by the linguistic background questionnaire completed a linguistic background questionnaire. 6

Results
In order to establish a baseline for grammaticality, we first compared the ratings of fillers with grammatical and ungrammatical clitics. The average rating for grammatical clitic sentences was 4.41 for both groups (4.30 for the HS group) and 2.58 for ungrammatical sentences with clitics (2.55 for the HS group) on a scale of 1-5. An ordinal logistic regression in R (R Core Team 2021) using the ordinal package (Christensen 2019), with ratings as the output and item (grammatical or ungrammatical clitic sentence), group and item by group interaction as predictors, resulted in a significant main effect of item (Item(Grammatical) β 0 = 3.43, SE = 0.51, p < 0.0001) but not for group (Group(HS) β 0 = −0.33, SE = 0.33, p = 0.32) and the interaction between item and group (Item(Ungrammatical) × Group (HS) β 0 = 0.55, SE = 0.29, p = 0.05). Grammatical clitic items were 30 times more likely to be rated highly compared to ungrammatical clitic items, more of the effect coming from the rating of grammatical items, as the interaction in Figure 1 shows.

Results
In order to establish a baseline for grammaticality, we first compared the ratings of fillers with grammatical and ungrammatical clitics. The average rating for grammatical clitic sentences was 4.41 for both groups (4.30 for the HS group) and 2.58 for ungrammatical sentences with clitics (2.55 for the HS group) on a scale of 1-5. An ordinal logistic regression in R (R Core Team 2021) using the ordinal package (Christensen 2019), with ratings as the output and item (grammatical or ungrammatical clitic sentence), group and item by group interaction as predictors, resulted in a significant main effect of item (Item(Grammatical) β0 = 3.43, SE = 0.51, p < 0.0001) but not for group (Group(HS) β0 = −0.33, SE = 0.33, p = 0.32) and the interaction between item and group (Item(Ungrammatical) × Group (HS) β0 = 0.55, SE = 0.29, p = 0.05). Grammatical clitic items were 30 times more likely to be rated highly compared to ungrammatical clitic items, more of the effect coming from the rating of grammatical items, as the interaction in Figure 1 shows. Looking at the experimental items, Figure 2 presents the mean acceptability ratings for different conditions by group. 7 As expected, agreement-matching sentences were ranked highest, followed by S1-V3 mismatching sentences, S2-V3 mismatching sentences and infinitival sentences. HS speakers rated mismatching sentences higher than the comparison group, suggesting higher tolerance for mismatches. Looking at the experimental items, Figure 2 presents the mean acceptability ratings for different conditions by group. 7 As expected, agreement-matching sentences were ranked highest, followed by S1-V3 mismatching sentences, S2-V3 mismatching sentences and infinitival sentences. HS speakers rated mismatching sentences higher than the comparison group, suggesting higher tolerance for mismatches.   Figure 3 shows the acceptability ratings by person and agreement matching. In this figure, we can see that the lowest ratings correspond to agreement-mismatching items (in blue), and the highest ratings to agreement-matching (in red). Participants gave more 4-5 ratings to 1st person than to 2nd person and 3rd person, mismatching subjects.  Figure 3 shows the acceptability ratings by person and agreement matching. In this figure, we can see that the lowest ratings correspond to agreement-mismatching items (in blue), and the highest ratings to agreement-matching (in red). Participants gave more 4-5 ratings to 1st person than to 2nd person and 3rd person, mismatching subjects.  Figure 3 shows the acceptability ratings by person and agreement matching. In this figure, we can see that the lowest ratings correspond to agreement-mismatching items (in blue), and the highest ratings to agreement-matching (in red). Participants gave more 4-5 ratings to 1st person than to 2nd person and 3rd person, mismatching subjects. In order to confirm that participants made a clear distinction between matching and non-matching items, we ran a mixed-effects ordinal logistic regression in R (R Core Team 2021) using the ordinal package (Christensen 2019), including rating (an ordered ordinal variable) as output variable and Match (Matching, non-Matching) as independent variable and Participant and Verb as random effects. As expected, matching items were significantly more likely to be rated highly than mismatching items (Match(Matching) β0 = 4.65, SE = 0.32, p < 0.0001), specifically 105 times higher odds.
Because of multicollinearity effects, Match could not be included in the model with the other variables, so a second model with rating (an ordered ordinal variable) as output variable and Person (1, 2, 3), Form (Finite, non-finite), Person X Form and ProficiencyOral In order to confirm that participants made a clear distinction between matching and nonmatching items, we ran a mixed-effects ordinal logistic regression in R (R Core Team 2021) using the ordinal package (Christensen 2019), including rating (an ordered ordinal variable) as output variable and Match (Matching, non-Matching) as independent variable and Participant and Verb as random effects. As expected, matching items were significantly more likely to be rated highly than mismatching items (Match(Matching) β 0 = 4.65, SE = 0.32, p < 0.0001), specifically 105 times higher odds.
Because of multicollinearity effects, Match could not be included in the model with the other variables, so a second model with rating (an ordered ordinal variable) as output variable and Person (1, 2, 3), Form (Finite, non-finite), Person X Form and ProficiencyOral (average of self-rated speaking and listening proficiency in Spanish) as fixed effects and random intercepts for participant and verb. 8 Results in Table 6 show that Form and Proficiency were significant predictors. Specifically, finite forms were 2.4 times more likely to receive a higher rating than non-finite forms, and higher proficiency lowered the odds of higher ratings. More importantly, Person significantly interacted with finite Form. Form-Person interactions are illustrated in Figure 4 (person refers to subjects): 3rd person subjects had a significantly higher effect on rating for finite forms than for non-finite, as expected since finite forms include matching, but 1st person also had a significantly higher effect on finite forms than on non-finite verbs. Post-hoc pairwise comparisons for verb form and person found that only finite forms had significant contrasts between 1st and 2nd person subjects (Bonferroni adjusted p = 0.0007), as well as 1st and 3rd and 2nd and 3rd (p < 0.0001) although the latter two are not relevant for our purpose, since 3rd person subjects include matching S3-V3 and non-matching S3-VINF items. Non-finite forms, on the other hand, had no significant contrasts between persons. verbs. Post-hoc pairwise comparisons for verb form and person found that only finite forms had significant contrasts between 1st and 2nd person subjects (Bonferroni adjusted p = 0.0007), as well as 1st and 3rd and 2nd and 3rd (p < 0.0001) although the latter two are not relevant for our purpose, since 3rd person subjects include matching S3-V3 and nonmatching S3-VINF items. Non-finite forms, on the other hand, had no significant contrasts between persons.  The contrast between 1st and 2nd person subjects confirms the first hypothesis in (7)a and disproves the alternative default account hypothesis in (8)a, since S1-V3 and S2-V3 had significantly different acceptability ratings, with 1st persons having a greater probability of higher ratings than 2nd persons.
In order to test hypothesis (7)b, which predicts differences between finite and nonfinite mismatching conditions, and the default account hypothesis in and (8)b, which predicts the opposite, we ran a model with the mismatching conditions only (S1-V3, S2-V3 and S1, S3, S3-VINF). This model included acceptability rating as an ordered outcome The contrast between 1st and 2nd person subjects confirms the first hypothesis in (7)a and disproves the alternative default account hypothesis in (8)a, since S1-V3 and S2-V3 had significantly different acceptability ratings, with 1st persons having a greater probability of higher ratings than 2nd persons.
In order to test hypothesis (7)b, which predicts differences between finite and non-finite mismatching conditions, and the default account hypothesis in and (8)b, which predicts the opposite, we ran a model with the mismatching conditions only (S1-V3, S2-V3 and S1, S3, S3-VINF). This model included acceptability rating as an ordered outcome variable and verb Form (finite or non-finite), V-type (affix or stem-stressed) and oral proficiency as independent variables, with participant and verb as random factors. As the model parameters in Table 7 show, all three variables were significant. Finite verbs were 94% more likely to get a higher acceptability rating than non-finite verbs, and stem-stressed verbs were 66% more likely to be highly rated than affix-stressed verbs. Oral proficiency increases reduced the likelihood of higher ratings by 96%. Considering that all the items in this model were non-matching, this effect of higher proficiency is expected: with higher proficiency, speakers will more confidently reject mismatching items and rate them less acceptable. The fact that acceptability ratings for finite verb forms significantly differed from non-finite verbs refutes (8)b and confirms hypothesis (7)b, which predicted that finite nonmatching forms should be rated higher than non-finite forms if surface similarity plays a role.
The fact that frequency did not have an effect does not support the hypothesis in (10) 9 . In sum, S1-V3 had significantly greater odds of acceptable ratings than S2-V3 and, among mismatching items, finite forms had greater odds of acceptable ratings than nonfinite forms.

Discussion
The results presented in the previous section indicate that bilingual speakers (proficient HS speakers and Spanish-dominant comparison speakers) show a preference for the scale in (13), with a clear break between matching and mismatching, but also between each of the mismatching categories.
The fact that frequency did not have an effect does not support the hypothesis in (10) 9 . In sum, S1-V3 had significantly greater odds of acceptable ratings than S2-V3 and, among mismatching items, finite forms had greater odds of acceptable ratings than nonfinite forms.

Discussion
The results presented in the previous section indicate that bilingual speakers (proficient HS speakers and Spanish-dominant comparison speakers) show a preference for the scale in (13), with a clear break between matching and mismatching, but also between each of the mismatching categories.
(13) Matching > S1-V3 > S2-V3 > non-fin mismatching The participants in this study had a smaller range of self-reported proficiency than those in previous studies and proficiency was measured as a continuous variable. Nevertheless, these results are consistent with Liceras et al.'s (1999) observation that root infinitives diminish in production when proficiency increases.

Person Asymmetries and OO Correspondence (Surface-Similarity)
The first hypothesis in (7)a predicted that OO correspondence should be sensitive to person, assuming that the phonological and prosodic shape of the surface form affects OO correspondence. This hypothesis was confirmed since S1-V3 mismatches were rated systematically higher than S2-V3 mismatches. The OO correspondence hypothesis establishes a link (or an entailment in Burzio's terms) between the unstressed 3rd person affix of TRA.j-o 'brought.3P.PAST' and unstressed 1st person affix of TRAI.g-o 'bring.1P.PRES', but not between stressed 3rd person affix of me.TI-O 'introduced.3P.PAST' and the stressed 1st person affix of ME.t-o 'introduce.1P.PRES', so we expected a higher rating for the former than for the latter. Furthermore, the fact that stem-stressed verbs showed higher ratings than affix-stressed ones confirms the proposed analysis since the former would find more surface correspondences within the paradigm (TRA.j-o 'brought.3P.PAST', TRA.j-e 'brought.1P.PAST' and TRAI.g-o 'bring.1P.PRES') than the latter.
The participants in this study had a smaller range of self-reported proficiency than those in previous studies and proficiency was measured as a continuous variable. Nevertheless, these results are consistent with Liceras et al.'s (1999) observation that root infinitives diminish in production when proficiency increases.

Person Asymmetries and OO Correspondence (Surface-Similarity)
The first hypothesis in (7)a predicted that OO correspondence should be sensitive to person, assuming that the phonological and prosodic shape of the surface form affects OO correspondence. This hypothesis was confirmed since S1-V3 mismatches were rated systematically higher than S2-V3 mismatches. The OO correspondence hypothesis establishes a link (or an entailment in Burzio's terms) between the unstressed 3rd person affix of TRA.j-o 'brought.3P.PAST' and unstressed 1st person affix of TRAI.g-o 'bring.1P.PRES', but not between stressed 3rd person affix of me.TI-O 'introduced.3P.PAST' and the stressed 1st person affix of ME.t-o 'introduce.1P.PRES', so we expected a higher rating for the former than for the latter. Furthermore, the fact that stem-stressed verbs showed higher ratings than affix-stressed ones confirms the proposed analysis since the former would find more surface correspondences within the paradigm (TRA.j-o 'brought.3P.PAST', TRA.j-e 'brought.1P.PAST' and TRAI.g-o 'bring.1P.PRES') than the latter.
It is possible that nominals may have a different feature structure depending on the person, as Béjar (2003); Béjar and Rezac (2009);Ritter (2002a, 2002b);Nevins (2007) among others, have suggested. In general, these analyses provide a mechanism to distinguish 3rd persons from all others, capturing the observation that 3rd persons are default. However, as we have seen, the results of these studies do not support the idea that 3rd persons are more favored than 1st or 2nd persons. Nevins (2007) proposes an account that maintains the role of 3rd person as default without treating as unmarked, however, the specific combination of features and the relativized agreement he uses predicts that if there is an asymmetry between 1st and 2nd person, S2-V3 should be preferred to S1-V3, contrary to what we observe. Specifically, Nevins (2007) Agreement is formalized as a feature-matching relationship between a probe, T in our case, and a goal (DP). An agreement can match full feature sets or feature subsets that have features with contrastive values or marked values. Given the way contrastivity and markedness are defined in his system, relativized agreement between a 3rd person probe and, a goal would target either a 3rd person or a 2nd person, predicting possible S2-V3 and, S3-V3 agreement mismatches. However, as we have seen, S1-V3 has better acceptability odds than S2-V3. 10

The Place of OO Correspondence
Returning to the role of surface similarity, it is clear that all speakers were aware that non-matching forms were deviant compared to matching S-V forms, even if they rated yo TRA.j-o 'I brought.3P.PAST' higher than yo me.TI-O 'introduced.3P.PAST'. Presumably, when speakers read the mismatching forms, processing S-V agreement failed because of the feature clash between the subject and inflection, but the surface form TRA.j-o would trigger an OO correspondence with TRAI.g-o 'bring.1P.PRES' that would mitigate the failed agreement, whereas me.TI-O would not trigger such correspondence with ME.t-o 'introduce.1P.PRES' because the stress patterns are different. This would account for the higher acceptability rating for S1-V3 in the stem-stressed forms. 11 From a different point of view, this paper raises an intriguing question: why should speakers show these gray areas between grammaticality and ungrammaticality? Under the canonical generative view of grammar, grammaticality, and its close correlate, acceptability is viewed as categorical, perhaps mediated by processing effects, but not as a gradable concept, although in practice grammaticality judgments have always had degrees. However, the phenomena under analysis in this paper seem to exploit productive grammatical mechanisms that operate on "unacceptable" structures. In this view, we are suggesting, speakers process mismatches as mismatches, but are able to formally repair them using surface similarity networks. The implications for bilingual grammars are important, since these gray-area mechanisms may be a path for divergence between bilingual and monolingual grammars.

Morphological Mechanisms. The Role of Frequency
The second hypothesis in (7)b, which predicted that finite forms should receive higher acceptability ratings than non-finite ones, given the potential surface similarity of the former was also confirmed. Specifically, ratings for non-finite forms were systematically lower than for finite mismatched ones. This result is not predicted by the default account. In the current account, tener 'have.INF' differs enough from inflected forms that form similarity correspondences would be weak and would not alleviate a failed feature agreement. 12 It is important to note that the OO correspondence analysis does not preclude the existence of defaults. Indeed, several researchers have argued that 3rd person is less marked than 1st and 2nd persons (see Jakobson 1963;Benveniste 1966;Ritter 2002a, 2002b;Bianchi 2006;Nevins 2007; Piñeros 2017), so it is quite possible that surface similarity may interact with paradigm defaults. The design of this study was not intended to test for default effects, since all finite forms were 3rd person, but our results suggest that surface similarity ameliorates mismatches with default 3rd person verbal forms compared to dissimilar infinitival forms.
What is less clear from this study is the connection between morphological processes or representations and frequency. Traditional generative accounts do not generally postulate an explicit role for frequency in morphological processes However, notions, such as markedness or default implicitly capture how a form undergoes a certain process. Burzio's account explicitly incorporates these notions to his model, as does Bybee. In this sense, the previous literature has observed that frequent irregular forms are more resilient (see Bybee 2001, for example), a finding indirectly replicated in the HS literature in Perez-Cortes (2022), who found that irregular embedded verbs were more target-like than regular ones in contexts where mood is alternated.
Our results only partially confirmed this view: on the one hand, irregular, stemstressed verbs were more acceptable particularly with nonmatching items. If irregular verbs have stronger networks or entailments than regular ones, one would expect speakers to activate surface similarity connections more readily. On the other hand, this preference extended to both high and low-frequency stem-stressed verbs, suggesting that frequency did not modulate irregular morphology.
If confirmed, these results suggest the need to develop a more comprehensive notion of how lexical networks are interrelated and how frequency affects them. As noted in our discussion of the different corpora, different forms in the paradigm had very different frequencies, so a high frequency for a 1st person preterit may be comparatively low compared to the infinitive, and if the OO correspondence view is right, one would expect the frequency to affect not only word-to-word networks, but affix-to-stem, affix-to-affix, paradigmatic form-to-paradigmatic form, etc., and to affect them possibly differently.

On the Bilingualism Continuum
One of the research questions that motivated this study was whether these mismatching agreement effects were related to bilingualism. However, the absence of group effects suggests this is not the case. It is likely that this is due to the similar proficiency level between the HS group and the Spanish-dominant group, but it is also interesting to note that oral proficiency, imperfect as it may be as a self-reported continuous variable, did have a strong effect. This contrast between the absence of group effects and strong proficiency effects suggests that the categorical distinction between heritage and minoritydominant speakers is less relevant than the proficiency continuum. Although previous studies have found nonmatching S-V agreement patterns among HS speakers, these are mostly among lower proficiency speakers and the studies do not include proficiency as a continuous variable.
From a different perspective, this study shows an area in which HS speakers have patterns similar to Spanish-dominant counterparts, even though sensitivity to OO correspondence could be argued to be a peripheral and subtle aspect of linguistic ability. In other words, most studies of HS focus on how those speakers are different from either minority-dominant speakers or monolinguals. This study shows an area in which they seem to be very similar. It is possible that the similarity in acceptability ratings among HS and Spanish-dominant speakers may be due to the absence of cross-linguistic effects from English, since English verbs have limited morphology, and arguably the 3rd person form is not the default morphological form (see Nevins 2007, p. 283).

Conclusions
This study opens several areas for future research. First, testing whether the effects we have found are also present in lower proficiencies in the continuum. Second, the current design compared stem-stressed vs. affix-stressed forms, an interesting extension would be to include the actual suspected target for the output correspondence, comparing, for example, yo me.TIÓ 'I introduced.3P.PAST' vs. ella ME-te 'she introduces.3P.PRES' and yo TRA.j-o 'I brought.3P.PAST' vs. ella TRAI.g-o 'she bring.1P.PRES'. These combinations would allow us to confirm whether actual similar forms are subject to output-to-out correspondence.
To conclude, the results presented in this study have found some evidence for the role of OO correspondence, or surface similarity in morphological form acceptability in the context of agreement mismatches. It also stresses the importance of treating proficiency as a continuous variable, perhaps as a more important variable than categorical group distinctions. Finally, it raises the need to find a more nuanced operationalization of frequency in the studies where the role of lexical frequency may be suspected.
Funding: This research received no external funding.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Illinois, Chicago (protocol # 2020-0228 "Grammatical properties of Spanish", approved 24 March 2020).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Following the approved protocol, data cannot be made public.
Acknowledgments: I would like to thank three anonymous reviewers for the time devoted to commenting this paper, and for the excellent suggestions they made. The paper has also greatly benefitted from discussions with David Giancaspro about agreement mismatches. He initially mentioned the use of 3rd person verbal forms with 1st person subjects among heritage speakers.

Conflicts of Interest:
The author declares no conflict of interest.
Lamentablemente yo no la pude ver Mi papá salió de la casa.
Arriba ella sintió frío 11 I wish to thank an anonymous reviewer for suggesting the potential relevance of Nevins' framework as an alternative explanation for the current data.