Gender Agreement and Assignment in Spanish Heritage Speakers: Does Frequency Matter?

: Gender has been extensively studied in Spanish heritage speakers. However, lexical frequency e ﬀ ects have yet to be explored in depth. This study aimed to uncover the extent to which lexical frequency a ﬀ ects the acquisition of gender assignment and gender agreement and to account for possible factors behind heritage language variability. Thirty-nine English-dominant heritage speakers of Spanish completed a lexical knowledge screening task (Multilingual Naming Test (MiNT)) along with an elicited production task (EPT), a forced choice task (FCT), and a self-rating lexical frequency task (SRLFT). Heritage speakers performed more successfully with high-frequency lexical items in both the EPT and the FCT, which examined their acquisition of gender assignment and gender agreement, respectively. Noun canonicity also a ﬀ ected their performance in both tasks. However, heritage speakers presented di ﬀ erences between tasks—we found an overextension of the masculine as well as productive vocabulary knowledge e ﬀ ects in the EPT, whereas the FCT showed an overextension of the feminine and no productive vocabulary knowledge e ﬀ ects. We suggest that lexical frequency, determined by the SRLFT, and productive vocabulary knowledge, as measured by the MiNT, account for the variability in the acquisition of gender assignment but not on gender agreement, supporting previous claims that production is more challenging than comprehension for bilinguals.


Introduction
Heritage speakers (HS) have been shown to exhibit an unstable knowledge of gender agreement. Their data also evidence more difficulty with feminine nouns in comparison with masculine nouns and with non-canonical rather than canonical noun endings (Montrul et al. 2008(Montrul et al. , 2014. Instability has also been found in other areas of heritage grammars in close relationship to lexical frequency in the input (Giancaspro 2017;Hur 2020). This study aimed to uncover the extent to which lexical frequency affects the acquisition of gender assignment and gender agreement.
Previous research suggests that the instability and variability of heritage grammars are due to incomplete acquisition or attrition (Montrul 2004;Montrul and Bowles 2009;Polinsky 2006), given that HSs and monolinguals may receive different input (Montrul and Sánchez-Walker 2013;Rothman 2009;Kupisch et al. 2017). However, Putnam and Sánchez (2013) suggest that instead of focusing on quantity or quality of input, the focus should be on frequency of activation (processing for comprehension and production that results in intake of the heritage language), a crucial factor in heritage language acquisition and maintenance. Low activation of grammatical features assigned to lexical items is, in their view, responsible for variability effects found in heritage grammars. Furthermore, following (Gollan et al. 2011), they propose a model that predicts more difficulties in the activation of lexical items for production than for comprehension purposes. Their proposal is based on the notion that access to lexical items from the heritage language for production purposes must overcome competition from the dominant language in a bottom-up process that starts at the conceptual level and involves the activation of semantic and syntactic constraints as well as phonological mapping in the heritage language. Access to lexical items in comprehension, on the other hand, is a top-down process in which speakers are given the phonological form in the heritage language, lowering the strength of the competition from the dominant language. In the present study, we tested the activation approach (Putnam and Sánchez 2013) by testing the effects of frequency of activation on gender assignment and agreement in a group of HSs of Spanish. We use perceived lexical frequency and proficiency as proxies for frequency of activation and we used a production task to test gender assignment and a receptive task to test gender agreement.
Grammatical gender systems are found in more than 200 languages. Difficulty in first language (L1) acquisition of gender systems is rare (Carroll 1989;Corbett 1991;Pérez-Pereira 1991). At the same time, gender constitutes a significant challenge for second language (L2) learners as well as HSs. This is not the case among native speakers with stable representations who make few or no gender mistakes in their native language and are known to have an "assignment system" that enables them to determine the gender of a noun (Corbett 1991).
Gender assignment is a lexical property of nouns that depends on two primary types of information: semantic meaning and grammatical form, which includes morphology and phonology. In languages such as Tamil, gender is strictly restricted to the semantics of nouns, while in other languages such as Russian, Swahili, and Bantu, gender in nouns is expressed by morphological and phonological means (Corbett 1991).
While gender assignment is a lexical property, gender agreement between the noun and other categories is manifested syntactically through agreement within the noun phrase and is generally common in adjectives that may show some formal indication of the number or gender of the noun they modify (Steele 1978).
In Spanish, gender assignment is found in certain nouns both as semantic information and as a formal feature. Harris (1991) presents a set of animate nouns referring to humans and animals that match the semantic notions of biological sex as seen below:

Masculine
Feminine hombre "man" mujer "woman" caballo "horse" yegua "mare" arnero "ram" oveja "ewe" Additionally, gender assignment follows certain tendencies. Harris (1991) divides gender morphological markers into three classes. First, the inner core with -o masculine endings and -a feminine endings. These nouns are also known as nouns with canonical gender marking. Second, the outer core nouns with -e, and consonant endings. Third, the residue-nouns with endings that are not part of either of the first two classes.
Although the Spanish markings have some partial predictability, there is a lack of direct correspondence between form and meaning, and they are a less reliable source than, for example, inflectional morphemes for past tense (Corbett 1991;Frigo and McDonald 1998). In this study, we included canonicity in gender marking as one of the variables that will help us understand the sources of variability in gender assignment and agreement in heritage Spanish. Table 1 summarizes the tendencies found in Spanish gender morphological markers. padre "father" mujer "woman" especialista "specialist" deportista "athlete" cedro "cedar" sidra "cider" mar "sea" liebre "hare" problema "problem" tribu "tribe"

Gender Acquisition in Second and Heritage Language in Spanish
Gender marking and agreement errors are common even at the advanced stages of proficiency for English-speaking L2 learners (Fernández-García 1999;Franceschina 2005;Hawkins and Franceschina 2004;McCarthy 2008) as well as among English-speaking child HSs (Goebel-Mahrle and Shin 2020) and English-speaking adult HSs (Alarcón 2011; Montrul et al. 2008Montrul et al. , 2014. HSs of Spanish who speak other dominant languages may display a different pattern-Irizarri van Suchtelen (2016) found that Dutch-speaking HSs of Spanish produced target-like gender agreement 94% of the time and Van Osch et al. (2013) found that Dutch-speaking Spanish HSs produced accurate gender agreement outside the DP 85% of the time. English-speaking L2 learners and HSs experience difficulties with gender assignment (1a), which is a lexical property, and with gender agreement (1b), which is a syntactic operation. 1 a.
*El fuente blanco "The-masc fountain-masc white-masc" b. *La fuente blanco "The-fem fountain white-masc" In (1a), the determiner, the noun, and the adjective appear in masculine form, indicating that the noun fuente "fountain" has been assigned masculine gender, contrasting with the feminine gender it is assigned in most varieties of Spanish. In (1b), the determiner is feminine and in principle agrees with the noun, but the adjective does not. In this case, there is a lack of syntactic agreement rather than a different assignment of gender to a lexical item. Previous studies have found that gender assignment (1a) presents more difficulties for both Spanish L2 and HSs than gender agreement (1b), suggesting that difficulties primarily affect the lexical aspect of gender (Goebel-Mahrle and Shin 2020; Grüter et al. 2012;Montrul et al. 2008Montrul et al. , 2014. L2 learners and HSs also tend to be less accurate in gender assignment and agreement with noun phrases headed by a feminine noun (Montrul et al. 2008(Montrul et al. , 2014Alarcón 2011;White et al. 2004), suggesting that masculine may be the default gender. Arguably, the canonicity of the noun is also one of the causes of greatest difficulty for both L2 learners and HSs (Alarcón 2011, Montrul et al. 2008Montrul et al. , 2014White et al. 2004).
In Spanish, canonicity can be classified into two categories: canonical and non-canonical word endings. Canonical are all nouns with masculine -o and feminine -a endings. Non-canonical are nouns with -e and consonant endings or -o and -a endings with the opposite gender marking as seen in Table 2. coche "car" puente "bridge" papel "paper" arroz "rice" sistema "system" programa "program"

Feminine
cama "bed" carta "letter" torre "tower" leche "milk" miel "honey" cal "quicklime" mano "hand" foto "photograph" Bedore et al. (2012) conducted a study with highly proficient L2 learners of Spanish and Spanish L1 speakers with the aim of investigating whether persistent difficulty with grammatical gender in production is due to a production-specific performance problem, or to a difficulty with the retrieval of gender information. The L2 group performed at ceiling in the off-line comprehension task. In the elicited production task, on the other hand, the L1 and L2 groups differed. Specifically, among L2 learners, gender assignment errors were more frequent than gender agreement errors, which were rare. These results suggest that the difficulty with grammatical gender experienced by L2 learners primarily affects lexical, rather than syntactic aspects of gender. Finally, the online processing task revealed that L2 learners did not process familiar determiner-noun pairs as efficiently as L1 participants. Nonetheless, they were able to use the determiner as a predictive cue although only for novel noun conditions. Bedore et al. (2012) argue that co-occurrence relations between noun and gender-marked modifiers are a key mechanism in establishing membership in gender class in Spanish in early language learning. The authors note that L1 speakers rely on determiners to detect a noun's gender since phonological and semantic cues alone are insufficient to establish membership in the appropriate gender class in Spanish. Nonetheless, L2 learners can take advantage of several cues that are not available to infants, such as parallels between L1 and L2, metalinguistic information, and information specific to written language. Grüter and colleagues argue that due to the richness of these information sources, L2 learners are less likely to rely on the computation of co-occurring elements, such as determiners and nouns, to the same extent as infant L1 learners, and will instead rely on the canonicity of the noun. Nonetheless, during the processing for novel words, L2 learners seem to follow a similar computation of co-occurring elements as the L1 speakers. Thus, the authors point out that it is clear that the processing of familiar and novel nouns differs among L2 learners. Montrul et al. (2008) carried out a study of gender agreement among Spanish L2 learners and HSs. Both groups completed a written picture identification task, a written gender recognition task, and an oral picture description task. In their study, canonicity was divided into two classes of nouns-all nouns with a masculine -o ending and a feminine -a ending were classified as canonical, while non-canonical nouns were those with all other endings (-e, consonant, opposite vowel). They found that L2 speakers had an advantage over HSs in both written tasks while HSs had an advantage over L2 learners in the oral task. Nonetheless, accuracy was higher in masculine noun conditions than in feminine conditions in both groups. Additionally, all groups made more errors with non-canonical endings in comparison with canonical endings.
Subsequent studies of gender agreement found similar results. Spanish L2 learners and HSs were less accurate in gender agreement with non-canonical ending nouns than with canonical ones (Montrul et al. , 2014. This can be attributed to lower frequency of non-canonical nouns in the input received by HSs as well as reduced language use for production purposes. In other words, non-canonical ending nouns are more infrequent than canonical ending nouns, which is a greater area of difficulty even among the early bilinguals who were exposed to Spanish during their childhood . As a way of explaining the difficulty of gender agreement specifically with non-canonical nouns, Montrul et al. (2014) proposed Gollan et al.'s (2011) frequency lag hypothesis to explain that there is a bilingual disadvantage in language processing and lexical retrieval specifically with low-frequency words in speaking and reading. Nonetheless, the study was unable to tease apart the independent effects of frequency and canonicity.
In this study, we addressed variability in HSs' knowledge by examining the acquisition of gender agreement and assignment among HSs with different levels of proficiency. We focused on lexical frequency, canonicity, and vocabulary proficiency as possible factors involved in variability in the acquisition of gender by HSs (Putnam and Sánchez 2013). Lexical frequency effects have been recently found in HSs in areas other than gender, mostly in productive tasks (Giancaspro 2017;Hur 2020;López Otero 2020). This is consistent with Putnam and Sánchez's (2013) proposal that considers productive tasks to be more difficult than receptive ones for HSs (see also (Sánchez 2019)). As previously indicated, canonicity has been shown to have effects on gender agreement and assignment among HSs (Montrul et al. , 2014) but its effects have not been teased apart from frequency effects. We include both factors in this study. Finally, we measure proficiency using a picture-based productive vocabulary task-the Multilingual Naming Test (MiNT; Gollan et al. 2012). This choice is based on the appropriateness of the task for heritage populations (Montrul et al. 2008;López Otero 2020) and on the well-established correlation between vocabulary knowledge and overall proficiency (Bedore et al. 2012;Gollan et al. 2012;Sheng et al. 2014;Treffers-Daller and Korybski 2015). To our knowledge, this is the first study that uses vocabulary knowledge as a proficiency measure to determine variability in the acquisition of gender agreement in HSs of Spanish.

Research Questions and Hypotheses
Given the relevance of lexical frequency, canonicity, and productive vocabulary highlighted in previous research, we posited the following questions and research hypotheses: RQ1: Are there lexical frequency effects that result in variability in the acquisition of gender assignment and gender agreement among heritage speakers? Hypothesis 1. Variability in the acquisition of gender assignment and agreement is modulated by lexical frequency. Low-frequency lexical items are expected to show more variability in gender agreement and assignment than high-frequency items.
RQ2: Does the acquisition of assignment and agreement correlate with a productive vocabulary knowledge measure (MiNT)?
Hypothesis 2. Higher levels of productive vocabulary in heritage speakers correlate with accuracy in gender assignment and agreement. If this hypothesis is correct, we expect HSs with better MiNT results to exhibit higher levels of accuracy on our tasks.

RQ3: Do canonicity and gender specification have an effect on gender assignment and agreement?
Hypothesis 3. Canonicity and gender specification have an effect on gender assignment and agreement. If this hypothesis is correct, we expect to find more variability with non-canonical and feminine lexical items among HSs.

RQ4: Do HS show differences in production and receptive tasks?
Hypothesis 4. Gender acquisition in heritage Spanish is modulated by task type (productive vs. receptive). If this hypothesis is correct, we expect within-subject results to be different in each task. Specifically, we expect heritage speakers to show a more target-like performance on the forced choice task (FCT), which measures their receptive knowledge, than on the elicited production task (EPT), which examines their productive knowledge.

Participants
A total of 39 HSs of Spanish (26 females, age range = 18-46; M = 22.13, SD = 4.94) participated in the study. Eighteen of them were simultaneous bilinguals who acquired both Spanish and English since birth, while 21 were sequential bilinguals. Four participants acquired Spanish since birth and started acquiring English before the age of 3 (M = 2.5; SD = 0.58). The remaining 17 sequential bilinguals acquired Spanish since birth and English later in their childhood (range of onset of acquisition of English = 4-10; M = 6.28; SD = 2.70). All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by an Institutional Review Board (study ID: Pro2018001535).
Overall, the participants reported being exposed to different varieties of Spanish by their main caretakers: 17% Ecuadorian Spanish; 17% Colombian Spanish; 13% Mexican Spanish; 13% Peruvian Spanish; 10% Cuban Spanish; 6% Puerto Rican Spanish; 5% Dominican Spanish; and other varieties including Spanish from Chile, El Salvador, Guatemala, and Spain. Five participants (13%) reported having a caretaker whose first and dominant language was English.
In spite of some participants acquiring English at the age of 10, all of them reported having been formally educated in English and having taken college-level Spanish courses. All of the participants were dominant in English, as shown by their productive vocabulary size: their MiNT scores in English were higher than in Spanish (Spanish MiNT scores range = 24-62; M = 47.38/68; SD = 10.40 vs. English MiNT scores range = 50-67; M = 62.08; SD = 4.37). Although we treated proficiency as a continuous variable in this study and used the MiNT, a lexical access task, as a measure of proficiency, following the suggestion of an anonymous reviewer, we provide here a comparison of the results of the MiNT and DELE (Diploma de Español como Lengua Extranjera), which has been traditionally used in L2 acquisition and heritage studies as a proficiency task to divide participants in sub-groups. Table 3 below shows the participants' MiNT scores in English and Spanish in comparison with their DELE proficiency test scores (Cuza et al. 2013;Duffield and White 1999;Montrul and Slabakova 2003).

Group DELE (Spanish) Spanish MiNT English MiNT
Advanced ( The DELE scores of the participants in this study indicated that most of them were advanced (28 out of 39). As we can see, the participants' English MiNT scores were comparable across all DELE-based proficiency groups, indicating similar levels of lexical access in English for all participants. Their Spanish MiNT scores, on the other hand, showed an increase as a function of their DELE-based proficiency group. Given the nature of our research questions, the fact that we treated lexical proficiency (productive vocabulary knowledge) in Spanish as a continuous variable following previous studies claiming bilingualism and language proficiency as a continuum, e.g., Ortega (2020), and the advantage that the MiNT has over the DELE in that it can be administered in both languages, we included the participants' Spanish MiNT scores in the analyses.
A control group of monolingual or Spanish-dominant speakers was not necessary because the focus of this study is not to determine how similar or different HSs are from other native speakers. Our focus was on lexical frequency effects on HSs within a proficiency continuum. Adding a control Languages 2020, 5, 48 7 of 16 group would only distract from the main goals of the paper and would introduce several comparison variables that are not part of the study, such as the different levels of lexical stability in monolingual acquisition, and heritage bilingual acquisition as well as differences in context of acquisition (Spanish as the socially dominant language vs. Spanish as a socially non-dominant language).

Materials
The participants completed three screening tasks: the Language Experience and Proficiency Questionnaire (LEAP-Q) (Marian et al. 2007), which collects information on the participants' language background, language learning experience, and their language patterns of use and exposure, and the Multilingual Naming Test (MiNT; Gollan et al. 2012) in Spanish and English. The MiNT is a lexical knowledge task in which participants are asked to name 68 items shown in pictures. They also completed an adapted version of the DELE test (Cuza et al. 2013). Additionally, they completed three experimental tasks: an elicited production task (EPT), a forced choice task (FCT) (see Figure 1), and a self-rating lexical frequency task (SRLFT). The EPT and the FCT aimed to examine gender assignment and agreement, respectively. We took the EPT task to be an assignment task because it requires that speakers access the noun as a lexical item and retrieve it with its gender assignment before retrieving the adjective. In the FCT, participants are provided two agreement patterns and they are asked to choose one. Both tasks were adapted from Bedore et al. (2012) and included 32 test items (36 distractors), which were divided into four conditions (k = 8): canonical masculine, canonical feminine, non-canonical masculine, and non-canonical feminine. Appendix B shows the test items used in each condition. We tested these conditions to look at the participants' knowledge of gender without relying on morphology (e.g., canonical masculine queso "cheese" vs. non-canonical masculine arroz "rice").
Languages 2020, 5, x FOR PEER REVIEW 7 of 16 variables that are not part of the study, such as the different levels of lexical stability in monolingual acquisition, and heritage bilingual acquisition as well as differences in context of acquisition (Spanish as the socially dominant language vs. Spanish as a socially non-dominant language).

Materials
The participants completed three screening tasks: the Language Experience and Proficiency Questionnaire (LEAP-Q) (Marian et al. 2007), which collects information on the participants' language background, language learning experience, and their language patterns of use and exposure, and the Multilingual Naming Test (MiNT; Gollan et al. 2012) in Spanish and English. The MiNT is a lexical knowledge task in which participants are asked to name 68 items shown in pictures. They also completed an adapted version of the DELE test (Cuza et al. 2013). Additionally, they completed three experimental tasks: an elicited production task (EPT), a forced choice task (FCT) (see Figure 1), and a self-rating lexical frequency task (SRLFT). The EPT and the FCT aimed to examine gender assignment and agreement, respectively. We took the EPT task to be an assignment task because it requires that speakers access the noun as a lexical item and retrieve it with its gender assignment before retrieving the adjective. In the FCT, participants are provided two agreement patterns and they are asked to choose one. Both tasks were adapted from Grüter et al. (2012) and included 32 test items (36 distractors), which were divided into four conditions (k = 8): canonical masculine, canonical feminine, non-canonical masculine, and non-canonical feminine. Appendix B shows the test items used in each condition. We tested these conditions to look at the participants' knowledge of gender without relying on morphology (e.g., canonical masculine queso "cheese" vs. non-canonical masculine arroz "rice"). The test items in both the EPT and the FCT presented a prompt in the form of a question that included the mass noun under examination (e.g., arroz "rice", luz "light", carne "meat") accompanied by two pictures below: one depicting the given mass noun on the left side followed by a second picture showing a color, as seen in Figure 1 below. Participants were shown four different colors across the tasks, all of which change morphologically depending on gender in Spanish: blanco The test items in both the EPT and the FCT presented a prompt in the form of a question that included the mass noun under examination (e.g., arroz "rice", luz "light", carne "meat") accompanied Languages 2020, 5, 48 8 of 16 by two pictures below: one depicting the given mass noun on the left side followed by a second picture showing a color, as seen in Figure 1 below. Participants were shown four different colors across the tasks, all of which change morphologically depending on gender in Spanish: blanco "white", rojo "red", amarillo "yellow", morado "purple". Participants were encouraged to use these color adjectives and not others while completing the practice items. In the EPT, participants were asked to answer the question by looking at both pictures, while in the FCT they were asked to choose between a grammatical DP in which determiner, noun, and adjective agree in gender and an ungrammatical DP in which the determiner and the noun do not agree in gender with the post-nominal adjective. In both tasks, participants responded orally and were exposed to preambles and prompts both in written and oral formats simultaneously. Moreover, they had to complete a series of practice items before starting the test items in both tasks.
Finally, the participants completed a SRLFT in Spanish, which aimed to establish a lexical frequency count representative for HSs of Spanish in an area of the United States where several varieties of Spanish co-exist. This task measures the HSs' use and exposure to the lexical items under examination by using a Likert scale asking participants how often they said and they heard specific lexical items: 1 (never), 2 (hardly ever), 3 (a few times a year), 4 (once a month), 5 (a few times a month), 6 (once a week), 7 (several times a week), 8 (once a day), 9 (several times a day). Additionally, participants were asked for a translation into English or a synonym in Spanish in order to confirm their knowledge of such lexical items. Their responses to both use and exposure to a specific item were added together and then averaged within and across participants, resulting in SRLFT-based lexical frequency counts ranging from 2 to 18. Appendix A shows lexical frequency counts of the lexical items under examination in decreasing order.

Statistical Analysis
We analyzed the data from both the EPT and the FCT by using a generalized linear mixed model in which response (grammatical response = 1, ungrammatical response = 0) was the dependent variable while gender (masculine or feminine), SRLFT-based lexical frequency count, Spanish MiNT score, canonicity (canonical or non-canonical), and task (EPT or FCT) were the independent variables 1 . All the variables were categorical, except for SRLFT-based lexical frequency count and Spanish MiNT score. The model included random intercepts for each subject as well as for each lexical item also. Additionally, the model also tested for two-way interactions between task and canonicity, task and gender, task and MiNT scores, as well as task and lexical frequency count.
Regarding our first research question, the model found lexical frequency effects in the HSs' performance. Lexical frequency as measured by the SRLFT facilitates gender assignment and agreement-HSs performed more successfully with high-frequency lexical items than with low-frequency lexical items. Figure 2 below shows lexical frequency effects across conditions in gender agreement, as measured by the EPT, and in gender assignment, as measured by the FCT. 1 We had to discard the item "agua" from the analysis due to the fact that it appears with a masculine determiner despite being feminine. The model also determined that both canonicity and gender specification have an effect on gender assignment and agreement. Specifically, items testing non-canonical nouns led to more unexpected responses than those testing canonical nouns in both gender assignment (EPT) and gender agreement (FCT). Figure 4 below shows gender assignment and agreement across conditions.  The EPT examined gender assignment in four different conditions: canonical masculine, canonical feminine, non-canonical masculine, and non-canonical feminine. Overall, HSs of Spanish showed more target-like gender assignment in masculine items (M = 0.95, SD = 0.22 for canonical masculine nouns and M = 0.96, SD = 0.19 for non-canonical masculine nouns) than in feminine items Regarding our second research question, the model found that gender assignment and gender agreement, tapped into by the EPT and the FCT, respectively, are modulated by the HSs' productive vocabulary knowledge. Specifically, as seen in Figure 3 below, productive vocabulary knowledge, as measured by the MiNT, plays a facilitative role in gender assignment (EPT). The model also found that such an effect does not extend to gender agreement (FCT).
Languages 2020, 5, x FOR PEER REVIEW 9 of 16 The model also determined that both canonicity and gender specification have an effect on gender assignment and agreement. Specifically, items testing non-canonical nouns led to more unexpected responses than those testing canonical nouns in both gender assignment (EPT) and gender agreement (FCT). Figure 4 below shows gender assignment and agreement across conditions.   The model also determined that both canonicity and gender specification have an effect on gender assignment and agreement. Specifically, items testing non-canonical nouns led to more unexpected responses than those testing canonical nouns in both gender assignment (EPT) and gender agreement (FCT). Figure 4 below shows gender assignment and agreement across conditions. Languages 2020, 5, x FOR PEER REVIEW 10 of 16 (M = 0.77, SD = 0.42 for canonical feminine nouns and M = 0.56, SD = 0.50 for non-canonical feminine nouns). Table 4 shows target-like and non-target-like gender assignments produced by the HSs.

Target-Like Gender Assignment Non-Target-Like Gender Assignment
Canonical masculine Jugo amarillo "yellow juice" Jugo *amarilla
The model also found that feminine nouns posed more difficulty overall. Most of the participants' variability occurred in feminine nouns, which is consistent with previous claims that masculine is the default gender in Spanish, and in non-canonical nouns, which impede HSs to associate -a and -o morphemes with feminine and masculine genders, respectively. However, the model also found that HSs performed more accurately with feminine nouns than with masculine nouns in the FCT. This indicates that gender specification effects are asymmetrical across the tasksthere is an overextension of the masculine gender in the EPT while in the FCT the feminine gender is overextended.
Along with the findings described above, the model determined that the HSs showed differences in production and receptive tasks, as inquired by our RQ4. When comparing the EPT, which is a productive task that measures gender assignment, with the FCT, a receptive task measuring gender agreement, the model found most of the HSs' variability in the EPT. Additionally, as mentioned above, gender effects differed across tasks-the EPT features an overextension of the masculine  Table 4 shows target-like and non-target-like gender assignments produced by the HSs. Table 4. Gender assignment production samples.

Target-Like Gender Assignment Non-Target-Like Gender Assignment
Canonical masculine Jugo amarillo "yellow juice" Jugo *amarilla
The model also found that feminine nouns posed more difficulty overall. Most of the participants' variability occurred in feminine nouns, which is consistent with previous claims that masculine is the default gender in Spanish, and in non-canonical nouns, which impede HSs to associate -a and -o morphemes with feminine and masculine genders, respectively. However, the model also found that HSs performed more accurately with feminine nouns than with masculine nouns in the FCT. This indicates that gender specification effects are asymmetrical across the tasks-there is an overextension of the masculine gender in the EPT while in the FCT the feminine gender is overextended. Along with the findings described above, the model determined that the HSs showed differences in production and receptive tasks, as inquired by our RQ4. When comparing the EPT, which is a productive task that measures gender assignment, with the FCT, a receptive task measuring gender agreement, the model found most of the HSs' variability in the EPT. Additionally, as mentioned above, gender effects differed across tasks-the EPT features an overextension of the masculine whereas in the FCT there is an overextension of the feminine gender. Productive vocabulary knowledge effects, as measured by the MiNT, are also modulated by the nature of the task-positive MiNT score effects were found in the EPT, but not in the FCT.

Discussion
Our RQ1 aimed to investigate the role of lexical frequency in the acquisition of gender assignment and agreement in HSs, for which we hypothesized that frequency could account for variability in this phenomenon. Consistently with Gollan et al.'s (2011) frequency lag hypothesis, our results provide evidence for the hypothesis that low-frequency words show more variability among HSs, particularly in the gender assignment task, which was a production task. This is also consistent with Putnam and Sánchez's (2013) activation hypothesis, which predicts that low activation of grammatical features assigned to lexical items will result in variability in HSs' production of those features. Our findings are consistent with findings for other areas of grammar such as mood (Giancaspro 2017), differential object marking (Hur 2020), and imperative force (López Otero 2020), and it further establishes the need to consider lexical frequency as one of the most relevant factors in understanding variability in HSs' grammatical representations. It also provides us with an important tool in evaluating proposals of incomplete acquisition or attrition, as variability may not stem from a lack of specification of features or properties of the heritage grammars, but it may affect only the way in which features assigned to some lexical items are activated in the HS's mind (Sánchez 2019). Our first hypothesis is partially confirmed, as HSs provided more accurate responses when presented with frequent lexical items, with the exception of gender assignment in canonical feminine nouns.
Our RQ2 inquired about the predictive power of productive vocabulary knowledge in the acquisition of gender assignment and agreement in HSs. We hypothesized that productive vocabulary knowledge, as measured by the MiNT, would correlate with higher accuracy levels in gender assignment and agreement. Our results show that productive vocabulary knowledge applied to gender assignment tested by the EPT, but it did not extend to gender agreement. This finding is consistent with the fact that the EPT is a production task and, as proposed by Gollan et al. (2011), lexical competition for production involves semantic competition, which could be cognitively more taxing for HSs than recognition in perceptual tasks. It also relates to the fact that gender assignment is a lexical property of the noun. The higher the level of productive vocabulary knowledge, the more likely it is that HSs could have mastered more lexical properties of nouns. The fact that it does not apply to gender agreement suggests that one might need to be cautious when establishing correlations between vocabulary knowledge and different areas of grammatical knowledge. It may well be the case that differences in productive vocabulary knowledge are relevant to the areas of grammar that require knowledge of lexical properties, such as gender assignment, which is a lexical property of nouns, and not to gender agreement, which is a morphosyntactic operation. Our second hypothesis is partially confirmed, as productive vocabulary knowledge only has an effect on gender assignment but not on gender agreement.
Our RQ3 explores the effects of canonicity and gender specification on the acquisition of gender assignment and agreement in HSs. We hypothesized that non-canonical and feminine lexical items would lead to more variability in the HSs' responses. We found canonicity and gender specification effects on gender assignment and agreement in our study, supporting the findings by Montrul et al. ( , 2014. This finding is of particular interest because while feminine and non-canonical nouns exhibited more variability, there were differences between the results of the EPT and the FCT so that HSs performed more accurately with masculine nouns in the EPT but with feminine nouns in the FCT. This could be attributed to the difficulties in accessing the gender features assigned to the noun in production that led to a default assignment. In receptive tasks, on the other hand, agreement in feminine gender has been found to be more salient, meaning being more highly recognizable, both in offline and online studies in comparison to masculine gender, which is the unmarked default status (Alemán Bañón and Rothman 2016; Beatty-Martínez and Dussias 2019; Domínguez et al. 1999;Smith et al. 2003). Given that masculine is the default in Spanish (Harris 1991;Pérez-Tattam et al. 2019), we suggest that feminine is the marked option and therefore easier to recognize for HSs when presented with optionality during the FCT 2 . Another relevant finding is the variability in the EPT results with non-canonical feminine nouns-while responses were above chance level at the higher end of self-reported lexical frequency, the participants responded below chance at the lower end of self-reported lexical frequency. This contrast indicates that non-canonical feminine assignment is favored by higher levels of lexical frequency in a way that does not affect non-canonical masculine in production 3 . In the receptive task, on the other hand, while acceptance of agreement with non-canonical masculine nouns showed some improvement at higher levels of frequency, it was not as marked as in the case of non-canonical feminine in the EPT. We take this to indicate that frequency may have a greater effect on non-canonical feminine nouns than on their masculine counterparts, which is reasonable if we assume that they are the marked option that needs to be activated in production. Our third hypothesis was partially confirmed-while both canonical and masculine nouns led to more accurate responses, HSs provided more accurate gender agreement responses with feminine than with masculine nouns.
Finally, our RQ4 investigated differences between production and receptive tasks. We hypothesized that HSs would show more target-like responses in tasks measuring their receptive knowledge than in those examining their production. Our results showed that variability was higher in the production task than in the receptive task. This is consistent with Putnam and Sánchez's (2013) proposal, based on Gollan et al. (2011), according to which production is more challenging than comprehension because it involves semantic competition as well as being modulated by lexical frequency. Given the high levels of accuracy of the participants on the English version of the MiNT, it is not surprising that lexical competition from the non-heritage language posed greater difficulties to lexical access in production among the HSs in this study. As we discussed above, two important features of the differences across tasks are that it is in production that we find an overextension of the masculine default as well as productive vocabulary knowledge effects. Our last hypothesis was partially confirmed-overall results indicated that HSs showed more target-like responses on the FCT, which measured their receptive tasks; however, when looking at masculine nouns, HSs provided more accurate responses in the EPT, which tapped into their productive knowledge. As discussed above, we argue that, when facing both masculine and feminine options on the FCT, HSs recognize feminine lexical items easier as feminine is the marked option.
We take our results to indicate that not all the factors we analyzed in this study had similar effects on gender assignment and gender agreement, as revealed by the HS data. Lexical frequency had greater effects on gender assignment, as shown on the oral production task, than on gender agreement, as shown on the receptive task, indicating that lexical frequency modulates variability in the acquisition of gender assignment but not gender agreement. Our first hypothesis was only partially supported by the data. The same can be said about our second hypothesis, given that productive vocabulary knowledge effects were mostly found in the gender assignment task. Canonicity and gender specification also showed higher levels of variability in gender assignment, as evidenced by the EPT. Overall, most of the variability effects were found in the production task, which supports Putnam  view that production that involves semantic competition is an area of greater difficulty than acceptability in a receptive task.

Conclusions
The current study provides evidence that lexical frequency plays a facilitative role in the acquisition of gender assignment and agreement in English-speaking HSs of Spanish. To our knowledge, this is the first study that uses a self-reported measure of lexical frequency counts to explore frequency effects in HSs. Additionally, our study found that productive vocabulary knowledge, as measured by the MiNT, can predict accurate production of gender assignment in HSs. In addition, to our knowledge, this is the first study that employs a productive vocabulary measure, as a continuous variable, to explore proficiency effects in the acquisition of gender among HSs. Our findings also indicate that noun canonicity and gender specification modulate the HSs' acquisition of gender. Specifically, while non-canonical nouns lead to more variable responses, gender specification shows a more complex effect-masculine nouns received more accurate responses in the production task, consistently with the argument that masculine is the default option, but when shown two options in the receptive task, HSs showed more target-like responses in feminine nouns, suggesting that feminine, the marked option, is more salient and easier to recognize. Finally, HSs showed more variability on production than on receptive tasks, providing support for previous proposals such as that of Putnam and Sánchez (2013).

Limitations of the Study
This study presented some methodological limitations that future research could approach. First, the participants' proficiency levels were not evenly distributed-most of them were advanced HSs of Spanish. Future research could explore the phenomena covered in this study in participants from different proficiency levels, particularly low-proficiency HSs, who were scarce in the current study. Additionally, we were not able to determine self-reported lexical frequency counts before we conducted the study. Therefore, the lexical frequency counts of the nouns tested did not feature the same ranges across conditions: a range of 10.47 for masculine canonical nouns (from 4.13 for "acero" to 14.6 for "queso"), a range of 11.4 for masculine non-canonical nouns (from 3.47 for "marfil" to 14.87 for "papel"), a range of 6.2 for feminine canonical nouns (from 7.4 for "tinta" to 13.6 for "gasolina", given that "agua", which received a lexical frequency count of 16.4, was discarded), and a range of 11.33 for feminine non-canonical nouns (from 2.67 for "cal" to 14 for "gente"). These inconsistent ranges of lexical frequency values across the conditions constrained us to examine lexical frequency as a continuous variable instead of as a categorical variable. In future research, we will fully explore lexical frequency by establishing lexical frequency values and categories before conducting the study.