Gender in Unilingual and Mixed Speech of Spanish Heritage Speakers in The Netherlands

: This study examines heritage speakers of Spanish in The Netherlands regarding their production of gender in both their languages (Spanish and Dutch) as well as their gender assignment strategies in code-switched constructions. A director-matcher task was used to elicit unilingual and mixed speech from 21 participants (aged 8 to 52, mean = 17). The nominal domain consisting of a determiner, noun, and adjective was targeted in three modes: (i) Unilingual Spanish mode, (ii) unilingual Dutch mode, and (iii) code-switched mode in both directions (Dutch to Spanish and Spanish to Dutch). The production of gender in both monolingual modes was deviant from the respective monolingual norms, especially in Dutch, the dominant language of the society. In the code-switching mode, evidence was found for the gender default strategy (common in Dutch, masculine in Spanish), the analogical gender strategy (i.e., the preference to assign the gender of the translation equivalent) as well as two thus far unattested strategies involving a combination of a default gender and the use of a non-prototypical word order. External factors such as age of onset of bilingualism, amount of exposure and use of both languages had an e ﬀ ect on both gender accuracy in the monolingual modes and assignment strategies in the code-switching modes. Abstract: This study examines heritage speakers of Spanish in the Netherlands regarding their production of gender in both their languages (Spanish and Dutch) as well as their gender assignment strategies in code-switched constructions. A director-matcher task was used to elicit unilingual and mixed speech from 21 participants (aged 8 to 52, mean = 17). The nominal domain consisting of a determiner, noun, and adjective was targeted in three modes: (i) Unilingual Spanish mode, (ii) unilingual Dutch mode, and (iii) code-switched mode in both directions (Dutch to Spanish and Spanish to Dutch). The production of gender in both monolingual modes was deviant from the respective monolingual norms, especially in Dutch, the dominant language of the society. In the code-switching mode, evidence was found for the gender default strategy (common in Dutch, masculine in Spanish), the analogical gender strategy (i.e., the preference to assign the gender of the translation equivalent) as well as two thus far unattested strategies involving a combination of a default gender and the use of a non-prototypical word order. External factors such as age of onset of bilingualism, amount of exposure and use of both languages had an effect on both gender accuracy in the monolingual modes and assignment strategies in the code-switching modes. Abstract: This study examines heritage speakers of Spanish in the Netherlands regarding their production of gender in both their languages (Spanish and Dutch) as well as their gender assignment strategies in code-switched constructions. A director-matcher task was used to elicit unilingual and mixed speech from 21 participants (aged 8 to 52, mean = 17). The nominal domain consisting of a determiner, noun, and adjective was targeted in three modes: (i) Unilingual Spanish mode, (ii) unilingual Dutch mode, and (iii) code-switched mode in (Dutch to and Spanish to Dutch). The production of gender in both was from the respective monolingual norms, especially in Dutch, the dominant language of the society. In the code-switching mode, evidence was found for the gender default strategy (common in Dutch, masculine in Spanish), the analogical gender strategy (i.e., the preference to assign the gender of the translation equivalent) as well as two thus far unattested strategies involving a combination of a default gender and the use of a non-prototypical word order. External factors such as age of onset of bilingualism, amount of exposure and use of both languages had an effect on both gender accuracy in the monolingual modes and assignment strategies in the code-switching modes. Abstract: This study examines heritage speakers of Spanish in the Netherlands regarding their production of gender in both their languages (Spanish and Dutch) as well as their gender assignment strategies in code-switched constructions. A director-matcher task was used to elicit unilingual and mixed speech from 21 participants (aged 8 to 52, mean = 17). The nominal domain consisting of a determiner, noun, and adjective was targeted in three modes: (i) Unilingual Spanish mode, (ii) unilingual Dutch mode, and (iii) code-switched mode in both directions (Dutch to Spanish and Spanish to Dutch). The production of gender in both monolingual modes was deviant from the respective monolingual norms, especially in Dutch, the dominant language of the society. In the code-switching mode, evidence was found for the gender default strategy (common in Dutch, masculine in Spanish), the analogical gender strategy (i.e., the preference to assign the gender of the translation equivalent) as well as two thus far unattested strategies involving a combination of a default gender and the use of a non-prototypical word order. External factors such as age of onset of bilingualism, amount of exposure and use of both languages had an effect on both gender accuracy in the monolingual modes and assignment strategies in the code-switching modes.


Introduction
This study explores how bilingual speakers juggle languages with conflicting features, with a specific focus on gender in nominal constructions. We study heritage speakers of Spanish in The Netherlands. While much research has been carried out with heritage speakers of Spanish in the US, heritage Spanish in contact with other languages, including Dutch, has been relatively less explored in previous literature (exceptions being Irizarri van Suchtelen (2016) and van Osch (2019)). Unlike most previous studies, we examine not only the heritage language (HL), Spanish, but also the societal language, Dutch. Even though the implicit assumption in the HL literature is often that the dominant language of the society is acquired in a completely monolingual-like manner, this generally remains a mere assumption.
In addition to examining heritage speakers' use of gender in both their languages in unilingual mode, we also test them in bilingual (code-switching) mode. To our knowledge, there are no studies regarding gender assignment in code-switching for this particular language combination. Most of

Introduction
This study explores how bilingual speakers juggle languages with conflicting features, with a specific focus on gender in nominal constructions. We study heritage speakers of Spanish in the Netherlands. While much research has been carried out with heritage speakers of Spanish in the US, heritage Spanish in contact with other languages, including Dutch, has been relatively less explored in previous literature (exceptions being Irizarri van Suchtelen (2016) and van Osch (2019)). Unlike most previous studies, we examine not only the heritage language (HL), Spanish, but also the societal language, Dutch. Even though the implicit assumption in the HL literature is often that the dominant language of the society is acquired in a completely monolingual-like manner, this generally remains a mere assumption.

Previous Literature
This section summarizes and discusses the relevant literature on bilinguals speaking either Spanish, Dutch, or both, and specifically, their linguistic behavior when it comes to gender.

Dutch and Spanish Gender in the Speech of Heritage Bilinguals
Several studies on Spanish heritage speakers concerning the acquisition of Spanish gender have been carried out, most of which show that heritage speakers behave differently from monolingual speakers when it comes to gender assignment and agreement, although there are differences between studies depending on the tasks used and the profile of the speakers examined.
Previous studies on gender in heritage speakers have focused on adult heritage speakers in the US. Of these, many have included a comparison to L2 learners, to examine the effect of the age of onset. Montrul et al. (2008), using an oral picture description task, a written comprehension task, and written recognition of gender agreement, demonstrated that both adult heritage speakers and L2 speakers of Spanish are less accurate than monolinguals when it comes to gender assignment and agreement. Montrul et al. (2014) found similar results using a word repetition task (WRT), a grammaticality judgment task (GJT), and a gender monitoring task (GMT), in which the participants were asked to determine the gender of the target noun. , in contrast, reports that adult heritage speakers, unlike L2 speakers, approached the levels of monolingual native speakers in an oral picture description task and a written gender recognition task.
Several studies involving child heritage speakers of Spanish in the US also report differences in the development of gender in heritage Spanish as compared to monolingual acquisition. Montrul and Potowski (2007) report that English/Spanish bilingual children (both heritage and L2) between the ages of 6 and 11 who were enrolled in a two-way immersion program made more gender errors in a picture retelling task than monolingual native children. Cuza and Pérez-Tattam (2016) compared younger and older US-born child heritage speakers of Spanish from age 5 to 11 in a picture description task and found deviance from the monolingual groups for all different age groups. Several studies suggest that once acquired knowledge of gender can be subsequently lost again. Anderson (1999) followed two Puerto Rican siblings, who had immigrated to the US at age 2 and 4, respectively. Two years after arrival, only the youngest sibling deviated from age-appropriate expectations. Another two years later, both siblings diverged from age-matched monolinguals, and their error rates had moreover increased relative to the first round of data collection. Sánchez-Sadek et al. (1975) also report a loss of accuracy with gender in a cross-sectional study comparing child heritage speakers in kindergarten and in third grade. In this study, the older children performed less accurately than the younger ones. Goebel-Mahrle and Shin (2020), who looked at child heritage speakers and monolinguals in two different age groups using a corpus study, found no differences between child heritage speakers and monolinguals aged 5-6, but attested lower accuracy in older heritage speakers (age 9 to 11) compared to the younger ones.
Given that English, the societal language for the Spanish heritage speakers in the above-mentioned studies, lacks gender in the nominal domain, these results may be attributed to cross-linguistic influence. Some studies have also looked at heritage Spanish in contact with a societal language that does instantiate gender, such as German or Dutch. Irizarri van Suchtelen (2016) looked at the performance of adult Spanish heritage speakers in The Netherlands on Spanish gender agreement in oral production (a combination of spontaneous speech and video and picture description tasks) and found small differences between monolinguals and heritage speakers, though only those who grew up learning both languages simultaneously, and especially for agreement outside the DP, for instance with anaphora and predicative adjectives (see also van Osch et al. 2014). One longitudinal study on three child heritage speakers of Spanish in Germany (Kuchenbrandt 2005), using audio-and video-recorded unstructured play sessions, found no differences from monolingual children whatsoever. These children were monitored at a very young age though (ca. 1;2 until 2;3,30), during the initial development of the gender system. This small sample of studies thus seems to suggest that gender in heritage Spanish may be less vulnerable when the contact language also has gender, though more studies are needed. Moreover, when comparing heritage Spanish in the US to heritage Spanish in Europe, we have to be careful in attributing differences in outcomes to the influence from the other language, considering the important differences between the two continents regarding the prestige of Spanish as a minority language and the type (size/density) of the Spanish-speaking communities (see also van Osch and Sleeman 2018;Kupisch 2013;Kupisch and Rothman 2018).
As for language-external effects on the acquisition of gender in heritage Spanish, age of onset was included as a variable of interest by Montrul and Potowski (2007), but they found no difference between children who were exposed to English from birth and those who started acquiring English after age four. Cuza and Pérez-Tattam (2016) demonstrate that in the bilingual children they tested, the amount of Spanish used by the children with the father was an important predictor for their accuracy scores with gender. Irizarri van Suchtelen (2016) and van Osch et al. (2014) mention that adult heritage speakers who grew up with two Spanish-speaking parents showed less divergence than those who grew up in mixed families, indicating that a higher amount of exposure in the home results in more monolingual-like outcomes. Similar input effects have also been reported on the acquisition of gender in other HLs, such as heritage Russian in Norway (Rodina and Westergaard 2017).
Although it is sometimes (often implicitly) assumed that heritage speakers acquire the dominant language of society in a monolingual-like manner, this is not necessarily the case. Hulk and Cornips (2006) compared child heritage speakers (n = 14) of different languages with gender (French, Moroccan Arabic/Berber) and without gender (Turkish, Akan, Ewe, Sranan) in The Netherlands to monolingual peers (n = 6) with respect to Dutch gender in an elicited production task. Although Dutch gender is acquired relatively late, monolingual children of the oldest age group (9;3-10;5) showed complete acquisition, whereas the heritage speakers of the same age performed around or below chance level, overgeneralizing the common gender. A similar pattern was attested by Blom et al. (2008), for Moroccan-Arabic/Berber child heritage speakers in The Netherlands, using a sentence completion task. While these results seem to imply incomplete acquisition of gender, Cornips (2008) mentions that overgeneralization of the common gender may also be an identity marker for certain ethnic minority groups in The Netherlands.
Some studies of heritage speakers in The Netherlands have also looked into the role of external variables, such as the age of onset and the quality and quantity of exposure to the successful acquisition of Dutch gender. Cornips and Hulk (2008), who compared different studies of child heritage speakers in The Netherlands, identified both an early age of onset and a lengthy and intensive input as two extralinguistic success factors in the acquisition of Dutch grammatical gender. Unsworth et al. (2014), on the other hand, who ran an elicited production task with 137 heritage speakers of English in The Netherlands, found that while age of onset did not play an important role, various factors related to both the quantity and the quality of the input as well as language use by the children were important predictors of the children's gender accuracy in Dutch.
Finally, language-specific properties of the HL may affect Dutch gender acquisition, as is argued by Egger et al. (2018). They studied 21 Greek child heritage speakers in The Netherlands (aged 4;4-13;3) in a Greek language school with two elicited production tasks and an acceptability judgement task, and found out that gender acquisition in Dutch is accelerated-at least in the initial stages-by cross-linguistic influence from Greek, which-like Spanish-has a gender system that is acquired early and that is mostly predictable due to morphophonological cues.

Dutch and Spanish Gender in Code-Switched Speech
In code-switched speech, bilinguals can embed nouns from one language into the other as in (1) (example from the Bangor Miami corpus). If the language in which the noun is embedded has grammatical gender, the embedded noun needs to be classified in one of the gender categories. Since the seminal work of Poplack et al. (1982) on the factors that influence gender assignment to English loanwords in Puerto Rican Spanish, several studies have focused on identifying the strategies that influence gender assignment in different Spanish/English bilingual populations (e.g., Otheguy and Lapidus 2003;Balam 2016;Liceras et al. 2008Liceras et al. , 2016Valdés Kroff 2016;Królikowska et al. 2019;Balam et al. 2021). Bilinguals using a default gender strategy assign most embedded nouns to a single gender, the default, which is masculine in Spanish (Roca 1989) as can be seen in (2a), where the masculine determiner is used, even though the translation equivalent of table in Spanish would be mesa, which is feminine. Bilinguals using an analogical gender strategy assign the gender of the translation equivalent to the embedded noun, as in (2b). Finally, bilinguals may use phonological cues, if gender in the matrix language-the language in which the noun is embedded-is assigned according to certain phonological indicators, as is the case in Spanish. This strategy is exemplified in (2c), where umbrella, which is masculine in Spanish (el paraguas), is assigned feminine gender based on the ending in -a.

2.
a. el Most research on gender assignment strategies in code-switching has focused on the oral production of Spanish determiner mixed NPs by (mostly adult) bilinguals of Spanish/English, with inconsistent findings. Jake et al. (2002), looking at Spanish/English adult bilinguals from the US, found that the analogical gender strategy is mostly used, while the masculine default strategy is applied to a lesser extent. In contrast to these findings, other studies have reported a preference for the masculine default in Spanish/English bilingual communities, such as Otheguy and Lapidus (2003) for bilinguals from New York, Balam (2016) for Northern Belize, and Valdés Kroff (2016) for Miami (using spoken data from the Bangor Miami corpus, Deuchar et al. (2014)). Valdés Kroff (2016) and Balam (2016) add that masculine gender was even assigned to nouns referring to a female person. The feminine gender, on the other hand, was assigned only to nouns with a feminine translation equivalent. Moreover, feminine-marked DPs were always preceded by a repetition/hesitation/disfluency. Królikowska et al. (2019) looked at elicited data from adult Spanish/English bilinguals from four different communities (San Juan (Puerto Rico), Granada (Spain), El Paso (Texas), and State College (Pennsylvania)), and showed that the code-switching strategy differed between them. While the bilinguals from San Juan and State College preferred a masculine default strategy, even when the translation equivalent was feminine, bilinguals from El Paso and Granada assigned more feminine determiners to nouns with feminine translation equivalents, indicating the employment of an analogical gender strategy. Królikowska et al. (2019) relate this difference to the amount of code-switching in a particular community: They suggest that masculine default gender strategy is preferred in bilingual communities where code-switching occurs more frequently. Some research has investigated gender assignment strategies in code-switched speech of child bilinguals, again with different outcomes. Liceras et al. (2008) compared child simultaneous Spanish/English bilinguals to adult L1 English and French L2 speakers of Spanish, and to L1 Spanish speakers of L2 English. Using a grammaticality judgment task for the adult participants and spontaneous production data from the child simultaneous bilinguals, they found diverging outcomes. The L1 speakers of Spanish seemed to apply the analogical gender strategy in most cases, while the L2 speakers of Spanish preferred the masculine default gender strategy. For the child simultaneous bilinguals, it remained unclear whether they preferred either one of the strategies. Liceras et al. (2012) looked at simultaneous and sequential bilingual children, using an oral acceptability judgment task, and found that sequential bilinguals preferred the analogical gender strategy, while simultaneous bilinguals adhered less to this strategy. Liceras et al. (2016) repeat that sequential bilinguals with Spanish as L2 prefer the masculine default strategy to the analogical gender strategy, and adds that both child and adult L1 Spanish/L2 English speakers prefer the analogical gender strategy. Balam et al. (2021) analyzed narrative data from simultaneous child bilinguals in Miami (aged 7 to 8 and 10 to 11) in the CHILDES database (MacWhinney (2000), collected by Pearson (2002) using the Frog story (Mayer 1969)). Half of the children attended an English immersion program, and the other half a two-way bilingual program. The authors looked at the production of mixed nominal constructions in Spanish/English. In mixed nominal constructions, the masculine default gender strategy was used to assign gender to most English nouns with feminine translation equivalents by all groups, while the feminine gender was assigned infrequently. They also report that the type of schooling does not affect the assignment of the feminine or masculine gender in mixed nominal constructions, as all groups behaved similarly. They point out that comparative research on different child bilingual populations is necessary to gain insight into the gender assignment strategies they employ in code-switched nominal constructions, which is what we aim to do in the present study, with bilinguals of a different language pair: Dutch/Spanish.
Few studies have looked at Spanish in combination with a language other than English. One study by Bellamy et al. (2018) looked at early sequential Purepecha/Spanish bilinguals (aged 15 to 45, Purepecha L1). Purepecha, like English, has no gender. They used a director-matcher task (cf. Gullberg et al. 2009) to elicit code-switched constructions and an online alternative forced-choice judgment task. In production, there was a preference for the masculine default strategy. However, in the judgment task, participants were influenced by the word ending. The ending -a of Purepecha nouns, which matches feminine gender marking in Spanish, led them to prefer a Spanish feminine determiner, even when the translation equivalent was masculine, providing support for the use of phonological cues in gender assignment in code-switching. More support for this type of gender assignment strategy is offered by a study by Parafita Couto et al. (2015), which looked at gender assignment strategies in mixed nominal constructions in Spanish/Basque. Adult simultaneous and sequential bilinguals were examined using naturalistic data, a director-matcher task, and an auditory judgment task. The results indicated that the feminine gender is the most frequently attested gender in adult spontaneous production and is also the preferred gender in the judgment task. This is probably due to the frequent word ending -a (as in Purepecha), thus the feminine gender was assigned frequently due to this phonological cue. The preference for feminine was also attested in Iriondo Etxeberria's (2017) study on Basque-Spanish bilinguals, as reported in Ezeizabarrena and Munarriz-Ibarrola (2019). However, Badiola and Sande (2018) reported a preference with masculine default in Basque-Spanish using an acceptability judgement task, even though they also observed a feminine preference with nouns with lexical -a. Other studies on Basque-Spanish observed no gender preference (see Ezeizabarrena 2009 for children production and Ezeizabarrena and Munarriz-Ibarrola (2019) for adult judgments). Given that these studies on Spanish-Basque mixed DPs revealed conflicting results that may be accounted for by the linguistic profile of the participants or sociolinguistic factors, Munarriz-Ibarrola et al. (2019) designed a forced-switch elicitation task to elicit mixed DPs with a Spanish determiner and a Basque noun. They tested 30 Spanish/Basque bilinguals with different profiles and sociolinguistic backgrounds. Their analysis revealed participants' L1 as a strong factor in the variability attested: L1 Spanish speakers relied predominantly on the analogical criterion, whereas speakers with only Basque as L1 followed mainly the phonological criterion.
Since the studies on code-switching in Spanish mentioned above have focused on Spanish in combination with a language that lacks gender (i.e., English, Purepecha, and Basque), it is interesting to look at Spanish in contact with a language that does instantiate gender. A study by Eichler et al. (2012) studied the code-switched speech of 14 bilingual children of German and a Romance language (French, Italian, or Spanish) and two Italian-French bilingual children by analyzing videotaped spontaneous conversations. They found that the children preferred to assign a default gender to nouns as they code-switch, but also found evidence for an analogical strategy. There was no evidence for a difference in code-switching strategies between balanced and unbalanced bilinguals. However, French-German bilinguals in Radford et al. (2007), who analyzed longitudinal data from four bilingual children (between ages 1;5 and 5;1) in Germany, did rely on the analogical criterion. When children produced mixed nominal constructions with French articles and masculine or feminine German nouns, they tended to match the gender of the French article to the gender of the German noun. However, they also report the use of the masculine default gender with neuter German nouns.
Research on code-switching in Dutch in contact with other languages, albeit scarce, also shows different strategies being employed. In Clyne's (1977; see also Clyne and Pauwels 2013) corpus of elicited production data of 200 English/Dutch bilinguals (heritage speakers and L2 learners) in Australia, a tendency towards the analogical gender strategy based on translation equivalence is attested for English insertions in Dutch. However, a common gender default is the main strategy, which is explained by the phonetic similarity between the Dutch determiner [d Abstract: This study examines heritage spea production of gender in both their langua assignment strategies in code-switched const unilingual and mixed speech from 21 particip consisting of a determiner, noun, and adjectiv mode, (ii) unilingual Dutch mode, and (iii) cod and Spanish to Dutch). The production of gen respective monolingual norms, especially in code-switching mode, evidence was found masculine in Spanish), the analogical gender s translation equivalent) as well as two thus fa default gender and the use of a non-prototyp of bilingualism, amount of exposure and us accuracy in the monolingual modes and assig ə Keywords: gender; heritage bilingualism; cod

Introduction
This study explores how bilingual speake specific focus on gender in nominal construct Netherlands. While much research has been ca heritage Spanish in contact with other languag in previous literature (exceptions being Irizarr most previous studies, we examine not only the language, Dutch. Even though the implicit assu language of the society is acquired in a comple a mere assumption.

Introduction
This study explores how bilingual speakers juggle languages with conflicting features, with a specific focus on gender in nominal constructions. We study heritage speakers of Spanish in the Netherlands. While much research has been carried out with heritage speakers of Spanish in the US, heritage Spanish in contact with other languages, including Dutch, has been relatively less explored in previous literature (exceptions being Irizarri van Suchtelen (2016) and van Osch (2019)). Unlike most previous studies, we examine not only the heritage language (HL), Spanish, but also the societal language, Dutch. Even though the implicit assumption in the HL literature is often that the dominant language of the society is acquired in a completely monolingual-like manner, this generally remains a mere assumption.
]. Treffers-Daller (1993), who studied 34 Dutch/French bilinguals in Brussels by analyzing natural speech and elicited production data, found a strong preference for an analogical strategy based on the gender of the noun in the original language. The French gender system, which distinguishes masculine and feminine gender, partly overlaps with the Brussels Dutch system-unlike Standard Dutch-which distinguishes three genders, i.e., masculine, feminine, and neuter. The Dutch/French bilinguals assign the gender of French nouns to French noun insertions in Dutch. Nouns that are, for example, masculine in French receive Dutch masculine gender when inserted in Dutch (instead of the gender of the Dutch translation equivalent). This different analogical strategy occurred besides a neuter default strategy, which was used to a lesser extent. Finally, Boumans' (1998) analysis of naturalistic speech recordings of 15 Moroccan Arabic/Dutch bilinguals in The Netherlands (heritage speakers and L2 learners) found few Moroccan Arabic nouns embedded in Dutch, all of which related to culturally specific concepts or repetitions from the immediately preceding discourse. Common gender was assigned to all of these nouns where agreement features surfaced, but it must be noted that the culture-specific nouns do not have clear translation equivalents in Dutch. In conclusion, these bilinguals resorted to a common default strategy when inserting Moroccan Arabic nouns without a clear translation equivalent.

Research Questions and Hypotheses
The objectives of this paper are (i) to study the extent to which Spanish heritage speakers produce target-like 3 gender in the two unilingual speech modes (Dutch and Spanish), (ii) to detect gender assignment strategies in code-switched speech, (iii) to compare heritage speakers' gender assignment in the different speech modes and code-switching directions, and (iv) to explore the effect of extralinguistic factors such as age, language exposure in and outside the home domain, exposure to Spanish media, age of arrival in The Netherlands, and length of residence in The Netherlands on heritage speakers' gender assignment, both in unilingual and in code-switching mode.
Our research questions are: RQ1. Are Spanish heritage speakers in The Netherlands target-like in both languages in their use of the two gender systems?
RQ2. How do Spanish heritage speakers resolve gender conflict sites in code-switched nominal constructions?
RQ3. Do any extralinguistic factors related to input and usage of Spanish and Dutch modulate the speakers' behavior?
Based on previous research (e.g., Montrul and Potowski 2007;Montrul et al. 2008; Cuza and Pérez-Tattam 2016; Goebel-Mahrle and Shin 2020), we hypothesize that the heritage speakers of Spanish in this study will deviate from what is typically reported for the acquisition of gender in monolingual Spanish, in particular when it comes to feminine and non-canonical nouns. However, the deviance may be less pronounced than what is generally reported for studies in the US (cf. van Osch et al. 2014; Irizarri van Suchtelen 2016).
Since Dutch can be assumed to be the dominant language for the participants in this study, we expect less deviance with Dutch gender than with Spanish gender. It may even be the case that the fact that their HL has gender as well has a positive effect on their acquisition of Dutch gender (cf. Egger et al. 2018). On the other hand, they may also overgeneralize common gender, which would be in line with other previous studies such as Hulk and Cornips (2006) and Unsworth et al. (2014).
We moreover expect to see gender accuracy in both languages modulated by language-external factors that have been previously reported in the literature, such as age of onset (Cornips and Hulk 2008), language use (Cuza and Pérez-Tattam 2016), and amount of exposure (Cornips and Hulk 2008;Irizarri van Suchtelen 2016;Unsworth et al. 2014).
As for the code-switching mode, we expect to find evidence for both a masculine default strategy (e.g., Bellamy et al. 2018) and the analogical gender strategy when inserting Dutch nouns into Spanish, potentially depending on the profile of the speaker. Liceras et al. (2008Liceras et al. ( , 2016 found that speakers who had Spanish as an L1 used the analogical gender more than speakers who had Spanish as their L2. Even though all speakers in our study are L1 speakers of Spanish, we may expect to find a possible division between these two strategies based on differences in language dominance between individual speakers. For Spanish noun insertions into Dutch, we expect to see either a common default strategy (cf. Boumans 1998), a neuter default strategy (cf. Treffers-Daller 1993), or a combination of a default and analogical strategy (cf. Clyne 1977; Clyne and Pauwels 2013), depending on the speaker.

Materials
In order to elicit production data, we designed a director-matcher task (cf. Gullberg et al. 2009). In this task, two people (a director and a matcher) sit across each other at a table with a division in the form of a cardboard box or a large book between them, so that they cannot see the other part of the table. Each participant has the same set of 30 cards with images of 15 different objects in four different colors. The director instructs the matcher to put the images in the order that was put before him/her randomly. This method is used to elicit nominal constructions consisting of a determiner, noun, and adjective (for example: "next to the black cross is a yellow candle").
In the present study, we tested the default gender strategy and the analogical gender strategy and did not consider the phonological cues strategy because Dutch has very few words ending in -a or -o that are not borrowed from or present in Spanish, and because Dutch does not assign gender based on phonological cues. To illustrate hypothetical strategies in Spanish/Dutch code-switching, Table 3 provides two examples of Dutch nouns embedded in Spanish. Bilinguals using a default strategy would assign all nouns to one gender category (for instance masculine gender in Spanish). If the analogical gender strategy is used, masculine gender would be assigned to hamer (cf. martillo (masc.) 'hammer'), while feminine gender would be assigned to huis (cf. casa (fem.) 'house'). Table 3. Examples of potential gender assignment strategies with Dutch nouns embedded in Spanish.

hamer-'hammer'
2. huis-'house' Default gender (masc.) el hamer el huis Default gender (fem.) la hamer la huis Analogical gender el hamer la huis Spanish equivalent el martillo (masc.) la casa (fem.) Table 4 provides two examples of Spanish nouns embedded in Dutch, illustrating all hypothetical strategies. As in Spanish, bilinguals using a default strategy would assign all nouns to one gender category, which is common or neuter in Dutch. If the analogical gender strategy is used, common gender would be assigned to martillo (cf. hamer (common) 'hammer'), and neuter gender would be assigned to casa (cf. huis (neuter) 'house'). The objects depicted in the images were counterbalanced for the gender of the noun in Dutch, the gender in Spanish, and the canonicity in Spanish. In other words, half of the objects were masculine in Spanish and the other half were feminine; half were common in Dutch and the other half were neuter; and half were canonical and the other half were non-canonical in Spanish. All different combinations of these three variables are represented by two nouns (see Table 5) (except for nouns that were neuter in Dutch and had a non-canonical feminine translation equivalent, since only one depictable object of this combination of variables was found) in order to test which code-switching strategies were used. The gender (both Dutch and Spanish) and canonicity of synonyms were also taken into account during the design of the study, in order to keep the stimuli counterbalanced for all participants. Given the diverse background of the heritage bilinguals in our study, the objects were controlled as much as possible for lexical variation in the different Spanish dialects (cf. Balam et al. 2021). In most cases, when participants used different words in Spanish, they were both of the same gender (e.g., gorro (masc.) or sombrero (masc.) for 'hat'), in which case the variation did not affect the results. The only object for which the gender and canonicity of the translation into Spanish differed between participants was the comb, which was mostly translated as peine (masc.), in the Spanish unilingual mode but 7 times as peinilla (fem.) and once as cepillo (masc.). Table 5. Objects in the Director-Matcher task according to the different gender and shape variables.

Common Gender Neuter
Canonical masculine hamer/martillo 'hammer' hoed/sombrero (or gorro) 'hat' The images occurred in three Spanish colour adjectives that inflect for gender (negro/-a 'black', blanco/-a 'white', and rojo/-a 'red'), and one that does not (verde 'green') 4 . Each of the images occurs twice in a different color. The images that were used as stimuli in the task can be found in Appendix A.
The participants performed the task four times. The first time, participants were instructed to only use Spanish, and the second time only Dutch, to elicit nominal constructions in the unilingual modes. The third time, they were instructed to only use Spanish, except for the object itself, which had 4 This study also included heritage speakers of Papiamento and Turkish, who did the same director-matcher task as the Spanish speakers in order to compare the results of speakers of different HLs. In an ideal situation, four color adjectives that inflect for gender in Spanish would have been chosen. However, because Papiamento borrowed many of its color adjectives from Dutch, several color adjectives that do inflect in Spanish could not be used, as this would complicate Dutch-Papiamento code-switching. Three color adjectives that do inflect for gender in Spanish (black, white, red) remained, and a fourth one (green) was picked because the Papiamento word is different from Dutch. to be said in Dutch. Finally, participants were asked to use Dutch, but had to say the object in Spanish. This way, code-switched nominal constructions were elicited in both directions, a method used before by Bellamy et al. (2018).
All participants completed a background questionnaire, mainly aimed at their language history, education, language use, and exposure to Spanish. They were also asked to judge their own proficiency in Spanish in speaking, reading, writing, and listening (or, in case of child participants, the parents reported their child's proficiency) 5 . For participants younger than 12 years, this questionnaire was filled in by (one of) the parents. The questionnaire for these younger participants also included a questionnaire on the parents' language history, education, and language use within the family.

Procedure
Before the experiment started, the participants were asked in which language they preferred the oral instructions, Dutch or Spanish. After that, they were informed that the task would be audio recorded, and before starting the task, they (or their parents in the case of underaged participants) were requested to sign a consent form. Other present participants were asked to leave the room if they had not completed the task yet. Next, the participants were instructed to play the first two rounds in the unilingual modes. The participants played the game with the instructor, a parent, or someone else, but not with another participant. When they finished the first two rounds, they were instructed to play the third and fourth round in the code-switching modes. After completing all four rounds of the task, the participants (and their parents) were asked to fill in the background questionnaire. Both the consent form and background questionnaire were filled out in the language of their preference. At the end of the procedure, the participants received compensation for their participation in the form of a toy for children, and a monetary compensation for teenagers and adults.

Participants
Twenty-one heritage speakers of Spanish (6 male, 15 female; for an overview, see Table 6), who were born in The Netherlands or arrived during their primary education, were recruited. The participants were either born in a Spanish-speaking country or had one or two Spanish-speaking parents. They lived across The Netherlands, mainly in the western part (Randstad area). Their ages ranged from 8 to 52 years (mean = 17).
The heritage countries of the participants and/or their families included a range of countries in Latin America (Mexico, Colombia, Peru, Argentina, Ecuador, and Paraguay, among others) and Spain.
Nine participants had lived in The Netherlands their entire lives. Two others were born in The Netherlands but spent a few years abroad (when aged 4-6 and 11-15). Four arrived before their primary education (which is obligatory from the age of 5 in The Netherlands), and the remaining six arrived between the ages of 7 and 12. As mentioned, age of onset in Dutch was not taken as a cut-off point to distinguish heritage speakers from child L2 learners in this study; it was rather treated as a variable of interest to be included in the analyses.
Ten participants were born in families with a Spanish-speaking mother and a Dutch-speaking father, and five in families with two Spanish-speaking parents. Two participants had one Spanish-speaking parent and a parent speaking another language (German and English). The other four participants reported having one or two parents with whom they spoke both Dutch and Spanish, three of whom indicated that (one of) their parents moved to The Netherlands as a child, meaning that the participants were third-generation heritage speakers. 5 To reduce the time of the experimental procedure, we did not include a separate measure of general proficiency, and used the participants' self-reports for our analyses. Previous research with heritage speakers of Spanish in The Netherlands (van Osch 2019) has shown self-reports to correlate significantly with other measures of proficiency such as the DELE (Diploma Español de Lengua Extranjera) and lexical decision tasks. Proficiency in Dutch was not measured, as we assumed the speakers to be monolingual-like in their dominant language. Self-reported usage of Dutch within and outside the immediate family was included in the background questionnaire. For participants younger than 21, the immediate family consisted of parents and siblings, while for participants older than 21, immediate family members were their partner and children. Within the immediate family, usage of Dutch ranged from 0% to 100% (mean = 55%), as did usage of Spanish (mean = 41%). Usage of Dutch outside the immediate family was at least 4%, and at most 100% (mean = 68%), while usage of Spanish outside the immediate family ranged from 0% to 96% (mean = 29%).
Parents were asked to report on the amount of input they provided to their children in Spanish and Dutch, both in the past and at the time of testing. Children's input when aged 0-4 ranged from 0% to 53% for Dutch (mean = 33%), and from 48% to 100% for Spanish (mean = 60%). The amount of Dutch input provided by the parents at the time of testing was between 0% and 55% (mean = 38%), and the current Spanish input between 40% and 100% (mean = 55%).
Other input from Spanish-language media such as television, books, music, and social media was reported in hours per week. Self-reports ranged from 0 to 33 h of Spanish media input per week (mean = 12.1 h).
Participants were asked to report their language skills (reading, writing, listening, speaking) on a scale from 1 (basic) to 3 (advanced). The means of these skills ranged from 1 to 3 (mean = 2.4).
Five participants, four of whom were adults and one teenager, took or had taken classes in the HL, while the other participants did not.
Participants reported the frequency of their visits to their heritage country on a scale from 1 (never) to 4 (once or multiple times a year). One participant never visited the heritage country, while the other participants visited less than once every five years (2) or once or multiple times every five years (3) (mean = 2.4).
Finally, participants were asked to provide information on any cognitive/learning problems. Two participants reported having ADHD (1), one had reading and reading comprehension problems (2), and one had dyslexia (3).
The experiment followed the Ethics Code for linguistic research in the faculty of Humanities at Leiden University, which approved its implementation.

Coding
The director-matcher task, which consisted of four parts where different types of speech (unilingual Dutch, unilingual Spanish, Dutch with embedded Spanish noun, Spanish with embedded Dutch noun) were elicited, sometimes caused confusion, which led to participants using a non-target speech mode during the experiment. Unilingual nominal constructions uttered in one of the two code-switching parts (n = 40) were excluded from the analysis, since they do not represent unilingual speech but rather the embedding of full nominal constructions from one language in the other. Code-switched nominal constructions that were not in the right directionality for a given mode, however, were included.
Some participants referred multiple times to the same object, often in constructions like "next to the A is the B, and next to the B is the C". Both nominal constructions were included in the analysis. Nominal constructions not referring to the target objects, such as "the next card" or "the final row" were included as well, as they also provided relevant information about grammatical gender.
Whenever a participant corrected him/herself, the final utterance was coded and used for the analysis, while also coding that the phrase had been corrected and if this correction resulted in a target-like gender assignment. Repetitions were marked as well. The coded data sheets are available as Supplementary Materials.

Results
In this section, we will describe the results in unilingual Dutch mode, unilingual Spanish mode, and code-switching mode; both in terms of linguistic and extralinguistic effects. Whenever statistical analyses were possible, generalized linear mixed effects models were carried out, using the lme4 package in R (R Development Core Team 2019). Nested models were compared by adding or removing potentially significant predictors at a time (stepwise regression) and performing log likelihood ratio tests. A significant difference indicates that the model with the additional predictor variable fits the data best. Random intercepts and slopes for "subject" and "item" (the object that had to be described) were also included if these significantly improved the model (following Baayen et al. 2008). Throughout the paper, p-values of all significant predictors in the optimal models are reported.
6.1. Unilingual Spanish 6.1.1. Linguistic Variables The total of 913 DPs uttered in the Spanish unilingual mode included combinations of a determiner+noun (180 instances), noun+adjective (50 instances), or determiner+noun+adjective (683 instances). Of the DPs containing an adjective, a considerable number (192) contained a non-canonical adjective, such as verde ('green'), for which the form of the masculine and the feminine adjective do not differ. Of those, 13 cases, which contained only a non-canonical adjective and no determiner, were excluded from the analysis, as it was impossible to determine the gender for those constructions. For the remaining 900 produced DPs, the overall gender accuracy was 92.22% (830 of 900). Out of the 70 incorrect utterances, 29 were errors with just the adjective, 11 with just the determiner, and 30 with both the determiner and the adjective. Overall, accuracy was higher with masculine nouns (96.92%; 472 of 487) than with feminine nouns (86.68%; 358 of 413), and more with canonical nouns (96.2%; 532 of 553) than with non-canonical nouns (85.88%; 298 of 347). A generalized linear mixed effects model was run to check whether the effects of target gender and canonicity were statistically significant. This was indeed the case for both factors (β = 2.19, SE = 0.54, z = 4.06, p < 0.001 for target gender and β = 2.12, SE = 0.55, z = 3.83, p < 0.001 for canonicity). These effects are illustrated in Figure 1.

Extralinguistic Variables
For the analysis of the extralinguistic factors determining gender accuracy, another set of generalized linear mixed effects models was run with gender accuracy as the dependent variable, in which the following fixed factors were introduced stepwise to the model: The age of arrival in the Netherlands, the number of years spent in the Netherlands and in the country of origin, the usage of both languages with immediate family and other contacts, the number of hours of 'other' input (that is, exposure through TV, social media, music, and books) in Spanish per week, the self-reported proficiency (or in the case of child participants, the proficiency as reported by the parents) in Spanish (averaged over reading, writing, listening, and speaking), and whether or not they had received some formal instruction in Spanish. The best-fitting model included random intercepts for item and subject, as well as two main effects, which were both statistically significant: Use of the HL with immediate

Extralinguistic Variables
For the analysis of the extralinguistic factors determining gender accuracy, another set of generalized linear mixed effects models was run with gender accuracy as the dependent variable, in which the following fixed factors were introduced stepwise to the model: The age of arrival in The Netherlands, the number of years spent in The Netherlands and in the country of origin, the usage of both languages with immediate family and other contacts, the number of hours of 'other' input (that is, exposure through TV, social media, music, and books) in Spanish per week, the self-reported proficiency (or in the case of child participants, the proficiency as reported by the parents) in Spanish (averaged over reading, writing, listening, and speaking), and whether or not they had received some formal instruction in Spanish. The best-fitting model included random intercepts for item and subject, as well as two main effects, which were both statistically significant: Use of the HL with immediate family (β = 4.26, SE = 1.65, z = 2.57, p = 0.01) and the amount of other input per week (β = 0.12, SE = 0.04, z = 3.19, p = 0.001) 6 .
The first effect indicates that the more a heritage speaker uses the HL to his/her parents and siblings, or in the case of some adult speakers their partner and/or children, the better their accuracy in gender in Spanish, as can be seen in Figure 2. This graph shows that those three with the highest amount of use are among the highest achieving heritage speakers. Moreover, those speakers who obtained accuracy scores below 80% are all speakers who reported using their HL 50% or less with their immediate family.
The number of hours of other exposure (that is, exposure through books, TV, social media, or music) also shows a positive correlation with gender accuracy in Spanish. As illustrated in Figure 3, the more hours a heritage speaker is exposed to this type of input in Spanish, the higher their accuracy with gender in Spanish.  This graph shows that those three with the highest amount of use are among the highest achieving heritage speakers. Moreover, those speakers who obtained accuracy scores below 80% are all speakers who reported using their HL 50% or less with their immediate family.

0%
The number of hours of other exposure (that is, exposure through books, TV, social media, or music) also shows a positive correlation with gender accuracy in Spanish. As illustrated in Figure 3, the more hours a heritage speaker is exposed to this type of input in Spanish, the higher their accuracy with gender in Spanish.
Another model search performed with the data of the child participants only, as the questionnaire for this group contained some additional questions. The final model included random intercepts for both subject and item, as well as two significant main effects: The number of years they had spent in the heritage country (β = 0.99, SE = 0.28, z = 3.49, p < 0.001) and a negative correlation with the 6 Even though the effect size of 'other input in Spanish' is considerably lower than that of 'use of the HL with immediate family´, we consider the high z-value and low p-value for the former variable to indicate that it is indeed a meaningful result deserving of mention. amount of previous input (between age 0 and 4) in Dutch from their parents (β = −4.98, SE = 1.83, z = −2.716, p = 0.006) 7 . These effects can be seen in Figures 4 and 5. unilingual Spanish mode. This graph shows that those three with the highest amount of use are among the highest achieving heritage speakers. Moreover, those speakers who obtained accuracy scores below 80% are all speakers who reported using their HL 50% or less with their immediate family.
The number of hours of other exposure (that is, exposure through books, TV, social media, or music) also shows a positive correlation with gender accuracy in Spanish. As illustrated in Figure 3, the more hours a heritage speaker is exposed to this type of input in Spanish, the higher their accuracy with gender in Spanish.    Even though the effect size of 'number of years spent in the heritage country' is considerably lower than that of 'previous parental input', we consider the high z-value and low p-value for the former variable to indicate that it is indeed a meaningful result that deserves mention.  Even though the effect size of 'number of years spent in the heritage country' is considerably lower than that of 'previous parental input', we consider the high z-value and low p-value for the former variable to indicate that it is indeed a meaningful result that deserves mention. Percentage correct

Previous parental input in Dutch
Accuracy gender in Spanish Figure 5. Relation between previous input by the parents in Dutch (age 0-4) and gender accuracy in unilingual Spanish mode (children only).

Linguistic Variables
In total, 727 DPs were uttered in the Dutch unilingual mode. These included combinations of a determiner+noun (31 instances), noun+adjective (37 instances), or determiner+noun+adjective (659 instances). In five cases, the gender of the DP could not be unambiguously determined, because the indefinite determiner (een-'a(n)') was used without an adjective. The remaining 722 cases were included in the analysis.
Dutch gender was target-like in 83.38% of all cases (602 of 722). Of the 120 incorrect utterances, 51 were errors with just the determiner, 66 with just the adjective (all of which had an indefinite determiner), and 3 with both the determiner and the adjective. As can be seen in Figure 6, the error rate was particularly high in neuter nouns (only 65.24% accuracy), compared to common gender nouns (98.48%). Within neuter nouns, accuracy was 69.65% for indefinite nouns and 58.27% for definite nouns. After a model search comparing generalized mixed effects models, the best fitted model included a significant main effect of target gender (β = 6.22, SE = 2.50, z = 2.49, p = 0.01) 8 , indicating that the difference between neuter and common nouns was indeed significant. On the other hand, the effect of definiteness, or the interaction between these two variables did not turn out to be significant. nouns (98.48%). Within neuter nouns, accuracy was 69.65% for indefinite nouns and 58.27% for definite nouns. After a model search comparing generalized mixed effects models, the best fitted model included a significant main effect of target gender (β = 6.22, SE = 2.50, z = 2.49, p = 0.01) 8 , indicating that the difference between neuter and common nouns was indeed significant. On the other hand, the effect of definiteness, or the interaction between these two variables did not turn out to be significant.

Extralinguistic Variables
A separate model search was done, to check whether any extralinguistic variables were significant predictors for gender accuracy in Dutch. The same factors were taken into consideration as for Spanish: The age of arrival in the Netherlands, the number of years spent in the Netherlands and in the country of origin, the usage of Dutch and the HL, both with immediate family and with non-immediate family and other contacts, and the number of hours of 'other' input in Spanish (reading, social media, TV, films, and music) per week, the general self-reported proficiency in Spanish and whether they had received formal instruction in their HL. These variables were introduced to the model in a stepwise manner, and each new model was compared to the previous one.

8
An anonymous reviewer pointed our attention to the relatively large standard error for this effect. However, given the large estimate, we consider this effect to be indeed a meaningful result that deserves mention.

Extralinguistic Variables
A separate model search was done, to check whether any extralinguistic variables were significant predictors for gender accuracy in Dutch. The same factors were taken into consideration as for Spanish: The age of arrival in The Netherlands, the number of years spent in The Netherlands and in the country of origin, the usage of Dutch and the HL, both with immediate family and with non-immediate family and other contacts, and the number of hours of 'other' input in Spanish (reading, social media, TV, films, and music) per week, the general self-reported proficiency in Spanish and whether they had received formal instruction in their HL. These variables were introduced to the model in a stepwise manner, and each new model was compared to the previous one. The best fitted model included random intercepts for both subject and item, as well as the following statistically significant variables: Age of arrival (β = −0.33, SE = 0.11, z = −2.96, p = 0.003), self-reported proficiency in Spanish (β = 3.84, SE = 0.96, z = 4.0, p < 0.001), and amount of other input per week (β = −0.14, SE = 0.05, z = −2.60, p = 0.009).
The first effect indicates that the earlier a heritage speaker arrived in The Netherlands, the better their accuracy in gender in Dutch, as can be seen in Figure 7. The best fitted model included random intercepts for both subject and item, as well as the following statistically significant variables: Age of arrival (β = −0.33, SE = 0.11, z = −2.96, p = 0.003), self-reported proficiency in Spanish (β = 3.84, SE = 0.96, z = 4.0, p < 0.001), and amount of other input per week (β = −0.14, SE = 0.05, z = −2.60, p = 0.009).
The first effect indicates that the earlier a heritage speaker arrived in the Netherlands, the better their accuracy in gender in Dutch, as can be seen in Figure 7. The effect of self-reported proficiency in Spanish indicates, perhaps surprisingly, that those who report a higher accuracy in Spanish, tend to obtain higher accuracy scores in Dutch gender, as illustrated in Figure 8.  The effect of self-reported proficiency in Spanish indicates, perhaps surprisingly, that those who report a higher accuracy in Spanish, tend to obtain higher accuracy scores in Dutch gender, as illustrated in Figure 8. The effect of self-reported proficiency in Spanish indicates, perhaps surprisingly, that those who report a higher accuracy in Spanish, tend to obtain higher accuracy scores in Dutch gender, as illustrated in Figure 8. Finally, the effect of other input indicated that the less heritage speakers were exposed to Spanish through reading, TV, music, and social media, the better their accuracy in gender in Dutch (Figure 9). Finally, the effect of other input indicated that the less heritage speakers were exposed to Spanish through reading, TV, music, and social media, the better their accuracy in gender in Dutch (Figure 9). For the child participants (age 12 or younger), the questionnaire also included questions about previous (from age 0 to 4) and current input from their parents in Dutch and Spanish. A separate model search was carried out with the data from these child participants only (n = 8). The final model included random intercepts for subject and item as well as a significant effect of average reported skill in the HL (β = 4.55, SE = 0.99, z = 4.58, p < 0.001) (Figure 10), which was also present in the full model. For the child participants (age 12 or younger), the questionnaire also included questions about previous (from age 0 to 4) and current input from their parents in Dutch and Spanish. A separate model search was carried out with the data from these child participants only (n = 8). The final model included random intercepts for subject and item as well as a significant effect of average reported skill in the HL (β = 4.55, SE = 0.99, z = 4.58, p < 0.001) (Figure 10), which was also present in the full model. For the child participants (age 12 or younger), the questionnaire also included questions about previous (from age 0 to 4) and current input from their parents in Dutch and Spanish. A separate model search was carried out with the data from these child participants only (n = 8). The final model included random intercepts for subject and item as well as a significant effect of average reported skill in the HL (β = 4.55, SE = 0.99, z = 4.58, p < 0.001) (Figure 10), which was also present in the full model. We suspected the effect of HL skill in the full model may have been driven by the effect in the children's group, so we performed a separate analysis on the data for the adults and teens only, and indeed, including the reported proficiency in Spanish did not improve the model. The final model for these data included, besides random intercepts for subject and item, a We suspected the effect of HL skill in the full model may have been driven by the effect in the children's group, so we performed a separate analysis on the data for the adults and teens only, and indeed, including the reported proficiency in Spanish did not improve the model. The final model for these data included, besides random intercepts for subject and item, a significant effect of age of arrival (β = −0.43, SE = 0.17, z = −2.42, p = 0.02), which was also present in the full model, and an effect of length of residence in The Netherlands (β = −0.25, SE = 0.12, z = −2.16, p = 0.03), which was not present in the full model. The effect of hours of other exposure was not included in the final model either, indicating it only applied to the participant group as a whole.
Finally, to check whether the participants' performances in gender in both languages were related, a Pearson correlation analysis was performed on the participants' mean accuracy scores in both unilingual modes. This correlation was not statistically significant (r(19) = −0.26, p = 0.19).

Spanish-Dutch Mode
In the Spanish-Dutch mode, participants had to perform the task in Spanish, but name just the object in Dutch. For the analysis, all constructions containing a Dutch noun for which either the determiner, the adjective, or both could unambiguously indicate the assigned gender in Spanish were taken into consideration. Ambiguous cases were those where the gender of the determiner and of the adjective did not match (e.g., el huis roja 'the.M house red.F'), or when a non-canonical adjective, such as verde 'green' was used without a determiner or in combination with a Dutch determiner.
In total, there were 57 of these ambiguous cases, leaving 649 cases that could be analyzed. In the majority of these (502 cases), a Spanish determiner and a Spanish adjective were used. In those cases where the determiner was absent (48 cases), or a Dutch determiner was used (47 cases), the form of the adjective was taken as an indicator of the gender of the DP. There were also 20 cases in which a Spanish determiner was used in combination with a Dutch adjective, in which case the gender of the determiner determined the gender of the construction.
Of all 649 unambiguous cases, the vast majority (538-82.90%) was assigned masculine gender, while in only 111 cases (17.10%) feminine gender was used. In 448 cases (69%), the gender assigned to the DP matched that of the translation equivalent in Spanish, compared to 201 cases (31%) in which it did not.
The overall preference for masculine gender over feminine gender, already suggests a default masculine strategy, but this becomes even clearer if we compare the use of masculine and feminine among the 'mismatch' cases only (i.e., cases where the assigned gender did not match the gender of the translation equivalent); of all 201 mismatch cases, 191 (95%) were assigned masculine gender, and only 10 (5%) were assigned feminine gender.
Most cases (499) followed the prototypical Spanish word order, which is a postnominal adjective. There were also 118 cases with a prenominal adjective (which is also possible, though less common, in Spanish), 29 cases in which a DP construction with a relative clause was used (un boek que es blanco 'a book that is white'), which were all produced by the same participant, and 3 cases with no adjective (all masculine), which were also produced by the same person and will be left out of the analysis. We mention word order, because it seems that participants behaved somewhat differently when they did not apply noun-adjective word order. First of all, the feminine was hardly used when the adjective was prenominal, and not at all in the DP+relative clause constructions. Moreover, the preference to use the gender that matches with the translation equivalent seems to be lower, or even absent in these cases, as can be seen in Figure 11. To test which, if any, of these differences were significant, a statistical analysis was conducted on all unambiguous DPs containing a Dutch noun. Due to the low number of cases, the DP + Relative clause and the cases without an adjective were excluded. The dependent variable was the gender assigned to the DP, and the independent variables were the gender of the translation equivalent in Spanish and the word order, which were added stepwise to the model to see whether one of these significantly improved the model fit. The final model had a statistically significant intercept (β = 3.25, SE = 0.63, z = 5.15, p < 0.001), which indicated that the masculine determiner was used significantly more often across the board than the feminine one, suggesting a default strategy, given that masculine and feminine nouns were evenly distributed across the objects. Furthermore, it included a significant effect of the gender of the translation equivalent (β = 4.15, SE = 0.45, z = 9.12, p < 0.001), showing that the masculine determiner was significantly more likely to be produced in the case of a masculine translation equivalent, and a feminine determiner in the case of a feminine translation equivalent, in line with the analogical gender strategy. Adding the effect of word order did not improve the model significantly.
An additional analysis was performed on a subset of the data that contained only the mismatch cases, to check whether there were significantly more mismatch cases that resulted in the assignment masculine gender than in the assignment of feminine gender (cf. Balam et al. 2021). A model was run on these data, with the assigned gender as the dependent variable, and without fixed effects, only random effects. The intercept of this model was significant (β = 14.49, SE = 2.82, z = 5.15, p < 0.001), indicating that, within this subset, the masculine gender was assigned more often than the feminine gender, thus confirming the default masculine strategy.
An anonymous reviewer wondered about the effect of the Dutch gender on the gender assignment in Spanish. There was no difference: Both common and neuter nouns were assigned

Number of cases
Translation equivalent match Translation equivalent mismatch To test which, if any, of these differences were significant, a statistical analysis was conducted on all unambiguous DPs containing a Dutch noun. Due to the low number of cases, the DP + Relative clause and the cases without an adjective were excluded. The dependent variable was the gender assigned to the DP, and the independent variables were the gender of the translation equivalent in Spanish and the word order, which were added stepwise to the model to see whether one of these significantly improved the model fit. The final model had a statistically significant intercept (β = 3.25, SE = 0.63, z = 5.15, p < 0.001), which indicated that the masculine determiner was used significantly more often across the board than the feminine one, suggesting a default strategy, given that masculine and feminine nouns were evenly distributed across the objects. Furthermore, it included a significant effect of the gender of the translation equivalent (β = 4.15, SE = 0.45, z = 9.12, p < 0.001), showing that the masculine determiner was significantly more likely to be produced in the case of a masculine translation equivalent, and a feminine determiner in the case of a feminine translation equivalent, in line with the analogical gender strategy. Adding the effect of word order did not improve the model significantly.
An additional analysis was performed on a subset of the data that contained only the mismatch cases, to check whether there were significantly more mismatch cases that resulted in the assignment masculine gender than in the assignment of feminine gender (cf. Balam et al. 2021). A model was run on these data, with the assigned gender as the dependent variable, and without fixed effects, only random effects. The intercept of this model was significant (β = 14.49, SE = 2.82, z = 5.15, p < 0.001), indicating that, within this subset, the masculine gender was assigned more often than the feminine gender, thus confirming the default masculine strategy.
An anonymous reviewer wondered about the effect of the Dutch gender on the gender assignment in Spanish. There was no difference: Both common and neuter nouns were assigned masculine gender in 83% of the cases and feminine gender in 17% of the cases. Including Dutch gender also failed to improve the statistical model significantly.

Spanish-Dutch Mode-Individual Differences between Participants
For each participant, we checked (1) in how many cases masculine and feminine gender were assigned; (2) in how many cases the gender matched that of the translation equivalent; and (3) in how many cases each construction (prenominal adjective, postnominal adjective, or DP containing a relative clause) was used. Based on these numbers, which are summarized in Table 7 below, one or more strategies could be determined for each participant. For instance, if a participant used (almost) exclusively masculine gender (regardless of whether this matched the gender of the translation equivalent), it was assumed that a default strategy was used. If a participant used both genders, but considerably more in the match condition than in the mismatch condition, this participant was considered to have applied the analogical gender strategy. If a participant almost exclusively used the construction containing a relative clause with a masculine adjective, then this was considered to be the main strategy of this participant. In several cases, a mix of two strategies was observed. For instance, if a participant used masculine gender considerably more than feminine, but also produced more cases in which the gender assigned to the DP matched that of the translation (for both genders), then this was considered as a mix between a default strategy and the analogical gender strategy. The strategies are indicated in the table with the following codification:
By far the most commonly used strategies were the masculine default (used by 11 participants) and the analogical gender strategy (used by 10 participants). Five of these participants used a mix of these two strategies. Then, there were four people who predominantly used the prenominal adjective with a masculine default, and one person who only used the construction including a relative clause, always with masculine gender. The use of a feminine default was not attested.
Relations with background variables were also explored ( Table 8) by means of descriptive statistics, as the number of participants in each strategy category was too low to be able to perform a statistical analysis. Language proficiency seemed to have an effect on the type of strategy used, especially regarding strategies 4 (prenominal adjective with masculine default) and 5 (construction with a relative clause and a masculine default). Of the seven participants who estimated their own general proficiency in Spanish the lowest, three used strategy 4 and 1 strategy 5. Furthermore, three out of the five lowest-scoring participants on gender in Spanish (unilingual mode) used strategy 4 or 5, and none of them used the analogical gender strategy. Those participants with a higher self-reported proficiency in Spanish (and with higher accuracy levels for gender in Spanish) tended to use either the masculine default (strategy 1) or the analogical gender strategy (strategy 3), or a mix of those two. However, there was one participant who used strategy 4 and scored 100% accuracy in the unilingual Spanish mode. There also seemed to be a reverse effect of gender accuracy in Dutch: Out of the seven subjects who scored above 95% accuracy in Dutch gender (unilingual mode), three used strategy 4.  Moreover, there was some indication that language exposure had an effect on the strategy adopted. Strategies 4 and 5 were used exclusively by people who were predominantly exposed to Dutch (75%) outside of the immediate family, and less than 20% to Spanish. The pattern was even clearer for exposure to input through books, media, TV, and music: The three people who indicated receiving the fewest hours of such input per week (1.5 or less) used strategies 4 (two people) and 5 (one person).
Those who were exposed to Spanish most through such media tended to use either the masculine default (strategy 1), the analogical gender strategy (strategy 3), or both.

Dutch-Spanish Mode
In the Dutch-Spanish mode, participants had to perform the task in Dutch, but name just the object in Spanish. For the analysis, all constructions containing a Spanish noun for which either the determiner, the adjective, or both could unambiguously indicate the assigned gender in Dutch were taken into consideration. Cases were considered ambiguous if for instance the determiner and the adjective did not correspond regarding their gender (e.g., de.C zwart.N. casa 'the black house'), or if a Spanish definite determiner was used in combination with an inflected adjective (el rode casa 'the red house'), in which case the form of the adjective does not distinguish between common and neuter gender. In total, there were 14 of these ambiguous cases, leaving 600 cases that could be analyzed.
In the majority of these (521 cases), a Dutch determiner and a Dutch adjective were used. In those cases where the determiner was absent (57 cases), or indefinite (311 cases), the form of the adjective was taken as an as indicator of the gender of the DP, that is, an inflected adjective was considered to indicate common gender, and an uninflected adjective was taken to indicate neuter gender 9 .
Of all 600 unambiguous cases, the majority (396 cases, 66%) was assigned common gender, while 204 (34%) were assigned neuter gender. When we look at the gender of the translation equivalent of the word in Dutch, we see that in 58.33% percent of all cases, the assigned gender matched the gender of the translation equivalent, while 41.67% of the time it did not.
Quite unexpectedly, there was also a considerably large minority of cases (115) in which a postnominal adjective was used. These included three instances where a Spanish instead of a Dutch adjective was used (de peine negro 'the black comb'), where this word order may not be so surprising, but the vast majority (112) contained a Dutch adjective. Interestingly, in all but one case (111), the adjective used in this case was the uninflected one, indicating neuter gender. These were mostly DPs with an indefinite or absent determiner (een casa zwart/casa zwart '(a) black house'). This type of construction was produced both with nouns for which the gender of the translation equivalent was also neuter (49 cases) and those for which it was common, thus resulting in a mismatch (62 cases).
These data are illustrated in Figure 12 below.
To check which of these effects were statistically significant, a generalized mixed effects model was performed on all DPs for which the gender could be unambiguously determined based on the determiner, the adjective, or both. The dependent variable was thus the gender assigned to the DP, and the two independent variables were the gender of the translation equivalent and the word order of the construction. The intercept of the model was not significant, indicating that across the board, there was no significant preference for either the common or the neuter gender. However, the main effects of the gender of the translation equivalent and of word order were both significant (β = −1.56, SE = 0.75, z = −2.08, p = 0.04 for the gender of the translation equivalent and β = −8.18, SE = 1.31, z = −6.22, p < 0.001 for word order) 10 . The first effect means that common gender was more often assigned to Spanish nouns whose translation equivalent in Dutch is common, and neuter gender was more often assigned to Spanish nouns whose translation equivalent is neuter. The effect of word order indicates that for prenominal adjective constructions, common gender was used relatively more often, while neuter gender was produced more with postnominal adjective constructions. Although the interaction between word order and gender of the translation equivalent was not significant (β = −2.46, SE = 1.51, 9 Although it could be argued that an absent determiner, when combined with an inflected adjective, may also be an indication of an omitted definite determiner (and thus a definite DP), our data show that within the same participant, absent determiners sometimes combined with inflected and sometimes with uninflected adjectives, implying that it is likely that in the former case, the adjective is used to assign common gender and in the latter to assign neuter gender. 10 Even though the effect sizes of these two variables differ considerably, we consider the high z-values and low p-values to indicate that both variables indeed are important predictors of gender assignment in our data. z = −1.63, p = 0.10), we decided to run post hoc analyses using the Tukey test, to check whether the effect of the gender of the translation equivalent was significant in both word orders. This is relevant, because we are not only interested in whether the participants used the analogical gender strategy more with AN word order than with NA word order, we also want to know whether they even apply it at all with NA word orders, which is a different question. If they do not apply it at all with NA word orders, we can conclude that they really adopt a different strategy depending on the word order. The post hoc test showed that the effect of the gender of the translation equivalent was only significant for the adjective-noun word order constructions (z = −5.93, p < 0.001), not for the noun-adjective word order constructions (z = −0.12, p = 0.98). This means that the analogical gender strategy was only applied in the former, not the latter, construction type.
In the majority of these (521 cases), a Dutch determiner and a Dutch adjective were used. In those cases where the determiner was absent (57 cases), or indefinite (311 cases), the form of the adjective was taken as an as indicator of the gender of the DP, that is, an inflected adjective was considered to indicate common gender, and an uninflected adjective was taken to indicate neuter gender 9 .
Of all 600 unambiguous cases, the majority (396 cases, 66%) was assigned common gender, while 204 (34%) were assigned neuter gender. When we look at the gender of the translation equivalent of the word in Dutch, we see that in 58.33% percent of all cases, the assigned gender matched the gender of the translation equivalent, while 41.67% of the time it did not.
Quite unexpectedly, there was also a considerably large minority of cases (115) in which a postnominal adjective was used. These included three instances where a Spanish instead of a Dutch adjective was used (de peine negro 'the black comb'), where this word order may not be so surprising, but the vast majority (112) contained a Dutch adjective. Interestingly, in all but one case (111), the adjective used in this case was the uninflected one, indicating neuter gender. These were mostly DPs with an indefinite or absent determiner (een casa zwart / casa zwart '(a) black house'). This type of construction was produced both with nouns for which the gender of the translation equivalent was also neuter (49 cases) and those for which it was common, thus resulting in a mismatch (62 cases).
These data are illustrated in Figure 12 below. To check which of these effects were statistically significant, a generalized mixed effects model was performed on all DPs for which the gender could be unambiguously determined based on the determiner, the adjective, or both. The dependent variable was thus the gender assigned to the DP, and the two independent variables were the gender of the translation equivalent and the word order of the construction. The intercept of the model was not significant, indicating that across the board, there was no significant preference for either the common or the neuter gender. However, the main effects of the gender of the translation equivalent and of word order were both significant (β = −1.56, SE = 0.75, z = −2.08, p = 0.04 for the gender of the translation equivalent and β = −8.18, SE = 1.31, z = 9 Although it could be argued that an absent determiner, when combined with an inflected adjective, may also be an indication of an omitted definite determiner (and thus a definite DP), our data show that within the same participant, absent determiners sometimes combined with inflected and sometimes with uninflected adjectives, implying that it is likely that in the former case, the adjective is used to assign common gender and in the latter to assign neuter gender.

Number of cases
Common gender assigned Neuter gender assigned To investigate the possible effect of a common default strategy, two additional models were run with the mismatch cases only, in both word orders. To check whether one gender was assigned significantly more often than the other, only the intercept was included in both models, and random effects if these improved the model fit. For adjective-noun word order, the model, which included a significant random effect of item, the intercept was significant, indicating a preference for the common gender (β = −11.81, SE = 2.61, z = −4.53, p < 0.001). For noun-adjective word order, the model also included a random effect of item, and rendered a significant intercept, which showed a preference for the neuter gender in mismatched cases (β = 15.33, SE = 6.17, z = 2.49, p = 0.01). This means that these speakers use a default masculine strategy for constructions with a prenominal adjective, and a neuter default strategy for constructions with a postnominal adjective.
An anonymous reviewer wondered about the effect of the Spanish gender on the gender assignment in Dutch. There was a small difference: 63% of all feminine nouns vs. 70% of all masculine nouns was assigned common gender in Dutch. However, including Spanish gender did not improve the statistical model significantly.

Dutch-Spanish Mode-Individual Differences between Participants
Similar to the Spanish-Dutch mode, different strategies were identified based on each participant's number of common and neuter DPs, the number of times in which the gender matched that of the translation equivalent, and the type of construction (prenominal or postnominal adjective) they used by participants who arrived in The Netherlands before age 3. There was also a relation with the amount of time spent in the country of origin: The analogical gender strategy was only used by those participants who had spent the least amount of time in the country of origin (three years or less), while the common default strategy was used considerably more by those participants who had spent more time in the country of origin (between 4 and 20 years). It was also the case that the common default strategy was used mostly by participants who were more exposed to Spanish through media, books, music, and movies (between 7 and 33 h per week), while the analogical gender strategy was almost exclusively used by people who reported 7 hours or less of this type of "other" exposure.
Finally, the amount of use of Spanish and Dutch also affected the code-switching strategies. The analogical gender strategy was used only by those who used Dutch most with their immediate family (i.e., more than 50% of the time). The precise information for these factors can be found in Table 10. Finally, we investigated whether participants' strategies in the two code-switching modes were somehow related, for example whether participants who tended to use analogical gender in one mode, also did so in the other mode, but we could not detect any consistent pattern.

Discussion
We hypothesized gender assignment and agreement in Spanish, the HL of the participants, to be different from what is generally reported for Spanish monolinguals. However, overall gender accuracy was quite high, with 92.33% correct responses. Still, the error pattern revealed an overextension of the masculine gender, similar to previous studies on heritage Spanish in the US and The Netherlands (Montrul and Potowski 2007;Montrul et al. 2008;van Osch et al. 2014;Cuza and Pérez-Tattam 2016; Irizarri van Suchtelen 2016; Goebel-Mahrle and Shin 2020). In addition, more errors were made with non-canonical nouns than with canonical nouns (cf. van Osch et al. 2014;Montrul et al. 2008;Montrul et al. 2014;Goebel-Mahrle and Shin 2020) and with the adjective than with the determiner (cf. Cuza and Pérez-Tattam 2016). We had predicted individual differences regarding language usage and exposure to be reflected in the results. This turned out to be the case: The use of the HL with the immediate family and the exposure to input from social media, books, TV, and music had a significant effect on gender accuracy. Of these, the use of the HL with the immediate family had a particularly high effect size. For children in particular, the amount of previous parental input and the number of years they had spent in their heritage country was also shown to affect their accuracy with Spanish gender, of which the former had the highest effects size. Age of arrival, on the other hand, did not correlate significantly with gender accuracy, although a trend was visible in the data. Importantly, no clear division age could be detected after which accuracy scores increased considerably. As mentioned in the introduction, most studies adopt a specific cut-off age (often age 6, but in some cases much older), to distinguish heritage speakers from child L2 learners. However, if such a cut-off point were ecologically valid, we might expect to see a sharp division between people who arrived before a certain age and those who arrive after. Our data do not support such a division, suggesting that people who are often categorized into two different populations (heritage speakers on the one hand, and child L2 learners on the other hand) should perhaps just be considered one heterogeneous population of bilinguals with various ages of onset of one of their languages. As for gender accuracy in Dutch, the societal language, we hypothesized participants to perform better than in Spanish, their HL. This was not confirmed by the data: Dutch gender was target-like in only 84.07% of all cases (586 of 697). Virtually all errors were overextensions of the common gender, for neuter gender words. This result is in line with other studies on bilingual children acquiring Dutch (Hulk and Cornips 2006;Unsworth et al. 2014) and contradicts the findings reported in Egger et al. (2018) on Greek-Dutch bilinguals, which suggested that children acquiring two gender systems simultaneously may actually benefit from being bilingual when it comes to the acquisition of gender.
Our hypothesis, in which we expected to find that effects of language-external factors relate to language history, input, and use was confirmed. A negative correlation between age of arrival and accuracy indicated that participants who were older when they arrived in The Netherlands made more errors in Dutch gender. This finding contradicts Unsworth et al. (2014), who found no effect of age of onset on the acquisition of Dutch gender in child Dutch/English bilinguals in The Netherlands. However, it is important to keep in mind that the variable 'age of onset' is directly related to the variable 'length of residence', which in turn depends also on the age at testing. In fact, both 'age of onset' and 'length of residence' were found to correlate significantly with 'age at testing' in our data (Pearson's R: 0.52, p = 0.02 for age of onset and Pearson's R: 0.88, p < 0.001 for length of residence). This means that the effect of age of onset of Dutch may (also) reflect an effect of age at testing: Older participants performing better than younger ones. In line with Unsworth et al. (2014) as well as Cornips and Hulk (2008), the amount of current input and exposure also affected participants' gender accuracy in Dutch, in particular for 'other' input, that is, input from TV, music, social media, and books. The vast majority (11 out of 15) of people who were less than 15 h per week exposed to Spanish through this 'other' input, score around 90% accuracy or higher. Of those people who reported 20 or more hours of exposure per week, only two out of six reached this level of accuracy in Dutch gender.
For the child participants in particular, there was a positive correlation between general (self-reported) proficiency in Spanish and gender accuracy in Dutch, indicating that children who had a better command of Spanish did better with Dutch gender. This may mean that there is indeed a beneficial effect from knowing Spanish-which has a transparent gender system-when it comes to acquiring a more opaque gender system like the Dutch one (cf. Egger et al. 2018). However, if that were the case, we would also expect to have found a correlation between the participants' performance with gender in the Spanish unilingual mode and the Dutch unilingual mode, which we did not. A more likely explanation for this finding may be that both proficiency in Spanish and gender accuracy in Dutch correlated with age in this case. The low accuracy scores in Dutch attested in this study are interesting given that it is often assumed (though rarely actually tested), especially in studies with adult heritage speakers, that heritage speakers are completely monolingual-like in the societal language, or at least more proficient than in the HL. Our results indicate that this is not necessarily the case. Even among those speakers who arrived before age 6, which is often taken to be a cut-off point to distinguish between heritage speakers and child L2 learners, there were several speakers who obtained gender accuracy scores in Dutch below 80%. However, we have to keep in mind that, as suggested by Cornips (2008), the overextension of common gender attested in our study may well be a marker of identity rather than a result of either age of onset or reduced/different input. We do not see a way to differentiate between these possible accounts based on the information that we have gathered in this study.
As for gender assignment in code-switched constructions, for Dutch noun insertions into Spanish, we expected to find evidence for both the gender default strategy and the analogical gender strategy. Based on previous work (Liceras et al. 2008(Liceras et al. , 2016, we had hypothesized to find a division between participants who used the masculine default and those who preferred the analogical gender strategy. We indeed attested both these strategies in the data, to a more or less equal degree, but there was a third strategy revealed as well, which seems to be specific to the Spanish-Dutch language combination that we studied. Four participants displayed a construction containing a prenominal Spanish adjective, which was masculine in all cases. Instead of a division between a masculine gender strategy and the analogical gender strategy, our participants could be categorized based on the word order they preferred: On the one hand, there was a rather large group of participants who used postnominal adjectives with either the masculine default or the analogical gender strategy (or both) and on the other hand there was a small subset of speakers who used the prenominal masculine adjective. This latter strategy seemed to be related to lower levels of proficiency, and gender accuracy, in Spanish, and higher gender accuracy in Dutch, as well as to more exposure to Dutch than Spanish outside the immediate family and through other input, while the analogical gender strategy and the masculine default strategy were used by speakers who were more exposed to Spanish and more proficient in Spanish.
Regarding Spanish noun insertions into Dutch, by far the most adopted strategy was the common default. Another frequently applied strategy, which we had not anticipated, was the use of a construction containing a postnominal uninflected Dutch adjective. The analogical gender strategy was used as well, but to a lesser extent. Similar to the Spanish-Dutch mode, the gender assignment strategies correlated with certain extralinguistic variables. In this mode, there turned out to be a division between people who tended to use the common gender default strategy on the one hand and people who used the analogical gender strategy on the other hand, in line with Liceras et al. (2008Liceras et al. ( , 2016. The latter group were generally speakers who were dominant in Dutch rather than Spanish (in terms of both proficiency, age of onset, and patterns of exposure and use), while the group that could be defined as more Spanish-dominant tended to use the common gender default 11 . Thus, in both directions, the analogical gender strategy correlated with being dominant in the matrix language. This is something that has been observed across bilingual communities and language pairs: The analogical criterion strategy does not seem to be present in speakers who are not Spanish L1 speakers (see Bellamy et al. 2018 for Purepecha-Spanish), whereas L1 Spanish speakers seem more likely to follow the analogical criterion (see Liceras et al. 2008 for Spanish/English or Munarriz-Ibarrola et al. 2019 for Basque-Spanish). Bilingual communities may also settle on specific code-switching patterns (cf. Królikowska et al. 2019 who compared four Spanish/English populations and observed that the more the bilinguals code-switched, the greater the tendency to assign the default masculine gender). All these differences in gender assignment strategies across communities, language pairs, and between individual speakers, could be due to a combination of proficiency and environmental factors (cf. Valdés Kroff et al. 2019). These factors should be taken into account in future studies.

Conclusions
This paper has examined heritage speakers (of various ages) of Spanish in The Netherlands regarding their production of gender in both their languages (Spanish and Dutch) as well their gender assignment strategies in code-switched constructions. One of the most interesting findings is that, although our speakers were either born in The Netherlands or arrived at an early age (before 12 years old), they showed non-target use of gender in both their HL, Spanish, and the societal language, Dutch. In fact, gender accuracy in Dutch was lower overall compared to that in Spanish. Instead of excluding participants with an age of onset of bilingualism above a certain age, we included participants with a wide range of age of arrival (from 0 to 12) to see whether a clear cut-off point could be identified that would separate heritage speakers from child L2 learners in terms of their gender accuracy levels in Spanish. However, age of arrival was not a significant predictor for Spanish, the HL, indicating that a later arrival to the host country does not necessarily imply better proficiency in the HL. In contrast, age of arrival was found to affect gender accuracy in Dutch. Moreover, several other factors related to the amount of input and exposure in Dutch and Spanish were found to affect gender accuracy in both languages.
As for code-switching strategies, apart from the use of a gender default (masculine for Spanish and common for Dutch) and the analogical gender strategy, our data revealed some thus far unreported strategies. When inserting Dutch nouns in Spanish, some participants used a prenominal masculine adjective across the board. When incorporating a Spanish noun into a Dutch DP, several participants used a construction comprising a postnominal adjective (which is ungrammatical in Dutch) in neuter (uninflected) form. In both code-switching modes, gender assignment strategies correlated with external factors, such as self-reported language proficiency, age of arrival, and amount of exposure to and usage in both languages, such that the analogical gender strategy was used more by speakers who were dominant in the matrix language. Overall, our findings reveal the heterogeneity of heritage speakers as a population and emphasize the importance of taking into account differences between communities, language pairs, as well as individual differences between speakers related to proficiency, exposure and use.