Feature Matching Does Not Equal Convergence: Acquisition of L2 French Accusative Pronouns by L1 Spanish Speakers

: Our study aims to determine whether formal similarity between two languages (oper-ationalized via the Feature Reassembly Hypothesis) allows adult L2 learners of French (Spanish native speakers; NSs) to straightforwardly acquire third-person singular accusative clitics in their L2. Additionally, we examined the role of surface similarity, since French and Spanish overlap and diverge in several ways. In terms of formal similarity, third-person accusative clitic pronouns in Spanish are almost perfect analogues of their French counterparts. In terms of surface similarity, however, while the feminine accusative pronouns are identical (“la” [la]), the masculine ones differ in Spanish (“lo” [lo]) and French (“le” [l (cid:85) ]). Participants included French NSs ( n = 26) and Spanish-speaking L2 French learners ( n = 36). Results from an ofﬂine forced-choice picture selection task and an online self-paced reading task did not support the Feature Reassembly Hypothesis because learners showed considerable difﬁculty with the interpretation and processing of these pronouns, revealing that, unlike French NSs, their interpretations and processing are guided by the feature [ ± Human] and, to a lesser degree, by gender, which might be due to the surface-level similarity between feminine accusative clitic pronouns in both languages.


Introduction
It has long been discussed in the literature on bilingualism/multilingualism that first (L1) and second (L2) languages are interconnected in the mind of a speaker. To explain such relationships, several frameworks have made different proposals. From a usage-based perspective, abundant evidence shows that both languages in a bilingual's mind remain available and activated when processing both written and spoken language. This parallel activation of the lexicon in both languages has been shown to operate when surface-level similarity exists between isolated words (Dijkstra et al. 1998;Marian and Spivey 2003) and between sentence structures (Baten et al. 2011;Duyck et al. 2007).
In contrast to usage-based perspectives, generative approaches have traditionally focused on understanding speakers' underlying knowledge of syntactic and morphosyntactic properties. Numerous studies conducted within this framework investigate whether knowledge of L1 grammar affects how the L2 is represented and used (the question of 'L1 transfer') and whether bilinguals are able to derive the same meanings and arrive at the same linguistic judgments in their L2 as native speakers (NSs).
While usage-based research demonstrates the important role surface-level similarity plays in language acquisition and processing, generative research has provided crucial evidence of the importance of abstract-level phenomena in L2 acquisition and processing. Therefore, one important open question is the role that both abstract-level similarity (e.g., similarity in structural or syntactic function) and surface-level similarity (e.g., similarity in how two forms look or sound) play in the acquisition and use of a second or subsequent (Ln) language. In this study, we investigate how abstract-and surface-level similarity between a speaker's L1 and L2 affect L2 comprehension and incremental L2/Ln language processing. We do so by investigating L2 acquisition of French pronominal clitics by NS of Spanish. To the best of our knowledge, no previous study has investigated situations wherein both formal-and surface-level similarities can be established between grammatical forms in two languages. One could, thus, argue that a situation like the one we are considering here constitutes a situation where both usage-based and generative approaches can provide valuable insights into the minds of bilingual speakers.
The rest of the paper is organized as follows: the next section details theoretical background, describes the linguistic property under investigation, and briefly discusses some previous research relevant for the current study. The section ends with the presentation of the research questions. Sections three and four present the two experiments and their results followed by a discussion section and a brief conclusion.

Theoretical Background
The phenomenon of parallel activation has received ample empirical support in usagebased research. For example, in an experiment conducted by Schwartz and Kroll (2006), a group of Spanish-English bilinguals living in Spain read English sentences that contained words such as "pan," which are inter-lingual homographs: words that look the same but have different meanings in English and Spanish (Spanish pan = bread). The bilinguals, who resided in Spain and used Spanish regularly, made more errors naming words like "pan" in English sentences as opposed to English words that do not exist in Spanish.
Within the generative SLA framework, a particularly useful conceptualization of L1 influence-especially as it concerns formal similarity-is offered by the Feature Reassembly Hypothesis (FRH; Lardiere 2009), which has operationalized L1 transfer as the process of mapping abstract feature bundles (realized in the L1 as words or morphemes) to forms in the L2. Lardiere adopts a Minimalist view of language architecture (Chomsky 1995) to propose that L2 learners transfer abstract units of meaning-linguistic features-expressed by L1 words and morphemes. Importantly, this view assumes that all lexical items (words and morphemes) are physical realizations of features, whether functional (grammatical) or semantic. These features (which include number and, crucially for our investigation, gender) are essential for mapping sound-meaning pairings. However, while the atomic units of meaning or linguistic features are assumed to be universal, the departing notion is that languages vary in how they organize features into words and morphemes, such that learners will have to reassemble features if the L1 and L2 differ in their feature encoding. Additionally, features not present in the L1 but present in the L2 will need to be added from the Universal Grammar feature inventory.
As we will see in more detail in the next section, Spanish and French accusative clitic pronouns should not require reassembly because they encode the same feature sets. The FRH does not predict difficulties for forms with different surface realizations if the corresponding abstract features can be mapped perfectly from the L1 to the L2 since "learners will look for morpholexical correspondences in the L2 to those in their L1, presumably on the basis of semantic meaning or grammatical function (the phonetic matrices will obviously differ)" (Lardiere 2009, p. 191). The situation regarding surfacelevel similarity is more complex because Spanish has a clitic form that looks exactly like the French accusative masculine clitic "le," although it is not its functional analogue. Spanish "le" [le] can refer to masculine and feminine antecedents and is used as a dative (not accusative) pronoun. Therefore, if the parallel activation phenomenon observed with lexical words applies to functional words, surface-level form similarity might cause Spanish-French bilinguals some trouble mapping "le" in their L2-French. We examine how Spanish-French bilinguals process French third-person singular accusative clitic pronouns. These forms, both in French and Spanish, have been of interest to linguists in different L1-L2 language combinations from a variety of theoretical perspectives. However, investigations focusing on Spanish speakers learning French remain relatively scarce-a situation we attempt to improve with the present study. Importantly, personal pronouns are a good testing ground for the Feature Reassembly Hypothesis because they constitute phonetic realizations of features such as person, number, case, and, in this particular language combination, gender.
Let us have a closer look at the French and Spanish paradigms for accusative and dative singular pronouns. On one hand, with respect to the similarity in terms of abstract features, the Spanish paradigm aligns perfectly with the French one. Both languages have pronominal clitics in their inventory (Roberge 1990;Zagona 2002). Compare this to languages like English, where all pronouns are strong. Additionally, in Spanish and French, clitics occupy different positions from a full Determiner Phrase (DP): While full DPs occur after the verb, clitics are positioned pre-verbally (with finite verbs). 1 Examples (1) and (2), (3) and (4)  Other similarities include that, dative and accusative clitics in both languages cannot be separated from the verb (except by other clitics, in which case both languages have specific ordering criteria) (van Riemsdijk 1999). Additionally, clitics cannot stand alone and, therefore, cannot be the answer to a whquestion (Cardinaletti and Starke 1999). Similarly, neither Spanish nor French clitics can be either stressed or conjoined (Cardinaletti and Starke 1999).
Similarities between French and Spanish extend to the encoding of gender and the [±Human] feature because both French and Spanish accusative clitics encode grammatical gender (masculine/feminine) but not the feature [±Human]. Unlike English, where the pronominal paradigm lexically encodes biological gender of human referents (she vs. he, her vs. him), both French and Spanish lexically encode grammatical gender, meaning that clitics match the gender of both human and, importantly, non-human antecedents. This difference between French/Spanish and English can be observed in examples (2) and (4), where the English pronoun him marks [+human] (while the pronoun it is reserved for [−human] referents). Thus, both French (2) and Spanish (4) could be translated into English as "I will see it [masculine]" if the referential expression associated with the pronoun is [−human] instead. 3 Regarding the syntactic analysis of clitics, we assume, following Cardinaletti and Starke (1999), that clitics in French and Spanish are deprived of the projection that realizes referential, human, and case features (unlike the case for strong pronouns). This analysis assumes that this syntactic deficiency explains their inability to stand alone and to receive stress. Additionally, we follow Jakubowicz et al. (1998, p. 120) in their assumption that Languages 2021, 6, 144 4 of 20 these clitics are underspecified for the [±Human] feature, which allows them to be merged in the functional domain of the verb (unlike full DPs.) Regarding gender features, French and Spanish nouns are posited to bear grammatical gender features that can be specified or unspecified. Masculine forms are unspecified because they are the default and are assumed to lack lexical marks (e.g., Harris 1991;Lumsden 1992). In contrast, feminine nouns are (lexically) specified as feminine since these are not the default forms. An interesting correlate of this premise is that there are asymmetries in terms of feature clashes: Masculine determiners or adjectives do not clash when they co-occur with feminine nouns because [masculine] bears default, unspecified agreement (see Liceras et al. 2008;White et al. 2004). On the contrary, feminine determiners and adjectives clash with masculine nouns because the feminine is specified for gender.
Finally, French and Spanish are also comparable in terms of the resolution of pronouns when these co-refer with referential expressions (R-expressions). This process requires for a dependency between the pronoun and the referent to be established. In the case of these languages, this dependency relies on gender matching, such that ϕ-features between the referent in the discourse and the features on the pronouns themselves must match. Once these features are co-indexed, relevant uninterpretable features can be checked and then deleted in the computation. In short, French and Spanish accusative clitic pronouns form a perfect analogue in terms of abstract similarity.
In terms of surface-level similarity, French and Spanish pronoun paradigms share commonalities, to different degrees, in writing and pronunciation. The feminine accusative clitic has the same spelling and pronunciation in both French and Spanish: "la" [la]. The masculine clitics differ, however: "lo" [lo] in Spanish and "le" [l ] in French. The parallels do not end there, either, because the French accusative masculine clitic has the same written form as the Spanish dative singular clitic-"le"-although it is pronounced differently in the two languages: [l ] in French and [le] in Spanish. Finally, the French dative clitic has a unique spelling and pronunciation that shares no similarity with Spanish forms: lui [l4i]. Table 1 illustrates the extent to which the paradigms in French and Spanish parallel each other in abstract-level features and feature bundles expressed by individual forms. It also shows the extent to which these surface forms are similar or different in the two languages in writing and pronunciation. As an interim summary: in terms of abstract functional features (the learning task), there is nothing to reassemble for L2 learners of French who are Spanish NSs when it comes to accusative clitics; the paradigms in both languages align perfectly. Thus, based on the predictions of the Feature Reassembly Hypothesis, we would expect that acquisition should proceed straightforwardly. In terms of surface-level similarity, however, the learning task differs: The fact that some forms align perfectly (e.g., both feminine accusative clitics are the same in writing and pronunciation), while others do not ("le" in French and Spanish, as shown in Table 1), might interfere with how these accusative clitics are judged or processed in L2 French.

Previous Studies: The Acquisition of Gender and French Accusative Clitics
Overall, there is consensus in the L2 literature that, relative to subject clitics, the acquisition of French object clitics lags behind in terms of overall clitic production (e.g., Schlyter 2003) but also in terms of accuracy regarding object clitics' distributional properties (e.g., placement) and gender agreement (Prévost 2009). Hawkins (2001) has proposed that the acquisition of the distributional properties of object clitics proceeds in stages, with learners first placing them in postverbal position, then omitting them, placing them in the intermediate position, before reaching target-like placement. Placement seems to be unproblematic for Spanish-French bilinguals, however. Bruhn de Bruhn de Gavarito and Montrul (1996) have shown that Spanish-speaking learners of French have no issues learning placement except in cases where Spanish (but not French) allows clitic climbing, i.e., Spanish-French bilinguals failed to consistently reject ungrammatical sentences like *Je le veux voir, ("I want to see him").
While French and Spanish clitic systems are very similar, there is evidence that even in situations with two closely related languages wherein L2 learners can successfully transfer some L1 processing strategies to the L2, acquiring gender might not be effortless. For instance, Sabourin and Haverkort (2003) studied the acquisition of gender agreement with German NSs who were L2 learners of Dutch. The authors used a Grammaticality Judgment Task (GJT) and a second GJT paired with ERP versions of the experiments. They found that learners were successful with definite Noun Phrases (NPs), which are very similar in both languages, but performed poorly with indefinite NPs, which feature important differences between the two languages.
Overall, previous research, mainly from generative and variationist perspectives, presents a complex picture when it comes to the L2 acquisition of gender. While some studies show that L2 learners can be quite accurate in their judgments and use of gendermarked determiners and adjectives (e.g., Bruhn de Garavito and White 2002; Edmonds and Gudmestad 2018;Gabriele et al. 2013;Gudmestad et al. 2019;Montrul et al. 2008), other studies have documented extensive difficulties when realizing gender agreement (e.g., Franceschina 2005;Grüter et al. 2012). Additionally, it appears that speakers whose L1 has grammatical gender have an easier time than those whose L1 does not have grammatical gender (e.g., Foucart and Frenck-Mestre 2011;Sabourin and Stowe 2008). This would mean that, in theory, our learners should have an easier time with the acquisition of gender distinctions in L2 French because Spanish, like French, encodes grammatical gender.
An ancillary consideration is that a disconnect may exist between how grammatical gender is processed/computed and learners' lexical knowledge of gender. For instance, Hopp (2013) investigated the acquisition of gender with L2 learners of German and found that only those learners who had stable (lexical) knowledge of gender could use gender as a predictive cue. In our investigation, we tried to disentangle the lexical knowledge of gender from how it is processed or computed by always providing the correct gender marking in the determiner (e.g., sa "his/her"/la "the" or son "his/her"/le "the"). In other words, we were interested in studying how gender was processed and interpreted rather than whether learners knew the lexical gender of a given NP.
A final factor to consider is how the feature [±Human] may affect L2 performance. As with the acquisition of gender, however, the research on this semantic feature has produced divergent results. Some researchers have proposed that the processing of animate nouns is more "cognitively demanding" (Sagarra and Herschensohn 2010, p. 106). Other researchers have suggested that gender cues can facilitate the processing of both [+human] and [−animate] pronouns (Irmen and Knoll 1999). In the case of our learners, we would expect positive transfer of this semantic feature because it works in equivalent ways in both languages. The question of the acquisition of gender for Spanish-French bilinguals, however, is an experimental one. In the following section, we describe the two experiments that we conducted to answer the following research questions: 1.
RQ1-Does the acquisition of the features associated with French accusative clitics proceed straightforwardly when there is nothing to reassemble (e.g., when the formal similarity coincides), as predicted by the Feature Reassembly Hypothesis?, and 2.
RQ2-What role, if any, does surface-level similarity play?

Participants
We tested two groups of participants: NSs of French who provided a baseline of behavior and who resided and were tested in Paris (n = 26); and NSs of Spanish who were L2/L3 learners of French (n = 36), tested in Mexico (Puebla). Table 2 includes the demographic information. All participants reported learning languages other than their native ones. NSs of French all reported speaking English (n = 26) at various levels, while half reported having learned Spanish (n = 13). A few reported learning German (n = 5), Italian (n = 3), or other languages (Russian, Portuguese, or French Sign Language (n = 3)).
Within the L2 learner group (Spanish NSs), all learners reported learning French and English (n = 34). 5 A few reported learning German (n = 9), while others reported learning Portuguese (n = 2) and Croatian (n = 1). The L2 group also completed a proficiency test that consisted of a C-test containing 50 blanks (50 points total) (Renaud 2010). The test contained two short authentic texts (74 and 97 words); the first clause of each was left intact. Starting at the second clause, the second half of every other word was deleted to target different categories (e.g., nouns, verbs, or prepositions). In terms of scoring, each correct answer received one point. Answers with minor spelling errors or unidiomatic phrasing received half a point. The instructions encouraged participants not to take more than five minutes per text to fill in the blanks, but they were not timed. Average scores are included in Table 2. Our forced-choice picture selection task tested participants' interpretations of clitic pronouns that differed in terms of the encoding of the feature [±Human] and gender. The task was delivered via Qualtrics and completed in a quiet lab under supervision. This test, including the proficiency task and the background questionnaire, took L2ers an average of 41 min to complete.

Materials
The picture selection task followed a 2 × 2 design, with gender (masculine/feminine) and [±Human] (human/nonhuman) as factors. The task included a total of 46 items: 12 experimental and 34 fillers. Items were distributed into two lists so that each participant only saw one version of each item (masculine/feminine), judging a total of 46 items. Participants were introduced to a scenario: They were to help Nicolas, a schoolboy, understand who or what his friends are talking about because he always arrives late. For each item, participants read a short discourse context where three potential referents were introduced (either two masculine referents and one feminine or vice versa). The test sentence following the context contained a clitic pronoun and was followed by four choices/pictures, only one of which was felicitous. Option 1 correctly matched the gender of the referent, while Option 2 referred to the opposite gender. Option 3 allowed participants to choose both, while Option 4 represented a distractor that was thematically compatible with the context but had gone unmentioned in the discourse. These options were randomized per participant. Figure 1 shows a sample item. Contexts and test sentences were presented in French. Spanish translations were not provided.
item, participants read a short discourse context where three potential referen troduced (either two masculine referents and one feminine or vice versa). Th tence following the context contained a clitic pronoun and was followe choices/pictures, only one of which was felicitous. Option 1 correctly matched of the referent, while Option 2 referred to the opposite gender. Option 3 allow pants to choose both, while Option 4 represented a distractor that was themati patible with the context but had gone unmentioned in the discourse. These op randomized per participant. Figure 1 shows a sample item. Contexts and test were presented in French. Spanish translations were not provided.

Results
Our task investigated the interpretations participants assigned to third-pe sative French clitic pronouns. We designed the task to determine the effects of man] feature plus grammatical gender to determine whether lack of reassem lead to straightforward acquisition. Participants were presented with four opti

Results
Our task investigated the interpretations participants assigned to third-person accusative French clitic pronouns. We designed the task to determine the effects of the [±Human] feature plus grammatical gender to determine whether lack of reassembly would lead to straightforward acquisition. Participants were presented with four options:

•
Correctly gender-matched referent; • Incorrect: gender-mismatched referent; • Incorrect: both genders; and • Incorrect: distractor (gender-matched). Table 3 presents the proportion of responses (counts in parentheses), arranged by answer type for the L1-French NSs. Overall, Table 3 shows that the NSs behaved as predicted in the literature. Note that the error type "Gender Mismatch" did not have a single incidence: gender-matching was particularly salient for NSs. In the [+Human] conditions, the only type of error was "distractor," which did not make sense in terms of the story but matched the gender of the referent. Below, Table 4 shows the proportion of responses for the L2 group, whose accuracy scores were much lower, their error types more varied. Recall that this task was not performed under time pressure-even if learners were unfamiliar with the gender of a nonhuman NP, they could look at the story and retrieve the gender marked in the determiner. We analyzed our data through a series of pairwise comparisons because our dependent variable was categorical (selection responses). To set up the comparisons, we collapsed results over binary categories using logistic mixed-effects models via the glmer function of the lme4 package (Bates et al. 2015) in the R environment (R Core Team 2020). Dependencies were captured in maximal random effect structures (RES). We report on separate models per group (L1 vs. L2). This decision follows Dekydtspotter et al. (2006), who argue that observed isomorphy between groups should not be taken to mean that L1 and L2 acquisition are fundamentally different. In doing so, we also avoid the comparative fallacy (Bley-Vroman 2009). Thus, we focus on finding out whether learners can make significant distinctions within their own grammar, rather than whether they exactly mirror their L1-French counterparts. Models included the following RES: random intercepts for participants and items, as well as random slopes. When the model did not converge with the maximal RES, we ran it again with the next maximal RES until convergence.
When we analyzed the natives' response behavior statistically, we collapsed data over Correct/v/Incorrect (all error types collapsed). Results from a logistic mixed-effects model revealed no main effects or interactions, suggesting that the natives' responses, which were overwhelmingly accurate, did not show a relationship with either the [±Human] feature or gender. The maximal converging RES (Table 5) included random effects by participant and by item. For the model for L2 learners (Table 6), we also collapsed results over binary categories. Results from a logistic mixed-effect model revealed main effects for [±Human] and gender (p = 0.0103 and p = 0.001, respectively), as well as a gender*[Human] interaction (p = 0.0304). The maximal RES that converged (Table 6) included random effects by participant and by item. Significant results are marked with an asterisk (*). We followed up with pairwise comparisons (paired-samples t-tests, emmeans in R), which revealed a significant contrast between human and nonhuman referents when the gender was feminine (t = −2.567, p = 0.0103), whereby the learners were more accurate with [+human] referents (Tukey HSD). Comparisons also unveiled a significant distinction between masculine and feminine (t = −3.393, p = 0.0007) when selecting among [+human] referents (learners were more accurate with feminine). No other contrasts were significant.
Finally, we examined the data by proficiency, treated as a continuous variable (Leal 2018). We ran a generalized linear mixed-effect regression with (centered) proficiency as a fixed-effect predictor and the maximal RES that would converge on the data set (by-subject and by-item slopes, plus their intercepts), collapsed over Correct/v/Incorrect responses. To better visualize the relationship, we plotted proficiency against accuracy in Figure 2. The model yielded a main effect of proficiency (β = −0.137, SE = 0.0228, z = −5.98, p = 0.000), suggesting that accuracy on the task increased with higher proficiency.
The results of the picture selection task revealed several important findings. First, as expected, the L1 group was extremely accurate in their selections (96.86% overall), which were not influenced by whether the referent was [±Human] or masculine/feminine. The L2 learners were much less accurate (64.57% overall), showing a two-way interaction (gender*Human), whereby they were more accurate with [+human] referents when the referent was feminine. Thus, although the French and Spanish clitic systems have similar feature bundles, this formal similarity did not automatically translate into an advantage in their L2 French. Finally, we found a positive effect of proficiency on overall accuracy. For the model for L2 learners (Table 6), we also collapsed results over binary categories. Results from a logistic mixed-effect model revealed main effects for [±Human] and gender (p = 0.0103 and p = 0.001, respectively), as well as a gender*[Human] interaction (p = 0.0304). The maximal RES that converged (Table 6) included random effects by participant and by item. Significant results are marked with an asterisk (*). We followed up with pairwise comparisons (paired-samples t-tests, emmeans in R), which revealed a significant contrast between human and nonhuman referents when the gender was feminine (t = −2.567, p = 0.0103), whereby the learners were more accurate with [+human] referents (Tukey HSD). Comparisons also unveiled a significant distinction between masculine and feminine (t = −3.393, p = 0.0007) when selecting among [+human] referents (learners were more accurate with feminine). No other contrasts were significant.
Finally, we examined the data by proficiency, treated as a continuous variable (see Leal 2018). We ran a generalized linear mixed-effect regression with (centered) proficiency as a fixed-effect predictor and the maximal RES that would converge on the data set (bysubject and by-item slopes, plus their intercepts), collapsed over Correct/v/Incorrect responses. To better visualize the relationship, we plotted proficiency against accuracy in Figure 2. The model yielded a main effect of proficiency (β = −0.137, SE = 0.0228, z = −5.98, p = 0.000), suggesting that accuracy on the task increased with higher proficiency.
The results of the picture selection task revealed several important findings. First, as expected, the L1 group was extremely accurate in their selections (96.86% overall), which were not influenced by whether the referent was [±Human] or masculine/feminine. The L2 learners were much less accurate (64.57% overall), showing a two-way interaction (gender*Human), whereby they were more accurate with [+human] referents when the referent was feminine. Thus, although the French and Spanish clitic systems have similar feature bundles, this formal similarity did not automatically translate into an advantage in their L2 French. Finally, we found a positive effect of proficiency on overall accuracy.

Participants
The same participants completed both tasks. For the SPR, we had two criteria for exclusion. We excluded one participant because their overall mean Reading Times (RTs) were more than 2.5 SDs from the mean. We excluded five additional participants whose accuracy in the comprehension questions was lower than 70%. All exclusions came from the L2 group.

Procedure
Participants read sentences in French in a non-cumulative moving-window display (Just et al. 1982) as presented by the software Linger (Rohde 2003). Each trial included a non-moving discourse context followed by a sentence with non-space characters replaced by dashes. To read sentences, participants were asked to press the space bar at whatever pace suited them. The software recorded the duration between space-bar presses in milliseconds. After participants read a moving segment, the word changed back to dashes so segments could not be reread. To verify comprehension (and to test attentiveness), each sentence was followed by a true/false question, for which half of the answers were true, half false. Additionally, half of the questions focused on the context and half on the movingwindow sentences. After each sentence, participants received automatic feedback on their accuracy on the comprehension questions. Instructions were presented in French and Spanish (on different pages) for the learners, and in French for the NSs. To familiarize the participants with the task, we included five practice items (participants received feedback). Trials were randomized per participant and presented in a Latin square; a given participant saw only one version, out of four conditions, per lexicalization/item.

Materials
Excluding practice items, participants read 58 sentence/context combinations, of which 24 were experimental sentences and 34 were fillers. The context included at the outset of each item was vital to foreground the information in the test sentence; that way, participants would have the potential antecedents for each clitic pronoun (e.g., whether the antecedent was feminine/masculine, human/nonhuman). The inclusion of the context, however, did make the experiment a long one: L2 Participants completed this portion of the study, on average, in 40 min (NSs completed the task in less than 30 min, on average). Participants were encouraged to take optional breaks, which occurred every 20 sentences.
The SPR task had a 2 × 2 × 2 design; the 24 experimental items (six items per cell) manipulated gender (masculine/feminine), [±Human] (human/nonhuman), and match (match/mismatch) as independent variables. Figure 3 illustrates a sample item (English translations were not present). Note that the "match" options come from the match or mismatch of gender features between the referent in the discourse (le passeport "passport" [masculine]) and the features in the accusative clitic ("le" [masculine]). In this sense, the mismatches used in the experiment were soft (discourse) violations, rather than instances of ungrammaticality. Based on the psycholinguistic literature, we expected that mismatched responses would be read more slowly, but we did not expect longer RTs based on the [±Human] feature, since French and Spanish clitics are perfect analogues regarding this formal similarity. However, because French and Spanish have a complete overlap of function and form for the feminine clitic [la], we expected RTs to be a bit slower for masculine referents if surface similarity played a role. Languages 2021, 6, x FOR PEER REVIEW 11 of 20 The SPR was not designed to test the participants' lexical knowledge of gender, which was always unambiguously marked on the determiner (e.g., sa "his/her"/la "the" vs. son "his/her"/le "the") in the (non-moving) context. As in Figure 3, the context in experimental items introduced a main character and two other potential referents (human or non-human, depending on the condition) of different gender. To control for phonological weight, Regions 4 and 6 all had the same number of syllables. Verbs (Region 5) were one-(n = 5) and two-syllable (n = 7) core transitive verbs in the third-person singular present tense (regarde "looks", cherche "looks for", comprend "understands", choisit "chooses", dessine "draws", décrit "describes", déteste "hates", touche "touches", pousse "pushes", laisse "leaves", surveille "monitors", voit "sees"). Each verb was repeated twice.

Results
To account for differences in word length and individual reading speeds, we lengthadjusted RTs using Fine et al.'s (2013) procedure. Then, because RTs were positively skewed (typical of RTs elicited with this type of task), we log-transformed the residuals. This procedure has been used successfully in previous psycholinguistic investigations (e.g., Kim 2018). We then analyzed these length-adjusted log-transformed RTs (henceforth logRTs) using mixed models for each group (L1 vs. L2), based on the reasoning offered earlier. For each model, we included three within-subjects fixed factors: whether the referent was [±Human] (human/non-human); whether the clitic constituted a match with the discourse antecedent (match/mismatch); and whether the gender of the clitic was masculine or feminine.
Because our experimental design was repeated-measures, we included random effects by participant and by item. Following Barr et al. (2013), we fit the maximal converging random-effects structure using a top-down stepwise method. As a departure point, we included all possible random by-subject and by-item slopes, plus their intercepts and interactions. When models did not converge, we removed the random slope that accounted for the least variation. This was an iterative process that continued until the model converged. Alpha was set at 0.05. Once again, we used the lmer package (Bates et al. 2015). Figures 4 and 5 show the logRTs for NSs (L1 French) and L2 learners of French (L1 Spanish), respectively, plotted by gender, [±Human], and match (error bars represent standard error). We expected the effects to show up at the critical region (Region 5), which corresponded to the verb. As is typical in self-paced reading investigations, however, we also expected to analyze subsequent regions for spill-over effects. This is especially common for L2 learners, who typically read more slowly than natives (Dekydtspotter et al. 2006). The SPR was not designed to test the participants' lexical knowledge of gender, which was always unambiguously marked on the determiner (e.g., sa "his/her"/la "the" vs. son "his/her"/le "the") in the (non-moving) context. As in Figure 3, the context in experimental items introduced a main character and two other potential referents (human or non-human, depending on the condition) of different gender. To control for phonological weight, Regions 4 and 6 all had the same number of syllables. Verbs (Region 5) were one-(n = 5) and two-syllable (n = 7) core transitive verbs in the third-person singular present tense (regarde "looks", cherche "looks for", comprend "understands", choisit "chooses", dessine "draws", décrit "describes", déteste "hates", touche "touches", pousse "pushes", laisse "leaves", surveille "monitors", voit "sees"). Each verb was repeated twice.

Results
To account for differences in word length and individual reading speeds, we lengthadjusted RTs using Fine et al.'s (2013) procedure. Then, because RTs were positively skewed (typical of RTs elicited with this type of task), we log-transformed the residuals. This procedure has been used successfully in previous psycholinguistic investigations (e.g., Kim 2018). We then analyzed these length-adjusted log-transformed RTs (henceforth logRTs) using mixed models for each group (L1 vs. L2), based on the reasoning offered earlier. For each model, we included three within-subjects fixed factors: whether the referent was [±Human] (human/non-human); whether the clitic constituted a match with the discourse antecedent (match/mismatch); and whether the gender of the clitic was masculine or feminine.
Because our experimental design was repeated-measures, we included random effects by participant and by item. Following Barr et al. (2013), we fit the maximal converging random-effects structure using a top-down stepwise method. As a departure point, we included all possible random by-subject and by-item slopes, plus their intercepts and interactions. When models did not converge, we removed the random slope that accounted for the least variation. This was an iterative process that continued until the model converged. Alpha was set at 0.05. Once again, we used the lmer package (Bates et al. 2015). Figures 4 and 5 show the logRTs for NSs (L1 French) and L2 learners of French (L1 Spanish), respectively, plotted by gender, [±Human], and match (error bars represent standard error). We expected the effects to show up at the critical region (Region 5), which corresponded to the verb. As is typical in self-paced reading investigations, however, we also expected to analyze subsequent regions for spill-over effects. This is especially common for L2 learners, who typically read more slowly than natives (Dekydtspotter et al. 2006). Languages 2021, 6, x FOR PEER REVIEW 12 of 20  At the critical region (Region_5-verb), the model for the NSs revealed no main effects, but it did unveil a gender*[Human] interaction (Table 7). Follow-up pairwise comparisons using paired-samples t-tests in emmeans in R (Tukey HSD) revealed that [+human] referents were read faster than [−human] referents, but only in the masculine conditions (t = −2.150, p = 0.0409). No other contrasts were significant. The model for L2 learners did not have any main effects or interactions (Appendix A); L2 participants did not make significant distinctions between matched and mismatched referents, irrespective of whether the referent was human, nonhuman, masculine, or feminine.
At the post-verbal region (Region 6), the model for NSs (Appendix B) unveiled an overall marginal human*gender interaction (p = 0.0827), indicating that the differences in logRTs between human and nonhuman referents varied according to gender (masculine, feminine). Because the interaction was marginal, we did not conduct post-hoc pairwise comparisons, although inspecting Figure 4 (Region 6) reveals that masculine referents were read faster when they were human and slower when non-human, while this trend was reversed for feminine referents. The model for L2 learners revealed no main effects or interactions (Appendix C).   At the critical region (Region_5-verb), the model for the NSs revealed no main effects, but it did unveil a gender*[Human] interaction (Table 7). Follow-up pairwise comparisons using paired-samples t-tests in emmeans in R (Tukey HSD) revealed that [+human] referents were read faster than [−human] referents, but only in the masculine conditions (t = −2.150, p = 0.0409). No other contrasts were significant. The model for L2 learners did not have any main effects or interactions (Appendix A); L2 participants did not make significant distinctions between matched and mismatched referents, irrespective of whether the referent was human, nonhuman, masculine, or feminine.
At the post-verbal region (Region 6), the model for NSs (Appendix B) unveiled an overall marginal human*gender interaction (p = 0.0827), indicating that the differences in logRTs between human and nonhuman referents varied according to gender (masculine, feminine). Because the interaction was marginal, we did not conduct post-hoc pairwise comparisons, although inspecting Figure 4 (Region 6) reveals that masculine referents were read faster when they were human and slower when non-human, while this trend was reversed for feminine referents. The model for L2 learners revealed no main effects or interactions (Appendix C). At the critical region (Region_5-verb), the model for the NSs revealed no main effects, but it did unveil a gender*[Human] interaction (Table 7). Follow-up pairwise comparisons using paired-samples t-tests in emmeans in R (Tukey HSD) revealed that [+human] referents were read faster than [−human] referents, but only in the masculine conditions (t = −2.150, p = 0.0409). No other contrasts were significant. The model for L2 learners did not have any main effects or interactions (Appendix A); L2 participants did not make significant distinctions between matched and mismatched referents, irrespective of whether the referent was human, nonhuman, masculine, or feminine.
At the post-verbal region (Region 6), the model for NSs (Appendix B) unveiled an overall marginal human*gender interaction (p = 0.0827), indicating that the differences in logRTs between human and nonhuman referents varied according to gender (masculine, feminine). Because the interaction was marginal, we did not conduct post-hoc pairwise comparisons, although inspecting Figure 4 (Region 6) reveals that masculine referents were read faster when they were human and slower when non-human, while this trend was reversed for feminine referents. The model for L2 learners revealed no main effects or interactions (Appendix C). Finally, although it is not typical to analyze the last region in SPR investigations, Figures 4 and 5 show a pronounced delay in the onset of effects. Our sentences were relatively short, so a delayed effect would not be unusual. However, we realize that disentangling wrap-up effects, which are not completely understood, is impossible with our design. We will address the potential consequences of this design feature in the discussion of our limitations. The model for the L1 group is presented in Table 8, while the model for L2 learners is presented in Table 9. The model for NSs at Region 7 revealed a significant main effect of match (p = 0.0475); no other effects or interactions reached significance. Thus, native French speakers were able to distinguish between matched and mismatched referents, although that decision was not influenced by the status of the referent as [±Human] or by the clitic's gender. The model for the L2 learners revealed only a significant three-way interaction (match*[Human]*gender). Follow-up pairwise comparisons using paired-samples t-tests (emmeans in R; Tukey HSD) revealed a significant contrast between human and nonhuman referents when matched for gender (t(30.4) = −2.703, p = 0.0111) and a marginal effect when there was a gender mismatch (t(32.2) = −1.931, p = 0.0624). Similarly, there was a significant contrast between human and nonhuman referents for masculine (t(31.9) = −2.046, p = 0.0490) and feminine (t(31.1) = −2.584, p = 0.0147). No other contrasts were significant. Significant results are marked with an asterisk (*). Finally, we examined the data by proficiency, treated as a continuous variable, to determine whether it had an impact on match. We ran a generalized linear mixed-effects regression with (centered) proficiency as a fixed-effect predictor and the maximal RES that would converge on the data set (by-subject and by-item slopes, plus their intercepts). This model did not yield any effects related to proficiency (Appendix D), suggesting that accuracy on the task did not increase with higher proficiency.
In summary, the SPR unveiled several important findings. First, as expected, effects were delayed, especially for the L2 group. Second, the NSs made a relevant distinction between matched and mismatched referents, although this result was not affected by the other factors (gender/[±Human]). The L2 learners, on the other hand, evinced a complex three-way interaction (gender*Human*match), whereby they read human and nonhuman referents at different speeds (to different degrees) when referents were matched for gender. They made this distinction within both masculine and feminine clitics. In short, while NSs made the expected distinctions between matched and mismatched referents, the L2 learners were guided by the [±Human] feature, even though their L1 and L2 operate in analogous ways.

General Discussion
As we mentioned at the outset, the goal of our study was two-fold. First, we aimed to determine whether formal similarity between two languages (operationalized via the Feature Reassembly Hypothesis, which is couched in the generative tradition) would allow adult L2 learners of French (Spanish NSs) to straightforwardly acquire third-person singular accusative clitics in their L2 (RQ1). We focused on Spanish-French bilinguals because, in terms of formal similarity, both languages encode the same features in analogous ways regarding accusative clitics. Additionally, following insights from usage-based research on bilinguals, we tried to determine the role played by surface similarity because French and Spanish overlap and diverge in interesting ways; we also aimed to determine whether the difference in the actual form of the masculine (but not feminine) clitics could affect results (RQ2).
To answer these questions, we used an offline picture selection task and an online selfpaced reading task. The results of both tasks showed that, although French and Spanish encode features in similar ways, Spanish-French bilinguals evinced non-trivial difficulties in their interpretation and processing of French accusative clitics. With regard to our research questions, the data from both tasks showed that the answer to RQ1 was negative: Although there was nothing to reassemble, the acquisition of these clitics did not proceed straightforwardly, against the predictions of the Feature Reassembly Hypothesis.
The picture selection task showed that the overall accuracy of the learners was relatively low (64.57% overall, compared to 96.86% for the L1 group), while also showing that learners were more accurate with feminine clitics, but only where these were [+human]. Alternatively, learners were more accurate with [+human] referents, but only when these were feminine. This result lends partial support to the notion that surface-level similarity may play a role, since the feminine clitics share the same form in both languages ("la" [la]), while the masculine clitics differ both in written form and sound ("le" [l ] French, "lo" [lo] in Spanish). Additionally, we did observe proficiency effects over overall accuracy, which means the developmental trajectory is positive.
The results of the self-paced reading task also showed that learners did not process accusative clitics in the same way as natives, and this difference was not only quantitative. Although the learners evinced a delayed effect, compared to NSs, they were also influenced by the gender of the clitic and whether the referent was [±Human] such that they made a distinction between [+human] and [−human] referents within the matched conditions but also for both masculine and feminine referents. The native French speakers' processing of these clitics, however, was not guided by either gender or the feature [±Human] because the only relevant distinction was between (gender-) matched or mismatched referents. These results again challenge the Feature Reassembly Hypothesis, although they do not offer unequivocal support to the notion that surface-level similarity matters: This issue was present in both the feminine and masculine conditions. Together, the results from both experiments showed that Feature Reassembly is not the only factor in the acquisition of object clitics: Even when there is no need for reassembly, that does not entail convergence (RQ1). In terms of surface-level similarity (RQ2), we have partial support from our offline task that there might be a facilitatory effect when formal and surfacelevel similarity coincide. However, our investigation does leave several open questions, the most pressing of which is perhaps this: Given the overwhelming similarity between French and Spanish feminine accusative clitics, why do we not observe convergence?
One interesting feature of both participant groups is their degree of multilingualism. As shown earlier, all participants, irrespective of group, reported learning English. This is not surprising because in Mexico, exposure to English is, at least nominally, compulsory under a program known as El Programa Nacional de Inglés (National English Program) sponsored by the Ministry of Mexican Public Education (Secretaría de Educación Pública). In practice, however, Spanish-English bilingualism is far from universal. This is relevant because, as mentioned in the introduction, English differs from both Spanish and French in terms of encoding the [±Human] feature, since the pronominal system does encode [±Human] (e.g., he/she vs. it). Previous research with Anglophone learners of French revealed differences in reading time patterns depending on the value of the [±Human] feature (Shimanskaya and Slabakova 2017). Thus, we cannot rule out that the L2 learner group could be influenced by the other languages in their repertory. This would necessitate, however, that they had actually acquired the English pronominal system, which is far from an easy task. Given the ubiquity of English as an L2 in Mexico, future research should control for learners' knowledge of the English pronominal system.
An additional open question from the picture selection task is the higher accuracy rates with feminine referents, especially when these are [+human]. We noted earlier the potential role of surface-level similarity, but there might be additional explanations. Recall that, in terms of gender, French NPs bear a gender feature that can be specified as [feminine]  [Masculine], on the other hand, does not have to be individuated because it is not assumed to be represented in the feature matrices. Thus, it is possible that the lower accuracy with masculine clitics is related to its underspecification of gender [Gender: ∅]. Given our particular language combination, this is a possibility we cannot discard.
We noted that researchers such as Hopp (2013) have proposed that lexical knowledge of gender might be distinguished from how gender is processed online. In theory, then, this could constitute an explanation for our results. In our materials, we did not match the gender of the referents because the gender was unambiguously marked in the determiner in the context for both tasks. Moreover, after examining our materials, we note that most NPs were of the same gender in both languages. In the SPR, only six referents (out of the 48 introduced) did not match in gender: With one exception (grille-pan [masc.French], tostadora [fem.Spanish] "toaster"), these referents were all feminine in French but masculine in Spanish. In the picture selection task, only two of the referents (le paplemousse "the grapefruit", le sac à dos "the backpack") were not matched, since they were masculine in French but feminine in Spanish (la toronja "the grapefruit", la mochila "the backpack"). Thus, even if learners had missed the gender in the context, this is an unlikely explanation for our results.
No study is without limitations, and ours is no exception. As mentioned earlier, our study was very long overall (almost two hours of testing), and we tried to shorten the duration by including shorter test sentences in the SPR. This meant the delayed effects occurred in the last region, which is not typically analyzed because wrap-up effects are not well understood (Just et al. 1982). We will note, however, that we are not the first to analyze this region (Tokowicz and Warren 2010); moreover, our data (Figures 4 and 5) show that the effects were quite delayed, which is not surprising because our regions were very small, often including a single word. Future research should take this into account by ensuring that sentences have a sufficient number of regions after the critical one.

Conclusions
Our study aimed to determine whether formal similarity between two languages (operationalized via the Feature Reassembly Hypothesis) would allow adult L2 learners of French (Spanish NSs) to straightforwardly acquire third-person singular accusative clitics in their L2. Additionally, as suggested by usage-based accounts, we examined the role of surface similarity, since French and Spanish overlap and diverge in several ways. Results from a forced-choice picture selection task and a self-paced reading task could not support the Feature Reassembly Hypothesis, as our learners showed considerable difficulty with the interpretation and processing of these pronouns, revealing that, unlike French NSs, their interpretations and processing are guided by the feature [±Human] and, to a lesser degree, by gender, which might be due to the surface-level similarity between feminine accusative clitic pronouns in both languages.
Author Contributions: All authors were involved in the conceptualization, methodology, analysis, and writing. All authors have read and agreed to the published version of the manuscript.

Informed Consent Statement: Informed consent was obtained from all subjects involved.
Data Availability Statement: Data will be available upon request without undue reservation.

Acknowledgments:
We are thankful to Roumyana Slabakova for her help with data collection. We are also indebted to the Alianza Francesa campus Puebla for their help.

Conflicts of Interest:
The authors declare no conflict of interest.  In Spanish, clitics can be attached to an infinitive or a gerund after a reconstruction verb. However, this discussion is outside of the scope of our investigation. 2 DOM stands for "Differential Object Marker," which is used in Spanish to mark objects when these are [+specific] and [+human].

Appendix A
3 Although our study does not focus on dative pronouns, we will point out that the situation is analogous for dative clitics because both French and Spanish use forms that are not inflected for gender (lui-French, le-Spanish; see Table 1). 4 Our presentation leaves aside the phenomenon known as leísmo-the use of the dative clitic le instead of the (etymological) accusative pronouns (lo/la). This is because Mexican Spanish is not considered a leísta dialect and instances of leísmo are viewed as representing intensifying verbal affixes instead (Cacoullos and Hernández 1999). While our French NSs might have come in contact with some leísta varieties due to proximity with Spain, only a few were Spanish learners and, crucially, we test only their native French. 5 Following our IRB protocol, participants could skip questions. Two participants (L2 group) did not answer all demographic questions. Our background questionnaire did not allow us to determine whether learners learned English or French/Spanish first. The information from two participants is missing in some categories, which is why the totals do not reach 100%.