Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses

Levey, Stephen; Rochon, Kathryn L.; Kastronic, Laura

doi:10.3390/languages10090232

Open AccessArticle

Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses

by

Stephen Levey

^1,*,

Kathryn L. Rochon

¹ and

Laura Kastronic

²

¹

Department of Linguistics, University of Ottawa, 70 Laurier Ave East, Ottawa, ON K1N 6N5, Canada

²

Atlantic Canada Opportunities Agency, 644 Main St, Moncton, NB E1C 9J8, Canada

^*

Author to whom correspondence should be addressed.

Languages 2025, 10(9), 232; https://doi.org/10.3390/languages10090232

Submission received: 4 July 2025 / Revised: 2 September 2025 / Accepted: 5 September 2025 / Published: 10 September 2025

Download Versions Notes

Abstract

We argue that quantitative analysis of community-based speech data furnishes an indispensable adjunct to theoretical and experimental studies targeting the acquisition of relativization. Drawing on a comparative sociolinguistic approach, we make use of three corpora of natural speech to investigate second-language (L2) speakers’ acquisition of restrictive relative clauses in English. These corpora comprise: (i) spontaneous L2 speech; (ii) a local baseline variety of the target language (TL); and (iii) L2 speakers’ first language (L1), French. These complementary datasets enable us to explore the extent to which L2 speakers reproduce the discursive frequency of relative markers, as well as their fine-grained linguistic conditioning, in the local TL baseline variety. Comparisons with French facilitate exploration of possible L1 transfer effects on L2 speakers’ production of English restrictive relative clauses. Results indicate that evidence of L1 transfer effects on L2 speakers’ restrictive relative clauses is tenuous. A pivotal finding is that L2 speakers, in the aggregate, closely approximate TL constraints on relative marker selection, although they use the subject relativizer who significantly less often than their TL counterparts. We implicate affiliation with, and integration into, the local TL community as key factors facilitating the propagation of TL vernacular norms to L2 speakers.

Keywords:

L2 speech; restrictive relativization; comparative sociolinguistics

1. Introduction

Relative clause constructions have attracted sustained scholarly interest for over half a century (Roland et al., 2007), addressing a broad range of questions in first language acquisition (e.g., Diessel & Tomasello, 2000, 2005); second and third language acquisition (e.g., Flynn et al., 2004); adult sentence processing (e.g., Gibson, 1998); language typology (e.g., Keenan & Comrie, 1977); as well as grammatical theory (e.g., Alexiadou et al., 2000).1 Much research and theorizing addressing the acquisition of those constructions is articulated from a cognitive perspective (Doughty & Long, 2003), buttressed by an experimental infrastructure involving grammaticality judgements, sentence combinations tasks, elicitation and act out tasks, among a host of other techniques.

In contrast with experimental research on relativization, the bulk of which has targeted perceptual issues relating to processing and decoding strategies (Romaine, 1984; Macdonald, 2015), corpus-based investigations of relative clauses in natural speech production data are much less common (but see e.g., Diessel & Tomasello, 2000; Ghafar Samar, 2000; Yip & Matthews, 2007). The fact that relative clauses are rare in running discourse (Milroy & Gordon, 2003) may explain the predilection for data elicited and analyzed in experimental conditions, although the extent to which findings generated in tightly controlled laboratory settings accurately reflect what transpires in real-world contexts remains a moot point (Jaeger, 2010; Speed et al., 2018). These concerns are symptomatic of the very issues that inspired the current research, in keeping with our goal of contributing to socially sensitive and ecologically valid models of L2 acquisition. We argue that the sociolinguistic investigation of everyday speech is a necessary complement to experimental research on L2 acquisition, as it can refine and subtly enhance our understanding of how relative clauses are acquired, precisely because such an approach engages with, rather than abstracts away from, the inherent variability endemic to natural speech situated in its social context (Labov, 1972a). Because this variability provides critical insights into the nature, extent, and limits of the L2 acquisition process, its correct characterization is of paramount importance in constructing theories of L2 acquisition that are accountable to actual usage facts.

Major incentives for corpus-based studies of relativization have come from constructivist and usage-based investigations of language acquisition (see e.g., Diessel & Tomasello, 2000, 2005; Fox & Thompson, 2007; Wiechmann, 2015). These studies have catalyzed interest in arriving at a better understanding of the relative clause constructions that language learners encounter and acquire as a result of exposure to spontaneous speech. If, as usage-based theories posit, grammatical knowledge is predicated on speakers’ linguistic experience (Bybee, 2010), then it follows that the community-based speech varieties of the TL to which L2 speakers are exposed have the potential to afford insights into the structural biases in the TL input that intimately shape the acquisition process. Both naturalistic and experimental research on L1 acquisition has been instrumental in highlighting the effect of input frequencies on the development of relative clause constructions in child language (Lieven, 2010). As we show below, relative clause constructions in L2 acquisition are impacted by TL input frequency patterns too (see also Mellow, 2006).

Among the key motivations for applying corpus-based approaches to informal social settings is the need to extend the purview of L2 research beyond formal language-learning environments to less formal ones (Bayley & Tarone, 2012), including naturalistic contexts, where the amount, frequency and type of input which learners encounter is said to be far less restricted than in classroom settings (Montrul, 2020). Sankoff et al. (1997, p. 193) argue that if L2 acquisition is more than the product of successful classroom-based learning, then it should exhibit properties of the TL vernacular which are not ordinarily transmitted to learners in academic settings, but are internalized by L2 speakers who have a high degree of contact with the TL community. The quantitative approach employed in the present study is ideally suited to elucidating vernacular patterns in L2 speech and ascertaining whether they are the product of vernacular transmission from the TL, the result of processes that are unique to L2 speech (i.e., interlanguage grammar; see Meyerhoff & Schleef, 2012), or the outcome of transfer effects from L2 speakers’ native language.

Of the different kinds of relative clause that have attracted scholarly attention, the ones we privilege here are restrictive relative clauses. Previous treatments have sought to determine whether the L2 acquisition of these constructions is subject to cross-linguistic influence (e.g., Gass, 1979; Ghafar Samar, 2000; Rochon, 2023), and whether the L2 acquisition of English relative clauses is sensitive to the gradience of difficulty associated with the noun phrase accessibility hierarchy originally posited by Keenan and Comrie (1977). With the exception of Ghafar Samar (2000) and Rochon (2023), however, dedicated variationist investigations of the L2 acquisition of restrictive relativization, based on natural production data, are all but non-existent. One of the goals of the current study is to address this lacuna.

Following Huddleston and Pullum (2002, p. 1035), we construe restrictive relative clauses as ones which delimit the denotational reference of the head nominal they modify. These constructions, reproduced from the discourse of L2 speakers of Canadian English, are exemplified in (1)–(3) below:

(1): There- there’s one guy that literally laughed at me the first time he saw me (L2/002/428)
(2): It was literally the best decision Ø I’ve made (L2/004/33)
(3): And I’m a person who likes to sleep in the morning (L2/013/324)2

We target the variable strategies, alternating between that, zero and WH-forms (see (1)–(3) above), for marking restrictive relative clauses, drawing on vernacular speech. This type of speech is deemed to be the style “which is most regular in its structure” (Labov, 1972a, p. 112), offering “the most systematic data for linguistic analysis” (Labov, 1984, p. 29). As such, it is particularly valued for its potential to reveal community norms. We emphasize that a community focus is critical to the investigation of restrictive relativization because the marking system of restrictive relative clauses in English is considered to be “notoriously variable” (Britain, 2020, p. 95). Inter-community differences in marking preferences are believed to indicate the general absence of a vernacular ‘norm’ in the constitution of the restrictive relative marker paradigm, as well as in terms of the distribution of the markers themselves (see Ball, 1996, p. 243).

A major corollary that ensues from the heterogeneous marking system of restrictive relative clauses in English is that it is essential to establish exactly what L2 speakers are exposed to, rather than simply intuiting the nature of the input (see also Tomasello, 2003, p. 112). A cornerstone of the present investigation is the detailed comparative framework we bring to bear on the L2 acquisition of relative clauses, enabling us to examine them from multiple vantage points. Our research design incorporates three complementary datasets: one representing spontaneous L2 English recorded from Canadian francophones in the Canadian Nation Capital Region between 2018 and 2022; a second corpus of vernacular speech recorded from native Canadian anglophones in the same locality, representing a local baseline variety of the TL; and a third corpus of vernacular Canadian French obtained from a subset of the L2 speakers we recorded.

Our comparative framework enables us to address: (i) whether L2 speakers of English use the same relative markers (or relativizers) as TL speakers to introduce restrictive relative clauses; (ii) whether L2 speakers use individual relative markers at rates which match their discursive frequency in the local TL variety; and, crucially, (iii) whether L2 speakers reproduce in whole, or in part, the fine-grained linguistic conditioning governing variable relativizer selection in the corresponding TL baseline variety (see Rehner & Mougeon, 2022). We stress that our comparison of L2 and TL speaker cohorts is intended as a heuristic only (see White, 2003), and is not meant to imply that the L2 system is an “incomplete” or “lesser” version of its TL counterpart (Bley-Vroman, 1983), or, indeed, that native-like mastery of TL grammar is necessarily the desired target for every L2 speaker (see Nagy et al., 2003).

Another key component of our research design involves systematic comparisons of L2 speech and L1 French, enabling us to identify and characterize any evidence of cross-linguistic influence from speakers’ native French on their L2 restrictive relativization strategies. Relativization is considered to be “a vulnerable area” for contact effects (Muysken, 2012, p. 238), and this possibility is believed to be enhanced when contact varieties share multiple typological similarities (Thomason, 2001), as is the case in the present study. If transfer effects are operative in the L2 speech investigated here, our comparative variationist framework should enable us to detect them.

2. Theoretical Considerations

Our investigation is based on the premise that speech is inherently variable, yet rule-governed and structured (Weinreich et al., 1968). The major analytical construct at the heart of variationist sociolinguistics is the linguistic variable, defined as alternative ways of expressing the same referential meaning or a similar grammatical function (Labov, 1972a). A key methodological requirement in defining the variable context hosting competing variants involves the commitment to accountable reporting (Labov, 1972a). This commitment enjoins the analyst to take into consideration all the relevant forms within the same envelope of variation, including variants that are normatively sanctioned as well as those that are not.

Inspection of experimental research designs often reveals an implicit reliance on the (highly idealized) norms of the standard language as the primary source of information about the canonical format and contexts of use of relative clauses in the target variety under investigation (see Ghafar Samar, 2000; Levey, 2014; Romaine, 1984). An internalized normative predisposition has led experimentalists and theoreticians to claim, for example, that the zero relativizer is only possible in English when the embedded sentence is a non-subject relative clause (e.g., the woman Ø I saw) and that the same variant is ungrammatical in any syntactic environment in French (Hawkins, 1989, p. 162). Similarly, French oblique relative clauses (e.g., la maison dans laquelle j’habite ‘the house in which I live’) are said to categorically require pied-piping, comprising a preposition followed by an overtly-expressed relative pronoun (Duffeler, 2017, pp. 16, 60; Guasti & Shlonsky, 1995, p. 262). In the same vein, preposition stranding in French relative clauses, a shibboleth of North American varieties, is deemed not possible (Labelle, 1990, p. 101). Yet inspection of actual usage data, as shown below, is at odds with such claims. Failure to respect the principle of accountability (Labov, 1972a) restricts our understanding of the acquisition process because it places predetermined limits on the forms and constructions which learners are believed to be exposed to.

Perhaps the most important consequence of accountable reporting resides in the window it affords on the structure of variation. Variation is habitually constrained by multiple factors relating to both the linguistic and social contexts in which variable features compete. The structured nature of the variable system underlying this competition can be inferred from variant distributions and their associated conditioning (i.e., the configuration of social and linguistic factors governing their selection). The various structural factors that determine variant selection operate probabilistically and are therefore amenable to quantitative analysis. Using statistical modelling, the relative contribution of an individual factor to variant choice (e.g., the contribution of an inanimate antecedent NP to the selection of a particular relative marker) is expressed as a probability value. Within a particular factor group, or predictor, the ordering of probability values from largest to smallest constitutes the hierarchy of constraints. When interpreted in the aggregate, the hierarchy of constraints associated with the multiple factors conditioning variant choice functions as a snapshot of the ‘grammar’ or structure underlying variable surface realizations (Poplack, 2011, p. 215).

To assess the L2 acquisition of restrictive relative clauses in the TL baseline variety, we compare variable structure across L2 and TL speaker cohorts. Of particular importance in this comparative exercise are conflict sites, or areas of functional, structural and/or quantitative differences between comparison varieties (Poplack & Meechan, 1998, p. 132). Because conflict sites tend to be variety-specific, they can be used to assess the extent of quantitative and qualitative differences between L2 and TL speech and exploited as fine-grained diagnostics of L2 acquisition.

Acquisition of TL norms by L2 speakers necessarily entails reproduction, or, perhaps more realistically, approximation of TL variable patterns (Geeslin & Long, 2014). The magnitude of this learning task is such that L2 speakers are routinely claimed to fall short of replicating in fine detail the full suite of constraints operating on variable phenomena in the corresponding TL baseline variety. Evidence to that effect typically emerges from the use of non-target-like forms and constructions, and/or from the incomplete acquisition, or reconfiguration, of TL usage constraints (see Howard et al., 2013; Schleef et al., 2011; Schleef, 2017). Reconfiguration of TL usage constraints by L2 speakers, visible in the re-ordering of probability values in the hierarchy of constraints vis-à-vis the order found in the TL baseline, may be diagnostic of the variable learner system, or interlanguage (Selinker, 1972), which develops during the process of acquiring a second language. The variationist approach we utilize here, capable of detecting even minor adjustments in constraint hierarchies conditioning variant choice, enables different learning outcomes to be accurately discriminated.

Among the key determinants of the outcomes of L2 acquisition, extra-linguistic factors figure prominently. These factors typically encompass individual L2 aptitude and proficiency, as well as an array of psychological, motivational and attitudinal considerations (Howard et al., 2013). Among the extra-linguistic parameters that we pay particular attention to here are L2 proficiency as well as L2 speakers’ contact with the local TL community. Since we are targeting here a longstanding bilingual community, characterized by protracted contact between French and English, our working hypothesis is that the context of L2 acquisition amply satisfies the social, cultural and linguistic preconditions believed to be optimal for establishing membership in the TL community. Sustained exposure to and engagement with the TL community are widely recognized to be conducive to high levels of L2 proficiency, enabling L2 learners to attain near-native-like levels of use of TL variables (see Blondeau et al., 2002; Howard et al., 2013; Sankoff et al., 1997).

Recognizing, however, that even relatively advanced L2 speaker groups may subsume a range of individual proficiency levels, we turn in the following section to sampling considerations and our methods for assessing L2 speaker abilities, enabling proficiency to be factored into our research design.

3. Data and Choice of Speakers

The community-based speech data we analyze here were collected in the Canadian National Capital Region including the cities of Ottawa (Ontario) and Gatineau (Quebec). As the site of prolonged contact between English and French, this metropolitan area is deemed a “natural laboratory for language contact” (Poplack, 1989, p. 413). On one side of the provincial border in Ontario, where English is the majority language, approximately 58% of the population of the city of Ottawa claim English as a mother tongue, contrasting with 12.5% French mother-tongue claimants (Statistics Canada, 2021). Conversely, on the other side of the provincial border in Quebec, where French is the designated majority (and official) language, the city of Gatineau comprises 71% French mother-tongue speakers, with some 12% of residents declaring English as their mother tongue (Statistics Canada, 2021).

A fundamental requirement underpinning our compilation of a corpus of spontaneous L2 English speech was that L2 speakers should be sampled from the local native francophone population in the Canadian National Capital Region. They were also expected to have acquired Canadian French as their primary language in childhood from native francophone parents/caregivers. A further requirement was that native francophones should have completed their mandatory schooling in French-speaking educational establishments. In line with our dedicated focus on L2 acquisition, speakers who had been raised bilingually (i.e., French and English) from birth were not eligible for inclusion in the study.

Between 2018 and 2022, we recorded a total of 29 speakers meeting our sampling requirements. Table 1 below shows the distribution of sample members by age and speaker sex.

A self-report language background questionnaire was administered to all L2 speakers in order to gather key information relating to their acquisition of English as a second language, and to assess their degree of contact with the local anglophone TL community. Information abstracted from this questionnaire was used to develop a comprehensive profile characterizing each speaker’s acquisitional history by examining their exposure to formal instruction in English; the language (French, English) used most often in daily life; frequency of English-language use at home, at work and in neighbourhood of residence; as well as the estimated proportion of anglophones in personal social networks.

Most speakers reported having begun formal instruction in English during grades three or four (i.e., between the ages of 8 and 10) of their mandatory schooling. At the time of the recordings, only three speakers reported that they used English more often than French in their daily lives, especially in work or study environments. English language use was least commonly reported in domestic settings, where French prevailed.

Only two L2 speakers, constituting just 7% of the sample, reported having no anglophones in their personal social networks. By contrast, fifty-five per cent percent of L2 speakers (N = 16) reported that native anglophones comprised 50% or more of their individual social networks, with a further 28% (N = 8) estimating that 25–50% of their social networks were made up of anglophones.

Also of relevance to the contact dimension is the fact that 20 speakers (69% of the sample) resided in neighbourhoods in Ottawa, located on the Ontario side of the provincial border, where English is the majority language. This inevitably resulted in some degree of exposure to, and interaction with, anglophones, regulated by the varying ratios of francophones to anglophones in individual neighbourhoods of residence (see Poplack, 2018a, pp. 31–33).

One general proficiency requirement imposed at the outset on all L2 speakers was the ability (and willingness) to participate in a recorded sociolinguistic interview, the standard methodological tool used for eliciting lengthy extracts of casual speech (Labov, 1984). Recordings were conducted in English with a native English-speaking interviewer, who introduced topics of interest to L2 speakers as the basis for extended discussion. The interview protocol was expressly intended to encourage L2 speakers to take the lead in the interaction, with minimal intervention from the interviewer. No other data-gathering instruments were used, allowing L2 speakers the freedom to express themselves as they pleased, and to use vernacular structures as little or as much as desired. Interviews lasted an average of 55 min, testifying to relatively elevated levels of English-language fluency in the L2 speaker cohort. The recorded data were subsequently transcribed, culminating in the creation of a fully searchable corpus of natural L2 data comprising some 277,000 words.

Capitalizing on procedures innovated and elaborated in previous language contact research (Poplack, 2018a; Torres Cacoullos & Travis, 2018), we computed a Cumulative English Proficiency Index (CEPI) score for each L2 speaker. Individual CEPI scores, to be interpreted relative to each other rather than as absolute, global indices of proficiency, were calculated from: (i) speaker self-assessments of English-language proficiency targeting production and comprehension; (ii) scalar responses to questions concerning contextual and situational uses of English (e.g., at home, at work, in the local neighbourhood of residence, for the purposes of socializing, etc.); and (iii) content analysis of L2 speaker production data.

Scalar responses relating to English-language proficiency and contextual uses of English were calculated by assigning a score from zero to ten for each assessed category. Content analysis of speech production data focused on discrete-point measures of lexical usage, including word-searching difficulties (e.g., where L2 speakers overtly indicated that they could not find the ‘appropriate’ English word), in addition to targeting morpho-syntactic features such as the variable omission of the English plural affix −s and the possessive −s morpheme, as well as the variable inflexion of present-tense verbs in the third person. We construe the non-negligible use of such variable features (absent from the TL baseline variety) to reflect developmental characteristics of L2 speakers’ interlanguage. Mean scores based on content analysis of each individual’s transcribed recording were calculated for each speaker. A speaker who made non-negligible use of an array of interlanguage features, as described above, would score less highly in terms of content analysis than one who had a more limited repertoire and used those features less frequently in their discourse.

Cumulative proficiency scores were derived for each L2 speaker from the various (weighted) complementary measures described in (i) to (iii) above. The resultant scores range from a low of 0.450 to a high of 0.863. These scores enabled us to subdivide the L2 sample into four unevenly constituted proficiency bands, as shown in Table 2 below.

In keeping with our comparative focus, we make use of two additional datasets, both collected using a sociolinguistic interview protocol targeting similar conversational topics that were employed in the compilation of the L2 corpus.

The first dataset, the Ottawa English Corpus (OEC), was compiled between 2008 and 2010 and comprises natural speech data recorded from 37 native adult anglophones residing in the Canadian National Capital Region. Amounting to some 273,000 words, this fully transcribed dataset serves as the local TL baseline variety with which L2 speech is compared.

The second dataset is based on vernacular Canadian French recorded from a sub-sample (N = 20) of the total native francophone population (N = 29) who contributed to the L2 English corpus. The fully transcribed French corpus comprises some 228,000 words of running discourse.

We use the French language corpus to explore the potential impact of L1 influence on L2 speakers’ production of restrictive relative clauses. The restrictive relativization systems of French and English share a set of partially corresponding variant forms (e.g., the WH-markers qui/who, which; complementizer que/that) as well as similar—though not identical—contexts of variant use. These partial structural and functional correspondences are believed to be conducive to transfer effects. Furthermore, the sociolinguistic context we target here, exhibiting high levels of bilingualism as well as intense and prolonged language contact, meets all the commonly invoked criteria reported to promote cross-linguistic influence.3

4. Method

All relative clause constructions in the TL, L2 and L1 datasets were manually located in the corpora described above by reading through the transcribed data in their entirety and performing cross-checks with the original audio-files, where necessary. This procedure ensured that all overtly marked restrictive relative clauses, as well as those introduced by a zero or null relativizer, were correctly identified.

All eligible tokens were subsequently extracted and imported into Excel files where they were coded for a number of key predictors hypothesized to influence relative marker selection (see, e.g., Tagliamonte et al., 2005; Wiechmann, 2015). Relative clause constructions falling outside the envelope of variation, as we have defined it, were excluded from the analysis (e.g., non-restrictive relative clauses, adverbial relative clauses, etc.).

To test a number of hypotheses relating to potential constraints on relative marker choice, we incorporated a number of predictors into our study design. A major predictor of relative marker choice relates to the syntactic position or function of the relativizer in the relative clause (Ball, 1996; Romaine, 1982). We distinguished relative clauses in which the relativized element is the subject of the relative clause from those where the relativized element is in non-subject position (i.e., direct object, oblique or object of a preposition, and genitive or possessive). These distinctions allow us to examine whether the use of relative clauses correlates with the typological generalizations posited by Keenan and Comrie (1977). The essence of these generalizations is that there is a hierarchy of grammatical positions (subject > direct object > oblique > genitive) correlating with increasing difficulty (and diminishing frequency) of relativization, such that positions lower down the hierarchy (e.g., oblique) are more difficult to relativize (and correspondingly less frequent)—possibly as a result of working memory and linear processing constraints (e.g., Gibson & Wu, 2013)—than positions further up the hierarchy, which are reportedly easier to relativize (and correspondingly more frequent).

Another major predictor considered to affect relative marker choice concerns the animacy of the antecedent NP in which the relative clause is embedded (D’Arcy & Tagliamonte, 2010; Guy & Bayley, 1995; Tagliamonte et al., 2005). In contrast with its French analogue, qui, which exhibits no sensitivity to the animacy of the head nominal post-modified by a subject relative clause, the English relative marker who encodes the semantic feature [+ human]. Relativizer which, by contrast, is said to be restricted to non-human antecedents, as is relativizer that (Guy & Bayley, 1995), although the use of that with human/non-human antecedents appears to vary significantly across communities (e.g., Tagliamonte, 2002). To examine the effects of animacy on relativizer marker choice, we operationalized a three-way distinction between human, non-human animate and inanimate heads.

Yet another predictor influencing relative marker selection concerns the type of antecedent post-modified by a relative clause. To assess the potential impact of this predictor, we employed a tripartite categorization system, distinguishing definite and indefinite nominal antecedents, as well as pronominal ones.

Matrix clause construction type has also been invoked in connection with relativizer choice. Earlier studies noted that the English zero subject relativizer, now considered “marginally non-standard” (Biber et al., 1999, p. 619), is most likely encountered when the relative clause is embedded in a matrix clause containing an existential-there construction (e.g., there’s a woman Ø wants to see you), a stative-possessive construction (e.g., I have a brother Ø knows him) or a cleft construction (e.g., it’s the upper-class people Ø live in this area) (see Biber et al., 1999; Tagliamonte, 2002). Our coding protocol took the aforementioned main clause constructions into consideration, as well as accounting for relative clauses embedded in isolated head NPs (e.g., people Ø I know), also reported to display distinct relative marker preferences (see Fox & Thompson, 2007).

It has been repeatedly observed that when the grammatical subject of a non-subject relative clause is a pronoun rather than a lexical NP, an overt relativizer is less likely to be present (e.g., Guy & Bayley, 1995; Levey & Hill, 2013). To detect the potential operation of this effect in our data, we distinguished cases where the grammatical subject of a non-subject relative clause is a pronoun from those where it is a full lexical noun phrase.

The two remaining predictors that we test, adjacency and relative clause length, address online processing constraints associated with syntactically complex constructions (see e.g., Rohdenburg, 1996).

With regard to adjacency, there is evidence indicating that the presence of intervening material between a relative clause and its antecedent head promotes the use of an overt relativizer to mitigate parsing difficulties (e.g., Guy & Bayley, 1995; Tottie & Harvie, 2000). Accordingly, we distinguished cases where the relative clause immediately follows its antecedent head NP, categorized as adjacent, from non-adjacent contexts where intervening material (including filled pauses and speech disfluencies) separates the relative clause from its head nominal.

Previous research suggests that the longer the relative clause, the greater the likelihood that an overt relative marker will be selected, whereas shorter relative clauses tend to favour the zero relative marker (Fox & Thompson, 2007). To ascertain any effect of clause length on relativizer selection, we initially counted the number of words in each relative clause, discounting the (variable) presence of the relative marker. Based on average length scores, we subsequently operationalized a binary division between shorter and longer relative clauses. Measures of clause length differ in subject and non-subject relative clauses because subject relative clauses can consist of just one word (i.e., a verb), whereas non-subject relative clauses minimally comprise two words (i.e., a subject and a verb).

In accordance with the comparative axis of our research, we also applied a modified version of the coding protocol to relative clause constructions extracted from L2 speakers’ L1, Canadian French. We pay particular attention to oblique relative clause constructions in Canadian French as these are the locus of variable marking strategies that exhibit noticeable structural and quantitative differences from their English counterparts. In the results section below, we return to those differences and their relevance to elucidating potential L1 transfer effects.

5. Results

Table 3 shows the distribution of relative clauses in L2 and TL discourse according to the syntactic position of the relativized NP. The distributional findings are entirely consistent with the typological generalizations associated with the noun phrase accessibility hierarchy (Keenan & Comrie, 1977), with subject position being the most amenable to relativization and the genitive the least. In fact, there are no genitive relative clauses in the TL data, and only two instances marked by whose in the L2 dataset. Further inspection of the data reveals the use of periphrastic or analytic constructions encoding a possessive function, as illustrated in (4)–(5) below from the L2 data.

(4): no- not all of them, there’s a bunch of them that their parents were just forcing them to try and learn French (L2/008/424)
(5): I mean like people that like their first language is English and speak French like yeah there’s like a huge difference (L2/026/388)

Although such analytic constructions in L2 discourse might seem initially to qualify as developmental or interlanguage phenomena, they are in fact attested in native English vernaculars (Hermann, 2003) and are consistent with an observed (cross-linguistic) tendency to promote NPs to higher positions on the noun phrase accessibility hierarchy that are more amenable to relativization (Keenan & Comrie, 1977).

We next consider the individual strategies that are used to mark relative clauses. Table 4 shows the very uneven distribution of relativizers in the L2 and TL datasets, respectively. The relative marker that is the lead variant in both datasets, occurring somewhat more frequently in L2 speech when contrasted with the TL baseline. Rates of the zero relativizer are almost equivalent in L2 and TL discourse. The WH-relativizers which and whose occur at minuscule rates and play no central role in the relativizer system used by either cohort. The only WH-relativizer that occurs to any significant extent in both datasets is who, although this marker accounts for a larger proportion of the variable context in the TL baseline than in the L2 data.

Because restrictive relativization in contemporary spoken English is widely believed to constitute a “syntactically partitioned system” (Brook & Tagliamonte, 2023, p. 27), effectively comprising alternations between that and who in subject relative clauses, and zero and that in non-subject ones (see also Meyerhoff et al., 2020), a more insightful picture of rate differences can be obtained by considering subject relative clauses separately from non-subject ones. Table 5 presents the results of such an analysis.

In terms of distributional parallels, competition between zero and that in non-subject relative clauses occurs at commensurate rates in the L2 and TL datasets. In subject relative clauses, by contrast, the only quantitative resemblance between the comparison groups pertains to the relatively low rates of the zero subject relativizer in each. Closer inspection of that variant in the L2 data reveals that it surfaces in similar syntactic contexts in the TL baseline, such as existential-there and stative-possessive constructions. Although relegated to a decidedly peripheral role in the marking of subject relative clauses in both datasets, the colloquial status, low frequency and patterning of the zero subject relativizer in L2 speech would seem to indicate that it is the product of vernacular transmission from the local TL variety.

A more complex issue relates to the differential rates of subject who in the L2 and TL data, respectively. What can explain the proportional discrepancies in the use of this variant by the comparison groups? The observed differences in Table 5 are certainly not consistent with any direct influence from L2 speakers’ native French, where as noted earlier (and see further below), the existence of qui (‘who/which’) is precisely the kind of interlingual parallel that would be expected to enhance, rather than impede, the L2 acquisition of relativizer who. We can also rule out the possibility that the overall rate of subject who in the TL baseline variety is exceptional. Comparison with other mainstream urban varieties of Canadian English reveals rates that are almost identical to the one reported for the TL variety in Table 5 (see e.g., Brook & Tagliamonte, 2023, p. 23 for Toronto English).

One possible explanation is that the aggregated L2 data in Table 5 may mask the impact of potential L2 proficiency differences on rates of relativizer who. Table 6, displaying variant inventories and distributions in the L2 data according to two broad proficiency levels, sheds light on this issue. Here we compare low/mid-low proficiency speakers (CEPI score range = 0.450–0.694) with their mid-high/high proficiency counterparts (CEPI score range = 0.700–0.863).

Variant rates in both comparison groups are very similar in non-subject relative clauses and almost identical in subject ones. The overall rate of who in subject relative clauses in both proficiency cohorts is exactly the same, confounding any expectation that proficiency offers a straightforward explanation of the lower incidence of who in L2 speech vis-à-vis the TL baseline.

Recognizing that surface parallels should not be equated with the functional isomorphy of form-based correspondences (Poplack et al., 2012, p. 223), a more exacting measure of the L2 acquisition of who requires us to consider whether those L2 speakers who use this variant do so in appreciably the same way as their TL counterparts. Recall that this sub-sector of the grammar qualifies as a “conflict site” (Poplack & Meechan, 1998), where there are differences in the semantic properties encoded by English who contrasted with its French counterpart, qui. The selection of who is determined by humanness of the antecedent head nominal, whereas qui remains unaffected by the animacy properties of the antecedent head NP. To investigate whether the L2 acquisition of who may involve non-target-like uses, we examine the distribution of subject relativizers according to the animacy properties of the antecedent head NP, as shown in Table 7.4

Table 7 shows that relativizer who is categorically used by L2 speakers with human antecedents, just as in the TL baseline variety, albeit at very different rates. The major distributional difference that emerges is that whereas who is the dominant variant used with human antecedents by TL speakers, it is relativizer that which is preferentially selected in the same context in the L2 data. Summarizing, to the extent that L2 speakers make use of who to mark subject relative clauses, they do so while categorically respecting the animacy constraint that is operative in the TL.5

We next examine the extent to which other environmental constraints on relative marker selection exhibit congruent or divergent patterns across the L2 and TL speaker cohorts. To conduct this comparison, we draw on mixed-effects regression analysis to examine the contribution of independent linguistic predictors, and proficiency in the case of L2 speakers, to the selection of different relative markers.

For statistical purposes, we employ Rbrul, a tool specifically developed for sociolinguistic research with the capacity to generate mixed-effects models (Johnson, 2009). In ensuing tables of the results, each speaker is run as a random effect to control for any individual speaker variance (Johnson, 2009). The numerical formalisms associated with the Rbrul output are to be interpreted as follows. The input probability is a measure of the overall likelihood that the relativizer in question will occur in the dataset. The log-likelihood value indicates the goodness of fit of the regression model to the dataset under consideration and the R² value indicates the proportion of the variance explained by the model. Individual constraints on relativizer choice (i.e., that versus who in subject relative clauses; that versus zero in non-subject ones) are represented by the log-odds (LO) and the centred factor weights (FW). Log-odds with a positive value indicate that the factor shown on the left-hand side of the table has a favouring effect on relativizer choice, whereas those with a negative value exert a disfavouring effect. Centred factor weights have a similar interpretation: those above 0.5 favour relativizer selection, whereas those below 0.5 disfavour the relativizer in question.

The ordering of log-odds/centred factor weights (i.e., from most to least favouring) within an individual predictor, or factor group, constitutes the constraint hierarchy (or ranking). It is the constraint hierarchy, rather than the associated percentage values or total Ns, that remains key to interpreting any comparative analysis. Detailed examination of the constraint hierarchies conditioning variant choice yields the most penetrating characterization of variable structure (Poplack & Tagliamonte, 2001, p. 6) and can be used to gauge the extent to which L2 speakers approximate TL grammatical norms. Constraint hierarchies in L2 speech which resemble, or are broadly parallel to, their counterparts operating in TL speech furnish the most compelling evidence of the successful L2 acquisition of TL variable patterns, when interpreted in the aggregate.

Table 8 and Table 9 present the results of multifactorial regression analyses of predictors contributing to the selection of the relativizers that and who in subject relative clauses in the TL and L2 datasets, respectively. We exclude from our statistical analysis of subject relative clauses speakers who produced no instances of who.

No predictor is selected as significant in either table depicting the results for subject relative clauses. The absence of any significant effect associated with adjacency or relative clause length, two parameters intended to capture the operation of online processing constraints, strengthens our conviction that subject relative clauses are generally less problematic for L2 learners than non-subject ones (Gass, 1979).

Despite the absence of statistically significant effects, we can still compare variable structure across the datasets, as evidenced by the constraint hierarchies. Bolded probability co-efficients in the L2 data indicate permutations in the L2 constraint rankings vis-à-vis the corresponding hierarchy of effects in the TL baseline variety.

With regard to matrix clause construction type (where existential-there and cleft constructions have been collapsed for statistical analysis with other copula clauses, all containing the semantically light verb be), we observe some minor differences in the respective ranking of lone head NPs and other matrix clause constructions with the relativizers that and who. Lone head NPs, for example, are disfavoured with that in the L2 data but favoured with the same variant in the TL baseline. Likewise, when we inspect the probability values for adjacency, that is the choice variant in non-adjacent contexts in the L2 data, whereas who is the preferred marker in the same environment in the TL baseline.

Other differences concern the effects associated with the type of antecedent NP in the L2 data, where indefinite nouns are favoured with that whereas pronouns are disfavoured. These effects are reversed in the case of indefinite nouns and pronouns when relativizer who is selected. Yet again, opposing trends can be observed in the TL baseline in relation to the operation of the same contextual effects.

Finally, although L2 proficiency is not selected as significant in the L2 data, the probability co-efficients for who show that it is very weakly favoured by low- to mid-low proficiency speakers. Paucity of data from speakers with lower CEPI scores precludes any definitive interpretation of this finding. Notwithstanding this caveat, the results tentatively suggest that the use of who in the L2 subject relative clauses examined here does not increase as a function of higher levels of English proficiency. This aligns with our earlier observation that in these data, proficiency does not appear to be a major determinant of who usage.7

We next consider the results for non-subject relative clauses. Table 10 and Table 11 present the results of regression analyses of predictors contributing to the selection of the relativizers that and zero in non-subject relative clauses in the TL and L2 datasets. In the TL corpus, three predictors return significant effects: subject of the relative clause, matrix clause construction type, and adjacency, all highlighted with grey shading. The same predictors are also selected as significant in the L2 data, with a fourth predictor, type of antecedent NP, additionally returning a significant effect in the L2 data, but not in the TL baseline variety.

Subject of the relative clause and adjacency exhibit parallel effects in the two comparison varieties. When the grammatical subject is a noun, relativizer that is preferentially selected, whereas when the grammatical subject is a pronoun, the zero relativizer is the choice relativizer. In usage-based theories, the preference for using the zero relativizer to mark non-subject relative clauses containing a pronominal subject has been attributed to the degree to which the matrix and relative clause are structurally integrated with each other, with a higher degree of “mergedness” favouring relativizer omission in English (Fox & Thompson, 2007, p. 319).

As noted earlier, the constraint hierarchy for adjacency likely reflects universal processing considerations rather than variety-specific effects. For example, the zero variant is strongly disfavoured in both comparison varieties when the relative clause is separated from its antecedent head NP by intervening material. This result is consistent with Rohdenburg’s (1996, p. 151) Complexity Principle. According to this principle, less explicit grammatical options (i.e., the zero variant) are liable to be disfavoured in cognitively complex environments (i.e., non-adjacent contexts), where an overt relativizer is preferred instead. This appears to be especially the case in non-subject relative clauses, where greater processing burdens may be incurred by longer filler-gap dependencies (Gibson, 1998).

When compared with the corresponding direction of effects in the TL baseline variety, re-ordering of the constraint rankings in the L2 data for matrix clause construction type and type of antecedent NP points to subtle adjustments in the conditioning of variant choice. These adjustments are suggestive of L2 speakers’ reconfiguration of constraints operative in the TL grammar. We caution, however, that they must be weighed against the fact that most of the linguistic predictors incorporated into the analysis, including the non-significant effects of animacy and length of the relative clause, pattern in appreciably the same way in the L2 and TL datasets, as gauged from the hierarchy of constraints.

Furthermore, closer inspection of L2 departures from the TL baseline variety reveals that some of the observable disparities in the L2 data are aligned with patterns detected in other native varieties of English. For example, the strong correlation between lone head NPs and the zero relativizer in the L2 data, but not in the corresponding TL baseline variety examined here, has been documented in other urban varieties of Canadian English (see Levey & Hill, 2013).

We also stress that some of the discrepancies visible in the L2 constraint hierarchies reflect relatively trivial alterations to the corresponding hierarchy of effects in the TL baseline variety. For example, copula clauses in the TL variety are a major determinant of the zero relativizer, as indicated by the top-tier probability coefficients (see also Fox & Thompson, 2007; Levey & Hill, 2013), but the same favouring effect also operates in the L2 variety, albeit to a lesser extent. Similarly, although the respective contributions of definite and pronominal antecedents to variant choice are ranked somewhat differently in the comparison varieties, both types of antecedent disfavour relativizer that and favour zero in the L2 data and TL baseline.

Viewed in the aggregate, comparison of the variable structure of subject and non-subject relative clauses in the L2 and TL datasets indicates that the inter-varietal differences we have uncovered are essentially quantitative rather than qualitative in nature. We make no claim here that L2 speakers have fully reproduced the TL variable system in all its precise structural detail—an accomplishment typically associated with first rather than second language acquisition (Labov, 2007). But we would nonetheless emphasize that our findings converge in foregrounding the capacity of relatively advanced L2 speakers to approximate the variable marking of restrictive relative clauses characteristic of the local TL variety.

To what extent might L2 speakers’ native French influence their production of restrictive relative clauses in English? We first consider the inventory and distribution of restrictive relative markers in French, as shown in Table 12. Together accounting for 98% of the variable context, just two relative markers, qui and qu(e), virtually saturate the restrictive relative marker paradigm. This finding is in line with the propensity of vernacular Romance varieties to use relative particles at the beginning of relative clauses, even in syntactic positions (e.g., in oblique relative clauses) where normative grammars would typically require the selection of a relative pronoun that agrees with the semantic and/or morpho-syntactic features of the head NP post-modified by the relative clause (Fiorentino, 2007, pp. 266–267; Stark, 2016, p. 1036). Inspection of the data in Table 12 shows that the relative pronouns dont and lequel(s)/laquelle(s), used here to mark oblique relative clauses, are exceptionally rare, consonant with their reported infrequency in other varieties of colloquial French (see e.g., Schafroth, 1995).

The conspicuous reduction in the number of relativizers found in colloquial (Canadian) French vis-à-vis the much richer paradigm of markers generally encountered in the standard literary language serves as a reminder that structural patterns characteristic of the written variety cannot be uncritically equated with those found in spontaneous speech (Cheshire, 2005; Milroy, 2001; Poplack, 2018b).

The relative marker qui accounts for a disproportionately large swathe of the variable context in French, reflecting the preponderance of subject relative clauses in the data (subject relative clauses account for 66% data; object relative clauses 27%; and oblique relative clauses 7%). As already noted, there is little evidence to suggest that the predominance of qui in the French data has any direct bearing on the frequency of who in L2 speech, where variant rates are significantly lower than in the corresponding TL benchmark. Nor is the zero variant in colloquial French, also found in English, distributed in ways which parallel its use in either the L2 or the TL datasets. To the very limited extent that it occurs in the French data examined here, it marks a mere 4% (N=10) of non-subject relative clauses. By contrast, the zero relativizer co-occurs with 49% of non-subject relative clauses in L2 speech, slightly above 45% in the corresponding TL baseline. Thus, despite the existence of structural variants common to both vernacular French and English, the rates of those shared options diverge markedly in the respective languages concerned, diminishing, rather than strengthening, the possibility of transfer effects.

What little variation there is in relative-marking strategies in the French data is almost exclusively confined to oblique relative clauses. Particularly remarkable in view of their scant recognition in the literature (Gadet, 1995) is the range of strategies used for marking oblique relative clauses in (Canadian) French. These strategies are partitioned across three unevenly distributed constructions, illustrated in (6)–(8) below, where [ ] indicates a null or absent preposition (also referred to in the literature as preposition chopping, Tarallo, 1983; preposition absorption, Poplack et al., 2012; or preposition ghosting, Radford, 2019):

Pied-piping
(6) la madame avec qui je vais vivre en Suisse, elle travaille avec Interpeace (FL1/017/713)
‘the lady with whom I’m going to live in Switzerland, she works with Interpeace’
Null-preposition
(7) ben ça c’est probablement une des personnes que j’ai commencé à parler [ ] en anglais (FL1/009/610)
‘well that’s probably one of the people that I began to speak [ ] in English
Preposition stranding
(8) c’est pas vraiment quelque chose que je m’en fais avec (FL1/025/218)
‘it’s not really something that I’m worried about (lit. ‘with’)’

Contrary to claims in the theoretical and descriptive literature, Table 13 shows that pied-piping, the prescribed strategy for marking French oblique relative clauses, is neither categorical nor even the majority option in the natural speech data analyzed here (see also Poplack et al., 2012, p. 209 for similar results). This finding highlights the gulf between analysts’ preconceived ideas about how oblique relative clauses are marked in French (e.g., Duffeler, 2017; Guasti & Shlonsky, 1995) and what actually transpires in natural speech.

The null-preposition strategy (‘null-prep’), involving the non-use of a (normatively obligatory) preposition in an oblique relative clause (Klein, 1993), is the lead variant in the L1 data, accounting for over half the marking strategies. The incidence of null-prep in the data analyzed here is consistent with its reported prevalence in other vernacular Romance varieties (see e.g., Alba de la Fuente & Pato, 2019; Tarallo, 1983).8

Of particular interest from a contact perspective is that null-prep is reported to be grammatically inadmissible in English (White, 2003, p. 51). A restricted number of cases (N = 10) surface, however, in the L2 data, as exemplified in (9)–(10) below, raising the possibility that a vernacular strategy in L2 speakers’ native French has been transferred to their English.

(9) so then you can go to the questions that you’re evaluating me [ ] (L2/004/345)
(10) like that was like my city that I wanted to go [ ] and like I could see myself living there (L2/017/130)

Only eight L2 speakers avail themselves of null-prep, indicating that it is not a widely diffused option in the L2 speaker sample.

Militating against the interpretation of L1 transfer effects are a number of competing explanations that merit consideration. Firstly, despite its putative grammatical inadmissibility in English (White, 2003), null-prep in oblique relative clauses is sporadically encountered in the TL baseline variety, as illustrated in (11)–(12), where this phenomenon seems to be limited, as far as we can tell, to the relativization of locative PPs (but see Radford, 2019 for a wider range of syntactic environments in English). The fact that a similar, if sparsely instantiated, construction exists in the TL data casts doubt on the hypothesis that potential transfer effects from French uniquely explain this phenomenon in L2 speech.

(11) I am presented a lot of times with moral dilemmas with the direction that my local board is going [ ] (TL/026/10688)
(12) yeah but I sh– I’m supposed to pay my insurance in the same province that I live [ ] (TL/029/11900)

Equally damaging to the L1 transfer hypothesis are claims that null-prep in L2 discourse qualifies as a systematic developmental phenomenon that is independent of the syntactic properties of L2 speakers’ native language (Bardovi-Harlig, 1987; Perpiñán & Cardinaletti, 2024). Bardovi-Harlig’s (1987) seminal study of L2 English, drawing on a large participant pool representing different L1 backgrounds and varying proficiency levels, indicated that before mastering preposition stranding and pied-piping, learners passed through a developmental stage where they did not produce a preposition (i.e., null-prep) in oblique relative clauses. These findings led Bardovi-Harlig (1987) to enunciate the following developmental schema: (i) null-prep > (ii) preposition stranding > pied-piping. Null-prep was mainly produced by L2 learners in the earlier phases of the acquisition process in Bardovi-Harlig’s (1987) study, with Jourdain (1996) also corroborating a strong correlation with proficiency. To the limited extent that null-prep occurs in the current study, it shows no robust correlation with CEPI scores, as speakers in both lower- and higher-proficiency bands make sporadic use of it, possibly supporting the notion that it may be a vestigial developmental strategy.

We conclude that the very limited instances of null-prep in the L2 data inhibit detailed quantitative analysis of its conditioning as well as systematic comparisons with its counterpart phenomenon in spoken French. The paucity of evidence at our disposal does not allow us to categorically rule out L1 influence, but the competing explanations that we reviewed suggest that a conspiracy of internal and external factors (i.e., ‘multiple causation,’ Thomason, 2001, p. 91) may plausibly account for null-prep in the L2 data analyzed here.

As shown in Table 14, the unrivalled strategy used by L2 speakers to mark oblique relative clauses in English is preposition stranding, mirroring the corresponding choice mechanism in the TL. Such is the strength of preposition stranding in the community TL baseline that native anglophones do not produce a single instance of pied-piping, in spite of claims that this is a common preposition-placement strategy in written and spoken English (Hoffmann, 2005, p. 257). Although pied-piping occurs in L2 speakers’ native French, as shown in Table 13, it does not trigger a single instance of its structural equivalent in their spoken English, contrary to abundant claims of structural priming in contact scenarios (see e.g., Loebell & Bock, 2003). One reason why pied-piping is not found in either the TL or L2 datasets is that WH-forms, the only relativizers licensing pied-piping in English, occur rarely in non-subject relative clauses (cf. Table 5). Furthermore, analysis of everyday speech (e.g., Levey, 2024) suggests, contra Hoffmann (2005), that preposition-stranding is by far the most preponderant—and, by extension, salient—option in spontaneous spoken English. Indeed, McDaniel et al. (1998, p. 309) go so far as to claim that pied-piping is not a natural option in English, but a prescriptive artefact acquired during schooling.

Could preposition stranding, despite its avowedly minoritarian status in spoken (Canadian) French, have enhanced speakers’ use of that strategy in their English? Such superficial structural parallels, as we have observed repeatedly, are certainly believed to optimize transfer effects (see e.g., Backus, 2005). Indeed, the very ubiquity of preposition stranding in spoken English is assumed to have triggered, albeit indirectly, the rise in preposition stranding in (Canadian) French as a result of language contact (see e.g., Roberge & Rosen, 1999, p. 154).

Yet as Poplack et al. (2012) caution, superficial form-based correspondences in French and English may be conditioned by very different underlying linguistic processes. This crucial caveat applies to the results of the present study. Out of nine occurrences of preposition stranding in the L1 French data, 56% (5/9) comprise just a single preposition, avec ‘with,’ competing with just three other forms, dedans (used twice) ‘in,’ dessus ‘on’, and de ‘of’ (see also Poplack et al., 2012, p. 216).

By contrast, comparison of the rates of preposition stranding according to the lexical identity of the stranded preposition in the L2 and TL datasets, depicted in Table 15, reveals evidence that is at variance with L1 influence. Both comparison varieties in Table 15 contain a much larger inventory of lexical forms compared to the French data, likely reflecting the substantially greater prevalence of preposition stranding in English. Granted, the most frequently stranded preposition in the English data is with, as is the case with its French equivalent, avec, but the prepositions to and of also figure among the more commonly stranded forms in the L2 and TL English data, in contrast with the documented aversion of their French counterparts, à ‘to, at’, de ‘of,’ to stranding (Poplack et al., 2012, p. 210).9

In summary, we conclude that systematic quantitative analysis of the data at our disposal fails to provide unequivocal evidence indicating that L2 speakers’ native French exerts a discernible structural influence on the restrictive relative clauses they produce in English. We concede that the minor phenomenon of null-prep in L2 oblique relative clauses has a counterpart in vernacular French, but other possible sources of null-prep in L2 discourse, including developmental motivations, necessarily constrain our ability to attribute this phenomenon exclusively to cross-linguistic transfer. Similarly, triangulation of quantitative evidence across the datasets included in our study suggests that the most compelling source of preposition stranding in L2 oblique relative clauses resides in the choice mechanisms operating in the TL baseline variety, unaffected by superficial parallels in speakers’ native French.

6. Discussion and Conclusions

The primary motivation for the research reported here arose from our concern to document relative clause constructions in natural production data representing L2 speech, and to characterize the frequency and structural diversity of those constructions in everyday social interactions.

A major caveat to emerge from the present investigation is that the types of English restrictive relative clauses that L2 learners use, as well as their probability distributions, should not be assumed a priori. This information can only be reliably inferred from systematic examination of community-based speech data. Contextualization of syntactic variation in relation to the everyday speech norms of the TL to which L2 learners are exposed operates as a crucial check on previous findings, including corpus-based ones, which do not necessarily take into account community-based motivations shaping variable usage. Among the quantitative disparities that have emerged from corpus-based studies of relativization, for example, are affirmations that object relative clauses are proportionally more common than subject ones in the spoken language (Roland et al., 2007, p. 357). This claim is squarely at odds with our own results. In each of the three natural speech corpora at our disposal, we found that subject relative clauses are the quantitatively preponderant type. Moreover, the gradient difficulties associated with the relativization of other syntactic positions (i.e., object > oblique > genitive) accord with the typological generalizations associated with the noun phrase accessibility hierarchy (Keenan & Comrie, 1977).

A hallmark of the current study lies in the importance it attaches to the structured heterogeneity that restrictive relative clauses manifest in everyday speech, and to the application of a detailed comparative approach to the analysis of that variation. This approach enabled us to leverage new insights into the sources of orderly heterogeneity in L2 discourse. We observed that the empirical characterization of that heterogeneity, and its implications for achieving a clearer understanding of the L2 acquisition process, have typically received limited attention in many previous experimental and theoretical treatments.

It is worth briefly reviewing possible motivations for that neglect. One reason has to do with inter-disciplinary differences in preferred frameworks of analysis, with experimental researchers typically eschewing the unbalanced datasets often generated by corpus-based studies (see Jaeger, 2010). A more insidious reason, as we have seen, is traceable to a theoretical inclination to rely on highly idealized normative accounts of language as surrogates for the facts of actual usage (Poplack, 2018b). This approach can lead to the erroneous imposition of categoricity on grammatical phenomena (e.g., oblique relative clauses in French) that are in fact inherently variable. It can also inflate the importance of certain constructions, such as pied-piping in English oblique relative clauses (see Levey, 2024), which are vanishingly rare in everyday spontaneous speech.

Because linguists’ intuitions about the existence, frequency and range of occurrence of constructions in speech may not dovetail with the actual patterns that characterize authentic interactions (Bybee, 2008, p. 226; Milroy, 2001, pp. 544–545), access to natural language corpora is indispensable in helping “to bridge the gap between the analysts’ conception of the data and the data themselves” (Ernestus & Baayen, 2011, p. 374). Indeed, the importance of examining “frequency distributions in native discourse to look for possible sources for acquisition patterns” (Shirai & Ozeki, 2007, p. 160) appears to be gaining traction in experimental paradigms.

Not only does a quantitative approach enable the relevant facts of variable usage to be laid bare, its greatest asset arguably lies in the level of analytical granularity it affords for comparing variable structure across different speaker groups and for capitalizing on structural comparisons as metrics of L2 acquisition.

What have those structural comparisons revealed in the present study? A first major finding is that L2 speakers use the same variants to mark restrictive relative clauses that are employed by TL speakers. Discursive frequencies of two restrictive relative markers, that and zero, are largely commensurate with the usage rates of TL speakers, especially in non-subject relative clauses. Only in the case of the subject relative marker who did we find reduced occurrences of this variant in the speech of the L2 cohort vis-à-vis the TL community baseline. None of the possible explanations we reviewed satisfactorily accounts for this discrepancy. Inspection of CEPI scores revealed no correlation in the use of who with English-language proficiency levels, or, indeed, with any other extralinguistic measure (e.g., level of educational attainment). There is, however, no indication in the data, as demonstrated by our quantitative comparisons of L2 and TL speaker cohorts, that those L2 speakers who make use of who do so in different ways from TL speakers. This bolsters our conviction that we are dealing here with a quantitative rather than a qualitative distinction between L2 and TL speakers, although the root cause of that distinction remains elusive.

Appeals to possible L1 influence on L2 speakers’ reduced use of subject who are inconsistent with what we find in speakers’ native French, where qui, partially equivalent to English who, is the quantitatively dominant marker in the French restrictive relativizer paradigm, potentially rendering it available for cross-linguistic priming. Granted, certain scholars maintain that the typological rarity of WH-relativizers cross-linguistically renders them less susceptible to transfer effects (Gisborne, 2024), despite claims in the literature ascribing a key role to language contact in the areal diffusion of interrogative pronouns as relative clause markers (see, e.g., Comrie, 1998; Auderset, 2020). Nor can any convincing explanation be extrapolated from previous research targeting the L2 acquisition of English relativizers. Inspection of the literature reveals that the L2 acquisition of who does not show any conspicuous quantitative anomalies in learner groups that parallel our own findings. In fact, Ghafar Samar’s (2000, pp. 117–118) study of L1 Persian speakers’ acquisition of English in the Canadian National Capital Region showed that L2 speakers actually used who at rates which surpassed those in the local TL community, despite the absence of corresponding WH-forms in their native Persian.

The expectation that cross-linguistic parallels between L1 and TL forms should have a facilitatory effect on L2 production preferences is not borne out with regard to the subject who. This may be because English who is not wholly congruent with French qui on account of differences in the semantic properties encoded by the respective forms. Recent work on cross-linguistic structural priming effects (Van Lieburg et al., 2023) suggests that if the structures concerned are not wholly similar across languages, this may have an inhibitory effect on production preferences, although the precise mechanisms underpinning inhibition processes remain to be determined.10

In spite of the putative susceptibility of relativization strategies to contact effects (e.g., Muysken, 2012), our comparative approach failed to turn up compelling evidence in favour of such a scenario. The only potential candidate for cross-linguistic transfer that we detected, involving the L2 use of null-prepositions (null-prep) in oblique relative clauses, remains insubstantiated. There is no proprietary relationship between this construction and speakers’ L1, French, as there is copious evidence indicating that null-prep is a robust interlanguage phenomenon.

A central finding of our research is that L2 speakers with varying proficiency levels are capable, in the aggregate, of approximating probabilistic constraints on relativizer selection that are operative in the local TL baseline variety. To be sure, the precise configuration of those constraints in the L2 data is not a wholesale facsimile of what is found in the corresponding TL baseline. But the adjustments that are discernible in the L2 variable grammar conditioning relative marker selection are, as far as we can determine, relatively minor. These adjustments are not of the magnitude that would warrant the inference that L2 speakers’ variable system constitutes a profoundly divergent interlanguage grammar in comparison with its TL counterpart.

Our results contribute to the growing body of evidence indicating that L2 speakers are sensitive to, and capable of reproducing, statistical regularities in the TL input to which they are exposed. Of particular importance is the fact that many of the fine-grained patterns depicted in Table 8, Table 9, Table 10 and Table 11 lie so far below the level of conscious awareness that they are highly unlikely to have been explicitly transmitted to learners in formal language-learning contexts. The tendency to omit a relativizer in non-subject relative clauses, visible in Table 10 and Table 11, when the main clause is semantically or propositionally ‘light’ (i.e., a matrix copular clause) is but one example of a non-trivial pattern that has eluded traditional accounts of relativization (see Fox & Thompson, 2007). This pattern is nonetheless firmly entrenched in the TL community grammar and reproduced by L2 speakers examined here.

We submit that the principal means by which such implicit patterns are conveyed to L2 learners is via a process of vernacular transmission mediated by contact with the local TL community (Sankoff et al., 1997). Of central importance in understanding the propagation of TL vernacular norms to L2 speakers are the social characteristics of the acquisition context. Situated in a stable, long-standing bilingual community where considerable value is attached to knowledge of both official languages, the social and attitudinal circumstances of L2 acquisition, abetted by extensive exposure to TL speakers, are highly conducive to advanced L2 attainment. Recall that the personal social networks of many of the francophones we targeted are made up of substantial proportions of anglophones, reaching majority levels for more than half of the L2 speaker sample. The fact that many francophones have close affiliations with the local TL community suggests that L2 speakers have integrative motivations for using English in their anglophone friendship groups. As Sankoff et al. (1997, p. 193) observe, a greater degree of social integration into the TL community leads to greater linguistic integration as well.

We conclude by pointing to some of the limitations of our own study and the need for additional research targeting issues that we have insufficiently addressed. Foremost among those issues is the role of individual differences in the L2 acquisition of restrictive relative clauses. Although our statistical methods were configured to take into account inter-individual patterns of variation in the marking of relative clauses, the nature and extent of intra-individual patterns of variation remain to be determined, as does their longitudinal development.

We stress that the absence of any detailed assessment of individual differences in the current study is not a defect of our methodological approach, but derives instead from the nature of the spontaneous speech data we privileged. As observed earlier, relative clauses are infrequent in running discourse, which severely restricts our ability to mine copious and balanced amounts of data for each speaker. Further sub-categorization of those data into different types of relative clause (i.e., subject, object, oblique), essential for analytical purposes, inevitably results in additional imbalances and skewed token distributions. Witness the fact, for example, that genitive relative clauses are almost absent from the corpora we examined. In the same vein, the analysis of oblique relative clauses resulted in only modest quantities of data for each corpus, precluding any meaningful quantitative assessment of individual patterns of variation.

Although distributional asymmetries and sparse data cells are unavoidable when working with natural language data, experimental paradigms have developed tried-and-tested protocols for eliciting and analyzing syntactic variables that occur at sub-optimal rates in everyday speech. In keeping with the utility of approaching “a single problem with different methods” (Labov, 1972b, pp. 118–119), “triangulating corpus and experimental methodologies complementarily” (Deshors & Gries, 2022, p. 171) would seem to offer fertile avenues for mitigating the impact of the limitations we have identified. Whatever transpires in future investigations into the L2 acquisition of relativization, this line of inquiry can only be enriched by the study of actual interactions situated in their community-based context, as we hope to have shown.

Author Contributions

Conceptualization, S.L.; Methodology, S.L. and K.L.R.; Formal analysis, S.L. and K.L.R.; Data curation, K.L.R. and L.K.; Writing—original draft, S.L.; Project administration, S.L. and L.K.; Funding acquisition, S.L. and L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Social Sciences and Humanities Research Insight, Grant grant number [435-2018-0999].

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Social Sciences and Humanities Research Ethics Board (REB), University of Ottawa (file no. S-10-18-1140, approved on 19 November 2018) for studies involving humans.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data analyzed in this study may be made available on request from the corresponding author. The data are not publicly available in accordance with the informed consent guidelines provided to the participants.

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1	The term ‘relative clause construction’ is used throughout this article to refer to constructions comprising two clauses: a matrix clause containing an NP which is post-modified by an embedded relative clause. The term ‘relative clause’ is used to refer to the post-modifying clause itself.
2	Examples are reproduced verbatim from spoken corpora. Codes in parentheses refer to the corpus from which the example is drawn (L2 = Second Language Corpus of English; TL = Target Language Corpus of English; FL1 = French First Language Corpus); the unique speaker identifier; and the line number of the utterance in the respective corpus. We use the Ø symbol to refer to the null or zero relative marker in spoken English and French.
3	Many theoretical linguists argue that qui is not a WH-word but an allomorph of complementizer que (see Mackenzie, 2018), although Koopman and Sportiche (2014) claim that qui behaves as an uncontroversial WH-pronoun in certain contexts.
4	Excluded from the results are limited instances of non-human animate NPs. We display the results for subject relative clauses only because the relativizer who is almost entirely restricted to that syntactic environment in our datasets (see also D’Arcy & Tagliamonte, 2010, p. 391).
5	Following Brook and Tagliamonte (2023, pp. 31, 33), we also examined the influence of education on individual rates of who, but found no systematic effect in the L2 data.
6	Regression analyses were run separately for each relative marker. As we are dealing in Table 8 (and subsequent tables based on regression analysis) with what is essentially a binary variable, probability values for competing variants within the same speaker cohort are mirror images of each other.
7	Recall that Table 9 excludes L2 speakers who make no use of who. Thus, the percentage values for who for the two proficiency groups in Table 9 differ from those in Table 6, based on the entire L2 speaker cohort.
8	The ‘other’ category in Table 13 includes one instance of dont as an oblique relativizer and one instance where the relative clause is doubly marked by pied-piping and preposition stranding.
9	Semantically ‘weak’ prepositions (e.g., à, de), whose interpretation is much more context dependent than that of their ‘strong’ equivalents (e.g., avec), are more likely to be ‘absorbed’ in Poplack et al.’s (2012) terminology (i.e., to result in null-prep).
10	We are grateful to a reviewer for bringing Van Lieburg et al. (2023) to our attention.

References

Alba de la Fuente, A., & Pato, E. (2019). Cortadora relative clauses: A comparative analysis between Spanish, Portuguese, and French. Isogloss: Open Journal of Romance Linguistics, 5, 1–19. [Google Scholar] [CrossRef]
Alexiadou, A., Law, P., Meinunger, A., & Wilder, C. (2000). The syntax of relative clauses. John Benjamins. [Google Scholar] [CrossRef]
Auderset, S. (2020). Interrogatives as relativization markers in Indo-European. Diachronica, 37(4), 474–513. [Google Scholar] [CrossRef]
Backus, A. (2005). Code-switching and language change: One thing leads to another? International Journal of Bilingualism, 9(3/4), 307–340. [Google Scholar] [CrossRef]
Ball, C. (1996). A diachronic study of relative markers in spoken and written English. Language Variation and Change, 8(2), 227–258. [Google Scholar] [CrossRef]
Bardovi-Harlig, K. (1987). Markedness and salience in second-language acquisition. Language Learning, 37(3), 385–407. [Google Scholar] [CrossRef]
Bayley, R., & Tarone, E. (2012). Variationist perspectives. In S. M. Gass, & A. Mackey (Eds.), The Routledge handbook of second language acquisition (pp. 41–56). Routledge. [Google Scholar] [CrossRef]
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar of spoken and written English. Pearson Education Limited. [Google Scholar]
Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The case of systematicity. Language Learning, 33, 1–17. [Google Scholar] [CrossRef]
Blondeau, H., Nagy, N., Sankoff, G., & Thibault, P. (2002). La couleur locale du français L2 des anglo-montréalais. Acquisition et interaction en langue étrangère, 17, 73–100. [Google Scholar] [CrossRef]
Britain, D. (2020). What happened to those relatives from east Anglia? A multilocality analysis of dialect levelling in the relative marker system. In K. V. Beaman, I. Buchstaller, S. Fox, & J. A. Walker (Eds.), Advancing socio-grammatical variation and change (pp. 93–114). Routledge. [Google Scholar]
Brook, M., & Tagliamonte, S. (2023). Subject relative who in Ontario, Canada: Change from above in a transplanted ecology. Journal of Linguistic Geography, 11, 25–37. [Google Scholar] [CrossRef]
Bybee, J. (2008). Usage-based grammar and second-language acquisition. In P. Robinson, & N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language acquisition (pp. 216–236). Routledge. [Google Scholar]
Bybee, J. (2010). Language, usage and cognition. Cambridge University Press. [Google Scholar] [CrossRef]
Cheshire, J. (2005). Syntactic variation and spoken language. In L. Cornips, & K. Corrigan (Eds.), Syntax and variation: Reconciling the biological and the social (pp. 81–106). John Benjamins. [Google Scholar] [CrossRef]
Comrie, B. (1998). Rethinking the typology of relative clauses. Language Design, 1, 59–86. [Google Scholar]
D’Arcy, A., & Tagliamonte, S. (2010). Prestige, accommodation, and the legacy of relative who. Language in Society, 39(3), 383–410. [Google Scholar] [CrossRef]
Deshors, S. C., & Gries, S. (2022). Using corpora in research on second language psycholinguistics. In A. Godfroid, & H. Hopp (Eds.), The Routledge handbook of second language acquisition and psycholinguistics (pp. 164–177). Routledge. [Google Scholar] [CrossRef]
Diessel, H., & Tomasello, M. (2000). The development of relative clauses in spontaneous child speech. Cognitive Linguistics, 11(1/2), 131–151. [Google Scholar] [CrossRef]
Diessel, H., & Tomasello, M. (2005). A new look at the acquisition of relative clauses. Language, 81(4), 882–906. [Google Scholar] [CrossRef]
Doughty, C., & Long, M. (2003). The scope of inquiry and goals of SLA. In C. Doughty, & M. Long (Eds.), The handbook of second language acquisition (pp. 3–16). Wiley-Blackwell. [Google Scholar] [CrossRef]
Duffeler, M. A. C. M. (2017). The comprehension of relative clauses by Romance learners of English: Syntactic and semantic influences [Unpublish doctoral dissertation, Vrije Universiteit]. Available online: https://www.lotpublications.nl/Documents/479_fulltext.pdf (accessed on 2 January 2025).
Ernestus, M., & Baayen, R. H. (2011). Corpora and exemplars in phonology. In J. Goldsmith, J. Riggle, & A. C. L. Yu (Eds.), The handbook of phonological theory (2nd ed., pp. 374–400). Wiley-Blackwell. [Google Scholar] [CrossRef]
Fiorentino, G. (2007). European relative clauses and the uniqueness of the relative pronoun type. Italian Journal of Linguistics, 19(2), 263–291. [Google Scholar]
Flynn, S., Foley, C., & Vinnitskaya, I. (2004). The cumulative-enhancement model for language acquisition: Comparing adults’ and children’s patterns of development in first, second and third language acquisition of relative clauses. International Journal of Multilingualism, 1(1), 3–16. [Google Scholar] [CrossRef]
Fox, B., & Thompson, S. A. (2007). Relative clauses in English conversation. Studies in Language, 31(2), 293–326. [Google Scholar] [CrossRef]
Gadet, F. (1995). Les relatives non standard en français parlé, le système et l’usage. Études Romanes, 34, 141–162. [Google Scholar]
Gass, S. (1979). Language transfer and universal grammatical relations. Language Learning, 29(2), 327–344. [Google Scholar] [CrossRef]
Geeslin, K., & Long, A. Y. (2014). Sociolinguistics and second language acquisition: Learning to use language in context. Routledge. [Google Scholar] [CrossRef]
Ghafar Samar, R. (2000). Aspects of second language speech: A variationist perspective on second language acquisition [Unpublished doctoral dissertation, University of Ottawa]. [Google Scholar]
Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1), 1–76. [Google Scholar] [CrossRef] [PubMed]
Gibson, E., & Wu, H.-H. I. (2013). Processing Chinese relative clauses in context. Language and Cognitive Processes, 28(1–2), 125–155. [Google Scholar] [CrossRef]
Gisborne, N. (2024, December). Contact as an explanation of the spread of wh-relatives. In Fourth AMC Symposium: Contact and language change. University of Edinburgh. [Google Scholar]
Guasti, M. T., & Shlonsky, U. (1995). The acquisition of French relative clauses reconsidered. Language Acquisition, 4(4), 257–276. Available online: http://www.jstor.org/stable/20011426 (accessed on 2 January 2025). [CrossRef]
Guy, G. R., & Bayley, R. (1995). On the choice of relative pronouns in English. American Speech, 70(2), 148–162. [Google Scholar] [CrossRef]
Hawkins, R. (1989). Do second language learners acquire restrictive relative clauses on the basis of relational or configurational information? The acquisition of French subject, direct object and genitive restrictive relative clauses by second language learners. Second Language Research, 5(2), 156–188. [Google Scholar] [CrossRef]
Hermann, T. (2003). Relative clauses in dialects of English: A typological approach [Unpublished doctoral dissertation, University of Freiburg]. [Google Scholar]
Hoffmann, T. (2005). Variable vs. categorical effects: Preposition pied-piping and stranding in British English relative clauses. Journal of English Linguistics, 33(3), 257–297. [Google Scholar] [CrossRef]
Howard, M., Mougeon, R., & Dewaele, J. M. (2013). Sociolinguistics and second language acquisition. In R. Bayley, R. Cameron, & C. Lucas (Eds.), The Oxford handbook of sociolinguistics (pp. 340–359). Oxford University Press. [Google Scholar] [CrossRef]
Huddleston, R., & Pullum, G. K. (2002). The Cambridge grammar of the English language. Cambridge University Press. [Google Scholar] [CrossRef]
Jaeger, F. T. (2010). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology, 61(1), 23–62. [Google Scholar] [CrossRef] [PubMed]
Johnson, D. E. (2009). Getting off the Goldvarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass, 3, 359–383. [Google Scholar] [CrossRef]
Jourdain, S. (1996). The case of null-prep in the interlanguage of adult learners of French [Unpublished doctoral dissertation, Indiana University]. [Google Scholar]
Keenan, E., & Comrie, B. (1977). Noun phrase accessibility and universal grammar. Linguistic Inquiry, 8, 63–99. Available online: https://www.jstor.org/stable/4177973 (accessed on 2 January 2025).
Klein, E. (1993). Toward second language acquisition: A study of null-prep. Kluwer. [Google Scholar]
Koopman, H., & Sportiche, D. (2014). The que/qui alternation: New analytical directions. In P. Svenonius (Ed.), Functional structure from top to toe: The cartography of syntactic structures (Vol. 9, pp. 46–96). Oxford University Press. [Google Scholar] [CrossRef]
Labelle, M. (1990). WH-movement, and the development of relative clauses. Language Acquisition, 1(1), 95–119. Available online: http://www.jstor.org/stable/20011343 (accessed on 2 January 2025). [CrossRef]
Labov, W. (1972a). Language in the inner city. University of Pennsylvania Press. [Google Scholar]
Labov, W. (1972b). Some principles of linguistic methodology. Language in Society, 1(1), 97–120. [Google Scholar] [CrossRef]
Labov, W. (1984). Field methods of the project on linguistic change and variation. In J. Baugh, & J. Sherzer (Eds.), Language in use: Readings in sociolinguistics (pp. 28–54). Prentice-Hall. [Google Scholar]
Labov, W. (2007). Transmission and diffusion. Language, 82(2), 344–387. [Google Scholar] [CrossRef]
Levey, S. (2014). A comparative variationist perspective on relative clauses in child and adult speech. In R. Torres Cacoullos, N. Dion, & A. Lapierre (Eds.), Linguistic variation: Confronting fact and theory (pp. 22–37). Routledge. [Google Scholar] [CrossRef]
Levey, S. (2024). Standard and non-standard English. In S. Fox (Ed.), Language in Britain and Ireland (pp. 48–69). Cambridge University Press. [Google Scholar] [CrossRef]
Levey, S., & Hill, C. (2013). Social and linguistic constraints on relativizer omission in Canadian English. American Speech, 88(1), 32–62. [Google Scholar] [CrossRef][Green Version]
Lieven, E. (2010). Input and first language acquisition: Evaluating the role of frequency. Lingua, 120, 2546–2556. [Google Scholar] [CrossRef]
Loebell, H., & Bock, K. (2003). Structural Priming across Languages. Linguistics, 41(5), 791–824. [Google Scholar] [CrossRef]
Macdonald, M. C. (2015). The emergence of language comprehension. In B. MacWhinney, & W. O’Grady (Eds.), The handbook of language emergence (pp. 81–99). Wiley-Blackwell. [Google Scholar] [CrossRef]
Mackenzie, I. (2018). The case of special qui. Journal of French Language Studies, 28(1), 21–41. [Google Scholar] [CrossRef]
McDaniel, D., McKee, C., & Bernstein, J. (1998). How children’s relatives solve a problem for minimalism. Language, 74(2), 308–334. [Google Scholar] [CrossRef]
Mellow, J. D. (2006). The emergence of second language syntax: A case study of the acquisition of relative clauses. Applied Linguistics, 27(4), 645–670. [Google Scholar] [CrossRef]
Meyerhoff, M., Birchfield, A., Ballard, E., Watson, C., & Charters, H. (2020). Restrictions on relative clauses in Auckland, New Zealand. In K. V. Beaman, I. Buchstaller, S. Fox, & J. A. Walker (Eds.), Advancing socio-grammatical variation and change: In honour of Jenny Cheshire (pp. 115–133). Routledge. [Google Scholar]
Meyerhoff, M., & Schleef, E. (2012). Variation, contact and social indexicality in the acquisition of (ing) by teenage migrants. Journal of Sociolinguistics, 16(3), 398–416. [Google Scholar] [CrossRef]
Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal of Sociolinguistics, 5(4), 530–555. [Google Scholar] [CrossRef]
Milroy, L., & Gordon, M. (2003). Sociolinguistics: Models and methods. Wiley-Blackwell. [Google Scholar] [CrossRef]
Montrul, S. (2020). How learning context shapes heritage and second language acquisition. In M. Dressman, & R. W. Sadler (Eds.), The handbook of informal language learning (pp. 57–74). Wiley. [Google Scholar] [CrossRef]
Muysken, P. (2012). Another icon of language contact shattered. Bilingualism: Language and Cognition, 15(2), 237–239. [Google Scholar] [CrossRef]
Nagy, N., Blondeau, H., & Auger, J. (2003). Second language acquisition and ‘real’ French: An investigation of subject doubling in the French of Montreal Anglophones. Language Variation and Change, 15(1), 73–103. [Google Scholar] [CrossRef]
Perpiñán, S., & Cardinaletti, A. (2024). Null-prep as a systematic interlanguage phenomenon: Evidence from relative clauses, interrogatives, and sluicing constructions. Second Language Research, 40(1), 139–169. [Google Scholar] [CrossRef]
Poplack, S. (1989). The care and handling of a mega-corpus. In R. W. Fasold, & D. Schiffrin (Eds.), Language change and variation (pp. 411–451). John Benjamins. [Google Scholar]
Poplack, S. (2011). Grammaticalization and linguistic variation. In B. Heine, & H. Narrog (Eds.), The handbook of grammaticalization (pp. 209–224). Oxford University Press. [Google Scholar]
Poplack, S. (2018a). Borrowing: Loanwords in the speech community and in the grammar. Oxford University Press. [Google Scholar] [CrossRef]
Poplack, S. (2018b). Categories of grammar and categories of speech: When the quest for symmetry meets inherent variability. In N. Shin, & D. Erker (Eds.), Questioning theoretical primitives in linguistic inquiry: Papers in honor of Ricardo Otheguy (pp. 7–34). John Benjamins Publishing Company. [Google Scholar] [CrossRef]
Poplack, S., & Meechan, M. (1998). How languages fit together in code-mixing. International Journal of Bilingualism, 2(2), 127–138. [Google Scholar] [CrossRef]
Poplack, S., & Tagliamonte, S. (2001). African American English in the diaspora. Blackwell. [Google Scholar]
Poplack, S., Zentz, L., & Dion, N. (2012). Phrase-final prepositions in Quebec French: An empirical study of contact, code-switching and resistance to convergence. Bilingualism: Language and Cognition, 15(2), 203–225. [Google Scholar] [CrossRef]
Radford, A. (2019). Relative clauses: Structure and variation in everyday English. Cambridge University Press. [Google Scholar] [CrossRef]
Rehner, K., & Mougeon, R. (2022). Variationist methods of analysis. In K. Geeslin (Ed.), Handbook of second language acquisition and sociolinguistics (pp. 200–211). Routledge. [Google Scholar] [CrossRef]
Roberge, Y., & Rosen, N. (1999). Preposition stranding and que-deletion in varieties of North American French. Linguistica Atlantica, 21, 153–168. Available online: https://journals.lib.unb.ca/index.php/la/article/view/22461 (accessed on 2 January 2025).
Rochon, K. L. (2023). Grammar THAT varies for speakers WHO are proficient: Relative clauses in second-language speech [Unpublished master’s dissertation, University of Ottawa]. [Google Scholar]
Rohdenburg, G. (1996). Cognitive complexity and increased grammatical explicitness in English. Cognitive Linguistics, 7(2), 149–182. [Google Scholar] [CrossRef]
Roland, D., Dick, F., & Elman, J. L. (2007). Frequency of basic English grammatical structures: A corpus analysis. Journal of Memory and Language, 57, 348–379. [Google Scholar] [CrossRef] [PubMed]
Romaine, S. (1982). Socio-historical linguistics: Its status and methodology. Cambridge University Press. [Google Scholar] [CrossRef]
Romaine, S. (1984). The language of children and adolescents: The acquisition of communicative competence. Blackwell. [Google Scholar]
Sankoff, G., Thibault, P., Nagy, N., Blondeau, H., Fonollosa, M.-O., & Gagnon, L. (1997). Variation in the use of discourse markers in a language contact situation. Language Variation and Change, 9(2), 191–217. [Google Scholar] [CrossRef]
Schafroth, E. (1995). À propos d’une typologie panromane des relatifs ‘non normatifs’. In C. Bougy, P. Boissel, & B. Garnier (Eds.), Mélanges René Lepelley, recueil d’études en hommage au Professeur René Lepelley (pp. 363–374). Musée de Normandie. [Google Scholar]
Schleef, E. (2017). Developmental sociolinguistics and the acquisition of T-glottalling by immigrant teenagers in London. In G. de Vogelaer, & M. Katerbow (Eds.), Acquiring sociolinguistic variation (pp. 311–347). John Benjamins. [Google Scholar] [CrossRef]
Schleef, E., Meyerhoff, M., & Clark, L. (2011). Teenagers’ acquisition of variation: A comparison of locally-born and migrant teens’ realisation of English (ing) in Edinburgh and London. English World-Wide, 32(2), 206–236. [Google Scholar] [CrossRef]
Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics, 10, 209–231. [Google Scholar] [CrossRef]
Shirai, Y., & Ozeki, H. (2007). Introduction. Studies in Second Language Acquisition, 29(2), 155–167. [Google Scholar] [CrossRef]
Speed, L., Wnuk, E., & Majid, A. (2018). Studying psycholinguistics out of the lab. In A. M. B. de Groot, & P. Hagoort (Eds.), Research methods in psycholinguistics and the neurobiology of language: A practical guide (pp. 90–207). John Wiley. [Google Scholar] [CrossRef]
Stark, E. (2016). Relative clauses. In A. Ledgeway, & M. Maiden (Eds.), The Oxford guide to the Romance languages (pp. 1029–1040). Oxford University Press. [Google Scholar] [CrossRef]
Statistics Canada. (2021). Census of population: Ottawa-Gatineau census metropolitan area. Available online: https://www12.statcan.gc.ca/census-recensement/2021/as-sa/fogs-spg/alternative.cfm?topic=6&lang=E&dguid=2021S0503505&objectId=5 (accessed on 2 January 2025).
Tagliamonte, S. (2002). Variation and change in the British relative marker system. In P. Poussa (Ed.), Relativisation on the North Sea littoral (pp. 147–165). Lincom Europa. [Google Scholar]
Tagliamonte, S., Smith, J., & Lawrence, H. (2005). No taming the vernacular! Insights from the relatives in northern Britain. Language Variation and Change, 17(1), 75–112. [Google Scholar] [CrossRef]
Tarallo, F. (1983). Relativization strategies in Brazilian Portuguese [Unpublished doctoral dissertation, University of Pennsylvania]. [Google Scholar]
Thomason, S. G. (2001). Language contact: An introduction. Edinburgh University Press. [Google Scholar]
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press. [Google Scholar] [CrossRef]
Torres Cacoullos, R., & Travis, C. (2018). Bilingualism in the community: Code-switching and grammars in contact. Cambridge University Press. [Google Scholar] [CrossRef]
Tottie, G., & Harvie, D. (2000). It’s all relative: Relativization strategies in early African American English. In S. Poplack (Ed.), The English history of African American English (pp. 198–230). Wiley-Blackwell. [Google Scholar]
Van Lieburg, R., Hartsuiker, R., & Bernolet, S. (2023). The production preferences and priming effects of Dutch passives in Arabic/Berber-Dutch and Turkish-Dutch heritage speakers. Bilingualism: Language and Cognition, 26, 695–708. [Google Scholar] [CrossRef]
Weinreich, U., Labov, W., & Herzog, M. (1968). Empirical foundations for a theory of language change. In W. P. Lehmann, & Y. Malkiel (Eds.), Directions for historical linguistics (pp. 97–195). University of Texas. [Google Scholar]
White, L. (2003). Second language acquisition and universal grammar. Cambridge University Press. [Google Scholar] [CrossRef]
Wiechmann, D. (2015). Understanding relative clauses: A usage-based view on the processing of complex constructions. De Gruyter Mouton. [Google Scholar] [CrossRef]
Yip, V., & Matthews, S. (2007). Relative clauses in Cantonese-English bilingual children: Typological challenges and processing motivations. Studies in Second Language Acquisition, 29(2), 277–300. [Google Scholar] [CrossRef]

Table 1. Distribution of L2 speakers by age and sex.

	19–33 Years	50–77 Years	Total N
Females	10	7	17
Males	12	0	12
	22	7	29

Table 2. Cumulative English Proficiency Index (score ranges shown in parentheses).

	No. of Speakers	% of Sample
Low (0.450–0.494)	3	10
Mid-low (0.575–0.694)	9	31
Mid-high (0.700–0.788)	11	38
High (0.800–0.863)	6	21
Total	29

Table 3. Distribution of relative clauses according to the syntactic role of the relativized NP.

	L2 Speakers		TL Speakers
Syntactic Position of Relativized NP	N	%	N	%
Subject	448	55%	444	53%
Object	274	34%	284	34%
Oblique	85	10.5%	114	13.5%
Genitive	2	0.2%	0	0%
Total	809		842

Table 4. Overall distribution of relative clause marking strategies in the L2 and TL corpora.

	L2 Speakers		TL Speakers
Variant	N	%	N	%
That	552	68%	498	59%
Zero	191	24%	185	22%
Who	63	8%	157	19%
Which	1	0.1%	2	0.2%
Whose	2	0.2%	0	0%
Total	809		842

Table 5. Distribution of marking strategies in subject and non-subject relative clauses in the L2 and TL corpora.

	L2 Speakers				TL Speakers
	Subject Relatives		Non-Subject Relatives		Subject Relatives		Non-Subject Relatives
Variant	N	%	N	%	N	%	N	%
That	369	82%	183	51%	285	64%	213	53.5%
Zero	15	3%	176	49%	5	1%	180	45%
Who	63	14%	0	0%	153	34.5%	4	1%
Whose	0	0%	2	0.6%	0	0%	0	0%
Which	1	0.2%	0	0%	1	0.2%	1	0.3%
Total	448		361		444		398

Table 6. Distribution of marking strategies in subject and non-subject relative clauses in the L2 corpus according to L2 proficiency.

	Low/Mid-Low Proficiency (CEPI Range = 0.450–0.694)				Mid-High to High Proficiency (CEPI Range = 0.700–0.863)
	Subject Relatives		Non-Subject Relatives		Subject Relatives		Non-Subject Relatives
Variant	N	%	N	%	N	%	N	%
That	144	83%	72	49%	225	82%	111	52%
Zero	5	3%	74	51%	10	4%	102	47%
Who	24	14%	0	0%	39	14%	0	0%
Whose	0	0%	0	0%	0	0%	2	1%
Which	1	1%	0	0%	0	0%	0	0%
	174		146		274		215

Table 7. Distribution of relative markers according to the animacy of the antecedent NP in the L2 and TL datasets.

	L2 Speakers				TL Speakers
	Human Antecedent		Inanimate Antecedent		Human Antecedent		Inanimate Antecedent
Variant	N	%	N	%	N	%	N	%
That	220	74%	139	99%	123	44%	159	99%
Zero	14	5%	1	1%	5	2%	0	0%
Who	63	21%	0	0%	152	54%	0	0%
Which	0	0%	1	1%	0	0%	1	1%
	297		141		280		160

Table 8. Rbrul of the contribution of independent predictors to the selection of that and who in subject relative clauses in the TL Corpus of English (Notes: N/A = not applicable; values of 0 or 100 in the % columns indicate invariant contexts)6.

THAT					WHO
Input probability	0.606				0.394
Log likelihood	−257.744				−257.744
R2 total	0.196				0.196
N	267/419				152/419
	LO	FW	%	N	LO	FW	%	N
Matrix construction
copula	0.325	0.581	68	162	−0.325	0.419	32	162
lone head NP	0.178	0.544	67	57	−0.178	0.456	33	57
other	0.093	0.523	63	151	−0.093	0.477	37	151
stative possessive	−0.597	0.355	49	49	0.597	0.645	51	49
Adjacency
adjacent	0.214	0.553	65	352	−0.214	0.447	35	352
non-adjacent	−0.214	0.447	58	67	0.214	0.553	42	67
Type of antecedent NP
pronoun	0.175	0.544	69	62	−0.175	0.456	31	62
definite NP	0.054	0.514	71	117	−0.054	0.486	29	117
indefinite NP	−0.299	0.443	59	240	0.229	0.557	41	240
Animacy of antecedent
human	N/A	N/A	43	265	N/A	N/A	57	265
inanimate	N/A	N/A	100	154	N/A	N/A	0	154
Length of relative clause
1–3 words	−0.032	0.492	64	135	0.032	0.508	36	135
4+ words	0.032	0.508	63	284	−0.032	0.492	37	284

Table 9. Rbrul of the contribution of independent predictors to the selection of that and who in subject relative clauses in the L2 Corpus of English (Notes: (i) N/A = not applicable; (ii) values of 0 or 100 in the % columns indicate invariant contexts; (iii) bolded numbers indicate re-ordering of individual constraints vis-à-vis those in the TL baseline).

THAT				WHO
Input probability	0.785				0.215
Log likelihood	−142.499				−142.499
R2 total	0.238				0.238
N	242/305				63/305
	LO	FW	%	N	LO	FW	%	N
Matrix construction
copula	0.356	0.588	83	126	−0.356	0.412	17	126
other	0.273	0.568	80	105	−0.273	0.432	20	105
lone head NP	−0.031	0.492	76	29	0.031	0.508	24	29
stative possessive	−0.598	0.355	69	45	0.598	0.645	31	45
Adjacency
adjacent	−0.243	0.440	78	246	0.243	0.560	22	246
non-adjacent	0.243	0.560	86	59	−0.243	0.440	14	59
Type of antecedent NP
indefinite noun	0.411	0.601	82	173	−0.411	0.399	18	173
definite noun	0.192	0.548	81	78	−0.192	0.452	19	78
pronoun	−0.602	0.354	69	54	0.603	0.646	32	54
Animacy of antecedent
human	N/A	N/A	70	210	N/A	N/A	30	210
inanimate	N/A	N/A	100	95	N/A	N/A	0	95
Length of relative clause
1–3 words	0.094	0.524	81	101	−0.094	0.476	19	101
4+ words	−0.094	0.476	78	204	0.094	0.524	22	204
Proficiency
mid-high to high	0.132	0.533	81	208	−0.131	0.467	19	208
low to mid-low	−0.132	0.467	75	97	0.131	0.533	25	97

Table 10. Rbrul of the contribution of independent predictors to the selection of that and zero in non-subject relative clauses in the TL Corpus of English (Note: grey shading indicates predictors selected as statistically significant).

THAT					ZERO
Input probability	0.919				0.081
Log likelihood	−237.114				−237.114
R2 total	0.223				0.223
N	207/375				168/375
	LO	FW	%	N	LO	FW	%	N
Subject of rel. clause (p < 0.0022)
noun	1.247	0.777	93	15	−1.247	0.223	7	15
pronoun	−1.247	0.223	54	360	1.247	0.777	46	360
Matrix construction (p < 0.0494)
stative possessive	0.552	0.635	71	21	−0.552	0.365	29	21
lone head NP	0.082	0.520	64	56	−0.082	0.480	36	56
other	−0.082	0.480	59	123	0.082	0.520	42	103
copula	−0.552	0.365	48	175	0.552	0.635	52	175
Adjacency (p < 0.0011)
adjacent	−0.849	0.300	53	349	0.849	0.700	47	349
non-adjacent	0.849	0.700	85	26	−0.849	0.300	15	26
Type of antecedent NP
indefinite NP	0.317	0.579	64	129	−0.317	0.421	36	129
pronoun	−0.122	0.470	51	57	0.122	0.530	49	57
definite NP	−0.195	0.451	50	189	0.195	0.549	50	189
Animacy of antecedent
human	0.223	0.556	62	82	−0.223	0.444	38	82
inanimate	−0.233	0.444	53	293	0.233	0.556	47	293
Length of relative clause
2–4 words	−0.098	0.475	52	233	0.098	0.524	49	233
5+ words	0.098	0.525	61	142	−0.098	0.476	39	142

Table 11. Rbrul of the contribution of independent predictors to the selection of that and zero in non-subject relative clauses in the L2 Corpus of English (Notes: (i) grey shading indicates predictors selected as statistically significant; (ii) bolded numbers indicate re-ordering of individual constraints vis-à-vis those operative in the TL baseline).

THAT					ZERO
Input probability	0.841				0.159
Log likelihood	−204.591				−204.591
R2 total	0.352				0.352
N	178/349				171/349
	LO	FW	%	N	LO	FW	%	N
Sub. of relative clause (p < 0.00457)
noun	0.972	0.726	79	14	−0.973	0.274	21	14
pronoun	−0.972	0.274	50	335	0.973	0.726	50	335
Matrix construction (p < 0.0307)
stative possessive	0.784	0.687	72	29	−0.785	0.313	28	29
other	0.193	0.548	54	124	−0.193	0.452	46	124
copula	−0.198	0.451	49	150	0.198	0.549	51	150
lone head NP	−0.780	0.314	37	46	0.780	0.686	63	46
Adjacency (p < 4.18 × 10⁻⁵)
adjacent	−1.083	0.253	47	317	1.083	0.747	53	317
non-adjacent	1.083	0.747	88	32	−1.083	0.253	13	32
Type of antecedent NP (p < 3 × 10⁻⁴)
indefinite NP	0.777	0.685	65	137	−0.777	0.315	35	137
definite NP	−0.068	0.483	42	166	0.068	0.517	58	166
pronoun	−0.709	0.330	44	46	0.709	0.670	57	46
Animacy of antecedent
human	0.150	0.537	54	67	−0.150	0.463	46	67
inanimate	−0.150	0.463	50	282	0.150	0.537	50	282
Length of relative clause
2–4 words	−0.198	0.451	45	206	0.198	0.549	55	206
5+ words	0.198	0.549	60	143	−0.198	0.451	40	143
Proficiency
mid-high to high	0.032	0.508	52	205	−0.032	0.492	48	205
low to mid-low	−0.032	0.492	50	144	0.032	0.508	50	144

Table 12. Distribution of restrictive relative markers in the L1 French corpus.

Variants	N	%
Qui	482	66%
Qu(e)	234	32%
Zero	10	1.4%
Lequel(s)/laquelle(s)	7	1%
Dont	1	0.1%
	734

Table 13. Distribution of preposition placement strategies in oblique relative clauses in the L1 Corpus.

Variants	N	%
Null-preposition	30	57%
Pied-piping	12	23%
Stranding	9	17%
Other	2	4%
	53

Table 14. Distribution of preposition placement strategies in oblique relative clauses in the L2 and TL datasets.

	L2 Speakers		TL Speakers
Strategy	N	%	N	%
Preposition stranding	75	88%	111	98%
Null-preposition	10	12%	2	2%
Total	85		113

Table 15. Distribution of stranded prepositions according to lexical identity in L2 and TL speech (Note: grey shading indicates shared lexical forms that account for more than 5% of the data in each variety).

L2 Speakers			TL Speakers
Preposition	N	%	N	%
with	23	31%	43	39%
in	12	16%	16	14%
on	9	12%	9	8%
to	9	12%	14	13%
of	7	9%	7	6%
at	5	7%	4	4%
about	3	4%	6	5%
for	3	4%	5	5%
into	2	3%	3	3%
from	1	1%	0	0%
off of	1	1%	0	0%
behind	0	0%	1	1%
by	0	0%	1	1%
past	0	0%	1	1%
through	0	0%	1	1%
Total	75		111

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Levey, S.; Rochon, K.L.; Kastronic, L. Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses. Languages 2025, 10, 232. https://doi.org/10.3390/languages10090232

AMA Style

Levey S, Rochon KL, Kastronic L. Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses. Languages. 2025; 10(9):232. https://doi.org/10.3390/languages10090232

Chicago/Turabian Style

Levey, Stephen, Kathryn L. Rochon, and Laura Kastronic. 2025. "Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses" Languages 10, no. 9: 232. https://doi.org/10.3390/languages10090232

APA Style

Levey, S., Rochon, K. L., & Kastronic, L. (2025). Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses. Languages, 10(9), 232. https://doi.org/10.3390/languages10090232

Article Menu

Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses

Abstract

1. Introduction

2. Theoretical Considerations

3. Data and Choice of Speakers

4. Method

5. Results

6. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI