Next Article in Journal
Partition by Exhaustification and Polar Questions in Vietnamese
Previous Article in Journal
The Dual Functions of Adaptors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses

1
Department of Linguistics, University of Ottawa, 70 Laurier Ave East, Ottawa, ON K1N 6N5, Canada
2
Atlantic Canada Opportunities Agency, 644 Main St, Moncton, NB E1C 9J8, Canada
*
Author to whom correspondence should be addressed.
Languages 2025, 10(9), 232; https://doi.org/10.3390/languages10090232
Submission received: 4 July 2025 / Revised: 2 September 2025 / Accepted: 5 September 2025 / Published: 10 September 2025

Abstract

We argue that quantitative analysis of community-based speech data furnishes an indispensable adjunct to theoretical and experimental studies targeting the acquisition of relativization. Drawing on a comparative sociolinguistic approach, we make use of three corpora of natural speech to investigate second-language (L2) speakers’ acquisition of restrictive relative clauses in English. These corpora comprise: (i) spontaneous L2 speech; (ii) a local baseline variety of the target language (TL); and (iii) L2 speakers’ first language (L1), French. These complementary datasets enable us to explore the extent to which L2 speakers reproduce the discursive frequency of relative markers, as well as their fine-grained linguistic conditioning, in the local TL baseline variety. Comparisons with French facilitate exploration of possible L1 transfer effects on L2 speakers’ production of English restrictive relative clauses. Results indicate that evidence of L1 transfer effects on L2 speakers’ restrictive relative clauses is tenuous. A pivotal finding is that L2 speakers, in the aggregate, closely approximate TL constraints on relative marker selection, although they use the subject relativizer who significantly less often than their TL counterparts. We implicate affiliation with, and integration into, the local TL community as key factors facilitating the propagation of TL vernacular norms to L2 speakers.

1. Introduction

Relative clause constructions have attracted sustained scholarly interest for over half a century (Roland et al., 2007), addressing a broad range of questions in first language acquisition (e.g., Diessel & Tomasello, 2000, 2005); second and third language acquisition (e.g., Flynn et al., 2004); adult sentence processing (e.g., Gibson, 1998); language typology (e.g., Keenan & Comrie, 1977); as well as grammatical theory (e.g., Alexiadou et al., 2000).1 Much research and theorizing addressing the acquisition of those constructions is articulated from a cognitive perspective (Doughty & Long, 2003), buttressed by an experimental infrastructure involving grammaticality judgements, sentence combinations tasks, elicitation and act out tasks, among a host of other techniques.
In contrast with experimental research on relativization, the bulk of which has targeted perceptual issues relating to processing and decoding strategies (Romaine, 1984; Macdonald, 2015), corpus-based investigations of relative clauses in natural speech production data are much less common (but see e.g., Diessel & Tomasello, 2000; Ghafar Samar, 2000; Yip & Matthews, 2007). The fact that relative clauses are rare in running discourse (Milroy & Gordon, 2003) may explain the predilection for data elicited and analyzed in experimental conditions, although the extent to which findings generated in tightly controlled laboratory settings accurately reflect what transpires in real-world contexts remains a moot point (Jaeger, 2010; Speed et al., 2018). These concerns are symptomatic of the very issues that inspired the current research, in keeping with our goal of contributing to socially sensitive and ecologically valid models of L2 acquisition. We argue that the sociolinguistic investigation of everyday speech is a necessary complement to experimental research on L2 acquisition, as it can refine and subtly enhance our understanding of how relative clauses are acquired, precisely because such an approach engages with, rather than abstracts away from, the inherent variability endemic to natural speech situated in its social context (Labov, 1972a). Because this variability provides critical insights into the nature, extent, and limits of the L2 acquisition process, its correct characterization is of paramount importance in constructing theories of L2 acquisition that are accountable to actual usage facts.
Major incentives for corpus-based studies of relativization have come from constructivist and usage-based investigations of language acquisition (see e.g., Diessel & Tomasello, 2000, 2005; Fox & Thompson, 2007; Wiechmann, 2015). These studies have catalyzed interest in arriving at a better understanding of the relative clause constructions that language learners encounter and acquire as a result of exposure to spontaneous speech. If, as usage-based theories posit, grammatical knowledge is predicated on speakers’ linguistic experience (Bybee, 2010), then it follows that the community-based speech varieties of the TL to which L2 speakers are exposed have the potential to afford insights into the structural biases in the TL input that intimately shape the acquisition process. Both naturalistic and experimental research on L1 acquisition has been instrumental in highlighting the effect of input frequencies on the development of relative clause constructions in child language (Lieven, 2010). As we show below, relative clause constructions in L2 acquisition are impacted by TL input frequency patterns too (see also Mellow, 2006).
Among the key motivations for applying corpus-based approaches to informal social settings is the need to extend the purview of L2 research beyond formal language-learning environments to less formal ones (Bayley & Tarone, 2012), including naturalistic contexts, where the amount, frequency and type of input which learners encounter is said to be far less restricted than in classroom settings (Montrul, 2020). Sankoff et al. (1997, p. 193) argue that if L2 acquisition is more than the product of successful classroom-based learning, then it should exhibit properties of the TL vernacular which are not ordinarily transmitted to learners in academic settings, but are internalized by L2 speakers who have a high degree of contact with the TL community. The quantitative approach employed in the present study is ideally suited to elucidating vernacular patterns in L2 speech and ascertaining whether they are the product of vernacular transmission from the TL, the result of processes that are unique to L2 speech (i.e., interlanguage grammar; see Meyerhoff & Schleef, 2012), or the outcome of transfer effects from L2 speakers’ native language.
Of the different kinds of relative clause that have attracted scholarly attention, the ones we privilege here are restrictive relative clauses. Previous treatments have sought to determine whether the L2 acquisition of these constructions is subject to cross-linguistic influence (e.g., Gass, 1979; Ghafar Samar, 2000; Rochon, 2023), and whether the L2 acquisition of English relative clauses is sensitive to the gradience of difficulty associated with the noun phrase accessibility hierarchy originally posited by Keenan and Comrie (1977). With the exception of Ghafar Samar (2000) and Rochon (2023), however, dedicated variationist investigations of the L2 acquisition of restrictive relativization, based on natural production data, are all but non-existent. One of the goals of the current study is to address this lacuna.
Following Huddleston and Pullum (2002, p. 1035), we construe restrictive relative clauses as ones which delimit the denotational reference of the head nominal they modify. These constructions, reproduced from the discourse of L2 speakers of Canadian English, are exemplified in (1)–(3) below:
(1)
There- there’s one guy that literally laughed at me the first time he saw me (L2/002/428)
(2)
It was literally the best decision Ø I’ve made (L2/004/33)
(3)
And I’m a person who likes to sleep in the morning (L2/013/324)2
We target the variable strategies, alternating between that, zero and WH-forms (see (1)–(3) above), for marking restrictive relative clauses, drawing on vernacular speech. This type of speech is deemed to be the style “which is most regular in its structure” (Labov, 1972a, p. 112), offering “the most systematic data for linguistic analysis” (Labov, 1984, p. 29). As such, it is particularly valued for its potential to reveal community norms. We emphasize that a community focus is critical to the investigation of restrictive relativization because the marking system of restrictive relative clauses in English is considered to be “notoriously variable” (Britain, 2020, p. 95). Inter-community differences in marking preferences are believed to indicate the general absence of a vernacular ‘norm’ in the constitution of the restrictive relative marker paradigm, as well as in terms of the distribution of the markers themselves (see Ball, 1996, p. 243).
A major corollary that ensues from the heterogeneous marking system of restrictive relative clauses in English is that it is essential to establish exactly what L2 speakers are exposed to, rather than simply intuiting the nature of the input (see also Tomasello, 2003, p. 112). A cornerstone of the present investigation is the detailed comparative framework we bring to bear on the L2 acquisition of relative clauses, enabling us to examine them from multiple vantage points. Our research design incorporates three complementary datasets: one representing spontaneous L2 English recorded from Canadian francophones in the Canadian Nation Capital Region between 2018 and 2022; a second corpus of vernacular speech recorded from native Canadian anglophones in the same locality, representing a local baseline variety of the TL; and a third corpus of vernacular Canadian French obtained from a subset of the L2 speakers we recorded.
Our comparative framework enables us to address: (i) whether L2 speakers of English use the same relative markers (or relativizers) as TL speakers to introduce restrictive relative clauses; (ii) whether L2 speakers use individual relative markers at rates which match their discursive frequency in the local TL variety; and, crucially, (iii) whether L2 speakers reproduce in whole, or in part, the fine-grained linguistic conditioning governing variable relativizer selection in the corresponding TL baseline variety (see Rehner & Mougeon, 2022). We stress that our comparison of L2 and TL speaker cohorts is intended as a heuristic only (see White, 2003), and is not meant to imply that the L2 system is an “incomplete” or “lesser” version of its TL counterpart (Bley-Vroman, 1983), or, indeed, that native-like mastery of TL grammar is necessarily the desired target for every L2 speaker (see Nagy et al., 2003).
Another key component of our research design involves systematic comparisons of L2 speech and L1 French, enabling us to identify and characterize any evidence of cross-linguistic influence from speakers’ native French on their L2 restrictive relativization strategies. Relativization is considered to be “a vulnerable area” for contact effects (Muysken, 2012, p. 238), and this possibility is believed to be enhanced when contact varieties share multiple typological similarities (Thomason, 2001), as is the case in the present study. If transfer effects are operative in the L2 speech investigated here, our comparative variationist framework should enable us to detect them.

2. Theoretical Considerations

Our investigation is based on the premise that speech is inherently variable, yet rule-governed and structured (Weinreich et al., 1968). The major analytical construct at the heart of variationist sociolinguistics is the linguistic variable, defined as alternative ways of expressing the same referential meaning or a similar grammatical function (Labov, 1972a). A key methodological requirement in defining the variable context hosting competing variants involves the commitment to accountable reporting (Labov, 1972a). This commitment enjoins the analyst to take into consideration all the relevant forms within the same envelope of variation, including variants that are normatively sanctioned as well as those that are not.
Inspection of experimental research designs often reveals an implicit reliance on the (highly idealized) norms of the standard language as the primary source of information about the canonical format and contexts of use of relative clauses in the target variety under investigation (see Ghafar Samar, 2000; Levey, 2014; Romaine, 1984). An internalized normative predisposition has led experimentalists and theoreticians to claim, for example, that the zero relativizer is only possible in English when the embedded sentence is a non-subject relative clause (e.g., the woman Ø I saw) and that the same variant is ungrammatical in any syntactic environment in French (Hawkins, 1989, p. 162). Similarly, French oblique relative clauses (e.g., la maison dans laquelle j’habite ‘the house in which I live’) are said to categorically require pied-piping, comprising a preposition followed by an overtly-expressed relative pronoun (Duffeler, 2017, pp. 16, 60; Guasti & Shlonsky, 1995, p. 262). In the same vein, preposition stranding in French relative clauses, a shibboleth of North American varieties, is deemed not possible (Labelle, 1990, p. 101). Yet inspection of actual usage data, as shown below, is at odds with such claims. Failure to respect the principle of accountability (Labov, 1972a) restricts our understanding of the acquisition process because it places predetermined limits on the forms and constructions which learners are believed to be exposed to.
Perhaps the most important consequence of accountable reporting resides in the window it affords on the structure of variation. Variation is habitually constrained by multiple factors relating to both the linguistic and social contexts in which variable features compete. The structured nature of the variable system underlying this competition can be inferred from variant distributions and their associated conditioning (i.e., the configuration of social and linguistic factors governing their selection). The various structural factors that determine variant selection operate probabilistically and are therefore amenable to quantitative analysis. Using statistical modelling, the relative contribution of an individual factor to variant choice (e.g., the contribution of an inanimate antecedent NP to the selection of a particular relative marker) is expressed as a probability value. Within a particular factor group, or predictor, the ordering of probability values from largest to smallest constitutes the hierarchy of constraints. When interpreted in the aggregate, the hierarchy of constraints associated with the multiple factors conditioning variant choice functions as a snapshot of the ‘grammar’ or structure underlying variable surface realizations (Poplack, 2011, p. 215).
To assess the L2 acquisition of restrictive relative clauses in the TL baseline variety, we compare variable structure across L2 and TL speaker cohorts. Of particular importance in this comparative exercise are conflict sites, or areas of functional, structural and/or quantitative differences between comparison varieties (Poplack & Meechan, 1998, p. 132). Because conflict sites tend to be variety-specific, they can be used to assess the extent of quantitative and qualitative differences between L2 and TL speech and exploited as fine-grained diagnostics of L2 acquisition.
Acquisition of TL norms by L2 speakers necessarily entails reproduction, or, perhaps more realistically, approximation of TL variable patterns (Geeslin & Long, 2014). The magnitude of this learning task is such that L2 speakers are routinely claimed to fall short of replicating in fine detail the full suite of constraints operating on variable phenomena in the corresponding TL baseline variety. Evidence to that effect typically emerges from the use of non-target-like forms and constructions, and/or from the incomplete acquisition, or reconfiguration, of TL usage constraints (see Howard et al., 2013; Schleef et al., 2011; Schleef, 2017). Reconfiguration of TL usage constraints by L2 speakers, visible in the re-ordering of probability values in the hierarchy of constraints vis-à-vis the order found in the TL baseline, may be diagnostic of the variable learner system, or interlanguage (Selinker, 1972), which develops during the process of acquiring a second language. The variationist approach we utilize here, capable of detecting even minor adjustments in constraint hierarchies conditioning variant choice, enables different learning outcomes to be accurately discriminated.
Among the key determinants of the outcomes of L2 acquisition, extra-linguistic factors figure prominently. These factors typically encompass individual L2 aptitude and proficiency, as well as an array of psychological, motivational and attitudinal considerations (Howard et al., 2013). Among the extra-linguistic parameters that we pay particular attention to here are L2 proficiency as well as L2 speakers’ contact with the local TL community. Since we are targeting here a longstanding bilingual community, characterized by protracted contact between French and English, our working hypothesis is that the context of L2 acquisition amply satisfies the social, cultural and linguistic preconditions believed to be optimal for establishing membership in the TL community. Sustained exposure to and engagement with the TL community are widely recognized to be conducive to high levels of L2 proficiency, enabling L2 learners to attain near-native-like levels of use of TL variables (see Blondeau et al., 2002; Howard et al., 2013; Sankoff et al., 1997).
Recognizing, however, that even relatively advanced L2 speaker groups may subsume a range of individual proficiency levels, we turn in the following section to sampling considerations and our methods for assessing L2 speaker abilities, enabling proficiency to be factored into our research design.

3. Data and Choice of Speakers

The community-based speech data we analyze here were collected in the Canadian National Capital Region including the cities of Ottawa (Ontario) and Gatineau (Quebec). As the site of prolonged contact between English and French, this metropolitan area is deemed a “natural laboratory for language contact” (Poplack, 1989, p. 413). On one side of the provincial border in Ontario, where English is the majority language, approximately 58% of the population of the city of Ottawa claim English as a mother tongue, contrasting with 12.5% French mother-tongue claimants (Statistics Canada, 2021). Conversely, on the other side of the provincial border in Quebec, where French is the designated majority (and official) language, the city of Gatineau comprises 71% French mother-tongue speakers, with some 12% of residents declaring English as their mother tongue (Statistics Canada, 2021).
A fundamental requirement underpinning our compilation of a corpus of spontaneous L2 English speech was that L2 speakers should be sampled from the local native francophone population in the Canadian National Capital Region. They were also expected to have acquired Canadian French as their primary language in childhood from native francophone parents/caregivers. A further requirement was that native francophones should have completed their mandatory schooling in French-speaking educational establishments. In line with our dedicated focus on L2 acquisition, speakers who had been raised bilingually (i.e., French and English) from birth were not eligible for inclusion in the study.
Between 2018 and 2022, we recorded a total of 29 speakers meeting our sampling requirements. Table 1 below shows the distribution of sample members by age and speaker sex.
A self-report language background questionnaire was administered to all L2 speakers in order to gather key information relating to their acquisition of English as a second language, and to assess their degree of contact with the local anglophone TL community. Information abstracted from this questionnaire was used to develop a comprehensive profile characterizing each speaker’s acquisitional history by examining their exposure to formal instruction in English; the language (French, English) used most often in daily life; frequency of English-language use at home, at work and in neighbourhood of residence; as well as the estimated proportion of anglophones in personal social networks.
Most speakers reported having begun formal instruction in English during grades three or four (i.e., between the ages of 8 and 10) of their mandatory schooling. At the time of the recordings, only three speakers reported that they used English more often than French in their daily lives, especially in work or study environments. English language use was least commonly reported in domestic settings, where French prevailed.
Only two L2 speakers, constituting just 7% of the sample, reported having no anglophones in their personal social networks. By contrast, fifty-five per cent percent of L2 speakers (N = 16) reported that native anglophones comprised 50% or more of their individual social networks, with a further 28% (N = 8) estimating that 25–50% of their social networks were made up of anglophones.
Also of relevance to the contact dimension is the fact that 20 speakers (69% of the sample) resided in neighbourhoods in Ottawa, located on the Ontario side of the provincial border, where English is the majority language. This inevitably resulted in some degree of exposure to, and interaction with, anglophones, regulated by the varying ratios of francophones to anglophones in individual neighbourhoods of residence (see Poplack, 2018a, pp. 31–33).
One general proficiency requirement imposed at the outset on all L2 speakers was the ability (and willingness) to participate in a recorded sociolinguistic interview, the standard methodological tool used for eliciting lengthy extracts of casual speech (Labov, 1984). Recordings were conducted in English with a native English-speaking interviewer, who introduced topics of interest to L2 speakers as the basis for extended discussion. The interview protocol was expressly intended to encourage L2 speakers to take the lead in the interaction, with minimal intervention from the interviewer. No other data-gathering instruments were used, allowing L2 speakers the freedom to express themselves as they pleased, and to use vernacular structures as little or as much as desired. Interviews lasted an average of 55 min, testifying to relatively elevated levels of English-language fluency in the L2 speaker cohort. The recorded data were subsequently transcribed, culminating in the creation of a fully searchable corpus of natural L2 data comprising some 277,000 words.
Capitalizing on procedures innovated and elaborated in previous language contact research (Poplack, 2018a; Torres Cacoullos & Travis, 2018), we computed a Cumulative English Proficiency Index (CEPI) score for each L2 speaker. Individual CEPI scores, to be interpreted relative to each other rather than as absolute, global indices of proficiency, were calculated from: (i) speaker self-assessments of English-language proficiency targeting production and comprehension; (ii) scalar responses to questions concerning contextual and situational uses of English (e.g., at home, at work, in the local neighbourhood of residence, for the purposes of socializing, etc.); and (iii) content analysis of L2 speaker production data.
Scalar responses relating to English-language proficiency and contextual uses of English were calculated by assigning a score from zero to ten for each assessed category. Content analysis of speech production data focused on discrete-point measures of lexical usage, including word-searching difficulties (e.g., where L2 speakers overtly indicated that they could not find the ‘appropriate’ English word), in addition to targeting morpho-syntactic features such as the variable omission of the English plural affix −s and the possessive −s morpheme, as well as the variable inflexion of present-tense verbs in the third person. We construe the non-negligible use of such variable features (absent from the TL baseline variety) to reflect developmental characteristics of L2 speakers’ interlanguage. Mean scores based on content analysis of each individual’s transcribed recording were calculated for each speaker. A speaker who made non-negligible use of an array of interlanguage features, as described above, would score less highly in terms of content analysis than one who had a more limited repertoire and used those features less frequently in their discourse.
Cumulative proficiency scores were derived for each L2 speaker from the various (weighted) complementary measures described in (i) to (iii) above. The resultant scores range from a low of 0.450 to a high of 0.863. These scores enabled us to subdivide the L2 sample into four unevenly constituted proficiency bands, as shown in Table 2 below.
In keeping with our comparative focus, we make use of two additional datasets, both collected using a sociolinguistic interview protocol targeting similar conversational topics that were employed in the compilation of the L2 corpus.
The first dataset, the Ottawa English Corpus (OEC), was compiled between 2008 and 2010 and comprises natural speech data recorded from 37 native adult anglophones residing in the Canadian National Capital Region. Amounting to some 273,000 words, this fully transcribed dataset serves as the local TL baseline variety with which L2 speech is compared.
The second dataset is based on vernacular Canadian French recorded from a sub-sample (N = 20) of the total native francophone population (N = 29) who contributed to the L2 English corpus. The fully transcribed French corpus comprises some 228,000 words of running discourse.
We use the French language corpus to explore the potential impact of L1 influence on L2 speakers’ production of restrictive relative clauses. The restrictive relativization systems of French and English share a set of partially corresponding variant forms (e.g., the WH-markers qui/who, which; complementizer que/that) as well as similar—though not identical—contexts of variant use. These partial structural and functional correspondences are believed to be conducive to transfer effects. Furthermore, the sociolinguistic context we target here, exhibiting high levels of bilingualism as well as intense and prolonged language contact, meets all the commonly invoked criteria reported to promote cross-linguistic influence.3

4. Method

All relative clause constructions in the TL, L2 and L1 datasets were manually located in the corpora described above by reading through the transcribed data in their entirety and performing cross-checks with the original audio-files, where necessary. This procedure ensured that all overtly marked restrictive relative clauses, as well as those introduced by a zero or null relativizer, were correctly identified.
All eligible tokens were subsequently extracted and imported into Excel files where they were coded for a number of key predictors hypothesized to influence relative marker selection (see, e.g., Tagliamonte et al., 2005; Wiechmann, 2015). Relative clause constructions falling outside the envelope of variation, as we have defined it, were excluded from the analysis (e.g., non-restrictive relative clauses, adverbial relative clauses, etc.).
To test a number of hypotheses relating to potential constraints on relative marker choice, we incorporated a number of predictors into our study design. A major predictor of relative marker choice relates to the syntactic position or function of the relativizer in the relative clause (Ball, 1996; Romaine, 1982). We distinguished relative clauses in which the relativized element is the subject of the relative clause from those where the relativized element is in non-subject position (i.e., direct object, oblique or object of a preposition, and genitive or possessive). These distinctions allow us to examine whether the use of relative clauses correlates with the typological generalizations posited by Keenan and Comrie (1977). The essence of these generalizations is that there is a hierarchy of grammatical positions (subject > direct object > oblique > genitive) correlating with increasing difficulty (and diminishing frequency) of relativization, such that positions lower down the hierarchy (e.g., oblique) are more difficult to relativize (and correspondingly less frequent)—possibly as a result of working memory and linear processing constraints (e.g., Gibson & Wu, 2013)—than positions further up the hierarchy, which are reportedly easier to relativize (and correspondingly more frequent).
Another major predictor considered to affect relative marker choice concerns the animacy of the antecedent NP in which the relative clause is embedded (D’Arcy & Tagliamonte, 2010; Guy & Bayley, 1995; Tagliamonte et al., 2005). In contrast with its French analogue, qui, which exhibits no sensitivity to the animacy of the head nominal post-modified by a subject relative clause, the English relative marker who encodes the semantic feature [+ human]. Relativizer which, by contrast, is said to be restricted to non-human antecedents, as is relativizer that (Guy & Bayley, 1995), although the use of that with human/non-human antecedents appears to vary significantly across communities (e.g., Tagliamonte, 2002). To examine the effects of animacy on relativizer marker choice, we operationalized a three-way distinction between human, non-human animate and inanimate heads.
Yet another predictor influencing relative marker selection concerns the type of antecedent post-modified by a relative clause. To assess the potential impact of this predictor, we employed a tripartite categorization system, distinguishing definite and indefinite nominal antecedents, as well as pronominal ones.
Matrix clause construction type has also been invoked in connection with relativizer choice. Earlier studies noted that the English zero subject relativizer, now considered “marginally non-standard” (Biber et al., 1999, p. 619), is most likely encountered when the relative clause is embedded in a matrix clause containing an existential-there construction (e.g., there’s a woman Ø wants to see you), a stative-possessive construction (e.g., I have a brother Ø knows him) or a cleft construction (e.g., it’s the upper-class people Ø live in this area) (see Biber et al., 1999; Tagliamonte, 2002). Our coding protocol took the aforementioned main clause constructions into consideration, as well as accounting for relative clauses embedded in isolated head NPs (e.g., people Ø I know), also reported to display distinct relative marker preferences (see Fox & Thompson, 2007).
It has been repeatedly observed that when the grammatical subject of a non-subject relative clause is a pronoun rather than a lexical NP, an overt relativizer is less likely to be present (e.g., Guy & Bayley, 1995; Levey & Hill, 2013). To detect the potential operation of this effect in our data, we distinguished cases where the grammatical subject of a non-subject relative clause is a pronoun from those where it is a full lexical noun phrase.
The two remaining predictors that we test, adjacency and relative clause length, address online processing constraints associated with syntactically complex constructions (see e.g., Rohdenburg, 1996).
With regard to adjacency, there is evidence indicating that the presence of intervening material between a relative clause and its antecedent head promotes the use of an overt relativizer to mitigate parsing difficulties (e.g., Guy & Bayley, 1995; Tottie & Harvie, 2000). Accordingly, we distinguished cases where the relative clause immediately follows its antecedent head NP, categorized as adjacent, from non-adjacent contexts where intervening material (including filled pauses and speech disfluencies) separates the relative clause from its head nominal.
Previous research suggests that the longer the relative clause, the greater the likelihood that an overt relative marker will be selected, whereas shorter relative clauses tend to favour the zero relative marker (Fox & Thompson, 2007). To ascertain any effect of clause length on relativizer selection, we initially counted the number of words in each relative clause, discounting the (variable) presence of the relative marker. Based on average length scores, we subsequently operationalized a binary division between shorter and longer relative clauses. Measures of clause length differ in subject and non-subject relative clauses because subject relative clauses can consist of just one word (i.e., a verb), whereas non-subject relative clauses minimally comprise two words (i.e., a subject and a verb).
In accordance with the comparative axis of our research, we also applied a modified version of the coding protocol to relative clause constructions extracted from L2 speakers’ L1, Canadian French. We pay particular attention to oblique relative clause constructions in Canadian French as these are the locus of variable marking strategies that exhibit noticeable structural and quantitative differences from their English counterparts. In the results section below, we return to those differences and their relevance to elucidating potential L1 transfer effects.

5. Results

Table 3 shows the distribution of relative clauses in L2 and TL discourse according to the syntactic position of the relativized NP. The distributional findings are entirely consistent with the typological generalizations associated with the noun phrase accessibility hierarchy (Keenan & Comrie, 1977), with subject position being the most amenable to relativization and the genitive the least. In fact, there are no genitive relative clauses in the TL data, and only two instances marked by whose in the L2 dataset. Further inspection of the data reveals the use of periphrastic or analytic constructions encoding a possessive function, as illustrated in (4)–(5) below from the L2 data.
(4)
no- not all of them, there’s a bunch of them that their parents were just forcing them to try and learn French (L2/008/424)
(5)
I mean like people that like their first language is English and speak French like yeah there’s like a huge difference (L2/026/388)
Although such analytic constructions in L2 discourse might seem initially to qualify as developmental or interlanguage phenomena, they are in fact attested in native English vernaculars (Hermann, 2003) and are consistent with an observed (cross-linguistic) tendency to promote NPs to higher positions on the noun phrase accessibility hierarchy that are more amenable to relativization (Keenan & Comrie, 1977).
We next consider the individual strategies that are used to mark relative clauses. Table 4 shows the very uneven distribution of relativizers in the L2 and TL datasets, respectively. The relative marker that is the lead variant in both datasets, occurring somewhat more frequently in L2 speech when contrasted with the TL baseline. Rates of the zero relativizer are almost equivalent in L2 and TL discourse. The WH-relativizers which and whose occur at minuscule rates and play no central role in the relativizer system used by either cohort. The only WH-relativizer that occurs to any significant extent in both datasets is who, although this marker accounts for a larger proportion of the variable context in the TL baseline than in the L2 data.
Because restrictive relativization in contemporary spoken English is widely believed to constitute a “syntactically partitioned system” (Brook & Tagliamonte, 2023, p. 27), effectively comprising alternations between that and who in subject relative clauses, and zero and that in non-subject ones (see also Meyerhoff et al., 2020), a more insightful picture of rate differences can be obtained by considering subject relative clauses separately from non-subject ones. Table 5 presents the results of such an analysis.
In terms of distributional parallels, competition between zero and that in non-subject relative clauses occurs at commensurate rates in the L2 and TL datasets. In subject relative clauses, by contrast, the only quantitative resemblance between the comparison groups pertains to the relatively low rates of the zero subject relativizer in each. Closer inspection of that variant in the L2 data reveals that it surfaces in similar syntactic contexts in the TL baseline, such as existential-there and stative-possessive constructions. Although relegated to a decidedly peripheral role in the marking of subject relative clauses in both datasets, the colloquial status, low frequency and patterning of the zero subject relativizer in L2 speech would seem to indicate that it is the product of vernacular transmission from the local TL variety.
A more complex issue relates to the differential rates of subject who in the L2 and TL data, respectively. What can explain the proportional discrepancies in the use of this variant by the comparison groups? The observed differences in Table 5 are certainly not consistent with any direct influence from L2 speakers’ native French, where as noted earlier (and see further below), the existence of qui (‘who/which’) is precisely the kind of interlingual parallel that would be expected to enhance, rather than impede, the L2 acquisition of relativizer who. We can also rule out the possibility that the overall rate of subject who in the TL baseline variety is exceptional. Comparison with other mainstream urban varieties of Canadian English reveals rates that are almost identical to the one reported for the TL variety in Table 5 (see e.g., Brook & Tagliamonte, 2023, p. 23 for Toronto English).
One possible explanation is that the aggregated L2 data in Table 5 may mask the impact of potential L2 proficiency differences on rates of relativizer who. Table 6, displaying variant inventories and distributions in the L2 data according to two broad proficiency levels, sheds light on this issue. Here we compare low/mid-low proficiency speakers (CEPI score range = 0.450–0.694) with their mid-high/high proficiency counterparts (CEPI score range = 0.700–0.863).
Variant rates in both comparison groups are very similar in non-subject relative clauses and almost identical in subject ones. The overall rate of who in subject relative clauses in both proficiency cohorts is exactly the same, confounding any expectation that proficiency offers a straightforward explanation of the lower incidence of who in L2 speech vis-à-vis the TL baseline.
Recognizing that surface parallels should not be equated with the functional isomorphy of form-based correspondences (Poplack et al., 2012, p. 223), a more exacting measure of the L2 acquisition of who requires us to consider whether those L2 speakers who use this variant do so in appreciably the same way as their TL counterparts. Recall that this sub-sector of the grammar qualifies as a “conflict site” (Poplack & Meechan, 1998), where there are differences in the semantic properties encoded by English who contrasted with its French counterpart, qui. The selection of who is determined by humanness of the antecedent head nominal, whereas qui remains unaffected by the animacy properties of the antecedent head NP. To investigate whether the L2 acquisition of who may involve non-target-like uses, we examine the distribution of subject relativizers according to the animacy properties of the antecedent head NP, as shown in Table 7.4
Table 7 shows that relativizer who is categorically used by L2 speakers with human antecedents, just as in the TL baseline variety, albeit at very different rates. The major distributional difference that emerges is that whereas who is the dominant variant used with human antecedents by TL speakers, it is relativizer that which is preferentially selected in the same context in the L2 data. Summarizing, to the extent that L2 speakers make use of who to mark subject relative clauses, they do so while categorically respecting the animacy constraint that is operative in the TL.5
We next examine the extent to which other environmental constraints on relative marker selection exhibit congruent or divergent patterns across the L2 and TL speaker cohorts. To conduct this comparison, we draw on mixed-effects regression analysis to examine the contribution of independent linguistic predictors, and proficiency in the case of L2 speakers, to the selection of different relative markers.
For statistical purposes, we employ Rbrul, a tool specifically developed for sociolinguistic research with the capacity to generate mixed-effects models (Johnson, 2009). In ensuing tables of the results, each speaker is run as a random effect to control for any individual speaker variance (Johnson, 2009). The numerical formalisms associated with the Rbrul output are to be interpreted as follows. The input probability is a measure of the overall likelihood that the relativizer in question will occur in the dataset. The log-likelihood value indicates the goodness of fit of the regression model to the dataset under consideration and the R2 value indicates the proportion of the variance explained by the model. Individual constraints on relativizer choice (i.e., that versus who in subject relative clauses; that versus zero in non-subject ones) are represented by the log-odds (LO) and the centred factor weights (FW). Log-odds with a positive value indicate that the factor shown on the left-hand side of the table has a favouring effect on relativizer choice, whereas those with a negative value exert a disfavouring effect. Centred factor weights have a similar interpretation: those above 0.5 favour relativizer selection, whereas those below 0.5 disfavour the relativizer in question.
The ordering of log-odds/centred factor weights (i.e., from most to least favouring) within an individual predictor, or factor group, constitutes the constraint hierarchy (or ranking). It is the constraint hierarchy, rather than the associated percentage values or total Ns, that remains key to interpreting any comparative analysis. Detailed examination of the constraint hierarchies conditioning variant choice yields the most penetrating characterization of variable structure (Poplack & Tagliamonte, 2001, p. 6) and can be used to gauge the extent to which L2 speakers approximate TL grammatical norms. Constraint hierarchies in L2 speech which resemble, or are broadly parallel to, their counterparts operating in TL speech furnish the most compelling evidence of the successful L2 acquisition of TL variable patterns, when interpreted in the aggregate.
Table 8 and Table 9 present the results of multifactorial regression analyses of predictors contributing to the selection of the relativizers that and who in subject relative clauses in the TL and L2 datasets, respectively. We exclude from our statistical analysis of subject relative clauses speakers who produced no instances of who.
No predictor is selected as significant in either table depicting the results for subject relative clauses. The absence of any significant effect associated with adjacency or relative clause length, two parameters intended to capture the operation of online processing constraints, strengthens our conviction that subject relative clauses are generally less problematic for L2 learners than non-subject ones (Gass, 1979).
Despite the absence of statistically significant effects, we can still compare variable structure across the datasets, as evidenced by the constraint hierarchies. Bolded probability co-efficients in the L2 data indicate permutations in the L2 constraint rankings vis-à-vis the corresponding hierarchy of effects in the TL baseline variety.
With regard to matrix clause construction type (where existential-there and cleft constructions have been collapsed for statistical analysis with other copula clauses, all containing the semantically light verb be), we observe some minor differences in the respective ranking of lone head NPs and other matrix clause constructions with the relativizers that and who. Lone head NPs, for example, are disfavoured with that in the L2 data but favoured with the same variant in the TL baseline. Likewise, when we inspect the probability values for adjacency, that is the choice variant in non-adjacent contexts in the L2 data, whereas who is the preferred marker in the same environment in the TL baseline.
Other differences concern the effects associated with the type of antecedent NP in the L2 data, where indefinite nouns are favoured with that whereas pronouns are disfavoured. These effects are reversed in the case of indefinite nouns and pronouns when relativizer who is selected. Yet again, opposing trends can be observed in the TL baseline in relation to the operation of the same contextual effects.
Finally, although L2 proficiency is not selected as significant in the L2 data, the probability co-efficients for who show that it is very weakly favoured by low- to mid-low proficiency speakers. Paucity of data from speakers with lower CEPI scores precludes any definitive interpretation of this finding. Notwithstanding this caveat, the results tentatively suggest that the use of who in the L2 subject relative clauses examined here does not increase as a function of higher levels of English proficiency. This aligns with our earlier observation that in these data, proficiency does not appear to be a major determinant of who usage.7
We next consider the results for non-subject relative clauses. Table 10 and Table 11 present the results of regression analyses of predictors contributing to the selection of the relativizers that and zero in non-subject relative clauses in the TL and L2 datasets. In the TL corpus, three predictors return significant effects: subject of the relative clause, matrix clause construction type, and adjacency, all highlighted with grey shading. The same predictors are also selected as significant in the L2 data, with a fourth predictor, type of antecedent NP, additionally returning a significant effect in the L2 data, but not in the TL baseline variety.
Subject of the relative clause and adjacency exhibit parallel effects in the two comparison varieties. When the grammatical subject is a noun, relativizer that is preferentially selected, whereas when the grammatical subject is a pronoun, the zero relativizer is the choice relativizer. In usage-based theories, the preference for using the zero relativizer to mark non-subject relative clauses containing a pronominal subject has been attributed to the degree to which the matrix and relative clause are structurally integrated with each other, with a higher degree of “mergedness” favouring relativizer omission in English (Fox & Thompson, 2007, p. 319).
As noted earlier, the constraint hierarchy for adjacency likely reflects universal processing considerations rather than variety-specific effects. For example, the zero variant is strongly disfavoured in both comparison varieties when the relative clause is separated from its antecedent head NP by intervening material. This result is consistent with Rohdenburg’s (1996, p. 151) Complexity Principle. According to this principle, less explicit grammatical options (i.e., the zero variant) are liable to be disfavoured in cognitively complex environments (i.e., non-adjacent contexts), where an overt relativizer is preferred instead. This appears to be especially the case in non-subject relative clauses, where greater processing burdens may be incurred by longer filler-gap dependencies (Gibson, 1998).
When compared with the corresponding direction of effects in the TL baseline variety, re-ordering of the constraint rankings in the L2 data for matrix clause construction type and type of antecedent NP points to subtle adjustments in the conditioning of variant choice. These adjustments are suggestive of L2 speakers’ reconfiguration of constraints operative in the TL grammar. We caution, however, that they must be weighed against the fact that most of the linguistic predictors incorporated into the analysis, including the non-significant effects of animacy and length of the relative clause, pattern in appreciably the same way in the L2 and TL datasets, as gauged from the hierarchy of constraints.
Furthermore, closer inspection of L2 departures from the TL baseline variety reveals that some of the observable disparities in the L2 data are aligned with patterns detected in other native varieties of English. For example, the strong correlation between lone head NPs and the zero relativizer in the L2 data, but not in the corresponding TL baseline variety examined here, has been documented in other urban varieties of Canadian English (see Levey & Hill, 2013).
We also stress that some of the discrepancies visible in the L2 constraint hierarchies reflect relatively trivial alterations to the corresponding hierarchy of effects in the TL baseline variety. For example, copula clauses in the TL variety are a major determinant of the zero relativizer, as indicated by the top-tier probability coefficients (see also Fox & Thompson, 2007; Levey & Hill, 2013), but the same favouring effect also operates in the L2 variety, albeit to a lesser extent. Similarly, although the respective contributions of definite and pronominal antecedents to variant choice are ranked somewhat differently in the comparison varieties, both types of antecedent disfavour relativizer that and favour zero in the L2 data and TL baseline.
Viewed in the aggregate, comparison of the variable structure of subject and non-subject relative clauses in the L2 and TL datasets indicates that the inter-varietal differences we have uncovered are essentially quantitative rather than qualitative in nature. We make no claim here that L2 speakers have fully reproduced the TL variable system in all its precise structural detail—an accomplishment typically associated with first rather than second language acquisition (Labov, 2007). But we would nonetheless emphasize that our findings converge in foregrounding the capacity of relatively advanced L2 speakers to approximate the variable marking of restrictive relative clauses characteristic of the local TL variety.
To what extent might L2 speakers’ native French influence their production of restrictive relative clauses in English? We first consider the inventory and distribution of restrictive relative markers in French, as shown in Table 12. Together accounting for 98% of the variable context, just two relative markers, qui and qu(e), virtually saturate the restrictive relative marker paradigm. This finding is in line with the propensity of vernacular Romance varieties to use relative particles at the beginning of relative clauses, even in syntactic positions (e.g., in oblique relative clauses) where normative grammars would typically require the selection of a relative pronoun that agrees with the semantic and/or morpho-syntactic features of the head NP post-modified by the relative clause (Fiorentino, 2007, pp. 266–267; Stark, 2016, p. 1036). Inspection of the data in Table 12 shows that the relative pronouns dont and lequel(s)/laquelle(s), used here to mark oblique relative clauses, are exceptionally rare, consonant with their reported infrequency in other varieties of colloquial French (see e.g., Schafroth, 1995).
The conspicuous reduction in the number of relativizers found in colloquial (Canadian) French vis-à-vis the much richer paradigm of markers generally encountered in the standard literary language serves as a reminder that structural patterns characteristic of the written variety cannot be uncritically equated with those found in spontaneous speech (Cheshire, 2005; Milroy, 2001; Poplack, 2018b).
The relative marker qui accounts for a disproportionately large swathe of the variable context in French, reflecting the preponderance of subject relative clauses in the data (subject relative clauses account for 66% data; object relative clauses 27%; and oblique relative clauses 7%). As already noted, there is little evidence to suggest that the predominance of qui in the French data has any direct bearing on the frequency of who in L2 speech, where variant rates are significantly lower than in the corresponding TL benchmark. Nor is the zero variant in colloquial French, also found in English, distributed in ways which parallel its use in either the L2 or the TL datasets. To the very limited extent that it occurs in the French data examined here, it marks a mere 4% (N=10) of non-subject relative clauses. By contrast, the zero relativizer co-occurs with 49% of non-subject relative clauses in L2 speech, slightly above 45% in the corresponding TL baseline. Thus, despite the existence of structural variants common to both vernacular French and English, the rates of those shared options diverge markedly in the respective languages concerned, diminishing, rather than strengthening, the possibility of transfer effects.
What little variation there is in relative-marking strategies in the French data is almost exclusively confined to oblique relative clauses. Particularly remarkable in view of their scant recognition in the literature (Gadet, 1995) is the range of strategies used for marking oblique relative clauses in (Canadian) French. These strategies are partitioned across three unevenly distributed constructions, illustrated in (6)–(8) below, where [ ] indicates a null or absent preposition (also referred to in the literature as preposition chopping, Tarallo, 1983; preposition absorption, Poplack et al., 2012; or preposition ghosting, Radford, 2019):
Pied-piping
(6) la madame avec qui je vais vivre en Suisse, elle travaille avec Interpeace (FL1/017/713)
‘the lady with whom I’m going to live in Switzerland, she works with Interpeace’
Null-preposition
(7) ben ça c’est probablement une des personnes que j’ai commencé à parler [ ] en anglais (FL1/009/610)
‘well that’s probably one of the people that I began to speak [ ] in English
Preposition stranding
(8) c’est pas vraiment quelque chose que je m’en fais avec (FL1/025/218)
‘it’s not really something that I’m worried about (lit. ‘with’)’
Contrary to claims in the theoretical and descriptive literature, Table 13 shows that pied-piping, the prescribed strategy for marking French oblique relative clauses, is neither categorical nor even the majority option in the natural speech data analyzed here (see also Poplack et al., 2012, p. 209 for similar results). This finding highlights the gulf between analysts’ preconceived ideas about how oblique relative clauses are marked in French (e.g., Duffeler, 2017; Guasti & Shlonsky, 1995) and what actually transpires in natural speech.
The null-preposition strategy (‘null-prep’), involving the non-use of a (normatively obligatory) preposition in an oblique relative clause (Klein, 1993), is the lead variant in the L1 data, accounting for over half the marking strategies. The incidence of null-prep in the data analyzed here is consistent with its reported prevalence in other vernacular Romance varieties (see e.g., Alba de la Fuente & Pato, 2019; Tarallo, 1983).8
Of particular interest from a contact perspective is that null-prep is reported to be grammatically inadmissible in English (White, 2003, p. 51). A restricted number of cases (N = 10) surface, however, in the L2 data, as exemplified in (9)–(10) below, raising the possibility that a vernacular strategy in L2 speakers’ native French has been transferred to their English.
(9) so then you can go to the questions that you’re evaluating me [ ] (L2/004/345)
(10) like that was like my city that I wanted to go [ ] and like I could see myself living there (L2/017/130)
Only eight L2 speakers avail themselves of null-prep, indicating that it is not a widely diffused option in the L2 speaker sample.
Militating against the interpretation of L1 transfer effects are a number of competing explanations that merit consideration. Firstly, despite its putative grammatical inadmissibility in English (White, 2003), null-prep in oblique relative clauses is sporadically encountered in the TL baseline variety, as illustrated in (11)–(12), where this phenomenon seems to be limited, as far as we can tell, to the relativization of locative PPs (but see Radford, 2019 for a wider range of syntactic environments in English). The fact that a similar, if sparsely instantiated, construction exists in the TL data casts doubt on the hypothesis that potential transfer effects from French uniquely explain this phenomenon in L2 speech.
(11) I am presented a lot of times with moral dilemmas with the direction that my local board is going [ ] (TL/026/10688)
(12) yeah but I sh– I’m supposed to pay my insurance in the same province that I live [ ] (TL/029/11900)
Equally damaging to the L1 transfer hypothesis are claims that null-prep in L2 discourse qualifies as a systematic developmental phenomenon that is independent of the syntactic properties of L2 speakers’ native language (Bardovi-Harlig, 1987; Perpiñán & Cardinaletti, 2024). Bardovi-Harlig’s (1987) seminal study of L2 English, drawing on a large participant pool representing different L1 backgrounds and varying proficiency levels, indicated that before mastering preposition stranding and pied-piping, learners passed through a developmental stage where they did not produce a preposition (i.e., null-prep) in oblique relative clauses. These findings led Bardovi-Harlig (1987) to enunciate the following developmental schema: (i) null-prep > (ii) preposition stranding > pied-piping. Null-prep was mainly produced by L2 learners in the earlier phases of the acquisition process in Bardovi-Harlig’s (1987) study, with Jourdain (1996) also corroborating a strong correlation with proficiency. To the limited extent that null-prep occurs in the current study, it shows no robust correlation with CEPI scores, as speakers in both lower- and higher-proficiency bands make sporadic use of it, possibly supporting the notion that it may be a vestigial developmental strategy.
We conclude that the very limited instances of null-prep in the L2 data inhibit detailed quantitative analysis of its conditioning as well as systematic comparisons with its counterpart phenomenon in spoken French. The paucity of evidence at our disposal does not allow us to categorically rule out L1 influence, but the competing explanations that we reviewed suggest that a conspiracy of internal and external factors (i.e., ‘multiple causation,’ Thomason, 2001, p. 91) may plausibly account for null-prep in the L2 data analyzed here.
As shown in Table 14, the unrivalled strategy used by L2 speakers to mark oblique relative clauses in English is preposition stranding, mirroring the corresponding choice mechanism in the TL. Such is the strength of preposition stranding in the community TL baseline that native anglophones do not produce a single instance of pied-piping, in spite of claims that this is a common preposition-placement strategy in written and spoken English (Hoffmann, 2005, p. 257). Although pied-piping occurs in L2 speakers’ native French, as shown in Table 13, it does not trigger a single instance of its structural equivalent in their spoken English, contrary to abundant claims of structural priming in contact scenarios (see e.g., Loebell & Bock, 2003). One reason why pied-piping is not found in either the TL or L2 datasets is that WH-forms, the only relativizers licensing pied-piping in English, occur rarely in non-subject relative clauses (cf. Table 5). Furthermore, analysis of everyday speech (e.g., Levey, 2024) suggests, contra Hoffmann (2005), that preposition-stranding is by far the most preponderant—and, by extension, salient—option in spontaneous spoken English. Indeed, McDaniel et al. (1998, p. 309) go so far as to claim that pied-piping is not a natural option in English, but a prescriptive artefact acquired during schooling.
Could preposition stranding, despite its avowedly minoritarian status in spoken (Canadian) French, have enhanced speakers’ use of that strategy in their English? Such superficial structural parallels, as we have observed repeatedly, are certainly believed to optimize transfer effects (see e.g., Backus, 2005). Indeed, the very ubiquity of preposition stranding in spoken English is assumed to have triggered, albeit indirectly, the rise in preposition stranding in (Canadian) French as a result of language contact (see e.g., Roberge & Rosen, 1999, p. 154).
Yet as Poplack et al. (2012) caution, superficial form-based correspondences in French and English may be conditioned by very different underlying linguistic processes. This crucial caveat applies to the results of the present study. Out of nine occurrences of preposition stranding in the L1 French data, 56% (5/9) comprise just a single preposition, avec ‘with,’ competing with just three other forms, dedans (used twice) ‘in,’ dessus ‘on’, and de ‘of’ (see also Poplack et al., 2012, p. 216).
By contrast, comparison of the rates of preposition stranding according to the lexical identity of the stranded preposition in the L2 and TL datasets, depicted in Table 15, reveals evidence that is at variance with L1 influence. Both comparison varieties in Table 15 contain a much larger inventory of lexical forms compared to the French data, likely reflecting the substantially greater prevalence of preposition stranding in English. Granted, the most frequently stranded preposition in the English data is with, as is the case with its French equivalent, avec, but the prepositions to and of also figure among the more commonly stranded forms in the L2 and TL English data, in contrast with the documented aversion of their French counterparts, à ‘to, at’, de ‘of,’ to stranding (Poplack et al., 2012, p. 210).9
In summary, we conclude that systematic quantitative analysis of the data at our disposal fails to provide unequivocal evidence indicating that L2 speakers’ native French exerts a discernible structural influence on the restrictive relative clauses they produce in English. We concede that the minor phenomenon of null-prep in L2 oblique relative clauses has a counterpart in vernacular French, but other possible sources of null-prep in L2 discourse, including developmental motivations, necessarily constrain our ability to attribute this phenomenon exclusively to cross-linguistic transfer. Similarly, triangulation of quantitative evidence across the datasets included in our study suggests that the most compelling source of preposition stranding in L2 oblique relative clauses resides in the choice mechanisms operating in the TL baseline variety, unaffected by superficial parallels in speakers’ native French.

6. Discussion and Conclusions

The primary motivation for the research reported here arose from our concern to document relative clause constructions in natural production data representing L2 speech, and to characterize the frequency and structural diversity of those constructions in everyday social interactions.
A major caveat to emerge from the present investigation is that the types of English restrictive relative clauses that L2 learners use, as well as their probability distributions, should not be assumed a priori. This information can only be reliably inferred from systematic examination of community-based speech data. Contextualization of syntactic variation in relation to the everyday speech norms of the TL to which L2 learners are exposed operates as a crucial check on previous findings, including corpus-based ones, which do not necessarily take into account community-based motivations shaping variable usage. Among the quantitative disparities that have emerged from corpus-based studies of relativization, for example, are affirmations that object relative clauses are proportionally more common than subject ones in the spoken language (Roland et al., 2007, p. 357). This claim is squarely at odds with our own results. In each of the three natural speech corpora at our disposal, we found that subject relative clauses are the quantitatively preponderant type. Moreover, the gradient difficulties associated with the relativization of other syntactic positions (i.e., object > oblique > genitive) accord with the typological generalizations associated with the noun phrase accessibility hierarchy (Keenan & Comrie, 1977).
A hallmark of the current study lies in the importance it attaches to the structured heterogeneity that restrictive relative clauses manifest in everyday speech, and to the application of a detailed comparative approach to the analysis of that variation. This approach enabled us to leverage new insights into the sources of orderly heterogeneity in L2 discourse. We observed that the empirical characterization of that heterogeneity, and its implications for achieving a clearer understanding of the L2 acquisition process, have typically received limited attention in many previous experimental and theoretical treatments.
It is worth briefly reviewing possible motivations for that neglect. One reason has to do with inter-disciplinary differences in preferred frameworks of analysis, with experimental researchers typically eschewing the unbalanced datasets often generated by corpus-based studies (see Jaeger, 2010). A more insidious reason, as we have seen, is traceable to a theoretical inclination to rely on highly idealized normative accounts of language as surrogates for the facts of actual usage (Poplack, 2018b). This approach can lead to the erroneous imposition of categoricity on grammatical phenomena (e.g., oblique relative clauses in French) that are in fact inherently variable. It can also inflate the importance of certain constructions, such as pied-piping in English oblique relative clauses (see Levey, 2024), which are vanishingly rare in everyday spontaneous speech.
Because linguists’ intuitions about the existence, frequency and range of occurrence of constructions in speech may not dovetail with the actual patterns that characterize authentic interactions (Bybee, 2008, p. 226; Milroy, 2001, pp. 544–545), access to natural language corpora is indispensable in helping “to bridge the gap between the analysts’ conception of the data and the data themselves” (Ernestus & Baayen, 2011, p. 374). Indeed, the importance of examining “frequency distributions in native discourse to look for possible sources for acquisition patterns” (Shirai & Ozeki, 2007, p. 160) appears to be gaining traction in experimental paradigms.
Not only does a quantitative approach enable the relevant facts of variable usage to be laid bare, its greatest asset arguably lies in the level of analytical granularity it affords for comparing variable structure across different speaker groups and for capitalizing on structural comparisons as metrics of L2 acquisition.
What have those structural comparisons revealed in the present study? A first major finding is that L2 speakers use the same variants to mark restrictive relative clauses that are employed by TL speakers. Discursive frequencies of two restrictive relative markers, that and zero, are largely commensurate with the usage rates of TL speakers, especially in non-subject relative clauses. Only in the case of the subject relative marker who did we find reduced occurrences of this variant in the speech of the L2 cohort vis-à-vis the TL community baseline. None of the possible explanations we reviewed satisfactorily accounts for this discrepancy. Inspection of CEPI scores revealed no correlation in the use of who with English-language proficiency levels, or, indeed, with any other extralinguistic measure (e.g., level of educational attainment). There is, however, no indication in the data, as demonstrated by our quantitative comparisons of L2 and TL speaker cohorts, that those L2 speakers who make use of who do so in different ways from TL speakers. This bolsters our conviction that we are dealing here with a quantitative rather than a qualitative distinction between L2 and TL speakers, although the root cause of that distinction remains elusive.
Appeals to possible L1 influence on L2 speakers’ reduced use of subject who are inconsistent with what we find in speakers’ native French, where qui, partially equivalent to English who, is the quantitatively dominant marker in the French restrictive relativizer paradigm, potentially rendering it available for cross-linguistic priming. Granted, certain scholars maintain that the typological rarity of WH-relativizers cross-linguistically renders them less susceptible to transfer effects (Gisborne, 2024), despite claims in the literature ascribing a key role to language contact in the areal diffusion of interrogative pronouns as relative clause markers (see, e.g., Comrie, 1998; Auderset, 2020). Nor can any convincing explanation be extrapolated from previous research targeting the L2 acquisition of English relativizers. Inspection of the literature reveals that the L2 acquisition of who does not show any conspicuous quantitative anomalies in learner groups that parallel our own findings. In fact, Ghafar Samar’s (2000, pp. 117–118) study of L1 Persian speakers’ acquisition of English in the Canadian National Capital Region showed that L2 speakers actually used who at rates which surpassed those in the local TL community, despite the absence of corresponding WH-forms in their native Persian.
The expectation that cross-linguistic parallels between L1 and TL forms should have a facilitatory effect on L2 production preferences is not borne out with regard to the subject who. This may be because English who is not wholly congruent with French qui on account of differences in the semantic properties encoded by the respective forms. Recent work on cross-linguistic structural priming effects (Van Lieburg et al., 2023) suggests that if the structures concerned are not wholly similar across languages, this may have an inhibitory effect on production preferences, although the precise mechanisms underpinning inhibition processes remain to be determined.10
In spite of the putative susceptibility of relativization strategies to contact effects (e.g., Muysken, 2012), our comparative approach failed to turn up compelling evidence in favour of such a scenario. The only potential candidate for cross-linguistic transfer that we detected, involving the L2 use of null-prepositions (null-prep) in oblique relative clauses, remains insubstantiated. There is no proprietary relationship between this construction and speakers’ L1, French, as there is copious evidence indicating that null-prep is a robust interlanguage phenomenon.
A central finding of our research is that L2 speakers with varying proficiency levels are capable, in the aggregate, of approximating probabilistic constraints on relativizer selection that are operative in the local TL baseline variety. To be sure, the precise configuration of those constraints in the L2 data is not a wholesale facsimile of what is found in the corresponding TL baseline. But the adjustments that are discernible in the L2 variable grammar conditioning relative marker selection are, as far as we can determine, relatively minor. These adjustments are not of the magnitude that would warrant the inference that L2 speakers’ variable system constitutes a profoundly divergent interlanguage grammar in comparison with its TL counterpart.
Our results contribute to the growing body of evidence indicating that L2 speakers are sensitive to, and capable of reproducing, statistical regularities in the TL input to which they are exposed. Of particular importance is the fact that many of the fine-grained patterns depicted in Table 8, Table 9, Table 10 and Table 11 lie so far below the level of conscious awareness that they are highly unlikely to have been explicitly transmitted to learners in formal language-learning contexts. The tendency to omit a relativizer in non-subject relative clauses, visible in Table 10 and Table 11, when the main clause is semantically or propositionally ‘light’ (i.e., a matrix copular clause) is but one example of a non-trivial pattern that has eluded traditional accounts of relativization (see Fox & Thompson, 2007). This pattern is nonetheless firmly entrenched in the TL community grammar and reproduced by L2 speakers examined here.
We submit that the principal means by which such implicit patterns are conveyed to L2 learners is via a process of vernacular transmission mediated by contact with the local TL community (Sankoff et al., 1997). Of central importance in understanding the propagation of TL vernacular norms to L2 speakers are the social characteristics of the acquisition context. Situated in a stable, long-standing bilingual community where considerable value is attached to knowledge of both official languages, the social and attitudinal circumstances of L2 acquisition, abetted by extensive exposure to TL speakers, are highly conducive to advanced L2 attainment. Recall that the personal social networks of many of the francophones we targeted are made up of substantial proportions of anglophones, reaching majority levels for more than half of the L2 speaker sample. The fact that many francophones have close affiliations with the local TL community suggests that L2 speakers have integrative motivations for using English in their anglophone friendship groups. As Sankoff et al. (1997, p. 193) observe, a greater degree of social integration into the TL community leads to greater linguistic integration as well.
We conclude by pointing to some of the limitations of our own study and the need for additional research targeting issues that we have insufficiently addressed. Foremost among those issues is the role of individual differences in the L2 acquisition of restrictive relative clauses. Although our statistical methods were configured to take into account inter-individual patterns of variation in the marking of relative clauses, the nature and extent of intra-individual patterns of variation remain to be determined, as does their longitudinal development.
We stress that the absence of any detailed assessment of individual differences in the current study is not a defect of our methodological approach, but derives instead from the nature of the spontaneous speech data we privileged. As observed earlier, relative clauses are infrequent in running discourse, which severely restricts our ability to mine copious and balanced amounts of data for each speaker. Further sub-categorization of those data into different types of relative clause (i.e., subject, object, oblique), essential for analytical purposes, inevitably results in additional imbalances and skewed token distributions. Witness the fact, for example, that genitive relative clauses are almost absent from the corpora we examined. In the same vein, the analysis of oblique relative clauses resulted in only modest quantities of data for each corpus, precluding any meaningful quantitative assessment of individual patterns of variation.
Although distributional asymmetries and sparse data cells are unavoidable when working with natural language data, experimental paradigms have developed tried-and-tested protocols for eliciting and analyzing syntactic variables that occur at sub-optimal rates in everyday speech. In keeping with the utility of approaching “a single problem with different methods” (Labov, 1972b, pp. 118–119), “triangulating corpus and experimental methodologies complementarily” (Deshors & Gries, 2022, p. 171) would seem to offer fertile avenues for mitigating the impact of the limitations we have identified. Whatever transpires in future investigations into the L2 acquisition of relativization, this line of inquiry can only be enriched by the study of actual interactions situated in their community-based context, as we hope to have shown.

Author Contributions

Conceptualization, S.L.; Methodology, S.L. and K.L.R.; Formal analysis, S.L. and K.L.R.; Data curation, K.L.R. and L.K.; Writing—original draft, S.L.; Project administration, S.L. and L.K.; Funding acquisition, S.L. and L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Social Sciences and Humanities Research Insight, Grant grant number [435-2018-0999].

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Social Sciences and Humanities Research Ethics Board (REB), University of Ottawa (file no. S-10-18-1140, approved on 19 November 2018) for studies involving humans.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data analyzed in this study may be made available on request from the corresponding author. The data are not publicly available in accordance with the informed consent guidelines provided to the participants.

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1
The term ‘relative clause construction’ is used throughout this article to refer to constructions comprising two clauses: a matrix clause containing an NP which is post-modified by an embedded relative clause. The term ‘relative clause’ is used to refer to the post-modifying clause itself.
2
Examples are reproduced verbatim from spoken corpora. Codes in parentheses refer to the corpus from which the example is drawn (L2 = Second Language Corpus of English; TL = Target Language Corpus of English; FL1 = French First Language Corpus); the unique speaker identifier; and the line number of the utterance in the respective corpus. We use the Ø symbol to refer to the null or zero relative marker in spoken English and French.
3
Many theoretical linguists argue that qui is not a WH-word but an allomorph of complementizer que (see Mackenzie, 2018), although Koopman and Sportiche (2014) claim that qui behaves as an uncontroversial WH-pronoun in certain contexts.
4
Excluded from the results are limited instances of non-human animate NPs. We display the results for subject relative clauses only because the relativizer who is almost entirely restricted to that syntactic environment in our datasets (see also D’Arcy & Tagliamonte, 2010, p. 391).
5
Following Brook and Tagliamonte (2023, pp. 31, 33), we also examined the influence of education on individual rates of who, but found no systematic effect in the L2 data.
6
Regression analyses were run separately for each relative marker. As we are dealing in Table 8 (and subsequent tables based on regression analysis) with what is essentially a binary variable, probability values for competing variants within the same speaker cohort are mirror images of each other.
7
Recall that Table 9 excludes L2 speakers who make no use of who. Thus, the percentage values for who for the two proficiency groups in Table 9 differ from those in Table 6, based on the entire L2 speaker cohort.
8
The ‘other’ category in Table 13 includes one instance of dont as an oblique relativizer and one instance where the relative clause is doubly marked by pied-piping and preposition stranding.
9
Semantically ‘weak’ prepositions (e.g., à, de), whose interpretation is much more context dependent than that of their ‘strong’ equivalents (e.g., avec), are more likely to be ‘absorbed’ in Poplack et al.’s (2012) terminology (i.e., to result in null-prep).
10
We are grateful to a reviewer for bringing Van Lieburg et al. (2023) to our attention.

References

  1. Alba de la Fuente, A., & Pato, E. (2019). Cortadora relative clauses: A comparative analysis between Spanish, Portuguese, and French. Isogloss: Open Journal of Romance Linguistics, 5, 1–19. [Google Scholar] [CrossRef]
  2. Alexiadou, A., Law, P., Meinunger, A., & Wilder, C. (2000). The syntax of relative clauses. John Benjamins. [Google Scholar] [CrossRef]
  3. Auderset, S. (2020). Interrogatives as relativization markers in Indo-European. Diachronica, 37(4), 474–513. [Google Scholar] [CrossRef]
  4. Backus, A. (2005). Code-switching and language change: One thing leads to another? International Journal of Bilingualism, 9(3/4), 307–340. [Google Scholar] [CrossRef]
  5. Ball, C. (1996). A diachronic study of relative markers in spoken and written English. Language Variation and Change, 8(2), 227–258. [Google Scholar] [CrossRef]
  6. Bardovi-Harlig, K. (1987). Markedness and salience in second-language acquisition. Language Learning, 37(3), 385–407. [Google Scholar] [CrossRef]
  7. Bayley, R., & Tarone, E. (2012). Variationist perspectives. In S. M. Gass, & A. Mackey (Eds.), The Routledge handbook of second language acquisition (pp. 41–56). Routledge. [Google Scholar] [CrossRef]
  8. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar of spoken and written English. Pearson Education Limited. [Google Scholar]
  9. Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The case of systematicity. Language Learning, 33, 1–17. [Google Scholar] [CrossRef]
  10. Blondeau, H., Nagy, N., Sankoff, G., & Thibault, P. (2002). La couleur locale du français L2 des anglo-montréalais. Acquisition et interaction en langue étrangère, 17, 73–100. [Google Scholar] [CrossRef]
  11. Britain, D. (2020). What happened to those relatives from east Anglia? A multilocality analysis of dialect levelling in the relative marker system. In K. V. Beaman, I. Buchstaller, S. Fox, & J. A. Walker (Eds.), Advancing socio-grammatical variation and change (pp. 93–114). Routledge. [Google Scholar]
  12. Brook, M., & Tagliamonte, S. (2023). Subject relative who in Ontario, Canada: Change from above in a transplanted ecology. Journal of Linguistic Geography, 11, 25–37. [Google Scholar] [CrossRef]
  13. Bybee, J. (2008). Usage-based grammar and second-language acquisition. In P. Robinson, & N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language acquisition (pp. 216–236). Routledge. [Google Scholar]
  14. Bybee, J. (2010). Language, usage and cognition. Cambridge University Press. [Google Scholar] [CrossRef]
  15. Cheshire, J. (2005). Syntactic variation and spoken language. In L. Cornips, & K. Corrigan (Eds.), Syntax and variation: Reconciling the biological and the social (pp. 81–106). John Benjamins. [Google Scholar] [CrossRef]
  16. Comrie, B. (1998). Rethinking the typology of relative clauses. Language Design, 1, 59–86. [Google Scholar]
  17. D’Arcy, A., & Tagliamonte, S. (2010). Prestige, accommodation, and the legacy of relative who. Language in Society, 39(3), 383–410. [Google Scholar] [CrossRef]
  18. Deshors, S. C., & Gries, S. (2022). Using corpora in research on second language psycholinguistics. In A. Godfroid, & H. Hopp (Eds.), The Routledge handbook of second language acquisition and psycholinguistics (pp. 164–177). Routledge. [Google Scholar] [CrossRef]
  19. Diessel, H., & Tomasello, M. (2000). The development of relative clauses in spontaneous child speech. Cognitive Linguistics, 11(1/2), 131–151. [Google Scholar] [CrossRef]
  20. Diessel, H., & Tomasello, M. (2005). A new look at the acquisition of relative clauses. Language, 81(4), 882–906. [Google Scholar] [CrossRef]
  21. Doughty, C., & Long, M. (2003). The scope of inquiry and goals of SLA. In C. Doughty, & M. Long (Eds.), The handbook of second language acquisition (pp. 3–16). Wiley-Blackwell. [Google Scholar] [CrossRef]
  22. Duffeler, M. A. C. M. (2017). The comprehension of relative clauses by Romance learners of English: Syntactic and semantic influences [Unpublish doctoral dissertation, Vrije Universiteit]. Available online: https://www.lotpublications.nl/Documents/479_fulltext.pdf (accessed on 2 January 2025).
  23. Ernestus, M., & Baayen, R. H. (2011). Corpora and exemplars in phonology. In J. Goldsmith, J. Riggle, & A. C. L. Yu (Eds.), The handbook of phonological theory (2nd ed., pp. 374–400). Wiley-Blackwell. [Google Scholar] [CrossRef]
  24. Fiorentino, G. (2007). European relative clauses and the uniqueness of the relative pronoun type. Italian Journal of Linguistics, 19(2), 263–291. [Google Scholar]
  25. Flynn, S., Foley, C., & Vinnitskaya, I. (2004). The cumulative-enhancement model for language acquisition: Comparing adults’ and children’s patterns of development in first, second and third language acquisition of relative clauses. International Journal of Multilingualism, 1(1), 3–16. [Google Scholar] [CrossRef]
  26. Fox, B., & Thompson, S. A. (2007). Relative clauses in English conversation. Studies in Language, 31(2), 293–326. [Google Scholar] [CrossRef]
  27. Gadet, F. (1995). Les relatives non standard en français parlé, le système et l’usage. Études Romanes, 34, 141–162. [Google Scholar]
  28. Gass, S. (1979). Language transfer and universal grammatical relations. Language Learning, 29(2), 327–344. [Google Scholar] [CrossRef]
  29. Geeslin, K., & Long, A. Y. (2014). Sociolinguistics and second language acquisition: Learning to use language in context. Routledge. [Google Scholar] [CrossRef]
  30. Ghafar Samar, R. (2000). Aspects of second language speech: A variationist perspective on second language acquisition [Unpublished doctoral dissertation, University of Ottawa]. [Google Scholar]
  31. Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1), 1–76. [Google Scholar] [CrossRef] [PubMed]
  32. Gibson, E., & Wu, H.-H. I. (2013). Processing Chinese relative clauses in context. Language and Cognitive Processes, 28(1–2), 125–155. [Google Scholar] [CrossRef]
  33. Gisborne, N. (2024, December). Contact as an explanation of the spread of wh-relatives. In Fourth AMC Symposium: Contact and language change. University of Edinburgh. [Google Scholar]
  34. Guasti, M. T., & Shlonsky, U. (1995). The acquisition of French relative clauses reconsidered. Language Acquisition, 4(4), 257–276. Available online: http://www.jstor.org/stable/20011426 (accessed on 2 January 2025). [CrossRef]
  35. Guy, G. R., & Bayley, R. (1995). On the choice of relative pronouns in English. American Speech, 70(2), 148–162. [Google Scholar] [CrossRef]
  36. Hawkins, R. (1989). Do second language learners acquire restrictive relative clauses on the basis of relational or configurational information? The acquisition of French subject, direct object and genitive restrictive relative clauses by second language learners. Second Language Research, 5(2), 156–188. [Google Scholar] [CrossRef]
  37. Hermann, T. (2003). Relative clauses in dialects of English: A typological approach [Unpublished doctoral dissertation, University of Freiburg]. [Google Scholar]
  38. Hoffmann, T. (2005). Variable vs. categorical effects: Preposition pied-piping and stranding in British English relative clauses. Journal of English Linguistics, 33(3), 257–297. [Google Scholar] [CrossRef]
  39. Howard, M., Mougeon, R., & Dewaele, J. M. (2013). Sociolinguistics and second language acquisition. In R. Bayley, R. Cameron, & C. Lucas (Eds.), The Oxford handbook of sociolinguistics (pp. 340–359). Oxford University Press. [Google Scholar] [CrossRef]
  40. Huddleston, R., & Pullum, G. K. (2002). The Cambridge grammar of the English language. Cambridge University Press. [Google Scholar] [CrossRef]
  41. Jaeger, F. T. (2010). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology, 61(1), 23–62. [Google Scholar] [CrossRef] [PubMed]
  42. Johnson, D. E. (2009). Getting off the Goldvarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass, 3, 359–383. [Google Scholar] [CrossRef]
  43. Jourdain, S. (1996). The case of null-prep in the interlanguage of adult learners of French [Unpublished doctoral dissertation, Indiana University]. [Google Scholar]
  44. Keenan, E., & Comrie, B. (1977). Noun phrase accessibility and universal grammar. Linguistic Inquiry, 8, 63–99. Available online: https://www.jstor.org/stable/4177973 (accessed on 2 January 2025).
  45. Klein, E. (1993). Toward second language acquisition: A study of null-prep. Kluwer. [Google Scholar]
  46. Koopman, H., & Sportiche, D. (2014). The que/qui alternation: New analytical directions. In P. Svenonius (Ed.), Functional structure from top to toe: The cartography of syntactic structures (Vol. 9, pp. 46–96). Oxford University Press. [Google Scholar] [CrossRef]
  47. Labelle, M. (1990). WH-movement, and the development of relative clauses. Language Acquisition, 1(1), 95–119. Available online: http://www.jstor.org/stable/20011343 (accessed on 2 January 2025). [CrossRef]
  48. Labov, W. (1972a). Language in the inner city. University of Pennsylvania Press. [Google Scholar]
  49. Labov, W. (1972b). Some principles of linguistic methodology. Language in Society, 1(1), 97–120. [Google Scholar] [CrossRef]
  50. Labov, W. (1984). Field methods of the project on linguistic change and variation. In J. Baugh, & J. Sherzer (Eds.), Language in use: Readings in sociolinguistics (pp. 28–54). Prentice-Hall. [Google Scholar]
  51. Labov, W. (2007). Transmission and diffusion. Language, 82(2), 344–387. [Google Scholar] [CrossRef]
  52. Levey, S. (2014). A comparative variationist perspective on relative clauses in child and adult speech. In R. Torres Cacoullos, N. Dion, & A. Lapierre (Eds.), Linguistic variation: Confronting fact and theory (pp. 22–37). Routledge. [Google Scholar] [CrossRef]
  53. Levey, S. (2024). Standard and non-standard English. In S. Fox (Ed.), Language in Britain and Ireland (pp. 48–69). Cambridge University Press. [Google Scholar] [CrossRef]
  54. Levey, S., & Hill, C. (2013). Social and linguistic constraints on relativizer omission in Canadian English. American Speech, 88(1), 32–62. [Google Scholar] [CrossRef]
  55. Lieven, E. (2010). Input and first language acquisition: Evaluating the role of frequency. Lingua, 120, 2546–2556. [Google Scholar] [CrossRef]
  56. Loebell, H., & Bock, K. (2003). Structural Priming across Languages. Linguistics, 41(5), 791–824. [Google Scholar] [CrossRef]
  57. Macdonald, M. C. (2015). The emergence of language comprehension. In B. MacWhinney, & W. O’Grady (Eds.), The handbook of language emergence (pp. 81–99). Wiley-Blackwell. [Google Scholar] [CrossRef]
  58. Mackenzie, I. (2018). The case of special qui. Journal of French Language Studies, 28(1), 21–41. [Google Scholar] [CrossRef]
  59. McDaniel, D., McKee, C., & Bernstein, J. (1998). How children’s relatives solve a problem for minimalism. Language, 74(2), 308–334. [Google Scholar] [CrossRef]
  60. Mellow, J. D. (2006). The emergence of second language syntax: A case study of the acquisition of relative clauses. Applied Linguistics, 27(4), 645–670. [Google Scholar] [CrossRef]
  61. Meyerhoff, M., Birchfield, A., Ballard, E., Watson, C., & Charters, H. (2020). Restrictions on relative clauses in Auckland, New Zealand. In K. V. Beaman, I. Buchstaller, S. Fox, & J. A. Walker (Eds.), Advancing socio-grammatical variation and change: In honour of Jenny Cheshire (pp. 115–133). Routledge. [Google Scholar]
  62. Meyerhoff, M., & Schleef, E. (2012). Variation, contact and social indexicality in the acquisition of (ing) by teenage migrants. Journal of Sociolinguistics, 16(3), 398–416. [Google Scholar] [CrossRef]
  63. Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal of Sociolinguistics, 5(4), 530–555. [Google Scholar] [CrossRef]
  64. Milroy, L., & Gordon, M. (2003). Sociolinguistics: Models and methods. Wiley-Blackwell. [Google Scholar] [CrossRef]
  65. Montrul, S. (2020). How learning context shapes heritage and second language acquisition. In M. Dressman, & R. W. Sadler (Eds.), The handbook of informal language learning (pp. 57–74). Wiley. [Google Scholar] [CrossRef]
  66. Muysken, P. (2012). Another icon of language contact shattered. Bilingualism: Language and Cognition, 15(2), 237–239. [Google Scholar] [CrossRef]
  67. Nagy, N., Blondeau, H., & Auger, J. (2003). Second language acquisition and ‘real’ French: An investigation of subject doubling in the French of Montreal Anglophones. Language Variation and Change, 15(1), 73–103. [Google Scholar] [CrossRef]
  68. Perpiñán, S., & Cardinaletti, A. (2024). Null-prep as a systematic interlanguage phenomenon: Evidence from relative clauses, interrogatives, and sluicing constructions. Second Language Research, 40(1), 139–169. [Google Scholar] [CrossRef]
  69. Poplack, S. (1989). The care and handling of a mega-corpus. In R. W. Fasold, & D. Schiffrin (Eds.), Language change and variation (pp. 411–451). John Benjamins. [Google Scholar]
  70. Poplack, S. (2011). Grammaticalization and linguistic variation. In B. Heine, & H. Narrog (Eds.), The handbook of grammaticalization (pp. 209–224). Oxford University Press. [Google Scholar]
  71. Poplack, S. (2018a). Borrowing: Loanwords in the speech community and in the grammar. Oxford University Press. [Google Scholar] [CrossRef]
  72. Poplack, S. (2018b). Categories of grammar and categories of speech: When the quest for symmetry meets inherent variability. In N. Shin, & D. Erker (Eds.), Questioning theoretical primitives in linguistic inquiry: Papers in honor of Ricardo Otheguy (pp. 7–34). John Benjamins Publishing Company. [Google Scholar] [CrossRef]
  73. Poplack, S., & Meechan, M. (1998). How languages fit together in code-mixing. International Journal of Bilingualism, 2(2), 127–138. [Google Scholar] [CrossRef]
  74. Poplack, S., & Tagliamonte, S. (2001). African American English in the diaspora. Blackwell. [Google Scholar]
  75. Poplack, S., Zentz, L., & Dion, N. (2012). Phrase-final prepositions in Quebec French: An empirical study of contact, code-switching and resistance to convergence. Bilingualism: Language and Cognition, 15(2), 203–225. [Google Scholar] [CrossRef]
  76. Radford, A. (2019). Relative clauses: Structure and variation in everyday English. Cambridge University Press. [Google Scholar] [CrossRef]
  77. Rehner, K., & Mougeon, R. (2022). Variationist methods of analysis. In K. Geeslin (Ed.), Handbook of second language acquisition and sociolinguistics (pp. 200–211). Routledge. [Google Scholar] [CrossRef]
  78. Roberge, Y., & Rosen, N. (1999). Preposition stranding and que-deletion in varieties of North American French. Linguistica Atlantica, 21, 153–168. Available online: https://journals.lib.unb.ca/index.php/la/article/view/22461 (accessed on 2 January 2025).
  79. Rochon, K. L. (2023). Grammar THAT varies for speakers WHO are proficient: Relative clauses in second-language speech [Unpublished master’s dissertation, University of Ottawa]. [Google Scholar]
  80. Rohdenburg, G. (1996). Cognitive complexity and increased grammatical explicitness in English. Cognitive Linguistics, 7(2), 149–182. [Google Scholar] [CrossRef]
  81. Roland, D., Dick, F., & Elman, J. L. (2007). Frequency of basic English grammatical structures: A corpus analysis. Journal of Memory and Language, 57, 348–379. [Google Scholar] [CrossRef] [PubMed]
  82. Romaine, S. (1982). Socio-historical linguistics: Its status and methodology. Cambridge University Press. [Google Scholar] [CrossRef]
  83. Romaine, S. (1984). The language of children and adolescents: The acquisition of communicative competence. Blackwell. [Google Scholar]
  84. Sankoff, G., Thibault, P., Nagy, N., Blondeau, H., Fonollosa, M.-O., & Gagnon, L. (1997). Variation in the use of discourse markers in a language contact situation. Language Variation and Change, 9(2), 191–217. [Google Scholar] [CrossRef]
  85. Schafroth, E. (1995). À propos d’une typologie panromane des relatifs ‘non normatifs’. In C. Bougy, P. Boissel, & B. Garnier (Eds.), Mélanges René Lepelley, recueil d’études en hommage au Professeur René Lepelley (pp. 363–374). Musée de Normandie. [Google Scholar]
  86. Schleef, E. (2017). Developmental sociolinguistics and the acquisition of T-glottalling by immigrant teenagers in London. In G. de Vogelaer, & M. Katerbow (Eds.), Acquiring sociolinguistic variation (pp. 311–347). John Benjamins. [Google Scholar] [CrossRef]
  87. Schleef, E., Meyerhoff, M., & Clark, L. (2011). Teenagers’ acquisition of variation: A comparison of locally-born and migrant teens’ realisation of English (ing) in Edinburgh and London. English World-Wide, 32(2), 206–236. [Google Scholar] [CrossRef]
  88. Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics, 10, 209–231. [Google Scholar] [CrossRef]
  89. Shirai, Y., & Ozeki, H. (2007). Introduction. Studies in Second Language Acquisition, 29(2), 155–167. [Google Scholar] [CrossRef]
  90. Speed, L., Wnuk, E., & Majid, A. (2018). Studying psycholinguistics out of the lab. In A. M. B. de Groot, & P. Hagoort (Eds.), Research methods in psycholinguistics and the neurobiology of language: A practical guide (pp. 90–207). John Wiley. [Google Scholar] [CrossRef]
  91. Stark, E. (2016). Relative clauses. In A. Ledgeway, & M. Maiden (Eds.), The Oxford guide to the Romance languages (pp. 1029–1040). Oxford University Press. [Google Scholar] [CrossRef]
  92. Statistics Canada. (2021). Census of population: Ottawa-Gatineau census metropolitan area. Available online: https://www12.statcan.gc.ca/census-recensement/2021/as-sa/fogs-spg/alternative.cfm?topic=6&lang=E&dguid=2021S0503505&objectId=5 (accessed on 2 January 2025).
  93. Tagliamonte, S. (2002). Variation and change in the British relative marker system. In P. Poussa (Ed.), Relativisation on the North Sea littoral (pp. 147–165). Lincom Europa. [Google Scholar]
  94. Tagliamonte, S., Smith, J., & Lawrence, H. (2005). No taming the vernacular! Insights from the relatives in northern Britain. Language Variation and Change, 17(1), 75–112. [Google Scholar] [CrossRef]
  95. Tarallo, F. (1983). Relativization strategies in Brazilian Portuguese [Unpublished doctoral dissertation, University of Pennsylvania]. [Google Scholar]
  96. Thomason, S. G. (2001). Language contact: An introduction. Edinburgh University Press. [Google Scholar]
  97. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press. [Google Scholar] [CrossRef]
  98. Torres Cacoullos, R., & Travis, C. (2018). Bilingualism in the community: Code-switching and grammars in contact. Cambridge University Press. [Google Scholar] [CrossRef]
  99. Tottie, G., & Harvie, D. (2000). It’s all relative: Relativization strategies in early African American English. In S. Poplack (Ed.), The English history of African American English (pp. 198–230). Wiley-Blackwell. [Google Scholar]
  100. Van Lieburg, R., Hartsuiker, R., & Bernolet, S. (2023). The production preferences and priming effects of Dutch passives in Arabic/Berber-Dutch and Turkish-Dutch heritage speakers. Bilingualism: Language and Cognition, 26, 695–708. [Google Scholar] [CrossRef]
  101. Weinreich, U., Labov, W., & Herzog, M. (1968). Empirical foundations for a theory of language change. In W. P. Lehmann, & Y. Malkiel (Eds.), Directions for historical linguistics (pp. 97–195). University of Texas. [Google Scholar]
  102. White, L. (2003). Second language acquisition and universal grammar. Cambridge University Press. [Google Scholar] [CrossRef]
  103. Wiechmann, D. (2015). Understanding relative clauses: A usage-based view on the processing of complex constructions. De Gruyter Mouton. [Google Scholar] [CrossRef]
  104. Yip, V., & Matthews, S. (2007). Relative clauses in Cantonese-English bilingual children: Typological challenges and processing motivations. Studies in Second Language Acquisition, 29(2), 277–300. [Google Scholar] [CrossRef]
Table 1. Distribution of L2 speakers by age and sex.
Table 1. Distribution of L2 speakers by age and sex.
19–33 Years50–77 YearsTotal N
Females10717
Males12012
22729
Table 2. Cumulative English Proficiency Index (score ranges shown in parentheses).
Table 2. Cumulative English Proficiency Index (score ranges shown in parentheses).
No. of Speakers% of Sample
Low (0.450–0.494)310
Mid-low (0.575–0.694)931
Mid-high (0.700–0.788)1138
High (0.800–0.863)621
Total29
Table 3. Distribution of relative clauses according to the syntactic role of the relativized NP.
Table 3. Distribution of relative clauses according to the syntactic role of the relativized NP.
L2 SpeakersTL Speakers
Syntactic Position of Relativized NPN%N%
Subject44855%44453%
Object27434%28434%
Oblique8510.5%11413.5%
Genitive20.2%00%
Total809 842
Table 4. Overall distribution of relative clause marking strategies in the L2 and TL corpora.
Table 4. Overall distribution of relative clause marking strategies in the L2 and TL corpora.
L2 SpeakersTL Speakers
VariantN%N%
That55268%49859%
Zero19124%18522%
Who638%15719%
Which10.1%20.2%
Whose20.2%00%
Total809 842
Table 5. Distribution of marking strategies in subject and non-subject relative clauses in the L2 and TL corpora.
Table 5. Distribution of marking strategies in subject and non-subject relative clauses in the L2 and TL corpora.
L2 Speakers TL Speakers
Subject RelativesNon-Subject RelativesSubject RelativesNon-Subject Relatives
VariantN%N%N%N%
That36982%18351%28564%21353.5%
Zero153%17649%51%18045%
Who6314%00%15334.5%41%
Whose00%20.6%00%00%
Which10.2%00%10.2%10.3%
Total448 361 444 398
Table 6. Distribution of marking strategies in subject and non-subject relative clauses in the L2 corpus according to L2 proficiency.
Table 6. Distribution of marking strategies in subject and non-subject relative clauses in the L2 corpus according to L2 proficiency.
Low/Mid-Low Proficiency
(CEPI Range = 0.450–0.694)
Mid-High to High Proficiency
(CEPI Range = 0.700–0.863)
Subject RelativesNon-Subject RelativesSubject RelativesNon-Subject Relatives
VariantN%N%N%N%
That14483%7249%22582%11152%
Zero53%7451%104%10247%
Who2414%00%3914%00%
Whose00%00%00%21%
Which11%00%00%00%
174 146 274 215
Table 7. Distribution of relative markers according to the animacy of the antecedent NP in the L2 and TL datasets.
Table 7. Distribution of relative markers according to the animacy of the antecedent NP in the L2 and TL datasets.
L2 SpeakersTL Speakers
Human AntecedentInanimate AntecedentHuman AntecedentInanimate Antecedent
VariantN%N%N%N%
That22074%13999%12344%15999%
Zero145%11%52%00%
Who6321%00%15254%00%
Which00%11%00%11%
297 141 280 160
Table 8. Rbrul of the contribution of independent predictors to the selection of that and who in subject relative clauses in the TL Corpus of English (Notes: N/A = not applicable; values of 0 or 100 in the % columns indicate invariant contexts)6.
Table 8. Rbrul of the contribution of independent predictors to the selection of that and who in subject relative clauses in the TL Corpus of English (Notes: N/A = not applicable; values of 0 or 100 in the % columns indicate invariant contexts)6.
THAT WHO
Input probability0.606 0.394
Log likelihood−257.744 −257.744
R2 total0.196 0.196
N267/419 152/419
LOFW%NLOFW%N
Matrix construction
copula0.3250.58168162−0.3250.41932162
lone head NP0.1780.5446757−0.1780.4563357
other0.0930.52363151−0.0930.47737151
stative possessive−0.5970.35549490.5970.6455149
Adjacency
adjacent0.2140.55365352−0.2140.44735352
non-adjacent−0.2140.44758670.2140.5534267
Type of antecedent NP
pronoun0.1750.5446962−0.1750.4563162
definite NP0.0540.51471117−0.0540.48629117
indefinite NP−0.2990.443592400.2290.55741240
Animacy of antecedent
humanN/AN/A43265N/AN/A57265
inanimate1001540154
Length of relative clause
1–3 words−0.0320.492641350.0320.50836135
4+ words0.0320.50863284−0.0320.49237284
Table 9. Rbrul of the contribution of independent predictors to the selection of that and who in subject relative clauses in the L2 Corpus of English (Notes: (i) N/A = not applicable; (ii) values of 0 or 100 in the % columns indicate invariant contexts; (iii) bolded numbers indicate re-ordering of individual constraints vis-à-vis those in the TL baseline).
Table 9. Rbrul of the contribution of independent predictors to the selection of that and who in subject relative clauses in the L2 Corpus of English (Notes: (i) N/A = not applicable; (ii) values of 0 or 100 in the % columns indicate invariant contexts; (iii) bolded numbers indicate re-ordering of individual constraints vis-à-vis those in the TL baseline).
THAT WHO
Input probability0.785 0.215
Log likelihood−142.499 −142.499
R2 total0.238 0.238
N242/305 63/305
LOFW%NLOFW%N
Matrix construction
copula0.3560.58883126−0.3560.41217126
other0.2730.56880105−0.2730.43220105
lone head NP−0.0310.49276290.0310.5082429
stative possessive−0.5980.35569450.5980.6453145
Adjacency
adjacent−0.2430.440782460.2430.56022246
non-adjacent0.2430.5608659−0.2430.4401459
Type of antecedent NP
indefinite noun0.4110.60182173−0.4110.39918173
definite noun0.1920.5488178−0.1920.4521978
pronoun−0.6020.35469540.6030.6463254
Animacy of antecedent
humanN/AN/A70210N/AN/A30210
inanimate10095095
Length of relative clause
1–3 words0.0940.52481101−0.0940.47619101
4+ words−0.0940.476782040.0940.52422204
Proficiency
mid-high to high0.1320.53381208−0.1310.46719208
low to mid-low−0.1320.46775970.1310.5332597
Table 10. Rbrul of the contribution of independent predictors to the selection of that and zero in non-subject relative clauses in the TL Corpus of English (Note: grey shading indicates predictors selected as statistically significant).
Table 10. Rbrul of the contribution of independent predictors to the selection of that and zero in non-subject relative clauses in the TL Corpus of English (Note: grey shading indicates predictors selected as statistically significant).
THAT ZERO
Input probability0.919 0.081
Log likelihood−237.114 −237.114
R2 total0.223 0.223
N207/375 168/375
LOFW%NLOFW%N
Subject of rel. clause (p < 0.0022)
noun1.2470.7779315−1.2470.223715
pronoun−1.2470.223543601.2470.77746360
Matrix construction (p < 0.0494)
stative possessive0.5520.6357121−0.5520.3652921
lone head NP0.0820.5206456−0.0820.4803656
other−0.0820.480591230.0820.52042103
copula−0.5520.365481750.5520.63552175
Adjacency (p < 0.0011)
adjacent−0.8490.300533490.8490.70047349
non-adjacent0.8490.7008526−0.8490.3001526
Type of antecedent NP
indefinite NP0.3170.57964129−0.3170.42136129
pronoun−0.1220.47051570.1220.5304957
definite NP−0.1950.451501890.1950.54950189
Animacy of antecedent
human0.2230.5566282−0.2230.4443882
inanimate−0.2330.444532930.2330.55647293
Length of relative clause
2–4 words−0.0980.475522330.0980.52449233
5+ words0.0980.52561142−0.0980.47639142
Table 11. Rbrul of the contribution of independent predictors to the selection of that and zero in non-subject relative clauses in the L2 Corpus of English (Notes: (i) grey shading indicates predictors selected as statistically significant; (ii) bolded numbers indicate re-ordering of individual constraints vis-à-vis those operative in the TL baseline).
Table 11. Rbrul of the contribution of independent predictors to the selection of that and zero in non-subject relative clauses in the L2 Corpus of English (Notes: (i) grey shading indicates predictors selected as statistically significant; (ii) bolded numbers indicate re-ordering of individual constraints vis-à-vis those operative in the TL baseline).
THAT ZERO
Input probability0.841 0.159
Log likelihood−204.591 −204.591
R2 total0.352 0.352
N178/349 171/349
LOFW%NLOFW%N
Sub. of relative clause (p < 0.00457)
noun0.9720.7267914−0.9730.2742114
pronoun−0.9720.274503350.9730.72650335
Matrix construction (p < 0.0307)
stative possessive0.7840.6877229−0.7850.3132829
other0.1930.54854124−0.1930.45246124
copula−0.1980.451491500.1980.54951150
lone head NP−0.7800.31437460.7800.6866346
Adjacency (p < 4.18 × 10−5)
adjacent−1.0830.253473171.0830.74753317
non-adjacent1.0830.7478832−1.0830.2531332
Type of antecedent NP (p < 3 × 10−4)
indefinite NP0.7770.68565137−0.7770.31535137
definite NP−0.0680.483421660.0680.51758166
pronoun−0.7090.33044460.7090.6705746
Animacy of antecedent
human 0.1500.5375467−0.1500.4634667
inanimate−0.1500.463502820.1500.53750282
Length of relative clause
2–4 words−0.1980.451452060.1980.54955206
5+ words0.1980.54960143−0.1980.45140143
Proficiency
mid-high to high0.0320.50852205−0.0320.49248205
low to mid-low−0.0320.492501440.0320.50850144
Table 12. Distribution of restrictive relative markers in the L1 French corpus.
Table 12. Distribution of restrictive relative markers in the L1 French corpus.
VariantsN%
Qui48266%
Qu(e)23432%
Zero101.4%
Lequel(s)/laquelle(s)71%
Dont10.1%
734
Table 13. Distribution of preposition placement strategies in oblique relative clauses in the L1 Corpus.
Table 13. Distribution of preposition placement strategies in oblique relative clauses in the L1 Corpus.
VariantsN%
Null-preposition3057%
Pied-piping1223%
Stranding917%
Other24%
53
Table 14. Distribution of preposition placement strategies in oblique relative clauses in the L2 and TL datasets.
Table 14. Distribution of preposition placement strategies in oblique relative clauses in the L2 and TL datasets.
L2 SpeakersTL Speakers
StrategyN%N%
Preposition stranding7588%11198%
Null-preposition1012%22%
Total85 113
Table 15. Distribution of stranded prepositions according to lexical identity in L2 and TL speech (Note: grey shading indicates shared lexical forms that account for more than 5% of the data in each variety).
Table 15. Distribution of stranded prepositions according to lexical identity in L2 and TL speech (Note: grey shading indicates shared lexical forms that account for more than 5% of the data in each variety).
L2 SpeakersTL Speakers
PrepositionN%N%
with2331%4339%
in1216%1614%
on912%98%
to912%1413%
of79%76%
at57%44%
about34%65%
for34%55%
into23%33%
from11%00%
off of11%00%
behind00%11%
by00%11%
past00%11%
through00%11%
Total75 111
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Levey, S.; Rochon, K.L.; Kastronic, L. Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses. Languages 2025, 10, 232. https://doi.org/10.3390/languages10090232

AMA Style

Levey S, Rochon KL, Kastronic L. Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses. Languages. 2025; 10(9):232. https://doi.org/10.3390/languages10090232

Chicago/Turabian Style

Levey, Stephen, Kathryn L. Rochon, and Laura Kastronic. 2025. "Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses" Languages 10, no. 9: 232. https://doi.org/10.3390/languages10090232

APA Style

Levey, S., Rochon, K. L., & Kastronic, L. (2025). Language Learning in the Wild: The L2 Acquisition of English Restrictive Relative Clauses. Languages, 10(9), 232. https://doi.org/10.3390/languages10090232

Article Metrics

Back to TopTop