1. Introduction
Relative clause constructions have attracted sustained scholarly interest for over half a century (
Roland et al., 2007), addressing a broad range of questions in first language acquisition (e.g.,
Diessel & Tomasello, 2000,
2005); second and third language acquisition (e.g.,
Flynn et al., 2004); adult sentence processing (e.g.,
Gibson, 1998); language typology (e.g.,
Keenan & Comrie, 1977); as well as grammatical theory (e.g.,
Alexiadou et al., 2000).
1 Much research and theorizing addressing the acquisition of those constructions is articulated from a cognitive perspective (
Doughty & Long, 2003), buttressed by an experimental infrastructure involving grammaticality judgements, sentence combinations tasks, elicitation and act out tasks, among a host of other techniques.
In contrast with experimental research on relativization, the bulk of which has targeted perceptual issues relating to processing and decoding strategies (
Romaine, 1984;
Macdonald, 2015), corpus-based investigations of relative clauses in natural speech
production data are much less common (but see e.g.,
Diessel & Tomasello, 2000;
Ghafar Samar, 2000;
Yip & Matthews, 2007). The fact that relative clauses are rare in running discourse (
Milroy & Gordon, 2003) may explain the predilection for data elicited and analyzed in experimental conditions, although the extent to which findings generated in tightly controlled laboratory settings accurately reflect what transpires in real-world contexts remains a moot point (
Jaeger, 2010;
Speed et al., 2018). These concerns are symptomatic of the very issues that inspired the current research, in keeping with our goal of contributing to socially sensitive and ecologically valid models of L2 acquisition. We argue that the sociolinguistic investigation of everyday speech is a necessary complement to experimental research on L2 acquisition, as it can refine and subtly enhance our understanding of how relative clauses are acquired, precisely because such an approach engages with, rather than abstracts away from, the inherent variability endemic to natural speech situated in its social context (
Labov, 1972a). Because this variability provides critical insights into the nature, extent, and limits of the L2 acquisition process, its correct characterization is of paramount importance in constructing theories of L2 acquisition that are accountable to actual usage facts.
Major incentives for corpus-based studies of relativization have come from constructivist and usage-based investigations of language acquisition (see e.g.,
Diessel & Tomasello, 2000,
2005;
Fox & Thompson, 2007;
Wiechmann, 2015). These studies have catalyzed interest in arriving at a better understanding of the relative clause constructions that language learners encounter and acquire as a result of exposure to spontaneous speech. If, as usage-based theories posit, grammatical knowledge is predicated on speakers’ linguistic experience (
Bybee, 2010), then it follows that the community-based speech varieties of the TL to which L2 speakers are exposed have the potential to afford insights into the structural biases in the TL input that intimately shape the acquisition process. Both naturalistic and experimental research on L1 acquisition has been instrumental in highlighting the effect of input frequencies on the development of relative clause constructions in child language (
Lieven, 2010). As we show below, relative clause constructions in L2 acquisition are impacted by TL input frequency patterns too (see also
Mellow, 2006).
Among the key motivations for applying corpus-based approaches to informal social settings is the need to extend the purview of L2 research beyond formal language-learning environments to less formal ones (
Bayley & Tarone, 2012), including naturalistic contexts, where the amount, frequency and type of input which learners encounter is said to be far less restricted than in classroom settings (
Montrul, 2020).
Sankoff et al. (
1997, p. 193) argue that if L2 acquisition is more than the product of successful classroom-based learning, then it should exhibit properties of the TL vernacular which are not ordinarily transmitted to learners in academic settings, but are internalized by L2 speakers who have a high degree of contact with the TL community. The quantitative approach employed in the present study is ideally suited to elucidating vernacular patterns in L2 speech and ascertaining whether they are the product of vernacular transmission from the TL, the result of processes that are unique to L2 speech (i.e., interlanguage grammar; see
Meyerhoff & Schleef, 2012), or the outcome of transfer effects from L2 speakers’ native language.
Of the different kinds of relative clause that have attracted scholarly attention, the ones we privilege here are
restrictive relative clauses. Previous treatments have sought to determine whether the L2 acquisition of these constructions is subject to cross-linguistic influence (e.g.,
Gass, 1979;
Ghafar Samar, 2000;
Rochon, 2023), and whether the L2 acquisition of English relative clauses is sensitive to the gradience of difficulty associated with the noun phrase accessibility hierarchy originally posited by
Keenan and Comrie (
1977). With the exception of
Ghafar Samar (
2000) and
Rochon (
2023), however, dedicated variationist investigations of the L2 acquisition of restrictive relativization, based on natural production data, are all but non-existent. One of the goals of the current study is to address this lacuna.
Following
Huddleston and Pullum (
2002, p. 1035), we construe restrictive relative clauses as ones which delimit the denotational reference of the head nominal they modify. These constructions, reproduced from the discourse of L2 speakers of Canadian English, are exemplified in (1)–(3) below:
- (1)
There- there’s one guy that literally laughed at me the first time he saw me (L2/002/428)
- (2)
It was literally the best decision Ø I’ve made (L2/004/33)
- (3)
And I’m a person
who likes to sleep in the morning (L2/013/324)
2
We target the variable strategies, alternating between
that,
zero and WH-forms (see (1)–(3) above), for marking restrictive relative clauses, drawing on vernacular speech. This type of speech is deemed to be the style “which is most regular in its structure” (
Labov, 1972a, p. 112), offering “the most systematic data for linguistic analysis” (
Labov, 1984, p. 29). As such, it is particularly valued for its potential to reveal community norms. We emphasize that a community focus is critical to the investigation of restrictive relativization because the marking system of restrictive relative clauses in English is considered to be “notoriously variable” (
Britain, 2020, p. 95). Inter-community differences in marking preferences are believed to indicate the general absence of a vernacular ‘norm’ in the constitution of the restrictive relative marker paradigm, as well as in terms of the distribution of the markers themselves (see
Ball, 1996, p. 243).
A major corollary that ensues from the heterogeneous marking system of restrictive relative clauses in English is that it is essential to establish exactly
what L2 speakers are exposed to, rather than simply intuiting the nature of the input (see also
Tomasello, 2003, p. 112). A cornerstone of the present investigation is the detailed comparative framework we bring to bear on the L2 acquisition of relative clauses, enabling us to examine them from multiple vantage points. Our research design incorporates three complementary datasets: one representing spontaneous L2 English recorded from Canadian francophones in the Canadian Nation Capital Region between 2018 and 2022; a second corpus of vernacular speech recorded from native Canadian anglophones in the same locality, representing a local baseline variety of the TL; and a third corpus of vernacular Canadian French obtained from a subset of the L2 speakers we recorded.
Our comparative framework enables us to address: (i) whether L2 speakers of English use the same relative markers (or relativizers) as TL speakers to introduce restrictive relative clauses; (ii) whether L2 speakers use individual relative markers at rates which match their discursive frequency in the local TL variety; and, crucially, (iii) whether L2 speakers reproduce in whole, or in part, the fine-grained linguistic conditioning governing variable relativizer selection in the corresponding TL baseline variety (see
Rehner & Mougeon, 2022). We stress that our comparison of L2 and TL speaker cohorts is intended as a heuristic only (see
White, 2003), and is not meant to imply that the L2 system is an “incomplete” or “lesser” version of its TL counterpart (
Bley-Vroman, 1983), or, indeed, that native-like mastery of TL grammar is necessarily the desired target for every L2 speaker (see
Nagy et al., 2003).
Another key component of our research design involves systematic comparisons of L2 speech and L1 French, enabling us to identify and characterize any evidence of cross-linguistic influence from speakers’ native French on their L2 restrictive relativization strategies. Relativization is considered to be “a vulnerable area” for contact effects (
Muysken, 2012, p. 238), and this possibility is believed to be enhanced when contact varieties share multiple typological similarities (
Thomason, 2001), as is the case in the present study. If transfer effects are operative in the L2 speech investigated here, our comparative variationist framework should enable us to detect them.
3. Data and Choice of Speakers
The community-based speech data we analyze here were collected in the Canadian National Capital Region including the cities of Ottawa (Ontario) and Gatineau (Quebec). As the site of prolonged contact between English and French, this metropolitan area is deemed a “natural laboratory for language contact” (
Poplack, 1989, p. 413). On one side of the provincial border in Ontario, where English is the majority language, approximately 58% of the population of the city of Ottawa claim English as a mother tongue, contrasting with 12.5% French mother-tongue claimants (
Statistics Canada, 2021). Conversely, on the other side of the provincial border in Quebec, where French is the designated majority (and official) language, the city of Gatineau comprises 71% French mother-tongue speakers, with some 12% of residents declaring English as their mother tongue (
Statistics Canada, 2021).
A fundamental requirement underpinning our compilation of a corpus of spontaneous L2 English speech was that L2 speakers should be sampled from the local native francophone population in the Canadian National Capital Region. They were also expected to have acquired Canadian French as their primary language in childhood from native francophone parents/caregivers. A further requirement was that native francophones should have completed their mandatory schooling in French-speaking educational establishments. In line with our dedicated focus on L2 acquisition, speakers who had been raised bilingually (i.e., French and English) from birth were not eligible for inclusion in the study.
Between 2018 and 2022, we recorded a total of 29 speakers meeting our sampling requirements.
Table 1 below shows the distribution of sample members by age and speaker sex.
A self-report language background questionnaire was administered to all L2 speakers in order to gather key information relating to their acquisition of English as a second language, and to assess their degree of contact with the local anglophone TL community. Information abstracted from this questionnaire was used to develop a comprehensive profile characterizing each speaker’s acquisitional history by examining their exposure to formal instruction in English; the language (French, English) used most often in daily life; frequency of English-language use at home, at work and in neighbourhood of residence; as well as the estimated proportion of anglophones in personal social networks.
Most speakers reported having begun formal instruction in English during grades three or four (i.e., between the ages of 8 and 10) of their mandatory schooling. At the time of the recordings, only three speakers reported that they used English more often than French in their daily lives, especially in work or study environments. English language use was least commonly reported in domestic settings, where French prevailed.
Only two L2 speakers, constituting just 7% of the sample, reported having no anglophones in their personal social networks. By contrast, fifty-five per cent percent of L2 speakers (N = 16) reported that native anglophones comprised 50% or more of their individual social networks, with a further 28% (N = 8) estimating that 25–50% of their social networks were made up of anglophones.
Also of relevance to the contact dimension is the fact that 20 speakers (69% of the sample) resided in neighbourhoods in Ottawa, located on the Ontario side of the provincial border, where English is the majority language. This inevitably resulted in some degree of exposure to, and interaction with, anglophones, regulated by the varying ratios of francophones to anglophones in individual neighbourhoods of residence (see
Poplack, 2018a, pp. 31–33).
One general proficiency requirement imposed at the outset on all L2 speakers was the ability (and willingness) to participate in a recorded sociolinguistic interview, the standard methodological tool used for eliciting lengthy extracts of casual speech (
Labov, 1984). Recordings were conducted in English with a native English-speaking interviewer, who introduced topics of interest to L2 speakers as the basis for extended discussion. The interview protocol was expressly intended to encourage L2 speakers to take the lead in the interaction, with minimal intervention from the interviewer. No other data-gathering instruments were used, allowing L2 speakers the freedom to express themselves as they pleased, and to use vernacular structures as little or as much as desired. Interviews lasted an average of 55 min, testifying to relatively elevated levels of English-language fluency in the L2 speaker cohort. The recorded data were subsequently transcribed, culminating in the creation of a fully searchable corpus of natural L2 data comprising some 277,000 words.
Capitalizing on procedures innovated and elaborated in previous language contact research (
Poplack, 2018a;
Torres Cacoullos & Travis, 2018), we computed a
Cumulative English Proficiency Index (CEPI) score for each L2 speaker. Individual CEPI scores, to be interpreted relative to each other rather than as absolute, global indices of proficiency, were calculated from: (i) speaker self-assessments of English-language proficiency targeting production and comprehension; (ii) scalar responses to questions concerning contextual and situational uses of English (e.g., at home, at work, in the local neighbourhood of residence, for the purposes of socializing, etc.); and (iii) content analysis of L2 speaker production data.
Scalar responses relating to English-language proficiency and contextual uses of English were calculated by assigning a score from zero to ten for each assessed category. Content analysis of speech production data focused on discrete-point measures of lexical usage, including word-searching difficulties (e.g., where L2 speakers overtly indicated that they could not find the ‘appropriate’ English word), in addition to targeting morpho-syntactic features such as the variable omission of the English plural affix −s and the possessive −s morpheme, as well as the variable inflexion of present-tense verbs in the third person. We construe the non-negligible use of such variable features (absent from the TL baseline variety) to reflect developmental characteristics of L2 speakers’ interlanguage. Mean scores based on content analysis of each individual’s transcribed recording were calculated for each speaker. A speaker who made non-negligible use of an array of interlanguage features, as described above, would score less highly in terms of content analysis than one who had a more limited repertoire and used those features less frequently in their discourse.
Cumulative proficiency scores were derived for each L2 speaker from the various (weighted) complementary measures described in (i) to (iii) above. The resultant scores range from a low of 0.450 to a high of 0.863. These scores enabled us to subdivide the L2 sample into four unevenly constituted proficiency bands, as shown in
Table 2 below.
In keeping with our comparative focus, we make use of two additional datasets, both collected using a sociolinguistic interview protocol targeting similar conversational topics that were employed in the compilation of the L2 corpus.
The first dataset, the Ottawa English Corpus (OEC), was compiled between 2008 and 2010 and comprises natural speech data recorded from 37 native adult anglophones residing in the Canadian National Capital Region. Amounting to some 273,000 words, this fully transcribed dataset serves as the local TL baseline variety with which L2 speech is compared.
The second dataset is based on vernacular Canadian French recorded from a sub-sample (N = 20) of the total native francophone population (N = 29) who contributed to the L2 English corpus. The fully transcribed French corpus comprises some 228,000 words of running discourse.
We use the French language corpus to explore the potential impact of L1 influence on L2 speakers’ production of restrictive relative clauses. The restrictive relativization systems of French and English share a set of partially corresponding variant forms (e.g., the WH-markers
qui/who,
which; complementizer
que/that) as well as similar—though not identical—contexts of variant use. These partial structural and functional correspondences are believed to be conducive to transfer effects. Furthermore, the sociolinguistic context we target here, exhibiting high levels of bilingualism as well as intense and prolonged language contact, meets all the commonly invoked criteria reported to promote cross-linguistic influence.
3 4. Method
All relative clause constructions in the TL, L2 and L1 datasets were manually located in the corpora described above by reading through the transcribed data in their entirety and performing cross-checks with the original audio-files, where necessary. This procedure ensured that all overtly marked restrictive relative clauses, as well as those introduced by a zero or null relativizer, were correctly identified.
All eligible tokens were subsequently extracted and imported into Excel files where they were coded for a number of key predictors hypothesized to influence relative marker selection (see, e.g.,
Tagliamonte et al., 2005;
Wiechmann, 2015). Relative clause constructions falling outside the envelope of variation, as we have defined it, were excluded from the analysis (e.g., non-restrictive relative clauses, adverbial relative clauses, etc.).
To test a number of hypotheses relating to potential constraints on relative marker choice, we incorporated a number of predictors into our study design. A major predictor of relative marker choice relates to the syntactic position or function of the relativizer in the relative clause (
Ball, 1996;
Romaine, 1982). We distinguished relative clauses in which the relativized element is the subject of the relative clause from those where the relativized element is in non-subject position (i.e., direct object, oblique or object of a preposition, and genitive or possessive). These distinctions allow us to examine whether the use of relative clauses correlates with the typological generalizations posited by
Keenan and Comrie (
1977). The essence of these generalizations is that there is a hierarchy of grammatical positions (subject > direct object > oblique > genitive) correlating with increasing difficulty (and diminishing frequency) of relativization, such that positions lower down the hierarchy (e.g., oblique) are more difficult to relativize (and correspondingly less frequent)—possibly as a result of working memory and linear processing constraints (e.g.,
Gibson & Wu, 2013)—than positions further up the hierarchy, which are reportedly easier to relativize (and correspondingly more frequent).
Another major predictor considered to affect relative marker choice concerns the animacy of the antecedent NP in which the relative clause is embedded (
D’Arcy & Tagliamonte, 2010;
Guy & Bayley, 1995;
Tagliamonte et al., 2005). In contrast with its French analogue,
qui, which exhibits no sensitivity to the animacy of the head nominal post-modified by a subject relative clause, the English relative marker
who encodes the semantic feature [+ human]. Relativizer
which, by contrast, is said to be restricted to non-human antecedents, as is relativizer
that (
Guy & Bayley, 1995), although the use of
that with human/non-human antecedents appears to vary significantly across communities (e.g.,
Tagliamonte, 2002). To examine the effects of animacy on relativizer marker choice, we operationalized a three-way distinction between human, non-human animate and inanimate heads.
Yet another predictor influencing relative marker selection concerns the type of antecedent post-modified by a relative clause. To assess the potential impact of this predictor, we employed a tripartite categorization system, distinguishing definite and indefinite nominal antecedents, as well as pronominal ones.
Matrix clause construction type has also been invoked in connection with relativizer choice. Earlier studies noted that the English zero subject relativizer, now considered “marginally non-standard” (
Biber et al., 1999, p. 619), is most likely encountered when the relative clause is embedded in a matrix clause containing an existential-
there construction (e.g.,
there’s a woman Ø wants to see you), a stative-possessive construction (e.g.,
I have a brother Ø knows him) or a cleft construction (e.g.,
it’s the upper-class people Ø live in this area) (see
Biber et al., 1999;
Tagliamonte, 2002). Our coding protocol took the aforementioned main clause constructions into consideration, as well as accounting for relative clauses embedded in isolated head NPs (e.g.,
people Ø I know), also reported to display distinct relative marker preferences (see
Fox & Thompson, 2007).
It has been repeatedly observed that when the grammatical subject of a non-subject relative clause is a pronoun rather than a lexical NP, an overt relativizer is less likely to be present (e.g.,
Guy & Bayley, 1995;
Levey & Hill, 2013). To detect the potential operation of this effect in our data, we distinguished cases where the grammatical subject of a non-subject relative clause is a pronoun from those where it is a full lexical noun phrase.
The two remaining predictors that we test, adjacency and relative clause length, address online processing constraints associated with syntactically complex constructions (see e.g.,
Rohdenburg, 1996).
With regard to adjacency, there is evidence indicating that the presence of intervening material between a relative clause and its antecedent head promotes the use of an overt relativizer to mitigate parsing difficulties (e.g.,
Guy & Bayley, 1995;
Tottie & Harvie, 2000). Accordingly, we distinguished cases where the relative clause immediately follows its antecedent head NP, categorized as adjacent, from non-adjacent contexts where intervening material (including filled pauses and speech disfluencies) separates the relative clause from its head nominal.
Previous research suggests that the longer the relative clause, the greater the likelihood that an overt relative marker will be selected, whereas shorter relative clauses tend to favour the zero relative marker (
Fox & Thompson, 2007). To ascertain any effect of clause length on relativizer selection, we initially counted the number of words in each relative clause, discounting the (variable) presence of the relative marker. Based on average length scores, we subsequently operationalized a binary division between shorter and longer relative clauses. Measures of clause length differ in subject and non-subject relative clauses because subject relative clauses can consist of just one word (i.e., a verb), whereas non-subject relative clauses minimally comprise two words (i.e., a subject and a verb).
In accordance with the comparative axis of our research, we also applied a modified version of the coding protocol to relative clause constructions extracted from L2 speakers’ L1, Canadian French. We pay particular attention to oblique relative clause constructions in Canadian French as these are the locus of variable marking strategies that exhibit noticeable structural and quantitative differences from their English counterparts. In the results section below, we return to those differences and their relevance to elucidating potential L1 transfer effects.
5. Results
Table 3 shows the distribution of relative clauses in L2 and TL discourse according to the syntactic position of the relativized NP. The distributional findings are entirely consistent with the typological generalizations associated with the noun phrase accessibility hierarchy (
Keenan & Comrie, 1977), with subject position being the most amenable to relativization and the genitive the least. In fact, there are no genitive relative clauses in the TL data, and only two instances marked by
whose in the L2 dataset. Further inspection of the data reveals the use of periphrastic or analytic constructions encoding a possessive function, as illustrated in (4)–(5) below from the L2 data.
- (4)
no- not all of them, there’s a bunch of them that their parents were just forcing them to try and learn French (L2/008/424)
- (5)
I mean like people that like their first language is English and speak French like yeah there’s like a huge difference (L2/026/388)
Although such analytic constructions in L2 discourse might seem initially to qualify as developmental or interlanguage phenomena, they are in fact attested in native English vernaculars (
Hermann, 2003) and are consistent with an observed (cross-linguistic) tendency to promote NPs to higher positions on the noun phrase accessibility hierarchy that are more amenable to relativization (
Keenan & Comrie, 1977).
We next consider the individual strategies that are used to mark relative clauses.
Table 4 shows the very uneven distribution of relativizers in the L2 and TL datasets, respectively. The relative marker
that is the lead variant in both datasets, occurring somewhat more frequently in L2 speech when contrasted with the TL baseline. Rates of the zero relativizer are almost equivalent in L2 and TL discourse. The WH-relativizers
which and
whose occur at minuscule rates and play no central role in the relativizer system used by either cohort. The only WH-relativizer that occurs to any significant extent in both datasets is
who, although this marker accounts for a larger proportion of the variable context in the TL baseline than in the L2 data.
Because restrictive relativization in contemporary spoken English is widely believed to constitute a “syntactically partitioned system” (
Brook & Tagliamonte, 2023, p. 27), effectively comprising alternations between
that and
who in subject relative clauses, and
zero and
that in non-subject ones (see also
Meyerhoff et al., 2020), a more insightful picture of rate differences can be obtained by considering subject relative clauses separately from non-subject ones.
Table 5 presents the results of such an analysis.
In terms of distributional parallels, competition between zero and that in non-subject relative clauses occurs at commensurate rates in the L2 and TL datasets. In subject relative clauses, by contrast, the only quantitative resemblance between the comparison groups pertains to the relatively low rates of the zero subject relativizer in each. Closer inspection of that variant in the L2 data reveals that it surfaces in similar syntactic contexts in the TL baseline, such as existential-there and stative-possessive constructions. Although relegated to a decidedly peripheral role in the marking of subject relative clauses in both datasets, the colloquial status, low frequency and patterning of the zero subject relativizer in L2 speech would seem to indicate that it is the product of vernacular transmission from the local TL variety.
A more complex issue relates to the differential rates of subject
who in the L2 and TL data, respectively. What can explain the proportional discrepancies in the use of this variant by the comparison groups? The observed differences in
Table 5 are certainly not consistent with any direct influence from L2 speakers’ native French, where as noted earlier (and see further below), the existence of
qui (‘
who/which’) is precisely the kind of interlingual parallel that would be expected to enhance, rather than impede, the L2 acquisition of relativizer
who. We can also rule out the possibility that the overall rate of subject
who in the TL baseline variety is exceptional. Comparison with other mainstream urban varieties of Canadian English reveals rates that are almost identical to the one reported for the TL variety in
Table 5 (see e.g.,
Brook & Tagliamonte, 2023, p. 23 for Toronto English).
One possible explanation is that the aggregated L2 data in
Table 5 may mask the impact of potential L2 proficiency differences on rates of relativizer
who.
Table 6, displaying variant inventories and distributions in the L2 data according to two broad proficiency levels, sheds light on this issue. Here we compare low/mid-low proficiency speakers (CEPI score range = 0.450–0.694) with their mid-high/high proficiency counterparts (CEPI score range = 0.700–0.863).
Variant rates in both comparison groups are very similar in non-subject relative clauses and almost identical in subject ones. The overall rate of who in subject relative clauses in both proficiency cohorts is exactly the same, confounding any expectation that proficiency offers a straightforward explanation of the lower incidence of who in L2 speech vis-à-vis the TL baseline.
Recognizing that surface parallels should not be equated with the functional isomorphy of form-based correspondences (
Poplack et al., 2012, p. 223), a more exacting measure of the L2 acquisition of
who requires us to consider whether those L2 speakers who use this variant do so in appreciably the same way as their TL counterparts. Recall that this sub-sector of the grammar qualifies as a “conflict site” (
Poplack & Meechan, 1998), where there are differences in the semantic properties encoded by English
who contrasted with its French counterpart,
qui. The selection of
who is determined by humanness of the antecedent head nominal, whereas
qui remains unaffected by the animacy properties of the antecedent head NP. To investigate whether the L2 acquisition of
who may involve non-target-like uses, we examine the distribution of subject relativizers according to the animacy properties of the antecedent head NP, as shown in
Table 7.
4Table 7 shows that relativizer
who is categorically used by L2 speakers with human antecedents, just as in the TL baseline variety, albeit at very different rates. The major distributional difference that emerges is that whereas
who is the dominant variant used with human antecedents by TL speakers, it is relativizer
that which is preferentially selected in the same context in the L2 data. Summarizing, to the extent that L2 speakers make use of
who to mark subject relative clauses, they do so while categorically respecting the animacy constraint that is operative in the TL.
5 We next examine the extent to which other environmental constraints on relative marker selection exhibit congruent or divergent patterns across the L2 and TL speaker cohorts. To conduct this comparison, we draw on mixed-effects regression analysis to examine the contribution of independent linguistic predictors, and proficiency in the case of L2 speakers, to the selection of different relative markers.
For statistical purposes, we employ Rbrul, a tool specifically developed for sociolinguistic research with the capacity to generate mixed-effects models (
Johnson, 2009). In ensuing tables of the results, each speaker is run as a random effect to control for any individual speaker variance (
Johnson, 2009). The numerical formalisms associated with the Rbrul output are to be interpreted as follows. The input probability is a measure of the overall likelihood that the relativizer in question will occur in the dataset. The log-likelihood value indicates the goodness of fit of the regression model to the dataset under consideration and the R
2 value indicates the proportion of the variance explained by the model. Individual constraints on relativizer choice (i.e.,
that versus
who in subject relative clauses;
that versus
zero in non-subject ones) are represented by the log-odds (LO) and the centred factor weights (FW). Log-odds with a positive value indicate that the factor shown on the left-hand side of the table has a favouring effect on relativizer choice, whereas those with a negative value exert a disfavouring effect. Centred factor weights have a similar interpretation: those above 0.5 favour relativizer selection, whereas those below 0.5 disfavour the relativizer in question.
The ordering of log-odds/centred factor weights (i.e., from most to least favouring) within an individual predictor, or factor group, constitutes the constraint hierarchy (or ranking). It is the constraint hierarchy, rather than the associated percentage values or total Ns, that remains key to interpreting any comparative analysis. Detailed examination of the constraint hierarchies conditioning variant choice yields the most penetrating characterization of variable structure (
Poplack & Tagliamonte, 2001, p. 6) and can be used to gauge the extent to which L2 speakers approximate TL grammatical norms. Constraint hierarchies in L2 speech which resemble, or are broadly parallel to, their counterparts operating in TL speech furnish the most compelling evidence of the successful L2 acquisition of TL variable patterns, when interpreted in the aggregate.
Table 8 and
Table 9 present the results of multifactorial regression analyses of predictors contributing to the selection of the relativizers
that and
who in subject relative clauses in the TL and L2 datasets, respectively. We exclude from our statistical analysis of subject relative clauses speakers who produced no instances of
who.
No predictor is selected as significant in either table depicting the results for subject relative clauses. The absence of any significant effect associated with adjacency or relative clause length, two parameters intended to capture the operation of online processing constraints, strengthens our conviction that subject relative clauses are generally less problematic for L2 learners than non-subject ones (
Gass, 1979).
Despite the absence of statistically significant effects, we can still compare variable structure across the datasets, as evidenced by the constraint hierarchies. Bolded probability co-efficients in the L2 data indicate permutations in the L2 constraint rankings vis-à-vis the corresponding hierarchy of effects in the TL baseline variety.
With regard to matrix clause construction type (where existential-there and cleft constructions have been collapsed for statistical analysis with other copula clauses, all containing the semantically light verb be), we observe some minor differences in the respective ranking of lone head NPs and other matrix clause constructions with the relativizers that and who. Lone head NPs, for example, are disfavoured with that in the L2 data but favoured with the same variant in the TL baseline. Likewise, when we inspect the probability values for adjacency, that is the choice variant in non-adjacent contexts in the L2 data, whereas who is the preferred marker in the same environment in the TL baseline.
Other differences concern the effects associated with the type of antecedent NP in the L2 data, where indefinite nouns are favoured with that whereas pronouns are disfavoured. These effects are reversed in the case of indefinite nouns and pronouns when relativizer who is selected. Yet again, opposing trends can be observed in the TL baseline in relation to the operation of the same contextual effects.
Finally, although L2 proficiency is not selected as significant in the L2 data, the probability co-efficients for
who show that it is very weakly favoured by low- to mid-low proficiency speakers. Paucity of data from speakers with lower CEPI scores precludes any definitive interpretation of this finding. Notwithstanding this caveat, the results tentatively suggest that the use of
who in the L2 subject relative clauses examined here does not increase as a function of higher levels of English proficiency. This aligns with our earlier observation that in these data, proficiency does not appear to be a major determinant of
who usage.
7We next consider the results for non-subject relative clauses.
Table 10 and
Table 11 present the results of regression analyses of predictors contributing to the selection of the relativizers
that and
zero in non-subject relative clauses in the TL and L2 datasets. In the TL corpus, three predictors return significant effects: subject of the relative clause, matrix clause construction type, and adjacency, all highlighted with grey shading. The same predictors are also selected as significant in the L2 data, with a fourth predictor, type of antecedent NP, additionally returning a significant effect in the L2 data, but not in the TL baseline variety.
Subject of the relative clause and adjacency exhibit parallel effects in the two comparison varieties. When the grammatical subject is a noun, relativizer
that is preferentially selected, whereas when the grammatical subject is a pronoun, the
zero relativizer is the choice relativizer. In usage-based theories, the preference for using the
zero relativizer to mark non-subject relative clauses containing a pronominal subject has been attributed to the degree to which the matrix and relative clause are structurally integrated with each other, with a higher degree of “mergedness” favouring relativizer omission in English (
Fox & Thompson, 2007, p. 319).
As noted earlier, the constraint hierarchy for adjacency likely reflects universal processing considerations rather than variety-specific effects. For example, the
zero variant is strongly disfavoured in both comparison varieties when the relative clause is separated from its antecedent head NP by intervening material. This result is consistent with
Rohdenburg’s (
1996, p. 151)
Complexity Principle. According to this principle, less explicit grammatical options (i.e., the
zero variant) are liable to be disfavoured in cognitively complex environments (i.e., non-adjacent contexts), where an overt relativizer is preferred instead. This appears to be especially the case in non-subject relative clauses, where greater processing burdens may be incurred by longer filler-gap dependencies (
Gibson, 1998).
When compared with the corresponding direction of effects in the TL baseline variety, re-ordering of the constraint rankings in the L2 data for matrix clause construction type and type of antecedent NP points to subtle adjustments in the conditioning of variant choice. These adjustments are suggestive of L2 speakers’ reconfiguration of constraints operative in the TL grammar. We caution, however, that they must be weighed against the fact that most of the linguistic predictors incorporated into the analysis, including the non-significant effects of animacy and length of the relative clause, pattern in appreciably the same way in the L2 and TL datasets, as gauged from the hierarchy of constraints.
Furthermore, closer inspection of L2 departures from the TL baseline variety reveals that some of the observable disparities in the L2 data are aligned with patterns detected in other native varieties of English. For example, the strong correlation between lone head NPs and the
zero relativizer in the L2 data, but not in the corresponding TL baseline variety examined here, has been documented in other urban varieties of Canadian English (see
Levey & Hill, 2013).
We also stress that some of the discrepancies visible in the L2 constraint hierarchies reflect relatively trivial alterations to the corresponding hierarchy of effects in the TL baseline variety. For example, copula clauses in the TL variety are a major determinant of the
zero relativizer, as indicated by the top-tier probability coefficients (see also
Fox & Thompson, 2007;
Levey & Hill, 2013), but the same favouring effect also operates in the L2 variety, albeit to a lesser extent. Similarly, although the respective contributions of definite and pronominal antecedents to variant choice are ranked somewhat differently in the comparison varieties, both types of antecedent disfavour relativizer
that and favour
zero in the L2 data and TL baseline.
Viewed in the aggregate, comparison of the variable structure of subject and non-subject relative clauses in the L2 and TL datasets indicates that the inter-varietal differences we have uncovered are essentially quantitative rather than qualitative in nature. We make no claim here that L2 speakers have fully reproduced the TL variable system in all its precise structural detail—an accomplishment typically associated with first rather than second language acquisition (
Labov, 2007). But we would nonetheless emphasize that our findings converge in foregrounding the capacity of relatively advanced L2 speakers to approximate the variable marking of restrictive relative clauses characteristic of the local TL variety.
To what extent might L2 speakers’ native French influence their production of restrictive relative clauses in English? We first consider the inventory and distribution of restrictive relative markers in French, as shown in
Table 12. Together accounting for 98% of the variable context, just two relative markers,
qui and
qu(e), virtually saturate the restrictive relative marker paradigm. This finding is in line with the propensity of vernacular Romance varieties to use relative particles at the beginning of relative clauses, even in syntactic positions (e.g., in oblique relative clauses) where normative grammars would typically require the selection of a relative pronoun that agrees with the semantic and/or morpho-syntactic features of the head NP post-modified by the relative clause (
Fiorentino, 2007, pp. 266–267;
Stark, 2016, p. 1036). Inspection of the data in
Table 12 shows that the relative pronouns
dont and
lequel(s)/laquelle(s), used here to mark oblique relative clauses, are exceptionally rare, consonant with their reported infrequency in other varieties of colloquial French (see e.g.,
Schafroth, 1995).
The conspicuous reduction in the number of relativizers found in colloquial (Canadian) French vis-à-vis the much richer paradigm of markers generally encountered in the standard literary language serves as a reminder that structural patterns characteristic of the written variety cannot be uncritically equated with those found in spontaneous speech (
Cheshire, 2005;
Milroy, 2001;
Poplack, 2018b).
The relative marker qui accounts for a disproportionately large swathe of the variable context in French, reflecting the preponderance of subject relative clauses in the data (subject relative clauses account for 66% data; object relative clauses 27%; and oblique relative clauses 7%). As already noted, there is little evidence to suggest that the predominance of qui in the French data has any direct bearing on the frequency of who in L2 speech, where variant rates are significantly lower than in the corresponding TL benchmark. Nor is the zero variant in colloquial French, also found in English, distributed in ways which parallel its use in either the L2 or the TL datasets. To the very limited extent that it occurs in the French data examined here, it marks a mere 4% (N=10) of non-subject relative clauses. By contrast, the zero relativizer co-occurs with 49% of non-subject relative clauses in L2 speech, slightly above 45% in the corresponding TL baseline. Thus, despite the existence of structural variants common to both vernacular French and English, the rates of those shared options diverge markedly in the respective languages concerned, diminishing, rather than strengthening, the possibility of transfer effects.
What little variation there is in relative-marking strategies in the French data is almost exclusively confined to oblique relative clauses. Particularly remarkable in view of their scant recognition in the literature (
Gadet, 1995) is the range of strategies used for marking oblique relative clauses in (Canadian) French. These strategies are partitioned across three unevenly distributed constructions, illustrated in (6)–(8) below, where [ ] indicates a null or absent preposition (also referred to in the literature as preposition chopping, Tarallo, 1983; preposition absorption, Poplack et al., 2012; or preposition ghosting, Radford, 2019):
Pied-piping
(6) la madame avec qui je vais vivre en Suisse, elle travaille avec Interpeace (FL1/017/713)
‘the lady with whom I’m going to live in Switzerland, she works with Interpeace’
Null-preposition
(7) ben ça c’est probablement une des personnes que j’ai commencé à parler [ ] en anglais (FL1/009/610)
‘well that’s probably one of the people that I began to speak [ ] in English
Preposition stranding
(8) c’est pas vraiment quelque chose que je m’en fais avec (FL1/025/218)
‘it’s not really something that I’m worried about (lit. ‘with’)’
Contrary to claims in the theoretical and descriptive literature,
Table 13 shows that pied-piping, the prescribed strategy for marking French oblique relative clauses, is neither categorical nor even the majority option in the natural speech data analyzed here (see also
Poplack et al., 2012, p. 209 for similar results). This finding highlights the gulf between analysts’ preconceived ideas about how oblique relative clauses are marked in French (e.g.,
Duffeler, 2017;
Guasti & Shlonsky, 1995) and what actually transpires in natural speech.
The null-preposition strategy (‘null-prep’), involving the non-use of a (normatively obligatory) preposition in an oblique relative clause (
Klein, 1993), is the lead variant in the L1 data, accounting for over half the marking strategies. The incidence of null-prep in the data analyzed here is consistent with its reported prevalence in other vernacular Romance varieties (see e.g.,
Alba de la Fuente & Pato, 2019;
Tarallo, 1983).
8Of particular interest from a contact perspective is that null-prep is reported to be grammatically inadmissible in English (
White, 2003, p. 51). A restricted number of cases (N = 10) surface, however, in the L2 data, as exemplified in (9)–(10) below, raising the possibility that a vernacular strategy in L2 speakers’ native French has been transferred to their English.
(9) so then you can go to the questions that you’re evaluating me [ ] (L2/004/345)
(10) like that was like my city that I wanted to go [ ] and like I could see myself living there (L2/017/130)
Only eight L2 speakers avail themselves of null-prep, indicating that it is not a widely diffused option in the L2 speaker sample.
Militating against the interpretation of L1 transfer effects are a number of competing explanations that merit consideration. Firstly, despite its putative grammatical inadmissibility in English (
White, 2003), null-prep in oblique relative clauses is sporadically encountered in the TL baseline variety, as illustrated in (11)–(12), where this phenomenon seems to be limited, as far as we can tell, to the relativization of locative PPs (but see
Radford, 2019 for a wider range of syntactic environments in English). The fact that a similar, if sparsely instantiated, construction exists in the TL data casts doubt on the hypothesis that potential transfer effects from French uniquely explain this phenomenon in L2 speech.
(11) I am presented a lot of times with moral dilemmas with the direction that my local board is going [ ] (TL/026/10688)
(12) yeah but I sh– I’m supposed to pay my insurance in the same province that I live [ ] (TL/029/11900)
Equally damaging to the L1 transfer hypothesis are claims that null-prep in L2 discourse qualifies as a systematic developmental phenomenon that is independent of the syntactic properties of L2 speakers’ native language (
Bardovi-Harlig, 1987;
Perpiñán & Cardinaletti, 2024).
Bardovi-Harlig’s (
1987) seminal study of L2 English, drawing on a large participant pool representing different L1 backgrounds and varying proficiency levels, indicated that before mastering preposition stranding and pied-piping, learners passed through a developmental stage where they did not produce a preposition (i.e., null-prep) in oblique relative clauses. These findings led
Bardovi-Harlig (
1987) to enunciate the following developmental schema: (i) null-prep > (ii) preposition stranding > pied-piping. Null-prep was mainly produced by L2 learners in the earlier phases of the acquisition process in
Bardovi-Harlig’s (
1987) study, with
Jourdain (
1996) also corroborating a strong correlation with proficiency. To the limited extent that null-prep occurs in the current study, it shows no robust correlation with CEPI scores, as speakers in both lower- and higher-proficiency bands make sporadic use of it, possibly supporting the notion that it may be a vestigial developmental strategy.
We conclude that the very limited instances of null-prep in the L2 data inhibit detailed quantitative analysis of its conditioning as well as systematic comparisons with its counterpart phenomenon in spoken French. The paucity of evidence at our disposal does not allow us to categorically rule out L1 influence, but the competing explanations that we reviewed suggest that a conspiracy of internal and external factors (i.e., ‘multiple causation,’
Thomason, 2001, p. 91) may plausibly account for null-prep in the L2 data analyzed here.
As shown in
Table 14, the unrivalled strategy used by L2 speakers to mark oblique relative clauses in English is preposition stranding, mirroring the corresponding choice mechanism in the TL. Such is the strength of preposition stranding in the community TL baseline that native anglophones do not produce a single instance of pied-piping, in spite of claims that this is a common preposition-placement strategy in written and spoken English (
Hoffmann, 2005, p. 257). Although pied-piping occurs in L2 speakers’ native French, as shown in
Table 13, it does not trigger a single instance of its structural equivalent in their spoken English, contrary to abundant claims of structural priming in contact scenarios (see e.g.,
Loebell & Bock, 2003). One reason why pied-piping is not found in either the TL or L2 datasets is that WH-forms, the only relativizers licensing pied-piping in English, occur rarely in non-subject relative clauses (cf.
Table 5). Furthermore, analysis of everyday speech (e.g.,
Levey, 2024) suggests, contra
Hoffmann (
2005), that preposition-stranding is by far the most preponderant—and, by extension, salient—option in spontaneous spoken English. Indeed,
McDaniel et al. (
1998, p. 309) go so far as to claim that pied-piping is not a natural option in English, but a prescriptive artefact acquired during schooling.
Could preposition stranding, despite its avowedly minoritarian status in spoken (Canadian) French, have enhanced speakers’ use of that strategy in their English? Such superficial structural parallels, as we have observed repeatedly, are certainly believed to optimize transfer effects (see e.g.,
Backus, 2005). Indeed, the very ubiquity of preposition stranding in spoken English is assumed to have triggered, albeit indirectly, the rise in preposition stranding in (Canadian) French as a result of language contact (see e.g.,
Roberge & Rosen, 1999, p. 154).
Yet as
Poplack et al. (
2012) caution, superficial form-based correspondences in French and English may be conditioned by very different underlying linguistic processes. This crucial caveat applies to the results of the present study. Out of nine occurrences of preposition stranding in the L1 French data, 56% (5/9) comprise just a single preposition,
avec ‘with,’ competing with just three other forms,
dedans (used twice) ‘in,’
dessus ‘on’, and
de ‘of’ (see also
Poplack et al., 2012, p. 216).
By contrast, comparison of the rates of preposition stranding according to the lexical identity of the stranded preposition in the L2 and TL datasets, depicted in
Table 15, reveals evidence that is at variance with L1 influence. Both comparison varieties in
Table 15 contain a much larger inventory of lexical forms compared to the French data, likely reflecting the substantially greater prevalence of preposition stranding in English. Granted, the most frequently stranded preposition in the English data is
with, as is the case with its French equivalent,
avec, but the prepositions
to and
of also figure among the more commonly stranded forms in the L2 and TL English data, in contrast with the documented aversion of their French counterparts,
à ‘to, at’,
de ‘of,’ to stranding (
Poplack et al., 2012, p. 210).
9In summary, we conclude that systematic quantitative analysis of the data at our disposal fails to provide unequivocal evidence indicating that L2 speakers’ native French exerts a discernible structural influence on the restrictive relative clauses they produce in English. We concede that the minor phenomenon of null-prep in L2 oblique relative clauses has a counterpart in vernacular French, but other possible sources of null-prep in L2 discourse, including developmental motivations, necessarily constrain our ability to attribute this phenomenon exclusively to cross-linguistic transfer. Similarly, triangulation of quantitative evidence across the datasets included in our study suggests that the most compelling source of preposition stranding in L2 oblique relative clauses resides in the choice mechanisms operating in the TL baseline variety, unaffected by superficial parallels in speakers’ native French.
6. Discussion and Conclusions
The primary motivation for the research reported here arose from our concern to document relative clause constructions in natural production data representing L2 speech, and to characterize the frequency and structural diversity of those constructions in everyday social interactions.
A major caveat to emerge from the present investigation is that the types of English restrictive relative clauses that L2 learners use, as well as their probability distributions, should not be assumed a priori. This information can only be reliably inferred from systematic examination of community-based speech data. Contextualization of syntactic variation in relation to the everyday speech norms of the TL to which L2 learners are exposed operates as a crucial check on previous findings, including corpus-based ones, which do not necessarily take into account community-based motivations shaping variable usage. Among the quantitative disparities that have emerged from corpus-based studies of relativization, for example, are affirmations that object relative clauses are proportionally more common than subject ones in the spoken language (
Roland et al., 2007, p. 357). This claim is squarely at odds with our own results. In each of the three natural speech corpora at our disposal, we found that subject relative clauses are the quantitatively preponderant type. Moreover, the gradient difficulties associated with the relativization of other syntactic positions (i.e., object > oblique > genitive) accord with the typological generalizations associated with the noun phrase accessibility hierarchy (
Keenan & Comrie, 1977).
A hallmark of the current study lies in the importance it attaches to the structured heterogeneity that restrictive relative clauses manifest in everyday speech, and to the application of a detailed comparative approach to the analysis of that variation. This approach enabled us to leverage new insights into the sources of orderly heterogeneity in L2 discourse. We observed that the empirical characterization of that heterogeneity, and its implications for achieving a clearer understanding of the L2 acquisition process, have typically received limited attention in many previous experimental and theoretical treatments.
It is worth briefly reviewing possible motivations for that neglect. One reason has to do with inter-disciplinary differences in preferred frameworks of analysis, with experimental researchers typically eschewing the unbalanced datasets often generated by corpus-based studies (see
Jaeger, 2010). A more insidious reason, as we have seen, is traceable to a theoretical inclination to rely on highly idealized normative accounts of language as surrogates for the facts of actual usage (
Poplack, 2018b). This approach can lead to the erroneous imposition of categoricity on grammatical phenomena (e.g., oblique relative clauses in French) that are in fact inherently variable. It can also inflate the importance of certain constructions, such as pied-piping in English oblique relative clauses (see
Levey, 2024), which are vanishingly rare in everyday spontaneous speech.
Because linguists’ intuitions about the existence, frequency and range of occurrence of constructions in speech may not dovetail with the actual patterns that characterize authentic interactions (
Bybee, 2008, p. 226;
Milroy, 2001, pp. 544–545), access to natural language corpora is indispensable in helping “to bridge the gap between the analysts’ conception of the data and the data themselves” (
Ernestus & Baayen, 2011, p. 374). Indeed, the importance of examining “frequency distributions in native discourse to look for possible sources for acquisition patterns” (
Shirai & Ozeki, 2007, p. 160) appears to be gaining traction in experimental paradigms.
Not only does a quantitative approach enable the relevant facts of variable usage to be laid bare, its greatest asset arguably lies in the level of analytical granularity it affords for comparing variable structure across different speaker groups and for capitalizing on structural comparisons as metrics of L2 acquisition.
What have those structural comparisons revealed in the present study? A first major finding is that L2 speakers use the same variants to mark restrictive relative clauses that are employed by TL speakers. Discursive frequencies of two restrictive relative markers, that and zero, are largely commensurate with the usage rates of TL speakers, especially in non-subject relative clauses. Only in the case of the subject relative marker who did we find reduced occurrences of this variant in the speech of the L2 cohort vis-à-vis the TL community baseline. None of the possible explanations we reviewed satisfactorily accounts for this discrepancy. Inspection of CEPI scores revealed no correlation in the use of who with English-language proficiency levels, or, indeed, with any other extralinguistic measure (e.g., level of educational attainment). There is, however, no indication in the data, as demonstrated by our quantitative comparisons of L2 and TL speaker cohorts, that those L2 speakers who make use of who do so in different ways from TL speakers. This bolsters our conviction that we are dealing here with a quantitative rather than a qualitative distinction between L2 and TL speakers, although the root cause of that distinction remains elusive.
Appeals to possible L1 influence on L2 speakers’ reduced use of subject
who are inconsistent with what we find in speakers’ native French, where
qui, partially equivalent to English
who, is the quantitatively dominant marker in the French restrictive relativizer paradigm, potentially rendering it available for cross-linguistic priming. Granted, certain scholars maintain that the typological rarity of WH-relativizers cross-linguistically renders them less susceptible to transfer effects (
Gisborne, 2024), despite claims in the literature ascribing a key role to language contact in the areal diffusion of interrogative pronouns as relative clause markers (see, e.g.,
Comrie, 1998;
Auderset, 2020). Nor can any convincing explanation be extrapolated from previous research targeting the L2 acquisition of English relativizers. Inspection of the literature reveals that the L2 acquisition of
who does not show any conspicuous quantitative anomalies in learner groups that parallel our own findings. In fact,
Ghafar Samar’s (
2000, pp. 117–118) study of L1 Persian speakers’ acquisition of English in the Canadian National Capital Region showed that L2 speakers actually used
who at rates which surpassed those in the local TL community, despite the absence of corresponding WH-forms in their native Persian.
The expectation that cross-linguistic parallels between L1 and TL forms should have a facilitatory effect on L2 production preferences is not borne out with regard to the subject
who. This may be because English
who is not wholly congruent with French
qui on account of differences in the semantic properties encoded by the respective forms. Recent work on cross-linguistic structural priming effects (
Van Lieburg et al., 2023) suggests that if the structures concerned are not wholly similar across languages, this may have an inhibitory effect on production preferences, although the precise mechanisms underpinning inhibition processes remain to be determined.
10In spite of the putative susceptibility of relativization strategies to contact effects (e.g.,
Muysken, 2012), our comparative approach failed to turn up compelling evidence in favour of such a scenario. The only potential candidate for cross-linguistic transfer that we detected, involving the L2 use of null-prepositions (null-prep) in oblique relative clauses, remains insubstantiated. There is no proprietary relationship between this construction and speakers’ L1, French, as there is copious evidence indicating that null-prep is a robust interlanguage phenomenon.
A central finding of our research is that L2 speakers with varying proficiency levels are capable, in the aggregate, of approximating probabilistic constraints on relativizer selection that are operative in the local TL baseline variety. To be sure, the precise configuration of those constraints in the L2 data is not a wholesale facsimile of what is found in the corresponding TL baseline. But the adjustments that are discernible in the L2 variable grammar conditioning relative marker selection are, as far as we can determine, relatively minor. These adjustments are not of the magnitude that would warrant the inference that L2 speakers’ variable system constitutes a profoundly divergent interlanguage grammar in comparison with its TL counterpart.
Our results contribute to the growing body of evidence indicating that L2 speakers are sensitive to, and capable of reproducing, statistical regularities in the TL input to which they are exposed. Of particular importance is the fact that many of the fine-grained patterns depicted in
Table 8,
Table 9,
Table 10 and
Table 11 lie so far below the level of conscious awareness that they are highly unlikely to have been explicitly transmitted to learners in formal language-learning contexts. The tendency to omit a relativizer in non-subject relative clauses, visible in
Table 10 and
Table 11, when the main clause is semantically or propositionally ‘light’ (i.e., a matrix copular clause) is but one example of a non-trivial pattern that has eluded traditional accounts of relativization (see
Fox & Thompson, 2007). This pattern is nonetheless firmly entrenched in the TL community grammar and reproduced by L2 speakers examined here.
We submit that the principal means by which such implicit patterns are conveyed to L2 learners is via a process of vernacular transmission mediated by contact with the local TL community (
Sankoff et al., 1997). Of central importance in understanding the propagation of TL vernacular norms to L2 speakers are the social characteristics of the acquisition context. Situated in a stable, long-standing bilingual community where considerable value is attached to knowledge of both official languages, the social and attitudinal circumstances of L2 acquisition, abetted by extensive exposure to TL speakers, are highly conducive to advanced L2 attainment. Recall that the personal social networks of many of the francophones we targeted are made up of substantial proportions of anglophones, reaching majority levels for more than half of the L2 speaker sample. The fact that many francophones have close affiliations with the local TL community suggests that L2 speakers have integrative motivations for using English in their anglophone friendship groups. As
Sankoff et al. (
1997, p. 193) observe, a greater degree of social integration into the TL community leads to greater linguistic integration as well.
We conclude by pointing to some of the limitations of our own study and the need for additional research targeting issues that we have insufficiently addressed. Foremost among those issues is the role of individual differences in the L2 acquisition of restrictive relative clauses. Although our statistical methods were configured to take into account inter-individual patterns of variation in the marking of relative clauses, the nature and extent of intra-individual patterns of variation remain to be determined, as does their longitudinal development.
We stress that the absence of any detailed assessment of individual differences in the current study is not a defect of our methodological approach, but derives instead from the nature of the spontaneous speech data we privileged. As observed earlier, relative clauses are infrequent in running discourse, which severely restricts our ability to mine copious and balanced amounts of data for each speaker. Further sub-categorization of those data into different types of relative clause (i.e., subject, object, oblique), essential for analytical purposes, inevitably results in additional imbalances and skewed token distributions. Witness the fact, for example, that genitive relative clauses are almost absent from the corpora we examined. In the same vein, the analysis of oblique relative clauses resulted in only modest quantities of data for each corpus, precluding any meaningful quantitative assessment of individual patterns of variation.
Although distributional asymmetries and sparse data cells are unavoidable when working with natural language data, experimental paradigms have developed tried-and-tested protocols for eliciting and analyzing syntactic variables that occur at sub-optimal rates in everyday speech. In keeping with the utility of approaching “a single problem with different methods” (
Labov, 1972b, pp. 118–119), “triangulating corpus and experimental methodologies complementarily” (
Deshors & Gries, 2022, p. 171) would seem to offer fertile avenues for mitigating the impact of the limitations we have identified. Whatever transpires in future investigations into the L2 acquisition of relativization, this line of inquiry can only be enriched by the study of actual interactions situated in their community-based context, as we hope to have shown.