1. Introduction
The acquisition of the Spanish tap–trill contrast (the tap /ɾ/ and the trill /r/) has received considerable attention in recent years (e.g.,
Waltmunson 2005;
Face 2006;
Colantoni and Steele 2008;
Johnson 2008;
Rose 2010;
Olsen 2012;
Kopečková 2014;
Amengual 2016;
Morales Reyes et al. 2017). Findings from studies on the L2 production of the Spanish rhotics by L1 English speakers have revealed clear developmental paths. Specifically, such learners initially produce the English [ɹ] in place of both the tap and the trill (
Waltmunson 2005;
Olsen 2012). The tap is the first rhotic mastered, and is used as a substitute for the trill until the latter is acquired (
Face 2006;
Johnson 2008). This is interesting because the two rhotics are contrastive in intervocalic position, yet English speakers fail to reliably produce this contrast until the trill is acquired. The same tendency to substitute the tap for the trill was also observed in L1 German speakers learning Spanish (
Kopečková 2014). Given that research to date has focused principally on either L1 English or L1 German learners of Spanish, it is unclear to what extent the patterns observed are general. Indeed, is this specific to native speakers of a Germanic language or is it a universal developmental pattern when acquiring the tap–trill contrast? One objective of the present study is thus to examine whether speakers of a typologically distinct L1, namely, Mandarin, demonstrate similar production patterns when acquiring the Spanish tap and trill. Previous work has demonstrated that L1 Mandarin speakers experience difficulty with both the perception (
Ortí Mateu 1990;
Chih 2013) and production (
Ortí Mateu 1990) of the /ɾ–l/ contrast in L2 Spanish, resulting in non-target lateral substitutions in place of the tap. L1 Mandarin speakers also have difficulty producing the trill (
Ortí Mateu 1990). Therefore, while some lateral substitutions are expected in the least proficient speakers when producing /ɾ/ and /r/, it is unknown what developmental patterns these speakers might exhibit as they become more proficient.
While the participants of the present study were native speakers of Mandarin, they also spoke English as an L2. The second objective of the present study is to contribute to our understanding of L1 and L2 cross-linguistic influence (CLI) in L3 acquisition. One of the limitations in L3 phonetics and phonology (L3PP) is the small number of language combinations that have been researched, and also the limited variety of structures that have been analyzed (
Cabrelli Amaro and Wrembel 2016). Consequently, to fully understand the role of the L1 and L2 in L3 production requires additional data from speakers of different language combinations and structures. The present experiment provides data from a unique language triad involving three typologically distinct languages (L1 Sino-Tibetan, L2 Germanic, L3 Romance), on segments that have received little attention (rhotics) in L3PP. Of particular interest here is the extent to which the L2 English plays a role in the acquisition of the L3 Spanish tap. Given the similarity of the English flap and the Spanish tap, and the fact that the English flap has been shown to facilitate the acquisition of the Spanish tap in L1 English–L2 Spanish speakers (
Colantoni and Steele 2008;
Olsen 2012), we might expect the same to be true for L1 Mandarin–L2 English–L3 Spanish speakers. This hypothesis was investigated by examining whether speakers who have acquired the L2 English flap produce the L3 Spanish tap with greater accuracy. The fact that L1 English–L2 Spanish speakers substitute the English [ɹ] for the two Spanish rhotics suggests that the [ɹ] may also be a likely substitute in the Spanish of L1 Mandarin–L2 English–L3 Spanish speakers, if they have indeed acquired the English [ɹ], and if the L2 is a more likely source of CLI, as two models of L3 acquisition (L2 Status Factor model;
Bardel and Falk (
2007,
2012); and the Typological Proximity Model,
Rothman (
2011,
2015)) would predict for the present study’s learners. Therefore, in addition to investigating the potential positive transfer of the L2 English flap, the extent to which transfer of the L2 English [ɹ] is present will also be examined.
The goals of the present study were investigated by analyzing the intervocalic production of the Spanish tap–trill contrast by beginner to intermediate L1 Mandarin–L2 English–L3 Spanish speakers. The participants’ production of the L2 English [ɹ] and [ɾ] was also examined, in order to establish whether the speakers had the ability to produce and therefore transfer L2 sounds into their L3 Spanish.
The current work is organized as follows. In the remainder of the introduction, we will first discuss the relevant theoretical background on phonological development, in order to highlight the role that perception and perceptual categorization can play in shaping learners’ production. A brief overview of L3 acquisition theory, as well as a summary of studies on the L3 acquisition of phonetics and phonology, is also presented. The phonetic and phonological characteristics of the Spanish, Mandarin, and English rhotics are then discussed in detail, in addition to other segments that may be relevant to the acquisition of the Spanish tap and trill. These include the English flap, due to its similarity to the Spanish tap, as well as /l/, given that Mandarin learners of Spanish experience difficulty with the /l–ɾ/ contrast (
Ortí Mateu 1990;
Chih 2013). A summary of previous findings on the acquisition of the Spanish rhotics is subsequently presented, followed by specific predictions regarding the expected production patterns that L1 Mandarin–L2 English–L3 Spanish speakers will display when acquiring the Spanish rhotics. Following the introduction, the current study’s methodology is summarized. The results are then presented, and the manuscript concludes with a discussion of the results’ implications for L3 acquisition and development.
1.1. Theoretical Background on L2 and L3 Phonological Development
Although the present study focuses on production, perceptual categorization is expected to play an important role and can help explain and predict the production patterns of non-native speakers. The objective of this section is to briefly present
Best and Tyler’s (
2007) Perceptual Assimilation Model (PAM-L2), which will be used to (1) motivate which L1 Mandarin and L2 English segments, including which characteristics of these segments, may play a role in the acquisition of the Spanish tap–trill contrast (
Section 1.2); (2) partially explain the similarities and differences observed in the production patterns of L1 Mandarin and L1 English learners of Spanish (
Section 1.3); and (3) make predictions concerning L1 Mandarin–L2 English–L3 Spanish speakers’ production of the Spanish rhotics (
Section 1.4). The PAM-L2 is expected to be the most appropriate model for the present study, given that it posits a role for phonetic, phonological, and orthographic similarity during the categorization of sounds. While the model was designed specifically for perception, the predictions it makes for the perceptual assimilation of sounds are expected to partially explain the production patterns observed in non-native speakers. Note that while the PAM-L2 was developed for L2 speech perception, one can potentially apply it to any non-native speech learning context, including L3 acquisition. Therefore, an additional goal of this section is to discuss how the PAM-L2 might be applied to L3 acquisition. While PAM-L2 has not been previously applied to an L3 learning context
1, the Speech Learning Model (
Flege 1995), which is similar to the PAM-L2, has been both discussed and applied to L3 acquisition (
Sypiańska 2016a;
Lipińska 2017). However, one of the limitations of the SLM is that it cannot account for orthographic or phonological similarity, which have been shown to play a crucial role in the acquisition of rhotics (e.g.,
Face 2006;
Johnson 2008). Given that the PAM-L2 considers not only phonetic but also phonological and orthographic similarity, it is expected to be a more suitable model for the acquisition of the Spanish tap and trill.
According to the PAM-L2 (
Best and Tyler 2007), if two contrasting non-native sounds are very similar to an L1 sound, then acquisition of that contrast will be difficult, as the two non-native sounds may be assimilated to a single native sound. This has consequences for production: if learners perceptually assimilate a non-native sound to a native sound, then the non-native sound will be produced incorrectly using L1 gestural patterns. The PAM-L2 acknowledges that perceptual assimilation can happen at either the phonetic or phonological level. At the phonetic level, perceptual assimilation occurs if two sounds are similar acoustically, arising from similar articulatory gestures (e.g., manner of articulation, place of articulation, voicing). At the phonological level, perceptual assimilation can occur if sounds have similar phonotactics or even if they are represented by the same grapheme. The example given in
Best and Tyler (
2007) exemplifying phonological similarity is of French /ʁ/ and English /ɹ/, which tend to be perceptually assimilated, despite the fact that they are phonetically very different. However, both are represented by the same grapheme <r>, and behave similarly in terms of their syllable structure and phonotactics (e.g., both can occur syllable initially, finally, and in consonant clusters). These similarities (phonological and orthographical) cause English-speaking learners to perceptually categorize the two rhotics as a single phonological category, and substitute /ʁ/ with [ɹ] in French productions; similar substitutions would be expected for French-speaking learners of English.
To summarize, perceptual assimilation can be due to phonetic, phonological and/or orthographic similarity, all of which may lead to L1 influence in production. The example of the French and English rhotic is particularly relevant to this paper. As we will see in
Section 1.2, while the two Spanish rhotics are phonetically distinct, they share some phonological as well as orthographic similarities with the English rhotic and, to a lesser extent, the Mandarin rhotic. Moreover, while the contrast of interest in this paper is the acquisition of the L3 Spanish tap and trill, the Spanish /l–ɾ/ contrast is also relevant for L1 Mandarin learners of Spanish, because they tend to perceptually assimilate the Spanish /l/ and /ɾ/ to Mandarin /l/ (
Ortí Mateu 1990;
Chih 2013).
While the PAM-L2 was developed to account for L2 acquisition, the model can be applied to L3 acquisition. The crucial difference is that both L1 and L2 sounds are potential sources for perceptual assimilation. Therefore, when considering the PAM-L2 in an L3 context, two possibilities arise: (1) The most similar sound from either the L1 or L2 will be the most likely source of perceptual assimilation for an L3 sound; and (2) sounds from one of the two previously learned languages will be more likely sources of perceptual assimilation. In order to determine which of the two proposed scenarios ((1) or (2)) is the most likely, we will consider three models of L3 acquisition, as well as findings from studies investigating the acquisition of L3PP.
Three models of L3/Ln acquisition that have been tested extensively are the L2 Status Factor model (L2SF) (
Bardel and Falk 2007,
2012), the Typological Proximity Model (TPM) (
Rothman 2011,
2015), and the Cumulative Enhancement Model (CEM) (
Flynn et al. 2004). The L2SF predicts initial L2 transfer
2, regardless of typological similarity. This is due to the similar process by which post-puberty languages are acquired (i.e., using declarative memory), as opposed to the process by which languages are acquired before puberty (i.e., using procedural memory). In contrast, the TPM predicts that learners will initially transfer the (psycho)typologically
3 most similar language, which is established by an L3 learner’s linguistic parser. The linguistic parser (subconsciously) compares the L1 and L2 to the L3 during the initial stages of acquisition. Up to four linguistic domains are analyzed hierarchically: lexicon -> phonology -> functional morphology -> syntax. The analysis continues until sufficient similarity is encountered between the L3 and either the L1 or the L2 at one of the domains (for further details, see
Rothman 2015). Therefore, if the lexicon of either the L1 or L2 were clearly more similar to the L3, the parser would stop the comparison at this point. However, if no clear similarities were found, the parser would continue the analysis and begin comparing the L1 and L2 phonologies to the L3 phonology, followed by the functional morphology and then the syntax, if necessary. Once the parser determines which language is the most similar, that language is transferred to the L3. The more similar language therefore functions as the initial state of the L3, analogous to the Full Transfer Full Access proposal of
Schwartz and Sprouse (
1996) in L2 acquisition. According to the TPM’s predictions, in the present study’s language combination, the linguistic parser would only need to compare the lexicons of the L1 Mandarin, L2 English, and L3 Spanish, due to the clear similarities of the L2 English and L3 Spanish lexicons. Such similarity is not present between the L1 and L3 lexicons, and thus we should assume that the parser would conclude that the L2 English is the more similar language. As a result, the L2 English phonological system would be the initial state of the learner’s L3 Spanish. The third L3 model, the CEM (
Flynn et al. 2004), does not assume that transfer originates from only one of previously learned languages. The model proposes that learning a language is a cumulative process, and that any previously learned language could influence the acquisition of a subsequent language. Crucially, however, the influence of previously learned languages is expected to be either facilitative or neutral, and therefore negative transfer is not expected. In the context of the present study, the CEM would assume that either the L1 Mandarin or L2 English could be a potential source of transfer, if facilitative in nature. However, as we will see in
Section 1.1,
Section 1.2 and
Section 1.3, the only segment that is expected to result in positive transfer is the L2 English flap
4. Therefore, while the CEM can potentially account for both L1 and L2 transfer, the only transfer it would predict for the present study’s learners is L2 transfer of the English flap.
Note that the three L3 models are designed to determine the expected source of syntactic/morphosyntactic transfer, and in the case of the TPM, are specific to initial state learners. Consequently, the models may not apply to (1) phonetic and phonological transfer; or (2) experienced L3 speakers, and therefore are not necessarily applicable in the present study. Nevertheless, the three models would predict L2 transfer, thus the L2 should be considered a likely source of CLI. Moreover, the predictions of the TPM demonstrate that despite English and Spanish being from typologically distinct language families, an L1 Mandarin–L2 English–L3 Spanish learner would likely consider their L2 English to be more similar to Spanish than their L1 Mandarin.
Given that the L3 models were developed to explain morphosyntactic transfer, it raises the question of whether they apply to the L3PP. Research investigating whether the L1 or L2 is a more likely source of CLI in L3PP has revealed inconsistent results. In contrast to the L2SF and the TPM, some recent findings suggest that the L1 (dominant) language may be a stronger source of CLI in learners. For example,
Kopečková (
2014) investigated the acquisition of the Spanish rhotics by L1 German–L2 English–L3 Spanish speakers (11- and 12-year-old children). With respect to the L3 trill production, the author found primarily either L1 transfer (German uvular fricative or approximant productions) or what the author refers to as interlanguage productions, that consisted of either L3 tap substitutions, or uvular trills (analyzed as a combination of an L1 and L3 segment). Likewise, regarding the L3 tap, the L1 proved to be the strongest source of transfer, with 33.3% of productions displaying clear L1 transfer (uvular fricative or approximant productions). However, the learners also produced target L3 taps approximately 50% of the time. It is possible that the target L3 tap productions were the result of positive transfer of the L2 English flap, but the study did not examine the L2 production of the flap. Consequently, it is not possible to know whether this was the case. Nevertheless, the results reported by Kopečková (
2014) reveal that for her learners, the L1 was a stronger source of CLI. Similar findings were reported in
Pyun (
2005),
Llama and Cardoso (
2018), and
Llama and López-Morelos (
2016).
Pyun (
2005) investigated the acquisition of phonological processes (e.g., unreleased obstruents, consonant cluster simplification) in L1 Korean–L2 English–L3 Swedish speakers. Even though English is more similar to Swedish, CLI from the L1 Korean was more prevalent (although L2 English influence was also observed to a lesser extent).
Llama and Cardoso (
2018) examined VOT production in the L3 Spanish of L1 English–L2 French and L1 French–L2 English adults. For both groups, prevailing influence from the L1 was observed. However, for the L1 English–L2 French group, the L1 English influenced the L2 French productions, and it was not clear to what extent the L1, or both the L1 and L2, influenced the L3 Spanish VOT values. In a similar study,
Llama and López-Morelos (
2016) examined the VOT realization of voiceless stops by English dominant-Spanish heritage speakers acquiring L3 French. The L3 French VOT values patterned according to the speakers’ English VOT values, and not the speakers’ more similar Spanish VOT values. While the studies summarized here suggest that the dominant language (generally the L1) may have a privileged role, evidence has also been found for a combination of CLI from both the L1 and the L2 (
Blank and Zimmer 2009;
Wrembel 2014;
Sypiańska 2016b), and from primarily the L2 (e.g.,
Hammarberg and Hammarberg 2005;
Wrembel 2010;
Llama et al. 2010;
Chang 2015). Some of the studies observing primarily L2 CLI involved L3s that were more similar to the L2 than the L1, and therefore support both the L2SF and the TPM. For example,
Wrembel (
2010) examined CLI in the L3 English of L1 Polish–L2 German speakers, and found that the L2 was initially a stronger source of CLI. Interestingly, the author also observed that as the learner became more proficient in the L3, the L2 influence diminished, whereas the L1 influence became more prevalent. This same trend was observed in
Hammarberg and Hammarberg (
2005), who investigated CLI in one L1 English–L2 German–L3 Swedish speaker. These two studies suggest that the source of influence may vary as learners become more proficient in the L3. While some previous research that observed primarily L2 transfer consisted of language groupings in which the L2 was more similar to the L3 than the L1, other research has observed stronger transfer from the L2, even when the L2 was clearly less similar to the L3 than the L1 (e.g., L1 French–L2 English–L3 Spanish;
Llama et al. 2010), or when all three languages were from different typological families (L1 English–L2 French–L3 Japanese;
Tremblay 2007).
Given the variable findings in research on L3PP, it remains unclear in what contexts L3 models might apply to the production of L3 segments, and whether the L1 or the L2 might be a stronger source of CLI. However, results from studies in L3PP have generally found that either the L1 or the L2 is a more likely source of CLI, as opposed to both being equally likely sources. This suggests that perceptual assimilation is not determined on a segment-by-segment basis according to the most similar previously learned L1 or L2 sound, thus potentially ruling out scenario (1) discussed previously. Scenario (2) may be more likely, with perceptual assimilation being most probable from the segments of one of the two previously acquired languages, either the L1 or the L2. Specific predictions are laid out in
Section 1.4.
In the following section, the phonetic, phonological, and orthographic characteristics of the two Spanish rhotics will be summarized. Other L1 Mandarin and L2 English segments that could play a role in L1 Mandarin–L2 English-speaking learners’ perceptual categorization and, thus, production, will also be discussed.
1.2. Relevant Phonetic and Phonological Characteristics of Spanish, Mandarin, and English
In this section, similarities and differences between relevant English, Mandarin, and Spanish consonants are discussed. The goal is to provide a detailed summary of the knowledge that an L1 Mandarin–L2 English speaker has, and thus demonstrate what needs to be acquired to master the Spanish rhotics, and also to highlight what may influence (positively or negatively) acquisition. The characteristics of the Spanish rhotics will be discussed first, followed by the relevant English then Mandarin segments.
1.2.1. Spanish
Spanish has two rhotic consonants, the tap /ɾ/ and the trill /r/. The tap is a voiced segment produced with a rapid, brief contact between the tongue tip and alveolar ridge (
Martínez Celdrán and Fernández Planas 2007), averaging only approximately 23 ms, and very rarely exceeding 30 ms (
Blecua 2001). The articulation of the trill similarly involves rapid contact of the tongue against the alveolar ridge. The primary difference is that the trill is produced with two or more rapid closures. The duration of the entire segment in intervocalic position is on average 82–88 ms (
Quilis 1993). Like the tap, the trill is voiced (
Navarro Thomás 1957), although voiceless realizations are also observed. For example,
Lewis (
2004) observed voiceless word initial trills with frequency of occurrence rates ranging from 5% to 40%, depending on the final segment of the preceding word.
Waltmunson (
2005) observed that word medial trills were sometimes partially devoiced (1.6/2, where 2 = fully voiced, 1 = partially voiced, 0 = voiceless).
Phonologically, both Spanish rhotics occur in intervocalic position where they are contrastive. There are approximately 30 minimal pairs in Spanish (e.g., pero /'peɾo/ ‘but’ versus perro/ 'pero/ ’dog’; caro /'caɾo/ ‘expensive’ versus carro /'caro/ ‘car’) (
Willis and Bradley 2008). However, the two members of such pairs are generally from different lexical categories (e.g., conjunction versus noun and adjective versus noun in the examples above). Thus, a tap being produced in place of a trill, or vice versa, is unlikely to result in miscommunication. While the focus of the present paper is on intervocalic rhotics, it is important to note that the rhotics also appear in other positions in the word where they are not contrastive. The tap appears in word- and syllable-final positions (e.g., por /por/ ‘for’; parte /'paɾ.te/ ‘part’) as well as in syllable onset clusters (e.g., problema /pɾo.ble.ma/ ‘problem’), whereas the trill is found word initially (e.g., rata/'ra.ta/‘rat’) and syllable initially (e.g., honra /'on.ra/ ‘honour’) after the consonants /l/ and /n/ (
Hualde 2014). In intervocalic position the tap is represented orthographically as <r> whereas the trill is represented as <rr>. However, both rhotics are represented by a single <r> in non-contrastive word positions (e.g., parte /’paɾ.te/ ‘part’ versus rata /’ra.ta/ ’rat’) and thus share the same grapheme with English.
While the canonical realization of the /ɾ/ and /r/ phonemes is a tap and an alveolar trill, respectively, a significant amount of variation is present. In different regions of Latin America, rhotics may be produced with assibilation, as approximants, with a dorsal place of articulation, or as laterals (
Hualde 2014). In certain regions of Spain, coda /ɾ/ is realized as a lateral, fricative, or is elided (
Samper Pandilla 2011), whereas the trill is produced as a fricative, approximant, or with r-coloring (
Henriksen and Willis 2010). A significant amount of intra-speaker variation is also observed, due to stylistic differences and, in the case of the trill, due to articulatory complexity (
Hualde 2014). For example, both rhotics are sometimes produced without a complete closure in the vocal tract, resulting in fricative productions (
Blecua 2001). Other variables, such as social factors, can also play a role, with males producing fewer occlusions when realizing trills compared to females (
Henriksen 2014). Given that there is variability in rhotic production, we should assume that the learners of the present study have been exposed to multiple allophones of the rhotic phonemes. Nevertheless, the canonical tap–trill realizations are the most common variants, and are also the variants that are taught in educational settings.
In sum, while the two Spanish rhotics are phonetically quite different, they share some similarities in that both are produced with an alveolar place of articulation, appear intervocalically, and are represented with the same graphemes. These similarities may cause learners to perceptually categorize the tap and trill as a single segment, which is what occurs in L1 English–L2 Spanish speakers (at least in production; see
Section 1.3 for details). In the following sections, we will see that while the English and Mandarin rhotics share some similarities themselves, they are both phonetically different segments compared to the Spanish rhotics. However, the English and Mandarin rhotics also share some characteristics with the tap and trill, such as their graphemic representations and, especially with respect to the English /ɹ/, their phonotactics. These shared characteristics can have important implications in terms of the substitutions that might surface in production.
1.2.2. Mandarin
Mandarin Chinese has one rhotic, a voiced apical post-alveolar approximant /ɹ̺/ (
Lee 1999), with an average duration of 95 ms (
Smith 2010). There is some debate as to the exact nature of the Mandarin rhotic.
Duanmu (
2000) states that some consider it to be a voiced retroflex fricative /ʐ/, but he presents two reasons why it should be considered an approximant. First, the consonant has very little frication. Second, Mandarin has no other voiced obstruents, so it would be ‘phonologically odd’ to have a single voiced obstruent.
Lee (
1999), on the other hand, by means of an articulatory and acoustic study, determined that the rhotic has a post-alveolar articulation (not retroflex). He also found that it is produced with no frication, thus it is clearly an approximant. Note, however, that Lee’s study was based on just four speakers, and only involved word initial /ɹ̺/ produced in limited contexts (the number of different stimuli used was not stated). Consequently, the study may not capture all of the variability in production of the Mandarin rhotic. Finally,
Cerini (
2013) also argues that the rhotic is an approximant, because it is the most frequent realization, but admits that a fricative variant exists (in addition to a hybrid of both). It will therefore be assumed that speakers may produce either variant.
Phonologically, Mandarin /ɹ̺/ occurs in syllable onset position, and, similar to the Spanish rhotics, can occur in intervocalic position (lăorén /lauɹ̺ən/ ‘old people’). It can also occur syllable finally in affixed forms as a syllabic consonant (e.g., gao-gaor-de [kau-kaɚ-də], ‘rather tall’;(
Duanmu 2007). The Mandarin rhotic does not occur in consonant clusters, which are lacking in general from the language. Mandarin /ɹ̺/ is transcribed orthographically as <r> in Pinyin (
Duanmu 2007), which is the phonetically transparent, Romanized writing system of Chinese. Mandarin speakers learn the Pinyin system when they begin elementary school. Children become very competent reading in Pinyin, and every time they learn a new character in school, it is accompanied by the corresponding pinyin representation so that they know how to pronounce the new character (
Hanley 2005). Pinyin is also frequently used to type on computers and smartphones by adults. While the Chinese characters are the dominant script, and the writing system with which Chinese speakers will have the greatest amount of exposure, the fact that they also use a Romanized system to some extent indicates that orthographic transfer in the production of Spanish should be considered a possibility in L1 Mandarin speakers. However, we might also expect that orthographic transfer is less likely in L1 Mandarin compared to L1 English speakers, given that they have had less exposure to a Roman writing system.
The characteristics of the Mandarin rhotic reveal that it does not share many characteristics with the Spanish rhotics. Phonetically, they are very different; phonologically, they are only somewhat similar; specifically, the Mandarin rhotic also occurs in initial and medial positions, but does not occur in clusters, or codas. Finally, while the Mandarin and Spanish rhotics are both represented orthographically with <r>, the extent to which orthography will play a role in perceptual categorization is expected to be less in Mandarin compared to English learners of Spanish. In view of these differences, the likelihood of any perceptual categorization between Mandarin /ɹ̺/ and Spanish /ɾ/ or /r/ is moderate, and it is conceivable that L1 Mandarin beginner learners of Spanish will substitute some other segment for either or both of the rhotics. The most likely of these with respect to the tap is the Mandarin dento-alveolar lateral. This hypothesis is supported by
Ortí Mateu (
1990), who observed that L1 Mandarin–L2 Spanish speakers tend to produce the lateral in place of /ɾ/. The lateral /l/ has a similar place of articulation (dental) with the tap (alveolar), and is often confused perceptually with /ɾ/ by Mandarin speakers (
Ortí Mateu 1990;
Chih 2013). The voiced stop [d], which is an allophone of /t/ that can surface in unstressed syllables (
Duanmu 2007), is also a possible candidate for perceptual assimilation, as it shares some similarities with the Spanish tap (place of articulation and voicing).
Given that the speakers of this study spoke English as an L2, English sounds also present possible sources of influence. The most likely candidates for substitutions will now be discussed.
1.2.3. English
The speakers of the present study were living in Eastern Canada, which was the dialect to which they had the greatest amount of exposure. This section will therefore focus on characteristics specific to the Canadian dialect.
English has one rhotic, a voiced retroflex or bunched-tongue approximant /ɹ/ (
Ladefoged and Maddieson 1996) with an average duration of 95 ms (
Smith 2010). It occurs in word initial (e.g., run /ɹʌn/), medial (e.g., merry /mɛ.ɹi/), and final positions (e.g., poor /pɔɹ/) as well as in stop-rhotic clusters (e.g., tree /tɹi/). The English /ɹ/ is also represented orthographically by <r> or <rr> (e.g., <correct>). Thus, while English /ɹ/ is phonetically quite different from the Spanish rhotics, both native and non-native speakers of English learning Spanish may recognize that /ɹ/appears in similar positions, and is represented by the same grapheme. Consequently, perceptual categorization with the English and Spanish rhotics is a possibility.
In addition to /ɹ/, English has a flap allophone [ɾ], which surfaces in place of intervocalic /t/ or /d/ after a stressed vowel (
Ladefoged and Maddieson 1996), such as in water /wɑtəɹ/ ['wɑ.ɾɚ]. A flap is considered to be nearly identical acoustically to a tap, but differs slightly in articulation. The English flap is a very brief (10–40 ms), voiced segment (
Zue and Laferriere 1979). While there are slight articulatory differences between a tap and a flap
5, four variants in English have been reported for the flap allophone, one of which is a tap (
Derrick and Gick 2011). This variation in realizations is only partially dependent on context, as speakers tend to use more than one variant even when producing the same word. These findings suggest that English speakers may already have experience producing taps
6. Given the acoustic similarity of the tap and the flap, the possibility exists that they will be perceptually categorized as the same sound; moreover, given the articulatory similarity of the tap and flap, if speakers can produce a flap, then this should facilitate the production of the tap, because learners will already have acquired the principal articulatory gestures required for the latter’s production.
As with the Spanish rhotics, the flap and the English /ɹ/ are produced with some variation across dialects. For example, while intervocalic, post-tonic /t/ and /d/ are flapped in most Canadian and American varieties, they are produced as [t] and [d] in Standard Southern British English, (
Yavas 2016), or as [ʔ] (
Wells 1982), whereas /ɹ/ is produced with an alveolar place of articulation, or can be elided in certain positions (
Yavas 2016). In other UK dialects, /ɹ/ may be realized as [ʁ] or [ɾ]; the [ɾ] realization is also common in South African English (
McMahon 2002). While the goal here is not to describe all of the variation present in English, it is important to point out that the learners in the present study have almost certainly experienced some variability in English production, and have potentially been taught different ways of pronouncing the English/ɹ/and post-tonic intervocalic /t/ and /d/.
A summary of the Spanish, Mandarin, and English rhotics is displayed in
Table 1. Three main characteristics should be highlighted. First, while English /ɹ/ does not resemble the Spanish rhotics from an acoustic or articulatory perspective, phonotactically the two languages’ rhotics are very similar. This, along with the fact that the English and Spanish rhotics are represented with the graphemes <r> and <rr>, means that perceptual categorization of the tap and trill with /ɹ/ by L1 Mandarin–L2 English–L3 Spanish speakers is a possibility, especially in the more proficient English speakers. The Spanish rhotics could also be perceptually categorized with Mandarin /ɹ̺/, but this segment’s phonotactics are less similar, thus perceptual categorization may be less likely to occur with Mandarin /ɹ̺/ compared to the English /ɹ/. Second, the English flap and Spanish tap are acoustically and articulatorily nearly identical. As a result, perceptual categorization between these two segments is likely if speakers have enough experience with English. Moreover, if speakers have the ability to produce a flap, they should also be able to produce a tap, given the articulatory similarities. Third, speakers may also perceive the tap to be more similar to /d/ or, for less proficient speakers of English, /l/. As a result, these two segments are additional candidates for perceptual assimilation, and thus may surface in production. Indeed, the presence of /l/ productions in L1 Mandarin–L2 Spanish speakers has been observed, which will now be discussed in detail. This will then be followed by a summary of the literature on L1 English–L2 Spanish speakers.
1.3. Previous Studies on the Acquisition of the Rhotics
1.3.1. The Acquisition of Rhotics by Mandarin Speakers
Various recent papers focused on teaching methods and evaluation of L2 Spanish learners’ pronunciation have identified that L1 Mandarin speakers experience difficulty with the Spanish /l–ɾ/contrast, both in perception and production, as well as with the production of /r/ (
Bertola de Urgorri 2009;
Cortés Moreno 2014;
Igarreta Fernández 2015). However, few experimental perception or production studies have examined these patterns in detail.
Ortí Mateu (
1990) investigated the perception and production of several Spanish minimal pairs by 120 early L1 Mandarin–L2 Spanish learners, who were studying Spanish at a university in China. To determine the learners’ ability to perceptually discriminate the /l–ɾ/, /l–r/, and /ɾ–r/ contrasts, participants performed a forced choice word identification task. When presented with an /l–ɾ/ minimal pair (e.g., /pala/ ‘stick’–/paɾa/ ‘for’), they only identified the correct item 59% of the time, essentially at chance. Note that it was not specified if the errors occurred when presented with the /ɾ/ token, the /l/ token, or both. Learners had no difficulty with the /r–l/ and /r–ɾ/ minimal pairs, as they selected the correct word more than 90% of the time.
Ortí Mateu (
1990) also tested the participants’ production ability via a repetition/word reading task. Participants heard a series of minimal pairs in addition to being presented with the written forms of each pair. They were then required to produce each word of the minimal pair (e.g., /pelo/ ‘hair’–/peɾo/ ‘but’; /coɾo/ ‘choir’–/coro/ ‘I run’). In the production tasks, the /ɾ–l/ contrast was also difficult with non-target production occurring 33% of the time. An error-type analysis was not presented, therefore it is not clear whether participants experienced difficulty with the production of /l/, /ɾ/ or both. However, Ortí Mateu does state in a summary that [l] was generally produced in place of /ɾ/, so presumably the errors surfaced when attempting to produce the words with /ɾ/. Moreover, /l/ is a phoneme in Mandarin, so it is unlikely that it resulted in difficulty in production. The author also found that the /r–ɾ/ contrast was difficult in production (51% error rate). Again, it is not clear which of the two sounds in this contrast was (more) difficult, but the author also stated in the production summary that participants were often unable to produce the trill. We can thus infer that production of the trill was difficult for these speakers.
Another experimental study that analyzed the perception of rhotic and lateral contrasts by L1 Mandarin learners of Spanish is
Chih (
2013), who investigated the perception of eight minimal pairs in Spanish including /l–ɾ/ and /r–ɾ/ by third- and fourth-year L2 Spanish students in Taiwan using a forced-choice written-word identification task. Participants were given a sheet of paper with the written forms of 10 minimal pairs per contrast. They then heard one of the two pairs and had to circle which of the pairs they heard. Some difficulty was observed with /l–ɾ/ (26% error rate); the /r–ɾ/ contrast proved to be easier (12% error rate). The results are comparable to those reported in
Ortí Mateu (
1990), and taken together, these studies corroborate the observations discussed in the pedagogical literature (that Mandarin speakers have difficulty with the /l–ɾ/ contrast in Spanish). Overall, while it is evident that Mandarin speakers of Spanish have difficulty with the perception and production of the /ɾ–l/ contrast, it is unclear at what point during acquisition this contrast is acquired; moreover, our understanding of trill production by Mandarin speakers is negligible. Previous work has revealed that the trill is a difficult segment for Mandarin speakers; however, it is unknown what learners may produce when attempting a trill, and whether they make a contrast between the tap and trill in production. Consequently, a more controlled experimental study is crucial for determining difficulty and developmental patterns of these sounds.
While very little previous work has investigated the acquisition of the Spanish tap–trill contrast by native Mandarin speakers,
Falahati (
2015) examined the production of the Persian rhotic by L1 Mandarin speakers. When summarizing the previous literature, the author notes that native Persian speakers have been shown to produce several different variants for the rhotic, including taps, approximants, fricatives, trills. It is therefore difficult to know what the target is for learners of the Persian rhotic. Nevertheless, the author found that all seven of the L1 Mandarin speakers produced some trills, although they primarily produced either taps or approximants. Two of the learners also produced some laterals. The learners in this study were L3 Persian speakers, and spoke English as an L2. However, the role of the L2 was not examined.
In the next section, previous research on L1 English–L2 Spanish speakers will be presented, which will reveal recurrent developmental patterns reported for the L2 acquisition of the Spanish rhotic contrast, as well as demonstrate the facilitative role of the English flap when acquiring the tap. These findings help to shed light on what developmental patterns L1 Mandarin–L2 English–L3 Spanish speakers may follow, because they illustrate what role English could play in the acquisition of Spanish. A summary of previous work investigating the acquisition of the Spanish rhotics by speakers of L1s other than English and Mandarin will also be discussed.
1.3.2. The Acquisition of Spanish Rhotics by Speakers of Other Native Languages
A significant amount of previous research reveals clear, consistent acquisition patterns in the L2 acquisition of the Spanish rhotics by native English speakers. Regarding the tap,
Waltmunson (
2005) and
Face (
2006) report very similar patterns of development. L2 learners first substitute the English [ɹ]. With greater proficiency, they tend to produce a mix of the English [ɹ] and the tap. Finally, at more advanced levels, learners master the tap. Similar patterns have been observed with the acquisition of the trill. The English [ɹ] is by far the most commonly produced segment by early learners. As the speakers become more advanced, those who have acquired the tap tend to produce it in place of the trill, before eventually acquiring the trill, which is particularly difficult (
Waltmunson 2005;
Face 2006;
Johnson 2008). For example,
Johnson (
2008) found that Spanish students who had taken four semesters of Spanish were no more accurate than first-semester students, and it was not until the third or fourth year of learning that speakers began producing trills consistently. The most advanced speakers in the study (Spanish majors and graduate students) trilled at frequencies similar to those of native speakers, but even with L2 speakers of this level of proficiency, a small proportion of the speakers could not produce a trill (~5%; estimated from a histogram, exact numbers were not provided). Similarly, in
Waltmunson (
2005), one of 11 Spanish instructors with over 10 years of Spanish speaking experience was still unable to produce the trill. Thus, while some similarities are observed in the acquisition of the two Spanish rhotics, the trill is acquired much later than the tap; until the trill is acquired, it is usually substituted with the tap (or [ɹ] in less proficient learners). The fact that L1 English–L2 Spanish speakers produce [ɹ] in place of both rhotics is likely due to some form of perceptual categorization, as a result of similar orthographies and phonotactics.
A variable that has received relatively little attention but that merits consideration is whether the acquisition of the Spanish tap is conditioned by stress (i.e., whether the target rhotic is in a stressed or unstressed syllable). As was mentioned in
Section 1.2, the English flap is an allophone of /t/ or /d/ that surfaces after a stressed syllable. The relative success that L1 speakers have when learning Spanish has been argued to be partially due to the flap allophone (
Colantoni and Steele 2008;
Olsen 2012). If this is the case, tap production in post-tonic syllables should be easier (and thus more accurate) for English speakers due to L1 influence; this is what was found in
Olsen (
2012). The L1 English–L2 Spanish speakers in his study were more accurate producing the tap in post-tonic (62% accuracy) compared to tonic position (45%). We might expect the same asymmetry in accuracy to be present in L1 Mandarin–L2 English–L3 Spanish speakers, if mastery of the L2 English flap facilitates acquisition of the L3 Spanish tap and such learners have acquired the former segment.
While the acquisition of the Spanish tap–trill contrast by L1 English speakers has been well-researched, less work has investigated the acquisition of the tap–trill contrast by speakers of other L1s. As was mentioned in
Section 1.1, Kopečková investigated the production of the L3 Spanish tap and trill by L1 German–L2 English children, aged 11–12. She found that the tap was used as a substitute for the trill. Moreover, she found that the L1 was a stronger source of transfer. In another study with a language pairing similar to the present study,
Morales Reyes et al. (
2017) investigated the acquisition of intervocalic taps and trills by six L1 Korean–L2 English–L3 Spanish, 19 L1 English–L2 Spanish, and nine L1 Spanish control children (ages = 4–8). With respect to the tap, the L3 Spanish learners produced primarily taps or tap-like realizations (80.0% of the time), as well as a small proportion of other segments, such as alveolar approximants (8.5%), and trill-like segments consisting of a tap followed by frication (5.7%). No [ɹ] substitutes were produced, by either the L2 or L3 Spanish learners. With respect to the trill, the L3 Spanish speakers tended to produce either taps followed by a fricative portion (similar to a trill, with frication instead of a clear closure; 40.0% occurrence), and fricatives (36.6%). Only one trill was produced. Interestingly, both groups of learners produced almost no tap substitutions, which is what we should expected, given the pattern observed in the older children in
Kopečková (
2014) study, and the results from studies on L1 English–L2 Spanish adults (e.g.,
Face 2006;
Johnson 2008). Overall, the
Morales Reyes et al. (
2017) study revealed very little transfer from the L1 or L2 in the L3 speakers. These results contrast with those from other previous studies on the acquisition of rhotics. However, the differences observed should be expected to some extent, given the very different learner profiles (i.e., young children compared to older children and adults). Moreover, the lack of transfer observed in the L2 and L3 Spanish learners in the
Morales Reyes et al. (
2017) study may have been partly due to the fact that the young children would have had relatively little exposure to orthography. As will be discussed in the next section, orthography plays a key role in the acquisition of non-native segments.
1.3.3. Orthographic Transfer in L2 Acquisition
Previous work on the L2 acquisition of the Spanish rhotics has revealed that orthographic transfer is a likely contributor to the English [ɹ] substitutes observed in production. This is supported by the literature on orthographic transfer. Numerous studies have found that orthography can have either a facilitative effect (e.g.,
Escudero et al. 2008;
Showalter and Hayes-Harb 2013) or a non-facilitative effect (e.g.,
Bassetti and Atkinson 2015;
Rafat 2015,
2016;
Hayes-Harb and Cheng 2016) on the acquisition of non-native segments. For example,
Rafat (
2016) investigated the production of Spanish segments by naïve English speakers of Spanish. The naïve Spanish speakers were presented with pictures accompanied by a corresponding audio form of five grapheme-phoneme pairs that are different in English and Spanish, such as <v>–/b/ and <d>–/ð/. Some speakers were also presented with the corresponding orthographic form either when learning the sounds, when performing the production task, or both, whereas one group of speakers was never presented with the orthographic form. The latter group (audio-only condition) demonstrated a transfer rate of only 8% overall, with just two of the five consonant pairs revealing any transfer. In contrast, the three groups that were presented with the orthographic forms demonstrated overall transfer rates that ranged from 43% to 54%, depending on the condition, and transfer was observed for all five of the consonant pairs. These results clearly demonstrate that orthographic transfer must be considered as a variable that can influence non-native production. Nevertheless, one of the questions that arises is whether speakers of a language who are less dependent on a Roman writing system (i.e., Mandarin speakers) will still be influenced by orthographic transfer when learning a language that uses a Romanized writing system (e.g., English, Spanish). Very few studies have examined this, but the answer appears to be affirmative.
Detey (
2009) investigated the role of orthographic influence in 120 L1 Japanese–L2 French speakers who had been studying French for 1–2 years in university. Japanese speakers primarily use a non-Romanized writing system (a morphophonic system called kanji, and two moraic systems called the katakana and the hiragana;(
Detey and Nespoulous 2008). However, like Mandarin, they learn and use to a lesser extent a Romanized script called
romaji. In romaji
, the Japanese /ɾ/ is transcribed as <r>. Participants performed a forced choice perception task. They listened to the auditory form of a French non-word (e.g.,
ladeko), and had to select what they heard from two options presented to them in orthographic form (<radeko> versus <ladeko>). Confusion rates were similar (23.7% for /l/, 23.5% for /ʁ/), but the authors argue that they should have been much higher for /l/, given its greater similarity to the Japanese /ɾ/ (which also has an /l/ allophone) than the French /ʁ/, which is not acoustically similar to /ɾ/. The authors conclude that orthography was an important factor. In terms of orthography, Japanese is similar to Mandarin. Therefore, the fact that Japanese speakers are influenced by orthography when learning a French suggests that Mandarin speakers may be susceptible to orthographic transfer when learning Spanish.
To recapitulate, previous research has revealed that L1 Mandarin speakers have difficulty with the/l-ɾ/contrast in Spanish, which is likely at least partly due to difficulty perceiving the contrast. They also have difficulty producing the trill, as do L1 English speakers, which can be attributed to the precise aerodynamic conditions required for its production. Moreover, both L1 groups tend to initially substitute the same segment for the tap and trill. While Mandarin speakers may be less susceptible to orthographic transfer than native English speakers, it must be considered a possibility in Mandarin speakers as well, given their use of the Pinyin system and the fact that rhotics in that system are represented orthographically with <r>.
1.4. The Present Study: Research Questions and Predictions
The present paper has two goals. First, given that previous research on the acquisition of the Spanish rhotics has focused primarily on L1 English–L2 Spanish speakers, the goal here is to investigate the production of the Spanish rhotics by speakers of a typologically distinct L1. Second, our current knowledge of L1 and L2 CLI in L3 acquisition is limited, due to the complexity of L3 acquisition. More research is needed from different language combinations and/or different structures, in order to obtain a comprehensive understanding of L1 and L2 CLI in L3 speech. Therefore, the second goal is to examine the extent to which the L1 and the L2 play a role in the production of the L3 Spanish rhotics. Of particular interest is whether the L2 flap allophone facilitates the acquisition of the L3 Spanish tap; and whether L1 or L2 segments are more likely substitutes for the L3 Spanish rhotics. To achieve these goals, the study was designed to answer the following questions, which are presented below with their respective hypotheses.
RQ1. What developmental patterns do L1 Mandarin–L2 English–L3 Spanish speakers exhibit when acquiring the Spanish tap?
RQ2. What developmental patterns do L1 Mandarin–L2 English–L3 Spanish speakers exhibit when acquiring the Spanish trill?
RQ3. Does the production accuracy of the Spanish tap and trill increase with higher L3 Spanish oral proficiency?
RQ4. Do L1 Mandarin–L2 English–L3 Spanish speakers transfer L1 or L2 segments?
RQ5. Does the ability to articulate an L2 English flap facilitate acquisition of the L3 Spanish tap?
Hypotheses
Hypotheses 1 (H1). Based on patterns observed in L2 Spanish tap acquisition (e.g., Ortí Mateu 1990; Waltmunson 2005; Face 2006), learners are expected to initially produce a single non-native substitution (see Hypothesis 4 for expected substitutions). As learners begin to acquire the tap, the taps are expected to gradually replace the substitutions, until primarily taps are produced.
Hypotheses 2 (H2). As observed in L1 English–L2 Spanish trill production (Waltmunson 2005; Face 2006; Johnson 2008), speakers are expected to initially produce the same substitute for the trill as they do for the tap, until the tap is acquired; tap substitutions are then expected, until the trill is acquired.
Hypotheses 3 (H3). Based on the finding that L2 Spanish rhotic production accuracy is generally higher in more proficient speakers (Face 2006), the frequency with which learners produce target taps and trills is expected to increase with higher L3 Spanish oral proficiency.
Hypotheses 4a (H4a). If the TPM and L2SF are applicable to L3 phonology, we should expect primarily L2 English [ɹ] substitutions.
Hypotheses 4b (H4b). If the L1 or dominant language has a privileged status, we should expect primarily [l] substitutions.
Hypotheses 5 (H5). As predicted by the CEM, L1 Mandarin speakers who can produce an L2 English flap are expected to produce the L3 Spanish tap with higher levels of accuracy, especially when producing the tap in post-tonic position (i.e., where positive transfer from their L2 English is most direct).
2. Materials and Methods
2.1. Participants
2.1.1. Language Profile
Twenty L1 Mandarin–L2 English–L3 Spanish speakers and 10 L1 Spanish–L2 English controls participated in the study. Participants were recruited via posters and advertisements posted on Spanish course websites at a major Canadian university and were paid CDN$10 for their participation. The inclusion criteria for participation were as follows: (1) Participants had to be native speakers of Mandarin, having grown up in a city where Mandarin was the official language, and they had to have spoken Mandarin at home with their parents; (2) Participants had to have learned English as a second language in school, in China (speakers who had moved to an English-speaking country as children (and that could be considered heritage speakers) were not accepted); (3) Participants had to be enrolled in a university-level Spanish course, or had to have completed (recently) a minimum level of Spanish equivalent to first-year-university Spanish.
Participants completed a language background questionnaire which ensured that they met the inclusion criteria. The majority of participants were completing either first or second-year Spanish although six upper-year students who had completed at least six semesters of Spanish courses also participated. Ideally, more third- and fourth-year students would have been included in order to obtain a clearer picture of the full developmental path of acquisition, but participants that matched the other requirements for participation were sparse in upper-year Spanish courses. The participants were on average 20.0 years old (SD = 1.32).
Regarding L2 English experience, most speakers had begun learning English around the age of seven, but were not immersed in English until much later, around the age of 15 when they moved to Canada. Thus, while the L2 English oral proficiency varied to some extent between participants (see
Section 2.1.2 for further details), their type of education and experience with English was similar. Note also that all 20 participants continued to use their L1 Mandarin on a daily basis, and indicated that they were more comfortable communicating in Mandarin than English. The speakers are therefore considered to be L1 Mandarin-dominant. A summary of the participant characteristics is displayed in
Table 2, whereas a detailed description of individual L1 Mandarin–L2 English–L3 Spanish participant profiles is provided in
Appendix A.
Regarding the 10 L1 Spanish–L2 English controls that participated, all were born and had spent at least the first 12 years of their lives in a Spanish-speaking country (Spain, Cuba, Argentina, and Mexico). When growing up in their home country, they spoke only Spanish at home, and were educated in Spanish. They were living in Canada at the time of testing, but stated that they spoke Spanish on a daily basis with friends and family. All speakers spoke dialects that produced the alveolar tap and trill. They were on average 32.1 years old (SD = 6.11).
2.1.2. Oral Proficiency
Spanish and English oral proficiency were determined via accent ratings. Accentedness ratings specifically target oral proficiency (as opposed to proficiency in other domains, such as the lexicon or morphosyntax), thus are one of the more relevant measures for determining oral proficiency in a non-native language (
Colantoni et al. 2015, p. 89). For this reason, they have often been used to establish oral proficiency in previous work (e.g.,
Colantoni and Steele 2007,
2008;
Kopečková 2016;
Lloyd-Smith et al. 2017). All L1 Mandarin–L2 English–L3 Spanish participants were required to read ‘The North Wind and the Sun’ in English as well as its Spanish equivalent. The Spanish recordings were presented in random order to native (
N = 10) Spanish speakers, who listened to and rated on a scale of 1–5 how strong they felt each speaker’s accent was
7. The scores from all judges were averaged, and the resulting values were used as measures of each speaker’s overall oral proficiency. The English recordings were also rated by 10 native English speakers using the same procedure. Cronbach’s alpha was calculated for both sets of ratings in order to determine interrater reliability. Results were α = 0.892 and α = 0.938 for the Spanish and English accent ratings, respectively, both of which indicate a high degree of interrater agreement. A dependent samples two-tailed
t-test comparing the mean L3 Spanish accent rating (M = 1.75, SE = 0.15) to the L2 English one (M = 2.89, SE = 0.25) revealed that the participants’ L2 English proficiency was higher overall than their L3 Spanish proficiency (
t = −8.603, df = 19,
p < 0.001). For individual oral proficiency ratings, see
Appendix A.
2.2. Experimental Tasks
The L3 Spanish participants performed tasks in three languages—Spanish, English, and Mandarin, whereas the L1 Spanish controls only performed the same Spanish task. These tasks will be discussed in the same order in which the tasks were presented to the participants.
A word-reading task was used to elicit the Spanish rhotics. Participants were presented with the orthographic form of each target word (e.g., perro), one after another, on a computer screen. Each slide was advanced at a steady (untimed) pace by the experimenter. The participants were required to produce the word when they saw it appear on the screen. One of the motivations for presenting the speakers with the orthographic form was that it ensured the speakers would attempt to produce the segment of interest (e.g., tap or trill), given that they are represented by <r> and <rr>, respectively. While a more complex task would have been preferred, the least experienced speakers had very low Spanish oral proficiency and little experience speaking Spanish. Pilot testing revealed that reading the passage ‘The North Wind and the Sun’ in addition to producing a series of sentences was tiring for some speakers. Pilot testing was also used to assess the feasibility of a picture naming task; however, many speakers had difficulty identifying the pictures they were presented with. Therefore, in order to avoid articulatory fatigue and to be able to include even the least proficient speakers in the study, a word reading task was considered to be the most viable option.
The motivation for analyzing the L2 English production of the participants was to examine whether they had acquired, and could therefore transfer, the English flap and the English /ɹ/, both of which were expected to be sources of L2 transfer. The primary task used to elicit the two English segments was also a word-reading task. However, the orthographic stimuli were accompanied by their corresponding audio form, for two reasons. First, the flap is written with <t> or <d> (e.g., <water> /wɑtəɹ/ ['wɑ.ɾɚ]; <ladder> /'lædəɹ/ ['læ.ɾɚ]), which could result in orthographic influence and therefore the production of [t] or [d] in place of [ɾ]. Second, participants may have been exposed to English varieties that do not flap (e.g., British English). It is thus possible that the participants have acquired the flap, but do not always use it when speaking English. The audio form ensured that the participants attempted to produce a flap, and not some other segment. Note, however, that one of the limitations of a repetition task is that it may not indicate the extent to which a speaker uses the target segment in context. Consequently, successful production of [ɾ] or [ɹ] in the word-reading task may not reveal conclusively whether the speaker had actually acquired the two L2 English segments. Therefore, the production of the English passage ‘The North Wind and the Sun’ that participants read before the word-reading task (used to determine L2 English oral proficiency) was also examined. The passage elicited the production of four flaps (disputing [dɪs.'pju.ɾɪŋ], immediately [ə.'mi:.ɾjət.li], succeeded [sək.'siː.ɾəd], considered [kən.'sɪ.ɾɚd]) and numerous /ɹ/ tokens. The production of the flaps and four of the English /ɹ/ (in various positions: agreed [ə.'ɡɹiːd], wrapped [ɹæpt], around [ə.'ɹawnd], more [mɔɹ]) were analyzed to determine whether speakers who successfully realized [ɾ] and [ɹ] in the word-reading task also produced these segments in the reading passage. Speakers who consistently produced the targets in the word-reading task and in the reading passage were assumed to have acquired the segment. In contrast, the speakers who produced the targets accurately in the word-reading task, but not in the reading passage, were excluded from the analyses that specifically examined the extent to which L2 transfer occurred. For these speakers, it would not be possible to establish without doubt whether or not they had acquired the segments of interest.
The third task that participants performed was a word reading task in Mandarin, which was designed to elicit Mandarin intervocalic /ɹ̺/. The primary reason for eliciting the Mandarin rhotic was to be aware of the characteristics present in each individual’s L1 rhotic productions, in the case that the same characteristics were observed in the learners’ L3 Spanish rhotic productions. While transfer of the Mandarin rhotic was expected to be minimal, it had to be considered as a possible candidate, given that it is a rhotic, that it is represented by <r> in Pinyin, and that the dominant language should be considered a potential source of transfer.
2.3. Stimuli
The Spanish word-reading task involved 24 target stimuli in total: six stimuli per stress condition × two stress conditions (tonic versus post-tonic) × 2 rhotics (e.g., corrí /ko.'ri/ ’I ran’, carro /'ka.ro/ ‘car’, veré /be.'ɾe/ ‘I saw’, pero /'pe.ɾo/ ‘but’). Care was taken to select frequent words that are learned in first-year Spanish, based on the first-year Spanish textbook that was being used at the university where the students were studying and a Spanish frequency dictionary (
Davies 2006). Thirty-eight distractor stimuli were also included. The word list was read twice by each speaker, resulting in 48 rhotic and 76 distractor tokens per speaker.
A total of 60 stimuli were included in the English word-reading task. There were 10 stimuli for the flap in post-tonic position (e.g., water /wɑtəɹ/ ['wɑ.ɾɚ]), 10 stimuli for the English /ɹ/ in both post-tonic (berry /'bɛ.ɹi/) and tonic positions (direct /də.'ɹɛkt/), and 30 distractors. The English stimuli were accompanied by their audio form. To create the English stimuli, a 24-year-old female native speaker of Canadian English was recorded reading the stimuli in isolation. She was instructed to read the list of stimuli at a normal rate, as clearly as possible. The list was read three times, and the best realization of each stimulus was chosen.
Twenty stimuli were included in the Mandarin task, which consisted of 10 target stimuli with the Mandarin rhotic /ɹ̺/ in intervocalic position (牛肉 niú.ròu ‘beef’), as well as 10 distractors. The entire stimuli list for all three languages can be found in
Appendix B.
2.4. Testing Protocol
Each participant was recorded individually in a quiet lab. Participants were first asked several questions about their language background, in order to acquire relevant data (e.g., age of acquisition, use) of previously learned languages. They were subsequently recorded reading ‘The North Wind and the Sun’ passage in Spanish. Participants were given time to read the passage once on their own before being recorded, in order to become familiar with the content. After the reading passage, participants performed the Spanish word-reading task. They were asked to produce each word that they saw on the computer screen. Participants were recorded in Spanish first because it was their weakest language, and the most likely to be affected by articulatory fatigue. Following the word reading task in Spanish, participants had a short break, and were then recorded reading ‘The North Wind and the Sun’ passage in English. As with the Spanish passage, they had the opportunity to read and become familiar with the English passage before being recorded. Participants subsequently performed the English word-reading task. They were given the same instructions as the Spanish word-reading task; however, they were instructed that they would also hear a recording of the word they were supposed to produce. They were asked to listen to the recorded word first, before reading the word on the screen. The English word-reading task was followed by a short break. The session concluded with participants reading the target stimuli in Mandarin Chinese, which followed the same procedure as the Spanish word-reading task. All productions were recorded using a Marantz PMD561 recorder and a unidirectional condenser microphone.
2.5. Data Preparation and Analysis
All words containing the target segments were extracted and examined acoustically in Praat (
Boersma and Weenink 2017). The segmental boundaries were marked, which were determined based on a decrease in intensity (for all segments) in the waveform and the spectrogram (
Figure 1), and, for the English and Mandarin rhotics, a decrease in F3. Some Mandarin /ɹ̺/’s were produced as fricatives. For these segments, boundaries were marked according to the changes in periodicity. After marking segment boundaries, the type of segment produced (e.g., [ɾ], [r], [l]) was identified through a visual analysis of acoustic cues present in the waveform and spectrograms (e.g., changes in intensity and formants, the presence/absence of noise), combined with an auditory analysis. A total of 720 L3 tap (30 speakers × 24 tokens), 720 L3 trill, (30 speakers × 24 tokens), 280 L2 flap (20 speakers × 14 per tokens), 480 L2 /ɹ/ (24 per speaker), and 200 L1 /ɹ̺/ tokens (10 per speaker) were analyzed. Segments were considered taps/flaps if there was a brief closure and trills if two or more rapid closures were observed. Regarding the English and Mandarin rhotics, segments were coded as /ɹ/ or /ɹ̺/ (respectively) if a noticeable drop in F3 was visible. The Mandarin rhotics produced as fricatives did not have a decrease in F3, and were identified through the presence of noise. These productions were coded as [ʐ]. Examples are displayed in
Figure 1. All coding was performed by the author, a native English, near-native Spanish speaker trained in acoustic analysis.
After segmenting and coding the data, the identified segmental realizations were extracted and used to calculate the frequency of production of each target and non-target segment. To determine the role of predictors on the outcome of target versus non-target tap/trill production, a series of mixed effects binomial logistic regression models were run. Each model included a random intercept for ‘participant’. Categorical predictors were coded using treatment coding. All statistics were run in SPSS v. 23 (IBM, Armonk, NY, USA), with a significance level of p = 0.05.
3. Results
The results of the experiment are presented in this section, starting with the production results for the L3 tap. This is followed by a comparison of the accuracy rates of the L2 English flap and the L3 Spanish tap, in order to determine whether the L2 flap facilitates the acquisition of the L3 tap. The section concludes with the presentation of the results for the trill.
Regarding the production of the tap, recall that primarily either [l] or [ɹ] were expected initially. Results revealed that, at the group level, a [ɾ] was produced 42.9% of the time by the L3 Spanish speakers. Tap targets were also frequently produced as laterals [l], making up 27.6% of all productions. The remaining 29.5% consisted of English [ɹ] approximant productions (11.8%), stop-liquid clusters [dɾ]
8 (6.2%), stops [d] (6.2%), and sporadic productions of a variety of other non-target segments (5.3%) (approximants [ð̞], fricatives, [ð], trills [r] and deletions). No productions of L1 Mandarin [ɹ̺] were observed. In contrast to the L3 Spanish learners, native Spanish speakers produced taps 98.7% of the time, in addition to stops in very few instances (1.3%). A mixed effects binomial logistic regression comparing the target tap production of the learners versus the controls revealed that the learners were significantly less accurate (
β = −5.262; SE = 1.132;
t = 4.648;
p < 0.001).
The segments produced by the learners, arranged in increasing order of L3 tap accuracy, are displayed in
Figure 2. Each bar represents the percentage realization of taps and other non-target segments, for each participant, among each individual’s productions. The final bar represents the percentage realization by the native Spanish speakers (averaged together).
While the native speakers produced almost exclusively taps, and there was an increase in the proportion of taps with oral proficiency among the L3 learners, a significant amount of variation was observed in the non-native productions. Four of the learners were unable to produce a single tap (M13, M12, M16, M08), while an additional three (M10, M06, M14) only produced taps less than 15% of the time. These learners also had seven of the nine lowest L3 oral proficiency ratings. Five speakers (M20, M05, M04, M01, M11,) were able to produce the L3 tap with very high accuracy rates (80% or more). The rest of the speakers produced the L3 tap with accuracy rates ranging from 29.1% to 75.0%. The most frequently produced substitute was [l], making up 48.3% of non-native realizations, followed by English [ɹ] (23.0% of non-native realizations), [d] (12.1%), and [dɾ] (12.1%). A mixed effects binomial logistics regression was run on the learner data to analyze whether the L3 oral proficiency was a predictor of accurate tap production. The results revealed that it was, with target tap production increasing as the L3 oral proficiency increased (β = 3.378; SE = 0.734; t = 4.000; p < 0.001).
The production results indicate what appear to be three developmental stages. At the first stage, speakers almost categorically substituted a single segment for the L3 tap—usually [l] but in some cases English [ɹ] or the sequence [dɾ]. At the second stage, speakers began to acquire the tap but also produced a variety of other non-target segments (e.g., stops, approximants). Finally, at the third stage (M05, M01, M04, M11, M03), speakers produced primarily taps (at least 90% of the time).
The low occurrence of L2 English [ɹ] productions raises the question of whether speakers did not substitute [ɹ] with more frequency because they had not yet acquired the segment, or because most learners did not consider it to be a valid substitute. An analysis of the L2 English productions revealed no difficulty with /ɹ/. All speakers produced a target [ɹ] at least 80% of the time in the word-reading task. In the passage-reading task, all speakers produced a target [ɹ] 100% of the time, except one speaker (M13) who produced [ɹ] 75% of the time. These results indicate that all speakers had acquired /ɹ/, and used it consistently when speaking English. Therefore, while early L1 English–L2 Spanish speakers tend to transfer the English [ɹ] (e.g.,
Waltmunson 2005), this was generally not the case for the L1 Mandarin–L2 English–L3 Spanish speakers (other than M13 and M16). Interestingly, one of the two speakers who consistently produced [ɹ] was the least proficient L2 English speaker (1.1 oral proficiency), whereas some of the more proficient L2 English speakers with the lowest levels of Spanish proficiency (e.g., M07, M08) substituted primarily [l] and not [ɹ]. This finding suggests that the use of [ɹ] did not correlate with L2 English proficiency. None of the other non-native segments used at each stage appear to depend on L2 English proficiency, except for perhaps [d]. The [d] substitute occurred with the greatest frequency in the productions of four of the five most proficient L2 English speakers (M02, M07, M09, M15), although it was also produced to some extent by one speaker with a somewhat lower proficiency (M18). To determine whether any correlations were present between the type of non-native substitutions and L2 English oral proficiency, a series of Spearman’s rho correlations were run (
Table 3), comparing each speaker’s L2 English oral proficiency score with the proportion of non-native substitutions produced. A strong significant correlation between [d] productions and L2 English proficiency was observed (ρ = 0.53;
p = 0.016), confirming that [d] is more likely to be used by more advanced L2 English speakers. No other significant correlations were found.
In order to determine whether the ability to produce the L2 flap might play a factor in the acquisition of the tap, the percent accuracy of L2 flap production from the word-reading task and the passage-reading task was examined. The proportion of target-like L2 flaps realized by speaker and task is displayed in
Table 4. In general, the most successful flap producers in the word-reading task also produced the flap in the passage-reading task, which indicates that these speakers were not only capable of producing L2 flaps, but that they produced them regularly when speaking English. Three speakers (M03, M06, M10) produced some flaps in the word-reading task, but none in the passage-reading task. These speakers were therefore not included in the analysis of whether the ability to produce an L2 flap might facilitate the acquisition of the L3 tap, given that it is not conclusive whether these speakers produce flaps in English or not.
Figure 3 displays the percent accuracy of L2 flap production from the word-reading task (horizontal axis) compared with the percent accuracy of the L3 tap production (vertical axis). If the production of the L2 flap facilitates acquisition of the L3 tap, we would expect L2 flap accuracy rates to be a significant predictor of L3 tap accuracy.
The scatterplot indicates that the five speakers with L2 flap accuracy rates that were lower than 50% (i.e., those who had not acquired the flap) also had low accuracy rates producing the L3 tap. Of the twelve speakers with at least a 50% L2 flap accuracy production, eight produced the L3 tap with a 50% or higher accuracy rate (and six of them with a much higher accuracy rate). The results suggest that L2 flap accuracy does indeed correlate with L3 tap accuracy to some extent, an observation that was confirmed with a mixed-effects binomial logistic regression. L2 flap accuracy was found to be a significant predictor of target L3 tap production (
β = −4.954; SE = 2.069;
t = −2.395;
p = 0.017)
9. Note that the predictor ‘stress’ (tonic-post-tonic) was not significant (
β = −1.257; SE = 0.760;
t = −1.654;
p = 0.099), nor was there any interaction between ‘flap accuracy’ and ‘stress’ (
β = 1.726; SE = 1.064;
t = 1.623;
p = 0.105). These results reveal that the more successful L2 flap producers were more likely to produce an accurate L3 tap overall, but that it was no different in tonic versus post-tonic position. These data support the hypothesis that the ability to produce the L2 flap facilitates the production of the L3 tap. However, it was not the case that all speakers who had acquired the L2 flap were necessarily able to produce the L3 tap, as four speakers had high accuracy rates producing the L2 flap, but low accuracy rates producing the L3 tap.
We will now turn to the results for trill production. Recall that the trill production was expected to be particularly difficult, and that learners were expected to substitute the same segments they substitute for the tap, until the trill is acquired. Overall, the trill was, as expected, much more difficult than the tap for the L3 speakers. While the control group produced trills 89.2% of the time, only 14.1% of productions by the learners were trills. A mixed effects binomial logistic regression comparing target productions by each group revealed that this difference was highly significant (β = 6.975; SE = 1.245; t = 5.602; p < 0.001).
Trill targets were produced most frequently as laterals [l] (22.9%) and taps [ɾ] (22.4%), followed by English [ɹ] (13.7%), [dɾ] clusters (13.2%), approximants [ð̞] (4.5%), fricatives [ð]/[ř] (4.3%), and stops [d] (3.6%). The remaining 1.3% consisted of infrequent productions of [w], [dl], [ðɾ]. The approximants were generally similar to the approximants substituted for the tap (very open dental approximants [ð̞]), although in some cases they were produced with a particularly long duration (>100 ms), which was not the case with the tap target. The fricatives that were produced varied, ranging from short dental or alveolar fricatives as observed in the tap substitutions, to long, failed alveolar trills. They did not resemble the L1 Mandarin fricatives [ʐ].
The individual results are presented in
Figure 4, ordered by the percent accuracy of targetlike trills, from the least to most accurate. Only five of the twenty participants were able to produce trills, and of those five, only two (M05, M01) produced trills with native-like accuracy levels. The least proficient learners tended to substitute a single segment, either [l], [ɹ], or [dɾ]. As speakers became more proficient, a variety of non-target segments were produced in place of the trill, including taps. Native speakers produced primarily trills (89.2%), as well as some fricatives (10.8%). In contrast to the tap results, a mixed effects binomial logistic regression revealed that the L3 Spanish oral proficiency of the learners was not a predictor of trill accuracy (
β = 1.781; SE = 1.625;
t = 1.096;
p = 0.274).
The production patterns of the trill were similar to those of the tap, but there appear to be four stages of acquisition as opposed to three. At Stage 1 (M13 to M15), speakers almost categorically produced a substitution; [l] was produced most frequently, but [ɹ] and [dɾ] were also observed This was followed by a stage where speakers produced a variety of non-target segments (M03 to M18), including taps, but also a number of other segments (e.g., [ð], [d], [ɹ]). The third stage (M20, M11, M19) involved some trill productions, but with low frequency. The final stage involved trill production with frequencies similar to those of native speakers. This stage was only achieved by two speakers (M05, M01).
When discussing the previous literature in
Section 1, it was identified that L1 English–L2 Spanish speakers often do not make a contrast between the tap and the trill (i.e., speakers produce the same segment for the tap and the trill, beginning first with the English /ɹ/, followed by the tap once it is acquired). A question of interest in the present paper is whether this may be a universal strategy or whether it is a strategy specific to L1 English–L2 Spanish speakers. In order to determine whether the L1 Mandarin speakers tended to produce the same segment for the tap and trill, the productions for each speaker were analyzed and compared.
Table 5 displays the most frequent productions by speaker, ordered by L3 Spanish oral proficiency, for each target (tap and trill). The shaded sections indicate which speakers produced the same segment for both targets (i.e., did not make a contrast), whereas the unshaded sections indicate the speakers that produced different segments (i.e., made a contrast). Note that speakers who could produce the trill (M01, M05, M11, M19, M20) were not included, given that they had already acquired the ability to produce a contrast. The results revealed that the six least proficient speakers produced the same segment for both rhotic targets, whereas five of the six most proficient speakers produced different segments. This pattern differs from the pattern observed in L1 English–L2 Spanish speakers. Specifically, in contrast to L1 English-L2 Spanish speakers, L1 Mandarin-L2 English-L3 Spanish speakers begin producing a contrast as they become more proficient, and do not tend to use the tap in place of the trill. A possible explanation for the different patterns is discussed in detail in
Section 4.2.