The Contributions of Crosslinguistic Influence and Individual Differences to Nonnative Speech Perception

: Perception of a nonnative language (L2) is known to be affected by crosslinguistic transfer from a listener’s native language (L1), but the relative importance of L1 transfer vis-a-vis individual learner differences remains unclear. This study explored the hypothesis that the nature of L1 transfer changes as learners gain experience with the L2, such that individual differences are more influential at earlier stages of learning and L1 transfer is more influential at later stages of learning. To test this hypothesis, novice L2 learners of Korean from diverse L1 backgrounds were examined in a pretest-posttest design with respect to their perceptual acquisition of novel L2 consonant contrasts (the three-way Korean laryngeal contrast among lenis, fortis, and aspirated plosives) and vowel contrasts (/o/-/ ʌ /, /u/-/ ɨ /). Whereas pretest performance showed little evidence of L1 effects, posttest performance showed significant L1 transfer. Furthermore, pretest performance did not predict posttest performance. These findings support the view that L1 knowledge influences L2 perception dynamically, according to the amount of L2 knowledge available to learners at that time. That is, both individual differences and L1 knowledge play a role in L2 perception, but to different degrees over the course of L2 development.


Introduction
Whereas infants are described as "citizens of the world" (Werker and Tees 1984) when it comes to perceiving the sounds of a nonnative language (L2), adults are known to be comparatively poor at L2 speech perception. This disparity follows from a process of perceptual specialization for the native language (L1), which begins well before the end of the first year of life and results in an apparent decline in perception of L2 sound contrasts (Kuhl et al. 1992;Polka and Werker 1994;Werker and Pegg 1992). Although the characterization of this process varies throughout the literature, in recent years researchers have converged upon the view that specialization for the L1 does not involve loss of perceptual ability per se, but rather shifts in attention, which may be attributed to lexical and/or phonemic development (Werker 1994(Werker , 1995Werker and Curtin 2005) or to the impetus for perceptual routines to be automatic and robust to adverse conditions (Strange 2011).
Under the view that mature L1 speakers maintain access to the perceptual and cognitive abilities undergirding L1 phonological development (e.g., Flege 1995), it becomes relevant to ask how, in L2 learning, these underlying abilities interact with the consequences of L1 specialization, especially in light of individual variation in these abilities. That is, how are the processes and outcomes of L2 acquisition affected by language-specific properties of a learner's L1 vs. the personally-specific abilities of the learner? The effect of the L1 has been discussed extensively in terms of TRANSFER or crosslinguistic influence (CLI) of aspects of L1 knowledge and/or experience (Major 2008;Odlin 1989) and plays a central role in theories of L2 speech learning (see Section 1.1). However, the effect of the learner (i.e., INDIVIDUAL DIFFERENCES among speakers who may share the same L1; Dörnyei 2006), long a focus of research in the interdisciplinary field of second language acquisition (SLA), has only recently begun to be considered systematically in the study of L2 perceptual learning (see Section 1.2).
The study reported in this article is an attempt to explore the interaction of L1 transfer and individual differences in L2 perceptual learning of Korean phonological contrasts. In what follows, we review the literature on transfer and individual differences in L2 learning, describe the target Korean contrasts (the three-way laryngeal contrast among lenis, fortis, and aspirated voiceless stops and two vowel contrasts between rounded and unrounded back vowels), and motivate a TRANSFER RAMP-UP HYPOTHESIS predicting a delayed onset of L1 transfer effects in L2 perceptual development.

Crosslinguistic Influence in Speech Learning
That L2 speech learning may be influenced by previously acquired linguistic knowledge is wellestablished in the field of L2 speech and in SLA more broadly. Major theories of nonnative speech perception, such as the Perceptual Assimilation Model (PAM; Best 1994Best , 1995 and its extension to L2 learners, PAM-L2 (Best and Tyler 2007), the Native Language Magnet theory (NLM; Kuhl and Iverson 1995;Kuhl et al. 1992Kuhl et al. , 2008, the Speech Learning Model (SLM; Flege 1995), and the Automatic Selective Perception framework (ASP; Strange 2011), all incorporate a role for a listener's L1, although they vary in terms of how L1 influence is conceptualized. For example, NLM emphasizes the role of developing L1 category prototypes, whereas PAM(-L2) focuses on the ways in which L2 contrasts may be perceptually assimilated to L1 contrasts, which have consequences for both discrimination and identification of L2 sounds (e.g., Tyler et al. 2014).
Despite their differences, these theories of nonnative speech perception are similar in providing an account for ostensibly negative effects of L1 development in perception of an L2. Although L1 experience is not always a handicap in L2 perception (Bohn and Best 2012;Chang 2016Chang , 2018, it often appears that listeners with an already-developed L1 are at a disadvantage compared to listeners who are less far along in L1 development. For example, English-speaking adults were found to discriminate contrasts of Thompson Salish poorly compared to English-learning infants (Werker and Tees 1984). Such disparities have been interpreted in terms of a decline in sensitivity to nonnative contrasts during L1 development (e.g., Kuhl et al. 1992;Werker and Pegg 1992). It remains unclear, however, whether the disadvantage for adult listeners in L2 perception has to do with reduced phonetic sensitivity or "negative transfer" of L1 structures such as phonological categories (or both).
Indeed, a fundamental component of many theories of L2 speech is the misleading influence of L1 categories. For example, the SLM posits that the most difficult L2 sounds to acquire accurately in the long term are not "new" sounds, but rather "similar" sounds, which resemble the sounds of the L1. This is because a "similar" sound, unlike a "new" sound, gets perceptually linked with an L1 sound, which inhibits the formation of a distinct representation for the L2 sound. According to the SLM, L1 influence (in the form of crosslinguistic perceptual linkage) takes place at the level of the position-specific allophone, but there is reason to believe that the phoneme, rather than the allophone, may be the structure most liable to exert L1 influence. In particular, some findings suggest that access to L1 allophones, which are typically below the level of speaker consciousness, may depend on factors such as the level of acquisition and the type of orthography (Bergier 2014;Eckman et al. 2001;Vokic 2010).
Besides formalizing the influence of L1 categories, theories such as the SLM also posit that "mechanisms and processes used in learning the L1 sound system… remain intact over the life span, and can be applied to L2 learning" (Flege 1995, p. 239), suggesting that the basic perceptual abilities supporting L1 acquisition are maintained in adulthood. This view is consistent with findings showing little effect of L1 specialization in certain perceptual tasks with nonnative speech. For instance, although [s] and [ʃ] are phonemic in English and allophonic in Dutch, L1 Dutch adults were just as good as L1 English adults at discriminating the two sounds in English nonce items; however, L1 Dutch adults also rated the sounds as more similar than did L1 English adults (Johnson and Babel 2010). This type of result is at odds with the view that adult listeners necessarily have reduced sensitivity to L2 contrasts, and further suggests that any effect of the L1 sound system is closely related to task type (in particular, the levels of processing tapped by the task; cf. Díaz et al. 2012).
If adults are capable of perceiving nonnative contrasts, this still leaves the question of why they tend to perform worse than infants in L2 perception. Strange's (2011) ASP framework addresses this question with the construct of selective perception routines (SPRs). According to ASP, L1 acquisition involves developing SPRs specialized for the target L1. These SPRs are selective in the sense that they target only those acoustic cues that are relevant for distinguishing L1 phonemes, which allows L1 perception to become automatic and robust in adverse conditions. Crucially, a language-universal mode of perception never disappears, but high task demands may block access to this mode, encouraging instead a default to L1 SPRs. In ASP, then, what distinguishes adult and infant listeners is not reduced sensitivity, but shifted attention. As in L1 acquisition, ASP posits that L2 learning involves developing SPRs that weight and direct attention to cues in a manner specialized for the target L2. This implies that L2 learning generally leads to a (positive) change in L2 perception, which has indeed been found (e.g., Kilman et al. 2014;Levy and Strange 2008;cf. Holliday 2016).
Insofar as better L2 perception reflects less L1 transfer, the finding of higher L2 perceptual accuracy in more advanced learners supports the view that L1 transfer decreases over the course of L2 learning. This view is at the heart of a different theory of L2 phonological development, the Ontogeny Phylogeny Model (OPM; Major 2001). OPM differs from other theories of L2 speech in addressing the time course of all three factors that play a role in general SLA theories: L1, L2, and universal components (cf. Universal Grammar; White 2003). The central claim is that, within a learner's developing L2 system, L1 components decrease over time, while L2 components increase; meanwhile, universal components show an inverse U-shaped pattern, first increasing and then decreasing. This theory also formalizes aspects of intra-speaker variation by describing the influence of style/register, crosslinguistic similarity, and markedness. However, inter-speaker variation is not addressed in this model (or in the others discussed above), and is the subject of a different literature.

Individual Differences in Speech Learning
Interest in individual differences (IDs) has existed for decades, going back to the 1970s when SLA researchers started examining the predictive power of various personally-specific properties of the learner (for reviews, see Dörnyei 2006;Dörnyei and Skehan 2003). These include (foreign) language (learning) aptitude, general intelligence or cognitive ability (for a recent review, see Bowles et al. 2016), personality (e.g., Dewaele and Furnham 2000;Verhoeven and Vermeer 2002), and musicality (e.g., Milovanov et al. 2008Milovanov et al. , 2010, all of which are understood to be complex constructs consisting of several subconstructs (e.g., Robinson 2001). For example, language aptitude may consist of variables such as phonemic coding ability, grammatical sensitivity, inductive learning ability, and rote learning ability. Based on Carroll (1981), Skehan (1989Skehan ( , 1991Skehan ( , 1998Skehan ( , 2002 attempted to explain three abilities (namely, auditory, linguistic, and memory-based), while others (e.g., Jilka et al. 2008) have conceptualized of IDs in terms of a "talent" dimension.
In regard to L2 perception specifically, IDs may reflect a construct that has been called PHONETIC SENSITIVITY in the L2 speech literature (e.g., Kwon 2013;Munro 2008). Note that it is phonetic, as opposed to psychoacoustic, sensitivity that is of interest here due to the neurolinguistic evidence of a "speech-specific origin of individual variability in L2 phonetic mastery" (Díaz et al. 2008, p. 16083). According to Piske (2008), phonetic sensitivity can be distinguished from phonetic or phonological awareness in not necessarily referring to a conscious level of speech processing, and we use this term in a similar sense, to refer generally to sensitivity to properties of the speech signal that may serve to cue a linguistic contrast (regardless of whether or not the contrast exists in the listener's L1). Under the view that the ability to perceive nonnative contrasts does not go away during L1 development, but may be masked by attentional shifts associated with L1 specialization, it becomes relevant to ask whether IDs in L2 perceptual learning can be traced to variation in learners' (remaining) phonetic sensitivity to nonnative contrasts. That is, might the degree of (non)commitment to perceptual cues relevant for the L1 constitute an "aptitude" for L2 perceptual learning? If not, then what does?
These questions have begun to be explored in a number of studies examining IDs in L2 perception and/or production. For example, research on L1 English speakers trained on Mandarin tonal contrasts found that IDs in learning outcomes were related to learners' pre-training inclination to attend to the most informative cue for Mandarin tones (i.e., pitch direction) as well as to the interaction between their basic pitch perception abilities and the type of training they underwent (Chandrasekaran et al. 2010;Perrachione et al. 2011). Effects of preexisting variation in cue weighting, which can be arbitrary (i.e., not determined by cue informativeness; Idemaru et al. 2012), were also observed in L1 Spanish learners of Dutch and L1 Korean learners of English (Wanrooij et al. 2013;Schertz et al. 2016); however, IDs in L2 perception were not predicted by learners' cue use in L2 production (Schertz et al. 2015). Other studies have linked IDs to variation in compactness and location of L1 categories Frauenfelder 2013, 2014), self-perception of one's own L2 speech (Baker and Trofimovich 2006), formation and structure of L2 representations (Golestani and Zatorre 2009;Hattori and Iverson 2009), L2 mispronunciation detection (Hanulíková et al. 2012), L2 working memory (Darcy et al. 2015), and different neural correlates (Raizada et al. 2010;Sebastián-Gallés et al. 2012; for a recent review, see Myers 2014).
Some of the work on IDs has focused on the specific target language in the present study-Korean. As reported for learners of other L2s, learners of Korean were found to evince wide variation in L2 perception, both in discrimination of L2 contrasts prior to extensive L2 exposure as well as in identification of L2 categories following weeks of L2 learning; however, pre-learning performance (i.e., phonetic sensitivity to the target L2 contrasts) showed no, or only a weak, correlation with postlearning performance (Jung and Kwon 2010;Kwon 2013). Notably, these results were found with learners representing two or more L1 backgrounds, including typologically diverse and genetically unrelated language families (e.g., Austronesian, Finno-Ugric, Indo-European, Sino-Tibetan). The majority of the data on Korean L2 speech, however, comes from studies focusing on L1 speakers of English and/or Mandarin (e.g., Chang 2010Chang , 2012Holliday 2014Holliday , 2015Schmidt 2007). These studies show, on the one hand, many dominant patterns of response in perceptual tasks such as crosslinguistic classification (e.g., L1 Mandarin learners tended to classify Korean lenis and aspirated stops as the same Mandarin category) yet, on the other hand, substantial IDs, especially in production (e.g., L1 English learners produced Korean stop contrasts in eight different ways). In the present study, we investigate the interaction of IDs with L1 transfer, focusing on Korean as the target L2. Next, we consider the specific target contrasts, each of which may pose difficulty for L2 learners.

Stop Laryngeal Contrasts in Korean
Korean is known for a typologically rare laryngeal contrast among three series of voiceless stops: LENIS, FORTIS, and ASPIRATED. This three-way laryngeal contrast occurs at four places of articulation in all (bilabial, denti-alveolar, and velar stops, as well as alveolo-palatal affricates) and has been described in terms of several different acoustic dimensions, including voice onset time (VOT), closure duration, and properties of the following vowel such as onset fundamental frequency (f0), voice quality, vowel duration, intensity buildup, and formant trajectories (Cho et al. 2002;Kagaya 1974;Park 2002aPark , 2002b; for a recent review, see Chang 2013).
The most widely-studied acoustic cues to the stop laryngeal contrasts in utterance-initial position are VOT and f0. Traditionally, fortis, lenis, and aspirated stops are described as showing, respectively, very short, medium-lag, and very long VOTs, as well as high, low, and very high onset f0 values. However, studies have noted a recent diachronic shift in the phonetic implementation of the three laryngeal categories, which is especially evident in Seoul Korean and in female speakers (Kang 2014;Oh 2011;Silva 2006aSilva , 2006b. In particular, the VOT of lenis stops has lengthened while that of aspirated stops has shortened, leading to highly overlapping VOT distributions; these developments have resulted in an increased role of f0 in realizing the lenis-aspirated contrast. Nevertheless, it is clear from other varieties of Korean as well as data from relatively careful speech that no two of the laryngeal categories have fully merged in VOT, such that there is still something resembling a tripartite VOT contrast in Korean. For example, VOTs of the female model talker in the Korean textbook used by participants in the current study (see Section 2.2) were distinct among the three stop series, as shown in Table 1 (see also the data on native Korean teachers in Chang 2010Chang , 2012. Table 1. VOT (in ms) of Korean stops as produced by the female model talker in participants' Korean textbook (Seoul National University Language Education Institute 2010). All measures represent VOT of stops in utterance-initial position before the low vowel /a/. Thus, regardless of the degree to which lenis and aspirated stops overlap in VOT for individual speakers, it is reasonable to posit that the role of VOT in distinguishing three laryngeal categories in Korean may place a higher burden on this cue in Korean than in languages with a two-way laryngeal contrast (e.g., voiced-voiceless), by far the most common type of spoken language (Maddieson 1984). L2 learners of Korean, likely to have been exposed only to a two-way VOT opposition in their L1, may therefore experience difficulty in acquiring Korean stop laryngeal contrasts, making this set of contrasts useful to examine in a study of L2 perceptual learning.

Vowel Contrasts in Korean
The vowel inventory of modern Korean as spoken in South Korea contains, for most speakers alive today, the monophthongs /i ɛ a u ɨ o ʌ/. A former length contrast has largely disappeared, leaving only short vowels in contemporary varieties (Kim-Renaud 2012). Older descriptions of Korean (e.g., Lee 1993;Sohn 1999;Yang 1992) also include the vowels /e y ø/; however, these vowels are no longer present in the inventory as distinct monophthongs for most speakers. The front rounded vowels have developed into diphthongs (i.e., /ɥi/, /wɛ/; see, e.g., Kim-Renaud 2012), while the former contrast between /e/ and /ɛ/ has merged to one mid front vowel phoneme (Ko 2009;Jang 2015, 2018). Thus, apart from several diphthongs (/ɥi wɛ wa wʌ jɛ ja ju jo jʌ ɨi/), modern Korean has a vowel space consisting of seven basic vowel qualities (Chang 2012;Yoon and Kang 2014).
Of interest in the current study are two vowel contrasts that depart from the typologically most common five-vowel inventory of /i e a o u/ (Maddieson 1984)-that between the high back (i.e., nonfront) vowels /u/ and /ɨ/ and that between the mid back vowels /o/ and /ʌ/. These contrasts can be described in terms of a difference in phonological labiality or roundedness, as only the former vowel in each pair alternates with labial stops in certain "p-irregular" verbal paradigms and is prohibited from occurring with the rounded on-glide /w/ in diphthongs (Kim-Renaud 1974Sohn 1999). Phonetically, however, the two contrasts are realized differently. Whereas /u/ and /ɨ/ differ primarily in terms of the second formant (F2) and less so in terms of the first (F1) or third formant (F3), /o/ and /ʌ/ show substantial differences in both F1 and F2 (Table 2;   In short, the two Korean contrasts /u/-/ɨ/ and /o/-/ʌ/, as relatively marked vowel contrasts, may pose a challenge for L2 learners, who may or may not have been exposed to similar contrasts in their L1. Both are therefore examined as target contrasts in the current study.

The Present Study
Given the scarcity of L2 speech research examining the roles of the L1 and IDs in tandem, we conducted a study of L2 perceptual learning of Korean with two goals: (1) examining how L1 transfer and IDs in phonetic sensitivity interact during L2 development, and (2) contributing empirical data on L2 acquisition of Korean, still an underinvestigated L2 despite its increasing popularity as a foreign language worldwide (Byon 2008;Byon and Pyun 2012;Gordon 2015).
With regard to the interaction of transfer with IDs, we hypothesized that L1 transfer in L2 perception would, over time, show an inverse U-shaped pattern instead of a more simple decline. Because the crucial difference between this hypothesis and previous models of L2 learning is in its description of the initial portion of L2 development as showing an increase (as opposed to decrease) in L1 transfer, we call this the TRANSFER RAMP-UP HYPOTHESIS. Note that this hypothesis does not differ from previous models (e.g., OPM; see Section 1.1) in predicting a decline in L1 transfer. Rather, the claim is that the timing of this decline-and, by implication, of L1 transfer effects to begin withincorporates a delay. That is, the influence of the L1 takes time to "ramp up", peaking at an intermediate, if still early, point in L2 development (see Figure 1). 1 Weak L1 influence at L2 onset thus allows certain non-L1 factors, such as IDs in phonetic sensitivity, to play a greater role earlier (i.e., in the PRE-RAMP-UP stage) than later in L2 development. Since the Transfer Ramp-up Hypothesis contradicts several models of L2 learning in positing that L1 transfer does not peak at L2 onset, it is worth explaining the motivation for this hypothesis in more detail. In short, this view follows from two principles alluded to above. The first is that language users maintain access to the perceptual resources that supported acquisition of their L1; the second is that the nature of L2 perception changes with L2 experience. Together, these principles set the stage 1 To be more specific, the timescale for the ramp-up is hypothesized to correspond to how long it takes to develop phonological knowledge of the target L2, which is likely to differ depending on the L2 as well as the L1 background of the learner (i.e., the particular pattern of phonological alignment between the L1 and L2). However, in general, we believe that the timescale of the ramp-up to high L1 transfer will be short-less than the five-week interval of L2 learning we observed (see Section 2.2)-because, especially in an instructed L2 context, the phonological system of the L2 (at least, the basic phonological inventory) is one of the first aspects of the language to be acquired, supported by learning the orthographic system in the case of alphabetic orthographies. This is why in Figure 1 the ramp-up is represented as a steep, as opposed to a shallow, incline.

L1 Transfer
for L2 perception ab initio, at which point nothing is yet known about the L2, to involve postponing L1 transfer in favor of a language-universal mode of perception. However, the development of a phonological framework for the L2 during L2 learning eventually provides a linguistic basis for mapping L2 speech to the L1, thus encouraging transfer at later stages of learning.
Following from the Transfer Ramp-up Hypothesis, two predictions were tested in the present study. In regard to ab initio perception of the L2, it was predicted that there would be no significant effect of learners' L1 background in a perceptual task not requiring knowledge of L2 phonological categories (P1). That is, performance in such a task was expected to reflect primarily IDs in phonetic sensitivity; therefore, taking ID dimensions as intrinsic properties of the learner, we expected to find a wide range in performance for learners of all L1 backgrounds, and little to no predictive value of linguistically relevant dimensions of a learner's L1. On the other hand, in regard to L2 perception after L2 learning, it was predicted that there would be a significant effect of L1 background, at least in a perceptual task drawing upon knowledge of L2 categories (P2).
Although these predictions were general, and thus applied both to stop and to vowel perception, the linguistically relevant dimensions of a learner's L1 were different for stops and vowels. In the case of stops, the centrality of VOT in distinguishing Korean stop laryngeal categories led us to focus on the number and type of VOT oppositions as the linguistically relevant dimension of the L1. In particular, given that Korean laryngeal categories are all characterized by positive VOTs, we predicted that L1s with VOT oppositions on the positive side of the VOT space (e.g., short-vs. longlag) would lead to the most beneficial transfer. In the case of vowels, we focused on the occurrence of the target L2 contrasts as the linguistically relevant dimension of the L1 (as a proxy for helpful acoustic oppositions in the back portion of the vowel space), predicting that L1s with one or both of the target contrasts would result in more beneficial transfer than L1s containing neither contrast.
Because this is one of the first studies to examine L2 perceptual learning of Korean by learners from multiple L1 backgrounds, we endeavored to broaden the empirical contribution, as well as the generalizability of the results, by adopting an inclusive approach to the research. As such, we admitted into the study all learners who were eligible rather than targeting a few specific L1 backgrounds. The resulting diversity of L1 backgrounds, some of which are represented by only one participant, naturally presents a challenge for examining the effects of L1 background; we address this by using a group-based analysis, as described in the next section.

Methods
The study received ethical approval from the Institutional Review Board (Pukyong National University) under approval code 1041386-202008-HR-47-02. To determine the number of participants to recruit for the study, we carried out a power analysis anticipating multiple regression models with up to eight coefficients apart from the intercept (accounting for group and category predictors consisting of up to three levels, along with interaction coefficients) and assuming 80% power, an alpha level of 0.05, and a model r 2 of 0.24 (based on exploratory modeling of pilot data). Using pwr.f2.test() in the pwr package (Champely 2018) in R (R Development Core Team 2020), we determined the target number of participants to be approximately 56, so we recruited participants until we reached a final sample of at least 56 participants.

Participants and Groups
A total of 59 adult L2 learners of Korean participated in the study, with two excluded from analysis due to failure to complete the study or outlier performance (i.e., lower than 10% accuracy) in the posttest portion (see Section 2.4.2). The 57 participants in the final analysis (43 female; Mage = 21.5 yr, SD 4.5) came from 10 L1 backgrounds: North American and British English (n = 22), Mandarin Chinese (n = 14), Finnish (n = 5), Swedish (n = 5), Slovenian (n = 3), Castilian Spanish (n = 2), European French (n = 2), European Portuguese (n = 2), Malay (n = 1), and Turkish (n = 1). They were sorted into groups according to crucial features of their L1 hypothesized to influence perception of the target L2 contrasts-namely, type of VOT contrast and type of vowel inventory.
For the stop study, learners were assigned to one of three groups based on the VOT opposition in their L1 stops: LEAD-SHORT (e.g., The lead-short group included the L1 speakers of Finnish, French, Malay, Portuguese, Slovenian, and Spanish; the short-long group, English and Mandarin; and the lead-long group, Swedish and Turkish. Using the VOT ranges in Keating (1984), the group classifications were made on the basis of published descriptions of the respective L1s, which suggest that the L1s in the lead-short, short-long, and leadlong groups contrast, respectively, lead (i.e., negative) vs. short-lag VOT (Cruz-Ferreira 1995; Fougeron and Smith 1993;Herrity 2000;Martínez-Celdrán et al. 2003;Shahidi and Aman 2011;Suomi et al. 2008), short-vs. long-lag VOT (Duanmu 2007;Labov et al. 2006;Ladefoged 1999;Roach 2004), and lead vs. long-lag VOT (Helgason and Ringen 2008;Öğüt et al. 2006). Background data on the groups are summarized in Table 3. The groups did not differ significantly in age [|t|s < 1.7, ps > 0.05]. For the vowel study, learners were assigned to one of two groups based on whether their L1 contained none or some of the target vowel contrasts-namely, /u/-/ɨ/ and /o/-/ʌ/. The NO-CONTRAST group included the L1 speakers of Finnish, Mandarin, Spanish, and Swedish; the SOME-CONTRAST group, English, French, Malay, Portuguese, Slovenian, and Turkish. As in the stop study, these group classifications were made on the basis of published descriptions of the respective L1s, which suggest that the L1s in the no-contrast group contain neither contrast (Bradlow 1995;Eklund and Traunmüller 1997;Iivonen and Harnud 2005;Lee and Zee 2003) whereas the L1s in the some-contrast group contain oppositions resembling one or both of the contrasts (Clynes and Deterding 2011;Escudero et al. 2009;Hillenbrand et al. 1995;Jurgec 2005;Kiliç et al. 2004;Strange et al. 2007). Because we relied on published descriptions and there is variation in the transcription conventions used for different languages, we took a somewhat liberal approach to looking for a given contrast; for example, for /u/-/ɨ/, we looked not only for /ɨ/ but also for /ɯ/. Table 4 summarizes the background data on the two groups in the vowel study, which did not differ significantly in age [t(55) = 1.239, p > 0.05]. Participants in all L1 groups tended to have considerable experience with additional languages, such that the majority had been exposed to a type of VOT contrast and/or vowel inventory different from that in their L1. All participants who were not L1 English speakers were also proficient in English (to a level sufficient for college coursework), and most (82%) reported knowledge of one or more other languages (e.g., Cantonese, German, Italian, Japanese, Polish, Russian, Serbo-Croatian, Tagalog, Thai). Given this self-reported multilingualism, we examined whether there were disparities in "diversifying" types of additional language (Ln) exposure across groups-which could potentially result in one group having an inherent advantage in either study-by coding participants in terms of the number of their Lns (including English) that would have provided exposure to a type of VOT contrast (or vowel inventory) different from that in their L1. Between-group comparisons on this dimension revealed no significant difference between the no-contrast and some-contrast groups in the vowel study [t(55) = 1.210, p > 0.05]; however, in the stop study, the short-long group reported significantly fewer "diversifying" Lns than the other two groups [|t|s > 3.1, ps < 0.01], while the lead-short and lead-long groups did not differ from each other in this respect [t(19) = 0.156, p > 0.05]. The disadvantage of the short-long group in terms of Ln exposure-largely due to the fact that, unlike the lead-short and lead-long groups, participants in this group did not have English to count as a "diversifying" Ln-was not, in fact, reflected in systematically lower L2 Korean performance (cf. Section 4); therefore, we assume that Ln exposure, at least operationalized in terms of number of Lns, did not have a detectable impact on the relative patterning of groups on L2 Korean.

L2 Learning Context
Participants were recruited in 2013-2014 from an international summer program at a university in Seoul, where they were enrolled in an elementary Korean course as well as other courses. The Korean course lasted five weeks and consisted of three hours of instruction per day for 3-4 days a week (18 total class meetings), amounting to a total of 54 contact hours. Consistent (≥90%) attendance was required to pass the course, and every participant received at least a passing grade, meaning that all received a similarly large amount of classroom instruction in the L2. Participants' non-language courses were conducted in English. Furthermore, students in the program resided in a campus dormitory and, besides contact with Korean students in extracurricular activities, communicated with other students primarily in English. Therefore, in spite of their residence in an L2 environment, participants were not generally immersed in the L2 outside of the classroom.
Although the participants came from different Korean classes split among a team of five instructors (average class size < 15), each class used the same syllabus and teaching materials and followed the same instructional format with English as the main language of instruction. Additionally, the allotment of classroom time to content was uniform across classes, with the first 10 hours allotted to orthographic and phonemic familiarization (i.e., reading and writing of the Korean alphabet). Thus, by the end of the first week of classes, students had mostly finished the portion of the course focused on spelling and pronunciation. Nevertheless, the posttest experiment was not conducted until five weeks later (i.e., at the end of the course) so that participants would be maximally comfortable with using Korean graphemes to provide their responses in this experiment.

Materials
Stimuli in both the pretest and posttest experiments were designed to test perception of the same Korean phonological contrasts tested in Jung and Kwon (2010): the three-way stop laryngeal contrast and two vowel contrasts, /u/-/ɨ/ and /o/-/ʌ/. Thus, the stimuli included all nine stops (lenis /p t k/, fortis /p* t* k*/, aspirated /p h t h k h /), which were put before /a/, and four lone vowels (/u ɨ o ʌ/).
The stimuli were prepared as follows. First, they were produced by a female native Korean speaker in a standard formal register within the carrier sentence /__-ka is*ɨpnita/ "There is (a) __". For example, for /k/, the target syllable was /ka/, and this was uttered in the sentence /ka-ka is*ɨpnita/. The set of 13 sentences (9 target stops + 4 target vowels) was arranged in a random order and recorded three times. The recordings were made at 22.1 kHz and 16 bps, in a soundproof booth using a CSL 4500 recording device and a Shure SM48 dynamic mic.
After the sentences were recorded, they were edited in Praat (Boersma and Weenink 2013) to isolate the target syllables. Each syllable was cut out from its carrier sentence and saved in a separate file. All tokens were then inspected auditorily, and one token of each syllable was chosen for the perception stimuli (usually the second token, unless this token sounded unnatural). The resulting 13 stimuli were then randomized and submitted for perceptual evaluation to five native Koreanspeaking judges to confirm their quality. The judges were asked to listen to and identify the stimuli (by transcribing them in Korean orthography), as well as rate their confidence in each identification judgment (on a 1-5 scale, 1 being "least confident" and 5 being "very confident"). The rate of correct identification was 100% and the mean confidence rating was 4.9, suggesting that the stimuli were highly intelligible, as well as comprehensible, and thus suitable for use in this study.

Procedure
Both the pretest and posttest experiments were carried out in a quiet classroom equipped similarly to the rooms for participants' Korean classes. Participants were tested in a group, using the classroom's audio speakers (which were mounted all around the room) and paper answer sheets. Test materials for the pretest and posttest are publicly accessible via the Open Science Framework (OSF) at https://osf.io/rkxdh/.

Pretest: Oddball Discrimination
The pretest experiment was conducted before the beginning of participants' Korean course and was meant to investigate the degree to which these ab initio learners could already perceive the target sounds. Because at this point in time participants had no knowledge of Korean, an identification task was eschewed in favor of a relatively difficult discrimination task (namely, oddball discrimination) with reduced stimulus variability (i.e., only one token of each target syllable) to prevent the task from being overly difficult. Prior to beginning the experiment, participants were told that the sounds they were going to hear were Korean speech sounds (so as to preclude a nonlinguistic mode of auditory perception), and they then completed a short practice session consisting of three trials with stimuli different from the test stimuli to familiarize them with the task.
Each trial of the oddball discrimination task presented a three-item sequence of auditory stimuli to the participant, who had to indicate which (if any) of the three items was different from the other two (i.e., the oddball). For example, one trial presented the sequence /ka/-/k*a/-/k*a/, and the correct answer on this trial was /ka/ (= item #1). The items in a test sequence were separated by an inter-item interval of 700 ms, and each sequence was played twice before participants had to respond. Participants responded by circling the number on their answer sheet corresponding to the serial position of the oddball or, alternatively, one of two other options: "all same" (if no item was different from the other two) and "all different" (if every item was unique). The inter-trial interval was 5 s. Trials were ordered randomly but blocked by contrast type, such that all trials testing laryngeal contrasts were presented before trials testing vowel contrasts. For each of the nine laryngeal contrasts and two vowel contrasts, two test sequences were included (e.g., /p*a/-/p h a/-/p*a/ and /p h a/-/p h a/-/p*a/ for the /p*/-/p h / contrast), with the position of the oddball distributed across the three possibilities in a ratio of 1:2:2.5. Each test sequence was iterated twice, for a total of 44 trials (11 target contrasts × 2 test sequences × 2 repetitions).
To check that the pretest was in fact reliable, we calculated split-half reliability by randomly dividing the dataset from the pretest into two subsets according to contrast type (i.e., each subset containing half of the items, randomly selected, for each of the five contrast types: lenis vs. fortis, lenis vs. aspirated, fortis vs. aspirated, /o/ vs. /ʌ/, /u/ vs. /ɨ/). This calculation suggested that the pretest had good reliability (Cronbach's α = 0.82).

Posttest: Forced-Choice Identification
The posttest experiment was conducted at the end of the Korean course and was meant to examine participants' (relative) degree of success in perceptually acquiring the target Korean sounds. By this point, participants had spent a considerable amount of time learning Korean, including the alphabet, so it was assumed they were familiar enough with its phonological categories and alphabet to perform an orthographic labeling task. Therefore, in contrast to the pretest, the posttest experiment used the identification paradigm to provide a measure more closely reflecting the task of real-world speech perception. Consequently, it should be noted that absolute accuracy levels in the pretest and posttest cannot be directly compared to each other; however, such a comparison is not needed to address our research questions, which concern relative levels of performance across L1 backgrounds rather than absolute accuracy. Prior to beginning this experiment, participants were once again told that they were going to hear Korean speech sounds, and then they completed a short practice session consisting of three trials using stimuli different from the test stimuli to familiarize them with the task.
The identification task was a ten-alternative forced-choice (10AFC) task for trials testing stops and a six-alternative forced-choice (6AFC) task for trials testing vowels. The response options for stop trials were /pa p*a p h a ta t*a t h a ka k*a k h a/ and the distractor option "other". The response options for vowel trials were /u ɨ o ʌ/ and the distractor options /a e/. On each trial, one auditory stimulus consisting of a target syllable in isolation was played twice before participants had to respond. Participants responded by circling the option on their answer sheet (written in Korean orthography) that matched what they had just heard. The inter-trial interval was 5 s. As in the pretest, trials were ordered randomly but blocked by contrast type, such that all trials testing stops were presented before the trials testing vowels. For each of the 13 target syllables, there were three repetitions (of the same token), for a total of 39 trials.
To check that the posttest was reliable, we again calculated split-half reliability by randomly dividing the dataset from the posttest into two subsets according to category type (i.e., each subset containing half of the items, randomly selected, for each of five category types: lenis stops, fortis stops, aspirated stops, mid vowels, high vowels). This calculation suggested that, like the pretest, the posttest also had good reliability (Cronbach's α = 0.82).

Statistical Analysis
In both the stop study and the vowel study, the likelihood of an accurate response in the pretest and posttest experiments was analyzed using mixed-effects logistic regression. To maximize the stability and generalizability of the final models, a two-stage modeling process was used, consisting of initial exploration of potential predictors (in models with lone fixed effects) followed by incremental model building, with model comparisons conducted via likelihood-ratio tests. This process resulted in four final models, one for the pretest and posttest in each of the two studies. All data from the pretest and posttest are publicly accessible via the OSF at https://osf.io/wgcen/.
In the first stage of modeling, we examined a series of models built with single fixed-effect predictors to get a sense of the informativeness of each fixed effect on its own. The random-effects structure in all models comprised intercepts by Participant and Stimulus, except in case there was only one stimulus per L2 category (i.e., in the model of posttest vowel identification). 2 The fixed effects explored in pretest models were the participant's L1 group (Group; treatment-coded with reference level "lead-short" in the stop study and "no contrast" in the vowel study) and the contrast tested on that trial (Contrast; treatment-coded with reference level "lenis-fortis" in the stop study and "/u/-/ɨ/" in the vowel study). The fixed effects explored in posttest models were Group (coded as above), the category tested on that trial (Category; treatment-coded with reference level "lenis" in the stop study and "/u/" in the vowel study), and the participant's overall accuracy in the pretest (PretestAcc; centered and standardized).
In the second stage of modeling, we built a full model on each dataset (e.g., pretest trials in the stop study) with a view toward providing a strong test of the presence of a Group effect. Starting from a base model containing the random-effects structure from above, all potential fixed predictors besides Group were examined first (in decreasing order of informativeness, as indicated by AIC values in the single-predictor models from the first stage of modeling), including all possible interaction terms. Once these predictors were tested and all those which failed to significantly improve the model were removed, the Group term was added to see if it significantly improved the model. Thus, for each dataset, the model improvement (or lack thereof) resulting from the addition of the Group term is interpreted as evidence of the presence/absence of an effect of L1 background.

Individual Differences at Pretest
To check whether IDs were confounded with L1 background (leading to unbalanced levels of IDs across the various L1-based groups), we first examined the range and distribution of variation in pretest performance (i.e., global discrimination accuracy) within each L1 group. Recall that there were 2 A more complex random-effects structure including random slopes was not used in either stage of modelling because models with such a random-effects structure failed to converge or, alternatively, showed signs of overparameterization and/or less stable fit. three groups in the stop study and two groups in the vowel study; therefore, there were three between-group comparisons to be made in the stop study and one in the vowel study.
As shown in Figure 2, there was some variation across groups in the shape of the distribution in pretest accuracies (e.g., in the stop study, a longer lower tail in the lead-short group vs. a shorter lower tail in the short-long group); however, paired comparisons via Welch-corrected two-sample ttests provided little evidence of a systematic disparity in the degree of IDs between any two groups. For the groups in the stop study, the range in pretest accuracy was 22-97%, 50-100%, and 47-89%, respectively, for the lead-short, short-long, and lead-long groups. None of the between-group differences were significant [|t|s < 2.026, ps > 0.05]. As for the groups in the vowel study, the range in pretest accuracy was 47-97% and 22-100%, respectively, for the no-contrast and some-contrast groups. Here, too, the between-group difference was not significant [t(54.7) = −0.502, p > 0.05]. Given these results, we conclude that IDs in phonetic sensitivity did not simply reflect participants' diverse L1 backgrounds. In fact, there were substantial, and not significantly different, levels of individual variation in all of the L1 groups discussed below.

Study 1: Stop Perception
Analysis of pretest performance in the stop study revealed considerable variation in discrimination accuracy across the different L2 laryngeal contrasts, but no systematic effect of the learner's L1 group. As shown in Figure 3, accuracy was well above chance level (=20%) in all cases, but tended to be lower on the lenis-fortis contrast (Macc = 58%) than on the other two contrasts (fortisaspirated: Macc = 85%; lenis-aspirated: Macc = 81%). This was reflected in the modeling results, which revealed a significant effect of Contrast [χ 2 (2) = 13.769, p < 0.01] but no effect of Group [χ 2 (2) = 5.604, p > 0.05]. The final model (shown in Table 5) confirmed that, compared to the lenis-fortis contrast (represented in the intercept), the fortis-aspirated and lenis-aspirated contrasts were both significantly more likely to be discriminated accurately [βs > 1.476, ps < 0.001].  Analysis of posttest performance in the stop study revealed both variation in identification accuracy across the different L2 laryngeal categories as well as an advantage of the short-long group-and, to a lesser extent, the lead-short group-over the lead-long group. As shown in Figure  4, with the exception of the lead-long group on fortis stops, accuracy was well above chance level (i.e., 10%); nevertheless, accuracy tended to be lower on fortis stops (Macc = 51%) than on the lenis (Macc = 68%) or aspirated stops (Macc = 62%), especially for the lead-short and lead-long groups. These patterns were reflected in the modeling results, which showed no effect of PretestAcc [χ 2 (1) = 0.834, p > 0.05] but a significant effect of Category [χ 2 (2) = 13.770, p < 0.01], of Group [χ 2 (2) = 8.564, p < 0.05], and of the Category × Group interaction [χ 2 (4) = 36.291, p < 0.001].  The nature of the interaction between Category and Group can be seen in the final model of posttest accuracy (Table 6). Relative to accuracy on lenis stops, the lead-short group was significantly less likely to be accurate on fortis stops [β = −1.394, p < 0.001] but not aspirated stops [β = 0.163, p > 0.05], and the lead-long group showed a similar pattern, reflected in non-significant interaction coefficients. The short-long group, however, showed a different pattern, with a much smaller decrease in accuracy on fortis stops vis-a-vis lenis stops and a slight decrease in accuracy on aspirated stops (contrasting with the slight increase observed in the other two groups); these differences were reflected in a significant positive interaction coefficient for fortis stops [β = 0.860, p < 0.05] and a significant negative interaction coefficient for aspirated stops [β = −0.810, p < 0.05].  In addition to analyzing accuracy, we also inspected the pattern of errors in the posttest to check whether they involved laryngeal category, place (of articulation), or both. As shown in Figure 5 (see also Figures A1-A3 in Appendix A), errors for every stimulus mostly involved laryngeal category only. The two exceptions were /p h a/ and /ta/, which each elicited a high proportion of errors involving place. Crucially, however, errors involving laryngeal category only were by far the most common error type, both overall (83%) and across groups (77-88%). . Each cell shows the percentage of all errors on the given stimulus represented by the given response (rows may not add to 100% due to rounding); the most common error type for each stimulus is bolded. The total number of errors for each stimulus (across all groups) is shown in parentheses.
Although our modeling results suggested that pretest performance was not a predictor of posttest stop perception, we further examined the relationship between the two through Pearson's correlations to provide additional evidence of the (non)predictiveness of pretest performance. These analyses revealed no significant correlations between pretest and posttest accuracy on stops, overall [r(55) = 0.13, p > 0.05] or within any individual group: lead-short [r(13) = 0.10, p > 0.05], short-long [r(34) = −0.18, p > 0.05], or lead-long [r(4) = 0.64, p > 0.05] (see Figure 6). In short, results of the stop study produced some evidence of L1 transfer, but crucially only following extensive L2 exposure; moreover, there was no evidence of a link between preexisting phonetic sensitivity (as measured in the pretest) and L2 stop perception following L2 learning (as measured in the posttest).
(a) (b) Figure 6. Individual posttest accuracy in the stop study by pretest performance, with regression lines (a) over all L1 groups together, and (b) over each L1 group separately. Points are jittered; the shaded areas indicate the 95% confidence interval around the regression line.

Study 2: Vowel Perception
Analysis of pretest performance in the vowel study revealed ceiling-level discrimination accuracy for both L2 vowel contrasts and across both L1 groups. As shown in Figure 7, accuracy was well above chance in all cases and did not differ between the /u/-/ɨ/ and /o/-/ʌ/ contrasts (Macc = 93% for both). Modeling results showed no effect of Contrast [χ 2 (1) = 0, p > 0.05] or of Group [χ 2 (1) = 0.040, p > 0.05], so the final model [N = 456, log-likelihood = −83.5] contained no fixed predictors. Consistent with the high accuracies in Figure 6, the intercept in this model indicated that L2 vowel contrasts were, overall, discriminated with significantly better than 50-50 odds [β = 6.983, z = 4.456, p < 0.001]. Compared to pretest performance, there was more variation in posttest performance, but accuracies were generally high, especially relative to stop identification rates. There were some differences among vowels, with /ɨ/ showing the highest accuracies and /ʌ/ the lowest. Crucially, there was also a group difference: the some-contrast group outperformed the no-contrast group, both overall (no contrast: Macc = 74%; some-contrast: Macc = 88%) and on every vowel category, as shown in Figure 8. Thus, modeling results showed a significant effect of Category [χ 2 (3) = 52.729, p < 0.001] and of Group [χ 2 (1) = 4.233, p < 0.05] though no effect of the Category × Group interaction [χ 2 (3) = 0.955, p > 0.05]. Additionally, as in the stop study, there was no effect of PretestAcc [χ 2 (1) = 0.395, p > 0.05]. The final model (Table 7) indicated that, compared to /u/, /ɨ/ was significantly more likely to be identified accurately [β = 2.090, p < 0.001] while /ʌ/ was significantly less likely to be identified accurately [β = −0.768, p < 0.05]; furthermore, the some-contrast group was significantly more likely to be accurate than the no-contrast group [β = 1.210, p < 0.05].  As in the study of stop perception, here too we inspected the pattern of errors in the posttest to see which vowels were most confusable with each other. As shown in Figure 9 (see also Figures A4 and A5 in Appendix A), /u/ was most often confused with /o/, /ɨ/ with /u/ (although there were very few errors on /ɨ/ overall), /o/ with /ʌ/, and /ʌ/ with /a/. Relatively few errors involved confusion with the vowel /ɨ/ or /e/. In short, vowel identification errors tended to involve confusion with a vowel sharing a specification for roundedness/labiality and/or height, which was unsurprising; however, predominant vowel confusions were not symmetrical (e.g., /o/ was most often misidentified as /ʌ/ but /ʌ/ was not most often misidentified as /o/).  (vertical = stimuli, horizontal = responses). Each cell shows the percent of all errors on the given stimulus represented by the given response (rows may not add to 100% due to rounding); the most common error type for each stimulus is bolded. The total number of errors for each stimulus (across all groups) is shown in parentheses.
As in the stop study, we further examined the relationship between pretest performance and posttest vowel perception via Pearson's correlations. In line with the modeling results as well as the lack of correlations seen in the stop study, these analyses revealed no significant correlations between pretest performance and posttest accuracy on vowels, overall [r(55) = 0.09, p > 0.05] or within either individual group: no-contrast [r(24) = 0.16, p > 0.05] or some-contrast [r(29) = −0.002, p > 0.05] (see Figure 10). In short, results of the vowel study were consistent with those of the stop study: again, we found evidence of L1 transfer only in the posttest and no evidence of a link between preexisting phonetic sensitivity and L2 vowel perception after L2 learning.

Discussion
In summary, evidence of L1 transfer was not found in L2 perception of Korean ab initio, but was found after a prolonged period of L2 learning. Results of the pretest showed, as expected, substantial individual differences (IDs) in learners' preexisting phonetic sensitivity to the target stop and vowel contrasts, but no systematic effect of the crucial L1 features hypothesized to affect L2 perception of these contrasts. In contrast, results of the posttest showed a significant effect of these L1 features, both for stops and for vowels, but no effect of phonetic sensitivity as reflected in pretest performance. These findings thus provide support for the first part of the Transfer Ramp-up Hypothesis: rather than peaking at L2 onset, L1 influence takes time to set in during L2 perceptual development (at which point the relative role of IDs in phonetic sensitivity may become smaller).
Although these findings converge with the results of previous studies on L2 Korean (Jung and Kwon 2010;Kwon 2013), they also diverge from the literature supporting the Perceptual Assimilation Model (PAM; Best 1994Best , 1995, much of which also examined naive listeners of an L2 yet found patterns of ostensible L1 influence in discrimination of L2 contrasts. In our view, the disparity between the current results and PAM-framed results is likely due to different formulations of the dependent measure. That is, whereas studies testing PAM have generally examined L2/nonnative perception by individual contrast (because PAM makes different predictions for different types of L2 contrasts), the present study, which was not focused on testing PAM, evaluated L2 perception mainly in terms of an overall likelihood of accuracy. In fact, when the (pretest) discrimination results are separated out by contrast, there is some indication of a possible L1 effect for certain contrasts (e.g., the short-long group showing the greatest advantage over the lead-short group on the Korean fortis vs. aspirated contrast; see Figure 3), even if not for others (e.g., the no-contrast group showing similar, ceiling-level performance as the some-contrast group on both Korean vowel contrasts; see Figure 7).
Thus, while we interpret the current findings as support for the Transfer Ramp-up Hypothesis, we are also careful to point out that our observation of a low degree of L1 transfer at L2 onset (vis-avis later points in L2 development) does not mean that there is no L1 influence at L2 onset. It is not possible to draw such a conclusion on the basis of the evidence in this study, and the Transfer Rampup Hypothesis, moreover, does not make this extreme claim. The core claim is rather that there is an increase in L1 transfer during the early part of L2 development. Because this does not entail the total absence of L1 effects at L2 onset, it stands to reason that the L1 may indeed have a detectable effect on the ab initio perception of specific L2 contrasts (e.g., "Single Category" contrasts that assimilate to the same L1 sound; Best 1994), which may amount to an L1 effect that is weak or undetectable overall when multiple L2 contrasts are considered together. The locus of the difference between the present study and previous theoretical frameworks, therefore, is in the observed trajectory of L1 transfer. In particular, our findings contradict the view-made explicit in the Ontogeny Phylogeny Model (OPM) and left implicit in other frameworks such as PAM(-L2)-that L1 influence is at its height at the start of L2 development and only decreases from that point.
Apart from the implications for theories of L1 transfer, the current findings also have implications for views of IDs in L2 acquisition. In particular, they make two contributions to our understanding of IDs in L2 perceptual development. First, given that no effect of L1 background was found in the pretest despite ample variation among learners, the pretest results provide evidence that the magnitude of IDs in phonetic sensitivity to L2 contrasts-at least, to the target L2 Korean contrasts examined in this study-can be relatively large compared to the magnitude of L1 effects. These results thus argue in favor of making the analysis of IDs central to studies of L2 perception (as is becoming increasingly common in the L2 speech literature; see Section 2.2) because IDs may actually prove to be a more powerful predictor of L2 perceptual behavior (at certain points in L2 development) than the more extensively examined factor of L1 background. Second, together with the pretest results, the posttest results provide evidence that the effect of IDs changes over the course of L2 development. Crucially, IDs in phonetic sensitivity were found not to predict L2 perceptual accuracy after L2 learning, which supports a view of L2 perception, at any given point during L2 development, as the outcome of a dynamic interaction between L1 transfer and IDs.
In light of these theoretical implications, it would be remiss of us not to mention the limitations of the present study, which motivate a number of different directions for future research. First, as alluded to in Section 2.4, the composition of the participant sample, influenced in large part by worldwide trends in who elects to study Korean as an L2, was unevenly distributed in terms of L1 backgrounds, necessitating an approach that grouped L1s together rather than analyzing them separately. Although adequate for addressing research questions related to broadly formulated phonological features of an L1, this approach does not lend itself to a nuanced examination of features that may be more language-specific (e.g., palatalization in Russian; pharyngealization in Arabic). It would, therefore, provide additional insight to replicate this type of study with a larger sample in which all L1 backgrounds are robustly represented, allowing for analyses that focus on specific L1s.
Second, as mentioned in Section 2.4.2, the study was designed to use different task paradigms (discrimination and identification) for the pretest and posttest, considering both ecological validity and appropriateness for different stages of L2 learning. Although both paradigms are widely used to measure perceptual ability, and the measures from these paradigms have been shown to be highly correlated with each other at the same point in time (e.g., Pearson's r > 0.6 for L2 discrimination and identification of Mandarin tones; Bowles et al. 2016), the fact remains that discrimination performance and identification performance cannot be directly compared to each other, which prevents us from being able to draw conclusions about participants' L2 perceptual development that are truly longitudinal. Thus, it would be useful in future research to collect longitudinal data from the exact same task, which, with some design modifications, could be made appropriate for learners at different stages of L2 development (e.g., use of iconic images for response options in ab initio L2 identification; see Bowles et al. 2016).
Third, this study included observations of two time points in L2 development, whereas at least three are needed to fully demonstrate the inverse U-shaped pattern of L1 transfer postulated in the Transfer Ramp-up Hypothesis (Figure 1). That is, we have only assumed the part of the pattern in which L1 transfer declines at later points in L2 development on the basis of prior findings in the literature (see Section 1.1), since it was not possible to observe a third, later time point in the case of the current participants (who did not necessarily continue learning Korean after the end of their Korean language course). In future work, it would therefore be helpful to track learners further into their trajectory of L2 learning (e.g., in a year-long course of L2 instruction) so as to provide direct evidence of the hypothesized post-ramp-up decline in L1 transfer.

Conclusions
In closing, we would like to highlight one of the chief challenges of designing developmental perceptual research such as the present study, and outline a possible approach to addressing this challenge in future research. In our view, truly longitudinal perceptual data (i.e., data from the same individual completing the same perceptual task, including the same auditory stimuli, at different points in time) may not be the ideal data for investigating perceptual change over time, because it is not only development, but also extraneous factors such as familiarity with (or memory of) the test stimuli, that may lead to listeners performing differently in the same task across two points in time. The challenge for future research, therefore, is to identify and control for such extraneous factors appropriately. In the case of test stimuli, for example, one way of addressing the issue of familiarity/memory would be to use similarly constructed, but non-identical, stimulus sets at different time points, normed in advance to be equivalently difficult. Studies designed "pseudolongitudinally" in this manner would be better able to show change that could be confidently interpreted as reflecting development.
Despite the challenges of incorporating a longitudinal design in developmental perceptual research, however, the need for longitudinal studies in working toward a theory of L2 perceptual learning that incorporates a role for both L1 transfer and individual differences cannot be overemphasized. Given that cross-sectional studies, by their very nature, are ill-equipped to examine the temporal dynamicity of individual difference effects, longitudinal studies, including both the laboratory training approach as well as the classroom learning approach taken in the present study, are uniquely positioned to shed new light on the roles, and interaction, of language-specific and personally-specific variables in L2 perceptual development. Figure A1. Confusion matrix of errors by the lead-short group in the posttest for stop items (vertical = stimuli, horizontal = responses). Each cell shows the percent of all errors on the given stimulus represented by the given response (rows may not add to 100% due to rounding); the most common error type for each stimulus is bolded. The total number of errors for each stimulus (across all participants in the group) is shown in parentheses. Figure A2. Confusion matrix of errors by the short-long group in the posttest for stop items (vertical = stimuli, horizontal = responses). Each cell shows the percent of all errors on the given stimulus represented by the given response (rows may not add to 100% due to rounding); the most common error type for each stimulus is bolded. The total number of errors for each stimulus (across all participants in the group) is shown in parentheses. Figure A3. Confusion matrix of errors by the lead-long group in the posttest for stop items (vertical = stimuli, horizontal = responses). Each cell shows the percent of all errors on the given stimulus represented by the given response (rows may not add to 100% due to rounding); the most common error type for each stimulus is bolded. The total number of errors for each stimulus (across all participants in the group) is shown in parentheses. Figure A4. Confusion matrix of errors by the no-contrast group in the posttest for vowel items (vertical = stimuli, horizontal = responses). Each cell shows the percent of all errors on the given stimulus represented by the given response (rows may not add to 100% due to rounding); the most common error type for each stimulus is bolded. The total number of errors for each stimulus (across all participants in the group) is shown in parentheses. Figure A5. Confusion matrix of errors by the some-contrast group in the posttest for vowel items (vertical = stimuli, horizontal = responses). Each cell shows the percent of all errors on the given stimulus represented by the given response (rows may not add to 100% due to rounding); the most common error type for each stimulus is bolded. The total number of errors for each stimulus (across all participants in the group) is shown in parentheses.