1. Introduction
Leading models of second language (L2) phonological acquisition generally agree that L2 speech learning is significantly influenced by perceptual biases stemming from the phonetic system of a learner’s first language (L1) (SLM/SLM-r;
Flege, 1995;
Flege & Bohn, 2021; PAM/PAM-L2;
Best, 1995;
Best & Tyler, 2007). For instance, the SLM/SLM-r posits that the learnability of L2 sounds is determined by the perceived phonetic distance between the L2 sound and the closest L1 phonetic category: sounds that are more phonetically distinct from L1 categories are predicted in the model to be easier to learn, as the formation of new categories for such sounds is more likely. In turn, L2 sounds that are similar to L1 sounds are predicted to be more difficult to acquire, as L2 learners may struggle to detect subtle phonetic differences between L2 and L1 sounds, leading them to map these L2 sounds onto existing L1 categories and consequently perceive and produce these inaccurately. The L2 speech models further agree that L2 learners’ perception of phonetic differences undergoes gradual, continuous refinement with increasing L2 exposure. This process is thought to affect both L1 and L2 categories, which co-exist in a shared phonological space (termed a ‘common phonetic space’ in
Flege & Bohn, 2021), inevitably influencing each other.
Although L2 speech models were not initially developed with third language (L3) learners in mind, i.e., learners who use three or more languages in everyday life, their core predictions offer a useful heuristic for investigating the more complex scenario of L3 speech learning. Existing research in L3 speech perception supports this extension, indicating that L3 learners can accurately discriminate novel sounds that do not exist in their previous languages (
Wrembel et al., 2019;
Balas et al., 2019;
Liu & Lin, 2021), but struggle to perceive L3 sounds that are phonetically similar to either L1 or L2 counterparts (
Liu & Lin, 2021;
Mun, 2022). Furthermore, perception of challenging sound contrasts has been found to show noticeable improvement within just a few months of learning the new language, particularly among adult L3 learners (
Cal & Wrembel, 2023;
Nelson, 2020). Evidence also suggests that L3 learners map L3 sounds onto both L1 and L2 categories, reflecting the notion of a common phonetic space (
Wrembel et al., 2019,
2020;
Mun, 2022;
Parrish, 2022;
Ren, 2022). For example,
Wrembel et al. (
2019) found that beginner L3 learners assimilated Polish vowel sounds not only onto L1 German but also, and primarily, onto L2 English vowel categories. Similarly,
Parrish (
2022) observed that Spanish–English and English–Spanish bilingual adults categorized (not yet learnt) L3 German and French vowels using both L1 and L2 categories, with no clear bias toward either. Additionally, there is some evidence that newly forming/formed L3 phonetic categories can influence the earlier established L2 categories (
Nelson, 2020), as well as L1 categories (
Sypiańska, 2016).
However, there is also emerging evidence from L3 perception studies that presents challenges to the models of L2 speech learning, concerning the non-linearity of perceptual development (
Nelson, 2020;
Balas et al., 2019;
Wrembel et al., 2020). The L2 speech learning models generally assume that L2-to-L1 mapping patterns stabilize as learners receive relevant L2 phonetic input, suggesting a linear refinement of target phonetic categories over time. According to the SLM-r model (
Flege & Bohn, 2021), L2 learners progressively overcome L1 perceptual biases by reaching specific ‘landmarks’ in L2 speech learning, such as adopting target-like cue weighting in L2 sound perception, which ultimately leads to the formation of accurate L2 phonetic categories. No changes in terms of L2-to-L1 mapping are envisioned in the model for learners who are no longer exposed to new phonetic input. In contrast, recent longitudinal studies on L3 speech perception development reveal that this process can be non-linear, dynamic, and self-organizing, aligning more closely with the principles of Complex Dynamic Systems Theory (CDST;
Larsen-Freeman, 2017;
de Bot, 2012;
Gut et al., 2023). These studies, discussed in greater detail in
Section 2 below, suggest that L3 speech perception development does not necessarily follow a linear trajectory but may involve fluctuations and shifts as multilinguals navigate the complexities of their speech learning in multiple languages over time.
CDST thus seems to offer a particularly apt framework for describing and understanding the process of multilingual speech development (
de Bot, 2012;
Huang et al., 2020). It emphasizes the interconnectedness of various subsystems, which in this context includes the learner’s L1, L2 and L3 sound systems. These subsystems do not function in isolation; rather, they dynamically interact and influence each other, resulting in a non-linear and emergent development of phonetic categories in all of the learner’s languages. In this connection, CDST stresses the crucial effect of the initial conditions which, in the context of learning a new L3 sound system, will entail a perceptual representation of both the L1 and the L2 at the beginning of L3 learning. Attractors, such as the perceptual targets of the L1 and/or the L2, are posited to anchor the learner’s perception of target L3 sounds and sound contrasts. However, these attractors are not fixed; they can be reshaped by the learner’s experiences and exposure (or lack thereof) to their multiple languages. Also, changes in individual learner characteristics, such as L2 (and/or L3) motivation, can significantly affect the trajectory of a multilingual’s speech perception development. For instance, a highly motivated multilingual might be more likely to destabilize existing attractors, allowing for the emergence of new phonetic categories that better align with the target L2 and/or L3 sounds. Thus, CDST provides a comprehensive lens through which the complexity of multilingual speech development can be described and understood, capturing the dynamic interplay of cognitive, social, and psychological factors that ultimately shape the formation of phonetic categories across a multilingual’s languages, and thus their speech perception performance over time.
The underlying theoretical assumptions of CDST mean that studies carried out within this framework necessitate a distinct set of research methods. Rather than focusing on individual variables, CDST researchers explore patterns of co-variability, analyzing as many interacting subsystems as possible over time (
Verspoor et al., 2011;
Lowie, 2017;
Gut et al., 2023). Studies in the CDST framework thus typically employ what is called a process-oriented method: the research design is longitudinal involving learner groups or individual learners from whom data is collected at numerous densely spaced data points. The analyses focus on capturing dynamic changes in language development, with intra-learner variability recognized as both a driving force for and a key indicator of ongoing developmental processes. One method frequently used in such studies, and also applied in the present study, is moving min–max analysis. This technique visualizes a learner’s developmental trajectory by showing the amount of variability in performance during specific time periods. Similarly, moving correlation analyses can be employed to show the synchronicity between developments in a learner’s different subsystems, providing insights into whether the relationships between the subsystems are competitive or supportive. To date, CDST-based research on phonological development beyond the target language remains scarce, with no studies specifically addressing multilingual perceptual development.
2. Previous Studies on L3 Speech Perception Development
Research into L3 perception development has begun to uncover how multilinguals process and distinguish sounds in their multiple languages. For instance,
Balas et al. (
2019) examined the perception of rhotic sounds in two groups of L3 learners, which are realized differently in their three languages. All learners had English as their L2, with German and Polish either as L1 or L3, allowing for an examination of L1 effects on their L2 and L3 rhotic perception. Based on results from a forced-choice goodness task administered at 5 and 10 months of L3 instructed learning, both groups performed at a ceiling level in the perception of L2 English rhotics, which was attributed to the learners’ relatively stable L2 rhotic category after around six years of instructed learning. However, L1 German adolescents consistently perceived L3 Polish rhotic sounds more accurately and responded faster in their Polish rhotic naturalness judgements than L1 Polish adolescents did for German rhotic sounds. Additionally, the Polish adolescents’ perception accuracy dropped significantly between the two testing times. The authors attributed the learner group differences and perceptual (in)stability to the markedness of the German uvular fricative compared to the Polish alveolar trill. In a follow-up study,
Wrembel et al. (
2020) further examined the L2 and L3 developmental trajectory of the L1 Polish group, this time focusing on the perception of rhotics as well as final obstruent (de)voicing. Overall, the study reconfirmed higher perception accuracy and lower processing speed in L2 English compared to L3 German. Thus, the results suggested that cross-linguistic influence (CLI) was low for L2 English perception, especially for rhotics. The study also showed that perception accuracy was generally higher for rhotics than for final obstruents in both L2 and L3. In fact, the perception of final obstruent devoicing in the learners’ foreign languages was at a chance level at both testing times, with no improvement for any of the learners, while rhotic perception was significantly higher in both languages, with individual speakers reaching ceiling performance. This was interpreted as evidence of a high level of L1-based CLI for the final obstruents, even in L2 English, despite years of lessons in that language. The study’s results were explained by the lower perceptual saliency of final obstruent (de)voicing compared to the distinct articulations of rhotics in the three languages under investigation. Regarding L3 German perception development, the adolescents’ accuracy in perceiving the German rhotic sound was higher after 5 months of learning than after 10 months, suggesting ongoing restructuring of perceptual categories within the first ten months of exposure to a new language, thus echoing findings by
Balas et al. (
2019). However, this restructuring was shown to be feature-dependent, as changes were found only in the perception of rhotics but not in final obstruent (de)voicing. This is largely in line with another more recent study by
Balas et al. (
2023), who overall reported little changes in the assimilation patterns of a range of L2 English and L3 Norwegian vowels into L1 Polish categories by adult Polish multilinguals, over a period of nine months of learning the L3. However, the latter study also show-cased individual learner differences for L2 and especially L3 perception development for certain vowels, pointing out that while some L3 learners exhibited stable assimilation patterns, others showed great variability in cross-linguistic vowel perception over time.
Nelson (
2020) examined adolescent and adult L2 and L3 learners’ perception of the v-w contrast, present in their L2 but not L1, after one, three, five, and ten months of L3 learning. Based on the results from the ABX task, adults generally performed better than adolescents in both languages and showed a slight upward trend in both their L2 and L3 perception learning. For adolescents, perception development was more non-linear. Surprisingly, the young multilinguals demonstrated more accurate and faster discrimination in L3 than in L2 at the first testing time, i.e., after only a few hours of L3 input. The author hypothesized a positive ‘novelty effect’ for the L3 learners, suggesting that very initial learners may not automatically assimilate novel sounds to their pre-existing categories (whether L1 or L2) but instead utilize available acoustic cues and possibly still different processing and phonological skills at that stage of L3 speech learning. Regarding L2 perception development, the young learners evidenced a drop in accuracy after around 10 weeks of L3 learning, interpreted as a reverse cross-linguistic effect in the form of a temporary ‘perceptual confusion’. This may manifest as competition between L2 and L3 phonetic representations, where the acquisition of L3 impacts the stability of L2 categories. However, after ten months of L3 learning, the novelty effect as well as the negative cross-linguistic effect disappeared for the young L3 learners, who perceived the contrast in their L2 and L3 similarly accurately. The present study extends the investigation of the adult L3 learners’ performance, specifically focusing on their perception of L2 and L3 final obstruents. While
Wrembel et al. (
2020) reported low accuracy and minimal perception development of this phonological feature in adolescent L3 learners,
Nelson (
2020) demonstrated that adult L3 learners can leverage their greater cognitive maturity and linguistic experience in multilingual speech development. Considering that most previous longitudinal studies have focused on young L3 learners, it is particularly compelling to examine the adult learner group in this context.
Thus, the few existing studies into L3 perception development suggest that the phonological space of multilinguals is reshaped relatively early in the course of learning the new L3, and that category boundaries can be expanded to accommodate L1, L2, and L3 categories of similar phonetic types, while new L3 categories for novel phonetic types may be formed. Initial sensitivity to phonetic contrasts may deteriorate with time due to language interactions and be modulated by the learnability of the sounds concerned, including markedness. In other words, speech perception in L3 learners undergoes a non-linear path of development, although L2 perception categories tend to show greater stability compared to newly forming L3 categories in learning contexts where multilinguals receive continuous and equal amounts of phonetic input in both of their foreign languages.
Despite the valuable insights gained from these studies, many have been limited by sparse data collection points, which may miss important fluctuations and developmental stages in the acquisition of L3 phonology, as shown in
Nelson (
2020) cf., (
Rothman et al., 2019). The present study addresses this limitation by employing a longitudinal approach with dense data collection points. This design allows for a more nuanced portrayal of the temporal dynamics of L3 phonological development in multilingual learners. Additionally, an attempt was made to contextualize the dense speech perception data with insights into individual learners’ language use and input, as well as affective engagement with learning their non-native languages. This approach made it possible not only to characterize the process of an individual multilingual’s speech perception development in time but also to better understand some of the underlying reasons for these changes.
4. Methods and Materials
4.1. Participants
Seven adult multilinguals (aged 21–39; 5 females, 2 males) participated in this study. All participants were L1 German speakers with self-assessed upper-intermediate/advanced proficiency in L2 English. They had daily informal contact with L2 English (AOL = 9 years) and attended a 90-min beginner course in L3 Polish twice a week, instructed by a native speaker. For all participants, Polish was a new language, i.e., all heritage speakers were excluded from the final dataset as were all early bilinguals. In addition, five of the seven adults also had some knowledge of French and/or Spanish, which they had acquired in secondary school. These learners reported minimal to no use or exposure to these additional languages in their daily lives throughout the study (with the exception of SYLÜ08, see below). At T4, all but one participant reported little to no improvement in their English, or where applicable, in their French and/or Spanish (again, see SYLÜ08 below). In turn, they mostly reported that they had improved in their Polish very much over the past year. The majority of the study participants were university students, of whom three majored in English studies.
Table 2 provides a detailed overview of the participants’ language learning profiles.
Three of the seven adults participated in dense data collection (DDC; see
Section 4.2 for details). The two females, REBA03 (aged 28) and ROGI18 (aged 32), took Polish classes to be able to communicate with their Polish partners’ families (they both spoke German to their partners). REBA03 was a university librarian and ROGI18 was in the second year of studying to become a secondary school teacher of English and German. The male participant SYLÜ08 (aged 22) was a politics student who was taking Polish classes in preparation for a four-month Erasmus stay in Poland. He remained in Germany for the first two group testing sessions and the first two DDC points. The subsequent two DDC points and the third group testing session occurred during his stay in Poland, with the final main testing session conducted shortly after his return to Germany. This participant was thus expected to show more prominent developmental changes in his L2 and/or L3 perception due to the altered language learning environment. While immersed in the L3 Polish speaking environment, SYLÜ08 also reported increased exposure and use of English and French during this time, English due to the international nature of his exchange program and French as this language happened to be the most ‘convenient’ mutual language he and his mentor shared at the host university. When asked at T4 how much his language skills had improved over the past year, he responded “a little” for English, “much” for French, “not at all” for Spanish, and “very much” for Polish (scale from 0 = not at all, 1 = a little, 2 = much, 3 = very much).
The background information was collected in an extensive language background interview conducted in German at the first (T1) and last (T4) main testing time. This structured interview covered many aspects considered potentially important with regard to the participants’ language development, such as their language learning history, self-assessed proficiency in all of their non-native languages, frequency and context of language use, perceived (phonological) similarity between all of the participants’ languages, as well as language learning attitudes. The participants received a small financial compensation for participating in the study.
4.2. Perception Task and Procedure
Participants undertook a timed forced-choice (FC) goodness task in both their L2 English and L3 German, assessing their perception of final obstruent (de)voicing. Response accuracy and reaction times were recorded across four main testing times for English (T1: 4 weeks, T2: 10 weeks, T3: 20 weeks, T4: 40 weeks of L3 learning) and three main testing times for Polish (starting at T2, due to the limited exposure of Polish beginner learners to the language at T1). Additionally, the three participants introduced above were tested monthly between the second and fourth testing times, resulting in eight testing points for L2 English and seven testing points for L3 Polish.
In the FC goodness task, the participants heard two versions of the same phrase, differing only in the final stimulus item embedded within a carrier phrase. By pressing one of two buttons (marked 1 and 2), they indicated which version sounded more natural to them (i.e., more target-like). One version was a target realization, while the other was an accented realisation. For example, in English, the target phrase “You will hear the word have”/hæv/was contrasted with a manipulated version/hæf/. Similarly, in Polish, the target phrase “Usłyszysz słowo lew”/lef/included final obstruents that were either target-like (voiceless) or accented (voiced). The order of presentation was counterbalanced. The task included 28 English and 30 Polish pair items, with three training pairs preceding the test blocks. The task also included pair items that tested other contrastive features not reported in this study.
Of these, the final obstruent (de)voicing stimuli, which were presented once, were as follows:
English (n = 13): days, grab, leg, could, stab, big, skies, give, love, food, judge, have, rob (manipulated as voiceless, interpretable as L1-German or L3-Polish-accented);
Polish (n = 16): ząb, chleb, obiad, miód, pociąg, śnieg, marchew, lew, obraz, teraz, twarz, masaż, odpowiedź, idź, koledż, brydż (manipulated as voiced, interpretable as L2 English-accented).
The stimuli were randomized and presented using E-Prime 2.0, with a 500 ms inter-stimulus interval and a 3000 ms response window. Performance was analyzed in terms of accuracy and reaction time (RT), with RT serving as a proxy for processing difficulty of the tested stimuli. The RT was measured from the end of the target words. While the main testing sessions were conducted in the Phonetics Lab at a German university, the DDC task was administered remotely at participants’ homes through a link to a SoSci Survey. Due to the impossibility of controlling the participants’ decision-making speed in the home environment, RT data for the DDC points are not reported in this study.
The stimuli were recorded by two female native speakers of English and Polish, who were phonetically trained and capable of producing the target phonetic distinctions. The English speaker was fluent in German, while the Polish speaker was fluent in both English and German. Multiple recordings of the stimuli were made to ensure naturalness, and the most acceptable manipulations were selected based on perceptual assessments by native speakers of the respective languages. The stimuli were produced naturalistically to avoid artificial concatenation.
4.3. Data Analyses
The perception of final voiced and devoiced obstruents (n = 351 for L2 English, n = 336 for L3 Polish, across the four main testing points), was assessed by calculating the mean accuracy and mean RT for each learner at each testing time. Following standard practice, only RTs from accurate responses were included in the analysis. Given the small sample size, non-parametric tests for between-subjects (Mann–Whitney U-test) and within-subject (Friedman test of differences among repeated measures) designs were used for sample-level analyses.
For individual DDC analyses, moving min–max graphs and moving correlations were applied, consistent with methods used in studies conducted within the CDST framework (
Lowie, 2017;
Gut et al., 2023). These tools visualize general patterns of variability using a rolling window of a selected number of data points. In this study, a rolling window of three was chosen considering that seven data points for L3 Polish and eight data points for L2 English were collected from each of the three participants. To further contextualize their performance, participants responded to an open-ended question at the end of each DDC: “How is it going with your learning?”. These brief statements were transcribed to capture any changes in the multilinguals’ language learning input and engagement, and thus to help frame their perception performance.
6. Discussion
The aim of this small-scale study was to examine the development of coda obstruent perception in seven adult multilinguals in both their L2 and L3, which involves a marked phonological process in their L2 and an unmarked phonological process shared in their L1 and L3. In line with the CDST-based research, the study also sought to trace the patterns of individual perception development with dense data collection in three multilinguals over the first year of their L3 learning to offer a more differentiated picture of the nature of multilingual speech perception development.
The first research question addressed the overall development of adult multilinguals’ perception of L2 coda obstruents. It was hypothesized that, with increasing L3 experience, learners would become less accurate and slower in perceiving voiced L2 English coda obstruents. This decline was expected due to learners’ not yet fully internalized phonological process in the L2 (
Wrembel et al., 2020), as well as the new, competing perceptual targets in the L3, which align with their L1 categories (SLM/SLM-r;
Flege, 1995;
Flege & Bohn, 2021). The findings partially confirmed this hypothesis. Initially, the adult L3 learners exhibited relatively high L2 perception accuracy, but this accuracy significantly decreased over time. Concurrently, the learners demonstrated faster rather than slower reaction with time in categorizing devoiced final obstruents in English as target-like. This pattern seems to suggest perceptual restructuring and increased automatization of mapping English final obstruents onto multilinguals’ L1 and/or newly forming L3 categories. These changes in L2 perception were evident after 10 months of instructed L3 learning, which is notable considering that these adult multilinguals had been learning and using English since their early school years and reported no changes in their use or exposure to English during the study period (with the exception of SYLÜ08, see below). Accurate perception of voicedness in coda position in English requires attending to a complex interaction of cues beyond glottal pulsing, such as the longer duration of the preceding vowel (
Krause, 1982). It is therefore likely that the multilingual participants in this study were challenged in perceiving the L2 targets, as demonstrated also in large variability in RT at T1, and did not employ a native-like weighing of all these cues cf., (
Flege & Hillenbrand, 1986;
Broesma, 2005). Instead, they appeared to rely on more familiar and less marked cues of their L1, which they also applied increasingly in L3 perception. These results may be interpreted as supporting the theoretical underpinnings of leading L2 speech models, such as SLM/SLM-r (
Flege, 1995;
Flege & Bohn, 2021), which posit that L1 and L2/L3 categories interact within a shared phonetic space, influencing one another bidirectionally.
The relatively rapid influence of L3 on L2 perception also seems to align with the Phonological Permeability Hypothesis (PPH), which suggests that “L1 and L2 systems are fundamentally different, and that this difference is maturationally conditioned” (
Cabrelli, 2016, p. 2). According to PPH, L2 phonetic categories are less stable and more permeable than L1 categories, making them more susceptible to changes.
Cabrelli (
2016) provided some evidence for PPH in a study on the acquisition of L3 Brazilian Portuguese by two types of sequential bilinguals (L1 English/L2 Spanish speakers and L1 Spanish/L2 English speakers), although this evidence was limited to production. The present study thus appears to be among the first (see also
Nelson, 2020) to suggest that such L3 effects on L2 speech perception and processing are also possible, at least in a scenario where L1 and L3 categories overlap, potentially resulting in a double/combined influence. To test PPH fully, future research will also need to collect perception data from multilinguals’ L1.
Contrary to the prediction that L2 English obstruent perception would remain relatively stable, the findings of this study with adult L3 learners differ from those reported by
Wrembel et al. (
2020), which examined the same phonological process but in adolescent L3 learners. In Wrembel et al.’s study, the young multilinguals performed at a chance level in perceiving L2 English final obstruents at both 5 and 10 months into L3 learning, suggesting minimal acquisition of the target feature by the adolescents. The present study, however, captures even earlier stages of L3 learning, specifically during the first and second month. In fact, statistical analyses revealed that significant results emerged between these early testing times and the final testing. It is thus possible that
Wrembel et al. (
2020) may have missed any changes in their adolescents’ perception performance.
Nelson (
2020) did find such changes in a study conducted with L3 adolescent learners, which she attributed to a temporary perceptual confusion. These findings together highlight the importance of longitudinal data with greater density, ideally beginning with a T0 measurement (prior to L3 learning) to establish a baseline for a more comprehensive understanding of multilingual perceivers’ developmental trajectories. Incorporating such baseline data would also align more closely with the CDST framework and its emphasis on initial conditions e.g., (
de Bot, 2012).
Comparing L2 and L3 perception development, the results of this study showed a comparable perception accuracy in L2 English (ranging from 59% to 73%) and L3 Polish (ranging from 58% to 76%), with significant differences observed only at the second main testing time. At this point, voiced English final obstruents were perceived more accurately than unvoiced Polish final obstruents by the adult multilinguals (73% vs. 58%, respectively). The comparable performance might be explained by different factors relevant to the multilinguals’ two foreign languages. L2 English had been a long-standing language for the adults, learned and used in diverse contexts for many years, which may have allowed for some accurate cue-weighting in their L2 perception, yet not fully target-like. In turn, L3 Polish was a relatively new language for them, with its phonological system just beginning to be familiar. The availability of both L1 and L2 categories for the perception of L3 targets may have caused some initial perceptual competition, as shown in the relatively low L3 accuracy at that time. However, the corresponding fast RT data seems to suggest that the participants rather relied on L1 phonetic categories, which apparently facilitated their L3 perception accuracy beyond mere guessing at such an early stage of learning the new language. An additional reason might be that these adults can be considered highly experienced language learners, with some possessing knowledge of additional foreign languages and/or studying for a language degree. Their good performance in perceiving final obstruents in their new L3 could thus be also attributed to their degree of multilingualism and language learning experience. Previous research has found that such experience confers advantages to L3 learners compared to L2 learners (
Enomoto, 1994;
Kopečková, 2015;
Tremblay & Sabourin, 2012;
Wrembel et al., 2019), arguing that L3 learners possess considerable perceptual acuteness due to their prior language learning experience, allowing them to adapt to new speech learning more quickly and accurately.
The second research question addressed the overall development of adult multilinguals’ perception of L3 coda obstruents in the first year of L3 learning. It was hypothesized that, as they gained more experience with L3 Polish, they would become more accurate and faster in perceiving the Polish sounds by mapping the devoiced obstruents onto their existing L1 categories (SLM/SLM-r;
Flege, 1995;
Flege & Bohn, 2021). This hypothesis was also only partially confirmed. Participants showed significant improvement in their perception accuracy from T2 to T3, although their reaction times increased at the time, indicating that, while they became more accurate in categorizing L3 coda obstruents, this improvement came at the cost of longer processing time. This pattern might be interpreted as suggesting the formation of a new L3 category, rather than mere reliance on existing L1 categories. If the multilinguals had simply mapped the L3 obstruents onto their L1 categories, we would expect not only improved accuracy but also shorter reaction times, reflecting a reduction in cognitive effort. Instead, the increase in reaction times may imply that learners were engaging in more deliberate processing, possibly pointing at the early stages of new L3 category formation.
The hypothesis that L3 perception development would exhibit greater instability compared to L2 perception development was confirmed. After a significant improvement in the perception of L3 final obstruents between T2 and T3, some decline in both accuracy and reaction time was observed. This fluctuating pattern seems to indicate a deal of reorganization within the perception of a new phonological system as L3 learners process the new phonetic input and learn to discern which phonetic cues are relevant for a feature, reconciling multiple options for phonetic categories.
As one reviewer noted, the observed relationship between perception accuracy and reaction time in the study might alternatively reflect the participants’ level of attentiveness during the task. Generally, higher accuracy tended to correspond with longer RTs, whereas lower accuracy was associated with shorter RTs. It is possible that participants became accustomed to the task over time, potentially leading to less careful performance, which might also possibly explain the observed pattern of simultaneous decrease in both accuracy and RT. While this is a plausible explanation that aligns with task habituation effects in repeated measures studies, it is noteworthy that the observed relationship between perception accuracy and RT was not uniform across all participants, and within participants between L2 and L3 tasks, suggesting individual differences in response strategies or developmental trajectories rather than a universal decline in attentiveness. Also, it is unlikely that external factors influenced the participants’ eagerness to complete the tasks quickly during specific data collection sessions. The testing procedure was consistent across sessions, following a strict protocol, with the same research assistant administering the task, and appointments were scheduled based on mutual agreement to ensure participants’ availability. Nonetheless, this point is acknowledged as a potential limitation, and further studies with additional controls for task engagement over time are recommended.
The third research question delved into the extent to which individual learner trajectories in L2 and L3 perception of final obstruents mirrored the overall trends. Using moving min–max analyses across eight and seven testing points, the study revealed considerable inter- and intra-individual variability in the perception of coda obstruent (de)voicing in both L2 English and L3 Polish. For example, ROGI18 initially displayed instability in both L2 and L3 perception, with her L2 perception eventually stabilizing at a level slightly above chance, while her L3 perception remained variable throughout. In contrast, SYLÜ08 showed increasing variability in L2 perception as L3 perception stabilized on non-targets. Meanwhile, REBA03 maintained stable and accurate L2 perception but exhibited increasing variability in L3, which, despite the fluctuations, remained more accurate than her L2. These patterns underscore the highly individualized and non-linear nature of multilingual perception development, as posited by the Complex Dynamic Systems Theory (CDST) (
de Bot, 2012;
Gut et al., 2023). The constantly evolving interactions between individual learner characteristics and available resources, such as engagement and input availability, appear to have driven these learners’ unique speech perception trajectories.
Consider ROGI18, an English student at a German university whose frequent exposure to English, which included both target-like and accented phonetic input, seems to have contributed to her L2 perceptual instability. Concurrently, in her monthly reflections on language learning progress, she reported relatively little engagement with L3 Polish due to other commitments. This constellation of factors may partially explain the unstable development observed in her L2 and L3 perception.
On the other hand, SYLÜ08 exhibited increasingly variable L2 perception as his L3 perception stabilized on non-targets. As an exchange student in Poland, his intensive exposure to (likely accented) English and naturalistic Polish, including the phonological rule of regressive voicing in connected speech in Polish, may have reshaped his initial categorization of English as voicing and Polish as devoicing languages. This direction of development may not have been fully predictable, considering, e.g., SLM-r’s (
Flege & Bohn, 2021) assumption of a linear refinement of target phonetic categories with increased (relevant) phonetic input. In an L3 vowel production study,
Kartushina and Martin (
2019) showed that, as early as two weeks into a study abroad program, multilinguals’ phonological space can undergo significant reorganization to accommodate all target sounds. This is particularly relevant to SYLÜ08, whose initial psychotypological perceptions, as revealed in the background interview, indicated the greatest perceived distance between English and Polish among his languages. This perceived distancing appears to manifest here as perceptual dissociation, leading to increasingly inaccurate perception of final obstruents in both his L2 and L3. At the same time, SYLÜ08 was the only participant to report increased use of French during the study, specifically during his stay abroad. French was a language that the multilingual psychotypologically related to Polish, especially in the phonological domain. Considering that French permits voiced obstruents in word-final position (
Pustka, 2011), and auditory analyses of the multilingual’s productions of final obstruents revealed consistent voicing of final obstruents in his French, it is possible that his representation of French as a voicing language contributed to the observed shifts in his perception of L3 Polish final obstruents as also voiced. Unfortunately, no perception data were collected in this study for participants’ other languages, limiting a fuller understanding of these interactions. This case thus highlights the critical importance of rigorously collecting perception data across truly all of a multilingual’s languages to fully appreciate the complexity of their perceptual development and cross-linguistic interactions.
REBA03, an example of an ambitious L3 learner, consistently expressed concerns throughout the DDC testing period about whether she was learning enough to communicate effectively in Polish. She also critiqued her classroom experience, particularly regarding cancelled classes and insufficient teacher input. Midway through the testing period, she reported a change in her Polish course, which she welcomed with hopes for “more intensive engagement, finally”. The sudden increase in variability in her overall accurate Polish performance around this time could thus be interpreted as a new developmental stage in her L3 learning (
van Dijk & van Geert, 2007), which also showed some instances of 100% perception accuracy.
The relationship between L2 and L3 perception development in these learners was further explored using moving correlation analyses, revealing an alternation between supportive and competitive relations (
Yu & Lowie, 2019). Interestingly, this alternation was most evident between the third and fourth testing points, roughly halfway through the multilinguals’ first year of L3 learning. While the exact trigger for this shift is difficult to pinpoint, it is noteworthy that this period coincided with end-of-term exams in Polish (and other subjects) for the two undergraduate students, during which they reported more focused and intensive engagement with learning their L3. For the librarian, this period aligned with the change of her Polish teacher, accompanied by hopes for more tangible learning outcomes.
When comparing the relationship between L2 and L3 perception development at the individual level to the sample-level patterns, only one learner’s trajectory was in line with the overall trend, showing a decline in L2 perception accuracy as L3 perception improved. In contrast, another learner experienced a simultaneous decrease in accuracy for both L2 and L3 perception. The third learner’s L3 perception improved to ceiling levels, while L2 perception remained stable, though slightly less accurate than her L3. Each of the multilinguals’ perception performance was shaped by a unique blend of interacting factors, leading to a developmental pattern that could not be fully predicted. These insights thus illustrate that individual and group analyses of speech perception development can offer complementary perspectives, portraying a more nuanced picture of multilingual speech perception development.