Crosslinguistic Inﬂuence in the Discrimination of Korean Stop Contrast by Heritage Speakers and Second Language Learners

: The present study examines the extent of crosslinguistic inﬂuence from English as a dominant language in the perception of the Korean lenis–aspirated contrast among Korean heritage speakers in the United States (N = 20) and English-speaking learners of Korean as a second language (N = 20), as compared to native speakers of Korean immersed in the ﬁrst language environment (N = 20), by using an AX discrimination task. In addition, we sought to determine whether signiﬁcant dependencies could be observed between participants’ linguistic background and experiences and their perceptual accuracy in the discrimination task. Results of a mixed-effects logistic regression model demonstrated that heritage speakers outperformed second language learners with 85% vs. 63% accurate discrimination, while no signiﬁcant difference was detected between heritage speakers and ﬁrst language-immersed native speakers (85% vs. 88% correct). Furthermore, higher verbal ﬂuency was signiﬁcantly predictive of greater perceptual accuracy for the heritage speakers. The results are compatible with the interpretation that the inﬂuence of English on the discrimination of the Korean laryngeal contrast was stronger for second language learners of Korean than for heritage speakers, while heritage speakers were not apparently affected by dominance in English in their discrimination of Korean lenis and aspirated stops.


Introduction
Heritage speakers (HS) are defined as individuals raised in homes where a minority language is spoken, while a different language is dominantly spoken in their country of residence (Valdés 2000(Valdés , 2005. According to the U.S. Census Bureau (2018), there are 1.1 million Korean HSs in the US who are, by definition, bilingual speakers (Korean and English) to some extent. Although they represent a distinct, and in some ways unique, category of bilinguals, the existing models of additional language acquisition can be extended to account for heritage language development. Specifically, the Speech Learning Model (SLM/SLM-r Flege 1995Flege , 2003Flege and Bohn 2020), developed primarily with adult second language learners in mind, postulates that coexisting sound systems share a joint phonetic space, where first (L1) and second language (L2) sound categories jostle for position. To our knowledge, SLM is the only model that explicitly predicts that acquisition of additional languages can influence learners' L1. A number of mechanisms of SLM predict various types of crosslinguistic influence at different stages of L2 acquisition, including mergers or assimilation, as well as dissimilation, but the critical prediction of the theory is that crosslinguistic influence is expected between the sound categories of L1 and L2. The model assumes that crosslinguistic influence is primarily imposed on the sound categories which are phonetically and phonologically similar but not acoustically identical across the two languages (Flege 1987). A crosslinguistic link is established among such categories, which results in mutual attraction (or, at times, repulsion). For the purposes of this study, we define crosslinguistic influence as phonetic changes in a sound category of one language which make it more similar to a related sound category of another language. The goal of this study was to determine to what extent such crosslinguistic influence from English on Korean laryngeal categories in stop consonants can affect the discrimination of Korean lenis and aspirated stops by HSs and L2 learners of Korean.
Crosslinguistic influence in bilingual L1 speech production is well established (Baker and Trofimovich 2005;Bergmann et al. 2016;Caramazza et al. 1973;Chang 2012;Dmitrieva et al. 2010;Flege 1987;Eefting 1987a, 1987b;Fowler et al. 2008;Guion 2003;Harada 2003;Lang and Davidson 2019;Major 1992;Peng 1993;Sancier and Fowler 1997;Stoehr et al. 2017). For example, Dmitrieva et al. (2010) showed that Russian L1 immigrants residing in the US pronounced Russian word-final obstruents in a more English-like fashion than monolingual Russian speakers. Similarly, Chang (2012) showed that English-speaking learners of Korean drifted towards Korean norms in their pronunciation of English vowels and word-initial stops during an intensive language course in South Korea.
In perception, evidence is less abundant. Nevertheless, recent work demonstrates that experience with additional languages can affect the way native speech is perceived. For example, in Dmitrieva (2019), Russian (L1)-English (L2) bilinguals applied Englishlike perceptual strategies in identifying Russian (L1) sounds (see also Antoniou et al. 2012;Garcia-Sierra et al. 2009;Hazan and Boulakia 1993). This evidence suggests that crosslinguistic influence between sound categories that are comparable across languages can result in the restructuring of L1 production targets and L1 perceptual representations.
The present work examines the extent of crosslinguistic influence from English on the perception of Korean stop consonants in Korean HSs and English-speaking L2 learners. Korean exemplifies a three-way laryngeal distinction in stop consonants: fortis (also known as tense), lenis (also known as lax or plain), and aspirated, e.g., [t* − t − t h ]. The Korean stop contrasts are typologically unusual and perceptually challenging, but at the same time sufficiently similar to the English laryngeal categories to trigger crosslinguistic interactions (Ahn et al. 2017). In fact, previous research contains evidence suggestive of such crosslinguistic influence between English and Korean laryngeal categories in HSs and other types of bilinguals, both in production and perception (Chang and Mandock 2019;Cheon and Lee 2013;Cheng 2017). In the current study, we focused on the perception of word-initial lenis-aspirated distinction in Korean stops-a contrast that has already received plentiful attention in the literature but remains an attractive topic due to its uncommon phonetic implementation. To our knowledge, this is the first study specifically examining the perception of lenis-aspirated contrast by HSs using the methodological approach outlined in the later portion of this section.
In the remainder of this introduction, we review phonetic properties of Korean laryngeal stop contrasts as well as developmental patterns in their acquisition and provide a comparison to the phonetics of English voicing distinctions in stops. We also review crosslinguistic assimilation patterns between Korean and English series of stops and zoom in on the Korean laryngeal categories that are the most challenging for both L1 and L2 learners. We then formulate the predicted directions of crosslinguistic interactions between Korean and English laryngeal categories and review existing findings supporting these predictions. Finally, we propose our hypotheses and provide more details about the design of the study.
Two phonetic parameters are majorly involved in distinguishing the three Korean stop contrasts in production and cueing their identification in perception: voice onset time or VOT (time elapsed between the release of a stop and the onset of vocal fold vibration) and onset F0 (fundamental frequency at the onset of the vowel following the stop consonant). Fortis stops have the shortest VOT (about 20 ms, according to Kang and Guion 2008), followed by lenis and then aspirated stops, both of which have relatively long VOTs (about 70 ms for both, from Kang and Guion 2008). Lenis stops are distinguished from fortis and aspirated ones via the lowest onset F0 (Cho et al. 2002;Han and Weitzman 1970;Kagaya 1974;Kim et al. 2002;Kong et al. 2011;Lee et al. 2013Lee et al. , 2020Lee and Jongman 2019). Moreover, Languages 2022, 7, 6 3 of 20 the word-initial lenis-aspirated stop contrast is believed to be undergoing tonogenesis-the emergence of a tonal distinction on the basis of existing consonantal laryngeal contrasts and the ultimate replacement of the laryngeal categories by tone Kang 2014. Specifically, an ongoing sound change in the Seoul-Gyeonggi dialect of Korean is merging the VOTs of lenis and aspirated stops, especially among younger speakers (Bang et al. 2018;Chang and Mandock 2019;Kang 2014;Kang and Guion 2008;Kong and Yoon 2013;Silva 2006), with onset F0 becoming the dominant cue to this distinction (Kang et al. 2010;Kim et al. 2002;Kong et al. 2011;Lee et al. 2013Lee et al. , 2020Lee and Jongman 2019).
The English language, in contrast, has a two-category contrast between voiced and voiceless stops. In the word-initial position, voiced stops are characterized by shorter VOTs while voiceless stops have longer VOTs (Abramson and Lisker 1985). Onset F0 also correlates with phonological voicing in English (Dmitrieva et al. 2015;Ohde 1984;House and Fairbanks 1953;Lehiste and Peterson 1961) but plays a decisively secondary role in perception (Idemaru et al. 2012;Llanos et al. 2013;Whalen et al. 1993).
Developmental evidence indicates that it takes longer for Korean-speaking children to master native laryngeal categories (not before the age of 4) than for English-speaking children to acquire English voicing categories (fully developed by the age of 2 or 3, depending on the source) (Bernthal et al. 2013;Jun 2007;Kim and Stoel-Gammon 2009;Lowenstein and Nittrouer 2008). Furthermore, fortis consonants are the earliest to be acquired in Korean (by 17 months of age), while lenis ones, and their contrast with aspirated stops, in particular, are the last to be mastered (Choi et al. 2019;Jun 2007;Kim and Stoel-Gammon 2009). Kong et al. (2011) suggested that early acquisition of fortis stops is due to the fact that only VOT needs to be mastered in order to distinguish them from the other categories. In contrast, the necessity to use F0 in order to identify lenis stops makes them acquisitionally challenging, both for L1 and L2 learners (Chang and Mandock 2019;Cheon and Lee 2013;Ko 2018;Oh et al. 2010). With respect to our study, relatively late acquisition of Korean laryngeal categories, especially the lenis-aspirated contrast, and its perceptual complexity, may make this contrast more susceptible to crosslinguistic influence from the dominant language in HSs and L2 learners.
In terms of the perceived similarities between English and Korean stop categories, both English-immersed non-heritage and HSs of Korean readily assimilate Korean aspirated stops to English voiceless stops (Cheon and Lee 2013;Schmidt 1996). Crosslinguistic assimilation patterns are less clear-cut for the rest of the Korean categories. Nevertheless, Cheon and Lee (2013) showed that non-heritage and HSs of Korean perceived Korean lenis stops as better exemplars of the English voiced than voiceless category. These findings suggest that the Korean lenis-aspirated pair can be assimilated to the English voiced and voiceless stops, respectively, by Korean-English bilinguals. Given their acoustic and perceptual similarity, SLM predicts that Korean and English laryngeal categories can influence each other in the speech of bilinguals.
Previous research indeed demonstrated patterns compatible with the crosslinguistic influence between Korean and English laryngeal categories in bilingual speakers. For example, Kang and Guion (2006) showed that Korean-English bilinguals produced a greater distinction between English voiced and voiceless stops in terms of onset F0 than English monolinguals (see also Kim 2012;Kong and Yoon 2013). Reliance on onset F0 in the perception of voicing in English was also shown to be greater for Korean speakers than for English speakers (Kim 1994;Kong and Yoon 2013;Kim 1994). Given the fact that Korean laryngeal contrasts are cued heavily via onset F0 while English voiced-voiceless contrast relies almost exclusively on VOT, these findings strongly suggest the effect of Korean on bilinguals' English.
On the flip side, as a result of English affecting Korean, several studies reported that bilinguals (HSs in particular)' Korean productions of the lenis-aspirated contrast exhibited a greater reliance on VOT and a lesser reliance on F0, in contrast with monolingual speakers of Korean (Cheng 2017;Kang and Nagy 2016;Oh and Daland 2011; see also (Lee and Iverson 2012;Oh 2019;Yoon 2015), for evidence from bilingual children). Unfortunately, Languages 2022, 7, 6 4 of 20 to our knowledge, no previous work has examined cue-weighting in perception of Korean stops among Korean-English bilinguals in order to determine whether their perceptual reliance on VOT was also greater than in Korean monolinguals. Kong (2012) reported that while F0 dominated the perception of lenis stops by native Korean speakers living in the USA, VOT also contributed to the categorization of Korean lenis stops. This result could be interpreted as evidence of English affecting the perception of Korean, but in the absence of a monolingual control group, this interpretation remains tentative.
Thus, previous research provides evidence of mutual crosslinguistic influence between Korean and English laryngeal categories in the bilingual speech production and perception. Building on this evidence, we hypothesize that Korean HSs in the US will perceive the Korean lenis-aspirated contrast differently from L1-immersed (L1-i) speakers, due to the influence of English. We assume that the difference will be due to the non-native weighting of perceptual cues to the contrast (a lesser reliance on F0 and a greater reliance on VOT, compared to L1-i speakers). Research in L2 acquisition demonstrates that listeners who do not sufficiently utilize in perception an acoustic dimension that is highly involved in implementing the contrast in production, may underperform on discrimination or in the identification of relevant categories (Cebrian 2006;Tohkura 1990, 1992).
The present study represents the first step in the investigation of the hypothesis outlined above. Our main research question concerns the extent to which HSs' discrimination of the Korean lenis-aspirated stop contrast differs from that of L2 learners and L1-i speakers of Korean. We make the additional comparison between HSs and L2 learners of Korean (English L1) in order to gain a better understanding of the roles of the age of onset of acquisition, and the roles of experiential factors, such as language use and exposure. Since L2 learners typically begin acquiring their L2 significantly later than HSs, and often do not have equivalent opportunities for language exposure and practice, we hypothesized that the HSs will outperform the L2 learners. As part of our hypothesis, we also expect both groups to behave differently from the L1-i speakers due to contact with English. Finally, we investigate whether aspects of participants' linguistic background and experiences such as age of acquisition, proficiency, or frequency of language use and exposure are predictive of HSs' and L2 learners' performance on the discrimination task.

Participants
Three groups of participants took part in the study: 20 HSs of Korean, 20 Englishspeaking L2 learners of Korean, and 20 Korean L1-i speakers. All participants in the experimental groups were recruited online either via Prolific, an online participant recruitment platform for social science research, or using a snowball technique. Before performing the task, all participants completed a consent form and filled in a language background questionnaire. All participants were compensated for their participation and completed the modified version of Language Experience and Proficiency Questionnaire (LEAP-Q, Marian et al. 2007). The following section reports details about the language backgrounds of the participants. No participant reported any disability or difficulty in speaking, hearing, and vision, which could hinder participation in the AX discrimination task that involves responding to both visual and auditory stimuli.
The HS group consisted of 20 speakers of Korean born and raised in the United States and whose parents were first-generation immigrants and native speakers of Korean. The group included 11 females and 9 males, ranging between 19 and 42 years old (M = 25.5). At the moment of the experiment, they reported being exposed to Korean, as opposed to English, about 39% of the time, on average (although this number varied from 10% to 100%). In terms of choosing to speak Korean, when both Korean and English were available options, participants reported that they opted for Korean only 30% of the time, on average (again, with a wide range from 0% to 100%). HSs attributed their knowledge of Korean primarily to interactions with family members, watching TV, and reading in Korean. The majority of the participants have never resided in Korea, although five of them reported Languages 2022, 7, 6 5 of 20 stays between 1-3 years in duration, and one lived in Korea for 10 years. The reported age of onset of English acquisition in this group, on average, was at the age of 2 (ranging from 1 to 5). Their average self-reported English speaking and comprehension proficiency was 6.8 and 6.7, respectively (on a 7-point scale), with the vast majority indicating excellent proficiency. Their average self-reported Korean speaking, and comprehension proficiency was somewhat lower, at 4.7 and 5.1 (on a 7-point scale). In addition to self-reported proficiency estimates, we employed a more objective measure of Korean proficiency by evaluating participants' verbal fluency in a narrative task via calculating the articulation rate. The average articulation rate for HSs was 4.2 syllables per second.
The L2 group consisted of 20 L2 learners (14 females and 6 males) with an age range of 18 to 42 years old (M = 26.3). All of them were native speakers of American English, learning Korean as an L2. All but three of these participants have not spent any time in Korea; the duration of stay for these three was from 1 to 2 years. Their age of onset of acquisition was around 21 years of age, qualifying them as adult L2 learners. The participants reported considerably more exposure to English (83%) than to Korean (17%). Most of the exposure came from watching TV and reading in Korean. Their average self-reported English speaking and comprehension proficiency was a perfect 7 on both counts, while their average Korean speaking and comprehension proficiency was at 2.7 and 3 (on a 7-point scale). Their average articulation rate in the narrative task was 3.4 syllables per second.
The L1-i group consisted of 20 native speakers residing in Korea (14 females and 6 males; age range, 20 and 34 years old (M = 24.2). All of them were born and raised in South Korea, and their residences at the time of the experiment were in the so-called Seoul-Gyeonggi area in which a standard variety of Korean is spoken. These participants were either students or alumni of a university located in Seoul. As English is ubiquitous in the South Korean educational system, all L1-i participants had some exposure to English. These participants' self-reported start of English acquisition was higher than that of HSs, on average (M = 7.5) and on the individual basis (for the most part, the earliest reported AOA of 4-5 years of age in this group was the latest AOA for the heritage group). Their average self-reported English speaking and comprehension abilities were only 3 and 4, respectively (on a 7-point scale), while their self-reported Korean proficiency was at 7 on both counts. Their average articulation rate in the narrative task was 4.2 syllables per second. Table 1 reports the summary of the language backgrounds of the participants in all groups.  (2007), were used as stimuli. Four female native speakers, three of whom were originally from Seoul and one from the Chung Cheong area, recorded the stimuli. These speakers had been in the US for no more than three years at the time of the recording and reported being exposed to English about 30% of the time or less, on average (see Schmidt 2007 for more detail). From the original set of stimuli, which consisted of 684 CV items, 72 were selected (18 unique CV combinations recorded by four different speakers). These words all began with a lenis or aspirated stop consonant of three different places of articulation (/p/, /t/, /k/) and contained three different vowels (/a/, /i/, /W/), resulting in a total of 18 unique CV words.
To ensure the quality of the audio files, a Korean investigator in the current study evaluated and selected the stimuli. In addition, we performed acoustic analysis along with a visual inspection of the spectrograms of all stimuli, using Praat 6.1.10 (Boersma and Weenink 2021) to ensure that lenis stops and aspirated stops were differentiated by F0 and had long-lag VOT values. The results showed that the contrast between the two types of stops was implemented via both F0 (t = 10.65, p < 0.001) and VOT (t = 5.62, p < 0.001). However, although the VOT values were significantly different between lenis and aspirated stops, a considerable overlap in the VOT values between the two kinds of stops is observed, which is especially conspicuous when compared to F0 values as shown in Figure 1. This overlap in VOT values may contribute to the perceptual difficulty in its categorization. This analysis strongly suggests that the use of onset F0 is important for the discrimination of these stimuli because VOT is likely to be a somewhat unreliable cue on its own.

Stimuli for the AX Discrimination Task
Seventy-two monosyllabic CV words, adapted from Schmidt (2007), were used as stimuli. Four female native speakers, three of whom were originally from Seoul and one from the Chung Cheong area, recorded the stimuli. These speakers had been in the US for no more than three years at the time of the recording and reported being exposed to English about 30% of the time or less, on average (see Schmidt 2007 for more detail). From the original set of stimuli, which consisted of 684 CV items, 72 were selected (18 unique CV combinations recorded by four different speakers). These words all began with a lenis or aspirated stop consonant of three different places of articulation (/p/, /t/, /k/) and contained three different vowels (/a/, /i/, /ɯ/), resulting in a total of 18 unique CV words.
To ensure the quality of the audio files, a Korean investigator in the current study evaluated and selected the stimuli. In addition, we performed acoustic analysis along with a visual inspection of the spectrograms of all stimuli, using Praat 6.1.10 (Boersma and Weenink 2021) to ensure that lenis stops and aspirated stops were differentiated by F0 and had long-lag VOT values. The results showed that the contrast between the two types of stops was implemented via both F0 (t = 10.65, p < 0.001) and VOT (t = 5.62, p < 0.001). However, although the VOT values were significantly different between lenis and aspirated stops, a considerable overlap in the VOT values between the two kinds of stops is observed, which is especially conspicuous when compared to F0 values as shown in Figure 1. This overlap in VOT values may contribute to the perceptual difficulty in its categorization. This analysis strongly suggests that the use of onset F0 is important for the discrimination of these stimuli because VOT is likely to be a somewhat unreliable cue on its own.

Verbal Narrative for Proficiency Estimation
A picture book of Little Red Riding Hood consisting of a sequence of wordless pictures (nine pictures total) was used for a picture description narrative task, which provided

Verbal Narrative for Proficiency Estimation
A picture book of Little Red Riding Hood consisting of a sequence of wordless pictures (nine pictures total) was used for a picture description narrative task, which provided verbal narratives to be used for estimating participants' speaking proficiency in the Korean language. Narrative elicitation is a standard measure in the literature to examine grammatical growth (Cuza 2010;Montrul 2002;Rojas and Iglesias 2013;Sebastián and Slobin 1994). Participants took about five minutes on average to narrate the story. All groups performed an AX discrimination task which is a simple but widely used task for group comparison in terms of perceptual performance (Lee-Ellis 2012), using Gorilla interface (Anwyl-Irvine et al. 2020; an online research platform for behavioral scientists). Using the headphone check function in Gorilla, we ensured that all participants were wearing headphones or earbuds at the start of the experiment. During the AX task, participants listened to pairs of Korean monosyllabic words that potentially differed in the word-initial stop (lenis/aspirated) only and judged whether the two word-initial consonants were the same or different. On each trial, words A and X were played with an interval of 200 ms between the two. Right after the playback, two color-coded 'buttons' were displayed on the screen, a red button corresponding to 'different' and a blue button corresponding to 'the same' response. Participants registered their decision by clicking on one of the two buttons. Trials were separated by a period of 500 ms of a blank screen. Stimuli were randomized for each presentation. The same stimuli were used for all groups. An equal number of different and same pairs was used. The members of each pair were also presented an equal number of times in each order (AX and XA) and each unique pair was presented twice to each participant. The total number of trials amounted to 288 ((3 places of articulation (/p/, /t/, /k/) * 3 types of vowels (/a/, /i/, /W/) * 2 orders (AX and XA) * 2 repetitions * 4 different speakers) + an equal number of 'same' pairs).
Importantly, in the 'same' pairs the two words were not acoustically identical, but they did begin with the same type of consonant (in terms of place of articulation and laryngeal specification). In other words, for each 'same' pair, two different tokens of the same word recorded by the same speaker were used. This decision was made to render this admittedly very simple task somewhat more complex. As a result, participants had to determine that the two initial stops were phonologically the same, rather than simply acoustically identical, to make a 'same' decision. Additionally, phonetic information from the same-pair words could not help in making this decision since the two of the 'same' words were actually two different recordings of the same consonant-vowel combination.
Before the experimental task began, written instructions (in English for L2 learners and HSs and in Korean for L1-i speakers) were displayed on the screen and a short practice block of five trials was provided in order to familiarize participants with the task. This task took approximately 12 min for each participant to complete.

Picture Description Narrative Task
The purpose of the picture description task was to obtain verbal narratives which could be used to estimate participants' verbal proficiency via obtaining a measure of their speech fluency. All participants who had completed the AX discrimination task also performed a picture description narrative task on the same platform, Gorilla. Participants had an opportunity to take an optional five-minute break after completing the AX task and before moving on to the picture description task. After the break, written instructions were displayed on the screen, which requested that the recording be performed in a quiet place to minimize background noise in the acoustic signal. In this task, participants were asked to narrate a story of Little Red Riding Hood in Korean by describing a sequence of wordless pictures. Their production was recorded using the recording function in Gorilla, and participants could use any recording device of their preference for this task. The productions elicited from the narrative task were analyzed acoustically to measure the articulation rate of each participant. The articulation rate was calculated by dividing the number of syllables in the narrative (estimated as the number of vocalic nuclei) by the duration of the narrative (without pauses) to obtain the number of syllables per second rate (see Baker-Smemoe et al. 2014;De Jong 2018;Ginther et al. 2010;Kormos and Dénes 2004;Nagy and Brook 2020 for more discussion). Each recording was analyzed by deploying a script by De De Jong and Wempe (2009) in Praat 6.1.10.

Analyses
Participants' responses in the AX discrimination task were categorized as correct or incorrect and submitted to statistical analyses performed in RStudio 1.4.1103 (RStudio Team 2020) using the 'lme4' package (Bates et al. 2015).
Three mixed-effects logistic regression models were implemented. In all three, perceptual accuracy coded as '1' (correct answer) or '0' (incorrect answer) was used as a binary categorical dependent variable. The first model was used to analyze the effect of the participant group on discrimination performance. It included group (HS-reference level, L2, L1-i), trial type (same or different-reference level), their interaction, and speaker (the four female speakers who recorded the stimuli, Speaker 1 as reference level) as fixed effects. It also included item and subject as random intercepts. Independent variables were treatment coded.
The second and the third models were conducted on the data from the HS group and the L2 group, respectively, and were implemented in order to determine whether participants' background characteristics were predictive of their perceptual accuracy in the discrimination task. These models included Korean usage (participants' self-reported percentage of choosing to speak Korean, from 0%-English only to 100%-Korean only), Korean exposure (participants' self-reported percentage of current exposure to Korean, from 0%-English only to 100%-Korean only), age of L2 acquisition or AOA (English for HSs' group and Korean for L2 learners' group), and individual articulation rate as fixed factors; subject and item were entered as random effects. Due to the low quality of recorded audio files and other technical issues, six participants were excluded from this analysis, resulting in 18 HSs in the second model and 16 L2 learners in the third model.

Effects of Group on Perceptual Accuracy
The results of the mixed-effects logistic regression model showed that the HS group was not significantly different from the L1-i group (p = 0.23) while being significantly different from the L2 group (β = −2.01, SE = 0.16, p < 0.001) in terms of the accuracy of their discrimination responses. A pairwise post hoc analysis using multiplicity adjustment, averaged over type and speaker, confirmed the result while also indicating that the estimated mean of the L1-i group and that of the L2 group were significantly different (β = −1.78, SE = 0.16, p < 0.001). As shown in Figure 2, the HS group was similar to the L1-i group in terms of the accuracy in the AX task, while both were considerably more accurate than the L2 group.

Effects of Speaker on Perceptual Accuracy
The results showed that Speakers 2 and 4, but not Speaker 3, were significantly different from Speaker 1 in their effect on the perceptual judgment; listening to these speakers increased the odds of accurate perceptual response by a factor of 2.66 and 2.50, respectively, when compared to Speaker 1 (Speaker 2: β = 0.98, SE = 0.17, p < 0.001, Speaker 3: p = 0.06, Speaker 4: β = 0.91, SE = 0.16, p < 0.001), adjusted for group and trial type. A pairwise post hoc comparison analysis was conducted using multiplicity adjustment (Tukey) to compare other pairs of speakers. The results revealed that Speaker 3 decreased perceptual accuracy compared to Speaker 2 (β = −0.68, SE = 0.17, p < 0.001) and compared to Speaker 4 (β = −0.61, SE = 0.17, p = 0.001). This indicates that when the lenis-aspirated stop contrast was represented with audio files recorded by Speakers 1 and 3, participants found it more difficult to discriminate the contrast compared to the recordings by other speakers. Figure 3 shows the results of the pairwise analysis of the effects of speaker on perceptual judgment. The estimated mean difference indicates the difference in the coefficients of each speaker pair in the mixed-effects logistic regression model. For example, Speaker 1 has negative values in the estimated mean difference in the comparison to all the other speakers (Speakers 2, 3, and 4), indicating that participants

Effects of Speaker on Perceptual Accuracy
The results showed that Speakers 2 and 4, but not Speaker 3, were significantly different from Speaker 1 in their effect on the perceptual judgment; listening to these speakers increased the odds of accurate perceptual response by a factor of 2.66 and 2.50, respectively, when compared to Speaker 1 (Speaker 2: β = 0.98, SE = 0.17, p < 0.001, Speaker 3: p = 0.06, Speaker 4: β = 0.91, SE = 0.16, p < 0.001), adjusted for group and trial type. A pairwise post hoc comparison analysis was conducted using multiplicity adjustment (Tukey) to compare other pairs of speakers. The results revealed that Speaker 3 decreased perceptual accuracy compared to Speaker 2 (β = −0.68, SE = 0.17, p < 0.001) and compared to Speaker 4 (β = −0.61, SE = 0.17, p = 0.001). This indicates that when the lenis-aspirated stop contrast was represented with audio files recorded by Speakers 1 and 3, participants found it more difficult to discriminate the contrast compared to the recordings by other speakers. Figure 3 shows the results of the pairwise analysis of the effects of speaker on perceptual judgment. The estimated mean difference indicates the difference in the coefficients of each speaker pair in the mixed-effects logistic regression model. For example, Speaker 1 has negative values in the estimated mean difference in the comparison to all the other speakers (Speakers 2, 3, and 4), indicating that participants performed worse in discriminating the lenis-aspirated stop contrast when evaluating the stimuli recorded by Speaker 1 compared to all other speakers.
To investigate the source of this differences, we examined the VOT and F0 in all stimuli as correlates of lenis and aspirated stops for each speaker separately. A visual inspection of these values, plotted in Figure 4, reveals that Speaker 1 had the greatest amount of variability in VOT values of the lenis stops, while Speaker 3 had the least amount of difference in F0 values between the two kinds of stops. This suggests that the increased variability in VOT and the decreased distinction in F0 in the realization of these parameters as correlates of the lenis-aspirated contrast may have caused the perceptual difficulties experienced by the participants.  To investigate the source of this differences, we examined the VOT and F0 in all stimuli as correlates of lenis and aspirated stops for each speaker separately. A visual inspection of these values, plotted in Figure 4, reveals that Speaker 1 had the greatest amount of variability in VOT values of the lenis stops, while Speaker 3 had the least amount of difference in F0 values between the two kinds of stops. This suggests that the increased variability in VOT and the decreased distinction in F0 in the realization of these parameters as correlates of the lenis-aspirated contrast may have caused the perceptual difficulties experienced by the participants.

Effects of Trial Type on Perceptual Accuracy
The effect of trial type (same trials/different trials) did not significantly affect perceptual judgment in the AX task (p = 0.84) when adjusted for group and speaker. However, the fixed effect of the interaction between group and type was found to significantly affect perceptual accuracy in the same trials for the L2 group compared to the HS group (β = 1.10, SE = 0.10, p < 0.001). On the other hand, the HS group's response did not differ by the trial type when compared with that of the L1-i group (p = 0.05). In order to examine the main effect of the interaction factor between group and type, a pairwise post-hoc analysis using Tukey adjustment was conducted. Figure 5 shows the linear prediction of the interaction effect between Group and Type. In the different trial type, the HS group was not significantly different from the L1-i group (p = 0.47). In contrast, both the HS group and the L1-i group performed significantly better than the L2 group in discriminating the stop contrasts in different trials. The odds ratios of accurate discrimination increased by a factor of 7.46 and 9.08 for these groups compared to the L2 group, respectively (HS-L2: β = 2.01, SE = 0.16, p < 0.001, L1-i-L2: β = 2.21, SE = 0.16, p < 0.001). by a factor of 2.49 and 3.83 for these two groups, respectively (HS-L2: β = 0.91, SE = 0.16, p < 0.001, L1-i-L2: β = 1.34, SE = 0.17, p < 0.001).
These results indicate that while the HS group and the L1-i group performed comparably in the different trials, the L1-i group performed slightly more accurately in the same trials. On the other hand, the L2 group showed significantly lower accuracy in both sameand different-word than the HS and L1-i groups. This indicates that detecting a lenisaspirated distinction was especially challenging for the L2 participants. Figure 5. Linear prediction of the interaction effect between group and type. Y-axis represents the estimated means (coefficients) for each group in the pairwise post-hoc analysis of the mixed-effects logistic regression model. Figure 6 shows the average accuracy scores of each group by trial. In addition to the statistical analysis, there are three descriptive aspects of the result that are worth In the same trial type, the HS group was significantly different from the L1-i group. The odds ratio of accurate judgment decreased by 35% for the HS compared to the L1-i group (β = -0.43, SE = 0.17, p = 0.03). Both the HS and L1-i groups significantly outperformed the L2 group on same trials. The odds ratios of accurate judgement were increased by a factor of 2.49 and 3.83 for these two groups, respectively (HS-L2: β = 0.91, SE = 0.16, p < 0.001, L1-i-L2: β = 1.34, SE = 0.17, p < 0.001).
These results indicate that while the HS group and the L1-i group performed comparably in the different trials, the L1-i group performed slightly more accurately in the same trials. On the other hand, the L2 group showed significantly lower accuracy in both same-and different-word than the HS and L1-i groups. This indicates that detecting a lenis-aspirated distinction was especially challenging for the L2 participants. Figure 6 shows the average accuracy scores of each group by trial. In addition to the statistical analysis, there are three descriptive aspects of the result that are worth mentioning. First, the average perceptual accuracy was only slightly higher for the same trials than in the different trials in the L1-i and HS groups, while the difference was quite pronounced in the L2 group. Second, the L2 group showed the greatest difference between the same and different trials. Third, even L1-i speakers did not reach 100% accuracy in the AX task. This confirms that the lenis-aspirated stop contrast is perceptually difficult not only for L2 learners and HSs but for L1-i speakers as well. Table 2 summarizes the results of the fixed effects in the logistic mixed-effects model. mentioning. First, the average perceptual accuracy was only slightly higher for the same trials than in the different trials in the L1-i and HS groups, while the difference was quite pronounced in the L2 group. Second, the L2 group showed the greatest difference between the same and different trials. Third, even L1-i speakers did not reach 100% accuracy in the AX task. This confirms that the lenis-aspirated stop contrast is perceptually difficult not only for L2 learners and HSs but for L1-i speakers as well. Table 2 summarizes the results of the fixed effects in the logistic mixed-effects model.

Effects of Articulation Rate on Perceptual Accuracy
The second mixed-effects model in the study analyzed the effects of background factors on the perceptual judgment made by HSs. The results showed that articulation rate significantly affected the perceptual accuracy of HSs by increasing the odds of accurate

Effects of Articulation Rate on Perceptual Accuracy
The second mixed-effects model in the study analyzed the effects of background factors on the perceptual judgment made by HSs. The results showed that articulation rate significantly affected the perceptual accuracy of HSs by increasing the odds of accurate perceptual response by a factor of 1.83 as the articulation rate increased by 1 syllable/sec (β = 0.60, SE = 0.21, p < 0.01), adjusted for other factors. The result suggests that HSs who were more fluent in Korean (and likely, more proficient overall) were also more accurate in discriminating the lenis-aspirated stop contrast than those who were less fluent. In contrast, all other factors included in the model-AOA of English, percent Korean use, and percent Korean exposure-showed no significant relation to the perceptual accuracy in the current dataset.
The third model analyzed the L2 group and predicted the effects of the background factors on the perceptual judgment in the AX task. None of the factors in the model were significantly related to the perceptual accuracy of L2 speakers.

Discussion
The present study aimed to determine whether HSs of Korean in the United States find it relatively more challenging to distinguish Korean lenis stops from Korean aspirated ones in an AX discrimination task compared to the L1-i speakers in Korea. We also collected data from L2 learners of Korean with English as their native language in order to determine whether a similar disadvantage would be observed for this group but to a greater degree than for HSs. These hypotheses were motivated by several assumptions, including the fact that the lenis-aspirated contrast is proven to be a perceptually challenging one, possibly due to its extensive (and largely exclusive) reliance on onset F0 as a perceptual cue. Another underlying assumption is that English and Korean laryngeal categories can influence each other, both in production and perception of bilingual speakers. One anticipated outcome of this influence is less than optimal reliance on onset F0 in distinguishing lenis and aspirated stops by Korean HSs, due to the effect of English in which VOT serves as the dominant feature distinguishing laryngeal categories. As a result, lenis-aspirated discrimination accuracy was predicted to suffer in HSs, as well as L2 learners (for the same reason).
The results only partially supported our predictions. L2 learners of Korean did in fact demonstrate a significantly lower accuracy of lenis-aspirated discrimination compared to HSs. However, there was no difference between the groups of L1-i speakers and HSs, apart from a statistically significant but minor advantage of the L1-i group in same trials only. This result gave us no strong evidence to conclude that HSs' discrimination abilities were not comparable to those of L1-i speakers.
This result is not particularly surprising when considered in the broader context of literature on the perceptual abilities of HSs. Although some studies indicate that HSs may underperform on some aspects of native speech perception, compared to L1-i speakers (Ahn et al. 2017;Cheon and Lee 2013;Lee-Ellis 2012), there is also evidence that equivalent performance can be observed (Chang 2016;Lukyanchenko and Gor 2011;Oh et al. 2003;Tees and Werker 1984;Werker 1989). For example, Oh et al. (2003) showed that HSs of Korean (childhood speakers in their terminology) were as accurate as L1-i speakers at recognizing fortis, lenis, and aspirated Korean stops in a three-choice identification task.
Furthermore, the difficulty of the task appears to play a role in determining the outcome of such comparisons, with simpler tasks, such as AX discrimination, often eliciting more comparable performance across groups (Lee-Ellis 2012). Moreover, some scholars suggest that among all types of linguistic competence, perceptual abilities benefit the most from the early and authentic exposure that characterizes heritage language acquisition, resulting in the most native-like performance in this modality, even when compared to speech production, and especially in counterposition with morphosyntactic abilities (Chang 2021;Oh et al. 2003). In addition, developmental investigations report that lenis-aspirated distinction in Korean stops is acquired before school age, when dominant language starts playing an important role in language development of HSs (Choi et al. 2019;Jun 2007;Kim and Stoel-Gammon 2009). Therefore, perceptual abilities with respect to this aspect of Korean phonology are likely to be well-established before dominant language begins exerting its influence on the heritage language. Thus, one possible interpretation of the lack of significant differences between heritage and L1-i speakers observed in the present study is that the two groups are truly equivalent in the way they perceive and process the lenis-aspirated contrast in Korean, especially under the conditions of this relatively simple task (AX discrimination).
There are, however, alternative possibilities. In particular, it is possible that HSs did rely on onset F0 less and on VOT more than L1-i speakers in their perception of the lenis-aspirated contrast as we predicted, but that it did not, contrary to prediction, affect their discrimination performance adversely. In previous work on L2 acquisition, parallels were often observed between incorrect cue weighting in perception of non-native contrasts and sub-optimal performance in the discrimination of the contrasts Tohkura 1990, 1992). However, there are also findings indicating that non-native cue-weighting does not inevitably lead to differences in contrast perception. For example, Escudero (2000Escudero ( , 2001 showed that Spanish L2 speakers of English performed in a nativelike fashion in discriminating the English lax-tense vowel contrast, even though their perceptual cue-weighting was different from that of English native speakers (they relied more on the temporal than spectral dimension). Therefore, our heritage participants could perform on par with L1-i speakers on the discrimination task, in spite of their reliance on a more English-like cue (VOT). This explanation is especially viable in light of the fact that stimuli used in the experiment did contain both VOT and onset F0 as cues to the lenis-aspirated distinction.
The acoustic analysis of the stimuli used in the AX discrimination task indicates that the contrast between lenis and aspirated stops in this dataset was implemented via both dimensions, VOT and onset F0. That is, the expected VOT merger between lenis and aspirated stops was not as pronounced in these stimuli as expected, potentially giving our heritage listeners (with their purported reliance on VOT) a leg up. It is possible that speakers who recorded the stimuli introduced some degree of hyperarticulation along the VOT dimension in their productions in order to increase the intelligibility of their speech, which is an effect sometimes observed for speech recorded in highly artificial laboratory conditions (see, e.g., Chang and Mandock 2019, for similar reasoning). In fact, there is evidence that Korean stops produced using 'clear speech' are differentiated via both VOT and onset F0 even for speakers who are expected to exhibit a VOT merger (Kang and Guion 2008). There is also evidence that VOT differences between lenis and aspirated stops are enhanced in child-directed speech, which is often characterized by modifications similar to those of clear speech (Ko 2018).
In addition, these speakers could have produced stronger VOT contrast between lenis and aspirated stops because they were all proficient in English and were, in fact, recorded in the United States, where they lived at the time of the recording (which is not mutually exclusive with the 'clear speech' hypothesis). As previous research shows, this fact alone could lead to a greater reliance on VOT in implementing Korean laryngeal contrasts by these speakers (Cheng 2017;Kang and Nagy 2016).
Finally, VOT merger in Korean lenis and aspirated stops is due to an ongoing sound change leading to tonogenesis. Some amount of variability in adopting the sound change is expected among the contemporary speakers. Therefore, some of speakers who recorded the stimuli for the experiment could be not as advanced in adopting this sound change as others (e.g., Speaker 1), relying on VOT more than they would otherwise.
Thus, the acoustic characteristics of the stimuli made alternative perceptual strategies available to the participants, specifically the use of VOT, instead of or in addition to onset F0, in determining the difference between lenis and aspirated stops, which could put the group relying mostly on F0 and the one relying mostly on VOT on equal footing.
The data we have at hand do not allow us to confirm or reject either of these interpretations at the moment. However, the two interpretations make different, clear, and testable predictions, which can be addressed in future work. For example, in a study with a comparable design which uses lenis and aspirated stimuli that differ only in F0, not in VOT (i.e., fully VOT-'merged' versions of lenis and aspirated stops), we would expect to find a significant difference in discrimination accuracy between HSs and L1-i speakers, if the second interpretation is correct.
Although we did not see a significant difference in the discrimination of the contrast between heritage and L1-i speakers, the heritage participants' fluency of spoken Korean, measured as the articulation rate, was significantly predictive of their perceptual accuracy. Insofar as fluency measures can be used as indicative of the overall language proficiency (see Nagy and Brook 2020;Polinsky 2008Polinsky , 2011Polinsky and Kagan 2007), these results suggest that greater proficiency in Korean led to greater perceptual accuracy. Indirectly, this finding supports our assumption that crosslinguistic influence from English is the source of perceptual difficulties. Research indicates that bilinguals with more balanced proficiency and dominance in the two languages, in particular early simultaneous bilinguals, often demonstrate a greater ability to maintain a separation between the two sound systems, minimizing the interference effects between the two (e.g., Barlow et al. 2013;Guion 2003;MacLeod et al. 2009;Sundara and Baum 2006). Therefore, those of our HSs who exhibited greater proficiency in their non-dominant language could exercise better control over the interference from English, thus performing better on the perceptual task. Interestingly, for L2 learners none of the linguistic background and experience factors were found to be significantly related to perceptual accuracy.
Among other notable aspects of these results, we observed a less than perfect performance on a relatively simple AX discrimination task even among the L1-i listeners, who performed with an accuracy of 88%, on average. This outcome can partly be attributed to the overall perceptual difficulty of the lenis-aspirated distinction in the aftermath of tonogenesis, if onset F0 is a less salient perceptual cue than VOT (Kong and Lee 2018;Son 2020). It nevertheless stands in contrast to the results of similar tasks, e.g., a three-choice phonemic identification of the lenis-fortis-aspirated stops in Oh et al. 2003, where native monolingual speakers of Korean performed with an accuracy of 98.6%. We believe that administering the study online vs. in the laboratory may have resulted in this discrepancy. Since we did not have control over the conditions in which our participants performed the tasks, beyond ensuring that they were using the headphones, it is possible that some participated while in a noisy or distracting environment, thus failing to perform at the same level at which they would have performed in the optimal conditions of a phonetics laboratory. While this complicates comparisons with laboratory studies, these conditions are closer to those under which natural speech perception takes place, thus providing a somewhat more realistic estimate of the relevant perceptual behaviors.
It is also noteworthy that L2 learners in the present study achieved a fairly low degree of accuracy on 'different' trials-only 53% correct. Their moderate proficiency in Korean is undoubtedly partly responsible for this outcome (around 3 on a 7-point scale, by selfreport). Nevertheless, one may question why these learners did not take advantage of the VOT as a correlate of the lenis-aspirated distinction the way HSs presumably did. This may be especially puzzling given that positive VOT is a primary cue to the voicing distinction in word-initial stops in English (Abramson and Lisker 1985). It should be considered, however, that both lenis and aspirated stops in Korean have long lag VOT and therefore fall squarely within the range of voiceless English stops. As a result, Englishspeaking learners are required to make a within-category discrimination decision when attempting to use VOT as a cue-in effect trying to distinguish between aspirated and slightly more aspirated stops. It should come as no surprise that it proves a difficult task for leaners whose neural pathways have been trained to discriminate categorically between aspirated and unaspirated stops. In fact, Cheon and Lee (2013) and Schmidt (2007) showed that native speakers of American English, including those learning Korean as a second language, strongly associated both lenis and aspirated Korean stops with English voiceless stops. This difficulty has probably been compounded by the fact that VOT was not a very consistent correlate of this distinction in Korean. Figure 4 demonstrates a fair amount of variability as well as partial overlap or close proximity between lenis and aspirated VOT ranges.
To conclude, in this study, we hypothesized, based on the assumptions of SLM, that a crosslinguistic interaction takes place between the laryngeal categories of English and Korean for those who speak both languages, specifically HSs of Korean in the United States and American learners of Korean as an L2. More precisely, we expected that, as a result of English affecting the perception of Korean, for both groups of listeners, the accuracy of their discrimination of the Korean lenis-aspirated contrast would suffer, in comparison to the group of L1-i speakers. We also expected a difference between the two groups, such that L2 learners, for whom Korean is a later acquired and non-dominant language, would discriminate Korean categories more poorly than HSs, for whom Korean is also non-dominant but an early acquired language (in addition to other differences between the groups, e.g., in terms of current exposure and use of Korean). Our prediction was partially supported, demonstrating that L2 learners of Korean were outperformed by HSs of Korean, but the latter group was not significantly different from the L1-i speakers. While on the face of it this result adds to the body of literature arguing for equal perceptual abilities of HSs and L1-i/monolingual speakers, an alternative explanation is possible. We conjectured that the lack of differences in the performance of heritage and L1-i speakers could be explicable by the acoustic properties of the stimuli used in the present study, which could allow both groups to achieve high discrimination accuracy despite relying on distinct perceptual strategies. This is an intriguing possibility that must await future research for its definitive confirmation. Informed Consent Statement: Informed consent was obtained online from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality.