Perceptual Discrimination of Phonemic Contrasts in Quebec French: Exposure to Quebec French Does Not Improve Perception in Hexagonal French Native Speakers Living in Quebec

In Quebec French, /a ~ A/ and /ε ~ aε/ are phonemic, whereas in Hexagonal French, these vowels are merged to /a/ and /ε/, respectively. We tested the effects of extended exposure to Quebec French (QF) as a second dialect (D2) on Hexagonal French (HF) speakers’ abilities to perceive these contrasts. Three groups of listeners were recruited: (1) non-mobile HF speakers born and living in France (HF group); (2) non-mobile QF speakers born and living in Quebec (QF group); and mobile HF speakers having moved from France to Quebec (HF>QF group). To determine any fine-grained effects of second dialect (D2) exposure on the perception of vowel contrasts, participants completed a same–different discrimination task in which they listened to stimuli paired at different levels of acoustic similarity. As expected, QF listeners showed a significant advantage over the HF group in discriminating between /a ~ A/ and /ε ~ aε/ pairs, thus suggesting an own-dialect advantage in perceptual discrimination. Interestingly, this own-dialect advantage appeared to be greater for the /ε ~ aε/ contrast. The QF listeners also showed an advantage over the HF>QF group, and, surprisingly, this advantage was greater than over the HF group. In other words, the results suggested that the acquisition of a second dialect did not enhance the abilities of listeners to perceive differences between phonemic contrasts in that D2. If anything, the acquisition of the D2 disadvantaged the perceptual abilities of the HF>QF group. This might be because these phonemes have, over time, become less acoustically marked for the HF>QF participants and have, potentially, become integrated into their D1 phonemic categories.


Introduction
Work focusing on speech production has shown that speakers can acquire non-native vowel contrasts in their speech as a consequence of extended exposure to a second dialect (D2) of their native language (Chambers 1992) and that this can occur even after moving to a new D2 region in adulthood (Johnson 2007;Nycz 2013;Walker 2019). In comparison, relatively few studies have explored whether speakers' perception of non-native vowel contrasts is also capable of changing as a result of D2 exposure. However, there is some evidence that long-term exposure to a D2 can result in an improved ability to perceive phonemic vowel categories in the D2 that are not native in a listener's first dialect (D1) (Bowie 2000); although, other studies have failed to find similar evidence of perceptual change in adulthood among other D2 populations (Ziliak 2012). Therefore, the extent to which D1 perceptual categories are susceptible to change after long-term D2 exposure in adulthood is not yet well understood.
The current study seeks to shed light on D1 perceptual malleability by examining how extended exposure to a D2 of French can influence the perception of non-native vowel While these vowels are lexically contrastive in QF-as shown by the minimal pairs in (1) and (2)-these contrasts have largely been neutralized in Hexagonal French (HF, i.e., the Standard French of France) with only [a] and [ε] being maintained in the majority of HF speakers' phonemic inventories (Hansen and Juillard 2011). Given this situation of cross-dialectal variation, exploring how HF speakers who have moved to Quebec in adulthood perceive these contrasts-and whether they perceive the contrasts at all-can contribute to a better understanding of the extent to which native speech perception is malleable in post-adolescence.

Cross-Dialectal Perception of Mergers
Research on the cross-dialectal perception of mergers has typically shown listeners who do not have a phonemic contrast in their D1 to be worse at perceiving this contrast in speech, as compared to speakers who do have this contrast natively (e.g., Conrey et al. 2005;Janson and Schulman 1983;Labov et al. 1991). This has often been shown through lexical discrimination tasks, in which participants hear two words and are asked to report whether the words they hear are the same or different from one another. Practically speaking, however, this method is known to be difficult to calibrate. For instance, several researchers have run into the issue of how to interpret results of participants who perform well above chance in accurately discriminating between words containing non-native regional contrasts (e.g., Austen 2020;Bowie 2000;Labov et al. 1991;Thomas and Hay 2005). This has led on occasion to listeners who do not show 100% accuracy in distinguishing minimal pairs to be considered for analytic purposes as perceptually merged. However, as Wade (2017) points out, this high cut-off point does not preclude the possibility of listeners still picking up on relevant perceptual cues between such word pairs.
One method that has been used to avoid such ceiling effects involves having listeners discriminate words that are acoustically resynthesized along a gradient continuum between two phonemic categories. For instance, Fridland and Kendall (2012) used a seven-step resynthesized continuum between vowels [e] to [ε] in US English to test at what point American English speakers change their perception from one vowel to the other when played ambiguous tokens of word pairs bait [beIt]-bet [bεt] and date [deIt]-debt [dεt]. They found that participants from the Southern US, a variety characterized by the merging of these vowels to [e], perceived more of the vowel steps along this continuum as [e] compared to non-Southern, non-merged participants. This was evidenced by a later crossover point of the gradient stimuli (i.e., when perception of [e] or [ε] is at 50%) for the Southern group versus the non-Southern group. This and other studies examining the perception of gradient phoneme contrasts (e.g., Bukmaier et al. 2014;Miller et al. 2011) have shown that listeners attend to fine-grained and dialect-specific information in an utterance when mapping an incoming speech signal onto existing vowel categories.

Perceptual Adaptation in the Acquisition of a New Language Variety
Research on second dialect acquisition (SDA) has typically focused on changes in production that come with long-term contact with a D2. Past SDA studies having explored perceptual change have often found individuals with more exposure to a D2 to hold advantages in perceiving D2 speech, as compared to speakers of the same D1 with less exposure to this D2 (e.g., Scott and Cutler 1984;Bowie 2000;Iverson 2004, 2007;Walker 2018;Voeten 2021). For instance, Walker (2018) showed that mobile English speakers who moved from the US to the UK, and vice versa, were unsurprisingly better at transcribing speech in noise spoken in their non-native dialect (American or British English) as compared to non-mobile counterpart speakers of these dialects. Moreover, Languages 2023, 8, 193 3 of 24 other studies have shown listeners with D2 experience to be better at perceiving specific phonemic distinctions present in a D2 that are absent in their D1. For instance, Scott and Cutler (1984) showed that, compared to British speakers who had never lived outside of Britain, British speakers with extensive experience living in the US were faster at resolving lexical ambiguities between words involving [t] and [R], a distinction present in American but not British English. More recently, Bowie (2000) found evidence that exposure to a dialect containing a vowel contrast which is absent in another dialect of the same language can result in an improved ability to discriminate between these contrastive, non-native vowels in perception. This study showed that listeners who lived outside Waldorf were generally better than listeners who never lived outside Waldorf at discriminating between pre-lateral /u/, /U/, and /o/, vowel categories that are merged in the Waldorf dialect (i.e., the listeners' D1).
However, strong evidence of perceptual adaptation has not always been found among listeners with long-term exposure to a D2. For example, Ziliak (2012) tested the perceptual categorization of vowels among non-mobile listeners from Southern Indiana and Chicago and mobile speakers who had moved from Indiana to Chicago. Participants heard words resynthesized along continua designed to test the perception three features of the Northern Cities Shift (/a/-fronting, /ae/-raising, and /ε/-backing). In a two-alternative forced choice design, participants selected between one of two words to indicate the word they heard. For all three continua tested, no statistically significant differences in categorical perception were found between listeners who had moved from Southern Indiana to Chicago and nonmobile residents of Southern Indiana. Although some intra-group variability was found among the mobile participants in terms of how closely they patterned in their perception of the features as compared to life-long Chicago residents, this was also the case for the non-mobile Indiana participants.
Studies of perceptual attrition in contexts of SLA also shed light on the extent to which perceptual adaptation can occur in adolescence. Like exposure to a D2, linguistic competition from a second language (L2) can result in perceptual changes to the native speech perception system. For example, Celata and Cancila (2010) tested the perception of singleton and geminate consonant distinctions (e.g., casa [kasa] 'house' vs. cassa [kas:a] 'box') in the Italian Lucchese dialect among three groups of participants: (i) first-generation speakers born in Lucca who had moved to San Francisco in adulthood, (ii) second-generation Lucchese speakers living in the US and born to immigrant parents from Lucca, and (iii) nonmobile speakers born and living in Lucca. It was found that the first-generation Lucchese speakers who had moved to the US in adulthood were better at discriminating between words with the singleton-geminate contrast than the second-generation speakers who were born in the US. In addition, they reported that both the first-and second-generation immigrant groups in the US showed a disadvantage in accurate identification of words containing singleton consonants versus geminates compared to the non-mobile Lucchese speakers. The higher error rate of the first-generation participants compared to the nonmobile participants suggests that L2 acquisition in adulthood can result in changes to the native L1 perceptual system and indeed hinder L1 perception.
Considered together, several observations can be drawn from this past research concerning the effects of late D2 or L2 acquisition on perceptual malleability in the native linguistic system. The results of Scott and Cutler (1984), Bowie (2000), and Walker (2018) indicate that individuals' speech perception abilities can change in adulthood after extended exposure to a D2. Moreover, these studies show that perceptual adaptation can take place in a number of domains, including the fine-grained perception of specific phonemes (e.g., Bowie 2000) and the comprehension of speech more generally (e.g., Walker 2018). Celata and Cancila (2010) and other recent studies of perceptual restructuring in late bilingualism (e.g., Cabrelli et al. 2019;de Leeuw et al. 2021;Tobin et al. 2017) lend additional support to the view that perceptual adaptation is possible in post-adolescence.
However, as shown by the results of Ziliak (2012), perceptual change (like changes in production, see Nycz 2015) is not guaranteed to occur as a result of speaker mobility. Therefore, more research needs to be conducted to understand the mechanics of postadolescent perceptual malleability among mobile speakers. The present study focuses on perceptual adaptation as relating to vowel discrimination, a dimension of perceptual malleability that remains underexplored. This is undertaken by exploring whether mobile listeners who do not have a native vowel contrast in their D1 perceptually discriminate tokens containing this contrast differently from listeners who do not have the same amount of exposure to this D2. Furthermore, while it has been shown that listeners' ability to perceptually discriminate between two words that differ as a function of a vowel contrast depends in part on the vocalic inventory of their native language (Amengual and Chamorro 2015) or dialect (Riverin-Coutlée and Arnaud 2015), it remains unclear whether the ability to discriminate between non-native vowel contrasts in perception can be acquired after extended exposure to a D2. The present study seeks to shed light on these questions through examining the perception of two vocalic contrasts in QF (/a~A/ and /ε~aε/). Would the HF>QF speakers outpreform the HF group, or would they subceed?

Variables of Interest
Although having roots in the history of HF, /a/ and /A/ (e.g., tache [taS] 'stain' vs. tâche [tAS] 'task, chore') have been undergoing a merger in France whereby the back vowel /A/ is realized close to or identical to the front vowel /a/ (Berns 2019). This merger has resulted in the phonemic neutralization of possible minimal pairs no longer commonly contrastive in HF, such as patte 'paw'~pâte 'dough' and tache 'stain'~tâche 'task'. Despite the fact that most speakers in France show a merger in the pronunciation of /a/ and /A/ (ibid.), there is still some amount of inter-speaker variation found in the pronunciation of /a/ and /A/ (Hansen and Juillard 2011), particularly in Eastern France along the Belgian and Swiss borders (Avanzi 2017). On the other hand, the /a~A/ vowel contrast is actively present in QF and is indeed considered one of the most characteristic markers of this dialect (Brasseur 2009). Because of the ubiquity of this feature, as well as the fact that it is not subject to negative social evaluation, this low contrast is often noted to constitute part of the norm of pronunciation for QF (e.g., Chalier 2021). This is supported by the fact that it is used widely even in formal speech contexts such as Radio-Canada news broadcaster speech (Bigot and Papen 2013). Thus, the marked difference in the presence of /a~A/ in QF vs. HF makes it an ideal variable for studying the perception of cross-dialectal phonetic variation.
The second contrast considered in this study is the mid-vowel contrast /ε~aε/ (e.g., mettre [mεtK] 'to put' vs. maître [maεtK] 'owner, teacher'). One important difference between the /a~A/ contrast and the /ε~aε/ contrast lies in the respective historical origins of these contrasts. While /a~A/ has deep roots in the history of HF (Berns 2015), the diphthongization of long vowels in French, as in /ε~aε/, are 'phonetic creations' of QF (Reinke and Ostiguy 2016, p. 55), having originated after French colonial contact with North America and therefore possessing no historical counterpart in HF 1 . Like the /a~A/ contrast, the contrast in /ε~aε/ is ubiquitous in contemporary QF (e.g., Côté and Lancien 2019; Riverin-Coutlée and Roy 2020). In QF, the long vowel [ε:] shows a strong tendency to diphthongize to [aε], with recent acoustic studies showing this phoneme to be very often or even categorically realized with a complex nucleus (e.g., Leblanc 2012; Riverin-Coutlée and Roy 2020). Moreover, this contrast, unlike the /a~A/ contrast, is subject to social stigmatization and is not typically considered part of the QF norm of pronunciation (Chalier 2021, p. 313).
In summary, these two contrasts were considered of interest in this study due to their cross-dialectal differences in HF and QF: (1) /a~A/ represents a contrast which is being merged, i.e., 'lost' in HF while maintained in QF; (2) /ε~aε/ represents a contrast which is not present in HF and is therefore a phonetic creation in QF. The difference in social evaluation of these contrasts in QF also made them interesting variables to compare. We were curious to explore whether these differences would be noticeable in our results as, essentially, the mobile participants would have potentially encountered /A/ in France, but /aε/ would be an entirely new phoneme to them in their D2.

Research Question and Hypotheses
As discussed in more detail below, three groups of listeners (i.e., (1) non-mobile HF speakers born and living in France (HF group), (2) non-mobile QF speakers born and living in Quebec (QF group), and (3) mobile HF speakers having moved from France to Quebec (HF>QF group) were recruited to investigate the following research question: • What effect does extended exposure to QF as a D2 have on HF>QF's ability to discriminate between these contrasts?
We predicted to find an effect of mobility on HF>QF listeners' ability to discriminate between /a~A/ and /ε~aε/, such that the HF>QF participants would prove better at accurately discriminating between acoustically similar items containing these contrasts as compared to the HF participants, given the HF>QF group's increased exposure to these contrasts in QF as an ambient D2. If supported, this finding would suggest that perceptual adaptation had taken place in the mobile speakers' native linguistic systems. It was further predicted that the HF>QF listeners would perform at an intermediary level between the HF and QF listeners, with the QF listeners holding an advantage and thus showing more accurate discrimination of these contrasts overall. The methods for testing this hypothesis are discussed in more detail in the following section.

Stimuli Design
Stimuli were recorded by a native female speaker of QF, who was 22 years old and who had spent her entire life in Quebec (in the Bois-Francs region) at the time of recording. Recordings were made in a quiet room at Université de Sherbrooke using an Olympus LS-7 Linear PCM Recorder audio recorder at a mono, 32-bit sampling rate of 44.1 kHz. The speaker was asked to read two word lists two times each in their most regular and natural accent. The list comprised words with the target vowel contrasts (e.g., tache-tâche, mettre-maître) as well as words with other phonemic contrasts across HF and QF (/u/ vs. /y/, e.g., bout-but, /ε/ vs. / cross-dialectal differences in HF and QF: (1) /a ~ ɑ/ represents a contrast which is being merged, i.e., 'lost' in HF while maintained in QF; (2) /ε ~ aε/ represents a contrast which is not present in HF and is therefore a phonetic creation in QF. The difference in social evaluation of these contrasts in QF also made them interesting variables to compare. We were curious to explore whether these differences would be noticeable in our results as, essentially, the mobile participants would have potentially encountered /ɑ/ in France, but /aε/ would be an entirely new phoneme to them in their D2.

Research Question and Hypotheses
As discussed in more detail below, three groups of listeners (i.e., (1) non-mobile HF speakers born and living in France (HF group), (2) non-mobile QF speakers born and living in Quebec (QF group), and (3) mobile HF speakers having moved from France to Quebec (HF>QF group) were recruited to investigate the following research question: • What effect does extended exposure to QF as a D2 have on HF>QF's ability to discriminate between these contrasts?
We predicted to find an effect of mobility on HF>QF listeners' ability to discriminate between /a ~ ɑ/ and /ε ~ aε/, such that the HF>QF participants would prove better at accurately discriminating between acoustically similar items containing these contrasts as compared to the HF participants, given the HF>QF group's increased exposure to these contrasts in QF as an ambient D2. If supported, this finding would suggest that perceptual adaptation had taken place in the mobile speakers' native linguistic systems. It was further predicted that the HF>QF listeners would perform at an intermediary level between the HF and QF listeners, with the QF listeners holding an advantage and thus showing more accurate discrimination of these contrasts overall. The methods for testing this hypothesis are discussed in more detail in the following section.

Stimuli Design
Stimuli were recorded by a native female speaker of QF, who was 22 years old and who had spent her entire life in Quebec (in the Bois-Francs region) at the time of recording. Recordings were made in a quiet room at Université de Sherbrooke using an Olympus LS-7 Linear PCM Recorder audio recorder at a mono, 32-bit sampling rate of 44.1 kHz. The speaker was asked to read two word lists two times each in their most regular and natural accent. The list comprised words with the target vowel contrasts (e.g., tache-tâche, mettremaître) as well as words with other phonemic contrasts across HF and QF (/u/ vs. /y/, e.g., bout-but, /ɛ/ vs. / oẽ /, e.g., brin-brin).
Eight minimal pairs were selected for this task. All sixteen words show medium to relatively low lexical frequency; specifically, they appeared less than 300 times but more than 5 times per million words in the Lexique 2 corpus of written French (New et al. 2004) (however, note that two lower-frequency words were also used as fillers; see Appendix A). Where possible, attempts were made to match the grammatical category of the words in each pair; unfortunately, this was not always possible due to the small number of possible QF minimal pairs containing the contrasts examined (e.g., laide 'ugly' ~ l'aide 'the help', mettre 'to put' ~ maître 'master').
As they are most relevant to the research question at hand, only the four target word pairs (i.e., those containing /a ~ ɑ/ and /ε ~ aε/- Table 1) will be discussed in the remainder of this paper.
/, e.g., brin-brin). Eight minimal pairs were selected for this task. All sixteen words show medium to relatively low lexical frequency; specifically, they appeared less than 300 times but more than 5 times per million words in the Lexique 2 corpus of written French (New et al. 2004) (however, note that two lower-frequency words were also used as fillers; see Appendix A). Where possible, attempts were made to match the grammatical category of the words in each pair; unfortunately, this was not always possible due to the small number of possible QF minimal pairs containing the contrasts examined (e.g., laide 'ugly'~l'aide 'the help', mettre 'to put'~maître 'master').
As they are most relevant to the research question at hand, only the four target word pairs (i.e., those containing /a~A/ and /ε~aε/- Table 1) will be discussed in the remainder of this paper. In the present study, resynthesized vowel continua were used to test whether individuals who had moved from France to Quebec, where they had received extended exposure to QF as a D2, perceptually differentiate and categorize the QF contrasts /a~A/ and /ε~aε/. In Praat (Boersma and Weenink 2021), gradient vowel continua for each of the eight minimal pairs in Table 1 were created using a script written by Lawrence (2018) and previously used in similar work (e.g., Alderton 2020; Barnard 2021). This script uses LPC (linear prediction coding) to estimate the spectral envelope of a speech sound. An iterative inverse-filtering technique (based on Alku et al. 1999) is then used to render a voicing source representation, allowing users to create an acoustic gradient continuum between two specified, natural speech tokens. Once the resynthesized segments are modified and their pitch and amplitude contours are matched to those of the vowels in the natural tokens, they are embedded back into the original consonantal context, retaining the original flanking environment of one of the natural tokens. This approach was chosen because it (1) embeds high-end components of the speech signal back into the modified segment to produce less synthetic-sounding stimuli and (2) allows users to manipulate both spectral quality and segment duration in the resynthesis process. The latter was deemed particularly important for this study given the role that duration plays in the production of these two vowels in QF (Riverin-Coutlée and Roy 2020).
Vowel trajectory and duration were not manipulated independently from one another in the resynthesis process for two reasons. First, vowel duration has been shown to be closely intertwined with vowel diphthongization in QF, such that vowel lengthening is a necessary condition for diphthongization to occur in QF (Riverin-Coutlée and Roy 2020). It was therefore reasoned that separating these two cues would not reflect the phonological reality of this variable in spoken QF. Second, as previously stated, recent acoustic studies have shown the monophthongal realization of words such as maître 'master' as [mε:tK] to be largely marginal (Leblanc 2012) or even absent (Riverin-Coutlée and Roy 2020) in modern-day QF.
This process yielded eight unique nine-step vowel continua. Four of these continua contained the target contrasts /a~A/ and /ε~aε/ and the other four contained other contrasts used as distractors. Midpoint F1/F2 formant values for the four resynthesized target continua are plotted in the right panel of Figure 1 (for more detailed formant measurements for each of these steps, see Appendix B). To ensure that these resynthesized tokens could be generalized as representative of QF, their spectral properties were compared to those of Riverin-Coutlée and Roy (2020), who conducted an acoustic analysis of 37 female QF speakers of similar age to the speaker in this study (ranging from 18 to 23 years). As shown below in Figure 1, the resynthesized tokens fall within a realistic spectral range when compared to the naturalistic vowel spaces of native, female QF speakers. The endpoints of each continuum (i.e., steps 1 and 9) approximately correspond in acoustic space to the formant values of the natural (i.e., unsynthesized) stimuli recorded by the speaker, while still being synthesized along with the other continua steps. Comparing the two target contrasts, one finds that each of the two minimal pairs containing the /ε~aε/ contrast overlap to a great degree in their respective vowel spaces, while there is no spectral overlapping of the continuum steps for the pairs involving the /a~A/ contrast (with the patte-pâte continuum being consistently higher in the vowel space than the tache-tâche continuum). Figures 2 and 3 show the formant trajectories of every other step in the tache-tâche and mettre-maître continua, respectively (more detailed measures for the resynthesized steps of each continuum are included in Appendix B). while still being synthesized along with the other continua steps. Comparing the two target contrasts, one finds that each of the two minimal pairs containing the /ε ~ aε/ contrast overlap to a great degree in their respective vowel spaces, while there is no spectral overlapping of the continuum steps for the pairs involving the /a ~ ɑ/ contrast (with the pattepâte continuum being consistently higher in the vowel space than the tache-tâche continuum).  Step 1 Step 3 Step 5 Step 7 Step 9 tache [taʃ ] 'stain' t âche [tɑːʃ ] 'task' pâte continuum being consistently higher in the vowel space than the tache-tâche continuum).  Step 1 Step 3 Step 5 Step 7 Step 9 tache [taʃ ] 'stain' t âche [tɑːʃ ] 'task' Figure 2. Waveform, spectrogram, and F1-F2 trajectories for steps 1, 3, 5, 7, and 9 of resynthesized continuum between /a/ and /A/ in the QF minimal pair tache-tâche. The lowering of the F2 formants across the continuum corresponds to an increasingly backed realization of /a/ over the nine steps.

Naturalness Rating Experiment
An online naturalness rating experiment was conducted to test the perceived naturalness of each of the nine steps of each of resynthesized continua. This was conducted to ensure that, as far as possible, the stimuli were assessed on the basis of the extent of the phonemic contrast rather than on the basis of sounding 'unnatural' or 'unnatural'. Ten native French-speaking participants from France, otherwise unrelated to this study, were recruited from the online participant recruitment platform Prolific (www.prolific.co, accessed on 20 October 2022). Participants were presented 72 tokens (i.e., all tokens) on a scale from 1 to 10, with 1 indicating that the token sounded tout à fait naturel 'completely natural' and 10 indicating that the tokens sounded clairement synthétisé 'clearly synthesized'. Each stimulus was presented once in a randomized order and participants had no time limit to give their response. Before the main experiment, participants were played a recording of the word joue ('play') that would merit a rating of 1 (i.e., a natural, unmodified recording of a word) and stimuli that would merit a rating of 10 (i.e., a token manipulated beyond clear lexical recognition). On average, it took 4 min and 19 s (min = 3:21, max = 7:22) to complete the experiment. Average ratings are shown in Table 2; as can be seen, the target continuum received numerically similar naturalness ratings, with the lowest rating (indicating a higher level of perceived naturalness) being 4.96 for tache-tâche and the highest rating (indicating a lower level of perceived naturalness) being 6.03 for laidel'aide. Step 1 Step 3 Step 5 Step 7 Step 9

Naturalness Rating Experiment
An online naturalness rating experiment was conducted to test the perceived naturalness of each of the nine steps of each of resynthesized continua. This was conducted to ensure that, as far as possible, the stimuli were assessed on the basis of the extent of the phonemic contrast rather than on the basis of sounding 'unnatural' or 'unnatural'. Ten native French-speaking participants from France, otherwise unrelated to this study, were recruited from the online participant recruitment platform Prolific (www.prolific.co, accessed on 20 October 2022). Participants were presented 72 tokens (i.e., all tokens) on a scale from 1 to 10, with 1 indicating that the token sounded tout à fait naturel 'completely natural' and 10 indicating that the tokens sounded clairement synthétisé 'clearly synthesized'. Each stimulus was presented once in a randomized order and participants had no time limit to give their response. Before the main experiment, participants were played a recording of the word joue ('play') that would merit a rating of 1 (i.e., a natural, unmodified recording of a word) and stimuli that would merit a rating of 10 (i.e., a token manipulated beyond clear lexical recognition). On average, it took 4 min and 19 s (min = 3:21, max = 7:22) to complete the experiment. Average ratings are shown in Table 2; as can be seen, the target continuum received numerically similar naturalness ratings, with the lowest rating (indicating a higher level of perceived naturalness) being 4.96 for tache-tâche and the highest rating (indicating a lower level of perceived naturalness) being 6.03 for laide-l'aide. Table 2 also shows that resynthesized tokens towards the end of each continuum were perceived as less natural on average than the earlier steps in the continuum; this is particularly true for the /ε~aε/ continua (e.g., l'aide-laide and mettre-maître) and is most likely a reflection of the relative complexity of the spectral properties of these tokens (i.e., the fact that there is more diphthongization over the course of the vowel) compared to the /a~A/ continua.
In sum, while the resynthesis process used to create the stimuli in this study did result in tokens that were perceived overall as more synthetic-than natural-sounding, this was, for the purposes of this study, deemed a necessary trade-off for being able to closely control the spectral progression of the stimuli tested. We briefly discuss this trade-off in more detail in the discussion section.

Participants
Thirty-five individuals participated in this study. Participants were recruited into one of three groups depending on their nationality and mobility status: 1.
HF>QF group: Mobile HF speakers born in France but living in Quebec after moving there in adulthood (n = 12). 3.
QF group: Non-mobile QF speakers born, raised, and living in Quebec (n = 10).
More information about participants is given in Table 3. Table 3. Group data for participant groups.

Mean Length of Residence in Quebec
HF group (n = 13) 5:8 42.8 (8) n/a n/a QF group (n = 10) 4:6 34.4 (10) n/a n/a HF-QF group (n = 12) 9:3 45.7 (7.5) 22 7 Participants in the HF and QF groups were recruited through Prolific (www.prolific.co, accessed 28 February 2022). Participants in the HF group were between 30 and 58 years of age (mean = 42.8 years; SD = 8 years), whereas participants in the QF group were between 23 and 54 years of age (mean = 34.4 years; SD = 10 years). Participants in the HF and QF groups reported that they spoke French as a native language, had not lived outside their native region (France or Quebec, respectively) for more than one year, and lived in their native region at the time of participation.
The HF>QF participants were recruited through Facebook groups for French expatriates living in Quebec (e.g., Les Français de la ville de Québec). They were between 29 and 67 years of age (mean = 45.7 years, SD = 7.5 years) when they moved to Quebec and had lived there between 3 and 42 years (mean = 13.6 years, SD = 9.9 years) by the time of testing. All HF>QF participants reported being born in France, having lived there until they were eighteen, and speaking French as a native language. Table 4 reports detailed information about each of the HF>QF participants.

Procedure
The study was designed and administered online through Gorilla Experiment Builder (www.gorilla.sc, accessed 23 February 2022; Anwyl-Irvine et al. 2020). Participants first verified that they fulfilled the specific participation criteria for their group (as detailed in Section 2.2). They were then asked to read a study description and sign a consent form before progressing to the main portion of the study. Although both production and perception data were collected from participants 2 , only one of the perceptual tasks (a same-different discrimination task) is discussed in this paper. This task served to examine participants' ability to differentiate between the target vowel contrasts in perception. The whole study took around 30 min in total to complete, with the discrimination task taking 4-7 min to complete. Participants were asked to wear headphones and complete the study in a quiet space. In all cases, the production data were elicited after participants completed the perception tasks to avoid perceptual priming effects (cf. de Leeuw et al. 2021). At the end of the experimental portion of the study, participants filled in a questionnaire about their language background, language attitudes towards QF, and, in the case of the HF-QF participants, their perceived dialect change since moving to Quebec. Participants were paid either CAD 15 or EUR 10.
For the same-different discrimination task, participants heard two successive words (e.g., patte-pâte) and were asked to report whether the pronunciations of the two words were the same (même) or different (différent) (Figure 4). To investigate the extent to which participants could discriminate varying degrees of acoustic difference between two words, different steps on each continuum were paired in eleven configurations (see, e.g., Amengual and Chamorro 2015;Pallier et al. 1997). For the target trials, that is, those comprising 'different' stimuli, the continua steps were paired in four different configurations with progressively increasing levels of perceptual similarity 3 :

•
Steps 1 and 9 • Steps 2 and 8 • Steps 3 and 7 • Steps 4 and 6 Investigating the extent to which participants can perceive small spectral differences in stimuli involving Quebec-specific vowel contrasts allowed us to test the hypothesis that the HF>QF listeners would perform at an intermediary level between the HF and QF groups (suggesting that restructuring has occurred in their native linguistic systems).
Participants were instructed to press the F key on their keyboard with their left finger if they thought that the pronunciations of the two words were identical and to press the J key with their right finger if they thought that the pronunciations of the two words were different. The task was designed so that participant could only respond after hearing the two words in full. Participants were asked to respond as quickly and as accurately as possible after the end of the second word in the pair. To encourage spontaneous responses, a timeout of 2000 ms from the end of the second stimulus was set before the task advanced automatically to the next trial (cf. de Leeuw et al. 2021).
An inter stimulus interval of 500 ms of silence separated the first word from the second in each trial. Fifteen additional 'same' trials consisting of tokens from the non-target contrast continua (i.e., /u~y/, /o~O/, /e~ε/) were also included to even out the numbers of 'same' vs. 'different' trials. In total, participants listened to 60 trials, with 32 'different' pairings and 28 'same' pairings. For each participant, stimuli were presented in a different pseudo-randomized order, such that trials containing tokens along the same lexical continuum (e.g., pairings 1-9 and 3-7 of the patte-pâte continuum) did not occur in succession. Participants were given the option to take two breaks at evenly spaced intervals across the task. As it has been shown that listeners' success in discriminating vowel sounds can depend partly on the order in which the stimuli are presented (Francis and Ciocca 2003), each 'different' pair in this task was played twice with the presentation order of the stimuli switched each time.
To ensure participants understood the instructions, they were asked to complete four initial practice trials, after which they were given feedback on the accuracy of their responses. Practice trials comprised two 'same' (1-1 pairings) and two 'different' trials (1-9 pairings) of the bouche-bûche continuum, which was not used elsewhere in this task.
Participants were instructed to press the F key on their keyboard with their left finger if they thought that the pronunciations of the two words were identical and to press the J key with their right finger if they thought that the pronunciations of the two words were different. The task was designed so that participant could only respond after hearing the two words in full. Participants were asked to respond as quickly and as accurately as possible after the end of the second word in the pair. To encourage spontaneous responses, a timeout of 2000 ms from the end of the second stimulus was set before the task advanced automatically to the next trial (cf. de Leeuw et al. 2021).
An inter stimulus interval of 500 ms of silence separated the first word from the second in each trial. Fifteen additional 'same' trials consisting of tokens from the non-target contrast continua (i.e., /u ~ y/, /o ~ ɔ/, /e ~ ε/) were also included to even out the numbers of 'same' vs. 'different' trials. In total, participants listened to 60 trials, with 32 'different' pairings and 28 'same' pairings. For each participant, stimuli were presented in a different pseudo-randomized order, such that trials containing tokens along the same lexical continuum (e.g., pairings 1-9 and 3-7 of the patte-pâte continuum) did not occur in succession. Participants were given the option to take two breaks at evenly spaced intervals across the task. As it has been shown that listeners' success in discriminating vowel sounds can depend partly on the order in which the stimuli are presented (Francis and Ciocca 2003), each 'different' pair in this task was played twice with the presentation order of the stimuli switched each time.
To ensure participants understood the instructions, they were asked to complete four initial practice trials, after which they were given feedback on the accuracy of their responses. Practice trials comprised two 'same' (1-1 pairings) and two 'different' trials (1-9 pairings) of the bouche-bûche continuum, which was not used elsewhere in this task.

Statistical Analysis
A total of 2100 responses were collected from this task. Fifteen trials were lost because participants timed out in their response and responses from one QF participant (n = 60) were excluded from the analyses as they performed below chance (d' score 4 = −2.16) (the full dataset, Dataset S1: datasetLanguages.csv, can be accessed at https://drive.google.com/drive/folders/1fLlyHjgYJXcMdq8t2HcwaJM-PMPyu6kP?usp=sharing) Table 5 reports mean accuracy for 'same' and 'different' trials and relative d' scores for each group of participants. In general, participants showed similar overall sensitivity, as confirmed by the numerically similar d' scores across groups. With regard to the 'same' trial, participants performed similarly across groups, whereas the same cannot be said for the 'different' trials. As a consequence, and in line with previous similar work (e.g., Amengual and Chamorro 2015), the analysis reported below was only on the 'different' trials (n = 1077).

Statistical Analysis
A total of 2100 responses were collected from this task. Fifteen trials were lost because participants timed out in their response and responses from one QF participant (n = 60) were excluded from the analyses as they performed below chance (d' score 4 = −2.16) (the full dataset, Dataset S1: datasetLanguages.csv, can be accessed at https://drive.google.com/ drive/folders/1fLlyHjgYJXcMdq8t2HcwaJM-PMPyu6kP?usp=sharing (accessed 5 July 2023)) Table 5 reports mean accuracy for 'same' and 'different' trials and relative d' scores for each group of participants. In general, participants showed similar overall sensitivity, as confirmed by the numerically similar d' scores across groups. With regard to the 'same' trial, participants performed similarly across groups, whereas the same cannot be said for the 'different' trials. As a consequence, and in line with previous similar work (e.g., Amengual and Chamorro 2015), the analysis reported below was only on the 'different' trials (n = 1077).
The statistical analysis was carried out in R (R Core Team 2023; version 4.3.0) using the packages lme4 (Bates et al. 2015) and lmerTest (Kuznetsova et al. 2017). All plots were created using ggplot2 (Wickham et al. 2016). We performed a series of generalized linear mixed effects models (GLMERs) for correct responses for each configuration separately (e.g., step 1-9; step 2-8; step 3-7; step 4-6) to ensure that the potential emergence of significant findings was not an artefact of mere acoustic differences between the two paired stimuli, which could have been, in principle, detected by any French-speaking listeners. Any potential effect of the different configurations was then extrapolated from the analysis by assessing whether factors behaved (dis)similarly across steps (see the comparative sociolinguistics methods, Tagliamonte (2013), for a similar approach). In all models, the dependent variable was the proportion of correctly identified 'different' trials. Group (HF, QF, HF>QF), Contrast (/ε~aε/ vs. /a~A/), and their interaction were entered as fixed effects, with Participant included as a random intercept (model syntax = DV~group * contrast + (1|participant)). Models including random intercepts per word as well as by-participant and by-word random slopes were initially tested but failed to converge. The significance of each models' coefficient was estimated using the Satterthwaite method from lmerTest (Kuznetsova et al. 2017) and significance level was set at p < 0.05. Following previous work (e.g., Winter and Grawunder 2012;Passoni et al. 2022), if results revealed significant interactions between predictors, any potential main effect was not expanded upon as main effects are uninterpretable in case of significant interactions. Residual plots were visually inspected to detect any obvious deviation from normality and homoscedasticity. Post hoc analyses were run using the emmeans package (Lenth 2023) with levels of significance Bonferroni-adjusted for pairwise comparisons.

Results
Figures 5 and 6 below report percentages of correct answers (i.e., when the participant chose 'different') divided by contrasts and configuration (see Appendix C for percentages of correct answers divided by contrasts and configuration for each of the HF>QF participants).
Again, to answer our research question, a series of generalized linear mixed effects models (GLMERs) for correct responses for each configuration separately (see Section 3.4) were run.   The labels on each of the bars show the percentage of correct discrimination for each participant group for each 'different' step configuration (i.e., 1-9, 2-8, 3-7, 4-6).

Configuration 1-9
Table 6 below displays the model summary for the configuration 1-9, i.e., the configuration which contained the stimuli which were most different from one another. The main effect of contrast (mid, /ε ~ aε/ vs. low, /a ~ ɑ/) suggested that, overall, participants were more likely to respond correctly (i.e., choose 'different') when listening to the low vowel contrast than the mid vowel contrast (p = 0.034). However, as evident in Figures 5  and 6, there were no significant group differences as, unsurprisingly, all groups Figure 6. Bar plot showing percentage of correctly perceived 'different' discriminations of /ε~aε/ step configurations (n = 538). The labels on each of the bars show the percentage of correct identification for each participant group for each 'different' step configuration (i.e., 1-9, 2-8, 3-7, 4-6).

Configuration 1-9
Table 6 below displays the model summary for the configuration 1-9, i.e., the configuration which contained the stimuli which were most different from one another. The main effect of contrast (mid, /ε~aε/ vs. low, /a~A/) suggested that, overall, participants were more likely to respond correctly (i.e., choose 'different') when listening to the low vowel contrast than the mid vowel contrast (p = 0.034). However, as evident in Figures 5 and 6, there were no significant group differences as, unsurprisingly, all groups performed well on this configuration and approached ceiling for /a~A/ (low contrast). For the mid contrast /ε~aε/, the QF group approached ceiling with HF>QF performing least accurately and HF performing between QF and HF>QF; although, again, these group differences were not significant.

Configuration 2-8
Due to multicollinearity issues, the original model was split by contrast and the p-value was therefore Bonferroni-adjusted to 0.025. In addition, because the two new models both revealed singularity issues, for this configuration, we ran two glm models with Group as fixed effect (syntax of final model 2-8 = glm(DV~group)). As can be noted in Tables 7 and 8, neither model detected an effect of group on the dependent variable. Furthermore, as can be observed in Figures 5 and 6, and similarly to configuration 1-9, all groups performed closer to ceiling on the low-contrast /a~A/, whereas, for the mid-contrast /ε~aε/, it was the QF group that performed most accurately; although, again, these descriptive differences were not significant.

Configuration 3-7
As displayed in Table 9 for configuration 3-7, the main effect of group (HF vs. HF>QF vs. QF) revealed that, for both contrasts, QF participants were significantly more likely to respond correctly (i.e., choose 'different') than HF participants (ß = 1.83, z-value = 2.17, p = 0.03) when hearing these paired trials, indicating, very generally, that as the stimuli resembled each other more, the QF participants displayed an own-dialect advantage over the other groups. As observed in Figures 5 and 6, HF>QF participants performed least accurately on this configuration: on the mid /ε~aε/ contrast, they performed below chance, and on the low-contrast /a~A/, they performed just slightly above chance. To further explore the significant effect of group on the responses for this configuration, we reran the model with QF as baseline. As can be noted in Table 10, the model output revealed that QF participants were also significantly more accurate (i.e., choose 'different') than HF>QF participants (ß = 2.75, z-value = 2.99, p = 0.002). Notably, as observed in Figures 5 and 6, the HF>QF participants performed less accurately than the HF participants, although this difference was not significant.

Configuration 4-6
Table 11 below shows the model summary for the 4-6 configuration which contained the stimuli which were most similar to one another. The significant interaction between group and contrast was further explored with post hoc analyses (Table 12). As evident in Figure 5, there were no significant group differences for /a~A/ (low contrast), as all groups performed below chance on this configuration. Continuing, QF participants were significantly more likely to respond correctly (i.e., choose 'different') than both the HF>QF participants and the HF participants for /ε~aε/ (mid contrast) (OR = 5.10, z-value = 2.91, p = 0.0416 and OR = 9.603, z-value = 5.7, p = 0.001, respectively) ( Figure 6). In addition, as can be noted in Figure 6, HF>QF participants performed least accurately on this configuration. Finally, the model results also showed that QF participants performed better on the /ε~aε/ than on the /a~A/ contrast (OR = 8.73, z-value = 3.8, p = 0.001).

Summary of Results
With regards to our research question, i.e., what effect does extended exposure to QF as a D2 have on HF>QF listeners' ability to discriminate between the /a~A/ and /ε~aε/ contrasts in QF, inferential analyses revealed no statistically significant differences in performance between the HF>QF participants and the HF participants for both contrasts and all configurations tested. If anything, what we found was that the HF>QF participants performed the least accurately of all three groups, i.e., that D2 acquisition of Quebec French seemed to impair their perception on this task rather than improve it. With regard to configurations 1-9 and 2-8, HF and HF>QF participants performed similarly to QF participants; however, not surprisingly, QF participants performed significantly better than both HF and HF>QF participants for configuration 3-7. Interestingly, with regard to configuration 4-6, QF participants performed better than both HF and HF>QF participants on the /ε~aε/ mid contrast but not on the /a~A/ low contrast.

Discussion
The main aim of this study was to investigate the effects of extended exposure to QF as a D2 on the perceptual discrimination of the vowel contrasts /a~A/ and /ε~aε/ in a group of native HF speakers who had moved to Quebec. A same-different discrimination task was conducted to determine whether increased exposure to non-native phonemic contrasts in a D2 can improve listeners' ability to perceive these as separate phonemes, as previously shown by, e.g., Bowie (2000). We hypothesized that the HF>QF group, having received extended exposure to QF, would be better at discriminating between subtly different pairs of words distinguished by the two target contrasts, as compared to the HF group, who had little first-hand experience with QF. If supported, this result would indicate that perceptual change had taken place among the mobile listeners after relocating to Quebec.
However, the results did not provide support for this hypothesis, as there was no evidence of the listeners in the HF>QF group showing a perceptual advantage over the HF group. Accordingly, it appeared that acquiring a new dialect did not enhance the perceptual capacity to perceive phonemic contrasts in that new dialect. Indeed, listeners in the HF>QF group proved overall to be, if anything, less accurate in their discrimination of differently paired /a~A/and /ε~aε/ items as compared to HF listeners. This pattern was revealed through descriptive differences between these two groups, which were particularly evident for the 3-7 step configurations. Considered at face value, these results would seem to suggest that more exposure to a D2 can lead to less perceptual sensitivity of phonemic contrasts in the D2 and, therefore, that less exposure to D2 contrasts might in fact yield more perceptual sensitivity in discriminating these non-native phonemic contrasts.
It is interesting to speculate as to why a descriptively observable difference was found between the HF and HF>QF groups (with the latter performing less accurately), as this finding would appear to go against past studies showing that more exposure to D2 speech leads to improved abilities to perceive speech in this dialect (e.g., Scott and Cutler 1984;Bowie 2000;Walker 2018). One explanation for our surprising result could be that the QF variants /A/ and /aε/ (i.e., phonemes largely absent from HF) have over time become less acoustically marked for the HF>QF participants and have potentially been integrated into the D1 phonemic categories /a/ and /ε/ as their exposure to D2 input has increased over time. This could also be related to the amount of attention that listeners in the HF>QF and HF groups pay to these contrasts. As Trudgill (1986, p. 11) argues, '[s]peakers are. . .more aware of variables whose variants are phonetically radically different' from corresponding variants in their native dialect. Thus, one could imagine that the HF>QF participants, who, compared to the HF participants, have had more acoustic input to QF as a D2, might be less aware of the variant phonemes /A/ and /aε/ because of this D2 input. Therefore, the HF>QF listeners appear to be less able to discriminate between these phoneme categories and their native /ε/ and /a/ categories when the differences between contrasts /a~A/ and /ε~aε/ are acoustically subtle (see also Auer et al. 1998).
However, given the fact that the inferential statistics failed to show an effect of mobility on group performance, i.e., no statistically significant difference was found between the HF and HF>QF groups in terms of their discrimination of any step configuration, at the very least, what we can say is that extended exposure to Quebec French did not appear to enhance the perceptual capacity to perceive phonemic contrasts in Quebec French. A lack of significance between HF and HF>QF listeners may well have been due to individual variation, particularly in the HF>QF group, as the amount of QF exposure among listeners in this group differed dramatically (ranging from 3 to 42 years). Due to space constraints, in the present study, we did not investigate intragroup variation among HF>QF participants (although, see Appendix C for information about individual participant performance). Future research should consider, e.g., correlational analyses employing nuanced measures of mobile participants' exposure to QF (e.g., personal social networks and media exposure in D2 country, see Voeten 2021; Ziliak 2012).
A second finding from this study was that participants in the QF group showed some evidence of an own-dialect advantage over the other two participant groups in their perception of the target contrasts, supporting results of past cross-dialectal perceptual studies (e.g., Adank et al. 2009;Clopper and Bradlow 2008;Dufour et al. 2007;Floccia et al. 2006;Impe et al. 2008). Namely, compared to the other two groups, QF listeners showed higher accuracy in discriminating the word pairs distinguished by the smallest acoustic differences (i.e., configurations 3-7 and 4-6). This advantage was particularly evident for QF listeners' discrimination of the /ε~aε/ contrast, as participants in this group showed high accuracy in discriminating between word pairs testing this contrast even at the most acoustically similar configurations of 3-7 and 4-6. One reason why QF listeners might have shown higher overall accuracy in discriminating between /ε~aε/ vs. /a~A/ may be that the mid contrast shows more dynamic spectral movement across the length of the vowel (see Figure 3) as compared to the low contrast. Furthermore, the fact that the QF participants proved more similar to the other two groups in discriminating /a~A/ vs. /ε~aε/ may be attributable to the fact that the former contrast is a 'phonetic creation' of QF (Reinke and Ostiguy 2016, p. 55), whereas the latter contrast has historically been present in HF. The different perceptual patterning found for each contrast could also be due to the different social evaluation of /a~A/ vs. /ε~aε/ (with the low contrast being more socially neutral than the mid contrast in QF). This finding may also have to do with the fact that /a~A/ is still undergoing merging in HF (Berns 2019), and thus is likely more familiar than the /ε~aε/ contrast to HF speakers. For instance, Chalier (2021) showed that the opposition in /a~A/ is still perceived among HF speakers, even if they do not make this contrast in production.
Future research on this topic could address some of the limitations of the current study. For instance, given that the conclusions of this study are based on relatively small sample sizex, replication studies with more participants could help determine whether the patterns identified in this study generalize to larger populations. Moreover, the potential effect of the relative unnaturalness of the experimental stimuli on the results should be explored in future work. In particular, given that the mid-vowel stimuli (i.e., laide-l'aide, mettre-maître) were rated as less natural-sounding than the low-vowel stimuli (i.e., patte-pâte, tache-tâche), future experiments would do well to normalize this aspect of the stimuli, to be able to mitigate any potential effects of perceived naturalness on perceptual discrimination. Notwithstanding these limitations, this study corroborates past research showing dialectal background to influence perceptual discrimination of vowel categories in French (e.g., Riverin-Coutlée and Arnaud 2015), thus advancing an understanding of the effects of native language background on speech perception more generally.

Conclusions
This study contributes to a growing body of research exploring the role that perceptual adaptation plays in SDA (see, e.g., Bowie 2000; Iverson 2004, 2007;Voeten 2021;Walker 2018;Ziliak 2012). Indirectly, it also contributes to studies in L1 perceptual attrition as it explores the malleability of the native language system upon new system acquisition (either the D2 or the L2). Surprisingly (see, e.g., de Leeuw et al. 2023; who show a perceptual advantage for bilingual returnees), we found some numerical evidence that HF participants with extended exposure to QF as a D2 showed less accuracy in discriminating between phonemic QF contrasts /a~A/ and /ε~aε/ as compared to non-mobile HF speakers with little first-hand exposure to QF. Our finding that the HF>QF group was at least descriptively less accurate in their perception of non-native QF contrasts compared to the non-mobile HF group motivates further examination of both bilingual and bidialectal populations. Future studies on perceptual adaptation in SDA and SLA could help contribute to a unified theory of which perceptual capacities and variables are most likely to be influenced following D2/L2 contact.
Author Contributions: The research idea was conceptualized by S.K. with feedback from E.d.L. The experiment was designed and programmed by S.K. with feedback from E.d.L. Data were collected online by S.K. The statistics in R were conducted by E.P. with feedback from S.K. and E.d.L. The manuscript was mainly written by S.K. with E.P. writing the statistics/results sections and E.d.L. and E.P. giving feedback on all sections. The research was funded by a PhD scholarship to S.K. with E.d.L. as the primary supervisor. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the LISS DTP Research Training and Support Grant, made possible by a ESRC-funded PhD scholarship to S.K. with E.d.L. and Professor Leigh Oakes as supervisors.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of Queen Mary University of London (QMERC20.559, 20th January 2022).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.