1. Introduction
Pitch is a feature employed at different levels of a phonetic token to fulfill various grammatical functions. In tonal languages, pitch indicates tone, thereby distinguishing different semantic meanings. Conversely, pitch at the word level can signify stress, similar to many stress-timed languages, and at the phrasal and sentence levels, it creates intonation patterns. It is, therefore unsurprising that Mandarin Chinese, as a tonal language, is often reported as challenging for speakers of non-tonal languages to perceive (cf. [
1], for French; [
2,
3], for English) and to produce (cf. [
4,
5], for English, [
6] for Japanese, English and Korean; [
7,
8] for Polish).
Several cross-linguistic studies have documented tone production errors among speakers of non-tonal languages. Some studies suggest that T1 and T4 are easier to acquire (cf. [
9,
10,
11]), while others indicate that T2 and T3 are easier [
5]. Most research focuses on monosyllabic words, showing that tone errors can be analyzed based on tonal register or tonal contour. When examining interlanguage tonal distribution, stress-timed languages appear to have a limited impact on learners’ ability to discriminate and produce tones [
12,
13,
14]. However, studies focusing on disyllabic words, particularly among Polish learners of Mandarin, are rare.
Regarding the difficulties encountered by non-native Mandarin speakers in learning the tones of disyllabic Mandarin words, the most extensive research has been conducted on Japanese learners. For instance, ref. [
15] found that Japanese students learning Mandarin frequently made errors when the first syllable was T2 or T3. Most tonal errors occurred in disyllabic words with the tone combination T2T1, especially when the first syllable was T2, which was often incorrectly pronounced as T2T3. The researchers attributed this to the Japanese language, where disyllabic compound words typically allow only one peak, meaning the first syllable must have an opposite stress to the second syllable. Similarly, ref. [
16] examined 25 Japanese learners of Mandarin who were asked to pronounce 80 disyllabic Mandarin words with all possible tone combinations. Most errors involved T2 and T3, with the majority occurring in the first syllable when it was T2 or T3, especially when followed by high starting tones such as T1 and T4. Errors in the second syllable predominantly involved T3. A similar error pattern was found in [
17], which involved 25 Korean learners of Mandarin. The difficulty for the first syllable was T2 = T3 > T1 = T4, with more errors when T2 or T3 was followed by the high starting tones T1 or T4.
Ref. [
18] investigated the difficulties of beginner Japanese students in perceiving and producing disyllabic words. They found that initial consonants were more challenging than vowels in perception, but there was no similar result in production. Ref. [
19] examined nine German participants in Mandarin Chinese perception and production tasks, finding that they had the worst performance in perceiving rising tones. For the first syllable, T2 was the most difficult to produce, while T4 was the most challenging in the second syllable. Overall, no similar results were found between disyllabic and monosyllabic words in production performance, with participants appearing to use their preferred tone combinations to handle disyllabic words. Taken together, the production of Mandarin disyllabic words among L2 learners of Mandarin Chinese who speak Japanese, Korean, or German is gaining increasing attention. This trend likely results from the prevalence of disyllabic communication patterns in modern Mandarin [
20]. Therefore, describing the production process of L1-dominant Polish speakers learning Mandarin (L2) cannot omit their current performance of disyllabic words.
Previous research on high-variability training has yielded mixed results. Some studies have demonstrated significant advantages [
21,
22,
23,
24], while others have found no significant benefits [
25,
26,
27,
28]. For instance, ref. [
27] examined high-variability training for Dutch Mandarin learners with Mandarin disyllabic pseudo-words. Their findings indicated that such training was effective in post-test perceptual recognition, but this was contingent on the learners’ individual abilities. Specifically, increased variability hindered perceptual learning for low-ability perceivers while enhancing it for high-ability perceivers. Thus, the process of acquiring tones by L2 learners warrants careful consideration.
Polish, a Slavic language, is a stress-timed language. It typically features multisyllabic words and a relatively rich consonant system, including retroflex and palatal fricatives and affricates, while its vowel system is relatively simple, comprising six oral and two nasalized vowels: /i, ɛ, ɨ, a, u, ɔ, ɛ͂, ɔ͂/. In Polish, stress predominantly falls on the penultimate syllable [
29]. In contrast, Mandarin Chinese is characterized as a monosyllabic language, where the meaning of a word is determined by its lexical tone. Mandarin tones are defined by relative pitch rather than absolute pitch. Recently, ref. [
20] discusses the transition of Mandarin Chinese from a predominantly monosyllabic to a disyllabic language. The primary reason for this shift is the high number of homophones in monosyllabic words, which leads to communication difficulties. According to the five-point scale proposed by [
30], Mandarin tones are described as follows: high-level T1 is 55, rising T2 is 35, low or falling-rising T3 is 214, and falling T4 is 51.
When Mandarin Chinese phonation pitch discussed, the tones’ fundamental frequency (F0) requirements must be taken into consideration. As for Mandarin Chinese, F0 operates between 162 Hz and 352 Hz (a female) and 68 Hz and 223 Hz (a male) when pronouncing the syllable ma in four Mandarin Chinese tones [
8]. The average means of F0 from another study carried out by Xu 1997 [
31] examining eight male speakers, run from about 90 Hz (for T3) to slightly above 140 Hz (T4). Similar results were found in Tseng 1981 [
32], for one female) and Howie ([
33], for one male) with the tonal range from 45 Hz to 250 Hz (female) and 40 Hz to 157 Hz (male). In addition, ref. [
34] reports that the frequency for male speakers ranges from roughly 78 to 185 Hz, and for female speakers from around 104 to 262 Hz. The range of spoken Mandarin Chinese (a tone language) is greater than that of English (a stress language) [
35] and the range of females is wider than that of males. Furthermore, the tonal range of a single speaker may differ depending on which language they are speaking. Ref. [
4] pointed out that the average pitch range of native Mandarin Chinese speakers is 1.5 times wider than that of native English speakers when speaking English, but the range of English speakers may increase when they switch from their English to Mandarin Chinese. (104–105 of [
36]) report that most Polish words operate within the 1300–1600 Hz range for speech, with a significant range of 100–2300 Hz;
Table 1 shows the details of the tonal ranges in Mandarin Chinese and Polish.
On the other hand, Polish speakers reveal a significantly higher F0 register when compared to English and German speakers (pp. 1308–1310 of [
37]; p. 654 of [
38]), to which the sibilants contribute evidently. In the study by [
37], the pitch range and variation characteristics of English, German, Bulgarian, and Polish were examined. They found that for both male and female speakers, the average pitch and pitch range (s.t.) as well as the maximum F0 used by speakers of Slavic languages (Bulgarian and Polish) were significantly higher than those of the Germanic languages (English and German), and also exhibited greater standard deviation. Similarly, ref. [
38] found differences in acoustic parameters such as F0 among speakers of German, Italian, and Polish. Polish speakers had the highest mean F0, significantly higher than the lowest mean F0 found among German speakers. Their research implies stereotypes about speaker voice quality in different linguistic communities: Italian speakers are often perceived as having “rougher” voices, while Polish speakers are considered to have “clearer” voices.
More specifically, concerning the perception aspects, the loss of the speech spectrum below 350 Hz in the Polish language results in only a 2% reduction in speech clarity. Thus, low-frequency sounds in speech appear to have minimal significance. The frequent occurrence of sibilants and consonant clusters in Polish, which utilize the higher frequency range, suggests that comprehensible speech can be estimated at 85% clarity within the 1 kHz–5 kHz range. Consequently, extensive exposure to speech frequencies within the 400 Hz–4000 Hz spectrum, necessary for understanding Polish, does not provide the listener with adequate sensitivity to bass and treble sounds ([
36]: pp. 104–105). High-pass and low-pass filtering applied to the Polish language reveals the importance of a specific frequency range, shown in
Table 1. We can assume that the pitch accent in Polish over a syllable, a word (a lexical stress) or a phrase (an intonational stress), represented by the fundamental frequency in a certain range of register, is still distant (i.e., higher) from the F0 exploited in tonal phonemes (also known as tonemes) in modern Mandarin Chinese.
Regarding the study of tonal production by Polish learners of Mandarin, ref. [
7] asked a phonetician and 15 native speakers of Taiwanese Mandarin to evaluate recordings made by 10 Polish students (4 M and 6 F). The results indicated that the learners’ production of T3 had a longer rising part, which evaluators often perceived as T2. Conversely, T2, which starts with a slight dip followed by a rise, was often mistaken for T3. Ref. [
39] used 108 phrases (17 monosyllabic words and 91 disyllabic words) and compared the production of 26 Polish learners of Mandarin (10 M and 16 F) with that of 10 Taiwanese Mandarin speakers. Two native Mandarin speakers with professional phonetic training evaluated the production using a five-point scale. Additionally, z-score analysis was conducted on 11 evenly spaced points, with the latter six points used as the basis for the study. The study found that Polish learners, who received scores of 4–5 for their tonal performance, had overall mean values and tonal slopes closer to those of Taiwanese Mandarin speakers. Although there was a significant difference in the mean values for disyllabic words starting with T3, this difference was smaller than that for Polish participants, who received scores of 1–2. Furthermore, Polish participants who received scores of 1–3 had mean values and tonal slopes significantly different from those of Taiwanese Mandarin speakers, particularly for disyllabic words starting with T1 (too low) and T3 (too high). Errors in tonal contour primarily occurred in disyllabic words starting with T2 (similar to T1) and T4 (similar to T2). Overall, the tonal range (tonal register) of Polish learners was narrower than that of Taiwanese speakers. The main difficulty for the learners was establishing the concept of pitch variation at the syllabic level.
Ref. [
40] found that for Polish learners of Mandarin at the A1 level, improvement in monosyllabic words could be achieved by training T4 to start at the same high pitch as T1, resulting in a production similar to native speakers. In contrast, ref. [
41] found T2 and T3 particularly problematic for six Polish adult learners (three M and three F) in their mid-twenties with B2/C1 proficiency levels.
In terms of perception, ref. [
42] examined how Polish learners perceive Mandarin monosyllabic tones and found that T3 had the lowest correct perception rate among Polish learners. Further studies by [
43,
44] continued to investigate the tonal performance of Polish learners in continuous speech. Evaluations by three native Mandarin speakers using statistical analysis also found that T3 was the most challenging tone for Polish learners. These studies revealed that in continuous speech, the ending point of each tone has a greater impact on the overall understanding of the tone than the starting point.
The findings can be categorized into two areas of investigation: differences in overall mean values and differences in overall slope. Firstly, there is a significant difference in the overall mean values between Polish learners and Taiwanese Mandarin speakers, especially for those advanced Polish learners, who scored 4–5 in tonal performance evaluations. Secondly, the overall slopes of all tonal combinations for Polish learners differed significantly from those of Taiwanese speakers across various tonal combinations, indicating inconsistency in pitch variation patterns. In summary, these studies highlight the specific difficulties Polish learners face with certain tones in Mandarin and suggest that focusing on the starting and ending points of tones, as well as overall tonal patterns, can aid in improving their production and perception.
This study aims to provide a detailed analysis of tonal acquisition in Mandarin Chinese disyllabic words by Polish learners of Mandarin using several statistical methods, including a mixture random effects model and pairwise comparisons, to offer a realistic observation. Additionally, the study will present pedagogical suggestions to enhance future Chinese tone instruction.
3. Results
Table 2 illustrates the influence of Polish learners’ tonal performance in Mandarin, revealing a significant positive coefficient of 0.294 for Polish learners. This suggests that compared to native Taiwanese Mandarin speakers, Polish learners tend to have a higher pitch in their tone production. The reference tone used in this analysis is Tone 4, both in the first and second syllables, against which the other tones (T1, T2, T3) are compared. Significant differences were observed between Tone 4 and the other tones. Specifically, in the first syllable, T1 showed a significant positive effect on the dependent variable with a coefficient of 0.231. Conversely, T2 and T3 demonstrated significant negative effects with coefficients of −0.295 and −0.293, respectively. These results indicate a clear deviation in tonal production between T1 and the lower pitch tones T2 and T3 when compared to T4. In the second syllable, a similar pattern emerged. T1 had a positive impact with an estimated coefficient of 0.279, suggesting a higher pitch effect similar to the first syllable. However, T2 and T3 exhibited negative impacts with coefficients of −0.137 and −0.263, respectively, indicating a lower pitch effect compared to T4. These findings underscore that the overall tonal register of L1-dominant Polish speakers learning Mandarin (L2) is relatively elevated compared to that of native Taiwanese Mandarin speakers. The data from
Table 2 provide a statistical validation of these tonal differences, highlighting the distinct tonal acquisition patterns observed in Polish learners of Mandarin.
Given the significant differences between the two groups, we further analyzed the T-values of the tones in different positions using pairwise comparisons, after adjusting for other variables. For example, considering the tones in the first syllable: when T1 occurs in the first syllable, our data indicate significant differences between T1 and T2, T3, and T4, as shown in the SIG column. The only exception is that T2 cannot be produced distinctively from T3, with a SIG value of 0.859.
Combining the data shown in
Table 3 and
Table 4, we found that T2 and T3 produced in the first syllable of a disyllabic word do not distinguish from one another effectively. This demonstrates that the production of T2 and T3 in the first syllable can easily be confused.
The tonal trajectories of all tonal combinations are shown in
Figure 1. Generally, L1-dominant Polish speakers learning Mandarin (L2) exhibited a higher tonal register compared to TM natives. As detailed in
Figure 1, the production of the first syllable (points 1 to 11) for the tonal combinations T1T1, T1T2, T1T4, and T2T1, when compared to TM natives, reveals significant differences. Similarly, the production of the second syllable (points 12 to 22) for the tonal combinations T1T1, T1T2, T3T2, and T4T2, indicates that the first syllable with T1 and the second syllable with T2 present more difficulties for L1-dominant Polish speakers learning Mandarin (L2). Notably, our results show that T3 and T4 seldom pose significant challenges for L1-dominant Polish speakers learning Mandarin (L2).
4. Discussion
Overall, TM natives have a relatively narrow tonal range compared to L1-dominant Polish speakers learning Mandarin (L2), likely due to ongoing utilization. Conversely, L1-dominant Polish speakers learning Mandarin (L2) face difficulty with Mandarin Chinese tones involving high pitch values when producing disyllabic words, indicating that T1 and T2 are potential challenges. Additionally, a common pattern among all tones is their relatively higher register compared to TM natives.
As mentioned before, most words in Polish operate in the 1300–1600 Hz range for speech. This suggests a significant habitual pitch range difference between Mandarin Chinese and Polish. Furthermore, losing the speech spectrum below 350 Hz in Polish results in only a 2% reduction in speech clarity, indicating a higher habitual pitch range in Polish as well [
36]. This also suggests that if T3 appears in either position within a disyllabic word, the likelihood of producing significantly different tones is rare, with the exception of T3T2. The distinct behaviors of T1 and T2 will be discussed in the following sections.
4.1. High Trajectories of T1 and T2
The overall tonal register of L1-dominant Polish speakers learning Mandarin (L2) is significantly higher than that of TM natives because both T1 and T2 end at a high pitch point, 5. Taking T1 as an example, we observed a relatively higher tonal register of T1 produced by L1-dominant Polish speakers learning Mandarin (L2), whether in the first or second syllable, except for the tonal combinations of T1T3, T2T1, T3T1, and T4T1. This suggests that when a low-tone point of the tones precedes T1, it can normalize the high pitch of T1. The habitual higher pitch ranges of Polish may lead to a significantly high register of T1, even after being transformed into T-values individually.
The issue with T2 lies in its higher register rather than its contour. We observed a deviation towards a high register for T2 in all tonal combinations, except for T2T2 and T2T3. Once again, T3 is suspected to contribute to this. It is worth noting that if T3 cannot be lowered, for example, in T3T2, T2 cannot be normalized if T3 itself is not produced well enough. Thus, the high trajectories of T1 and T2 can be mitigated when they co-occur with T3, a low tone.
Ref. [
49] conducted an experiment demonstrating that language transfer effects impact the F0 performance of learners acquiring Mandarin Chinese as a target language. For instance, in English, which is also a stress-timed language, F0 elevation typically occurs towards the end of interrogative sentences, whereas in Mandarin Chinese, F0 elevation spans across the entire sentence. These differences may lead English-speaking Mandarin Chinese learners to transfer their English-speaking habits to Mandarin Chinese, resulting in errors in intonation F0. Specifically, the contour tones T2 and T4 did not present learning difficulties in this study; instead, the challenge lay in tonal register. This is attributed to the rising contour being LH and the falling contour being HL in intonation. Despite extensive practice by L1-dominant Polish speakers learning Mandarin (L2) in sentence-level exercises, the mastery of tonal register remained elusive. Therefore, for L1-dominant Polish speakers learning Mandarin (L2), the application of F0 in Polish sentence intonation assists in acquiring tonal contour; however, due to Polish’s habitual use of a higher F0 range, errors in pitch height persist whenever high-tone contours are involved in the target language.
Furthermore, our research indicates that L1-dominant Polish speakers learning Mandarin (L2) exhibit a high register in the production of T1 and T2 in disyllabic words. This partially aligns with previous studies on Japanese, Korean, and German learners, which identified T2 as particularly challenging ([
15,
16], for Japanese learners; [
17], for Korean learners; [
19], for German learners). Most challenges in these studies, as well as in ours, occur in the first syllable of Mandarin disyllabic words. More specifically, we found that T2 is often confused with T3 in the first syllable. Of course, more research is needed, as Zajdler et al. (2013) [
42], along with Zajdler et al. (2019) and Chu et al. (2024) [
43,
44], found T3 to be the most difficult tone in both perception and continuous speech contexts.
Interestingly, ref. [
39] reported that L1-dominant Polish speakers learning Mandarin (L2) face difficulties with all four tones in the first syllable of disyllabic words. Our findings differ, possibly due to differences in the proficiency levels of learners. Ref. [
39] studied L1-dominant Polish speakers learning Mandarin (L2) who had attended a seven-week Mandarin study program in Taiwan with over 60 h of instruction, whereas our participants have been studying Mandarin in a Polish sinology department for more than six months.
In this study, comparing the tonal ranges of disyllabic words (2–3 T-values) among native Mandarin speakers, they appear relatively smaller compared to monosyllabic words (4.25–2.25 T-values), reported by Chu et al. [
40] with a difference of approximately 1 T-value for disyllabic words and 2 T-values for monosyllabic words. L1-dominant Polish speakers learning Mandarin (L2), on the other hand, maintain a tonal range of 4.25–2 T-values for monosyllabic words and 4–2 T-values for disyllabic words, showing a consistent 2 T-value difference between the two. This suggests that native Mandarin speakers tend to demonstrate tones in a relatively sloppy manner in disyllabic words, possibly because disyllabic words already carry meaning and native speakers do not necessarily need to articulate tones precisely to achieve communicative goals. Hence, tonal ranges are smaller in disyllabic words for native speakers. In contrast, ref. [
49] reported that English native speakers at the HSK5 level exhibited smaller tonal ranges than the control group of native Mandarin speakers across three target syllables, indicating that L1-dominant Polish speakers learning Mandarin (L2) have adjusted their tonal ranges to approximate those of native speakers.
For Japanese and Korean L2 learners, errors in T2 or T3 are particularly frequent when followed by T1 or T4, both of which are high-onset tones [
16,
17]. These findings suggest that disyllabic words should be viewed as connected units rather than as combinations of two independent monosyllables, inspiring our recommendation to practice tones in conjunction with T3. On the other hand, Polish speakers exhibit a significantly higher F0 range compared to German speakers [
37,
38]. Additionally, in terms of perception, the loss of the speech spectrum below 350 Hz in Polish only results in a 2% reduction in speech clarity. This suggests that low-frequency sounds in speech are not particularly meaningful for Polish speakers.
Therefore, we observe that due to the relatively smaller tonal range of native Mandarin speakers in disyllabic words, L1-dominant Polish speakers learning Mandarin (L2) tend to exhibit higher starting points in comparison. This results in subsequent higher tonal registers. To minimize the perceptual differences in tones between Polish learners and native Mandarin speakers, or to make Polish learners’ pronunciation more similar to that of native speakers, we recommend that L1-dominant Polish speakers learning Mandarin (L2) practice reducing the pitch of each syllable in disyllabic words when in a high tonal register. This approach will make their pronunciation sound more natural and less like a sequence of individual monosyllabic words concatenated together into disyllabic words. Furthermore, we would explain to Polish learners that in disyllabic words, native speakers already perceive a meaningful combination of syllables, and precise tone articulation in every syllable, especially in high tonal registers, may not be necessary. Moreover, due to the differences in tonal range between Polish learners and native Mandarin speakers in monosyllabic and disyllabic words, tones such as T1 and T2 associated with high tonal registers are affected. This is often caused by starting pitches in disyllabic words being too high, leading to overall higher tonal contours, whether flat, rising, or falling, compared to native Mandarin speakers. The habit of using higher registers originates from the language transfer effects from Polish itself.
To sum up, we suggest that L1-dominant Polish speakers learning Mandarin (L2) relax their vocal cords when pronouncing the first syllable of disyllabic words, avoiding excessively high pitch as the starting point. With practice, they can gradually approach the pronunciation of disyllabic words more closely to that of native Mandarin speakers. The problem is particularly relevant from the perspective of language learners who aim to achieve native-like production of non-native phonemes. This intention is closely associated with a large number of Mandarin learners whose native languages lack lexical tones. Learning Mandarin tones through the assumption of universally existing intonation patterns in language, especially for contour tones like T2 and T4, can potentially facilitate more effective learning experiences for Mandarin learners from non-tonal language backgrounds.
4.2. Pedagogical Implications
Our results indicate that tonal contour is not a problem for L1-dominant Polish speakers learning Mandarin (L2), but tonal register is. These results contrast with previous findings, such as those by [
39], which indicated that L1-dominant Polish speakers learning Mandarin (L2) typically have a smaller tonal range compared to Taiwanese Mandarin speakers. This suggests that the group of L1-dominant Polish speakers learning Mandarin (L2) in our study has already overcome the issue of tonal range. On the other hand, ref. [
49] proposes methods for Cantonese and English learners acquiring Mandarin Chinese, emphasizing the need for learners to recognize potential errors in intonation, prosody, and stress patterns in Mandarin Chinese due to the phonetic features of their respective native languages, known as language transfer effects. Based on these findings, targeted instructional strategies are recommended to provide Mandarin Chinese models and practice opportunities. Thus, the minimal difference between L1-dominant Polish speakers learning Mandarin (L2) and TM natives in T3 and T4 within disyllabic words implies that acquiring a low-ending tone is easier or occurs earlier for learners compared to the high-ending tones T1 and T2. Given the historical adoption of different habitual pitch ranges in Mandarin Chinese and Polish languages, and the potential for low register tones to stabilize disyllabic word production, the pedagogical implications are as follows:
(a) L1-dominant Polish speakers learning Mandarin (L2) have appropriately expanded their tonal range. Therefore, more attention should be devoted to practicing the ending high-register tones. It is important to note that the difficulty in producing high-ending tones arises in the random sequence of tones in speech, compared to the predictable stress patterns of the Polish language. Therefore, more practice with the 16 tonal combination exercises can help L1-dominant Polish speakers learning Mandarin (L2) reduce the occurrence of unnatural production of all tones.
(b) One approach to inhibit hyper-articulated T1 and T2 is to continue practicing accompanied by Tone 3. Evidence suggests that when T3 precedes T1, T1 is produced more native-like (e.g., T3T1 or T1T3). However, the inhibitory effect of T3 seems less powerful when neighboring T2 (e.g., T3T2 or T2T3). This could be because producing T2 or T3 involves not only controlling pitch within the time duration and up to the end but also reaching a proper reference point, posing a greater challenge for L1-dominant Polish speakers learning Mandarin (L2). It appears that T4 within various tonal combinations achieves the greatest success. Therefore, encouraging L1-dominant Polish speakers learning Mandarin (L2) to practice the low ending point, starting with T4, is advisable. Both T3 and T4 are suitable candidates for tonal combination practice, as T3 remains in the low register and T4 benefits from its similarity to stress in Polish. Individuals can choose the tone appropriate for their tonal development.
(c) Specifically, since L1-dominant Polish speakers learning Mandarin (L2) perform better with disyllabic words that include low tone T3 or falling tone T4, and given that there is no significant difference between disyllabic words composed of T1 and T2 with T3 and T4, it is recommended that L1-dominant Polish speakers learning Mandarin (L2) start practicing with the disyllabic word combinations that they find easier. This approach is likely to help them stabilize the tonal register of T1 and T2 within disyllabic words.
(d) Among the difficulties, for initial practice, despite the fact that the T1T1 disyllabic word combination shows the greatest difference from native Mandarin speakers, it is still suggested to begin with T1T1 training. This is because, once the starting point of T1T1 is correct, maintaining a consistent rate of tonal vibration will make the overall disyllabic word tone more similar to that of native Mandarin speakers. Finally, practice should progress to other disyllabic word combinations involving T2. For T2, learners need to pay attention not only to the starting point but also to ensure that adjustments at the starting point do not lead to deviations in the tonal contour.
(e) Different language training paradigms have also shown partial effectiveness ([
27], for Dutch Mandarin learners). Therefore, this study suggests that language training paradigms should enable educators to better understand their students’ abilities and provide appropriate training accordingly. In other words, educators should flexibly adjust their teaching strategies based on the perceptual abilities of learners to better support their language learning. This approach emphasizes that it is not necessary to rigidly adhere to either high- or low-variability training.
5. Conclusions and Further Search
In this study, we employed a mixture random effects model along with pairwise comparisons to analyze the tonal challenges encountered by L1-dominant Polish speakers learning Mandarin (L2), particularly in the context of T1 and T2, relative to TM natives. Additionally, we utilized Python to visually represent these tonal variations through line plots, facilitating a clearer understanding of the differences in tonal domains between the two groups. Furthermore, practical pedagogical recommendations were provided based on our findings; for example, increased practice with 16 tonal combinations, particularly focusing on high-ending tones. Pairing with T3 and T4 to form disyllabic words can be beneficial, because T3 remains in the low-tone range, while T4 benefits from its similarity to the stress patterns in Polish. This approach may help stabilize T1 and T2 tones in disyllabic words. Despite the T1T1 combination showing the greatest deviation from native Mandarin Chinese, it is still recommended to start with T1T1 training as it may be more effective than starting with T2. Additionally, given the individual differences in learners’ perceptual abilities, educators should adapt their teaching strategies flexibly to suit different learners.
To sum up, this research serves to address previous gaps in the literature concerning the performance of L1-dominant Polish speakers learning Mandarin (L2) in disyllabic words, specifically focusing on the challenges associated with T1 and T2. Moreover, it offers tailored instructional strategies, suggesting the incorporation of low-ending T3 and T4 tones to complement the training regimen, thus enhancing the overall proficiency of L1-dominant Polish speakers learning Mandarin (L2).
This study reveals several findings that align with or diverge from previous research. Most prior studies on Japanese, Korean, and German speakers have identified difficulties with disyllabic words, particularly with T2 or T3, and noted position sensitivity. In contrast, our results indicate significant differences in the high tonal register of T1 and T2 compared to TM speakers, with no position sensitivity. Specifically, T2 and T3 are often confused in the production of the first syllable, confirming that T2 is a common difficulty for L1-dominant Polish speakers learning Mandarin (L2). Given the increasing importance of disyllabic word pronunciation, future research should explore whether there is an interaction or a stronger connection between tone and syllable position.
Although this paper primarily focuses on the tonal performance of L1-dominant Polish speakers learning Mandarin (L2) in disyllabic words, it is essential to consider that previous studies have emphasized monosyllabic tones due to the teaching progression from simple (monosyllabic) to complex (disyllabic) structures. This step-by-step approach in teaching—starting with monosyllables before progressing to disyllables—is logical. However, prior research on L1-dominant Polish speakers learning Mandarin (L2) shows that T2 and T3 are challenging in monosyllabic contexts [
7,
41], and T4 presents difficulties as well [
40]. This raises questions about whether difficulties with monosyllabic tones automatically resolve or exacerbate when learning disyllabic tones. A future research goal could be to trace the learning trajectory from monosyllabic to disyllabic tone pronunciation and compare the tonal performance in both contexts.