As a highlight of groundbreaking neural prosthesis in bioengineering, the cochlear implant (CI) provides accessibility to sound for individuals with profound hearing impairment [1
]. The remarkable success of the CI is largely attributable to the proper use of signal transduction technology to harness central auditory plasticity of the implantees [2
]. While CI users can significantly benefit from the neuroplasticity driven by the CI-empowered learning experience in developing their auditory, linguistic, and cognitive skills [3
], pitch perception poses a unique challenge for these individuals. A vast body of literature demonstrated that CI recipients show deficits in pitch-related perceptual tasks, including voice emotion perception [10
], speech prosody recognition [13
], music appreciation [16
], and lexical tone perception [19
]. One known factor here lies in the limitations of the contemporary CI multichannel technology that encodes degraded spectral-temporal signals of the auditory input. Degradations in the representation can impair and even preclude the reception of pitch cues. The current study on Mandarin Chinese speakers aimed to investigate applicable interventions in alleviating the deficits in the representation and reception of pitch information for the pediatric CI users.
There has been a growing number of CI prescriptions not only for individuals with bilateral deafness but also for those with profound unilateral hearing loss. Some clinical practices point to compelling evidence in improving CI individuals’ speech perception by means of bimodal hearing for unilaterally implanted candidates with some residual acoustic hearing. Bimodal hearing refers to the combination of two different solutions for hearing loss, with fitting a CI in one ear while using a hearing aid (HA) in the opposite ear. Behaviorally, successful bimodal listeners obtained an improvement in speech understanding with CI + HA stimulation (i.e., a unilateral CI together with a contralateral HA) relative to with a unilateral CI alone [22
]. The improved perceptual outcome in the bimodal stimulation over the CI alone condition is referred to as bimodal benefit, and it has been demonstrated in a number of studies of non-tonal languages. Several research groups provided corroborating evidence for bimodal benefits with task-specific tests in the perception of segmental linguistic features that contain low-frequency components, such as voicing, semivowels, and nasals [25
], or in the perception of supra-segmental linguistic features, such as intonation, emphasis, and stress [27
]. The behavioral improvements might be underpinned by the plasticity in auditory-related cortical areas as a consequence of the bimodal hearing experience. Nevertheless, the great heterogeneity of the CI population revealed a remarkable variability with respect to the degree of benefit from the bimodal stimulation. Some subjects tended to benefit significantly from the additional HA in the opposite ear, whereas others benefited less, or even not at all.
The great variability of the bimodal benefit leads to an initiative to pinpoint the contributing factors for this situation at the individual level. One important factor could be the degree of impairment in basic auditory function, which can lead to different amounts of residual hearing in the non-implanted ear [29
]. For example, Most et al. [28
] revealed significant negative correlations between the perception of supra-segmental features with bimodal stimulation and the unaided pure-tone average (PTA) of the non-implanted ear. However, it should be noted that the authors did not evaluate the relationship between the benefit magnitude in perceptual outcomes and the unaided PTA of the non-implanted ear directly. This might limit the power of validation on bimodal benefit for the individual participants. In another study, Zhang et al. [30
] demonstrated that the amount of bimodal benefit in speech understanding was significantly correlated with audiometric thresholds of the non-implanted ear (r = −0.814). Nevertheless, some studies failed to find a reliable relationship between acoustic hearing thresholds and benefit magnitudes [31
], which points to other possible contributing factors for bimodal benefits.
In recent years, there has been a surge in investigating the bimodal benefit on lexical tone perception in individuals from tonal languages [31
]. Lexical tones are a distinctive feature of tonal languages, which utilize pitch variations to indicate contrasts in word meaning. For instance, Mandarin Chinese has four tones that can be combined with the same syllable to indicate different words (e.g., /ma/ could respectively mean ‘mother’, ‘hemp’, ‘horse’ or ‘scold’ depending on whether the tone is flat, rising, dipping or falling). Findings from tonal language speakers have been inconsistent. Some revealed no significant bimodal benefit on lexical tone recognition either in quiet or in noise [36
]. Some reported a significant bimodal benefit for lexical tone recognition in noise but not in quiet [32
]. Others showed a significant bimodal benefit in quiet for tonal perception tasks that depend more on pitch cues [33
]. While most of the studies focused on the verification of bimodal benefits in various listening conditions, very few investigated the candidate factors that could account for the individual variability in bimodal benefit. One exception is Chang et al. [31
], which reported no significant correlation between the acoustic hearing threshold of the non-implanted ear and the bimodal performance in Mandarin tone recognition. The lack of solid evidence for the underlying contributing factors of the bimodal benefit variability leaves the topic open for further inspection. In particular, there has been little work on young bimodal children who are still under a sensitive period of auditory cortical organization for tonal acquisition.
Several initiatives have been launched to investigate the lexical tone perception in Mandarin children with CIs. Based on differences in pitch height and pitch contour, Mandarin tones are traditionally classified into four categories: Tone 1 (T1) with a high-flat pitch, Tone 2 (T2) with a mid-rising pitch, Tone 3 (T3) with a falling-rising pitch, and Tone 4 (T4) with a high-falling pitch. Prior studies have demonstrated that T1 and T4 are significantly easier to perceive than T2 and T3 for native CI children, with more confusion between the identification of T2 and T3 [21
]. However, it is still an open question whether additional acoustic hearing from the non-implanted ear with an HA has the potential to enhance the representation of different tone patterns and to alleviate the confusion between T2 and T3. Notably, although large individual variability was found across the studies, age at implantation was negatively correlated with tonal perceptual performance, with earlier implanted children obtaining higher perceptual scores. Age at implantation and duration of CI use have been demonstrated as two major demographic factors contributing to the variability in lexical tone perception in pediatric implantees [38
], and the two factors are also recognized as significant predictors of sentence-level intelligibility [39
]. Under the scope of neural plasticity, there is a critical period for cochlear implantation, with the auditory plasticity decreasing as age increases, and with the expected cortical organizing as implant experience accumulates [1
]. Thus, the demographic factors need to be considered to explain pediatric participants’ bimodal outcome (including bimodal benefit and benefit magnitude) in both benign and adverse listening conditions.
The primary objective of this study was to investigate the nature of the bimodal benefit for Mandarin tone recognition at both group and individual levels in native kindergarten-aged children who use CI in one ear and HA in the other. Three specific hypotheses were tested: (1) the child participants would show confusion between T2 and T3, which can be alleviated by bimodal hearing; (2) the noisy listening condition would exhibit more robust bimodal benefit than the quiet condition; and (3) the audiometric thresholds and the demographic factors of the participants could account for the bimodal benefit in the preschool bimodal participants. The findings of this study would contribute to our understanding and potential development of age-appropriate intervention programs for aural rehabilitation and speech intervention for the pediatric CI population of tonal language speakers.
3.1. Lexical Tone Recognition
Group mean accuracy data for the four lexical tones are shown in Figure 3
, contrasting performance for the two device conditions in quiet and in noise. The average RAU scores of tonal identification for CI alone and CI + HA device conditions were, respectively, 93.95 (SD
= 11.09) and 95.04 (SD
= 11.64) in quiet, and 61.76 (SD
= 17.91) and 69.17 (SD
= 16.62) in noise. In quiet, both T1 and T4 obtained nearly perfect scores whereas T2 was mainly misidentified as T3 (accounting for 88.54% of all errors), and T3 was mainly misidentified as T2 (accounting for 87.28% of all errors). In noise, T4 was identified with the fewest errors, and was relatively evenly misidentified as the other three tones. However, T1 was mainly misidentified as T2 (accounting for 49.76% of errors), T2 was mainly misidentified as T3 (accounting for 48.21% of errors), and T3 was mainly misidentified as T2 (accounting for 56.17% of errors). The results demonstrated the confusion between T2 and T3 identification both in quiet and in noise. See Supplementary Table S3 and Table S4
for detailed information of the confusion matrices of the four tones in quiet and noise.
Statistical analysis for the accuracy data revealed a significant main effect of device condition (F = 5.96, p = 0.02). Compared with CI alone, lexical tones were overall significantly better recognized with bimodal stimulation (t = 2.41, p = 0.03). The significant main effect of tone type (F = 12.99, p < 0.001) was also revealed, with the identification of T4 significantly better than all other three tones (all ts > 3.19, ps < 0.05). Moreover, there were also significant interaction effects of device condition by listening condition (F = 4.14, p = 0.04). Further post-hoc comparisons in quiet revealed no significant difference on lexical tone recognition between the two device conditions, with only T3 showing marginally better CI + HA performance than with CI alone (t = 1.85, p = 0.06). By contrast, in noisy listening condition, the perceptual performance of lexical tones was significantly better in bimodal hearing than in CI alone condition (t = 3.13, p = 0.003).
Additional LME models for error analysis found a significant three-way interaction of tone type by listening condition by device condition (F = 4.64, p = 0.03). Post-hoc pairwise comparisons with Bonferroni adjustment showed that T3 was less misidentified as T2 for CI + HA condition relative to CI alone condition in quiet (t = -2.06, p = 0.04), but not in noise (p = 0.97).
The individual listeners’ results are displayed in Figure 4
. Paired-samples T tests were conducted to compare the perceptual performance of each individual between CI alone and CI + HA device conditions. In quiet, no subject showed a significant difference between the two device conditions (all p
s > 0.08). However, in noise, a significant bimodal benefit was revealed for S13 (t
= 3.69, p
= 0.005), and S14 (t
= 4.1, p
3.2. Normalized Bimodal Benefit
To account for individual differences in performance, the normalized benefit score compares the two device conditions and computes relative gain in the bimodal condition over the CI alone condition for each individual. On average, the normalized benefit score on lexical tone recognition was +16.64 (ranging from -6.35 to +69.23) in quiet and +17.64 (ranging from -1.89 to +34.55) in noise. Five children had negative scores in quiet condition, suggesting the lack of bimodal benefit (p
= 0.42, exact binomial test). In comparison, 12 children showed positive scores in the noisy condition, indicating the tangible bimodal benefits (p
< 0.01, exact binomial test). The mean normalized benefit score was calculated as the average score of the two listening conditions for each participant. The average score was +17.14 on group level, ranging from 0 to +50.69. The normalized benefit scores for each child participant in quiet and noise are shown in Figure 5
3.3. Regression Analysis Results
For the CI alone condition, the linear regression models revealed that CI duration was marginally associated with lexical tone recognition across listening conditions (F = 3.65, p = 0.08). Specifically, CI duration was a significant predictor for lexical tone recognition in noise (F = 4.89, p = 0.047), with increased CI use contributing to a higher accuracy score of tonal recognition in noise, but not in the quiet condition (p = 0.45). Similar results were obtained for the CI + HA condition. CI duration was significantly correlated with lexical tone recognition in noise (F = 5.59, p = 0.036), but not in quiet (p = 0.4).
The duration of bimodal use was a significant predictor for the bimodal benefit in lexical tone recognition across the two listening conditions (F = 5.73, p = 0.034). Overall, the normalized benefit scores improved as a function of increase in the combined use of CI and HA. Moreover, the PTA at three low frequencies in the non-implanted ear was significantly correlated with bimodal benefit in quiet (F = 8.09, p = 0.015), with a lower PTA resulting in a higher normalized benefit score. A visual inspection of residual plots revealed that the residuals were normally distributed without any obvious deviations from homoscedasticity for each model.