Spatial Release from Masking with Simulated Electric–Acoustic and Cochlear Implant Speech

Srinivasan, Nirmal; Borkowski, Bailey; Barkhouse, Morgan; Patro, Chhayakanta

doi:10.3390/ohbm7010015

Open AccessArticle

Spatial Release from Masking with Simulated Electric–Acoustic and Cochlear Implant Speech^†

by

Nirmal Srinivasan

^*,

Bailey Borkowski

,

Morgan Barkhouse

and

Chhayakanta Patro

Department of Speech-Language Pathology and Audiology, Towson University, Towson, MD 21252, USA

^*

Author to whom correspondence should be addressed.

^†

Portions of the data were presented at the 186th Meeting of the Acoustical Society of America.

J. Otorhinolaryngol. Hear. Balance Med. 2026, 7(1), 15; https://doi.org/10.3390/ohbm7010015

Submission received: 6 March 2026 / Revised: 2 April 2026 / Accepted: 3 April 2026 / Published: 16 April 2026

(This article belongs to the Section Otology and Neurotology)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Spatial release from masking (SRM) refers to the improvement in speech understanding that occurs when a target talker is spatially separated from competing speech. Although normal-hearing (NH) listeners benefit substantially from spatially separating the maskers from the target, cochlear implant (CI) users experience markedly reduced advantages due to degraded spectral and binaural cue transmission. Electric–acoustic stimulation (EAS), which preserves low-frequency acoustic hearing in combination with electric stimulation, may partially restore these cues, but its benefits at small, conversationally relevant spatial separations remain poorly understood. Methods: This study measured speech identification thresholds using Coordinate Response Measure (CRM) sentences in NH listeners using natural, EAS, and simulated CI speech across five spatial configurations (0°, ±5°, ±10°, ±15°, ±30°). Speech identification thresholds were measured using a one-up/one-down adaptive procedure with Coordinate Response Measure (CRM) sentences. CI simulation used an eight-channel noise-band vocoder, whereas EAS simulation replaced the two lowest-frequency vocoder channels with low-pass speech (≤500 Hz). All stimuli were spatialized using head-related impulse responses generated from a validated virtual-acoustics model. Results: All stimulus types showed improved thresholds with increasing spatial separation; however, the magnitude of spatial release from masking (SRM) varied systematically. Natural speech produced the lowest thresholds and largest SRM, EAS speech yielded intermediate benefits, and simulated CI speech produced the smallest improvements. Notably, EAS and CI simulations were comparable at small separations, but EAS provided significantly greater SRM at ±15° and ±30°. Conclusions: These findings demonstrate that even partial low-frequency acoustic preservation enhances SRM at moderate spatial separations, highlighting the importance of EAS configurations for improving spatial hearing in CI-related listening environments.

Keywords:

spatial release from masking; electric–acoustic simulation; simulated cochlear implant speech; small spatial separations

1. Introduction

Understanding speech in the presence of competing talkers, often referred to as the cocktail party problem [1], relies heavily on spatial hearing mechanisms. Listeners with normal hearing (NH) routinely benefit from spatial release from masking (SRM), defined as improved speech recognition when a target and masker differ in spatial location. Classical psychoacoustic models attribute SRM to two primary components: better ear listening, arising from acoustic head shadow effects that improve the signal-to-noise ratio (SNR) at one ear, and binaural unmasking, which leverages interaural time and level differences (ITDs and ILDs) to enhance segregation of competing sources [2,3]. Experimental work consistently shows that NH listeners can obtain substantial SRM—often 6–10 dB or more—depending on masker type, acoustic environment, and spatial configuration [4,5,6,7]. SRM also interacts with contextual factors such as reverberation, which disproportionately reduces energetic SRM while leaving informational masking benefits partially intact [5,8].

For cochlear implant (CI) users, speech in noise perception remains one of the most persistent functional limitations. CI processors transmit temporal envelope cues relatively well but degrade fine structure and low-frequency timing cues essential for accurate ITD-based binaural processing [9]. As a result, SRM for CI listeners is often significantly smaller than that observed for NH individuals, particularly when maskers are symmetrically arranged or at smaller spatial separations requiring precise binaural analysis [3,9,10]. In contrast, electric–acoustic stimulation (EAS, also called a hybrid cochlear implant (HCI)), which combines a CI for high frequencies with preserved acoustic hearing for low-frequencies, has demonstrated the ability to restore access to low-frequency temporal fine structure, ITDs, and other cues that strongly support spatial hearing and speech in noise performance. Large scale clinical studies report improved speech perception, better subjective spatial hearing, and greater sound quality in EAS users than in CI listeners, provided that low-frequency hearing is adequately preserved [10,11].

To better understand the mechanisms underlying these effects, many studies employ vocoder simulations, which enable systematic manipulation of spectral resolution, channel interactions, spatial geometry, and masker type while approximating key characteristics of CI or EAS processing [12,13]. Critically, binaural advantages can persist even when spectral resolution is reduced by vocoding, provided that interaural cues are preserved. It has been shown that individuals demonstrated nearly normal binaural benefits with spectrally degraded speech when target and interferers were spatially separated, reinforcing the central role of binaural cues in SRM under CI-like processing [14]. Recent simulation work has further shown that spectral smearing, a correlate of channel interaction, elevates speech reception thresholds and modulates the magnitude of SRM, highlighting the dependence of spatial advantages on spectrotemporal fidelity [15]. EAS simulations, which combine low pass acoustic speech with vocoded high frequency bands, consistently demonstrate better masking release and spatial cue access than electric-only simulations, reflecting the critical role of residual low-frequency information [16].

A key knowledge gap concerns small spatial separations. Historically, studies of SRM in CI and EAS listeners have used large separations (e.g., ±90°) that maximize head shadow benefits but do not reflect everyday communication, where talkers are often only a few degrees apart. Recent work using simulated CI speech with small spatial separations (e.g., ±2–±30°) showed that NH listeners exhibit significantly reduced SRM for vocoded speech relative to natural speech, reflecting limitations imposed by poor spectral resolution and degraded interaural cues [7]. It has been shown that children with bilateral CIs and NH peers emphasize cue-tradeoffs between head shadow and interaural differences and provide a functionally relevant metric that could be extended to adult simulations [17]. These findings underscore that when angular separation is small, head shadow advantages alone are insufficient; instead, listeners depend heavily on fine grained binaural cues, precisely the cues that EAS seeks to restore [2,16]

Environmental factors further constrain SRM. Reverberation reduces interaural coherence and smears amplitude envelopes, producing stronger decrements in spatial benefits for CI and EAS users than for NH listeners [8,18]. Likewise, device-level factors such as behind the ear microphone placement can distort spatial cues before they ever reach binaural pathways [9]. Another critical layer involves electric–acoustic interactions. Psychoacoustic studies show that electric stimulation can elevate thresholds for acoustic probe tones within overlapping frequency regions, reducing the benefit of low-frequency hearing unless fittings are carefully optimized [19]. Computational modeling work reveals that electric–acoustic interactions influence neural firing synchrony, phase locking, and dynamic range, providing mechanistic explanations for why EAS sometimes yields robust benefits and other times does not [20,21].

Although the present study employs NH listeners, the primary goal is not to directly generalize outcomes to CI or EAS users. Rather, NH listeners are used as a mechanistic model to isolate the effects of spectral degradation and partial restoration of low-frequency acoustic cues on SRM under tightly controlled conditions. This approach minimizes confounding influences common in clinical populations, such as auditory deprivation, neural plasticity, etiology, electrode placement variability, and device-specific fitting strategies. Vocoder-based simulations in NH listeners have therefore been widely used to estimate process-level constraints and theoretical upper bounds on spatial hearing performance with CI and EAS processing, while acknowledging that real-world clinical outcomes may differ.

The present study builds on this foundation by quantifying SRM at small, conversationally relevant separations using natural speech, simulated CI speech, and simulated EAS in NH listeners. By holding the acoustic scene constant while manipulating listening mode, the incremental contribution of low-frequency acoustic cues to SRM in at smaller target to masker separations where head shadow benefits alone are minimal were investigated. It is hypothesized that simulated EAS will yield lower speech identification thresholds and greater SRM than simulated CI speech but will not fully match natural speech performance. It is also expected that the differences among listening modes will reflect not only energetic SNR advantages but also sensitivity to perceptual segregation cues.

2. Methods

2.1. Listeners

Twenty-two young adults with normal hearing (mean age = 21.3 years; age range: 19–23 years) participated in the study. Air-conduction audiometric thresholds were obtained for all participants, and normal hearing was confirmed as thresholds ≤15 dB HL at octave frequencies from 250 to 8000 Hz. None of the participants demonstrated audiometric asymmetry, defined as interaural threshold differences greater than 10 dB HL. All study procedures were reviewed and approved by Towson University’s Institutional Review Board, and participants received financial compensation for their time.

2.2. Stimuli

Three male talkers from the Coordinate Response Measure (CRM, [22]) corpus were used in the experiment. CRM sentences follow the fixed carrier phrase: “Ready [CALL SIGN] go to [COLOR] [NUMBER] now.” The corpus includes eight possible call signs (Arrow, Baron, Charlie, Eagle, Hopper, Laker, Ringo, and Tiger), four colors (Blue, Red, White, and Green), and eight numbers (1–8), resulting in 256 unique combinations. On each trial, listeners heard one target sentence, identified by the call sign “Charlie”, presented concurrently with two masker sentences that used different call signs. Each talker produced a distinct color–number combination, and listeners responded by selecting the color and number combination for the call sign “Charlie”.

The CRM corpus was selected because it provides tightly controlled speech materials that minimize semantic predictability while producing robust informational masking. Although the response set is constrained, chance performance is low (1/32), and the adaptive threshold-tracking procedure targets performance well above chance, making systematic guessing unlikely to influence threshold estimation. The CRM corpus has been extensively validated in multi-talker and spatial–hearing research, including studies of spatial release from masking in normal-hearing listeners and cochlear implant users [6,7,8,9,23].

For the natural, simulated CI, and simulated EAS speech conditions, the target and masker signals were first convolved with location-specific head-related impulse responses (HRIRs) to generate stimuli containing appropriate binaural spatial cues. In the natural-speech conditions, the resulting direction-dependent signals were summed and presented diotically over headphones. In the simulated CI and simulated EAS conditions, the direction-dependent signals were summed and subsequently processed through the appropriate simulators before being delivered over headphones. Across all listening conditions, the target level remained fixed while the masker level was adaptively varied on each trial.

2.3. Cochlear Implant Simulation

Spectral degradation was introduced using a noise-band vocoder, a widely used method for simulating cochlear implant (CI) signal processing [24]. An eight-channel vocoder configuration was selected because it yields speech-recognition outcomes comparable to those achieved by high-performing CI users [25,26]. The input bandwidth was restricted to 150–8000 Hz, after which the stimuli were divided into eight frequency bands using fourth-order Butterworth filters (24 dB/octave). Band cutoff frequencies were assigned according to the Greenwood frequency-position mapping [27], ensuring a distribution that reflects cochlear tonotopy. Within each band, the temporal envelope was extracted through half-wave rectification followed by low-pass filtering at 160 Hz (24 dB/octave). These envelopes were then used to modulate corresponding band-limited noise carriers. Finally, all eight modulated signals were summed to produce the noise-vocoded stimuli, which were presented bilaterally to simulate CI listening conditions.

2.4. Electro Acoustic Simulation

To simulate EAS, the input signal was first bandwidth-limited to 150–7000 Hz and divided into eight analysis bands using fourth-order Butterworth band-pass filters. The two lowest-frequency vocoder channels were then replaced with a low-pass-filtered version of the original speech (cutoff = 500 Hz; fourth-order Butterworth), approximating the range of residual acoustic hearing typically preserved in EAS users. This approach follows established methods in the literature [28,29,30] and reflects patterns of low-frequency hearing retention commonly reported in cochlear implant recipients [31,32]. The remaining six channels underwent standard noise-band vocoder processing, in which the temporal envelope of each band was extracted and used to modulate a band-limited noise carrier. Finally, outputs from all eight channels, the low-frequency acoustic component and the six vocoded high-frequency channels, were combined to generate the simulated EAS stimuli.

2.5. Conditions

A virtual auditory spatial array was used to present all speech stimuli. Head-related impulse responses (HRIRs) were generated following the procedures described by [33], which use an image-model-based simulation to compute the directions, delays, and attenuations of early reflections [34]. These reflections, together with the direct path, were then spatially rendered using non-individualized head-related transfer functions (HRTFs). This simulation method has been shown to yield HRIRs that closely approximate the physical and perceptual characteristics of those measured in real acoustic environments. Five spatial configurations were tested: a colocated condition in which the target and both maskers were presented from 0° azimuth, and four spatially separated conditions with the target fixed at 0° and the maskers symmetrically positioned at ±5°, ±10°, ±15°, or ±30°. The HRIR set used in this study was the same as that employed in previous work examining the effects of small spatial separations between the target and the maskers on SRM using simulated CI speech [7].

2.6. Procedure

All participants were seated in a double-walled, sound-treated audiology booth in the Spatial Hearing and Auditory PErception (SHAPE) laboratory at Towson University. Auditory stimuli were delivered through circumaural headphones (Sennheiser HD 650; Sennheiser, Hanover, Germany). Stimulus generation was performed in MATLAB (MathWorks Inc., Natick, MA, USA), and signals were presented via a Lynx Hilo audio interface (Lynx Studio Technology, Costa Mesa, CA, USA).

A one-up/one-down adaptive procedure [35] based on the accuracy of reporting the color–number combination of the target sentence was used to estimate the target-to-masker ratio (TMR) required to identify 50% of the target call sign. In all trial blocks, the target speech was presented in the presence of competing masking speech. The target speech was presented at 20 dB above the pure-tone average (PTA) at 0.5, 1, 2, and 4 kHz and the masker levels were adjusted to achieve the required TMR. After each correct response, the masker level was increased by 5 dB; following each incorrect response, it decreased by 5 dB. After the first three reversals, the step size was reduced to 1 dB. Each adaptive track included nine reversals, and the TMR threshold was calculated as the mean of the final six reversals. Stimulus type varied randomly across trial blocks and the threshold estimates were averaged across three adaptive tracks for each of the spatial separation tested. Study participants responded to the speech stimuli using a computer monitor positioned directly in front of them. After each trial, feedback (“Correct” or “Incorrect”) was provided. Testing was self-paced, and listeners were encouraged to take breaks as needed to minimize fatigue and attentional effects. All testing was completed within a single experimental session lasting approximately two hours and no testing was spread across multiple days. These procedures were implemented to ensure stable performance.

2.7. Data Analysis

All statistical analyses were conducted using SPSS version 28.0 (IBM Corp., Armonk, NY, USA). Repeated-measures analyses of variance (RM-ANOVAs) were employed to examine differences in speech identification thresholds across spatial-separation conditions for natural, Simulated CI, and EAS speech. In addition, Pearson correlation analyses were performed to assess the relationship between identification thresholds at each spatial separation tested for the three different types of speech stimuli.

3. Results

Figure 1 shows the mean target-to-masker ratios (±1 standard error of mean) required to identify the target signal 50% of the time at the five different spatial separations tested in this experiment. With in each panel, the darker line indicates the mean thresholds while the lighter lines indicate individual thresholds of the listeners. An RM-ANOVA was conducted with stimuli type (natural, EAS, and CI simulated speech) and spatial separations (0°, ±5°, ±10°, ±15°, and ±30°) as within-subject factors and TMR as the dependent variable. Mauchly’s test indicated that the assumption of sphericity had been violated for spatial separation and therefore degrees of freedom were corrected using Greenhouse–Geisser estimates of sphericity (χ²(9) = 24.18, p = 0.004, ε = 0.57). Results indicated a significant main effect of stimuli type (F(2, 40) = 462.13, p < 0.001, partial η² = 0.96, indicating a very large effect) and a significant main effect of spatial separation (F(2.29, 45.70) = 178.90, p < 0.001, partial η² = 0.90 indicating a very large effect) on TMR thresholds. Also, there was a significant interaction between stimuli type and spatial separation on TMR thresholds (F (8, 160) = 21.02, p < 0.001, partial η² = 0.51 indicating a large effect).

To better understand the significant interactions, separate RM-ANOVAs were conducted for each of the stimuli type. There was a significant effect of spatial separation on TMR thresholds for all the three types of speech stimuli used (Natural Speech: F(4, 80) = 167.20, p < 0.001, partial η² = 0.89 indicating a large effect; EAS Speech: F(4, 80) = 44.17, p < 0.001, partial η² = 0.69 indicating a large effect; CI simulated Speech: F(4, 80) = 47.34, p < 0.001, partial η² = 0.70 indicating a large effect). A post hoc analysis using paired sample t-tests and Bonferroni correction showed that the TMR thresholds at all other spatial separations were significantly better than colocated conditions (all p < 0.05) for natural, EAS, and simulated CI speech conditions.

Spatial release from masking (SRM) was calculated as the difference between the TMR threshold at spatially separated condition and the colocated condition. Figure 2 shows the mean SRM (±1 standard error of the mean) at different spatial separations for the three types of speech stimuli used. To investigate the effect of speech stimulus type on SRM, an RM-ANOVA was conducted with stimuli type (natural, EAS, and CI simulated speech) and spatial separations (0°, ±5°, ±10°, ±15°, and ±30°) as within-subject factors. There was a significant interaction between stimuli type and spatial separation (F(6, 120) = 14.27, p < 0.001, partial η² = 0.42, indicating a large effect). Simple effect analyses indicated that the natural speech condition resulted in significantly larger SRM compared to EAS and simulated CI speech conditions at all spatial separations. There was no significant difference in the amount of SRM between EAS and simulated CI speech conditions at 5° and 10° of spatial separation between the target and the maskers. However, at larger spatial separations (15° and 30°) between the target and the maskers, there was a significant difference in the amount of SRM for EAS and simulated CI speech conditions with higher SRM for EAS speech stimuli compared to the CI simulated speech (all p < 0.001). Also, the difference in the amounts of SRM became increasingly larger as the spatial separation between the target and maskers increased.

Correlational analyses were performed to investigate the relationships between the individual TMR thresholds obtained at various spatial separations for the three kinds of speech stimuli. The scatterplots showing these relationships at the different spatial separation tested are shown in Figure 3. All the correlations were positive and statistically significant for all the spatial separations with the correlation value (Pearson’s r, df = 20 for all conditions) ranging between 0.39 and 0.65. These results indicate that listeners with better thresholds in the natural speech condition tend to have better thresholds in the EAS and simulated CI speech conditions as well.

4. Discussion

The present study examined speech identification thresholds and spatial release from masking (SRM) for natural, EAS, and CI simulated speech across a range of small-to-moderate spatial separations. Consistent with the “cocktail party” literature, spatial separation produced robust reductions in target-to-masker ratio (TMR) thresholds overall, but the size of the benefit depended strongly on stimulus type, revealing a large interaction between stimulus fidelity and spatial configuration. These findings reinforce and refine the established accounts of spatial hearing that integrate energetic unmasking via head shadow with binaural unmasking and object-based attentional selection.

4.1. Summary of Principal Findings

Across natural, EAS, and simulated CI speech, spatial separation between a target and competing talkers yielded robust spatial release from masking (SRM), with natural > EAS > simulated CI speech performance and a stimulus type × separation interaction. Post hoc contrasts confirmed significantly better TMR thresholds at all spatially separated conditions for each stimulus class, and SRM grew with increased spatial separation. Correlations of individual thresholds across stimulus types indicate shared listener factors that generalize from natural to degraded speech. These patterns dovetail with the classic SRM literature showing that separating speech sources in azimuth improves intelligibility via better-ear SNR, binaural unmasking (ITD/ILD), and facilitated auditory object selection, particularly when maskers are other talkers [4,5,6,7,8,23].

4.2. Main Effects of Stimulus Type and Spatial Separation

The strong main effect of stimulus type with best thresholds for natural speech, intermediate for EAS, and poorest for simulated CI speech aligns with evidence that spectral fidelity and access to pitch/fundamental-frequency cues facilitate segregation of competing talkers and reduce informational masking. Degrading speech via vocoding reduces spectral resolution and temporal fine-structure (TFS) cues that support stream formation and pitch-based grouping, elevating speech-on-speech thresholds [12,36]. The finding of smallest SRM for simulated CI speech echoes reports that limited encoding of ITDs and coarse envelope cues constrain binaural advantages under vocoding or electric hearing. The pronounced main effect of spatial separation on TMR thresholds is a hallmark of SRM: separating target and maskers yields benefits from head shadow (better-ear SNR), interaural time differences (ITDs), and interaural level differences (ILDs) that enhance both energetic and informational unmasking. Classic and contemporary work shows that SRM can be substantial for speech maskers and that binaural cues contribute beyond better-ear SNR, especially when interferers are other talkers [4,5,6,7,8].

4.3. Interaction of Stimulus Type with Spatial Separation

The significant stimulus type × separation interaction indicates that spatial benefits scale with acoustic fidelity and the reliance of SRM on binaural cues [37]. However, the magnitude of SRM varied substantially by stimulus type. With natural speech, SRM increased steadily with separation, consistent with listeners leveraging high-resolution spectral and TFS cues together with binaural differences to segregate competing talkers. The literature similarly reports larger binaural advantages and SRM when interferers are speech, with effects growing as spatial separations widen (within limits) and as scene complexity increases [6,23,38]. Also, prior studies show that when spectral cues are intact, binaural unmasking and spatial attention work synergistically to improve speech perception [39,40]. In contrast, CI-simulated speech produced the smallest SRM across separations, a pattern consistent with studies showing that vocoding and CI processors transmit envelope-ITDs and ILDs only coarsely, often limiting binaural fusion and unmasking [41,42]. Notably, faithful ITD cues (in the fine structure or in the envelope) are critical; when ITDs are scrambled or poorly encoded, SRM collapses even if ILDs are available [42,43]. Because CI simulations attenuate or eliminate temporal fine structure cues critical for localization and segregation, listeners could not benefit from spatial separation to the same degree as with natural speech. The hybrid condition yielded intermediate SRM overall, overlapping with CI-simulation at small separations (±5°, ±10°) but diverging at larger separations (±15°, ±30°), where hybrid outperformed CI-simulation and the gap grew with angle. This pattern is coherent with models and data suggesting that SRM increases with angular separation but at a diminishing rate and that access to residual low-frequency acoustic information preferentially boosts use of ITD cues at wider angles. The hybrid results therefore imply partial preservation of cue sets (e.g., low-frequency timing) that become increasingly useful as spatial separation grows [38,42].

4.4. Correlational Structure Across Stimulus Types

The positive correlations across stimulus types suggest stable individual listener traits affecting speech-in-noise ability, regardless of spectral degradation. Listeners who performed well with natural speech generally performed well with EAS and simulated CI speech, consistent with individual differences in factors such as cognitive processing efficiency, working memory, or attentional control [44,45]. The moderate correlation magnitudes (r = 0.39–0.65) further imply that while there is shared variance, each stimulus type also engages partially distinct perceptual or cognitive processes. When bottom-up matches are poor (e.g., under vocoding), listeners depend more on explicit working memory-based mechanisms, preserving rank-ordering across conditions [45,46].

4.5. Relevance to CI Hearing and EAS Strategies

The reduced SRM for CI-simulated speech echoes extensive evidence that bilateral CI users often show modest SRM and smaller binaural benefits than normal-hearing peers, largely due to limited access to precise ITDs and to interaural place mismatches. Hybrid approaches that preserve low-frequency acoustic hearing can restore some ITD sensitivity and improve spatial perception relative to purely electric hearing, helping to explain the hybrid advantage at larger separations in our data [42,47]. Moreover, clinical and review papers emphasize that preserved low-frequency hearing in the implanted ear (EAS) can enhance speech-in-noise and localization outcomes beyond bimodal (CI + HA contralaterally) fittings, particularly in multi-talker scenarios, although benefits vary with how much low-frequency hearing is preserved and how devices are programmed (e.g., spectral overlap) [48,49].

Overall, these results sharpen the picture of SRM in speech-on-speech masking by demonstrating that the magnitude and growth of SRM with spatial separation depend on the integrity of spectral and temporal cues. With natural speech, listeners exploit complementary mechanisms—better-ear SNR, ITDs/ILDs, pitch and timbre cues, and object-based attention—yielding the largest SRM across separations. With CI-simulated speech, the selective loss of fine spectral detail and TFS reduces auditory object formation and limits access to precise interaural timing, compressing SRM. The hybrid condition sits between, indicating that even partial acoustic preservation restores some critical cues, particularly at larger separations where timing-based information exerts greater leverage. This pattern is consistent with SRM models that partition contributions of angular separation vs. asymmetry, and with object-based accounts of attention in complex scenes [38,50].

4.6. Limitations and Future Directions

Two limitations merit emphasis. First, the study tested relatively small separations; larger angles, more talkers, and reverberant environments often reveal different balances of better-ear, binaural, and attentional contributions. Extending to ±60–±90°, adding room acoustics, or manipulating talker sex/F0 and head-orientation would provide a richer stress-test of mechanisms and may enlarge stimulus-type differences. Second, while CI simulations are invaluable, clinical bilateral CI cohorts are essential to validate predicted constraints (e.g., ITD/ILD sensitivity, spatial attention) and to probe how device synchronization and mapping shape SRM. Future work combining computational SRM models with listener-specific device profiles could specify realistic upper bounds on SRM for various CI strategies.

4.7. Implications

These findings underscore the importance of spectral fidelity for maximizing SRM. For CI users, the substantially reduced SRM observed in the simulated CI condition suggests that real-world communication in multi-talker environments remains challenging, even with spatial separation. Improving access to fine-structure cues or enhancing binaural processing in CI systems may therefore yield significant functional benefits. Meanwhile, the hybrid condition’s intermediate performance suggests potential advantages of EAS strategies that preserve low-frequency acoustic hearing, which has been shown to support improved spatial perception and speech segregation [48].

4.8. Scope and Generalizability

The present findings should be interpreted within the constraints of a simulation-based paradigm using normal-hearing listeners. Performance obtained with simulated CI and EAS speech does not imply that congenital or post-lingually deafened CI users would achieve comparable SRM magnitudes in real-world listening conditions. Instead, the results provide insight into how specific signal-processing constraints—such as reduced spectral resolution and partial restoration of low-frequency timing cues—shape spatial unmasking when other sources of biological and device-related variability are held constant. In this sense, the NH simulation framework offers a controlled means of examining the relative contributions of acoustic and binaural cues to SRM and identifying conditions under which EAS processing can offer advantages over electric-only stimulation. Clinical studies are necessary to determine how these mechanistic effects interact with long-term auditory experience, device use, and neural adaptation in CI and EAS users.

5. Conclusions

In sum, spatial separation consistently improved speech perception, but the extent of this benefit strongly depended on stimulus type. Natural speech provided the greatest advantage, hybrid speech showed moderate benefits, and CI-simulated speech provided the least SRM. These results highlight both the power of spatial hearing in typical listening conditions and the challenges associated with reduced spectral resolution, offering insights relevant to auditory modeling, hearing prosthesis design, and clinical audiology. Together, these results clarify why hybrid configurations often outperform purely electric hearing in multi-talker environments and underscore the clinical value of preserving and fitting low-frequency acoustic hearing to maximize real-world spatial benefit.

Author Contributions

Conceptualization: N.S. and C.P.; Methodology: N.S., B.B., and M.B.; Data Collection: B.B. and M.B.; Data Analysis: N.S., B.B., M.B., and C.P.; Writing—Review and Edition: N.S. and C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partly supported by Towson University’s College of Health Professions’ Summer Undergraduate Research Internship awarded to Bailey Borkowski and Towson University’s Seed Funding Grant awarded to Nirmal Srinivasan.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Towson University Institutional Review Board (Approval #1703016568, approval date 9 April 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Acknowledgments

The authors would like to thank all individuals who participated in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cherry, E.C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 1953, 25, 975–979. [Google Scholar] [CrossRef]
Culling, J.F.; Lavandier, M. Binaural unmasking and spatial release from masking. In Binaural Hearing; Litovsky, R.L., Goupell, M.J., Fay, R.R., Popper, A.N., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 209–241. [Google Scholar]
Litovsky, R.Y. Spatial release from masking. Acoust. Today 2012, 8, 18–25. [Google Scholar] [CrossRef]
Arbogast, T.L.; Mason, C.R.; Kidd, G., Jr. The effect of spatial separation on informational masking of speech in normal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am. 2005, 117, 2169–2180. [Google Scholar] [CrossRef]
Marrone, N.; Mason, C.R.; Kidd, G., Jr. The effects of hearing loss and age on the benefit of spatial separation between multiple talkers in reverberant rooms. J. Acoust. Soc. Am. 2008, 124, 3064–3075. [Google Scholar] [CrossRef]
Srinivasan, N.K.; Jakien, K.M.; Gallun, F.J. Release from masking for small spatial separations: Effects of age and hearing loss. J. Acoust. Soc. Am. 2016, 140, EL73–EL78. [Google Scholar] [CrossRef] [PubMed]
Srinivasan, N.; McCannon, S.; Patro, C. Spatial Release from Masking for Small Spatial Separations Using Simulated Cochlear Implant Speech. J. Otorhinolaryngol. Hear. Balance Med. 2024, 5, 18. [Google Scholar] [CrossRef]
Srinivasan, N.; Stansell, M.; Gallun, F.J. The role of early and late reflections on spatial release from masking: Effects of age and hearing loss. J. Acoust. Soc. Am. 2017, 141, EL185–EL191. [Google Scholar] [CrossRef]
D’Onofrio, K.; Richards, V.; Gifford, R. Spatial Release from Informational and Energetic Masking in Bimodal and Bilateral Cochlear Implant Users. J. Speech Lang. Hear. Res. 2020, 63, 3816–3833. [Google Scholar] [CrossRef]
Li, C.; Kuhlmey, M.; Kim, A.H. Electroacoustic Stimulation. Otolaryngol. Clin. North. Am. 2019, 52, 311–322. [Google Scholar] [CrossRef] [PubMed]
Reinhart, P.; Parkinson, A.; Gifford, R.H. Hybrid Cochlear Implant Outcomes and Improving Outcomes with Electric-Acoustic Stimulation. Otol. Neurotol. 2024, 45, e749–e755. [Google Scholar] [CrossRef]
Loizou, P.C. Speech Processing in Vocoder-Centric Cochlear Implants. Adv. Oto-Rhino-Laryngol. 2006, 64, 109–143. [Google Scholar]
Chen, F.; Loizou, P.C. Predicting the Intelligibility of Vocoded Speech. Ear Hear. 2011, 32, 331–338. [Google Scholar] [CrossRef]
Garadat, S.N.; Litovsky, R.Y.; Yu, G.; Zeng, F.G. Role of binaural hearing in speech intelligibility and spatial release from masking using vocoded speech. J. Acoust. Soc. Am. 2009, 126, 2522–2535. [Google Scholar] [CrossRef]
Cychosz, M.; Xu, K.; Fu, Q.-J. Effects of spectral smearing on speech understanding and masking release in simulated bilateral cochlear implants. PLoS ONE 2023, 18, e0287728. [Google Scholar] [CrossRef] [PubMed]
Williges, B.; Dietz, M.; Hohmann, V.; Jürgens, T. Spatial Release from Masking in Simulated Cochlear Implant Users with and Without Access to Low-Frequency Acoustic Hearing. Trends Hear. 2015, 19, 233121651561694. [Google Scholar] [CrossRef] [PubMed]
Peng, Z.E.; Litovsky, R.Y. Novel approaches to measure spatial release from masking in children with bilateral cochlear implants. Ear Hear. 2022, 43, 101–114. [Google Scholar] [CrossRef]
König, C.; Baumann, U.; Stöver, T.; Weissgerber, T. Impact of Reverberation on Speech Perception in Noise in Bimodal/Bilateral Cochlear Implant Users with and without Residual Hearing. J. Clin. Med. 2024, 13, 5269. [Google Scholar] [CrossRef] [PubMed]
Imsiecke, M.; Krüger, B.; Büchner, A.; Lenarz, T.; Nogueira, W. Interaction Between Electric and Acoustic Stimulation Influences Speech Perception in Ipsilateral EAS Users. Ear Hear. 2019, 41, 868–882. [Google Scholar] [CrossRef]
Kipping, D.; Nogueira, W. A Computational Model of a Single Auditory Nerve Fiber for Electric-Acoustic Stimulation. J. Assoc. Res. Otolaryngol. 2022, 23, 835–858. [Google Scholar] [CrossRef]
Kipping, D.; Zhang, Y.; Nogueira, W. A Computational Model of the Electrically or Acoustically Evoked Compound Action Potential in Cochlear Implant Users with Residual Hearing. IEEE Trans. Biomed. Eng. 2024, 71, 3192–3203. [Google Scholar] [CrossRef]
Bolia, R.S.; Nelson, W.T.; Ericson, M.A.; Simpson, B.D. A speech corpus for multitalker communications research. J. Acoust. Soc. Am. 2000, 107, 1065–1066. [Google Scholar] [CrossRef]
Hawley, M.L.; Litovsky, R.Y.; Culling, J.F. The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer. J. Acoust. Soc. Am. 2004, 115, 833–843. [Google Scholar] [CrossRef] [PubMed]
Shannon, R.V.; Zeng, F.; Kamath, V.; Wygonski, J.; Ekelid, M. Speech Recognition with Primarily Temporal Cues. Science 1995, 270, 303–304. [Google Scholar] [CrossRef] [PubMed]
Friesen, L.M.; Shannon, R.V.; Baskent, D.; Wang, X. Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants. J. Acoust. Soc. Am. 2001, 110, 1150–1163. [Google Scholar] [CrossRef] [PubMed]
Fu, Q.; Nogaki, G. Noise Susceptibility of Cochlear Implant Users: The role of Spectral Resolution and Smearing. J. Assoc. Res. Otolaryngol. 2005, 6, 19–27. [Google Scholar] [CrossRef]
Greenwood, D.D. A cochlear frequency-position function for several species—29 years later. J. Acoust. Soc. Am. 1990, 87, 2592–2605. [Google Scholar] [CrossRef]
Başkent, D. Effect of speech degradation on top-down repair: Phonemic restoration with simulations of cochlear implants and combined electric–acoustic stimulation. J. Assoc. Res. Otolaryngol. 2012, 13, 683–692. [Google Scholar] [CrossRef]
Başkent, D.; Chatterjee, M. Recognition of temporally interrupted and spectrally degraded sentences with additional unprocessed low-frequency speech. Hear. Res. 2010, 270, 127–133. [Google Scholar] [CrossRef]
Qin, M.K.; Oxenham, A.J. Effects of introducing unprocessed low-frequency information on the reception of envelope-vocoder processed speech. J. Acoust. Soc. Am. 2006, 119, 2417–2426. [Google Scholar] [CrossRef]
Büchner, A.; Schüssler, M.; Battmer, R.D.; Stöver, T.; Lesinski-Schiedat, A.; Lenarz, T. Impact of low-frequency hearing. Audiol. Neurotol. 2009, 14, 8–13. [Google Scholar] [CrossRef]
Reiss, L.A.; Turner, C.W.; Erenberg, S.R.; Gantz, B.J. Changes in pitch with a cochlear implant over time. J. Assoc. Res. Otolaryngol. 2007, 8, 241–257. [Google Scholar] [CrossRef]
Zahorik, P. Perceptually relevant parameters for virtual listening simulation of small room acoustics. J. Acoust. Soc. Am. 2009, 126, 776–791. [Google Scholar] [CrossRef]
Allen, J.B.; Berkley, D.A. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 1979, 65, 943–950. [Google Scholar] [CrossRef]
Levitt, H. Transformed Up-Down Methods in Psychoacoustics. J. Acoust. Soc. Am. 1971, 49, 467–477. [Google Scholar] [CrossRef]
Oxenham, A.J.; Simonson, A.M. Masking release for low-and high-pass-filtered speech in the presence of noise and single-talker interference. J. Acoust. Soc. Am. 2009, 125, 457–468. [Google Scholar] [CrossRef] [PubMed]
Yost, W.A. Spatial release from masking based on binaural processing for up to six talkers. J. Acoust. Soc. Am. 2017, 141, 2093–2106. [Google Scholar] [CrossRef]
Jones, G.L.; Litovsky, R.Y. A cocktail party model of spatial release from masking by both noise and speech interferers. J. Acoust. Soc. Am. 2011, 130, 1463–1474. [Google Scholar] [CrossRef]
Best, V.; Ozmeral, E.; Gallun, F.J.; Sen, K.; Shinn-Cunningham, B.G. Spatial unmasking of birdsong in human listeners: Energetic and informational factors. J. Acoust. Soc. Am. 2005, 118, 3766–3773. [Google Scholar] [CrossRef]
Kidd, G.; Mason, C.R.; Best, V.; Marrone, N. Stimulus factors influencing spatial release from speech-on-speech masking. J. Acoust. Soc. Am. 2010, 128, 1965–1978. [Google Scholar] [CrossRef] [PubMed]
Fitzgerald, M.B.; Kan, A.; Goupell, M.J. Bilateral Loudness Balancing and Distorted Spatial Perception in Recipients of Bilateral Cochlear Implants. Ear Hear. 2015, 36, e225–e236. [Google Scholar] [CrossRef]
Kan, A.; Litovsky, R.Y. Binaural hearing with electrical stimulation. Hear. Res. 2015, 322, 127–137. [Google Scholar] [CrossRef] [PubMed]
Ihlefeld, A.; Litovsky, R.Y. Interaural level differences do not suffice for restoring spatial release from masking in simulated cochlear implant listening. PLoS ONE 2012, 7, e45296. [Google Scholar] [CrossRef]
Akeroyd, M.A. Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impaired adults. Int. J. Audiol. 2008, 47, S53–S71. [Google Scholar] [CrossRef] [PubMed]
Rönnberg, J.; Lunner, T.; Zekveld, A.A.; Sörqvist, P.; Danielsson, H.; Lyxell, B.; Dahlström, Ö.; Signoret, C.; Stenfelt, S.; Pichora-Fuller, M.K.; et al. The Ease of Language Understanding (ELU) model: Theoretical, empirical, and clinical advances. Front. Syst. Neurosci. 2013, 7, 31. [Google Scholar] [CrossRef] [PubMed]
Dryden, A.; Allen, H.A.; Henshaw, H.; Heinrich, A. The Association Between Cognitive Performance and Speech-in-Noise Perception for Adult Listeners: A Systematic Literature Review and Meta-Analysis. Trends Hear. 2017, 21, 233121651774467. [Google Scholar] [CrossRef]
Loiselle, L.H.; Dorman, M.F.; Yost, W.A.; Gifford, R.H. Sound source localization by hearing preservation patients with and without symmetrical low frequency acoustic hearing. Audiol. Neurotol. 2015, 20, 166–171. [Google Scholar] [CrossRef]
Gifford, R.H.; Davis, T.J.; Sunderhaus, L.W.; Menapace, C.; Buck, B.; Crosson, J.; O’Neill, L.; Beiter, A.; Segel, P. Combined electric and acoustic stimulation with hearing preservation: Effect of cochlear implant low frequency cutoff on speech understanding and perceived listening difficulty. Ear Hear. 2017, 38, 539–553. [Google Scholar] [CrossRef]
Incerti, P.V.; Ching, T.Y.C.; Cowan, R. A systematic review of electric–acoustic stimulation: Device fitting ranges, outcomes, and clinical fitting practices. Trends Amplif. 2013, 17, 3–26. [Google Scholar] [CrossRef]
Shinn-Cunningham, B.G. Object-based auditory and visual attention. Trends Cogn. Sci. 2008, 12, 182–186. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The (left panel) displays speech identification thresholds, defined as the target-to-masker ratio (TMR) needed to correctly recognize 50% of the target items, for natural speech, while the (middle panel) presents the corresponding thresholds for the EAS speech and the (right panel) for simulated CI speech across all tested spatial separations. Within each panel, the dark black line represents the group mean threshold, and the light blue lines show the thresholds for individual listeners, with each listener’s data marked by unique symbols. Error bars reflect ±1 standard error of the mean (SEM). Blue stars denote spatial separations at which thresholds differed significantly from the colocated (0°) condition.

Figure 2. Spatial release from masking (SRM; defined as the difference between colocated and separated speech identification thresholds) for natural speech (darker bars), EAS (medium gray bars) and simulated CI speech (lighter bars) at all the spatial separations tested in the study.

Figure 3. Scatterplot showing the relationship between natural, EAS, and simulated CI speech identification thresholds at all spatial separations. Individual participants are used as points in each of the panels. Pearson’s correlation is shown as well. * indicates correlations significant at p < 0.05 level and ** indicates correlations significant at p < 0.01 level.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Srinivasan, N.; Borkowski, B.; Barkhouse, M.; Patro, C. Spatial Release from Masking with Simulated Electric–Acoustic and Cochlear Implant Speech. J. Otorhinolaryngol. Hear. Balance Med. 2026, 7, 15. https://doi.org/10.3390/ohbm7010015

AMA Style

Srinivasan N, Borkowski B, Barkhouse M, Patro C. Spatial Release from Masking with Simulated Electric–Acoustic and Cochlear Implant Speech. Journal of Otorhinolaryngology, Hearing and Balance Medicine. 2026; 7(1):15. https://doi.org/10.3390/ohbm7010015

Chicago/Turabian Style

Srinivasan, Nirmal, Bailey Borkowski, Morgan Barkhouse, and Chhayakanta Patro. 2026. "Spatial Release from Masking with Simulated Electric–Acoustic and Cochlear Implant Speech" Journal of Otorhinolaryngology, Hearing and Balance Medicine 7, no. 1: 15. https://doi.org/10.3390/ohbm7010015

APA Style

Srinivasan, N., Borkowski, B., Barkhouse, M., & Patro, C. (2026). Spatial Release from Masking with Simulated Electric–Acoustic and Cochlear Implant Speech. Journal of Otorhinolaryngology, Hearing and Balance Medicine, 7(1), 15. https://doi.org/10.3390/ohbm7010015

Article Menu

Spatial Release from Masking with Simulated Electric–Acoustic and Cochlear Implant Speech^†

Abstract

1. Introduction