1. Introduction
Cochlear implants (CIs) bypass the outer and middle ear of a person with sensorineural hearing loss to deliver electric stimulation directly to the auditory nerve in the cochlea, restoring functional hearing to the user. It is considered to be one of the most successful and prolific sensory prostheses, having passed more than one million implantations in 2022 [
1]. CI users generally have good speech understanding in quiet environments and are usually capable of having conversations on the phone, but perception of more complex sounds such as speech-in-noise and music remains problematic for many CI users [
2].
In many ways, the CI aims to mimic biological hearing processes. Pitch is conveyed using the natural tonotopic organization of the cochlea, with high frequencies at the base and low frequencies at the apex of the cochlea [
3]. Loudness is encoded in the cochlea by the number of activated fibers and the firing rate of these fibers. CIs increase perceived loudness by increasing charge delivered to the auditory nerve, which in turn leads to a broader excitation pattern and a higher firing rate of the auditory nerve cells. This mimicry works rather well, but this electrical equivalent of neural activation leads to new challenges, such as current spread that reduces the pitch specificity and a reduced number of loudness steps that can be perceived [
2]. As a result, for CI users, the difference between a just-hearable sound, also referred to as a threshold-level sound, and a comfortable sound is much smaller than for those with normal hearing. This smaller dynamic range can negatively impact speech intelligibility, which partially relies on intensity variations for the identification of vowels and consonants [
4] and music perception [
2]. Another issue is the interaction between pitch and loudness. In normal hearing, sound presented at the same sound pressure level at different frequencies is not perceived as equally loud; middle-frequency sounds are perceived louder than high-frequency sounds. Typically, CIs use a pre-emphasis filter to emulate this frequency sensitivity, which compresses low-frequency sound and amplifies high-frequency sound. In practice, this internal amplification means that a high-frequency sound will reach the defined stimulation threshold sooner than a low-frequency sound of the same sound level.
Over the years, CI manufacturers have developed different implants and speech coding strategies (SCSs) to encode sound. Similarly, they have implemented different ways to deliver the charge to the auditory nerve, such as increasing the pulse rate per channel (pps/channel), changing the pulse shape (such as varying interphase or interpulse gaps), the number of active electrode contacts, and more (see McKay [
5] for a comprehensive overview). These variations are partially driven by the parameters of the hardware platform. Changing these will result in more neural activation and, therefore, an increase in perceived loudness. However, it is difficult to compare the strategies and understand these differences, as many different types of testing are used to investigate implant performance, which are very often platform-specific. Additionally, limited numbers of participants make it difficult to find statistically relevant data. Another downside is the inability to compare speech coding strategies within a patient, as different speech coding strategies are manufacturer-specific. This means that patient-specific differences in anatomy and physiology, as well as the interpretation of the patient, always play a role in any finding.
Interestingly, patient testing with different speech coding strategies of different brands sometimes shows opposing results for the same parameter. Balkany et al. [
6] found that low stimulation rates between 500–900 pps/channel were preferred by Nucleus Contour device users (from Cochlear Inc., Sydney, Australia) and did not hinder speech understanding. However, experiments with implants from MedEl demonstrated a preference for high stimulation rates of 1200–1600 pps [
7]. For Advanced Bionics (AB, Valencia, CA, USA) implants, increasing stimulation rate did not influence speech understanding, but it did lower T-level and thus increased the dynamic range [
8]. Another difficulty in comparing strategies is even in definitions of fitting settings; for instance, both define T-level as the first noticeable sound, but Advanced Combination Encoder (ACE), developed by Cochlear, works with a comfortable level (C-level) and HiRes Fidelity 120 (F120), developed by Advanced Bionics, maps a most comfortable level (M-level).
These differences between strategies have been described from multiple angles in multiple ways. Wolfe [
9] published a book for audiologists that describes different SCSs and their fittings. Vaerenberg et al. [
10] developed a graphical representation to compare loudness encoding in different implants, from input to the output of the implant in current. McKay [
5] modeled the effect of different pulse shape parameters on auditory nerve activation. However, as far as we are aware, no study has been conducted yet that considers every step from sound to processor to auditory nerve, and investigates the effects of one on the other. Using a computational model instead of patient experiments enables an objective comparison of different speech coding strategies in the same virtual cochlea. In this way, physiological differences are bypassed, and no patients are needed, so only the effect of the speech encoding on the auditory nerve is investigated.
In this paper, we assess how choices in signal encoding of two different SCSs (ACE and F120) impact the activation of the auditory nerve. Research versions of ACE and F120 strategies were kindly made available to us by the respective manufacturers (described in detail by Nogueira et al. [
11,
12]), which allow us to simulate an SCS as close as possible to the clinical version. Unfortunately, research versions of other manufacturers were not available to us.
The ACE strategy of Cochlear drives an array with 22 electrode contacts, and has a default stimulation rate of 900 pps/channel. This strategy uses pre-emphasis, and it does not activate a channel before an internal base loudness level (assigned to T-level) is reached. Within the electric dynamic range, loudness is logarithmically encoded. Once the internal base level is reached, the electrode contact starts stimulating at T-level [
11]. Additionally, a stimulus above the internal saturation level (assigned to C-level) does not increase the electrode contact’s output. Another notable feature of this strategy is the use of the N-over-M paradigm. Per stimulation cycle, N electrode contacts are activated with the most energy in this cycle of the M available channels. For the ACE strategy, N is fixed most often at 8, and M is the total number of electrode contacts (22).
The F120 strategy from Advanced Bionics is designed for devices with 16 electrode contacts and employs dual electrode stimulation, or more commonly called current steering, rather than monopolar stimulation. With current steering, two electrode contacts are stimulated simultaneously, which creates an intermediate pitch percept between those elicited by the individual contacts [
13]. F120 uses a slightly different pre-emphasis, but unlike in the ACE strategy, individual channels are also stimulated below their T-levels. In each stimulation cycle, all 15 electrode contact pairs will be activated, but when the sound level is too low, this stimulation is subthreshold. As a result, a constant stimulation rate of 1850 pps/channel is achieved. In this strategy, the loudness is linearly encoded up to the most comfortable level (M-level), after which the current increase is compressed with a 1:12 ratio. Typically, no inter-pulse gaps are used on the AB platforms.
In this study, we aim to fit both strategies in the same virtual patient by finding analogous fitting definitions. However, due to the difference in pre-emphasis, the T-levels of both strategies are reached at different sound levels. Additionally, the compared processors have some significant differences in the definition of the upper level (C-level/M-level). Therefore, it would be incorrect to assume that these values are the same at a processing or fitting level. Vaerenberg et al. [
10] worked around this issue by only comparing responses to inputs of one frequency. We opted for a different approach and created a way to equalize stimulus loudness at T-level. This is explained in greater detail in the Methods section. Lastly, in normal clinical practice, patient fittings are done with pulse trains. However, in this paper, fittings are done using acoustic pure tone waves, just as Vaerenberg et al. [
10] did in their loudness comparison, building on the assumption that a loud sound would excite a comparable number of auditory nerve fibers, irrespective of the type of implant.
This paper is based on the implanted cochlea model developed at the Leiden University Medical Center, now extended to run both F120 and ACE. In this computational model study, we aim to bring insights into loudness encoding in cochlear implants of two different manufacturers. We provide results from both the neural and the electrical outputs obtained for ACE and F120 for all electrode contacts at a wide range of levels of loudness.
2. Materials and Methods
Computational model. To model the neural response to different strategies, the latest version of the implanted cochlea model was used [
14,
15,
16]. This model presents a pipeline that consists of multiple segments: (1) a volume conduction model, which calculates simulated electrical potentials along the auditory neurons in a realistic three-dimensional geometry of the implanted human cochlea, (2) an active nerve fiber (ANF) model with human kinetics [
17], which calculates deterministic neural responses to the electrical potentials from the volume conduction model, and (3) a stochastic model PHAST+, which extends the deterministic single fiber thresholds with stochasticity, adaptation, and accommodation [
14,
18,
19]. The cochlear geometry used in this study is based on µCT imaging data from a human temporal bone. Inside the model geometry, realistic neural trajectories are defined according to histological data and simulated electrical potentials are calculated at their nodes of Ranvier and internodal segments [
14,
18].
Figure 1 shows a sliced-open version of this cochlear model geometry with one of the modeled electrode arrays (
Figure 1A), as well as a top view to portray the neural trajectories modeled with two different implant arrays (
Figure 1B,C). These modeled electrode arrays were chosen to match the patient population of our clinic and have been the most validated in previous research. Similarly, we have chosen to stick to the validated versions of the available toolboxes of the speech coding strategies. Our aim is to portray loudness percepts of two different clinical strategies and not include research versions of these strategies that employ a range of parameters not often used in the clinic. However, to exclude the effects of position and emphasize the differences purely in strategy, an initial calibration was done with hypothetical lateral wall electrode arrays modeled in the same position (see
Figure 2). These arrays have the same shape and dimension of electrode contacts, with only the number of contacts and distance between them varying for each strategy.
The combined implanted cochlea model takes an electrodogram, created by a speech coding strategy, as input and generates spiking output for 3200 fibers, resulting in a neurogram. Initially, the implanted cochlea model only worked with the research version of HiRes Fidelity 120 (F120) developed by Advanced Bionics. For this paper, the model was extended to also run ACE. For the ACE simulations, a realistic representation of the Nucleus Slim Straight electrode array was used (
Figure 1A,B), and for the F120 simulations, a model of the HiFocus Mid-Scala was made, based on geometric data provided by Advanced Bionics (
Figure 1C). The insertion angle of the most basal electrode contact of HiFocus Mid-Scala was 27 degrees from the round window, and for the most apical electrode contact, 420 degrees from the round window. For Nucleus Slim Straight, these values were 38 and 455 degrees, respectively. All cochlear angles are determined according to the consensus coordinate system described by Verbist et al. [
20].
Speech coding strategies. The code of both SCSs was provided by their respective companies as research versions of clinical strategies used in the implants. Both these research versions allowed T- and C-/M-level adjustment and produced an electrodogram in response to sound. We used the T- and C-/M-levels found with the ANF model; however, a known shortcoming is that the fiber thresholds are about a factor of 3 higher than clinical thresholds. To correct for this known systemic error, the T-, C-, and M-levels were scaled down with a factor of 3, thereby avoiding the SCS’s internal current output limit.
The speech processor for F120 was taken from the open-source Advanced Bionics Generic-Python-Toolbox. This is code for the spectral resolution strategy (SpecRes), a research version of the commercial HiRes with Fidelity 120 strategy [
12]. To match both clinical results and loudness curves described by Vaerenberg et al. [
10], the scaling of the audiomixer function was set to relative, using a full scale of 109.6 dB. The acoustic dynamic range was set to 40 dB. This is not to be compared to the IDR setting in the fitting software. Noise reduction was disabled; automatic gain control (AGC) and pre-emphasis were enabled.
For ACE, the Nucleus MATLAB toolbox (NMT, in MATLAB 2023a), by Cochlear was used to simulate the Cochlear Nucleus CI. As described by Nogueira et al. [
11], the default values of the ACE toolbox set the base-level or T-level at 33.86 dB and the saturation level or C-level at 65.35 dB. To make the default values more reflective of the clinic and Vaerenberg et al. [
10], the level of C was maintained at 65 dB, but the level of T was changed to 25 dB resulting in a dynamic range of 40 dB. Specifically for this study, a dB scaling was added at the start of the processing chain to scale audio inputs to these known T- and C-level outputs. Similar to F120, AGC and pre-emphasis were enabled.
Other relevant settings can be found in
Table 1. An overview of both strategies is shown in
Figure 3. The electrodograms of
Figure 3 show the biphasic cathodic-first rectangular pulses that both strategies use, as well as the sequential stimulation order for ACE versus simultaneous stimulation order for F120.
Loudness scaling. For comparison of the output of different electrode contacts, a type of frequency-specific loudness scaling was needed, as pre-emphasis causes low-frequency sounds to reach the T-level of the implant at higher sound pressure levels (SPLs) than high-frequency sounds. A similar principle is used to create audiograms in pure-tone audiometry, using ISO equal-loudness contours, which scale input in dB SPL to dB HL. Similarly, we developed a conversion table that relates a stimulus in dB SPL to a converted scale: ‘dB T’, which equalizes stimuli at all frequencies to their implant-specific threshold level. A stimulus at 25 dB T for each electrode contact leads to T-level electric stimulation. This table can be found in
Appendix A.
Stimuli. Pure tones of 1 s were generated for each electrode contact’s center electrode frequency (CEF), the auditory frequencies assigned to specific contacts by ACE and F120. For ACE, the electrode contacts are enumerated from base to apex, meaning E1 is assigned the highest frequency of 7437 Hz (CEF1), whereas for F120, the lowest frequency (CEF1) is assigned to E1 in the apex. All stimuli were scaled using the conversion table from
Appendix A and the implants’ dB scaling. Cosine-squared ramps were applied to the start of the stimulus (0.05 s).
Metric for neural response. In previous studies conducted with the implanted cochlea model, T- and M-level stimulation was defined as a number of activated fibers (expressed in width of the excitation area) [
21,
22]. In this paper, a next step was taken to quantify loudness in this model by summing the spikes of the stable part of the signal to achieve an expression of loudness in total spike rate (spikes/s). Our neural activation stabilized after 0.2 s of stimulation (a combined effect of the ramp and the AGC). To reduce the influence of spontaneous activity and possible cross-turn activation, relevant neural fibers per electrode contact were selected, whose activation would be caused by that specific electrode contact. For all stimuli, 640 out of 3200 fibers were selected and based on the maximum width of activation observed, to not exclude relevant data. During calibration, all electrode contact amplitudes except for the center electrode contact (CE) of the CEF and its neighboring electrode contacts (CE–1 and CE+1) were set to 0. This was done to reduce the effect of subthreshold pulses. For CEF2, this would mean that the CE (E2), as well as E1 and E3, were included in the calibration.
Neural baseline determination. Spike rates were used to indicate when a sound was at a threshold or at a comfortable level. These rates were determined in an experiment where modified or hypothetical electrode array geometries for ACE and F120 simulations were placed in such a way that their electrode contact allocated to 1250 Hz was at the same position in the cochlea along the lateral wall, to mitigate any effects of electrode contact positioning. For ACE, this was E14; for F120, this was E8. This frequency was chosen because both SCSs have very similar CEFs around this value and are only marginally affected by the pre-emphasis of each strategy. Stimuli presented at 25 and 65 dB T resulted in minimal spike rate differences between the strategies. In the end, the spike rates found at this electrode contact of F120 were chosen, as this SCS has previously been investigated using this model.
Implant fitting. After scaling the stimulus loudness and determining the neural baseline, the implants were fitted at all CEFs. We hypothesized that a just-noticeable sound and a sound of 65 dB should have the same spike rate for both SCSs on all contacts to be detected as equal by a subject, similar to other computational models that assume a close relationship between total neural excitation and loudness perception [
23,
24].
For each electrode contact, stimuli with the corresponding CEF were presented at 25 and 65 dB T, and spike rates were calculated. T- and C- or M-levels of the SCSs were adjusted, resulting in a changed pulse amplitude until the total spike rate closely matched the previously defined total spike rates of threshold and comfortable level, respectively.
Loudness growth curves. After calibrating all electrode contacts individually, stimuli from 0 to 110 dB T with 5 dB T step sizes were created for each electrode contact’s CEF to generate input-output curves. For F120, subthreshold activation was now enabled for all electrode contacts, similar to clinical implementation. Spike rates were calculated in the same way as explained previously.
Electrodograms were analyzed by calculating the charge delivered and evaluating the maximum amplitude of each electrode contact. Similar to the neural analysis, only the stable part of the electrodogram was included. As F120 always has all electrode contacts activated with subthreshold activation, a lot of charge is delivered over the entire cochlea that does not directly impact the neural loudness percept. In calculations, only the CE and its neighbors (CE − 1 and CE + 1) were included, as these electrode contacts significantly impact the spiking rate and are used in the loudness coding.
4. Discussion
This paper aimed to increase our understanding of loudness encoding in two different cochlear implant devices from different manufacturers by examining the whole system from sound to auditory nerve responses. The implanted cochlea model was extended to run both F120 and ACE, and pure sine waves were used to calibrate loudness to include every aspect of both SCSs. This work has highlighted notable differences between the two strategies, for instance, the different patterns in the area of excitation. By utilizing pure tones as input to the SCSs, we could also show the difference in recruitment of electrodes over the loudness growth curves. This use of sound waves reflects the loudness encoding similar to the real-world responses, contrary to the use of simple pulse trains. It is important to note that all differences described here between the SCSs do not say anything about their clinical performance, but merely explain how loudness is conveyed by both strategies.
The neurograms shown in
Figure 6 show a clear difference at a neural level in the way fibers are activated per SCS. When loudness increases, ACE induces an active increase in area of excitation by recruiting neighboring electrode contacts (see
Figure 9), whereas the increase in area of activation for F120 was mainly the effect of current spread, rather than active use of more electrode contacts. F120 first increases the individual spike rate locally. This can be a fundamental difference in the way the SCSs encode loudness; however, the fact that ACE is an N-of-M speech coding strategy may also play a role. N-of-M strategies stimulate the (in this case) eight most dominant analysis bands. With a pure tone stimulus, an increase in energy of the signal may cause energy to spread to neighboring channels because of partially overlapping analysis filters. As there is no other frequency present, this automatically means that this energy spread is perceived as dominant, which is then processed as such by the ACE SCS and leads to neighboring electrode contract recruitment. However, with these pure tone stimuli, the maximum number of dominant bands was always lower than eight.
The loudness growth curves of
Figure 7 clearly illustrate the response of the auditory nerve to different sound levels. It can be seen that F120 stimulates below 25 dB T, while ACE turns on abruptly, matching literature descriptions of the strategies [
11,
12]. For ACE, the electrode contacts reach C-level stimulation at 65 dB T and plateau after the defined C-level, while F120 continues to increase above M-level, albeit with the defined 1:12 compression defined in the AGC [
12].
Comparing
Figure 7 and
Figure 8 demonstrates that ACE (with the Slim Straight array) can reach equal loudness with less total charge than F120 (with the Mid-Scala array). This has several reasons. Firstly, ACE recruits more auditory nerve fibers than F120 by broadening their signal along the electrode array, i.e., recruitment of adjacent electrodes. Secondly, it may also be the effect of the difference in pulse shape between the strategies. ACE uses an interphase gap (
Table 1, also visible in the electrodogram of
Figure 3), which increases the responsiveness of the auditory nerve fiber to the first phase of the pulse [
5], as we could also observe in our model. Thirdly, the generally higher charge delivered by F120 can be due to the higher stimulation rate of this strategy. It would be interesting to match the stimulation rate and pulse shape of ACE to that of F120 to test this hypothesis. Based on our findings and the literature, we expect that the charge needed for ACE to reach a certain spike level would be more similar to F120 than it is now. Fourthly, another difference between the two implants is the position of the electrode arrays. The Slim Straight array is situated close to the lateral wall, whereas the Mid-Scala array is in the center of the scala tympani. A greater distance from the modiolus results in more current spread and thus a wider area of activation (see [
25] or other comparable models).
Similarly, this effect of greater spread is visible in
Figure 6. E14 of ACE is situated farther from the modiolus than E8 of F120 (see red electrode contacts in
Figure 1B,C) and has a wider area of activation even when only one electrode is active at T-level (
Figure 6A,B). This spread was not present in the hypothetical lateral wall arrays, which were positioned at the same distance from the modiolus (
Figure 4). Although the arrays are placed in identical cochlear geometries, the array of ACE is modeled to be implanted slightly deeper into the cochlea, based on clinical CT-scans (
Figure 1), which is reflected in the position of excitation in
Figure 6. Interestingly, although the electrode array position has an effect on the width of activation, this is not reflected in large differences in necessary total charge (
Figure 5B and
Figure 8).
The difference in charge build-up, as shown in
Figure 8, has the same pattern as the spike rates of
Figure 7. The sudden strong activation of neighboring electrode contacts in ACE leads to jumps in spike rate, whereas the gradual charge increase on each electrode contact of F120 is reflected in a smoother increase of the spike rate. It would be interesting to investigate if the difference in smooth or jumping behavior of these neural loudness growth functions is reflected in patient perception. Tak and Yathiraj [
26] researched intensity discrimination in implanted and normal hearing children (13 out of 15 with ACE) and found that, especially in high frequencies, implanted children had poorer intensity discrimination than their normal hearing counterparts. The highest stimulus used was 4000 Hz, comparable to CEF5 and CEF6 in
Figure 7 and
Figure 8, which show the shallowest buildup in charge and neural spikes for these inputs, which is in line with a higher just noticeable difference for loudness.
Figure 7 also shows that for many CEFs, F120 reaches the defined M-level in spike rate below 65 dB T and that AGC compression already occurs below M-level. In the literature there are different default values for the AGC knee-point; Vaerenberg et al. [
10] shows compression from 60 dB, Wolfe [
9] states that the default value for AGC is around 63 dB; the research code provided by AB defines it as −53.6 dB below the full scale maximum, which would mean 56 dB in this study. In all cases, that means that the fitting of M-level at 65 dB always happens in the compression zone of the AGC. In this study, the main difference between the fitting and the final stimulation was the enabling of subthreshold pulses for F120 over all electrode contacts for the final run, whereas they were only enabled on three electrode contacts during fitting. Apparently, this activation of subthreshold pulses over the entire cochlea changes the responsiveness of the auditory nerve fibers in such a way that M-level stimulation is reached at lower sound levels.
In this study, pure tone stimuli were used in the calibration of the two SCSs. This differs from clinical practice, where the fitting of T- and C- or M-level is done with pulse trains. These pulse trains do not take the effect of the pre-emphasis into account, and are not affected by the AGC or subthreshold pulses used in real stimulation. However, this study demonstrates that these factors have quite a large effect on the way loudness is encoded. Computational model studies that exclude these front-end processes, and psychophysical studies that directly work with electric pulse trains, might therefore oversimplify loudness encoding.
In previous work, the T- and M-level stimulation of this model was expressed in excitation width along the basilar membrane [
22], which was partly informed by model simulations of psychophysical experiments [
21]. We have built upon this work to set our baseline spike rates, albeit several characteristics were different, such as array type and included stimuli. Although the spike rate values chosen for our definitions of T- and C- or M-levels are therefore to some extent arbitrary, using different values would influence all electrode contacts equally, so the comparisons between the strategies made in this paper would still be unaffected.
There are also important limitations to this study. Loudness is generally hypothesized to depend on the total integrated activity of the cochlear nerve, but the true neural correlates have yet to be elucidated [
27]. We have based our method on this hypothesis and may have oversimplified the true workings of loudness coding, as we do not take higher-level processing or neural plasticity into account, which may also affect loudness perception. Another limitation is the validity of the research versions of the SCSs. Our settings used in the strategies have been verified as much as possible by literature and through personal correspondence with the respective companies, but there may still be differences between our version and the clinically implemented SCSs. Additionally, specific pre- or post-processing created by both companies to increase speech-understanding in noisy environments can affect loudness as they repress the signal of noisy channels and increase lower energy channels, improving speech-in-noise perception [
28]. However, the pure-tone stimuli used in this paper were perceived as “noise” by these algorithms, resulting in suppression of the signal we are trying to measure.
In future studies, this implanted cochlea model can be used to objectively investigate many more differences between SCSs. It is a diverse tool that can be extended to run with other SCSs, also from different manufacturers, like Med-El, whose SCSs were, unfortunately, not available to us for this study. A major area of interest is the encoding of frequency information. Many studies have investigated pitch encoding showing electrodograms, patient outcomes, neural activation, or a combination of these (e.g., see [
29] for a comparison between three implants based on pure tones, [
30] for ACE with harmonic complexes, or see [
31] for place and temporal pitch encoding in three different implants). This subject is beyond the scope of the current paper, but future work will focus on including this topic in our modeling pipeline. Other research topics could include looking at the stimuli used in psychophysical experiments, the interaction between perceived pitch and loudness, or temporal encoding. The focus could also be on CI-specific characteristics such as changing pulse rate, pulse shape, or dynamic range of the implants. It is also possible to examine different electrode array designs, cochlear anatomies, and pathologies, and investigate what happens to a neural signal if auditory nerve fiber health deteriorates or the electrode array is placed in a different position in the cochlea.