Auditory Brainstem–Cortical Anatomy Relates to the Magnitude of Frequency-Following Responses (FFRs) and Event-Related Potentials (ERPs) Coding Speech-in-Noise

Bidelman, Gavin M.; Stirn, Jack R.; Rizzi, Rose; MacLean, Jessica A.; Cheng, Hu

doi:10.3390/neuroimaging1010006

Open AccessFeature PaperArticle

Auditory Brainstem–Cortical Anatomy Relates to the Magnitude of Frequency-Following Responses (FFRs) and Event-Related Potentials (ERPs) Coding Speech-in-Noise

by

Gavin M. Bidelman

^1,2,3,*

,

Jack R. Stirn

¹,

Rose Rizzi

^1,2

,

Jessica A. MacLean

^1,2

and

Hu Cheng

^2,3

¹

Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN 47408, USA

²

Program in Neuroscience, Indiana University, Bloomington, IN 47405, USA

³

Cognitive Science Program, Indiana University, Bloomington, IN 47405, USA

^*

Author to whom correspondence should be addressed.

Neuroimaging 2026, 1(1), 6; https://doi.org/10.3390/neuroimaging1010006

Submission received: 2 January 2026 / Revised: 27 February 2026 / Accepted: 11 March 2026 / Published: 23 March 2026

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Speech-evoked brain potentials provide a window into the neural encoding of speech, experience-dependent plasticity, and deficits in central auditory processing from communication disorders. Stronger and faster frequency-following responses (FFRs) and cortical event-related potentials (ERPs) have been interpreted as reflecting more robust and efficient auditory–sensory processing across brainstem and cortical levels. Importantly, these neural signatures relate to real-world listening skills like speech-in-noise (SIN) perception. How functional FFR/ERPs relate to the underlying anatomical structures that generate these responses in brainstem and cortex is unknown. Methods: Using a multimodal imaging approach, we recorded FFRs and ERPs to clean and noise-degraded speech sounds to assess the strength of listeners’ neural encoding of speech at brainstem (FFR) and cortical (ERP) levels. MRI volumetrics of midbrain and transverse temporal gyrus (Heschl’s gyrus) quantified morphological variation in subcortical and cortical anatomy that underly these EEG potentials. We used the QuickSIN to assess behavioral SIN abilities. Results: We found larger and thicker right (but not left) Heschl’s gyrus was related to listeners’ SIN perception as well as the size of their cortical ERPs. Structural and functional measures interacted at a subcortical level. For listeners with smaller midbrain volumes, larger speech FFRs were associated with better QuickSIN scores, whereas in individuals with larger midbrain volumes, larger FFRs were related to poorer QuickSIN scores. Conclusions: Our findings reveal common functional signatures of speech sound processing (FFRs, ERPs) are related to the anatomy of their underlying generator sources and suggest that both auditory brain structure and function can account for perceptual SIN capacity.

Keywords:

auditory cortex; cocktail party listening; speech-evoked potentials; speech perception; inferior colliculus; structure–function brain relations

Key Contribution: Using EEG and MRI, we show that the amplitude of electrophysiological signatures of speech processing (FFR, ERPs) are associated with the size of their underlying anatomical generators in the midbrain and primary auditory cortex, and with speech-in-noise listening abilities. Our findings expose new structure–function–behavior relations in speech coding that extend across the auditory system hierarchy.

1. Introduction

Everyday listening situations contain noise that hinders speech perception [1]. Issues with speech-in-noise (SIN) understanding are familiar complaints among the elderly and listeners with hearing loss [2], but they are also common in individuals with normal hearing status [3,4,5,6,7]. Perceptual figure-ground deficits are also observed in many language and auditory-based disorders despite normal audiograms [8,9,10,11,12,13]. As such, it is now well recognized that SIN listening skills are determined by more than audibility or peripheral hearing status [1,3,4,5,14,15]. SIN tests are now among the most informative assessments of real-world listening capacity [16,17]. This has led several investigators to examine whether electrophysiological assays might provide a window into how the central auditory nervous system encodes speech sounds, and in turn, whether certain EEG biomarkers might predict SIN skills at the individual level [3,5,15,18].

The brain’s neuroelectric response to speech reflects an aggregate of EEG activity generated from both brainstem and cortical structures [19]. The cortical event-related potentials (ERPs) are composed of several waves (e.g., P1-N1-P2), reflecting activation of auditory thalamus, cortex, and associative areas [20]. The brainstem component, termed the frequency-following response (FFR), is a sustained potential that mirrors acoustic stimuli with high fidelity [21,22,23]. The neural generators (anatomical sources) of the FFR have been described in previous M/EEG studies [4,23,24,25,26,27]. While they reflect a mixture of phase-locked activity from structures throughout the auditory pathways [23,24,25,28,29], FFRs are predominantly of brainstem (midbrain inferior colliculus) origin when recorded to sounds with frequencies >150 Hz [4,23,24]. Links between noise-related changes in both brainstem (FFR) and cortical (ERP) speech-evoked potentials and SIN perception have been widely reported over the past decade, e.g., [3,4,5,6,15,18,30,31,32,33,34]. In general, faster and more robust FFRs/ERPs have been associated with better neural representation of speech and consequently, better speech perception in noise-degraded listening tasks [1,3,5,31]. Collectively, a growing body of studies have suggested speech-FFRs/ERPs can provide a valuable neural index to predict individual differences in SIN processing as well as neuroplastic improvements following auditory training regimens [35,36].

Despite their potential as an objective “barometer” of complex auditory processing [37], there is a surprising dearth of studies examining links between these functional EEG markers and the underlying brain anatomy from which they originate. Presumably, the morphology of auditory structures should constrain their corresponding evoked potentials. Given the basic principles of electrophysiology and volume conduction [38,39], larger size (volume, thickness, area) of the brainstem–cortical auditory pathways should support increased neural synchrony and packing of the neural elements generating electrical brain potentials like FFRs and ERPs.

In this vein, a handful of multimodal imaging studies combining M/EEG and MRI have shown connections between the morphology and neurophysiology of Heschl’s gyrus (HG) and the surrounding auditory cortex. For example, a reduced volume of HG has been observed in patients with tinnitus [40], while larger HG volumes were observed in musicians [41] and listeners with absolute pitch [42]. In particular, Schneider et al. [41] reported evoked MEG responses from the primary auditory cortex were 102% larger and gray matter volume 130% larger in professional musicians. In older adults, subtle declines in gray matter volume of the primary auditory cortex accounted for reduced hearing sensitivity [43]. These studies have established structure–function relationships at a cortical level of the auditory system. However, only a handful of studies have assessed, albeit indirectly, the auditory structure–function relations at the brainstem level.

One study showed that hemodynamic activation in right auditory cortex, as measured via fMRI BOLD signal, was related to individual differences in the strength of cortically based FFRs [44]. Microstructural integrity of the white matter surrounding left primary auditory areas also related to FFR latency, suggesting a link between the fine timing of neural phase-locked responses and left lateralized auditory brain networks [44]. Relatedly, fMRI activation in the midbrain (i.e., inferior colliculus) predicts FFR changes in pitch tracking accuracy of voice fundamental frequency (F0) after auditory learning [45]. While these studies have revealed correspondence between different neuroimaging assays, we are unaware of any study that has assessed whether the volumetric properties of auditory midbrain and cortical anatomy map to functional properties of speech-evoked FFRs and ERPs, respectively.

To this end, we used a multimodal imaging approach to record FFRs and ERPs to clean and noise-degraded speech sounds via EEG to evaluate the strength of listeners’ neural encoding of SIN. MRI volumetrics of the midbrain and HG derived from anatomical MRIs provided insight into morphological variation in subcortical and cortical auditory brain anatomy that underly speech-FFR/ERP responses. We hypothesized that physical differences in auditory brainstem and cortex would have measurable associations with FFR and ERP magnitudes. Our findings reveal that the strength of listeners’ electrophysiological responses to speech are directly related to the structural morphology of their underlying auditory brain anatomy.

2. Materials and Methods

2.1. Participants

We collected EEG and MRI data from N = 30 adults (6 male, 24 female) aged 18–41 years (μ ± σ = 23.3 ± 4.7 years). Sample size was determined a priori to detect a noise-related decrement in speech-FFR magnitude at 80% power based on the effect sizes observed in our prior work (Cohen’s-d = 0.46) [18]. This sample was also at least 2× larger than previous studies that have shown links between speech FFR/ERPs and SIN processing [3,5,30] and was thus well positioned to examine potential structural correlates of these prior EEG-behavior effects. Participants were recruited from the Indiana University student body and Greater Bloomington area. All exhibited normal hearing sensitivity confirmed via audiometric screening (i.e., <25 dB HL, octave frequencies 250–8000 Hz). Most were right-handed (mean laterality index; 74.3 ± 40.7) [46]. All had a collegiate level of education (16.6 ± 3.0 years formal education) and were native speakers of American English. On average, the sample had obtained 8.6 ± 7.9 years (range = 0–35 years) of formal self-reported music training. All were paid for their time and gave informed consent in compliance with a protocol approved by the Institutional Review Board at Indiana University.

2.2. EEG Recording and Analysis

We recorded brainstem (FFRs) and cortical (ERPs) evoked potentials simultaneously as participants monitored vowel sounds (/a/, /i/) presented continuously for ~15 min totaling ~2000 trials of each token, for full EEG task details, see Ref. [18]. Occasional deviants (/u/ vowels; 70 trials) were pseudo-randomly intermixed within the vowel stream. Participants were asked to detect the oddball sounds via a key press. Each vowel was 100 ms with a common voice F0 (=150 Hz). Though the FFR can reflect an aggregate of phase-locked responses from multiple subcortical and cortical sources [23,24,25,47,48,49,50], this F0 was above the phase-locking limit of cortical neurons [51] and helped ensure our FFRs would be mainly of brainstem origin with little to no cortical contribution [4,23,24,52]. Auditory stimuli were delivered at 75 dBA SPL through shielded [18] ER-2 insert headphones (Etymotic Research, Elk Grove Village, IL, USA). This presentation level was below the SPL that elicited post auricular muscle (PAM) artifact in the FFR [53] (PAM artifact is indicated in FFRs when response amplitude exceeds 0.2–0.3 μV, see Fig. 2 of Ref. [53]. Our FFRs are 3–4× smaller than this artifactual threshold. Absence of PAM in our EEG data is due to the use of lower stimulus presentation levels (75 dBA)). In addition to the clean run, the task was performed in noise where vowel stimuli were mixed with 8-talker babble at a signal-to-noise ratio (SNR) of +5 dB (speech at 75 dBA SPL; noise at 70 dBA SPL). In the present study, clean responses served as a control to assess noise-related changes in the FFR/ERPs relative to a baseline. However, our analysis focused on the noise FFR/ERP responses as we were interested in assessing how speech encoding, in noise, related to speech-in-noise perception (i.e., QuickSIN scores) rather than clean speech processing, per se.

EEG was recorded differentially between Ag/AgCl disk electrodes placed at the scalp vertex (Cz; non-inverting electrode) referenced to linked mastoids (M1/M2; inverting electrodes) with mid-forehead serving as common ground. This single-channel montage was ideal for simultaneously recording high SNR brainstem and cortical auditory responses which were distributed maximally over frontocentral scalp locations [23,33,54] and was among the most common montages for recording FFRs [22,53,55]. The Cz electrode was also selected because both brainstem FFRs and cortical N1-P2 amplitudes showed robust correlations with SIN at this scalp location [3,33]. Interelectrode electrode impedances were ≤5 kΩ. EEGs were digitized at 5 kHz using SynAmps RT amplifiers (Compumedics Neuroscan, Charlotte, NC, USA). EEGs were epoched (−10–200 ms), pre-stimulus baselined, and ensemble averaged across trials to obtain compound speech-evoked potentials [19]. We bandpass filtered full-band responses from 100 to 1500 Hz and 1 to 30 Hz to isolate the FFRs and ERPs, respectively [18,19,54].

From FFRs, we measured the RMS amplitude from the steady-state portion (10–100 ms) of the response waveform to quantify the overall strength of the brainstem response to speech. From the ERPs, we measured the peak-to-peak amplitude of the N1-P2 complex. N1 was identified as the greatest negative deflection between 90 and 145 ms, and P2 as the maximum positive deflection within 145–175 ms [18]. Together, FFR_rms and N1-P2 magnitude described the overall strength of the FFR and ERP. While other features of the ERP/FFRs are possible to measure (e.g., latency), these amplitude assays were relatively isomorphic as they allowed us to directly compare the overall magnitude of speech encoding at brainstem vs. cortical levels and both were highly sensitive to noise-related changes [18,33].

2.3. Structural MRI and Morphometric Analysis

We acquired a 3D whole-brain T1-weighted anatomical image for each participant using the MPRAGE sequence (TR/TE = 2400/2.68 ms; field of view = 256 × 240 mm²; 208 sagittal slices; voxel size = 0.8 mm isotropic; flip angle = 8°) using the Siemens 3T Magnetom Prisma Fit scanner at the IU Imaging Research Facility. MRI data were converted to Brain Imaging Data Structure (BIDS) format using ezBIDS (https://brainlife.io/ezbids/ [accessed on 1 December 2025]) for further processing [56].

Cortical reconstruction and volumetric segmentation were performed on the T1-weighted structural MRI scans using the FreeSurfer (v7.3.1; http://surfer.nmr.mgh.harvard.edu/ [accessed on 1 December 2025]) comprehensive recon-all pipeline [57]. This automated pipeline included motion correction [58], removal of non-brain tissue and skull stripping via hybrid watershed/surface deformation [59], automated Talairach transformation, segmentation of white and gray matter volumetric structures [59,60], and cortical surface reconstruction [61]. Parcellation of brainstem structures (medulla oblongata, pons, midbrain and superior cerebellar peduncle) was performed using FreeSurfer’s Bayesian brainstem segmentation pipeline [62]. MRI processing was conducted on the Indiana University high-throughput Quartz supercomputing cluster (92 compute nodes, each equipped with two 64-core AMD EPYC 7742 2.25 GHz CPUs and 512 GB of RAM).

From the full-brain FreeSurfer output (i.e., aparc + aseg stats table from recon-all), we measured cortical thickness (mm), surface area (mm²), and gray matter volume (mm³) [59,61,63,64] from each region defined in the Desikan–Killiany atlas parcellation [60]. FreeSurfer morphometrics have good test–retest reliability across scanners and various field strengths [65,66]. Statistical results were projected onto the cortical surface using FreeSurfer’s fsaverage standard brain [67]. (Because some brain volumetrics can scale with general head size, we analyzed all volume measures with correction for estimated total intracranial volume (i.e., eTIV) [68]. However, we note that eTIV was unrelated to QuickSIN scores [F(1,26) = 0.25, p = 0.62] and our results were identical with or without treating eTIV as a covariate in our statistical models. This confirms that differences in overall brain size alone do not explain our results.)

2.4. QuickSIN Speech-in-Noise Perception Task

We measured listeners’ SIN perception using the QuickSIN (Etymotic Research, Elk Grove Village, IL, USA) [69]. Participants heard six sentences embedded in four-talker noise babble, each containing five keywords. Sentences were presented at 70 dB HL. The SNR decreased parametrically in 5 dB steps from 25 dB SNR to 0 dB SNR. At each SNR, participants were instructed to repeat the sentence, and the correctly recalled keywords were logged. We computed their SNR loss by subtracting the number of recalled target words from 25.5 (i.e., SNR loss = 25.5 − Total Correct). The QuickSIN was presented binaurally via Sennheiser HD 280 circumaural headphones (Sennheiser Electronic Corp., Old Lyme, CT, USA). Two lists were performed and the second was used in subsequent analysis to avoid familiarization effects.

2.5. Statistical Analysis

Unless otherwise noted, we analyzed the dependent variables using linear models in R (v4.2.2) [70]. Subjects served as a random effect for repeated measures models. QuickSIN scores served as the main dependent variable. Subject demographics (e.g., sex, age, music training) that have sometimes been associated with FFR amplitudes [71,72] were unrelated to QuickSIN scores in this study (all ps > 0.44), so these were not included in further models. Initial diagnostics indicated mild heteroscedasticity in model residuals for structure–function–behavior linear models (see Section 3.4). Thus, in these cases, we used a weighted least squares (WLS) approach [73], with weights calculated as the inverse of the variance at each observation. Variance estimates were obtained by regressing the residuals of QuickSIN measurements against the fitted predictor variables. Tukey-adjusted contrasts were used for multiple comparisons in all multi-level models. Effects sizes were reported as

η_{p}^{2}

for all significant omnibus effects. Visualization of morphometric statistics was performed in the fsbrain package [74] and Freeview [57].

3. Results

3.1. Behavioral Results

Figure 1a shows QuickSIN scores across the sample plotted against listeners’ years of formal music training and age. Neither age, music, sex, nor pure-tone average (PTA) hearing levels were associated with QuickSIN scores (all ps > 0.07). However, despite normal hearing acuity in our sample (Figure 1b), QuickSIN scores showed substantial variability across listeners (range = −2.5 to 3.5 dB SNR loss) confirming large variation in SIN performance even in young normal-hearing adults e.g., [3,4,5,18].

3.2. Speech-Evoked EEG Results

Figure 2 shows cortical ERPs and brainstem FFRs as a function of noise. Cortical ERPs showed a series of biphasic deflections (“P1-N1-P2” waves) over the first ~200 ms after sound onset, consistent with typical morphology of the canonical auditory evoked potentials (Figure 2a). The N1-P2 complex was maximal near the scalp vertex and inverted at the mastoids, consistent with generators in the supratemporal plane (e.g., auditory cortex) [20,75]. FFRs appeared as a phase-locked neurophonic potential that mirrored spectrotemporal properties of the eliciting speech stimulus (Figure 2b). FFR spectra revealed that subcortical activity captured the voice fundamental frequency (F0) and its integer related partials up to about the eighth harmonic (~1200 Hz), consistent with the upper limit of phase locking in the midbrain inferior colliculus [76]—the major generator of the human FFR [23].

Noise both weakened and prolonged the FFRs and ERPs, suggesting poorer neural representation of speech at both the sub- and neo-cortical levels of auditory processing. Comparison between SNRs revealed noise-related decline in both brainstem FFR [clean vs. noise: t(29) = 2.14, p = 0.041] and cortical ERP [clean vs. noise: t(29) = 4.32, p = 0.00017] amplitudes. A linear model revealed behavioral QuickSIN scores were dependent on both FFR and ERP strength [FFR_RMS × N1-P2 interaction: F(1,26) = 4.50, p = 0.043

; η_{p}^{2} = 0.15

]. As these EEG effects replicated many prior studies examining noise-related changes in the speech-FFRs/ERPs and their relation to SIN perception in both young and older adults [5,6,18,33,77,78], we did not further analyze the clean (control condition) responses. Instead, we focused subsequent analyses on the noise FFR/ERPs as we were interested in examining how structural properties of the brainstem–cortical auditory anatomy link to electrophysiological function and behavior assays of noise-degraded speech processing.

3.3. Structural MRI Results

Structural morphology of HG (primary auditory cortex) was related to SIN perception but depended on hemisphere (Figure 3).

3.3.1. Auditory Cortical Thickness

Cortical thickness of transverse HG did not differ between right and left hemispheres [t(29) = −1.22, p = 0.23]. However, a multivariate linear model including both left and right HG thickness revealed that the right [F(1,27) = 9.38, p = 0.0049;

η_{p}^{2} = 0.15

] but not the left [F(1,27) = 0.019, p = 0.89] auditory cortex thickness was associated with QuickSIN scores (Figure 3, middle panels). A thicker right HG was associated with lower QuickSIN scores, indicative of better SIN performance.

3.3.2. Auditory Cortical Volume

In contrast to thickness, cortical volume was larger in the left vs. right HG [t(29) = 8.25, p < 0.0001], consistent with well-known leftward asymmetry and larger density of the left auditory cortex [79,80,81]. Paralleling cortical thickness, a multivariate linear model including both left and right HG volumes showed that the right [F(1,27) = 5.18, p = 0.031

; η_{p}^{2} = 0.16

] but not the left [F(1,27) = 0.20, p = 0.65] auditory cortex volume was also associated with QuickSIN scores (Figure 3, bottom panels). A more voluminous right HG was associated with better SIN performance.

3.3.3. Control Analysis

We also examined whether QuickSIN scores varied with the morphology of adjacent primary motor regions [43], given the putative role of motor system in SIN processing [82]. These control analyses showed that the morphologies of neither the left nor right precentral gyrus were related to QuickSIN scores (all ps > 0.19). This confirmed that SIN perception was driven by the morphology specific to auditory cortex rather than gross differences in brain anatomy.

3.3.4. Midbrain Volumetrics

Figure 4 shows the volumetric data for the brainstem midbrain; the left and right auditory cortical volumes are shown for comparison (Note that thickness measures are not relevant to the FreeSurfer brainstem segmentation pipeline [62]. Thus, only volume could be directly compared across anatomical levels). A linear mixed model conducted on the volumes [i.e., vol~roi + (1|sub)] revealed a main effect of the region [F(2,87) = 2233.1, p < 0.0001;

η_{p}^{2} = 0.98

]. Tukey contrasts revealed that midbrain volumes were (expectedly) larger than either auditory cortex (both ps < 0.0001). The left HG (1287 mm³) was 1.3× larger than the right HG (971 mm³) (p = 0.0021).

Figure 4. Volumetrics of the auditory brainstem–cortical pathway. BS, midbrain volume according to FreeSurfer brainstem parcellation [62]. HG, Heschl’s gyrus (primary auditory cortex) [60]. Auditory midbrain is larger than bilateral auditory cortex. HG size is larger in the left vs. right hemispheres. Error bars = ±1 SEM, *** p < 0.0001.

3.4. Structure–Function–Behavior Relations Underlying SIN Processing

Figure 5 shows associations between auditory brain structure (MRI volumetrics; see Figure 4), function (FFR/ERPs; see Figure 2), and behavioral SIN perception (QuickSIN; Figure 1). At a cortical level, a WLS multivariate linear model revealed that right HG volume [F(1,26) = 7.11, p = 0.013;

η_{p}^{2} = 0.21

] and N1-P2 amplitude of the speech ERPs [F(1,26) = 8.93, p = 0.006;

η_{p}^{2} = 0.26

] independently predicted QuickSIN scores with no interaction [F(1,26) = 0.02, p = 0.89] (Figure 5a). (We only considered right auditory cortex in this analysis since left hemisphere was not related to QuickSIN (see Figure 3). Similarly, only FFRs/ERPs to noise-degraded speech were considered as we were interested in assessing brain-behavior relations underlying speech-in-noise processing rather than clean speech).

In contrast to the cortex, midbrain data revealed an interaction between midbrain anatomical size, FFR magnitude, and behavior [F(1,26) = 6.99, p = 0.014;

η_{p}^{2} = 0.21

]. To visualize this interaction of the three continuous variables, we plotted a median split of the sample, dividing listeners based on whether they had smaller (lower 50th percentile) or larger (upper 50th percentile) midbrain volumes (Figure 5). The direction of brain–behavior correspondence differed within each of these anatomical sub-groups. For listeners with a smaller brainstem anatomy, larger FFR_RMS amplitudes to noise-degraded speech were associated with better (i.e., lower) QuickSIN scores (β_std = −0.12; Figure 5b). In contrast, in listeners with larger anatomy, larger FFR_RMS were associated with poorer (i.e., higher) QuickSIN scores (β_std = 0.49). Thus, behavioral SIN processing depended on a complex interaction between anatomical and functional capacity of auditory midbrain structures.

4. Discussion

In this multimodal imaging study, we examined structure–function–behavior links between brainstem (FFR) and cortical (ERP) responses to noise-degraded speech and anatomical properties of their corresponding auditory midbrain–cortical structures. Our data revealed three key findings: (1) larger/thicker right (but not left) Heschl’s gyrus (HG) predicted better SIN listening performance; (2) HG morphology predicted the magnitude of listeners’ cortical response to speech (ERPs); and (3) a structure–function interaction at a subcortical level; listeners with smaller brainstem anatomy but larger speech FFRs showed improved SIN perception.

Our cortical MRI volumetric data are consistent with previous reports showing larger density and leftward asymmetry of the auditory cortex [79,80,81]. Schneider et al. [41] reported HG volume ranging between ~900–2000 mm³ depending on listeners’ musical training. While professional musicians have reliably larger (right) HG volumes than non-musicians, amateur musicians (akin to the mean musicianship levels in our sample) did not differ from non-musicians in aggregated HG volumetrics. Peelle et al. [43] showed that, in older adults, gray matter volume in the right primary auditory cortex correlated with hearing acuity as measured by pure-tone audiometric thresholds; subtle hearing loss was associated with smaller gray matter volume. Importantly, control analyses confirmed hearing acuity was not associated with atrophy in motor regions, suggesting structure–function relationships were specific to auditory processing. We observed similar results for suprathreshold SIN perception.

Anatomical variability in auditory cortex has been related to several hearing behaviors including learning foreign speech sounds [83], tracking linguistic pitch patterns [84], melody discrimination [85], and speech [86] and musical sound [87] identification. In older adults, a reduced temporal lobe volume correlated with poorer SIN performance as measured by QuickSIN [88]. Our results broadly converged with these previous findings and demonstrated new links between auditory cortical morphology and complex hearing skills. We found that right transverse temporal gyrus morphology was related to improved (i.e., lower) QuickSIN scores (structure-behavior relation). In turn, larger ERP N1-P2 amplitudes to noise-degraded speech were associated with better QuickSIN (functional–behavior relation). As these brain–behavior correlations were right lateralized, this implies that more voluminous auditory cortical anatomy in the “non-linguistic” hemisphere supports better figure-ground speech perception. The critical role of right auditory–linguistic brain areas in speech and SIN perception has been appreciated in recent functional EEG studies [15,89,90] and may reflect compensatory responses of the non-dominant hemisphere to help decipher degraded/ambiguous speech signals.

Correlations between speech FFRs and midbrain volumetrics revealed a critical interaction between the strength of electrophysiological responses and anatomical size. We found that in listeners with smaller midbrain volumes (bottom 50%-tile), stronger FFRs were associated with better QuickSIN scores. On the contrary, in listeners with larger midbrains (upper 50%-tile), stronger FFRs were related to worse QuickSIN scores. These findings suggested that when the auditory brainstem is of lesser size, functional encoding of degraded speech is bolstered to provide additional compensatory processing. In contrast, when midbrain anatomy is already substantially large, further functional resources may not provide additional support to SIN perception, and may even reflect an inefficiency or redundancy in speech coding. Differences in neural organization, myelination, or synaptic density could imply that larger morphology within the auditory pathways may not necessarily yield better computational efficiency as often assumed, i.e., “bigger is not better” [91]. Alternatively, the FFR-MRI interactions observed here could highlight different listening strategies for SIN processing due to stronger anatomical support vs. functional encoding dependent on a certain listener’s profile e.g., [87,92]. The data may also reflect quasi-ceiling/floor effects if listeners with larger anatomy are already near ceiling in neural encoding and thus enjoy minimal added benefit from additional neural resources.

Our EEG-MRI findings have ramifications for qualifying neuroplasticity observed in the auditory evoked potentials including the FFR. The strength to which brainstem responses can capture voice pitch and harmonic timbre cues of complex signals relates to listeners’ perception of speech material [1,3,93]. FFRs also have high test–retest reliability and are among the most stable class of the auditory evoked potentials [94]. Beyond such normal variability, FFR amplitude can be further enhanced by various experiential factors including native language experience [95,96], music abilities [72,97], and perceptual learning [45,98], though not in all studies. FFR waveform enhancements are often interpreted to reflect plasticity in central auditory processing [30,32,72,96,97,99,100,101,102,103]. For example, several studies reported that musicians have superior SIN processing and correspondingly more robust FFRs [30,104,105]. Yet, other studies have failed to find musician SIN advantages either at the behavioral [97,106,107,108,109,110] or FFR level [111]. Our data here show that the size of a listener’s midbrain anatomy interacts with the size of their functional FFR and its correspondence with perceptual SIN abilities. Thus, failures to replicate FFR-SIN effects may be related, at least in part, to unmeasured individual differences in midbrain anatomy cf. [53]. This notion was supported by several MRI-M/EEG cortical findings showing that larger HG morphology can emit more robust auditory ERPs to speech and musical sounds [41,87]. We extended this prior work by demonstrating similar structure–function relations in the subcortex between auditory midbrain anatomy and FFR amplitude.

We should note that the structure–function relations examined here are limited to amplitude measures, often interpreted as reflecting the “strength” of neural encoding. Timing characteristics of the FFR/ERPs (i.e., latency, coherence) [36,97,112,113] were probably unrelated to structural volumetrics and may likely be less biased by other non-neural factors that are known to affect FFR/ERP amplitude (e.g., impedance, skull thickness, myogenic noise) [53]. However, the fact that we found any portion of variance in the ERP/FFR amplitude that could be explained by structural brain morphology is a cause for concern when interpreting amplitude measures of these responses and their relation to auditory perception [3,5,30,53].

Physical factors of a listener are known to modulate auditory EEG responses. Head size (e.g., circumference) has a well-known influence on auditory brainstem response amplitude and latency [114,115,116]. Stronger FFRs to linguistic pitch stimuli have been observed in speakers of tonal languages vs. their English-speaking peers [95,102]. However, individuals of East Asian descent have different head sizes and shapes [117] compared to Caucasian listeners cf. [118]. Insomuch as head size is related to the brain volumetrics assessed here, it is conceivable that differences in physical dimensions of the head could partly explain subtle group differences in FFRs reported previously. However, this account seems untenable by itself given that experience-dependent enhancements in the FFRs were still observable among listeners of similar cultural background [72,119]. FFR neuroplasticity can also be maintained even when controlling for confounding listener demographics and recording variables that can artificially inflate FFR amplitudes [53,97]. Accounts of our data due to simple head size differences are also unlikely given the interaction between midbrain volumetrics, FFR magnitudes, and QuickSIN (Figure 5b). Explanations based on mere EEG volume conduction principles [38,39] would, on the contrary, predict larger responses from larger anatomy across the board, which is not what we observe in our EEG-MRI data. Our data also showed no dependence on overall intracranial volume (eTIV).

Instead, our brainstem findings were most parsimoniously explained as a layering of functional processing (as indexed via FFR) that can override fixed, structural predispositions in midbrain anatomy depending on the perceptual demands of speech processing. Notably, our FFRs/ERPs were also recorded during active SIN perception tasks. It is possible that the observed structure–function–behavior interactions of the auditory brainstem might have only emerged under states of active (attentional) and not passive listening conditions e.g., [18]. Future studies are needed to test this possibility along with whether anatomical variation in brainstem anatomy moderates or mediates experience-dependent [72,95,97,102] and learning-related plasticity in FFR amplitude [45,98,120].

5. Conclusions

EEG-based FFR and ERP responses can offer important neural indices of auditory neurophysiological function and how brainstem and cortical levels of the hearing pathway enable robust speech perception in noise [3,4,6,15]. Our MRI data show that these two classes of speech-evoked potentials are related to anatomical volumetrics of auditory midbrain and transverse temporal cortex (Heschl’s gyrus) brain structures. A larger and thicker right (but not left) HG related to listeners’ SIN abilities as well as the size of the cortical ERPs to speech. At the subcortical level, listeners with smaller midbrain volumes but larger speech FFRs obtained better SIN performance behaviorally. Collectively, our findings reveal that the magnitude of auditory brainstem and cortical electrophysiological responses and their relation to SIN perception are related to the underlying anatomy of their neural generators.

Author Contributions

Conceptualization, G.M.B.; data collection, R.R., J.R.S. and J.A.M.; formal analysis, G.M.B.; validation, H.C., writing, all authors; funding acquisition, G.M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by the Indiana Clinical and Translational Sciences Institute (CTSI), funded in part by grant #UL1TR002529 from the National Institutes of Health, National Center for Advancing Translational Sciences, Clinical and Translational Sciences Award. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and protocols approved by the Indiana University Institutional Review Board (MRI: #15650, approved 15 July 2022; EEG: #14860, approved 7 April 2022).

Informed Consent Statement

Written informed consent was obtained from all participants involved in the study.

Data Availability Statement

The data presented in this study are not publicly available due to privacy or ethical restrictions. Participants were not consented to publicly sharing their data. Data are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bidelman, G.M. Communicating in challenging environments: Noise and reverberation. In Springer Handbook of Auditory Research: The Frequency-Following Response: A Window into Human Communication; Kraus, N., Anderson, S., White-Schwoch, T., Fay, R.R., Popper, A.N., Eds.; Springer Nature: New York, NY, USA, 2017. [Google Scholar]
Anderson, S.; Parbery-Clark, A.; Yi, H.G.; Kraus, N. A neural basis of speech-in-noise perception in older adults. Ear Hear. 2011, 32, 750–757. [Google Scholar] [CrossRef]
Song, J.H.; Skoe, E.; Banai, K.; Kraus, N. Perception of speech in noise: Neural correlates. J. Cogn. Neurosci. 2011, 23, 2268–2279. [Google Scholar] [CrossRef]
Bidelman, G.M.; Momtaz, S. Subcortical rather than cortical sources of the frequency-following response (FFR) relate to speech-in-noise perception in normal-hearing listeners. Neurosci. Lett. 2021, 746, 135664. [Google Scholar] [CrossRef]
Bidelman, G.M.; Davis, M.K.; Pridgen, M.H. Brainstem-cortical functional connectivity for speech is differentially challenged by noise and reverberation. Hear. Res. 2018, 367, 149–160. [Google Scholar] [CrossRef] [PubMed]
Billings, C.J.; McMillan, G.P.; Penman, T.M.; Gille, S.M. Predicting perception in noise using cortical auditory evoked potentials. J. Assoc. Res. Otolaryngol. 2013, 14, 891–903. [Google Scholar] [CrossRef] [PubMed]
Garrett, M.F.; Vasilkov, V.; Mauermann, M.; Devolder, P.; Wilson, J.L.; Gonzales, L.; Henry, K.S.; Verhulst, S. Deciphering compromised speech-in-noise intelligibility in older listeners: The role of cochlear synaptopathy. eNeuro 2025, 12, 1–20. [Google Scholar] [CrossRef] [PubMed]
Lagacé, J.; Jutras, B.; Gagné, J.P. Auditory processing disorder and speech perception problems in noise: Finding the underlying origin. Am. J. Audiol. 2010, 19, 17–25. [Google Scholar] [CrossRef]
Cunningham, J.; Nicol, T.; Zecker, S.G.; Bradlow, A.; Kraus, N. Neurobiologic responses to speech in noise in children with learning problems: Deficits and strategies for improvement. Clin. Neurophysiol. 2001, 112, 758–767. [Google Scholar] [CrossRef]
Warrier, C.M.; Johnson, K.L.; Hayes, E.A.; Nicol, T.; Kraus, N. Learning impaired children exhibit timing deficits and training-related improvements in auditory cortical responses to speech in noise. Exp. Brain Res. 2004, 157, 431–441. [Google Scholar] [CrossRef]
Putter-Katz, H.; Adi-Bensaid, L.; Feldman, I.; Hildesheimer, M. Effects of speech in noise and dichotic listening intervention programs on central auditory processing disorders. J. Basic Clin. Physiol. Pharmacol. 2008, 19, 301–316. [Google Scholar] [CrossRef]
Dole, M.; Meunier, F.; Hoen, M. Functional correlates o fthes peech-in-noise perception impairment in dyslexia: An MRI study. Neuropsychologia 2014, 60, 103–114. [Google Scholar] [CrossRef]
Dole, M.; Hoen, M.; Meunier, F. Speech-in-noise perception deficit in adults with dyslexia: Effects of background type and listening configuration. Neuropsychologia 2012, 50, 1543–1552. [Google Scholar] [CrossRef] [PubMed]
Middelweerd, M.J.; Festen, J.M.; Plomp, R. Difficulties with speech intelligibility in noise in spite of a normal pure-tone audiogram. Audiology 1990, 29, 1–7. [Google Scholar] [CrossRef]
Bidelman, G.M.; Howell, M. Functional changes in inter- and intra-hemispheric cortical processing underlying degraded speech perception. NeuroImage 2016, 124, 581–590. [Google Scholar] [CrossRef]
Fitzgerald, M.B.; Ward, K.M.; Gianakas, S.P.; Smith, M.L.; Blevins, N.H.; Swanson, A.P. Speech-in-Noise Assessment in the Routine Audiologic Test Battery: Relationship to Perceived Auditory Disability. Ear Hear. 2024, 45, 816–826. [Google Scholar] [CrossRef]
Fitzgerald, M.B.; Gianakas, P.; Qian, Z.J.; Losorelli, S.; Swanson, A.C. Preliminary guidelines for replacing word-recognition in quiet with speech in noise assessment in the routine audiologic test battery. Ear Hear. 2023, 44, 1548–1561. [Google Scholar] [CrossRef] [PubMed]
Price, C.N.; Bidelman, G.M. Attention reinforces human corticofugal system to aid speech perception in noise. NeuroImage 2021, 235, 118014. [Google Scholar] [CrossRef]
Bidelman, G.M.; Moreno, S.; Alain, C. Tracing the emergence of categorical speech perception in the human auditory system. NeuroImage 2013, 79, 201–212. [Google Scholar] [CrossRef] [PubMed]
Picton, T.W.; Alain, C.; Woods, D.L.; John, M.S.; Scherg, M.; Valdes-Sosa, P.; Bosch-Bayard, J.; Trujillo, N.J. Intracerebral sources of human auditory-evoked potentials. Audiol. Neurootol. 1999, 4, 64–79. [Google Scholar] [CrossRef]
Krishnan, A. Human frequency following response. In Auditory Evoked Potentials: Basic Principles and Clinical Application; Burkard, R.F., Don, M., Eggermont, J.J., Eds.; Lippincott Williams & Wilkins: Baltimore, MD, USA, 2007; pp. 313–335. [Google Scholar]
Skoe, E.; Kraus, N. Auditory brain stem response to complex sounds: A tutorial. Ear Hear. 2010, 31, 302–324. [Google Scholar] [CrossRef]
Bidelman, G.M. Subcortical sources dominate the neuroelectric auditory frequency-following response to speech. NeuroImage 2018, 175, 56–69. [Google Scholar] [CrossRef]
Gorina-Careta, N.; Kurkela, J.L.O.; Hämäläinen, J.; Astikainen, P.; Escera, C. Neural generators of the frequency-following response elicited to stimuli of low and high frequency: A magnetoencephalographic (MEG) study. NeuroImage 2021, 231, 117866. [Google Scholar] [CrossRef]
Coffey, E.B.J.; Nicol, T.; White-Schwoch, T.; Chandrasekaran, B.; Krizman, J.; Skoe, E.; Zatorre, R.J.; Kraus, N. Evolving perspectives on the sources of the frequency-following response. Nat. Commun. 2019, 10, 5036. [Google Scholar] [CrossRef]
Bidelman, G.M. Multichannel recordings of the human brainstem frequency-following response: Scalp topography, source generators, and distinctions from the transient ABR. Hear. Res. 2015, 323, 68–80. [Google Scholar] [CrossRef]
Ross, B.; Tremblay, K.L.; Alain, C. Simultaneous EEG and MEG recordings reveal vocal pitch elicited cortical gamma oscillations in young and older adults. NeuroImage 2020, 204, 116253. [Google Scholar] [CrossRef] [PubMed]
López-Caballero, F.; Martin-Trias, P.; Ribas-Prats, T.; Gorina-Careta, N.; Bartrés-Faz, D.; Escera, C. Effects of cTBS on the frequency-following response and other auditory evoked potentials. Front. Hum. Neurosci. 2020, 14, 250. [Google Scholar] [CrossRef]
Lucchetti, F.; Nonclercq, A.; Avan, P.; Giraudet, F.; Fan, X.; Deltenre, P. Subcortical neural generators of the envelope-following response in sleeping children: A transfer function analysis. Hear. Res. 2021, 401, 108157. [Google Scholar] [CrossRef]
Parbery-Clark, A.; Skoe, E.; Kraus, N. Musical experience limits the degradative effects of background noise on the neural processing of sound. J. Neurosci. 2009, 29, 14100–14107. [Google Scholar] [CrossRef] [PubMed]
Anderson, S.; Skoe, E.; Chandrasekaran, B.; Kraus, N. Neural timing is linked to speech perception in noise. J. Neurosci. 2010, 30, 4922–4926. [Google Scholar] [CrossRef] [PubMed]
Coffey, E.B.J.; Chepesiuk, A.M.P.; Herholz, S.C.; Baillet, S.; Zatorre, R.J. Neural correlates of early sound encoding and their relationship to speech-in-noise perception. Front. Neurosci. 2017, 11, 479. [Google Scholar] [CrossRef]
Parbery-Clark, A.; Marmel, F.; Bair, J.; Kraus, N. What subcortical-cortical relationships tell us about processing speech in noise. Eur. J. Neurosci. 2011, 33, 549–557. [Google Scholar] [CrossRef] [PubMed]
Saiz-Alía, M.; Forte, A.E.; Reichenbach, T. Individual differences in the attentional modulation of the human auditory brainstem response to speech inform on speech-in-noise deficits. Sci. Rep. 2019, 9, 14131. [Google Scholar] [CrossRef]
Skoe, E.; Kraus, N. Neural delays in processing speech in background noise minimized after short-term auditory training. Biology 2024, 13, 509. [Google Scholar] [CrossRef] [PubMed]
Anderson, S.; White-Schwoch, T.; Parbery-Clark, A.; Kraus, N. Reversal of age-related neural timing delays with training. Proc. Natl. Acad. Sci. USA 2013, 110, 4357–4362. [Google Scholar] [CrossRef]
Skoe, E.; Krizman, J.; Spitzer, E.; Kraus, N. The auditory brainstem is a barometer of rapid auditory learning. Neuroscience 2013, 243, 104–114. [Google Scholar] [CrossRef]
Scherg, M. Fundamentals of dipole source potential analysis. In Auditory Evoked Magnetic Fields and Electric Potentials. Advances in Audiology; Grandori, F., Hoke, M., Romani, G.L., Eds.; Karger: Basel, Switzerland, 1990; pp. 40–69. [Google Scholar]
Scherg, M.; Berg, P.; Nakasato, N.; Beniczky, S. Taking the EEG back into the brain: The power of multiple discrete sources. Front. Neurol. 2019, 10, 855. [Google Scholar] [CrossRef] [PubMed]
Schneider, P.; Andermann, M.; Wengenroth, M.; Goebel, R.; Flor, H.; Rupp, A.; Diesch, E. Reduced volume of Heschl’s gyrus in tinnitus. NeuroImage 2009, 45, 927–939. [Google Scholar] [CrossRef]
Schneider, P.; Scherg, M.; Dosch, H.G.; Specht, H.J.; Gutschalk, A.; Rupp, A. Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nat. Neurosci. 2002, 5, 688–694. [Google Scholar] [CrossRef]
Wengenroth, M.; Blatow, M.; Heinecke, A.; Reinhardt, J.; Stippich, C.; Hofmann, E.; Schneider, P. Increased volume and function of right auditory cortex as a marker for absolute pitch. Cereb. Cortex 2013, 24, 1127–1137. [Google Scholar] [CrossRef]
Peelle, J.E.; Troiani, V.; Grossman, M.; Wingfield, A. Hearing loss in older adults affects neural systems supporting speech comprehension. J. Neurosci. 2011, 31, 12638–12643. [Google Scholar] [CrossRef]
Coffey, E.B.J.; Musacchia, G.; Zatorre, R.J. Cortical correlates of the auditory frequency-following and onset responses: EEG and fMRI evidence. J. Neurosci. 2017, 37, 830–838. [Google Scholar] [CrossRef]
Chandrasekaran, B.; Kraus, N.; Wong, P.C. Human inferior colliculus activity relates to individual differences in spoken language learning. J. Neurophysiol. 2012, 107, 1325–1336. [Google Scholar] [CrossRef]
Oldfield, R.C. The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia 1971, 9, 97–113. [Google Scholar] [CrossRef]
Tichko, P.; Skoe, E. Frequency-dependent fine structure in the frequency-following response: The byproduct of multiple generators. Hear. Res. 2017, 348, 1–15. [Google Scholar] [CrossRef]
Gnanateja, G.N.; Rupp, K.; Llanos, F.; Remick, M.; Pernia, M.; Sadagopan, S.; Teichert, T.; Abel, T.J.; Chandrasekaran, B. Frequency-Following Responses to Speech Sounds Are Highly Conserved across Species and Contain Cortical Contributions. eNeuro 2021, 8, 1–20. [Google Scholar] [CrossRef] [PubMed]
Riegel, J.; Schüller, A.; Wißmann, A.; Zeiler, S.; Kolossa, D.; Reichenbach, T. The cortical contribution to the speech-FFR is not modulated by visual information. bioRxiv 2026. [Google Scholar] [CrossRef]
Schüller, A.; Schilling, A.; Krauss, P.; Rampp, S.; Reichenbach, T. Attentional modulation of the cortical contribution to the frequency-following response evoked by continuous speech. J. Neurosci. 2023, 43, 7429–7440. [Google Scholar] [CrossRef]
Brugge, J.F.; Nourski, K.V.; Oya, H.; Reale, R.A.; Kawasaki, H.; Steinschneider, M.; Howard, M.A., 3rd. Coding of repetitive transients by auditory cortex on Heschl’s gyrus. J. Neurophysiol. 2009, 102, 2358–2374. [Google Scholar] [CrossRef] [PubMed]
Kuwada, S.; Batra, R.; Maher, V.L. Scalp potentials of normal and hearing-impaired subjects in response to sinusoidally amplitude-modulated tones. Hear. Res. 1986, 21, 179–192. [Google Scholar] [CrossRef]
Bidelman, G.M.; Sisson, A.; Rizzi, R.; MacLean, J.; Baer, K. Myogenic artifacts masquerade as neuroplasticity in the auditory frequency-following response. Front. Neurosci. 2024, 18, 1422903. [Google Scholar] [CrossRef]
Musacchia, G.; Strait, D.; Kraus, N. Relationships between behavior, brainstem and cortical encoding of seen and heard speech in musicians and non-musicians. Hear. Res. 2008, 241, 34–42. [Google Scholar] [CrossRef] [PubMed]
Chandrasekaran, B.; Kraus, N. The scalp-recorded brainstem response to speech: Neural origins and plasticity. Psychophysiology 2010, 47, 236–246. [Google Scholar] [CrossRef] [PubMed]
Levitas, D.; Hayashi, S.; Vinci-Booher, S.; Heinsfeld, A.; Bhatia, D.; Lee, N.; Galassi, A.; Niso, G.; Pestilli, F. ezBIDS: Guided standardization of neuroimaging data interoperable with major data archives and platforms. Sci. Data 2024, 11, 179. [Google Scholar] [CrossRef]
Fischl, B. FreeSurfer. NeuroImage 2012, 62, 774–781. [Google Scholar] [CrossRef]
Reuter, M.; Rosas, H.D.; Fischl, B. Highly accurate inverse consistent registration: A robust approach. NeuroImage 2010, 53, 1181–1196. [Google Scholar] [CrossRef]
Fischl, B.; van der Kouwe, A.; Destrieux, C.; Halgren, E.; Segonne, F.; Salat, D.H.; Busa, E.; Seidman, L.J.; Goldstein, J.; Kennedy, D.; et al. Automatically parcellating the human cerebral cortex. Cereb. Cortex 2004, 14, 11–22. [Google Scholar] [CrossRef] [PubMed]
Desikan, R.S.; Segonne, F.; Fischl, B.; Quinn, B.T.; Dickerson, B.C.; Blacker, D.; Buckner, R.L.; Dale, A.M.; Maguire, R.P.; Hyman, B.T.; et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage 2006, 31, 968–980. [Google Scholar] [CrossRef] [PubMed]
Dale, A.M.; Fischl, B.; Sereno, M.I. Cortical surface-based analysis. I. Segmentation and surface reconstruction. NeuroImage 1999, 9, 179–194. [Google Scholar] [CrossRef]
Iglesias, J.E.; Van Leemput, K.; Bhatt, P.; Casillas, C.; Dutt, S.; Schuff, N.; Truran-Sacrey, D.; Boxer, A.; Fischl, B. Bayesian segmentation of brainstem structures in MRI. NeuroImage 2015, 113, 184–195. [Google Scholar] [CrossRef]
Fischl, B.; Dale, A.M. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc. Natl. Acad. Sci. USA 2000, 97, 11050–11055. [Google Scholar] [CrossRef]
Fischl, B.; Sereno, M.I.; Dale, A.M. Cortical surface-based analysis. II: Inflation, flattening, and a surface-based coordinate system. NeuroImage 1999, 9, 195–207. [Google Scholar] [CrossRef] [PubMed]
Reuter, M.; Schmansky, N.J.; Rosas, H.D.; Fischl, B. Within-subject template estimation for unbiased longitudinal image analysis. NeuroImage 2012, 61, 1402–1418. [Google Scholar] [CrossRef]
Han, X.; Jovicich, J.; Salat, D.; van der Kouwe, A.; Quinn, B.; Czanner, S.; Busa, E.; Pacheco, J.; Albert, M.; Killiany, R.; et al. Reliability of MRI-derived measurements of human cerebral cortical thickness: The effects of field strength, scanner upgrade and manufacturer. NeuroImage 2006, 32, 180–194. [Google Scholar] [CrossRef] [PubMed]
Fischl, B.; Sereno, M.I.; Tootell, R.B.H.; Dale, A.M. High-resolution intersubject averaging and a coordinate system for the cortical surface. Hum. Brain Mapp. 1999, 8, 272–284. [Google Scholar] [CrossRef]
Buckner, R.L.; Head, D.; Parker, J.; Fotenos, A.F.; Marcus, D.; Morris, J.C.; Snyder, A.Z. A unified approach for morphometric and functional data analysis in young, old, and demented adults using automated atlas-based head size normalization: Reliability and validation against manual measurement of total intracranial volume. NeuroImage 2004, 23, 724–738. [Google Scholar] [CrossRef]
Killion, M.C.; Niquette, P.A.; Gudmundsen, G.I.; Revit, L.J.; Banerjee, S. Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am. 2004, 116, 2395–2405. [Google Scholar] [CrossRef] [PubMed]
R-Core-Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; Available online: https://www.R-project.org/ (accessed on 1 December 2025).
Başoğlu, İ.; Belgin, E.; Ölçek, G.; Başoğlu, Y. Age and sex differences in speech FFR suggest early central auditory aging. Sci. Rep. 2025, 15, 40319. [Google Scholar] [CrossRef]
Wong, P.C.; Skoe, E.; Russo, N.M.; Dees, T.; Kraus, N. Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci. 2007, 10, 420–422. [Google Scholar] [CrossRef] [PubMed]
Draper, N.R.; Smith, H. Applied Regression Analysis, 3rd ed.; Wiley: New York, NY, USA, 1998. [Google Scholar]
Schaefer, T.; Ecker, C. fsbrain: An R package for the visualization of structural neuroimaging data. bioRxiv 2020. [Google Scholar] [CrossRef]
Alain, C.; Snyder, J.S.; He, Y.; Reinke, K.S. Changes in auditory cortex parallel rapid perceptual learning. Cereb. Cortex 2007, 17, 1074–1084. [Google Scholar] [CrossRef]
Liu, L.F.; Palmer, A.R.; Wallace, M.N. Phase-locked responses to pure tones in the inferior colliculus. J. Neurophysiol. 2006, 95, 1926–1935. [Google Scholar] [CrossRef]
Presacco, A.; Simon, J.Z.; Anderson, S. Evidence of degraded representation of speech in noise, in the aging midbrain and cortex. J. Neurophysiol. 2016, 116, 2346–2355. [Google Scholar] [CrossRef]
Billings, C.J.; Bennett, K.O.; Molis, M.R.; Leek, M.R. Cortical encoding of signals in noise: Effects of stimulus type and recording paradigm. Ear Hear. 2010, 32, 53–60. [Google Scholar] [CrossRef] [PubMed]
Schneider, P.; Sluming, V.; Roberts, N.; Scherg, M.; Goebel, R.; Specht, H.J.; Dosch, H.G.; Bleeck, S.; Stippich, C.; Rupp, A. Structural and functional asymmetry of lateral Heschl’s gyrus reflects pitch perception preference. Nat. Neurosci. 2005, 8, 1241–1247. [Google Scholar] [CrossRef]
Jancke, L.; Steinmetz, H. Anatomical brain asymmetries and their relevance for functional asymmetries. In The Asymmetrical Brain; Davidson, R.J., Hugdahl, K., Eds.; MIT Press: Boston, MA, USA, 2003; pp. 187–229. [Google Scholar]
Hutsler, J.J. The specialized structure of human language cortex: Pyramidal cell size asymmetries within auditory and language-associated regions of the temporal lobes. Brain Lang. 2003, 86, 226–242. [Google Scholar] [CrossRef]
Du, Y.; Buchsbaum, B.R.; Grady, C.L.; Alain, C. Noise differentially impacts phoneme representations in the auditory and speech motor systems. Proc. Natl. Acad. Sci. USA 2014, 111, 7126–7131. [Google Scholar] [CrossRef]
Golestani, N.; Molko, N.; Dehaene, S.; LeBihan, D.; Pallier, C. Brain structure predicts the learning of foreign speech sounds. Cereb. Cortex 2007, 17, 575–582. [Google Scholar] [CrossRef]
Wong, P.C.; Warrier, C.M.; Penhune, V.B.; Roy, A.K.; Sadehh, A.; Parrish, T.B.; Zatorre, R.J. Volume of left Heschl’s Gyrus and linguistic pitch learning. Cereb. Cortex 2008, 18, 828–836. [Google Scholar] [CrossRef] [PubMed]
Foster, N.E.; Zatorre, R.J. Cortical structure predicts success in performing musical transformation judgments. NeuroImage 2010, 53, 26–36. [Google Scholar] [CrossRef] [PubMed]
Fuhrmeister, P.; Myers, E.B. Structural neural correlates of individual differences in categorical perception. Brain Lang. 2021, 215, 104919. [Google Scholar] [CrossRef]
Mankel, K.; Shrestha, U.; Tipirneni-Sajja, A.; Bidelman, G.M. Functional plasticity coupled with structural predispositions in auditory cortex shape successful music category learning. Front. Neurosci. 2022, 16, 897239. [Google Scholar] [CrossRef]
Jiang, K.; Albert, M.S.; Coresh, J.; Couper, D.J.; Gottesman, R.F.; Hayden, K.M.; Jack, C.R., Jr.; Knopman, D.S.; Mosley, T.H.; Pankow, J.S.; et al. Cross-Sectional Associations of Peripheral Hearing, Brain Imaging, and Cognitive Performance With Speech-in-Noise Performance: The Aging and Cognitive Health Evaluation in Elders Brain Magnetic Resonance Imaging Ancillary Study. Am. J. Audiol. 2024, 33, 683–694. [Google Scholar] [CrossRef]
Price, C.N.; Alain, C.; Bidelman, G.M. Auditory-frontal channeling in α and β bands is altered by age-related hearing loss and relates to speech perception in noise. Neuroscience 2019, 423, 18–28. [Google Scholar] [CrossRef]
Luthra, S.; Guediche, S.; Blumstein, S.E.; Myers, E.B. Neural substrates of subphonemic variation and lexical competition in spoken word recognition. Lang. Cogn. Neurosci. 2019, 34, 151–169. [Google Scholar] [CrossRef] [PubMed]
Kanai, R.; Rees, G. The structural basis of inter-individual differences in human behaviour and cognition. Nat. Rev. Neurosci. 2011, 12, 231–242. [Google Scholar] [CrossRef]
Rizzi, R.; Bidelman, G.M. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. Brain Res. 2024, 1844, 149166. [Google Scholar] [CrossRef] [PubMed]
Berg, K.A.; Smith, S.B.; Gifford, E.H. Comparing the effects of vocoded speech on behavioural vowel recognition and the frequency following response. Int. J. Audiol. 2025, 64, 1164–1172. [Google Scholar] [CrossRef]
Bidelman, G.M.; Pousson, M.; Dugas, C.; Fehrenbach, A. Test-retest reliability of dual-recorded brainstem vs. cortical auditory evoked potentials to speech. J. Am. Acad. Audiol. 2018, 29, 164–174. [Google Scholar] [CrossRef]
Krishnan, A.; Gandour, J.T.; Bidelman, G.M. The effects of tone language experience on pitch processing in the brainstem. J. Neurolinguistics 2010, 23, 81–95. [Google Scholar] [CrossRef] [PubMed]
Zhao, T.C.; Kuhl, P.K. Linguistic effect on speech perception observed at the brainstem. Proc. Natl. Acad. Sci. USA 2018, 115, 8716–8721. [Google Scholar] [CrossRef]
Mankel, K.; Bidelman, G.M. Inherent auditory skills rather than formal music training shape the neural encoding of speech. Proc. Natl. Acad. Sci. USA 2018, 115, 13129–13134. [Google Scholar] [CrossRef]
Reetzke, R.; Xie, Z.; Llanos, F.; Chandrasekaran, B. Tracing the trajectory of sensory plasticity across different stages of speech learning in adulthood. Curr. Biol. 2018, 28, 1419–1427.e4. [Google Scholar] [CrossRef] [PubMed]
Kraus, N.; Skoe, E.; Parbery-Clark, A.; Ashley, R. Experience-induced malleability in neural encoding of pitch, timbre, and timing. Ann. N. Y. Acad. Sci. 2009, 1169, 543–557. [Google Scholar] [CrossRef]
Musacchia, G.; Sams, M.; Skoe, E.; Kraus, N. Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc. Natl. Acad. Sci. USA 2007, 104, 15894–15898. [Google Scholar] [CrossRef]
Krizman, J.; Marian, V.; Shook, A.; Skoe, E.; Kraus, N. Subcortical encoding of sound is enhanced in bilinguals and relates to executive function advantages. Proc. Natl. Acad. Sci. USA 2012, 109, 7877–7881. [Google Scholar] [CrossRef] [PubMed]
Krishnan, A.; Xu, Y.; Gandour, J.T.; Cariani, P. Encoding of pitch in the human brainstem is sensitive to language experience. Brain Res. Cogn. Brain Res. 2005, 25, 161–168. [Google Scholar] [CrossRef] [PubMed]
Kraus, N.; Slater, J.; Thompson, E.C.; Hornickel, J.; Strait, D.L.; Nicol, T.; White-Schwoch, T. Music enrichment programs improve the neural encoding of speech in at-risk children. J. Neurosci. 2014, 34, 11913–11918. [Google Scholar] [CrossRef]
Hennessy, S.; Mack, W.J.; Habibi, A. Speech-in-noise perception in musicians and non-musicians: A multi-level meta-analysis. Hear. Res. 2022, 416, 108442. [Google Scholar] [CrossRef] [PubMed]
Coffey, E.B.J.; Mogilever, N.B.; Zatorre, R.J. Speech-in-noise perception in musicians: A review. Hear. Res. 2017, 352, 49–69. [Google Scholar] [CrossRef]
Boebinger, D.; Evans, S.; Rosen, S.; Lima, C.F.; Manly, T.; Scott, S.K. Musicians and non-musicians are equally adept at perceiving masked speech. J. Acoust. Soc. Am. 2015, 137, 378–387. [Google Scholar] [CrossRef]
Madsen, S.M.K.; Whiteford, K.L.; Oxenham, A.J. Musicians do not benefit from differences in fundamental frequency when listening to speech in competing speech backgrounds. Sci. Rep. 2017, 7, 12624. [Google Scholar] [CrossRef]
Ruggles, D.R.; Freyman, R.L.; Oxenham, A.J. Influence of musical training on understanding voiced and whispered speech in noise. PLoS ONE 2014, 9, e86980. [Google Scholar] [CrossRef]
Yeend, I.; Beach, E.F.; Sharma, M.; Dillon, H. The effects of noise exposure and musical training on suprathreshold auditory processing and speech perception in noise. Hear. Res. 2017, 353, 224–236. [Google Scholar] [CrossRef]
Escobar, J.; Mussoi, B.S.; Silberer, A.B. The effect of musical training and working memory in adverse listening situations. Ear Hear. 2020, 41, 278–288. [Google Scholar] [CrossRef] [PubMed]
Whiteford, K.L.; Baltzell, L.S.; Chiu, M.; Cooper, J.K.; Faucher, S.; Goh, P.Y.; Hagedorn, A.; Irsik, V.C.; Irvine, A.; Lim, S.-J.; et al. Large-scale multi-site study shows no association between musical training and early auditory neural sound encoding. Nat. Commun. 2025, 16, 7152. [Google Scholar] [CrossRef] [PubMed]
Bidelman, G.M.; Weiss, M.W.; Moreno, S.; Alain, C. Coordinated plasticity in brainstem and auditory cortex contributes to enhanced categorical speech perception in musicians. Eur. J. Neurosci. 2014, 40, 2662–2673. [Google Scholar] [CrossRef]
Parbery-Clark, A.; Anderson, S.; Hittner, E.; Kraus, N. Musical experience offsets age-related delays in neural timing. Neurobiol. Aging 2012, 33, 1483.e1–1483.e4. [Google Scholar] [CrossRef]
Stockard, J.J.; Stockard, J.E.; Sharbrough, F.W. Nonpathologic factors influencing brainstem auditory evoked potentials. Am. J. EEG Technol. 1978, 18, 177–209. [Google Scholar] [CrossRef]
Trune, D.R.; Mitchell, C.; Phillips, D.S. The relative importance of head size, gender and age on the auditory brainstem response. Hear. Res. 1988, 32, 165–174. [Google Scholar] [CrossRef]
Chambers, R.D.; Matthies, M.L.; Griffiths, S.K. Correlations between various measures of head size and auditory brainstem response latencies. Hear. Res. 1989, 41, 179–187. [Google Scholar] [CrossRef] [PubMed]
Ball, R.; Shu, C.; Xi, P.; Rioux, M.; Luximon, Y.; Molenbroek, J. A comparison between Chinese and Caucasian head shapes. Appl. Ergon. 2010, 41, 832–839. [Google Scholar] [CrossRef] [PubMed]
Ulehlová, L.; Voldrich, L.; Janisch, R. Correlative study of sensory cell density and cochlear length in humans. Hear. Res. 1987, 28, 149–151. [Google Scholar] [CrossRef]
Bidelman, G.M.; Gandour, J.T.; Krishnan, A. Cross-domain effects of music and language experience on the representation of pitch in the human auditory brainstem. J. Cogn. Neurosci. 2011, 23, 425–434. [Google Scholar] [CrossRef] [PubMed]
Carcagno, S.; Plack, C.J. Subcortical plasticity following perceptual learning in a pitch discrimination task. J. Assoc. Res. Otolaryngol. 2011, 12, 89–100. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Behavioral speech-in-noise performance. (a) QuickSIN scores reflect signal-to-noise ratio thresholds for sentence recognition in noise. QuickSIN scores are plotted against listeners’ age and music training (which were not related to SIN performance). Dotted lines = variable means. (b) Plots of hearing acuity (PTA) and QuickSIN scores as a function of sex. Males and females had similar SIN performance and PTA. Clinically “normal” hearing is ≤25 dB HL. Dotted lines = n.s. regression. Shading = 95% CI.

Figure 2. Grand average ERPs and FFRs confirm noise-related changes in the neural encoding of speech. (a) Cortical ERPs. (b) Brainstem FFR time waveforms (left) and response spectra (right). Waveforms reflect activity at the Cz electrode (only vowel /a/ responses shown for clarity). The stimulus time waveform is shown in gray. Noise decreases amplitude and prolongs latency of neural responses including the N1-P2 of the ERPs and RMS amplitude of the FFR (inset bar charts). F0 = fundamental frequency; H1–H8 = harmonics. Error bars = ±1 SEM, * p < 0.05, *** p < 0.0001.

Figure 3. Structural asymmetries in auditory cortex relate to SIN perception. (top) Auditory HG (oval) and control (light colors; primary motor cortex) brain regions to assess correlations between structural morphology and QuickSIN scores. ROIs follow the DK atlas [60]. (middle) Cortical thickness in right (but not left) HG predicted QuickSIN performance. (bottom) HG volume was smaller in right vs. left hemisphere (inset). As with thickness, more voluminous right HG was associated with better QuickSIN scores. Structural properties of motor cortex did not correlate with SIN scores. Dotted lines = n.s. correlations; solid lines = significant correlations. Shading = 95% CI. * p < 0.05.

Figure 5. Structure–function–behavior relations underlying SIN processing. (a) Cortical data show relations between behavioral QuickSIN performance and (i) HG volume and (ii) ERP amplitudes. Larger cortical anatomy and electrophysiological responses were related to better SIN processing. (b) Brainstem data showing an FFR-MRI interaction. The interaction is dichotomized for visualization purposes only. For listeners with smaller midbrain volumes (lower 50th percentile; N = 15), stronger FFRs were associated with better QuickSIN scores. In contrast, in listeners with larger midbrains (upper 50th percentile; N = 15), stronger FFRs were associated with poorer QuickSIN. Shading = 95% CI.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bidelman, G.M.; Stirn, J.R.; Rizzi, R.; MacLean, J.A.; Cheng, H. Auditory Brainstem–Cortical Anatomy Relates to the Magnitude of Frequency-Following Responses (FFRs) and Event-Related Potentials (ERPs) Coding Speech-in-Noise. Neuroimaging 2026, 1, 6. https://doi.org/10.3390/neuroimaging1010006

AMA Style

Bidelman GM, Stirn JR, Rizzi R, MacLean JA, Cheng H. Auditory Brainstem–Cortical Anatomy Relates to the Magnitude of Frequency-Following Responses (FFRs) and Event-Related Potentials (ERPs) Coding Speech-in-Noise. Neuroimaging. 2026; 1(1):6. https://doi.org/10.3390/neuroimaging1010006

Chicago/Turabian Style

Bidelman, Gavin M., Jack R. Stirn, Rose Rizzi, Jessica A. MacLean, and Hu Cheng. 2026. "Auditory Brainstem–Cortical Anatomy Relates to the Magnitude of Frequency-Following Responses (FFRs) and Event-Related Potentials (ERPs) Coding Speech-in-Noise" Neuroimaging 1, no. 1: 6. https://doi.org/10.3390/neuroimaging1010006

APA Style

Bidelman, G. M., Stirn, J. R., Rizzi, R., MacLean, J. A., & Cheng, H. (2026). Auditory Brainstem–Cortical Anatomy Relates to the Magnitude of Frequency-Following Responses (FFRs) and Event-Related Potentials (ERPs) Coding Speech-in-Noise. Neuroimaging, 1(1), 6. https://doi.org/10.3390/neuroimaging1010006

Article Menu

Auditory Brainstem–Cortical Anatomy Relates to the Magnitude of Frequency-Following Responses (FFRs) and Event-Related Potentials (ERPs) Coding Speech-in-Noise

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants

2.2. EEG Recording and Analysis

2.3. Structural MRI and Morphometric Analysis

2.4. QuickSIN Speech-in-Noise Perception Task

2.5. Statistical Analysis

3. Results

3.1. Behavioral Results

3.2. Speech-Evoked EEG Results

3.3. Structural MRI Results

3.3.1. Auditory Cortical Thickness

3.3.2. Auditory Cortical Volume

3.3.3. Control Analysis

3.3.4. Midbrain Volumetrics

3.4. Structure–Function–Behavior Relations Underlying SIN Processing

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI