Visual Deprivation Alters Functional Connectivity of Neural Networks for Voice Recognition: A Resting-State fMRI Study

Humans recognize one another by identifying their voices and faces. For sighted people, the integration of voice and face signals in corresponding brain networks plays an important role in facilitating the process. However, individuals with vision loss primarily resort to voice cues to recognize a person’s identity. It remains unclear how the neural systems for voice recognition reorganize in the blind. In the present study, we collected behavioral and resting-state fMRI data from 20 early blind (5 females; mean age = 22.6 years) and 22 sighted control (7 females; mean age = 23.7 years) individuals. We aimed to investigate the alterations in the resting-state functional connectivity (FC) among the voice- and face-sensitive areas in blind subjects in comparison with controls. We found that the intranetwork connections among voice-sensitive areas, including amygdala-posterior “temporal voice areas” (TVAp), amygdala-anterior “temporal voice areas” (TVAa), and amygdala-inferior frontal gyrus (IFG) were enhanced in the early blind. The blind group also showed increased FCs of “fusiform face area” (FFA)-IFG and “occipital face area” (OFA)-IFG but decreased FCs between the face-sensitive areas (i.e., FFA and OFA) and TVAa. Moreover, the voice-recognition accuracy was positively related to the strength of TVAp-FFA in the sighted, and the strength of amygdala-FFA in the blind. These findings indicate that visual deprivation shapes functional connectivity by increasing the intranetwork connections among voice-sensitive areas while decreasing the internetwork connections between the voice- and face-sensitive areas. Moreover, the face-sensitive areas are still involved in the voice-recognition process in blind individuals through pathways such as the subcortical-occipital or occipitofrontal connections, which may benefit the visually impaired greatly during voice processing.


Introduction
Humans recognize a person's identity primarily by their face and voice. Functional magnetic resonance imaging (fMRI) studies in human and nonhuman primates have revealed a set of cortical areas specialized for face processing [1][2][3][4][5]. Face-sensitive areas of humans are most selectively and reliably located in two regions, namely, the fusiform learn faster in voice-recognition training, recognize learned speakers with higher accuracy, and respond faster than their sighted counterparts [35,42]. Their superior performance in voice recognition persists even two weeks after training [43]. However, it remains unclear whether the heightened voice-recognition ability in blind individuals is due to altered functional connectivity (FC) in the neural substrates for voice processing.
To investigate the plastic changes in the FC patterns of the voice perception network in blind compared to sighted participants, the present study quantitatively evaluated the internal FCs of different subareas involved in voice processing, the FCs between voicesensitive and face-sensitive areas, and whether the FC changes among these areas could predict the superior voice-recognition ability in early blind individuals. In light of our recent study revealing a strong language familiarity effect in voice recognition in blind individuals [44], we included both Chinese and Japanese materials to verify the significant effect of recognizing voices spoken in a nonnative language. Findings from this study will provide insights into how visual deprivation affects the voice and face neural systems that process identity information in different sensory modalities and how the affected systems cooperate functionally during speaker recognition.

Participants
Twenty early blind adults (EB; 5 females; mean age = 22.6 years; age range: 18-35 years) were recruited from the Special Education College at Beijing Union University and local communities in Beijing. All the blind participants had complete vision loss or no more than rudimentary sensitivity for brightness differences with no pattern vision. Four had become completely blind no later than the age of four and the others were congenitally blind (see Supplementary Table S1 for a full description). We also recruited 22 sighted control participants (SC; 7 females; mean age = 23.7 years; age range: 20-38 years) who were matched to the blind participants for age, educational level, and musical experience. All participants were right-handed according to Edinburgh Handedness Inventory [45]. All the blind and sighted participants reported normal hearing and no history of neurological or psychiatric disorders. All participants were native Chinese speakers with no prior experience of Japanese. The study was approved by the Institutional Review Board of Peking University (approval code: #2015-12-06). Each participant provided written informed consent to their participation in the experiment.

Stimuli and Procedure of the Behavioral Experiment
Stimulus material for the voice-recognition task consisted of 15 Mandarin Chinese sentences and 15 Japanese sentences, which were selected from a corpus used in our previous study [46]. The number of syllables across sentences in both languages was kept at 15 on average. The average duration of sentences was 2695 ms (SD = 55 ms) for Chinese (CN) and 2676 ms (SD = 55 ms) for Japanese (JP). Sentences were read naturally by five female native speakers of each language, resulting in a total of 150 stimuli. All the sentences were perceived as having no discernible idiosyncratic talker characteristics (e.g., unusual phonetic or prosodic properties such as creaky voice). The 16-bit digital audio recordings were sampled at 44.1 kHz. The stimulus materials were volume balanced using Praat software (http://www.fon.hum.uva.nl/praat/, accessed on 2 February 2017) and were presented over headphones.
We adopted a well-established paradigm [42,43,47,48] for the speaker recognition experiment to assess the voice-recognition ability of our participants. Each of the blind and sighted participants performed the speaker recognition experiment in both language conditions (CN and JP) and the order of language was counterbalanced across participants. In each language condition, the experiment consisted of four sessions: familiarization phase, practice phase, generalization phase (GP), and delayed memory phase (DP) (Figure 1). The tests in the first three phases were conducted in order on the same day and the delayed Brain Sci. 2023, 13, 636 4 of 14 memory phase was set after two weeks. Each participant was tested individually in a quiet room.
After practicing, each participant was asked to recognize the voices of 10 novel sentences read by the same 5 speakers as in the practice phase without feedback. Each participant heard 10 sentences × 5 speakers × 1 time = 50 trials in total. Their accuracy in this phase was computed to measure their voice-recognition ability.


Delayed memory phase (DP) Two weeks later, the participants returned to the lab and performed the same task as in the generalization phase. This retention test allowed us to examine the possible difference in voice memory ability between the blind and the sighted groups. Figure 1. Illustration of behavioral experimental design. (a) Familiarization phase: participants heard a number designating the speaker followed by a training sentence read by that speaker. Then participants pressed the key to begin the next trial. (b) Practice phase: after hearing a sentence, participants were asked to type in the number of the speaker. Correct responses were followed by a cue tone ("Ding"). Incorrect responses were followed by a cue tone ("DiDi"), then the correct number of the speaker was presented. (c) Generalization phase (GP): after hearing a Figure 1. Illustration of behavioral experimental design. (a) Familiarization phase: participants heard a number designating the speaker followed by a training sentence read by that speaker. Then participants pressed the key to begin the next trial. (b) Practice phase: after hearing a sentence, participants were asked to type in the number of the speaker. Correct responses were followed by a cue tone ("Ding"). Incorrect responses were followed by a cue tone ("DiDi"), then the correct number of the speaker was presented. (c) Generalization phase (GP): after hearing a sentence, participants were asked to enter the number of the speaker on the keyboard without feedback. (d) Delayed memory phase (DP): the stimuli and procedure were the same as in the generalization phase.

•
Familiarization phase The familiarization phase was introduced first to help participants associate the speakers with the corresponding voices. In each trial, each participant heard a number designating the speaker (i.e., No. 1-5) followed by 1 of the 5 training sentences read by that speaker. Trials were blocked by sentences. Each sentence was read by all five speakers with two repetitions. Thus, each participant heard 5 sentences × 5 speakers × 2 times = 50 trials in total.

Practice phase
After familiarization, participants were trained to identify the voice of each speaker. The sentence stimuli were the same as those presented in the familiarization phase, but after hearing a sentence, participants were asked to enter the number of the speaker on the keyboard. Correct responses were followed by a cue tone ("Ding"). If the answer was incorrect, the correct number of the speaker was announced to remind the participant. Each participant heard 5 sentences × 5 speakers × 5 times = 125 trials in total.

•
Generalization phase (GP) After practicing, each participant was asked to recognize the voices of 10 novel sentences read by the same 5 speakers as in the practice phase without feedback. Each participant heard 10 sentences × 5 speakers × 1 time = 50 trials in total. Their accuracy in this phase was computed to measure their voice-recognition ability.

•
Delayed memory phase (DP) Two weeks later, the participants returned to the lab and performed the same task as in the generalization phase. This retention test allowed us to examine the possible difference in voice memory ability between the blind and the sighted groups.
Functional volumes were preprocessed and analyzed using SPM12 (Wellcome Department of Imaging Neuroscience, London, UK) and Data Processing Assistant for Resting-State fMRI pipeline analysis (DPARSF) [49] implemented in MATLAB (MathWorks). The initial 10 functional volumes were discarded to allow for signal stabilization and the subject's adaptation to the environment. The preprocessing of the remaining 230 volumes included: (1) slice timing correction for acquisition timing differences, (2) realignment of the functional images to correct for head motions and coregistration of functional and anatomical data, (3) regressing out nuisance covariates including Friston 24-head motion parameters [50], white matter signal, cerebrospinal fluid signal, and linear trends, (4) spatially normalizing the realigned images into the Montreal Neurological Institute (MNI) space by using the parameters from the DARTEL algorithm for anatomical images processing [51] and resampled to 2 × 2 × 2 mm 3 , (5) spatial smoothing using a 4 mm FWHM Gaussian kernel, and (6) a band-pass filter (0.01-0.10 Hz) to reduce the effect of low-frequency drift and high-frequency noise.

Seed-Based FC Analysis
To explore the reorganization of the specific FCs between voice-and face-sensitive areas, we performed seed-based FC analyses and compared them across the blind and the sighted groups. As the selectivity of voice recognition is particularly pronounced in the right hemisphere [52,53], our analyses focused on the voice-and face-sensitive regions in the right hemisphere.
Drawing from the outcomes of the earlier research that identified the voice-sensitive regions in the human auditory cortex, known as the 'temporal voice areas' [17], we defined two "voice patches" along the right STS/STG as regions-of-interest (ROIs): the right posterior 'temporal voice areas' (TVAp, MNI coordinate: x = 42, y = −35, z = 3) and the right anterior 'temporal voice areas' (TVAa, MNI coordinate: x = 55, y = −2, z = −7). Considering both temporal and extra-temporal regions play important roles in performing a voice-recognition task, we selected the right amygdala (MNI coordinate: x = 20, y = -8, z = −12) [17] and right inferior frontal gyrus (IFG, to be exact, the posterior triangularis; MNI coordinate: x = 53, y = 26, z = 26) [9], both of which show reliable voice sensitivity. The face-sensitive areas were identified based on a quantitative meta-analysis of fMRI studies on sighted participants [9], including the right FFA (MNI coordinate: x = 41, y = −53, z = −19) and the right OFA (MNI coordinate: x = 40, y = −81, z = −5) as ROIs. We also ran an exploratory analysis of the voice/face-sensitive areas in the left hemisphere (Please refer to Supplementary Table S2 for detailed descriptions of the ROIs).
In the ROI-to-ROI FC analyses, 6 mm radius spheres were created centering in the coordinates of 6 ROIs, and the time course for each seed was extracted by averaging the time courses of all voxels in the ROI for each participant. Then, the synchrony of the time series between the 6 ROIs was assessed by Pearson's correlation coefficients, which were transformed into Fisher z-scores. Next, we performed two-sample t-tests to examine the differences between the transformed correlation coefficients of the two groups. To describe the relationship between the reorganization of FCs and voice-recognition ability, brain-behavior correlational results were obtained with the p-values corrected by the False Discovery Rate (FDR) method for multiple comparisons. Comparisons of the correlation coefficients were performed between the two groups according to the method proposed by Diedenhofen & Much (2015) [54].

Behavioral Results
The behavioral accuracy data in the generalization phase and delayed memory phase indicated the two participant groups' voice-recognition ability across the different conditions ( Figure 2). The group difference was analyzed using a three-way repeated-measures ANOVA with Time (GP, DP) and Language (CN, JP) as within-subject factors, and Group (SC, EB) as a between-subject factor. The ANOVA results revealed a significant main effect of Group (F (1, 40) = 5.439, p = 0.025, η 2 p = 0.120), indicating that overall, the blind participants performed better than their sighted counterparts. There was also a significant main effect of Language ( series between the 6 ROIs was assessed by Pearson's correlation coefficients, which were transformed into Fisher z-scores. Next, we performed two-sample t-tests to examine the differences between the transformed correlation coefficients of the two groups. To describe the relationship between the reorganization of FCs and voice-recognition ability, brain-behavior correlational results were obtained with the p-values corrected by the False Discovery Rate (FDR) method for multiple comparisons. Comparisons of the correlation coefficients were performed between the two groups according to the method proposed by Diedenhofen & Much (2015) [54].

Behavioral Results
The behavioral accuracy data in the generalization phase and delayed memory phase indicated the two participant groups' voice-recognition ability across the different conditions ( Figure 2). The group difference was analyzed using a three-way repeated-measures ANOVA with Time (GP, DP) and Language (CN, JP) as within-subject factors, and Group (SC, EB) as a between-subject factor. The ANOVA results revealed a significant main effect of Group (F (1, 40) = 5.439, p = 0.025, η 2 p = 0.120), indicating that overall, the blind participants performed better than their sighted counterparts. There was also a significant main effect of Language (F (1, 40) = 42.472, p < 0.001, η

Changes in Functional Connectivity among the Voice-and Face-Sensitive Areas in the Early Blind
To delineate the alterations in the functional connectivity among voice-and face-sensitive areas in the blind subjects, we compared the average FCs of the six seeds between the sighted and early blind groups using two-sample t-tests (Figure 3a,b, Supplementary Table S3).
Within the voice-recognition network, the early blind group exhibited significantly higher FCs than the sighted group between the amygdala and following regions: TVAp (t    Supplementary Table S4 for a full description).

Correlations between Voice-recognition Ability and the Strengths of FC
We conducted Pearson's correlations between voice-recognition ability and the strength of functional connectivity for each group. The results showed that a stronger connectivity between TVAp and FFA was associated with better voice recognition only in the sighted participants (CN-GP: r-SC = 0.640, p-FDR = 0.015; r-EB = −0.014, p-FDR > 0.05; r-SC > r-EB, p = 0.021). On the other hand, in the blind participants, a stronger connectivity between the amygdala and FFA was associated with better voice-recognition performance (CN-GP: r-EB = 0.607, p-FDR = 0.075; r-SC = −0.315, p-FDR > 0.05; r-EB > r-SC, p = 0.002; JP-DP: r-EB = 0.765, p-FDR < 0.001; r-SC = 0.395, p-FDR > 0.05; r-EB > r-SC, p = 0.077). Refer to Figure 4 and Tables S5-S7 for more details.

Correlations between Voice-recognition Ability and the Strengths of FC
We conducted Pearson's correlations between voice-recognition ability and the strength of functional connectivity for each group. The results showed that a stronger connectivity between TVAp and FFA was associated with better voice recognition only in the sighted participants (CN-GP: r-SC = 0.640, p-FDR = 0.015; r-EB = −0.014, p-FDR > 0.05; r-SC > r-EB, p = 0.021). On the other hand, in the blind participants, a stronger connectivity between the amygdala and FFA was associated with better voice-recognition performance (CN-GP: r-EB = 0.607, p-FDR = 0.075; r-SC = −0.315, p-FDR > 0.05; r-EB > r-SC, p = 0.002; JP-DP: r-EB = 0.765, p-FDR < 0.001; r-SC = 0.395, p-FDR > 0.05; r-EB > r-SC, p = 0.077). Refer to Figure 4 and Tables S5-S7 for more details.
Although the p-values were not significant after being corrected for multiple comparisons of correlation analysis (p uncorrected < 0.05, p-FDR > 0.05), there was a tendency that the stronger FCs of TVAp/TVAa-amygdala and TVAa-FFA/OFA were associated with better performance for voice recognition only in the sighted group, while the stronger FCs of amygdala-OFA and FFA-OFA associated with better voice-recognition performance only in the early blind group (Figure 4a, Supplementary Tables S5-S7).

Discussion
In the present study, we investigated how vision loss shaped the neural substrates for voice recognition by resting-state fMRI in early blind individuals and sighted controls. Behavioral results replicated previous findings on the superiority of voice retention memory [43] and the significant effect of language familiarity on voice recognition in the Although the p-values were not significant after being corrected for multiple comparisons of correlation analysis (p uncorrected < 0.05, p-FDR > 0.05), there was a tendency that the stronger FCs of TVAp/TVAa-amygdala and TVAa-FFA/OFA were associated with better performance for voice recognition only in the sighted group, while the stronger FCs of amygdala-OFA and FFA-OFA associated with better voice-recognition performance only in the early blind group (Figure 4a, Supplementary Tables S5-S7).

Discussion
In the present study, we investigated how vision loss shaped the neural substrates for voice recognition by resting-state fMRI in early blind individuals and sighted controls. Behavioral results replicated previous findings on the superiority of voice retention memory [43] and the significant effect of language familiarity on voice recognition in the blind group [44]. ROI-wised functional connectivity analyses evidenced a significant enhancement in the functional coupling between the amygdala and TVAp/TVAa/IFG in the early blind. We also found stronger FCs between FFA/OFA and IFG but weaker FCs between FFA/OFA and TVAa in blind than in sighted participants. Furthermore, we analyzed the correlations between FCs and voice-recognition accuracy in each group. Our results showed that better behavioral performance was associated with stronger FC between TVAp and FFA only in sighted individuals but stronger FC between the amygdala and FFA only in early blind individuals.

Enhanced Internal Connections of Voice Perception Network in the Early Blind
Recognizing a person by voice involves multiple processes. Correspondingly, a large network of distributed brain areas is involved in the processing of voice identity, including not only temporal voice areas as the core parts but also subcortical (such as the amygdala) and prefrontal cortices as the extended regions [15,17,18]. One of the key findings in the present study is that the intranetwork connections among the voice-sensitive areas were enhanced in the blind group, indicating reorganization within the intact voice-recognition system associated with visual impairment. Moreover, our results showed that the alterations in the voice perception network were not confined to TVAs but also included the extended parts of the network, especially the amygdala. The amygdala is involved in the processing of emotional voices in the blind [37]. Some evidence has suggested that the amygdala is also associated with the processing of voice and face traits regardless of the affective characteristics [17,55]. A recent fMRI study in patients with primary visual cortex impairment has confirmed that the amygdala is involved in the processing of socially salient but emotionally neutral facial expressions [56]. In the current study, emotionally neutral stimuli were used, and more accurate (GP) and delayed (DP) performances were associated with a stronger connection between the amygdala and TVA, thus providing further support for the role of the amygdala in speaker identity recognition irrespective of emotional valence.
We also observed that the FC between the amygdala and IFG was enhanced in the blind group. The inferior frontal regions are involved in recognizing learned-familiar persons [9,57], and extensive evidence has indicated that the basolateral complex of the amygdala projects to plenty of regions (e.g., the prefrontal cortex and hippocampus) associated with learning and memory [58][59][60]. The enhanced pathway between the amygdala and IFG observed in this study, therefore, might be a neural basis for enhanced ability to establish and consolidate the link between voice trait and identity in blind people. This result is corroborated by previous evidence that blind individuals learn faster in voice-recognition training [35,42,61] and are more accurate in delayed voice-identity recognition compared with sighted counterparts [43]. Taken together, the strengthened intra-network functional connectivity between the distributed voice-sensitive areas might play a critical role in the voice recognition of the early blind. More specifically, the amygdala appeared to be a key component in the voice perception network.

Reorganization of the Internetwork Connections between the Voice-and Face-Sensitive Areas in the Early Blind
Neuropsychological and neuroimaging studies provide mounting evidence for the multimodal integration of facial and vocal information during identity processing [26,28]. Voice-and face-sensitive areas are functionally and anatomically connected for transferring the identity information during voice recognition [19,[22][23][24]. The exchange of information between the two systems facilitates identity processing in sighted people [27,62]. The find-ings of the current study in the early blind are consistent with previous work by showing a positive association between the FC of TVAp-FFA and voice-recognition performance.
More importantly, we found that the strengths of the FC between FFA/OFA and TVAa were reduced in the blind group, indicating the absence of crossmodal integration of facial and vocal information due to visual deprivation. Similarly, it was reported that auditory deprivation would introduce a significant reduction of fractional anisotropy and increment of radial diffusivity in the V2/V3-and FFA-TVA connections [25]. We speculate that vision loss in blind individuals disrupts the visual input to FFA, leading to the absence of crossmodal sensory integration in the FFA-TVA pathways and the consequent reduced connectivity in the FFA-TVA pathways. Our speculation is further supported by consistent findings across previous studies that the TVA (particularly the anterior part) as an association area receives identity information (such as gender or age) conveyed both by facial and vocal stimuli [24,26,63,64].
Taken together, our result that the functional connectivity between the voice-and facesensitive areas promoted voice identity processing is consistent with the well-documented "integrative model" of personal recognition in sighted people [27,62,65], but the absence of crossmodal sensory integration induced by visual deprivation leads to reduced coupling between the voice-and face-sensitive areas in early blind individuals.

Neuroplastic Changes of the Face-Sensitive Areas in the Early Blind
The blind group outperformed the sighted group in the delayed memory phase (but not in the generalization phase) during the voice-recognition task. It is possible that early blindness promoted the long-term memory consolidation of speaker identity. Indeed, we found that blind participants' performance during the delayed memory phase was positively correlated with the FC between the amygdala and FFA. Previous studies have provided strong evidence for the direct white matter pathway [66] and high functional coupling between the amygdala and FFA [67,68]. A meta-analysis study revealed that the superficial subregion nucleus of the amygdala and FFA were primarily involved in cognitive memory [69]. Our data suggest that the efficiency of functional connectivity between the amygdala and FFA may modulate the long-term memory storage for voice through the retained pathways in the early blind.
Moreover, the FC between FFA and OFA was associated with voice-recognition accuracy only in the blind group. Given that the network of face perception was composed of distributed patches such as FFA and OFA [10] and that the FC between FFA and OFA plays a critical role in face perception among sighted people [70], our result indicates that face-sensitive areas can retain their functional selectivity in blind people [30,32,71]. This is consistent with previous findings that the FFA could be activated by auditory-only voice recognition without corresponding face training in sighted people [19,23]. In addition, a recent fMRI study using multivoxel pattern analysis and functional cortical mapping techniques demonstrated that blind individuals could develop category selectivity (face, body, etc.) in the ventral-temporal cortex which was strikingly similar to the sighted controls [72]. However, given that visual impairment disrupts cortical processing of facial properties, it remains inconclusive whether the disrupted face system was dedicated to early or late stages of voice processing or both.
Meanwhile, we observed that FCs between FFA/OFA and IFG were enhanced in the blind group. The inferior frontal areas are considered as extended parts of both the voice perception network [17,18,73] and the face perception network [10]. The frontal regions associated with voice recognition are directly adjacent to the regions involved in face recognition [9]. Further investigations are needed to clarify the precise role of IFG and its subregions in voice perception.

Conclusions
The clear group differences in the current behavioral and resting-state fMRI data reveal plastic changes in the neural substrates for voice recognition associated with early visual deprivation. Specifically, the internal links of the intact voice system were enhanced, while the connections between the core part of the voice system and the disrupted face system were decreased in the early blind. Despite visual deprivation in blind individuals, intrinsic brain activities independent of experimental tasks showed that the face system was not excluded from the processing of personal identity; instead, it was found to be actively involved in voice recognition via the connections between the core face-sensitive areas (e.g., FFA and OFA) and the amygdala/IFG. These findings are in line with the "metamodal" theory that the two systems conduct similar computational operations during face and voice processing during the functional reorganization [28], which may facilitate blind individuals' talker identity recognition and their adaptation to the social environment in daily life.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/brainsci13040636/s1, Table S1: Characteristics of the early blind participants; Table S2: Region coordinates for defined regions of interest in the left hemisphere; Table S3: The results of group comparisons between the sighted and blind participants in the strength of functional connectivity; Table S4: The results of group comparisons between the sighted control and early blind subjects in the strength of functional connectivity in the left hemisphere; Table S5: Correlation of voice-recognition accuracy and the strength of functional connectivity in the sighted; Table S6: Correlation of voice-recognition accuracy and the strength of functional connectivity in the blind; Table S7: The comparisons of correlations between the sighted control and early blind groups.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Peking University (approval code: #2015-12-06).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study and written informed consent has been obtained from the subjects to publish this paper.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author, W.P., upon reasonable request.