Humans can listen to and detect a variety of sounds under different conditions. To be specific, a normal human behavioral frequency difference limen (FDL), which is the just noticeable difference in hearing, is 1.22% to 4.02% in 140 Hz, and 0.25% to 2.50% between the frequency of 80–400 Hz [1
]. However, the FDL measured by the electrical frequency-following response emanating directly from the brainstem neurons is even smaller (75%) than the behavioral FDL [1
]. Accordingly, FDL might vary from person to person, and the brainstem can detect a smaller FDL than the behavioral perception. It would be interesting to investigate the factors for these discrepancies, and to find methods to improve the behavioral perception. To calculate FDL, Nelson, Stanton, and Freyman proposed a square-root function between log (FDL) and frequency [3
], whereas Micheyl, Xiao, and Oxenham defined the equation in one of their models, as follows [4
Regarding this, the stimulus presentation frequency (f
), duration (d
), and level (s
) would affect the just-noticeable difference in pure-tones [4
]. In particular, a greater decrease of FDL was found in a smaller tonal duration (d
; 5 ms) and a larger sensation level (s
; 80 dB SL), in which the effect remained unchanged with a tonal duration of 200 ms; and was more significant in low frequency (f
; 200 Hz) [5
]. The gliding frequencies were also found to have a higher detection rate of frequency change, as compared with the discrete frequencies [7
]. These factors are related to the presentation of frequencies and might decrease FDL. Yet, by managing these controllable factors, there was still a discrepancy between the behavioral FDL and the FDL recorded from the frequency-following responses in the brainstem neurons [1
]. Therefore, this study attempted to examine other factors, such as the visual–auditory interaction, synesthetic experience, and musical training, so as to minimize the FDL of the gliding frequencies, which is related to interactions with the visual system and attention allocation. Many researchers have conducted studies on FDL [8
], but most of them examined discrete frequency discrimination, which required participants to discriminate two separate pure-tones. There have been very few studies about the FDL of gliding frequencies [12
] and the features of the neural systems that are sensitive to detecting gliding frequencies (frequency shift detectors—FSDs) [13
]. The present study can fill in the literature gap and shed light on the understanding of gliding frequency perception, as well as approaches to overcome the perceptual limit in auditory perception. For example, it would be beneficial for training for the ability to recognize each pitch without an external reference—namely the absolute pitch [15
Studies have found that visual–auditory interactions are beneficial for increasing the accuracy of perception [16
]. Hirata and Kelly found a bigger improvement in phoneme learning when English speakers were allowed to look at the lip movement during Japanese audio training [17
]. In contrast, the improvement was very little when the English speakers were provided with the hand gesture or listened to the audio tape alone. As demonstrated in this example, combining the visual and auditory information during the perception process can achieve a more accurate perception. Considering the attention models that explain the visual–auditory interaction, Wicken proposed a multiple resource model, in which different modalities were processed in different channels, which would facilitate each other while activities in the same modality would compete each other and therefore hinder the performance [18
]. Based on the rule of cross-modal similarities, Marks studied four inter-sensory relations, which were pitch-lightness, pitch-brightness, loudness-brightness, and pitch-object form [19
]. He demonstrated that the congruent presentation of auditory and visual stimuli led to a higher accuracy and faster response than the incongruent presentation. For example, he showed that during a discrete frequency discrimination task, pairing a higher frequency with a brighter visual stimulus resulted in a shorter reaction time than pairing with a dim visual stimulus. This phenomenon illustrated that when there is a similarity in the visual and auditory information, a faster and more accurate response can be obtained [19
]. Consistent with this early study, there is event-related potential (ERP) and functional magnetic resonance imaging (fMRI) evidence that auditory cortical responses could be enhanced when the tones are paired up to an attended visual stimulus [21
]. As demonstrated by the studies above, the congruent presentation of visual stimuli and auditory stimuli would result in a faster reaction time and higher response accuracy through the cross-modal interaction. The studies above revealed that different sensory modalities are interlinked and can facilitate responses. Therefore, visual cues can be a potential factor to facilitate auditory perception, in which a congruent visual cue might facilitate a better auditory perception. Therefore, this study applied the cross-modal interaction to test the minimization of FDL, through the interaction with visual modality. The cross-modal interaction is similar to the Stroop task (i.e., naming a color word that is printed in another color), in which the task performance is hindered by a mismatched condition between the meaning of the word and its color, while it is facilitated by a matched condition between the meaning of the word and its color [22
]. The current study targets the improvement of auditory perception with the presence of visual cues. As a congruent visual cue could predict a more accurate response, it is expected that it could help to reduce the behavioral FDL.
Despite the benefits from the cross-modal interaction, the visual–auditory interaction, on certain occasions, can distort perception. Take the sound-induced flash illusion as an example; the perception of two auditory tones being paired with one flash light would be perceived as two instances of flash lights [23
]. Similarly, the ventriloquist effect was defined as the perception of voice that seems coming from a direction other than the true place [24
]. Moreover, the Mcgurk effect is an example of merged visual–auditory information; that is, the auditory perception of “ba” and “ga” to be “da” is a form of intermediate perception [25
]. In other words, this intermediate perception does not improve either the visual or auditory perception, but generates other percepts. As demonstrated by the experiments above, vision could sometimes distort the auditory perception. In contrast to the multiple resource theory that has been mentioned in the previous paragraph [18
], Kahneman proposed a single pool of attention resources (i.e., common resource theory [26
]), and early research indicated the shared capacity in processing visual and auditory discrimination by demonstrating the difficulties of discriminating pitch and light intensity at the same time [27
]. Recent research also found that visual tasks that demanded low attentional resources could improve auditory thresholds [28
]. Moreover, the evidence that visual and auditory processing occurred at the central level rather than the two peripheral mechanisms seemed to support the common central resources model [29
]. A recent study also suggested sharing visual and auditory attentional resources, in that visual–spatial and auditory–spatial information did not facilitate performance [30
]. It revealed that resources are limited, and vision and auditory processing are two dependent processing mechanisms that compete for the central resources. Vision and auditory processing would influence each other. For example, visual processing was impaired with concurrent spoken messages [31
]; the duration for the perception of effort and exertion in physical exercise was longer in the people who used visual and auditory senses together [32
]. Furthermore, the cross-modal Stroop task demonstrated that the hearing of auditory color words could distract and slow down people’s performance in a color naming task [33
]. In view of these studies, attention was shown to be limited and shared by different processing. Thus, to improve auditory processing, the common resource model would suggest allocating more attentional resources to auditory perception than visual perception, through visual deprivation. As resources are limited, it might be that humans would allocate more resources to auditory processing when vision is deprived. Studies have revealed that blindness can bring a range of improvements in auditory perception [34
]. Similarly, a study also showed that being blindfolded for ninety-minutes enhanced the performance of harmonicity perception. This could be explained by the metamodal model, in which the deprivation of visual inputs would rapidly release nonvisual inputs (i.e., auditory and tactile) from suppression, because the domination of visual sensory in the striate cortex is halted [36
]. In an animal study, Petrus et al. found an improvement in the frequency selectivity and discrimination of the primary auditory cortex (A1) neurons after visual deprivation for 6–8 days [37
]. Another animal study also revealed that visual deprivation would refine the intra- and inter-laminar connections in the auditory cortex (A1) [38
]. In connection with this, Williams demonstrated that vision is responsible for two-thirds of the brain’s electrical activity when opening eyes [39
]. In view of this high consumption of resources in visual perception, blindfolding would suppress visual processing and allocate more attentional resources to auditory processing. From the review above, the common central resource model suggests that blindfolding would enhance auditory perception by channeling more resources to auditory processing. Therefore, blindfolding is another potential factor that may minimize FDL.
Synesthesia is an involuntary sensory experience where the stimulation of one modality evokes the sensation of another modality [40
]. In general, around 5% of the population have experienced one type of synesthesia [42
], such as auditory–tactile synesthesia, chromesthesia (sound-to-color synesthesia), grapheme-color synesthesia, and auditory–visual synesthesia. There are two hypotheses regarding the mechanisms of synesthesia. The cross-activation hypothesis suggests that the synesthetic experience is due to the excessive neural connections between the adjacent cortical areas [43
]. In contrast, the disinhibited-feedback hypothesis proposes that synesthesia is the consequence of the inhibition failure between the brain areas [44
]. No matter which mechanism is correct, the two sensory modalities are inter-linked, and a cross-modal interaction plays a part in synesthesia. This points to the possibility of improving one’s perception through the synesthetic experience from another modality. Indeed, there is evidence that the synesthetes have a better visual ability. By measuring the vividness of the mental image through the Vividness of Mental Imagery Questionnaire (VVIQ), the synesthetes shared a major characteristic of having a more vivid mental image than the non-synesthetes [45
]. Despite the fact that most of the subjects in the study were linguistic-color synesthetes, a few were colored music and visual-taste synesthetes. Furthermore, a study demonstrated that the participants with a high VVIQ score could be trained to acquire the grapheme-color synesthesia through associative learning, which involved extensive memory and reading exercises [46
]. Although auditory perception was not involved in this study, it revealed that a synesthetic experience could be induced in non-synesthetes. It also suggested that non-synesthetes with a high VVIQ score could be trained to become “synesthete” and acquire the advantages of a cross-modal interaction in auditory perception. Synesthetic experiences involving one modality might favor a better performance in another modality through a stronger association between the modalities. A neurological study found an increased activation in the left inferior parietal cortex (IPC) of the auditory–visual synesthetes when compared to the non-synesthetes [47
]. As IPC is responsible for multimodal integration and feature binding, the researchers believed that the auditory–visual synesthetes had a more enhanced sensory integration ability than the non-synesthetes. Accordingly, a synesthetic experience might facilitate a better visual–auditory interaction and thus improve auditory perception. In addition, in the case of a visual flash causing auditory synesthetic experiences, the synesthetes demonstrated an excellent ability in a difficult visual task involving rhythmic temporal patterns [48
]. In this case, the advantage of performing the visual task was not only owing to the visual system, but also because of the “hearing” of the rhythmic temporal patterns in the auditory perception. Therefore, when two senses interacted and intertwined together, they benefited each other. It is worth examining whether a visual synesthetic experience would facilitate or impair the frequency discrimination. With reference to the cross-modal interaction, it could enhance the auditory discrimination ability and result in a smaller FDL. The study of visual synesthetic experience would provide pragmatic information about the inter-sensory processing and clarify the argument between cross-modal perception and unimodal perception in synesthesia.
Musical training has been found to be a crucial factor for improving auditory perception. Musicians have a half FDL, earlier pitch change detection, and better ability to discriminate frequencies, in comparison with non-musicians [49
]. A previous experiment demonstrated that the threshold of pitch discrimination for musician participants was six times smaller than that for non-musician participants [51
]. It further indicated that the non-musician participants needed at least 14 h of training to attain the similar pitch-discrimination threshold as the musician participants [51
]. Therefore, ordinary people can acquire an enhanced pitch-discrimination ability if they receive musical training. As indicated by a larger amplitude of N2b and P3 responses during attentive listening, professional musicians also showed a faster and more accurate pitch detection than non-musicians [52
]. Furthermore, musicians seemed to have a different neuroanatomy from non-musicians, such as an increased amount of grey matter, and therefore their neural encoding of sound is superior [53
]. Therefore, musical training might somehow “train’ the brain to acquire better auditory abilities. Besides visual–auditory interaction, musical training can be another important factor for improving the hearing experience. The present study examined whether musical training would expand the limit in auditory perception. Despite the previous evidence that musicians have a smaller FDL, this study examined the effect of musical training on FDL in gliding frequency perception. Although Gottfried and Riester already demonstrated that music students have a better performance in pitch glide identification tests, their results were based on accuracy, not FDL [14
]. Thus, the present study could provide information of the FDL of gliding frequencies.
The present study aimed to investigate the effects of visual cues, blindfolding, synesthetic experience, and musical training on behavioral FDLs. Both multiple resource and common resource models imply an advantage of these four factors for frequency discrimination. Therefore, it was hypothesized that either providing visual cues or minimizing visual inputs could reduce FDL. Moreover, given the stronger communication between the visual and auditory modalities in synesthesia, a smaller FDL was expected to be found in participants with a synesthetic experience. Finally, it was hypothesized that the participants with musical training would have a smaller FDL.
2. Materials and Methods
Ninety university students (37 males and 53 females) were recruited to participate in this study, aged 17 to 25 years (M = 20.32 years old, standard deviation (SD) = 1.30). All of the participants were asked to complete the informed consent and a self-report background questionnaire before the experiment, in order to ensure normal hearing and visual ability.
Nine participants’ data were screened out because of program errors (i.e., the frequency range was not reduced after a correct answer or was reduced after an incorrect answer).
The experimental setup was adapted from the study of Demany, Carlyon, and Semal (2009) [54
]. The experiment was conducted in a quiet environment. All of the tested frequencies were sinusoidal waveform pure-tones that were generated through MATLAB and delivered through Earpods. An Asus UX430U laptop with the Realtek High Definition Audio and WDM audio device was used in this experiment. The reference frequencies were 110, 440, and 1760 Hz, with a sound pressure level (SPL) of 65 dB. These three reference frequencies were chosen because they represent the second octave, fourth octave (middle octave), and the sixth octave in the piano [55
], which constitute the common frequency range in a music piece. Specifically, 440 Hz was chosen because it was the tuning preference in orchestra [56
All of the initial target frequencies were set a semitone higher or lower than the reference frequency, as it was the basic detectable difference of a frequency in music. In this study, FDL was defined by the difference between the reference frequency and the initial target frequency. Table 1
shows all of the reference frequencies and the initial target frequencies in this study.
In each condition, as gliding frequencies have a higher detection rate of frequency change than discrete frequencies [12
], the two pure-tone frequencies were arranged to glide upwards or downwards, smoothly and randomly in 750 ms, instead of being separated by a silent interval. Within 750 ms, the first 250 ms was the reference frequency, the second 250 ms was the gliding effect, and the last 250 ms was the target frequency. An unlimited pause of the inter-stimulus interval was given in each trial for responding. The schematic representation of the stimuli is shown in Figure 1
Interventions and questionnaires were applied in order to investigate the effects of the four factors on FDL. In particular, the effects of visual cues and blindfolding were examined by interventions, while the effects of synesthesia and music experience were examined through questionnaires.
The design of visual cues was adapted from the study by Ben-Artzi and Marks (1995), which revealed a positive relationship between the pitch change and the position of the dot [20
]. To minimize the efforts of the searching dots, this study adopted a continuous straight line as the visual cue. This line showed the gliding direction when a participant listened to the sound track and identified the change of the frequency. The schematic representation of visual cues is shown in Figure 2
. To prevent participants from noticing the answers directly from the cues, both congruent and incongruent cues were presented to them. For the congruent cue, the line went up or down in the middle of the screen, according to the gliding direction in the sound track; for the incongruent cue, the line went in the opposite direction to the gliding direction in the sound track. To prevent habituation and to indicate the start of the next trial, a fixation cross (+) was flashed at the beginning of each trial.
Participants in the blindfolded group were instructed to complete the frequency discrimination test with an eye mask.
To assess the synesthetic experiences, this study adopted the VVIQ and the Projector–Associator Test (PAT) from the Synesthesia Battery on https://www.synesthete.org/
. This battery was a standardized test to investigate and study synesthesia [40
The VVIQ scale consists of 32 five-point Likert-scale questions, which evaluate the vividness of the mental imagery. The Cronbach’s alpha for the VVIQ scale was 0.92 in this study.
The PAT consists of 12 five-point Likert-like items, which measure the types of synesthetic experiences. The Cronbach’s alpha for the PAT was 0.88. Those of the projector scale and the associator scale were 0.78 and 0.83, respectively.
In addition to VVIQ and PAT, the background information of the participants, the presence of a perfect/absolute pitch, the years of musical training, and the presence of visual–auditory synesthetic experience were collected.
Before the experiment started, a frequency discrimination test (Lutman, 2004) was applied to ensure the ability to discriminate frequencies [57
]. This test contained 14 trials and required participants to choose a higher tone between the reference frequency (500 Hz) and the target frequency (seven trials with a 5% change, or seven trials with a 2% change). Participants needed to obtain seven accurate responses or more to pass the test. In this study, all of the participants passed this screening test (M
= 91.14% of accuracy, SD = 1.45).
After that, five practice trials were given for demonstrating the operation and to ensure that all of the participants were confident with the experimental procedures. All of the participants were required to perform both an experimental session and a control session; they were randomly assigned to participate in one of the sessions first. For the experimental session, the participants were randomly divided into two conditions—visual cues and blindfolding. In the visual cues experimental session, a total of 180 trials were presented to each participant. They comprised fifteen trials of gliding upwards and fifteen trials of gliding downwards in three frequency levels (low/middle/high) and two types of visual cues (congruent/incongruent). All of the trials were randomly presented.
In the blindfolded experimental session, a total of 90 trials were presented to each participant. It was composed of fifteen trials of gliding upwards and fifteen trials of gliding downwards in three frequency levels (low/middle/high). All of the trials were randomly presented, and the participants were required to put on an eye mask during the whole experimental session.
After finishing the experimental session, the VVIQ and PAT questionnaires were distributed.
Next, the control session was given with a similar procedure to the blindfolded condition, except that the participants were instructed to focus on the fixation cross (+) on the screen. To eliminate the habituation effect, the fixation cross was flashed once before every trial. Ninety trials were randomly presented, and were different from those of the blindfolded experimental session.
In each trial, participants were told to discriminate whether there was an upward change, downward change, or no change in the frequency tone. To indicate their response, they were instructed to press an upward arrow (↑), downward arrow (↓), or right arrow (→) in the keyboard when they thought the operating sound track was increasing in frequency, decreasing, or remained unchanged, respectively. For each correct answer, the frequency change, which was the difference between the reference frequency and the target frequency, decreased by half. On the other hand, each incorrect response increased the change by one-fourth of the original change.