Sound Quality Factors Inducing the Autonomous Sensory Meridian Response

The acoustical characteristics of auditory triggers often recommended to generate the autonomous sensory meridian response (ASMR) on Internet platforms were investigated by parameterizing their sound qualities following Zwicker’s procedure and calculating autocorrelation (ACF)/interaural cross-correlation (IACF) functions. For 20 triggers (10 human- and 10 nature-generated sounds), scores (on a five-point Likert scale) of the ASMR, perceived loudness, perceived pitch, comfort, and perceived closeness to the sound image were obtained for 26 participants by questionnaire. The results show that the human-generated sounds were more likely to trigger stronger ASMR than nature-generated sounds, and the primary psychological aspect relating to the ASMR was the perceived closeness, with the triggers perceived more closely to a listener having higher ASMR scores. The perceived closeness was evaluated by the loudness and roughness (among Zwicker’s parameter) for the nature-generated sounds and the interaural cross-correlation coefficient (IACC) (among ACF/IACF parameters) for the human-generated sounds. The nature-generated sounds with higher loudness and roughness and the human-generated sounds with lower IACC were likely to evoke the ASMR sensation.


Introduction
The autonomous sensory meridian response (ASMR) is an atypical sensory phenomenon in which individuals experience a tingling, static sensation across the scalp and back of the neck in response to specific triggering audio and visual stimuli or to light touch [1]. This sensation is widely reported to promote relaxation, wellbeing, and sleep, and there are many ASMR-related channels on YouTube. Some researchers have examined the relationship between the ASMR and misophonia [2][3][4]. Misophonia is an auditory disorder of decreased tolerance to specific sounds or their associated stimuli such as oral sounds (e.g., loud breathing, chewing, swallowing), clicking sounds (e.g., keyboard tapping, finger tapping, windshield wipers), and sounds associated with movement (e.g., fidgeting) [5][6][7][8]. The ASMR triggers produce positive emotions associated with an increase of wellbeing, while the misophonia triggers produce negative emotions associated with fight-or-flight responses. Although the displayed emotions are opposite, both are caused commonly by hypersensitivities to sound triggers, and it is possible that the acoustical characteristics of the ASMR triggers may explain the occurrence mechanism of the misophonia. Actually, a previous study reported that people who experienced the ASMR were more likely to have a risk of misophonia [2].
Several common audio and visual stimuli (triggers) that induce the ASMR are known, and an online ASMR experience questionnaire completed by 475 individuals identified the trigger types as whispering (75%), personal attention (69%), crisp sounds (64%), and slow movements (53% participants reporting the ASMR experience) [1]. Following this questionnaire, many studies on the ASMR have empirically selected such highly possible triggers [9][10][11][12][13]. However, it is not clear which physical characteristics of these triggers induce the ASMR.
In the case of audio signals, numerical models have been proposed to define the sound quality. Perceptual characteristics of the hearing of sound are the loudness, pitch, and timbre, and the sound quality is expressed generally by numerical algorithms based on varying sound pressure. As an example, Zwicker's parameters (loudness, sharpness, roughness, and fluctuation strength) have been used to evaluate the sound quality of environmental noise [14]. The loudness is the psychological sound intensity, and it is calculated by transforming the frequency onto the Bark scale, considering the effects of frequency and temporal masking, and counting the area of the loudness pattern [15]. The loudness of a pure tone with a frequency of 1 kHz and sound pressure level of 40 dB is defined as being 1 sone. The sharpness is a measure of the sound acuity and high-frequency component, and is obtained by adding a weight function to its specific loudness [16]. The sharpness of a noise at 60 dB in a critical band at 1 kHz is defined as being 1 acum. The roughness is a fundamental hearing sensation caused by sound with rapid amplitude modulation (15-300 Hz) and is quantified on the basis of the modulation frequency and depth of the time-varying loudness [16]. The roughness of a 1 kHz tone at 60 dB with a 100% amplitude modulation (modulation depth of 1) at 70 Hz is defined as being 1 asper. The fluctuation strength is similar in principle to roughness except that it quantifies the subjective perception of the slower (up to 20 Hz) amplitude modulation of a sound, and it is calculated from the modulation frequency and depth of the time-varying loudness [16]. The fluctuation strength produced by a 1 kHz tone at 60 dB with a 100% amplitude modulated at 4 Hz is defined as being 1 vacil.
The other procedure for evaluating sound quality is using the autocorrelation and interaural cross-correlation functions (ACF and IACF) frequently used for music and acoustics in concert halls [17]. Our auditory perceptions are deeply related to the timing of nerve firings caused by binaurally detected sounds, and the ACF and IACF are modeled in the processors of the auditory nerve [18,19]. Three parameters can be calculated from ACF analyses of monoaurally recorded sound: (1) the delay time of the maximum peak (τ 1 ), (2) the amplitude of the first maximum peak (φ 1 ) and (3) the width of the peak at the original time [W Φ(0) ] (see Section 2.2 for details). The fundamental frequency (1/τ 1 Hz) and the pitch strength of the sound are τ 1 and φ 1 , respectively. The spectral centroid of the original signal is W Φ(0) , with longer and shorter values, respectively, corresponding to lower and higher centroid values of spectral energy signals. These ACF parameters explain not only the musical motif suitable for a specific concert hall [17] but also annoyance induced by noise [20,21] and speech intelligibility [22,23]. From the IACF analyses of binaurally recorded sound, the interaural cross-correlation coefficient (IACC) can be calculated (see Section 2.1 for details). The IACC is the maximum peak amplitude of the IACF whose delay time is within ±1 ms. The IACC is related to the subjective sound diffuseness, which means that a higher IACC corresponds to the listener perceiving a well-defined direction of the incoming sound, whereas a lower IACC corresponds to a well-diffused sound. Such ACF and IACF parameters have also been used for the evaluation of several types of noise [24][25][26][27].
The present study identified physical factors that induce the auditory-based ASMR sensation using the four Zwicker parameters and four ACF/IACF parameters. We prepared a total of 20 sound motifs likely to induce the ASMR and calculated the eight sound quality parameters. To confirm the occurrence of the ASMR, previous studies have adopted physiological (e.g., functional magnetic resonance imaging or heat rate) [11,28,29] and psychological (e.g., questionaries) [1,9,10,12,13] procedures. The present study adopted the psychological approach, with participants quantifying the degree of the perceived ASMR on a five-point Likert scale. In addition to the ASMR, the participants scored four subjective sensations (subjective loudness, pitch, comfort, and closeness) at the same time. We examined the correlation of the ASMR scores with the four subjective sensations and eight sound quality parameters.

ASMR Triggers and Sound Quality Parameters
The 10 auditory ASMR triggers (human-generated sounds) used in the study, and 10 healing sounds (nature-generated sounds) recorded binaurally were added for the comparison ( Table 1). The human-and nature-generated sounds were obtained from several websites and music distribution sites, respectively. The human-generated sounds were recorded by a dummy head microphone or a binaurally wearing microphone. Although the nature-generated sounds do not have information on the recording devices, the participants of this study could perceive the sound images close to them with binaural hearing. For the sake of expediency, both sounds are called as trigger. The human-and nature-generated sounds, respectively, represent sounds generated by human behaviors (e.g., the cutting of vegetables and typing at a keyboard) and natural phenomena (e.g., waves and rain). The time length of each trigger was 50 s, and the sound energy was set at the same equivalent continuous A-weighted sound pressure level (L Aeq ) of 45 dBA. Table 1 lists the sound quality parameters. The Zwicker parameters were calculated using a Matlab command embedded in Auditory Toolbox [30]. The calculation algorithms were based on work in the literature [14][15][16]. The calculations of roughness and fluctuation strength had running steps of 0.5 ms and 2 ms, respectively, along the time length of 50 s, and Table 1 lists average values of the time-varying parameters. The ACF parameters were calculated from the normalized ACF: where Here, τ is the delay time [s], s is the running step [s], 2T is the integration interval [s] and p l (t) is the sound in the left channel at time t after passing through an A-weighted network. The ACF parameters were the (1) delay time of the maximum peak (τ 1 ), (2) amplitude of the first maximum peak (φ 1 ) and (3) width of the peak at τ = 0 (W Φ(0) ), calculated by doubling the delay time at which the normalized ACF becomes 0.5 times that at the origin of the delay (Figure 1a). Additionally, τ 1 and φ 1 are related to the pitch (high or low) and pitch strength (clear or ambiguous) perceived in the periodical part of the sound. The spectral centroid is equivalent to W Φ(0) , and a sound with greater W Φ(0) is thus perceived as having a lower pitch in the noisy part.

Participants
We recruited 26 participants (20 men and 6 women; age: 21.7 ± 0.4 years) who had normal hearing. All participants self-reported that they knew of the ASMR through watching Japanese YouTube channels. The institutional ethics committee approved the experimental protocol (approval code: R3-19).

Tasks and Procedures
After listening to the ASMR trigger (50 s) through headphones (HD598, Sennheiser, Wedemark, Germany) binaurally, the participants were instructed to provide scores on a The IACC was calculated from the normalized IACF: where Here, Φ rr is the ACF for the right channel and p r (t) is the A-weighted sound in the right channel. The IACC is the maximum peak amplitude of the IACF whose delay time is within ±1 ms (Figure 1b). The IACC is related to the subjective sound diffuseness mentioned in the Introduction. The integration interval (2T) and running step (s) were, respectively, 1 and 0.5 s for the both ACF and IACF calculations, and Table 1 lists average values of the time-varying parameters.

Participants
We recruited 26 participants (20 men and 6 women; age: 21.7 ± 0.4 years) who had normal hearing. All participants self-reported that they knew of the ASMR through watching Japanese YouTube channels. The institutional ethics committee approved the experimental protocol (approval code: R3-19).

Tasks and Procedures
After listening to the ASMR trigger (50 s) through headphones (HD598, Sennheiser, Wedemark, Germany) binaurally, the participants were instructed to provide scores on a five-point Likert scale in the subsequent 10 s. The L Aeq at the ear positions was adjusted to 45 dBA. After mounting the headphones on a head and torso simulator (type 4128; Brüel & Kjaer, Naerum, Denmark), the output level was adjusted to the 45 dBA in the average of the left and right channels. The participants were asked to give scores (−2, −1, 0, 1 or 2) for the degree of perceived loudness (from −2: not so loud to 2: very loud), perceived pitch (from −2: very low to 2: very high), comfort (from −2: not so comfortable to 2: very comfortable), perceived closeness to the sound image (from −2: very far to 2: very close) and ASMR (from −2: not feeling an ASMR to 2: feeling a strong ASMR) on the question sheet. The order of presentation of the AMSR triggers was randomized. The experiment was conducted in an anechoic chamber (L Aeq of the background noise below 30 dB) at Osaka University, Japan. The Matlab was used to calculate the statistical values in the following section. Figure 2 shows the average scores of the subjective loudness, pitch, comfort, closeness, and ASMR for the human-(black symbols) and nature-generated (gray symbols) sounds. The subjective loudness, closeness, and ASMR scores tended to be higher for the humangenerated sounds than for the nature-generated sounds. According to a t-test of the total scores of the human-(260 = 10 ASMR triggers × 26 participants) and nature-generated (260) sounds, there were significant differences in the subjective loudness (t 338 = 3.65, p < 0.01), closeness (t 338 = 8.69, p < 0.01), and ASMR (t 338 = 7.84, p < 0.01). In contrast, the comfort was higher for the nature-generated sounds (t 338 = 6.28, p < 0.01) and there was no significant difference in the perceived pitch between the nature-and human-generated sounds (t 338 = 0.28, p = 0.78). The three sounds with the highest ASMR values were Earpick, Shampoo, and Book for the human-generated sounds and Volcano, Lava, and Bubble for the nature-generated sounds, and they were commonly perceived to be close. The three sounds with the lowest ASMR values were Cutting, Heels, and Brush for the human-generated sound and Cicada, Bamboo, and Rain for the nature-generated sounds, and they were commonly perceived to be far.  Black and gray symbols are results for human-and nature-generated sounds, respectively. The bar on each symbol shows standard deviations. The black and gray horizontal dot lines are total averaged scores for human-and nature-generated sounds, respectively. Table 2 shows the Pearson correlation coefficients of the ASMR scores with the sound quality parameters that had normal distributions. The ASMR scores of the nature-generated sounds were strongly correlated with loudness and roughness among the Zwicker parame-ters. Meanwhile, the ASMR scores of the human-generated sounds were strongly correlated with the IACC among the ACF/IACF parameters. Figure 3 shows the ASMR scores as functions of loudness, roughness, and IACC which showed high Pearson correlation coefficients. The strong negative relationship could be observed in the IACC for the human-generated sounds, while the positive relationships could be observed in the loudness and roughness for the nature-generated sounds. Table 2 lists the correlation coefficients of the ASMR scores with the scores of the other psychological judgements, too. The subjective loudness had a high correlation with the ASMR generated by the nature-generated sounds. Additionally, closeness had a high correlation with the ASMR generated by both human-and nature-generated sounds.

Discussion
The primary reason why the ASMR scores of the human-generated sounds were significantly higher than the nature-generated sounds may be the distance from the sound source to the receiver. In fact, the perceived closeness was strongly related to the ASMR sensation ( Table 2). The human-generated sounds were recorded at a position close to the binaural devices whereas the nature-generated sounds were recorded at a certain distance from the sound source. Additionally, the ASMR triggers used in previous studies (e.g., whisper voice, personal attention, and crisp sounds) were recorded close to the binaural microphone [1,[9][10][11][12][13]. In these triggers, the personal attention refers to role-play videos that concentrate on the viewer, so that it is not just an ASMR trigger but the scenario/context in which the triggers occur. To examine acoustical aspects in the triggers, sounds including the scenario/context (e.g., speech) were removed from the triggers used in this study. However, the Earpick, Shampoo, and Hair sounds that had high ASMR scores made the participants imagine to be acted upon themselves. It seems undeniable that such unintended personal attention might help the ASMR sensations for these triggers, and the very closed triggers to the participants are likely to induce the pseudo-personal attention.
For nature-generated sounds, sound qualities relating to higher loudness and roughness induced the ASMR experience ( Figure 3). These parameters also had high correlations with the closeness scores (loudness: r = 0.73, p < 0.05, roughness: r = 0.77, p < 0.01). The nearby sounds produce the ASMR, whereas some listeners are annoyed by sounds close to their ears. Therefore, the comfort scores were significantly lower for the human-generated sounds (Figure 2c). Although it is well known that people who experience ASMRs report feeling relaxed and sleepy after watching and listening to ASMR content, some people feel annoyance from the triggers [4]. The hypersensitivity for the auditory perception is the same origin for the ASMR and misophonia; however, higher-order cognitive processing may divide expressed emotions into the preference for the ASMR or annoyance for the misophonia [3]. The very closed sound makes the listeners imagine either the positive personal attention or negative invasion of territory. Separation at the cognitive processing may be related to the different interpretation of the closeness. If this study contains speech signals addressing the participants, the comfort scores for the human-generated sounds may be improved.
Although a previous ASMR study reported that sounds with a lower pitch were more likely to produce an intense ASMR sensation [9], the pitch scores and ACF/IACF parameters relating to pitch (i.e., τ 1 , φ 1 and W Φ(0) ) did not affect the ASMR score ( Figure 2b and Table 2). The bass or low-frequency response is higher when a sound source is close to a directional or cardioid microphone (in what is known as the acoustical proximity effect) [31]. In this study, the acoustical proximity effect might occur to the same degree for any human-generated sound that is sufficiently close to the binaural microphones.
The human-generated sounds with a lower IACC produced a stronger ASMR sensation ( Figure 3). The IACC is related to the spatial characteristics of a sound field, and it can thus control the location of a sound image. In concert halls (having a diffused sound field), the IACC is lower when the distance between the sound source and receiver is greater [32], because the direct sound that tends to increase the IACC is weakened relative to reflections and reverberations. In contrast, in laboratory experiments, the IACC can be controlled by changing the interchannel phase difference of stereo loudspeakers in front of the listener, and a sound with lower IACC can generate a sound image closer to the listener (in what is referred to as auditory distance rendering) [33][34][35][36][37]. This phenomenon observed in auditory distance rendering agrees with the results of the present study. However, the binaural phase of the ASMR triggers used in this study was not manipulated digitally; therefore, there may be another explanation in this case. The IACC indicates the similarity of time-varied sounds entering the left and right ears. It is thus expected that sound near one ear (e.g., the sound heard when using an earpick) has low similarity (low IACC) between the ears, and we thus have to separate the relationships between the IACC and the distance from the sound image into near and far fields centering around the listener's head. Finally, we discuss the possible applications of these findings in clinical treatments for misophonia. The most successfully used treatment at the clinical scene is cognitive behavioral therapy (CBT) [38][39][40][41][42]. The CBT protocol constitutes four different techniques: task concentration exercises, counterconditioning, stimulus manipulation, and relaxation exercises. Following treatment, 48% of the patients showed a significant reduction of misophonia symptoms [43]. In a session of stimulus manipulation, the patients are instructed to change the pitch and time interval of sound triggers by using an audio-editing software, and this manipulation initiates a sense of control over their personal misophonic trigger sounds. In this study, the IACC is the most effective factor to control the ASMR sensation, so the change of IACC (e.g., convolution with binaural impulse responses) may be effective to let the patients know the misophonic trigger sounds under their control.

Conclusions
The following conclusions are drawn from the results of the study.
(1) Human-generated sounds are more likely to trigger stronger ASMRs than naturegenerated sounds. (2) Among possible ASMR auditory triggers, sounds perceived to be close to the listener are more likely to evoke the ASMR sensation. (3) In the case of nature-generated sounds, the ASMR triggers with higher loudness and roughness among Zwicker parameters are more likely to evoke the ASMR sensation. (4) In the case of human-generated sounds, the ASMR triggers with a lower IACC among the ACF/IACF parameters are more likely to evoke the ASMR sensation. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patients to publish this paper.
Data Availability Statement: Not applicable.