Introduction
“The eyes are the windows to the soul”—by looking into a person’s eyes, we may understand how she or he thinks and feels. Scientists have backed up this proverb by showing that the pupil reflects various cognitive functions such as cognitive processing load (
Beatty, 1982; Hyönä, Tommola, & Alaja, 1995;
Kahneman, 1973), emotion (
Partala & Surakka, 2003), attentional modulation (Eldar, Cohen, & Niv, 2013; Gabay, Pertzov, & Henik, 2011), memory (
Goldinger & Papesh, 2012; Naber, Frassle, Rutishauser, & Einhäuser, 2013), decision making (Einhäuser, Koch, & Carter, 2010; Einhäuser, Stout, Koch, & Carter, 2008), high level visual content information processing (
Naber & Nakayama, 2013), and mental imagery (
Laeng & Sulutvedt, 2014). The underlying mechanism is considered to be related to the locus coeruleus (LC)–norepinephrine (NE) function, which modulates adaptive gain and optimizes performance (
Aston-Jones & Cohen, 2005). Since changes in pupil size are tightly coupled with the activity of the LC neurons, we may infer the LC-NE function by observing pupillary responses.
The auditory system is sensitive to stimulus regularity and detects any change rapidly to optimize environmental monitoring. It has been demonstrated that pupillary responses reflect salient and surprising auditory events (e.g.,
Bala & Takahashi, 2000;
Huang & Elhilali, 2017; Liao, Kidani, et al., 2016; Liao, Yoneya, et al., 2016; Wang, Boehnke, Itti, & Munoz, 2014; Wang & Munoz, 2014; Wetzel, Buttelmann, Schieler, & Widmann, 2016). For example, Liao, Yoneya, et al. (2016) showed that when participants listened to an auditory sequence consisting of repetitive tones with a deviant noise oddball presented occasionally, pupil size increased when the oddball appeared. This pupillary dilation response (PDR) was observed regardless of whether the participant paid attention to the auditory sequence or not, suggesting that the PDR is an automatic physiological response for auditory surprise detection.
The PDR reflects a surprising moment not only when the surprise is defined objectively as a deviant oddball event against the background, but also when it is defined by human participants’ subjective evaluations. Liao, Kidani, et al. (2016) presented ten discrete environmental sounds to participants while their pupillary responses were recorded. Each sound was presented for 500 ms with a 10-s inter-stimulus interval. After the pupillary response recording, they were asked to rate several psychoacoustic aspects of the sounds, including salience, loudness, preference, beauty, hardness, vigorousness, and annoyance. Results showed that the pupil dilated when the sounds were presented. Most importantly, the magnitude of the PDR was positively correlated with the subjective salience of the sound, as well as its loudness, but not with other aspects of the psychoacoustic judgments.
The correspondence between auditory surprise and the PDR shown in our previous studies was found when the salient auditory event was briefly presented, e.g., 50 ms for the noise oddball (Liao, Yoneya, et al., 2016) and 500 ms for the environmental sound (Liao, Kidani, et al., 2016). In real-world situations, on the other hand, a salient auditory event may last long and continuously. Therefore, it is important to examine whether the PDR reflects auditory salience in complex auditory scenes. In the current study, we examined whether the PDR reflects subjective auditory surprise in music and how loudness may contribute to the effect. Music is a long-lasting, continuous, complex, and yet structured auditory stimulus. A composition usually consists of certain repetitions and variations of the repetitive structure. These characteristics of music enable us to trace subjective surprise evaluations as an excerpt changes. We examined whether the pupil dilates when an excerpt is evaluated as surprising.
Methods
Participants listened to an excerpt of music for 90 s and concurrently rated how surprising it was, i.e., rich in variation versus monotonous, by sliding a rating bar continuously. Meanwhile, we had them fixate a central point on the monitor to record their pupillary responses. Each participant listened to 15 excerpts of classical, jazz, and rock music. After the concurrent surprise-rating session, participants listened to the same excerpts again while their pupillary responses were recorded, but they were not involved in any task.
Participants
Twenty-two adults (aged 22-43; median of 35) participated in the study. All had normal or corrected-tonormal vision and reported normal hearing. All participants were naïve about the purpose of the study and received payment for their participation. All the procedures were approved by the NTT Communication Science Laboratories Ethical Committee, and all participants gave informed written consent before the experiment.
Materials
Stimuli were generated and controlled by a personal computer (Dell OptiPlex 980) and presented through a headphone (Sennheiser HD 595) and on an 18.1-inch monitor (EIZO FlexScanL685Ex). Auditory stimuli were 15 excerpts of the first 90 s of selected pieces (
Table 1). They were selected because their structure consisted of both several repetitions and variations of them. The sound pressure levels were fixed across the participants at a comfortable listening level. The visual stimulus was a dark gray fixation point (0.25 × 0.25°, 0.33 cd/m
2) presented against light gray background (27.0 cd/m
2).
Behavioral responses were collected from a transducer (TSD115) connected to a Biopac MP system (HLT100C module, BIOPAC Systems, Inc.). The transducer had a slider on the panel to allow participants to report subjective assessments from 0 to 10 continuously. The sampling rate of the transducer was 1000 Hz. Pupillary responses were recoded binocularly by an infrared eye-tracker camera (Eyelink 1000 Desktop Mount, SR Research Ltd.) with a sampling rate of 1000 Hz.
Design
The 15 musical excerpts were presented twice in different sessions: first in a surprise-rating session and then again in a passive-listening session. The order of the excerpts in each session for each participant was randomly assigned. The inter-stimulus interval (ISI) was 5 s. The total duration of each session was around 25 min.
Procedure
All participants were given written and oral explanations about the nature of the experiment and the pupillary response recording. Participants sat in front of the monitor at a viewing distance of 51 cm in a dimly lit soundproof chamber, with their chin on a chinrest. Before each session, a five-point calibration procedure was performed, after which the participants were instructed to fixate the central point throughout the experiment.
In the surprise-rating session, participants were asked to concurrently rate how they felt about changes (in any sense) compared with the portions within the excerpt they had heard so far. For example, if they felt any aspect in the music, including melody, tempo, harmony, or texture (e.g., more instruments playing), became richer in variation, they moved the slider to the right to register higher scores. If they felt the change became monotonous, they moved it to the left to register lower scores. The slider was reset in the middle (i.e., scored as 5) at the beginning of each excerpt.
In the passive-listening session, participants listened to the same musical excerpts again without any task involvement. The break between the two sessions was longer than 30 min. The order of the two sessions was fixed to avoid the influence of expectation on the surprise rating due to the repetition.
After the two sessions, participants answered a questionnaire to rate from 1 (never heard the piece) to 7 (often heard the piece) how familiar they were with each excerpt and to write down the name of the piece and/or the artist/composer if they knew it. They were allowed to replay the excerpts at their own pace when answering the questionnaire.
Results
Familiarity with the excerpts (questionnaire)
The mean familiarity scores for the classical, jazz, and rock music were 4.1, 2.4, and 2.6, respectively (scores for individual excerpts are listed in
Table 2). The mean scores for each participant were subjected to a repeated-measures ANOVA with the music genres (classical, jazz, rock) as within-subject factors. Results showed that participants were more familiar with the classical music we selected than the other types of music [
F(2,42) = 19.07,
p < .001,
η2 = .48]. The results of the open questions are shown in
Table 2 (second and third columns). Participants tended to give more answers and correct ones to questions about the classical music than to those about the other genres, which is consistent with the results of the subjective feeling of familiarity.
On-line subjective surprise rating
Figure 1A shows examples of the surprise rating over time. The average (the fourth column) and variation (the fifth column) of the surprise rating over time are listed in
Table 2.
To examine whether the surprise rating varied among the music genres or excerpts (e.g., in terms of its familiarity), we conducted two different analyses. First, the means of the average, as well as the standard deviations, of the surprise rating score were subjected to a repeated-measures ANOVA with the music genres (classical, jazz, rock) as within-subject factors. Results showed that neither the average surprise rating [F(2,42) = 2.80, p > .07, η2 = .12] nor the variations in the surprise rating over time [F(2,42) = 1.13, p > .3, η2 = .05] differed among music genres. Second, we calculated the correlation between the familiarity rating and average surprise rating and the correlation between the familiarity rating and the variations in the surprise rating over time. Results showed a positive correlation between familiarity and the average surprise rating over time (r = .24, p < .001) but not between familiarity and variations in the surprise rating over time (r = - 0.07, p > .2).
To examine the consensus among the participants on the surprise rating, we calculated Kendall’s coefficient of concordance (W). The rating data were resampled with a 10-Hz sampling rate for the analysis. The results are shown in
Table 2 (sixth column). The consensuses among the participants were moderate but significant, and they varied among musical excerpts (median of 0.56, min of 0.30, max of 0.73; all
ps < .001).
Pupillary response analysis
Figure 1B shows the pupil size change over time. Only data recorded from the right eye were analyzed since the pupillary responses from both eyes were consensual. Data during blinks were treated as missing and discarded (30.1%). The range of the average blink rate was about the same as in our previous studies (Liao, Kidani, et al., 2016; Liao, Yoneya, et al., 2016), where the task was an auditory one that allowed normal blinks.
The pupil size measurement in the video-based eye tracker system, as used in the current study, was covariant with the gaze position (Gagl, Hawelka, & Hutzler, 2011). To avoid recording errors due to unexpected gaze positions, pupil size data were screened when the gaze position deviated 1.5 deg. from the central fixation point, and 23.1% of the data were screened out.
The Eyelink system outputs arbitrary units [au] to represent the pupil size, which was not calibrated across participants or conditions. To compare the results across conditions, we computed z-score during each 90-s excerpt. To reduce high-frequency noise due to the over-fine sampling rate (i.e., 1000 Hz) for pupillary response measurements, we resampled the data with a 10-Hz sampling rate for the analysis. Specifically, the data between the resampling points (i.e., every 100 data points) were discarded without any interpolation or filtering procedure. In this work, we used an EDF converter (provided by SR Research) to convert the Eyelink EDF file to the ASC format, and we used Matlab for all the data analyses. The function for the resampling procedure described above was “downsample.” We followed the same protocol as in our previous study (Liao, Yoneya, et al., 2016).
Surprise-related PDR
The pupil data recorded in the two sessions (surprise-rating and passive-listening) were time-aligned to the rating data obtained in the surprise-rating session. The surprising moments were defined arbitrarily as the period when the surprise rating score was above 7 (the red lines in
Figure 1), the unsurprising moments as a surprise rating score below 4 (the blue lines in
Figure 1), and the neutral moments as a surprise rating score between 7 and 4 (the black lines in
Figure 1). The criterion was set to obtain similar probabilities of the valid data for surprising and unsurprising moments: 24.7% and 21.7% of the total duration, respectively.
Results are shown in
Figure 2. Mean pupil diameter was subjected to a three-way repeated-measures ANOVA with the task (surprise-rating, passive-listening), music genre (classical, jazz, rock), and surprise (surprising, neutral, unsurprising) as within-subject factors. Results showed main effects of surprise [
F(2,42) = 9.66,
p < .001,
η2 = .32] and music genre [
F(2,42) = 3.31,
p < .05,
η2 = .14] but not any other effect or interaction (
ps > .1). When we applied a different criterion to define the surprising moments in which the deviation of the rating score from mean was more than 1.5 times the standard deviation, the effect of surprise remained [
F(2,42) = 10.82,
p < .001,
η2 = .34]. The results suggest that the pupil dilated more strongly during the surprising moments than during the unsurprising ones regardless of the music genre or whether the on-line surprise rating was involved or not.
To further investigate whether there was systematic bias induced by a particular musical excerpt or participant, we used scatter plots to represent the surprise-related PDR for individual excerpts and participants. Results are shown in
Figure 3. The data were clustered below the diagonal line (confirming larger PDR during surprising moments than during unsurprising ones), while the distribution of the genres or participants was spread equally, indicating a consistent tendency of the surprise-related PDR among different genres or participants. There was no significant correlation between the surprise-related PDR (i.e., the difference in average pupil size between surprising moments and unsurprising ones) and the familiarity rating (
r = .02,
p > .8 for the surprise-rating condition, and
r = .03,
p > .7 for the passive-listening condition; data not shown). The overall results suggest that the surprise-related PDR did not depend on the genre, participant, or familiarity with the excerpt.
We conduced further analysis to verify the effect of the surprise-related PDRs and examine whether the effect could be explained by stimulus-driven factors coupled with the musical excerpts or response biases/tendencies associated with the participants. Specifically, we calculated the estimated PDR-surprise association using bootstrapping procedures. In the completely random procedure (as a baseline), the pupil data were aligned with rating data randomly selected from different participants/excerpts. The difference in the mean pupil diameter between surprising and unsurprising moments, derived from the ratings of different participants and excerpts, was calculated 1,000 times (by random selection between the pair of the pupil and rating data) to form a distribution, where the PDR was expected not to be associated with the surprise at all. The results are shown in Figure 4. In both the surpriserating and passive-listening conditions, the baseline distributions (i.e., the black distributions) were quite distant from the observed surprise-related PDR (indicated as vertical dashed lines), indicating a reliable surprise-related PDR: when the pupil data matched the rating data for the same participant and musical excerpt, pupil size was larger during surprising moments.
We further calculated the estimated PDR-surprise associations when the pupil data were paired with the rating data for the same excerpt, but randomly selected from different participants (i.e., shuffled-participant condition), and when the pupil data were paired with the rating data obtained from the same participant, but randomly selected from different excerpts (i.e., shuffled-excerpt condition). We expected that if the observed surprise-related PDR could mainly be explained by the stimulus-driven factor, the distribution of the estimated PDR-surprise association from the shuffled-participant condition would be close to the observed surprise-related PDR. Namely, as long as the pupil data were aligned with the rating data from the same excerpt, regardless of the rater/participant, the PDR-surprise association would increase. In contrast, if the surprise-related PDR could mainly be explained by the participant-related factors, such as response bias systematic tendency of rating, etc., the surprise-related PDR would be close to the estimation obtained from the shuffled-excerpt procedure. Namely, the surprise-related PDR would be due to coordination between the pupillary response and rating of a particular rater/participant, regardless of the excerpt that was to be rated.
As shown in
Figure 4, in the surprise-rating session, the observed surprise-related PDR was quite close to the distribution derived from the shuffled-participant procedure but not to the distribution derived from the shuffled-excerpt procedure, indicating that the stimulus characteristics might contribute to the surprise-related PDRs during surprise rating. In contrast, no such result was found in the passive-listening session. The distribution of the shuffled-participant or shuffled-excerpt condition overlapped the baseline distribution and was distant from the observed surprise-related PDR. The results suggest that the surprise-related PDRs observed during passive listening cannot be explained by the stimulus characteristics or response biases/tendencies associated with the participants.
Discussion
We examined whether the PDR reflects surprising moments in music. Participants evaluated how surprisingly a musical excerpt changed over time while they listened to the music concurrently. We found that their pupil size increased at the moment they gave a surprise rating, indicating a surprise-related PDR in music. This pattern of results was also observed when they listened to the music passively without performing any evaluation. Note that in the current study, the surprise-related PDR was not revealed as a typical phasic (or biphasic) response as it is when the surprise event is clearly defined and presented discretely (e.g., Liao, Kidani, et al., 2016; Liao, Yoneya, et al., 2016). In contrast, the ‘surprise’ was defined by a continuously updated processing over time (therefore, it was not necessarily a discrete transient event), as indexed when the surprise rating scores increased beyond certain levels, which might make the stereotypical phasic response less noticeable. In any case, when averaging the pupil size across time periods during ‘surprise’ events, it has been consistently observed that the average pupil size is larger around the surprise events than for background neutral sounds (Liao, Yoneya, et al., 2016) or less surprising/salient sounds (Liao, Kidani, et al., 2016).
Further bootstrapping analysis demonstrated that the effect of stimulus characteristics might contribute to the surprise-related PDRs during the surprise rating task but not during passive listening. Moreover, the decision-making-related PDRs were only observed when the participants performed the rating task but not when they listened to the music passively, indicating the absence of spontaneous evaluation in the latter case. The overall results indicate that PDR reflects surprising moments in music, regardless of whether an evaluation of the surprise per se is required. This suggests that the surprise-related PDR could be due to a stimulus-driven response to the acoustic features embedded in the music or due to automatic monitoring of surprise in an auditory environment.
The surprise-related PDR was observed for all the music genres we tested, regardless of the familiarity with the excerpt. In the behavioral subjective rating, participants were more familiar with a particular genre of music, i.e., classical music, than the others, and tended to give a higher surprise rating on average over time if they were familiar with the excerpt. The reason for this tendency could be that when one is familiar with a particular excerpts, it becomes easier for him/her to form an expectation and thus to predict the ‘surprise’ or be predisposed to it. It has been shown that with familiarity with excerpts, chills and emotional responses related to the excerpts increase (e.g.,
Mori & Iwanaga, 2017;
Panksepp, 1995;
Pereira et al., 2011). Chills are also observed in the reflection of pupillary dilation response (Laeng, Eidet, Sulutvedt, & Panksepp, 2016) and are often present when music is rich in variation. While we did not measure chills or perform an emotional evaluation of the excerpts, it is unclear whether the surprise rating was similar to chills or not. However, the surprise-related PDR did not correlate with familiarity with the excerpts and was observed robustly and constantly regardless of music genre. This suggests that the surprise-related PDR can hardly be explained by familiarity or chills and is consistent with the idea that the reflection of the PDR in auditory surprise is an automatic physiological response. This conclusion is also supported by evidence showing that the PDR to a deviant auditory oddball (Liao, Yoneya, et al., 2016) is independent of the task demand, i.e., when the participant does not pay attention to the oddball per se.
It remains unclear whether and how the subjective surprise evaluation in music can be derived from stimulus-driven factors. The consensus on the surprise rating among the participants was generally at the intermediate level and varied among the musical excerpts, indicating that the evaluation was based on an interaction between the top-down expectation (e.g., knowledge and familiarity with the excerpts) and stimulus-driven factors (e.g., acoustic features). This conclusion is also supported by the results of the bootstrapping analysis of the estimate PDR-surprise associations in that the surprise-related PDR could be explained, but only partly, by the stimulus-driven effects associated with the musical excerpts.
Huang and Elhilali (
2017) investigated auditory salience using natural soundscapes. They asked participants to rate relative salience between two auditory streams and took a data-driven approach to uncover the critical parameters for auditory salience. They found that auditory salience is spaced among multidimensional features that combine nonlinearly and context dependently. Estimating auditory surprise in music requires, in addition to the features contributing to auditory salience, parameters that are possibly related to the time sequence and interactions among the acoustic features to estimate the surprise derived from the passing sequence. A related study has shown how surprise in popular music contributes to preference (Miles, Rosen, & Grzywacz, 2017). Considering that pupil size also reflects emotional arousal (Bradley, Miccoli, Escrig, & Lang, 2008) that might be related to preference, more study is required to further investigate how the pupil reflects surprise and preference and their interaction.
The surprise-related PDR in music cannot be explained by an explicit or spontaneous surprise evaluation of the music or cognitive processing load (in terms of task demand). Einhäuser and colleagues (2010) showed that the pupil dilates at the moment a decision is made, regardless of the decision content or whether the motor response is required. This is consistent with our observation of the decision-making-related PDR only in the surprise-rating condition, regardless of the surprising or unsurprising rating, but not in the passive-listening condition. In contrast, surprise-related PDRs were constantly observed in both conditions, and thus cannot be explained by the decision-making process. With regard to the cognitive processing load, the pupil dilates when the load increases (
Beatty, 1982;
Hyönä et al., 1995;
Kahneman, 1973). In the current study, the task demand was constantly required during the surprise-rating session but not required at all during the passive-listening one, but the surprise-related PDR was constantly observed regardless of whether the cognitive effort was involved or not.
The surprise-related PDR might be partially explained by the loudness change of the music, depending on the music genre. It has been shown that subjective salience evaluation is highly correlated with loudness, regardless of whether the sound is presented briefly (Liao, Kidani, et al., 2016) or if it is long-lasting music as in the current study. However, the loudness change could not explain the pupillary response to all the music genres. In the loudness-related PDR analysis, larger pupil size during loud moments than during quiet ones was only observed for the classical music, and the effect was more remarkable during the surprise-rating condition than the passive-listening condition. This general pattern is different from the surprise-related PDR, in that the effect was observed for all the music genre. Furthermore, while the consensus of the surprise rating among the participants was at the inter-mediate level, it is possible that the pupillary response for individual participant did not simply reflect the loudness change, but instead was modulated by each participant’s specific judgment of surprise. The overall results suggest that loudness may partially explain the surprise-related PDR.
Pupillometry has recently been widely used to study various aspects of musical processing such as arousal and preference (Gingras, Marin, Puig-Waldmüller, & Fitch, 2015), chills (
Laeng et al., 2016), and familiarity (Weiss, Trehub, Schellenberg, & Habashi, 2016). The current study contributes to our understanding of pupillary response related cognitive processing by demonstrating that not only emotional arousal induced by music, but also the orienting response by surprise can be revealed by the pupillary response. By presenting relatively long musical excerpts, we were able to apply various analyses to investigate the dynamics of pupillary responses and related cognitive processing. We conclude that the pupil dilates automatically during surprising moments in music.