Quantifying Auditory Presence Using Electroencephalography

: Presence is used to assess the subjective experience of being in one place when physically situated in another. Recently, the research on presence has gained increasing attention due to the wide use of immersive audio technologies. Currently, the most widely-used measurement of presence is based on post-experiment self-report questionnaires. It is reliable but imperfect due to the psychological changes caused by the act of answering the questionnaire when immersed in the virtual environment. Therefore, the present work aims to ﬁnd an objective way to measure presence, and electroencephalography (EEG) was investigated as a possible tool for this objective measurement. In this study, two listening tests were conducted, where eight loudspeakers were used to reproduce urban soundscapes to stimulate auditory presence. Presence was measured by both questionnaires and EEG. Results showed a signiﬁcant correlation between T/B (Theta/Beta Ratio) extracted from EEG and subjective presence levels assessed by questionnaires, suggesting the possible use of EEG to measure presence objectively. This study could bring some insight for the research of presence, and related technologies, such as VR, video games and immersive audio production.


Presence
Presence is used to assess the participants' sense of "being there" in the virtual environment [1]. It can be defined explicitly as the subjective experience of being in one place or environment, even when physically situated in another [2]. In other words, it reflects the extent to which the participants' cognitive and perceptual systems are tricked into believing they are somewhere other than their physical location [3]. Generally, the term presence refers to physical presence in many applications, though it can also be classified as social presence or co-presence [4,5].
Many researchers have assessed presence earlier, but most of them focused on the visual aspect of the stimuli, for example, the size and resolution of the screen [6], or the visual fidelity of the animation [7]. Less attention was paid to the auditory aspect of the stimuli. Regarding research focusing on the auditory cues of presence [8][9][10][11][12][13], most experiments still used visual display to accompany auditory stimuli. Besides, the audio stimuli these researchers used were mostly stereophonic or binaural recordings, whereas a loudspeaker array is used in the present work.

Subjective Measurement
Typically, presence is measured via questionnaires. There are many types of questionnaires available for this purpose, and Schwind summarized 15 that had been used in earlier studies [14]. Among them, the most widely used questionnaires were Presence Questionnaire (PQ) by Witmer and Singer [2,15] and SUS Questionnaire by Slater, Usoh and Steed [1,14,[16][17][18].
Subjective measurement based on questionnaires is still the most widely-used and reliable way to evaluate presence, because it is the direct response reported by participants.
However, it is sometimes imperfect due to its inherent characters. For example, the act of answering questionnaires would cause the participants to leave the virtual environment and cause a break in presence (BIP) [18][19][20]. Let alone the questionnaires that require papers and pens, Schwind et al. reported that even answering the questionnaires within VR would still affect the consistency of data [14]. Besides, participants have to average their response during the entire process when answering the questionnaire, which means the measurement is not real-time. Slater noted that researchers should rely less on questionnaires in the studies of presence [19]. For this reason, we seek to explore the objective measurement of presence to possibly improve on that.

Objective Measurement
Physiological signals generated from the body can be directly collected to measure presence (e.g., peripheral physiological signals, such as heart rate and skin conductance [21,22] and neurological signals [23]) without the influence of subjective opinions of the participants. This kind of objective measurement is straightforward and can be real-time and reliable if conducted with strictly controlled procedures or algorithms. Electroencephalography (EEG) has gained its advantage by the freedom it gives the participants when wearing the EEG headsets, especially compared to other devices such as fMRI [6]. EEG is the recorded electrical activity generated by the brain via electrodes placed on the scalp. It is the superposition of many simpler electric signals generated by millions of neurons, reflecting brain activity. EEG has already been used in many types of research regarding presence [6,7,22,[24][25][26][27], and other fields [28][29][30]. This paper explored the possibility of measuring presence using EEG data.
The power ratio between two specified frequency bands is also commonly used in clinical and cognitive neuroscience [32][33][34][35][36]. Using the three frequency bands of interest in this study (theta, alpha and beta) and following the pattern of calculating the ratio as "slow-wave/fast-wave", the power ratios available in the present study could be T/A (theta/alpha), T/AB (theta/(alpha + beta)), T/B (theta/beta) and A/B (alpha/beta).

Summary
This present study seeks to find the answers to the following questions. Besides subjective measurement by questionnaires, is it possible to objectively measure presence with EEG? If yes, which EEG index should we use specifically to reflect the level of presence, absolute band power or specific power ratio? Is there any correlation between these two measurement methods?
To answer these questions, two experiments were conducted where eight loudspeakers were used to reproduce urban soundscapes to stimulate auditory presence. During the experiment, both subjective measurement (based on questionnaires) and objective measurement (based on EEG signal) of presence were carried on. Then, we analyzed the subjective presence levels and EEG indices via different statistical tests, and further evaluated the correlation between these two measurement methods.
The rest of the paper is organized as follows: Section 2 explains the methodology that we used in the study. In Section 3, we present the results followed by a discussion in Section 4. Then, in Section 5, we draw the conclusion and give a few suggestions for future work.

Program Selection and Listening Environment
In this study, urban soundscape was used to stimulate presence. Urban soundscape is the combination of the natural acoustic environment (e.g., the sound from trees and water) and environmental sounds created by humans. It can be considered as the sonic landscape of urban scenes, and can be used for the research of city designing, noise control, etc. There are three advantages of using it in this study. First, it eliminates the influence of familiarity on the participants' performance in the listening test, because almost everyone has the experience of being in the urban scenes (e.g., on the street or in the railway station). Second, it widens the choice of participants, because for this study, participants were only required to assess the feeling of being in a virtual space. It made the task easier and thus lowered the threshold for selecting participants. Third, compared with using music as programs, the participants' performance would be less affected by preference or other personal affections.
This study used ESMA-3D Immersive Soundscape Recordings by Dr. Hyunkook Lee's team [37] as urban soundscape. The ESMA-3D recording technique uses eight microphones for 360°audio capture. Four microphones in the main layer capture sound sources, and the other four in the upper layer capture ambience and elevated sources. By its nature, the reproduction of ESMA-3D could be achieved by the 8-channel double-layer quad-speakers array (i.e., cube array) [38]. As for programs, six out of 13 soundscapes of the original sound file were chosen to shorten the duration of the experiment, including four outdoor scenes (Union Square, Central Park, W 34th St Penn Station, and Adam St under the Manhattan Bridge) and two indoor scenes (Grand Central Terminal, and factory at W 34th & 10th Ave). In the authors' opinion, these soundscapes have the best potential to stimulate presence because they can show moving objects (e.g., a bus arriving and leaving the bus station) or unique acoustic character of chambers (e.g., broadcast in the railway station).
The experiments were carried out in an air-conditioned room with no other external noise, as shown in Figure 1. The room was acoustically treated with carpet on the floor and curtains 15 cm ahead of each wall. There were 32 loudspeakers (GENELEC 8020D) in the room. Only eight were selected to form a cube array for the sound reproduction, with four loudspeakers in the main layer and the other four in the upper layer. The distance from each loudspeaker to the listening spot was carefully adjusted to be 3 m. The acoustic center of the four loudspeakers in the main layer was set to be 1.2 m from the ground, which is approximately ear height when seated. The acoustic center of the four loudspeakers in the upper layer was set to be at the height of 2 m, elevated at about 45°from the listening point. Thus, the 8-channel signal could be distributed to eight corresponding loudspeakers without further processing.

Questionnaires
The questions related to auditory stimuli within the Presence Questionnaire and the SUS Questionnaire were selected and combined into a new questionnaire. The selected four questions were as follows: 1.
Please rate your sense of being in the virtual environment on a scale of 0 to 10, where 10 represents your normal experience of being in a place; 2.
During your experience, did you often think to yourself that you were actually in the virtual environment? Please rate it on a scale of 0 to 10, where 10 represents you almost feel you were actually in the virtual environment; 3.
How well could you identify sounds? Please rate it on a scale of 0 to 10, where 10 represents you could clearly identify different kinds of sounds; 4.
How well could you localize sounds? Please rate it on a scale of 0 to 10, where 10 represents you could easily detect the location of each sound.
The first two questions were chosen from the SUS Questionnaire. The first question focuses on rating the experience in the virtual environment compared to the real world, and the second question focuses on the subjective feeling of placing oneself in the virtual environment. The last two questions were chosen from the Presence Questionnaire to evaluate the realism of the sound field of the virtual environment. This questionnaire design was similar to Hendrix and Barfield's study [9], where they assessed auditory presence in their experiment with five questions, three to evaluate presence and the other two to evaluate the spatialized sound. In this study, detailed explanations of these four questions were given to the participants during the experiment.
The final presence score was the sum of the answers to these four questions. As the questions suggest, the higher the score, the more the presence. In this way, the subjective assessment of presence was acquired.

EEG Apparatus
Conventional measurement of EEG is time-consuming and constrains the movement of participants. Therefore, a multi-channel wireless portable EEG device (Emotiv Epoc X [39]) was used. Its ability to collect reliable EEG data has been validated [40,41]. It guaranteed that participants could immerse themselves in the experiment without feeling the EEG measurement equipment being there.
The placement of electrodes of the EEG device is in agreement with the international 10-20 system, but restricted in the number. Data were collected through 14 channels, using 14 active electrodes (placed on AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) and four reference electrodes (placed on TP9, P3, P4, and TP10). The layout of the electrodes on the scalp is shown in Figure 2. These 14 electrodes could cover the whole brain area and provide sufficient data for processing [42]. The EEG sampling rate was 128 Hz. The EmotivPro software was used to acquire raw EEG data from the headset.  In Experiment 1, two types of programs were utilized: (1) the original version (Program A): six scenes included, each lasting 30 s, with the total time of three minutes; (2) the mono version (Program B): with the same content as the original version, but signals from eight channels were added together and then copied to each channel. That is, the signal of every channel was the same, and participants would hear the same sound from eight loudspeakers. Given that the eight loudspeakers were placed at an equal distance from the listening spot, we were expecting the effect of listening to Program B to be similar to listening to mono sound and less immersive than listening to Program A. Experiment 1 was conducted with 18 experienced listeners (16 males, 2 females, ages ranging between 23-27 years old). They were selected from the lab, and none of them reported hearing damages. These 18 participants were randomly divided into two groups. Participants in Group I (nine participants) were presented with Program A followed by Program B, with 15 s of silence in between. Participants in Group II (nine participants) were presented with Program B ahead of Program A instead, still with 15 s of silence in between. This design aims to rule out the influence of the order of presentation.
We divided participants into two groups but did not require the same participant to participate in the experiment twice, because this study is distinct from traditional listening tests, where participants were required to assess spectral irregularities or distortion. Those tasks require specific listening skills that could be repeated many times. However, assessing presence depends more on the first-time experience. Being too familiar with the programs would cause participants to lose concentration on the task and give unreliable results. In similar studies [6,21,27,30] participants were also presented with two or three conditions for comparison within a single test.
The procedure was as follows.
(1) Participants were first welcomed and given the introduction of the experiment. (2) Program A was played to make the participants familiar with the program and the immersive sound. (3) Participants were fitted with the EEG headset, and each electrode's conductivity and EEG quality were checked. (4) Participants were presented with the programs, and EEG was measured in the meantime. To help participants immerse themselves when listening, they were recommended to close their eyes. If not, there were still no visual clues of which loudspeaker was playing sound. When listening, the task for the participants was only to assess presence by their experience, without going into details of the evaluation of spaciousness such as ASW (Apparent Source Width) or LEV (Listener Envelopment) [? ] (5) After measurement, participants were required to fill in the questionnaire. The duration of the whole experiment was approximately 20 min.

Experiment 2
In Experiment 2, two types of programs were utilized: (1) the original version (Program A); (2) the 2D version (Program C): the four height channels (Left Front Height, Right Front Height, Left Rear Height, Right Rear Height) of the original version were muted. Thus, the ESMA-3D degenerated into ESMA without height channels. We called it the 2D version for this reason. The purpose of making Program C was to create a program that is subtly different from Program A, with the difference smaller than that between Programs A and B. This serves the purpose of testing the consistency between subjective and objective measurements under a more stringent condition than Experiment 1.
Experiment 2 was conducted with 25 naïve listeners (8 males, 17 females, ages ranging between 21-28 years old). They were volunteers selected in different majors across the campus, and none of them reported hearing damages. Similar to Experiment 1, the participants for Experiment 2 were also divided into two groups: 12 in Group I and 13 in Group II. Participants in Group I were presented with Program A followed by Program C, with 15 s of silence in between. Participants in Group II were presented with Program C ahead of Program A, still with 15 s of silence in between. The procedure of Experiment 2 was the same as Experiment 1.

Signal Processing
Raw EEG data were processed using a brain signal processing toolbox in MATLAB called EEGLAB [43]. EEG data were first applied with a bandpass filter of 1-50 Hz to screen the unwanted noise and keep the frequency range of interest. The filtered data were further cleaned via a built-in plugin of EEGLAB by the name of Clean Rawdata. Bad portions of data caused by non-physiologic artifacts (e.g., insufficient contact of electrodes with the head surface or sudden large movements) were removed based on their standard deviation during that process. To further remove physiologic artifacts, such as heart beat (ECG), eye movements (EOG), muscle pulsations (EMG), Independent Component Analysis (ICA) was run. Next, another EEGLAB plugin, MARA ("Multiple Artifact Rejection Algorithm") was utilized for automatic classification of artifactual ICA components [44,45]. The components rejected by MARA were then removed from the original data.
A MATLAB script was run with the pre-processed data to calculate the power of each frequency range and the power ratios. The script was organized as follows. The power spectral density (PSD) was first calculated using Welch's method with a Hanning window of the length of one second. Then, different frequency bands were defined as described in Section 1.3, and the power of each frequency band was summed. Lastly, different power ratios of interest were calculated.

Experiment 1
In Experiment 1, 16 sets of valid data were collected. Data from two participants were ruled out due to their poor condition during the experiment. One participant in Group I reported that he was sleepy and could not concentrate on the experiment, and one participant in Group II reported that he got distracted by his phone.
The presence scores calculated from the questionnaires are displayed in Figure 3, in which Participant 1-8 belonged to Group I, and Participant 9-16 belonged to Group II. It demonstrates a dramatic difference in the presence score between Programs A and B. Presence score rated for Program A was higher than B for almost all participants except Participant 10. The presence score was normally distributed according to the Kolmogorov-Smirnov method (for Program A, p = 0.104; for Program B, p = 0.466). Given that each participant rated the presence score for Programs A and B successively, the presence score for Programs A and B should be treated as pairs when evaluating the difference between them. The paired t-test showed that the difference in presence score was significant at a significance level of 0.05 (p = 1.28 ×10 −4 ). Besides, paired t-tests were also carried out on the difference in the ratings of each question between the two programs. The results showed that there was significant difference between the two programs for all four questions.  Table 1. The absolute powers of theta, alpha and beta and a series of power ratios are presented. The power ratios used were T/A (theta/alpha), T/AB (theta/(alpha+beta)), T/B (theta/beta) and A/B (alpha/beta). It can be seen that the distribution of these EEG indices fluctuated greatly. For different participants, the fluctuation was large, and even for the same participant in different stages, the indices still varied vastly. The values of all EEG indices for both Programs A and B (each column of Table 1) did not meet the normal distribution (p < 0.05) according to the Kolmogorov-Smirnov method, indicating the paired t-test was no longer suitable. Thus, the Wilcoxon signed rank test [46] was used instead, which does not require the data to be normally distributed.  Results of the Wilcoxon signed rank test for each EEG index pair (for example, theta power for Programs A and B) are shown in Table 2. It can be seen that only the difference in beta power between Programs A and B was significant at a significance level of 0.05. The differences in alpha power, T/AB and T/B were close to significant. Because of the significant difference in both presence score and specific EEG indices between Programs A and B, it is worth analyzing the correlation between them. To this end, we used the chi-squared test, a non-parametric statistic. Data were first transformed into the dichotomous format for further processing, as shown in Table 3. If the presence score for Program A is higher than B, it would be marked as 1, and 0 if otherwise. Meanwhile, if the EEG index, for example, T/B for Program A is higher than B, it would be marked as 1 as well.  In the row direction of Table 3, the dichotomous data of presence score and EEG indices came from the same participant, indicating they should be treated as pairs in the correlation analysis. For this reason, we utilized the McNemar test within the range of chi-squared tests to be more specific [47,48]. Table 4 presents the results of the McNemar test. We could see that there was a significant difference (p < 0.05) between presence score and the absolute powers (theta, alpha and beta), suggesting that they were not statistically correlated. Instead, there was no significant difference (p > 0.05) between presence score and the four power ratios, indicating a significant correlation between them. In addition, the Pearson correlation coefficient was also obtained with the McNemar test. We could see that presence score only showed a significant (p < 0.05) correlation with T/B among all seven indices, with the correlation coefficient being 0.683. It suggested that if the presence score for program A was higher than B, it would be highly possible that the T/B for program A was also higher than that for B. On the contrary, the correlation was insignificant (p > 0.05) for the other six indices. To summarize, we first found significant differences in the presence score and beta power when presenting different programs, proving that participants were able to detect the difference between the two programs. Then, we investigated the correlation between presence score by subjective questionnaires and EEG indices by objective measurements. Considering both the results of the McNemar test and the Pearson correlation coefficient, we found that among all EEG indices used in this study, only T/B showed a significant correlation with the presence score.

Experiment 2
In Experiment 2, data from one participant in Group II were ruled out considering that she could not tell the difference between Programs A and C. Thus, 24 sets of valid data were collected as shown in Figure 3, in which Participant 1-12 belonged to Group I, and Participant 13-24 belonged to Group II.
As shown in Figure 4, there was a trend that the presence score for Program A was higher than that for C. Though not as overwhelming as in Experiment 1, this trend still applied to most participants (19 of 24). The paired t-test showed that p = 0.02 < 0.05, indicating a significant difference in the presence score at a significance level of 0.05. By the results of Experiment 1, the presence score only showed a significant correlation between T/B within all the indices. Therefore, in Experiment 2, we put emphasis on checking the reliability of this correlation. As shown in Table 5, these two sets of values did not meet normal distribution according to the Kolmogorov-Smirnov method (for Program A, p = 2.6 ×10 −5 ; for Program C, p = 1.58 ×10 −4 ). Thus, the Wilcoxon signed rank test was applied, and the results (p = 0.04) showed that the difference in T/B between the two programs was significant. Then, the McNemar test was performed to evaluate the correlation between presence score and T/B. The results (p = 0.625) showed no significant difference between presence and T/B, indicating the existence of correlation. Besides, the Pearson correlation coefficient was 0.574 (p = 0.003), confirming the correlation from a statistical point of view.
To summarize, we first found significant differences in both presence score and T/B when presenting different programs. Then, the correlation between the measure of presence score and the measure of T/B was evaluated, showing no significant difference between these two measurement methods. The correlation coefficient was significantly different from 0, as well.

Discussion
Up to now, we could answer the questions in Section 1.4. In this study, we first found that presence score was significantly different between programs. In addition, T/B (Theta/Beta Ratio) extracted from EEG signal showed significant differences between programs, and also demonstrated a significant correlation with the presence score after the data transformation. This indicates that the results of EEG measurement, together with subjective measurement both showed significant differences between different loudspeaker setups, and they were correlated to a great extent. This implies the possibility of EEG to be a potential tool for the objective measurement of presence.

Post-Test Interview
Besides questionnaires, we also interviewed the participants about their experience within the virtual environment. We obtained similar answers from the participants who rated higher presence levels for Program A. They reported that the sound came from all directions, and they felt enveloped by the sound. On the contrary, answers from the participants who gave higher presence scores for the counterpart of Program A (Program B or C) were more interesting. In Experiment 1, the one who rated a higher presence level for Program B reported that he could concentrate more on one or two items in the scenes during Program B. In Experiment 2, the five participants who rated higher presence levels for Program C reported that they felt the ambience from the four height channels in Program A has masked the sound from nearby (such as conversation on the left or the bus leaving in front) and degraded presence. The interview results indicate that not all participants prefer the 8-channel setup, and suggest that it is not always the case that "the more loudspeakers, the better the experience".

T/B in Neuroscience
In Section 3, we found a significant correlation between presence and T/B. It was still unclear what this stood for from the perspective of neuroscience. Witmer and Singer [2] stated that presence in a virtual environment depends on one's attention shifting from the physical environment to the virtual environment. In the meantime, T/B (Theta/Beta Ratio) has been found to be related to attention control [49,50]. van Son et al. [51] found that the frontal T/B was significantly higher during mind-wandering episodes than during the time on-task, which indicated a state of attention control over thoughts. Besides, T/B has been used as a diagnostic biomarker of attention deficit hyperactivity disorder (ADHD) in clinical diagnose [52]. This could give a little clue about the connection between presence and T/B, but more detailed research is still needed in future work.

Correlation between Questionnaires and EEG Indices
In Section 3, the McNemar test was used to evaluate the correlation between presence score and EEG indices, and revealed no significant difference between these two metrics in distinguishing which program stimulated a higher presence level. Another statistic metric, Cohen's kappa could also be utilized. It is commonly used to measure the agreement between different observers on the same judgement. Kappa is standardized to a −1 to 1 scale. A kappa of 1 indicates perfect agreement; a kappa of 0 indicates agreement equivalent of chance; negative values indicate agreement less than chance [53]. We could take kappa as a tool to evaluate the agreement between questionnaires and EEG measurements. Kappa values between the presence score and each EEG index (all indices in Experiment 1 and T/B in Experiment 2) were calculated as shown in Table 6. It can be seen that only the kappa values between presence score and T/B were significantly different from 0, proving the consistency between them to be significant. Kappa was 0.636 in Experiment 1 and 0.56 in Experiment 2, indicating substantial agreement between presence score and T/B. In addition, kappa in Experiment 2 was a little lower than that in Experiment 1. This is consistent with the fact that the difference between Programs A and C is smaller than that between Programs A and B.

Conclusions
This paper presented an analysis of auditory presence using eight loudspeakers to reproduce urban soundscapes as programs. Two experiments were conducted where presence was measured both subjectively with questionnaires and objectively with EEG. For subjective evaluation, the presence score showed significant differences between loudspeaker setups. For objective measurement, beta power and T/B extracted from EEG signals showed significant differences in the meantime. Correlation between presence score and EEG indices was analyzed, and T/B demonstrated a significant correlation with presence score.
The results of the present work suggest that the results of EEG measurement and subjective measurement both showed significant difference between different loudspeaker setups, and they were correlated to a great extent. This indicates the possibility of EEG to be a potential tool for the objective measurement of presence. Among all indices extracted from EEG signals, T/B has the largest potential to be the objective marker for the measurement of presence. Future work could focus on building a model for predicting presence by measuring the T/B only once, which may further make it easier to measure presence.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement: Not applicable.
Acknowledgments: The authors would like to thank Ziyun Liu for the suggestions for this study, and all listeners who participated in the listening test.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: EEG electroencephalography T/A the power ratio of theta/alpha T/B the power ratio of theta/beta T/AB the power ratio of theta/(alpha+beta) A/B the power ratio of alpha/beta