Introduction
Presence is defined as a perceptual illusion of nonmediation. It is triggered by technical interfaces (
Lombard & Ditton 1997). Thus, virtual environments or media contents are perceived as “real”. In addition, one’s self-awareness is immersed into this other world (Draper, Kaber & Usher, 1998). According to
Sadowski and Stanney (
2002), presence is “a sense of belief that one has left the real world and is now ‘present’ in the virtual environment” (p. 791). Presence in virtual environments requires departing mentally from the physical environment and arriving in a mediated environment (
Kim & Biocca, 1997;
Sadowski & Stanney, 2002;
Steuer, 1992). There are various different presence subconcepts (e.g. social presence, self-presence, or environmental presence). In this study, however, we will focus on spatial presence which can be considered the core form of presence. Since the role of attentional processes has been emphasized in theory (
Biocca, 1997; Strack, 1995;
Steuer, 1992), we aim to empirically investigate patterns of visual attention allocation in the context of spatial presence.
Draper, Kaber, and Usher (1998) introduced an attentional resource model of telepresence in the context of tele-operation. The model distinguishes task-relevant and distracting information in both immediate and mediated environments. Mediated environments are displayed through media interfaces whereas the immediate environment comprises everything that is not mediated. They assume that the probability of experiencing telepresence is increased if more attentional resources are allocated to the mediated environment rather than to the immediate environment.
Recently,
Wirth et al. (
2007) introduced a comprehensive spatial presence model. According to the model, the two core dimensions of presence are selflocation (the sensation of being physically situated within the mediated environment) and possible actions (the perceived amount of possibilities to act within the mediated environment).
Wirth’s (
2007) model further distinguishes two critical steps. In the first step, the focus of attention must be allocated towards the mediated environment and the user has to establish a mental representation of this environment. Then, the second important step is that the media users no longer locate themselves in the immediate environment, but feel present in the mediated environment. Thus, the model suggests that attentional processes are required to experience presence. Attention allocation towards the mediated environment may be media-induced (involuntary) or user-directed (controlled). The former results from media characteristics such as high pictorial realism, whereas the latter is associated with user characteristics such as interests or motivation.
Wirth et al. (
2007) further state that in interactive and/or immersive media, a continuous sensory input captures and maintains the involuntary attention, whereas in non-interactive media such as books, the controlled attention processes are more central. This model is in agreement with
Stark (
1995) who points out that the characteristics of the human visual information system in general, and eye movements in particular, could form the basis for telepresence. Despite the fact that the relevance of attentional processes in the context of spatial presence seems evident, there still is scarce empirical research focusing on the attentional processes.
Similar to presence, attention is a complex concept consisting of various sub-dimensions. William
James (
1890) suggested two categories: passive vs. active attention. These two categories have persisted, although the modern terms are bottom-up and top-down (the spatial presence model introduced above includes this dimension). In addition, several forms of attention have been proposed:
attentional orientation (directing the attention to a particular stimulus),
selective attention (focusing on one particular stimulus instead of another),
divided attention (distributing the attentional resources over two or more different stimuli), and
sustained attention (attending to a stimulus over a period of time) (
cf. Henderson, 2003). In addition one can distinguish overt attention (attending a stimulus with the sense organs) and covert attention (mentally focussing on certain stimuli or aspects of a stimulus) (cf.
Wright & Ward, 2008).
Attentional processes have been investigated in different modalities. However, most research has been done in the visual domain, and in scene perception in particular. Due to the fact that in the human eye, only a small region of the retina (i.e. the fovea) provides high quality visual information, we move our eyes about three times each second. These rapid eye movements are termed saccades, whereas the periods of relative gaze stability are termed fixations. Fixations can be seen as deictic pointers to entities in the environment. Eye movements may act as a primary origin for the coordinate systems of vision, motor control, and cognition.
A lot of research in the field of scene perception has focused on the question of whether the eye movements are controlled bottom-up (i.e. based on stimulus characteristics such as contrast) or top-down (i.e. based on memory or cognitive processes) (
Henderson, 2003). As a consequence, variations in the eye movement patterns while looking at a standardized visual stimulus can be attributed to top-down processes.
A central tool in the visual attention research is eye tracking since eye movements are a behavioral manifestation of the attention allocation in a particular scene.
Henderson (
2003) states that eye movements serve as a window into the operation of the attentional system. Moreover, he concludes that eye movements provide an unobtrusive and sensitive index of visual and cognitive processing. Thereby, eye tracking is obviously most suited to investigate overt attention.
Bailenson and Yee (
2005) introduced head movements as a proxy for gaze in order to assess attention allocation. So far, the potential of eye movement tracking has not yet been realized in presence research. We think that tracking eye movements enables us to address several unsettled issues. It is assumed that attention allocation is a prerequisite for spatial presence. However, it is obvious that paying close (visual) attention to a mediated environment only increases the probability that presence emerges. For example, an airport screener using an x-ray device to check suitcases usually attends the screen very closely. However, it is unlikely that he or she feels located inside the suitcases. This raises the question as to what extent presence is influenced by the visual scene perception.
There have been some attempts to use eye movement analyses to understand immersion in mediated environments (c.f. Cox, Cairns, Berthouze, & Jennett, 2006;
Haffegee, & Barrow, 2009). These studies did not investigate spatial presence directly but immersion. Immersion is a broader concept that includes emotional processes.
Tijs (
2006) aimed to explore the role of eye movements in the context of immersion in games. In his experiment, he used two different games, one that he assumed would evoke strong immersion and another one that he expected to evoke little or no immersion. The results show that more immersion is associated with longer average fixation durations. During the game, there were also significant variations in pupil dilatation. These findings point out that eye movements could be an important indicator in the context of immersion and presence. Yet other research has to replicate these findings because the variability of the pupil dilatation may be the result of variations of the brightness of the screen. Computer games like the ones used in the experiment, usually include different sceneries with various light sceneries.
We aim to investigate whether strong sensations of presence are associated with specific patterns of eye movements. Such patterns could be of major interest from several perspectives. First, the role of visual attention in spatial presence could be clarified. Second, identifying the eye movement patterns triggering presence could have not only theoretical, but also practical implications (e.g. for VR-designers). Third, since eye tracking is unobtrusive and highly reliable, the identification of specific presence eye movement parameters could form the basis for a new indicator of spatial presence. Fourth, scene perception in mediated environments could help to better understand scene perception in natural environments. Fifth, the eye movement patterns could, in the long run, bear light on the cognitive processes during presence experiences.
We outlined above that immersion and fixation duration were found to be associated (Cox, Cairns, Berthouze, & Jennett, 2006;
Tijs, 2006). Therefore we predict the following:
H1: The average fixation duration is positively related to the subjective sensation of presence.
H2: The number of fixations is negatively related to the subjective sensation of presence.
Presence not only depends on the media characteristics and attention allocation towards the medium, but also on the user in terms of the motivation, and the abilities to immerse in a virtual environment (
Wirth et al., 2007). This suggests the following hypothesis:
H3: The immersive tendency and the actual sensation of presence are positively related.
So far, it remains unclear whether presence and pupil dilatation are associated. Therefore we aim to answer the following research question:
RQ1: Are the subjective sensations of presence and pupil dilatation associated?
Furthermore, there are various eye movement parameters that have not yet been related to presence in a well controlled study. This suggests the following research question.
RQ2: Is there an association between the subjective sensation of presence and the number of out-of-bounds (fixations outside the display), and saccade amplitude?
Method
Design
We used a virtual roller coaster simulation as a stimulus. To increase the likelihood of having different levels of presence, we chose to manipulate the auditory media content since haptic or visual manipulations would have directly influenced the eye movements. We want to point out that two versions of the stimulus were visually identical – the only difference between high and low presence versions was sound (present vs. absent).
Our original study included a second trial that we chose not to include here since the repeated exposure created strong expectations and corresponding confounds (i.e. carry-over effects).
Material
We used a commercially available rollercoaster simulation (nolimitscoaster). We switched off speed displays and chose good weather conditions (i.e. sunny day). We presented the ride on a 46’’ LCD television. To rule out any differences besides the sound between the two versions (e.g. due to different real-time image rendering or viewing angles), we generated a highresolution video clip displaying a ride on the track “Plutonium” (cf.
Figure 1). The duration of the ride was 127 seconds.
Participants
Forty-four undergraduate students enrolled in Psychology volunteered to participate in this investigation. Mean age was 22.14 years (SD = 4.06). Among these participants, 40 were female. They received an extra credit for their participation and could end the experiment at any time.
Measurement
To track the eye movements, we used the EyeLink II device (SR Research). This head mounted video-based eye tracker uses infrared light to monitor the pupil– corneal reflection. The average accuracy is high (usually < 0.5°). The device allows for wearing glasses and enables head movements up to 30°. We tracked the subject’s dominant eye with a sampling rate of 500 Hz. We captured fixation duration, saccadic amplitude, saccade velocity, out-of-bounds, and pupil dilatation. Eye movements smaller than one degree of visual angle within two measures were integrated as one fixation, whereas all eye movements greater than one degree within two measures were counted as saccades.
To assess the subjective sensations of spatial presence, we used the MEC spatial presence (MEC-SPQ) (Vorderer et al., 2004) questionnaire. This instrument assesses six sub-dimensions of spatial presence states:
- -
self location (“It was as though my true location had shifted into the environment in the presentation”)
- -
possible actions (“The objects in the presentation gave me the feeling that I could do things with them”)
- -
spatial situation model (“I was able to imagine the arrangement of the spaces presented in the medium very well”)
- -
attention allocation (“I concentrated on the medium”)
- -
higher cognitive involvement (“I thought most about things having to do with the medium”)
- -
suspension of disbelief (“I concentrated whether there were any inconsistencies in the medium”)
Each dimension is captured by four items resulting in 24 items in total. According to Vorderer et al. (2004), computing the mean of all these 24 items results in the total spatial presence score. According to the authors, the scales’ reliability is high (Cronbach’s alpha = .93). Sacau, Laarni and Hartmann (2008) further added empirical evidence for the validity and reliability of the measure. As suggested, we used 5-point Likert scales ranging from 1 (‘I do not agree at all’) to 5 (‘I fully agree’). In this study, we calculated the reliabilities of the MEC-SPQ scales. These were highly reliable (all Cronbach’s alpha > .90.
Immersive tendency was captured by the measure of Witmer and Singer (1998). It assesses the disposition to become immersed or involved in mediated environments. The 21-item scale measures how easily someone experiences immersion in the world displayed by media. According to the authors, the scales’ reliability is good (Cronbach’s alpha = .81). Example item: “Do you ever become so involved in a video game that it is as if you are inside the game rather than moving a joystick and watching the screen?” We used 5-point Likert scales ranging from 1 (‘I do not agree at all’) to 5 (‘I fully agree’). All these questionnaire data and the demographics were collected on a computer.
Procedure
After informed consent was obtained, the participants were seated in front of the LCD screen (distance to the screen was 50 cm; no head fixation). Then, we put the head mounted eye tracker on the participant’s head and calibrated the eye tracking system. Before the roller coaster ride started, participants were told that they could enjoy the following presentation without any task. When the ride had ended, we removed the eye tracker and participants answered the questionnaires. After a second trial that we do not include in the analysis here, the participants were debriefed and thanked. The whole experiment lasted approximately 20 minutes.
Result
In a first step we excluded outliers (> 2 SD) and performed LN-transformations for variables that were appropriate. Table 1 displays the descriptives.
A manipulation check revealed that manipulating audio content did not influence the subjective experience of presence, t(40)=.13; p = .90. Yet, there were substantial inter-individual differences in the levels of presence. Thus, the following analyses focus on the individual differences in the experiences of presence and relate these to the eye movement parameters.
To test our hypotheses and research questions, we calculated a stepwise multiple regression analysis. We included eye-movement parameters and the immersive tendency as predictors and the presence score as criterion. The final model includes only one predictor: number of fixations. This analysis reveals that number of fixations is negatively related to the subjective sensation of presence, whereas the relationship between fixation duration and presence fails to reach significance (p = .15).
![Jemr 03 00023 i002]()
Thus, we have to reject H1 which predicts a positive relation between fixation duration and subjective sensations of presence. In contrast, we can accept H2 as there is a negative relation between the number of fixations and the subjective sensation of presence. Putting these findings further into perspective, we would like to point out that the number of fixations and fixation duration are strongly negatively related, r(42)= -.902; r2 = .814; p < .001. Yet the bivariate correlation between presence and fixation duration is marginally not significant, r(42)= .245; p = .059.
The regression analysis further reveals, against the prediction in H3, that the immersive tendency does not significantly influence the sensation of presence.
The stepwise regression analysis further provides answers to the research questions. Concerning RQ1, we found no relation between the pupil size measures and the presence score. RQ2 asked whether there is an association between the subjective sensation of presence and number of out-of-bounds, and saccade amplitude. Again there were no evident associations.
To further validate these findings, we calculated explorative discriminant analyses. We performed a median split on the overall MEC spatial presence score and tried to classify the participants scoring high vs. low on presence on the basis of their eye movements. We included all predictors showing a significant relation with the criterion or a corresponding tendency. In agreement with
Bortz (
2005), we consider an alpha level below .2 a tendency since rejecting H
1 requires a beta error below 5%.
Thus, our model including the parameters
- -
number of fixations (λ = .912; F(1, 40) = 3.871; p = .056)
- -
and saccade amplitude (λ = .917; F(1, 40) = 3.630; p = .064) is significant (λ = .846; χ² = 6.53; df = 2; p = .038). This model correctly classifies 66.7 % of the cases.
To gain further insights about the underlying mechanisms, we calculated the corresponding discriminat analyses for the MEC sub-dimensions. Thereby, eye movement parameters explain the most variance in the sub-dimension possible action. The parameters
- -
number of fixations, λ = .871; F(1, 40) = 5.917; p = .020,
- -
number of out-of-bounds, λ = .953; F(1, 40) = 1.982; p = .167,
- -
and fixation duration , λ = .922; F(1, 40) = 3.401; p = .073,
could classify the spatial presence sub-dimension possible actions best (λ = .760; χ² = 10.56; df = 3; p = .014). This model correctly classifies 73.8 % of the cases.
These analyses confirm that the number of fixations is a relevant predictor for sensations of presence and that some of the other parameters could turn out significant in larger samples or in different contexts.
Discussion
Our study shows that previous findings on eye movements and immersion seem to be, to some extent, also valid for eye movements and presence. Previous reseach found immersion to be associated with longer fixation duration (
Cox et al., 2006;
Tijs, 2006). Although our study finds only a tendency for the correlation between presence and fixation duration, there is a clear negative relationship between amount of fixation and presence. This seems plausible given the fact that amount of fixation and fixation duration are strongly correlated. We would like to point out that in our study, the amount of fixation is the more relevant predictor for presence than fixation duration. Yet, in a larger sample, fixation duration could be relevant for predicting presence in its own way: Even though fixation duration and number of fixations share 81 % of the variance, fixation duration bears the highest beta in value. This may indicate that the non-shared variance of number of fixations and fixation duration predicts presence.
Our findings further demonstrate that there is no relationship between pupil dilatation and the sensation of presence. We think that the relations between pupil size and immersion reported in previous research (
Tijs, 2006) may reflect a confound since participants in this research were looking at different stimuli in the high vs. low presence conditions. In our study, the stimuli used were constant in terms of brightness and content.
Wirth et al. (
2007) assumed presence to be a booster for any media effect. This includes emotional effects. However, presence is not tied to a particular emotion. Feeling present in a scary versus a joyful environment may result in fear as well as joy and vice versa. We think that many factors, such as brightness, emotional state, and arousal influence pupil dilatation, making it unlikely that pupil dilatation could serve as a reliable and valid indicator for presence.
The number of out-of-bounds was not a significant predictor for presence. Presence theory suggests that presence requires attention allocation towards the medium (
Wirth et al., 2007). Therefore, one could argue that high presence should be associated with little or no out-of-bounds. In our study, only few participants looked outside the display. This could account for a floor effect that – as a matter of fact – cannot explain much of the variance in the criterion. However, one spontaneous comment revealed that high sensations of presence may be associated with actively looking away from the screen. After the experiment, one participant told the experimenter that she found the presentation so intense that she had to fixate the frame of the display from time to time not to be drawn in the presentation too much. She further said that she has been using this strategy since she was a kid whenever being exposed to “intense” presentations.
To our surprise, the personality trait immersive tendency did not determine the actual sensation of presence (
Wirth et al., 2007; Witmer & Singer, 1998). Here, the actual viewing behavior seems to be more important than this trait. One could argue that some components of the immersive tendency, such as imagery skills, are not required in this particular environment that can be considered a sensory “rich” presentation. In addition, our presentation did not include any narration so that participants were not required to be motivated to follow or appreciate any kind of narration. As outlined above, bottom-up as well as top-down processes are relevant for presence (
Stark, 1995;
Wirth et al., 2007). Given that an immersive environment was used, one could argue that the bottom-up processes triggered and maintained sensations of presence (
Wirth et al., 2007). Yet the fact that the visual input was kept constant, could imply that the differences in the viewing behavior as well as in the sensations of presence must have a top-down component, even though the presentation was sensory rich and dynamic. A possible example for such a topdown effect could be the thought: “What a poor and outdated simulation. The graphics of the games I usually play are so much better.” Yet this participant could feel present on the ride through the optical flow. Therefore we think that both, bottom-up as well as top-down processes, account for the findings in this study.
In dynamic environments such as the virtual roller coaster, spatial presence is associated with increased activity in the parietal lobe regions, which in the first place mediates spatial localization (
Lee & Kim, 2008). In less dynamic environments, spatial cues such as shadowing or object motion were found to be more important than object cues such as textures or geometric detail (
Lee & Kim, 2008). Accordingly, through an EEG study, it was found that in the context of a virtual rollercoaster ride, high spatial presence activated the parietal lobe regions of the brain that mainly mediate spatial localization (Baumgartner, Valko, Esslen & Jäncke, 2006). Therefore, the virtual roller coaster seems to elicit spatial presence in the first place through spatial cues such as optical flow. Presence is a complex construct depending on multiple factors. Among them are the medium (e.g. display, controls), the content (e.g. fictional vs. real), the user (e.g. motivation, previous experiences, vision) and the situation (e.g. noisy vs. quiet environment) (cf.
Sacau et al., 2008). Most noteworthy, the eye movements could predict to a relevant extent, the amount of subjective presence.
Much more research is required to fully understand the interplay of these factors. This research bears some limitations. First, we used only one type of virtual environment. There is need for replications for other kinds of environments before generalizing our findings. In particular, we think that further research is needed for non-dynamic environments. Second, in the first trial, the audio manipulation did not influence the subjective sensations of presence at all. One could argue that optical flow is the most important feature to evoke presence, whereas audio seems to be irrelevant. Third, we used ex post measures (i.e., measured after exposure). Although Wissmath, Weibel and Mast (2010) found ex post ratings to be highly valid and reliable indicators of presence, the self localization is clearly a highly dynamic process. Therefore, combining continuous measures of presence and eye movement data could be the decisive step to further disentangle the interplay between eye movements and the sensation of presence. Thereby, an effective and yet unobtrusive manipulation of presence would be most desirable.