A Subjective and Behavioral Assessment of Affordances in Virtual Architectural Walkthroughs

: Immersive technologies, such as VR, offer ﬁrst-person experiences using depth perception and spatial awareness that elucidate a sense of space impossible with traditional visualization techniques. This paper looks beyond the visual aspects and towards understanding the experiential aspects of two popular uses of VR in 3D architectural visualization: a “passive walkthrough” and an “interactive walkthrough”. We designed a within-subject experiment to measure the user-perceived quality for both experiences. All participants (N = 34) were exposed to both scenarios and afterwards responded to a post-experience questionnaire; meanwhile, their physical activity and simple active behaviors were also recorded. Results indicate that while the fully immersive-interactive experience rendered a heightened sense of presence in users, overt behaviors (movement and gesture) did not change for users. We discuss the potential use of subjective assessments and user behavior analysis to understand user-perceived experiential quality inside virtual environments, which should be useful in building taxonomies and designing affordances that best ﬁt these environments.


Introduction
Architectural visualizations are uses of media (images, diagrams and more recently 3D modeling techniques) to express and externally reflect upon, design visions. Advances in computer-generated imagery have increased our appetite for life-like photorealistic visualizations of would-be environments and speculations on possible built futures. To this end, the emergence of immersive technologies, especially virtual reality (VR) applications, has presented a powerful first-person communication medium allowing users to step into, freely move about in and explore the environment. Instead of imagining a design, one can have a naturalistic experience akin to the real-world experience of a built environment. VR employs dimensions of immersion, interactivity and presence within computer-generated models to produce an explorable place illusion [1,2]. This feature makes it easier for users to understand spatial relationships, scale and depth. VR-driven architectural visualizations allow projects to be showcased in real-time, enabling immediate and critical feedback. With VR, ideas can be "considered, revised, developed, rejected and returned to" [3]. The synthetic environments of virtual architectural worlds invoke the sense of being inside them-a sense of presence [4]. This subjective feeling [5,6] is pivotal for virtual experiences and emerges out of the interplay of immersion and interaction [7][8][9]. Research indicates that merely a place illusion or spatial presence [8,[10][11][12] alone is not sufficient to sustain prolonged interest in virtual environments (VE). In fact, users also require motivation through involvement and engagement within these worlds for a heightened sense of selfpresence [13,14]. This could be a consequence of focusing one's energy and attention on the stimuli available in the virtual world, e.g., interaction possibilities, with which an involved a user experiences more presence [7,15]. Given this premise, we investigated two popular uses of VR in architectural visualization for their effects on users-a "passive walkthrough" and an "interactive walkthrough". We tested subjects within a virtual architectural interior (see Figure 1) with the objective of studying the effects of interactivity on the overall formation of a sense of presence, engagement, perceived naturalness and negative effects. This paper describes the experiment and discusses the results.

Plausibility in Virtual Environments
Immersive virtual environments (IVE) are 360-degree spatial experiences that either superimpose or occlude the real-space altogether. With this, the ubiquity of real-time rendering has made it possible to experience virtual architectural environments with correct scale and depth precision. IVEs now offer visualization solutions for the design industry, environment models for immersive games, training environments for virtual learning [16], visualization solutions for collaborative design [17] and methods for the gamification of building information modeling (BIM) to test various physical dynamics and performances [18]. All current VR applications facilitate 360°viewing. Some are passive experiences along predefined paths or points with little exploration and interaction. Others allow freedom of movement (exploration) but no interaction, whereas in their most interactive form, they allow for both exploration and interactions with virtual objects.
IVEs are effective spaces because of their similarity to our real-world navigation, mapping and manipulation techniques. As humans we respond naturally and effortlessly to perceived actions. We take this behavior with us into virtual spaces when dealing with the affordances they offer [19]. Like presence, the phenomenon of plausibility illusion (Psi) [20] is also important for research within virtual reality applications. Psi refers to the illusion that a virtual scenario experienced is actually occurring [21]. This refers to the coherence and consistency of behaviors and events that transpire within the context of a given virtual scenario [22]. Psi fits well with the conceptualization of quality as a cognitive judgment. This was investigated by Skarbez et al. [23] in an empirical study where participants transitioned from lower-coherence to higher-coherence scenarios. Participants could change the characteristics and behaviors of their virtual avatars with the goal of matching them to themselves in the real world. The level of plausibility was higher in the highest-coherence scenario; i.e., users connected with the most well-behaved avatar.
For the user, quality is a judgment that distinguishes between perceived quality and expected quality. Refs. [21,24] previously examined the effects of plausibility mismatches on the formation of an overall sense of presence. Both studies underlined the need for developing protocols to assess coherence factors and their consistencies. This study observed selected affordances within an immersive virtual environment (IVE) and examined their effects in terms of user behavior and perceived experiential quality.

Affordances and Perceived Quality in Virtual Environments
We do not objectively perceive environmental properties or objects; rather, we perceive what we can do [25]. The perception-in-action [26] process is facilitated by the opportunities presented to an organism by its environment, or the situated affordances of the environment. The concept has been around for a long time, and a detailed explication is not within the mandate of this paper. However, it is important to emphasize that affordances are neither objective nor subjective; instead, they "cut across the dichotomy of subjective-objective... Both physical and psychical, yet neither" [27]. Affordances must therefore be understood in two ways: (a) affordances are properties of the environment; or (b) affordances are relations between an organism and its environment. Building on Hassenzahl's hedonic and pragmatic model [28,29], we define four distinct affordance types that expand from more immediate operational goals to deeper biological or psychological needs. These are: 1.
Manipulation Affordances: The directly perceived affordances that speak to the physical/sensorial compatibility between the user and the object.

2.
Effect Affordance: It describes the functioning of the object due to manipulation. It is also directly perceived based on cause-and-effect knowledge of the user.

3.
Use Affordance: It relates to the physical and mental skills of the user utilizing the right cognitive or usage plans. 4.
Experience Affordances: They are related to the psychological and biological needs of the user and are perceived only with correct knowledge and usage modes.
Manipulation affordances are at the lowest level and are signified by motor goals performed in order to accomplish do-goals, i.e., the effect and use affordances. At the highest level are be-goals (or experience affordances) that motivate actions towards purposes [29]. Between them, they highlight the how, what and why of interaction possibilities. For example, a VE can afford manipulation to a user in the form of pressing a button (motor-goal). The effect affordance of the environment can associate the pressing of a button with the activation of an illumination object, e.g., a light on a wall-cause and effect. This combination and sequence of actions could be intended towards a use, such as illuminating the scene; the effect-and-use affordances use do-goals. The failure or fulfillment of a do-goal results in emotional consequences, such as satisfaction or annoyance. Returning to Hassenzahl [30], the emotional (hedonic) aspects make up the experience affordance for a user-enabling the achievement of be-goals [31]. The pragmatic aspects of a user's experience come from the compatibility of the user's skills with the capabilities of manipulation-the effect and use affordances of the environment [32].
For a light to turn on when a button is pressed meets a users expectation of how life real-world action sequences work. A light illuminating a scene realizes the use for that action. When these actions are matched to the abilities of the user, their successful performance achieves be-goals inside VEs that can cause pleasure. On the contrary, failing to perform them can cause annoyance. The extent of the interactions available inside a VE and their resultant consequences afford experiences to users. Steffen et al. [33] examined how the availability of affordances in VR applications gives them an edge over physical reality in certain use-cases, e.g., simulation-based training. On the other hand, works [34] evaluating the perceptions of real-world affordances-such as texture, gradient, handlesize, hand-size, etc.-have found that they affect user's choices and emotional states within VEs. This paper builds upon the aforementioned works by focusing on the psychological aspects of affordances inside VEs. In particular, we focus on their influences on user-perceived quality and the sense of presence and plausibility. It is part of ongoing research into factors influencing user experience and performance inside IVEs [35].

Measuring User Behavior and Experience
User-perceived quality is the emotional response, involvement and degree of interest a user shows. Inside IVEs, a foremost user experience is that of a "sense of presence," characterized as the "human experience" of the environment [20,36]. Presence is classified into three categories [4,10]: spatial presence, self-presence and co-presence. Of these, spatial presence, is a subjective feeling of "being there" inside a mediated space. For a user, this is characterized by a temporary loss of attention to the physical environment, and a behavioral response to the mediated environment. A user is said to be in a state of immersion when he responds to the physical and symbolic affordances (action possibilities) of the mediated environment [37]; in how he interacts with its "continuous stream of stimuli" [7], appropriates the tools/interface at hand and moderates his actions. Understanding these points help optimize the overall user-perceived quality of immersive applications. Kahneman [38] proposed two systems of thought: System 1 (fast, instinctive, emotional); and System 2 (slow, deliberate, logical). Over the years, numerous surveys and questionnaires [7,[39][40][41] have been used as subjective assessment measures of VEs to capture self-reported System 2 reflective processes-things that do not usually come naturally and require some sort of conscious mental exertion on the part of the user-skills, mental or emotional states, etc. Reflexive System 1 skills are more intuitive and automatic, such as the innate abilities to perceive the world and recognize objects. They are better captured using physiological measures to assess covert and fast behaviors. Considering that most IVEs make use of (or imitate) real-world navigation and manipulation techniques, we have employed observation methodologies in this work. We believe behavioral observation can be useful for assess ing data (overt-motor responses and movements patterns) of subjects collected while they explore IVEs.

Materials and Methods
We conducted a repeated measures user study in two visually identical virtual models, manipulating only the affordances of the environment. We had two independent scenarios: • Passive-walkthrough (PW), an immersive environment with navigation affordances but no interactive features; • Interactive-walkthrough (IW), an immersive-interactive environment with navigation affordances and a few manipulation and effect affordances.
The aim was to observe and investigate whether the addition of affordances-and interactivity features-within IVEs affected the overt behavior of users; and further, whether their behavioral performances correlated with their subjective evaluations of the IVEs. The use of a within-subject method and visually identical environments was to reduce errors associated with individual differences. For subjective assessments, the study collected profile surveys, and a presence questionnaire was used by users for post-experience self-reporting. The behavioral assessment was based on an active-time diary and ethograms (inventory of behaviours or actions) generated from the video-data for each participant in the study. The behavioral patterns were analyzed against the subjective experiential scores for each participant to verify whether: manipulation and effect affordances in the immersive-interactive (IW) scenario would result in higher perceived experiential quality and higher behavioral activity compared to the non-interactive scenario (PW).

Environment
The virtual environment was designed in-house and modeled in Sketch-Up Pro. Texturing, lighting and interactivity elements were applied in UnReal Engine (UE4). The VE in both scenarios represents an architectural interior of a one-bedroom (32 feet on either side) apartment. The open-plan layout has the kitchen extending into the living room and a balcony. The same balcony can be accessed from the bedroom. There is a separate bathroom and storage space to explore as well. Both scenarios, PW and IW, support natural/free walking. They also support navigation affordances using point-and-teleport technique for movement and navigation in the IVE. In addition, IW also used manipulation affordances and effect affordances. Figure 2 shows an image of the environment.

1.
Passive Walkthrough (PW): This model uses high-poly assets from the UE Marketplace, and high-resolution images from an online repository. The model was prepared using datasmith in Unreal Engine 4. In order to simulate real-world materials, we used PBR-texturing (physically based rendering), realistic lighting and a spatial soundscape to enhance the immersive experience. The environment was optimized for used with HTC Vive Pro. Both handheld controllers can be used to exploit the navigation affordance of point-and-teleport. Additionally, hidden affordances in the form of collider components were also applied to surfaces in the model. They were activated to discourage teleportation or natural/free walking through surfaces (such as walls) to avoid unrealistic perforation effects of virtual surfaces. All doors in this environment were open by default to allow users free movement through the interior space.

2.
Interactive Walkthrough (IW): This model has all the features from the PW. In addition, the IW scenario also uses additional manipulation and effect affordances. While one handheld controller was used for point-and-teleport, the second was used for the interactivity features that include: • Two light toggles around the average eye-level: A familiar design feature, a button (with a light-bulb icon) provided the required cognitive affordance and the opportunity for manipulation using the handheld controller. A laser pointer (similarly to real-life pointers) could be directed at the button to toggle on-off. The explicit manipulation affordance was immediately satisfied with the effect affordance, as the user should notice their actions resulting in additional scene lighting. • Six operable doors around the average waist height: Interaction with doors was communicated via metaphorical affordance, i.e., the imitation of real-life door handles. The familiar and explicit affordance of hold-and-twist was, however, not present; instead, their manipulation was possible through a hidden affordance activated when a user clicked the handheld controller closer to the handle. The effect and pattern affordances were revealed to the user through successive movements resulting in learning how to open a virtual door. • Six cabinets and drawers at various heights: Different height levels were used to assess the naturalness of the user's behavioral response. The cabinets used the same manipulation and effect affordances as the doors.
All doors in this scenario were closed by default so that users had to open them using the handheld controllers in order to access different spaces.

Participants
The study inducted 34 participants (18 male, 16 female, µ = 26.7 ± 6.7) over a period of two weeks via mailing lists, flyers and online forms. Participants tried both scenarios in a randomized order. Before this, none of the participants had tested VR in a laboratory scenario. Then participants reported no competence in VR, whereas 15 participants had basic competence and 9 reported intermediate competence in using VR applications. A total of 68 experiences (N = 34 × 2 scenarios/subject) were recorded. Out of 34, two entire sessions (for subject S4 and S10) were excluded on account of incomplete video data. Participants each signed a written consent form and were duly compensated for their participation. All participants were active users of multimedia technologies, and most had prior experience with head-mounted displays. The experiment was pre-approved, and data collection was in line with ethical principles for medical research involving human subjects. Figure 3 shows a participant engaged with the environment.

Laboratory and Equipment:
The experiment was conducted in our VR laboratory, which is approximately 16 × 19 feet in size. The laboratory is equipped for subjective and physiological assessments. The VR simulation was run on a desktop PC operating with 64-bit Windows 10 Pro with an Intel Core i7 7700 3.6 GHz processor, 32 GB DDR4 SDRAM (2800 MHz) and a single 3 GB NVIDIA GeForce GTX 1060 graphics card. The participants explored the virtual environment using the HTC Vive Pro HMD supporting 6DOF and motion tracking. It has a total resolution of (1440 × 1600 per eye) at a 90 Hz refresh rate. The headset features a 110-degree FoV and supports 3D spatial audio. The play area was fixed at 10 × 14 feet inside the laboratory. The experience was externally displayed on a 65-inch Samsung Full-HD TV to examine the activity of the participants and look out for unwanted artifacts and/or graphic or interactivity malfunctions.

Procedure
Experimental sessions were pre-scheduled using Google Forms. They were limited to a single participant at a time. Each slot was allocated 60 min that included testing both scenarios and filling out the respective questionnaires. Participants were received by the moderator. They then filled out a 10-item background information survey. Next, participants tried on the headset (HMD) to familiarize themselves with the HTC Vive Pro controllers following a quick tutorial inside the SteamVR Home space. Participants were then provided a set of instructions explaining the experimental procedure. All participants confirmed their willingness by signing a consent form. The experiment was divided into two parts, i.e., PW and IW. The task order was deliberately randomized for each participant to prevent carryover effects. Subjects spent time in each scenario per their liking. Each experience was followed by the ITC-SOPI questionnaire for experiential evaluation. After testing both scenarios, participants were thanked and compensated for their time.

Subjective Measure
The experiment used the Independent Television Company Sense of Presence Inventory (ITC-SOPI) as the prime instrument-a validated cross-media questionnaire for users to report their experiences of a "displayed environment" [40]. The protocol collected background information, such as demographics, digital proficiency and VR competency at the beginning. Afterwards, participants filled out a post-experience questionnaire following each scenario. The responses were recorded on a 1-5 Likert scale for the four aspects of the ITC-SOPI. Participants had the additional option to put down their comments at the end of all ratings. The ITC-SOPI included: • Spatial presence (SP)-a sense of being there and/or encapsulated by a space. • Engagement (EN)-feeling psychologically involved in, feeling moved by and/or enjoying the content. • Ecological validity, or naturalness (NV)-perceiving the mediated environment as lifelike and/or natural. • Negative effects (NE)-an adverse psychological reaction towards the mediated environment.

Behavioral Observations
Active run-time logs were created by the application for each use. Click activities were also logged within the game. Additionally, over 10 h of video data of participant activity was recorded. Video-based observations made it possible for subjects to express themselves unobtrusively, feel at ease and facilitate more natural. Video-based behavior observation enables frame-accurate annotation of behavior. The data were post-processed for analysis and observation coding inside open-source event logging software, BORIS (Behavioral Observation Research Interactive Software) [42]. All behaviors were coded based on manual video analysis by a single person to ensure reliability. Codings were done in an ethogram (details follow in the next subsection). Observations were coded for each subject in each scenario. Two main types were determined:

1.
State Events: durational events that have beginnings and ends. Since active run-times for participants varied considerably, a uniform 3-min observation time was used. A 3-min interval/slice was randomly selected from the active run-time sequence of each user.

Results
An experiment was designed with a single categorical group at two levels: PW and IW. Four dimensions from the ITC-SOPI, namely, SP, EN NV and NE, make up the measured quantitative variables for our study. The manually coded participant behavior types from video analysis were also variables. Thirty-four participants evaluated two virtual scenarios, out of which, two sessions were excluded due to incomplete data. Sixty-four data entries (2 per subject X 32) were received and analyzed for the four dimensions of the ITC-SOPI and overt user behavior.

From ITC-SOPI
The results of the questionnaire data were compiled in a mean opinion score (MOS)average judgment for one scenario over all subjects. A multivariate analysis of covariance (MANCOVA) was run after controlling for the covariate of active run-time. This was done considering the possible effects of active run-time (duration of time spent within the VE) on the four dimensions of the ITC-SOPI. Statistical significance was assumed at p = 0.05. There was a statistically significant difference between the two categorical scenarios (PW and IW) on the combined dependent variables of spatial presence (SP), engagement (EN), naturalness (NV) and negative effects (NE) after controlling for active run-time: F(4, 59) = 4.662, p = 0.02, Wilk's λ = 0.76, η 2 p = 0.24. We ran separate ANOVAs for each dependent variable SP, EN, NV and NE. We found significant differences between PW & IW for SP & EN, but no statistically significant difference was found for NV or NE:  Table 1 shows the comparison of the means of the three items under both scenarios. The mean levels in the table indicate higher values for the IW scenario in at least two categories of SP and EN. These differences can be visualized in the min-max plots available for the four variables in Figure 4.

From the Time-Log and Observations
Our understanding of behavior begins with the collective means for run-time and activities of all participants (shown in Table 2). The collective run-time for PW was 284 min 25 s (17,065 s). It was 337 min 38 s (20,258 s) for IW. Differences were observed in the run-time, durational and non-durational activities between the two scenarios, PW and IW.
The ANOVA results below indicate no statistically significant results for all behaviors, barring click events, which demonstrated notable differences. This was expected, as subjects had more manipulation opportunities in IW compared to PW. Figure 5 illustrates the min-max plots for participant behavior.

Discussion
The above results indicate that while user-perceived experiential quality improved from IW to PW, user behavior showed no significant difference across the sample. Click (or point) events were an exception, showing an increase in IW owing to more action possibilities.
We found that the addition of even a few manipulation and effect affordances markedly increased the place illusion inside the IVE. User-perceived spatial presence increased from PW to IW with a p-value = 0.001. Insofar as PW provided multi-directional viewing, it nonetheless remained passive, whereas the interactions in IW made the environment seem more active. Subjects did not feel surrounded by a lifeless world, but one which responded to their actions. This was also expressed in writing by some subjects. The same possibility for action positively affected the level of engagement or involvement (p-value = 0.017) that directly correlates with the significance a user attaches to the stimuli or activity of the virtual environment [13,18]. This further alludes to the importance of affordances in creating opportunities for action inside IVEs [22]. Both scenarios had quite high mean scores. However, manipulation and effect affordances did not convince the subjects more of the naturalness, life-likeness or persuasiveness of the virtual environment. Subjects found no difference; p-value = 0.279. The same was true for negative effects, as the presence of an adverse psychological reaction did not vary between scenarios: p-value = 0.383.
In this section, we expand the collective results with a by-subject comparison for further understanding. We conducted observational assessments of three individual subjects and used event plots for their activity. The three subjects were selected based on the similarity of their logged run-times in both scenarios (in Table 3). The run-time variance between scenarios for other subjects was far greater.

Subject S15
The subject logged the lowest run-time in both scenarios. This indicates that the change in scenario did not effect the time-use tendency of the user. Despite this, the subject reported a higher SP score in IW compared to PW (µ = 3.5 > µ = 2.72). The same is also true for a higher EN score in IW (µ = 3.31 > µ = 2.62), and a marginally higher score for NV in IW too (µ = 4.2 > µ = 3.8). Considering the increases in SP, EN and NV, there is a visible difference in still-to-stride ratio from scenario to PW (52:48) to IW (76:22). In IW, the subject remained stationary for longer to interact with objects. This is visible from the click events that almost doubled from 20 to 43. Figure 6 compares the events plots.

Subject S27
The subject produced a similar median run-time log for both scenarios. Once again, the change in scenario did not effect the time-use tendency of the user. Compared to PW, the subject reported a minimal score increase for all three dimensions in IW: SP (µ = 3.7 > µ = 3.1), EN (µ = 4.15 > µ = 3.8) and NV (µ = 4.4 > µ = 4.0). The overt behavior for the subject also barely shifted from one scenario to the other. From the plot in Figure 7 we can see the similarities of the events. It can be confirmed with Table 3 that there were next to no behavioral changes by this subject.

Subject S05
The subject recorded long run-time logs in regard to the whole sample in both scenarios, consecutively. As evident from the event plot in Figure 8, the subject barely moved from one position in both scenarios. However, it doubled its click events from 25 to 51 in the IW scenario. This, however, did not effect the SP score at all. We see a hair-line increase in IW (µ = 3.4 > µ = 3.5). Interestingly enough, the score for EN was higher in scenario PW for this subject (µ = 4.03 > µ = 3.85). The same was true for NV with a higher score in PW (µ = 3.6 > µ = 3.4). The overt behavior remained similar in both scenarios for this subject as well.

Conclusions
The aim of this study was to conduct a comparative assessment of two VR experiences and cross-examine their user-perceived experiential quality against how users behaved in them. We analyzed the effects of manipulation and effect affordances inside a virtual architectural interior on the overall sense of presence in users and how they modified their behavior with respect to these affordances.
On the one hand, our study confirms that IVEs are more than just passive geometries and that users feel cognitively and emotionally more involved in virtual environments with action possibilities. The results validated that affordances do positively affect the presence and user-perceived quality. However, results from observation analysis nullified our hypothesis that subjects in the IW scenario would demonstrate higher overt-motor responses to manipulation and effect affordances. Subjects' overt behavior remained predominantly unmoved between the scenarios. The longevity of durational events and frequency of momentary events did not show any significant changes. It is perhaps this lack of overt activity that caused users not to notice any difference in negative effects (arising from exaggerated head-movements, etc.) between the scenarios either. We could observe that: 1.
The representationalism (metaphorical affordance) of virtual environments in its imitation of real-life objects creates expectations that can not be physically met, e.g., the door handle.

2.
Affordance mismatches resulted in the users appropriating the ready-at-hand tool (i.e., the handheld controller) in a manner most familiar to them.

3.
While VR creates an illusion of real-life behavior with objects, users did not use spatial literacy; instead, they felt more comfortable relying on familiar digital literacies (like the pointing and clicking of a mouse).

4.
Metaphorical affordances can be useful when the emphasis is on physical exploration, and one-on-one imitation of a function may not be preferred-e.g., when designing immersive-interactive architectural or exhibition tours.

5.
Explicit affordances will help when a realistic one-on-one imitation of a function is required in VEs-design prototyping support, test fixture solutions, etc.
It was also observed that most subjects preferred the point-and-teleport technique to natural walking. They avoided extending out in space, even when the situation required so within the IVE. This establishes a premise: investigating whether users' background knowledge of multimedia technologies influences their locomotion preferences. Future studies could include users with lower technological proficiencies to test this. There is definitely a need to further understand the taxonomy of affordances with respect to virtual environments. If most VR experiences are to remain similar to real-life, then the designs of objects and their affordances have to be adjusted to the "human experience" of immersive media. Our future work will focus more on affordance mismatches and their effects on coherence and overall plausibility within VEs. We will further work on subjective and computer-based observation methods to understand mental, behavioral and emotional affordances and their effects on experiential quality inside IVEs.