This study investigated the interaction between players and shared remote locations in a remote AR hide-and-seek game compared to a co-located RW game. We designed the above-discussed remote AR hide-and-seek game to answer our RQ “How does the interaction between players and shared remote locations in a remote AR hide-and-seek game compare to interactions in a co-located RW hide-and-seek game in terms of players’ spatial decision-making, game engagement and player experience?”. The study compared how players made spatial decisions to test H1, and it evaluated game engagement and overall experience across the two conditions to test H2.
6.1. Player Strategies and Spatial Decision-Making
Following Montello’s view of spatial decision-making as a process of perceiving, interpreting, and acting on spatial information [
18], our findings support H1, demonstrating a clear shift in how players engaged with space in the AR condition. Longer hiding times (
Section 5), increased upward gaze fixations (
Figure 11), and participant reflections (e.g., “It took a while to get used to the AR environment but once did, it was fun to find creative hiding spots.”) show that players adopted distinct hiding strategies. These strategies were enabled by AR game mechanics introduced in this game design. For example, players could position virtual objects without physical constraints like gravity or reachability. This allowed placements in “
mid-air” or on “
ceilings”.
These novel spatial possibilities introduced greater complexity because they expanded the potential hiding spots beyond traditional physical boundaries. In the RW condition, hiding spots were limited to locations governed by familiar rules of physics, like gravity or flat surface availability. However, the AR condition enabled placements in unconventional areas, such as placements like mid-air or overhead spots. In the AR condition, players had to use different spatial reasoning. They needed to mentally visualize and assess unfamiliar hiding spots. This was different from the RW condition, where spots were predictable and physically constrained. Players demonstrated this shift through longer hiding times and upward gaze patterns, that are reported in the results. It suggests that players actively explored new parts of the space that would usually be ignored in RW gameplay. Participants also described choosing placements that felt “creative” or “playful”, such as hiding objects on ceilings or floating in the air. These behaviors indicate that the spatial reasoning involved in AR was not only different but also often more exploratory, driven by the novelty of the AR game design. However, these creative opportunities simultaneously introduced usability challenges, including difficulties in precisely controlling object placement through virtual thumbsticks and maintaining spatial orientation. This could discourage sustained player engagement, and further studies are needed to investigate this aspect.
6.1.1. Role-Based Analysis of the AR Hide-and-Seek Gameplay
The role-based analysis provides further insights into how Hiders and Seekers adapted their strategies and interactions across AR and RW conditions, focusing on hiding strategies, spatial exploration, and time-based metrics.
In the AR condition, Hiders were able to hide objects in places that would not be possible in the real world, such as mid-air or on ceilings, by using the AR game mechanics such as object placement and manipulating virtual objects using thumbsticks. This expanded the range of hiding spots and fostered innovative spatial decision-making as discussed above. Compared to RW, where physical constraints limited strategies, AR enabled Hiders to exploit vertical spaces and unconventional placements. Quantitatively, the median hiding time in AR was significantly longer than in RW (). This difference may reflect the additional time Hiders needed to explore and make decisions within the more flexible and unfamiliar spatial environment provided by AR. This difference may further suggest that participants spent more time planning placements due to the increased flexibility of the AR environment.
It might also reflect the learning curve participants described in the interviews, where eight participants noted that “it took a while to get used to” unconventional hiding spots made possible by AR. While these findings suggest a potentially more complex spatial decision-making process, further research needs to clarify how much of the additional hiding time is consumed from genuine complexity versus simple adaptation to unfamiliar AR game mechanics.
Eye-tracking data support the notion that AR gameplay changes conventional search behaviors. In RW, scan paths were dispersed primarily at ground-level and horizontal surfaces, suggesting Seekers instinctively searched areas consistent with traditional hiding spots. However, the AR condition yielded denser fixations in higher or mid-air regions (
Figure 11), highlighting how participants actively explored vertical dimensions and structural elements, such as the tops of door frames or horizontal concrete beams, rarely noted in RW play. This vertical shift may indicate that AR encouraged Seekers to spend more visual attention on less conventional areas, such as ceilings or mid-air, in response to how Hiders used the expanded spatial possibilities. Although, interestingly, the quantitative measure of “time to find” remained comparable across conditions, the AR-based search showed higher vertical spread in gaze patterns and a more directed focus on potential mid-air targets. These findings highlight that even though players spent a similar amount of time searching in both conditions, AR required Seekers to scan different height ranges and explore less conventional hiding spots. This shift in behavior shows how AR changes the way players make spatial decisions during gameplay.
6.1.2. Object Usage and Visual Saliency
An analysis of object preferences to hide revealed clear differences between the AR and RW conditions. As shown in
Table 2, the
Minion,
Batman, and
Fries were much more popular in the AR condition. They were selected 15, 10, and 10 times, respectively. Although other objects were also visually distinctive (see
Figure 1), participants described the
Minion as easier to see and control when placing it in AR. In the RW condition, the
Koala, a toy with less color and visual contrast, was used the most. The
Minion was the least selected object, used only once. These results suggest that color, brightness, and clear edges strongly influenced object choice in AR.
This behavior may seem counterintuitive for a game like hide-and-seek, because the game demands Hiders to focus on the concealment of the object. However, according to feedback from participants, it appears Hiders often prioritized ease of placement and visual feedback over concealment effectiveness.
“The yellow Minion stood out more in AR, so it was easier to place where I wanted.”—P35
The preference for visually distinctive objects like the
Minion in our AR hide-and-seek game aligns with findings in AR geovisualization research that visual saliency supports user attention and spatial interaction. To better understand how different visual properties affect user guidance, Zhang et al. [
45] have evaluated seven visual variables in an outdoor AR geovisualization task. The study was performed using Microsoft HoloLens 2 devices (Microsoft Way, Redmond, WA, USA) equipped with eye-tracking capabilities. Their study considered seven visual variables, comprising five static and two dynamic types. The static variables were
natural material color, illuminating material color, shape,
angular size, and
linear size, while the dynamic variables included
vibration (Note: The “vibration” refers to a visually animated oscillation, not a haptic cue [
45].) and
flicker. Participants were asked to identify target map symbols overlaid in the AR environment as quickly and accurately as possible, allowing the researchers to assess which visual properties most effectively guided attention and supported spatial decision-making. Results indicated that their dynamic variables (
vibration and
flicker) provided the strongest visual guidance. Among the static variables,
shape and
illuminating material color were the most effective in capturing user attention. The color saturation and contour clarity of the
Minion object likely triggered a similar visual response. These features may have offered perceptual clarity in a cluttered, LiDAR scan-based AR space.
While Zhang et al. [
45] focused on passive observation of map symbols, our study shows that saliency also supports active object placement. Hiders not only noticed salient objects, but also they had to rotate, scale, and align them appropriately in the remote 3D environment. Visual saliency likely reduced this effort by improving object visibility and spatial feedback. Our quantitative results show that Hiders took significantly longer to place objects in AR than in RW (
M = 72.1 s vs. 52.4 s,
). This increase reflects the added complexity of aligning objects in a 3D AR space without tactile feedback. However, selecting visually salient objects likely helped Hiders manage this complexity.
Sutton et al. [
46] also support the above finding. They investigated the subtle saliency modulation (Note: The authors identify saliency modulation as the process of making a target object or a region stand out more naturally (like boosting its contrast or sharpness) so that people notice it faster) in optical see-through AR using a Microsoft HoloLens 1 (e.g., contrast boosting). Their within-subject experiment compared unmodulated images, saliency-modulated images, and images marked with explicit overlays (e.g., circles). Results showed that saliency modulation significantly increased the number of participants fixating on target areas and reduced time to first fixation compared to unmodulated scenes. Although their study involved passive viewing of static images, our interactive gameplay study reveals similar effects in a dynamic, task-driven context, where players use AR. These findings together reinforce the value of perceptually grounded, non-intrusive AR cues in supporting spatial decision-making and engagement.
In contrast, the RW condition favored what could be considered more practical hiding strategies. These involved placing objects in hiding spots that were physically reachable and easier to conceal within the real environment. For example, behind furniture, under tables, or within shelves (see
Table 3). Objects like the
Koala and
Rocket, which were smaller and less visually prominent items, were selected more frequently in RW (16 and 11 times, respectively). No participant mentioned visual feedback as a constraint in RW. Notably, 15 participants in the RW condition employed basic camouflage strategies by aligning the object’s color with the surrounding environment, even when the object remained visible.
Figure 13 shows an example of this behavior, where a Hider placed the red color fries object beside a similarly colored toy, attempting to blend it in through color matching rather than concealment through occlusion.
While the majority of AR participants (40 participants) favored visually salient objects for ease of placement and feedback (see
Figure 16), 6 participants experimented with placing virtual objects behind physical ones to block them from the Seeker’s view. This behavior is illustrated in
Figure 14 and
Figure 15, where the virtual fries object was positioned behind a larger physical toy. This demonstrates that while saliency guided most object choices, some participants incorporated spatial depth and alignment to introduce concealment tactics more commonly seen in RW gameplay.
Together, these findings show that visual saliency plays a supportive role in AR gameplay. It helps to reduce the perceptual effort of spatial tasks, especially in interfaces like handheld AR. Even in a game that rewards hiding, Hiders still favored visibility and control. Future LBARGs can incorporate this further by designing virtual assets with strong visual anchors to support player actions in AR spaces.
6.1.3. Gaze Behavior and Search Patterns
Players demonstrated scattered fixations and upward scan paths, particularly due to spatial possibilities such as mid-air and ceiling placements in AR. A study compared visual search across AR, Augmented Virtuality (AV), and VR [
47] and found that AR imposes a higher cognitive load than either AV or VR, and that is reflected by more dispersed eye-movement patterns in. In their study, AR users swept their gaze across many distinct regions, rather than homing in on a few likely spots.
In the AR condition of the study, players held the iPad throughout gameplay. This may have forced them to split their attention between moving in the real world and interacting with the AR interface on the screen. Such split attention and added visual complexity can raise cognitive load [
47]. Furthermore, this dual-task situation might have constrained their natural spatial exploration, leading to more deliberate and limited movement patterns compared to the RW condition, where participants could freely move without having to manage a handheld device. While Handheld AR reflects a common and accessible form of AR interaction, it represents only one modality of LBARG interfaces. Other platforms, such as HMDs (e.g., Microsoft HoloLens), offer hands-free interaction. However, these systems also introduce challenges such as narrow FOV, eye strain, or fatigue from prolonged use. Future work could compare how different devices affect gameplay strategies and spatial presence in remote multiplayer AR experiences.
In a traditional hide-and-seek scenario, objects cannot be suspended mid-air, but they need to be placed on surfaces or within arm’s reach. These strategies refer to common real-world assumptions about where objects are usually placed, for example, places like on the floor, under the tables, or other horizontal surfaces within reach. Players often rely on these assumptions when searching for hidden objects in familiar environments. In comparison, AR’s capacity to enable mid-air placements triggered a distinct and more vertically oriented search behavior, underscoring how the possibilities in AR challenge and expand traditional spatial decision-making as compared to the RW.
This shift in search patterns is further supported by our analysis of spatial gaze entropy reported in
Section 5.5. By calculating the distribution of fixation durations across defined AOIs, Seekers whore were found to in the AR condition exhibited higher gaze entropy than those in the RW condition. This measure reflects a more distributed allocation of visual attention across the scene, confirming that AR gameplay encouraged broader and more exploratory scanning behavior. In contrast, RW Seekers demonstrated lower entropy values, indicating a more focused and structured gaze strategy, consistent with traditional search strategies guided by physical constraints and prior experience.
Seeliger et al. [
48] analyzed eye-gaze behavior in an “AR picking-and-assembly task” delivered through a HMD, where visual cues directed users to pick bins both inside and outside their FOV. Although their scenario is industrial component picking, their empirical findings also align with our findings. Their study stated that the AR guidance led users to “sweep their gaze across many distinct regions rather than fixating on a few likely spots”. Such widened and dispersed fixations align with the visual patterns observed in our AR condition. These findings indicate that AR interactions promote a more exploratory and distributed visual attention compared to the RW gameplay. However, future designs must carefully balance such exploratory game mechanics with simple user interfaces and interactions to mitigate user frustration and cognitive load.
6.2. Game Engagement and Player Experience
According to the reported GEQ results (see
Section 5.2), realism was rated higher in the RW condition, whereas three absorption-level items, “I feel different”, “I lose track of where I am”, and “I feel spaced out” scored significantly higher in the AR condition. While at first glance these results could be interpreted as disorientation in AR gameplay experience, this response can also signal a higher level of cognitive immersion.
Brockmyer et al. [
33] describe engagement as a gradual process that moves from immersion to presence, flow, and ultimately absorption, the deepest level. Items such as “I feel different”, “I lose track of where I am”, and “I feel spaced out” reflect this absorption stage and are associated with more intense, altered cognitive states during gameplay. Losing track of place, therefore, signals more than a simple confusion. Two additional findings from our study support the absorption reading. First,
reward scores on the UES-SF were significantly higher in the AR condition (
) than in the RW condition (
), with
and an effect size of
. Similarly, the
hedonic quality from the UEQ-S was markedly higher in AR (
) compared to RW (
). These increases occurred despite a lower
perceived usability score in the AR condition (
) versus RW (
), suggesting that players found the AR gameplay stimulating and rewarding rather than overwhelming. Second, Seekers’ performance did not deteriorate in the AR gameplay because the median time-to-find was comparable across both modes. Furthermore, also, only 8 of 60 players reported disorientation in the post-play interview. These results suggest that players remained highly engaged in AR without experiencing a loss of control or confusion.
Gaze data add further context to this. In AR, Seekers’ scan paths spread upward to mid-air and ceiling regions, echoing the finding by Chiossi et al. [
47] that AR search induces broader visual sweeps under higher cognitive load. This visual behavior aligns with the drop in verbal cue use by stranger pairs (see
Table 7), outlined as follows: attention shifted toward remote spatial cues rather than constant dialog.
In the RW condition, strangers relied more heavily on verbal hot–cold cues (
) than known pairs (
), suggesting a stronger dependence on verbal feedback when familiarity level was low. However, both known and stranger pairs got higher verbal cue usage scores compared to the AR condition. This aligns with previous research showing that face-to-face interaction fosters spontaneous communication and stronger social engagement [
10,
22]. In the AR condition, cue use among strangers dropped to
, while known pairs showed only a slight decrease to
. This pattern indicates that AR reduced strangers’ reliance on verbal coordination, likely due to increased spatial and visual demands, while familiar pairs adapted seamlessly. Therefore, the challenge of balancing verbal communication with AR mechanics appears specific to stranger pairs navigating unfamiliar remote spaces. Furthermore, players in the AR condition had to interact through a screen while navigating a remote space, which may have increased the mental demand required to coordinate. For pairs who were strangers, this shift likely reduced the ease of verbal interaction, redirecting their attention toward individual exploration and visual game elements. This reflects earlier findings by Duenser et al. [
22] and Billinghurst et al. [
23], who noted that distributed AR systems often require greater cognitive effort, which can lead to less spontaneous verbal communication.
Overall engagement scores did not differ between stranger and known pairs, indicating that familiarity did not significantly affect the primary engagement constructs measured in this study. It did, however, moderate communication style. Stranger pairs showed the largest drop in cues when moving to AR, while friends remained stable. Thus, familiarity shaped how players coordinated rather than how much they enjoyed the game. Future studies should pair GEQ data with conversational analysis to unpack these subtle social behaviors.
Moving virtual objects in the remote space using the virtual thumbstick impacted both usability and engagement. This is reflected in the lower perceived usability score in the AR condition (
) compared to the RW condition (
) in the UEQ-S results, suggesting that the interface introduced friction in the interaction flow. Several participants also noted these difficulties in the post-play interviews, describing the need for constant physical repositioning to accurately place objects in the remote scene. For example, one participant mentioned, “You had to keep moving around to get the right angle. It was a bit tiring”. Future research should investigate AR mechanics that simplify object placement and reduce the need for continuous physical adjustments, such as auto-alignment features and tap-to-place-like game mechanics to maintain high engagement while improving usability. Previous studies have also shown that AR systems often introduce cognitive and physical challenges that can affect usability [
22]. Designing AR mechanics that simplify input and reduce cognitive effort may improve the overall user experience while maintaining the engagement aspects of the gameplay.
Aggregate GEQ and UES-SF scores indicate that overall engagement did not differ significantly between AR and RW. Specifically, median construct-level scores for Absorption, Flow, Presence, and Immersion (GEQ), as well as Focused Attention, Perceived Usability, and Aesthetic Appeal (UES-SF), showed only marginal or non-significant differences. Nevertheless, two engagement facets did diverge indicating participants found AR gameplay more fulfilling and satisfying.
Reward (UES–SF) was higher in AR () than RW (), r = 0.46.
Hedonic quality (UEQ-S) was likewise greater in AR ( vs. ).
Further supporting this, the UEQ-S results demonstrated a significant advantage for AR in terms of Hedonic Quality (, AR vs. , RW), reflecting higher levels of excitement and novelty. Conversely, RW gameplay maintained a higher Pragmatic Quality (, RW vs. , AR), indicating participants found RW gameplay clearer and easier to use. Thus, while overall engagement was comparable across conditions, AR specifically has enhanced aspects related to enjoyment and satisfaction, albeit at a slight cost to usability and difficulties in the interaction.
Previous studies have pointed out that unfamiliar or overly complex interaction methods can reduce usability in AR systems [
49]. In particular, when virtual navigation demands constant adjustment using on-screen controls, it can disrupt the natural flow of gameplay and increase user frustration. Laato et al. [
10] also found that players often avoid camera-based AR features if they slow down progress or increase task difficulty. As they reported, only 7% of surveyed players regularly used AR features, with the most common reason being that they slowed progression.
Interview comments reflect these quantitative findings. Thirty-eight participants described AR as “more engaging” or “more rewarding”, often citing the novelty of manipulating remote objects and seeing the remote player’s live position and orientation. At the same time, many noted interface friction, consistent with the non-significant drop in Perceived Usability. Taken together, these results suggest that AR heightened specific rewarding and hedonic aspects of play without lifting overall engagement scores significantly. We therefore consider H2 as partially supported as the AR condition fostered certain engagement qualities but not overall engagement as a single construct.
6.3. Enhanced Social Interaction in AR
Reflexive thematic analysis revealed that 38 of 60 participants felt “more connected” or “really playing together” in the AR mode, even though they were playing remotely. Quantitatively, this is reflected further by the
hot–cold cue data (see
Table 7. Stranger pairs reduced verbal cues from
(RW) to
(AR) while maintaining identical median
time-to-find. In other words, AR let unfamiliar players collaborate effectively with
less speech, implying that alternative non-verbal channels were doing the communicative work. Prior work on remote AR collaboration shows similar effects. For example, a study found that visual embodiment cues can substitute for spoken directives and raise perceived co-presence [
23].
In our game, we did not include gesture-based input. However, our prototype displayed a simple “frustum avatar” that mimics the remote Seeker’s position and orientation. This helped players to understand what their partner was doing. It also supported coordination between remotely connected spaces without needing to speak much.
Prior studies suggest such directional visual cues support joint attention and task grounding [
23]. This may explain why stranger pairs needed fewer verbal cues but still performed just as well. Interview statements such as,
“I liked that you could see where the other person was looking; it put context to our chat.”—P41
indicate that these two cues already improved the social interaction. This finding aligns with prior collaborative AR studies that highlight the importance of
gaze direction and
viewpoint for effective spatial coordination. For example, Gauglitz et al. [
50] explored remote collaboration where the helper could pan and zoom a stabilized view of the local user’s scene. This allowed them to direct attention without disrupting the user’s first-person perspective. Similarly, Kim et al. [
51] tested combinations of visual cues in HMD-based AR tasks and found that pointer cues representing gaze significantly increased co-presence, while these implementations differ from our game setting, because they were using HMDs or explicit pointer tools, the findings support the idea that orientation and position are minimal but effective cues for coordinating shared tasks. In our AR hide-and-seek, the 3D frustum provided this data implicitly, allowing players to predict actions and reduce confusion, especially when other bodily cues like footsteps or shadows were absent.
Yet, some aspects of in-person play were still missing. Twelve participants said they “missed the physical interaction” of the real-world game. Piumsomboon et al. [
52] demonstrate that sharing facial expressions and hand gestures in AR can enhance emotional understanding and improve collaborative awareness. These cues help fill the gap left by missing physical presence, raising social connection by conveying both intent and emotion. Future LBARGs could test each layer separately to reveal how much extra social information is worth the technical effort.