Spatial Augmented Reality Storytelling in Arts and Culture: A Critical Review from an Interaction Design Perspective

Printezis, Petros; Koutsabasis, Panayiotis

doi:10.3390/heritage9010020

Open AccessReview

Spatial Augmented Reality Storytelling in Arts and Culture: A Critical Review from an Interaction Design Perspective

by

Petros Printezis

^*

and

Panayiotis Koutsabasis

Department of Product and System Design Engineering, University of the Aegean, 84100 Mytilene, Greece

^*

Author to whom correspondence should be addressed.

Heritage 2026, 9(1), 20; https://doi.org/10.3390/heritage9010020

Submission received: 28 November 2025 / Revised: 31 December 2025 / Accepted: 3 January 2026 / Published: 9 January 2026

(This article belongs to the Special Issue Digital Museology and Emerging Technologies in Cultural Heritage)

Download

Browse Figures

Versions Notes

Abstract

Spatial Augmented Reality (SAR) has evolved in the past fifteen years from a whimsical, projection-based approach to a socially nuanced medium of interpretative scholarship for culture, education, and storytelling. This paper presents a critical literature review on SAR systems and cases in arts and culture, based on 52 papers selected over the last decade. The perspective of the review is that of interaction design, which is concerned in general with the practice of designing interactive digital products, environments, systems, and services, and in particular with how the specific characteristics of a physical space, the interaction modality, and the narrative impact the design and efficacy of SAR in art and heritage contexts. This paper reports on the technology landscape, the physical contexts and scales of deployment, interaction modalities, audiences, and evaluation methods of SAR in arts and culture. Then, we present our reflections on the current state-of-the-art in terms of sketching out a historic trajectory of the field, SAR-oriented narrative design patterns, issues of inclusion and accessibility, and several design tensions, constraints, and recommendations for interaction design. Finally, we discuss potential further work in several dimensions of designing SAR for arts and culture, and we present a research agenda.

Keywords:

spatial augmented reality (SAR); cultural heritage interpretation; projection mapping; narrative patterns; interaction modalities; multi-user engagement; embodied interpretation; evaluation methods; participatory and co-creation frameworks; scenographic storytelling; Human–Computer Interaction (HCI); user experience (UX)

1. Introduction

Spatial Augmented Reality (SAR) goes by a variety of names (projection mapping, video mapping, or media architecture), but ultimately, it is projector-based augmentation spatially rendered to real surfaces/objects in the real world [1]. Unlike head-mounted or handheld AR, SAR is inherently co-present and social; many people see the same augmentation in situ without headgear, while the real-world space is the stage for the narrative that is being constructed. Therefore, SAR is particularly advantageous in the arts and culture since places, artifacts, and audiences are often co-present anyway; performance halls and galleries, archaeological sites and museums, town halls and parks become their own narratives.

SAR has shifted over the past ten years from large-scale works to embodied, pedagogical, and participatory experiences. From building-scale works that provide depth of culture and aesthetic value to museum-scale systems that rely on tangibility, gesture, and walk-through interaction, there is a desire to make complex narratives based on tangible artifacts and intangible practices increasingly accessible to a wider public. Simultaneously, research has begun to evolve from one-time events to comparative findings and standardized HCI (Human–Computer Interaction) and UX (user experience) measures that not only examine visual overlays but also multisensory social installations, participatory interactions, cultural learning, and sensitization goals.

Despite SAR existing in proliferation in the arts, culture, and education, the current landscape of related research remains scattered across journals and conferences, mainly about HCI (Human–Computer Interaction), Heritage studies, Design, Media Arts, and Tourism, with practitioners finding they have more in common to ask, including the following:

Where is SAR employed (which types of institutions or places) and for which purpose?
What are the main types of spatial configurations of SAR?
What are the current options for sensing solutions?
What are the repeated narrative structures, and what are their most effective scales?
What evidence exists for measures of impact (engagement, learning/recall, presence, cognitive load), and how do cultural and art institutions weigh spectacle versus scholarly interpretation?
What are the research gaps so far that may be addressed in further research and development?
What are the recurring trends and design tensions that emerge, and what design recommendations and solutions may address these on a consistent basis?

Given the diversity of cultural SAR deployments, the present review aims to consolidate recurring narratives, technological arrangements, and interaction approaches across the past decade. To structure this inquiry, three guiding research questions are adopted:

Where and how is SAR applied within arts, culture, heritage, and educational settings, and what technological and spatial configurations are most employed?
What forms of interaction, narrative structuring, and audience engagement are reported in the literature, and how do these relate to the physical scales and purposes of SAR installations?
What evidence exists regarding user experience, presence, learning, and cognitive load, and what design tensions or gaps emerge that may inform future research and practice?

These questions provide a consistent frame for synthesizing heterogeneous findings and for articulating both current trajectories and unmet needs in cultural SAR research.

This paper presents a review of a paper corpus of 52 cases of SAR in arts and culture published mainly between 2015 and 2025, with a few exceptions published earlier than 2015. We focus on arts and culture (specifically: entertainment, education, museums, and heritage).

1.1. Scope and Working Definitions

In this review, Spatial Augmented Reality (SAR) is defined as projector-registered augmentation spatially aligned with real-world surfaces/objects, such that the digital image exists in space. Since it is a device-free experience for participants, SAR is shared, co-located, inherently social; multiple participants can surround a scale model, wander a gallery, or stand in an outdoor plaza and experience the same thing. We consider works that are otherwise known as projection mapping, video mapping, media architecture, tabletop/object scale SAR, and room/gallery scale SAR to be included as long as the image overlay is spatially aligned (in the given architecture). We also include blended/phygital1 spaces where the room interfaces and projection are the primary output.

SAR is different from mobile AR2 (handheld overlays) and head-worn AR/MR3 (see-through or video-see-through HMDs4), which are personalized media experiences. While we occasionally touch upon findings relative to mobile/HMD research, mostly where design/evaluation findings can be directly transferable (i.e., patterns of narration, presence, and cognitive load), our analysis focuses on SAR as a shared projection-based medium, free of personal devices or accessories.

Our thematic focus is based on storytelling and interpretation. Projects are included as long as they tell a story, explicate something, or argue something in relation to the arts/cultural/heritage/educational fields, from historical reconstructions and iconographical studies to community memory efforts and intangible craft-based practices or curated artistic stories. We are inclusive of linear narrative walks/tours, chaptered/branching exhibitions, quest/role-playing formats, dialogic/co-created narratives, and time-based “palimpsests5”. We are exclusive of purely abstract displays or promotional presentations without a narrative purpose, unless those presentations are shown to have interpretive value, or purely cultural value.

The primary context of application is within the arts and culture realm: museums/galleries, cultural heritage (indoor/outdoor), education/edutainment6, performing-arts scenography, public cultural interventions, and civic/community memory efforts. Throughout our reporting and reflection, we describe issues of the SAR scale (concepts and patterns), user interactions with SAR (modalities), respective technologies for projection and sensing, as well as respective design options, affordances, and constraints.

The timeframe of this review is approximately the last decade (≈2015–2025) to match the corpus we have assembled. We are including peer-reviewed journal articles, conference proceeding full papers, book chapters, and mature design case studies, which represent/set up a narrative purpose and ideally some level of evaluation.

Our approach to the reporting and discussion of the state of the art is not a quantitative meta-analysis but a qualitative narrative synthesis; due to the heterogeneity of SAR in situation/method/outcome, it is not amenable to effect size counting. Nor is there an extensive exploration of a technological review of projectors necessary beyond what is discussed to understand how equipment choices impact narrative/interactivity.

1.2. Methodology of the Review

Search strategy: We first located the academic publishers’ fields through which most cultural SAR literature is often published: the digital libraries of ACM, IEEE, Taylor and Francis, Springer, and Elsevier, as well as Google Scholar. Searches were conducted on domain/narrative terms with interaction: “spatial augmented reality” OR “projection mapping” OR “video mapping” AND “storytelling” OR “narrative” OR “interpretation” AND “interaction” OR “interactive” and filtered with applications of “museum,” “heritage,” and “education.” We considered the first 200 results returned from each publishing service, based on title and abstract. We also identified papers from the cited by links of some of the papers located. We focused on the past decade for recapturing contemporary efforts (≈2015–2025), but did not hesitate to include a handful of prior or background works when directly related to current cultural SAR efforts. We also located a few review papers with somewhat relevant scope, which we briefly discuss below.

Screening and eligibility criteria. Titles/abstracts were screened based on four criteria: (i) the visual overlay is projector-based and spatially registered to the real world (or the work has otherwise design/evaluation insights transferable to SAR efforts based on mobile/HMD AR), (ii) the work attempts an interpretive narrative/educational storytelling (and not pure abstract spectacle), (iii) it exists in the realm of arts/culture/education/heritage/museum (adjacent realms noted only as methodological designs for purposes of comparison), and (iv) it is a peer-reviewed publication, full-length (i.e., not short paper). Duplicates across literature collection sites are merged at the level of the literature in the thematic synthesis, where applicable. For example, if an extended journal version significantly adds to the value of a shorter version, the two citations are kept within this compiled set but are not counted twice for descriptive statistics.

Coding scheme. All papers included were placed on a coding sheet. In addition to citation metrics (APA, DOI/URL, venue, year, authors), we coded for the following:

Used technology (e.g., display, sensing, etc.);
Physical context/scale (e.g., tabletop/object room/gallery building/outdoor setups);
System setup requirements (e.g., multi-projector setups, depth cameras, tangible tokens);
Modality of interaction (e.g., passive guided gaze, tangible, gesture/body, walk-through, co-creation);
Narrative focus/cultural theme and content (storytelling);
Audience/user group;
Type of study (lab, in-the-wild, expert review, concept contribution);
Evaluation measures (e.g., SUS7, UEQ8, NASA-TLX9, presence studies, retrospective think-aloud, interviews) and outcomes;
Design strategies (e.g., participatory design, research-through-design, user-centered iterative or agile development);
Notes on research gaps, trends, and various comments.

This coding was performed in two passes: extraction from full text/PDFs followed by a consensus round to unify terms (i.e., “video mapping” categorized under SAR and types of interaction normalized). Relative to SAR literature being interdisciplinary in methods, locations, and findings, we take a thematic approach to integration as opposed to meta-analysis of effect sizes.

Characteristics of paper corpus. The total number of papers selected is 52. Table 1 presents the listing of the paper corpus with respect to the year of publication. The large majority of selected papers were published over the last four years. Furthermore, Table 2 lists the venues of publication. Most papers were published in scientific journals, which is an indication of more completed works, and there are also several papers presented in high-quality conferences.

2. Reporting

2.1. Technology Landscape

Spatial Augmented Reality can be understood as a coherent system in which multiple technological and design elements are inseparably linked and must be conceived in relation to one another. For a digital narrative to appear convincingly embedded on physical surfaces, these elements cannot function in isolation; instead, they form an interdependent ecosystem that aligns technical capabilities with spatial conditions and narrative intent. As illustrated in Figure 1, this ecosystem is organized into five interconnected clusters that collectively support the realization of SAR experiences. The projection cluster addresses the physical act of image emission, encompassing optical characteristics, brightness levels, and photometric behavior required for visual legibility on real-world materials. Sensing and Tracking focus on how the system perceives the environment and users, integrating depth sensing, computer vision, and localization technologies to maintain spatial awareness. Calibration and Registration ensure that digital content is geometrically and photometrically aligned with the physical world, allowing virtual elements to remain stable and contextually grounded. Authoring and Playback concern the tools and infrastructures used to design, orchestrate, and deliver content in real time, including software engines and media servers. Finally, Experience and Constraints capture the perceptual and environmental conditions under which SAR operates, incorporating multisensory design considerations, occlusion management, and ambient light limitations. A spatial augmented reality system emerges only when these clusters operate in concert, guided by their technical dependencies, spatial relationships, and narrative requirements. It is this coordinated interaction that enables digital content to be perceived not as an overlay, but as an integrated extension of the physical environment.

The purpose of the present section is to provide a concise and high-level overview of the technological landscape associated with the implementation of spatial augmented reality systems. Rather than offering an exhaustive or deeply technical analysis, the section introduces, in a deliberately surface-level manner, the range of technologies that one may encounter when designing and deploying SAR installations. Given the inherently multidisciplinary and technically dense nature of these systems, several components and concepts described herein may appear complex or overly detailed to non-specialist readers. For this reason, the section may be safely skipped without loss of continuity, serving primarily as a reference framework for readers seeking a clearer understanding of the technological scope underpinning SAR.

2.1.1. Projection

At the most fundamental level, spatial augmented reality is grounded in the display mechanism, namely projection, which establishes the visual conditions under which digital content becomes perceptible on physical surfaces. In tabletop configurations and small gallery stations, single-projector setups are commonly employed, often using ultra-short-throw lenses to minimize shadows and prevent viewers’ bodies from interrupting the projection beam in confined spaces [29]. As installations scale up to room-sized environments or building façades, multi-projector arrangements become necessary in order to achieve a seamless, continuous image across larger surfaces. In such cases, techniques such as edge blending10 and color matching11 are essential to ensure visual continuity and consistency between overlapping projection fields [8].

Contemporary SAR installations increasingly rely on laser or laser-phosphor projectors, as these systems offer reduced maintenance requirements and more stable brightness levels over prolonged periods of operation, making them suitable for long-term exhibitions and public deployments [25]. However, brightness requirements cannot be defined universally and are instead highly dependent on spatial context. Object-based and tabletop installations typically operate within the lower thousands of lumens12. Gallery rooms without significant ambient light intrusion require mid-range brightness levels (mid-teens), while large-scale façade projections often demand tens of thousands of lumens per channel to maintain legibility at architectural scale.

Importantly, perceived image quality is influenced not only by luminous output (brightness) but also by the material properties of the projection surface. Surface reflectance and texture often play a more decisive role than raw lumen values. Matte and light-colored finishes tend to produce high visual clarity, whereas materials such as dark stone, glossy varnish, gold leaf, or deeply textured and glare-prone surfaces introduce significant challenges. These conditions necessitate photometric compensation13, careful control of projection angles, and content design strategies that reinforce figure–ground separation to preserve readability [22]. Finally, conservation constraints impose upper limits on permissible light exposure for sensitive surfaces. As a result, narrative pacing and visitor dwell time must be designed in accordance with safe lux thresholds, acknowledging that physical materials can only tolerate limited illumination over time without risk of degradation.

2.1.2. Sensing and Tracking

Sensing and Tracking technologies provide the spatial awareness that distinguishes spatial augmented reality from simple image projection, enabling digital content to respond meaningfully to physical form, user presence, and action. RGB-D14 cameras, which capture both color information and depth data, are widely used to generate real-time surface models of the environment. These models support automatic registration, pixel-accurate alignment, and interaction with non-planar geometries such as reliefs, sandboxes, and physical scale models, as demonstrated in [7] and in embodied interactive installations such as [6]. By continuously updating the spatial relationship between projected imagery and physical surfaces, RGB-D sensing allows virtual elements to remain perceptually anchored to the material world.

In contexts where depth sensing is not available or required, standard RGB cameras can be employed for vision-based tracking. Using fiducial15 marker systems such as AprilTags16 or ArUco17 tags, these cameras enable reliable planar and three-dimensional pose estimation of artifacts, handheld objects, or segmented interaction units. Such marker-based approaches are particularly effective for tangible interaction scenarios and are consistent with interaction patterns surveyed in [32]. Complementing these methods, dedicated depth-sensing technologies, including Time-of-Flight (ToF)18 and structured-light sensors, enhance robustness in room-scale installations by supporting occlusion handling and pixel-precise image warping. These capabilities are critical for maintaining visual stability in complex spatial environments and are reflected in systematic analyses such as [34].

For embodied and device-free interaction, skeleton tracking and hand-pose estimation techniques allow users to control and influence SAR experiences through body movement and gestures, eliminating the need for wearable equipment. This approach supports more inclusive and accessible interaction paradigms, as discussed in [30]. Verification of tangible actions further extends interaction possibilities through the use of RFID19, NFC20, small magnets, or fiducial-tagged physical objects. These technologies enable systems to reliably detect object manipulation and spatial relationships, forming the basis for object-driven narratives and participatory interaction models, in line with tangible and participatory principles reviewed in [53].

At larger spatial scales, sensing infrastructures may be distributed across the environment. Handheld props can integrate inertial measurement units (IMUs)21 to track orientation and motion, while entire venues may rely on Ultra-Wideband (UWB)22 beacon systems or LiDAR23-based scans to support coarse localization, environmental reconstruction, or pre-show calibration. Such infrastructural sensing approaches are foundational to large-scale media architecture and spatial storytelling installations, including those described in [26].

2.1.3. Calibration and Registration

Calibration and Registration constitute the mechanisms through which all digital content is anchored to the physical world, ensuring that projected imagery aligns meaningfully with real surfaces rather than merely appearing on them. In simple cases involving flat displays, planar alignment can be achieved using homographies24. However, the majority of cultural and architectural surfaces encountered in spatial augmented reality are non-planar, irregular, or materially complex. In such contexts, projector–camera calibration is required in order to estimate both intrinsic and extrinsic parameters of the projection system, allowing the projector’s pixel grid25 to conform accurately to complex geometries. This approach is central to cultural heritage projection workflows such as [8], where architectural surfaces demanded precise, in situ geometric registration to preserve spatial and historical coherence.

Once calibration has been established, a series of post-processing steps, including geometric warping26, edge-blending masks, and color matching, are applied to eliminate visible seams between overlapping projection areas. These techniques enable multiple projectors to function as a unified visual system, producing a continuous image across irregular or fragmented surfaces. Blending-driven media architecture configurations of this kind are discussed in [26], where multiple projection channels were harmonized across a structurally complex façade to achieve perceptual continuity at the architectural scale.

Alongside geometric alignment, photometric compensation plays a critical role in maintaining visual uniformity. By adjusting brightness values on a per-pixel basis, the system can counteract variations caused by surface albedo, projector vignetting, material texture, or intensity differences in overlapping projection regions. This process is especially important when projecting onto challenging cultural materials, and its influence on audience perception is demonstrated in heritage illumination contexts such as [22], which documents how material reflectance directly affects perceptual clarity and legibility.

More recent calibration pipelines increasingly incorporate vision-based autocorrection mechanisms. These systems use fiducial markers or depth-based spatial snapshots to re-establish alignment following power cycles, physical shifts in equipment, or accidental user interference. Marker-based re-registration and drift-correction strategies are commonly reported in AR–SAR hybrid systems surveyed in [34], highlighting the widespread reliance on marker or depth-assisted realignment to ensure long-term stability in deployed installations.

At the scale of architectural façades, calibration workflows almost invariably begin with pre-computed geometric representations of the environment. Point clouds or mesh models derived from photogrammetry27 or laser scanning provide a stable spatial reference against which projectors are aligned. This digital survey approach forms the backbone of large-scale façade mapping systems, as exemplified in [51], where high-resolution geometric models establish the registration framework for coordinating multiple projection channels across extensive architectural surfaces.

2.1.4. Authoring and Playback

Authoring and playback represent two complementary phases of the spatial augmented reality production workflow, addressing the creation of content and the controlled delivery of that content in situ. On the authoring side, real-time engines such as Unity28 and Unreal Engine29 are widely employed to support interaction design, logic scripting, and three-dimensional scene reconstruction. These platforms enable the integration of narrative logic with spatial data and user input, and they are commonly used within cultural-heritage XR pipelines, as exemplified in [33].

Alongside game engines, generative visual environments, including TouchDesigner30, Notch31, vvvv32, openFrameworks33, and Processing34, play a significant role in SAR development. These tools facilitate rapid prototyping, real-time audiovisual synthesis, and sensor-fusion workflows, allowing designers to experiment iteratively with data-driven visuals and responsive behaviors. Such practices align closely with creative computational approaches described in [41], where algorithmic processes and live inputs are central to experiential design.

On the playback side of the workflow, media servers such as Disguise35, Watchout36, and Resolume37 are used to orchestrate complex, multi-channel projection systems. These platforms provide timeline-based control, geometric warping, edge blending, and show-control capabilities, making them particularly suited for large rooms and façade-scale installations. Their role within media-architecture production pipelines is well-documented in works such as [26], where coordinated playback across multiple projection channels is essential to achieving spatial and perceptual coherence.

For smaller or mid-scale installations, particularly in museum contexts, manual projection-mapping tools such as MadMapper38 remain a standard solution. These tools offer flexibility for on-site alignment and rapid adjustments during installation, supporting workflows where spatial conditions or curatorial requirements may change late in the production process, as also observed in [14]. Complementing these systems, a growing trend within museums is the integration of lightweight content management system (CMS) layers. Such systems allow curators and non-technical staff to update captions, rotate narrative segments, or localize content without modifying the underlying engineering infrastructure, a practice consistent with heritage exhibition systems described in [9].

As SAR projects increasingly involve multidisciplinary teams and multiple contributors, structured asset pipelines and version control mechanisms become critical. Coordinating video, audio, textual, and three-dimensional assets across iterative production cycles ensures consistency and traceability throughout the workflow. This collaborative approach reflects XR authoring methodologies discussed in [48], where shared repositories and controlled versioning support scalable and sustainable content production.

2.1.5. Experience and Constraints

The experiential layer of spatial augmented reality is shaped not only by visual projection but also by sound, light, and other environmental factors that collectively sustain the illusion of spatial coherence. Spatialized audio systems, including distributed speaker arrays and low-frequency subwoofers, are frequently used to anchor narrative voices, soundscapes, and diegetic39 sounds to specific locations in space. By aligning auditory cues with visual events, these systems enhance presence and attentional focus, as demonstrated in multisensory cultural-heritage XR projects such as [42] and in audio-driven immersive design approaches discussed in [18]. In parallel, architectural lighting can be integrated into the narrative framework through DMX40 (Digital Multiplex) or sACN41 (Streaming Architecture for Control Networks) control systems, allowing illumination states to synchronize precisely with projected content. Such lighting–projection coordination is a defining feature of media-architecture storytelling workflows, including those described in [26], where changes in lighting reinforced temporal transitions and historical atmospheres.

Beyond audio and light, additional sensory elements such as subtle haze, airflow from fans, scent cues, or low-intensity vibration may be introduced to deepen immersion. However, these elements must be applied conservatively, particularly in hybrid theatre–museum environments, and only when conservation, accessibility, and visitor comfort requirements permit. These constraints reflect broader guidelines on multisensory interpretation and visitor impact outlined in [22], which emphasize perceptual clarity, physical comfort, and material tolerance as key parameters shaping the overall sensory envelope of an installation.

Across SAR deployments, two persistent constraints strongly influence technological choices and, by extension, narrative form. The first is occlusion. Front-projection systems offer high brightness and close coupling between image and surface, but they are inherently vulnerable to shadows cast by moving participants or staff. Mitigation strategies include the use of ultra-short-throw (UST) projector optics and angled ceiling mounts that minimize beam obstruction by keeping bodies out of the projection path. Such spatial and geometric considerations are consistent with practical deployment strategies described in [7], where physical positioning played a decisive role in interaction quality. In addition, designers often anticipate occlusion by authoring content within visually safe zones that avoid typical visitor trajectories, an approach aligned with collaborative spatial-use patterns discussed in [6].

The second constraint is ambient light. While indoor gallery spaces can often be darkened or carefully tuned, outdoor heritage façades must contend with street lighting, signage, and environmental glare. As a result, large-scale exterior projections tend to favor high-contrast color palettes and pronounced motion cues over fine visual detail. This design strategy is supported by visitor-perception studies such as [22], which highlight the limits of legibility under competing light conditions. Façade-scale media-architecture projects further adapt to these constraints by tailoring content coverage and perspective to anticipated viewing positions, optimizing readability despite ambient light interference. Such adaptations are evident in [26], where nighttime public viewing conditions directly shaped both visual composition and narrative pacing.

Finally, long-term maintainability plays a critical role in determining which technologies are appropriate for responsible deployment. Automated calibration mechanisms reduce daily setup effort and compensate for gradual system drift over extended exhibition runs, as seen in in situ projection workflows requiring repeated realignment, such as [8]. Content pipelines that expose captions, language variants, and audio-description tracks support accessibility goals and reflect inclusive design principles discussed in [30]. At the same time, adherence to photometric thresholds protects vulnerable materials, following illumination and visitor-perception constraints documented in [22]. Curatorial oversight and provenance checks further ensure that narrative dramatization does not result in misrepresentation, echoing concerns about interpretive integrity raised in [15]. Taken together, these considerations suggest that the most effective spatial augmented reality experiences are not those of greatest technical complexity, but those in which projection, sensing, and authoring are deployed at an appropriate scale, allowing technology to recede into the background while narrative clarity, credibility, and shared experience remain foregrounded, an ethos widely shared among media-architecture practitioners, as reflected in [26].

2.2. Physical Contexts and Scales of Deployment

SAR takes advantage of real surfaces, so the physical environment is not the backdrop but rather the medium. There are three scales that emerge when it comes to cultural interventions, as seen in Figure 2. Object or tabletop stations bring artifacts, maps, and models to life. For example, the work of [7] explores interactive holographic projection and tangible spatial interaction at small scales. Room or gallery installations create responsive scenography from within interiors. An example is [15], which demonstrates multi-surface projection mapping and ambient storytelling in museum rooms. Building or outdoor interventions use façades and heritage landscapes as civic stages. An example is [26], which employs large-scale façade projection for public audiences. Each scale has certain stories to tell, certain limitations to impose, and pushes technology to varying configurations.

At the object or tabletop scale, projection acts as a magnification of key features that become challenging to see in the object as it is. Relief maps, i.e., a three-dimensional representation, usually of terrain, materialized as a physical artifact, can denote historic layers, trading routes, and seasonal practices [8]. Scale models of homesteads or industrial buildings can show time-lapse circulation, labor, and transformations [22]. The proximity between the viewer and the projected material is relatively intimate, sometimes bringing groups around one shared tangible surface [6]. As a result, legibility relies upon localized contrast, accurate registration, and very short throw distances to keep human bodies from entering the projection. As a result, interactivity is often tangible and low-friction. Tokens, dials, and touch-like interfaces welcome families and school groups without any trained assistance for co-exploration, consistent with participatory museum interaction methods seen in [53]. The intimacy of this scale works best for narratives that require time and spatial nuances, like iconographical studies, craft processes, or the growth of a site over years. Conservation concerns frequently come into play in these stations, exposure to light is an appropriate lux level, surface temperature is what index tells an item on the site roster whether it is autumn or summer, as a result, narrative pacing, idle states and sensor timeouts must be part of the design rather than an operational afterthought, which aligns with conservation-aware exhibition design practices in [22].

Room and gallery scale shifts the focus from object legibility to social orchestration. Multiple projectors, spatialized audio, and controlled lighting can turn a black box or heritage hall into a cohesive storytelling environment [15]. Here, the space becomes the interface, meaning that circulation, lines of sight, and acoustic zones must be composed with as much care as the visuals, as seen in media-architecture concepts in [26]. This can result in passive, guided gaze; however, more cultural projects are hybridizing embodied gestures, walkable timelines, and tangential tools that become shared instruments [6]. This scale is best for chaptered narratives that require staging and rhythm. It supports collaborative questioning and dialogue in ways that head-mounted displays do not because visitors are aware of one another’s presence [33]. Technical decisions are made to be robust and maintainable over time, auto calibration, edge blending, photometric compensation, and show control between projection and light is what makes a gallery installation viable weeks or months after opening night [8]. Accessibility design becomes more essential at this scale; clear seating or leaning edges, audio description options, caption layers, and safe luminance transitions are critical to preserve comfort for diverse audiences while maintaining narrative clarity [30].

Building and outdoor scale transform SAR into the civic realm. Façades and archaeological sites host nightly shows for large, diverse audiences, many of whom do not come to town with plans to visit a museum [51]. Thus, architecture informs the narrative trajectory; portals or domes, spires, rose windows, or industrial silhouettes suggest certain narrative bumps while omitting others, as seen in heritage-surface storytelling frameworks in [43]. Weather, ambient street lighting, and very distant lines of sight provide a basic technical setting [25]. Multi-projector arrays, high output per channel, respected viewing distances, and volume that considers neighborhood inhabitants are common constraints, consistent with public-realm media architecture deployments such as [26]. Audiences are often standing and transient, which implies clear arcs/figure to ground contrasts/cultural framing, that can be comprehended without prior knowledge [36]. Here, interactivity is largely communal and atmospheric; when interactivity does occur, it is almost through large gestures, live performances, or synchronized communal engagement, instead of personalized agency, as reflected in hybrid XR-performance systems such as [4].

Cultural authenticity is important for curators and is often publicly scrutinized. SAR operates as an extension of the monument itself, so the projection must weigh spectacular success alongside rigorous interpretation and fidelity to the site. Similarly, on all scales, the greatest design fit comes between the type of story and spatial affordance, as for example in the work of [38]. Precise examination works best on models/maps where closeness affords precision of registration/annotation, like the work of [12]. Dialogic collaborative exploration works best in rooms where audiences can talk/gesture/move together without devices, public/recreational myth, like for example in [15]. Contested memory or identity works best on façades/outdoor sites where place meets community, like in [26].

Which narratives are best related to the spatial scales of object-, room-, and building-scale SAR installations, according to the above matrix? As seen in Figure 3, object-scale SAR relates best to accurate investigation because proximity, visibility, registration, and conservation details rely on as much detail as possible for map, model, and iconographic discussion. Room- and gallery-scale SAR best relates to dialogic collaboration because co-presence, participatory circulation, embodied gestures, and collective inquiry facilitate group consideration and walkable narratives. Building- and outdoor-scale SAR relates to civic myth and contested identity because façades, large-scale narratives, and collective memory and community contestation of meaning occur in the public space and architectural outline. Furthermore, the clear presence of empty or less densely populated squares implies weaker connections, suggesting that not every narrative form is best supported by every spatial scale. Thus, the above diagram supports the argument that narrative clarity and cultural integrity of SAR come from a consideration of projected space and intentions of the narrative.

Figure 4 illustrates the physical contexts and scales of systems referenced in the collated papers. Of the corpus of 52 papers, 25 are explicitly contextualized with a known physical deployment context or scale. Object/Tabletop scale (relief maps, sandboxes, scale models, small objects) is reported by 12 papers [3,5,7,8,9,11,12,13,16,29,33,50] Room/Gallery scale (immersive rooms, black-box galleries, installations indoors) is referenced by eight papers [2,4,6,14,18,30,50,52]. Building/Outdoor scale (building facades, archaeological sites, media architecture, and projection mapping for outdoor projections) is discussed in 5 papers [8,13,22,43,51]. The papers [13,24] are both from the same installation, so they are being referenced as one.

2.3. Interaction Modalities

Ultimately, the success or failure of SAR depends on what people can do to engage it in the moment [47]. This same stack of projectors might feel like a lecture, a game, a performance, a workshop, or else, all depending upon how visitors are welcomed to respond. Therefore, in cultural settings, the most successful interactions are those that are sensed at a glance, resilient to crowd dynamics, and in natural harmony with the meaning of the place or object [27]. Figure 5 shows the six (6) different types of interaction found in paper corpus as well as from author’s experience.

Passive guided gaze is the baseline expectation. If one can expect directed emphasis, then a series of images exists on architectural details, sections of frescos, or zones of a scale model lined up with narration or spatial audio that explains what matters [39]. This modality scales with larger crowds and is relatively easy to master in heritage contexts where touching is not allowed. The exception, however, is potential overload, where visual momentum counters complex symbols or dense wordings. The response from designers is to craft chapters that are brief with obvious figure/ground contrast for content with pacing and stillness to allow meaning to breathe. When historical caution or sacred sentiment must be maintained, guided gaze is often the most respectful option to explore.

Tangible and token-based control is where space becomes a tabletop theater. One must place or turn a small object, rotate a dial, or present a tagged prop to reveal what else can be on a model or artifact. The physical affordance makes cause-and-effect clear without pre-instruction to support family and school groups. It also maintains a shared surface of interest so that divided attention between the device and the world does not squander attention. The critical design considerations are to ensure that mappings remain stable, there are no hidden states, and idle states are obvious enough for late arrivals to know how to start. Tokens can act as narrative theme objects themselves—the tool of an agricultural craft, for example, or a miniature version of the absent building—which encourages recall [7].

Gesture and whole-body input is another option, as part of storytelling—movement—as opposed to only a mouse replacement. Hand positioning, reach, sweep, and posturing can act like historical movements or ceremonial forms if the mapping is sensitive and expressive enough. Vision-based tracking can allow people to approach recognition of their own accord, but comes with the familiar downsides: false positives, fatigue, and the “Midas touch” where everything responds to everything. Successful designs rely upon small vocabularies with semantic meaning and staging that does not allow bodies to block critical projections. Audio guidance and light feedback help reinforce recognition without breaking immersion [41].

Walk through, and locomotion is where space becomes the timeline. As visitors travel along their designated path, and often a natural path, episodes arise based on trajectories in time or stages of processes/eras. This is great for gallery spaces and often the only viable option in outdoor settings where there is no heritage context. The narrative must support non-linear exploration, which means nodes function independently without losing value as part of a larger narrative through line. Wayfinding becomes part of interaction design: markings on the ground, light gradients, and sound beacons create a flow that avoids cluttered signage. Safety and conservation drive how close people can get to artifacts or stones where a line of sight is maintained [40].

Co-creation and participatory modes are where visiting communities contribute to additions, selections, or reinterpretations of original content [53]. This can be as simple as voting on which layer is projected next on a model in museums or as complex as documenting and projecting on screen context during relevant times. In public spaces, this might require additional assets brought in to be curated into scheduled events. Participation shifts tone from authoritative to dialogic while granting access to marginalized narratives, but requires facilitation/control to track provenance and proper framing, where viewers can distinguish between fact-driven history, memories, and speculation. If remote contributors join telepresently at any time for in situ projection, there must also be legibility and rhythm for those onsite [28,53].

Voice (and speech)-based engagement is nearly nonexistent in contemporary SAR applications for cultural heritage, largely because they are at odds with the situational, sociocultural, and acoustic nature of the spaces. Museum and gallery contexts are reverberant, busy, and multiparty environments where appropriate speech detection with reliable attribution of “who is talking to the system” is inherently unpredictable. At the same time, these spaces welcome multilingual publics, which render a single implementation accommodating to varying languages, accents, and speech patterns unlikely. In certain sacred and memorial contexts, requesting access to a system through vocalization is inappropriate relative to norms of awareness and quiet contemplation. Finally, institutions favor low-friction, low-maintenance solutions like physical markers, paths to walk, and easily stabilized gestures for long-term installations, which are more manageable for day-to-day exhibitions and easier for people who might not be comfortable—if not able—to use their voice as a form of input [7,9,12,15,16].

As seen in Figure 6, this is a matrix mapping the most common types of interaction utilized during cultural SAR based on a bi-dimensional field of physical/spatial engagement (from minimal to full spanning/exploration of space) and participation/engagement (from low passive, responsive engagement to high participatory involvement). At the lowest common denominator [low physical, low participatory] is passive guided gaze, wherein the visitor is generally expected to absorb directed emphasis and pace from projection and sound; this serves best with larger crowds, or in situations where sensitive/sanctified reception is necessary. From there, physical and token-based control facilitates low-friction, tactile engagement via tokens, dials, and tagged elements, so cause-and-effect is immediately legible and transparent on first glance, but able to operate on a communal plane of interest. Gesture and whole-body engagement necessitate sophisticated posturing, reaching, and programmed motion as a form of expressive control, limited vocabularies, and audio/light feedback, but must also account for fatigue, misinterpretation, and “Midas touch” (where all within reach inadvertently becomes activated). At the other extreme, walk-through and locomotion assume that space itself represents a chronologically driven narrative that allows for episodically revealed content based on the visitor’s directionality and distance. This method relies on appropriate wayfinding, safety measures, and sightline maintenance to support non-linear movement. At the high end of engagement, co-creation and participatory efforts allow for visitors/community to create content, select layers, or reinterpret elements of heritage, thus shifting tonal expectations from authoritative imposition to dialogic encouragement. This method is more sophisticated in curation/framing to ensure authenticity of authorship or interpretive clarity. Finally, voice and speech-centric engagement exists on a less common periphery as no studies of visitor engagement legitimize its use within cultural SAR implementation due to acoustic complexity, multilingual audiences, a silent reverie typically aligned with institutional learning/visits, or institutional preference for low-friction, low-effort engagement opportunities.

Of the 52 studies included in this qualitative narrative synthesis, and as shown in Figure 7, 27 report at least one of six identified interaction forms. Passive Guidance/Guided Gaze is observed in 5 studies (see studies [2,3,6,32,39]). Gesture/Whole Body Input follows, appearing in 4 studies (see studies [3,4,7,13]). Tangible/Token-Based Control is recorded in 9 studies (see studies [5,8,12,14,15,22,33,42,50]), making it the most frequently reported interaction modality within the sample. Co-Creation/Participatory Modes are documented in 5 studies (see studies [12,18,29,31,33]). Walkthrough/Locomotion interaction is identified in 4 papers (see studies [4,18,23,40]). In contrast, Voice/Speech Input represents the least observed interaction type, with no studies in the present synthesis reporting its use. This absence suggests that voice-based interaction is not commonly adopted in cultural SAR systems, potentially due to social norms, acoustic conditions, or expectations of visitor behavior in heritage environments, where vocal interaction may be perceived as intrusive or inappropriate.

2.4. Audiences

Spatial augmented reality (SAR) in cultural heritage contexts fundamentally differs from individualized extended reality systems in that it is experienced collectively, within shared physical space and time. The absence of personal devices situates SAR as an inherently social medium, where diverse visitor groups—varying in age, ability, motivation, cultural background, and familiarity with the site—coexist and negotiate meaning simultaneously. This section examines how such heterogeneous and overlapping audiences shape the design requirements of cultural SAR, foregrounding social interaction, accessibility, and inclusion as core design parameters rather than secondary concerns. Drawing on examples across spatial scales and visitor profiles, the discussion articulates how narrative structure, pacing, multimodal communication, and spatial framing must respond to shared attention economies and socially mediated interpretation, positioning audience diversity as a defining constraint and opportunity for SAR-based cultural experiences.

Spatial augmented reality is inherently shared. Users do not wear devices; they stand together and experience the same augmentation in the same spatial existence [27,47]. Therefore, this social affordance mediates the audience to a greater extent than personalized AR. Families, school trips, average museum-goers, rushed tourists, local inhabitants with lived memories, seniors, and disabled people all enter the same room or square [17,30]. Thus, designing for this overlap makes social connection, inclusion, and accessibility primary considerations, as opposed to afterthoughts [28,53].

The manner in which cultural sites host individuals varies widely between levels of knowledge and motivation. A façade projection, for example, might have thousands of people who have been standing there for ten minutes, while a tabletop station might have ten people who have been focusing on it for thirty minutes. The story and pace must respect the attention budgets [22,35]. Those with chaptered structures allow late arrivals and early leavers to not lose the rhythm of the narrative [39,52]. Those with progressive disclosure allow a child to trigger a simple reveal, while a lingering expert can wait for a more robust one [36,40]. Clear wayfinding and sightlines reduce the cognitive overhead of “where should I look” to devote attention to meaning-making instead of orientation [13,21].

Cultural heritage spatial augmented reality almost always demands a “user” that is not singular but exists in the space of family clusters, school groups, tourists, locals, and differently abled participants. For example, research on museum service models suggests exhibits should accommodate first-time tourists and repeat locals with varying expectations and prior familiarity with the heritage [13,17]. More commonly, school groups and families are accommodated through playful and exploratory approaches with cause-and-effect responses, as made clear through exploration games and AR created for younger audiences [36,40].

At the same time, many projects acknowledge SAR and XR explicitly as means for inclusion and participation, reaching seniors, differently abled visitors, and communities that cannot have their voices actively represented in more traditional ways. Social inclusion projects emphasize a shared space, multimodal feedback, and low-friction interaction so that, no matter motor or sensory capabilities, everyone can access the same engagement as others through SAR [30,35]. Literature reviews on participatory AR and co-design indicate that, for a potentially overlapping audience, social connection, discourse, and accessibility considerations should be made a priority from the outset instead of as an afterthought [28,53].

Across this diversity of visitor profiles, three audience-related needs emerge as particularly significant. The first concerns narrative layering, understood as the provision of interpretive moments with varying durations and depths of engagement, allowing visitors to enter, sustain, or disengage from the narrative according to their available time and attentional capacity [22,39,52]. The second relates to multimodal redundancy across visual, auditory, and spatial channels, ensuring that meaning remains accessible regardless of differences in sensory abilities, cognitive styles, or prior knowledge [13,30,35]. The third involves the design of socially supported meaning-making, in which co-presence, conversation, and shared interpretation are intentionally incorporated into the experience rather than occurring incidentally as an unintended byproduct [27,28,53].

These needs influence potential modalities of interaction as well as how spaces support framing content. Tablet SAR is appropriate for families and groups who occupy the same space and negotiate meaning together [7,12,13], while room-scale SAR encourages dialogue within group formations and collaborative inquiry [9,15,33]. Façade-scale SAR requires appropriate legibility, figure-ground discrimination, and cultural framing to best accommodate a large, transient audience with minimal exposure [16,43,51].

In addition, cultural SAR intercedes with access requirements for captioning, audio description, pacing options, and safe patterns of movement because heterogeneous visitation is more the rule than the exception [30,35,40]. Where SAR is purposefully designed, co-present groups can better refine focus, effectively retain information through socialized discourse, and even allow visitors of varied ages or types to share time collaboratively in the same space [13,36,52].

Thus, audience needs emerge from SAR considerations extending beyond simple demographic categories to encompass rhythms of natural attention spans, diverse sensory access, social engagement, and spatially responsive patterns. These needs impact narrative development, interaction selection, and spatial configuration. As such, cultural SAR will have legibility, inclusivity, and relevance for as many engaged visitors as possible in one space [27,28,53].

2.5. Evaluation Methods and Evidence

The evaluation of user experience (UX) in spatial augmented reality (SAR) installations within cultural heritage contexts constitutes a methodologically complex field, shaped by the situated nature of space, narrative intent, and heterogeneous visitor audiences. Unlike screen-based or personal augmented reality systems, SAR unfolds across physical environments that actively condition perception, movement, and social interaction, rendering uniform evaluation strategies insufficient. This section synthesizes prevailing assessment approaches employed in SAR research, outlining how usability, workload, presence, learning, and social engagement are operationalized across different spatial scales, from tabletop installations to architectural and outdoor interventions. By reviewing standardized instruments, mixed-method evaluation practices, and qualitative interpretive strategies, the section situates current UX assessment methodologies within the broader challenges of ecological validity, narrative coherence, and cultural authenticity, while also identifying persistent methodological gaps and limitations in the existing literature.

UX (user experience) assessment of SAR in cultural sites begs for site-, audience- and narrative-specific approaches. Thus, controlled assessments consist of a multitude of standardized instruments that allow for cross-project and observation portability, intercept interviews, and minimal behavioral metrics such as dwell time or re-entry [23,35,47]. The idea is to determine if the interface works, in addition to whether the cultural intent works. Practically, this becomes triangulation of usability and workload with engagement, presence, learning, and quality of social interaction, like in [24].

Three types of assessments emerge from the discourse. Tabletop and object-scaled stations are ideal for shorter, task-oriented assessments, think-aloud observations, and quick pre- to post-testing of comprehension (like in [7,12]). Here, the researcher can control for POV (Point of View) and lighting while obtaining a standardized instrument immediately after implementation. Room and gallery installations (e.g., [1,9]) need to be assessed in the wild with mixed audiences, where standardized questionnaires can be mingled with intercepts and structured observations of interpersonal dynamics (turn-taking, pointing, discussion). Architectural or outdoor installations are the most difficult, thanks to their transient audiences in a loud setting. Here, the general assessment movement is brief intercept interviews, 1–2 focused recall questions, and survey feedback after the fact through email, sometimes supplemented by an expert assessment from exhibition- or heritage-related professionals, as for example in [8].

Three standardized evaluation instruments emerge as particularly prevalent due to their brevity, robustness, and suitability for both inter- and intra-study comparison. The System Usability Scale (SUS)42 provides a perceived usability score ranging from 0 to 100 and is commonly used to assess the clarity and accessibility of an interface across development iterations or between comparable systems. This makes SUS especially valuable for evaluating whether interaction mechanisms are understandable and operable within a given design cycle.

The User Experience Questionnaire (UEQ)43 extends evaluation beyond usability by capturing broader experiential qualities, including attractiveness, clarity, efficiency, control, stimulation, and novelty. These dimensions are measured using bipolar scales44 and are often supported by benchmark datasets45, enabling practitioners and researchers to contextualize results and interpret experiential quality even without deep methodological expertise. NASA TLX complements these instruments by assessing perceived workload across multiple dimensions, including mental demand, physical demand, temporal demand, perceived performance, effort, and frustration.

Within cultural environments where spatial augmented reality systems are deployed, these three instruments are often used in combination to provide a holistic assessment of visitor experience. SUS indicates whether visitors are able to access and operate the system effectively, UEQ reveals whether the experience is perceived as engaging and intelligible, and NASA TLX captures whether narrative comprehension and spatial navigation impose excessive cognitive or attentional load [23,24].

Presence and immersion are considered central qualities of spatial augmented reality experiences, even in the absence of head-mounted displays. Visitors frequently report a sense of being enveloped by the projected environment or of having stepped into an additional spatial layer applied to a physical model, surface, or architectural feature. To assess these experiential qualities, a limited number of studies employ established presence-related scales or develop custom measurement batteries that probe feelings of absorption, spatial situatedness, and co-presence with others sharing the space [27,39]. Learning outcomes and knowledge retention are evaluated using two primary categories of instruments. The first includes structured assessment formats, such as multiple-choice or short-answer questions, which target factual recall or relational understanding aligned with the intended narrative content. The second involves free-recall measures, where participants are asked to describe what they remember, with responses subsequently scored based on the identification of core concepts or key ideas. Less frequently, concept-mapping and sorting activities are used to assess learning, yet these approaches offer particular value in exhibitions focused on processes, systems, or complex relationships, as they can reveal how visitors organize and integrate information [40,52].

Qualitative data is critical as SAR is social and interpretive. Observational field notes document where people look, who leads the interaction, and when confusion arises. Brief intercept interviews record quotes assessing whether the narrative made sense, what historical facts could be separated from curatorial commentary, and whether it felt like an authentic experience in sacred or sensitive spaces. Expert assessments from curators, conservators, and educators often remark on authenticity, tone, and conservation-friendly argument. These do not replace visitor studies but carry more weight when an assessment needs to be made for visual spectacle versus academic nuance.

Across the reviewed literature, several recurring themes can be identified. Studies consistently report that the experiential qualities afforded by spatial augmented reality led to higher levels of visitor engagement and perceived presence compared to textual panels or static representational media. SAR is also shown to support improved recall, particularly in cases involving spatially complex information, such as architectural staging, construction processes, or iconographic relationships embedded within physical space. At the same time, multiple works note that cognitive load can become excessive when linear narrative structures are imposed onto exploratory or wandering visitor behavior, or when digital overlays compete visually with the surrounding environment. To mitigate this effect, effective SAR designs tend to structure information into smaller, digestible narrative units, slow down visual transitions at critical interpretive moments, and position labels or annotations in close proximity to the physical features they reference. These strategies are associated with reduced workload demands and improved comprehension. Beyond individual cognition, SAR installations also demonstrate clear social benefits. Engaging SAR experiences are associated with increased conversational exchange, more frequent gestural pointing, and stronger shared recall among visitors, who are more likely to recount the experience to one another during and after the visit. Such socially mediated meaning-making aligns with museum practices that increasingly prioritize collaborative and group-based learning over strictly individual or family-centered modes of interpretation [13,28,36].

However, significant gaps do exist. Longitudinal retention is rarely studied, and baselines are often weak: no direct comparison to non-projection alternatives illustrating the same content; sample sizes are occasionally too small for statistically significant claims; accessibility and inclusivity are reported unevenly. Few studies boast participants from age/disability/language subgroup dimensions, and fewer test alternatives to above-elbow gesturing by providing consistent captions/audio descriptions. Outdoor assessments at civic scales remain poorly assessed due to permitted realities, time pressure, and noise [34,47,49].

3. Reflection

3.1. Historic Context and Trajectory (2010–2025)

The timeline presented in Figure 8 summarizes the evolution of projection-based augmentation, from early civic façade installations in the 2010s to the more multimodal, participatory, and evidence-oriented forms of spatial augmented reality (SAR) observed in the mid-2020s. Developments shown above the timeline highlight the main technological and design shifts of each period, including the use of high-lumen projections for public outreach, the integration of RGB-D sensing and ultra-short-throw projectors, the emergence of interactive workstations and spatial scenography, the turn toward phygital experiences during the COVID-19 pandemic, and the growing emphasis on evaluation- and design-driven frameworks from 2022 onward.

Below the timeline, broader changes in cultural practice and interpretation are shown as parallel developments. These shifts reflect a move away from short-lived visual spectacle toward more context-sensitive interpretation, from linear storytelling toward dialogic and responsive mediation, and from individual encounters toward shared and socially distributed meaning-making. Key projects are placed as reference points along the timeline to indicate their relative position within these developments. Early examples include civic and reconstruction-oriented works such as Perspective Lyrique (2010) and Lighting the Sails (2010–2012). Later phases include room-scale interactive installations, such as Lozano-Hemmer’s Zoom Pavilion (2015), museum-based scenographic and maquette projections (e.g., the Mastic Museum), and large-scale façade installations developed by 2025, including Casa Batlló—Arborescent. More recent entries point toward emerging telepresence-based heritage applications that are still in early stages of exploration.

Projection-based augmentation emerged in the 2010s as a shocking public means. Facade video mapping was increasingly popularized at city festivals and heritage celebrations. High-lumen projectors brought buildings to life using cinematic footage. During this embryonic stage, exciting novelty and visual artistry, musical harmony, and the impermanent nature of performative endeavors took a backseat. There was cultural content—founding myths and civic anniversaries—but much of it was a one-sided presentation with limited interpretive potential and no interaction except for gazing with others. Yet, technologically, this era was marked by the cheapening of laser projectors, powerful media servers, and geometric warping software that made large-surface mapping accessible to cultural institutions and municipalities alike. Notable examples of that era include the work of [5], which presented an early façade-based heritage interpretation, and the work of [16], which demonstrated narrative projection for civic audiences, even if interpretive depth was still limited. Ιt was at this time that the boundaries of cultural SAR were set not by contact or teaching but by mere visibility, technological feasibility, and a nascent array of audience expectations for performance and societal storytelling.

Two things marked the field shift in 2012–2016. The first was RGB-D sensing, enabled by Kinect-type cameras. Both tabletop and object-scale SAR emerged in galleries and makerspaces. The projectionist could both pinpoint where to project onto physical models, maps, and artifacts, register projection with depth information, calibrate, and assess where a hand or token might be on a 2D plane. Museum “stations” became a staple where a scale model or relief of a map serves as a narrative interface that highlights overlaid layers of history, rituals, or infrastructure through projected layers and simple interaction. Notable examples of works with aforementioned features include [7], which used depth sensing and gesture with holographic projection, and [12], which combined depth capture, tangible manipulation, and projected overlays. For the first time, SAR transitions from spectacle to spatial pedagogy, enabling interpretive nuance, process nuance, and collective comprehension of a common spatial reference.

The second was ultra-short-throw optics, which brought SAR to average rooms without the need for long throw46 or heavy rigging47, hence facilitating pop-up implementations and more permanent gallery scenography. Notable examples include [9], which demonstrated compact multi-projector museum scenography. UST optics reduced installation barriers, allowing museums to integrate SAR into regular exhibition cycles and to favor repeatable scenography rather than single-night events.

From 2016 to 2019, cultural institutions began to advocate for spectacle alongside research. Architectural mappings of churches and archaeological digs began to carry narratives of interpretation, for example, polychrome revivals, iconographic interpretations, and the history of restorations. In museums, interactive SAR stations and projection-mapped models made spatial heritage complex and embodied. Simultaneously, mobile AR in the realm of heritage matured and vocabularies for designs emerged (virtual guides, time-portals, treasure-hunts). Although these were delivered via phones, the patterns fed back into narrative development and assessment processes for SAR. Works with these characteristics include [8], which projected chronological layers onto archaeological remains, and [13], which demonstrated spatial narrative layering and interpretive structuring. During this phase, SAR became increasingly hybridized with XR discourses, broadening its methodological vocabulary and creating expectations for evaluation, presence, and interpretive rigor.

The pandemic in 2020–2021 marked a conceptual turn. With head-mounted devices impractical to share in a gallery setting and increasingly online audiences, the conversation shifted to phygital “on-life” environments. Projection, spatial audio, and light in galleries and theaters created scenarios where the room became the interface, and stories could unfold socially without wearables. Inclusion and accessibility became part of the conversation in this instance. Projects sought to merge SAR scenography with participatory making efforts, while educators used sandboxes48 and tangible SAR to help teach collaborative learning for younger audiences. The objective was not just to present culture but to help the public rehearse and co-create it in space. Notable examples of this trend include the works of [30], who show how projection-based XR supported inclusion and collective exploration, and [15], which demonstrated collaborative, place-grounded storytelling and tangible interaction during a period when device-free experiences were essential. This period also accelerated a shift toward low-friction, hygienic, device-free interfaces, reinforcing SAR’s comparative advantage over HMD-based systems in multi-user cultural environments.

Since 2022, research on SAR for arts and culture has presented increased methodological rigor. Cultural projects in the wild increasingly report findings to the field and cross-validate standardized HCI and UX evaluation instruments [54], triangulated with interviews and observations. Findings coalesce around a deep grasp that SAR increases engagement and presence, and often positively supports recall of spatially complex material. Yet, cognitive load increases when linear narratives clash with nonlinear visitor exploration or when densely symbolic projections reside over moving bodies. Designers respond with staged disclosure of projected content, clearer wayfinding cues, and tangible manipulatives to ground attention. For example, in the work of [22], visitors’ recall and cognitive processing were directly correlated to narrative pacing and calibrated projection. This stage transitions from intentional (demonstration) deployments to a research-based design process involving SAR usability and user-centered metrics, cognitive load assessments, and visitor trajectory data.

In the mid-2020s, advancements emerge in three lines of up-and-coming technology. Multi-projector auto-calibration and reliable edge blending stabilize for large rooms and façades, and semi-permanent or portable cultural applications become viable, such as [42], whose multisensory XR49 pipeline relies on robust projection calibration. Vision- and machine-learning hand-tracking, similar to MediaPipe-class50 methods, enable gesture as meaning—not just cursor navigation—linking physical movement to narrative semantics, as seen, for example, in [7], which prototyped meaning-driven full-body interaction. Authoring pipelines also become more collaborative as curators, educators, and community members can add their own media and annotation contributions, versioned and scheduled like exhibitions. Examples include the works of [28], which outline participatory pipelines, and [53], which formalize co-design frameworks for cultural experiences. Collectively, these developments position SAR as a mature modality capable of supporting participatory, distributed, and co-located cultural learning rather than merely augmenting surfaces.

Over the last couple of years, we have been witnessing trends in two dimensions.

The first is co-creation and participation, becoming a well-established approach. Workshops open to communities, as reported for instance in [31], empower visitors to shape narrative sequences, and art-therapy or social-healing initiatives bring projection to the forefront as not just an arbiter of official narratives but a champion for marginalized voices, as demonstrated in [45], which deals explicitly with community memory and cultural sustainability. This trend signals a shift toward polyvocal and community-grounded modes of interpretation, aligning SAR with participatory museology and inclusive heritage practices.

The second is telepresence or Tele Absence overlaid on in-situ narratives that bring distributed publics together to annotate, assess, and re-evaluate urban heritage collaboratively, exemplified in [44], where remote participants co-engage with site-based narrative layers. Such developments broaden SAR beyond local audiences, repositioning it as a medium for cross-location cultural discourse and collaborative sense-making.

Across this timeline, there is a clearly defined trajectory. It morphs from ephemeral spectacle to situated pedagogy, from individual content delivery to socially constructed meaning-making, from haphazardly embodied creation to frameworks and metrics that help cultural institutions pursue intentional, evidence-guided design. Constructions include object/room/façade scales, passive/guided/co-creation modalities, and narrative-dominant/site-dominant staging logics.

Overall, the evolution of SAR in the world of arts and culture comprises a developmental history that moves from an increasingly stabilized relationship between technological application, narrative fidelity, audience accessibility, and empirical evaluation, which stabilizes the practice within a more advanced design and research field.

3.2. SAR Narrative Design Patterns

Patterns of narration indicate both where a story is staged across the real fabric of space and how visitors are invited (or not) to follow along (see, for instance, [37,38]). In the world of SAR, these patterns are confirmed by the affordances of place. A façade, a gallery room, and a tabletop all suggest different trajectories, pacing, and modes of engagement. The following patterns do not constitute templates to be followed strictly. They are recurring solutions that can be orchestrated, adjusted, and scaled if the integrity of the narrative and site relationship is sustained. They also function as a shared vocabulary across the literature, allowing cultural designers to link physical scale, interaction modality, and interpretive intent into coherent storytelling strategies.

Historized Outdoor Spaces. This pattern utilizes monuments, façades, and archaeological remains as the staging devices for public storytelling. Typical intents include the revelation of a site’s origin, emphasis on formative moments, and remediating group relationships with communal memories. The requirements of alignment include a multi-projector array, strong figure to ground, and legibility within active viewing areas. Since these audiences are temporary, are standing, and mostly do not know the details of the place, stories become more effective when they’re like a book’s chapters, patterned repetition, metaphorical images that can be understood in one sitting [5,16,43]. These patterns in performance, therefore, are more horizontal, take time to develop, and are more visually comprehensive for a bunch of people to understand, for those who remain for varying lengths of time. Recent media-architecture projects extend this pattern by embedding long-term, civic-scale narratives into building façades, such as the Longue durée installation, which stages heritage interpretation through large-scale, temporally layered projection on an urban structure, as seen in Figure 9 [26].

Transplanted Tabletop Storyworlds. In this pattern, narrative occurs on an actual map, scale model, or grouping of artifacts. The intended outcome is to clarify complicated spatial and process relationships at arm’s length. Projection reveals flows, stratifications, and cause-and-effect relationships rarely found on text panels alone. Tangible tokens or easy-to-use selectors allow visitors to choose themes. Strengths include precision and focused energy around a small surface, which can promote conversation and pointing. Risks include hidden states, cluttered overlap, and overshadowing by hands. Best practices include consistent mapping and token and layer associations, non-verbal states that signal how to begin, strong figure to ground contrast, and progressive reveal attributes where groups can stop once they have completed a light tour or drill down if time permits (this pattern is evident in tabletop and sandbox-style systems such as [7,11,12]). An example of this pattern in a museum context is the Mastic Villages installation at the Mastic Museum of Chios, where a projection-mapped maquette is used to convey intangible cultural heritage through layered narratives, as seen in Figure 10 [15].

Enhanced Indoor Rooms. Galleries and black box spaces become dynamic scenography. Multi-projector blends, spatialized audio, synchronous lighting, and sometimes scent/wind create a space that performs the narrative. The room itself becomes the interface, so circulation and sightlines matter as much as pixels. Typical stories are chaptered and rhythmically paced, with breathing room for reflection or discussion opportunities. Interactivity ranges from guided gazes to walkable timelines to collectively tangible tools. Strengths include social presence and sustained engagement without devices in hand. Risks include calibration lag, cognitive loading if too many projections compete for attention, and accessibility concerns. Stable auto-calibration, staged pacing, and clear wayfinding render the experience legible for repeated showings and for diverse audiences (see, for instance, [1,9,14,42]). Gallery-scale SAR scenography with synchronized projection, light, and sound, as in the Kirini installation seen in Figure 11 [14], where the entire room becomes a narrative interface, exemplifies this pattern by turning circulation and sightlines into core design parameters.

Virtual Guides and Embodied Agents. A guide character appears as a projected form or audio presence that cues attention to specific features in the space. In galleries, the guide can be mapped to a wall or scrim; they may “possess” a niche or statue to speak first person. Level of stylization matters for either attention or comfort. More cartoonish or stylized agents are easier to follow and accept than near-photo-realistic guides; spatialized audio supports attention to the speaker. Strengths include clarity and companionship. Risks include the uncanny valley reaction and over-emphasizing the guide while deemphasizing the place. Keep the guide contextually embedded, avoid expository monologues, and sprinkle moments where the place takes over (examples of such guiding and embodied agents can be found in [3,39,44,48].

Time-Layered Reveals and Palimpsests. Many cultural narratives promote change over time. This pattern overlays past, present, and ideally future realities atop the same surface. For example, polychrome revival on statues, restoration of lost frescoes, or industrial byproducts are presented at various operational stages over time. The method alternates additive/reductive reveals so viewers can compare states without losing perspective. Strengths include cognitive clarity about sequence and causality; risks include temporal disjunctions and visual fatigue if transitions are too rapid. A clear temporal legend, repeated anchors across states that persist through certain reveals, and reversible toggles help create a mental map instead of watching a sleight-of-hand endeavor [8,43,51].

Gamified Exploration and Role Play. Visitors traverse tasks, clues, and roles that unlock narrative pieces. In rooms, it is based upon stations; outside, it is appropriate points of interest and safe pathways. Strengths included the motivation and remembered essence of active engagement. Risks include gamified experiences that replace meaning with collection mechanics, frustration due to misalignment or unclear rules, and consistently tracked errors if too many moving parts inhibit clarity, unless kept to a short length of engagement. Balance is found when tasks are kept brief, when mechanics relate to cultural intention, when reflective prompts or end-of-journey recaps reframe the experience as narrative instead of mere scoring (see, for example, [29,36,40]).

Tangible Rituals and Tool Reenactments. Things that belong to a craft or ritual become input devices. A visitor turns a miniature wine press to expose the wine-making process, places a stylus where a scribe would go to write, and positions a hive tool to expose a chapter on beekeeping. Strengths include embodied comprehension with immediate learnability; risks include wear-and-tear, hygiene issues, and the temptation to favor cutesy props over accurate realities. Durable materials, cleanable surfaces, and mappings that follow real processes keep the pattern authentic. This is particularly strong for intangible heritage, where know-how matters as much as objects. Examples of such craft- and ritual-centered projections include [15,45].

Participatory and Co-Created Narratives. Community contributions—or visitor contributions—emerge as projected voices, alternative pathways, or scheduled sequences. In museums, this may be curated co-authorship with educators or nearby groups; in public spaces, this may be gathered contributions voiced back at appropriate times; remote inclusions via telepresence may be layered as well, so distant publics emerge or appear to lend their voices. Strengths include plurality and inclusion; risks include the moderate burden that obscures provenance or confuses fact/memory/opinion. Clear framing, visible citations of source material, and gentle editorial layering allow dialogic representation to enhance heritage interpretation instead of muddying it. Representative examples include [28,31,48,53].

Composing Patterns. Most successfully implemented permutations involve two to three patterns simultaneously. A façade performance introduces the space through outdoor, historicized storytelling, then the gallery deepens understanding with time-layered reveals, while a tabletop engages specific thematic, tangible exploration. The rule of composition is simple: if each additional pattern decreases cognitive load or increases meaning, great! If it does neither, take it away. Maintain one narrative through-line throughout that visitors can summarize in one sentence upon exit, and that one sentence is how it is tested whether the narrative survives the technology. This multi-layered composition is exemplified in projects and surveys such as [10,26,33].

Selecting a Pattern. Select the lightest pattern appropriate to site/cultural intention: outdoor historicization for civic myth vs. identity, tabletop storyworlds for spatial relationships/process; indoor room scenography for shared inquiry/chaptered interpretation; add agents when guidance/companionship is important, not out of default; time-layered reveals are preferred when change over time is the central message instead of bringing game loops into play when motivation is key (but not when meaning is already strong); reserve participatory elements for contexts that can support moderation where multiple voices are critical to the story. When pattern and place purpose align, SAR feels like the site telling its own story with technology as a subtle stage manager instead of center stage. These selection logics resonate with broader reviews of AR/XR in cultural heritage, such as [27,34,46,47,49].

Figure 12 illustrates eight narrative patterns that result from exploring how stories emerge from and over physical space during SARs. Historized Outdoor Spaces take advantage of monuments and façades as the public, commemorative storytelling surfaces, and memory/place-induced inseparabilities. Transplanted Tabletop Storyworlds perform stories on maps, models, objects, and other situated elements that clarify spatial or process relationships. Enhanced Indoor Rooms take advantage of scenography in exhibit hall spaces to position the room as the interface. Virtual Guides and Embodied Agents render characters or pre-recorded sounds to the space to draw attention to an area and retrospectively frame understanding. Time-Layered Reveals and Palimpsests allow visitors to experience change over time, gradually maneuvering in different states. Gamified Exploration and Role Play let stories unlock across spaces and through objectives, inviting the visitor to piece together the information through clues or tasks. Tangible Rituals and Tool Reenactments use physical craft objects or tools as input devices that unlock embodied knowledge. Finally, Participatory and Co-Created Narratives are those that seek visitor or community input, allowing for plural and dialogic modes of interpretation. Therefore, these patterns assess narrative motivation, spatial opportunities, and audience-based engagement to define the narrative potential of SARs.

3.3. Inclusion and Accessibility

Accessibility in spatial augmented reality must address visual, auditory, motor, cognitive, linguistic, and social differences at both the content and interface levels [22,35]. From a visual standpoint, projected information should remain legible under real-world conditions through large font sizes, adequate line spacing, strong contrast, and short textual units. Long sentences and dense text should be avoided—particularly on uneven or textured surfaces—in favor of concise captions, iconography, and pictograms that can be parsed from a distance. When narration is present, layered captions and optional audio-description tracks should be available through low-friction access mechanisms such as QR codes or handheld receivers. Light transitions should remain gradual, conservation limits for lux and flicker must be respected, and any strobing effects should be clearly signposted with accessible exit routes [22,35].

Auditory and voice-based modalities are central to inclusive SAR, both for accessibility and for the representation of multiple perspectives. Captions should consistently accompany spoken content, and meaning should never rely solely on audio delivery [27,30]. Voice can also support plural and contested narratives through curated quotations, attributed annotations, and scheduled alternative interpretations that distinguish between documented history, collective memory, and interpretation. Where appropriate, co-creation processes can yield short narrative contributions that are reviewed and integrated by curatorial teams, allowing disagreement and plurality to be acknowledged rather than flattened [28,45,48,53].

Motor accessibility requires interaction techniques that do not depend on fine motor control, sustained arm elevation, or precise positioning. Reachable interaction zones, alternatives to mid-air gestures, and stable, graspable tangible elements should be provided. Seated perspectives must be fully supported, and spatial layouts should ensure step-free access, sufficient circulation space, and clear sightlines so wheelchair users and mobility-impaired visitors can engage without obstruction or pressure from surrounding crowds [27,30].

Cognitive accessibility benefits from clear narrative structure, limited simultaneous layers, and strong spatial anchoring between content and the physical features it references. Overly dense symbolism or parallel timelines can increase cognitive load and disorientation. Information should therefore be presented in manageable units, with clearly signposted choices and simple ways to return to a main narrative thread. Short recaps, summary panels, or educator modes with pause points can support comprehension, reflection, and memory without penalizing visitors who disengage or rejoin at different moments [36,40,52].

Because SAR is inherently shared, interaction design should support turn-taking, conversation, and collective interpretation. Interaction should not privilege a single user while others wait, and larger installations should provide multiple points of engagement so small groups can participate simultaneously. Viewing and listening areas must remain accessible for both seated and standing visitors, avoiding visual occlusion and acoustic shadows [6,15,17].

Finally, accessibility extends to operational practice. Staff procedures, calibration routines, assistance protocols, and crowd-management strategies must support inclusive use without foregrounding technical mediation. When accessibility is treated as part of the narrative and spatial design contract, SAR installations become more legible, welcoming, and effective, not measured solely by novelty or spectacle, but by who can participate, understand, and share the experience without barrier [8,23,27,35].

3.4. Design Tensions, Practical Constraints, and Recommendations

The design and deployment of spatial augmented reality in cultural contexts are shaped by a series of recurring tensions that arise from the coexistence of technological possibility, curatorial responsibility, and the operational realities of public space. Projection-based SAR systems must simultaneously engage broad audiences and uphold scholarly rigor, accommodate non-linear patterns of visitation while conveying coherent narratives, and balance sensory impact with legibility, conservation, and accessibility. This section articulates a set of practical yet theoretically grounded trade-offs that repeatedly emerge in cultural SAR practice, ranging from spectacle versus scholarship and shared experience versus personalization, to robustness, authorship, and long-term sustainability. Rather than presenting prescriptive solutions, the discussion frames these tensions as design constraints that require early recognition and continuous negotiation, drawing on documented examples to illustrate how narrative integrity, inclusivity, operational resilience, and ethical responsibility can be maintained without undermining experiential quality.

Spectacle vs. scholarship. Projection is a source of awe and can garner public engagement, but spaces of culture are responsible for factual integrity and compassion. If something is sequenced for visual appeal and not the data, however, participants are entertained but misled. The solution? Narrative integrity: draft the curatorial through-line early on, delineate sources and argument, employ visuals for supplementary support, no leading applause along with quieter moments, cite captioned elements, and curator voice when appropriate, and cut effects for time if meaning is being cut; there is not enough time to sacrifice meaning. Examples of this trade-off appear in façade-based works such as [5], where high-impact spectacle required careful historical framing, and in archaeological mappings such as [8], which balance visual revelation with scholarly accuracy.

Linear story vs. nonlinear space. The audience (users or visitors) chat and enter in the middle; it is rarely the case that a linear narrative plays through to completion for everyone. Therefore, write in chapters that loop. Each should have a clear entering/exiting motion, anchored by 2–3 highlighted landmarks to help latecomers get up to speed. A combination of progressive disclosure for those on the move and summative moments at the end helps those who exit early grasp a workable overarching understanding. This approach is exemplified in gallery installations such as [9], where modular chapters supported circulating visitors, and in participatory stations like [19], where visitors could join and leave without losing narrative meaning.

Attention vs. overload. SAR illuminates pre-discussed environments. However, large swaths of text projected upon non-planar surfaces, fast cuts, and simultaneous overlays overwhelm visual presence and increase cognitive load. Project short words and employ more pictograms and proximate callouts to items of interest, use slower pacing during crucial moments, and avoid more than one moving layer projected in a patron’s field of vision for emphasis. If there is interactivity, render the gesture or physical map semantically transparent for learning purposes, not technological deciphering. UX-load issues have been documented clearly in [22], where pacing and label placement influenced comprehension, and in mobile AR studies like [24], where excessive overlays increased workload.

Brightness/legibility vs. conservation/sacred etiquette. Cultural surfaces have specific lux requirements and sacred engagements. Create a lighting budget per scene, including idle states; use high-contrast palettes and local highlights versus global washes; do not move too quickly with flicker/strobing/overpowering color cycling in sacred spaces. When in doubt, reduce dwell time to allow for a still screen to read. These constraints appear in façade and heritage projects such as [20], where excessive brightness had to be balanced with architectural sanctity, and in museum contexts in [35], where lighting variation shaped comfort.

Robustness vs. complexity. Multiple projector setups, depth sensors, and custom props increase failure points. Cultural operations need reliability to exceed opening night; use stable optics and mounts, auto-calibration requirements, and content that can withstand small drift. Daily turning on and off should be under the punch checklist for staff. Spare parts for critical sensors and props; it is better to have anticipated failure if one cannot afford the time lost with more than one failure point. The simplest solution that tells the story well is the best one. This challenge is highlighted in multisensory XR setups like [42], which emphasize reliability across components, and in hybrid SAR–VR systems such as [38], which require stable cross-modal alignment.

Co-presence vs. personalization. SAR works best when everyone is looking and talking simultaneously; however, many participants (users or visitors) request personal control or language changes. Offer low-tech personalization that does not fragment the joint scene, i.e., a QR code that turns off/on specific captions/audio description on a personal device or multilingual handouts that correspond with the same chaptering structure of the projection. Make personal devices optional. This tension is evident in [17], who highlight overlapping needs of different visitor profiles, and in participatory systems such as [36], where co-creation must remain shared rather than fully individualized.

Authentication/provenance/contested histories. Spaces provide authority to what is presented upon it; if the space is traumatic, absent, or controversial, however, narrative objectivity can become a misrepresentation in and of itself. Cite sources, distinguish reconstruction from provenance, credit community members’ voices, schedule multiple perspectives (where applicable) and identify them as perspectives instead of facts, and keep an editorial log so changes over time are evident. This concern is central in works such as [45], which deal explicitly with cultural memory and authenticity, and in telepresence reinterpretations such as [44], where multiple publics co-author meaning.

Accessibility vs. production realities. Tight production schedules encourage teams to avoid captioning/audio description/reach testing/overhead gesture alternatives. Bake accessibility into your definition of done; create minimums for font size and contrast; seated positions should be accounted for without blind spots; tangible controls should help with limited mobility; if gestures are included, an alternative tangible or floor-based trigger should be offered. Accessibility-centered SAR can be seen in [30], which integrates mobility and cognitive accessibility considerations, and in [27], which emphasizes multisensory and inclusive design frameworks.

Operational realities of public space. Outdoor projections have weather, light pollution, and traffic safety to consider. Secure appropriate permits well in advance; create viewing areas that do not obstruct roadways or pathways. Make sound transmission inconsequential to residents; provide a degraded mode for wind or partially broken projector (i.e., reduced list of chapters that still make sense); plan for cleanup, security, and no abrupt end that leaves the monument darkened or glossed over. These challenges parallel those addressed in civic-scale media architecture, such as [26], where environmental constraints shape operational decisions.

Rights/data/privacy. Archival images/community videos/interactive metrics must maintain legal/ethical constraints; rights must be clear for projected items for repurposing; visible consents must be rendered for recorded contributions; certain data retention limits cannot be bent; if QR codes are used to obtain feedback, make clear what is kept and for how long. Participatory projects such as [31] highlight consent and provenance management when integrating community-generated content.

Authoring at scale. Museums/municipalities must be able to update content without reopening engineering. Dissociate content from mapping logic (wherever possible) to allow for a lightweight CMS for captions/timelines/media playlists that feed into mapped elements; version/localized text on the outside, so curators can iterate; document a pipeline so future teams can understand the project over time. Workflows of this nature are described in [53], where co-design pipelines formalize editable, curator-friendly structures.

Budget/energy/sustainability. High-impact projection takes power/lamp hours/laser hours: estimate energy per hour deployed and plan appropriately; sustain fewer powerful visuals instead of constant underwhelming movements; reuse rigs/meshes from one programming to the next; create modular props that can be reskinned for the subsequent exhibition. Sustainability considerations echo arguments made in [20], which encourages mindful energy use in technologically intensive cultural exhibitions.

Interaction in SAR. Anecdotal best practices should seal the deal: use the least invasive form of interaction that manages to effectively tell the story at hand for any site in question. If it is chaptered guided gaze that is enough, do not add gesture; if it is a single tangible, that is enough to convey the conceptual theme, do not add a panel; if it is easier for bodies to function without hands, let space have its way with them. The more SAR works for the site/story instead of competing with it, the more natural a social experience it will be (and memorable)! Examples include the single-token, low-friction designs in [15] and the gesture-based but semantically simple interactions in [7].

A more general practical advice is to use the smallest technological footprint necessary to complete the cultural purpose, then let bodies/light/sound do the heavy lifting. When tensions are proactively understood from the very beginning and managed throughout with present solutions for a unified conclusion, technology fades into the background, operations stay composed, and the space has something to say.

4. Discussion

This review synthesizes critically and creatively the current literature relative to cultural SAR and preliminarily assesses, based on how inter-implicated scale, interaction, and narrative are more or less connected, and how each is substantiated with data relative to audience impact. Across the studies selected, a number of overarching findings and tensions emerge beyond case-by-case results. This section synthesizes our findings based on where physical scale connects with narrative structure, what is the most popular means of interaction (and why), how SAR is currently assessed in situ, and where discrepancies or gaps in the literature exist.

4.1. Physical Scale, Narrative Form, and Spatial Affordances

The first major finding that emerged relates to the interconnectedness between physical scale and narrative form. Object/table SAR tends to champion narratives of accurate investigation and process explanation as their protagonists; audiences discover maps/models or artifacts in close proximity, which yield complex spatial or procedural mappings [7,12,13]. At this level, projection is more differentiated for progression or comparison than a static label or screen. Moreover, tangible/token-based interaction makes cause-and-effect mappings clear for family groups/school-aged groups.

Room and gallery scale SAR, however, seem to more often favor dialogic collaboration and chaptered interpretation [9,15,33]. The interface itself (the space) operates as a means for joint inquiry as visitors move, point, or even gesture together within the immersive audio-visual space. Multi-projector set-ups/spatialized audio/lighting control enables scenographic framing that can promote a more structured, yet fluid, paced narrative. The room scale is best suited for narratives that champion multi-perspectival/staged reflection since audiences can see both the content and each other.

Finally, building/outdoor scale narratives are more often aligned with civic myth and contested identity narratives, especially projected on façades, archaeological contexts, and landscapes [5,16,43]. Such projections engage large, diverse, often transient populations with mixed levels of prior knowledge/relational acknowledgment to the historical in question; thus, a narrative posture must champion bold figure-ground differentiation, episodic arcs that can be accessed mid-course, and symbolic points of reference legible from afar. While this scale may be effective as an emotionally compelling option and civically dominant gesture, it simultaneously risks oversimplifying complex or contentious histories if not properly framed.

Taken together, these findings suggest an overarching guiding principle: scale is not a neutral container but rather a narrative decision. Object/table SAR works well for detailed investigatory narratives; room-scale SAR works well for dialogic collaborative understanding; façades/outdoor-based narratives work well for public mythic/civic memory. To argue otherwise (for example, nuanced debate on a transient façade show) risks cognitive overload or interpretive oversimplification.

4.2. Interaction Modalities and Preferred Engagements

The second major finding relates to interaction modalities/modalities. Across the corpus, passive guided gaze and tangible/token-based interaction are far more cited than followed by gesture and whole-body input, walk-through/locomotion, and finally co-creation/participatory modes. Voice/speech-based interaction is almost entirely absent from cultural SAR applications where interaction unfolds through the physical/spatial/visual domain.

Passive guided gaze is the basic standard, especially relative to façade-scale and room-scale applications, where conservation crowd control concerns and sacred spaces prevent direct interaction [5,15,16]. Designers use temporal pacing, as needed, to bring specified selections or compositional emphasis to directed attention without appeal for explicit directed input, which is low-friction across audience density and pre-existing levels of familiarity. This is most often preferred as productive, where respectful distance should be maintained or where the intention is more contemplative than active problem-solving.

Conversely, tangible/token-based interaction dominates table contexts as props/dials/tagged objects share close proximity with their referenced projections on a common table/projected surface [7,12,13]. This is preferred as they are easy to find (instead of trying to find an app), easy to negotiate collaboratively around a shared surface, and do not engender a “heads-down” disconnect typically associated with personal devices. It also allows for content mapping that maintains stability across repeat visits with varied communities (legibility).

Gesture/whole-body input is used sparingly and often experimentally. While active interaction presents exciting opportunities for engagement, it brings tracking ambiguity over time, at best, fatigue at worst, “Midas touch” where gesture vocabulary is too extensive for recognition for input [7,41]. Thus, gesture-based solutions for SAR serve best with a limited number of semantically relevant actions, reinforced with audio-visual affordances.

Similarly, locomotion/walk-through input is found mostly within room/gallery scale, where spatial progress serves as a narrative mode [9,40]. These designs capitalize on audience movement as a quasi-narrative mode; however, they must allow for non-linear exploration/outcomes without compromising the arc. Finally, co-creation/participatory engagement modes are more limited but an exciting opportunity where community voices/alternative readings become literally projected as part of the narrative [28,31,53].

Overall, the tendency toward passive/tangible modalities suggests little investment in high-friction/complex agency with SAR, favoring robust social-legibility (but less dynamic) engagement that benefits institutions with conservation concerns and accessibility needs. Higher-complexity modalities—gesture, voice, and multi-device orchestration—are cautioned or under-explored.

4.3. Evaluation Practices, Evidence Gathered, and Existing Gaps

The third theme relates to how cultural SAR is evaluated; an emerging body of literature suggests a triangulated assessment of user experience between standardized measures (SUS, UEQ, NASA-TLX) [22,23,24] and interviews and field observation. Thus far, these consistently determine that SAR accounts for higher levels of engagement/presence/recall of spatially complex content than traditional media interventions [7,13,22]. At the same time, they cite possible cognitive overload; when high-density outputs meet non-linear visitor trajectory or when too much visual information competes for space within one viewport [22,39,52].

However, despite these findings, gaps exist. Longitudinal retention/transfer-of-learning studies are few. Most assessments occur immediately or shortly thereafter; baseline comparison is not always achieved, which means quality exhibits without projection do not apply, even when their effectiveness cannot be met with such low-tech approaches/for static content. Sample sizes are often small/biased toward interested visitors; age/disability/language/socio-economic accessibility factors are inconsistently mentioned [34,47,49]. Additionally, outdoor/façade SAR interventions are often poorly assessed due to logistical challenges/transient audiences that either lend themselves to expert anecdotal commentary or evident expert opinion without visitor studies.

Thus, the evidence base is strongest for short-term engagement/perceived presence in more controlled/semi-controlled indoor settings, weakest for long-term learning/inclusivity/community impact. To bridge the gap requires closer collaboration between HCI researchers, curators/evaluation specialists, as well as consistent reporting of methods/measures.

4.4. Contradictions and Open Tensions

Finally, certain tensions emerge across the literature that remain unresolved or occasionally render contradictory findings. First, research indicates tension between spectacle vs. scholarship: while spectacle can bring together diverse publics, positioning the space and any subsequent interaction to occur with a projected advantage (thus potentially enhancing engagement levels), it can overshadow nuance or flatten contested narratives worthy of critique [5,8]. Some argue that magic light shows dilute interpretative rigor, others bemoan “light shows” that might trivialize sensitive places.

Second, research demonstrates tension between linear narratives and non-linear spatial exploration. While designers can easily sequence a narrative that is easily narratable as a projected arc for meaning through approach/departure and inevitable pathing with wonder, visitors arrive late, leave early, and wander all the time [9,39]. Some note confusion/fragmented interpretations when modular structures cannot guarantee access, while others value progressive disclosure as a requisite for effective design. The literature has not yet corroborated stable design patterns that reconcile these competing demands affordably.

Third, research notes tension between co-presence vs. personal preferences, while co-presence in device-free engagement spaces validates how powerful SAR (minus devices) can be, personal users increasingly expect personal control (translation access/portability/via personalized devices). Some projects explore personalized devices as an optional means of acquiring captions or audio descriptions [30,35], but little empirical evidence has yet been applied concerning social interaction/interplay across attention capacity.

Finally, research features tensions surrounding participation vs. authorship. While co-created narratives/community-generated elements empower representation through inclusion (especially favorable for outcomes), they can compound provenance/editorial integrity versus established historical data/memory/narratives versus educated speculation [28,45,53]. Few articles acknowledge stable frameworks that navigate tensions for long-term implementations.

4.5. Toward a Design and Research Agenda for Cultural SAR

A preliminary agenda for future research emerges across these themes. Further research is needed to better gauge: (i) how narrative patterns could be scale-appropriate and interaction-based with minimal cognitive effort and maximal meaningful engagement, (ii) how evaluation could be extended beyond immediate participation to learning longitudinally, equity and access, and community relevance, and (iii) how clearer parameters could be established for co-authorship, contested narratives, and ethical expressions in speculative narration with projection-based storytelling. Clarifying such questions will render SAR not only a cutting-edge medium but an advanced, applied, situated discipline of practice for cultural attribution and community interpretation.

5. Conclusions

This literature review represents the culmination of ten years of study into cultural SAR. The field has evolved from technological mediation for spectacular perception to an evidentiary field for public-facing interpretation, education, and cultural meaning-making. With 52 studies revealing themes across the scale and means of interaction involved, as well as the narrative intent of successful installations, a definitive argument for scaling and authoring SAR beyond a logistical boundary. Object and tabletop SAR best support investigative narratives, for projection is intrinsically linked to a material referent. Room and gallery-scale SARs are best for dialogue-based meaning-making, for con-presence, and spatial motion helps direct articulation. Façade and outdoor SAR are best for civic mythology, for episodic pacing; large referent symbols suggest meaning-making in a more casual, expansive, transecting situation. These connections suggest that scale is a narrative decision and not a logistical constraint, for cognitive overload and interpretation flattening too commonly occurs when scale and narrative intent fail to align.

Similarly, the means of interaction align well with cultural sensibilities and contextual nuances. Guided gaze (passive) and tangible/token means of interaction dominate due to their robustness, low-friction opportunities, and social legibility. Gestures are common systems of interaction in use but limited in frequency due to operator-induced constraints to reduce fatigue, frustration, ambiguity, or sonic or sensory interference. Participatory and co-creation modes of interaction are the least frequent but represent an important direction toward pluralistic authorship and community-based storytelling. Furthermore, it is surprising that voice interaction across the literature review boasts no instances; acoustic awareness and cultural awareness toward tonality are important in heritage spaces where spatial interaction reigns supreme due to SAR’s cross-over with visual assessment.

Assessment practices have also increased over the years substantially as qualitative and quantitative measures triangulate standardized usability (SUS, UEQ), workload (NASA-TLX), and presence assessments (PIR)51, literacy (learning recall test), observational measures (field study), and expert review. Findings suggest that SAR creates dynamic engagement, spatial cognition, and social interaction; concurrently, they report consistent cognitive overload as symbolic layering is too great within non-linear patterns. But significant gaps exist: longitudinal learning assessments are rarely made, implementation of accessibility is anecdotal at best, and assessments of outdoor/civic/scale limitation are minimal, as they fail to operate effectively through better methodology. This suggests the need for a more systematic comparable assessment unit where inclusivity is deemed appropriate and systematic assessment is more necessary.

Ultimately, despite these findings from existing literature thus far, a great number of design tensions remain unclear—spectacle vs. scholarly focus, linearity vs. non-linearity of narrative exploration, co-presence vs. individualization, and participatory vs. curatorially dictated authorship—and will need to be mediated by an authoring process that engages with narrative construction to a cohesive framework based upon spatial affordances, means of interaction lexicon, and audience diversity.

Thus, this literature review supports cultural SAR as a progressive, socially aware, and methodically expanding medium that simultaneously situates itself and those who create it at the intersection of place, body, and meaning-making for the collective. Future developments should link the patterns and gaps found here through scalable authoring pipelines, more in-depth learning assessments for participatory protocols that ethically bring community voices together to bear. As cultural institutions look toward SAR more increasingly for interpretation, education, and civic discourse, the foundation for such established practices now exists to warrant moves beyond pure prototyping into practical sustainability with ethical, narrative-centric approaches for public-facing work.

Author Contributions

Conceptualization, P.P. and P.K.; methodology, P.P. and P.K.; validation, P.P. and P.K.; investigation, P.P.; resources, P.P.; data curation, P.P. and P.K.; writing, original draft preparation, P.P.; writing, review and editing, P.P. and P.K.; visualization, P.P.; supervision, P.K.; project administration, P.K. and P.P.; funding acquisition, P.P. and P.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by Industrial PhDs, Greece 2.0, and the National Recovery and Resilience Plan, grant number 0555660.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

Notes

1	Phygital refers to the integrated experience that seamlessly blends the physical and digital worlds. It describes products, services, environments, or interactions where tangible, real-world elements are enhanced, augmented, or extended through digital technologies, such as sensors, interactive media, AI, AR/VR/SAR, or connected platforms, creating a unified and continuous user experience across both realms.
2	AR—Augmented Reality.
3	MR—Mixed Reality.
4	HMDs—Head Mounted Displays.
5	Palimpsests are layered surfaces, texts, or spaces in which earlier forms, meanings, or inscriptions have been partially erased, overwritten, or transformed, yet remain perceptible. The term is used metaphorically to describe systems (cultural, spatial, historical, or digital), where multiple temporal layers coexist, allowing past and present narratives to overlap and inform one another.
6	Edutainment refers to the integration of education and entertainment in experiences, media, or applications designed to engage users emotionally and cognitively while facilitating learning. It combines instructional content with playful, narrative, or interactive elements such as games, storytelling, audiovisual media, and simulations to enhance motivation, comprehension, and knowledge retention.
7	SUS (System Usability Scale) is a standardized questionnaire used to evaluate the perceived usability of a system. It consists of ten items rated on a Likert scale and provides a single score reflecting overall ease of use, learnability, and user confidence. SUS is technology-agnostic and widely used due to its reliability, speed, and comparability across systems.
8	UEQ (User Experience Questionnaire) assesses user experience beyond usability, capturing both pragmatic qualities (such as efficiency and clarity) and hedonic qualities (such as stimulation and novelty). It uses semantic differential scales and is particularly suited for evaluating interactive systems where emotional and experiential factors are critical.
9	NASA-TLX (Task Load Index) measures perceived workload during task performance. It evaluates mental, physical, and temporal demand, as well as effort, performance, and frustration. NASA-TLX is commonly used to understand cognitive load and user strain, especially in complex, immersive, or safety-critical systems.
10	Edge blending: A technique in projection mapping that smoothly overlaps images from multiple projectors to eliminate visible seams and create a single continuous image.
11	Color matching: The calibration process that ensures consistent color, brightness, and white balance across multiple projectors in a projection mapping setup.
12	Lumens: A unit of measurement indicating the perceived brightness of a light source or projector output.
13	Photometric compensation: A technique in projection mapping that corrects brightness and color distortions caused by surface geometry, texture, and reflectance, ensuring the projected image appears visually consistent and accurate across non-uniform surfaces.
14	RGB-D cameras: Cameras that capture standard color images (RGB) together with depth information (D), enabling the measurement of distance, geometry, and spatial structure of a scene.
15	Fiducial marker systems: Computer vision systems that use predefined visual markers to enable reliable detection, tracking, and spatial alignment of digital content within a physical environment.
16	AprilTags: A family of robust fiducial markers designed for precise and reliable visual detection, commonly used for camera pose estimation, tracking, and spatial alignment in computer vision and mixed reality systems.
17	ArUco: An open-source fiducial marker system used for camera calibration, pose estimation, and object tracking, widely applied in augmented reality and computer vision applications.
18	Time-of-Flight (ToF): A depth-sensing technique that measures the distance between a sensor and objects by calculating the time it takes for emitted light to travel to a surface and return.
19	RFID (Radio-Frequency Identification): A technology that uses radio waves to wirelessly identify and track tagged objects without direct line of sight.
20	NFC (Near Field Communication): A short-range wireless communication technology that enables data exchange between devices or tags when they are brought very close together.
21	Inertial Measurement Units (IMUs): Sensors that combine accelerometers, gyroscopes, and sometimes magnetometers to measure motion, orientation, and acceleration in three-dimensional space.
22	Ultra-Wideband (UWB): A short-range wireless technology that enables highly precise distance and position tracking by transmitting signals across a very wide frequency spectrum.
23	LiDAR (Light Detection and Ranging): A sensing technology that measures distances by emitting laser pulses and analyzing their reflections to generate precise three-dimensional representations of the environment.
24	Homographies: Mathematical transformations that map points between two planes, commonly used in computer vision to align images and relate a projected image to a physical surface.
25	Projector’s pixel grid: The fixed two-dimensional array of discrete pixels that defines how a projector outputs and spatially samples an image onto a surface.
26	Geometric warping: The process of digitally distorting an image so it aligns correctly with the shape, orientation, and geometry of a physical projection surface.
27	Photogrammetry: A technique that reconstructs three-dimensional geometry from multiple overlapping photographs by analyzing visual features and their spatial relationships.
28	Unity: A real-time 3D development engine widely used to create interactive applications, simulations, and immersive experiences across multiple platforms.
29	Unreal Engine: A high-performance real-time 3D engine used for creating visually advanced interactive applications, simulations, and immersive experiences.
30	TouchDesigner: A node-based visual programming environment for real-time interactive multimedia, widely used in live visuals and immersive installations.
31	Notch: A real-time graphics tool focused on high-end visual effects and interactive motion graphics, often used in live events and projection mapping.
32	vvvv: A visual programming framework for real-time graphics, interaction, and physical computing, commonly used in media art and installations.
33	openFrameworks: An open-source C++ creative coding toolkit for building custom interactive and audiovisual applications.
34	Processing: A creative coding platform designed to make visual and interactive programming accessible, especially for artists and designers.
35	Disguise: A real-time content playback and show control platform for large-scale projection mapping, immersive environments, and live events.
36	Watchout: A multi-display media server system used for synchronized playback across multiple projectors and screens.
37	Resolume: A real-time VJ and media playback software widely used for live visuals, projection mapping, and audiovisual performances.
38	MadMapper: A projection mapping and media control software used to map, warp, and blend visual content onto complex physical surfaces in real time.
39	Diegetic sounds: Sounds that originate within the narrative world of an experience and are perceivable by its characters, reinforcing realism and spatial coherence.
40	DMX (Digital Multiplex): A standardized digital communication protocol used to control lighting, effects, and stage equipment in real time.
41	sACN (Streaming Architecture for Control Networks): A network-based lighting control protocol that transmits DMX data over IP networks for scalable and distributed control systems.
42	System Usability Scale (SUS): A standardized ten-item questionnaire used to assess the perceived usability of a system or interface.
43	User Experience Questionnaire (UEQ): A standardized questionnaire used to measure user experience across pragmatic and hedonic quality dimensions of an interactive system.
44	Bipolar scales: Measurement scales that assess responses between two opposing attributes (e.g., easy–difficult), commonly used in usability and user experience evaluation.
45	Benchmark datasets: Standardized datasets used to evaluate, compare, and validate the performance of algorithms, systems, or models under consistent conditions.
46	Long-throw projection: A projection setup in which the projector is placed at a relatively large distance from the surface, suitable for covering large areas or façades.
47	Rigging: The process of securely mounting and supporting equipment—such as lights, projectors, or speakers—using structural systems in performance and installation environments.
48	Sandboxes: Isolated testing environments used to safely experiment with systems, software, or content without affecting live or production setups.
49	XR (Extended Reality): An umbrella term encompassing virtual reality (VR), augmented reality (AR), and mixed reality (MR), describing immersive technologies that blend digital and physical worlds.
50	MediaPipe-class methods: Real-time computer vision and machine learning pipelines for tasks such as body, hand, and face tracking, optimized for low-latency interactive applications.
51	PIR (Presence, Involvement, and Realism): A framework used to assess how strongly users feel present, engaged, and perceptually convinced within an immersive or mediated experience.

References

Bimber, O.; Raskar, R. Spatial Augmented Reality: Merging Real and Virtual Worlds; CRC Press: Boca Raton, FL, USA, 2005. [Google Scholar]
Kenderdine, S.; Chan, L.K.Y.; Shaw, J. Pure Land: Futures for embodied museography. ACM J. Comput. Cult. Herit. 2014, 7, 8. [Google Scholar] [CrossRef]
Kim, M.; Lee, J.; Stuerzlinger, W.; Wohn, K. HoloStation: Augmented visualization and presentation. In Proceedings of the SA ’16 Symposium on Visualization, Macau, China, 5–8 December 2016. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, P.; Zhu, Z. Integrating performer into a real-time augmented reality performance spatially by using a multi-sensory prop. In Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology (VRST ’17), Gothenburg, Sweden, 8–10 November 2017. [Google Scholar] [CrossRef]
Barber, G.A.; Lafluf, M.; Amen, F.G.; Accuosto, P. Interactive projection mapping in heritage: The Anglo case. In Proceedings of the CAAD Futures 2017 Proceedings, Istanbul, Turkey, 12–14 July 2017. [Google Scholar]
Schmidt, S.; Steinicke, F.; Irlitti, A.; Thomas, B.H. Floor-projected guidance cues for collaborative exploration of spatial augmented reality setups. In Proceedings of the ACM ISS 2018 (Interactive Surfaces and Spaces), Tokyo, Japan, 25–28 November 2018. [Google Scholar] [CrossRef]
Caggianese, G.; Gallo, L.; Neroni, P. Evaluation of spatial interaction techniques for virtual heritage applications: A case study of an interactive holographic projection. Future Gener. Comput. Syst. 2018, 81, 516–527. [Google Scholar] [CrossRef]
Nofal, E.; Stevens, R.; Coomans, T.; Vande Moere, A. Communicating the spatiotemporal transformation of architectural heritage via in-situ projection mapping. Digit. Appl. Archaeol. Cult. Herit. 2018, 10, e00083. [Google Scholar] [CrossRef]
Lee, Y.Y.; Lee, J.H.; Ahmed, B.; Son, M.G.; Lee, K.H. A new projection-based exhibition system for a museum. ACM J. Comput. Cult. Herit. 2019, 12, 10. [Google Scholar] [CrossRef]
Duguleană, M.; Carrozzino, M.; Gams, M.; Tanea, I. (Eds.) VR Technologies in Cultural Heritage; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
Slavin, N. Map-Based Storytelling in Spatial Augmented Reality: Projection of Interactive Layers. Master’s thesis, Technical University of Munich, München, Germany, 2020. [Google Scholar]
Leinonen, T.; Brinck, J.; Vartiainen, H.; Sawhney, N. Augmented reality sandboxes. Digit. Creat. 2021, 32, 38–55. [Google Scholar] [CrossRef]
Cisternino, D.; Corchia, L.; De Luca, V.; Gatto, C.; Liaci, S.; Scrivano, L.; Trono, A.; De Paolis, L.T. Augmented reality applications to support the promotion of cultural heritage. J. Comput. Cult. Herit. 2021, 14, 47. [Google Scholar] [CrossRef]
Ioakeim, N.; Printezis, P.; Skarimpas, C.; Koutsabasis, P.; Vosinakis, S. Kirini: An interactive projection-mapping installation. In Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
Nikolakopoulou, V.; Printezis, P.; Maniatis, V.; Kontizas, D.; Vosinakis, S.; Chatzigrigoriou, P.; Koutsabasis, P. Conveying intangible cultural heritage in museums with interactive storytelling and projection mapping: The case of the Mastic Villages. Heritage 2022, 5, 1024–1049. [Google Scholar] [CrossRef]
De Paolis, L.T.; Liaci, S.; Sumerano, G.; De Luca, V. A video mapping performance as an innovative tool. Information 2022, 13, 122. [Google Scholar] [CrossRef]
Trunfio, M.; Lucia, M.D.; Campana, S.; Magnelli, A. Innovating the cultural heritage museum service model. J. Herit. Tour. 2022, 17, 1–19. [Google Scholar] [CrossRef]
Larrieux, E.; Speziali, S. Augmented objects as portals into virtual worlds: Using audio to create immersive experiences in extended realities. In Proceedings of the AudioMostly 2022 (AM ’22), St. Pölten, Austria, 6–9 September 2022. [Google Scholar] [CrossRef]
Suzuki, R.; Karim, A.; Xia, T.; Hedayati, H.; Marquardt, N. Augmented reality and robotics: A survey and taxonomy for AR-enhanced human–robot interaction and robotic interfaces. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’22), New Orleans, LA, USA, 29 April–5 May 2022. [Google Scholar] [CrossRef]
Allal-Chérif, O. Intelligent cathedrals: Using augmented reality, virtual reality, and artificial intelligence. Technol. Forecast. Soc. Chang. 2022, 178, 121604. [Google Scholar] [CrossRef]
Xu, N.; Liang, J.; Shuai, K.; Li, Y.; Yan, J. HeritageSite AR: An exploration game. In Proceedings of the CHI EA ’23, Hamburg, Germany, 23–28 April 2023. [Google Scholar] [CrossRef]
Li, H.; Ito, H. Visitor’s experience evaluation of projection mapping at cultural sites. Herit. Sci. 2023, 11, 52. [Google Scholar] [CrossRef]
De Luca, V.; Barba, M.C.; D’Errico, G.; Nuzzo, B.L.; De Paolis, L.T. A user experience analysis for a mobile MR application. Virtual Real. 2023, 27, 2821–2837. [Google Scholar] [CrossRef]
De Paolis, L.T.; Gatto, C.; Corchia, L.; De Luca, V. Usability, user experience and mental workload in mobile AR for storytelling. Virtual Real. 2023, 27, 1117–1143. [Google Scholar] [CrossRef]
Sari, I.P.; Juhana, A.; Nurhidayatulloh, N. Global trends in projection mapping technology. J. Print Media Technol. Res. 2023, 12, 219–229. [Google Scholar] [CrossRef]
Tzortzi, K.; Fatah gen Schieck, A.; Printezis, P.; Kontogeorgopoulou, E.-M.; Efthymiou, E.; Vourloumi, M.; Maniatis, V. Longue durée: Perceiving heritage through media architecture. In Proceedings of the 6th Media Architecture Biennale Conference, Toronto, ON, Canada, 14–23 June 2023; pp. 119–132. [Google Scholar]
Innocente, C.; Ulrich, L.; Moos, S.; Vezzetti, E. A framework study on the use of immersive XR technologies in cultural heritage. J. Cult. Herit. 2023, 62, 268–284. [Google Scholar] [CrossRef]
Silva, C.; Zagalo, N.; Vairinhos, M. Towards participatory activities with augmented reality for cultural heritage: A literature review. Comput. Educ. X Real. 2023, 3, 100044. [Google Scholar] [CrossRef]
Reaver, K. Augmented reality as a participation tool for youth in urban planning. Front. Virtual Real. 2023, 4, 1055930. [Google Scholar] [CrossRef]
De Luca, V.; Gatto, C.; Liaci, S.; Corchia, L.; Chiarello, S.; Faggiano, F.; Sumerano, G.; De Paolis, L.T. VR and SAR for social inclusion: The “Includiamoci” project. Information 2023, 14, 38. [Google Scholar] [CrossRef]
Ahmed Maqbool, S.; Maxwell, D. Story Seeds: Creating interactive narratives. In Proceedings of the DRS2024, Boston, MA, USA, 23–28 June 2024. [Google Scholar] [CrossRef]
Nikolarakis, A.; Koutsabasis, P. Mobile AR interaction design patterns. Multimodal Technol. Interact. 2024, 8, 52. [Google Scholar] [CrossRef]
Chen, K.; Yang, Q.; Yuan, Q.; Pan, Z. Exploring the impact of bidirectional interactions between VR and SAR on cultural exhibition. In Proceedings of the VRCAI ’24, Nanjing, China, 1–2 December 2024. [Google Scholar] [CrossRef]
Ramtohul, A.; Khedo, K.K. Augmented reality systems in the cultural heritage domains: A qualitative narrative synthesis. Digit. Appl. Archaeol. Cult. Herit. 2024, 32, e00317. [Google Scholar] [CrossRef]
Ceccarelli, S.; Cesta, A.; Cortellessa, G.; De Benedictis, R.; Fracasso, F.; Leopardi, L.; Ligios, L.; Lombardi, E.; Malatesta, S.G.; Oddi, A.; et al. Evaluating visitors’ experience in museum: Comparing artificial intelligence and multi-partitioned analysis. Digit. Appl. Archaeol. Cult. Herit. 2024, 33, e00340. [Google Scholar] [CrossRef]
Wang, H.; Gao, Z.; Zhang, X.; Du, J.; Xu, Y.; Wang, Z. Gamifying cultural heritage: Exploring the potential of immersive virtual exhibitions. Telemat. Inform. Rep. 2024, 15, 100150. [Google Scholar] [CrossRef]
Bollini, L. Space as a narrative interface: Phygital interactive storytelling. In Multidisciplinary Aspects of Design; Springer: Cham, Switzerland, 2024; pp. 613–622. [Google Scholar] [CrossRef]
Shin J-e Kim, H.; Park, H.; Woo, W. Investigating the design of augmented narrative spaces through virtual–real connections: A systematic literature review. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24), Honolulu, HI, USA, 11–16 May 2024. [Google Scholar] [CrossRef]
Weidner, F.; Hartbrich, J.; Arboleda, S.A.; Kunert, C.; Schneiderwind, C.; Diao, C.; Gerhardt, C.; Surdu, T.; Broll, W.; Werner, S.; et al. Eyes on the narrative: Visual realism and audio in AR storytelling. In Proceedings of the ETRA ’24, Glasgow, UK, 4–7 June 2024. [Google Scholar] [CrossRef]
Xu, N.; Li, Y.; Liang, J.; Shuai, K.; Li, Y.; Yan, J.; Zhang, C.; Dong, Y. HeritageSite AR: A mobile augmented reality exploration game. ACM J. Comput. Cult. Herit. 2024, 17, 67. [Google Scholar] [CrossRef]
Baron, A.M.; Gul, L.F. Gesture-driven storytelling in cultural heritage interpretation. In Proceedings of the eCAADe 43: Confluence, Ankara, Türkiye, 1–5 September 2025; Volume 2, pp. 543–550. [Google Scholar]
Muñoz, A.; Climent-Ferrer, J.J.; Martí-Testón, A.; Solanes, J.E.; Gracia, L. Enhancing cultural heritage engagement with multisensory XR. Electronics 2025, 14, 2039. [Google Scholar] [CrossRef]
Li, H.; Li, B.; Ito, H.; Zhang, T. Architectural influence on narrative content in projection mapping. npj Herit. Sci. 2025, 13, 509. [Google Scholar] [CrossRef]
Li, Y.; Wang, S.; Sun, X.; Yang, L.; Zhu, T.; Chen, Y.; Zhao, K.; Zhao, Y.; Li, M.; Lc, R. Reality as imagined: TeleAbsence-driven extended reality experience. Int. J. Hum.-Comput. Interact. 2025, in press. [Google Scholar] [CrossRef]
Ruan, C.; Qiu, S.; Yao, H. Enhancing cultural sustainability in ethnographic museums. Sustainability 2025, 17, 6915. [Google Scholar] [CrossRef]
Tang, X.; Situ, J.; Cui, A.Y.; Wu, M.; Huang, Y. LLM integration in extended reality: A comprehensive review of current trends, challenges, and future perspectives. In Proceedings of the CHI ’25, Yokohama, Japan, 26 April–1 May 2025. [Google Scholar] [CrossRef]
Chatsiopoulou, A.; Michailidis, P.D. Augmented reality in cultural heritage: A narrative review of design, development and evaluation approaches. Heritage 2025, 8, 421. [Google Scholar] [CrossRef]
Shawash, J.; Thibault, M.; Hamari, J. Who killed Helene Pumpulivaara?: AI-assisted content creation and XR implementation for interactive built heritage storytelling. In Proceedings of the IMX ’25, Niterói, Brazil, 3–6 June 2025. [Google Scholar] [CrossRef]
Pavavimol, T.; Ometov, A.; Valkama, M.; Thibault, M. Transitions between realities: A qualitative narrative synthesis on the usage of XR systems for bridging reality and virtuality. In Proceedings of the IMX ’25, Niterói, Brazil, 3–6 June 2025. [Google Scholar] [CrossRef]
Nordin, S.A.; Din, S.C. A study on components of interactive projection mapping. Idealogy J. 2025, 10, 493–504. [Google Scholar] [CrossRef]
Shi, T.; Chen, Y.; Wang, Z. Designing 3D mapping projections. Cogent Arts Humanit. 2025, 12, 2492426. [Google Scholar] [CrossRef]
Tan, Y.Y.; Wang, Y.; Samah, A.S.A.; Jibin, S. Digital storytelling and cultural learning through projection mapping. Int. J. Creat. Multimed. 2025, 6, 49–64. [Google Scholar] [CrossRef]
Nikolakopoulou, V.; Koutsabasis, P. The ‘Making’ of Participatory and Co-Design for Digital Experiences in Cultural Heritage: A Review. CoDesign 2025, 21, 876–902. [Google Scholar] [CrossRef]
Albert, B.; Tullis, T. Measuring the User Experience: Collecting, Analyzing, and Presenting UX Metrics; Morgan Kaufmann: Burlington, MA, USA, 2022. [Google Scholar]

Figure 1. Comprehensive ecosystem map of a Spatial Augmented Reality (SAR) system.

Figure 2. Three primary spatial scales of Spatial Augmented Reality (SAR) in cultural contexts.

Figure 3. Mapping narrative types to spatial scales in Spatial Augmented Reality (SAR).

Figure 4. Distribution of physical contexts and scales of deployment across the corpus.

Figure 5. Sketch-based illustration of interaction modalities in Spatial Augmented Reality (SAR).

Figure 6. Interaction modalities in Spatial Augmented Reality (SAR) for cultural settings.

Figure 7. Distribution of interaction modalities across the SAR literature corpus.

Figure 8. Timeline of key developments and indicative installations in Spatial Augmented Reality (SAR) for cultural heritage.

Figure 9. The Longue durée media-architecture installation is an example of the “Historized Outdoor Spaces” pattern in civic-scale spatial augmented reality [26]. (a) Part of the projected narration from Longue durée; (b) Longue durée with visitors.

Figure 10. Projection-mapped tabletop maquette from the Mastic Villages installation at the Mastic Museum of Chios, illustrating the “Transplanted Tabletop Storyworlds” pattern [15]. (a) The projected 3D model of the Mastic Villages with visitors; (b) The projected 3D model of the Mastic Villages.

Figure 11. Room-scale SAR scenography from the Kirini installation, illustrating the “Enhanced Indoor Rooms” narrative pattern [14]. (a) A sketch preview of the Kirini installation; (b) One of the interactions during the Kirini prototype.

Figure 12. Patterns of narration in spatial augmented reality.

Table 1. Listing of paper corpus per year of publication.

Year of Publication	#	%	Citations
2014	1	1.9%	Kenderdine et al., 2014 [2]
2016	1	1.9%	Kim et al., 2016 [3]
2017	2	3.8%	Zhang et al., 2017 [4], Barber et al., 2017 [5]
2018	3	5.8%	Schmidt et al., 2018 [6], Cagiannese et al., 2018 [7], Nofal et al., 2018 [8]
2019	2	3.8%	Lee et al., 2019 [9], Duiguleana et al., 2019 [10]
2020	1	1.9%	Slavin 2020 [11]
2021	3	5.8%	Leinonen et al., 2021 [12], Cisternino et al., 2021 [13], Printezis et al., 2021 [14]
2022	6	11.5%	Nikolakopoulou et al., 2022 [15], De Paolis et al., 2022 [16], Trunfio et al., 2022 [17], Larrieux et al., 2022 [18], Suzuki et al., 2022 [19], Allal-Cherif et al., 2022 [20]
2023	10	19.2%	Xu et al., 2023 [21], Li et al., 2023 [22], De Luca et al., 2023 [23], De Paolis et al., 2023 [24], Sari et al., 2023 [25], Tzortzi et al., 2023 [26], Innocente et al., 2023 [27], Silva et al., 2023 [28], Reaver 2023 [29], De Luca et al., 2023 [30]
2024	11	21.2%	MaqBool et al., 2024 [31], Nikolarakis and Koutsabasis 2024 [32], Chen et al., 2024 [33], Ramtohul et al., 2024 [34], Ceccarelli et al., 2024 [35], Wang et al., 2024 [36], Bollini et al., 2024 [37], Shin et al., 2024 [38], Weidner et al., 2024 [39], Xu et al., 2024 [40], Baron and Gul 2024 [41]
2025	12	23.1%	Munoz et al., 2025 [42], Li et al., 2025 [43], Li et al., 2025 [44], Ruan et al., 2025 [45], Tang et al., 2025 [46], Chatsiopoulou et al., 2025 [47], Shawash et al., 2025 [48], Pavavimol et al., 2025 [49], Nordin et al., 2025 [50], Shi et al., 2025 [51], Tan et al., 2025 [52], Nikolakopoulou et al., 2025 [53]
Total	52	100.0%

Table 2. Paper corpus per venue of publication.

Journals	33	63.5%
Journal on Computing and Cultural Heritage (ACM)	4	7.7%
Virtual Reality (Springer)	3	5.8%
Digital Applications in Archaeology and Cultural Heritage (Elsevier)	3	5.8%
Heritage (MDPI)	2	3.8%
Information (MDPI)	2	3.8%
Other journals (1 occurrence)	16	30.8%
Conferences	19	36.5%
CHI Conference on Human Factors in Computing Systems (ACM)	4	7.7%
IMX ’25 (ACM International Conference on Interactive Media Experiences)	2	3.8%
Other conferences (1 occurrence)	13	25.0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Printezis, P.; Koutsabasis, P. Spatial Augmented Reality Storytelling in Arts and Culture: A Critical Review from an Interaction Design Perspective. Heritage 2026, 9, 20. https://doi.org/10.3390/heritage9010020

AMA Style

Printezis P, Koutsabasis P. Spatial Augmented Reality Storytelling in Arts and Culture: A Critical Review from an Interaction Design Perspective. Heritage. 2026; 9(1):20. https://doi.org/10.3390/heritage9010020

Chicago/Turabian Style

Printezis, Petros, and Panayiotis Koutsabasis. 2026. "Spatial Augmented Reality Storytelling in Arts and Culture: A Critical Review from an Interaction Design Perspective" Heritage 9, no. 1: 20. https://doi.org/10.3390/heritage9010020

APA Style

Printezis, P., & Koutsabasis, P. (2026). Spatial Augmented Reality Storytelling in Arts and Culture: A Critical Review from an Interaction Design Perspective. Heritage, 9(1), 20. https://doi.org/10.3390/heritage9010020

Article Menu

Spatial Augmented Reality Storytelling in Arts and Culture: A Critical Review from an Interaction Design Perspective

Abstract

1. Introduction

1.1. Scope and Working Definitions

1.2. Methodology of the Review

2. Reporting

2.1. Technology Landscape

2.1.1. Projection

2.1.2. Sensing and Tracking

2.1.3. Calibration and Registration

2.1.4. Authoring and Playback

2.1.5. Experience and Constraints

2.2. Physical Contexts and Scales of Deployment

2.3. Interaction Modalities

2.4. Audiences

2.5. Evaluation Methods and Evidence

3. Reflection

3.1. Historic Context and Trajectory (2010–2025)

3.2. SAR Narrative Design Patterns

3.3. Inclusion and Accessibility

3.4. Design Tensions, Practical Constraints, and Recommendations

4. Discussion

4.1. Physical Scale, Narrative Form, and Spatial Affordances

4.2. Interaction Modalities and Preferred Engagements

4.3. Evaluation Practices, Evidence Gathered, and Existing Gaps

4.4. Contradictions and Open Tensions

4.5. Toward a Design and Research Agenda for Cultural SAR

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI