Next Article in Journal
Augmented Reality’s Impact on Student Creativity in Design and Technology: An Immersive Learning Study
Previous Article in Journal
Behavioral Engagement in VR-Based Sign Language Learning: Visual Attention as a Predictor of Performance and Temporal Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

From the Reality–Virtuality Continuum to the XR Ecosystem: A Systematic Literature Review of Definitions and Conceptual Models

Department of Computing, University of Turku, 20014 Turku, Finland
*
Author to whom correspondence should be addressed.
Multimodal Technol. Interact. 2026, 10(3), 24; https://doi.org/10.3390/mti10030024
Submission received: 15 December 2025 / Revised: 16 January 2026 / Accepted: 28 February 2026 / Published: 2 March 2026

Abstract

Extended Reality (XR) technologies are rapidly reshaping human–computer interaction; however, persistent ambiguity in the use of core terms (VR, AR, MR) hampers cumulative knowledge building, cross-study comparability, and technical standardisation. This review evaluates the XR conceptual landscape across four primary dimensions: the historical evolution of core definitions, the synthesis of contemporary theoretical frameworks, the critical extensions of the Reality-Virtuality (RV) Continuum, and the alignment between academic taxonomies and industry practices. This review evaluates the XR conceptual landscape across four primary dimensions: the historical evolution of core definitions, the synthesis of contemporary theoretical frameworks, the critical extensions of the Reality-Virtuality (RV) Continuum, and the alignment between academic taxonomies and industry practices. To address this issue, we conducted a PRISMA-guided systematic literature review across four major databases (IEEE Xplore, ACM Digital Library, Scopus, and Web of Science), complemented by seminal and industry sources. Of the 173,677 retrieved records, 59 studies were included in the synthesis. Using thematic synthesis, we mapped the historical evolution of definitions and conceptual models and identified recurring analytical dimensions. The results indicate a clear paradigm shift from Milgram’s one-dimensional Reality–Virtuality continuum—originally grounded in visual display technology—towards a multidimensional conceptual space that integrates subjective user-experience constructs (e.g., coherence and plausibility) with objective system characteristics. The included studies cover 1968–2025, with marked acceleration in the 2020s: 2022 alone accounts for the highest annual count (9 studies), and nearly half of the corpus (47.5%) was published in 2021–2025. We further show that industry actors pragmatically re-bound these academic concepts for product and market positioning, leading to systematic divergences between academic and industrial definitions. By distilling key turning points and synthesising core analytical dimensions into a structured lens, this review provides a historically grounded, actionable understanding of the XR conceptual landscape to support terminological alignment across research and practice.

1. Introduction

With the rise of the Metaverse and the rapid development of immersive technologies, Extended Reality (XR)—a collective taxonomic framework that encompasses Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR)—is rapidly penetrating everything from industrial design and medical training to everyday social interaction and entertainment [1]. The rapid expansion of XR research and products has amplified the need for consistent terminology and conceptual clarity across studies and application domains. However, the field’s boom has been accompanied by a fundamental, long-standing challenge: ambiguity and confusion over the use of core terms. What exactly is “mixed reality”? How is it fundamentally different from the increasingly powerful “augmented reality”? What is the gap between the theoretical frameworks of academia and the market-driven definitions of industry (e.g., Meta [2], Varjo [3])? This terminological fog is a serious obstacle to effective academic exchange, technical standardisation and interdisciplinary collaboration. A comprehensive review by Dargan et al. synthesises AR’s evolution from VR [4], detailing its types, hardware/software requirements, benefits, limitations, and applications across fields such as education and healthcare, with a year-by-year tabular overview of research [5]. This confusion is not confined to experts; empirical research reveals that the general public’s understanding of what constitutes “Augmented Reality” varies significantly and often diverges from technical definitions. For instance, non-expert users often describe AR as the mere coexistence of real and virtual components, whereas expert definitions emphasise the seamless synthesis and contextual integration of virtual elements into the physical world [6]. This misalignment between expert terminology and public mental models further complicates the adoption and effective communication of XR technologies.
In response to the persistent terminological fog, this paper provides a comprehensive historical and conceptual review of XR-related definitions and taxonomies (e.g., [7]), conducted through a systematic examination of key literature. Despite the growing body of XR survey and review literature, prior work often emphasises technology trends, hardware/software requirements, or domain applications, while offering limited integrative synthesis of how core terms (VR/AR/MR) and their taxonomies evolved over time, how the foundational RV Continuum has been critiqued and extended, and how academic frameworks align with (or diverge from) market-driven industry definitions. This leaves researchers and practitioners without a historically grounded and analytically consistent lens for comparing conceptual models and resolving terminological ambiguity. The core argument of this paper is that XR terminology and classification have shifted from a technology-centric, one-dimensional “Reality–Virtuality (RV) Continuum” [8], to a multidimensional analytical space centred on user experience and system attributes. Concurrently, the industry has reshaped these concepts for product and market strategy purposes. To bridge the identified gap between technology-centric models and user-centric experiences, this study is framed around the following four research questions:
  • RQ1: How have the definitions of Mixed Reality (MR), Augmented Reality (AR), and Virtual Reality (VR) evolved since their inception?
  • RQ2: What are the key theoretical frameworks and taxonomies that have been proposed to classify these realities?
  • RQ3: What are the primary critiques and extensions of the foundational Reality-Virtuality (RV) Continuum model?
  • RQ4: What are the key differences and similarities between the taxonomies used in academic research and the definitions employed by industry leaders?
To systematically address these research questions, we conducted a systematic literature review and synthesised several key multidimensional analytical dimensions. Methodologically, we address these questions through a PRISMA-guided systematic literature review across four major academic databases (IEEE Xplore, ACM Digital Library, Scopus, and Web of Science), complemented by seminal works and representative industry materials. We then apply thematic synthesis to extract definitions and conceptual models, code recurring analytical dimensions, and conduct a structured cross-study comparison. Specifically, RQ1 and RQ2 are answered via chronological mapping and comparative coding of definitions, taxonomies, and frameworks; RQ3 is addressed by synthesising major critiques and extensions of the RV Continuum; and RQ4 is addressed through a targeted comparison between academic taxonomies and industry definitions. The main contribution of this paper is a systematic synthesis of these analytical dimensions, providing researchers and practitioners in the field with a clearer, more historically comprehensive understanding of the XR landscape and effectively addressing the prevailing terminological ambiguity.
To systematically unpack these dimensions, the subsequent sections are structured to transition from methodological foundations to theoretical and practical syntheses. Section 2 details the methodology employed (Materials and Methods), including the systematic literature review process and thematic synthesis approach. Section 3 introduces the foundational framework of MR, specifically Milgram’s Reality–Virtuality (RV) Continuum. Section 4 reviews the theoretical evolution, detailing major extensions, criticisms, and alternative taxonomies that have contributed to the conceptual development of MR. Section 5 explores contemporary academic perspectives, focusing on multidimensional frameworks and user-centric concepts. Section 6 contrasts academic interpretations with industry definitions by examining device capabilities, product strategies, and ecosystem boundaries. Finally, Section 7 discusses the paradigm shift. Section 8 concludes the paper and outlines implications for future research.

2. Materials and Methods

This paper employs a rigorous systematic literature review (SLR) methodology to trace and synthesise the evolution of definitions and taxonomies within the field of eXtended Reality (XR) [9,10]. The review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. An SLR approach was selected to ensure the comprehensiveness, transparency, and replicability of the literature collection process, following established guidelines. The current protocol is registered with the Open Science Framework at https://osf.io/37py2 (accessed on 14 December 2025). Following data extraction, we synthesised definitions and identified new conceptual dimensions using Thematic Synthesis [11]. The method provides a robust approach for integrating findings across diverse qualitative literature.
To address the core research questions (RQs) formulated in the Introduction, we derived a set of analytical dimensions from the selected literature. These dimensions represent recurring conceptual patterns across definitions, taxonomies, and theoretical frameworks related to MR. Rather than functioning as independent contributions, the dimensions provide a structured lens through which each research question can be systematically answered.
In other words, the dimensions organise the findings, while the research questions guide their interpretation in the Discussion and Conclusion. The review process was conducted in four main phases: (1) Defining Research Questions; (2) Literature Search and Selection; (3) Thematic Synthesis; (4) Final Inclusion and Reporting.

Literature Search and Selection

The initial literature search covered four databases, yielding 173,677 records (see Table 1). The selection of search terms was rigorously guided by the four core research dimensions defined in the methodology, ensuring a comprehensive, structured, systematic literature review. These keywords were essential for tracing the conceptual evolution of XR terminology and ensuring coverage of foundational, theoretical, user-centric, and critical perspectives.
See Table 1 for search terms and database counts.
For Foundational Taxonomies and Core Definitions: We utilised terms such as “Framework”, “Taxonomy”, “Model”, “Definition”, “Survey”, and “Review”, combined with “Mixed Reality”, “Augmented Reality”, or “Virtual Reality”. This combination was crucial for identifying the seminal and contemporary literature focused on classifying and structuring the field, and on addressing the criteria of Dimensions I and II (RQ1, 2).
For the User-Centric Paradigm Shift, the following terms were selected: “Plausibility”, “Presence”, “Coherence”, and “Fidelity”. These terms explicitly capture the shift in classification thinking from purely technical attributes to the subjective user experience (Dimension III, RQ3). This process enabled the identification of literature ranging from the earliest definitions of AR, MR, and VR to the most recent conceptual frameworks.
For Historical Critique and Expansion: We specifically included “Reality-Virtuality Continuum” and “Mediated Reality.” These were necessary to anchor the review in the original Milgram framework and to capture the major theoretical critiques and expansions, such as Mann’s work (Dimension IV, RQ4) [12]. To optimise the dataset and focus on core conceptual studies, a multi-stage screening process was implemented. The first stage removed duplicate records and non-relevant literature, reducing the total to 89,773 records (as shown in Figure 1). The initial “removal of unrelated fields” screening excluded records that were clearly outside the scope of eXtended Reality (XR) terminology, definition, or taxonomy. This included literature from distant application domains, non-academic document types, or those focused purely on tangential technical implementations without addressing the core conceptual frameworks of VR, AR, or MR. Given the still substantial volume, a secondary refined search was conducted: records were required to include the aforementioned keywords in their titles or abstracts. This stringent screening reduced the pool of full-text abstracts to 1216 records. Subsequently, eligibility screening was conducted based on explicit inclusion and exclusion criteria, detailed in the newly added Table 2. Final inclusion strictly adhered to the four core criteria detailed in Table 2. Figure 1 illustrates the complete workflow of literature identification, screening, and eligibility assessment compliant with systematic review methodology.
To ensure reproducibility, we operationalised exclusion criteria at different screening stages. For transparency, Table 3 summarises these exclusion criteria.
In addition to database searches, backwards and forward snowballing were conducted by reviewing the reference lists and citation networks of the included studies to identify additional relevant literature.
See Table 4 for the full list of included studies with p1, p2, and more.

3. Descriptive Overview of the Included Corpus

Descriptive statistics (e.g., publication-year distribution) were computed from the metadata of the included studies, and listed in Table 4.
Figure 2 summarises the publication-year distribution of the included corpus (n = 59), spanning 1968–2025. The literature is strongly concentrated in recent years: 34 of 59 studies (57.6%) were published between 2020 and 2025, and 42 (71.2%) appeared in the last decade (2016–2025). The annual count peaks in 2022 (n = 9). This concentration indicates that definitional and conceptual work around XR terminology has accelerated markedly in the 2020s, motivating the need for structured synthesis and cross-framework comparison.
To characterise the included corpus beyond publication years, we coded each study into a single primary study type based on its dominant contribution (review/survey, theory/taxonomy, empirical/experimental, or system/prototype). The per-study coding is reported in Table 4 and summarised in Figure 3.
As shown in Figure 3, the included corpus is dominated by secondary syntheses and conceptual contributions (Reviews/Surveys: 22/59, 37.3%; Theory/Taxonomy: 21/59, 35.6%), with fewer empirical studies (10/59, 16.9%) and system/prototype papers (6/59, 10.2%). This distribution is consistent with the review’s focus on definitions and conceptual models, which are most frequently articulated through conceptual/taxonomic work and secondary syntheses. Importantly, the smaller share of empirical studies does not imply a lack of XR empirical research broadly; rather, it indicates that empirical work often operationalises terms without explicitly formalising definitional boundary criteria, which motivates the dimension-based synthesis developed in the following sections.

4. Foundational Framework

This foundational chapter establishes the starting point for the conceptual evolution of XR, primarily addressing RQ1 (the evolution of definitions) by tracing the emergence and evolution of the definitions of MR/AR/VR. It also contributes to RQ2 (key theoretical frameworks) by outlining the earliest classification framework, Milgram’s Reality-Virtuality Continuum. The original classification in this section is defined by the Foundational Dimension of the review (Dimension I in Table 2).

4.1. Historical Precursors and the Proposal of the Continuum

Milgram and Kishino published the seminal work “A Taxonomy of Mixed Reality Visual Displays” in 1994, which introduced both the concept of the Reality-Virtuality (RV) Continuum spectrum and the term Mixed Reality (MR) [8]. This work is considered a seminal contribution in the field and has been cited thousands of times (as noted by later analyses/reviews such as [33,41]). In 1995, another related paper by Milgram et al. entitled “Augmented reality: a class of displays on the reality-virtual reality continuum” also received thousands of citations [16]. Of these, “A Taxonomy of Mixed Reality Visual Displays” focuses more on building an overall categorisation framework, while “Augmented Reality: A class of displays… ” delves more deeply into the specific positioning and realisation of AR within this framework. It is worth noting that the initial version of this continuum was explicitly focused on visual displays, meaning it was primarily concerned with the ability of display technologies to fuse real and virtual content. The reality-virtuality continuum proposed by Milgram et al. in 1994 did not emerge from nowhere but builds on decades of exploration of human-computer interaction and immersive experiences [8]. This can be traced back to computer graphics pioneer Ivan Sutherland’s conception of the “Ultimate Display” in 1965 [65] and his development of the first Head-Mounted Display (HMD) in 1968 [13], which fused synthetic graphics with real-world views. These early works laid the technical and ideological foundations for all subsequent discussions of “mixed reality”.
The ends of the Reality-Virtuality (RV) Continuum proposed by Milgram and Kishino are the Real Environment (RE), which “consists only of real objects”, and the Virtual Environment (VE), which “consists only of virtual objects”. Any environment in the middle of this continuum that contains a mix of real and virtual objects is considered Mixed Reality (MR) (as shown in Figure 4). Within this continuum, there are two key subcategories:
  • Augmented Reality (AR): A MR environment is said to be Augmented Reality (AR) if it augments the real world with virtual content (computer graphics). This means the environment is primarily physical, with the addition of virtual elements.
  • Augmented Virtuality (AV): if the majority of the content is virtual but contains some perception or incorporation of real-world objects, it is called Augmented Virtuality.
    The RV Continuum has been widely adopted for classifying and designing specific XR applications. For instance, in the domain of education, Panchenko et al. (2020) proposed a facet classification for augmented reality books, in which the first facet is the reality-virtuality continuum, categorising books as Virtual Book, Mixed Book, Augmented Book, and Reality Book [39]. This demonstrates how the continuum provides a foundational framework for structuring and understanding the design space of XR applications, bridging high-level theory with practical implementation.

4.2. Original Definition of Mixed Reality

Milgram and his colleagues originally defined a Mixed Reality (MR) environment as one in which real-world and virtual-world objects are presented together within a single display [8]. This means that MR encompasses both AR and AV and spans the spectrum between purely real and purely virtual environments [8,12,33,40]. This integration can be achieved in several ways:
  • Virtual object superimposition: Virtual objects are superimposed onto the real world through optical-see-through or video passthrough display technology [12,41].
  • Real-world integration: Real-world content can also be integrated into the virtual world by embedding live video streams or integrating tracked haptic objects into the virtual experience [41]. This concept of mixing real and virtual worlds is widely used in industries such as broadcasting, entertainment, audio/video and computer graphics [12,41].

4.3. Historic Significance

The Reality-Virtuality Continuum has been used as a seminal framework for VR and AR research and development since its inception. It is also known as one of the landmark works in the field due to its far-reaching impact. It provides researchers with a common language and taxonomy to distinguish and discuss different types of immersive experiences [33,41]. However, despite the widespread acceptance and application of the model, researchers have largely ignored its accompanying taxonomy over the following 25 years as AR and VR technologies have improved in quality, cost, and popularity (for example, Pokémon GO on mobile phones or head-mounted displays such as the Meta Oculus, HTC Vive, etc.) [41]. The model also suffers from several limitations:
  • Monosensory: The initial version focused solely on visual displays, failing to adequately account for the user’s other senses.
  • Lack of user experience: the influence of the observer’s or prior life experience on the experience is not explicitly considered.
  • Insufficient consideration of consistency: The content is only relevant to realism and does not take into account the consistency of the overall experience.
It is because of these rapid advances in technology and the limitations of the original models that Skarbez, Smith and Whitton point out that it is time to revisit these core concepts [41]. They propose that the RV Continuum is actually discontinuous (arguing that perfect VR is unattainable and that MR is broader than previously thought, even including traditional VR experiences). Furthermore, Speicher’s survey [33] also revealed that although the RV Continuum is the most popular source of MR concepts, it is far from being a universal definition, reflecting the fragmentation of the MR field. While acknowledging Milgram’s seminal contribution, these follow-up studies have sought to update and extend his framework to accommodate evolving technological and user-experience needs.
The limitations of the original RV Continuum are not merely theoretical concerns but have manifested as significant empirical challenges within the broader XR research ecosystem. A large-scale tertiary review by Becker and Freitas, which analysed 81 systematic reviews across diverse application domains, provides compelling evidence for this fragmentation [52]. Their study found a striking lack of consistency in how XR technologies are defined and classified in the literature. A significant number of reviews omitted explicit definitions altogether, and among those that provided them, considerable variation was observed. While foundational works like [8,17] were the most frequently cited, their frameworks were often not systematically applied to categorise or describe the primary studies under review [52]. This empirical evidence underscores the real-world consequences of the model’s limitations: the absence of a clear, universally applied taxonomy severely impedes the comparison, synthesis, and cumulative progress of research across different fields. This widespread terminological ambiguity, empirically documented at the macro level, powerfully validates the necessity of subsequent theoretical revisions and multidimensional extensions to the foundational continuum that the field has pursued.

5. Theoretical Evolution

As mentioned earlier, although Milgram’s RV Continuum has served as a cornerstone for the field’s development, it inevitably carries historical limitations. His chapter systematically reviews the first major wave of academic revisions that challenged these limitations, thereby directly addressing RQ1 (the evolution of definitions), RQ2 (key theoretical frameworks), and primarily RQ3 (critiques and extensions of the RV Continuum). These developments have led scholars to revisit, critique, and extend this classic model from different perspectives, aiming to construct a theoretical framework that better reflects contemporary technological realities, user experience, and multisensory integration. Notably, this evolution includes Skarbez et al.’s arguments for discontinuity and the introduction of Coherence [41], as well as Mann’s broader Mediated Reality framework [12]. The 2022 xReality framework by Rauschnabel et al. classifies Augmented Reality (AR) and Virtual Reality (VR) using distinct criteria based on expert opinion [7]. AR is classified using a continuum focused on the user’s experience within their local environment (local presence). In contrast, VR is classified using a continuum focused on the degree of immersion and the sense of being elsewhere (telepresence).

5.1. Beyond a Single Visual Dimension

While Milgram and Kishino’s original RV Continuum provided the field’s foundational topography, contemporary research challenges its core assumption of a linear, continuous progression driven by visual display technology. The original model implies that as Reproduction Fidelity (RF) increases [22], experience smoothly transitions from AR to a perfect Virtual Reality. However, Skarbez, Smith, and Whitton (2021) offer a critical correction to this view, arguing that the continuum is fundamentally discontinuous [29]. They contend that “perfect VR” is theoretically unattainable with current paradigms because displays can only manipulate exteroceptive senses (vision, audio) but cannot control interoceptive senses (vestibular, proprioception). This sensory mismatch creates a “discontinuity” at the extreme right of the spectrum—a gap between technologically mediated reality and the “Matrix-like” neural stimulation required for total immersion [41,54]. This theoretical rupture necessitates a shift in analytical dimensions from hardware capabilities to perceptual logic. As Skarbez et al. note, traditional metrics such as the Extent of World Knowledge (EWK) and RF focus on the display’s technical capacity [29]. In contrast, the modern framework introduces Coherence—defined as the extent to which the scenario adheres to internal physical or contextual validity—as a distinct dimension from visual realism. This distinction is empirically supported by Brübach et al. [47], who demonstrated that breaks in plausibility (e.g., gravity-defying objects) function independently of breaks in presence, confirming the multi-layer nature of XR perception.
Beyond purely visual parameters, the “visual-centric” limitation of Milgram’s model has been addressed by integrating sensorimotor dimensions. Vatavu (2022) expanded the theoretical scope by proposing “Sensorimotor Realities (SRs),” a framework that moves beyond visual superimposition to include the mediation of human motor abilities and sensory inputs [5]. This synthesis of Skarbez’s coherence-driven discontinuity and Vatavu’s sensorimotor mediation indicates a consensus in modern scholarship: the definition of XR has evolved from a linear spectrum of display technologies to a multi-dimensional space of perceptual validity.

5.2. A Broader Framework—Mediated Reality

While Milgram’s continuum focuses on the additive mixing of virtual objects into real scenes (or vice versa), Steve Mann’s theory of Mediated Reality offers a competing ontological framework based on modulation. Mann argues that the RV Continuum is essentially a subset of a larger “Reality, Virtuality, Mediality” space (Figure 5 and Figure 6) [12].
The critical distinction lies in the concept of Diminished Reality (DR). Milgram’s model serves well to describe Augmented Reality (AR)—superimposing information—but struggles to account for the subtractive modification of perception. Mann’s framework explicitly addresses this by treating XR as a filter that can not only add signal but also block or “diminish” reality (e.g., the HOLZER system for obliterating visual noise like billboards) [66]. This posits that the core function of XR is not merely mixing worlds but computing perception.
This framework has recently been extended to incorporate social and ontological dimensions. Building on Mann’s tripartite axis of “Physical, Virtual, and Social (as shown in Figure 7),” Gittens proposed the Dyadic-XV Taxonomy [58]. Unlike previous models that classify technologies, this taxonomy classifies beings (e.g., Digital Beings, NPCs, Avatars) and their social agency within the mediated space. This evolution from Mann’s perceptual modulation to Gittens’ social ontology highlights a significant expansion of the field: XR is no longer viewed solely as a “viewing” technology (as defined by Milgram) but as a “living” environment defined by sociality and interactivity. Thus, Mediated Reality provides a more robust theoretical container for emerging Metaverse concepts than the traditional linear continuum.

5.3. Solidifying a Key Point on the Continuum

While the theoretical framework of MR is evolving, so is the precise definition of key nodes on the continuum, especially the important concept of AR. Although Milgram [8] introduced the concept of AR in their continuum and defined it as a situation in which the real world is augmented with virtual (computer-generated) objects, subsequent research has further refined it. Steven Feiner’s KARMA (Knowledge-based Augmented Reality for Maintenance Assistance) system, developed in 1993 [15], provided an early and critical example of Augmented Reality in practice. The KARMA system utilises a head-mounted perspective display to superimpose computer-generated graphical information onto the user’s real-world field of view, aiding maintenance and repair tasks. The system requires that graphics be accurately superimposed on nearby real objects, directly validating AR’s need for accurate 3D registration between the virtual and real worlds. By superimposing additional information on real objects, the KARMA system demonstrates how AR can augment the real world and hints at the possibility of real-time interaction between the user and this augmented information. Building on such systems, Azuma [17] provided a widely accepted and more precise theoretical definition of AR in his landmark 1997 review, A survey of Augmented Reality. He views AR as a variant of virtual environments (VE), but unlike fully immersive VE, AR allows users to see the real world while virtual objects are superimposed on it. Azuma’s definition implicitly encompasses three classic principles of AR, and these guidelines have laid the foundation for subsequent AR research:
1.
Combination of Real and Virtual Worlds: AR can blend virtual objects (often computer-generated) with real environments.
2.
Real-time Interaction: Users are able to interact with superimposed virtual information in real time. Although not directly listed in Azuma’s review, he discusses the construction of AR systems in his article, including the biggest challenges facing real-time systems, such as “registration and sensing.” Earlier studies he cites (e.g., [15]) also emphasise the system’s real-time and interactive nature.
3.
3D Registration: Virtual objects are precisely aligned with the real environment in three dimensions, making it appear as if they are coexisting in the same space, Azuma specifies that ideally, virtual and real objects would appear in the same space, for example, a virtual light fixture covering a real table, and a real table covering part of two virtual chairs, and lists “registration” as one of the biggest problems in building an AR system.
This body of work gave Augmented Reality a clear academic identity as a solid node on the RV continuum that could be measured and realised, providing a solid foundation for all subsequent AR research and applications and continuing to play a guiding role in the development of the technology [12,17,33,54,56].
Meanwhile, concrete AR application cases have begun to emerge, validating these theoretical principles and putting them into practice. For instance, the miBook system developed by Dias et al. serves as a prime example of AR applied to enhance learning [25]. By overlaying 3D models onto physical books, this system perfectly embodies Azuma’s concepts of blending virtual and real elements with 3D registration. Its successful implementation as an educational tool not only solidifies AR’s pivotal position within the reality-virtuality continuum but also foreshadows the immense potential for XR technology to deeply penetrate vertical industries.

5.4. Mathematical Formalisation of Multiple Reality-Virtuality Continua

The theoretical expansion of the RV Continuum has recently culminated in systematic attempts to mathematically formalise the combination of multiple realities. Pamparau conducted a systematic review of 50 theoretical contributions related to Milgram’s RVC, categorising them into three types: extensions (e.g., Skarbez et al.’s discontinuity argument), integrations (e.g., Mann’s Mediated Reality), and analogies [51]. Building on this synthesis, the author proposed a mathematical formalisation for combining multiple RV Continua, expressed as F i :
R V 1 R V 2 R V i R V
where the operation ‘∘’ represents the combination of different reality experiences [51]. This framework enables the conceptualisation of complex XR systems that integrate multiple reality dimensions simultaneously. Furthermore, Pamparau introduced the concept of an “XR transition protocol,” defined as a set of engineering details employed by XR transitional interfaces, which provides the technical foundation for seamless transitions between different reality states [51]. This mathematical formalisation represents a significant step toward operationalising the theoretical constructs of the RV Continuum, bridging the gap between conceptual frameworks and practical implementation.

5.5. Cross-Reality as an Application Framework for the RV Continuum

The theoretical expansions of the RV Continuum are not merely academic exercises; they provide a foundational framework for solving concrete, practical challenges in data analysis. A compelling example is found in the work of Gall et al., who address the problem of “visualisation mirages”, cognitive biases and misinterpretations arising from uncertainties introduced by visualisation techniques themselves [54]. Their proposed cross-reality system operationalises the RV Continuum by integrating a 2D desktop environment for precise overview tasks with an AR head-mounted display for immersive, 3D exploration of uncertainty. In this setup, users can “pull” a 2D distribution chart from the screen into the AR space, where it expands into a 3D curve boxplot that visually encodes the range of possible representations under different parameters [54]. This approach directly leverages the distinct strengths of different points on the continuum: the precision and familiarity of a 2D display, combined with the spatial, natural interaction capabilities of AR, to reveal hidden dimensions of the data. This case exemplifies how the multidimensional understanding of XR, moving beyond Milgram’s original visual-display focus, enables the creation of complementary interfaces that enhance analytical rigour without disrupting the user’s workflow.

6. Contemporary Academic Perspectives

This chapter provides a systematic answer to RQ2 (key theoretical frameworks) and RQ3 (critiques and extensions of the RV Continuum). As the 21st century enters its second decade, contemporary scholars have begun to propose more comprehensive and refined categorisation frameworks as XR technology continues to mature. These new frameworks address the limitations of the original RV Continuum by incorporating complex factors such as user subjective experience (Dimension III in Table 2) and objective system characteristics (Dimension II in Table 2). We review this transition by examining key multidimensional frameworks and user-centric theories.

6.1. Speicher’s Multi-Dimensional Descriptive Framework

Speicher and others were motivated by a central problem: even within the community of experts in the AR/VR field, there is significant fragmentation and inconsistency in the definition of “mixed reality” [33]. Through expert interviews and a literature review, they found that although Milgram’s RV Continuum is widely cited, it focuses primarily on visual displays and does not fully encompass emerging multi-user or multi-environment MR experiences. This definitional ambiguity has led to increasingly difficult discussions in the MR field. To address this issue, Speicher and colleagues propose a seven-dimensional conceptual framework that aims to describe and taxonomise MR applications more accurately, thereby reducing confusion and providing a more complete picture of the MR space. These seven core dimensions include:
1.
Number of Environments: Refers to the number of physical and virtual environments required for a particular MR type. For example, if the AR user and the VR user are in the same room, the VR experience would be considered a separate environment.
2.
Number of Users: Refers to the number of users required for a particular MR type. While more than one user is not strictly required in all cases, it is necessary for certain types of collaborative MRs.
3.
Level of Immersion: Degree to which the system is immersive.
4.
Level of Virtuality: Proportion or nature of virtual content in the environment.
5.
Degree of Interaction: Degree of user interaction with virtual or real objects in the environment, ranging from completely passive (e.g., viewing 360-degree photos) to active manipulation (e.g., gesture control).
6.
Input: The way the system receives user or environmental data.
7.
Output: The way the system presents information to the user.
Speicher et al. argued that these seven dimensions allow for a more comprehensive capture and characterisation of the objective aspects of the MR experience, which marks a shift from “qualitative localisation” to “quantitative localisation” in the study of MR taxonomies [33]. The utility of such a multidimensional descriptive framework is vividly illustrated by the CoLT system (Tahmid et al., 2023), a collaborative literature review platform that enables users to interact with a shared virtual workspace from points across the RV Continuum, including PCs, AR, and VR [55]. CoLT directly embodies several of Speicher’s dimensions, such as the number of users (supporting colocated and remote collaborators), level of immersion, and degree of interaction, while also introducing the critical temporal dimension of synchronous and asynchronous collaboration [55]. This system demonstrates how contemporary frameworks provide the necessary vocabulary and structure for designing and analysing complex, real-world XR applications that support diverse user needs and workflows. Speicher’s framework makes it clear that it is not intended to replace the Milgram RV continuum, but rather to provide a richer “descriptor set” [33]. It allows for more detailed classification and localisation of any point on the Milgram continuum. The study found that although Milgram’s continuum remained the most popular source of MR definitions, it was cited by only a little over a third of the literature reviewed, highlighting the definitional fragmentation of the MR field.

6.2. Gittens’ Ontological Perspective: A New Dimension of “Being”

In the evolution of research on XR taxonomies, Gittens proposed the Dyadic-XV Taxonomy, which represents a radical innovation that goes completely outside the traditional framework for thinking about display technology or user experience [58]. Gittens frames his work as a categorisation of “Beings” in the XR world, a radical departure from all previous taxonomies that have focused on human perception or system capabilities.
Gittens’ central concept is that his Dyadic-XV taxonomy is constructed on the basis of “Digital Beings” and their relationship to the dimension of “Sociality.” The taxonomy aims to describe and classify the two-way relationship that exists between humans and digital beings, and to identify existing and potential new types of digital beings. These digital beings include virtual humans, non-player characters (NPCs), robots, and avatars that provide rich, immersive, and engaging experiences in virtual worlds [58].
Gittens’ theory is heavily inspired by Steve Mann’s eXtended meta-uni-omni-Verse (XV) taxonomy. Mann’s XV taxonomy situates VR, AR, and the Metaverse within a three-dimensional spatial construct comprised of Physical Reality, Virtuality, and Sociality [58,61,64]. Gittens builds on this by using Mann’s three-dimensional space of physical, virtual, and social as the theoretical basis for his Dyadic-XV taxonomy, with a particular emphasis on the traditionally neglected dimension of “virtuality” within the “native lifeforms,” which has been largely overlooked by traditional research. He notes that [41], in their revision of Milgram and Kishino’s continuum, also constructed a new categorical space that incorporates familiar concepts such as “presence” and “immersion” and introduces new ones.
Gittens’ contribution is to provide a unique ontological perspective on XR taxonomy. It forces us to think about “what exists in XR” rather than just “what we see” or “how we interact”. This perspective is expected to provide valuable resources and reference points for designers of robots, VR worlds, games and social simulations to develop richer, more immersive systems [61].

6.3. The User-Centric Dimension: Presence, Plausibility, and Congruence

Whether it is Speicher’s multidimensional framework or Gittens’ social taxonomy, there is a common trend: XR taxonomies are increasingly focusing on users’ internal feelings and experiences. This focus on subjective experience requires a nuanced understanding of core concepts such as “Presence.” Moving beyond the traditional goal of maximising presence (fully immersing the user in the VE), recent research emphasises achieving optimal presence, a balanced state where users can maintain awareness of and interact with the real world as needed for safety and social context [37]. This paradigm shift is crucial for designing XR systems for dynamic, co-located settings, and it reframes the design goal from creating isolated experiences to facilitating fluid movement across the reality-virtuality continuum.
A review by Schuemie et al. identifies “Presence” as the classic core concept for assessing VR experiences [20]. They state that Presence (as a shorthand for “telepresence”) is a mental state or subjective perception in which an individual’s perception fails to accurately identify the role of technology in an experience, despite the fact that some or all of the current experience is generated and/or filtered by artificial technology. The most common definition is the feeling of being “in the moment”, and Schuemie et al.’s study also highlights that factors affecting the sense of presence include both system characteristics (e.g., high quality, high resolution information, consistency between displays, interactivity with the environment, virtual body performance) and user characteristics (e.g., individual differences, predisposition to immersive experiences) [20].
Building on this, the work of [45] further developed the concept of presence and proposed models of “Plausibility” and “Congruence” that are more applicable to MR [22,23]. Their research suggests that the traditional concept of Place Illusion (PI) in VR, i.e., the feeling of being “in a place” [29,53], becomes ambiguous in MR, as users are always more or less aware of their presence in the MR experience [54].
  • Plausibility Illusion (Psi): [45] emphasises that Plausibility Illusion (Psi) becomes particularly important in MR experiences. Slater [23] defines Psi as “the illusion that what is apparently happening is really happening (even though you know for sure that it is not)”. Slater [23] further explains that plausibility refers to “overall credibility of the scenario being depicted in comparison with expectations” and is assessed on a cognitive level. Latoschik’s study [45] points out that inconsistencies in different information processing layers (e.g., sensory/perceptual and cognitive) can lead to a breakdown of plausibility. For example, virtual objects that exhibit irrational behaviour in a real environment (e.g., floating rather than landing on a surface) can undermine external rationality [41].
  • Coherence: This notion of “coherence” used in the context of user perception echoes the system-capability dimension of “Coherence” (CO) proposed by Skarbez et al. [41]. Latoschik and Wienrich emphasise that perceived incoherence significantly reduces the perceived plausibility of an object [54]. Plausibility of the situation. It can be argued that “coherence”, as explored by Latoschik et al. at the level of the user’s psychological perception, and “coherence”, as proposed by Skarbez at the level of the system’s capabilities, constitute two sides of the same core idea: i.e., the virtual and real elements of the integration must follow a plausible set of internal logics [54].
The contemporary XR taxonomy’s emphasis on “presence” can be traced back to Marvin Minsky’s 1980 exploration of “telepresence” [67]. Subsequently, Jonathan Steuer formally introduced the concept of “presence” to computer-mediated environments in 1992 [14]. Systematic research in this area matured with the establishment of the International Society for Proximity Studies (ISPR) in 2000 [20] and gave rise to key metrics, such as the ITC-SOPI questionnaire [19]. Slater further deconstructed it into “Place Illusion (PI)” and “Plausibility Illusion (Psi)” [23], providing a direct theoretical springboard for our understanding of “Plausibility” in MR today [41].
While traditional taxonomies focus on describing the macro-level positioning and user perception of XR experiences, emerging research is beginning to explore system design patterns that transcend reality boundaries, thereby addressing complexity in practice. Cools et al. proposed 11 cross-reality (CR) system design patterns through a systematic synthesis of 60 literature sources [63]. CR systems are defined as those operating between distinct points along the reality-virtuality continuum. The study treats MR as an overarching classification system encompassing all points across the entire RV continuum and establishes a framework describing how these systems connect [63]. These patterns are categorized into four core types: (1) Foundational Patterns, focusing on connection mechanisms between realities, including defining the scope of data transfer boundaries (e.g., volume, surface, person, or object) and describing the directionality of information flow (e.g., unidirectional, bidirectional, or composite); (2) Source Patterns; (3) Display Patterns; and (4) Interaction Patterns. This focus on design patterns extends XR theory beyond static technological classifications to a systematic description of dynamic connections between realities, providing practical tools and design guidelines for constructing complex multi-user or multi-environment systems.
The value of the user experience dimension has been repeatedly validated in specific XR application research. For example, in evaluating the educational AR tool miBook, researchers used factor analysis to identify three core user experience benefits: creativity, realism, and accessibility [25]. Among these, realism directly corresponds to the academic concept of the illusion of reality, defined as the degree to which virtual content aligns with real-world rules and user expectations. Accessibility, meanwhile, pertains to the ease of user interaction within blended environments, influencing the establishment of presence [25]. Furthermore, the dimension of cognitive load provides a crucial lens for evaluating XR experiences. A systematic review by Buchner et.al. synthesises findings from 58 studies, revealing that the impact of AR on cognitive load is not monolithic but is significantly mediated by the type of AR and the instructional design [46]. For instance, spatial AR consistently demonstrates advantages in reducing cognitive load and enhancing performance in procedural tasks compared to see-through AR, which can introduce extraneous load due to the device itself. The review also highlights a critical shift in research paradigms: while media-comparison studies (e.g., AR vs. traditional textbooks) dominate the literature, value-added studies, which compare different AR implementations, offer more actionable insights. These studies confirm that integrating design principles from cognitive load theory, such as visual cues and generative learning strategies (e.g., note-taking), can effectively manage cognitive load and optimise learning outcomes [46]. This underscores that the efficacy of an XR system is co-determined by its technological capabilities and its pedagogical design. This research demonstrates that even in early-stage, technologically simpler AR applications, user-centred design and evaluation dimensions have become critical benchmarks for measuring success.
Achieving the high degree of perceptual coherence and plausibility demanded by contemporary XR systems is fundamentally dependent on underlying AI technologies. Cortes et al. conducted a systematic review of the application of Convolutional Neural Networks (CNNs) in XR (VR, AR, MR, and XR treated as generic descriptors) [59]. They proposed a functional classification spanning three core areas, demonstrating how CNNs enable the fidelity required for robust user experience [59]:
1.
Execution (43%): Focuses on application implementation and performance optimization [59]. This includes 360-degree content processing and analysis, object detection and tracking, and advanced scene recognition, all critical for maintaining low latency and system efficiency.
2.
Interaction (33%): Enables natural and fluid user input. Applications range from Human-Computer Interaction (HCI) and gesture recognition to Foveated and Ocular Visualisation (e.g., eye tracking for rendering optimisation). These functions directly support the real-time engagement and sense of Congruence in XR.
3.
Creation (24%): Involves the generation of content and visualisation elements. Key subcategories are 3D reconstruction and modelling, and point cloud processing. These technologies provide the necessary high-fidelity graphical assets to support external Rationality and Plausibility Illusion.
The value of these works is that they provide XR taxonomies with user-centred dimensions that can be measured by psychology and cognitive science. The merit of an XR experience depends not only on its technical parameters but also on its ability to create a believable, consistent, and immersive perception in the user’s mind [38,54]. For example, by using devices such as the Varjo XR-3, scholars were able to compare the perceptual differences between VR and video see-through AR conditions and discover their impact on spatial presence and plausibility ratings. This further demonstrates the need to assess the XR experience from the user’s perspective.

6.4. Merino et al.’s Systematic Framework for MR/AR Evaluation Methods [35]

Reflecting on the evaluation practices within the MR/AR field itself is another sign of its maturation. Merino et al. (2020) provide the most comprehensive framework to date through their systematic review [35]. Their work transcends singular conceptual definitions to systematically map research types, evaluation scenarios (ranging from algorithmic performance to team collaboration), and cognitive dimensions (such as presence, plausibility, and cognitive load). The 43 data collection methods they identified provide a detailed methodological toolkit for both quantitative and qualitative evaluation of XR experiences [35]. This research not only confirms the multidimensional trend proposed by Speicher et al. but further concretises it at the level of evaluation practice, emphasising that rigorous, multidimensional assessment methods are crucial for understanding and advancing XR technology.

6.5. A Trend Toward Unified Classification Standards

A systematic review of existing taxonomies and concepts in this chapter demonstrates that the classification paradigm in the XR field has clearly transitioned from a single-dimensional (technology-driven) approach to a multidimensional theoretical space (user experience- and system-attribute-driven). These dimensions, including subjective perceptual factors (such as coherence and plausibility), objective system characteristics (such as latency and field of view), and macro-contextual elements (such as sociality and collaboration), collectively form the current academic “analytical toolkit” for precisely describing and analysing any XR experience. This dimensional framework is not intended to establish a new singular classification model, but rather to provide researchers with a comprehensive set of variables to address the high complexity and contextual dependency of XR systems. This forms the basis for our subsequent discussion of the divergence between academic and industrial approaches.
While XR taxonomies are evolving towards greater multidimensionality, scholars have also begun to conduct systematic analyses of the review literature itself. Nikolaidis (2022) performed an “umbrella review” of 47 AR survey articles, revealing systematic biases in the field’s research coverage [49]. The study found that healthcare (41%) and education (36%) dominate the AR review landscape, while areas such as industry, robotics, and tourism remain underrepresented in systematic reviews. More significantly, the author proposed a unified taxonomy comprising ten criteria: hardware, field of interest, method, aim, main outcomes, sample, software, tracking, limitations, and modalities, providing a standardised framework for future XR research [49]. This work not only reinforces the value of multidimensional frameworks like Speicher’s but also operationalises them into actionable classification criteria, reflecting the XR field’s maturation from conceptual debates toward standardised, structured description.

7. Beyond Academia: Industry and Public Perceptions

Section 6 provides a selective, non-exhaustive view of industry terminology. We focus on Meta and Varjo as two well-documented and influential cases with explicit public-facing definitions. However, industry usage is substantially more heterogeneous than these two examples alone might suggest: different platform ecosystems and product strategies often drive distinct naming choices and definitional boundaries that can diverge from academic taxonomies.
To answer RQ4, this chapter compares academic classifications with the terminologies and product strategies used by leading XR companies. The previous sections synthesised academic theoretical frameworks, while this section reveals how industry pragmatically reshapes these academic concepts according to its product and market strategies, which leads to significant definitional differences between theory and practice.
This section is structured around two representative industry leaders: Meta and Varjo. These two companies were specifically chosen because they exemplify the two most significant, divergent strategic directions in the contemporary XR landscape:
  • Meta represents the social ecosystem/mass market, whose definition and technology are primarily driven by the vision of an accessible, social Metaverse [44].
  • Varjo represents the professional high-fidelity segment, whose definitions are tied to achieving extreme precision and realism for high-end industrial and professional applications [43,54].
The comparison between Meta’s social-oriented approach and Varjo’s technology/fidelity-oriented approach highlights the fundamental tension between the broad philosophical nature of academic theory and the necessary precision of commercial product strategy, which is essential for understanding the definitional ambiguities persisting in the field.

7.1. Meta (Industry Framing of MR)

Meta’s understanding and application of XR is deeply rooted in its vision of a grand “metaverse” [44]. Thus, the core of Meta’s meta-universe lies in the powerful aspect of its social connectivity, with the goal of constructing a three-dimensional virtual world in which user avatars can engage in political, economic, social, and cultural activities, enabling the coexistence of the virtual and the real and the creation of value. Meta’s vision of a social meta-universe in which everyone can participate can in a sense be seen as a modern, commercialised version of Ivan Sutherland’s “Ultimate Display” [65], conceived more than half a century ago; and its quest for multi-sensory experiences echoes Morton Heilig’s 1962 “Sensorama” idea of integrating senses such as odour [68]. With this vision in mind, Meta’s definition of MR seems particularly pragmatic, closely combining its Quest series hardware and video perspective technology [69]. Meta is committed to seamlessly integrating digital content into users’ real-world spaces. For example, imagine playing a virtual board game on a table in your living room. Through deep learning-driven high-precision recognition and naturally occurring models, Meta seeks to create a more immersive environment and natural interaction experience. In addition, the social significance of its meta-universe is further strengthened by the use of mobile devices for anytime, anywhere access and virtual currency as an economic bridge between the meta-universe and the real world. As for VR, Meta sees it as a fully immersive gateway to its social meta-universe (e.g., Horizon Worlds and Infinite Office) [44]. In Meta’s context, VR technology aims to completely isolate users from the perception of externally unrealistic environments, thereby enabling full attention and immersion in the virtual world [40,44]. These definitions and technological paths of Meta clearly serve their platform and ecosystem strategy. They want to build a next-generation Internet that is accessible to all, driven by ever-intensifying social activity and content creation.

7.2. Varjo (Industry Framing of MR)

In contrast to Meta’s vision of a pervasive social metaverse, Varjo is positioning its XR technology in the market for high-end, professional, and industrial-grade applications [3,69,70]. Varjo’s products are designed to meet the needs of key players in industries such as government and defence, aerospace, education, healthcare and automotive. In its public-facing materials, Varjo frames mixed reality (MR) as an experience enabled by the highest-fidelity video see-through technology. Devices such as the Varjo XR-3 achieve this through a see-through mode enabled by dual 12-megapixel front-facing cameras and a unique dual-display architecture—70 pixels per degree in the focal area and 30 pixels per degree in the peripheral area. Together, these features allow virtual content to blend seamlessly with the real world, appearing “true to life” [48]. Varjo emphasises the user’s ability to clearly see the difference between knitted and printed fabrics or to compare the effect of gloss on surface appearance. It demonstrates Varjo’s dedication to reproducing the finest detail. In addition, the Varjo XR-4 series introduces the “world’s first gaze-driven XR autofocus camera system,” further enhancing the integration of a sense of reality [70]. In Varjo’s understanding, MR aims to create “environments where real and virtual components are seamlessly integrated and interact in a natural way” and achieve a high level of local presence [64]. This means that in Varjo’s virtual environment, users can see minute details. Varjo’s motivation is to deliver superior business value and Return On Investment (ROI) to its customers. For example, maximising the ROI of a VR crane simulation for the steel industry through its VR technology, or achieving Federal Aviation Administration (FAA) and European Union Aviation Safety Agency(EASA) VR/XR flight training certification, demonstrates that its customers need the ultimate in accuracy and realism to do high-value work. In Virtual Reality (VR), Varjo’s core technology is defined by its signature “human-eye resolution” [69].

7.3. Broader Industry Terminology Beyond the Two Case Examples

Although Section 6.1 and Section 6.2 focus on Meta and Varjo as two well-documented cases, industry terminology is substantially more heterogeneous than any two examples can represent. In practice, labels such as “MR” or adjacent framings operate as strategic boundary objects, shaped by ecosystem positioning, interaction modality, and target use cases rather than by a shared academic taxonomy. For instance, Microsoft’s Windows Mixed Reality documentation explicitly defines mixed reality as “a blend of physical and digital worlds” and presents it as a platform umbrella grounded in vision, display, and input advances [71]. Apple largely avoids AR/VR/MR labels in public positioning and instead frames Vision Pro as a “spatial computer,” foregrounding “spatial computing” as the organising concept [72]. Magic Leap, by contrast, emphasises “see-through AR” and optics-centric capabilities, anchoring its framing more narrowly to optical see-through enterprise AR [73].

7.4. Public Perceptions of XR Terminology

Beyond the academic-industry divide, a third layer of definitional complexity emerges from the perspective of the general public. Survey data suggest that public understandings of AR tend to foreground accessibility and familiar metaphors (e.g., smartphone-based experiences such as Pokémon Go) over technical precision [6]. Whereas industrial framings are largely shaped by product strategy and platform positioning, public definitions are more directly shaped by exposure and lived experience, resulting in vernacular, example-driven mental models. This additional layer of interpretation contributes to a tripartite misalignment among (i) broad, theory-oriented academic definitions, (ii) product- and ecosystem-oriented industry framings, and (iii) experience-driven public conceptions—a misalignment that complicates cumulative knowledge building and may slow the field’s maturation and mass adoption.

7.5. Summary and Implications

Convergence. Despite differences in terminology, industrial practice is not completely divorced from academic theory. Many aspects of Meta and Varjo’s implementation of MR—including real-scene understanding, virtual–real interaction, and occlusion handling (e.g., a virtual chair correctly occluded by a physical table)—can be interpreted as practical instantiations of influential academic principles and frameworks [17,27,45]. More broadly, industry’s pursuit of immersion and realism echoes academic work on presence and plausibility: research on breaks in presence and on the consistency of virtual events with user expectations has provided conceptual guidance for improving experiential quality [27].
Divergence. However, significant terminological discrepancies persist. This divergence is not merely semantic; it is systematically shaped by differences in product strategy, target markets, and technical implementation choices (e.g., display architectures and system latency). When combined with the experience-driven mental models observed in public perceptions, these forces produce the tripartite misalignment discussed above. The following subsections (Section 6.1, Section 6.2 and Section 6.3) detail how such technological and market drivers reshape conceptual boundaries in practice, thereby clarifying why terminological alignment remains challenging.
To synthesise the above evidence, Table 5 summarises key points of convergence and divergence between academic taxonomies, industry framings, and public perceptions.

8. Discussion

To address the fragmentation of terminology and the theoretical shifts noted throughout this review, we present a concept map in Figure 8. The diagram visualises the hierarchical progression of the field: (Bottom) Foundation theories define XR based on objective System/Mediation Properties (e.g., Fidelity, Latency). (Middle) Contemporary research introduces Perceptual-Cognitive Mechanisms (e.g., Coherence, Plausibility) as the logical filter between hardware and user experience. (Top) Modern frameworks expand into Multidimensional Taxonomies, incorporating Sociality and Context. The map aligns key theorists (e.g., Milgram [8], Skarbez [27], Mann [12]) with their respective analytical layers to clarify their relationships. This visualisation organises the diverse literature into a logical hierarchy, illustrating the field’s evolutionary trajectory. As shown in the map, the focus of XR research has shifted from objective System Properties (such as Fidelity and Tracking metrics proposed by Milgram and Speicher) to prioritising Perceptual-Cognitive Mechanisms (specifically Coherence and Plausibility as emphasised by Skarbez and Latoschik [27,45]). This shift suggests that high-fidelity hardware alone is insufficient to define XR; rather, it serves as the foundation for Experience Constructs (like Presence and Embodiment). Consequently, contemporary definitions have evolved into Multidimensional Taxonomies that incorporate broader dimensions such as Mediality, Sociality, and Context (as seen in Mann and Gittens’ work [12,58]). This concept map provides a structured overview of how these distinct theoretical contributions fit together within both academic validation and industry framing.
The systematic review reveals a profound paradigm shift in the conceptualisation of eXtended Reality. This observation aligns with Merino et al.’s analysis [35], which documented a distinct move from technology-centred evaluation to human-centred evaluation since 2016. The MR/AR field has evolved from the singular technological dimension proposed by Milgram into a complex, multidimensional research landscape that centres on the user’s subjective experience.
Figure 2 and Figure 3 provide corpus-level context for interpreting the persistence of terminological fragmentation. Together, they indicate that definitional work has accelerated in recent years while remaining largely negotiated through conceptual and synthesis-oriented contributions. This structure amplifies competition between boundary criteria (technology-mediated virtuality, experiential consistency, and ecosystem framing), helping explain why terminological unification consolidation pressures can coexist with persistent definitional divergence.

8.1. Answering RQ1 and RQ3: The Shift from Monosensory to Multi-Dimensional Extension

The primary driver of this evolution (RQ1) has been the shift from merely describing the technical ability to mix real and virtual content to incorporating subjective user perceptions. Milgram’s initial model suffered from monocular limitations. Seminal revisions, such as those by Skarbez et al. [27], introduced the crucial concepts of Coherence and Plausibility, arguing that a system’s quality is not just about realism, but the internal logic and consistency of the experience. This user-centric dimension (III. User Perception Theory in Table 2) provided a necessary critique and extension to the original continuum (RQ3). Furthermore, Mann’s concept of Mediated Reality and the subsequent eXtended meta-uni-omni-Verse (XV) Framework challenged the RV continuum by introducing Diminished Reality and the dimension of Sociality, marking the progression to macro-contextual and ontological factors (IV Macro Background and Critique in Table 2).

8.2. Answering RQ2: Systematisation Through Contemporary Multi-Dimensional Frameworks

Furthermore, the theoretical space expanded beyond the simple “mixing” of reality and virtuality. Contemporary models, such as Speicher’s seven dimensions, solidify this by offering granular descriptors based on objective system characteristics (e.g., number of users/environments, degree of interaction), enabling a shift from qualitative to quantitative localisation. Parallel to these general frameworks, application-specific research in the domain of educational technology, Panchenko et al. proposed a faceted classification scheme for augmented reality books, which includes eight facets such as the reality-virtuality continuum, type of augmented material, devices, interaction type, spatial space of the book, and book category [39]. This faceted approach allows for a detailed, structured description of AR books, enabling educators and developers to systematically analyse and design these resources. It exemplifies how multi-dimensional taxonomies can be applied to specific XR applications, further enriching the conceptual tools available to researchers and practitioners.

8.3. Answering RQ4: The Technological Roots of Academic-Industry Divergence

The analysis of industry leaders reveals a fundamental tension: a divergence between the breadth of academic theory and the precision of product strategy. Academic definitions are often philosophical and broad (e.g., Mann’s Mediated Reality encompassing all forms of modification [12]), while industry players like Meta and Varjo adopt definitions that are highly specific and tied to their hardware capabilities and market goals (e.g., Meta’s focus on social metaverse and video passthrough; Varjo’s focus on “human-eye resolution” and high-fidelity professional use cases). However, industry practice is not divorced from theory; the pursuit of accurate 3D registration, occlusion handling, and high fidelity is a practical embodiment of the foundational work by Azuma and the user-centric theories of Skarbez and Latoschik [17,27,45]. The industry’s pragmatic approach validates the need for multi-dimensional analysis, even as it simplifies the terminology for market purposes.
The divergent paths of industry leaders like Meta and Varjo can be fundamentally understood through the lens of display technology architectures and their inherent trade-offs, as systematically categorised by Yin et al. [42]. Their analysis provides a technical framework for why the industry has fragmented into distinct product philosophies. Varjo’s pursuit of professional-grade “human-eye resolution” and high-fidelity video passthrough aligns with the “free-space coupler” architecture [42]. This approach, which uses cameras to capture the real world and displays to overlay virtual content, prioritises image quality and realism but inherently introduces challenges like system latency and increased computational load. Conversely, Meta’s vision for an accessible, social metaverse built around its Quest hardware ecosystem favours a “lightguide-based” architecture (e.g., waveguides) for its eventual AR/MR applications [42]. This path prioritises a compact form factor, lower power consumption, and see-through capability, which are crucial for widespread consumer adoption and prolonged social use, albeit often at the cost of a narrower field of view and lower peak brightness compared to free-space systems. Thus, the academic taxonomy of display architectures directly explains the industry’s strategic segmentation: Varjo’s high-fidelity, professional niche versus Meta’s accessible, social-metaverse mass market.

8.4. Future Challenges and HCI Design Imperatives

While this review focuses on definitions and conceptual models of XR, the growing use of AI—particularly AI-generated content and AI-assisted perception and interaction—may further intensify pressures on existing taxonomies by enabling more dynamic, context-adaptive, and personalised blends of “real” and “virtual”. We therefore treat AI-enabled XR as a forward-looking driver of terminological evolution rather than a core focus of the present review. Furthermore, as the conceptual understanding of XR matures, the focus of research must increasingly shift towards the design of seamless interactions within these multi-dimensional taxonomies. At the hardware level, the Vergence-Accommodation Conflict (VAC) solutions identified by Yin et al., such as multifocal displays, varifocal lenses, and light-field displays, will provide the technical foundation for constructing XR experiences that better align with human visual physiology [42]. Future work will need to explore how users can intuitively navigate and transition between the complex states defined by frameworks like Speicher’s or the revised RV Continuum. Studies that develop and evaluate interaction metaphors, such as the “Virtual Phone” for bi-directional micro-interactions [37], will be essential for translating theoretical taxonomies into practical, user-friendly XR systems. The challenge lies not only in defining the spaces but also in designing the doors between them. Future research must focus on how these new technologies impact users’ perceptions of Plausibility and Coherence across the now fractured RV Continuum, particularly as the boundaries between “real” and “virtual” become more dynamically personalised.

8.5. Summary of Added Value Relative to Existing Surveys

Compared with existing XR surveys and reviews [17,22,42,43,49,52], which often organise the literature by technologies, application domains, or isolated experience constructs, this review offers a contribution that is explicitly definitional and integrative. By combining a PRISMA-guided selection with a historical tracing of conceptual lineages, we provide a structured account of how boundary criteria for VR/AR/MR/XR have shifted over time. The proposed dimension-based lens and visual synthesis enable direct comparison across heterogeneous frameworks and make implicit definitional assumptions explicit. Finally, by including industry and public framings as a third axis of analysis, the review explains persistent misalignment not as mere semantic noise but as a systematic outcome of differing objectives across communities.

9. Conclusions and Future Outlook

This paper outlines a clear evolutionary path through a systematic review of definitions and taxonomies of eXtended Reality (XR). The path begins with a one-dimensional “RV continuum” centred on visual display technology proposed by Milgram and Kishino in 1994, but has since evolved into a multi-dimensional theoretical space.
Through a synthesis of this evolutionary vein, we systematically answered the core research questions: RQ1, RQ2, and RQ3, finding that the XR taxonomy has shifted from a single-dimensional technical classification to a multidimensional analytical framework. This framework spans foundational positioning (the RV Continuum), user perception (Presence, Plausibility, Coherence), system characteristics (Speicher’s metrics), and macro-contextual factors (Sociality, Economy). RQ4 is answered by the fundamental divergence between academic theories, which provide broad, philosophical frameworks, and industrial definitions, which are highly specific to hardware and market strategies. Nevertheless, the industry’s pursuit of high fidelity is a practical embodiment of user-centric theories.
However, this review is limited by the scope of its keyword-based search strategy and the inherent fragmentation of XR terminology, which may have led to the omission of relevant studies that were inconsistently labelled across disciplines. In addition, integrating conceptual frameworks originating from diverse epistemological backgrounds (e.g., technical, psychological, and industry-driven models) necessarily involves interpretive synthesis. Finally, as XR technologies evolve rapidly, the contrast between academic definitions and industry practices presented in this review may shift over time, requiring ongoing reassessment.
Emerging enabling technologies and novel sensory interfaces will likely continue to broaden XR capabilities and, in turn, increase pressure on existing terminologies and taxonomies. However, the future scaling of XR systems is intrinsically dependent on overcoming fundamental constraints related to communication infrastructure. Research is intensely focused on leveraging MEC, 5G, and emerging 6G networks to enhance the Quality of Experience (QoE) by minimising real-time latency and maximising data reliability. Solving the terminological ambiguities detailed in this review is a necessary precursor to effective standardisation and solution deployment in these critical infrastructure domains.

Author Contributions

Conceptualization, X.H.; methodology, X.H.; software, X.H.; validation, X.H.; formal analysis, X.H.; investigation, X.H.; resources, X.H.; data curation, X.H.; writing—original draft preparation, X.H.; writing—review and editing, T.M. and T.L.; visualisation, X.H.; supervision, T.M. and T.L.; project administration, T.M. and T.L.; funding acquisition, T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no competing interests.

References

  1. Billinghurst, M.; Kato, H.; Poupyrev, I. The MagicBook: A transitional AR interface. Comput. Graph. 2001, 25, 745–753. [Google Scholar] [CrossRef]
  2. Meta. The Future of Mixed Reality: A Platform Ecosystem Vision. Available online: https://developers.meta.com/horizon/documentation/unity/mr-experience-and-use-cases/ (accessed on 20 January 2025).
  3. Varjo Technologies Oy. Mixed Reality. 2025. Available online: https://support.varjo.com/hc/en-us/mixed-reality (accessed on 23 June 2025).
  4. Dargan, S.; Bansal, S.; Kumar, M.; Sharma, M.; Garg, S.; Sehgal, H. Augmented Reality: A Comprehensive Review. Arch. Comput. Methods Eng. 2023, 30, 1057–1080. [Google Scholar] [CrossRef]
  5. Vatavu, R.-D. Sensorimotor Realities: Formalizing Ability-Mediating Design for Computer-Mediated Reality Environments. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR); IEEE: New York, NY, USA, 2022; pp. 685–694. [Google Scholar] [CrossRef]
  6. Thompson, A.; Potter, L.E. Defining AR: Public Perceptions of an Evolving Landscape. In CHI’20 Extended Abstracts, Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems Extended Abstracts, Honolulu, HI, USA, 25–30 April 2020; ACM: New York, NY, USA, 2020; pp. 1–7. [Google Scholar]
  7. Rauschnabel, P.A.; Felix, R.; Hinsch, C.; Shahab, H.; Alt, F. What is XR? Towards a Framework for Augmented and Virtual Reality. Comput. Human Behav. 2022, 133, 107289. [Google Scholar] [CrossRef]
  8. Milgram, P.; Kishino, F. A taxonomy of mixed reality visual displays. IEICE Trans. Inf. Syst. 1994, E77-D, 1321–1329. [Google Scholar]
  9. Kitchenham, B.A.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Keele University: Keele, UK, 2007. [Google Scholar]
  10. Koivisto, J.; Hamari, J. Gamification of physical activity: A systematic literature review of comparison studies. In International GamiFIN Conference; CEUR-WS: Aachen, Germany, 2019; pp. 106–117. [Google Scholar]
  11. Thomas, J.; Harden, A. Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Med. Res. Methodol. 2008, 8, 45. [Google Scholar] [CrossRef]
  12. Mann, S. Mediated reality with implementations for everyday life. Presence Connect 2002, 1, 2002. [Google Scholar]
  13. Sutherland, I.E. A head-mounted three dimensional display. In Proceedings of the December 9–11, 1968, Fall Joint Computer Conference, Part I; Association for Computing Machinery: New York, NY, USA, 1968; pp. 757–764. [Google Scholar]
  14. Steuer, J. Defining virtual reality: Dimensions determining telepresence. J. Commun. 1992, 42, 73–93. [Google Scholar] [CrossRef]
  15. Feiner, S.; MacIntyre, B.; Seligmann, D. Knowledge-based augmented reality. Commun. ACM 1993, 36, 53–62. [Google Scholar] [CrossRef]
  16. Milgram, P.; Takemura, H.; Utsumi, A.; Kishino, F. Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and Telepresence Technologies; SPIE: Bellingham, WA, USA, 1995; Volume 2351, pp. 282–292. [Google Scholar]
  17. Azuma, R.T. A survey of augmented reality. Presence Teleoperators Virtual Environ. 1997, 6, 355–385. [Google Scholar] [CrossRef]
  18. Mann, S. Wearable computing: A first step toward personal imaging. Computer 1997, 30, 25–32. [Google Scholar] [CrossRef]
  19. Lessiter, J.; Freeman, J.; Keogh, E.; Davidoff, J. A cross-media presence questionnaire: The ITC-Sense of Presence Inventory. Presence Teleoperators Virtual Environ. 2001, 10, 282–297. [Google Scholar] [CrossRef]
  20. Schuemie, M.J.; Van Der Straaten, P.; Krijn, M.; Van Der Mast, C.A.P.G. Research on presence in virtual reality: A survey. Cyberpsych. Behav. 2001, 4, 183–201. [Google Scholar] [CrossRef]
  21. Feiner, S.K. Augmented reality: A new way of seeing. Sci. Am. 2002, 286, 48–55. [Google Scholar] [CrossRef] [PubMed]
  22. Alexander, A.L.; Brunyé, T.; Sidman, J.; Weil, S.A. From gaming to training: A review of studies on fidelity, immersion, presence, and buy-in and their effects on transfer in PC-based simulations and games. DARWARS Train. Impact Group 2005, 5, 3. [Google Scholar]
  23. Slater, M. Place illusion and plausibility can lead to realistic behaviour in immersive virtual environments. Philos. Trans. R. Soc. B Biol. Sci. 2009, 364, 3549–3557. [Google Scholar] [CrossRef]
  24. Rodello, I.A.; Sanches, S.R.R.; Sementille, A.C.; Brega, J.R.F. Mixed Reality: Concepts, Tools and Applications. Rev. Bras. Comput. Apl. 2010, 2, 2–16. [Google Scholar] [CrossRef][Green Version]
  25. Dias, A.d.C.; García Ordaz, M.; López, J.F.M. From 2D to augmented reality. In Proceedings of the 6th Iberian Conference on Information Systems and Technologies (CISTI 2011), Chaves, Portugal, 15–18 June 2011; pp. 1–7. [Google Scholar]
  26. Zheng, R.; Zhang, D.; Yang, G. Seam the Real with the Virtual:a Review of Augmented Reality. In Proceedings of the 2015 Information Technology and Mechatronics Engineering Conference, Chongqing, China, 28–29 March 2015; pp. 77–80. [Google Scholar] [CrossRef][Green Version]
  27. Skarbez, R.T. Plausibility Illusion in Virtual Environments. Ph.D. Thesis, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, 2016. [Google Scholar]
  28. Gilbert, S. Perceived realism of virtual environments depends on authenticity. Presence 2016, 25, 322–324. [Google Scholar] [CrossRef]
  29. Skarbez, R.; Brooks, F.P., Jr.; Whitton, M.C. A Survey of Presence and Related Concepts. ACM Comput. Surv. 2017, 50, 96. [Google Scholar] [CrossRef]
  30. Kim, S.; Kang, S.; Choi, Y.-J.; Choi, M.-H.; Hong, M. Augmented-Reality Survey: From Concept to Application. KSII Trans. Internet Inf. Syst. 2017, 11, 982–1004. [Google Scholar] [CrossRef]
  31. Nevelsteen, K.J.L. Virtual world, defined from a technological perspective and applied to video games, mixed reality, and the Metaverse. Comput. Animat. Virtual Worlds 2018, 29, e1752. [Google Scholar] [CrossRef]
  32. Bekele, M.K.; Champion, E.M. Redefining Mixed Reality: User-Reality-Virtuality and Virtual Heritage Perspectives. In Proceedings of the 24th International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA) 2019; CAADRIA: Hong Kong, China, 2019; Volume 2, pp. 675–684. [Google Scholar]
  33. Speicher, M.; Hall, B.D.; Nebeling, M. What is mixed reality? In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2019; pp. 1–15. [Google Scholar]
  34. Krzyzaniak, M.; Frohlich, D.; Jackson, P.J.B. Six types of audio that DEFY reality!: A taxonomy of audio augmented reality with examples. In Audio Mostly (AM’19); ACM: New York, NY, USA, 2019; pp. 1–8. [Google Scholar]
  35. Merino, L.; Schwarzl, M.; Kraus, M.; Sedlmair, M.; Schmalstieg, D.; Weiskopf, D. Evaluating Mixed and Augmented Reality: A Systematic Literature Review (2009–2019). In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR); IEEE: New York, NY, USA, 2020; pp. 438–451. [Google Scholar] [CrossRef]
  36. Hofer, M.; Hartmann, T.; Eden, A.; Ratan, R.; Hahn, L. The role of plausibility in the experience of spatial presence in virtual environments. Front. Virtual Real. 2020, 1, 2. [Google Scholar] [CrossRef]
  37. George, C.; Tien, A.N.; Hussmann, H. Seamless, bi-directional transitions along the reality-virtuality continuum: A conceptualization and prototype exploration. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR); IEEE: New York, NY, USA, 2020; pp. 412–424. [Google Scholar]
  38. Zhang, C. The why, what, and how of immersive experience. IEEE Access 2020, 8, 90878–90888. [Google Scholar] [CrossRef]
  39. Panchenko, L.F.; Vakaliuk, T.A.; Vlasenko, K.V. Augmented reality books: Concepts, typology, tools. In Proceedings of the CEUR Workshop Proceedings, Vienna, Austria, 21–22 February 2020. [Google Scholar]
  40. Turchet, L.; Hamilton, R.; Çamci, A. Music in extended realities. IEEE Access 2021, 9, 15810–15832. [Google Scholar] [CrossRef]
  41. Skarbez, R.; Smith, M.C.; Whitton, M.C. Revisiting the reality-virtuality continuum: A new taxonomy for virtual and augmented reality. Presence Teleoperators Virtual Environ. 2021, 29, 247–274. [Google Scholar]
  42. Yin, K.; He, Z.; Xiong, J.; Zou, J.; Li, K.; Wu, S.-T. Virtual reality and augmented reality displays: Advances and future perspectives. J. Phys. Photonics 2021, 3, 022010. [Google Scholar] [CrossRef]
  43. Samini, A.; Palmerius, K.L.; Ljung, P. A Review of Current, Complete Augmented Reality Solutions. In 2021 International Conference on Cyberworlds (CW); IEEE: New York, NY, USA, 2021; pp. 49–56. [Google Scholar] [CrossRef]
  44. Park, S.-M.; Kim, Y.-G. A metaverse: Taxonomy, components, applications, and open challenges. IEEE Access 2022, 10, 4209–4251. [Google Scholar] [CrossRef]
  45. Latoschik, M.E.; Wienrich, C. Congruence and Plausibility, Not Presence: Pivotal Conditions for XR Experiences and Effects, a Novel Approach. Front. Virtual Real. 2022, 3, 883651. [Google Scholar] [CrossRef]
  46. Buchner, J.; Buntins, K.; Kerres, M. The impact of augmented reality on cognitive load and performance: A systematic review. J. Comput. Assist. Learn. 2022, 38, 285–303. [Google Scholar] [CrossRef]
  47. Brübach, L.; Westermeier, F.; Wienrich, C.; Latoschik, M.E. Breaking Plausibility Without Breaking Presence—Evidence for the Multi-Layer Nature Of Plausibility. IEEE Trans. Vis. Comput. Graph. 2022, 28, 2267–2276. [Google Scholar] [CrossRef]
  48. Pointecker, F.; Friedl, J.; Schwajda, D.; Jetter, H.-C.; Anthes, C. Bridging the gap across realities: Visual transitions between virtual and augmented reality. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR); IEEE: New York, NY, USA, 2022; pp. 827–836. [Google Scholar]
  49. Nikolaidis, A. What Is Significant in Modern Augmented Reality: A Systematic Analysis of Existing Reviews. J. Imaging 2022, 8, 145. [Google Scholar] [CrossRef]
  50. Auda, J.; Gruenefeld, U.; Faltaous, S.; Mayer, S.; Schneegass, S. A Scoping Survey on Cross-reality Systems. ACM Comput. Surv. 2023, 56, 83. [Google Scholar] [CrossRef]
  51. Pamparau, C. A Review of Milgram and Kishino’s Reality-Virtuality Continuum and a Mathematical Formalization for Combining Multiple Reality-Virtuality Continua. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 88. [Google Scholar] [CrossRef]
  52. Becker, A.; Dal Sasso Freitas, C.M. Evaluation of XR Applications: A Tertiary Review. ACM Comput. Surv. 2023, 56, 110. [Google Scholar] [CrossRef]
  53. Westermeier, F.; Brübach, L.; Latoschik, M.E.; Wienrich, C. Exploring plausibility and presence in mixed reality experiences. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2680–2689. [Google Scholar] [CrossRef] [PubMed]
  54. Gall, A.; Heim, A.; Fröhler, B.; Heinzl, C. Uncertainty unveiled: Revealing the uncertainty of distribution visualization through cross reality. In 2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct); IEEE: New York, NY, USA, 2023; pp. 20–24. [Google Scholar]
  55. Tahmid, I.A.; Rodrigues, F.; Giovannelli, A.; Lisle, L.; Thomas, J.; Bowman, D.A. CoLT: Enhancing collaborative literature review tasks with synchronous and asynchronous awareness across the reality-virtuality continuum. In 2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct); IEEE: New York, NY, USA, 2023; pp. 831–836. [Google Scholar]
  56. Li, R.Z.; Cherival, T.; Cassandro, R.; Hou, Y. Eight questions on the evolution and impact of extended reality technologies in engineering systems performance evaluation. In 2024 IEEE International Conference on Prognostics and Health Management (ICPHM); IEEE: New York, NY, USA, 2024; pp. 217–222. [Google Scholar]
  57. Westermeier, F.; Brübach, L.; Wienrich, C.; Latoschik, M.E. Assessing depth perception in VR and video see-through AR: A comparison on distance judgment, performance, and preference. IEEE Trans. Vis. Comput. Graph. 2024, 30, 2140–2150. [Google Scholar] [CrossRef]
  58. Gittens, C.L. Dyadic-XV: A taxonomy of digital beings. IEEE Consum. Electron. Mag. 2024, 13, 36–45. [Google Scholar] [CrossRef]
  59. Cortes, D.; Bermejo, B.; Juiz, C. The use of CNNs in VR/AR/MR/XR: A systematic literature review. Virtual Real. 2024, 28, 154. [Google Scholar] [CrossRef]
  60. Ghourab, E.M.; Azab, M.; Gračanin, D.; Alhussein, O.; Al-Qutayri, M.; Muhaidat, S. Extended reality-aware wireless communication networks: A systematic literature review. IEEE Open J. Commun. Soc. 2025, 6, 7567–7588. [Google Scholar] [CrossRef]
  61. Mann, S.; Yuan, Y.; Lamberti, F.; Saddik, A.E.; Thawonmas, R.; Pratticò, F.G. eXtended meta-uni-omni-Verse (XV): Introduction, Taxonomy, and State-of-the-Art. IEEE Consum. Electron. Mag. 2024, 13, 27–35. [Google Scholar] [CrossRef]
  62. Lampropoulos, G. Intelligent Virtual Reality and Augmented Reality Technologies: An Overview. Future Internet 2025, 17, 58. [Google Scholar] [CrossRef]
  63. Cools, R.; Han, J.; Esteves, A.; Simeone, A.L. From display to interaction: Design patterns for cross-reality systems. IEEE Trans. Vis. Comput. Graph. 2025, 31, 3129–3139. [Google Scholar] [CrossRef]
  64. Moslavac, M.; Vlahović, S.; Skorin-Kapov, L. Towards a unified definition of social XR. In 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW); IEEE: New York, NY, USA, 2025; pp. 9–573. [Google Scholar]
  65. Sutherland, I.E. The ultimate display. Proc. IFIP Congr. 1965, 2, 506–508. [Google Scholar]
  66. Mann, S.; Fung, J. EyeTap devices for augmented, deliberately diminished, or otherwise altered visual perception of rigid planar patches of real-world scenes. Presence 2002, 11, 158–175. [Google Scholar] [CrossRef]
  67. Minsky, M. Telepresence. OMNI Magazine, June 1980; pp. 44–52. [Google Scholar]
  68. Heilig, M.L. Sensorama Simulator. US Patent 3,050,870, 28 August 1962. [Google Scholar]
  69. Varjo Technologies Oy. How Varjo Delivers Human-Eye Resolution in Virtual Reality. 2020. Available online: https://varjo.com/blog/introducing-bionic-display-how-varjo-delivers-human-eye-resolution/ (accessed on 23 June 2025).
  70. Varjo Technologies Oy. Varjo Insider Blog. 2025. Available online: https://varjo.com/blog/ (accessed on 23 June 2025).
  71. Microsoft. What Is Mixed Reality? Windows Mixed Reality Documentation. 2026. Available online: https://learn.microsoft.com/en-us/windows/mixed-reality/discover/mixed-reality (accessed on 3 January 2026).
  72. Apple. Introducing Apple Vision Pro: Apple’s First Spatial Computer. Apple Newsroom. 2023. Available online: https://www.apple.com/au/newsroom/2023/06/introducing-apple-vision-pro/ (accessed on 3 January 2026).
  73. Magic Leap. Magic Leap 2. 2026. Available online: https://www.magicleap.com/magic-leap-2 (accessed on 3 January 2026).
Figure 1. A flowchart describing the literature’s selection procedure.
Figure 1. A flowchart describing the literature’s selection procedure.
Mti 10 00024 g001
Figure 2. Publication-year distribution of the included studies (n = 59).
Figure 2. Publication-year distribution of the included studies (n = 59).
Mti 10 00024 g002
Figure 3. Study type distribution.
Figure 3. Study type distribution.
Mti 10 00024 g003
Figure 4. Reality-Virtuality (RV) Continuum Source [8].
Figure 4. Reality-Virtuality (RV) Continuum Source [8].
Mti 10 00024 g004
Figure 5. Taxonomy of Reality, Virtuality, Mediality.
Figure 5. Taxonomy of Reality, Virtuality, Mediality.
Mti 10 00024 g005
Figure 6. Mediated reality Frame.
Figure 6. Mediated reality Frame.
Mti 10 00024 g006
Figure 7. Mann’s eXtended meta-uni-omni-Verse (XV) Framework [58].
Figure 7. Mann’s eXtended meta-uni-omni-Verse (XV) Framework [58].
Mti 10 00024 g007
Figure 8. A concept map illustrating the evolution of XR theoretical dimensions.
Figure 8. A concept map illustrating the evolution of XR theoretical dimensions.
Mti 10 00024 g008
Table 1. Search terms used and the number of records received from databases.
Table 1. Search terms used and the number of records received from databases.
Search terms
(Framework OR Taxonomy OR Model OR Definition OR Survey OR Review) AND (“Mixed Reality” OR “Augmented Reality” OR “eXtended Reality” OR “Virtual Reality”) (Plausibility OR Presence OR Coherence OR Fidelity) AND (“Mixed Reality” OR “Augmented Reality”) “Reality-Virtuality Continuum” OR “Mediated Reality”
DatabaseNumber of records retrieved
IEEE Xplore13,630
ACM Digital Library55,706
Scopus37,215
Web of Science67,126
TOTAL173,677
Table 2. Core standards and key elements of XR research dimensions.
Table 2. Core standards and key elements of XR research dimensions.
DimensionsCore StandardKey Elements (Must Include)
I. Concept Core and Description (RQ1)Direct exploration of the definition, evolution, and classification of XR terminology.Definition, Taxonomy, Framework, Survey, Review.
II. Theoretical Models and Frameworks (RQ2)Propose cross-scenario, reusable XR conceptual models or system architectures.Clear Model, Architecture, or Conceptual Framework.
III. User Perception Theory (RQ3)Discuss how users’ subjective experiences influence the perception of reality.Plausibility, Presence, Coherence, Fidelity.
IV. Macro Background and Critique (RQ4)Provide critiques of Milgram’s Continuum, extensions, or discussions of XR from a macro perspective.Direct critiques or extensions of the Reality-Virtuality Continuum. Discussion of Metaverse, IoT, or Social/Ethical Implications.
Table 3. Exclusion criteria for records.
Table 3. Exclusion criteria for records.
CriterionDescription
E1Did not contain an explicit definition, taxonomy, conceptual model, or framework related to XR/VR/AR/MR terminology (e.g., purely application reports without conceptual discussion).
E2Focused on technical implementation details (e.g., algorithms, rendering pipelines, networking, hardware performance) without proposing or interrogating a definitional or conceptual classification.
E3Addressed adjacent concepts (e.g., games, 3D graphics, teleoperation) and term misuse (e.g., MRI in the medical field).
E4Not a peer-reviewed scholarly source (e.g., patents, marketing materials), unless it was an industry definition explicitly analysed in Section 6.
E5Not accessible in full text or did not provide sufficient information to extract definitional/model content.
E6Not written in English.
Table 4. List of Included Literature (R = Reviews/Surveys, T = Theory/Taxonomy, E = Empirical/Experimental, S = System/Prototypes).
Table 4. List of Included Literature (R = Reviews/Surveys, T = Theory/Taxonomy, E = Empirical/Experimental, S = System/Prototypes).
No.TitleYearType
p1A head-mounted three dimensional display [13].1968S
p2Defining virtual reality: dimensions determining telepresence [14].1992T
p3Knowledge-based augmented reality [15].1993S
p4Taxonomy of mixed reality visual displays [8].1994T
p5Augmented reality: a class of displays on the reality-virtuality continuum [16].1995T
p6A survey of augmented reality [17].1997R
p7Wearable computing: a first step toward personal imaging [18].1997S
p8A cross-media presence questionnaire: the ITC-Sense of presence inventory [19].2001R
p9The MagicBook: a transitional AR interface [1].2001S
p10Research on presence in virtual reality: a survey [20].2001R
p11Mediated reality with implementations for everyday life [12].2002T
p12Augmented reality: a new way of seeing [21].2002R
p13From gaming to training: a review of studies on fidelity, immersion, presence, and buy-in and their effects on transfer in PC-based simulations and games [22].2005R
p14Place illusion and plausibility can lead to realistic behaviour in immersive virtual environments [23].2009T
p15Mixed reality: concepts, tools and applications [24].2010R
p16From 2D to augmented reality [25].2011E
p17Seam the real with the virtual: a review of augmented reality [26].2015R
p18Plausibility illusion in virtual environments [27].2016E
p19Perceived realism of virtual environments depends on authenticity [28].2017T
p20A survey of presence and related concepts [29].2017R
p21Augmented-reality survey: from concept to application [30].2017R
p22Virtual world, defined from a technological perspective and applied to video games, mixed reality, and the Metaverse [31].2018T
p23Redefining mixed reality: user-reality-virtuality and virtual heritage perspectives [32].2019T
p24What is mixed reality? [33]2019T
p25Six types of audio that DEFY reality!: A taxonomy of audio augmented reality with examples [34].2019T
p26Evaluating mixed and augmented reality: a systematic literature review (2009-2019) [35].2020R
p27The role of plausibility in the experience of spatial presence in virtual environments [36].2020E
p28Seamless, bi-directional transitions along the reality-virtuality continuum: A conceptualization and prototype exploration [37].2020E
p29The why, what, and how of immersive experience [38].2020R
p30Augmented reality books: concepts, typology, tools [39].2020T
p31Defining AR: Public perceptions of an evolving landscape [6].2020E
p32Music in extended realities [40].2021R
p33Revisiting Milgram and Kishino’s reality-virtuality continuum [41].2021T
p34Virtual reality and augmented reality displays: Advances and future perspectives [42].2021R
p35A Review of current, complete augmented reality solutions [43].2021R
p36A metaverse: Taxonomy, components, applications, and open challenges [44].2022R
p37Congruence and plausibility, not presence: pivotal conditions for XR experiences and effects, a novel approach [45].2022T
p38The impact of augmented reality on cognitive load and performance: A systematic review [46].2022R
p39Sensorimotor realities: Formalizing ability-mediating design for computer-mediated reality environments [5].2022T
p40Augmented Reality: A comprehensive review [4].2022R
p41What is XR? Towards a framework for augmented and virtual reality [7].2022T
p42Breaking plausibility without breaking presence—evidence for the multi-layer nature of plausibility [47].2022E
p43Bridging the gap across realities: visual transitions between virtual and augmented reality [48].2022E
p44What is significant in modern augmented reality: A systematic analysis of existing reviews [49].2022R
p45A scoping survey on cross-reality systems [50].2023R
p46A Review of Milgram and Kishino’s reality-virtuality continuum and a mathematical formalization for combining multiple reality-virtuality [51].2023T
p47Evaluation of XR applications: A tertiary review [52].2023R
p48Exploring plausibility and presence in mixed reality experiences [53].2023E
p49Uncertainty unveiled: Revealing the uncertainty of distribution visualization through cross reality [54].2023S
p50CoLT: Enhancing collaborative literature review tasks with synchronous and asynchronous awareness across the reality-virtuality continuum [55].2023S
p51Eight questions on the evolution and impact of extended reality technologies in engineering systems performance evaluation [56].2024T
p52Assessing depth perception in VR and video see-through AR: A comparison on distance judgment, performance, and preference [57].2024E
p53Dyadic-XV: A taxonomy of digital beings [58].2024T
p54The use of CNNs in VR/AR/MR/XR: a systematic literature review [59].2024R
p55Extended Reality-aware wireless communication networks: A systematic literature review [60].2024R
p56eXtended meta-uni-omni-Verse (XV): Introduction, taxonomy, and state-of-the-art [61].2024T
p57Intelligent virtual reality and augmented reality technologies: An overview [62].2025R
p58From display to interaction: Design patterns for cross-reality systems [63].2025T
p59Towards a unified definition of social XR [64].2025T
Table 5. The Divergence of Core Terminology Across Academic Theory, Social Ecosystem (Meta), and Professional Fidelity (Varjo).
Table 5. The Divergence of Core Terminology Across Academic Theory, Social Ecosystem (Meta), and Professional Fidelity (Varjo).
CharacteristicsAcademic Theory (General)MetaVarjo
Definition of MRAn integration of real and virtual worlds along the RV continuum [8], defined as a “single perception” of real and virtual stimuli [41], with multiple working definitions in literature [33].A practical implementation that blends virtual content with the user’s physical environment via video passthrough. The focus is on social interaction and creating a platform ecosystem for apps and experiences within a real-world context.The ultimate fusion of real and virtual, where virtual content is photorealistic and indistinguishable from reality. It is achieved via high-fidelity video passthrough, enabling professional use cases that demand extreme precision and realism.
Key TechnologyInitially defined by display capabilities (EWK, RF, EPM) [8], later expanded to include user-centric dimensions like immersion and coherence [41].Driven by deep learning for content generation, mobile access, multimodal interaction, and a synergistic hardware/software ecosystem  [44].“Human-eye resolution” display; low-latency video passthrough; gaze-driven XR autofocus system.
Primary GoalTo create a theoretical framework for classifying and understanding a wide range of experiences [8,41].To build a social, accessible, and commercially viable Metaverse ecosystem for the general consumer.To provide a high-ROI tool for critical professional applications like training, simulation, and design.
Relationship to Milgram’s ContinuumMilgram and Kishino [8] originally proposed the Reality–Virtuality (RV) continuum. Skarbez et al. [41] argued that the continuum is fundamentally discontinuous at the “perfect VR” extreme and that all technologically mediated experiences fall within the broader scope of MR.The Metaverse concept transcends the continuum by focusing more on social meaning and persistent content, accessible via both immersive (VR/AR) and non-immersive (2D screens) platforms.Enables seamless, high-fidelity transitions between AR and VR, effectively operationalising the RV continuum [48,54].
Typical ApplicationTele-robotics, virtual environments, stereoscopic video/graphics. Early AR systems were used to enhance video scenes and for maintenance assistance [8].Social events, fashion, games, education, collaborative office work, psychotherapy, and marketing simulations [44].Training and Simulation (Flight, Crane), Design and Visualisation, Research, Sales and Marketing, Government and Defence, Healthcare, Automotive.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, X.; Lehtonen, T.; Mäkilä, T. From the Reality–Virtuality Continuum to the XR Ecosystem: A Systematic Literature Review of Definitions and Conceptual Models. Multimodal Technol. Interact. 2026, 10, 24. https://doi.org/10.3390/mti10030024

AMA Style

Han X, Lehtonen T, Mäkilä T. From the Reality–Virtuality Continuum to the XR Ecosystem: A Systematic Literature Review of Definitions and Conceptual Models. Multimodal Technologies and Interaction. 2026; 10(3):24. https://doi.org/10.3390/mti10030024

Chicago/Turabian Style

Han, Xiaoran, Teijo Lehtonen, and Tuomas Mäkilä. 2026. "From the Reality–Virtuality Continuum to the XR Ecosystem: A Systematic Literature Review of Definitions and Conceptual Models" Multimodal Technologies and Interaction 10, no. 3: 24. https://doi.org/10.3390/mti10030024

APA Style

Han, X., Lehtonen, T., & Mäkilä, T. (2026). From the Reality–Virtuality Continuum to the XR Ecosystem: A Systematic Literature Review of Definitions and Conceptual Models. Multimodal Technologies and Interaction, 10(3), 24. https://doi.org/10.3390/mti10030024

Article Metrics

Back to TopTop