X-Reality Museums: Unifying the Virtual and Real World Towards Realistic Virtual Museums

: Culture is a ﬁeld that is currently entering a revolutionary phase, no longer being a privilege for the few, but expanding to new audiences who are urged to not only passively consume cultural heritage content, but actually participate and assimilate it on their own. In this context, museums have already embraced new technologies as part of their exhibitions, many of them featuring augmented or virtual reality artifacts. The presented work proposes the synthesis of augmented, virtual and mixed reality technologies to provide uniﬁed X-Reality experiences in realistic virtual museums, engaging visitors in an interactive and seamless fusion of physical and virtual worlds that will feature virtual agents exhibiting naturalistic behavior. Visitors will be able to interact with the virtual agents, as they would with real world counterparts. The envisioned approach is expected to not only provide reﬁned experiences for museum visitors, but also achieve high quality entertainment combined with more effective knowledge acquisition.


Introduction
Culture is a major connecting tissue of a society, facilitating its links with the past, strengthening its members' bonds and elevating their quality of life, but also allowing them to envision and plan their future. Culture has gone through important changes, expanding its potential audience, and is currently in a revolutionary phase-named Culture 3.0-in which individuals are expected to assimilate and manipulate in their own way the cultural contents they are being exposed to [1]. Contemporary museums are challenged to not only adapt to the current status quo, and follow existing trends, but also shape future cultural experiences.
Evolution though does not happen in isolation; instead it is easier when it is sought following multidisciplinary approaches. In this case, a discipline providing a helping hand is information and communication technologies (ICTs). The abundance of mature technologies in the fields of computer graphics, human-computer interaction, computer vision, etc., have dissolved reluctances, even of the most sceptics, to harness digital tools with Cultural Heritage (CH) towards better understanding our history and civilization [2], as well as actively participating and disseminating our cultural heritage. Furthermore, in accordance with Culture 3.0, state-of-the-art technologies are expected to play an important role to enhance museums' on-site experiences and hence transform them into high-technological CH spaces [3], able to provide high quality interactive experiences to their audience.
Museums take advantage of ICT technologies to exhibit digital content besides their physically exhibited artefacts and often employ augmented or virtual reality (collectively referred to as "mixed reality") technologies to immerse visitors in novel experiences, following playful approaches [4]. These immersive technologies, however, have the potential to truly innovate museum experiences, beyond merely engaging visitors to artificial worlds. When used in combination, and by employing state-of-the art computer graphics, multisensory interaction, and artificial intelligence (AI) [5], they have the potential to orchestrate experiences of both physical [6] and virtual [7] worlds in an outstanding blend. Museum visitors can be immersed in virtual worlds unravelling in front of them and substituting in a seamless manner the museum's physical environment. Virtual agents in this new world will behave and move naturally, thus acquiring the potential to directly interact with visitors who will be transformed from passive viewers to keen participants.
The presented work elaborates on the aforementioned vision and discusses background information and required technological advancements to attain it. In particular the contributions of this work are: (i) a conceptual model that can be used as a reference for implementing XR museum experiences which transcend the boundaries between real and virtual interaction in the museum space, (ii) the elaboration of a novel approach, that of "true mediated reality", which refers to the collective technologies in support of realistic, high-quality human-to-virtual character interactions in cultural heritage contexts, (iii) a tangible case study, illustrating how the proposed conceptual model is materialized.
The rest of this article is structured as follows. Section 2 presents a review of the literature, introducing readers to the extended reality (XR) concept, as well as the current state regarding the technological components involved in the proposed conceptual architecture (diminished reality, true mediated reality, and natural multimodal interaction), concluding with a highlight on future research directions. Section 3 elaborates on the proposed conceptual model for building unified XR experiences toward realistic virtual museums, providing a segmentation of the components into a layered design so as to facilitate the development of novel solutions. Section 4 presents a case study, aiming to instantiate the discussed topics and use of the conceptual architecture in a concrete, real-world example. Finally, Section 5 discusses main points presented throughout the article, drawing conclusions on future research directions.

Related Work
Several technological achievements need to be accomplished, in order to materialize, and eventually achieve the delivery of unique unified XR experiences in realistic virtual museums. In this Section, we provide a review of the scientific literature including a look back and overview of X-Reality in general, as well as the key technologies that constitute the X-Reality vision. Sections 2.2-2.4 summarize the current state for each one of the main architectural components of the conceptual model elaborated in Section 3, while Section 2.5 highlights challenges that should be overcome.

Extended Reality (XR)
According to the EU, "technologies such as augmented (AR) and virtual reality (VR) are going to transform the ways in which people interact and share information on the internet and beyond" [8]. Indeed, the recent proliferation of affordable VR/AR headset gear and smart devices, has enabled both technologies to surpass the Gartner hype cycle [9] peak of inflated expectations (or "hype"), and allow the global VR/AR market to grow at an exponential rate, backed by a rapid increase in demand and production [10].
VR technology, as an immersive medium fully occupying (mainly, but not exclusively) the human auditory and visual sensory channels is not new, its roots debatably dating back before the 1960s [11]. Affordable, consumer-grade VR headsets have been around as early as the mid-90s with the release of the Forte VFX1 (retailing at $695) in 1995 [12]. Today, over two decades later, the VR industry can be considered as relatively mature, and in its early stages of adoption [13]. The market however still has not fully responded since VR's second renaissance, arguably signaled by the acquisition of pioneering technology company Oculus by the industrial giant Facebook, in 2014 [14]. Unlike smartphones that exhibited fast adoption, VR technology mostly depends on enterprise adoption, thus exhibiting slower rates [15]. A 2017 study of PricewaterhouseCoopers with 2216 industry respondents indicated that 7% were currently making substantial investments in VR technology, a percentage that was double-sized (15%) for their forthcoming three years investments [16]. Although the percentages are rather low compared to investments in other technologies (e.g., Internet of Things current investments were 73%, Artificial Intelligence 54%, robotics 15%), they nevertheless confirm that VR is one of the eight technologies that enterprise is interested in, and also that it exhibits an increasing adoption tendency.
AR, in its own respect is an emerging technology, focused on integrating virtual (e.g., computer generated) objects into a real-world experience, aligning both real and virtual objects with each other in a complementary manner and creating the illusion that virtual images are seamlessly blended with the real world [17]. Unlike VR, where users are totally immersed in a virtual world, AR aims to combine real and virtual objects, in a way that is beneficial towards accomplishing specific tasks or missions [18]. Popular application domains include education, architecture, industry, health, marketing, and games [17,19]. Indicative of the popularity of this technology to end-users and of its potential to disrupt the gaming technology industry is the recent 1 billion download [20] milestone reached by the immensely popular Pokémon GO AR game, which was released in the summer of 2016.
Apart from the well-established domains of each respective medium, the notion of MR has been defined in the seminal work of Milgram and Kishino [21], as the merging of real and virtual worlds, in an effort to describe environments not necessarily exhibiting total immersion and complete synthesis, but generally fall in somewhere along a virtuality continuum. More recently, MR has been given the description of a "broader" virtual reality of an interdisciplinary nature [22], and a superset containing AR/VR. Hence, on the one hand, MR refers to blurring the boundaries of what is real and what is virtual, enhancing the user's perception of the real world with digital information. In this respect, various MR applications have been proposed in the scientific literature that are closer to the AR end of the virtuality continuum, including among others interactive paper matter [23,24], augmented restaurant tables [25], sites' reconstruction [26], educational applications [27] and cultural heritage applications [28]. On the other hand, other MR approaches that lie closer to the VR end of the virtuality continuum, refer to the blending of real world physical objects or people into synthetic 3D worlds (augmented virtuality-AV [29]). Early examples of AV explored the now common practice of texturing 3D surfaces with real-life video and photographs [30]. More recent applications based on efficient, low-cost platforms, include 3D videoconferencing [31], telepresence [32] and infotainment [33], gaming [34,35], education [36], scene inspection [37,38], and even surgical operation [39].
In this relatively well-established AR/MR/VR landscape, extended reality (XR)sometimes also referred to as cross-reality [40]-has been given at least three definitions, as a superset, extrapolation or even subset of MR [41]. In this work, XR refers to the collective range of taxonomies of the reality-virtuality continuum, including AR, MR and VR applications [42]. In this respect, XR aims to fuse layered objects into the real world with an immersive digital world, allowing users to partake in activities that are not possible in a strictly MR digital environment, or even the real physical world. Combined with the various technological advancements in human-computer interaction to act as key enablers, users' experience with AR, MR, and VR applications is enhanced [11], enabling XR to reach beyond "reality" (eXtended). This type of intersection between the real and the virtual generates ample opportunities for XR applications to facilitate an entirely new "reality" space to interact with and innovate inside of. It represents the next step towards immersive interaction through parallel enabling technologies, to create an advanced experience with real-time rendering requirements. XR focuses on weaving together each enabler's heterogeneous interaction layers so as to deliver a unique experience, one that integrates users in both a physical and a virtual representation of the same geographical space at the same time. Incorporating elements of diminished and mediated reality (e.g., replacing physical objects with virtual ones, as well as realistic character blending, animation techniques Appl. Sci. 2021, 11, 338 4 of 16 and rendering features) into an XR environment will allow it to be used as a building block for producing various novel augmented, mixed and virtual reality applications. XR encompasses a wide spectrum of hardware and software, including sensory interfaces, applications, and infrastructures, that enable content creation for its subsets, AR, MR, and VR [43]. XR thus represents the convergence of AR, MR, and VR, in which the best elements of each aspect are utilized and optimized for a given use case scenario and application [44].
Representing a crossroads where multidisciplinary research conducted in the fields of computer vision (e.g., body tracking, hands articulation, face detection), artificial intelligence (e.g., natural language processing and generation, conversational systems), graphics (e.g., realistic 3D modelling, rendering, and animation) and natural interaction meet, XR tools have the unique potential to revolutionize cultural heritage experiences in terms of interpretation, visualization and interaction. In this respect, XR aims to replace the real world with an artificially enhanced environment orchestrated by software, and to stimulate in such environments a user's physical presence by duplicating, to the extent possible, the five basic senses of sight, touch and hearing, but also smell and taste, currently addressed at an experimental extent [45].
XR technologies proposed as a vehicle for promoting enhanced cultural experiences will allow people to virtually travel to other areas, and experience local history and lore in a fascinating way. Cultural organisations, as well as agencies engaged in and/or furthering humanitarian efforts, are tapping into the unique potential of AR/MR/VR technologies, in particular the empathy-inducing capabilities of VR [46]. These can effectively educate and raise awareness about certain issues, and even elicit response and action among viewers, as similar effects have been observed in VR-like museum experiences [47]. Building upon the reality-virtuality continuum [21], Figure 1 illustrates the xreality-virtuality continuum featuring examples of cultural heritage technological applications. In particular, the real environment can be mediated through mobile, holographic (tethered/untethered) AR, mobile VR, or desktop VR technologies to immerse users in a fusion of both real and virtual worlds. For example, an existing physical artifact can be augmented with additional digital information displayed on a mobile device, it can be essentially altered and even substituted in user's perception with a virtual counterpart, or it can even become completely virtual accessible only through VR.
immersive interaction through parallel enabling technologies, to create an advanced experience with real-time rendering requirements. XR focuses on weaving together each enabler's heterogeneous interaction layers so as to deliver a unique experience, one that integrates users in both a physical and a virtual representation of the same geographical space at the same time. Incorporating elements of diminished and mediated reality (e.g., replacing physical objects with virtual ones, as well as realistic character blending, animation techniques and rendering features) into an XR environment will allow it to be used as a building block for producing various novel augmented, mixed and virtual reality applications. XR encompasses a wide spectrum of hardware and software, including sensory interfaces, applications, and infrastructures, that enable content creation for its subsets, AR, MR, and VR [43]. XR thus represents the convergence of AR, MR, and VR, in which the best elements of each aspect are utilized and optimized for a given use case scenario and application [44].
Representing a crossroads where multidisciplinary research conducted in the fields of computer vision (e.g., body tracking, hands articulation, face detection), artificial intelligence (e.g., natural language processing and generation, conversational systems), graphics (e.g., realistic 3D modelling, rendering, and animation) and natural interaction meet, XR tools have the unique potential to revolutionize cultural heritage experiences in terms of interpretation, visualization and interaction. In this respect, XR aims to replace the real world with an artificially enhanced environment orchestrated by software, and to stimulate in such environments a user's physical presence by duplicating, to the extent possible, the five basic senses of sight, touch and hearing, but also smell and taste, currently addressed at an experimental extent [45].
XR technologies proposed as a vehicle for promoting enhanced cultural experiences will allow people to virtually travel to other areas, and experience local history and lore in a fascinating way. Cultural organisations, as well as agencies engaged in and/or furthering humanitarian efforts, are tapping into the unique potential of AR/MR/VR technologies, in particular the empathy-inducing capabilities of VR [46]. These can effectively educate and raise awareness about certain issues, and even elicit response and action among viewers, as similar effects have been observed in VR-like museum experiences [47]. Building upon the reality-virtuality continuum [21], Figure 1 illustrates the xrealityvirtuality continuum featuring examples of cultural heritage technological applications. In particular, the real environment can be mediated through mobile, holographic (tethered/untethered) AR, mobile VR, or desktop VR technologies to immerse users in a fusion of both real and virtual worlds. For example, an existing physical artifact can be augmented with additional digital information displayed on a mobile device, it can be essentially altered and even substituted in user's perception with a virtual counterpart, or it can even become completely virtual accessible only through VR.

Diminished Reality
From an etymological perspective diminish means to become gradually less. In the context of XR technologies, the term diminished reality is used to denote the fading of real parts of the environment and their substitution with a virtual counterpart. The notion of diminished reality was coined by the pioneer of wearable computing, Steve Mann, and describes a reality that can remove, at will, certain undesired aspects of regular reality [48]. Diminished reality has been considered as a subdomain of AR and MR, and has become an active field of research due to the proliferation of these technologies, despite their inherent difference in the augmentation approach, as AR and MR principally superimpose virtual objects on the real world and do not substitute them [49].
During the last decades, several diminished reality approaches have emerged [49] that can be clustered in two main categories: those that require prepared structure information and registered photos and those that achieve real-time processing and work without pre-processing the target scene. However, current approaches suffer from shortcomings regarding the realistic and real time elimination of obstacles and their substitution by plausible background. In light of AI techniques able to address difficult computer vision challenges efficiently and quickly, the concept of diminished reality can be tackled by methods based on 3D geometry [50] and deep learning [50,51] to remove objects from the user's point of view in real-time. These approaches should avoid visible artefacts, by developing techniques to match rendering of the background with the viewing conditions. Current efforts have achieved important results for image inpainting, employing deep convolutional networks [52], as well as other machine learning approaches such as generative adversarial networks [53]. At the same time, future DR approaches should also consider other aspects of the real environment, such as sound or human motion [49].

True Mediated Reality
Virtual characters play a fundamental role for attaining a high level of believability in XR environments [54]. In this respect, we introduce the term "true mediated reality" to define the need for delivering realistic virtual characters, by means of (i) photo-realistic reconstruction of 3D models from the real world, (ii) real-time mixed-reality virtual character rendering and animation, and (iii) realistic interactive virtual characters. In the context of virtual museums, true mediated reality is a key element for achieving high quality user experiences, but also for transferring knowledge to museum visitors.
With regard to realistic reconstruction of 3D models from the real world, significant technical progress has been achieved in capturing, creating, archiving, preserving, and visualizing 3D digital items [55]. However, for the realization of the proposed unified XR experiences in realistic virtual museums, state of the art 3D reconstruction techniques should be further enhanced, in order to produce realistic 3D reconstructions that will be appropriate for deployment in real-time XR environments. In addition, in the context of modelling cultural heritage artefacts, a challenge that needs to be overcome is that antiquities, and especially statues, are often incomplete (e.g., broken, or partly recovered), thus rendering typical 3D reconstruction techniques ineffective. Such a restoration would be highly specialized and resource demanding, requiring the involvement of domain experts (e.g., archaeologists and curators) with sophisticated 3D modelling skills to manually extend a partially reconstructed model. To resolve this, future research should explore semi-automated approaches, through interdisciplinary joined efforts involving the fields of archaeology, museology and curation, as well as computer science.
Rendering deformable objects and characters in XR, attaining a high level of believability and realism of real-time registration between real scenes and virtual augmentations requires two main aspects for consistent matching: geometry and illumination [55]. Superimposing in real-time virtual objects and dynamic-deformable virtual scenes onto an image of a real scene is still an open research field [56,57]. In addition, a demanding undertaking in order to achieve realistic representation is the real-time animation of deformable virtual agents through real-time model skinning. Given the proliferation of low CPU power devices (e.g., untethered headsets), a prominent technological challenge is the need for realistic, high fidelity and high refresh rate rendering of the deformable 3D characters in such devices.
No matter how efficient model deformation algorithms for the rendering of humanoid 3D models are, several aspects that imitate the human behavior should also be considered for creating convincing virtual agents. Realistic interactive virtual characters should feature a reliable and consistent motion and dialog behavior, but also nonverbal communication and affective components. Modelling the "mind" and creating intelligent communication behavior on the encoding side, is an active field of research in AI [58]. On the other hand, the visual representation of a character including its perceivable behavior, from a decoding perspective, such as facial expressions and gestures, belongs to the domain of computer graphics and implicates many open issues concerning natural communication [59]. Other non-verbal communication attributes that should be addressed by embodied agents in order to further enhance their naturalness include body posture, which provides emotional cues, as well as gaze direction, which is important not only for conveying emotion but also for semantic cues during a conversation. In this respect, addressing the challenge of realistic interactive virtual characters requires new models, taking into account all the aforementioned non-verbal communication cues, while following a perception-attentionaction process for virtual characters, in order to improve the naturalness of their behavior. Such models should include: (i) perception capabilities that will allow virtual agents to access knowledge of states of users and other virtual agents (for example position, gesture and emotion) and information of both real and virtual environments; (ii) attention capabilities that will model the cognitive process of humans to focus on selected information of importance or interest; and (iii) decision-making and motion synthesis for virtual agents.
Another direction, in which virtual characters for XR applications could turn to, is integrated, low-cost systems for photo-realistic 3D human (self-) representation. Significant growth has emerged in this field since the advent of commodity cameras and depth sensors. Approaches developed over the years are able to extract lifelike 3D human "avatars" from unconstrained sources, utilizing deformable 3D mesh geometry and corresponding image data (which varies over time) to obtain the human's shape, and textured appearance for every frame of animation, even in challenging situations [60]. Of particular interest in this direction will be the preservation of users' facial appearance and expressions given the significant occlusion of the users' heads by head-mounted display gear [61].
In conclusion, true mediated reality aims to improve the consistency of the simulated world with the actual reality, by positioning 3D True-AR models [62] in the real world in a very veritable manner, leading to people not being able to notice that the model they are looking at is actually a 3D augmented model, thus achieving "suspension of disbelief". In the museum context, true mediated reality will achieve the integration of 3D statues' models in the real world environment, supporting nonverbal and verbal communication, affective components, and behavioral aspects, such as gaze and facial expressions, lip movements, body postures and gestures, creating realistic embodied virtual agents, able to share knowledge with visitors through storytelling.

Natural Multimodal Interaction
Natural interaction with technology is a much-acclaimed feature that has the potential to ensure optimized user experience, as people can communicate with technology and explore it like they would with any real world artefact: through gestures, expressions, movements, and by looking around and manipulating physical stuff [63]. With regard to virtual environments, natural interaction has the potential to increase user immersion, thus enhancing user performance and enjoyment [64].
From an interaction perspective, techniques that are considered natural include gestures, touch, head and body movements, but also speech, as well as gaze input. Although all these modalities are feasible in XR environments and there are considerable efforts and achievements exploring how each one of them is applied, a major concern with regard to natural interaction with virtual agents is the fusion of multiple modalities into such a complex system. Future efforts in the field should adopt the notion of adaptive multimodality, offering to users the most appropriate and effective input forms at the current interaction context [65]. Multimodal interactions in crowded real-life settings impose additional concerns, as interaction does not take place in isolation in the virtual environment only, but concurrently with real world interactions with other museum visitors or friends, a Appl. Sci. 2021, 11, 338 7 of 16 challenge which is further escalated when adopting recent trends towards inter-connected VR [66].
Natural speech-based interaction is a fundamental constituent of natural interaction with virtual agents, addressed by the field of embodied conversational agents that typically combine facial expression, body posture, hand gestures, and speech to provide a more human-like interaction [67]. Although currently embodied conversational agents remain rather rare, it is indicative of the popularity of dialogue-based systems the fact that more and more applications enrich their classical graphical user interfaces (GUI) with personal chat services.
Several of the conversations generated by conversational agents are driven by artificial intelligence, while others have people supporting the conversation. Lately, AI has been refuelled with the emergence of deep learning and neural networks, which boosted the research results in natural language processing (NLP) as well [68]. Nevertheless, in the design of conversational interfaces, NLP remains the biggest bottleneck. Furthermore, despite the considerable number of conversational agent engines that have already been developed, there are very limited attempts to address diverse groups of museum visitors, e.g., children, the elderly, technologically naïve visitors, etc. In this respect, a completely new mind-set must be adopted in designing and developing conversational interfaces, even when a chat might seem so simple.

Summary of Future Challenges
As XR technologies gradually become more and more immersive and realistic, they are empowered to transform the way that information is delivered and consumed. Put simply, in order to successfully combine the real and virtual world and deliver a unified experience that cannot be derived exclusively either from the real or the virtual content, three technological challenges need to be resolved: (i) hide some parts of the real world (diminished reality), (ii) superimpose realistic virtual scenes and characters (true mediated reality), and (iii) interact with the virtual agents in a natural manner (natural multimodal interaction). The relevant current technological challenges and future directions, as they have been discussed in this section, are summarized in Table 1. Develop new models to support improve the virtual agents' behaviour naturalness, by following the perception-attention-action approach. • Deploy convincing, real time head mounted display removal for photo-realistic digitized 3D characters. • Employ adaptive multimodality to support natural input in a dynamically changing context of use. • Advance embodied conversational agents, by adopting a completely new mind-set in designing and developing conversational interfaces, that will also take into account the diversity of potential users.

A Conceptual Architecture for Unifying XR Experiences for Realistic Virtual Museums
Museums have existed since the third century BC, and until today they have undergone several changes to cope with the sociological, cultural and economic shifts through humankind's history [69]. Nevertheless, one fundamental attribute that has remained intact is their orientation towards education: museums principally aim to share knowledge with their audiences. Contemporary museums have embraced technology and incorporated technological artifacts related to their collections, as a means of creating delightful experiences, increasing their wow-factor, entertaining visitors, but also for providing access to collections that are not physically exhibited in the museum and for giving additional information about the exhibited artifacts.
Among the wide variety of technological exhibits, AR and VR solutions are increasingly becoming popular, due to the high immersion and presence they offer [70,71], taking also into account that the hardware involved is now affordable and has achieved important progress in the delivered user experience quality. This is an important accomplishment and constitutes a milestone for further evolving XR experiences in the museum towards becoming unique and unified.
Unified XR experiences in realistic virtual museums focus on engaging museum visitors in an interactive and immersive blend of physical and virtual, as if it was a single unified "world" [72]. Following this concept, interaction with the XR environment and its agents will be achieved naturally, as when interacting with real world artifacts and counterparts. Embodied virtual agents will interact with museum visitors in order to provide instructions and transfer knowledge in a more direct manner (e.g., historical personalities sharing their stories, infamous artwork becoming "alive"). This realistic interplay will introduce passive museum visitors to active partakers, thus endorsing their feeling of presence in the XR environment and achieving better transfer of knowledge and higher enjoyment.
From a technical perspective, unified XR experience in realistic virtual museums, requires a distributed Service oriented Architecture (SoA) that will interweave the different technologies in a flexible and scalable manner and promote reusability, interoperability, and loose coupling among its components. Figure 2 illustrates a conceptual model incorporating the fundamental components that such approaches should comprise.
Overall, the conceptual model involves two main component categories: (i) elements that directly affect user interaction and are responsible for delivering the XR experience (green area in Figure 2), and (ii) components pertaining to processes unseen by the end user and which are responsible for interpreting user interactions (marked as "true mediated reality" in Figure 2-yellow area).
In particular, the diminished reality component undertakes the task of removing, in real-time, physical elements (e.g., a museum exhibit) that will be replaced with their virtual counterparts in the user's view. In order to achieve this, several processes have to run. Scene registration and localization processes will identify the user's location in the physical environment and the objects in their field of view. This is an ongoing procedure during the interaction, as the user's location may be modified anytime. Then, physical objects are perceptually removed, by substituting them with the appropriate background.
From a technical perspective, unified XR experience in realistic virtual museums, requires a distributed Service oriented Architecture (SoA) that will interweave the different technologies in a flexible and scalable manner and promote reusability, interoperability, and loose coupling among its components. Figure 2 illustrates a conceptual model incorporating the fundamental components that such approaches should comprise. Overall, the conceptual model involves two main component categories: (i) elements that directly affect user interaction and are responsible for delivering the XR experience (green area in Figure 2), and (ii) components pertaining to processes unseen by the end user and which are responsible for interpreting user interactions (marked as "true mediated reality" in Figure 2-yellow area).
In particular, the diminished reality component undertakes the task of removing, in real-time, physical elements (e.g., a museum exhibit) that will be replaced with their virtual counterparts in the user's view. In order to achieve this, several processes have to run. Scene registration and localization processes will identify the user's location in the physical environment and the objects in their field of view. This is an ongoing procedure during the interaction, as the user's location may be modified anytime. Then, physical objects are perceptually removed, by substituting them with the appropriate background.
Next, virtual agents have to be placed in the virtual environment, in order to substitute the physical exhibits. This is handled by the true mediated reality components. A prerequisite for delivering high quality experiences is the realistic reconstruction of 3D models, matching-and in certain cases extending-their physical counterparts. 3D models should ideally not only represent in a realistic manner the museum exhibits, but also attempt to deliver their original form in case of ruined artifacts (e.g., statues with missing parts). Then, the virtual representations of physical artifacts are placed in the virtual environment, in a veritable manner, a task that requires realistic rendering and animations.
Last, a unified experience requires context-sensitive natural interaction with multiple users, which involves processes that are responsible for perceiving and interpreting users' natural input commands, namely gestures and natural language. In this respect, a natural language processing knowledge base needs to be embedded in the system, while a corresponding process will undertake the task of identifying the received user commands. At the same time, a variety of input gestures should be supported, in accordance with the state of the art in the field, thus allowing users to build upon their experiences with other Next, virtual agents have to be placed in the virtual environment, in order to substitute the physical exhibits. This is handled by the true mediated reality components. A prerequisite for delivering high quality experiences is the realistic reconstruction of 3D models, matching-and in certain cases extending-their physical counterparts. 3D models should ideally not only represent in a realistic manner the museum exhibits, but also attempt to deliver their original form in case of ruined artifacts (e.g., statues with missing parts). Then, the virtual representations of physical artifacts are placed in the virtual environment, in a veritable manner, a task that requires realistic rendering and animations.
Last, a unified experience requires context-sensitive natural interaction with multiple users, which involves processes that are responsible for perceiving and interpreting users' natural input commands, namely gestures and natural language. In this respect, a natural language processing knowledge base needs to be embedded in the system, while a corresponding process will undertake the task of identifying the received user commands. At the same time, a variety of input gestures should be supported, in accordance with the state of the art in the field, thus allowing users to build upon their experiences with other gestural interfaces and interact with the virtual environment easily and effectively. In parallel, an emotion detection process will be in charge of monitoring and detecting user emotions, so that the system can be further adapted to the user. All identified gestures, speech, and emotions should be taken into account by a context-sensitive interaction decision making process, responsible for determining how the virtual statue will respond, considering also other parameters, such as the number of users who actively interact with the virtual exhibit and of those who passively attend the ongoing interaction. The decisions made will impact the virtual agent's behavior, in terms of posture, gestures, exhibited emotions, as well as the information that will be delivered through multiple possible formats, including spoken dialogue output.
The next section (Section 4) exemplifies the aforementioned conceptual architecture through an example in the form of a case study.

Case Study: XR Natural History Museum
The orchestration of the previously detailed conceptual architecture towards delivering an immersive XR user experience, integrating users in both a physical and virtual representation of the same geographical space at the same time is explained through the example of an interactive XR exhibition installation dedicated to the presentation of Pleistocene Cretan fauna [73]. This demonstration intends to showcase living, animated, life-sized reconstructions of the animals that roamed Crete approximately 800,000 years ago. The purpose of this case study is to create a unified XR experience that intertwines "realities" (augmented, virtual, and plain reality in this case) to deliver a unique experience that transcends the capacities of each medium individually, and enables users to interact with other museum visitors in the same room while immersed, and to simultaneously enjoy all the benefits offered by both the physical as well as synthetic museum space addressed.

XR Systems and Applications
As illustrated in Figure 3, there are two types of interactive systems that can coexist in the same physical space, lending full support the Pleistocene Crete exhibition. First, the virtual experience, which constitutes a fully immersive, small museum room, where fossils and reconstructed skeletons of the creatures (based on hypothesized remains not yet found, or making up for the fact that animal remains may have been moved to museums abroad) are either mounted, or shown as dig sites. The user can navigate the room and view an abundance of information shown either in textual, image, audible or video form, borrowing real elements from the museum's audiovisual material (Figure 4a). However, users can opt to "travel" back in time, and view each individual animal in a fully animated, lifelike reconstruction, roaming a virtual recreation of the animal's habitat that completely transforms the environment around the user ( Figure 4b). As such, users can observe the different ways these animals moved, and thus gain a far greater insight on the morphological conditions of the environment that allowed each animal to thrive in Crete at various time periods during the Pleistocene era. A "timeline" interactive User Interface feature further enables users to "travel" to the various time periods corresponding to the calculated eras where each animal lived, and visualize the changes via various effects (i.e., a fast-forward montage of evolution). Narrative elements can further be infused into the virtual exploration, allowing the user to view interactions between animals spanning the same time period, as if partaking in a nature documentary television series [74] (e.g., witnessing the growth of a specific Mammuthus creticus [75], or Athene cretensis [76] preying on a herd of Candiacervus [77]).
Second, the substitutional reality experience serves as an augmented approximation of the aforementioned virtual case. In this experience, the real, physical area of the museum fossil room is setup to accommodate the XR experience, allowing the real (or plasterbuilt) animal remains to be subjected to a diminished reality effect. As the real-world animal skeleton cast disappears, a lifelike, moving reconstruction of the animal takes its place, and is allowed to roam the physical space around the user. Elements of natural interaction (involving gestures, movement, body postures and object manipulation) can be embedded in the experience, allowing the animal to realistically react to users' attempts to touch it, while also utilizing spatial mapping and surface understanding to allow the holographic creature to avoid collisions with other bystanders, furniture, and overall museum infrastructure.

Mapping to the Proposed Conceptual Model
The aforementioned exhibition systems and applications encompass the abstract framework of concepts and relationships presented in Section 3, so as to serve as guidance for the development of system; interpret the interactions between entities in the application environment and through its defined relationships; derive a specific and concrete architecture for describing the structure models of this particular use case. Means by which elements identified as either interaction-oriented or structures for true mediated reality are illustrated in the mapping shown in Figure 5.
place, and is allowed to roam the physical space around the user. Elements of natural interaction (involving gestures, movement, body postures and object manipulation) can be embedded in the experience, allowing the animal to realistically react to users' attempts to touch it, while also utilizing spatial mapping and surface understanding to allow the holographic creature to avoid collisions with other bystanders, furniture, and overall museum infrastructure.

Mapping to the Proposed Conceptual Model
The aforementioned exhibition systems and applications encompass the abstract framework of concepts and relationships presented in Section 3, so as to serve as guidance for the development of system; interpret the interactions between entities in the application environment and through its defined relationships; derive a specific and concrete architecture for describing the structure models of this particular use case. Means by which elements identified as either interaction-oriented or structures for true mediated reality are illustrated in the mapping shown in Figure 5. place, and is allowed to roam the physical space around the user. Elements of natural interaction (involving gestures, movement, body postures and object manipulation) can be embedded in the experience, allowing the animal to realistically react to users' attempts to touch it, while also utilizing spatial mapping and surface understanding to allow the holographic creature to avoid collisions with other bystanders, furniture, and overall museum infrastructure.

Mapping to the Proposed Conceptual Model
The aforementioned exhibition systems and applications encompass the abstract framework of concepts and relationships presented in Section 3, so as to serve as guidance for the development of system; interpret the interactions between entities in the application environment and through its defined relationships; derive a specific and concrete architecture for describing the structure models of this particular use case. Means by which elements identified as either interaction-oriented or structures for true mediated reality are illustrated in the mapping shown in Figure 5.  As can be seen in this mapping, the reference conceptual model is viewed line of principles guiding the design of the Pleistocene Crete XR ecosystem. The of the exhibition system architecture to our conceptual model allows the arch combine all the necessary elements and IT components in a unified X-Reality environment, encapsulating the system functionalities into a number of paralle services. This allows us to break down otherwise complex processes into easy standalone components, and hence greatly simplifies aspects of system develop integration. As can be seen in this mapping, the reference conceptual model is viewed as an outline of principles guiding the design of the Pleistocene Crete XR ecosystem. The alignment of the exhibition system architecture to our conceptual model allows the architecture to combine all the necessary elements and IT components in a unified X-Reality interactive environment, encapsulating the system functionalities into a number of parallel-running services. This allows us to break down otherwise complex processes into easy-to-grasp, standalone components, and hence greatly simplifies aspects of system development and integration.

Operational Setup and Evaluation Framework
Similar experiences to the one described can be developed to serve a variety of cultural institutions' needs. For the production of the XR content, a collaboration among museum curators and technology experts is warranted, in order to reproduce reconstructed digital museum artefacts with a high degree of fidelity.
The aforementioned are intended as in-venue offers within an actual museum, meaning a dedicated XR space will be required for the deployment of both experiences. Similar mixed reality interaction demonstrators suggest an adequate "play area" be made available for each demonstrator so as to accommodate unimpeded user movement without risk of injury, but also ensure that the user's capacity to navigate each XR space is not severely limited. Ideally, museums should be encouraged to employ one expert to be present at each experience space for troubleshooting, as well as providing one-on-one tutoring prior to the users immersing themselves into each experience. Each experience should have a limited duration, both to ensure enough time can be allocated for crowds of museum visitors to try out each application, as well as to combat the potential effects of motion sickness some users (especially first-timers) may experience while trying out these novel experiences.
To assess the museum visitor experience and gather feedback for improving the design and functionality of the applications, a reference evaluation framework can be used, based on common criteria such as the systems' usability and engagement. A recent evaluation methodology which applies to this particular case is proposed in [78]. Furthermore, additional aspects of each experience should be taken into account to assess the systems' potential as a tool for museum education. Such ventures constitute an interesting direction for future work.

Conclusions
Museums have been characterized as "places where time is transformed into space" (The quote is attributed to Orhan Pamuk, a Nobel Laureate novelist). Contemporary museums have gone through various shifts, expanding their thematic and hosting not only historical or art exhibits, but a wide breadth of tangible and intangible cultural heritage artefacts, as well as scientific and technology artefacts. At the same time, museum visitors have changed themselves, becoming more tech savvy and often desiring the incorporation of technological artefacts in non-technological museums. Yet, technology should not be used in the museum context as an end in itself; instead it should constitute the medium for elevating museum visitors' experience, also enhancing their understanding and knowledge acquisition.
Along this direction, and taking advantage of the potential of XR technology, this chapter has proposed a reference technological model for implementing unified XR experiences in realistic virtual museums. The model supports physical and virtual worlds being seamlessly blended toward innovative museum experiences. Furthermore, we introduced the "true mediated reality" concept to refer to the collective technological components required for visitors to be able to interact with embodied virtual agents that will substitute museum artefacts with believable, interactive characters. The conceptual model aims to allow museum visitors to concurrently interact with their physical environment and other museum visitors while immersed in an XR application, constituting this type of experiences ideal for the museum environment where experiences should not be provided strictly in isolation but as social activities as well.
In this respect, we have presented the state of the art technology and highlighted needs to further advance research in three major technological pillars, and namely diminished reality, true mediated reality, and natural multimodal interaction. Future research endeavors should work toward fading the real environment, as it is perceived by all human senses, and substituting it with realistic objects and characters. Virtual characters in the XR environment should not only be realistic, but also exhibit naturalness in their movement, speech, and overall behavior. In addition, users' representation in the virtual environment should be agnostic of the devices used, embedding in the XR world a user avatar without head-mounted displays or input controllers. Finally, user interaction is expected to be multimodal, as in real world, and natural featuring speech, gestures, and even emotions. When the aforementioned technological advancements have been achieved, the experience delivered in museums and cultural heritage sites will be revolutionized, hoping to not only entertain visitors, but allow them to better understand the exhibits, increase their empathy with challenging concepts and topics, and eventually enhance their knowledge.
Author Contributions: All authors have contributed equally to this work. All authors have read and agreed to the published version of the manuscript.