Next Article in Journal
A Transformer-Based Autoencoder with Isolation Forest and XGBoost for Malfunction and Intrusion Detection in Wireless Sensor Networks for Forest Fire Prediction
Previous Article in Journal
Multi-Class Intrusion Detection in Internet of Vehicles: Optimizing Machine Learning Models on Imbalanced Data
Previous Article in Special Issue
A Reference Architecture for Virtual Human Integration in the Metaverse: Enhancing the Galleries, Libraries, Archives, and Museums (GLAM) Sector with AI-Driven Experiences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

A Systematic Review on the Combination of VR, IoT and AI Technologies, and Their Integration in Applications

by
Dimitris Kostadimas
,
Vlasios Kasapakis
and
Konstantinos Kotis
*
Department of Cultural Technology and Communication, University of the Aegean, 81100 Mytilene, Greece
*
Author to whom correspondence should be addressed.
Future Internet 2025, 17(4), 163; https://doi.org/10.3390/fi17040163
Submission received: 22 February 2025 / Revised: 31 March 2025 / Accepted: 4 April 2025 / Published: 7 April 2025
(This article belongs to the Special Issue Advances in Extended Reality for Smart Cities)

Abstract

:
The convergence of Virtual Reality (VR), Artificial Intelligence (AI), and the Internet of Things (IoT) offers transformative potential across numerous sectors. However, existing studies often examine these technologies independently or in limited pairings, which overlooks the synergistic possibilities of their combined usage. This systematic review adheres to the PRISMA guidelines in order to critically analyze peer-reviewed literature from highly recognized academic databases related to the intersection of VR, AI, and IoT, and identify application domains, methodologies, tools, and key challenges. By focusing on real-life implementations and working prototypes, this review highlights state-of-the-art advancements and uncovers gaps that hinder practical adoption, such as data collection issues, interoperability barriers, and user experience challenges. The findings reveal that digital twins (DTs), AIoT systems, and immersive XR environments are promising as emerging technologies (ET), but require further development to achieve scalability and real-world impact, while in certain fields a limited amount of research is conducted until now. This review bridges theory and practice, providing a targeted foundation for future interdisciplinary research aimed at advancing practical, scalable solutions across domains such as healthcare, smart cities, industry, education, cultural heritage, and beyond. The study found that the integration of VR, AI, and IoT holds significant potential across various domains, with DTs, IoT systems, and immersive XR environments showing promising applications, but challenges such as data interoperability, user experience limitations, and scalability barriers hinder widespread adoption.

1. Introduction

The rapid advancement of digital technologies has fundamentally transformed how we interact with information, make decisions, and perceive our environment. The integration of emerging and disruptive technologies (EDT), such as VR, AI, and the IoT, continuously deliver new technological advancements, revolutionizing the industry, and seamlessly connecting physical and digital worlds. These three EDT technologies, individually impactful, hold immense synergistic potential when combined. VR creates immersive environments that enhance user engagement and understanding; AI drives intelligent decision-making through advanced data processing and predictive analytics, supports intelligent interactions between humans and machines, and generates content via dialogues in a fully automated manner; and IoT enables the real-time collection and communication of data by creating vast networks of physical devices. Together, these technologies unlock opportunities for innovation across multiple application domains.
In recent years we have witnessed a significant progress in each of these technologies independently. VR and eXtended reality (XR) in general have evolved from simple stereoscopic displays and basic digital augmentations to sophisticated systems capable of providing highly immersive experiences [1,2], with applications ranging from education [3,4] and entrainment [5] to professional training [6,7] and therapeutic interventions [8]. In the past five, years, AI (genAI, conversational AI, computational AI, symbolic AI, neurosymbolic AI) has seen unprecedented growth [9], driven by significant academic and industry contributions. The development of conversational AI and large language models (LLMs) has transformed everyday human-computer interactions [10,11], enabling more natural and intuitive communication while also assisting people by simplifying tasks in a variety of fields [12]. Additionally, computational AI demonstrated remarkable capabilities in pattern recognition, decision-making, and autonomous control, particularly through advances in machine learning and neural networks. Finally, IoT has expanded exponentially, with billions of connected devices generating vast amounts of data and enabling real-time monitoring and control of physical systems [13]. The current technological landscape also reveals several emerging trends that highlight the timeliness of this review. The rise of DTs, which combine IoT sensors with AI-driven simulations and VR interfaces, demonstrates the practical value of integrating these technologies. Similarly, the growing interest in metaverse applications and immersive collaborative environments underscores the need for a better understanding of how these technologies can be combined/integrated effectively.
Despite the undoubtful progress and importance of the individual EDTs, the integration of these three technologies remains relatively unexplored. A significant portion of the research focuses on VR, AI, or IoT in isolation or in limited pairings [14,15,16], neglecting the synergistic possibilities of their combined usage. Additionally, many studies emphasize theoretical frameworks rather than real-world implementations, leaving critical questions about feasibility and practical challenges unanswered.
This paper presents a systematic review that seeks to address these gaps by analyzing current research on the integration of VR, AI, and IoT technologies. Adhering to PRISMA guidelines, it focuses on studies with real-world implementations and working prototypes to provide actionable insights for researchers and practitioners. Specifically, this review aims to:
  • Identify the primary application domains of VR-AI-IoT integration.
  • Identify and analyze methodologies, tools, and frameworks used for combining these technologies.
  • Highlight unique advantages of their synergy.
  • Discuss key limitations and challenges that hinder practical adoption.
  • Offer directions for future research to advance interdisciplinary innovation.
The main contribution of this systematic review is to offer a comprehensive analysis of the intersection of VR, AI, and IoT through a systematic review that adheres to the PRISMA methodology for systematic reviews. Unlike previous studies that focus on the application of these technologies in isolation or in limited pairings [14,15,16,17] and/or in certain domains [18,19], while not solely focusing on feasible implementations, this review adopts an integrated and interdisciplinary approach to highlight the underexplored potential of their synergy in various application fields. Additionally, this review focuses on works with real-life implementations and actual working prototypes that provide actionable insights beyond high-level frameworks, bridging the gap between theory and practice. By pinpointing current applications, mapping their primary application fields, summarizing the state-of-the-art, examining tools and methodologies, and identifying challenges and limitations, this review provides a targeted foundation for future research aimed at harnessing the full synergy of VR, AI and IoT, addressing practical challenges, and advancing innovations across diverse fields.
The remainder of this paper is structured as follows. Section 2 provides essential background knowledge, introducing the reader in each of the core technologies studied in this review. Section 3 outlines the research methodology, which adheres to PRISMA guidelines, ensuring transparency and reproducibility of the review process. The research questions are also set in the methodology section. In Section 4 a comprehensive summary of all eligible studies is presented, accompanied by a detailed table that offers key insights from each work. Section 5 addresses the research questions through a critical discussion of the findings. Finally, Section 6 and Section 7 highlight future research directions and present the concluding remarks.

2. Background

2.1. Virtual Reality

VR and its extension (XR) is an immersive technology that allows users to interact with digital environments in real time. Devices such as head-mounted displays (HMDs), motion controllers, tracking systems, sensors (e.g., Gyroscopes, Accelerometers and Magnetometers) and sensory feedback systems are used to create simulations that replicate real-world or fantastical scenarios [20,21]. This technology has found applications in diverse fields, including healthcare, gaming, education, industry, business, and many more [22,23]. For example, VR is extensively used for surgical simulations, enabling medical professionals to perform complex procedures in a risk-free environment [24,25]. Similarly, in education, VR provides interactive learning experiences, fostering deeper engagement and comprehension among students [4,26]. VR has also garnered increased interest from the general public due to advancements in VR gaming and the availability of commercially accessible devices such as the Meta Quest [27,28]. Despite its growing adoption, challenges persist, particularly regarding motion sickness, latency, and the high cost of hardware, which limit accessibility and scalability [29,30,31]. Even though standalone VR headsets such as the Meta Quest have come a long way, insufficient computational power, resolution limitations, restricted fields of view, and tracking errors affect immersion and user experience negatively [32]. The overall graphical fidelity, physical simulations, and user interactions within VR environments often lack realism, breaking the sense of immersion [33]. Additionally, there is a lack of standardization across platforms and the development of immersive VR environments has multiple barriers including the difficulty to forecast and simulate motion as well as testing and evaluating these systems with real users [34]. The current state of VR is characterized by rapid advancements in hardware and software, driven by innovations in fields such as computer graphics and sensor technologies. The development of standalone VR devices, such as the Meta Quest series, has made VR more accessible by eliminating the need for tethered connections to powerful computers. The integration of advanced computer graphics, sensor technologies, and innovative software continues to drive VR forward and attempts to eliminate known issues, making it a transformative tool in many domains as it enables the creation of highly immersive experiences.

2.2. Internet of Things

IoT is a network of interconnected physical devices referred to as “things” that communicate and exchange data over the internet [35,36]. IoT enables everyday objects to become “smart” interconnected digital entities with the help of sensors, actuators, and other appliances that collect, transmit, and act on data, facilitating automation and improving decision-making processes [37,38]. The rapid expansion of IoT is fueled by advances in wireless communication technologies, such as 5G and LPWAN, which support the connectivity of billions of devices worldwide. Data collected by IoT sensors are transferred through networks (wireless or wired) and then analyzed, either in a central cloud environment or directly on local devices (edge computing). This technology is also used to effectively bridge physical and digital environments as through the “communication” of the real-world sensors real-world variables can be passed to a DT [39]. For instance, in healthcare, IoT devices monitor patient vitals and provide real-time data to medical professionals, improving patient outcomes [40,41]. Similarly, in smart cities, IoT applications optimize energy usage, manage traffic flow, and enhance public safety by integrating real-time data from various sources [42]. In general, IoT is a crucial enabler for smart environments and data-driven decision-making processes (especially when combined with the latest advancements in the field of AI) and is widely researched and used in fields such as smart cities, intelligent transportation, logistics and supply chain as well as healthcare and emergency response [43]. However, this technology faces significant challenges, including data security, interoperability, availability, as well as reliability [44,45]. Due to varying communication protocols and architectures, and the lack of standardization in the field, integrating heterogeneous devices continues to be a significant challenge. Advanced encryption method are necessary to ensure data confidentiality and mitigate the serious risks posed by security flaws such as illegal access and data breaches, which are more and more common nowadays. Furthermore, issues related to mobility, the power consumption and computational power of such devices as well as network reliability, reduce the effectiveness of IoT systems, particularly in dynamic environments [45]. As IoT systems become more complex, ensuring their scalability and reliability requires addressing these issues through robust frameworks and regulatory measures [43,46].

2.3. Artificial Intelligence

AI encompasses a broad range of computational techniques and approaches designed to enable machines to mimic human cognitive functions such as learning, reasoning, and problem-solving. It has experienced significant growth over the past five years, with advancements in technology and widespread adoption [9]. AI is considered an umbrella term for a broad range of technologies that have the ability to (a) represent a problem, and (b) solve it. Its applications are diverse and span numerous fields, including self-driving cars and mobility solutions, automated robots for inventory management in warehouses, smart home systems and more [47].
Modern AI systems primarily rely on machine learning techniques, particularly deep learning (DL), which allows systems to learn patterns and make decisions from large datasets without explicit programming. Machine learning is a subset of AI that focuses on developing algorithms that enable computers to learn from data. DL, a specialized subset of machine learning, uses neural networks with multiple layers to process large amounts of data and identify intricate patterns [48]. Neural networks, inspired by the human brain, consist of interconnected nodes that process information. These distinctions highlight the varying complexity and scope of AI technologies. Regarding the application of DL, the most common neural networks are the CNNs (Convolutional Neural Networks) and RNNs (Recurrent Neural Networks). CNNs, inspired by the visual cortex of the human brain [49], excel at image-related tasks such as recognition and object detection due to their ability to detect patterns in grid-like data [50]. RNNs are designed to handle sequential data by maintaining a memory of previous inputs through recurrent connections, making them ideal for time-series analysis and natural language processing. Their evolution has significantly advanced fields such as computer vision and speech recognition [51,52].
Additionally, LLMs have revolutionized the field by enabling systems to understand and generate human-like text/images/video/audio, based on extensive datasets. These models, such as GPT-3 [53] and BERT [54], excel in a variety of language tasks including translation, summarization, and question answering, offering highly accurate and contextually relevant outputs. Building on the advancements of LLMs, Conversational AI involves designing intelligent systems that can engage in natural, free-form dialogues with humans. Utilizing technologies such as natural language processing (NLP), machine learning (ML), and voice recognition, these systems interpret user intent, manage context, and provide personalized responses [55]. This enables engaging, human-like interactions that are rich in flexibility, informality, and personalization. Conversational agents learn from previous interactions to continuously improve their dialogue capabilities, making them valuable tools in various applications, including personal assistance and customer support.
The success of contemporary AI applications is largely attributed to advances in neural network architectures, the availability of massive datasets, and increases in computational power. As the volume of data continues to grow, machine learning algorithms are making increasingly precise and accurate predictions in real time [56]. On the other hand, challenges such as algorithmic bias [57], lack of transparency, and ethical concerns surrounding data privacy [58] remain critical obstacles to widespread adoption. Addressing these issues will require advances in explainable AI (XAI) and the development of ethical frameworks for AI governance [59]. A substantial portion of these limitations of the currently trending AI approaches stems from the datasets used for training. As mentioned, DL models require vast amounts of labeled data and computational resources in order to train sufficiently, and this is another barrier that cannot be overlooked when building such systems since it is not deemed easy for organizations and especially individuals or small teams to have access to such datasets. Moreover, some AI systems struggle to understand contextually complex scenarios and their performance is poor, limiting their reliability in critical or high-stakes situations like their applications in healthcare [60].

2.4. Combined Technologies

The integration of VR, AI, and IoT is driving a new wave of technological innovation, enabling novel applications across diverse sectors beyond the capabilities of each individual technology. This article further delves into this exact combination. An aspect of this synergy is AIoT (Artificial Intelligence of Things), where AI processes data collected by IoT devices to generate actionable insights [61,62]. Another aspect is the integration of VR with IoT sensors, where real-time data from IoT devices enhance the realism and interactivity of virtual environments by feeding real-world data into the virtual counterpart [63]. The combination of VR with AI further expands the potential for intelligent, adaptive virtual environments. AI-driven VR systems can personalize user experiences by analyzing behavior and preferences while their application is ranging from gaming to healthcare [64]. Finally, these three technologies can converge to create intelligent systems that surpass previous iterations which incorporated only a subset of these technologies, and then provide new and more robust capabilities. For example, IoT devices gather data about a user’s actions or environmental conditions, AI analyzes these data to identify patterns or trends, and VR visualizes the outcomes in an intuitive, immersive environment. This synergy is particularly relevant for applications such as virtual training environments, smart city planning, and personalized healthcare solutions. Unlike previous research that often focuses on these technologies in isolation or limited pairings, this review adopts a holistic approach to explore their combined applications in real-life implementations without being limited to a certain application domain.

3. Systematic Review Methodology

This systematic review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to ensure a transparent and replicable approach. A comprehensive search using the Scopus as well as the Clarivate Web of Science databases was conducted to search for articles that will help answer a number of specific research questions as outlined here, and further discussed in detail in Section 5.

3.1. Research Questions

  • RQ1: What are the primary fields of application where VR, AI, and IoT are used in combination?
  • RQ2: How are the three core technologies studied (VR-AI-IoT) or their subsets currently being integrated in different application domains, and what unique advantages does their combined usage offer compared to isolated or paired applications?
  • RQ3: What methodologies, tools, and architectures are commonly employed in studies combining VR, AI, and IoT?
  • RQ4: What are the current limitations and challenges in combining VR, AI, and IoT, particularly regarding data collection, interoperability, and user experience?
  • RQ5: Are there any emerging frameworks or models of the seamless combination of the three technologies, for specific application domains?

3.2. Eligibility Criteria

To refine the selection of relevant studies from the results returned by the register and database searches, the following inclusion and exclusion criteria were established to ensure the relevance and quality of the papers included in this review.
  • Inclusion Criteria
    IC1: Studies published between 2020–2024
    IC2: Peer-reviewed articles
    IC3: Referring to at least two combined technologies
    IC4: Full text available in English
  • Exclusion Criteria
    EC1: Review papers, editorials, or opinion pieces
    EC2: Grey literature (technical reports, white papers, blogs)
    EC3: Retracted publications
    EC4: Publications without DOI
    EC5: Duplicate publications of the same study
    EC6: Papers presenting single-technology implementations
    EC7: Papers presenting system architecture proposals without any prototype implementations
    EC8: Content with missing or unclear methodology.
    EC9: Papers without adequate proof of a system’s existence or inadequate testing of the proof-of-concept implementation (Simulations with fabricated data are not considered adequate as well as testing with a significant focus on a single part of the system, e.g., AI model improvements within IoT or data transfer latency, etc.) are not considered. Papers that neither show images nor mention a testbed or experiment are also excluded. Only implementations or experiments that use the proposed system in its entirety are considered, regardless of scale, as long as it could be applied to real-life scenarios, even after further improvements.
To ensure the high quality of the articles included in our systematic review, additional criteria beyond those mentioned above were introduced. As the aim was to cover a broad range of application fields where convergence of AI, IoT, and VR takes place, in cases where there is an abundance of findings in a particular application area only the top 3 articles based on their research impact will be selected. This impact has been assessed using the following criteria:
  • Number of citations to the paper.
  • Impact factor of the journal published.
  • Clarity of the methodology followed (e.g., tables provided, graphs, step by step details etc.)
This decision was made after a comprehensive review of the eligible literature, as it revealed a disproportionate number of papers were connected to specific domains such as healthcare, Industry 4.0 and more. To address this potential bias and enhance the generalizability of our systematic review, certain papers were strategically excluded. This approach was designed to ensure a more balanced and comprehensive exploration of the synergistic potential between AI, IoT, and VR, thus improving the overall precision and objectivity of this research.
To address the potential limitations of solely relying on Scopus and Web of Science sources, we conducted a supplementary non-systematic search in the databases of the Institute of Electrical and Electronics Engineers (IEEE) and the Association for Computing Machinery (ACM). From these sources, a total of 509 papers were retrieved from the IEEE Xplore database and 1113 papers from ACM Digital Library. From those, 6 articles (3 from ACM and 3 from IEEE) were selected based on their relevance to the previously mentioned inclusion criteria, while also taking into account their recency, impact, and alignment with the review’s objectives. This approach ensures a more inclusive and balanced representation of the literature while maintaining a manageable scope.
As the newly included works were not identified through the systematic review process (but through a supplementary, non-systematic search) they have not been incorporated in the following PRISMA flowchart to avoid inaccurately representing the originally employed systematic methodology. To maintain transparency, reproducibility as well as adherence to PRISMA guidelines the supplementary works have been treated as a complementary addition and are discussed separately.

3.3. Search Queries & Databases

The study involved searching through the popular and highly-indexed Scopus and Web of Science (WOS) databases. To retrieve articles related to the research questions set the following single line boolean queries were performed in each one. The key terms stem from the research questions set.
For Scopus:
(TITLE-ABS-KEY ((“virtual reality” OR “VR”) AND (“internet of things” OR “IoT”) AND (“artificial intelligence” OR “AI”)) AND NOT (TITLE-ABS-KEY (“Review” OR “Survey”))) AND (PUBYEAR > 2019 AND PUBYEAR < 2025)
For Web of Science:
(AB = “virtual reality” OR AB = “VR”) AND (AB = “internet of things” OR AB = “IoT”) AND (AB = “artificial intelligence” OR AB = “AI”) NOT (TI = “Review” OR TI = “Survey” OR AB = “Review” OR AB = “Survey” OR AK = “Review” OR AK = “Survey”)|With Publication Date Filter set from 1 January 2020–30 November 2024
These database searches were performed on 30 November 2024. The initial search results comprised 659 and 106 articles, respectively. Thus, a total of 765 articles from both databases were retrieved. After the duplicates were removed, which were found to be 97, the final number of articles for screening was 668. Automated screening tools were not used.
The additional measures set regarding the quality of the papers chosen further justify our choice of Scopus and Web of Science as the primary databases for retrieving articles, given their advanced filtering capabilities and transparent citation metrics. Both databases provide comprehensive bibliometric information, including immediate access to citation counts, which directly supports rigorous article selection process.
Regarding our supplementary research the search queries conducted on the IEEE Xplore and ACM Digital Library databases were the following:
For ACM:
[[All: “virtual reality”] OR [All: “vr”]] AND [[All: “internet of things”] OR [All: “iot”]] AND [[All: “artificial intelligence”] OR [All: “ai”]] AND NOT [[All: “review”] OR [All: “survey”]] AND [E-Publication Date: Past 5 years]
For IEEE:
(((“virtual reality” OR “VR”) AND (“internet of things” OR “IoT”) AND (“artificial intelligence” OR “AI”)) NOT (“Review” OR “Survey”))|With Publication Date Filter set from 2020 to 2025

3.4. PRISMA

Concluding the systematic search and screening of relevant studies, the PRISMA flowchart is presented in Figure 1, outlining the systematic process followed. This flowchart tracks the journey from the initial identification of relevant papers to the various screening stages, ultimately leading to the inclusion of 14 final studies that were rigorously analyzed.
Data extraction was performed using the export function of these databases. Through that, spreadsheets were created with crucial data such as title, abstract, authors, keywords, citation counts, year, sources, article types, links, and more. The titles of the articles were systematically screened in relation to the inclusion criteria as part of the initial selection process. In the second screening phase, the articles were screened based on keywords, abstract, conclusion section, as well as overall document structure and section titles. The third and final screening stage involved a detailed in-depth review of the complete manuscripts passing the previous screenings, assessed for eligibility based on the inclusion and exclusion criteria and possibly filtered due to the aforementioned additional criteria.

4. Overview of Included Studies

This section offers a comprehensive overview of the studies selected for the systematic review. The Table 1 details the key characteristics of the included articles, offering a snapshot of the works examined in this review. By organizing data across various dimensions such as application fields, technological combinations, implementation types, and methodological approaches, the table aims to provide a structured insight into the current landscape of interdisciplinary research. Additionally, it offers a comparative perspective to address the research questions and identify trends in combining VR, AI, and IoT across various fields. The columns highlight not only the technological innovations but also the practical aspects of each study, including tools and methodologies used, technological advantages, inherent challenges, and significant research results.
Zixuan Zhang et al. in their 2020 research [65] explore the integration of AI, VR, and IoT in wearable electronics, focusing on gait analysis for healthcare applications. They fabricated a pair of socks able to harvest energy through tactile triboelectric nanogenerators (TENGs) and collect data in a non-invasive way. The smart socks were able to collect features and feed a one-dimensional convolutional neural network (1D CNN) trained on data from five participants. This AIoT pairing is particularly useful for analyzing walking patterns in individuals with conditions such as Parkinson’s disease, diabetes, and musculoskeletal abnormalities. The system also incorporated a micro-controller unit (MCU) to transmit data to computers, enabling emergency alerts for irregular gait signals, such as falls, and then inform caregivers. A challenge that the authors came up against during the creation of the smart socks was that the accuracy was largely reduced as the number of participants increased, while the sensor’s accuracy was depended on weather and environment conditions. However, the authors mention that with sufficient training time and an optimal sliding window size, accuracy can be improved. The final system included a triple-sensor sock, a signal pre-processing circuit, an MCU with wireless transmission, a personal computer, and a VR environment created with the Unity engine. The article also showcased its applications in VR fitness games where the users can control characters in 3D worlds but also in smart home environments where the system distinguished family members from strangers solely based on walking patterns. This article shows how much can be done with simple low-cost tools thanks to the combination of IoT, AI and VR.
Zhongda Sun et al. in [66] present a DT-based virtual shopping system utilizing an AIoT-enabled soft robotic manipulator, which also integrates L-TENGs as robotic fingers, T-TENGs and polyvinylidene fluoride (PVDF) sensors for enhanced object recognition and user interactions. This system was designed for use in retail and industrial settings where through IoT, AI and VR it would be able to assist virtual shopping through item identification and handling. It recognizes objects through the sensors which extract features later processed by AI algorithms such as SVM and a three-layer 1D-CNN. To evaluate the system, a dataset of objects was created and testing with real simulations was conducted demonstrating an impressive object recognition accuracy of 97.14% with the main limitation being the recognition of similarly shaped objects. Later, with the use of the PVDF sensor, a temperature feature could also be extracted to help in the classification process after the size and shape characteristics were extracted. To demonstrate the system’s capabilities, a DT virtual shop was proposed where users could shop in the virtual world while a robotic manipulator on a moving cart mimicked their actions in the real shop, recognizing and collecting items. The DT updates the item’s size, shape, and temperature, enabling users to confirm the item’s accuracy and condition.
Another related work is [67] where the authors proposed a novel framework in the healthcare application field, named PP-SPA for privacy-preserved Human Activity Recognition (HAR) and real-time support for cognitive impaired individuals using smartphone-based virtual personal assistants. Through a custom smartphone application developed for Android, which continuously collects data from the smartphone’s sensors, they were able to track the user’s daily life activities and assist them if needed. The app also processes the collected data remotely using web services. A digital diary feature is integrated into the app, which contains a dictionary of activities. When the app detects a user’s location and activity, it provides auditory prompts to guide the user in executing tasks. Various machine learning algorithms were employed, and an accuracy of 90% in activity recognition was achieved. Specifically, the Hoeffding Tree algorithm and Logistic Regression outperformed Random Forest and Naive Bayes by significant margins in all performance metrics used. The PP-SPA framework effectively improves the routine life of cognitively impaired individuals by providing personalized health assessments and recommendations. The challenges identified in this work include the need for ensuring user privacy and the need for accurate real-time activity recognition in a home environment. The study could be benefited from a wider and bigger dataset, as well as testing in a real world multi-resident home.
Continuing in the realm of healthcare, Jun Zhang and Yonghang Tai in their 2022 research [68] developed a smart human-centric medical digital twin (MDT) focusing on lung biopsy that effectively combines AI, VR, and IoT. The findings underscore the potential of these technologies to revolutionize clinical processes and enhance medical training. The MDT functions as a virtual patient replica, supported by a dataset created from 3 real hospitals and a customized CNN for anomaly detection and treatment guidance, though the study does not detail its accuracy or compare results with human diagnoses. The study also involved the creation of a VR-based surgery simulator, VatsSim-XR, which integrates VR, AR (augmented reality), and MR (mixed reality) modes for clinical training in lung biopsy procedures. The simulator comprises a laptop, Oculus VR headsets, two positioners and two force feedback devices to add haptic feedback and improve virtual operation realism. The experiments carried out demonstrated that the MDT significantly improved novice doctors’ surgical skills toward expert levels. Although fully virtual training offered a wider field of vision, expert doctors did not perform significantly better than novice ones in some cases. This suggests that while VR and AR have their advantages, they may not always yield better outcomes compared MR. To address the risks of sharing private medical data via MDT, the authors also proposed a CodeBERT-based NN to detect and mitigate cybersecurity vulnerabilities. Their approach, which employs unique code processing and knowledge representation methods, showed a precision of 70%. However, the current accuracy requires refinement to meet the standards needed for practical deployment. Other key limitation include the study’s reliance on data from only three hospitals in specific Chinese regions, limiting generalizability, and its narrow focus on lung biopsy procedures, restricting broader applications. The simulation environment lacks fidelity and could be enhanced. The study also omits evaluations of healthcare professionals’ user experience, overlooking crucial metrics such as user satisfaction, learning curves, and long-term skill retention. Additionally, practical aspects such as system scalability, integration with existing hospital infrastructure, and cost implications remain unaddressed. Including these factors would offer a more comprehensive evaluation and support real-world implementations.
Advancing Industry 4.0, the authors of [69] proposed a Human Collaborative Intelligence empowered Digital Twin framework (HCLINT-DT), integrating XR, AI, and IoT to advance collaborative intelligence in various fields. The framework uses natural language processing (NLP) technologies, such as speech-to-text and named entity recognition, to enable users to create and share annotations (textual, visual, vocal) that link real-world objects annotated via AR with their virtual counterparts within DTs. This facilitates knowledge preservation and collaborative learning. Real-world testing was conducted through a use case involving family photo albums, where participants interacted with both AR and VR interfaces to annotate and explore memories. DL models such as YOLOv5 were used for object detection, identifying photos within the user’s view and retrieving associated annotations. An online survey showed satisfaction with automatic photo identification but mixed preferences for AR experiences over traditional albums which suggest areas for improvement. The frameworks use in industrial settings is exemplified through testing in an electrical systems production facility, where workers assembling switchboards used AR for real-time component recognition and annotation. This eliminated the need to move between physical switchboards and computers, improving efficiency and collaboration by displaying relevant information directly in the user’s view. Some limitations that could be pointed out are the reliance on user engagement for annotations where AI could supplement knowledge in known areas. Implementing advanced technologies such as XR and crowd intelligence may introduce technical challenges while the scalability of this model is still in question, as the introduction of more and more annotations and data points might hinder the real-time performance.
Recent research by Fernandes et al. [70] extends previous work on DTs in Industry 4.0 by implementing and refining existing technology to create a DT system specifically for electric utilities, integrating AI, IoT, and VR into a four-layer model. The foundation layer comprises a global 3D modeling repository, legacy system integration, data universalization, and cybersecurity. The IoT-based Dynamic Devices and Sensors Layer enables real-time asset monitoring, while the AI layer transitions from corrective to predictive maintenance by automating tasks such as anomaly detection. The Human Interface layer uses AR and VR for remote assistance and training, supported by a centralized “Universal Virtual Data Lake” (UVDL) for data storage and analysis. Preservation of legacy systems is an advantage of this work as it is something that is usually overlooked. A real-world application was showcased using a Network Digital Twin (NDT) to create a 3D replica of a city’s electrical grid, improving asset management and service quality. The system combines image processing, IoT-based real-time data collection, and point cloud technology for precise spatial data capture, allowing users to visualize and interact with electrical infrastructures. AI is used for predictive maintenance and grid management, while AR and VR improve training and field operations through the UVDL. A cost-benefit analysis revealed a 90% reduction in inspection hours, a 65% decrease in on-field time. Challenges faced included the integration of various technologies and the need for a cultural shift within the workforce to adapt to new digital tools. The authors skillfully managed to combine existing knowledge to prove the real-world applicability of such technologies, moving beyond the conceptual stage and into practical, real-life implementation—a stage at which many research works often remain.
The use of VR to enrich the educational experience is not new. The synergistic capabilities of both AI, IoT and VR, on the other hand, have not been fully explored yet regarding the enhancement of the learning experience.
Authors of [71] created a system employing AI and XR which aims to address unethical behaviors in traditional education in Bangladesh and enhance self-learning. The system integrates AR and VR capabilities through a mobile app and web portal, with AR enabling interaction with 3D models from textbooks and VR offering immersive learning while AI personalizes study guidance based on quiz performance. The concept is built upon simple tools such as, the Google cardboard, simple AI algorithms such as decision trees, and smartphones, the app claims compatibility with low-spec devices. The system was tested by comparing the performance of students using traditional resources and the other using the proposed system and showcased that students with access to the system performed significantly better. Students reported finding the AR, VR, and AI study guide features beneficial. Even though the outcome sounds promising, the study poses several critical limitations. First of all, it attempts to eliminate the teachers’ role and does not consider ways to complement traditional educational methods and provide diverse learning styles. The AI capabilities are underwhelming with very limited features. Use of DL and NNs would enhance predictive accuracy and adaptability as well as incorporate features such as interactive learning through conversation, retrieving recent information on subjects, specific searches, tailored quizzes, etc. Additionally, no cost-effectiveness analysis is provided and ensuring accessibility to affordable technology for all students is not thoroughly explored while creating an app for low-spec devices is not deemed sufficient. The testing user base is also extremely limited to only 20 students while no feedback mechanism was implemented in the app. Finally, the content creation seems to be limited to the developers and not expandable by users. Implementing AI or a collaborative framework that allows contributions from educators could enhance this aspect. The solution does not seem easily scalable, and the authors do not discuss further around that topic.
The 2023 research by Jian Wang et al. addressed several of the aforementioned limitations while combining AI, IoT and VR in a single framework to modernize physical education (PE) in higher education [72]. The system immerses students in a VR environment for simulated physical activities, with motion data captured via Kinect devices and analyzed using Support Vector Machines (SVM) optimized by Particle Swarm Optimization (PSO) for accurate, real-time feedback. An action information module provides detailed motion data and animations, enabling students to compare their performance with template actions. The system features a 5-layered architecture, including hardware (Kinect sensors, VR-ready computers), data storage, application services, business logic (SVM and PSO algorithms), and a presentation layer offering action evaluation, learning libraries, and intuitive interfaces. Experimental results highlight the system’s effectiveness in enhancing PE instruction. Simulations conducted by the authors show significant improvements in performance metrics post-optimization. The system also handled increasing amounts of data effectively, making it better suited for real-world use than traditional methods. The results confirmed that combining VR, SVM, and PSO enhanced PE evaluation by providing individualized feedback while maintaining computational efficiency. The authors also provide a detailed description of the tools, methods, setup, and findings, with all the data available in the paper. This novel method of teaching, not only increases student engagement and performance in PE, but it also makes learning more pleasant while taking into account traditional techniques and the already existing frameworks. It serves as an interesting example of how AI, VR and IoT harmonically cooperate.
The synergistic combination of VR, AI and IoT extends to the arts and cultural domain, as demonstrated by Sernani et al.’s Vocal Museum system, which enhances museum visits through personalized voice-guided experiences [73]. The system integrates IoT and AI to provide personalized, context-aware experiences without overwhelming visitors with other digital distractions. It consists of three main components, a localization system based on Ultra-WideBand (UWB) antennas for position tracking, a mobile app for user interaction, and a server managing content delivery. Visitors pair their given UWB tags with the app and interact via voice or text chat, receiving context-aware information about nearby artworks. When a visitor enters an artwork zone, the system proactively engages the visitor with welcome messages and encourages interaction through conversational AI. Inside the museum there are multiple beacons that track the UWB tag location and based on that trigger certain interactions. The system was tested in a controlled lab environment with three famous artworks, demonstrating technical feasibility, but several limitations and open challenges were acknowledged. Their system could be improved with multilingual support and camera-based artwork recognition. Additionally, it lacks evidence regarding its scalability, and there is an absence of testing of the voice recognition in noisy museum environments or with non-technical users. Additionally, the system’s usability requires validation through established metrics such as the System Usability Scale (SUS). A potential advancement could be the integration of AR or even VR for features like visualizations, gesture-based interactions, remote visits, and virtual artwork restoration. On the other hand, the authors stated that even though XR has many advantages, it can take the visitors’ focus off of the real artworks, which did not align with their intentions. Despite these limitations, the Vocal Museum system represents a promising approach to enhancing museum experiences through voice interaction while maintaining focus on the physical artworks.
The Sensor2Scene framework presented in [74], highlights the potential of LLMs to interpret IoT sensor data and integrate it into AR environments for immersive, context-aware visualizations, with suggested applications in creative practices. The AI agent comprises Sensor Data Interpreter and Scene Producer modules, transforming IoT sensor readings and spatial information into structured metadata, which is used to generate 3D visual elements via advanced text-to-3D models that align with real-world settings. Features like a user feedback loop and environmental scanning enhance adaptability by refining scene descriptions based on real-time sensor updates. The authors benchmarked Sensor2Scene using two public and a self-collected dataset, assessing the system’s performance in scene description and 3D model generation quality. Evaluation revealed strengths in scene fidelity and utilization, but challenges in integration and coherence. A user feedback mechanism was introduced to improve these aspects iteratively. In terms of 3D model generation, the system demonstrated the ability to produce contextually relevant AR scenes, though issues such as less detailed textures and occasional object collisions were reported. Additionally, heavy reliance on pre-defined prompts for LLMs may limit the system’s flexibility. User feedback praised the dynamic adaptation of virtual elements to real-time sensor updates but emphasized the need for more refined interactions and texture quality. Proposed future research directions include the use of text-to-video technologies and 3D Gaussian splatting.
There are also several smart city systems that enhance quality of life in urban environments that take advantage of the convergence of IoT, AI and VR. Kuan-Ting Lai et al. in their 2023 research [75] developed an open-source platform to enhance drone fleet management. They distinguish AI drones, capable of running real-time DL algorithms, from AIoT drones, which extend this functionality by connecting to the Internet and enabling for object detection and autonomous navigation. The authors emphasize the need for robust Internet connectivity, identifying 4G/5G as optimal for achieving sufficient transmission distance and bandwidth. However, existing commercial solutions often rely on proprietary protocols, limiting customizability. This gap underscores the necessity for an open-source drone cloud to support advanced algorithmic research and development. To address these challenges “AI Wings” was developed, providing a versatile, cost-effective system that transforms DIY unmanned aerial vehicles (UAVs) into AIoT drones. It is an open-source platform [76] leveraging Android-based devices as embedded modules to avoid the high costs and complexity associated with alternatives such as NVIDIA Jetson. The system incorporates an ArduPilot-compatible MCU, a drone cloud server for secure management, and elliptic-curve cryptography for enhanced security. VR is integrated through Microsoft AirSim to test and train AI models in simulations before real-world deployment. Experiments were conducted under different conditions, with the drones operating at low altitudes and the quantized MobileNet SSD model was tested using TensorFlow Lite on various Android platforms to detect humans. The model achieved real-time performance, with results such as 33 FPS on a Snapdragon 855 and 16 FPS on a Snapdragon 665. Three different types of AIoT drones where constructed and tested to further validate the platform. The drones autonomously navigate, dodge obstacles, and perform tasks directed by the cloud server. Their real-world practicality was demonstrated through an emergency delivery service where using AI Wings automated external defibrillators (AEDs) were quickly delivered to people in emergency situations. This innovative system represents a significant step forward in making AIoT-enabled drones both practical and scalable in smart city environments while also embracing an open approach.
Continuing in the phasm of smart city applications and drones authors of [77] specifically focus on smart public safety surveillance (SPSS) framework using UAVs and employing both AI, VR and IoT. The study introduces the concept of a “microverse”, which serves as a task-oriented, edge-scale DT for localized, real-time decision-making as a practical alternative to fully immersive metaverses. The Microverse aims to integrate technologies such as DTs, blockchain, and lightweight network slicing to overcome challenges such as interoperability, and real-time data handling in complex IoT environments. The hierarchical architecture of the microverse consists of four interconnected layers, the Physical Layer (IoT devices and sensors), Slicing Layer (managing communication and Quality of Service (QoS)), Microverse Layer (creating virtual realms with semantic models and intelligent services), and Application Layer (user-facing VR and AR experiences). The SPSS Microverse prototype was tested through real small scale testing and simulations. It was developed with Unreal Engine 5 (UE5) and utilized UAVs and edge computing for real-time surveillance with YOLOv8 being used for object detection and tracking. Feedback data from drones, controlled via a custom Android app, is being processed through the microverse engine. The virtual environment allowed for immersive visualization and interaction while monitoring through a VR-headset was also possible. The system seemed to effectively mirror real-world conditions in a digital space by synchronizing physical surveillance drones with virtual environments, enabling tasks such as object detection and trespassing alerts in real-time. The low latency and high throughput recorded through the tests show feasibility for a smart city application but there are still enhancements to be done and the framework is not yet implemented in real testing as a whole, given that the slicing layer and certain security measures weren’t integrated during testing. Additionally ethical considerations about the surveillance drones weren’t discussed thoroughly and the accuracy of their detections wasn’t disclosed either.
Authors of [78] introduce AIoTtalk, a SIP-based platform designed to support a variety of heterogeneous AIoT applications. Unlike existing IoT platforms that rely on lightweight protocols such as MQTT, AIoTtalk makes use of the Session Initiation Protocol (SIP) for its robust capabilities in managing real-time multimedia and messaging sessions. This platform addresses critical requirements of modern AIoT applications, such as high-quality data streaming, low latency. The platform’s modular design simplifies IoT application development. Sensors collect data, which are processed and later get sent to actuators, while SIP proxies ensure smooth communication. A hybrid blockchain architecture secures data transmission, and AI models such as long short-term memory (LSTM) and a CNN model, are integrated as IoTtalk devices to perform tasks such as prediction and recognition. The platform was implemented in two real-world AIoT applications and tested in both cloud and edge computing environments. The first case regarded a real-time road traffic prediction application where roadside units (RSUs) collected vehicle speed data and transmitted it to the AIoTtalk platform via SIP messaging. The LSTM model processed the data to predict traffic speeds over the next five minutes and achieved an average error of just 4%. The second case was a neighborhood violence detection application, where smart home systems streamed audio data via SIP sessions to a CNN model to identify suspicious sounds such as gunshots or screams and sent notifications to emergency services when needed. It effectively identified urban sounds with high accuracy, but had difficulty with certain sounds such as children playing. The experiments demonstrated AIoTtalk’s efficiency under various scenarios. In the cloud setup, the AIoTtalk server and SIP proxy were located remotely from the user agents, simulating a geographically distributed deployment. In the edge setup, all components were co-located. Results showed that edge computing reduced latency by approximately 10ms compared to cloud. However, it should be mentioned that the study lacked details on user experience and did not specify the locations or scale of the experiments.
In [79] a metaverse avatar teleport system is presented, with its applications primarily focusing on entertainment sectors such as video games. The system employs an AIoT pose estimation device to control avatar movements, aiming to address traditional VR interface limitations by implementing a oneM2M IoT-compliant network-based solution. Its architecture consists of an AIoT device (Raspberry Pi 4 with a Google Coral USB Accelerator), a webcam for image capture, PoseNet for pose estimation, a Teleport Information Extractor, middleware, and a metaverse interworking module. The system resizes and analyzes webcam input to estimate human pose skeletons, triggering avatar teleportation via HTTP and MQTT protocols. Testing in the Minecraft video game environment using the system demonstrated its ability to recognize three distinct poses differentiated by the relative positions of wrists and shoulders and successfully adjusted coordinates and direction vectors, displaying guidance messages to users. However, the system appears to suffer from significant limitations that hinder its practical applicability, while its functionality is also limited. Firstly, it is constrained by specific lighting conditions, while metrics regarding latency and recognition accuracy are not reported. The implementation is particularly narrow in scope, showcasing only basic teleportation between two points within a Minecraft environment, which doesn’t represent the variety of metaverse applications. Furthermore, while the system adheres to the oneM2M standard, it overlooks other important industry standards that would be crucial for real-world deployment. These limitations, combined with the system’s lack of scalability and highly specific use case, suggest that further development would be needed before this approach could be considered for practical metaverse applications.
Having provided a thorough review of related papers, a comprehensive table that summarizes the details of each reviewed paper is presented in Table 1, aiming to provide a clear and concise summary of various studies, highlighting their contributions and limitations as well as providing technical info such as tools and methodologies used.
As mentioned in Section 3, to increase the inclusivity, objectivity and comprehensiveness of this systematic review, we have additionally included an overview of related articles selected from the databases of ACM and IEEE. These articles are within the scope of our paper, and adhere to the inclusion criteria. A representative set of 3 articles from each database was selected based on their impact and recency.
Starting with the first article selected from ACM database where authors of [80] created a safety control system with the integration of DT technology as well as IoT and AI to enhance human-robot-collaboration (HRC) and safety in industrial settings by calculating the appropriate distance between humans and robots. In the design phase VR is employed to create test DTs and identify potential safety hazards in HRC scenarios based on feedback from real-world IoT sensors in the physical system. This enables for two-way iterative optimization of the physical and the virtual system. In the production phase, CNNs are used to monitor and calculate the safety distance between humans and robots. The layered design includes applications for feedback, robot planning, and real-time DT data exchange. Binocular cameras were used for real-time monitoring and a Cascade Pyramid Network (which contains 2 sub-networks: GlobalNet and RefineNet) was used for human key point detection. The authors also developed a proprietary algorithm to calculate the minimum safety distance. The system achieved a 97.25% accuracy in label recognition after testing with a robotic arm in an automobile assembly station, with Unity3D used for VR simulation. A programmable logic controller (PLC) was used to control the robotic arm and the communication between the VR system and the PLC was established through the OPC Unified Architecture (OPC UA) protocol. The UDP protocol was used for real-time data streaming and Web services. Some of the system’s limitations is its reliance on pre-defined scenarios and the lack of consideration for the detection of environmental obstacles apart from humans. Additionally, the dataset of 1000 images mentioned to train the model and the 80–20 split between the training and testing set limits the model’s generalizability. Further testing with a wider audience and devices to expand the dataset will help improve the system’s performance in real life settings. Regarding the decrease of the response time, the authors emphasize the need for the integration of advanced technologies such as 5G and cloud computing.
Table 1. Table of Eligible Studies.
Table 1. Table of Eligible Studies.
WorkApplication FieldTechnology CombinationType of ImplementationMethodology, Models & Tools UsedOpen TechnologyAdvantagesLimitations and ChallengesKey Results
[65] Zixuan Zhang et al. (2020)Healthcare, Smart home, EntertainmentAI + IoT + VRReal scenario in gait monitoring as well as other applicationsT-TENGs, 1D CNN, Python with Keras and TensorFlow, Unity for VRNot specifiedLow-cost, Self-powered, High accuracy in gait identification (93.54%), Potential for real-time monitoring without privacy concerns, Additional applications in VR games and smart home systemsLimited accuracy with an increasing number of participants, Environmental sensitivity affecting sensor outputs, Worse single sensor results, Need for larger datasets for improved model performanceIntroduced a fully complete non-privacy invasive wearable electronic system of smart socks. Achieved 93.54% accuracy in gait identification among 13 participants and 96.67% accuracy in detecting five different human activities useful for healthcare. Demonstrated potential applications in VR fitness games and smart home systems
[66] Zhongda Sun et al. (2021)Industry, RetailAI + IoT + VRSimulation with real items, proposed real life scenarioT-TENG, L-TENG, PVDF, Python with Keras, 1D-CNN, SVM, DBN, Arduino Mega 2560, OscilloscopeNot specifiedLow-cost, High-accuracy (97.14%), Works in low light conditions which is important for industrial applications, Immersive VR experience through DTs, Integration of temperature sensing gives new capabilities, Allows for comprehensive understanding of the products handled in VR worldNeed for effective sensor fusion, Difficulty recognizing similarly shaped objects, Recognition accuracy influenced by grasping angles and PVDF sensor’s by contact pressure. Limited dataset potentially affecting generalizability, Possible noticeable time lag between virtual and real world. Future efforts could enhance robustness in diverse settings and applicationsSuccessfully developed a soft robotic manipulator that integrates multiple sensors for enhanced object recognition and user interaction in a virtual shopping environment. The system achieved a high recognition accuracy with potential for real-time applications in unmanned working spaces. It also proposed a shop digital twin and a two-way communication concept
[67] Abdul Rehman Javed et al. (2023)HealthcareAI + IoTReal small scale testingSmartphone sensors (Accelerometer, Gyroscope, Magnetometer, and GPS), Python for AI, Hoeffding Tree, Logistic regression, Naive Bayes, k-MeansNot specifiedPrivacy-preserved solution, Real-time human activity recognition and assistance, Acceptable accuracy (90%), Easy access and with relatively low cost considering the use of smartphone sensorsNo multi-resident home implementation, Need for optimal location clustering to improve accuracy of activity recognition, suggesting existing methods may not adequately differentiate between closely located activities. Collecting a more diverse dataset with varied activities and participants, along with real-world testing, could enhance its performanceSuccessfully integrated AI and IoT to provide real-time support for the cognitively impaired, Potential for improved health assessments and personalized care plans. The findings underscore the importance of privacy preservation in handling sensitive user data. Good accuracy of over 90% in recognizing daily life activities to sensor data
[68] Jun Zhang and Yonghang Tai in their (2022)HealthcareAI + IoT + VRReal testing scenarioUnity for the DT, CNN for treatment prediction, For vulnerability detection: CodeBERT, Word2vec, GloVe, FastText For the Training simulator: Laptop, Undisclosed force feedback devices, Oculus VR-Headsets, Positioners, CamerasNot specifiedImmersive surgical training through real-time feedback and interaction, Proven to improve novice doctors, Implementation of a new vulnerability tolerance scheme that enhances the cyber resilienceLimited scope of clinical data and only to lung biopsy, System may not fully replicate real-world conditions, Relatively low accuracy of the proposed cybersecurity measure, Insights from healthcare professionals could be improved with better evaluation metrics, Scalability and cost implications not addressedThe proposed medical digital twin significantly improved novice doctors’ surgical skills to near-expert levels, using a custom CNN for treatment suggestions. Mixed Reality (MR) training was highly effective, and CodeBERT enhanced the cyber security of the MDT by improving software vulnerability identification accuracy
[69] Lorenzo Stacchio et al. (2022)Industry, Healthcare, Smart home and moreAI + IoT + VRReal testing scenario; but observational study regarding the manufacturing settingMicrosoft HoloLens 2 for AR, HTC VIVE for VR Unity for VR, YOLO V5 for object detection, SIFT (Scale-Invariant Feature Transform) for image matchingNot specifiedReal-time collaboration and knowledge sharing through human annotations, Enhancing the learning experience and facilitating better communication in a variety of fields, Interconnects the physical with the virtual world through parallel DTs thus keeping the advantages of bothNeed for improved annotation retrieval methods, Reliance on user engagement for annotations, Implementing advanced technologies such as XR and crowd intelligence may introduce technical challenges, Scalability might be an issue due to potential information overload, Testing with a wider audience could be beneficialA versatile DT annotation system, the HCLINT-DT framework supports knowledge preservation and sharing through annotations. It connects the real world with a virtual digital twin via AR and VR, fostering collaborative learning. An online survey of 30 participants found the AR interface easy to use, though some preferred traditional photo albums over augmented ones
[70] Sabryna V. Fernandes et al. (2022)IndustryAI + IoT + VRReal live laboratory scenarioMobile Mapping Systems (MMS), Drones for aerial imaging, Thermal cameras for asset inspection, and Ground Penetrating Radar (GPR) for underground mapping, LiDAR to identify object characteristics and 3D image representation, SQL, Undisclosed AI algorithms/modelsNot specifiedAttempt to include legacy devices, Significant reduction in working hours for inspections, Decrease in on-field time for various activities, Dynamic and real-time monitoring, Predictive maintenance, VR training without the associated risks of real life tasks, Cost-EffectivenessCostly high-quality equipment, Complexity of integrating various data sources, Expanding data integration even for legacy devices, Could incorporate more sophisticated data analytics tools and elaborate more on the additional info accessible to the users through the DT, as well as develop a constant user feedback mechanism, Address information overloadThe digital twin for the electrical distribution system improved operational efficiencies, using high-end tools, IoT sensors, and knowledge from previous studies. It reduced inspection hours by 90% and on-field time by 65%, DTs enhanced asset management and industrial collaboration through VR and AR, and service quality in smart cities
[71] Haymontee Khan et al. (2021)EducationAI + VRReal testing scenarioSmartphones, Google Cardboard, Decision Tree algorithm for the learning assistanceNot specifiedApp for teaching that is made to work in devices with low specification standards and incorporates both AI, AR and VR through low cost tools.Ensuring accessibility to affordable technology for all students, No cost-effectiveness analysis provided, Does not address the long-term effectiveness, scalability, and potential technological barriers faced by students, The testing user base was extremely limited (20 students), The AI capabilities are just as limited without the use of advanced methods such as deep learning and neural networks, No feedback mechanism, Content is very limited and not expandable by users, UI could be improved, Does not consider ways to complement traditional educational methods and support diverse learning styles.Proposed an AR, VR, and AI-enhanced system for education allowing for interactive material, assistance in quizzes. Testing showed that students that used it performed better than those who didn’t
[72] Jian Wang et al.EducationAI + IoT + VRSimulation experimentsKinect, Undisclosed VR Headsets, VR-Ready computers, SVM and PSO, MySQL, Microsoft SQL Server 2018 and ASP.NET for web applications, MATLAB 2019 for simulationsDoesn’t provide specific code or files but the paper is very detailed providing data openly thus making it more easy to replicateExceptional AI capabilities with cutting-edge algorithms and optimization, Feedback mechanism, Expandable content, Takes into account conventional teaching methods, Provides a whole framework from activity evaluation to curriculum and instructional material building.Yet to be implemented in a real life higher education classroom, Need for more robust data management, Could benefit from user experience feedback as well as test the long term effectiveness, A cost-effectiveness analysis would be useful as well.An immersive VR-based information management system was created to enhance college PE by integrating AI algorithms such as SVM and PSO, and providing real-time feedback using Kinect. The optimized model significantly improved performance metrics, showcasing the transformative potential of this technology on physical education
[75] Kuan-Ting Lai et al.Smart CitiesAI + IoT + VRReal world testing and simulationsArduPilot microcontroller, MAVLink protocol, Android device with Snapdragon CPU, Microsoft AirSim, TensorFlow Lite, MobileNet SSD for object detection, Elliptic-curve cryptography, TCP and WebRTC protocols. The following drones: Bebop 2, F450 and X800Open source in GitHubLow-cost conversion of standard drones into AIoT drones, Ability to run real-time AI models for tasks such as object detection, Secure cloud server for managing drone fleets, VR simulations for training and testing, Open-source approach.Improve drones’ autonomy and weight, Could add more VR environments and AI models, Could also showcase optimal route planning scenarios.The AI Wings system effectively commands and controls multiple AIoT drones, achieving real-time object detection at over 30 FPS using MobileNet SSD on Snapdragon 855. The experimental medical drone service showcased the system’s capability to deliver AEDs quickly, showing practical applications in emergency scenarios
[77] Qian Qu et al. (2024)Smart CitiesAI + IoT + VRReal testing scenario with a proof-of-concept prototypeUnreal Engine 5, YOLOv8 for object detection, Android device, Drones, Meta software, Meta Quest 3, RTSP protocolNot specifiedReal-time data processing and monitoring, Integration of physical and digital environments.No detection accuracy reported, Lack of a comprehensive security analysis of the implemented protocols, No user experience evaluation, Network slicing layer and microchained security networks yet to be implemented in real testing, Ethical Considerations could also be discussed.Integration of AI, IoT, and VR in UAVs enabled real-time object detection and situational awareness with low latency. The microverse concept is feasible and validated by these findings, enhances public safety and urban governance by providing dynamic digital representations of real-world scenarios
[78] Shun-Ren Yang (2023)Smart CitiesAI + IoTReal testbed scenariosSIP, LSTM, CNN, Hybrid blockchain architecture for secure messagingAccording to the article, the project was open and could be downloaded through GitHub but it is not up anymore (date checked: 1 January 2025)Low latency and high quality of experience for messaging and streaming-based AIoT applications through the more advanced SIP protocol, Addresses the limitations of current IoT service platforms which struggle to support diverse AIoT applications requiring multimedia capabilities.More details about the experiments conducted are needed, such as where and in how many places the system was actually implemented. Need for enhancement of the audio recognition accuracy. Does not address user experience.Introduced the low latency AIoT platform AIoTtalk and showcased road traffic prediction as well as neighborhood violence detection applications
[73] Paolo Sernani (2020)Art & CultureAI + IoTLaboratory simulationsSewio UWB tags and antennas, Apache Cordova for the app development, Google’s speech-to-text services for voice recognition, Redis for in-memory database managementNot specifiedEnhances museum visitor engagement by providing personalized, voice-based interactions with artworks, Attempts to avoid distractions from digital displays, Allows for the use of natural language, Understands user’s proximity to exhibitions through tracking tags, Offers an experience that wouldn’t be possible with a traditional audio or human guide.Lack of multilingual support, Not tested in noisy environments, No formal evaluation methodology, No user study or quantitative data on system effectiveness and learning engagement, No discussion on scalability and concurrent users, No XR Capabilities, No pattern behavior analyzation.Managed to provide personalized museum visits through voice and text interactions, demonstrating the potential to enhance user engagement without overwhelming visitors with information. System can effectively localize users and provide relevant information
[74] Yunqi Guo (2024)Art & CultureAI + IoT + VRReal testing with 2 public datasets and 1 made by the authorsLangChain framework for AI agent construction, WebXR with three.js for AR rendering, GPT-3.5 and GPT-4 LLMs for scene description generation, DreamGaussian, MVDream and Genie for Text-to-3D conversion, Blender for 3D rendering, AR Goggles with attached Camera, Meta Quest 3Not specifiedReal-time sensor data visualization in AR with low-latency, Dynamic scene adaptation, High fidelity in IoT data interpretation, Cross-platform compatibility, Enhanced user interaction while taking into account user feedback, Good cooperation of many tools and technologies.LLMs struggled to map sensor data accurately to real-world entities, Some generative models produced less detailed textures and object collisions, No real-world deployment, Limited evaluation of user feedback, Potential computational overhead for real-time scene generation on lower-end devices, Heavy reliance on pre-defined prompts for LLMs limiting flexibility in unexpected scenarios.Proposed a framework for using IoT sensor and spatial data to generate immersive, contextually relevant AR scenes, achieving high scores in fidelity and utilization. The system is built to transform intangible environmental data into contextual visual representations with low-latency and high accuracy that enhance users’ situational awareness and interactions depicting the environment’s status in an artistic way
[79] Jae-won Lee et al. (2023)Entertainment, MetaverseAI + IoT + VRLimited lab testing but with working proofWebcam for image capture, PoseNet for pose estimation, and a Teleport Information Extractor for posture classification, Raspberry Pi 4, Google Coral USB Accelerator, MCPI (Minecraft Pi Edition API), and Mobius as the IoT server, MobileNet V1 as the backbone network, MQTT and HTTPS for data transfer.Not specifiedAllows for unrestricted content consumption without spatial constraints, Compatible with devices adhering to the oneM2M standard, Network-based connections instead of wired ones, enhancing user mobility and interaction in the metaverse.Very limited functionality and use case, Requires specific lighting conditions, No discussion of latency and accuracy of the recognition, No comparison to existing solutions, Not scalable, Seems impractical for actual applications, No consideration of industry standards beyond oneM2M, Limited testing, Only showcased its application in the Minecraft video game environment.A system that estimates user postures using an AIoT device and enabled the teleportation of user avatars in a metaverse environment based on these postures. The experiments demonstrated effective communication between the AIoT device and the metaverse server, allowing for real-time repositioning of avatars based on user movements.
In [81] a telexistence UAV system was developed for 3D urban reconstruction, improving post-processing time and reducing 3D scan uncertainties. The system combines a UAV equipped with a stereo camera and an NVIDIA Jetson TX2 computing unit for image capture and data processing along with an HMD for real-time visualization. The system employed visual-inertial odometry (VIO) for robust camera tracking and combined it with a depth-fusion framework (a technique that combines depth data from multiple frames to create a consistent 3D model) to reduce pose drift and improve model accuracy. The depth fusion framework was enhanced by incorporating inertial measurement unit (IMU) data. The UAV captures IMU and image data, processes it on the compute unit and transmits it via Wi-Fi to the ground station. There the geometry data and maps motion is reconstructed based on the HMD user’s head pose. The data are rendered on the HMD in real-time visual feedback allowing users to interactively guide the drone to fill in missing areas. Simulation experiments took place in a 3D outdoor environment where a virtual drone performed 3D reconstruction tasks. Participants were asked to reconstruct virtual objects with and without visual feedback. Results showed that the VR-aided approach significantly improved model completeness and accuracy while demonstrating that the VIO-based tracking outperformed traditional methods. To further showcase the feasibility of the system the authors created a real-world UAV prototype forming a unified system with the HMD used in the simulation phase. In this system all three technologies assist in improving 3D reconstruction significantly. Challenges identified in the study included the need for precise UAV control to ensure effective scanning as well as latency issues in data transmission (approximately 500 ms) primarily stemming from the communication between the ground station and the UAV, and the response time of the mechanical system. The authors highlighted that while their system achieved real-time scanning, the computational load on the ground station limited the UAV’s autonomy, suggesting potential areas for future improvement and a transition to AIoT with the drone carrying out the entirety of the computational tasks.
Wu et al. in [82] presented an Internet-of-Vehicles (IoV) traffic condition system developed using AIoT for improving road safety and traffic management through real-time multimedia data transmission. Their system utilized Faster Region-based Convolutional Neural Networks (R-CNN) for the detection of objects such us road obstacles or construction works by analyzing video feeds from vehicles’ dashboard cameras. In addition to that the model analyzes GPS information from the cameras to realize vehicle speed and level of traffic congestion. The system operates over a 6G Network, which facilitates high-speed data transmission and QoS while communication can take place between vehicles’ cameras as well as RSUs. When new information about road conditions is detected, for example a crash, the system uses push notification to notify social groups instantly and inform the proper authorities. A proof-of-concept prototype using hardware tools like Raspberry Pi, GPS modules, and cameras was showcased and tested regarding the detection and notification process. One of the most important parts of this work is the incorporation of federated learning which allows dashboard cameras to update their recognition models locally without sharing raw data thus ensuring better privacy and security. In addition to that, the system uses bilinear pairing techniques to establish secure communication. Each vehicle’s 6G SIM is encrypted with a public key from the base station and a session key is generated for data transmission. This means that only authorized entities can decrypt the messages making communications more secure against attacks. Also the system uses anonymous IDs that change with each message so hackers can’t track or forge identities. The study shows AIoT can be adopted in IoV applications especially in smart cities without the need of extensive infrastructure changes. However the system relies heavily on the widespread adoption of 6G networking which is still in developmental stage and not globally available. Interoperability issues as well as standardization concerns also arise. The accuracy of the model in large scale implementations could vary depending on several factors and real-life road conditions which may have not been represented adequately in the tested dataset. Moreover the cost of integration for such system in a real world scenario is not discussed and the legal and ethical considerations for the use of video feed for model training need to furtherly be discussed too.
Continuing with the set of articles retrieved from IEEE Xplore, authors of [83] focused on the application of AI, VR, and IoT to combat gender-based violence. The authors proposed Bindi, an end-to-end autonomous multimodal system designed to automatically detect violent situations by identifying fear-related emotions. The system follows a layered architecture consisting of the edge, fog, and cloud layers. The edge layer includes wearable devices (specifically a bracelet and a pendant) that collect physiological and auditory data, with the bracelet running a lightweight KNN algorithm for real-time fear detection. The fog layer is implemented on a smartphone device and performs the auditory analysis using a NN and multimodal data fusion to combine the physiological and auditory inputs. If a risky situation is detected an alarm is triggered. The cloud layer, built with MongoDB and NodeJS, stores encrypted data for long-term monitoring and evidence collection. This layered architecture ensures real-time processing, scalability, and secure handling of sensitive data. The system was tested using the open WEMAC dataset introduced by the authors, which includes physiological and auditory data from 47 women exposed to fear-inducing VR stimuli. The study evaluated three multimodal data fusion strategies, achieving an average fear recognition accuracy of 63.61%. The main limitations of this work are regarding the small population of the dataset as well as the fact that the physiological and speech data were not collected simultaneously, potentially reducing the accuracy of multimodal fusion. The accuracy of the model is low and balancing true and false positives remains a challenge. Computational constrains on the edge devices is another limitation restricting the use of more advanced algorithms such as NNs and necessitate the adoption of lightweight alternatives like KNN.
Authors in [84] also proposed a novel intelligent wearable system for motion and emotion recognition that addresses limitations like single-function wearables and the lack of 3D DTs in similar systems. Their system effectively combines VR, AI and IoT for health monitoring and is composed of the hardware and the software system along with the 3D display platform. A helmet was designed which integrated a ten-axis accelerometer, gyroscope, camera, UWB chip for position tracking, a speaker with a recording function, and a TGAM module for electroencephalogram (EEG) signal acquisition. A Raspberry Pi serves as a processing terminal. The software system utilizes a DL network developed by the authors called the Three-Branch Spatial-Temporal Feature Extraction Network (TB-SFENet) for motion and emotion recognition. The TB-SFENet combines a bidirectional LSTM branch for temporal feature extraction, a transformer branch for global feature extraction, and a CNN branch for spatial feature extraction. The YOLOX algorithm was used for object recognition from the cameras’ video feed. For emotion recognition, the authors created the TGAM Electroencephalogram Emotion Classification (TEEC) dataset and additionally used two well-known datasets for motion recognition with the system achieving 82.50%, 97.04% and 92.68% accuracy in each dataset respectively. Authors run other models such as MobileNet on the same datasets and demonstrated how TB-SFENet generally outperformed the rest. The previously mentioned 3D platform was built using UE5 and its purpose is to visualize the user’s actions, emotions, and location in real-time, enabling interaction between the physical user and their DT. Feedback such as voice alerts for dangerous situations (e.g., falls or entering hazardous areas) is provided to the user after detection by the AI models. The system was furtherly tested in real scenarios, with motion recognition accurately synchronizing actions like standing, walking, and falling in the 3D DT platform and triggering voice alerts. Emotion recognition classified positive, negative, and calm states based on video-induced emotions, displaying them in the virtual environment. The system also demonstrated multi-person tracking capabilities, allowing simultaneous monitoring of multiple users. This work showcases a holistic approach to integrating VR, AI and IoT in health monitoring scenarios and authors proved the maturity of this system through thorough testing. However, some limitations and challenges remain such as the helmet’s and sensors’ bulkiness as well as synchronization issues and interference regarding the UWB chips.
Finally in [85] Joseph et al. introduce a multifunctional mobile application integrating AR, AI and IoT to enhance home automation and educational experiences. The app uses YOLOv8 for object detection, NLP, and OCR to identify household appliances through the smartphone’s camera feed. AR functionality, built with Unity and Vuforia Engine, overlays controls on appliances (like power buttons and temperature adjustments). The user can interact with these buttons and the app will then communicate to the integrated ESP32 micro-controller, either via Wi-Fi or Bluetooth, which acts as an IoT gateway. Sensors are also installed on home appliances to retrieve relevant data. The app analyzes data over time to provide insights about electricity usage and help users track bills and identify ways to save energy and reduced costs. Additionally, 3D generative models, 360-degree videos, and 2D images can be retrieved enabling interactive and immersive learning experiences. The user is able to interact with the AR objects through hand gestures captured by the smartphones front facing camera. The authors also mention that the app can be used as a medical reference tool for medical practitioners since it allows them to overlay 3D models of organs for educational purposes. A working mechanism is detailed and concept flow and IoT architecture diagrams illustrate the system’s functionality. However, authors do not provide extensive details about the model’s accuracy, nor do they include information about experiments or testing. While the paper outlines the app’s potential benefits in various fields further clarification on the system’s performance, such as reporting detailed model accuracy metrics, and conducting user satisfaction research as well as testing on a larger scale would increase its impact. The broad scope of applications that it attempts to cover combined with the lack of detail about the breadth or depth of the available content and while not following an open approach also raises questions about the app’s true potential. Authors also mention the limited educational content and suggest collaboration with educational institutions. This gap could also be addressed to some extent with the use of generative and conversational AI which was not employed in the system. The paper does not cover interoperability issues with the wide variety of IoT devices and protocols, limiting the app’s applicability in varied setups. Moreover, its reliance on smartphones, while cost-effective, impacts performance on lower-end devices. The lack of incorporated security measures also leaves vulnerabilities like unauthorized IoT access unaddressed.
To summarize, a concise summary of the papers retrieved from ACM and IEEE Xplore databases is presented in Table 2.

5. Addressing the Research Questions

This section provides a detailed discussion of the research questions introduced in Section 3, addressing each one in alignment with the specific objectives of this systematic review. By systematically exploring these questions, the aim is to present a comprehensive and clear review that directly answers the research questions set and supports our goals for further research.

5.1. RQ1: What Are the Primary Fields of Application Where VR, AI, and IoT Are Used in Combination?

To answer the first research question regarding the primary fields of application where VR, AI, and IoT intersect, a quantitative analysis on the papers that underwent systematic full-read screening has been conducted. The comprehensive examination of the identified scholarly works provides insights into the disciplinary landscape, where these technologies converge. The following table presents the identified fields of application as well as their frequency and prevalence in the literature.
The quantitative results, presented in Table 3 as well as Figure 2, reveal that healthcare, industry, and smart cities are more prevalent in the literature, with each field accounting for a significant number of articles. Specifically, 14 articles met the inclusion criteria after applying our additional criteria (in order to ensure quality and even out the population of papers in each application field of our review), while a broader set of 34 papers would be created representing the number of papers that would be included if our additional criteria were not applied. This set demonstrates even greater prevalence across the identified fields and helps better understand the current state of the literature.
Not surprisingly, healthcare is the main focus of applied research on the topic. The convergence of these three technologies not only benefits the field in ways that were previously impossible but also has a significant impact. The applications will revolutionize the field and benefit countless lives. In terms of the VR-AI-IoT combination, healthcare is the most mature field and will probably continue to be due to its impact and the revolutionizing possibilities.
Industry, as well as smart cities, are also in great focus with a good amount of ready-for-real-world applications. As the era of Industry 4.0 approaches, an increasing number of systems are emerging that enhance productivity, reduce losses, and improve worker safety. The high stakes of the field coupled with the vast economic benefits of integrating such systems into real working environments, underscore their importance. Several systems were showcased that will benefit from future improvements as they address ongoing challenges. The creation of efficient and sustainable urban environments is just as important while there is an increase in research regarding everything from smart appliances, smart homes, smart grid up to smart cities. Urban planning, infrastructure management, transportation and public safety as well as other emergency services are constantly evolving and the use of both VR, AI and IoT creates new capabilities and features for such applications.
On the other hand, apart from these 3 major domains, full implementations of systems regarding other fields were very rare and disproportionate compared to the above, which was the reason for deciding to include some additional criteria to limit the amount of works of the preserving fields as our research focuses more on the big picture where the convergence occurs. Through a rigorous systematic review process only 1 article was found to create a system to enhance visits in cultural heritage spaces like museums while another one was more loosely connected to the art application field, thus creating the Art & Culture application field which only included 2 articles. Then only another 2 articles were found to be eligible regarding the education sector which was surprising as it was expected for this field to have a significant amount of real world systems due to its global importance and the rise of digital learning, AI, and online education platforms. Finally, only a single article was found to be eligible in the metaverse section, which was more related to the entertainment sector as it was tested in a video game environment. This field, though, could be considered an umbrella field due to the word metaverse loosely being used many times to refer to an interactive virtual world. It was used in this case because the article specifically mentioned it, and no specific field for entertainment or video games was created, as the article suggested it could generally be applied to metaverses.
The quantitative analysis highlights several key trends in the application of VR, AI, and IoT across different domains. Healthcare, industry, and smart cities are the main subjects of the majority of articles, reflecting the maturity of these sectors in adopting new technologies. On the contrary the rest of the identified fields have significantly less contributions in terms of complete tested and implemented systems using all three technologies, which calls for further future research as there are many possibilities for improvement and to spark something new.
A final conclusion that can be drawn from the quantitative research and a look at Table 3 is that, in general, only a total of 34 articles out of 676 retrieved at the start of the screening process were found to be relevant. This indicates the considerable potential for future work in the field of VR, AI and IoT convergence regardless of the application domain. Many areas within this interdisciplinary field remain unexplored or under-researched. As these technologies continue to evolve and integrate, there will be a growing need for innovative solutions and rigorous academic inquiry to address emerging challenges and opportunities.

5.2. RQ2: How Are the Three Core Technologies Studied (VR-AI-IoT) or Their Subsets Currently Being Integrated in Different Application Domains, and What Unique Advantages Does Their Combined Usage Offer Compared to Isolated or Paired Applications?

To answer this research question more clearly, this section is organized into individual subsections based on each specific application field that the final selection of papers introduced. These subsections highlight key areas where the integration of these technologies is driving advancements and innovation and it stems from the set of eligible papers.
The integration of VR, AI, and the IoT is not merely an amalgamation of distinct technologies but a synergistic approach that attempts to revolutionize various application domains. Across healthcare, industry, education, smart cities, retail, and cultural heritage, this triad unlocks unprecedented possibilities, leveraging the individual strengths of each technology to deliver holistic solutions that surpass the capabilities of isolated implementations.

5.2.1. Healthcare

In healthcare, the combined use of VR, AI, and IoT addresses critical challenges in patient care, diagnostics, and especially medical professionals’ training. IoT devices enable real-time monitoring of patient vitals (e.g. [65]) and environmental conditions while AI algorithms provide predictive analytics for early diagnosis and treatment planning as well as info related to procedures, meanwhile VR creates immersive simulations for medical training and rehabilitation. DT frameworks are widely adopted and with the integration of IoT sensors to continuously gather patient data, AI models to predict health anomalies, and VR environments to simulate surgical procedures, thus enhancing both preventive care and medical education as seen in [68]. The collaboration offers realism, reduces errors, improves efficiency, and facilitates personalized healthcare, that wouldn’t be possible if one of those key technologies were missing. In addition to DTs many assistive device, such as in [66], are being created utilizing sensors, AI and many times VR or AR in order to improve people’s daily life with health problems such as the cognitive impaired.

5.2.2. Industry 4.0

The industrial sector similarly benefits from the convergence, particularly through the concept of DTs, which enable real-time monitoring of operations, prediction of equipment failures, and remote execution of dangerous tasks. IoT devices collect data from machinery (where inclusion of legacy devices and tools is also possible [70]), AI analyzes these data to proactively predict maintenance needs, and XR (especially VR and AR) provides operators with immersive interfaces for troubleshooting and remote assistance. For example, in manufacturing settings, the integration of XR devices such as HoloLens with IoT sensors and AI algorithms allows workers to visualize machine components, identify potential faults, and receive real-time guidance. This reduces downtime, enhances productivity, and minimizes human error, offering a competitive advantage in Industry 4.0.

5.2.3. Smart Cities

Smart city initiatives significantly enhance urban management, public safety, and citizen engagement tackling serious problems thanks to the VR-AI-IoT combinations. In such environments with the use of IoT networks vast amounts of real-time data from urban infrastructure are collected easily and with great accuracy while AI algorithms analyze them to provide insights and predictions. When fed with all the data and predictions, VR is able to give city planners immersive simulations and assist them in decision-making and crisis management [78]. UAV fleets will play a major role especially when equipped with IoT sensors and AI algorithms that enable them to monitor traffic, detect emergencies, and provide situational awareness to authorities while delivering instant first aids avoiding terrestrial traffic [75]. VR simulations further prepare first responders for various scenarios, improving response time and minimizing risks.

5.2.4. Education

The education sector has already met significant improvements with the introduction of digital learning and LLMs but additional improvements arise thanks to the synergistic combination of technologies. Up until now AI was able to create personalized and tailored learning content but it lacks the interactive and engaging elements that VR or AR provide, or the realism that IoT sensors can feed into such environments. In this domain IoT devices are able to capture student interactions, AI personalizes learning experiences by adapting content to individual needs while also assessing performance, and VR offers immersive environments that enhance engagement and retention. Such systems propose a more holistic approach to digital learning and are able to provide students with a deeper understanding of complex concepts, encourage self-paced learning, and bridge the gap between theoretical knowledge and practical application without necessarily rendering the role of a human teacher obsolete just yet. Though several applications exist, not many of them made use of all of the technologies effectively, thus major improvements are expected in the future.

5.2.5. Art & Culture

Less but detectable were the applications concerning the art & culture field were they were found to be limited to enhancing the interactive experience of museum tours through the use of IoT location tag sensors to track visitor position and AI to personalize content delivery through voice interactions in order to limit screen time in visits [73]. Additionally, [74] showcased how, through IoT sensors and spatial data, AI was able to create immersive, contextually relevant AR scenes. Even though there have been attempts, the field remains under-explored and has yet to develop significant systems integrating the triad of AI-VR-IoT.

5.3. RQ3: What Methodologies, Tools and Architectures Are Commonly Employed in Studies Combining VR, AI, and IoT?

Through this systematic review, several core methodologies, tools, and architectures that facilitate the convergence of VR, AI, and IoT were identified. This section discusses the most common ones across the set of eligible papers. The sections is broken down to individual subsections.

5.3.1. Digital Twins

The first and most common approach across all the articles incorporating all the aforementioned technologies are digital twins. DTs are a prevalent methodology providing realism since they serve as virtual replicas of physical systems, allowing real-time monitoring and control through IoT devices, predictive analysis through AI models, and immersive visualization through VR interfaces [86]. This approach is commonly employed in industries such as healthcare, manufacturing, and smart cities.

5.3.2. Layered Architectures

A layered system architecture ensures modularity, scalability, and efficient data processing. Typical layers include a hardware layer with IoT devices, a data layer for storage and analysis, a business logic layer with AI models, and a presentation layer using XR interfaces. This structure allows for seamless integration of different technologies and ensures efficient system performance.
A good example was [77] which proposed a hierarchical four-layered system that includes a physical layer with IoT networks and devices, a slicing layer that regulates communications, a microverse layer for DTs, and an application layer for user interaction where VR and AR could be implemented. [72] describes a system with a presentation layer for user interaction, a business layer for data processing and assessment, and an application server for system logic. [78] uses a layered approach with a blockchain handler for security and a SIP message handler for processing data. The Sensor2Scene framework at [74], while not explicitly using a layered architecture, uses a framework that can be seen as layered: sensor data are processed and then used to create AR experiences with LLMs and generative models as intermediaries.

5.3.3. Machine Learning Models

Various machine learning models are employed to process data collected from IoT devices and enhance decision-making. Common models include convolutional neural networks (as seen in [65,66,68,78]) especially for image recognition. Furthermore, LLMs such as GTP-4 are also employed to interpret sensor data and generate scene descriptions for AR experiences as seen in [74]. Support vector machines were also used in papers for classification tasks (e.g., [66,72]). Other conventional ML algorithms such as decision trees for predictive analytics were used in [67], as well as logistic regression, Naive Bayes and k-means appeared in [71], but they were significantly less prominent in bibliography compared to DL algorithms and neural networks, which also tend to achieve much higher accuracies compared to more conventional algorithms especially in more advanced tasks such as object recognitions etc. These models are crucial in applications such as healthcare diagnostics, industrial maintenance, and educational assessments. Last but not least, another prominent algorithm was YOLO (You Only Look Once) which was used in several papers for real-time object detection and it appeared to perform commendably well in various applications.

5.3.4. Communication Protocols

In the majority of the source material studied, real-time data exchange between IoT devices, AI systems, and VR interfaces is achieved through communication protocols such as MQTT (Message Queuing Telemetry Transport), WebRTC (Web Real-Time Communication), and SIP (Session Initiation Protocol) alongside HTTPS (Hypertext Transfer Protocol Secure). MQTT is favored for its lightweight messaging capabilities and low bandwidth requirements, making it ideal for IoT applications. WebRTC enables peer-to-peer communication, ensuring low latency and high-quality audio and video streams essential for immersive VR experiences. SIP supports the establishment, modification, and termination of communication sessions, providing a robust framework for integrating multiple systems. These protocols ensure low latency and reliable data transmission, which is critical for time-sensitive applications.

5.3.5. Tools and Platforms

The convergence of VR, AI, and IoT in various domains is supported by a suite of tools and platforms. Unity [87] and Unreal Engine [88] are prominent for VR development, offering robust environments for creating high-fidelity, interactive virtual experiences. These platforms provide extensive libraries and support for various VR hardware, streamlining the development process. Regarding VR headsets, the most common one seems to be the Quest family while for AR most of the systems used Hololens glasses.
Python, paired with frameworks such as TensorFlow [89] and Keras [90], is very widely used for implementing AI models, again due to its simplicity and the extensive range of machine learning libraries available. These tools facilitate the development of sophisticated AI algorithms that can analyze and adapt to data in real-time.
For IoT prototyping, Arduino and Raspberry Pi are popular choices, providing versatile and cost-effective platforms for building and testing IoT devices. These platforms support a range of sensors and actuators, enabling the integration of physical devices with digital systems. Smartphones are also used as a sensor power-house while offering efficient computational power in certain scenarios.

5.4. RQ4: What Are the Current Limitations and Challenges in Combining VR, AI, and IoT, Particularly Regarding Data Collection, Interoperability, and User Experience?

Despite significant advancements, integrating VR, AI, and IoT presents numerous challenges and limitations which are revealed through the literature. These limitations hinder the seamless operation of integrated systems and require continuous research to address them effectively, stemming for future work.
Data collection in AIoT systems heavily relies on the accuracy and reliability of IoT sensors. However, environmental factors, such as lighting conditions, temperature, and physical obstructions, can compromise sensor performance, leading to inconsistent or incomplete data. The high cost of some sensors hinders scalability and constitutes a significant limitation as the prohibitive cost does not make them attractive options for use in large-scale real-life scenarios.
Privacy and security of collected data is a critical issue. AIoT solutions must balance the need for accurate data collection with users’ privacy, which often necessitates secure data handling and compliance with data protection regulations. Privacy concerns are prevalent in domains such as healthcare and smart cities, where sensitive user data are collected in real time. It is of great importance to address these concerns by implementing non-invasive, privacy-preserving IoT solutions. As aforementioned, after data is collected, ML models require these large datasets in order to perform sufficiently, however this is tied to an increasing risk of privacy violations. Specifically in metaverse environments and XR systems, privacy threats arise from the single point of failure architectures, the extensive use of biometric sensors, eye-tracking data, and behavioral analytics, which can expose personally identifiable information. To mitigate these risks, integrating privacy-enhancing technologies (PETs) such as homomorphic encryption (HE), blockchain technology, differential privacy (DP), and federated learning (FL) is essential and is a recommended way to combating those issues [91,92]. For example, HE allows systems to act on encrypted data without exposing raw information thus ensuring that sensitive data remains safe. DP introduces noise into the system to improve data anonymization. Finally, with hybrid techniques and while combining the above FL a safer and robust decentralized architecture can be created. FL enables edge devices to locally train models using their own datasets, transmitting only aggregated model updates (e.g., weights and biases) to central servers instead of raw data, thus making systems less susceptible to poisoning attacks and unauthorized data access [93]. Secure communication is key in AIoT systems especially in XR environments where continuous communication of data between devices takes place and where personally identifiable information is constantly extracted from devices such as HMDs.
At the same time, many ethical considerations also arise, especially in cases of monitoring in scenarios such as smart cities. Ensuring data privacy and security is essential to gain user trust and facilitate the widespread adoption of these technologies. Ongoing advancements should seriously take this into account and propose modern cybersecurity measures while also maintaining transparency and ensuring data are collected with the users’ consent. Regarding the training of ML models, the issue of ethical use of AI also arises, particularly with respect to bias in training datasets. This bias often stems from the data collection processes, where incomplete or uneven data for certain groups can lead to their under-representation or misrepresentation. This issue became more prominent with the increasing use of GenAI for creating synthetic data [94]. To avoid that, it is important to ensure that AI models are trained on diverse and representative (bias-free) datasets that accurately reflect the current state in reality. This can be achieved with techniques such as oversampling of underrepresented classes, undersampling of overrepresented ones, the use of synthetic data as well as the pre-processing, but this can be a time-consuming task [94,95,96]. Apart from the representation and generative bias, algorithmic bias is one of the most common issue that occurs when AI algorithms are designed or implemented with emphasis on specific features during training, thus resulting in unrealistic/unfair conclusions. The move to explainable AI (XAI) will assist in identifying bias sources with techniques like feature importance analysis, and promote fairness by having transparent decision-making processes. Post-processing of AI-generated content as well as adversarial training are some of the methods that can identify and mitigate bias in fair AI systems [94].
Along with the process of data collection, there is a significant need for data-availability. Setting up real-life experiments and collecting real data is challenging, while large-scale implementations are scarce. There are limited datasets available for training in very specific applications, and privacy concerns, as well as limited trust and lack of knowledge of new technologies from users, further hinder the creation of such datasets by smaller research teams. Due to that, it is hard to train highly accurate AI models as they are strongly depend on data, while the quality and variations of the collected data also plays a major role in the accuracy. It is important to highlight that data collection and limited real-life testing are among the most significant limitations. Low-accuracy readings and generalizability of the models were a common challenge that some experiments faced and it could partly relate to the aforementioned factors. As indicated by quantitative research and the overview of eligible studies, there is a notable lack of real-life testing for many systems, which should be addressed more thoroughly in future work.
Interoperability is another significant challenge due to the lack of standardized communication protocols and data formats. Integrating new technologies with existing legacy systems is often complex and time-consuming. Developing universal standards and adopting open-source solutions can help address these interoperability issues. Semantics, which were not usually employed in the systems reviewed, can assist in this matter and overcome problems like the heterogeneity of sensor data.
In addition to all of the above, integrating VR, AI, and IoT systems often requires substantial computational resources. The continuous increase of data stored along with data considered by LLMs can lead to an unmanageable computational overhead, particularly on lower-end devices. Real-time data processing is even more demanding, especially with multiple sensors, requiring substantial resources. In the IoT world data is not simply stored but also constantly transferred between things, so network traffic is another crucial factor. The choice between edge and cloud computing affects computational overhead. While edge computing can reduce latency by processing data closer to the source, it might also strain the resources of edge devices. On the other hand, cloud computing can handle larger processing loads but may introduce latency due to network delays. These limitations impact the scalability of such systems, particularly in resource-constrained environments that rely on crowd intelligence and user-generated content.
Finally, user experience challenges include balancing immersion with usability in VR environments and addressing latency issues in real-time data processing. Immersive VR experiences can overwhelm users, reducing overall usability, while latency disrupts real-time interactions. Ref. [66] stated that latency issues in virtual shopping systems could negatively impact user engagement, underscoring the need for optimized network infrastructure and data processing algorithms. Moreover, UI interfaces are often overlooked, yet they play a crucial role in the user experience. Poorly designed interfaces can hinder interaction and navigation within VR environments while making the user constantly crave the real-life one. Additionally, the textures and shading of VR worlds are frequently cited as immersion breakers, detracting from the overall experience. Enhancing these graphical elements is essential to maintaining a seamless and engaging virtual experience. Addressing these user experience challenges requires a holistic approach, involving the optimization of network infrastructure, improvements in UI design, and integration of the advanced graphical rendering techniques included even in commonly used tools such as Unity and Unreal Engine. These efforts are vital in creating VR environments that are not only immersive but also user-friendly and responsive.
The challenges and limitations in integrating VR, AI, and IoT are complex and differ depending on the application domain. Data collection issues, interoperability barriers, user experience concerns, computational demands, ethical considerations, and sector-specific challenges must be addressed to achieve seamless integration. Overcoming these obstacles requires a collaborative effort from researchers, developers, and policymakers to establish standardized protocols, improve sensor technology, enhance user interfaces, and ensure data privacy and security. Future research should focus on developing scalable, user-centric solutions that address these challenges while maximizing the potential benefits of VR, AI, and IoT integration.

5.5. RQ5: Are There Any Emerging Frameworks or Models of the Seamless Combination of the Three Technologies, for Specific Application Domains?

The previous sections have laid the groundwork to address the final research question. As seen in Table 1 and the literature review, there are examples of emerging frameworks in multiple fields which are ready to be implemented in real-life cases or that have already underwent such testing. The main application fields are healthcare, accessibility & assistive technologies, industry, retail, education, art & culture, entertainment, metaverse, mobility & smart transportation, safety & emergency services, sensing & data collection as well as smart cities & infrastructure.
However, while there are promising examples, as highlighted by the quantitative research conducted, the number of existing frameworks and applications that effectively use each technology of the synergistic combination is rather limited. This, coupled with the discussed challenges and limitations, emphasizes the need for further research and development to create revolutionary systems that fully harness the potential of these converging technologies.

6. Future Research Directions

Our systematic review reveals significant opportunities for future research in the integration of VR, AI, and IoT technologies. While current implementations demonstrate considerable promise, there are still notable gaps in both the range of applications explored and the maturity of developed systems. Future research efforts should address these gaps while expanding into unexplored domains and tackling the challenges identified in this review.
Research in AIoT, DTs, and XR technologies presents exciting possibilities for transforming various sectors. A key area for future investigation is the development of more advanced AI models capable of analyzing real-time data within AIoT systems, particularly focusing on their ability to learn from larger and continuously expanding datasets. Future systems could benefit from incorporating crowd intelligence to enhance both data collection and model training processes. Generative and agentic AI (LLMs) and conversational AI (Chatbots), will significantly improve human-computer interactions within DTs and XR environments by allowing for fast and easy generation of simulations, more intuitive communication and adaptive interaction. They will generate roles/agents playing in simulated environments, and natural responses employing various forms of feedback, such as verbal, visual, haptic and more. Generative AI will assist in XR by creating dynamic, context-aware simulation/virtual environments and objects that adapt to user behavior and real-world conditions captured through IoT sensors. Agentic AI will generate multi-agent systems for role-playing simulation. Such agents will autonomously and collaboratively perform complex tasks and alter environments (either virtual or real world ones) without being bound to user input. Instead, these systems will be continuously enhanced based on events and goal-driven decision making, demonstrating proactive behavior by analyzing sensor data. Furthermore, the introduction of FL in such systems, as well as PETs and decentralized architectures without a single point of failure has the potential to significantly enhance the security of future systems, bringing them closer to an acceptable threshold.
Another crucial research direction lies in improving human-digital twin interactions within XR environments. This includes creating more natural interfaces for DT manipulation, implementing various interaction methods such as speech and gesture control, and developing feedback systems that engage multiple human senses while enhancing the realism of virtual environments. The development of dynamic ever-evolving DTs that can automatically update and adapt to real-time changes in their physical counterparts is also an interesting research direction. Attempting to also incorporate semantics for data homogenization and standardization will also be an important addition in future systems, making them more interoperable.
Additionally, future work should aim to bridge the gap between theoretical advancements and real-world applications. Many proposed systems and theoretical frameworks currently face challenges in practical implementation. To address this, researchers should conduct long-term studies evaluating the impact of VR-AI-IoT systems on real-world users and communities. Collaboration with industry partners and public institutions will be essential for testing and implementing integrated solutions in real-world settings.
Regarding specific application field directions, a promising one for future research on this synergistic technological approach is smart transportation and mobility. The potential for VR to simulate traffic patterns and enable teleoperation, AI to optimize route planning and facilitate MaaS (Mobility as a Service), and IoT to provide real-time vehicle data, could enhance urban mobility systems and create standardized mobility platforms. VR can provide immersive training for the development and testing of autonomous vehicle algorithms, allowing engineers to simulate various traffic scenarios in a controlled and risk-free environment. Additionally, VR can aid in the design and planning of smart city infrastructure by visualizing traffic flows and optimizing urban mobility systems. AI can dynamically adjust traffic signals and manage traffic flow based on real-time data, reducing congestion and improving fuel efficiency, and many more. IoT sensors embedded in vehicles and infrastructure can monitor and share real-time information about road conditions, vehicle performance, and environmental factors, enabling predictive maintenance and even more importantly avoiding accidents on the road by having a connected network of vehicles.
Similarly, fields such as agriculture, present a fertile ground for innovation, where IoT sensors can monitor soil conditions, AI can predict crop yields, and VR can offer immersive training for farmers. This integrated approach can lead to more efficient and sustainable farming practices, improving productivity and reducing environmental impact. More systems are anticipated to emerge in specific fields such as mental health, gaming and e-sports, as well as art, culture, and tourism, where very limited applications were identified during our research. In mental health, for example, integrating VR, AI, and IoT can offer innovative therapies and real-time monitoring as well as create immersive simulations to improve patient outcomes. In gaming and e-sports, these technologies can enhance immersive experiences with changing environments based on sensor data, personalize game-play, allow for novel ways of human-computer interaction and provide advanced analytics for performance improvement.
As part of our own future research efforts, we aim to explore the potential of XR and AI in optimizing experiences in cultural settings with the assistance of IoT sensors, aiming to contribute to the broader field of related research that explores this synergistic combination. We plan to create a DT counterpart of a real museum. By collecting and analyzing visitors’ interaction data from both the real and the digital world, we aim to develop dynamic systems that improve spatial layouts of exhibitions and enhance engagement with less popular exhibits. This research project will assess the impact of AI-driven route optimization on visitor satisfaction and engagement, contributing valuable insights into the practical implementation of immersive technologies in real-world cultural environments. This research will not only improve museums/exhibitions layouts but also serve as a scalable framework for similar applications in retail and educational environments, further demonstrating the practical value of VR-AI-IoT integrations across different domains.

7. Conclusions

This systematic review explored the integration of VR, AI, and IoT technologies, while investigating their application in real-world domains and proof-of-concept implementations that effectively employ these technologies synergistically in interdisciplinary domains. The primary domains of application identified were healthcare, Industry 4.0, and smart cities which demonstrate significant advancements, with DTs playing an important role in enabling real-time monitoring, predictive analytics, and immersive simulations. Several frameworks, tools and methodologies were showcased. IoT typically provides real-world data through sensors and enables the connection of the virtual to the real world through various devices. AI processes and analyzes data to generate world specific content, communicate with users, and automate decision-making, while VR offers immersive environments.
Numerous challenges were highlighted including data reliability issues, user experience limitations, high implementation costs and interoperability. Additionally, the study revealed that there is a considerable gap in the literature regarding the development of systems that integrate/combine all three technologies in a real-world setting. While some sectors have made progress, others, such as education and cultural heritage, remain underexplored. The findings suggest a need for standardized frameworks, improved sensor technology, and optimized system architectures to enhance scalability and adoption. Additional attention must be paid in EDTs that use AI as a fuel (e.g., iVR, metaverse) to boost their performance and acceptance in tackling global threats and megatrends such as Health, Climate Change, Urbanization and Smart Cities.
It is emphasized that achieving seamless integration of EDTs requires overcoming certain technological (and other) barriers. By bridging the gap between isolated technological advancements, this review provides a foundation for future research and development aimed at creating intelligent, interoperable, adaptable, and resilient systems, capable of addressing complex challenges in a rapidly evolving technological landscape.

Author Contributions

Conceptualization, D.K. and K.K.; methodology, D.K. and K.K.; validation, V.K. and K.K.; formal analysis, D.K. and K.K.; investigation, D.K.; writing—original draft preparation, D.K.; writing—review and editing, V.K. and K.K.; visualization, D.K.; supervision, V.K. and K.K.; project administration, V.K. and K.K.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial IntelligenceNLPNatural Language Processing
AIoTArtificial Intelligence of ThingsPETprivacy-Enhancing Technologies
CNNConvolutional Neural NetworkPLCProgrammable Logic Controller
DBNDeep Belief NetworkPoCProof of Concept
DLDeep LearningPSOParticle Swarm Optimization
DPDifferential PrivacyPVDFPolyVinyliDene Fluoride
DTDigital TwinQoSQuality of Service
EDTEmerging Disrupted TechnologiesR-CNNRegion-based Convolutional
Neural Networks
ETEmerging TechnologiesRNNRecurrent Neural Network
FLFederated LearningRSURoadside Unit
HCIHuman Computer InteractionSIPSession Initiation Protocol
HEHomomorphic EncryptionSUSSystem Usability Scale
HRCHuman-robot collaborationSVMSupport Vector Machine
IoTInternet of ThingsTENGTriboElectric NanoGenerators
IoVInternet-of-VehiclesT-TENGTactile TriboElectric NanoGenerators
LiDARLight Detection And RangingUAVUnmanned Aerial Vehicle
LLMLarge Language ModelUWBUltra-WideBand
LSTMLong Short-Term MemoryVRVirtual Reality
L-TENGLength TriboElectric NanoGeneratorsWOSWeb of Science
MaaSMobility as a ServiceXAIeXplainable AI
MCUMicrocontroller UnitXReXtended Reality
MLMachine Learning
NDTNetwork Digital Twin

References

  1. Yang, U.; Son, H.; Han, K. Developing a Realistic VR Interface to Recreate a Full-body Immersive Fire Scene Experience. In Proceedings of the SIGGRAPH Asia 2023 Posters Conference, Sydney, NSW, Australia, 12–15 December 2023; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
  2. Innocente, C.; Ulrich, L.; Moos, S.; Vezzetti, E. A framework study on the use of immersive XR technologies in the cultural heritage domain. J. Cult. Herit. 2023, 62, 268–283. [Google Scholar] [CrossRef]
  3. Saeed, M.; Khan, A.; Khan, M.; Saad, M.; El Saddik, A.; Gueaieb, W. Gaming-Based Education System for Children on Road Safety in Metaverse Towards Smart Cities. In Proceedings of the 2023 IEEE International Smart Cities Conference (ISC2), Bucharest, Romania, 24–27 September 2023; pp. 1–5. [Google Scholar] [CrossRef]
  4. Lai, Y.H.; Chen, S.Y.; Lai, C.F.; Chang, Y.C.; Su, Y.S. Study on enhancing AIoT computational thinking skills by plot image-based VR. Interact. Learn. Environ. 2021, 29, 482–495. [Google Scholar] [CrossRef]
  5. Epp, R.; Lin, D.; Bezemer, C.P. An Empirical Study of Trends of Popular Virtual Reality Games and Their Complaints. IEEE Trans. Games 2021, 13, 275–286. [Google Scholar] [CrossRef]
  6. Naranjo, J.E.; Sanchez, D.G.; Robalino-Lopez, A.; Robalino-Lopez, P.; Alarcon-Ortiz, A.; Garcia, M.V. A Scoping Review on Virtual Reality-Based Industrial Training. Appl. Sci. 2020, 10, 8224. [Google Scholar] [CrossRef]
  7. Lopez, M.A.; Terrón, S.; Lombardo, J.M.; González-Crespo, R. Towards a solution to create, test and publish mixed reality experiences for occupational safety and health learning: Training-MR. Int. J. Interact. Multimed. Artif. Intell. 2021, 7, 212. [Google Scholar]
  8. Rizzo, A.; Goodwin, G.; De Vito, A.; Bell, J. Recent Advances in Virtual Reality and Psychology: Introduction to the Special Issue. Transl. Issues Psychol. Sci. 2021, 7, 213–217. [Google Scholar] [CrossRef]
  9. Maslej, N.; Fattorini, L.; Perrault, R.; Parli, V.; Reuel, A.; Brynjolfsson, E.; Etchemendy, J.; Ligett, K.; Lyons, T.; Manyika, J.; et al. Artificial Intelligence Index Report 2024 Stanford Institute for Human-Centered Artificial Intelligence. Available online: https://hai-production.s3.amazonaws.com/files/hai_ai-index-report-2024-smaller2.pdf (accessed on 25 January 2025).
  10. Muralikrishna, V.; Vijayalakshmi, M. Autonomous Human Computer Interaction System in Windows Environment Using YOLO and LLM. In Proceedings of the 4th International Conference on Artificial Intelligence and Smart Energy; Manoharan, S., Tugui, A., Baig, Z., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 157–169. [Google Scholar]
  11. Akpan, I.J.; Kobara, Y.M.; Owolabi, J.; Akpan, A.A.; Offodile, O.F. Conversational and generative artificial intelligence and human–chatbot interaction in education and research. Int. Trans. Oper. Res. 2024, 32, 1251–1281. [Google Scholar] [CrossRef]
  12. Kalla, D.; Smith, N.; Samaah, F.; Kuraku, S. Study and analysis of chat GPT and its impact on different fields of study. Int. J. Innov. Sci. Res. Technol. 2023, 8, 827–833. [Google Scholar]
  13. Al-Fuqaha, A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications. Commun. Surv. Tutor. 2015, 17, 2347–2376. [Google Scholar] [CrossRef]
  14. Reiners, D.; Davahli, M.R.; Karwowski, W.; Cruz-Neira, C. The Combination of Artificial Intelligence and Extended Reality: A Systematic Review. Front. Virtual Real. 2021, 2, 721933. [Google Scholar] [CrossRef]
  15. Muhammed, D.; Ahvar, E.; Ahvar, S.; Trocan, M.; Montpetit, M.J.; Ehsani, R. Artificial Intelligence of Things (AIoT) for smart agriculture: A review of architectures, technologies and solutions. J. Netw. Comput. Appl. 2024, 228, 103905. [Google Scholar] [CrossRef]
  16. Hu, M.; Luo, X.; Chen, J.; Lee, Y.C.; Zhou, Y.; Wu, D. Virtual reality: A survey of enabling technologies and its applications in IoT. J. Netw. Comput. Appl. 2021, 178, 102970. [Google Scholar] [CrossRef]
  17. Ribeiro de Oliveira, T.a.; Biancardi Rodrigues, B.; Moura da Silva, M.; Antonio, N. Spinassé, R.; Giesen Ludke, G.; Ruy Soares Gaudio, M.; Iglesias Rocha Gomes, G.; Guio Cotini, L.; da Silva Vargens, D.; Queiroz Schimidt, M.; et al. Virtual Reality Solutions Employing Artificial Intelligence Methods: A Systematic Literature Review. ACM Comput. Surv. 2023, 55, 1–29. [Google Scholar] [CrossRef]
  18. Adli, H.K.; Remli, M.A.; Wan Salihin Wong, K.N.S.; Ismail, N.A.; González-Briones, A.; Corchado, J.M.; Mohamad, M.S. Recent Advancements and Challenges of AIoT Application in Smart Agriculture: A Review. Sensors 2023, 23, 3752. [Google Scholar] [CrossRef]
  19. Chaturvedi, R.; Verma, S.; Ali, F.; Kumar, S. Reshaping Tourist Experience with AI-Enabled Technologies: A Comprehensive Review and Future Research Agenda. Int. J. Hum.–Comput. Interact. 2023, 40, 5517–5533. [Google Scholar] [CrossRef]
  20. Pyun, K.R.; Rogers, J.A.; Ko, S.H. Materials and devices for immersive virtual reality. Nat. Rev. Mater. 2022, 7, 841–843. [Google Scholar] [CrossRef]
  21. Baashar, Y.; Alkawsi, G.; Wan Ahmad, W.N.; Alomari, M.A.; Alhussian, H.; Tiong, S.K. Towards Wearable Augmented Reality in Healthcare: A Comparative Survey and Analysis of Head-Mounted Displays. Int. J. Environ. Res. Public Health 2023, 20, 3940. [Google Scholar] [CrossRef]
  22. Liberatore, M.J.; Wagner, W.P. Virtual, mixed, and augmented reality: A systematic review for immersive systems research. Virtual Real. 2021, 25, 773–799. [Google Scholar] [CrossRef]
  23. Al-Ansi, A.M.; Jaboob, M.; Garad, A.; Al-Ansi, A. Analyzing augmented reality (AR) and virtual reality (VR) recent development in education. Soc. Sci. Humanit. Open 2023, 8, 100532. [Google Scholar] [CrossRef]
  24. Velazquez-Pimentel, D.; Hurkxkens, T.; Nehme, J. A Virtual Reality for the Digital Surgeon. In Digital Surgery; Springer International Publishing: Cham, Switzerland, 2020; pp. 183–201. [Google Scholar] [CrossRef]
  25. Chheang, V.; Schott, D.; Saalfeld, P.; Vradelis, L.; Huber, T.; Huettl, F.; Lang, H.; Preim, B.; Hansen, C. Advanced liver surgery training in collaborative VR environments. Comput. Graph. 2024, 119, 103879. [Google Scholar] [CrossRef]
  26. Marougkas, A.; Troussas, C.; Krouska, A.; Sgouropoulou, C. How personalized and effective is immersive virtual reality in education? A systematic literature review for the last decade. Multimed. Tools Appl. 2023, 83, 18185–18233. [Google Scholar] [CrossRef]
  27. Thomas, A. Virtual Reality (VR)—Statistics & Facts. 2025. Available online: https://www.statista.com/topics/2532/virtual-reality-vr/ (accessed on 25 January 2025).
  28. Kari, T.; Kosa, M. Acceptance and use of virtual reality games: An extension of HMSAM. Virtual Real. 2023, 27, 1585–1605. [Google Scholar] [CrossRef] [PubMed]
  29. Weinstein, L.; Chardonnet, J.R.; Merienne, F. Cybersickness and the perception of latency in immersive virtual reality. Front. Virtual Real. 2020, 1, 582204. [Google Scholar] [CrossRef]
  30. Hatami, M.; Qu, Q.; Chen, Y.; Kholidy, H.; Blasch, E.; Ardiles-Cruz, E. A Survey of the Real-Time Metaverse: Challenges and Opportunities. Future Internet 2024, 16, 379. [Google Scholar] [CrossRef]
  31. Hamad, A.; Jia, B. How Virtual Reality Technology Has Changed Our Lives: An Overview of the Current and Potential Applications and Limitations. Int. J. Environ. Res. Public Health 2022, 19, 11278. [Google Scholar] [CrossRef]
  32. Meske, C.; Hermanns, T.; Jelonek, M.; Doganguen, A. Enabling Human Interaction in Virtual Reality: An Explorative Overview of Opportunities and Limitations of Current VR Technology. In HCI International 2022 – Late Breaking Papers: Interacting with eXtended Reality and Artificial Intelligence; Chen, J.Y.S., Fragomeni, G., Degen, H., Ntoa, S., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 114–131. [Google Scholar] [CrossRef]
  33. Abbasi, M.; Váz, P.; Silva, J.; Martins, P. Enhancing Visual Perception in Immersive VR and AR Environments: AI-Driven Color and Clarity Adjustments Under Dynamic Lighting Conditions. Technologies 2024, 12, 216. [Google Scholar] [CrossRef]
  34. Ashtari, N.; Bunt, A.; McGrenere, J.; Nebeling, M.; Chilana, P.K. Creating Augmented and Virtual Reality Applications: Current Practices, Challenges, and Opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20); Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–13. [Google Scholar] [CrossRef]
  35. Minerva, R.; Biru, A.; Rotondi, D. Towards a definition of the Internet of Things (IoT). In Proceedings of the Towards a Definition of the Internet of Things (IoT). IEEE; 2015. Available online: https://iot.ieee.org/images/files/pdf/IEEE_IoT_Towards_Definition_Internet_of_Things_Revision1_27MAY15.pdf (accessed on 25 January 2025).
  36. Shovic, J.C. Raspberry Pi IoT Projects: Prototyping Experiments for Makers; Apress: Berkley, CA, USA, 2021. [Google Scholar] [CrossRef]
  37. Pliatsios, A.; Goumopoulos, C.; Kotis, K. Interoperability in IoT: A Vital Key Factor to Create the “Social Network” of Things. In Proceedings of the 13hh International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM 2019), Porto, Portugal, 22–26 September 2019. [Google Scholar]
  38. Gokhale, P.; Bhat, O.; Bhat, S. Introduction to IOT. Int. Adv. Res. J. Sci. Eng. Technol. 2018, 5, 41–44. [Google Scholar]
  39. Li, K.; Cui, Y.; Li, W.; Lv, T.; Yuan, X.; Li, S.; Ni, W.; Simsek, M.; Dressler, F. When Internet of Things meets Metaverse: Convergence of Physical and Cyber Worlds. arXiv 2022, arXiv:2208.13501. [Google Scholar] [CrossRef]
  40. Chen, J.; Shi, Y.; Yi, C.; Du, H.; Kang, J.; Niyato, D. Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey. arXiv 2024, arXiv:2401.13699. [Google Scholar] [CrossRef]
  41. Al-Nbhany, W.A.N.A.; Zahary, A.T.; Al-Shargabi, A.A. Blockchain-IoT Healthcare Applications and Trends: A Review. IEEE Access 2024, 12, 4178–4212. [Google Scholar] [CrossRef]
  42. Omrany, H.; Al-Obaidi, K.M.; Hossain, M.; Alduais, N.A.M.; Al-Duais, H.S.; Ghaffarianhoseini, A. IoT-enabled smart cities: A hybrid systematic analysis of key research areas, challenges, and recommendations for future direction. Discov. Cities 2024, 1, 2. [Google Scholar] [CrossRef]
  43. Elgazzar, K.; Khalil, H.; Alghamdi, T.; Badr, A.; Abdelkader, G.; Elewah, A.; Buyya, R. Revisiting the internet of things: New trends, opportunities and grand challenges. Front. Internet Things 2022, 1, 1073780. [Google Scholar] [CrossRef]
  44. Kumar, S.; Tiwari, P.; Zymbler, M. Internet of Things is a revolutionary approach for future technology enhancement: A review. J. Big Data 2019, 6, 1–21. [Google Scholar] [CrossRef]
  45. Khanna, A.; Kaur, S. Internet of Things (IoT), Applications and Challenges: A Comprehensive Review. Wirel. Pers. Commun. 2020, 114, 1687–1762. [Google Scholar] [CrossRef]
  46. Malhotra, P.; Singh, Y.; Anand, P.; Bangotra, D.K.; Singh, P.K.; Hong, W.C. Internet of Things: Evolution, Concerns and Security Challenges. Sensors 2021, 21, 1809. [Google Scholar] [CrossRef] [PubMed]
  47. European Parliament. What Is Artificial Intelligence and How Is It Used? Available online: https://www.europarl.europa.eu/topics/en/article/20200827STO85804/what-is-artificial-intelligence-and-how-is-it-used (accessed on 25 January 2025).
  48. IBM. AI vs. Machine Learning vs. Deep Learning vs. Neural Networks. Available online: https://www.ibm.com/think/topics/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks (accessed on 25 January 2025).
  49. Bi, S.; Wang, C.; Zhang, J.; Huang, W.; Wu, B.; Gong, Y.; Ni, W. A Survey on Artificial Intelligence Aided Internet-of-Things Technologies in Emerging Smart Libraries. Sensors 2022, 22, 2991. [Google Scholar] [CrossRef]
  50. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8. [Google Scholar] [CrossRef]
  51. Das, S.; Tariq, A.; Santos, T.; Kantareddy, S.S.; Banerjee, I. Recurrent Neural Networks (RNNs): Architectures, Training Tricks, and Introduction to Influential Research. In Machine Learning for Brain Disorders; Springer: New York, NY, USA, 2023; pp. 117–138. [Google Scholar] [CrossRef]
  52. Mienye, I.D.; Swart, T.G. A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications. Information 2024, 15, 755. [Google Scholar] [CrossRef]
  53. OpenAI. GPT-3 Applications. Available online: https://openai.com/index/gpt-3-apps/ (accessed on 25 January 2025).
  54. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
  55. Bansal, G.; Chamola, V.; Hussain, A.; Guizani, M.; Niyato, D. Transforming Conversations with AI—A Comprehensive Study of ChatGPT. Cogn. Comput. 2024, 16, 2487–2510. [Google Scholar] [CrossRef]
  56. Varitimiadis, S.; Kotis, K.; Pittou, D.; Konstantakis, G. Graph-Based Conversational AI: Towards a Distributed and Collaborative Multi-Chatbot Approach for Museums. Appl. Sci. 2021, 11, 9160. [Google Scholar] [CrossRef]
  57. IBM. What Is AI Bias? Available online: https://www.ibm.com/think/topics/ai-bias (accessed on 25 January 2025).
  58. King, J.; Meinhardt, C. Privacy in an AI Era: How Do We Protect Our Personal Information? Available online: https://hai.stanford.edu/news/privacy-ai-era-how-do-we-protect-our-personal-information (accessed on 25 January 2025).
  59. Cheong, B.C. Transparency and accountability in AI systems: Safeguarding wellbeing in the age of algorithmic decision-making. Front. Hum. Dyn. 2024, 6, 1421273. [Google Scholar] [CrossRef]
  60. McCoy, L.G.; Ci Ng, F.Y.; Sauer, C.M.; Yap Legaspi, K.E.; Jain, B.; Gallifant, J.; McClurkin, M.; Hammond, A.; Goode, D.; Gichoya, J.; et al. Understanding and training for the impact of large language models and artificial intelligence in healthcare practice: A narrative review. BMC Med. Educ. 2024, 24, 1096. [Google Scholar] [CrossRef]
  61. Thakur, R.; Panse, P.; Bhanarkar, P.; Borkar, P. AIoT: Role of AI in IoT, Applications and Future Trends. Res. Trends Artif. Intell. Internet Things 2023, 42–53. [Google Scholar]
  62. DusunIoT. AIoT—Artificial Intelligence of Things. Available online: https://www.dusuniot.com/blog/aiot-artificial-intelligence-of-things/ (accessed on 25 January 2025).
  63. Zhang, Z.; Wen, F.; Sun, Z.; Guo, X.; He, T.; Lee, C. Artificial Intelligence-Enabled Sensing Technologies in the 5G/Internet of Things Era: From Virtual Reality/Augmented Reality to the Digital Twin. Adv. Intell. Syst. 2022, 4, 2100228. [Google Scholar] [CrossRef]
  64. Devagiri, J.S.; Paheding, S.; Niyaz, Q.; Yang, X.; Smith, S. Augmented Reality and Artificial Intelligence in industry: Trends, tools, and future challenges. Expert Syst. Appl. 2022, 207, 118002. [Google Scholar] [CrossRef]
  65. Zhang, Z.; He, T.; Zhu, M.; Sun, Z.; Shi, Q.; Zhu, J.; Dong, B.; Yuce, M.R.; Lee, C. Deep learning-enabled triboelectric smart socks for IoT-based gait analysis and VR applications. Npj Flex. Electron. 2020, 4, 29. [Google Scholar] [CrossRef]
  66. Sun, Z.; Zhu, M.; Zhang, Z.; Chen, Z.; Shi, Q.; Shan, X.; Yeow, R.C.H.; Lee, C. Artificial intelligence of things (AIoT) enabled virtual shop applications using self-powered sensor enhanced soft robotic manipulator. Adv. Sci. 2021, 8, e2100230. [Google Scholar]
  67. Javed, A.R.; Sarwar, M.U.; ur Rehman, S.; Khan, H.U.; Al-Otaibi, Y.D.; Alnumay, W.S. PP-SPA: Privacy preserved smartphone-based personal assistant to improve routine life functioning of cognitive impaired individuals. Neural Process. Lett. 2023, 55, 35–52. [Google Scholar] [CrossRef]
  68. Zhang, J.; Tai, Y. Secure medical digital twin via human-centric interaction and cyber vulnerability resilience. Conn. Sci. 2022, 34, 895–910. [Google Scholar] [CrossRef]
  69. Stacchio, L.; Angeli, A.; Marfia, G. Empowering digital twins with eXtended reality collaborations. Virtual Real. Intell. Hardw. 2022, 4, 487–505. [Google Scholar] [CrossRef]
  70. Fernandes, S.V.; João, D.V.; Cardoso, B.B.; Martins, M.A.I.; Carvalho, E.G. Digital Twin Concept Developing on an Electrical Distribution System—An Application Case. Energies 2022, 15, 2836. [Google Scholar] [CrossRef]
  71. Khan, H.; Soroni, F.; Sadek Mahmood, S.J.; Mannan, N.; Khan, M.M. Education System for Bangladesh Using Augmented Reality, Virtual Reality and Artificial Intelligence. In Proceedings of the 2021 IEEE World AI IoT Congress (AIIoT), Seatle, WA, USA, 10–13 May 2021; pp. 137–142. [Google Scholar] [CrossRef]
  72. Wang, J.; Yang, Y.; Liu, H.; Jiang, L. Enhancing the college and university physical education teaching and learning experience using virtual reality and particle swarm optimization. Soft Comput. 2024, 28, 1277–1294. [Google Scholar] [CrossRef]
  73. Sernani, P.; Vagni, S.; Falcionelli, N.; Mekuria, D.N.; Tomassini, S.; Dragoni, A.F. Voice Interaction with Artworks via Indoor Localization: A Vocal Museum. In Proceedings of the 7th International Conference on Augmented Reality, Virtual Reality, and Computer Graphics (AVR 2020), Lecce, Italy, 7–10 September 2020; De Paolis, L.T., Bourdot, P., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 66–78. [Google Scholar] [CrossRef]
  74. Guo, Y.; Hou, K.; Yan, Z.; Chen, H.; Xing, G.; Jiang, X. Sensor2Scene: Foundation Model-Driven Interactive Realities. In Proceedings of the 2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys 2024), Hong Kong, China, 13–15 May 2024; pp. 13–19. [Google Scholar] [CrossRef]
  75. Lai, K.T.; Chung, Y.T.; Su, J.J.; Lai, C.H.; Huang, Y.H. AI Wings: An AIoT Drone System for Commanding ArduPilot UAVs. IEEE Syst. J. 2023, 17, 2213–2224. [Google Scholar] [CrossRef]
  76. Kuanting/aiwings. Available online: https://github.com/kuanting/aiwings (accessed on 25 January 2025).
  77. Qu, Q.; Hatami, M.; Xu, R.; Nagothu, D.; Chen, Y.; Li, X.; Blasch, E.; Ardiles-Cruz, E.; Chen, G. The microverse: A task-oriented edge-scale metaverse. Future Internet 2024, 16, 60. [Google Scholar] [CrossRef]
  78. Yang, S.R.; Lin, Y.C.; Lin, P.; Fang, Y. AIoTtalk: A SIP-Based Service Platform for Heterogeneous Artificial Intelligence of Things Applications. IEEE Internet Things J. 2023, 10, 14167–14181. [Google Scholar] [CrossRef]
  79. Lee, J.W.; Lee, Y.; Choi, H.B.; Son, S.W.; Leem, E.; Seo, J. A metaverse Avatar Teleport System Using an AIoT Pose Estimation Device. In Proceedings of the 2023 IEEE International Conference on Metaverse Computing, Networking and Applications (MetaCom 2023), Kyoto, Japan, 26–28 June 2023; pp. 698–703. [Google Scholar] [CrossRef]
  80. Li, H.; Ma, W.; Wang, H.; Liu, G.; Wen, X.; Zhang, Y.; Yang, M.; Luo, G.; Xie, G.; Sun, C. A framework and method for Human-Robot cooperative safe control based on digital twin. Adv. Eng. Inform. 2022, 53, 101701. [Google Scholar] [CrossRef]
  81. Zhang, D.; Xu, F.; Pun, C.M.; Yang, Y.; Lan, R.; Wang, L.; Li, Y.; Gao, H. Virtual Reality Aided High-Quality 3D Reconstruction by Remote Drones. ACM Trans. Internet Technol. 2021, 22, 1–20. [Google Scholar] [CrossRef]
  82. Wu, H.T. The internet-of-vehicle traffic condition system developed by artificial intelligence of things. J. Supercomput. 2021, 78, 2665–2680. [Google Scholar] [CrossRef]
  83. Miranda Calero, J.A.; Rituerto-Gonzalez, E.; Luis-Mingueza, C.; Canabal, M.F.; Barcenas, A.R.; Lanza-Gutierrez, J.M.; Pelaez-Moreno, C.; Lopez-Ongil, C. Bindi: Affective Internet of Things to Combat Gender-Based Violence. IEEE Internet Things J. 2022, 9, 21174–21193. [Google Scholar] [CrossRef]
  84. Yu, F.; Yu, C.; Tian, Z.; Liu, X.; Cao, J.; Liu, L.; Du, C.; Jiang, M. Intelligent Wearable System With Motion and Emotion Recognition Based on Digital Twin Technology. IEEE Internet Things J. 2024, 11, 26314–26328. [Google Scholar] [CrossRef]
  85. Joseph, S.; Priya S, B.; R, P.; M, S.K.; S, S.; V, J.; R, S.P. IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI. In Proceedings of the 2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI), Gwalior, India, 10–11 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
  86. Mihai, S.; Yaqoob, M.; Hung, D.V.; Davis, W.; Towakel, P.; Raza, M.; Karamanoglu, M.; Barn, B.; Shetve, D.; Prasad, R.V.; et al. Digital Twins: A Survey on Enabling Technologies, Challenges, Trends and Future Prospects. IEEE Commun. Surv. Tutor. 2022, 24, 2255–2291. [Google Scholar] [CrossRef]
  87. Unity Technologies. Unity Real-Time Development Platform: 3D, 2D, VR & AR Engine. Available online: https://unity.com/ (accessed on 25 January 2025).
  88. Epic Games, Inc. The Most Powerful Real-Time 3D Creation Tool— Unreal Engine. Available online: unrealengine.com (accessed on 25 January 2025).
  89. Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: tensorflow.org (accessed on 25 January 2025).
  90. Keras—Deep Learning for Humans. Available online: https://keras.io/ (accessed on 25 January 2025).
  91. Alkaeed, M.; Qayyum, A.; Qadir, J. Privacy preservation in Artificial Intelligence and Extended Reality (AI-XR) metaverses: A survey. J. Netw. Comput. Appl. 2024, 231, 103989. [Google Scholar] [CrossRef]
  92. Khalid, N.; Qayyum, A.; Bilal, M.; Al-Fuqaha, A.; Qadir, J. Privacy-preserving artificial intelligence in healthcare: Techniques and applications. Comput. Biol. Med. 2023, 158, 106848. [Google Scholar] [CrossRef]
  93. Lazaros, K.; Koumadorakis, D.E.; Vrahatis, A.G.; Kotsiantis, S. Federated Learning: Navigating the Landscape of Collaborative Intelligence. Electronics 2024, 13, 4744. [Google Scholar] [CrossRef]
  94. Ferrara, E. Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci 2023, 6, 3. [Google Scholar] [CrossRef]
  95. Min, A. Artifical Intelligence and Bias: Challenges, Implications, and Remedies. J. Soc. Res. 2023, 2, 3808–3817. [Google Scholar] [CrossRef]
  96. Hanna, M.G.; Pantanowitz, L.; Jackson, B.; Palmer, O.; Visweswaran, S.; Pantanowitz, J.; Deebajah, M.; Rashidi, H.H. Ethical and Bias Considerations in Artificial Intelligence/Machine Learning. Mod. Pathol. 2025, 38, 100686. [Google Scholar] [CrossRef]
Figure 1. PRISMA flowchart illustrating the procedure undergone by the papers included in this systematic review.
Figure 1. PRISMA flowchart illustrating the procedure undergone by the papers included in this systematic review.
Futureinternet 17 00163 g001
Figure 2. Distribution of Articles Across Application Domains With and Without Additional Criteria Limits, Based on Quantitative Analysis.
Figure 2. Distribution of Articles Across Application Domains With and Without Additional Criteria Limits, Based on Quantitative Analysis.
Futureinternet 17 00163 g002
Table 2. Table of Reviewed Studies from IEEE Xplore and ACM Databases.
Table 2. Table of Reviewed Studies from IEEE Xplore and ACM Databases.
WorkApplication FieldTechnology CombinationType of ImplementationMethodology, Models & Tools UsedOpen TechnologyAdvantagesLimitations and ChallengesKey Results
[80] Li et al. (2022)IndustryAI + VR + IoTSimulations and real testing scenariosHMDs, Binocular cameras, CNNs for motion detection, Unity Engine, Algorithm for distance calculation, OPC Unified Architecture, UDP and TCP protocol for communications, Siemens S7-1200 PLC, ABB-IRB1600 industrial robotic armNot specifiedSignificantly improved the response time for collision avoidance in HRC, Achieved an average response time of 147.06 ms offering real-time monitoring.Lack of consideration for other moving obstacles in the environment that may cause collisions, Limited dataset, Scalability of the system wasn’t discussed.Developed a safety control framework for HRC based on DT technology regarding the retention of safe distance. The system achieved a high detection accuracy rate of 97.25% for label recognition and 97.00% for human key point detection, while effectively managing the safety distance between humans and robots during operation.
[81] Zhang et al. (2021)Smart CitiesAI + VR + IoTSimulations and real testing scenariosTelexistence drone, NVIDIA Jetson TX2 for processing, a MYNTAI D1000-50 stereo camera for image capture, VINS-Mono framework for visual-inertial odometry, Inertial Measurement Unit, Depth sensors, HMD (Oculus Rift), WiFi, MAVLink protocol for UAV communicationNot specifiedProvided better accuracy and completeness of 3D models compared to traditional methods. The system allowed for real-time feedback during the scanning process of outdoor urban environments, enhancing user interaction and navigation guidance.Latency in data transmission, Computational overheads, UAV autonomyImproved 3D reconstruction quality by the developed VR interface which achieved higher accuracy and completeness in reconstructions compared to traditional joystick controls. The system’s real-time feedback mechanism allowed users to actively fill in gaps during the scanning process, resulting in more complete models.
[82] Wu et al. (2021)MobilityAI + IoTReal testing scenarioFaster Region-based Convolutional Neural Networks (R-CNN) for object recognition, Federated learning, 6G Networks, Raspberry Pi, GPS, CamerasNot specifiedAutomatically identifies road conditions and shares relevant multimedia information with nearby vehicles, reducing the need for extensive fixed infrastructure and improving data privacy through federated learning while using cutting-edge technologies like 6G and simple tools like a common dashboard camera.Dependence on 6G networks, Interoperability and standardization concerns, Legal and ethical concerns, Scalability and implementation costs, Lack of specific accuracy metrics and needs for a larger and wider dataset representing various cases and conditions.Successfully demonstrated the ability to recognize road conditions and share information in real-time, achieving a significant recognition rate for road obstacles and traffic conditions. The integration of federated learning improved the accuracy of the object recognition model, contributing to enhanced driving safety and efficient traffic management while also maintaining data security.
[83] Miranda Calero et al. (2022)Health (specifically addressing gender-based violence)AI + IoT + VRTesting on real dataPhysiological sensors, KNN classifier, NN for speech data processing, Commercial off-the-shelf smart sensors and a smartphone application, ARM Cortex-M4, Python, MongoDBNo open technology is explicitly mentioned, except the freely available WEMAC datasetProvides an autonomous, inconspicuous solution for detecting gender-based violence situations by automatically identifying fear-related emotions, eliminating the need for manual operation by victims during critical moments.Low classification accuracy (63.61%), Limited small sample size of 47 participants, Computational constraints on the edge devicesIntroduced the WEMAC dataset, Integrated physiological and auditory data to detect potential gender-based violence situations and achieved an overall fear classification accuracy of 63.61% using a subject-independent approach.
[84] Yu et al. (2024)HealthAI + IoT + VRReal testing scenarioTGAM module for electroencephalogram signal acquisition, UWB position chip, Ten-axis accelerometer, Raspberry Pi, CNN, bidirectional LSTM, TB-SFENet developed for motion and emotion recognition, Unreal Engine 5The authors expressed willingness to share their emotion classification dataset (TEEC) with institutions, but access would require participant consent due to privacy concerns. There is no indication of open code or tools being provided.High accuracy in recognizing user states while providing real-time interaction and visualization, enhancing health monitoring while efficiently combining both AI, IoT, and VRUser comfort, Potential interference issues with the UWB positioning accuracy, ScalabilityProposed an intelligent wearable system with high-precision motion and emotion recognition through the developed TB-SFENet model. The DT platform provided real-time interaction and visualization. Showcased significant potential for applications in intelligent healthcare and safety monitoring.
[85] Joseph et al. (2023)Education, Home Automation, HealthcareAI + IoT + ARProof of concept working mechanismUnity 3D, Vuforia Engine, YOLOv8, OpenCV, APIs for content retrieval, ESP32 microcontroller, Wi-Fi, BluetoothNot specifiedCost-effective and accessible requiring only a smartphone camera, making AR and IoT technologies accessible to a wide audience. It enhances user convenience through intuitive hand gestures, promotes energy efficiency by tracking power consumption, and serves as an educational tool for interactive learning.Need for increased object recognition accuracy, Lack of easily expandable educational content, Need for social features, Scalability concerns, Heavy reliance on smartphone resources, Interoperability with a variety of IoT devices, Lack of testing and user experience research, Lack of security measuresCreated a multi-functional application that provides users with real-time control over household appliances and educational content. It allows immersive learning experiences and efficient energy management, demonstrating significant potential for enhancing home automation and education.
Table 3. Results of the quantitative data-analysis based on the number of papers included for each identified application field.
Table 3. Results of the quantitative data-analysis based on the number of papers included for each identified application field.
Application Domain# Of Articles# Of Articles Without The
Additional Criteria Limit
Health313
Industry38
Smart Cities38
Art & Culture22
Education22
Metaverse11
Total1434
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kostadimas, D.; Kasapakis, V.; Kotis, K. A Systematic Review on the Combination of VR, IoT and AI Technologies, and Their Integration in Applications. Future Internet 2025, 17, 163. https://doi.org/10.3390/fi17040163

AMA Style

Kostadimas D, Kasapakis V, Kotis K. A Systematic Review on the Combination of VR, IoT and AI Technologies, and Their Integration in Applications. Future Internet. 2025; 17(4):163. https://doi.org/10.3390/fi17040163

Chicago/Turabian Style

Kostadimas, Dimitris, Vlasios Kasapakis, and Konstantinos Kotis. 2025. "A Systematic Review on the Combination of VR, IoT and AI Technologies, and Their Integration in Applications" Future Internet 17, no. 4: 163. https://doi.org/10.3390/fi17040163

APA Style

Kostadimas, D., Kasapakis, V., & Kotis, K. (2025). A Systematic Review on the Combination of VR, IoT and AI Technologies, and Their Integration in Applications. Future Internet, 17(4), 163. https://doi.org/10.3390/fi17040163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop