1. Introduction
Cardiac arrest, the sudden cessation of cardiac mechanical activity, is a life-threatening emergency resulting in the immediate loss of blood circulation and subsequent death without rapid intervention. Out-of-hospital cardiac arrest (OHCA) is defined as a cardiac arrest occurring outside of medical facilities, where the frequent inaccessibility of rapid medical intervention severely threatens patient survival [
1]. The incidence of OHCA has been rising globally, driven by demographic shifts such as population aging and lifestyle factors including physical inactivity and high-calorie diets [
2]. In the United States alone, OHCA affects more than 356,000 individuals annually [
3]. The survival of cardiac arrest patients is highly time-dependent, with the probability of successful resuscitation decreasing significantly for every minute treatment is delayed [
4]. Cardiopulmonary Resuscitation (CPR) serves as the fundamental technique for restoring blood circulation and improving the chances of survival for those in cardiac arrest [
5]. Since more than 70% of OHCA cases occur in private residences [
6], educating the public in basic life support (BLS) is essential to foster bystander-initiated CPR, which is key to enhancing survival before emergency services arrive.
Reflecting this global need, the 2025 guidelines from the International Liaison Committee on Resuscitation (ILCOR) and American Heart Association (AHA) have recently highlighted the potential of digital transformation in medical education. Specifically, the guidelines suggest that in situ simulation and augmented reality (AR) might support CPR training by integrating practice into realistic environments [
7,
8,
9]. To ensure that AR functions as a reliable training modality rather than a mere visual aid, it is imperative to develop systems capable of accurately measuring and correcting physical maneuvers, thereby providing the technical precision and real-time feedback essential for practical skill acquisition.
To address these requirements, this paper proposes an AI-integrated extended reality (XR)-based CPR training system. The system is designed to shift the traditional educational workflow into a more decentralized and efficient process. By integrating vision-based pose estimation and multi-modal sensor data analysis, the platform enables trainees to engage in autonomous hands-on practice guided by AI coaching. This approach allows human instructors to focus on final certification, thereby streamlining the overall BLS training procedure.
To validate the technical feasibility and performance of the proposed system, this paper conducts a series of experiments focused on its functional components. It should be noted that this research primarily serves as a Proof-of-Concept (PoC), emphasizing the architectural design and technical integration of the AI-driven evaluation system. The experimental phase was conducted through internal functional testing designed to verify system reliability, rather than medical clinical trials or human participant recruitment. Although the 2025 guidelines have since been released, these internal evaluations were conducted based on the 2020 Korean Cardiopulmonary Resuscitation (KACPR) Guidelines [
10], which represented the most recent standards available at the beginning of this study. Specifically, the experiments evaluate key technical metrics, including the accuracy of sensors embedded in the manikin kit for chest compression and ventilation, the latency of speech-to-text (STT) processing, and the performance of the generative AI evaluator in analyzing verbal protocols. By focusing on these fundamental technical requirements, the testing demonstrates the system’s robustness and its potential to function as an a complementary CPR training system.
The main contributions of this paper are summarized as follows:
Proposal of an auxiliary CPR training paradigm: We propose an AI-integrated XR-based training system designed to supplement conventional instructor-led training by transitioning portions of the practical workflow to an online, self-paced model. This approach alleviates the pedagogical burden on instructors by allowing them to focus on final certification and evaluation, while simultaneously fostering a self-paced environment for trainees to enhance skill proficiency.
Automated evaluation based on the latest regional guidelines: We introduce an AI-integrated evaluation framework that aligns with the KACPR Guidelines. By combining vision-based pose estimation with multimodal sensor data (pressure, flow, and gyro sensors), the system provides a precise, automated assessment of physical maneuvers such as compression depth, posture, and airway management without the need for human supervision.
Personalized real-time feedback system: We optimized the verbal assessment pipeline using a concurrent architecture integrating Faster-Whisper and GPT-4o. This technical enhancement achieves a recognition accuracy of approximately 95% and a low latency of under 0.5 s, ensuring that trainees receive immediate, context-aware feedback on critical communicative protocols.
2. Related Works
To realize the potential of XR-supported training highlighted in recent guidelines, it is essential to examine the technological components that ensure high-precision skill acquisition. This section explores previous research in sensor-based feedback, XR-based medical training, and AI-driven performance evaluation, focusing on how their integration can address the current evidence gap in digital CPR training. By analyzing the strengths and limitations of these existing approaches, we define the technical requirements for the proposed XR-based CPR training system with an AI-driven trainee’s performance evaluation.
2.1. Conventional CPR Training and Limitations of Sensor-Based Feedback
Currently, CPR training has evolved from purely instructor-led sessions to technology-assisted learning utilizing smart manikins or wearable devices. These feedback devices have become the general training model for objective assessment [
7], providing immediate quantitative feedback on performance metrics such as compression depth (5–6 cm) and rate (100–120 cpm). Empirical studies demonstrate that such feedback significantly improves adherence to mechanical guidelines compared to practice without devices, aiding trainees in self-correcting their compression rhythm and intensity [
11,
12].
However, these sensor-based approaches primarily focus on quantitative metrics such as compression depth, often without assessing the trainee’s biomechanical posture. Research indicates that achieving target compression metrics does not necessarily imply proper body mechanics. Conventional sensors, measuring acceleration or pressure, cannot detect postural deviations such as bent elbows or improper utilization of upper body weight [
13,
14]. This limitation is critical because correct compression depth achieved with poor posture can accelerate rescuer muscle fatigue, leading to a rapid decay in CPR quality [
15]. Moreover, excessive reliance on device-specific feedback can hinder skill transferability, as trainees may struggle to internalize the correct motor patterns required for real-world scenarios [
11].
Due to the inability of sensors to assess biomechanics, the role of human instructors remains indispensable for monitoring trainee form. This continued dependency creates a significant pedagogical burden, as instructors must bridge the gap between the device’s numerical feedback and the trainee’s actual physical posture. Consequently, the reliance on professional supervision reintroduces the inherent limitations of traditional training formats. In a typical one-to-many setting, instructors cannot simultaneously monitor and correct the fine motor skills of multiple trainees, often resulting in insufficient guidance on individual posture corrections [
16,
17].
2.2. Advancements and Technical Challenges in XR-Based CPR Training
To alleviate the pedagogical burden on instructors and overcome the spatiotemporal constraints identified in traditional methods, XR technologies, including virtual reality (VR) and AR, are being actively adopted in medical training. Recent systematic reviews demonstrate that XR serves as an effective alternative platform, enabling self-directed learning that allows trainees to engage in repetitive scenario training without immediate instructor supervision [
18,
19]. This shift towards autonomous training is crucial for overcoming the limitations of one-to-many instruction and facilitating flexible, non-face-to-face learning environments.
VR-based CPR simulations contribute to this paradigm by replicating realistic emergency environments, which creates the necessary psychological tension and motivates trainees through immersive role-playing [
20]. Concurrently, AR technology enhances intuitive learning by overlaying virtual guidelines, such as compression points and hand placement indicators, directly onto physical manikins or real-world settings [
21]. These technologies have proven their potential to provide high immersion and presence, which are essential for maintaining trainee engagement during self-paced practice.
However, despite these educational benefits, significant technical challenges hinder the practical application of XR as a precise skill evaluation tool. A primary limitation lies in the spatial registration of virtual objects. Existing AR systems often require cumbersome manual calibration to align virtual guides with physical manikins. Research indicates that this manual setup disrupts the training flow and often leads to registration errors during dynamic user movements, thereby degrading the overall user experience [
22]. Furthermore, processing high-fidelity 3D visuals alongside sensor data often introduces system latency. Even minor delays in visual feedback can disrupt the rhythmic motor skills required for CPR, hindering real-time error correction [
23].
Moreover, many existing XR CPR systems prioritize visual immersion over precise performance evaluation. While VR provides high visual engagement, studies argue that it often lacks integration with intelligent biomechanical assessment, relying instead on simple rule-based feedback or disconnected physical sensor devices [
20,
21]. Consequently, while XR facilitates accessibility and immersion, current systems often struggle to provide the immediate, corrective feedback on posture and protocol execution comparable to that available from a human instructor.
2.3. AI-Driven Performance Evaluation and Real-Time Feedback
With the advancement of deep learning-based computer vision technology, research has rapidly shifted towards markerless human motion recognition, enabling analysis using only cameras without the need for separate wearable sensors [
24]. In particular, recent attempts utilize Pose Estimation models, such as OpenPose and MediaPipe, to analyze the positions and angles of the trainee’s shoulders, elbows, and wrists during CPR [
25]. These AI-based approaches allow for contactless analysis of the trainee’s full-body posture, effectively addressing the “blind spots” of traditional sensors by determining not only compression depth but also whether correct postural alignment is maintained. Furthermore, studies have proposed systems that utilize machine learning algorithms to analyze trainee performance data, providing adaptive feedback on specific weaknesses or adjusting difficulty levels to suit individual characteristics [
26].
However, despite these advancements in kinematic analysis, a critical gap remains in the evaluation of verbal protocols and the integration of these technologies into immersive environments. Effective BLS requires not only physical maneuvers but also clear communicative actions, such as designating a specific bystander to call emergency services or requesting an automated external defibrillator (AED). Current AI-driven CPR systems predominantly focus on physical metrics, often neglecting these verbal components. While some voice-assisted systems exist, they typically rely on simple keyword spotting algorithms that lack the natural language understanding capabilities required to assess the context, timing, or intent of a trainee’s sentence [
27].
2.4. Summary of Research Gaps and Motivation
In summary, while individual advancements in sensor technology, XR, and AI have addressed specific aspects of CPR training, a holistic solution remains absent. Current sensor-based methods lack postural oversight, XR systems struggle with technical usability and precise feedback, and AI approaches often neglect the communicative dimensions of emergency response. Consequently, there is a critical need for an integrated platform that synergizes the immersive accessibility of XR with the multimodal analytical precision of AI.
Therefore, this study proposes a novel CPR training system that overcomes physical constraints through a marker-based XR environment and applies vision-based AI models to precisely analyze and correct trainee motions in real-time without separate sensors. Furthermore, by integrating Large Language Models (LLMs) for verbal protocol evaluation, this approach distinguishes itself from existing studies, providing an environment where anyone can receive continuous, comprehensive retraining that covers both physical skills and communicative procedures, effectively bridging the gap between autonomous self-study and instructor-led training.
3. The Proposed XR-Based CPR Training System Design
This section presents the comprehensive design and implementation of the proposed AI-driven XR-based CPR training system. The core design objective is to establish a self-paced, immersive learning environment by integrating XR and AI technologies. Key components include sensor data analysis, automated trainee performance evaluation, and sLLM-based feedback generation for real-time corrective guidance. By synthesizing these diverse technological components, the system enables a holistic assessment of trainee performance without the need for human instructors.
3.1. System Architecture
As illustrated in
Figure 1, the proposed system architecture is organized into three sequential phases designed to process trainee inputs and deliver corrective guidance: CPR Training Data Collection, AI-based Evaluation, and Personalized Feedback.
The first phase involves data acquisition from both a camera and sensors embedded within the CPR manikin kit. A smartphone camera captures real-time video of the trainee to monitor body posture and hand positioning. Simultaneously, sensors embedded within the manikin collect physical performance metrics, including pressure sensors for chest compression depth, and detectors for airway management and artificial respiration volume.
The second phase focuses on AI-based evaluation, where the collected multi-modal data is processed to assess trainee proficiency. The system analyzes sensor data to verify if compressions and ventilations meet clinical guidelines. Concurrently, vision algorithms analyze the camera feed to evaluate the trainee’s biomechanics. This comprehensive data analysis enables the system to objectively evaluate CPR performance across multiple dimensions.
The final phase closes the learning loop through personalized feedback generation. Based on the evaluation results, a sLLM generates context-specific feedback tailored to the trainee’s specific errors. This feedback is then delivered through the XR interface, providing real-time visual and auditory cues that guide the trainee in immediately correcting their actions.
3.2. Hardware and Software Configuration
The proposed system integrates specific hardware components to ensure accurate data acquisition and immersive interaction. As illustrated in
Figure 2, the CPR manikin is embedded with three primary sensors, including a Velostat (Adafruit, New York City, NY, USA) pressure sensor to measure chest compression depth, a YF-S401 flow sensor to monitor artificial respiration volume, and an MPU-6050 gyro sensor (TDK InvenSense, San Jose, CA, USA) to detect neck tilt angle for airway management. These sensors are connected to an ESP8266 Wi-Fi module, which transmits collected data to the central server in real-time. For visual feedback and immersive guidance, the system utilizes the Microsoft HoloLens 2 (Microsoft, Redmond, WA, USA), while a separate smartphone camera is employed for external posture capture.
Regarding the software configuration, the system’s core logic and AI processing are implemented on a central server. The XR environment is developed using Unity, enabling the rendering of interactive 3D guides overlaid on the real world. The communication between the sensors, the AI server, and the HoloLens 2 is synchronized to ensure minimal latency in feedback delivery.
3.3. Data Analysis
To ensure comprehensive skill assessment, the proposed system evaluates trainee performance by analyzing data collected from integrated multi-modal hardware components. For sensor-based skill assessment, real-time data from embedded sensors within the manikin is compared against pre-defined clinical thresholds summarized in
Table 1 to verify technical accuracy. Specifically, the Velostat pressure sensor validates whether compressions achieve the required depth of 5–6 cm, while the YF-S401 flow sensor verifies if the air volume delivered falls within the target range of 500–600 mL. Additionally, the MPU-6050 gyro sensor monitors if the neck is tilted to approximately 30 degrees to ensure an open airway.
To assess biomechanics through vision-based posture estimation, the system utilizes MoveNet Lightning V4, a lightweight pose estimation model optimized for mobile and edge devices. As illustrated in
Figure 3, the workflow initiates by receiving video data from an external smartphone camera, from which the model extracts 17 key body landmarks. Based on this data, the algorithm calculates geometric angles using vector analysis, such as the perpendicularity of the arms relative to the chest and the distance between wrists. This analysis determines if the trainee is maintaining the correct posture, including straight elbows and vertical compression, and triggers real-time visual feedback accordingly.
Finally, as illustrated in
Figure 4, verbal communication such as calling for help or requesting an AED, is evaluated via an LLM-driven protocol utilizing a pipeline of voice activity detection (VAD), speech-to-text, and natural language processing (NLP). For speech recognition, the system employs Faster-Whisper accelerated by CTranslate2. The transcribed text is then analyzed by GPT-4o to evaluate the trainee’s speech against KACPR guidelines. This process assesses keyword matching and contextual appropriateness to generate both a quantitative score and qualitative feedback.
3.4. XR Tracking and Feedback Interface
To provide a stable and immersive training experience without manual intervention, the system employs marker-based tracking using ArUco markers and OpenCV on the HoloLens 2. By attaching ArUco markers to the manikin and AED kit, the system facilitates real-time recognition and tracking of the 3D coordinates of these training objects. This spatial registration enables the seamless overlay of digital guides, including compression points and hand placement indicators, directly onto the physical equipment.
To mitigate technical challenges such as jitter and sensor noise inherent in AR tracking, the system incorporates specific signal processing techniques. Median and Moving Average filters are applied to raw coordinate data to remove outliers and stabilize the movement of virtual objects. Concurrently, Spherical Linear Interpolation (Slerp) is applied to quaternion data to handle rotation smoothly, thereby preventing gimbal lock phenomena and ensuring fluid visual updates. Through this robust configuration, the system achieves real-time, Six Degrees of Freedom (6-DoF) tracking, enabling trainees to receive precise visual feedback on actions like hand positioning and compression rhythm directly within their field of view.
4. Experiments and Results
This section presents the experimental results and performance evaluations of the core components in the proposed XR-based CPR training system. The evaluations are based on system verification tests conducted in alignment with the CPR certification protocols established by KACPR, focusing on sensor accuracy and LLM-based verbal assessment. And we evaluate marker-based object tracking, and system integration. These tests were designed to validate the system’s precision, low-latency responsiveness, and overall efficiency in delivering personalized feedback. The results demonstrate the system’s robustness in facilitating effective training, substantiated by quantitative metrics including accuracy rates, response times, and error reduction percentages.
4.1. Speech-to-Text (STT) Performance Optimization
Verbal communication (e.g., calling for help, designating a specific person) is a critical component of CPR protocols defined in the CPR certification protocols. We evaluated the system’s STT module to ensure it meets real-time requirements for accurate protocol verification. To identify the optimal balance between recognition accuracy and processing speed, we first benchmarked five variants of the Whisper model. Key performance metrics included word error rate (WER), where lower is better, and real-time factor (RTF), defined as the ratio of processing time to audio duration.
As presented in
Table 2, experimental benchmarking indicated that the ‘Medium’ model achieved the lowest WER (2.9%), ensuring high accuracy for capturing critical medical terminology, while maintaining an RTF of 0.50. This signifies that the model can process audio twice as fast as real-time playback. Based on these results, we selected the Medium model as the core engine for our system.
To further enhance real-time performance, we deployed the selected Medium model using Faster-Whisper, utilizing CTranslate2 for acceleration. This optimization leverages 8-bit quantization and efficient inference to minimize memory usage and latency. Additionally, a VAD filter was integrated to process only valid speech segments, filtering out background noise.
Finally, we compared the overall system latency and accuracy between the initial sequential processing approach and the proposed concurrent architecture using the optimized Medium model. The results in
Table 3 demonstrate that the proposed architecture drastically reduced the total processing time from 5 s to under 0.5 s and improved recognition accuracy from 50% to 95%. This validates that the Medium model, synergized with Faster-Whisper optimization, provides the necessary speed and reliability for real-time CPR training feedback.
4.2. LLM-Based Verbal Evaluation
To automate the human instructor’s role in evaluating verbal protocols, we conducted internal functional testing based on the 2020 KACPR Guidelines which represented the most recent standards at the time of this study. The selection of LLMs for this purpose was strategically designed to compare two dominant paradigms including the open-source LLaMA series to explore the feasibility of privacy-focused local or edge deployment and the commercial GPT series to establish state-of-the-art performance benchmarks. The primary objective was to identify an engine capable of balancing rapid response speed with high evaluation accuracy within a Korean-speaking training environment.
As summarized in
Table 4, internal test results revealed varying performance characteristics among the evaluated models. The fine-tuned LLaMA models exhibited an average latency of approximately 3 s and accuracy rates between 63% and 73%. While these models offer advantages for local deployment, their current performance presented challenges in meeting the rigorous demands for real-time pedagogical feedback. Conversely, GPT-4o demonstrated superior performance by achieving the highest evaluation accuracy of 88% with a rapid response time of 0.3 to 1.5 s. Consequently, GPT-4o was selected as the optimal model to provide immediate and context-aware feedback to trainees.
Leveraging the selected GPT-4o model, we implemented a robust evaluation pipeline consisting of three stages including text correction, content evaluation, and score calculation. In the first stage, the AI mitigates potential transcription errors from the STT process to ensure data integrity. To maintain assessment reliability, a specific System Prompt defines the AI’s role as an Emergency Response Evaluator while incorporating the exact procedural steps and correct answer guidelines from the certification protocols. The AI evaluates trainee utterances based on two primary metrics where it verifies Keyword Matching for essential medical terms such as “CPR”, “AED”, or “119” which is the emergency number for South Korea.
Additionally, the system assesses Contextual Appropriateness to confirm whether the intent and timing of the command align with the specific 2020 guideline scenario. Based on these rigorous criteria, the system calculates a quantitative score and generates qualitative feedback to guide the trainee. This structured approach ensures that the automated feedback is not only rapid but also pedagogically valid and fully aligned with KACPR CPR certification standards.
4.3. XR Tracking Stability and Precision
Precise and stable tracking of training equipment is fundamental to the XR training experience. Conventional AR systems often require manual calibration, which can disrupt training flow [
28]. To evaluate the system’s tracking robustness, we verified the performance of the automated marker-based tracking architecture utilizing ArUco markers and OpenCV. This framework was integrated with the HoloLens 2’s native spatial awareness to verify its ability to seamlessly synchronize virtual guides with physical equipment.
The system enables comprehensive performance monitoring by fusing diverse tracking modalities. Specifically, the ArUco framework derives the 3D coordinates of the manikin and AED pads for object alignment, while the HoloLens 2’s built-in sensors simultaneously handle eye tracking for visual attention analysis and hand tracking for procedure verification. This automated, multi-modal recognition ensures that digital guides are overlaid accurately onto physical objects the moment they enter the user’s field of view, obviating the need for manual setup.
To ensure visual stability during dynamic movements, the system’s multi-layered optimization pipeline was rigorously tested. Raw tracking data, which typically suffers from sensor noise and jitter, was stabilized by applying Median and Moving Average Filters. These filters effectively removed sudden outliers, ensuring smooth movement over temporal windows. Furthermore, Quaternion interpolation was utilized for rotational data, which successfully prevented gimbal lock artifacts and ensured fluid transitions. Through these optimizations, the system achieved high-precision 6-DoF tracking, where virtual feedback remained spatially locked to the physical world with minimal drift, validating the system’s suitability for immersive medical training.
4.4. Summary of Experiments
The comprehensive experimental results empirically validate the technical feasibility and system reliability of the proposed AI-driven XR-based CPR training system. Through rigorous benchmarking of key components against CPR certification protocols, we demonstrated significant optimizations, specifically in STT latency, LLM evaluation accuracy, and XR tracking stability, that directly address the limitations of conventional training methods.
Collectively, these results confirm that the proposed system operates with the real-time responsiveness, precision, and robustness required for effective self-paced CPR training. By automating the evaluation process and providing instant, personalized guidance, the system presents a scalable solution for broadly disseminating high-quality life-saving skills to the general public.
5. Discussions
This study aimed to develop an AI-integrated XR training system capable of functioning as an autonomous instructor. However, it is important to note that the proposed system was experimentally validated using a manikin in a controlled environment. Consequently, the results may differ when applied to the complex and unpredictable nature of actual clinical settings. Additionally, for the system to be practically utilized in actual CPR training curriculums, validation by skilled professional CPR instructors is essential. Therefore, further research is required to verify its educational effectiveness and reliability in real-world scenarios.
In this section, we discuss how the proposed system addresses the critical limitations of conventional training methods identified in
Section 2, specifically focusing on biomechanical precision, XR usability, and verbal protocol evaluation.
5.1. Enhancing Biomechanical Precision via Vision AI
As discussed in
Section 2.1, conventional sensor-based systems are limited to measuring quantitative outcomes like compression depth, creating a “blind spot” regarding the trainee’s biomechanical posture [
13,
14]. This reliance on incomplete data necessitates human supervision to prevent the formation of improper motor habits that lead to rescuer fatigue. In contrast, the proposed system overcomes this limitation by integrating MoveNet-based pose estimation with standard sensor data. By analyzing the verticality of the arms and the distance between wrists in real time, the system provides corrective feedback on the ‘input’ mechanics (posture) alongside the ‘output’ metrics (depth). This multi-modal approach ensures that trainees acquire not only the correct compression force but also the ergonomically efficient posture required for sustained CPR, effectively reducing the pedagogical burden on human instructors.
5.2. Robust Tracking and Usability in XR Environments
Section 2.2 highlighted that while XR offers immersion, technical hurdles such as cumbersome manual calibration and registration errors hinder its practical application [
22]. Furthermore, system latency often disrupts the rhythmic feedback loop essential for motor skill acquisition [
23]. Our system addresses these usability challenges through an automated marker-based tracking architecture. By utilizing ArUco markers combined with Median and Moving Average filters, the system achieves precise 6-DoF tracking without manual setup, ensuring that virtual guides are seamlessly overlaid onto physical manikins. Moreover, the experimental results (
Table 3) demonstrate that our optimized concurrent architecture reduces total processing latency to under 0.5 s. This “real-time” responsiveness minimizes the cognitive dissonance between the user’s action and the system’s feedback, validating the system as a precise skill evaluation tool rather than a mere visual simulator.
5.3. Real-Time Verbal Assessment with LLM Integration
While previous AI-driven studies have advanced kinematic analysis,
Section 2.3 identified a critical gap in the evaluation of communicative protocols, with existing systems lacking the capability to understand the context of trainee utterances [
27]. To bridge this gap, this study introduces a Generative AI evaluator utilizing GPT-4o. Unlike simple keyword-spotting algorithms, our system analyzes the semantic context of the trainee’s commands (e.g., requesting an AED) with a recognition accuracy of 88% (
Table 4). This allows for the evaluation of not just what was said, but whether it was appropriate for the specific emergency scenario. By successfully integrating this verbal assessment into the immersive XR loop, the system provides a holistic training experience that covers the full spectrum of BLS requirements—physical, procedural, and communicative.
5.4. Comparative Summary
Table 5 summarizes the comparison between existing training approaches and the proposed system. While sensor-based kits focus on quantitative metrics and conventional XR systems prioritize visual immersion, the proposed AI-XR system synergizes these elements. It offers a comprehensive solution that ensures biomechanical accuracy, tracking stability, and verbal proficiency, thereby effectively functioning as an autonomous training platform.
6. Conclusions
This study proposed and implemented an AI-integrated XR training system designed to address the spatiotemporal constraints and pedagogical limitations of traditional instructor-led CPR training. Unlike previous studies that relied on partial automation, this research focused on validating the technical feasibility of a fully autonomous training loop by integrating vision-based pose estimation, multi-modal sensor analysis, and LLM-driven verbal evaluation.
The experimental results directly support the system’s capability to function as a reliable training aid. First, regarding real-time responsiveness, the proposed concurrent architecture utilizing the optimized Faster-Whisper model was experimentally proven to reduce STT processing latency to under 0.5 s while maintaining a recognition accuracy of 95%. This confirms that the system meets the low-latency requirements essential for providing immediate feedback during dynamic CPR sessions. Second, in terms of protocol evaluation, the generative AI evaluator powered by GPT-4o achieved an accuracy of 88% in assessing verbal commands against KACPR guidelines, demonstrating its potential to replace human instructors in evaluating communicative proficiency.
Furthermore, the usability issues associated with XR technology were addressed through the implementation of an automated marker-based tracking system. By applying Median and Moving Average filters, the system achieved stable 6-DoF tracking without the need for manual calibration, ensuring that virtual guides were accurately registered to the physical manikin. This addresses the limitation of registration errors identified in prior AR-based training research. Additionally, the integration of MoveNet-based pose estimation with sensor data successfully bridged the “blind spot” of conventional feedback devices, enabling the system to correct the trainee’s biomechanical posture in real-time.
However, this study has limitations as the validation was conducted using a manikin in a controlled environment. While the technical components demonstrated high reliability, the system’s feasibility in complex, real-world clinical scenarios remain to be verified. In conclusion, this research successfully established a PoC for a AI and XR-integrated CPR training system, providing a technically foundation for autonomous, self-paced basic life-saving education.
Author Contributions
Conceptualization, J.K. and W.-T.K.; methodology, J.K. and W.-T.K.; software, J.K.; validation, J.K. and W.-T.K.; formal analysis, J.K. and W.-T.K.; investigation, J.K. and W.-T.K.; resources, J.K. and W.-T.K.; data curation, J.K. and W.-T.K.; writing—original draft preparation, J.K. and W.-T.K.; writing—review and editing, J.K. and W.-T.K.; visualization, J.K.; supervision, W.-T.K.; project administration, W.-T.K.; funding acquisition, W.-T.K. All authors have read and agreed to the published version of the manuscript.
Funding
This work was partly supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2022-II220454, Technology development of smart edge device SW development platform) and the BK-21 four program through National Research Foundation of Korea (NRF) under Ministry of Education.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Correction Statement
This article has been republished with a minor correction to the Funding statement. This change does not affect the scientific content of the article.
References
- Kosugi, S.; Shinouchi, K.; Ueda, Y.; Abe, H.; Sogabe, T.; Ishida, K.; Mishima, T.; Ozaki, T.; Takayasu, K.; Iida, Y.; et al. Clinical and Angiographic Features of Patients with Out-of-Hospital Cardiac Arrest and Acute Myocardial Infarction. JACC 2020, 76, 1934–1943. [Google Scholar] [CrossRef]
- Jung, E.; Park, J.H.; Ro, Y.S.; Ryu, H.H.; Cha, K.-C.; Shin, S.D.; Hwang, S.O.; Lee, M.J.; Park, J.-H.; Kim, S.J.; et al. Cardiac Arrest Pursuit Trial with Unique Registration, Epidemiologic Surveillance (CAPTURES) project investigators. Family history, socioeconomic factors, comorbidities, health behaviors, and the risk of sudden cardiac arrest. Sci. Rep. 2023, 13, 21341. [Google Scholar] [CrossRef] [PubMed]
- Martin, S.S.; Aday, A.W.; Allen, N.B.; Almarzooq, Z.I.; Anderson, C.A.; Arora, P.; Avery, C.L.; Baker-Smith, C.M.; Bansal, N.; Beaton, A.Z.; et al. 2025 Heart disease and stroke statistics: A report of US and global data from the American Heart Association. Circulation 2025, 151, 41–660. [Google Scholar] [CrossRef]
- Gupta, K.; Nguyen, D.D.; Kennedy, K.F.; Chan, P.S. Time to bystander cardiopulmonary resuscitation by patient sex for out-of-hospital cardiac arrest. Resuscitation 2024, 196, 110126. [Google Scholar] [CrossRef] [PubMed]
- Patil, K.D.; Halperin, H.R.; Becker, L.B. Cardiac arrest: Resuscitation and reperfusion. Circ. Res. 2015, 116, 2041–2049. [Google Scholar] [CrossRef]
- Hansen, S.M.; Hansen, C.M.; Folke, F.; Rajan, S.; Kragholm, K.; Ejlskov, L.; Gislason, G.; Køber, L.; Gerds, T.A.; Hjortshøj, S.; et al. Bystander Defibrillation for Out-of-Hospital Cardiac Arrest in Public vs Residential Locations. JAMA Cardiol. 2017, 2, 507–514. [Google Scholar] [CrossRef]
- International Liaison Committee on Resuscitation (ILCOR). 2025 International Consensus on Cardiopulmonary Resuscitation and Emergency Cardiovascular Care Science with Treatment Recommendations (CoSTR). Available online: https://ilcor.org/uploads/EIT-2025-COSTR-Full-Chapter.pdf (accessed on 9 January 2026).
- Del Rios, M.; Bartos, J.A.; Panchal, A.R.; Atkins, D.L.; Cabañas, J.G.; Cao, D.; Dainty, K.N.; Dezfulian, C.; Donoghue, A.J.; Drennan, I.R.; et al. Part 1: Executive summary: 2025 American Heart Association guidelines for cardiopulmonary resuscitation and emergency cardiovascular care. Circulation 2025, 152, 284–312. [Google Scholar] [CrossRef]
- Donoghue, A.J.; Auerbach, M.; Banerjee, A.; Blewer, A.L.; Cheng, A.; Kadlec, K.D.; Lin, Y.; Diederich, E.; Sawyer, T.; Stallings, D.T.; et al. Part 12: Resuscitation education science: 2025 American Heart Association guidelines for cardiopulmonary resuscitation and emergency cardiovascular care. Circulation 2025, 152, 719–750. [Google Scholar] [CrossRef]
- Hwang, S.O.; Cha, K.-c.; Chung, S.P.; Kim, Y.-M.; Park, J.D.; Kim, H.-S.; Lee, M.J.; Na, S.-H.; Cho, G.C.; Kim, A.-R.E.; et al. 2020 Korean Cardiopulmonary Resuscitation Guidelines. Public Health Wkly. Rep. 2021, 14, 358–369. [Google Scholar]
- Augusto, J.B.; Santos, M.B.; Faria, D.; Alves, P.; Roque, D.; Morais, J.; Gil, V.; Morais, C. Real-Time Visual Feedback Device Improves Quality of Chest Compressions: A Manikin Study. Bull. Emerg. Trauma 2020, 8, 135–141. [Google Scholar] [CrossRef] [PubMed]
- Lin, Y.; Lockey, A.; Donoghue, A.; Greif, R.; Cortegiani, A.; Farquharson, B.; Siddiqui, F.J.; Banerjee, A.; Matsuyama, T.; Cheng, A. Use of CPR Feedback Devices in Resuscitation Training: A Systematic Review and meta-analysis of randomized controlled trials. Resusc. Plus 2025, 23, 100939. [Google Scholar] [CrossRef] [PubMed]
- Huang, L.-W.; Chan, Y.-W.; Tsan, Y.-T.; Zhang, Q.-X.; Chan, W.-C.; Yang, H.-H. Implementation of a Smart Teaching and Assessment System for High-Quality Cardiopulmonary Resuscitation. Diagnostics 2024, 14, 995. [Google Scholar] [CrossRef]
- Di Mitri, D.; Schneider, J.; Specht, M.; Drachsler, H. Detecting Mistakes in CPR Training with Multimodal Data and Neural Networks. Sensors 2019, 19, 3099. [Google Scholar] [CrossRef] [PubMed]
- Abelairas-Gómez, C.; Rey, E.; González-Salvado, V.; Mecías-Calvo, M.; Rodríguez-Ruiz, E.; Rodríguez-Núñez, A. Acute muscle fatigue and CPR quality assisted by visual feedback devices: A randomized-crossover simulation trial. PLoS ONE 2018, 13, 0203576. [Google Scholar] [CrossRef]
- Anderson, T.M.; Secrest, K.; Krein, S.L.; Schildhouse, R.; Guetterman, T.C.; Harrod, M.; Trumpower, B.; Kronick, S.L.; Pribble, J.; Chan, P.S.; et al. Best Practices for Education and Training of Resuscitation Teams for In-Hospital Cardiac Arrest. Circ. Cardiovasc. Qual. Outcomes 2021, 14, e008587. [Google Scholar] [CrossRef] [PubMed]
- Walton, R.; Riha, J.; Swor, T.; Kopper, J.; Yuan, L.; Mochel, J.; Hoen, M.T.; Blong, A. Comparison of Traditional Didactic Versus Additional Hands-On Simulation Training in the Performance of Basic Life Support in Veterinary Students-A Prospective, Blinded, Randomized Study. J. Veter. Med. Educ. 2024, 51, 38–43. [Google Scholar] [CrossRef]
- Fugate, J.M.B.; Tonsager, M.J.; Macrine, S.L. Immersive Extended Reality (I-XR) in Medical and Nursing for Skill Competency and Knowledge Acquisition: A Systematic Review and Implications for Pedagogical Practices. Behav. Sci. 2025, 15, 468. [Google Scholar] [CrossRef]
- Sun, R.; Wang, Y.; Wu, Q.; Wang, S.; Liu, X.; Wang, P.; He, Y.; Zheng, H. Effectiveness of virtual and augmented reality for cardiopulmonary resuscitation training: A systematic review and meta-analysis. BMC Med. Educ. 2024, 24, 730. [Google Scholar] [CrossRef]
- Semeraro, F.; Ristagno, G.; Giulini, G.; Gnudi, T.; Kayal, S.; Monesi, A.; Tucci, R.; Scapigliati, A. Virtual reality cardiopulmonary resuscitation (CPR): Comparison with a standard CPR training mannequin. Resuscitation 2019, 135, 308–312. [Google Scholar] [CrossRef]
- Cheng, A.; Fijacko, N.; Lockey, A.; Greif, R.; Abelairas-Gómez, C.; Gosak, L.; Lin, Y. Use of augmented and virtual reality in resuscitation training: A systematic review. Resusc. Plus 2024, 18, 100643. [Google Scholar] [CrossRef]
- Lee, Y.; Kim, S.K.; Choi, J.; Park, G.W.; Go, Y. Usability of CPR Training System based on Extended Reality. J. Internet Things Converg. 2022, 8, 115–122. [Google Scholar] [CrossRef]
- Trinh, G.; McAdams, R.M. A pilot study of a virtual reality-based simulation platform for Neonatal Resuscitation Program training. J. Perinatol. 2025, 45, 521–526. [Google Scholar] [CrossRef]
- Watanabe, R.; Islam, J.; Zhu, X.; Kaneko, E.; Iseki, K.; Jing, L. Chest Compression Skill Evaluation System Using Pose Estimation and Web-Based Application. Appl. Sci. 2025, 15, 8252. [Google Scholar] [CrossRef]
- Iijima, Y.; Zhu, X.; Jing, L.; Pei, Y.; Kaneko, Y.; Iseki, K. Chest Compression Evaluation based on Pose Estimation. WSEAS Trans. Biol. Biomed. 2024, 21, 323–330. [Google Scholar] [CrossRef]
- Parizad, R.; Hatwal, J.; Javanshir, E.; Batta, A.; Mohan, B. Artificial Intelligence in Cardiopulmonary Resuscitation: Revolutionizing Resuscitation Through Precision and Prediction—A Narrative Review. Vasc. Health Risk Manag. 2025, 21, 847–857. [Google Scholar] [CrossRef] [PubMed]
- Roveta, A.; Castello, L.M.; Massarino, C.; Francese, A.; Ugo, F.; Maconi, A. Artificial Intelligence in Medical Education: A Narrative Review on Implementation, Evaluation, and Methodological Challenges. AI 2025, 6, 227. [Google Scholar] [CrossRef]
- Weerasinghe, K.; Ge, X.; Heick, T.; Wijayasingha, L.; Cortez, A.; Satpathy, A.; Stankovic, J.; Alemzadeh, H. EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services. arXiv 2025, arXiv:2511.09894. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |