Next Article in Journal
Research on Multi-Source Dynamic Stress Data Analysis and Visualization Software for Structural Life Assessment
Previous Article in Journal
Large-Scale Point Cloud Completion Through Registration and Fusion of Object-Level Reconstructions
error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Automating the Evaluation of Artificial Respiration: A Computer Vision Approach

1
School of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China
2
Wenzhou Safety (Emergency) Institute of Tianjin University, Tianjin University, Wenzhou 325000, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2026, 16(1), 555; https://doi.org/10.3390/app16010555
Submission received: 12 November 2025 / Revised: 28 December 2025 / Accepted: 30 December 2025 / Published: 5 January 2026

Abstract

Traditional cardiopulmonary resuscitation (CPR) training faces limitations such as instructor dependency, low efficiency, and subjective assessment. To address these issues, this study proposes a novel computer vision-based method for the automation and objective evaluation of artificial respiration, shifting focus to the long-overlooked ventilation component. We developed an evaluation framework integrating human pose estimation and spatio-temporal graph convolution network (ST-GCN): first, OpenPose is utilized to extract skeletal keypoints of the rescuer, followed by action classification and recognition-including chest compressions, airway opening, and artificial breathing via a ST-GCN. Based on the American Heart Association (AHA) guidelines, this research defines and implements five quantitative metrics for ventilation quality, including CPR operation procedure, chin-frontal angle, interruption time, ventilation time, and ventilation frequency. An automated scoring model was established accordingly. Validated on a self-constructed dataset containing multi-source videos, the model achieved an accuracy of 87.64% in recognizing artificial respiration actions and 84.47% in evaluating action standardization. Experimental results demonstrate that the system can effectively and objectively evaluate the quality of artificial respiration. Compared with traditional instructor-dependent approaches, this study provides a low-cost, scalable technical solution, offering a new pathway for promoting high-quality CPR training.

1. Introduction

Cardiac arrest (CA) is a leading cause of mortality in developed nations and represents a significant global public health concern. High-quality cardiopulmonary resuscitation (CPR) is critical for successful resuscitation outcomes [1]. The low survival rate from OHCA in China is often contextualized by low bystander intervention rates. While direct nationwide comparators for China are scarce, data from Europe indicate an average bystander CPR rate of 58%, underscoring a potential area for systemic improvement [2,3,4]. The low rate of mouth-to-mouth resuscitation is due mainly to low willingness to help and insufficient training in artificial respiration [5]. Therefore, popularizing scientific first aid knowledge and artificial respiration skills is crucial for improving the success rate of CPR.
Conventional cardiopulmonary resuscitation (CPR) training primarily relies on instructor-led sessions using manikins. This approach is constrained by instructor availability, subjective assessment, and a lack of scalable objective feedback [6]. The 2020 American Heart Association (AHA) Guidelines for Cardiopulmonary Resuscitation and Emergency Cardiovascular Care recommend the integration of CPR feedback devices into training. Numerous comparative studies between CPR feedback devices and conventional training methods have demonstrated that these devices significantly improve training outcomes. Recent research has explored the use of human pose estimation (HPE) for automated feedback in CPR training. Song et al. [7] employed AlphaPose with dual ZED cameras to develop a compression posture detection model capable of accurately identifying incorrect compression postures. Weiss et al. [8] applied pose estimation techniques to extract two-dimensional skeletal keypoints, calculate arm angles and chest-to-chest distance metrics, and compare them with expert evaluations, showing that pose-based metrics provide a more detailed and objective assessment of both individual and team CPR performance. Huang et al. [9] introduced SmartCPR, an intelligent training evaluation system that enables users to automatically recognize and assess whether their chest compressions meet the standards of high-quality CPR using only a smartphone. Currently, machine vision [10] and deep learning techniques [11,12,13] has primarily focused on the evaluation of chest compressions. The automated and quantitative assessment of artificial respiration remains a significantly underexplored research gap.
This study proposes a novel vision-based framework for the automated assessment of artificial respiration quality. First, an integrated visual framework dedicated to artificial respiration evaluation is introduced, which combines 2D HPE [14,15] with a Spatial-Temporal Graph Convolutional Network (ST-GCN) [16] to identify different CPR phases and subsequently analyze key performance indicators during the ventilation phase. Second, based on AHA guidelines [17,18] and empirical biomechanical analysis, five quantitative metrics for artificial respiration are defined and implemented: CPR operation procedure, chin–frontal angle, interruption time, ventilation time, and ventilation frequency. Third, a dedicated CPR action dataset comprising multi-source videos is constructed and publicly released, and system performance is validated. Experimental results demonstrate that the proposed framework performs effectively in both action recognition and standardized assessment of ventilation quality.
This paper first outlines the system architecture and the OpenPose model. Subsequently, it explores the criteria for optimal airway opening and details the algorithm’s implementation on the basis of these criteria. Finally, an experimental study is conducted to evaluate the effectiveness of the proposed approach.

2. Materials and Methods

2.1. System Architecture

In this study, we independently developed and established a CPR action database, implemented the deep learning-based OpenPose (version 1.7.0) algorithm to estimate the coordinates of two-dimensional (2D) key human body points accurately, and then constructed an ST-GCN framework for action feature extraction and pattern recognition. This technical approach enables a systematic evaluation of the temporal logic accuracy of CPR procedures and facilitates the quantitative analysis of key parameters, such as the degree of airway opening and ventilation duration during artificial respiration. Ultimately, this work aims to develop a CPR quality assessment system based on multimodal data fusion. As shown in Figure 1, the system architecture comprises four key stages: video capture and preprocessing, human pose estimation, human posture recognition, and motion evaluation.

2.2. Human Pose Estimation

A camera was positioned to capture a frontal view of the trainee performing the maneuver. Both the operator and the upper body of the CPR simulator were within the field of view. A sample CPR video is provided in Supplementary Material (Video S1). FFmpeg is an audio-video processing tool that performs operations such as decoding, encoding, transcoding, and adjusting the resolution and frames per second (FPS). In this study, we utilized the FFmpeg package in Python (version 3.7) to standardize the video format to MP4, set the resolution to 320 × 240, and fixed the FPS at 30. Posture estimation was conducted following format conversion. For human pose estimation, we employed OpenPose, a deep learning-based 2D human body estimation algorithm capable of real-time pose detection for single or multiple individuals. OpenPose operates via a bottom-up approach, initially identifying all key points within an image before assigning and connecting them. Unlike the Top-Down approach, the computational complexity of this method remains independent of the number of individuals present, resulting in higher efficiency and robustness.
The BODY 25 model of OpenPose was selected to detect and extract the coordinate data and confidence scores of 25 key points from the operator in the video [19] (Figure 2). Since artificial respiration involves minimal lower-body movement, only relevant key points were analyzed for assessing airway opening, ventilation duration, interruption time, and ventilation frequency. These key points included the neck, mid-hip, bilateral arms, and index fingers of both hands. Additionally, a filtering process was applied to eliminate local extreme point artifacts, enhancing data reliability while preserving physiological realism.

2.3. Operation Process Judgment

CPR must adhere to the standardized compression–airway–breathing sequence [17]; therefore, action recognition technology is utilized to analyze and assess the CPR process. Figure 3 shows the judgment algorithm for the CPR process consists of three key stages: dataset construction, action classification model training, and procedural evaluation.
The CPR action dataset used for training the action recognition model was derived from two primary sources: demonstrations by professional CPR instructors and publicly available CPR videos collected from the internet. To enhance the model’s generalization ability, the dataset includes individuals with varying heights, weights, and body proportions. The dataset comprises a total of 723 action segments, categorized into chest compressions, airway opening, artificial respiration, and other actions. ST-GCN [20] is then employed to extract key point coordinates from all video frames, enabling action recognition and the transformation of static skeletal frames into temporal skeletal representations. Table 1 shows the action categories and corresponding sample counts in the self-constructed dataset.
The architecture of the ST-GCN is shown in Figure 4. Initially, the input skeletal point behavior tensor is normalized. The spatiotemporal features are subsequently extracted via a stack of nine spatiotemporal convolutional layers, each comprising one graph convolutional network (GCN) unit and three temporal convolutional network (TCN) units. The model is optimized via stochastic gradient descent (SGD) [21], followed by global average pooling, which standardizes the dimensions of the output feature maps. Finally, a fully connected Softmax layer classifies the input tensor’s action category, and the output feature information is backpropagated through the network to facilitate learning in an end-to-end manner.
To enhance computational efficiency and focus on large-scale motion recognition driven by key joints of the head, torso, and major limbs, 18 keypoints from the BODY 25 model were selected. Detailed foot keypoints (such as Toe and Heel) as well as the MidHip point were excluded. The input skeletal point behavior time series tensor is formatted as [B, C, T, V, M]. B denotes the training batch size; C denotes the joint feature dimension; T denotes the number of keyframes; V denotes the number of joints; and M denotes the number of people included in the keyframes. Table 2 summarizes the parameter settings for the model constructed in this study.
The model was trained on a self-constructed dataset of CPR maneuvers, with the data split into training and test sets in an 8:2 ratio. Training was conducted over 80 epochs, with peak Top-1 accuracy of 88.89% achieved at epoch 50. As model performance stabilized thereafter, training was halted, and the parameters obtained at this stage were retained as the final model parameters.
A long CPR video containing multiple actions is segmented into frame sequences of 50 frames each. The trained classifier is then employed to recognize and categorize each frame sequence, reconstructing the sequence of actions throughout the video. Finally, the system performs an assessment to distinguish between correct and incorrect CPR actions.

2.4. Motion Judgment Criteria

To assess whether the airway is successfully opened, this study introduces the concept of the chin–frontal angle, as shown in Figure 5. This angle is defined as the angle between the lines connecting the right and left index fingers and the horizontal plane. A series of experiments were conducted to investigate the relationship between the chin-frontal angle and ventilation effectiveness. A total of 20 volunteers were recruited to perform mouth-to-mouth artificial respiration on CPR simulators under randomized conditions, categorized into four chin-frontal angle ranges: 10–19°, 20–29°, 30–39°, and 40–49°. The results presented in Table 3 demonstrate a positive correlation between the chin-frontal angle and ventilation efficiency. The highest quality of artificial respiration was observed when the chin-frontal angle ranged from 40° to 49°. with statistically significant differences compared with the other three groups (p < 0.001). The partial η2 for mean ventilation volume was 0.997, and for ventilation success rate it was 0.987, indicating that the mandibular-forehead angle range exerts a substantial influence on ventilation quality.
On the basis of the AHA guidelines and our experimental findings, we developed the artificial respiration standards outlined in Figure 1, considering practical constraints such as video image resolution and algorithmic modeling errors. An artificial respiration maneuver was considered correct only if all four metrics achieved a 100% accuracy rate; otherwise, it was defined as an error.

2.5. Judgment of Artificial Respiration

2.5.1. Open Airway Judgment

Artificial respiration is performed after the operator successfully completes the airway opening maneuver. During the prone positioning process, the nose key point moves downward, with its final position located below the mid-hip key point. To analyze this movement, the time series curves of the nose key point and mid-hip key point vertical coordinates were plotted over time, as shown in Figure 6a. The horizontal coordinates of the two intersection points between these time series curves represent the start (T1) and end (T2) times of the mouth-to-mouth ventilation cycle. The first intersection point (T1) serves as a checkpoint for verifying whether the airway is adequately opened at that frame.
To compute the chin-frontal angle, the coordinates of the left and right index fingers ( x L , y L ) and ( x R , y R ) are obtained. The angle is then derived via the weak perspective camera modeling principle [22] and the inverse cosine function, as expressed in Equation (1).
α = a r c sin y R y L x R x L 2 + y R y L 2
If the airway remains insufficiently opened at frame T1, ventilation is considered unsuccessful. The theoretical threshold for the chin-frontal angle is set at 45°, with an allowable deviation of ±5° to account for video resolution errors.

2.5.2. Interrupt Time Judgment

The interruption time during CPR was determined by analyzing the bilateral wrist key point coordinates. Upon completing a chest compression cycle, the operator’s hands transition from a crossed and closed position to a separated state, resulting in an increased distance between the wrists. To quantify this movement, the Euclidean distance ( w r i s t _ d i s t a n c e ) between the left and right wrist key points is calculated via the two-point distance formula within the weak perspective camera model, as expressed in Equation (2):
w r i s t _ d i s t a n c e = x R x L 2 + y R y L 2
where ( x L , y L ) and ( x R , y R ) represent the coordinates of the left and right wrist joints, respectively. As shown in Figure 6b, the time series curve of wrist_distance is plotted alongside the vertical coordinate curve of the nose key point, as shown in Figure 6c. When the wrist distance exceeds 30 pixels, chest compressions are considered complete, and the interruption phase begins. The interruption time is calculated on the basis of the start (T1) and end (T2) frame indices and the video frame rate (f), as defined in Equation (3). Time < 10 s is regarded as qualified.
t i m e = 60 × T 2 T 1 f

2.5.3. Ventilation Time Judgment

To identify the start and end points of each ventilation cycle, we analyzed the longitudinal coordinates of the nose and mid-gluteal key points. The first frame, in which the nose coordinate became lower than the mid-gluteal coordinate, was designated the start of ventilation, whereas the first frame, in which the nose coordinate rose above the mid-gluteal coordinate, marked the end of ventilation. During the ventilation phase, the local minima of the nose coordinate curve were identified, and the horizontal distance between two consecutive minima represented the duration of a single ventilation. A ventilation duration exceeding 1 s was considered adequate.

2.5.4. Ventilation Frequency Judgment

To analyze the chest compressions, we plotted the vertical coordinates of the nose key point as a function of time, as shown in Figure 6c. The local extrema of the curve were used to determine the end points of each compression cycle, ensuring that the horizontal distance between consecutive extrema was ≥5 units. The number of extrema within the interval was recorded as the compression frequency for that cycle. Similarly, to assess the ventilation frequency, we identified the local minima within the ventilation interval, which represented the lowest head position during ventilation. The number of local minima within the ventilation phase was recorded as the ventilation frequency, ensuring that each ventilation endpoint was separated by at least 50 horizontal units. Finally, the compression-to-ventilation ratio was evaluated to determine whether it conformed to the standard 30:2 guideline.

3. Results

3.1. Experimental Environment

The experimental setup comprised a monocular camera (model: DS-E12, Hikvision, Hangzhou, China), a CPR mannequin (manufacturer: Tianjin Tianyan Medical Equipment & Technology Co., Ltd., Tianjin, China), and a GTX 1080 Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA) with an Intel i7 processor (Intel Corporation, Santa Clara, CA, USA) and 16 GB of RAM, which supported OpenPose and other computational processes.
This study utilized a fixed monocular camera to capture video data, with the camera’s position, angle, and distance from the subject held constant. A standardized resuscitation manikin was employed throughout. To evaluate the accuracy of the action recognition model, 40 volunteers were recruited to perform CPR while being recorded. Each volunteer carried out the 30:2 compression-to-ventilation cycle, resulting in a total of 550 video clips. Consistent with the self-constructed CPR action dataset described earlier, these clips comprised four types of operations: 200 chest compressions, 100 airway openings, 200 rescue breaths, and 50 non-CPR actions. Each operation was independently labeled as correct or incorrect by two certified first-aid instructors based on feedback from the manikin’s sensors. The inter-rater agreement, measured by Cohen’s kappa coefficient, was 0.85. The algorithm’s classification results were then compared with the manual annotations to assess the accuracy of the proposed method. The recognition results for the long CPR video sequences are shown in Figure 7.

3.2. Motion Recognition Accuracy

Utilizing the pre-trained parameters of the action recognition model, we evaluated its classification performance on a custom-built dataset. The trained model successfully identified all three types of CPR actions, achieving an average recognition accuracy of 87.64%, as detailed in Table 4. These results indicate that the model meets the accuracy requirements for CPR procedure recognition.

3.3. Testing of Algorithms for Judging Artificial Respiratory Movements

To evaluate the effectiveness of the proposed algorithm, its performance was assessed via four key metrics: accuracy, precision, recall, and F-measure. The mathematical formulations for these metrics are provided in Equations (4)–(7).
TP: Artificial respiration was predicted to be correct when the artificial respiration maneuver was correct.
FP: Artificial respiration was predicted correctly when the artificial respiration maneuver was incorrect.
FN: Artificial respiration was incorrectly predicted when the artificial respiration maneuver was correct.
TN: Artificial respiration was incorrectly predicted when the artificial respiration maneuver was incorrect.
A c c u r a c y = T P + T N T P + F P + F N + T N
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
F m e a s u r e = 2 × R e c a l l × P r e c i s i o n R e c a l l + P r e c i s i o n
The test results indicate that the TP, TN, FP, and FN rates were 0.482, 0.363, 0.068, and 0.086, respectively. The overall accuracy of the artificial respiration action detection algorithm reached 84.47%, whereas the precision, recall, and F-measure were calculated as 87.56%, 84.72%, and 86.12%, respectively. These results demonstrate the algorithm’s effectiveness in assessing the standardization of artificial respiration maneuvers in CPR videos.

4. Discussion

In this study, we incorporated an ST-GCN into the evaluation algorithm to extract spatiotemporal features of artificial ventilation from a self-constructed CPR motion dataset. The model achieved accurate classification and recognition of three types of CPR actions. Compared with conventional training methods that rely on manual evaluation, the proposed model provides objective and real-time assessment of trainees’ performance, thereby greatly improving evaluation efficiency and accuracy. Experimental results demonstrated an action recognition accuracy of 87.64%, indicating high reliability of the system in CPR performance assessment.
Furthermore, we investigated the influence of airway opening angle on ventilation quality during artificial respiration and proposed the concept of the “chin–frontal angle.” Using a controlled variable approach, we analyzed the effect of different rotation angles on tidal volume and found that the range of 40–49° yielded optimal ventilation outcomes. This finding translates clinical airway mechanics into quantifiable computer vision criteria, providing theoretical support for developing airway-opening detection algorithms and enhancing the applicability of the system in artificial ventilation training. By comparing indicators such as comprehensiveness, adaptability, cost, performance, and flexibility, our approach demonstrates distinct advantages and can serve as a standardized evaluation tool for artificial respiration videos.
Previous studies have explored the application of HPE in CPR feedback [7,8,9]. Consistent with our finding that intelligent evaluation enhances training efficiency, most prior research has primarily focused on the chest compression phase. In contrast, our work is the first to innovatively apply computer vision techniques to the recognition and assessment of artificial ventilation, introducing objectivity into this long-overlooked component of CPR training and promoting the comprehensive, intelligent, and precise development of CPR education.
This study has several limitations that indicate directions for future research. First, its performance may degrade when there are significant changes in video perspective (e.g., side or overhead views) or when clothing variations and occlusions occur. Integrating multi-view, multi-camera data or incorporating additional sensors such as depth cameras may address these issues and enhance the robustness of pose estimation. Second, the rule-based evaluation metrics currently rely on a fixed pixel-distance threshold, which assumes a constant camera distance and subject body size. Future work should adopt scale-invariant metrics, such as distances normalized by intrinsic body dimensions (e.g., shoulder width), to enhance the model’s generalizability across different recording setups and anthropometric conditions. Third, the proposed nasal keypoint trajectory serves as a surrogate measure for chest compression frequency. While applicable in training scenarios where head motion is observable—typical for novice rescuers—it may be less reliable for professionals who maintain head stability. Subsequent studies should validate this approach against reference signals from manikin sensors (e.g., using correlation or error metrics) and explore multi-keypoint fusion for more robust frequency detection. Finally, transitioning from offline video analysis to real-time feedback represents a critical step toward deployable training tools. Optimizing inference speed and developing an efficient model will be essential for practical implementation.

5. Conclusions

Compared to existing approaches, the innovation of this study lies in its systematic computer vision-based assessment focusing on artificial respiration—a critical yet long-overlooked component of CPR. Based on the operational standards for artificial respiration, this research develops an assessment algorithm by integrating the OpenPose pose estimation algorithm with an ST-GCN. The algorithm extracts skeletal movement features from CPR training videos to train an action classification model, ultimately enabling the evaluation and feedback of artificial respiration performance. Although the current model performs well in controlled experimental settings, there remains room for improvement in its processing efficiency for real-time video streams, adaptability to complex environments (e.g., varied clothing and lighting conditions), and the completeness of the assessment workflow. Overall, the proposed model offers a promising auxiliary tool for modern CPR training. It can effectively reduce reliance on experienced instructors, provide trainees with immediate and objective performance feedback through automated analysis, and help establish unified and standardized skill assessment criteria. This study provides a new technological direction for advancing the popularization, standardization, and intelligent development of CPR skill training.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16010555/s1, Video S1: A test video of CPR.

Author Contributions

Conceptualization, Y.T., C.W. and B.F.; Data curation, Y.T., C.W. and S.M.; Formal analysis, Y.T., C.W. and W.D.; Funding acquisition, W.D. and B.F.; Investigation, Y.T., C.W. and W.D.; Methodology, Y.T., C.W. and S.M.; Project administration, S.M.; Resources, W.D. and B.F.; Software, C.W. and S.M.; Supervision, W.D. and B.F.; Validation, Y.T., C.W. and S.M.; Visualization, Y.T. and S.M.; Writing—original draft, Y.T., W.D. and B.F.; Writing—review and editing, Y.T., C.W., W.D. and B.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number: 2023YFC3011802, grant name: Research on Key Technologies and Prototype Development of Fully Closed Loop Digital Intelligence Integrated Cardiopulmonary Resuscitation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

We extend our sincere gratitude to the Basic Life Support Research Group for their invaluable assistance. Special thanks are also due to the two CPR instructors and all volunteers for their essential participation and contributions to the completion of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CPRCardiopulmonary resuscitation
ST-GCNSpatiotemporal graph convolutional network
AHAAmerican Heart Association
CACardiac arrest
HPEHuman pose estimation
CNNConvolutional Neural Networks
ERCEuropean Resuscitation Council
FPSFrames per second
GCNGraph convolutional network
TCNTemporal convolutional network
SGDStochastic gradient descent

References

  1. Kaplow, R.; Cosper, P.; Snider, R.; Boudreau, M.; Kim, J.D.; Riescher, E.; Higgins, M. Impact of CPR Quality and Adherence to Advanced Cardiac Life Support Guidelines on Patient Outcomes in In-Hospital Cardiac Arrest. AACN Adv. Crit. Care 2020, 31, 401–409. [Google Scholar] [CrossRef]
  2. Birkun, A.; Gautam, A.; Trunkwala, F. Global prevalence of cardiopulmonary resuscitation training among the general public: A scoping review. Clin. Exp. Emerg. Med. 2021, 8, 255–267. [Google Scholar] [CrossRef] [PubMed]
  3. Gräsner, J.T.; Wnent, J.; Herlitz, J.; Perkins, G.D.; Lefering, R.; Tjelmeland, I.; Koster, R.W.; Masterson, S.; Rossell-Ortiz, F.; Maurer, H.; et al. Survival after out-of-hospital cardiac arrest in Europe—Results of the EuReCa TWO study. Resuscitation 2020, 148, 218–226. [Google Scholar] [CrossRef] [PubMed]
  4. Yan, S.; Gan, Y.; Wang, R.; Song, X.; Zhou, N.; Lv, C. Willingness to attend cardiopulmonary resuscitation training and the associated factors among adults in China. Crit. Care 2020, 24, 457. [Google Scholar] [CrossRef] [PubMed]
  5. Ayobami Adeyem, E. Mouth-to-mouth Ventilation in Cardiopulmonary Resuscitation, Is It a Necessity? Era J. Med. Res. 2023, 10, 74–76. [Google Scholar] [CrossRef]
  6. Ali, D.M.; Hisam, B.; Shaukat, N.; Baig, N.; Ong, M.E.H.; Epstein, J.L.; Goralnick, E.; Kivela, P.D.; McNally, B.; Razzak, J. Cardiopulmonary resuscitation (CPR) training strategies in the times of COVID-19: A systematic literature review comparing different training methodologies. Scand. J. Trauma Resusc. Emerg. Med. 2021, 29, 53. [Google Scholar] [CrossRef]
  7. Fei, S.; Zexing, N.; Chao, C.; Chunxiu, W.; Yajun, W.; Zhenzhen, F.; Ying, H.; Ruirui, L.; Chunlin, Y. A new chest compression posture detection model based on dual ZED camera. Chin. J. Emerg. Med. 2023, 32, 1189–1194. [Google Scholar] [CrossRef]
  8. Weiss, K.E.; Kolbe, M.; Nef, A.; Grande, B.; Kalirajan, B.; Meboldt, M.; Lohmeyer, Q. Data-driven resuscitation training using pose estimation. Adv. Simul. 2023, 8, 12. [Google Scholar] [CrossRef]
  9. Huang, L.W.; Chan, Y.W.; Tsan, Y.T.; Zhang, Q.X.; Chan, W.C.; Yang, H.H. Implementation of a Smart Teaching and Assessment System for High-Quality Cardiopulmonary Resuscitation. Diagnostics 2024, 14, 995. [Google Scholar] [CrossRef]
  10. Zhu, L.; Spachos, P.; Pensini, E.; Plataniotis, K.N. Deep learning and machine vision for food processing: A survey. Curr. Res. Food Sci. 2021, 4, 233–249. [Google Scholar] [CrossRef]
  11. Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
  12. Kufel, J.; Bargieł-Łączek, K.; Kocot, S.; Koźlik, M.; Bartnikowska, W.; Janik, M.; Czogalik, Ł.; Dudek, P.; Magiera, M.; Lis, A.; et al. What Is Machine Learning, Artificial Neural Networks and Deep Learning?-Examples of Practical Applications in Medicine. Diagnostics 2023, 13, 2582. [Google Scholar] [CrossRef] [PubMed]
  13. Chen, X.; Wang, X.; Zhang, K.; Fung, K.M.; Thai, T.C.; Moore, K.; Mannel, R.S.; Liu, H.; Zheng, B.; Qiu, Y. Recent advances and clinical applications of deep learning in medical image analysis. Med. Image Anal. 2022, 79, 102444. [Google Scholar] [CrossRef] [PubMed]
  14. Samkari, E.; Arif, M.; Alghamdi, M.; Al Ghamdi, M.A. Human Pose Estimation Using Deep Learning: A Systematic Literature Review. Mach. Learn. Knowl. Extr. 2023, 5, 1612–1659. [Google Scholar] [CrossRef]
  15. Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 172–186. [Google Scholar] [CrossRef]
  16. Hedegaard, L.; Heidari, N.; Iosifidis, A. Continual spatio-temporal graph convolutional networks. Pattern Recognit. 2023, 140, 109528. [Google Scholar] [CrossRef]
  17. Joglar, J.A.; Chung, M.K.; Armbruster, A.L.; Benjamin, E.J.; Chyou, J.Y.; Cronin, E.M.; Deswal, A.; Eckhardt, L.L.; Goldberger, Z.D.; Gopinathannair, R.; et al. 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation 2024, 149, e1–e156. [Google Scholar] [CrossRef]
  18. Olasveengen, T.M.; Semeraro, F.; Ristagno, G.; Castren, M.; Handley, A.; Kuzovlev, A.; Monsieurs, K.G.; Raffay, V.; Smyth, M.; Soar, J.; et al. European Resuscitation Council Guidelines 2021: Basic Life Support. Resuscitation 2021, 161, 98–114. [Google Scholar] [CrossRef]
  19. Cicirelli, G.; D’Orazio, T. A Low-Cost Video-Based System for Neurodegenerative Disease Detection by Mobility Test Analysis. Appl. Sci. 2022, 13, 278. [Google Scholar] [CrossRef]
  20. Plizzari, C.; Cannici, M.; Matteucci, M. Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 2021, 208–209, 103219. [Google Scholar] [CrossRef]
  21. Sclocchi, A.; Wyart, M. On the different regimes of stochastic gradient descent. Proc. Natl. Acad. Sci. USA 2024, 121, e2316301121. [Google Scholar] [CrossRef]
  22. Mo, L.; Qi, N.; Zhao, Z. Spacecraft Pose Estimation Based on Different Camera Models. Chin. J. Mech. Eng. 2023, 36, 63. [Google Scholar] [CrossRef]
Figure 1. System architecture.
Figure 1. System architecture.
Applsci 16 00555 g001
Figure 2. OpenPose extraction skeleton effect: (A) chest compressions; (B) opening the airway; (C) artificial respiration.
Figure 2. OpenPose extraction skeleton effect: (A) chest compressions; (B) opening the airway; (C) artificial respiration.
Applsci 16 00555 g002
Figure 3. Flowchart of the CPR operation flow judgment algorithm.
Figure 3. Flowchart of the CPR operation flow judgment algorithm.
Applsci 16 00555 g003
Figure 4. Schematic diagram of the model structure.
Figure 4. Schematic diagram of the model structure.
Applsci 16 00555 g004
Figure 5. Angular changes during chin-frontal rotation: (a) 10–19°; (b) 20–29°; (c) 30–39°; (d) 40–49°.
Figure 5. Angular changes during chin-frontal rotation: (a) 10–19°; (b) 20–29°; (c) 30–39°; (d) 40–49°.
Applsci 16 00555 g005
Figure 6. Time−series curves of key point vertical coordinates: (a) Nose and mid-hip key points; (b) Both wrist key points; (c) Nose key point.
Figure 6. Time−series curves of key point vertical coordinates: (a) Nose and mid-hip key points; (b) Both wrist key points; (c) Nose key point.
Applsci 16 00555 g006
Figure 7. Example of long video recognition results.
Figure 7. Example of long video recognition results.
Applsci 16 00555 g007
Table 1. Self-constructed dataset-related information sheet.
Table 1. Self-constructed dataset-related information sheet.
CPR ActionsSample SizeLabelLabel Number
chest compressions220chestcompression2
open airway185Airway0
artificial respiration210Breath1
other actions108Other3
Table 2. ST-GCN parameter configuration.
Table 2. ST-GCN parameter configuration.
Model LevelType of LevelNumber of Key PointsInputOutput
Batch NormalizationNormalization Layer18(16,3,150,18,1)(32,3,150,18)
ST-GCN1GCN→TCN×3 *18(32,3,150,18)(32,64,150,18)
ST-GCN2GCN→TCN×318(32,64,150,18)(32,64,150,18)
ST-GCN3GCN→TCN×318(32,64,150,18)(32,128,150,18)
ST-GCN4GCN→TCN×318(32,128,150,18)(32,128,75,18)
ST-GCN5GCN→TCN×318(32,128,75,18)(32,128,75,18)
ST-GCN6GCN→TCN×318(32,128,75,18)(32,128,75,18)
ST-GCN7GCN→TCN×318(32,128,75,18)(32,128,38,18)
ST-GCN8GCN→TCN×318(32,256,75,18)(32,128,38,18)
ST-GCN9GCN→TCN×318(32,256,75,18)(32,128,38,18)
Global Average PoolingGlobal Average Pooling18(32,256,75,18)(32,256,1,1)
Fully ConnectedFully Connected Layer18(32,256,1,1)(32,9,1,1)
* GCN→TCN×3 indicates the execution of one spatial graph convolution (GCN) followed by a temporal convolution module consisting of three stacked temporal convolutional layers (TCN).
Table 3. Quality of artificial respiration in the range of different chin-frontal angles.
Table 3. Quality of artificial respiration in the range of different chin-frontal angles.
Average Ventilation (mL)Average Ventilation 95% CIQualified Rate of Ventilation (%)Qualified Rate of Ventilation 95% CI
10–19°282.18 ± 13.01[271.23, 293.13]4.50 ± 3.12[1.63, 7.37]
20–29°389.02 ± 11.57[378.07, 399.97]22.75 ± 3.34[19.88, 25.62]
30–39°486.76 ± 12.09[475.81, 497.71]41.00 ± 6.04[35.63, 46.37]
40–49°539.44 ± 7.02[528.49, 550.39]59.00 ± 7.00[53.63, 64.37]
F1868.615 393.117
p<0.001 <0.001
η2p0.997 0.987
Table 4. Motion recognition accuracy.
Table 4. Motion recognition accuracy.
Movement CategoryNumber of VideosPositiveNegativeAccuracy (%)Average Accuracy (%)
chest compressions200182891.0087.64
open airway100871387.00
artificial respiration200171985.50
others5042884.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.; Tong, Y.; Ma, S.; Dong, W.; Fan, B. Automating the Evaluation of Artificial Respiration: A Computer Vision Approach. Appl. Sci. 2026, 16, 555. https://doi.org/10.3390/app16010555

AMA Style

Wang C, Tong Y, Ma S, Dong W, Fan B. Automating the Evaluation of Artificial Respiration: A Computer Vision Approach. Applied Sciences. 2026; 16(1):555. https://doi.org/10.3390/app16010555

Chicago/Turabian Style

Wang, Chaofang, Yali Tong, Shuai Ma, Wenlong Dong, and Bin Fan. 2026. "Automating the Evaluation of Artificial Respiration: A Computer Vision Approach" Applied Sciences 16, no. 1: 555. https://doi.org/10.3390/app16010555

APA Style

Wang, C., Tong, Y., Ma, S., Dong, W., & Fan, B. (2026). Automating the Evaluation of Artificial Respiration: A Computer Vision Approach. Applied Sciences, 16(1), 555. https://doi.org/10.3390/app16010555

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop