Towards Objective Emotional Monitoring in Children with Cerebral Palsy: A Review of rPPG and Multimodal Approaches
Abstract
1. Introduction
2. Materials and Methods
2.1. Databases and Sources
2.2. Search Strategy
2.3. Inclusion and Exclusion Criteria
- Studies published in peer-reviewed journals and full conference papers.
- Articles in English (and [Spanish, if applicable]).
- Relevant articles reporting experiments, analyzes, and/or evaluations of the proposed approach.
- Studies involving pediatric populations, children with disabilities, or scenarios simulating high-motion interference (relevant for CP).
- Editorials, conference abstracts without full papers and opinion papers.
- Articles that do not report experiments, analyses, and/or evaluations of the proposed approach.
- Articles that lack empirical support through evaluation or analysis.
2.4. Screening and Selection
- Editorials, opinion papers, and conference abstracts without a corresponding full paper (n = 19).
- Articles that did not report experiments, technical analyses, or evaluations of the proposed approach (n = 29).
- Articles lacking empirical support through formal evaluation or rigorous analysis (n = 19).
2.5. Data Extraction and Synthesis
- QA1: Does it include experiments derived from the proposal?
- QA2: Does it include relevant or sufficient articles on the subject?
- QA3: Does it include an analysis and/or evaluation of the proposal, or does it provide quantitative results or an exhaustive analysis?
- QA4: Does it utilize experimental protocols?
- QA5: Does it incorporate elements for the protection of sensitive data?
- Study reference (title, authors and keywords).
- Methodological data: algorithm used (e.g., CHROM, POS, ICA, CNN, Masked Autoencoder).
- Practical scope, specifying applications and key concepts.
- Empirical evidence, highlighting the key results.
- Population and context: sample size, age group (pediatric vs. adult), and lighting/motion conditions.
3. Fundamentals of Remote Photoplethysmography (rPPG)
3.1. Physiological and Optical Principles
3.2. Evolution of Algorithms: CHROM, POS, ICA
- ICA: This method utilizes blind source separation (BSS) to decompose temporal RGB color mixtures into independent non-Gaussian components. Although revolutionary for isolating the cardiac pulse from non-periodic noise without prior training, its accuracy significantly degrades under rapid motion. Studies show that ICA requires higher processing time due to its iterative nature and is prone to “component switching,” where the pulse signal moves between different independent channels [28,29]. This limitation is critical in children with CP; sudden spastic movements or dystonic posturing can cause continuous component switching, making tracking stable HRV features for stress or pain classification highly unreliable.
- CHROM: By assuming a standardized skin color to define a projection direction, this algorithm eliminates specular reflection and mitigates ambient light variations. Technically, it offers a superior balance between complexity and accuracy; it is computationally lighter than ICA and shows lower MAE in scenarios with stable illumination. However, its reliance on fixed skin-tone ratios limits its performance across diverse phenotypes [20,28]. For pediatric neurorehabilitation or home-care settings, ambient light often fluctuates as the child moves or changes posture, meaning that while CHROM reduces minor motion artifacts, it may fail to isolate the micro-vascular pulse during intense affective distress episodes where the child exhibits heightened motor activity.
- PBV: This model leverages the specific “signature” or vector of blood volume changes across the RGB spectrum. By using a pre-defined PBV, it achieves higher robustness against motion artifacts than CHROM. In comparative tests, PBV maintains a stable Signal-to-Noise Ratio SNR even when the subject is talking, although its precision depends heavily on the correct calibration of the camera’s spectral sensitivity [12,28]. This mathematical resilience represents an advantage for children with CP who present uncoordinated facial expressions or vocalizations; however, the reliance on precise spectral calibration limits its clinical feasibility across different standard webcams or video settings used in ecological or school environments.
- POS: Representing the current state-of-the-art in mathematical methods, POS defines a projection plane orthogonal to the skin tone in a temporally normalized RGB space. According to recent benchmarks, POS consistently achieves the lowest RMSE in datasets with large-head rotations and exercise (e.g., MAE ≈ 2–5 bpm in moderate motion). It outperforms CHROM and ICA by effectively isolating the pulse from intensity variations caused by distance changes to the sensor [2,28]. Consequently, POS emerges as the most clinically feasible algorithmic candidate among traditional methods for children with CP, as it is capable of handling the frequent distance and orientation changes relative to the camera caused by trunk instability or wheel-chair adjustments, while maintaining the HR quality required for objective emotion monitoring.
3.3. Recent Advances with Deep Learning
- CNN: Primarily utilized for extracting spatial–temporal features directly from video frames. Architectures like X-IPPGNet leverage depthwise separable convolutions to reduce the number of parameters, achieving a high precision with a RMSE of 6.26 bpm and a MAE of 4.99 bpm on the UBFC-rPPG dataset [26]. This balance between accuracy and efficiency enables precise ROI segmentation even in less-constrained scenarios [6,26]. For children with CP, the lightweight and robust nature of these spatial–temporal CNNs is crucial, as they can maintain ROI tracking even when involuntary facial movements or hypertonia alter standard facial landmarks, ensuring consistent pixel extraction for pulse analysis.
- LSTM: These recurrent networks are essential for modeling the sequential nature of cardiac signals, assisting in the prediction of HR over time. For instance, recent architectures utilize an LSTM model in a multi-region framework to effectively map optimized remote photoplethysmography signals from multiple facial ROIs into precise HR estimations. This approach significantly mitigates instabilities caused by motion artifacts and illumination variations by capturing long-term temporal dependencies in the pulse signal, balancing accuracy against the inherent computational latency of recurrent structures [7]. In affective monitoring for pediatric CP, LSTM networks provide a valuable framework to differentiate between sudden physiological spikes caused by a physical spasm and the prolonged autonomic changes associated with persistent pain or emotional distress.
- Transformers: Their global attention mechanism allows for capturing long-term dependencies within the video signal. While they require larger datasets for training, they can outperform CNNs in pulse wave reconstruction by focusing on relevant skin patches, thus improving the SNR during significant head rotations [12,27]. This self-attention capability is highly advantageous for children with CP, who frequently experience sudden head drops, lateral rotations, or partial facial occlusions due to wheelchair restraints or involuntary posturing; the model can dynamically shift its attention to unoccluded skin regions to prevent signal dropout.
- Self-supervised learning: Frameworks like rPPG-MAE address the scarcity of labeled clinical data by using pre-training on unlabeled videos. Adopting this approach enhances system generalization and narrows the “domain gap,” enabling models with robust performance across different lighting conditions and camera sensors without requiring synchronized ECG ground-truth during initial training [12,21,27]. This learning paradigm directly addresses a major bottleneck in pediatric clinical research, where collecting synchronized gold-standard physiological data (e.g., ECG or contact sensors) from highly agitated or uncooperative children with severe CP is often clinically and ethically unfeasible.
3.4. Technical Challenges
- Skin tone variability: Melanin levels affect light absorption, which can introduce biases in the accuracy of traditional algorithms [12].
- Limited datasets: There is a notable lack of extensive, public databases that include diverse populations, specifically children with CP, to validate the full reliability of these systems in real world environments and ensure their integration into proactive monitoring systems [14].
4. Applications of rPPG in Physiological Monitoring
4.1. Heart Rate (HR) and Heart Rate Variability (HRV)
4.2. Respiratory Rate and Blood Oxygenation
4.3. Multimodal Estimation (Blood Pressure, SpO2, Multiple Vital Signs)
4.4. Validation Against Reference Methods (ECG, Pulse Oximetry)
- Reference software: The use of specialized tools such as ixTrend allows for the comparison of pulse rates extracted from video with actual heart rates recorded simultaneously [4].
- Medical devices: Results are typically validated against commercial oximeters (such as the Omron HEM-6111) for the accuracy of the obtained bpm values [1].
- ECG signals: ECG remains the primary gold standard for validating the temporal precision of rPPG. Its role is critical not only for basic HR estimation but for calculating HRV. In children with CP, ECG provides the necessary timing accuracy to guarantee that rPPG reliably captures physiological shifts associated with emotional dysregulation or pain [2].
- Statistical metrics: Validation is supplemented by MAE analysis and Mean Absolute Percentage Error (MAPE), frequently visualized through Bland–Altman plots [32].
5. rPPG and Emotion Monitoring. Particularization to Children with Cerebral Palsy
5.1. Specific Challenges: Spasticity, Limited Communication, and Motor Variability
5.2. Emotion and Stress Recognition from HR/HRV
5.3. Integration with Complementary Biomarkers and Limitations
6. Discussion: Critical Synthesis, Identified Gap and Future Perspectives
6.1. Maturity of rPPG vs. Immaturity for Emotion Detection
6.2. Potential of Multimodal Integration
6.3. Technical Robustness: Algorithms Resistant to Motion
6.4. Methodological and Ethical Limitations
6.5. Expected Impact and Future Perspectives
6.6. Translational Roadmap for rPPG Technology in Children with Cerebral Palsy
- Phase T1: Laboratory Validation and Algorithmic Robustness.The foundation of the roadmap focuses on mitigating the severe motion artifacts induced by spasticity and involuntary movements characteristic of pediatric CP; while classical mathematical frameworks like POS provide a baseline for handling head rotations under controlled constraints [2,28], deep learning architectures offer the necessary flexibility for clinical translation. Lightweight spatiotemporal convolutional neural networks (CNNs) represent a crucial step forward, allowing continuous tracking of regions of interest (ROIs) even when severe hypertonia distorts standard facial geometry [6,26]. To finalize this laboratory phase, future developments must prioritize recurrent networks capable of modeling long-term temporal dependencies to stabilize the pulse signal against sudden shifts [7], alongside the optimization of alternative chromatic channels that have already demonstrated high accuracy in unconstrained preliminary tests [4].
- Phase T2: Clinical Testing and Addressing the Data Gap.The second phase marks the transition to controlled clinical evaluations, a step currently hindered by a critical methodological void: the absolute scarcity of public, labeled rPPG datasets involving children with neurodevelopmental disorders [14]. Forcing algorithms trained on neurotypical adults to interpret atypical infant or pediatric motor patterns causes severe performance degradation due to the extreme geometric variability of spontaneous motion [35]. To break this bottleneck without introducing highly intrusive contact-based reference sensors (like ECG) that stress agitated children, T2 translation must leverage self-supervised learning (SSL). Implementing frameworks such as masked autoencoders allows models to pre-train on massive unlabeled video data, effectively bridging the domain gap and enhancing generalization in actual clinical environments [12,21,27].
- Phase T3: Multimodal Integration and Clinical Utility.Phase T3 shifts the focus toward multi-sensor systems capable of providing comprehensive clinical utility. Objective emotional and physiological monitoring in children with severe communication impairments cannot rely on a single optical modality. During sudden spasms or posture adjustments where the child’s face is completely occluded, the rPPG signal must be dynamically supported by mechanical variables. Fusing rPPG with ballistocardiography (BCG) via sensors integrated directly into the wheelchair’s seating or support structure ensures continuous, uninterrupted monitoring of vital signs like heart rate and indirect blood pressure [2]. This physiological telemetry must be cross-validated with robust local morphological descriptors that automatically identify gross motor expressions of pain or distress [17].
- Phase T4: Real-World Community and Home Implementation.The ultimate goal of this translational roadmap is the permanent deployment of non-invasive monitoring in the daily lives of patients (homes and schools). To achieve true ecological validity, the autonomic markers (HR/HRV) extracted remotely via rPPG toolkits [15] must be translated into real-time, interpretable alerts for caregivers and families. This final phase will provide a reliable, objective “voice” to children with cerebral palsy, transforming a laboratory optical technique into an accessible assistive technology that directly improves their quality of life.
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| BCG | Ballistocardiography. |
| CHROM | Chrominance-based method. |
| CNN | Convolutional Neural Networks. |
| CP | Cerebral Palsy. |
| ECG | Electrocardiogram. |
| FPGA | Field Programmable Gate Array. |
| HR | Heart Rate. |
| HRV | Heart Rate Variability. |
| ICA | Independent Component Analysis. |
| LSTM | Long Short-Term Memory. |
| MAE | Mean Absolute Error. |
| PBV | Blood Volume Pulse Vector. |
| POS | Plane-Orthogonal-to-Skin. |
| RF | Respiration Frequency. |
| RMSE | Root Mean Square Error. |
| ROI | Regions Of Interest. |
| rPPG | remote PPG. |
| SNR | Signal-to-Noise Ratio. |
References
- Hsu, J.Y.; Jiang, T.Y.; Chao, P.C.P. A Fast FPGA Hardware Accelerator for Remote Heart Rate Detection Based on RGB Vision. IEEE Trans. Biomed. Circuits Syst. 2024, 18, 592–607. [Google Scholar] [CrossRef]
- Liu, Y.; Qin, B.; Li, R.; Li, X.; Huang, A.; Liu, H.; Lv, Y.; Liu, M. Motion-Robust Multimodal Heart Rate Estimation Using BCG Fused Remote-PPG With Deep Facial ROI Tracker and Pose Constrained Kalman Filter. IEEE Trans. Instrum. Meas. 2021, 70, 5007215. [Google Scholar] [CrossRef]
- Mehta, A.D.; Sharma, H. CPulse: Heart Rate Estimation From RGB Videos Under Realistic Conditions. IEEE Trans. Instrum. Meas. 2023, 72, 5023312. [Google Scholar] [CrossRef]
- Sinhal, R.; Singh, K.; Raghuwanshi, M. Heart rate measurement based on color signal extraction. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 1990–1993. [Google Scholar] [CrossRef]
- Qi, L.; Yu, H.; Xu, L.; Mpanda, R.; Greenwald, S. Robust heart-rate estimation from facial videos using Project ICA. Physiol. Meas. 2019, 40, 085007. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.C.; Lin, C.H.; Chiu, L.W.; Wu, B.F.; Chung, M.L.; Tang, S.C.; Sun, Y. Contact-Free Atrial Fibrillation Screening with Attention Network. IEEE J. Biomed. Health Inform. 2024, 28, 5124–5135. [Google Scholar] [CrossRef]
- Liu, H.; Ding, Y.; Zhou, M.; Li, Q. Adaptive-Weight Network for Imaging Photoplethysmography Signal Extraction and Heart Rate Estimation. IEEE Trans. Instrum. Meas. 2022, 71, 5023909. [Google Scholar] [CrossRef]
- Gao, H.; Zhang, C.; Pei, S.; Wu, X. Region of Interest Analysis Using Delaunay Triangulation for Facial Video-Based Heart Rate Estimation. IEEE Trans. Instrum. Meas. 2024, 73, 5009712. [Google Scholar] [CrossRef]
- Liu, S.Q.; Yuen, P.C. Robust Remote Photoplethysmography Estimation With Environmental Noise Disentanglement. IEEE Trans. Image Process. 2024, 33, 27–41. [Google Scholar] [CrossRef]
- Collins, M.L.; Davies, T.C. Emotion differentiation through features of eye-tracking and pupil diameter for monitoring well-being. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–4. [Google Scholar] [CrossRef]
- Qi, Y.; Chee, Y.J.; Miao, C.; Zheng, S.; Jie Choo, T.W.; Zhang, R.; Wang, Q.; Qi Zhou, M.Y.; Olivo, M.; Dalan, R.; et al. An automated optical flow-mediated dilation method for fast screening of endothelial function. Biomed. Signal Process. Control 2026, 118, 109785. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, Y.; Yu, Z.; Lu, H.; Yue, H.; Yang, J. rPPG-MAE: Self-Supervised Pretraining With Masked Autoencoders for Remote Physiological Measurements. IEEE Trans. Multimed. 2024, 26, 7278–7293. [Google Scholar] [CrossRef]
- Liu, L.; Xia, Z.; Zhang, X.; Feng, X.; Zhao, G. Illumination Variation-Resistant Network for Heart Rate Measurement by Exploring RGB and MSR Spaces. IEEE Trans. Instrum. Meas. 2024, 73, 5026613. [Google Scholar] [CrossRef]
- Freitas, A.; Almeida, R.; Gonçalves, H.; Conceição, G.; Freitas, A. Monitoring fatigue and drowsiness in motor vehicle occupants using electrocardiogram and heart rate—A systematic review. Transp. Res. Part F Traffic Psychol. Behav. 2024, 103, 586–607. [Google Scholar] [CrossRef]
- Boccignone, G.; Conte, D.; Cuculo, V.; D’Amelio, A.; Grossi, G.; Lanzarotti, R.; Mortara, E. pyVHR: A Python framework for remote photoplethysmography. PeerJ Comput. Sci. 2022, 8, e929. [Google Scholar] [CrossRef]
- Gupta, K.; Sinhal, R.; Badhiye, S.S. Remote photoplethysmography-based human vital sign prediction using cyclical algorithm. J. Biophotonics 2024, 17, e202300286. [Google Scholar] [CrossRef]
- Rosales, C.; Jácome, L.; Carrión, J.; Jaramillo, C.; Palma, M. Computer vision for detection of body expressions of children with cerebral palsy. In Proceedings of the 2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), Salinas, Ecuador, 16–20 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Redd, C.B.; Silvera-Tawil, D.; Hopp, D.; Zandberg, D.; Martiniuk, A.; Dietrich, C.; Karunanithi, M.K. Physiological Signal Monitoring for Identification of Emotional Dysregulation in Children. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 4273–4277. [Google Scholar] [CrossRef]
- Qiao, D.; Ayesha, A.H.; Zulkernine, F.; Jaffar, N.; Masroor, R. ReViSe: Remote Vital Signs Measurement Using Smartphone Camera. IEEE Access 2022, 10, 131656–131670. [Google Scholar] [CrossRef]
- Liu, Y.; Xu, C.; Qi, L.; Li, Y. A robust non-contact heart rate estimation from facial video based on a non-parametric signal extraction model. Biomed. Signal Process. Control 2024, 93, 106186. [Google Scholar] [CrossRef]
- Jo, J.; Yoon, Y.c. Remote Heart Rate Estimation Using Attention-targeted Self-Supervised Learning Methods. Int. J. Adv. Sci. Eng. Inf. Technol. 2023, 13, 870. [Google Scholar] [CrossRef]
- Benezeth, Y.; Krishnamoorthy, D.; Botina Monsalve, D.J.; Nakamura, K.; Gomez, R.; Mitéran, J. Video-based heart rate estimation from challenging scenarios using synthetic video generation. Biomed. Signal Process. Control 2024, 96, 106598. [Google Scholar] [CrossRef]
- Wang, J.; Lu, H.; Wang, A.; Chen, Y.; He, D. Hierarchical Style-Aware Domain Generalization for Remote Physiological Measurement. IEEE J. Biomed. Health Inform. 2024, 28, 1635–1643. [Google Scholar] [CrossRef]
- Liu, X.; Yang, X.; Li, X. HRUNet: Assessing Uncertainty in Heart Rates Measured From Facial Videos. IEEE J. Biomed. Health Inform. 2024, 28, 2955–2966. [Google Scholar] [CrossRef] [PubMed]
- Das, M.; Choudhary, T.; Bhuyan, M.K.; Sharma, L.N. Non-Contact Heart Rate Measurement From Facial Video Data Using a 2D-VMD Scheme. IEEE Sens. J. 2022, 22, 11153–11161. [Google Scholar] [CrossRef]
- Ouzar, Y.; Djeldjli, D.; Bousefsaf, F.; Maaoui, C. X-iPPGNet: A novel one stage deep learning architecture based on depthwise separable convolutions for video-based pulse rate estimation. Comput. Biol. Med. 2023, 154, 106592. [Google Scholar] [CrossRef]
- Park, S.; Kim, B.K.; Dong, S.Y. Self-supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimation. IEEE Trans. Instrum. Meas. 2022, 71, 5024910. [Google Scholar] [CrossRef]
- Boccignone, G.; Conte, D.; Cuculo, V.; D’Amelio, A.; Grossi, G.; Lanzarotti, R. An Open Framework for Remote-PPG Methods and Their Assessment. IEEE Access 2020, 8, 216083–216103. [Google Scholar] [CrossRef]
- Tangjui, N.; Taeprasartsit, P. Robust Method for Non-Contact Vital Sign Measurement in Videos Acquired in Real-World Light Settings From Skin Less Affected by Blood Perfusion. IEEE Access 2024, 12, 28582–28597. [Google Scholar] [CrossRef]
- Shao, H.; Luo, L.; Qian, J.; Chen, S.; Hu, C.; Yang, J. TranPulse: Remote Photoplethysmography Estimation with Time-Varying Supervision to Disentangle Multiphysiologically Interference. IEEE Trans. Instrum. Meas. 2024, 73, 5029911. [Google Scholar] [CrossRef]
- Kurihara, K.; Sugimura, D.; Hamamoto, T. Non-Contact Heart Rate Estimation via Adaptive RGB/NIR Signal Fusion. Trans. Image Process. 2021, 30, 6528–6543. [Google Scholar] [CrossRef]
- Molinaro, N.; Zangarelli, F.; Schena, E.; Silvestri, S.; Massaroni, C. Cardiorespiratory Parameters Monitoring Through a Single Digital Camera in Real Scenarios: ROI Tracking and Motion Influence. IEEE Sens. J. 2023, 23, 20097–20106. [Google Scholar] [CrossRef]
- Mirabet-Herranz, N.; Galdi, C.; Dugelay, J.L. Facial Biometrics in the Social Media Era: An In-Depth Analysis of the Challenge Posed by Beautification Filters. IEEE Trans. Biom. Behav. Identity Sci. 2024, 7, 108–117. [Google Scholar] [CrossRef]
- Othman, W.; Kashevnik, A.; Ali, A.; Shilov, N.; Ryumin, D. Remote Heart Rate Estimation Based on Transformer with Multi-Skip Connection Decoder: Method and Evaluation in the Wild. Sensors 2024, 24, 775. [Google Scholar] [CrossRef]
- Lysenko, S.; Seethapathi, N.; Prosser, L.; Kording, K.; Johnson, M.J. Towards Automated Emotion Classification of Atypically and Typically Developing Infants. In Proceedings of the 2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob), New York, NY, USA, 29 November–1 December 2020; pp. 503–508. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]


| Reference | QA1 | QA2 | QA3 | QA4 | QA5 | Total Value |
|---|---|---|---|---|---|---|
| [1] | 1 | 1 | 1 | 1 | 1 | 5 |
| [2] | 1 | 1 | 1 | 1 | 0 | 4 |
| [4] | 1 | 0.5 | 1 | 1 | 0 | 3.5 |
| [7] | 1 | 1 | 1 | 1 | 0 | 4 |
| [9] | 1 | 1 | 1 | 1 | 0 | 4 |
| [10] | 1 | 1 | 1 | 1 | 1 | 5 |
| [12] | 1 | 1 | 1 | 1 | 0.5 | 4.5 |
| [13] | 1 | 1 | 1 | 1 | 0 | 4 |
| [14] | 0 | 1 | 1 | 1 | 0 | 3 |
| [15] | 1 | 1 | 1 | 1 | 0 | 4 |
| [16] | 1 | 1 | 1 | 1 | 1 | 5 |
| [17] | 1 | 1 | 1 | 1 | 0.5 | 4.5 |
| [18] | 1 | 1 | 1 | 1 | 1 | 5 |
| [19] | 1 | 1 | 1 | 1 | 0.5 | 4.5 |
| [20] | 1 | 1 | 1 | 1 | 0 | 4 |
| [21] | 1 | 1 | 1 | 1 | 0 | 4 |
| [5] | 1 | 1 | 1 | 1 | 0 | 4 |
| [22] | 1 | 1 | 1 | 1 | 0 | 4 |
| [6] | 1 | 1 | 1 | 1 | 1 | 5 |
| [3] | 1 | 1 | 1 | 1 | 1 | 5 |
| [23] | 1 | 1 | 1 | 1 | 0 | 4 |
| [24] | 1 | 1 | 1 | 1 | 1 | 5 |
| [25] | 1 | 1 | 1 | 1 | 0.5 | 4.5 |
| [8] | 1 | 1 | 1 | 1 | 0 | 4 |
| [26] | 1 | 1 | 1 | 1 | 0 | 4 |
| [27] | 1 | 1 | 1 | 1 | 0 | 4 |
| [28] | 1 | 1 | 1 | 1 | 0 | 4 |
| [29] | 1 | 1 | 1 | 1 | 0.5 | 4.5 |
| Ref | Techniques/Algorithms | Applications |
|---|---|---|
| [12] | rPPG-MAE: Masked Autoencoder, Vision Transformer (ViT), PC-STMap augmentation | Multimodal estimation: HR, HRV, and RF |
| [30] | TranPulse; Video Swin Transformer (3D); Asymmetric Encoder-Decoder | HR estimation; Multi-physiological signal disentanglement |
| [28] | pyVHR framework: POS, CHROM, PCA, SSR, ICA, GREEN, LGI, PBV, MTTS-CAN | HR estimation, HRV, blood volumetric pulse (BVP) estimation. |
| [16] | Cascade residual CNN-FPNR, ICNet, PCA, FFT, SVM classification, GEVD | Multimodal Estimation: HR, Respiratory Rate (RR), SpO2, and Temperature. |
| [19] | BlazeFace, MediaPipe, ICA, Detrending Filter, ResNet blocks | Multimodal Estimation: HR, HRV, SpO2, and Blood Pressure (BP) |
| [2] | DFT-KF-POS: ECO tracker, PFLD landmark detector, Pose Constrained Kalman Filter, POS + BCG fusion via FIR notch filters | Multimodal HR Estimation (BCG + rPPG) |
| [3] | CPulse: Homomorphic filtering, Empirical Wavelet Transform (EWT), and Principal Component Analysis (PCA) | HR measurement. |
| [26] | X-iPPGNet; 3D Depthwise Separable Convolutions; Xception-inspired architecture | Pulse Rate (Heart Rate) estimation. |
| [24] | HRUNet; Bayesian Neural Network (BNN); STFT; Variational inference; CNN | HR measurement; Arrhythmia screening (Atrial Fibrillation) |
| [13] | TST-SFA (Time–Space Transformer); Multiscale Retinex (MSR); ResNet-101 | HR measurement |
| [14] | Systematic review: CNN, LSTM, SVM, KNN, Random Forest, BPNN | HR, HRV, Respiratory Rate, Drowsiness, and Fatigue monitoring |
| [23] | HSRD (Hierarchical Style-aware Representation Disentangling), ResNet-18, AdaIN | HR Estimation, HRV |
| [6] | CNN with Attention Mechanism (HRV-B, HRV-M, Dual-M), POS-based rPPG, OpenFace | Atrial Fibrillation (AF) Screening, HRV indices |
| [20] | CiSSA (Circulant Singular Spectrum Analysis); JBSS; IVA-G; CHROM | HR estimation |
| [8] | DD-ROI (Data-Driven ROI); Delaunay Triangulation; MediaPipe; CHROM; POS; PBV. | HR estimation |
| [27] | Fusion Video Vision Transformer (Fusion ViViT), Contrastive Learning (SSL) | Remote HR Estimation (Near-instant ∼6 s) |
| [29] | ICA with Savitzky–Golay filter, Butterworth filter, nACF, and SAM | HR measurement from non-facial regions (forearm) |
| [22] | PhysNet; RTPPG (3D-CNN); KDE (Kernel Density Estimation); Image animation | HR estimation in NIR and Fitness |
| [7] | Adaptive-weight network (1D CNN), LSTM (3-layer), dlib landmark detector | HR Estimation, HRV |
| [21] | PhysFormer, Temporal difference transformer | HR Estimation, rPPG prediction |
| [25] | 2D Variational Mode Decomposition (2D-VMD), AAPSD, Multimode Kurtosis, ICA | HR Estimation |
| [5] | Project_ICA (Signal projection+ICA), KLT tracker, Viola-Jones detector | HR Estimation |
| [1] | FastICA (Independent Component Analysis); FFT; Jacobi SVD; Fixed-point arithmetic | HR detection |
| [4] | rPPG, Butterworth band pass filter, Peak counting | HR (Pulse Rate) Estimation |
| [18] | Logistic Regression (LR), Support Vector Machine (SVM), Decision Trees (DT) | Emotional State Classification (Meltdown, Frustrated, Happy, etc.) |
| [17] | Viola-Jones, AdaBoost, Haar features, Cascade of classifiers | Body Pattern Detection (Headache, Happiness, Hunger, Fear, Recreation) |
| Ref | Key Concepts | Key Results |
|---|---|---|
| [12] | Self-supervised Learning, Vision Transformer (ViT), PC-STMap (POS/CHROM with STMap) | MAE: 4.52 bpm (VIPL-HR), 0.40 bpm (PURE), 0.29 bpm (UBFC-rPPG). |
| [30] | Deep Learning; Vision Transformer (ViT); Spatio-temporal modeling; High-dimensional reconstruction | MAE: 0.41 bpm (UBFC-rPPG), 0.54 bpm (COHFACE), 4.12 bpm (VIPL-HR). Pearson . |
| [28] | Deep Learning (MTTS-CAN), Python 3.9 framework, face tracking (Kalman filter, MTCNN), GPU acceleration | MAE < 1 bpm for top methods; POS and CHROM outperformed baseline. |
| [16] | Deep Learning (CNN), FCN-8S segmentation, Multimodal imagery framework, Noise reduction (SCNAU) | HR accuracy: 99.83%; Precision: 90.37%; HR RMSE: 5.26 bpm, PCC: 0.923. |
| [19] | Deep Learning (ResNet for BP, BlazeFace for detection), intensity-based rPPG, cloud-based processing | HR MAE: 1.73 bpm (TokyoTech), 3.95 bpm (PURE). BP MAE: 6.7 mmHg (SBP)/9.6 mmHg (DBP). |
| [2] | Deep Learning (PFLD), Multimodal Integration (video rPPG + cushion BCG), Pose Constrained Kalman Filter | MAE 3.12 bpm lower than POS alone; Pearson correlation |
| [3] | Signal processing (EWT and PCA) to isolate blood volumetric pulse (BVP) | MAE: 0.98 bpm (PURE), 1.06 bpm (UBFC), 1.99 bpm (COHFACE), 1.81 bpm (ASIPL). |
| [26] | Deep Learning; CNN(3D-CNN); Color channel decoupling; Xception network | MMSE-HR: MAE = 4.10, RMSE = 5.32; UBFC-rPPG: MAE = 4.99; MAHNOB-HCI: MAE = 3.17. |
| [24] | Deep Learning; Bayesian posterior estimation; Uncertainty assessment | Health-HR-NSR: MAE = 1.758; Health-HR-AF: MAE = 5.412; Outperforms SOTA when uncertainty > 0.4 is excluded. |
| [13] | Deep Learning; Space-shared/specific features; Transformer; Affinity variation loss | VIPL-HR: MAE = 4.39; COHFACE: MAE = 1.31; BH-rPPG: MAE = 2.73. |
| [14] | Deep Learning, CNN, LSTM, Multimodal Integration (physiological + behavioral data) | Accuracy: Fatigue alerts 93.4% (mCNN); Drowsiness classification 91% (CNN-LSTM); HRV-based BPNN 88%. |
| [23] | Deep Learning, CNN(ResNet), Adversarial Learning, Domain Generalization, Representation Disentangling | MAE: 1.05 (PURE), 8.00 (VIPL-HR), 0.54 (UBFC), 8.66 (V4V) |
| [6] | Deep Learning, CNN, Attention Mechanism, Multi-task Learning, Motion Analysis | Sensitivity: 96.62%, Specificity: 90.61%, AUC: 0.96. |
| [8] | Supervised ROI selection; Facial mesh modeling; Dynamic skin segmentation | MAE: 0.55 bpm (PURE), 2.24 bpm (IIP-W), 6.19 bpm (IIP-F). Significant reduction in MAE vs. standard ROI |
| [27] | Deep Learning, Video Vision Transformer, Self-Supervised Learning, RGB-NIR Fusion, Multimodal Integration | RMSE: 14.86 bpm (VIPL-HR SSL); 16.94 bpm (MR-NIR-Car Transfer learning) |
| [29] | Non-facial iPPG, Signal processing (SG filter, SAM), robust to AC fluorescent light interference | Reduced errors by 83% for fluorescent lighting. MAE: 3.32 bpm under fluorescent tubes. |
| [22] | Deep Learning; Data Augmentation; Synthetic video generation; Near-infrared (NIR); Transfer Learning | IMVIA-NIR: MAE = 3.25 bpm; ECG-Fitness: MAE = 9.32 bpm; SNR improvement: −10.9 to 4.7 dB |
| [7] | Deep Learning, 1D CNN, LSTM, Multiple ROI Integration | MAHNOB-HCI: RMSE = 7.65 bpm, Standard Deviation (STD) = 7.55, Pearson Correlation (R) = 0.82. |
| [21] | Self-supervised Learning (SSL), Transfer Learning, Attention mechanism, Multichannel input | MAE: 4.97 (COHFACE), 1.71 (UBFC1), 1.62 (UBFC2). SSL improved performance on COHFACE |
| [25] | Spatial–Temporal Filtering, Variational Mode Decomposition, Statistical Mode Selection | COHFACE (Natural light): RMSE = 2.51 bpm, . Private dataset: RMSE = 0.80 bpm, |
| [5] | Blind Source Separation (ICA), Skin Reflection Model, Face Tracking, Color Normalization | Stationary MAD: 3.30 bpm; Computer Interaction RMSE: 7.10 bpm. Pale skin (r ) vs. Dark skin (). |
| [1] | FPGA Hardware Acceleration; Verilog HDL; ROI search; Real-time monitoring | ME: −0.76 ± 5.09 bpm (16 s window); Hardware computation time: 0.034 ms to 0.710 ms. |
| [4] | Remote PPG, RGB Channel Comparison, ROI (Forehead) extraction | Accuracy: Blue channel 89.09%, Red 79.22%, Green 76.82%. |
| [18] | Wearable Sensor (Empatica E4), Multimodal Integration (HR, HRV, EDA, TEMP, ACC) | Global accuracy 68%; Person-dependent accuracy up to 85%. |
| [17] | Machine Learning, Intelligent Agent, Mobile Application Integration, Boosting technique | Accuracy: Headache 77%, Happiness 75%, Hunger 82%, Fear 88%, Recreation 77%. |
| Ref | Technical Setup (Camera Type, Wavelength, Frame Rate (fps)) | Signal Processing (ROI Strategy, Algorithm) | Dataset and Population (Dataset, Sample Size (n), Age Group, Skin Tone Diversity) | Experimental Conditions (Motion Protocol, Illumination Condition) | Validation and Metrics (Reference Device (Gold Standard (e.g. ECG, PPG)), Outcome Metrics (MAE, RMSE)) | Significance and Limitations (Key Limitation, Clinical Relevance) |
|---|---|---|---|---|---|---|
| [6] | Logitech C920 webcam/CCD camera, RGB, 30/84 fps | IBI consistency, HRV indices; CNN-based Attention Network | 657 participants (Largest AF database); Mean age ≈ 71.67 years; Taiwanese clinical sites | Talking, facial expressions, head movements; Ambient fluorescent lighting (200–400 lx) | Single-lead Sigknow EZYPRO ECG patch; Sensitivity 96.62%, Specificity 90.61% | Contact-free Atrial Fibrillation screening; Clinical relevance for telemedicine; Limitation: privacy and dark ambient lighting. |
| [20] | Logitech C930, C310, RGB, 30 fps | Cheek ROI (standardized rectangular); HRUNet (Bayesian Neural Network) with integrated Fourier transform | Health-HR (), UBFC, OBF, PURE, MMSE-HR, COHFACE, VIPL-HR; Chinese ethnicity; diverse skin colors in MMSE | Undisturbed, Motion-disturbed (nodding, speaking, moving), Light-disturbed (dark/dim) | Medical-grade oximeter; MAE: 1.73 (undisturbed), 7.96 (motion), 3.25 (light) bpm | Quantifies measurement uncertainty; Limitation: Uncertainty may fail to flag periodic noise (e.g., running frequency). |
| [28] | RGB-video camera, Visible light, Variable (25, 30, 61 fps) | Face detection (MTCNN, Kalman), ROI (Patch/Skin thresholding); POS, CHROM, PCA, SSR, ICA, GREEN, LGI, PBV | PURE, LGI, UBFC, MAHNOB, COHFACE; to 164; Age 19–40 (MAHNOB); Skin tone not in source | Resting, talking, head rotation, translation; Controlled and natural light | ECG, BVP; MAE, RMSE, PCC | Proposes open Python framework (pyVHR); Lack of standardized pre/post processing and reproducible evaluation. |
| [30] | Webcam or phone camera, RGB, 25 fps | Facial frame differences; TranPulse (Two-stage Video Swin Transformer) | UBFC-rPPG, COHFACE, VIPL-HR (), PURE; Imbalanced population distributions | Head rotation, talking, expressions; Studio, bright, and dim lighting | Realistic heart waveforms (PPG/ECG); MAE: 4.69 (VIPL-HR), RMSE: 7.53 (VIPL-HR) | Disentangles multiphysiological interference (e.g., respiration); Limitation: slightly worse performance in dim light. |
| [26] | RGB cameras, 25–61 fps | Face segmentation and cropping; X-iPPGNet: 3D Depthwise Separable Convolutions | BP4D+ (), MMSE-HR (), MAHNOB-HCI (); Highly diverse (Black, White, Asian, Hispanic) | Spontaneous facial expressions, significant head motions, occlusions | Contact sensors (BVP/ECG); MAE: 4.10 (MMSE), 4.99 (UBFC), 3.17 (MAHNOB) bpm | End-to-end PR estimation from 2s windows; Limitation: No BVP signal recovery prevents pulse wave feature analysis. |
| [29] | Eyepiece industrial microscope, RGB, 25 fps | Non-facial skin (forearm); Savitzky–Golay filter, ICA, FFT spectrum accumulation mechanism (SAM) | SF-VS dataset, ; Age 18–74; Fitzpatrick types II to V | Minimal movement; Ring LED, LED downlight, and ceiling fluorescent tubes | Arduino-based pulse sensor; MAE: 0.73 to 3.44 bpm | Addressed AC light interference in non-facial regions; Limitations: lacks Fitzpatrick Type-I; misses some clinical requirements. |
| [8] | Microsoft Lifecam Studio/Logitech C920, RGB, 20–61 fps | Delaunay Triangulation (898 triangular ROIs); Data-driven ROI (DD-ROI) selection | PURE, IIP-F, IIP-W, COHFACE, UBFC, MAHNOB-HCI; Asians and Caucasians | Resting, talking, facial rotation; Varying light intensity | Contact PPG/ECG; MAE: 1.70 (PURE), 2.24 (IIP-W) | Systematic analysis of facial regions; Recommendation of forehead/cheek ROIs; Limitation: Occlusion (hats/hair). |
| [23] | Commodity cameras, RGB, 30 fps | STMap, Hierarchical Style-aware Representation Disentangling (HSRD) using ResNet-18 | VIPL-HR, V4V, PURE, BUAA, UBFC-rPPG; Diverse skin color and gender | Varying illumination, different movement levels, complex backgrounds | ECG/PPG (CMS50E); MAE: 4.31 (UBFC), RMSE: 6.30 (UBFC) | Addresses domain shift and instance-specific variation; Limitation: imprecise domain categorization in implicit DG |
| [3] | Webcam (Logitech B525), Smartphone (Nord2), RGB, 20–30 fps | Sub-ROIs (25 × 25); Homomorphic filtering, EWT (Empirical Wavelet Transform), and PCA | PURE, UBFC, COHFACE, ASIPL (); High diversity including darker skin tones (India) | Natural and artificial light; rigid/nonrigid head motion and facial expressions | Finger pulse oximeter (CMS-60C); MAE: 0.98 (PURE), 1.06 (UBFC), 1.99 (COHFACE), 1.81 (ASIPL) bpm | Substantial tolerance to varying skin tones and motion; Limitation: rapid facial expression changes. |
| [16] | Logitech C920 HD Pro, RGB, 30 fps | ROI (face, nose); Cyclical algorithm (PCA + FFT), Cascade residual CNN-FPNR, ICNet | UBFC-RPPG, ; Age 18–35; Caucasian, Asian, Hispanic (von Luschan chart) | Indoor sitting, spontaneous movements; Sunlight and fluorescent ceiling lamps | CMS50E pulse oximeter (PPG); MAE, RMSE, MSE, PCC | Predicts multiple vitals (HR, RR, SpO2); Accuracy increases when adding trends to models. |
| [19] | Smartphone camera, Visible (RGB), 30 fps | MediaPipe Face Mesh (478 landmarks), ROIs (Forehead, cheek, nose); ICA, ResNet | TokyoTech (), PURE (), Video-HR (), Video-BP (); Age 10–80 years | Steady, talking, rotation, daily living environment; Ambient lights | Finger PPG, Pulse oximeter, Andesfit BP Monitor; MAE | Validated in daily living environments; Commercialized as ‘Veyetals’ mobile app (initial release version, 2022). |
| [25] | Webcam (QHMPL), Green channel, 6–20 fps | ROI (Forehead, cheeks); 2D-VMD (Variational Mode Decomposition) with AAPSD | Private (, 2 yrs) and COHFACE (, yrs); Variable skin tones | Sitting, no large movements; Studio (Halogen) and natural light | Contact PPG (Nonin SenSmart); ME, RMSE, SD, Correlation (r) | 2D-VMD reduces error significantly compared to ICA; Dlib face detection fails during head rotations. |
| [22] | RGB and Near-infrared (NIR) cameras; 30 fps | Face tracking (BlazeFace) and skin detection; Physnet and RTrPPG (3D-CNN) | MERL-RICE, TokyoTech-NIR, ECG-Fitness, IMVIA-NIR (, varying ethnicities) | Fitness activities (rowing, running, biking), low-light/nighttime (NIR) | Pulse oximeter, finger PPG sensor; MAE (lowered to 1.63 bpm with DA on MERL) | Synthetic video generation for data augmentation; Addresses scarcity of NIR and movement-heavy datasets |
| [5] | Logitech C270i, RGB, 30 fps | Face tracking/Skin detection; Project_ICA (Simplified skin reflection model) | 28 participants (18 pale, 10 dark skin); 112 videos | Stationary, computer interaction, swinging heads, exercise recovery; External daylight | Finger pulse oximeter (YUWELL YX303); MAD (3.30 stationary), RMSE (7.21 stationary) | Robust against head movement; Key limitation: Obtaining reliable measurements from dark-skinned subjects. |
| [21] | High-resolution camera, RGB, 30 fps | Facial regions (cheek, forehead, jaw); Physformer (Video Transformer) with SSL | COHFACE, UBFC1, UBFC2, PURE (); Various skin tones | Talking, reading, exercising; Different lighting and camera angles | Contactless imaging PPG; MAE: 4.97 (COHFACE), RMSE: 5.71 (COHFACE) | Reduces reliance on labeled data through SSL; Limitation: performance on Pearson’s Correlation was not improved |
| [1] | TRDB-D5M camera, RGB, 16 fps | Skin detection rules; ICA implemented on FPGA | 12 subjects, 142 data points, Age 21–28, Male/Female | Resting, mitigated motion artifacts; Ambient lighting | Omron HEM-6111; Accuracy (ME ± 1.96SD) of bpm (16 s) | Fast FPGA hardware accelerator for real-time edge processing; Limitations: Fixed-point precision inferiority. |
| [7] | Consumer-level camera, RGB (Green channel), 30 fps | Multiple cheek ROIs (annulus/circular); Adaptive-Weight Network with LSTM | MAHNOB-HCI, 27 subjects, Age 19–40; Slight facial expressions | Emotional stimulation scenarios; Slight facial expressions, small body movements | ECG; RMSE 7.65 bpm, Pearson’s R 0.65 | Dynamically selected ROI weights; Limitation: ROI loss during head turning. |
| [4] | Mobile phone camera, RGB, 30.35 fps | Manual static forehead ROI; Color Intensity Pulse Count (Butterworth band-pass filter) | DMIMS database, ; Women aged 20–35; Skin tone diversity not specified | No motion protocol (subjects held hands up), Controlled environment | MP20 monitor with ixTrend software (Version not available); Accuracy (Blue channel: 89.09%, Red: 79.22%, Green: 76.82%) | Low accuracy due to lack of SNR improvement; Demonstrates feasibility via mobile phone without specialized hardware. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Nava-Bautista, M.X.; Castillo-Topete, V.H.; Molina-Cantero, A.J.; Gómez-González, I.M. Towards Objective Emotional Monitoring in Children with Cerebral Palsy: A Review of rPPG and Multimodal Approaches. Appl. Sci. 2026, 16, 5502. https://doi.org/10.3390/app16115502
Nava-Bautista MX, Castillo-Topete VH, Molina-Cantero AJ, Gómez-González IM. Towards Objective Emotional Monitoring in Children with Cerebral Palsy: A Review of rPPG and Multimodal Approaches. Applied Sciences. 2026; 16(11):5502. https://doi.org/10.3390/app16115502
Chicago/Turabian StyleNava-Bautista, Martha Xóchitl, Víctor H. Castillo-Topete, Alberto J. Molina-Cantero, and Isabel M. Gómez-González. 2026. "Towards Objective Emotional Monitoring in Children with Cerebral Palsy: A Review of rPPG and Multimodal Approaches" Applied Sciences 16, no. 11: 5502. https://doi.org/10.3390/app16115502
APA StyleNava-Bautista, M. X., Castillo-Topete, V. H., Molina-Cantero, A. J., & Gómez-González, I. M. (2026). Towards Objective Emotional Monitoring in Children with Cerebral Palsy: A Review of rPPG and Multimodal Approaches. Applied Sciences, 16(11), 5502. https://doi.org/10.3390/app16115502

