MDPI - Publisher of Open Access Journals

42 pages, 3140 KiB

Open AccessReview

Face Anti-Spoofing Based on Deep Learning: A Comprehensive Survey

by Huifen Xing, Siok Yee Tan, Faizan Qamar and Yuqing Jiao

Appl. Sci. 2025, 15(12), 6891; https://doi.org/10.3390/app15126891 - 18 Jun 2025

Viewed by 2122

Face recognition has achieved tremendous success in both its theory and technology. However, with increasingly realistic attacks, such as print photos, replay videos, and 3D masks, as well as new attack methods like AI-generated faces or videos, face recognition systems are confronted with [...] Read more.

Face recognition has achieved tremendous success in both its theory and technology. However, with increasingly realistic attacks, such as print photos, replay videos, and 3D masks, as well as new attack methods like AI-generated faces or videos, face recognition systems are confronted with significant challenges and risks. Distinguishing between real and fake faces, i.e., face anti-spoofing (FAS), is crucial to the security of face recognition systems. With the advent of large-scale academic datasets in recent years, FAS based on deep learning has achieved a remarkable level of performance and now dominates the field. This paper systematically reviews the latest advancements in FAS based on deep learning. First, it provides an overview of the background, basic concepts, and types of FAS attacks. Then, it categorizes existing FAS methods from the perspectives of RGB (red, green and blue) modality and other modalities, discussing the main concepts, the types of attacks that can be detected, their advantages and disadvantages, and so on. Next, it introduces popular datasets used in FAS research and highlights their characteristics. Finally, it summarizes the current research challenges and future directions for FAS, such as its limited generalization for unknown attacks, the insufficient multi-modal research, the spatiotemporal efficiency of algorithms, and unified detection for presentation attacks and deepfakes. We aim to provide a comprehensive reference in this field and to inspire progress within the FAS community, guiding researchers toward promising directions for future work. Full article

(This article belongs to the Special Issue Deep Learning in Object Detection)

► Show Figures

Figure 1

20 pages, 2511 KiB

Open AccessArticle

MT-CMVAD: A Multi-Modal Transformer Framework for Cross-Modal Video Anomaly Detection

by Hantao Ding, Shengfeng Lou, Hairong Ye and Yanbing Chen

Appl. Sci. 2025, 15(12), 6773; https://doi.org/10.3390/app15126773 - 16 Jun 2025

Viewed by 855

Abstract

Video anomaly detection (VAD) faces significant challenges in multimodal semantic alignment and long-term temporal modeling within open surveillance scenarios. Existing methods are often plagued by modality discrepancies and fragmented temporal reasoning. To address these issues, we introduce MT-CMVAD, a hierarchically structured Transformer architecture [...] Read more.

Video anomaly detection (VAD) faces significant challenges in multimodal semantic alignment and long-term temporal modeling within open surveillance scenarios. Existing methods are often plagued by modality discrepancies and fragmented temporal reasoning. To address these issues, we introduce MT-CMVAD, a hierarchically structured Transformer architecture that makes two key technical contributions: (1) A Context-Aware Dynamic Fusion Module that leverages cross-modal attention with learnable gating coefficients to effectively bridge the gap between RGB and optical flow modalities through adaptive feature recalibration, significantly enhancing fusion performance; (2) A Multi-Scale Spatiotemporal Transformer that establishes global-temporal dependencies via dilated attention mechanisms while preserving local spatial semantics through pyramidal feature aggregation. To address the sparse anomaly supervision dilemma, we propose a hybrid learning objective that integrates dual-stream reconstruction loss with prototype-based contrastive discrimination, enabling the joint optimization of pattern restoration and discriminative representation learning. Our extensive experiments on the UCF-Crime, UBI-Fights, and UBnormal datasets demonstrate state-of-the-art performance, achieving AUC scores of 98.9%, 94.7%, and 82.9%, respectively. The explicit spatiotemporal encoding scheme further improves temporal alignment accuracy by 2.4%, contributing to enhanced anomaly localization and overall detection accuracy. Additionally, the proposed framework achieves a 14.3% reduction in FLOPs and demonstrates 18.7% faster convergence during training, highlighting its practical value for real-world deployment. Our optimized window-shift attention mechanism also reduces computational complexity, making MT-CMVAD a robust and efficient solution for safety-critical video understanding tasks. Full article

► Show Figures

Figure 1

15 pages, 6040 KiB

Open AccessArticle

Estimation of Respiratory Signals from Remote Photoplethysmography of RGB Facial Videos

by Hyunsoo Seo, Seunghyun Kim and Eui Chul Lee

Electronics 2025, 14(11), 2152; https://doi.org/10.3390/electronics14112152 - 26 May 2025

Viewed by 572

Abstract

Recently, technologies monitoring users’ physiological signals in consumer electronics such as smartphones or kiosks with cameras and displays are gaining attention for their potential role in diverse services. While many of these technologies focus on photoplethysmography for the measurement of blood flow changes, [...] Read more.

Recently, technologies monitoring users’ physiological signals in consumer electronics such as smartphones or kiosks with cameras and displays are gaining attention for their potential role in diverse services. While many of these technologies focus on photoplethysmography for the measurement of blood flow changes, respiratory measurement is also essential for assessing an individual’s health status. Previous studies have proposed thermal camera-based and body movement-based respiratory measurement methods. In this paper, we adopt an approach to extract respiratory signals from RGB face videos using photoplethysmography. Prior research shows that photoplethysmography can measure respiratory signals, due to its correlation with cardiac activity, by setting arterial vessel regions as areas of interest for respiratory measurement. However, this correlation does not directly reflect real-time respiratory components in photoplethysmography. Our new approach measures the respiratory rate by capturing changes in skin brightness from motion artifacts. We utilize these brightness factors, including facial movement, for respiratory signal measurement. We applied the wavelet transform and smoothing filters to remove other unrelated motion artifacts. In order to validate our method, we built a dataset of respiratory rate measurements from 20 individuals using an RGB camera in a facial movement-aware environment. Our approach demonstrated a similar performance level to the reference signal obtained with a contact-based respiratory belt, with a correlation above 0.9 and an MAE within 1 bpm. Moreover, our approach offers advantages for real-time measurements, excluding complex computational processes for measuring optical flow caused by the movement of the chest due to respiration. Full article

(This article belongs to the Special Issue Pattern Recognition and Image Processing: Latest Advances and Prospects)

► Show Figures

Figure 1

12 pages, 2710 KiB

Open AccessArticle

Smartphone Video Imaging Combined with Machine Learning: A Cost-Effective Method for Authenticating Whey Protein Supplements

by Xuan Tang, Wenjiao Du, Weiran Song, Weilun Gu and Xiangzeng Kong

Foods 2025, 14(7), 1277; https://doi.org/10.3390/foods14071277 - 5 Apr 2025

Viewed by 704

Abstract

With the growing interest in health and fitness, whey protein supplements are becoming increasingly popular among fitness enthusiasts and athletes. The surge in demand for whey protein supplements highlights the need for cost-effective methods to characterise product quality throughout the food supply chain. [...] Read more.

With the growing interest in health and fitness, whey protein supplements are becoming increasingly popular among fitness enthusiasts and athletes. The surge in demand for whey protein supplements highlights the need for cost-effective methods to characterise product quality throughout the food supply chain. This study presents a rapid and low-cost method for authenticating sports whey protein supplements using smartphone video imaging (SVI) combined with machine learning. A gradient of colours ranging from purple to red is displayed on the front screen of a smartphone to illuminate the sample. The colour change on the sample surface is captured in a short video by the front-facing camera. Then, the video is split into frames, decomposed into RGB colour channels, and converted into spectral data. The relationship between video data and sample labels is established using machine learning models. The proposed method is tested on five tasks, including identifying 15 brands of whey protein concentrate (WPC), quantifying fat content and energy levels, detecting three types of adulterants, and quantifying adulterant levels. Moreover, the performance of SVI was compared to that of hyperspectral imaging (HSI), which has an equipment cost of around 80 times that of SVI. The proposed method achieves accuracies of 0.933 and 0.96 in WPC brand identification and adulterant detection, respectively, which are only around 0.05 lower than those of HSI. It obtains coefficients of determination of 0.897, 0.906 and 0.963 for the quantification of fat content, energy levels and milk powder adulteration, respectively. Such results demonstrate that the combination of smartphones and machine learning offers a low-cost and viable preliminary screening tool for verifying the authenticity of whey protein supplements. Full article

(This article belongs to the Special Issue Anti-Food Fraud: Technologies in Food Safety, Quality and Traceability)

► Show Figures

Graphical abstract

57 pages, 8107 KiB

Open AccessReview

Machine Learning for Human Activity Recognition: State-of-the-Art Techniques and Emerging Trends

by Md Amran Hossen and Pg Emeroylariffion Abas

J. Imaging 2025, 11(3), 91; https://doi.org/10.3390/jimaging11030091 - 20 Mar 2025

Cited by 2 | Viewed by 4240

Abstract

Human activity recognition (HAR) has emerged as a transformative field with widespread applications, leveraging diverse sensor modalities to accurately identify and classify human activities. This paper provides a comprehensive review of HAR techniques, focusing on the integration of sensor-based, vision-based, and hybrid methodologies. [...] Read more.

Human activity recognition (HAR) has emerged as a transformative field with widespread applications, leveraging diverse sensor modalities to accurately identify and classify human activities. This paper provides a comprehensive review of HAR techniques, focusing on the integration of sensor-based, vision-based, and hybrid methodologies. It explores the strengths and limitations of commonly used modalities, such as RGB images/videos, depth sensors, motion capture systems, wearable devices, and emerging technologies like radar and Wi-Fi channel state information. The review also discusses traditional machine learning approaches, including supervised and unsupervised learning, alongside cutting-edge advancements in deep learning, such as convolutional and recurrent neural networks, attention mechanisms, and reinforcement learning frameworks. Despite significant progress, HAR still faces critical challenges, including handling environmental variability, ensuring model interpretability, and achieving high recognition accuracy in complex, real-world scenarios. Future research directions emphasise the need for improved multimodal sensor fusion, adaptive and personalised models, and the integration of edge computing for real-time analysis. Additionally, addressing ethical considerations, such as privacy and algorithmic fairness, remains a priority as HAR systems become more pervasive. This study highlights the evolving landscape of HAR and outlines strategies for future advancements that can enhance the reliability and applicability of HAR technologies in diverse domains. Full article

(This article belongs to the Special Issue Advancing Action Recognition: Novel Approaches, Techniques and Applications)

► Show Figures

Figure 1

20 pages, 25584 KiB

Open AccessArticle

LIDeepDet: Deepfake Detection via Image Decomposition and Advanced Lighting Information Analysis

by Zhimao Lai, Jicheng Li, Chuntao Wang, Jianhua Wu and Donghua Jiang

Electronics 2024, 13(22), 4466; https://doi.org/10.3390/electronics13224466 - 14 Nov 2024

Cited by 2 | Viewed by 2420

Abstract

The proliferation of AI-generated content (AIGC) has empowered non-experts to create highly realistic Deepfake images and videos using user-friendly software, posing significant challenges to the legal system, particularly in criminal investigations, court proceedings, and accident analyses. The absence of reliable Deepfake verification methods [...] Read more.

The proliferation of AI-generated content (AIGC) has empowered non-experts to create highly realistic Deepfake images and videos using user-friendly software, posing significant challenges to the legal system, particularly in criminal investigations, court proceedings, and accident analyses. The absence of reliable Deepfake verification methods threatens the integrity of legal processes. In response, researchers have explored deep forgery detection, proposing various forensic techniques. However, the swift evolution of deep forgery creation and the limited generalizability of current detection methods impede practical application. We introduce a new deep forgery detection method that utilizes image decomposition and lighting inconsistency. By exploiting inherent discrepancies in imaging environments between genuine and fabricated images, this method extracts robust lighting cues and mitigates disturbances from environmental factors, revealing deeper-level alterations. A crucial element is the lighting information feature extractor, designed according to color constancy principles, to identify inconsistencies in lighting conditions. To address lighting variations, we employ a face material feature extractor using Pattern of Local Gravitational Force (PLGF), which selectively processes image patterns with defined convolutional masks to isolate and focus on reflectance coefficients, rich in textural details essential for forgery detection. Utilizing the Lambertian lighting model, we generate lighting direction vectors across frames to provide temporal context for detection. This framework processes RGB images, face reflectance maps, lighting features, and lighting direction vectors as multi-channel inputs, applying a cross-attention mechanism at the feature level to enhance detection accuracy and adaptability. Experimental results show that our proposed method performs exceptionally well and is widely applicable across multiple datasets, underscoring its importance in advancing deep forgery detection. Full article

(This article belongs to the Special Issue Deep Learning Approach for Secure and Trustworthy Biometric System)

► Show Figures

Figure 1

20 pages, 1946 KiB

Open AccessArticle

Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos

by Hemel Sharker Akash, Md Abdur Rahim, Abu Saleh Musa Miah, Hyoun-Sup Lee, Si-Woong Jang and Jungpil Shin

Sensors 2024, 24(21), 7077; https://doi.org/10.3390/s24217077 - 3 Nov 2024

Cited by 2 | Viewed by 1930

Abstract

Human interaction recognition (HIR) between two people in videos is a critical field in computer vision and pattern recognition, aimed at identifying and understanding human interaction and actions for applications such as healthcare, surveillance, and human–computer interaction. Despite its significance, video-based HIR faces [...] Read more.

Human interaction recognition (HIR) between two people in videos is a critical field in computer vision and pattern recognition, aimed at identifying and understanding human interaction and actions for applications such as healthcare, surveillance, and human–computer interaction. Despite its significance, video-based HIR faces challenges in achieving satisfactory performance due to the complexity of human actions, variations in motion, different viewpoints, and environmental factors. In the study, we proposed a two-stream deep learning-based HIR system to address these challenges and improve the accuracy and reliability of HIR systems. In the process, two streams extract hierarchical features based on the skeleton and RGB information, respectively. In the first stream, we utilised YOLOv8-Pose for human pose extraction, then extracted features with three stacked LSM modules and enhanced them with a dense layer that is considered the final feature of the first stream. In the second stream, we utilised SAM on the input videos, and after filtering the Segment Anything Model (SAM) feature, we employed integrated LSTM and GRU to extract the long-range dependency feature and then enhanced them with a dense layer that was considered the final feature for the second stream module. Here, SAM was utilised for segmented mesh generation, and ImageNet was used for feature extraction from images or meshes, focusing on extracting relevant features from sequential image data. Moreover, we newly created a custom filter function to enhance computational efficiency and eliminate irrelevant keypoints and mesh components from the dataset. We concatenated the two stream features and produced the final feature that fed into the classification module. The extensive experiment with the two benchmark datasets of the proposed model achieved 96.56% and 96.16% accuracy, respectively. The high-performance accuracy of the proposed model proved its superiority. Full article

(This article belongs to the Special Issue Computer Vision and Sensors-Based Application for Intelligent Systems)

► Show Figures

Figure 1

16 pages, 8982 KiB

Open AccessArticle

A Two-Stream Method for Human Action Recognition Using Facial Action Cues

by Zhimao Lai, Yan Zhang and Xiubo Liang

Sensors 2024, 24(21), 6817; https://doi.org/10.3390/s24216817 - 23 Oct 2024

Cited by 1 | Viewed by 1444

Abstract

Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. [...] Read more.

Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. In such situations, the face often remains visible, providing valuable cues for action recognition. This paper introduces Face in Action (FIA), a novel two-stream method that leverages facial action cues for robust action recognition under conditions of significant occlusion. FIA consists of an RGB stream and a landmark stream. The RGB stream processes facial image sequences using a fine-spatio-multitemporal (FSM) 3D convolution module, which employs smaller spatial receptive fields to capture detailed local facial movements and larger temporal receptive fields to model broader temporal dynamics. The landmark stream processes facial landmark sequences using a normalized temporal attention (NTA) module within an NTA-GCN block, enhancing the detection of key facial frames and improving overall recognition accuracy. We validate the effectiveness of FIA using the NTU RGB+D and NTU RGB+D 120 datasets, focusing on action categories related to medical conditions. Our experiments demonstrate that FIA significantly outperforms existing methods in scenarios with extensive occlusion, highlighting its potential for practical applications in surveillance and healthcare settings. Full article

(This article belongs to the Special Issue Deep Learning Applications for Pose Estimation and Human Action Recognition)

► Show Figures

Figure 1

17 pages, 7063 KiB

Open AccessArticle

Online Scene Semantic Understanding Based on Sparsely Correlated Network for AR

by Qianqian Wang, Junhao Song, Chenxi Du and Chen Wang

Sensors 2024, 24(14), 4756; https://doi.org/10.3390/s24144756 - 22 Jul 2024

Cited by 3 | Viewed by 1391

Abstract

Real-world understanding serves as a medium that bridges the information world and the physical world, enabling the realization of virtual–real mapping and interaction. However, scene understanding based solely on 2D images faces problems such as a lack of geometric information and limited robustness [...] Read more.

Real-world understanding serves as a medium that bridges the information world and the physical world, enabling the realization of virtual–real mapping and interaction. However, scene understanding based solely on 2D images faces problems such as a lack of geometric information and limited robustness against occlusion. The depth sensor brings new opportunities, but there are still challenges in fusing depth with geometric and semantic priors. To address these concerns, our method considers the repeatability of video stream data and the sparsity of newly generated data. We introduce a sparsely correlated network architecture (SCN) designed explicitly for online RGBD instance segmentation. Additionally, we leverage the power of object-level RGB-D SLAM systems, thereby transcending the limitations of conventional approaches that solely emphasize geometry or semantics. We establish correlation over time and leverage this correlation to develop rules and generate sparse data. We thoroughly evaluate the system’s performance on the NYU Depth V2 and ScanNet V2 datasets, demonstrating that incorporating frame-to-frame correlation leads to significantly improved accuracy and consistency in instance segmentation compared to existing state-of-the-art alternatives. Moreover, using sparse data reduces data complexity while ensuring the real-time requirement of 18 fps. Furthermore, by utilizing prior knowledge of object layout understanding, we showcase a promising application of augmented reality, showcasing its potential and practicality. Full article

(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)

► Show Figures

Figure 1

27 pages, 5718 KiB

Open AccessArticle

An Interpretable Modular Deep Learning Framework for Video-Based Fall Detection

by Micheal Dutt, Aditya Gupta, Morten Goodwin and Christian W. Omlin

Appl. Sci. 2024, 14(11), 4722; https://doi.org/10.3390/app14114722 - 30 May 2024

Cited by 5 | Viewed by 3383

Abstract

Falls are a major risk factor for older adults, increasing morbidity and healthcare costs. Video-based fall-detection systems offer crucial real-time monitoring and assistance. Yet, their deployment faces challenges such as maintaining privacy, reducing false alarms, and providing understandable outputs for healthcare providers. This [...] Read more.

Falls are a major risk factor for older adults, increasing morbidity and healthcare costs. Video-based fall-detection systems offer crucial real-time monitoring and assistance. Yet, their deployment faces challenges such as maintaining privacy, reducing false alarms, and providing understandable outputs for healthcare providers. This paper introduces an innovative automated fall-detection framework that includes a Gaussian blur module for privacy preservation, an OpenPose module for precise pose estimation, a short-time Fourier transform (STFT) module to capture frames with significant motion selectively, and a computationally efficient one-dimensional convolutional neural network (1D-CNN) classification module designed to classify these frames. Additionally, integrating a gradient-weighted class activation mapping (GradCAM) module enhances the system’s explainability by visually highlighting the movement of the key points, resulting in classification decisions. Modular flexibility in our system allows customization to meet specific privacy and monitoring needs, enabling the activation or deactivation of modules according to the operational requirements of different healthcare settings. This combination of STFT and 1D-CNN ensures fast and efficient processing, which is essential in healthcare environments where real-time response and accuracy are vital. We validated our approach across multiple datasets, including the Multiple Cameras Fall Dataset (MCFD), the UR fall dataset, and the NTU RGB+D Dataset, which demonstrates high accuracy in detecting falls and provides the interpretability of results. Full article

► Show Figures

Figure 1

19 pages, 4243 KiB

Open AccessArticle

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

by Yuan Gao, Yu Zhang, Ping Zeng and Yingjie Ma

Electronics 2024, 13(9), 1749; https://doi.org/10.3390/electronics13091749 - 1 May 2024

Cited by 4 | Viewed by 2346

Abstract

The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into [...] Read more.

The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into a significant threat, complicating the task of deepfake detection. Despite the notable success of current deepfake detection methods, which predominantly employ data-driven CNN classification models, these methods exhibit limited generalization capabilities and insufficient robustness against novel data unseen during training. To tackle these challenges, this paper introduces a novel detection framework, ReLAF-Net. This framework employs a restricted self-attention mechanism that applies self-attention to deep CNN features flexibly, facilitating the learning of local relationships and inter-regional dependencies at both fine-grained and global levels. This attention mechanism has a modular design that can be seamlessly integrated into CNN networks to improve overall detection performance. Additionally, we propose an adaptive local frequency feature extraction algorithm that decomposes RGB images into fine-grained frequency domains in a data-driven manner, effectively isolating fake indicators in the frequency space. Moreover, an attention-based channel fusion strategy is developed to amalgamate RGB and frequency information, achieving a comprehensive facial representation. Tested on the high-quality version of the FaceForensics++ dataset, our method attained a detection accuracy of 97.92%, outperforming other approaches. Cross-dataset validation on Celeb-DF, DFDC, and DFD confirms the robust generalizability, offering a new solution for detecting high-quality deepfake videos. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

17 pages, 3609 KiB

Open AccessArticle

Real-Time Dynamic Intelligent Image Recognition and Tracking System for Rockfall Disasters

by Yu-Wei Lin, Chu-Fu Chiu, Li-Hsien Chen and Chao-Ching Ho

J. Imaging 2024, 10(4), 78; https://doi.org/10.3390/jimaging10040078 - 26 Mar 2024

Cited by 3 | Viewed by 2536

Abstract

Taiwan, frequently affected by extreme weather causing phenomena such as earthquakes and typhoons, faces a high incidence of rockfall disasters due to its largely mountainous terrain. These disasters have led to numerous casualties, government compensation cases, and significant transportation safety impacts. According to [...] Read more.

Taiwan, frequently affected by extreme weather causing phenomena such as earthquakes and typhoons, faces a high incidence of rockfall disasters due to its largely mountainous terrain. These disasters have led to numerous casualties, government compensation cases, and significant transportation safety impacts. According to the National Science and Technology Center for Disaster Reduction records from 2010 to 2022, 421 out of 866 soil and rock disasters occurred in eastern Taiwan, causing traffic disruptions due to rockfalls. Since traditional sensors of disaster detectors only record changes after a rockfall, there is no system in place to detect rockfalls as they occur. To combat this, a rockfall detection and tracking system using deep learning and image processing technology was developed. This system includes a real-time image tracking and recognition system that integrates YOLO and image processing technology. It was trained on a self-collected dataset of 2490 high-resolution RGB images. The system’s performance was evaluated on 30 videos featuring various rockfall scenarios. It achieved a mean Average Precision (mAP50) of 0.845 and mAP50-95 of 0.41, with a processing time of 125 ms. Tested on advanced hardware, the system proves effective in quickly tracking and identifying hazardous rockfalls, offering a significant advancement in disaster management and prevention. Full article

(This article belongs to the Special Issue From Imaging to Understanding: Methods and Application for Environment, Infrastructure and Human Monitoring)

► Show Figures

Figure 1

16 pages, 11754 KiB

Open AccessArticle

Assessment System for Child Head Injury from Falls Based on Neural Network Learning

by Ziqian Yang, Baiyu Tsui and Zhihui Wu

Sensors 2023, 23(18), 7896; https://doi.org/10.3390/s23187896 - 15 Sep 2023

Cited by 4 | Viewed by 1835

Abstract

Toddlers face serious health hazards if they fall from relatively high places at home during everyday activities and are not swiftly rescued. Still, few effective, precise, and exhaustive solutions exist for such a task. This research aims to create a real-time assessment system [...] Read more.

Toddlers face serious health hazards if they fall from relatively high places at home during everyday activities and are not swiftly rescued. Still, few effective, precise, and exhaustive solutions exist for such a task. This research aims to create a real-time assessment system for head injury from falls. Two phases are involved in processing the framework: In phase I, the data of joints is obtained by processing surveillance video with Open Pose. The long short-term memory (LSTM) network and 3D transform model are then used to integrate key spots’ frame space and time information. In phase II, the head acceleration is derived and inserted into the HIC value calculation, and a classification model is developed to assess the injury. We collected 200 RGB-captured daily films of 13- to 30-month-old toddlers playing near furniture edges, guardrails, and upside-down falls. Five hundred video clips extracted from these are divided in an 8:2 ratio into a training and validation set. We prepared an additional collection of 300 video clips (test set) of toddlers’ daily falling at home from their parents to evaluate the framework’s performance. The experimental findings revealed a classification accuracy of 96.67%. The feasibility of a real-time AI technique for assessing head injuries in falls through monitoring was proven. Full article

(This article belongs to the Special Issue AI-based Sensing for Health Monitoring and Medical Diagnosis)

► Show Figures

Figure 1

16 pages, 5389 KiB

Open AccessArticle

Fake Biometric Detection Based on Photoplethysmography Extracted from Short Hand Videos

by Byeongseon An, Hyeji Lim and Eui Chul Lee

Electronics 2023, 12(17), 3605; https://doi.org/10.3390/electronics12173605 - 26 Aug 2023

Cited by 4 | Viewed by 2160

Abstract

An array of authentication methods has emerged, underscoring the importance of addressing spoofing challenges arising from forgery and alteration. Previous studies utilizing palm biometrics have attempted to circumvent spoofing through geometric methods or the analysis of vein images. However, these approaches are inadequate [...] Read more.

An array of authentication methods has emerged, underscoring the importance of addressing spoofing challenges arising from forgery and alteration. Previous studies utilizing palm biometrics have attempted to circumvent spoofing through geometric methods or the analysis of vein images. However, these approaches are inadequate when faced with hand-printed photographs or in the absence of near-infrared sensors. In this study, we propose using remote photoplethysmography (rPPG) signals to tackle spoofing concerns in palm images captured in RGB environments. rPPG signals were extracted using video durations of 3, 5, and 7 s, and 30 features within the heart rate band were identified through frequency conversion. A support vector machine (SVM) model was trained with the processed features, yielding accuracies of 97.16%, 98.4%, and 97.28% for video durations of 3, 5, and 7 s, respectively. These features underwent dimensionality reduction through a principal component analysis (PCA), and the results were compared with the initial 30 features. Additionally, we evaluated the confusion matrix with zero false-positives for each video duration, finding that the overall accuracy experienced a decline of 1 to 3%. The 5 s video retained the highest accuracy with the smallest decrement, registering a value of 97.2%. Full article

(This article belongs to the Special Issue Theories and Technologies of Network, Data and Information Security)

► Show Figures

Figure 1

21 pages, 5075 KiB

Open AccessArticle

Signer-Independent Arabic Sign Language Recognition System Using Deep Learning Model

by Kanchon Kanti Podder, Maymouna Ezeddin, Muhammad E. H. Chowdhury, Md. Shaheenur Islam Sumon, Anas M. Tahir, Mohamed Arselene Ayari, Proma Dutta, Amith Khandakar, Zaid Bin Mahbub and Muhammad Abdul Kadir

Sensors 2023, 23(16), 7156; https://doi.org/10.3390/s23167156 - 14 Aug 2023

Cited by 27 | Viewed by 6070

Abstract

Every one of us has a unique manner of communicating to explore the world, and such communication helps to interpret life. Sign language is the popular language of communication for hearing and speech-disabled people. When a sign language user interacts with a non-sign [...] Read more.

Every one of us has a unique manner of communicating to explore the world, and such communication helps to interpret life. Sign language is the popular language of communication for hearing and speech-disabled people. When a sign language user interacts with a non-sign language user, it becomes difficult for a signer to express themselves to another person. A sign language recognition system can help a signer to interpret the sign of a non-sign language user. This study presents a sign language recognition system that is capable of recognizing Arabic Sign Language from recorded RGB videos. To achieve this, two datasets were considered, such as (1) the raw dataset and (2) the face–hand region-based segmented dataset produced from the raw dataset. Moreover, operational layer-based multi-layer perceptron “SelfMLP” is proposed in this study to build CNN-LSTM-SelfMLP models for Arabic Sign Language recognition. MobileNetV2 and ResNet18-based CNN backbones and three SelfMLPs were used to construct six different models of CNN-LSTM-SelfMLP architecture for performance comparison of Arabic Sign Language recognition. This study examined the signer-independent mode to deal with real-time application circumstances. As a result, MobileNetV2-LSTM-SelfMLP on the segmented dataset achieved the best accuracy of 87.69% with 88.57% precision, 87.69% recall, 87.72% F1 score, and 99.75% specificity. Overall, face–hand region-based segmentation and SelfMLP-infused MobileNetV2-LSTM-SelfMLP surpassed the previous findings on Arabic Sign Language recognition by 10.970% accuracy. Full article

(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)

► Show Figures

Figure 1

Search Results (29)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (29)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI