MDPI - Publisher of Open Access Journals

27 pages, 3539 KB

Open AccessArticle

MSBN-SPose: A Multi-Scale Bayesian Neuro-Symbolic Approach for Sitting Posture Recognition

by Shu Wang, Adriano Tavares, Carlos Lima, Tiago Gomes, Yicong Zhang and Yanchun Liang

Electronics 2025, 14(19), 3889; https://doi.org/10.3390/electronics14193889 - 30 Sep 2025

Viewed by 222

Posture recognition is critical in modern educational and office environments for preventing musculoskeletal disorders and maintaining cognitive performance. Existing methods based on human keypoint detection typically rely on convolutional neural networks (CNNs) and single-scale features, which limit representation capacity and suffer from overfitting [...] Read more.

Posture recognition is critical in modern educational and office environments for preventing musculoskeletal disorders and maintaining cognitive performance. Existing methods based on human keypoint detection typically rely on convolutional neural networks (CNNs) and single-scale features, which limit representation capacity and suffer from overfitting under small-sample conditions. To address these issues, we propose MSBN-SPose, a Multi-Scale Bayesian Neuro-Symbolic Posture Recognition framework that integrates geometric features at multiple levels—including global body structure, local regions, facial landmarks, distances, and angles—extracted from OpenPose keypoints. These features are processed by a multi-branch Bayesian neural architecture that models epistemic uncertainty, enabling improved generalization and robustness. Furthermore, a lightweight neuro-symbolic reasoning module incorporates human-understandable rules into the inference process, enhancing transparency and interpretability. To support real-world evaluation, we construct the USSP dataset, a diverse, classroom-representative collection of student postures under varying conditions. Experimental results show that MSBN-SPose achieves 96.01% accuracy on USSP, outperforming baseline and traditional methods under data-limited scenarios. Full article

► Show Figures

Figure 1

23 pages, 28830 KB

Open AccessArticle

Micro-Expression-Based Facial Analysis for Automated Pain Recognition in Dairy Cattle: An Early-Stage Evaluation

by Shuqiang Zhang, Kashfia Sailunaz and Suresh Neethirajan

AI 2025, 6(9), 199; https://doi.org/10.3390/ai6090199 - 22 Aug 2025

Viewed by 1035

Abstract

Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm [...] Read more.

Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm triage. Although earlier systems tracked whole-body posture or static grimace scales, frame-level detection of facial micro-expressions has not been explored fully in livestock. We translate micro-expression analytics from automotive driver monitoring to the barn, linking modern computer vision with veterinary ethology. Our two-stage pipeline first detects faces and 30 landmarks using a custom You Only Look Once (YOLO) version 8-Pose network, achieving a 96.9% mean average precision (

m A P

) at an Intersection over the Union (IoU) threshold of 0.50 for detection and 83.8% Object Keypoint Similarity (OKS) for keypoint placement. Cropped eye, ear, and muzzle patches are encoded using a pretrained MobileNetV2, generating 3840-dimensional descriptors that capture millisecond muscle twitches. Sequences of five consecutive frames are fed into a 128-unit Long Short-Term Memory (LSTM) classifier that outputs pain probabilities. On a held-out validation set of 1700 frames, the system records 99.65% accuracy and an F1-score of 0.997, with only three false positives and three false negatives. Tested on 14 unseen barn videos, it attains 64.3% clip-level accuracy (i.e., overall accuracy for the whole video clip) and 83% precision for the pain class, using a hybrid aggregation rule that combines a 30% mean probability threshold with micro-burst counting to temper false alarms. As an early exploration from our proof-of-concept study on a subset of our custom dairy farm datasets, these results show that micro-expression mining can deliver scalable, non-invasive pain surveillance across variations in illumination, camera angle, background, and individual morphology. Future work will explore attention-based temporal pooling, curriculum learning for variable window lengths, domain-adaptive fine-tuning, and multimodal fusion with accelerometry on the complete datasets to elevate the performance toward clinical deployment. Full article

► Show Figures

Figure 1

21 pages, 3420 KB

Open AccessArticle

Keypoints-Based Multi-Cue Feature Fusion Network (MF-Net) for Action Recognition of ADHD Children in TOVA Assessment

by Wanyu Tang, Chao Shi, Yuanyuan Li, Zhonglan Tang, Gang Yang, Jing Zhang and Ling He

Bioengineering 2024, 11(12), 1210; https://doi.org/10.3390/bioengineering11121210 - 29 Nov 2024

Viewed by 1445

Abstract

Attention deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder among children and adolescents. Behavioral detection and analysis play a crucial role in ADHD diagnosis and assessment by objectively quantifying hyperactivity and impulsivity symptoms. Existing video-based action recognition algorithms focus on object or [...] Read more.

Attention deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder among children and adolescents. Behavioral detection and analysis play a crucial role in ADHD diagnosis and assessment by objectively quantifying hyperactivity and impulsivity symptoms. Existing video-based action recognition algorithms focus on object or interpersonal interactions, they may overlook ADHD-specific behaviors. Current keypoints-based algorithms, although effective in attenuating environmental interference, struggle to accurately model the sudden and irregular movements characteristic of ADHD children. This work proposes a novel keypoints-based system, the Multi-cue Feature Fusion Network (MF-Net), for recognizing actions and behaviors of children with ADHD during the Test of Variables of Attention (TOVA). The system aims to assess ADHD symptoms as described in the DSM-V by extracting features from human body and facial keypoints. For human body keypoints, we introduce the Multi-scale Features and Frame-Attention Adaptive Graph Convolutional Network (MSF-AGCN) to extract irregular and impulsive motion features. For facial keypoints, we transform data into images and employ MobileVitv2 for transfer learning to capture facial and head movement features. Ultimately, a feature fusion module is designed to fuse the features from both branches, yielding the final action category prediction. The system, evaluated on 3801 video samples of ADHD children, achieves 90.6% top-1 accuracy and 97.6% top-2 accuracy across six action categories. Additional validation experiments on public datasets NW-UCLA, NTU-2D, and AFEW-VA verify the network’s performance. Full article

(This article belongs to the Special Issue Applications of Computational Modeling in Biomedical Image and Signal Processing)

► Show Figures

Figure 1

16 pages, 6330 KB

Open AccessArticle

A Two-Stage Facial Kinematic Control Strategy for Humanoid Robots Based on Keyframe Detection and Keypoint Cubic Spline Interpolation

by Ye Yuan, Jiahao Li, Qi Yu, Jian Liu, Zongdao Li, Qingdu Li and Na Liu

Mathematics 2024, 12(20), 3278; https://doi.org/10.3390/math12203278 - 18 Oct 2024

Cited by 3 | Viewed by 1854

Abstract

A plentiful number of facial expressions is the basis of natural human–robot interaction for high-fidelity humanoid robots. The facial expression imitation of humanoid robots involves the transmission of human facial expression data to servos situated within the robot’s head. These data drive the [...] Read more.

A plentiful number of facial expressions is the basis of natural human–robot interaction for high-fidelity humanoid robots. The facial expression imitation of humanoid robots involves the transmission of human facial expression data to servos situated within the robot’s head. These data drive the servos to manipulate the skin, thereby enabling the robot to exhibit various facial expressions. However, since the mechanical transmission rate cannot keep up with the data processing rate, humanoid robots often suffer from jitters in the imitation. We conducted a thorough analysis of the transmitted facial expression sequence data and discovered that they are extremely redundant. Therefore, we designed a two-stage strategy for humanoid robots based on facial keyframe detection and facial keypoint detection to achieve more natural and smooth expression imitation. We first built a facial keyframe detection model based on ResNet-50, combined with optical flow estimation, which can identify key expression frames in the sequence. Then, a facial keypoint detection model is used on the keyframes to obtain the facial keypoint coordinates. Based on the coordinates, the cubic spline interpolation method is used to obtain the motion trajectory parameters of the servos, thus realizing the robust control of the humanoid robot’s facial expression. Experiments show that, unlike before where the robot’s imitation would stutter at frame rates above 25 fps, our strategy allows the robot to maintain good facial expression imitation similarity (cosine similarity of 0.7226), even at higher frame rates. Full article

(This article belongs to the Section E2: Control Theory and Mechanics)

► Show Figures

Figure 1

19 pages, 1219 KB

Open AccessArticle

A Robust and Efficient Method for Effective Facial Keypoint Detection

by Yonghui Huang, Yu Chen, Junhao Wang, Pengcheng Zhou, Jiaming Lai and Quanhai Wang

Appl. Sci. 2024, 14(16), 7153; https://doi.org/10.3390/app14167153 - 15 Aug 2024

Cited by 2 | Viewed by 2980

Abstract

Facial keypoint detection technology faces significant challenges under conditions such as occlusion, extreme angles, and other demanding environments. Previous research has largely relied on deep learning regression methods using the face’s overall global template. However, these methods lack robustness in difficult conditions, leading [...] Read more.

Facial keypoint detection technology faces significant challenges under conditions such as occlusion, extreme angles, and other demanding environments. Previous research has largely relied on deep learning regression methods using the face’s overall global template. However, these methods lack robustness in difficult conditions, leading to instability in detecting facial keypoints. To address this challenge, we propose a joint optimization approach that combines regression with heatmaps, emphasizing the importance of local apparent features. Furthermore, to mitigate the reduced learning capacity resulting from model pruning, we integrate external supervision signals through knowledge distillation into our method. This strategy fosters the development of efficient, effective, and lightweight facial keypoint detection technology. Experimental results on the CelebA, 300W, and AFLW datasets demonstrate that our proposed method significantly improves the robustness of facial keypoint detection. Full article

► Show Figures

Figure 1

15 pages, 6566 KB

Open AccessArticle

Two-Stage Method for Clothing Feature Detection

by Xinwei Lyu, Xinjia Li, Yuexin Zhang and Wenlian Lu

Big Data Cogn. Comput. 2024, 8(4), 35; https://doi.org/10.3390/bdcc8040035 - 26 Mar 2024

Viewed by 3699

Abstract

The rapid expansion of e-commerce, particularly in the clothing sector, has led to a significant demand for an effective clothing industry. This study presents a novel two-stage image recognition method. Our approach distinctively combines human keypoint detection, object detection, and classification methods into [...] Read more.

The rapid expansion of e-commerce, particularly in the clothing sector, has led to a significant demand for an effective clothing industry. This study presents a novel two-stage image recognition method. Our approach distinctively combines human keypoint detection, object detection, and classification methods into a two-stage structure. Initially, we utilize open-source libraries, namely OpenPose and Dlib, for accurate human keypoint detection, followed by a custom cropping logic for extracting body part boxes. In the second stage, we employ a blend of Harris Corner, Canny Edge, and skin pixel detection integrated with VGG16 and support vector machine (SVM) models. This configuration allows the bounding boxes to identify ten unique attributes, encompassing facial features and detailed aspects of clothing. Conclusively, the experiment yielded an overall recognition accuracy of 81.4% for tops and 85.72% for bottoms, highlighting the efficacy of the applied methodologies in garment categorization. Full article

► Show Figures

Figure 1

22 pages, 5540 KB

Open AccessArticle

Research on Fatigued-Driving Detection Method by Integrating Lightweight YOLOv5s and Facial 3D Keypoints

by Xiansheng Ran, Shuai He and Rui Li

Sensors 2023, 23(19), 8267; https://doi.org/10.3390/s23198267 - 6 Oct 2023

Cited by 7 | Viewed by 2751

Abstract

In response to the problem of high computational and parameter requirements of fatigued-driving detection models, as well as weak facial-feature keypoint extraction capability, this paper proposes a lightweight and real-time fatigued-driving detection model based on an improved YOLOv5s and Attention Mesh 3D keypoint [...] Read more.

In response to the problem of high computational and parameter requirements of fatigued-driving detection models, as well as weak facial-feature keypoint extraction capability, this paper proposes a lightweight and real-time fatigued-driving detection model based on an improved YOLOv5s and Attention Mesh 3D keypoint extraction method. The main strategies are as follows: (1) Using Shufflenetv2_BD to reconstruct the Backbone network to reduce parameter complexity and computational load. (2) Introducing and improving the fusion method of the Cross-scale Aggregation Module (CAM) between the Backbone and Neck networks to reduce information loss in shallow features of closed-eyes and closed-mouth categories. (3) Building a lightweight Context Information Fusion Module by combining the Efficient Multi-Scale Module (EAM) and Depthwise Over-Parameterized Convolution (DoConv) to enhance the Neck network’s ability to extract facial features. (4) Redefining the loss function using Wise-IoU (WIoU) to accelerate model convergence. Finally, the fatigued-driving detection model is constructed by combining the classification detection results with the thresholds of continuous closed-eye frames, continuous yawning frames, and PERCLOS (Percentage of Eyelid Closure over the Pupil over Time) of eyes and mouth. Under the premise that the number of parameters and the size of the baseline model are reduced by 58% and 56.3%, respectively, and the floating point computation is only 5.9 GFLOPs, the average accuracy of the baseline model is increased by 1%, and the Fatigued-recognition rate is 96.3%, which proves that the proposed algorithm can achieve accurate and stable real-time detection while lightweight. It provides strong support for the lightweight deployment of vehicle terminals. Full article

(This article belongs to the Special Issue Deep Learning Based Face Recognition and Feature Extraction)

► Show Figures

Figure 1

12 pages, 1861 KB

Open AccessArticle

Face Keypoint Detection Method Based on Blaze_ghost Network

by Ning Yu, Yongping Tian, Xiaochuan Zhang and Xiaofeng Yin

Appl. Sci. 2023, 13(18), 10385; https://doi.org/10.3390/app131810385 - 17 Sep 2023

Cited by 4 | Viewed by 2840

Abstract

The accuracy and speed of facial keypoint detection are crucial factors for effectively extracting fatigue features, such as eye blinking and yawning. This paper focuses on the improvement and optimization of facial keypoint detection algorithms, presenting a facial keypoint detection method based on [...] Read more.

The accuracy and speed of facial keypoint detection are crucial factors for effectively extracting fatigue features, such as eye blinking and yawning. This paper focuses on the improvement and optimization of facial keypoint detection algorithms, presenting a facial keypoint detection method based on the Blaze_ghost network and providing more reliable support for facial fatigue analysis. Firstly, the Blaze_ghost network is designed as the backbone network with a deeper structure and more parameters to better capture facial detail features, improving the accuracy of keypoint localization. Secondly, HuberWingloss is designed as the loss function to further reduce the training difficulty of the model and enhance its generalization ability. Compared to traditional loss functions, HuberWingloss can reduce the interference of outliers (such as noise and occlusion) in model training, improve the model’s robustness to complex situations, and further enhance the accuracy of keypoint detection. Experimental results show that the proposed method achieves significant improvements in both the NME (Normal Mean Error) and FR (Failure Rate) evaluation metrics. Compared to traditional methods, the proposed model demonstrates a considerable improvement in keypoint localization accuracy while still maintaining high detection efficiency. Full article

(This article belongs to the Special Issue Current Trends and Future Perspectives on Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

13 pages, 10079 KB

Open AccessArticle

Full-BAPose: Bottom Up Framework for Full Body Pose Estimation

by Bruno Artacho and Andreas Savakis

Sensors 2023, 23(7), 3725; https://doi.org/10.3390/s23073725 - 4 Apr 2023

Cited by 6 | Viewed by 3774

Abstract

We present Full-BAPose, a novel bottom-up approach for full body pose estimation that achieves state-of-the-art results without relying on external people detectors. The Full-BAPose method addresses the broader task of full body pose estimation including hands, feet, and facial landmarks. Our deep learning [...] Read more.

We present Full-BAPose, a novel bottom-up approach for full body pose estimation that achieves state-of-the-art results without relying on external people detectors. The Full-BAPose method addresses the broader task of full body pose estimation including hands, feet, and facial landmarks. Our deep learning architecture is end-to-end trainable based on an encoder-decoder configuration with HRNet backbone and multi-scale representations using a disentangled waterfall atrous spatial pooling module. The disentangled waterfall module leverages the efficiency of progressive filtering, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, it combines multi-scale features obtained from the waterfall flow with the person-detection capability of the disentangled adaptive regression and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes. Full-BAPose achieves state-of-the art performance on the challenging CrowdPose and COCO-WholeBody datasets, with AP of 72.2% and 68.4%, respectively, based on 133 keypoints. Our results demonstrate that Full-BAPose is efficient and robust when operating under a variety conditions, including multiple people, changes in scale, and occlusions. Full article

(This article belongs to the Special Issue Feature Papers in Physical Sensors 2022)

► Show Figures

Figure 1

18 pages, 1944 KB

Open AccessReview

Towards Machine Recognition of Facial Expressions of Pain in Horses

by Pia Haubro Andersen, Sofia Broomé, Maheen Rashid, Johan Lundblad, Katrina Ask, Zhenghong Li, Elin Hernlund, Marie Rhodin and Hedvig Kjellström

Animals 2021, 11(6), 1643; https://doi.org/10.3390/ani11061643 - 1 Jun 2021

Cited by 48 | Viewed by 11208

Abstract

Automated recognition of human facial expressions of pain and emotions is to a certain degree a solved problem, using approaches based on computer vision and machine learning. However, the application of such methods to horses has proven difficult. Major barriers are the lack [...] Read more.

Automated recognition of human facial expressions of pain and emotions is to a certain degree a solved problem, using approaches based on computer vision and machine learning. However, the application of such methods to horses has proven difficult. Major barriers are the lack of sufficiently large, annotated databases for horses and difficulties in obtaining correct classifications of pain because horses are non-verbal. This review describes our work to overcome these barriers, using two different approaches. One involves the use of a manual, but relatively objective, classification system for facial activity (Facial Action Coding System), where data are analyzed for pain expressions after coding using machine learning principles. We have devised tools that can aid manual labeling by identifying the faces and facial keypoints of horses. This approach provides promising results in the automated recognition of facial action units from images. The second approach, recurrent neural network end-to-end learning, requires less extraction of features and representations from the video but instead depends on large volumes of video data with ground truth. Our preliminary results suggest clearly that dynamics are important for pain recognition and show that combinations of recurrent neural networks can classify experimental pain in a small number of horses better than human raters. Full article

(This article belongs to the Special Issue Perception and Expression of Facial Expressions in Animals)

► Show Figures

Figure 1

16 pages, 6419 KB

Open AccessArticle

Content-Aware Eye Tracking for Autostereoscopic 3D Display

by Dongwoo Kang and Jingu Heo

Sensors 2020, 20(17), 4787; https://doi.org/10.3390/s20174787 - 25 Aug 2020

Cited by 14 | Viewed by 6162

Abstract

This study develops an eye tracking method for autostereoscopic three-dimensional (3D) display systems for use in various environments. The eye tracking-based autostereoscopic 3D display provides low crosstalk and high-resolution 3D image experience seamlessly without 3D eyeglasses by overcoming the viewing position restriction. However, [...] Read more.

This study develops an eye tracking method for autostereoscopic three-dimensional (3D) display systems for use in various environments. The eye tracking-based autostereoscopic 3D display provides low crosstalk and high-resolution 3D image experience seamlessly without 3D eyeglasses by overcoming the viewing position restriction. However, accurate and fast eye position detection and tracking are still challenging, owing to the various light conditions, camera control, thick eyeglasses, eyeglass sunlight reflection, and limited system resources. This study presents a robust, automated algorithm and relevant systems for accurate and fast detection and tracking of eye pupil centers in 3D with a single visual camera and near-infrared (NIR) light emitting diodes (LEDs). Our proposed eye tracker consists of eye–nose detection, eye–nose shape keypoint alignment, a tracker checker, and tracking with NIR LED on/off control. Eye–nose detection generates facial subregion boxes, including the eyes and nose, which utilize an Error-Based Learning (EBL) method for the selection of the best learnt database (DB). After detection, the eye–nose shape alignment is processed by the Supervised Descent Method (SDM) with Scale-invariant Feature Transform (SIFT). The aligner is content-aware in the sense that corresponding designated aligners are applied based on image content classification, such as the various light conditions and wearing eyeglasses. The conducted experiments on real image DBs yield promising eye detection and tracking outcomes, even in the presence of challenging conditions. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

19 pages, 3545 KB

Open AccessArticle

CogBeacon: A Multi-Modal Dataset and Data-Collection Platform for Modeling Cognitive Fatigue

by Michalis Papakostas, Akilesh Rajavenkatanarayanan and Fillia Makedon

Technologies 2019, 7(2), 46; https://doi.org/10.3390/technologies7020046 - 13 Jun 2019

Cited by 21 | Viewed by 9998

Abstract

In this work, we present CogBeacon, a multi-modal dataset designed to target the effects of cognitive fatigue in human performance. The dataset consists of 76 sessions collected from 19 male and female users performing different versions of a cognitive task inspired by the [...] Read more.

In this work, we present CogBeacon, a multi-modal dataset designed to target the effects of cognitive fatigue in human performance. The dataset consists of 76 sessions collected from 19 male and female users performing different versions of a cognitive task inspired by the principles of the Wisconsin Card Sorting Test (WCST), a popular cognitive test in experimental and clinical psychology designed to assess cognitive flexibility, reasoning, and specific aspects of cognitive functioning. During each session, we record and fully annotate user EEG functionality, facial keypoints, real-time self-reports on cognitive fatigue, as well as detailed information of the performance metrics achieved during the cognitive task (success rate, response time, number of errors, etc.). Along with the dataset we provide free access to the CogBeacon data-collection software to provide a standardized mechanism to the community for collecting and annotating physiological and behavioral data for cognitive fatigue analysis. Our goal is to provide other researchers with the tools to expand or modify the functionalities of the CogBeacon data-collection framework in a hardware-independent way. As a proof of concept we show some preliminary machine learning-based experiments on cognitive fatigue detection using the EEG information and the subjective user reports as ground truth. Our experiments highlight the meaningfulness of the current dataset, and encourage our efforts towards expanding the CogBeacon platform. To our knowledge, this is the first multi-modal dataset specifically designed to assess cognitive fatigue and the only free software available to allow experiment reproducibility for multi-modal cognitive fatigue analysis. Full article

(This article belongs to the Special Issue Multimedia and Cross-modal Retrieval)

► Show Figures

Figure 1

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI