MDPI - Publisher of Open Access Journals

20 pages, 8021 KB

Open AccessArticle

CNN 1D: A Robust Model for Human Pose Estimation

by Mercedes Hernández de la Cruz, Uriel Solache, Antonio Luna-Álvarez, Sergio Ricardo Zagal-Barrera, Daniela Aurora Morales López and Dante Mujica-Vargas

Information 2025, 16(2), 129; https://doi.org/10.3390/info16020129 - 10 Feb 2025

Cited by 2 | Viewed by 2034

Abstract

The purpose of this research is to develop an efficient model for human pose estimation (HPE). The main limitations of the study include the small size of the dataset and confounds in the classification of certain poses, suggesting the need for more data to improve the robustness of the model in uncontrolled environments. The methodology used combines MediaPipe for the detection of key points in images with a CNN1D model that processes preprocessed feature sequences. The Yoga Poses dataset was used for the training and validation of the model, and resampling techniques, such as bootstrapping, were applied to improve accuracy and avoid overfitting in the training. The results show that the proposed model achieves 96% overall accuracy in the classification of five yoga poses, with accuracy metrics above 90% for all classes. The implementation of the CNN1D model instead of traditional 2D or 3D architectures accomplishes the goal of maintaining a low computational cost and efficient preprocessing of the images, allowing for its use on mobile devices and real-time environments. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

28 pages, 5769 KB

Open AccessArticle

Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis

by Andrzej D. Dobrzycki, Ana M. Bernardos, Luca Bergesio, Andrzej Pomirski and Daniel Sáez-Trigueros

Mathematics 2024, 12(1), 76; https://doi.org/10.3390/math12010076 - 25 Dec 2023

Cited by 5 | Viewed by 3303

Abstract

Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess the effectiveness of CLIP in classifying human postures, focusing on its application in yoga. Despite the initial limitations of the zero-shot approach, applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results. The article describes the full procedure for fine-tuning, including the choice for image description syntax, models and hyperparameters adjustment. The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%, surpassing the current state-of-the-art of previous works on the same dataset by approximately 6%, its training time being 3.5 times lower than what is needed to fine-tune a YOLOv8-based model. For more application-oriented scenarios, with smaller datasets of six postures each, containing 1301 and 401 training images, the fine-tuned models attain an accuracy of 98.8% and 99.1%, respectively. Furthermore, our experiments indicate that training with as few as 20 images per pose can yield around 90% accuracy in a six-class dataset. This study demonstrates that this multimodal technique can be effectively used for yoga pose classification, and possibly for human posture classification, in general. Additionally, CLIP inference time (around 7 ms) supports that the model can be integrated into automated systems for posture evaluation, e.g., for developing a real-time personal yoga assistant for performance assessment. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

12 pages, 751 KB

Open AccessArticle

A Computer Vision-Based Yoga Pose Grading Approach Using Contrastive Skeleton Feature Representations

by Yubin Wu, Qianqian Lin, Mingrun Yang, Jing Liu, Jing Tian, Dev Kapil and Laura Vanderbloemen

Healthcare 2022, 10(1), 36; https://doi.org/10.3390/healthcare10010036 - 25 Dec 2021

Cited by 33 | Viewed by 6538

Abstract

The main objective of yoga pose grading is to assess the input yoga pose and compare it to a standard pose in order to provide a quantitative evaluation as a grade. In this paper, a computer vision-based yoga pose grading approach is proposed using contrastive skeleton feature representations. First, the proposed approach extracts human body skeleton keypoints from the input yoga pose image and then feeds their coordinates into a pose feature encoder, which is trained using contrastive triplet examples; finally, a comparison of similar encoded pose features is made. Furthermore, to tackle the inherent challenge of composing contrastive examples in pose feature encoding, this paper proposes a new strategy to use both a coarse triplet example—comprised of an anchor, a positive example from the same category, and a negative example from a different category, and a fine triplet example—comprised of an anchor, a positive example, and a negative example from the same category with different pose qualities. Extensive experiments are conducted using two benchmark datasets to demonstrate the superior performance of the proposed approach. Full article

(This article belongs to the Section Artificial Intelligence in Medicine)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI