Submit to Special Issue Submit Abstract to Special Issue Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Computer Vision-Based Human Activity Recognition

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 20 July 2026 | Viewed by 10518

Share This Special Issue

Special Issue Editor

Dr. Stefan Poslad

E-Mail Website
Guest Editor

Director, IoT2US Lab, Queen Mary University of London, London, UK
Interests: Internet of Things; ubiquitous computing; smart environments; spatial-awareness; pervasive games; security; privacy
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Humans perform a common set of (physical) activities of daily living (ADLs) necessary for self-care and living independently, involving body-part versus whole-body movement. Yet, humans also perform a richer variety of ADLs applied to entertainment, health, sports, surveillance, transport, leisure, and work, involving single and groups of humans scaling up to large crowds. Increasing (and autonomous) automation, changes to the physical environment, and the ageing yet increasing population will affect our ADLs. Hence, recognising, modelling, and analysing ADLs are essential and have many benefits and applications. Whilst a range of sensors can be used for human activity recognition (HAR), the focus here is on the use of computer vision (CV) used for HAR that includes a range of cameras—micro to macro, short-range to remote, stationary versus mobile, and visible versus non-visible light—and can involve the use of hybrid (non-visual plus) visual sensor fusion. This is being driven by advances in micro-sensors, cheaper high-resolution cameras, the increased embedding of cameras into the environment, and improvements in computer vision object recognition and artificial intelligence. Note that most HAR goes beyond pure human recognition and involves relevant physical object recognition to greatly aid this too. This SI targets innovations that support the narrative given above. It also includes new methods and designs for systems based on the Internet of Things; cyber-physical and embedded systems; sensor data acquisition; sensor data fusion; data analytics involving probabilistic and digital twin models to classify, predict, and simulate HAR; data science and AI; and data visualisation and decision support for HAR. Note that accepted papers need to have a viable computer vision-sensing element for HAR.

Dr. Stefan Poslad
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

human activity recognition (HAR)
activities of daily living (ADLs)
computer vision (CV)
sensor data fusion for CV
AI and data science for CV

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

21 pages, 9749 KiB

Open AccessArticle

Enhanced Pose Estimation for Badminton Players via Improved YOLOv8-Pose with Efficient Local Attention

by Yijian Wu, Zewen Chen, Hongxing Zhang, Yulin Yang and Weichao Yi

Sensors 2025, 25(14), 4446; https://doi.org/10.3390/s25144446 - 17 Jul 2025

Viewed by 528

Abstract

With the rapid development of sports analytics and artificial intelligence, accurate human pose estimation in badminton is becoming increasingly important. However, challenges such as the lack of domain-specific datasets and the complexity of athletes’ movements continue to hinder progress in this area. To address these issues, we propose an enhanced pose estimation framework tailored to badminton players, built upon an improved YOLOv8-Pose architecture. In particular, we introduce an efficient local attention (ELA) mechanism that effectively captures fine-grained spatial dependencies and contextual information, thereby significantly improving the keypoint localization accuracy and overall pose estimation performance. To support this study, we construct a dedicated badminton pose dataset comprising 4000 manually annotated samples, captured using a Microsoft Kinect v2 camera. The raw data undergo careful processing and refinement through a combination of depth-assisted annotation and visual inspection to ensure high-quality ground truth keypoints. Furthermore, we conduct an in-depth comparative analysis of multiple attention modules and their integration strategies within the network, offering generalizable insights to enhance pose estimation models in other sports domains. The experimental results show that the proposed ELA-enhanced YOLOv8-Pose model consistently achieves superior accuracy across multiple evaluation metrics, including the mean squared error (MSE), object keypoint similarity (OKS), and percentage of correct keypoints (PCK), highlighting its effectiveness and potential for broader applications in sports vision tasks. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

14 pages, 772 KiB

Open AccessArticle

Leveraging Artificial Occluded Samples for Data Augmentation in Human Activity Recognition

by Eirini Mathe, Ioannis Vernikos, Evaggelos Spyrou and Phivos Mylonas

Sensors 2025, 25(4), 1163; https://doi.org/10.3390/s25041163 - 14 Feb 2025

Viewed by 1036

Abstract

A significant challenge in human activity recognition lies in the limited size and diversity of training datasets, which can lead to overfitting and the poor generalization of deep learning models. Common solutions include data augmentation and transfer learning. This paper introduces a novel data augmentation method that simulates occlusion by artificially removing body parts from skeleton representations in training datasets. This contrasts with previous approaches that focused on augmenting data with rotated skeletons. The proposed method increases dataset size and diversity, enabling models to handle a broader range of scenarios. Occlusion, a common challenge in real-world HAR, occurs when body parts or external objects block visibility, disrupting activity recognition. By leveraging artificially occluded samples, the proposed methodology enhances model robustness, leading to improved recognition performance, even on non-occluded activities. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

19 pages, 3693 KiB

Open AccessArticle

Real-Time On-Device Continual Learning Based on a Combined Nearest Class Mean and Replay Method for Smartphone Gesture Recognition

by Heon-Sung Park, Min-Kyung Sung, Dae-Won Kim and Jaesung Lee

Sensors 2025, 25(2), 427; https://doi.org/10.3390/s25020427 - 13 Jan 2025

Viewed by 1517

Abstract

Sensor-based gesture recognition on mobile devices is critical to human–computer interaction, enabling intuitive user input for various applications. However, current approaches often rely on server-based retraining whenever new gestures are introduced, incurring substantial energy consumption and latency due to frequent data transmission. To address these limitations, we present the first on-device continual learning framework for gesture recognition. Leveraging the Nearest Class Mean (NCM) classifier coupled with a replay-based update strategy, our method enables continuous adaptation to new gestures under limited computing and memory resources. By employing replay buffer management, we efficiently store and revisit previously learned instances, mitigating catastrophic forgetting and ensuring stable performance as new gestures are added. Experimental results on a Samsung Galaxy S10 device demonstrate that our method achieves over 99% accuracy while operating entirely on-device, offering a compelling synergy between computational efficiency, robust continual learning, and high recognition accuracy. This work demonstrates the potential of on-device continual learning frameworks that integrate NCM classifiers with replay-based techniques, thereby advancing the field of resource-constrained, adaptive gesture recognition. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

15 pages, 2250 KiB

Open AccessArticle

Video-Based Plastic Bag Grabbing Action Recognition: A New Video Dataset and a Comparative Study of Baseline Models

by Pei Jing Low, Bo Yan Ng, Nur Insyirah Mahzan, Jing Tian and Cheung-Chi Leung

Sensors 2025, 25(1), 255; https://doi.org/10.3390/s25010255 - 4 Jan 2025

Cited by 1 | Viewed by 1271

Abstract

Recognizing the action of plastic bag taking from CCTV video footage represents a highly specialized and niche challenge within the broader domain of action video classification. To address this challenge, our paper introduces a novel benchmark video dataset specifically curated for the task of identifying the action of grabbing a plastic bag. Additionally, we propose and evaluate three distinct baseline approaches. The first approach employs a combination of handcrafted feature extraction techniques and a sequential classification model to analyze motion and object-related features. The second approach leverages a multiple-frame convolutional neural network (CNN) to exploit temporal and spatial patterns in the video data. The third approach explores a 3D CNN-based deep learning model, which is capable of processing video data as volumetric inputs. To assess the performance of these methods, we conduct a comprehensive comparative study, demonstrating the strengths and limitations of each approach within this specialized domain. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

19 pages, 2352 KiB

Open AccessArticle

Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition

by Ziliang Ren, Xiongjiang Xiao and Huabei Nie

Sensors 2024, 24(23), 7682; https://doi.org/10.3390/s24237682 - 30 Nov 2024

Viewed by 1962

Abstract

Action recognition based on 3D heatmap volumes has received increasing attention recently because it is suitable for application to 3D CNNs to improve the recognition performance of deep networks. However, it is difficult for models to capture global dependencies due to their restricted receptive field. To effectively capture long-range dependencies and balance computations, a novel model, PoseTransformer3D with Global Cross Blocks (GCBs), is proposed for pose-based action recognition. The proposed model extracts spatio-temporal features from processed 3D heatmap volumes. Moreover, we design a further recognition framework, RGB-PoseTransformer3D with Global Cross Complementary Blocks (GCCBs), for multimodality feature learning from both pose and RGB data. To verify the effectiveness of this model, we conducted extensive experiments on four popular video datasets, namely FineGYM, HMDB51, NTU RGB+D 60, and NTU RGB+D 120. Experimental results show that the proposed recognition framework always achieves state-of-the-art recognition performance, substantially improving multimodality learning through action recognition. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

21 pages, 10905 KiB

Open AccessArticle

Low-Cost Non-Wearable Fall Detection System Implemented on a Single Board Computer for People in Need of Care

by Vanessa Vargas, Pablo Ramos, Edwin A. Orbe, Mireya Zapata and Kevin Valencia-Aragón

Sensors 2024, 24(17), 5592; https://doi.org/10.3390/s24175592 - 29 Aug 2024

Cited by 2 | Viewed by 3198

Abstract

This work aims at proposing an affordable, non-wearable system to detect falls of people in need of care. The proposal uses artificial vision based on deep learning techniques implemented on a Raspberry Pi4 4GB RAM with a High-Definition IR-CUT camera. The CNN architecture classifies detected people into five classes: fallen, crouching, sitting, standing, and lying down. When a fall is detected, the system sends an alert notification to mobile devices through the Telegram instant messaging platform. The system was evaluated considering real daily indoor activities under different conditions: outfit, lightning, and distance from camera. Results show a good trade-off between performance and cost of the system. Obtained performance metrics are: precision of

96.4 %

, specificity of

96.6 %

, accuracy of

94.8 %

, and sensitivity of

93.1 %

. Regarding privacy concerns, even though this system uses a camera, the video is not recorded or monitored by anyone, and pictures are only sent in case of fall detection. This work can contribute to reducing the fatal consequences of falls in people in need of care by providing them with prompt attention. Such a low-cost solution would be desirable, particularly in developing countries with limited or no medical alert systems and few resources. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Journal Menu

Journal Browser

Computer Vision-Based Human Activity Recognition

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (6 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI