sensors-logo

Journal Browser

Journal Browser

Special Issue "Human Activity Recognition Based on Image Sensors and Deep Learning"

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 21 May 2021.

Special Issue Editors

Prof. Dr. Fakhreddine Ababsa
E-Mail Website
Guest Editor
LISPEN EA 7515, Arts et Métiers, Institut Image, Chalon-sur-Saône, Burgundy, France
Interests: virtual and augmented reality; computer vision; image processing
Dr. Cyrille Migniot
E-Mail Website
Guest Editor
ImViA EA 7535, Dijon, Burgundy, France
Interests: computer vision; motion analysis; human monitoring

Special Issue Information

Dear Colleagues,

Video-based human activity recognition (HAR) has made considerable progress in recent years due to its applications in various fields, such as surveillance, entertainment, smart homes, sport analysis, human–computer interaction, virtual reality, enhanced manufacturing, and healthcare systems. Its purpose is to automatically detect, track, and describe human activities in a sequence of image frames.

Deep learning (DL) techniques have become popular for video-based HAR, thanks in particular to their accuracy and their ability to handle large and well-annotated video databases. Nonetheless, their application to this field is still relatively new. Consequently, the exploration of use of DL in video-based HAR provides scope for significant contributions. For example, common DL approaches automatically extract hierarchical features from static images and do not take into account motion, which is a key feature for human activity description. Using techniques such as long short-term memory (LSTM), which have proven their power in motion modeling, could provide more efficient solutions. Moreover, multistream networks would allow modeling the temporal dependencies between motion and appearance. To deal with challenging conditions (such as background cluttering and illumination change) introduced by the motion in images and improve classification performance, advanced DL architectures could be considered like transfer learning, generative adversarial network (GAN), and multitask learning.

The aim of this Special Issue is to report on recent research works on video-based human activity recognition using advanced deep learning techniques. We encourage submissions of conceptual, empirical, and literature review papers focusing on this field, regardless of the application area.

Prof. Dr. Fakhreddine Ababsa
Dr. Cyrille Migniot
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2200 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Multi-Channel Generative Framework and Supervised Learning for Anomaly Detection in Surveillance Videos
Sensors 2021, 21(9), 3179; https://doi.org/10.3390/s21093179 (registering DOI) - 03 May 2021
Viewed by 248
Abstract
Recently, most state-of-the-art anomaly detection methods are based on apparent motion and appearance reconstruction networks and use error estimation between generated and real information as detection features. These approaches achieve promising results by only using normal samples for training steps. In this paper, [...] Read more.
Recently, most state-of-the-art anomaly detection methods are based on apparent motion and appearance reconstruction networks and use error estimation between generated and real information as detection features. These approaches achieve promising results by only using normal samples for training steps. In this paper, our contributions are two-fold. On the one hand, we propose a flexible multi-channel framework to generate multi-type frame-level features. On the other hand, we study how it is possible to improve the detection performance by supervised learning. The multi-channel framework is based on four Conditional GANs (CGANs) taking various type of appearance and motion information as input and producing prediction information as output. These CGANs provide a better feature space to represent the distinction between normal and abnormal events. Then, the difference between those generative and ground-truth information is encoded by Peak Signal-to-Noise Ratio (PSNR). We propose to classify those features in a classical supervised scenario by building a small training set with some abnormal samples of the original test set of the dataset. The binary Support Vector Machine (SVM) is applied for frame-level anomaly detection. Finally, we use Mask R-CNN as detector to perform object-centric anomaly localization. Our solution is largely evaluated on Avenue, Ped1, Ped2, and ShanghaiTech datasets. Our experiment results demonstrate that PSNR features combined with supervised SVM are better than error maps computed by previous methods. We achieve state-of-the-art performance for frame-level AUC on Ped1 and ShanghaiTech. Especially, for the most challenging Shanghaitech dataset, a supervised training model outperforms up to 9% the state-of-the-art an unsupervised strategy. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

Open AccessArticle
Using a Deep Learning Method and Data from Two-Dimensional (2D) Marker-Less Video-Based Images for Walking Speed Classification
Sensors 2021, 21(8), 2836; https://doi.org/10.3390/s21082836 - 17 Apr 2021
Viewed by 436
Abstract
Human body measurement data related to walking can characterize functional movement and thereby become an important tool for health assessment. Single-camera-captured two-dimensional (2D) image sequences of marker-less walking individuals might be a simple approach for estimating human body measurement data which could be [...] Read more.
Human body measurement data related to walking can characterize functional movement and thereby become an important tool for health assessment. Single-camera-captured two-dimensional (2D) image sequences of marker-less walking individuals might be a simple approach for estimating human body measurement data which could be used in walking speed-related health assessment. Conventional body measurement data of 2D images are dependent on body-worn garments (used as segmental markers) and are susceptible to changes in the distance between the participant and camera in indoor and outdoor settings. In this study, we propose five ratio-based body measurement data that can be extracted from 2D images and can be used to classify three walking speeds (i.e., slow, normal, and fast) using a deep learning-based bidirectional long short-term memory classification model. The results showed that average classification accuracies of 88.08% and 79.18% could be achieved in indoor and outdoor environments, respectively. Additionally, the proposed ratio-based body measurement data are independent of body-worn garments and not susceptible to changes in the distance between the walking individual and camera. As a simple but efficient technique, the proposed walking speed classification has great potential to be employed in clinics and aged care homes. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

Open AccessArticle
3D Human Pose Estimation with a Catadioptric Sensor in Unconstrained Environments Using an Annealed Particle Filter
Sensors 2020, 20(23), 6985; https://doi.org/10.3390/s20236985 - 07 Dec 2020
Viewed by 548
Abstract
The purpose of this paper is to investigate the problem of 3D human tracking in complex environments using a particle filter with images captured by a catadioptric vision system. This issue has been widely studied in the literature on RGB images acquired from [...] Read more.
The purpose of this paper is to investigate the problem of 3D human tracking in complex environments using a particle filter with images captured by a catadioptric vision system. This issue has been widely studied in the literature on RGB images acquired from conventional perspective cameras, while omnidirectional images have seldom been used and published research works in this field remains limited. In this study, the Riemannian varieties was considered in order to compute the gradient on spherical images and generate a robust descriptor used along with an SVM classifier for human detection. Original likelihood functions associated with the particle filter are proposed, using both geodesic distances and overlapping regions between the silhouette detected in the images and the projected 3D human model. Our approach was experimentally evaluated on real data and showed favorable results compared to machine learning based techniques about the 3D pose accuracy. Thus, the Root Mean Square Error (RMSE) was measured by comparing estimated 3D poses and truth data, resulting in a mean error of 0.065 m when walking action was applied. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

Open AccessArticle
Recognition of Non-Manual Content in Continuous Japanese Sign Language
Sensors 2020, 20(19), 5621; https://doi.org/10.3390/s20195621 - 01 Oct 2020
Cited by 1 | Viewed by 485
Abstract
The quality of recognition systems for continuous utterances in signed languages could be largely advanced within the last years. However, research efforts often do not address specific linguistic features of signed languages, as e.g., non-manual expressions. In this work, we evaluate the potential [...] Read more.
The quality of recognition systems for continuous utterances in signed languages could be largely advanced within the last years. However, research efforts often do not address specific linguistic features of signed languages, as e.g., non-manual expressions. In this work, we evaluate the potential of a single video camera-based recognition system with respect to the latter. For this, we introduce a two-stage pipeline based on two-dimensional body joint positions extracted from RGB camera data. The system first separates the data flow of a signed expression into meaningful word segments on the base of a frame-wise binary Random Forest. Next, every segment is transformed into image-like shape and classified with a Convolutional Neural Network. The proposed system is then evaluated on a data set of continuous sentence expressions in Japanese Sign Language with a variation of non-manual expressions. Exploring multiple variations of data representations and network parameters, we are able to distinguish word segments of specific non-manual intonations with 86% accuracy from the underlying body joint movement data. Full sentence predictions achieve a total Word Error Rate of 15.75%. This marks an improvement of 13.22% as compared to ground truth predictions obtained from labeling insensitive towards non-manual content. Consequently, our analysis constitutes an important contribution for a better understanding of mixed manual and non-manual content in signed communication. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

Open AccessArticle
Focus on the Visible Regions: Semantic-Guided Alignment Model for Occluded Person Re-Identification
Sensors 2020, 20(16), 4431; https://doi.org/10.3390/s20164431 - 08 Aug 2020
Cited by 1 | Viewed by 805
Abstract
The occlusion problem is very common in pedestrian retrieval scenarios. When persons are occluded by various obstacles, the noise caused by the occluded area greatly affects the retrieval results. However, many previous pedestrian re-identification (Re-ID) methods ignore this problem. To solve it, we [...] Read more.
The occlusion problem is very common in pedestrian retrieval scenarios. When persons are occluded by various obstacles, the noise caused by the occluded area greatly affects the retrieval results. However, many previous pedestrian re-identification (Re-ID) methods ignore this problem. To solve it, we propose a semantic-guided alignment model that uses image semantic information to separate useful information from occlusion noise. In the image preprocessing phase, we use a human semantic parsing network to generate probability maps. These maps show which regions of images are occluded, and the model automatically crops images to preserve the visible parts. In the construction phase, we fuse the probability maps with the global features of the image, and semantic information guides the model to focus on visible human regions and extract local features. During the matching process, we propose a measurement strategy that only calculates the distance of public areas (visible human areas on both images) between images, thereby suppressing the spatial misalignment caused by non-public areas. Experimental results on a series of public datasets confirm that our method outperforms previous occluded Re-ID methods, and it achieves top performance in the holistic Re-ID problem. Full article
(This article belongs to the Special Issue Human Activity Recognition Based on Image Sensors and Deep Learning)
Show Figures

Figure 1

Back to TopTop