sensors-logo

Journal Browser

Journal Browser

Computer Vision Techniques Applied to Human Behaviour Analysis in the Real-World

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (31 December 2022) | Viewed by 20558

Special Issue Editors


E-Mail Website
Guest Editor
Centre for Robotics Research, Department of Engineering, King’s College London, London WC2R 2LS, UK
Interests: computer vision; machine learning; human behaviour analysis and synthesis; social signal processing; human–robot interaction

E-Mail Website
Guest Editor
1. Department of Information and Computing Sciences, Utrecht University, 3584CC Utrecht, The Netherlands
2. Department of Computer Engineering, Bogazici University, 34342 Bebek, Istanbul, Turkey
Interests: Machine Learning; Pattern Recognition; Computer Vision; Multimedia Methods; Behaviour Analysis

E-Mail
Guest Editor
School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
Interests: multimodal affective computing; human-computer interaction; mental health assessment and management

Special Issue Information

Dear Colleagues,

We are happy to invite you to submit a paper for the special issue “Computer Vision Techniques Applied to Human Behaviour Analysis in the Real-World”. The details can be found below.

Intelligent devices, such as smart wearables, intelligent vehicles, virtual assistants and robots, are progressively becoming widespread in many aspects of our daily lives, where effective interaction is increasingly desirable. In such applications, the more information exchanged between the user and the system through multiple modalities, the more versatile, efficient and natural the interaction becomes. Currently, modern intelligent devices do not take into account the user state sufficiently into consideration and thus suffer from a lack of personalization and low engagement. In particular, interaction logs and verbal data alone are not adequate for genuinely interpreting human behaviours, and therefore there has been a significant effort to analyse human behaviours from video data. Although significant progress has been made so far, there is still much room for improvement in moving from controlled and acted settings to real-world settings. The key aim of this Special Issue is to bring together cutting edge research and innovative computer vision techniques applied to human behaviour analysis, from the recognition of gestures and activities to the interpretation of these cues at a higher level for predicting cognitive, social and emotional states.

Special issue topics include, but are not limited to:

  • Unsupervised, semi-supervised and supervised learning-based approaches to human behaviour analysis
  • Face, gesture and body analysis
  • Activity recognition and anticipation
  • Affect and emotion recognition
  • Interactive behaviour analysis, including multiparty interaction, human-computer interaction, human-robot interaction
  • Combining vision with other modalities (e.g., audio, biosignals) for human behaviour analysis
  • Societal and ethical considerations of human behaviour analysis, including explainability, bias, fairness, privacy
  • Real-time systems for human behaviour analysis on devices with limited on-board computational power
  • Databases and open source tools for human behaviour analysis
  • Applications in education, healthcare, smart environments, or any related field

Dr. Oya Celiktutan

Prof. Albert Ali Salah

Prof. Dongmei Jiang

Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • human behaviour analysis
  • computer vision
  • machine learning

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 9991 KiB  
Article
Understanding How CNNs Recognize Facial Expressions: A Case Study with LIME and CEM
by Guillermo del Castillo Torres, Maria Francesca Roig-Maimó, Miquel Mascaró-Oliver, Esperança Amengual-Alcover and Ramon Mas-Sansó
Sensors 2023, 23(1), 131; https://doi.org/10.3390/s23010131 - 23 Dec 2022
Cited by 6 | Viewed by 2024
Abstract
Recognizing facial expressions has been a persistent goal in the scientific community. Since the rise of artificial intelligence, convolutional neural networks (CNN) have become popular to recognize facial expressions, as images can be directly used as input. Current CNN models can achieve high [...] Read more.
Recognizing facial expressions has been a persistent goal in the scientific community. Since the rise of artificial intelligence, convolutional neural networks (CNN) have become popular to recognize facial expressions, as images can be directly used as input. Current CNN models can achieve high recognition rates, but they give no clue about their reasoning process. Explainable artificial intelligence (XAI) has been developed as a means to help to interpret the results obtained by machine learning models. When dealing with images, one of the most-used XAI techniques is LIME. LIME highlights the areas of the image that contribute to a classification. As an alternative to LIME, the CEM method appeared, providing explanations in a way that is natural for human classification: besides highlighting what is sufficient to justify a classification, it also identifies what should be absent to maintain it and to distinguish it from another classification. This study presents the results of comparing LIME and CEM applied over complex images such as facial expression images. While CEM could be used to explain the results on images described with a reduced number of features, LIME would be the method of choice when dealing with images described with a huge number of features. Full article
Show Figures

Figure 1

16 pages, 33181 KiB  
Article
CamNuvem: A Robbery Dataset for Video Anomaly Detection
by Davi D. de Paula, Denis H. P. Salvadeo and Darlan M. N. de Araujo
Sensors 2022, 22(24), 10016; https://doi.org/10.3390/s222410016 - 19 Dec 2022
Cited by 3 | Viewed by 4458
Abstract
(1) Background: The research area of video surveillance anomaly detection aims to automatically detect the moment when a video surveillance camera captures something that does not fit the normal pattern. This is a difficult task, but it is important to automate, improve, and [...] Read more.
(1) Background: The research area of video surveillance anomaly detection aims to automatically detect the moment when a video surveillance camera captures something that does not fit the normal pattern. This is a difficult task, but it is important to automate, improve, and lower the cost of the detection of crimes and other accidents. The UCF–Crime dataset is currently the most realistic crime dataset, and it contains hundreds of videos distributed in several categories; it includes a robbery category, which contains videos of people stealing material goods using violence, but this category only includes a few videos. (2) Methods: This work focuses only on the robbery category, presenting a new weakly labelled dataset that contains 486 new real–world robbery surveillance videos acquired from public sources. (3) Results: We have modified and applied three state–of–the–art video surveillance anomaly detection methods to create a benchmark for future studies. We showed that in the best scenario, taking into account only the anomaly videos in our dataset, the best method achieved an AUC of 66.35%. When all anomaly and normal videos were taken into account, the best method achieved an AUC of 88.75%. (4) Conclusion: This result shows that there is a huge research opportunity to create new methods and approaches that can improve robbery detection in video surveillance. Full article
Show Figures

Figure 1

15 pages, 718 KiB  
Article
ArbGaze: Gaze Estimation from Arbitrary-Sized Low-Resolution Images
by Hee Gyoon Kim and Ju Yong Chang
Sensors 2022, 22(19), 7427; https://doi.org/10.3390/s22197427 - 30 Sep 2022
Cited by 1 | Viewed by 1893
Abstract
The goal of gaze estimation is to estimate a gaze vector from an image containing a face or eye(s). Most existing studies use pre-defined fixed-resolution images to estimate the gaze vector. However, images captured from in-the-wild environments may have various resolutions, and variation [...] Read more.
The goal of gaze estimation is to estimate a gaze vector from an image containing a face or eye(s). Most existing studies use pre-defined fixed-resolution images to estimate the gaze vector. However, images captured from in-the-wild environments may have various resolutions, and variation in resolution can degrade gaze estimation performance. To address this problem, a gaze estimation method from arbitrary-sized low-resolution images is proposed. The basic idea of the proposed method is to combine knowledge distillation and feature adaptation. Knowledge distillation helps the gaze estimator for arbitrary-sized images generate a feature map similar to that from a high-resolution image. Feature adaptation makes creating a feature map adaptive to various resolutions of an input image possible by using a low-resolution image and its scale information together. It is shown that combining these two ideas improves gaze estimation performance substantially in the ablation study. It is also demonstrated that the proposed method can be generalized to other popularly used gaze estimation models through experiments using various backbones. Full article
Show Figures

Figure 1

14 pages, 7650 KiB  
Article
Gaze Estimation Approach Using Deep Differential Residual Network
by Longzhao Huang, Yujie Li, Xu Wang, Haoyu Wang, Ahmed Bouridane and Ahmad Chaddad
Sensors 2022, 22(14), 5462; https://doi.org/10.3390/s22145462 - 21 Jul 2022
Cited by 6 | Viewed by 3194
Abstract
Gaze estimation, which is a method to determine where a person is looking at given the person’s full face, is a valuable clue for understanding human intention. Similarly to other domains of computer vision, deep learning (DL) methods have gained recognition in the [...] Read more.
Gaze estimation, which is a method to determine where a person is looking at given the person’s full face, is a valuable clue for understanding human intention. Similarly to other domains of computer vision, deep learning (DL) methods have gained recognition in the gaze estimation domain. However, there are still gaze calibration problems in the gaze estimation domain, thus preventing existing methods from further improving the performances. An effective solution is to directly predict the difference information of two human eyes, such as the differential network (Diff-Nn). However, this solution results in a loss of accuracy when using only one inference image. We propose a differential residual model (DRNet) combined with a new loss function to make use of the difference information of two eye images. We treat the difference information as auxiliary information. We assess the proposed model (DRNet) mainly using two public datasets (1) MpiiGaze and (2) Eyediap. Considering only the eye features, DRNet outperforms the state-of-the-art gaze estimation methods with angular-error of 4.57 and 6.14 using MpiiGaze and Eyediap datasets, respectively. Furthermore, the experimental results also demonstrate that DRNet is extremely robust to noise images. Full article
Show Figures

Figure 1

13 pages, 1599 KiB  
Article
Micro-Expression Recognition Based on Optical Flow and PCANet+
by Shiqi Wang, Suen Guan, Hui Lin, Jianming Huang, Fei Long and Junfeng Yao
Sensors 2022, 22(11), 4296; https://doi.org/10.3390/s22114296 - 5 Jun 2022
Cited by 6 | Viewed by 2691
Abstract
Micro-expressions are rapid and subtle facial movements. Different from ordinary facial expressions in our daily life, micro-expressions are very difficult to detect and recognize. In recent years, due to a wide range of potential applications in many domains, micro-expression recognition has aroused extensive [...] Read more.
Micro-expressions are rapid and subtle facial movements. Different from ordinary facial expressions in our daily life, micro-expressions are very difficult to detect and recognize. In recent years, due to a wide range of potential applications in many domains, micro-expression recognition has aroused extensive attention from computer vision. Because available micro-expression datasets are very small, deep neural network models with a huge number of parameters are prone to over-fitting. In this article, we propose an OF-PCANet+ method for micro-expression recognition, in which we design a spatiotemporal feature learning strategy based on shallow PCANet+ model, and we incorporate optical flow sequence stacking with the PCANet+ network to learn discriminative spatiotemporal features. We conduct comprehensive experiments on publicly available SMIC and CASME2 datasets. The results show that our lightweight model obviously outperforms popular hand-crafted methods and also achieves comparable performances with deep learning based methods, such as 3D-FCNN and ELRCN. Full article
Show Figures

Figure 1

13 pages, 645 KiB  
Article
A Spatiotemporal Deep Learning Approach for Automatic Pathological Gait Classification
by Pedro Albuquerque, Tanmay Tulsidas Verlekar, Paulo Lobato Correia and Luís Ducla Soares
Sensors 2021, 21(18), 6202; https://doi.org/10.3390/s21186202 - 16 Sep 2021
Cited by 10 | Viewed by 3058
Abstract
Human motion analysis provides useful information for the diagnosis and recovery assessment of people suffering from pathologies, such as those affecting the way of walking, i.e., gait. With recent developments in deep learning, state-of-the-art performance can now be achieved using a single 2D-RGB-camera-based [...] Read more.
Human motion analysis provides useful information for the diagnosis and recovery assessment of people suffering from pathologies, such as those affecting the way of walking, i.e., gait. With recent developments in deep learning, state-of-the-art performance can now be achieved using a single 2D-RGB-camera-based gait analysis system, offering an objective assessment of gait-related pathologies. Such systems provide a valuable complement/alternative to the current standard practice of subjective assessment. Most 2D-RGB-camera-based gait analysis approaches rely on compact gait representations, such as the gait energy image, which summarize the characteristics of a walking sequence into one single image. However, such compact representations do not fully capture the temporal information and dependencies between successive gait movements. This limitation is addressed by proposing a spatiotemporal deep learning approach that uses a selection of key frames to represent a gait cycle. Convolutional and recurrent deep neural networks were combined, processing each gait cycle as a collection of silhouette key frames, allowing the system to learn temporal patterns among the spatial features extracted at individual time instants. Trained with gait sequences from the GAIT-IT dataset, the proposed system is able to improve gait pathology classification accuracy, outperforming state-of-the-art solutions and achieving improved generalization on cross-dataset tests. Full article
Show Figures

Figure 1

Back to TopTop