sensors-logo

Journal Browser

Journal Browser

Vision and Sensor-Based Sensing in Human Action Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (28 February 2024) | Viewed by 44089

Special Issue Editor


E-Mail Website
Guest Editor
Pattern Processing Lab, School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan
Interests: pattern recognition; character recognition; image processing; computer vision; human–computer interaction; neurological disease analysis; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

The goal of this Special Issue is to help to overcome the gap between human action recognition and its involvement in the development of many important applications such as human–computer interaction (HCI), virtual reality, security, the internet of things (IoT), healthcare facilities, and so on.

Over the past few decades, video-based or sensor-based sensing for human action recognition has received tremendous attention from the research community due to its wide range of applications and the recent advancement of computational performance, camera and sensor technology, and algorithms of machine learning and deep learning.

In this Special Issue of vision and sensor-based sensing in human action recognition, we are aiming to publish novel and technically sound research articles that demonstrate theoretical and practical contributions to computer vision, machine learning, AI, sensing, and medical and social issues.

Topics of interest include, but are not limited to:

  • Human action recognition from camera, video, and other relevant sensor data
  • Nontouch and touch interfaces using human action
  • Deep learning approach for human action recognition
  • Handwriting action analysis and recognition
  • Medical diagnosis and recognition using human action
  • Biosignal processing for human action recognition
  • Health care application using human action
  • Virtual reality, augmented reality, and other applications using human action
  • Human action analysis and recognition for social issues
  • Large datasets on human action recognition
  • Current state-of-the-art and future trends of human action recognition

Prof. Dr. Jungpil Shin
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Human–computer interaction
  • Hand gesture
  • Nontouch and touch interfaces
  • Handwriting action
  • Wearable sensors, nonwearable sensors
  • Video-based sensors
  • Medical diagnosis and recognition
  • Biosignal processing
  • Virtual reality, augmented reality
  • Machine learning
  • Deep learning

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 5629 KiB  
Article
Early Eye Disengagement Is Regulated by Task Complexity and Task Repetition in Visual Tracking Task
by Yun Wu, Zhongshi Zhang, Farzad Aghazadeh and Bin Zheng
Sensors 2024, 24(10), 2984; https://doi.org/10.3390/s24102984 - 8 May 2024
Viewed by 598
Abstract
Understanding human actions often requires in-depth detection and interpretation of bio-signals. Early eye disengagement from the target (EEDT) represents a significant eye behavior that involves the proactive disengagement of the gazes from the target to gather information on the anticipated pathway, thereby enabling [...] Read more.
Understanding human actions often requires in-depth detection and interpretation of bio-signals. Early eye disengagement from the target (EEDT) represents a significant eye behavior that involves the proactive disengagement of the gazes from the target to gather information on the anticipated pathway, thereby enabling rapid reactions to the environment. It remains unknown how task difficulty and task repetition affect EEDT. We aim to provide direct evidence of how these factors influence EEDT. We developed a visual tracking task in which participants viewed arrow movement videos while their eye movements were tracked. The task complexity was increased by increasing movement steps. Every movement pattern was performed twice to assess the effect of repetition on eye movement. Participants were required to recall the movement patterns for recall accuracy evaluation and complete cognitive load assessment. EEDT was quantified by the fixation duration and frequency within the areas of eye before arrow. When task difficulty increased, we found the recall accuracy score decreased, the cognitive load increased, and EEDT decreased significantly. The EEDT was higher in the second trial, but significance only existed in tasks with lower complexity. EEDT was positively correlated with recall accuracy and negatively correlated with cognitive load. Performing EEDT was reduced by task complexity and increased by task repetition. EEDT may be a promising sensory measure for assessing task performance and cognitive load and can be used for the future development of eye-tracking-based sensors. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

11 pages, 2226 KiB  
Article
Pupil Response in Visual Tracking Tasks: The Impacts of Task Load, Familiarity, and Gaze Position
by Yun Wu, Zhongshi Zhang, Yao Zhang, Bin Zheng and Farzad Aghazadeh
Sensors 2024, 24(8), 2545; https://doi.org/10.3390/s24082545 - 16 Apr 2024
Cited by 1 | Viewed by 744
Abstract
Pupil size is a significant biosignal for human behavior monitoring and can reveal much underlying information. This study explored the effects of task load, task familiarity, and gaze position on pupil response during learning a visual tracking task. We hypothesized that pupil size [...] Read more.
Pupil size is a significant biosignal for human behavior monitoring and can reveal much underlying information. This study explored the effects of task load, task familiarity, and gaze position on pupil response during learning a visual tracking task. We hypothesized that pupil size would increase with task load, up to a certain level before decreasing, decrease with task familiarity, and increase more when focusing on areas preceding the target than other areas. Fifteen participants were recruited for an arrow tracking learning task with incremental task load. Pupil size data were collected using a Tobii Pro Nano eye tracker. A 2 × 3 × 5 three-way factorial repeated measures ANOVA was conducted using R (version 4.2.1) to evaluate the main and interactive effects of key variables on adjusted pupil size. The association between individuals’ cognitive load, assessed by NASA-TLX, and pupil size was further analyzed using a linear mixed-effect model. We found that task repetition resulted in a reduction in pupil size; however, this effect was found to diminish as the task load increased. The main effect of task load approached statistical significance, but different trends were observed in trial 1 and trial 2. No significant difference in pupil size was detected among the three gaze positions. The relationship between pupil size and cognitive load overall followed an inverted U curve. Our study showed how pupil size changes as a function of task load, task familiarity, and gaze scanning. This finding provides sensory evidence that could improve educational outcomes. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

21 pages, 5075 KiB  
Article
Signer-Independent Arabic Sign Language Recognition System Using Deep Learning Model
by Kanchon Kanti Podder, Maymouna Ezeddin, Muhammad E. H. Chowdhury, Md. Shaheenur Islam Sumon, Anas M. Tahir, Mohamed Arselene Ayari, Proma Dutta, Amith Khandakar, Zaid Bin Mahbub and Muhammad Abdul Kadir
Sensors 2023, 23(16), 7156; https://doi.org/10.3390/s23167156 - 14 Aug 2023
Cited by 8 | Viewed by 2794
Abstract
Every one of us has a unique manner of communicating to explore the world, and such communication helps to interpret life. Sign language is the popular language of communication for hearing and speech-disabled people. When a sign language user interacts with a non-sign [...] Read more.
Every one of us has a unique manner of communicating to explore the world, and such communication helps to interpret life. Sign language is the popular language of communication for hearing and speech-disabled people. When a sign language user interacts with a non-sign language user, it becomes difficult for a signer to express themselves to another person. A sign language recognition system can help a signer to interpret the sign of a non-sign language user. This study presents a sign language recognition system that is capable of recognizing Arabic Sign Language from recorded RGB videos. To achieve this, two datasets were considered, such as (1) the raw dataset and (2) the face–hand region-based segmented dataset produced from the raw dataset. Moreover, operational layer-based multi-layer perceptron “SelfMLP” is proposed in this study to build CNN-LSTM-SelfMLP models for Arabic Sign Language recognition. MobileNetV2 and ResNet18-based CNN backbones and three SelfMLPs were used to construct six different models of CNN-LSTM-SelfMLP architecture for performance comparison of Arabic Sign Language recognition. This study examined the signer-independent mode to deal with real-time application circumstances. As a result, MobileNetV2-LSTM-SelfMLP on the segmented dataset achieved the best accuracy of 87.69% with 88.57% precision, 87.69% recall, 87.72% F1 score, and 99.75% specificity. Overall, face–hand region-based segmentation and SelfMLP-infused MobileNetV2-LSTM-SelfMLP surpassed the previous findings on Arabic Sign Language recognition by 10.970% accuracy. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

24 pages, 3959 KiB  
Article
Optically Non-Contact Cross-Country Skiing Action Recognition Based on Key-Point Collaborative Estimation and Motion Feature Extraction
by Jiashuo Qi, Dongguang Li, Jian He and Yu Wang
Sensors 2023, 23(7), 3639; https://doi.org/10.3390/s23073639 - 31 Mar 2023
Cited by 1 | Viewed by 2235
Abstract
Technical motion recognition in cross-country skiing can effectively help athletes to improve their skiing movements and optimize their skiing strategies. The non-contact acquisition method of the visual sensor has a bright future in ski training. The changing posture of the athletes, the environment [...] Read more.
Technical motion recognition in cross-country skiing can effectively help athletes to improve their skiing movements and optimize their skiing strategies. The non-contact acquisition method of the visual sensor has a bright future in ski training. The changing posture of the athletes, the environment of the ski resort, and the limited field of view have posed great challenges for motion recognition. To improve the applicability of monocular optical sensor-based motion recognition in skiing, we propose a monocular posture detection method based on cooperative detection and feature extraction. Our method uses four feature layers of different sizes to simultaneously detect human posture and key points and takes the position deviation loss and rotation compensation loss of key points as the loss function to implement the three-dimensional estimation of key points. Then, according to the typical characteristics of cross-country skiing movement stages and major sub-movements, the key points are divided and the features are extracted to implement the ski movement recognition. The experimental results show that our method is 90% accurate for cross-country skiing movements, which is equivalent to the recognition method based on wearable sensors. Therefore, our algorithm has application value in the scientific training of cross-country skiing. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

24 pages, 1957 KiB  
Article
Domain Adaptation Methods for Lab-to-Field Human Context Recognition
by Abdulaziz Alajaji, Walter Gerych, Luke Buquicchio, Kavin Chandrasekaran, Hamid Mansoor, Emmanuel Agu and Elke Rundensteiner
Sensors 2023, 23(6), 3081; https://doi.org/10.3390/s23063081 - 13 Mar 2023
Cited by 2 | Viewed by 1511
Abstract
Human context recognition (HCR) using sensor data is a crucial task in Context-Aware (CA) applications in domains such as healthcare and security. Supervised machine learning HCR models are trained using smartphone HCR datasets that are scripted or gathered in-the-wild. Scripted datasets are most [...] Read more.
Human context recognition (HCR) using sensor data is a crucial task in Context-Aware (CA) applications in domains such as healthcare and security. Supervised machine learning HCR models are trained using smartphone HCR datasets that are scripted or gathered in-the-wild. Scripted datasets are most accurate because of their consistent visit patterns. Supervised machine learning HCR models perform well on scripted datasets but poorly on realistic data. In-the-wild datasets are more realistic, but cause HCR models to perform worse due to data imbalance, missing or incorrect labels, and a wide variety of phone placements and device types. Lab-to-field approaches learn a robust data representation from a scripted, high-fidelity dataset, which is then used for enhancing performance on a noisy, in-the-wild dataset with similar labels. This research introduces Triplet-based Domain Adaptation for Context REcognition (Triple-DARE), a lab-to-field neural network method that combines three unique loss functions to enhance intra-class compactness and inter-class separation within the embedding space of multi-labeled datasets: (1) domain alignment loss in order to learn domain-invariant embeddings; (2) classification loss to preserve task-discriminative features; and (3) joint fusion triplet loss. Rigorous evaluations showed that Triple-DARE achieved 6.3% and 4.5% higher F1-score and classification, respectively, than state-of-the-art HCR baselines and outperformed non-adaptive HCR models by 44.6% and 10.7%, respectively. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

12 pages, 1737 KiB  
Article
Automatic Gender and Age Classification from Offline Handwriting with Bilinear ResNet
by Irina Rabaev, Izadeen Alkoran, Odai Wattad and Marina Litvak
Sensors 2022, 22(24), 9650; https://doi.org/10.3390/s22249650 - 9 Dec 2022
Cited by 7 | Viewed by 2130
Abstract
This work focuses on automatic gender and age prediction tasks from handwritten documents. This problem is of interest in a variety of fields, such as historical document analysis and forensic investigations. The challenge for automatic gender and age classification can be demonstrated by [...] Read more.
This work focuses on automatic gender and age prediction tasks from handwritten documents. This problem is of interest in a variety of fields, such as historical document analysis and forensic investigations. The challenge for automatic gender and age classification can be demonstrated by the relatively low performances of the existing methods. In addition, despite the success of CNN for gender classification, deep neural networks were never applied for age classification. The published works in this area mostly concentrate on English and Arabic languages. In addition to Arabic and English, this work also considers Hebrew, which was much less studied. Following the success of bilinear Convolutional Neural Network (B-CNN) for fine-grained classification, we propose a novel implementation of a B-CNN with ResNet blocks. To our knowledge, this is the first time the bilinear CNN is applied for writer demographics classification. In particular, this is the first attempt to apply a deep neural network for the age classification. We perform experiments on documents from three benchmark datasets written in three different languages and provide a thorough comparison with the results reported in the literature. B-ResNet was top-ranked in all tasks. In particular, B-ResNet outperformed other models on KHATT and QUWI datasets on gender classification. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

15 pages, 4970 KiB  
Article
Recognition of Uni-Stroke Characters with Hand Movements in 3D Space Using Convolutional Neural Networks
by Won-Du Chang, Akitaka Matsuoka, Kyeong-Taek Kim and Jungpil Shin
Sensors 2022, 22(16), 6113; https://doi.org/10.3390/s22166113 - 16 Aug 2022
Cited by 2 | Viewed by 1750
Abstract
Hand gestures are a common means of communication in daily life, and many attempts have been made to recognize them automatically. Developing systems and algorithms to recognize hand gestures is expected to enhance the experience of human–computer interfaces, especially when there are difficulties [...] Read more.
Hand gestures are a common means of communication in daily life, and many attempts have been made to recognize them automatically. Developing systems and algorithms to recognize hand gestures is expected to enhance the experience of human–computer interfaces, especially when there are difficulties in communicating vocally. A popular system for recognizing hand gestures is the air-writing method, where people write letters in the air by hand. The arm movements are tracked with a smartwatch/band with embedded acceleration and gyro sensors; a computer system then recognizes the written letters. One of the greatest difficulties in developing algorithms for air writing is the diversity of human hand/arm movements, which makes it difficult to build signal templates for air-written characters or network models. This paper proposes a method for recognizing air-written characters using an artificial neural network. We utilized uni-stroke-designed characters and presented a network model with inception modules and an ensemble structure. The proposed method was successfully evaluated using the data of air-written characters (Arabic numbers and English alphabets) from 18 people with 91.06% accuracy, which reduced the error rate of recent studies by approximately half. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

21 pages, 25805 KiB  
Article
Fitness Movement Types and Completeness Detection Using a Transfer-Learning-Based Deep Neural Network
by Kuan-Yu Chen, Jungpil Shin, Md. Al Mehedi Hasan, Jiun-Jian Liaw, Okuyama Yuichi and Yoichi Tomioka
Sensors 2022, 22(15), 5700; https://doi.org/10.3390/s22155700 - 29 Jul 2022
Cited by 13 | Viewed by 3693
Abstract
Fitness is important in people’s lives. Good fitness habits can improve cardiopulmonary capacity, increase concentration, prevent obesity, and effectively reduce the risk of death. Home fitness does not require large equipment but uses dumbbells, yoga mats, and horizontal bars to complete fitness exercises [...] Read more.
Fitness is important in people’s lives. Good fitness habits can improve cardiopulmonary capacity, increase concentration, prevent obesity, and effectively reduce the risk of death. Home fitness does not require large equipment but uses dumbbells, yoga mats, and horizontal bars to complete fitness exercises and can effectively avoid contact with people, so it is deeply loved by people. People who work out at home use social media to obtain fitness knowledge, but learning ability is limited. Incomplete fitness is likely to lead to injury, and a cheap, timely, and accurate fitness detection system can reduce the risk of fitness injuries and can effectively improve people’s fitness awareness. In the past, many studies have engaged in the detection of fitness movements, among which the detection of fitness movements based on wearable devices, body nodes, and image deep learning has achieved better performance. However, a wearable device cannot detect a variety of fitness movements, may hinder the exercise of the fitness user, and has a high cost. Both body-node-based and image-deep-learning-based methods have lower costs, but each has some drawbacks. Therefore, this paper used a method based on deep transfer learning to establish a fitness database. After that, a deep neural network was trained to detect the type and completeness of fitness movements. We used Yolov4 and Mediapipe to instantly detect fitness movements and stored the 1D fitness signal of movement to build a database. Finally, MLP was used to classify the 1D signal waveform of fitness. In the performance of the classification of fitness movement types, the mAP was 99.71%, accuracy was 98.56%, precision was 97.9%, recall was 98.56%, and the F1-score was 98.23%, which is quite a high performance. In the performance of fitness movement completeness classification, accuracy was 92.84%, precision was 92.85, recall was 92.84%, and the F1-score was 92.83%. The average FPS in detection was 17.5. Experimental results show that our method achieves higher accuracy compared to other methods. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

14 pages, 1562 KiB  
Article
A Lightweight Subgraph-Based Deep Learning Approach for Fall Recognition
by Zhenxiao Zhao, Lei Zhang and Huiliang Shang
Sensors 2022, 22(15), 5482; https://doi.org/10.3390/s22155482 - 22 Jul 2022
Cited by 6 | Viewed by 1661
Abstract
Falls pose a great danger to social development, especially to the elderly population. When a fall occurs, the body’s center of gravity moves from a high position to a low position, and the magnitude of change varies among body parts. Most existing fall [...] Read more.
Falls pose a great danger to social development, especially to the elderly population. When a fall occurs, the body’s center of gravity moves from a high position to a low position, and the magnitude of change varies among body parts. Most existing fall recognition methods based on deep learning have not yet considered the differences between the movement and the change in amplitude of each body part. Besides, some problems exist such as complicated design, slow detection speed, and lack of timeliness. To alleviate these problems, a lightweight subgraph-based deep learning method utilizing skeleton information for fall recognition is proposed in this paper. The skeleton information of the human body is extracted by OpenPose, and an end-to-end lightweight subgraph-based network is designed. Sub-graph division and sub-graph attention modules are introduced to add a larger perceptual field while maintaining its lightweight characteristics. A multi-scale temporal convolution module is also designed to extract and fuse multi-scale temporal features, which enriches the feature representation. The proposed method is evaluated on a partial fall dataset collected in NTU and on two public datasets, and outperforms existing methods. It indicates that the proposed method is accurate and lightweight, which means it is suitable for real-time detection and rapid response to falls. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

17 pages, 1137 KiB  
Article
Toward COVID-19 Contact Tracing though Wi-Fi Probes
by Xu Yang, Chenqi Shi, Peihao Li, Yuqing Yin and Qiang Niu
Sensors 2022, 22(6), 2255; https://doi.org/10.3390/s22062255 - 14 Mar 2022
Cited by 1 | Viewed by 2232
Abstract
COVID-19 is currently the biggest threat that challenges all of humankind’s health and property. One promising and effective way to control the rapid spreading of this infection is searching for primary close contacts of the confirmed cases. In response, we propose COVID-19 Tracer, [...] Read more.
COVID-19 is currently the biggest threat that challenges all of humankind’s health and property. One promising and effective way to control the rapid spreading of this infection is searching for primary close contacts of the confirmed cases. In response, we propose COVID-19 Tracer, a low-cost passive searching system to find COVID-19 patients’ close contacts. The main idea is utilizing ubiquitous WiFi probe requests to describe the location similarity, which is then achieved by two designed range-free judgment indicators: location similarity coefficient and close contact distance. We have carried out extensive experiments in a school office building, and the experimental results show an average accuracy of more than 98%, demonstrating our system’s effectiveness in judging close contacts. Last but not least, we have developed a prototype system for a school building to find potential close contacts. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

15 pages, 425 KiB  
Article
Deep Learning Based Air-Writing Recognition with the Choice of Proper Interpolation Technique
by Fuad Al Abir, Md. Al Siam, Abu Sayeed, Md. Al Mehedi Hasan and Jungpil Shin
Sensors 2021, 21(24), 8407; https://doi.org/10.3390/s21248407 - 16 Dec 2021
Cited by 10 | Viewed by 4613
Abstract
The act of writing letters or words in free space with body movements is known as air-writing. Air-writing recognition is a special case of gesture recognition in which gestures correspond to characters and digits written in the air. Air-writing, unlike general gestures, does [...] Read more.
The act of writing letters or words in free space with body movements is known as air-writing. Air-writing recognition is a special case of gesture recognition in which gestures correspond to characters and digits written in the air. Air-writing, unlike general gestures, does not require the memorization of predefined special gesture patterns. Rather, it is sensitive to the subject and language of interest. Traditional air-writing requires an extra device containing sensor(s), while the wide adoption of smart-bands eliminates the requirement of the extra device. Therefore, air-writing recognition systems are becoming more flexible day by day. However, the variability of signal duration is a key problem in developing an air-writing recognition model. Inconsistent signal duration is obvious due to the nature of the writing and data-recording process. To make the signals consistent in length, researchers attempted various strategies including padding and truncating, but these procedures result in significant data loss. Interpolation is a statistical technique that can be employed for time-series signals to ensure minimum data loss. In this paper, we extensively investigated different interpolation techniques on seven publicly available air-writing datasets and developed a method to recognize air-written characters using a 2D-CNN model. In both user-dependent and user-independent principles, our method outperformed all the state-of-the-art methods by a clear margin for all datasets. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

19 pages, 1961 KiB  
Article
American Sign Language Alphabet Recognition by Extracting Feature from Hand Pose Estimation
by Jungpil Shin, Akitaka Matsuoka, Md. Al Mehedi Hasan and Azmain Yakin Srizon
Sensors 2021, 21(17), 5856; https://doi.org/10.3390/s21175856 - 31 Aug 2021
Cited by 61 | Viewed by 7658
Abstract
Sign language is designed to assist the deaf and hard of hearing community to convey messages and connect with society. Sign language recognition has been an important domain of research for a long time. Previously, sensor-based approaches have obtained higher accuracy than vision-based [...] Read more.
Sign language is designed to assist the deaf and hard of hearing community to convey messages and connect with society. Sign language recognition has been an important domain of research for a long time. Previously, sensor-based approaches have obtained higher accuracy than vision-based approaches. Due to the cost-effectiveness of vision-based approaches, researchers have been conducted here also despite the accuracy drop. The purpose of this research is to recognize American sign characters using hand images obtained from a web camera. In this work, the media-pipe hands algorithm was used for estimating hand joints from RGB images of hands obtained from a web camera and two types of features were generated from the estimated coordinates of the joints obtained for classification: one is the distances between the joint points and the other one is the angles between vectors and 3D axes. The classifiers utilized to classify the characters were support vector machine (SVM) and light gradient boosting machine (GBM). Three character datasets were used for recognition: the ASL Alphabet dataset, the Massey dataset, and the finger spelling A dataset. The results obtained were 99.39% for the Massey dataset, 87.60% for the ASL Alphabet dataset, and 98.45% for Finger Spelling A dataset. The proposed design for automatic American sign language recognition is cost-effective, computationally inexpensive, does not require any special sensors or devices, and has outperformed previous studies. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

17 pages, 6903 KiB  
Article
A Bayesian Dynamical Approach for Human Action Recognition
by Amirreza Farnoosh, Zhouping Wang, Shaotong Zhu and Sarah Ostadabbas
Sensors 2021, 21(16), 5613; https://doi.org/10.3390/s21165613 - 20 Aug 2021
Cited by 7 | Viewed by 2521
Abstract
We introduce a generative Bayesian switching dynamical model for action recognition in 3D skeletal data. Our model encodes highly correlated skeletal data into a few sets of low-dimensional switching temporal processes and from there decodes to the motion data and their associated action [...] Read more.
We introduce a generative Bayesian switching dynamical model for action recognition in 3D skeletal data. Our model encodes highly correlated skeletal data into a few sets of low-dimensional switching temporal processes and from there decodes to the motion data and their associated action labels. We parameterize these temporal processes with regard to a switching deep autoregressive prior to accommodate both multimodal and higher-order nonlinear inter-dependencies. This results in a dynamical deep generative latent model that parses meaningful intrinsic states in skeletal dynamics and enables action recognition. These sequences of states provide visual and quantitative interpretations about motion primitives that gave rise to each action class, which have not been explored previously. In contrast to previous works, which often overlook temporal dynamics, our method explicitly model temporal transitions and is generative. Our experiments on two large-scale 3D skeletal datasets substantiate the superior performance of our model in comparison with the state-of-the-art methods. Specifically, our method achieved 6.3% higher action classification accuracy (by incorporating a dynamical generative framework), and 3.5% better predictive error (by employing a nonlinear second-order dynamical transition model) when compared with the best-performing competitors. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

16 pages, 4012 KiB  
Article
INIM: Inertial Images Construction with Applications to Activity Recognition
by Nati Daniel and Itzik Klein
Sensors 2021, 21(14), 4787; https://doi.org/10.3390/s21144787 - 13 Jul 2021
Cited by 4 | Viewed by 3124
Abstract
Human activity recognition aims to classify the user activity in various applications like healthcare, gesture recognition and indoor navigation. In the latter, smartphone location recognition is gaining more attention as it enhances indoor positioning accuracy. Commonly the smartphone’s inertial sensor readings are used [...] Read more.
Human activity recognition aims to classify the user activity in various applications like healthcare, gesture recognition and indoor navigation. In the latter, smartphone location recognition is gaining more attention as it enhances indoor positioning accuracy. Commonly the smartphone’s inertial sensor readings are used as input to a machine learning algorithm which performs the classification. There are several approaches to tackle such a task: feature based approaches, one dimensional deep learning algorithms, and two dimensional deep learning architectures. When using deep learning approaches, feature engineering is redundant. In addition, while utilizing two-dimensional deep learning approaches enables to utilize methods from the well-established computer vision domain. In this paper, a framework for smartphone location and human activity recognition, based on the smartphone’s inertial sensors, is proposed. The contributions of this work are a novel time series encoding approach, from inertial signals to inertial images, and transfer learning from computer vision domain to the inertial sensors classification problem. Four different datasets are employed to show the benefits of using the proposed approach. In addition, as the proposed framework performs classification on inertial sensors readings, it can be applied for other classification tasks using inertial data. It can also be adopted to handle other types of sensory data collected for a classification task. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

21 pages, 2465 KiB  
Article
Gaze and Event Tracking for Evaluation of Recommendation-Driven Purchase
by Piotr Sulikowski, Tomasz Zdziebko, Kristof Coussement, Krzysztof Dyczkowski, Krzysztof Kluza and Karina Sachpazidu-Wójcicka
Sensors 2021, 21(4), 1381; https://doi.org/10.3390/s21041381 - 16 Feb 2021
Cited by 22 | Viewed by 3227
Abstract
Recommendation systems play an important role in e-commerce turnover by presenting personalized recommendations. Due to the vast amount of marketing content online, users are less susceptible to these suggestions. In addition to the accuracy of a recommendation, its presentation, layout, and other visual [...] Read more.
Recommendation systems play an important role in e-commerce turnover by presenting personalized recommendations. Due to the vast amount of marketing content online, users are less susceptible to these suggestions. In addition to the accuracy of a recommendation, its presentation, layout, and other visual aspects can improve its effectiveness. This study evaluates the visual aspects of recommender interfaces. Vertical and horizontal recommendation layouts are tested, along with different visual intensity levels of item presentation, and conclusions obtained with a number of popular machine learning methods are discussed. Results from the implicit feedback study of the effectiveness of recommending interfaces for four major e-commerce websites are presented. Two different methods of observing user behavior were used, i.e., eye-tracking and document object model (DOM) implicit event tracking in the browser, which allowed collecting a large amount of data related to user activity and physical parameters of recommending interfaces. Results have been analyzed in order to compare the reliability and applicability of both methods. Observations made with eye tracking and event tracking led to similar results regarding recommendation interface evaluation. In general, vertical interfaces showed higher effectiveness compared to horizontal ones, with the first and second positions working best, and the worse performance of horizontal interfaces probably being connected with banner blindness. Neural networks provided the best modeling results of the recommendation-driven purchase (RDP) phenomenon. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

Back to TopTop