Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = Inflated 3D ConvNet (I3D)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1071 KB  
Article
YOLO-I3D: Optimizing Inflated 3D Models for Real-Time Human Activity Recognition
by Ruikang Luo, Aman Anand, Farhana Zulkernine and Francois Rivest
J. Imaging 2024, 10(11), 269; https://doi.org/10.3390/jimaging10110269 - 24 Oct 2024
Cited by 6 | Viewed by 4170
Abstract
Human Activity Recognition (HAR) plays a critical role in applications such as security surveillance and healthcare. However, existing methods, particularly two-stream models like Inflated 3D (I3D), face significant challenges in real-time applications due to their high computational demand, especially from the optical flow [...] Read more.
Human Activity Recognition (HAR) plays a critical role in applications such as security surveillance and healthcare. However, existing methods, particularly two-stream models like Inflated 3D (I3D), face significant challenges in real-time applications due to their high computational demand, especially from the optical flow branch. In this work, we address these limitations by proposing two major improvements. First, we introduce a lightweight motion information branch that replaces the computationally expensive optical flow component with a lower-resolution RGB input, significantly reducing computation time. Second, we incorporate YOLOv5, an efficient object detector, to further optimize the RGB branch for faster real-time performance. Experimental results on the Kinetics-400 dataset demonstrate that our proposed two-stream I3D Light model improves the original I3D model’s accuracy by 4.13% while reducing computational cost. Additionally, the integration of YOLOv5 into the I3D model enhances accuracy by 1.42%, providing a more efficient solution for real-time HAR tasks. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 617 KB  
Article
Automatic Evaluation Method for Functional Movement Screening Based on Multi-Scale Lightweight 3D Convolution and an Encoder–Decoder
by Xiuchun Lin, Yichao Liu, Chen Feng, Zhide Chen, Xu Yang and Hui Cui
Electronics 2024, 13(10), 1813; https://doi.org/10.3390/electronics13101813 - 7 May 2024
Viewed by 1815
Abstract
Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and [...] Read more.
Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and efficiently. To address this challenge, this paper proposes an automatic evaluation method for FMS based on a multi-scale lightweight 3D convolution encoder–decoder (ML3D-ED) architecture. This method adopts a self-built multi-scale lightweight 3D convolution architecture to extract features from videos. The extracted features are then processed using an encoder–decoder architecture and probabilistic integration technique to effectively predict the final score distribution. This architecture, compared with the traditional Two-Stream Inflated 3D ConvNet (I3D) network, offers a better performance and accuracy in capturing advanced human movement features in temporal and spatial dimensions. Specifically, the ML3D-ED backbone network reduces the number of parameters by 59.5% and the computational cost by 77.7% when compared to I3D. Experiments have shown that ML3D-ED achieves an accuracy of 93.33% on public datasets, demonstrating an improvement of approximately 9% over the best existing method. This outcome demonstrates the effectiveness of and advancements made by the ML3D-ED architecture and probabilistic integration technique in extracting advanced human movement features and evaluating functional movements. Full article
Show Figures

Figure 1

16 pages, 2929 KB  
Article
Automatic Evaluation of Functional Movement Screening Based on Attention Mechanism and Score Distribution Prediction
by Xiuchun Lin, Tao Huang, Zhiqiang Ruan, Xuechao Yang, Zhide Chen, Guolong Zheng and Chen Feng
Mathematics 2023, 11(24), 4936; https://doi.org/10.3390/math11244936 - 12 Dec 2023
Cited by 5 | Viewed by 2276
Abstract
Functional movement screening (FMS) is a crucial testing method that evaluates fundamental movement patterns in the human body and identifies functional limitations. However, due to the inherent complexity of human movements, the automated assessment of FMS poses significant challenges. Prior methodologies have struggled [...] Read more.
Functional movement screening (FMS) is a crucial testing method that evaluates fundamental movement patterns in the human body and identifies functional limitations. However, due to the inherent complexity of human movements, the automated assessment of FMS poses significant challenges. Prior methodologies have struggled to effectively capture and model critical human features in video data. To address this challenge, this paper introduces an automatic assessment approach for FMS by leveraging deep learning techniques. The proposed method harnesses an I3D network to extract spatiotemporal video features across various scales and levels. Additionally, an attention mechanism (AM) module is incorporated to enable the network to focus more on human movement characteristics, enhancing its sensitivity to diverse location features. Furthermore, the multilayer perceptron (MLP) module is employed to effectively discern intricate patterns and features within the input data, facilitating its classification into multiple categories. Experimental evaluations conducted on publicly available datasets demonstrate that the proposed approach achieves state-of-the-art performance levels. Notably, in comparison to existing state-of-the-art (SOTA) methods, this approach exhibits a marked improvement in accuracy. These results corroborate the efficacy of the I3D-AM-MLP framework, indicating its significance in extracting advanced human movement feature expressions and automating the assessment of functional movement screening. Full article
Show Figures

Figure 1

19 pages, 6553 KB  
Article
Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention
by Amril Nazir, Rohan Mitra, Hana Sulieman and Firuz Kamalov
Sensors 2023, 23(13), 5811; https://doi.org/10.3390/s23135811 - 22 Jun 2023
Cited by 23 | Viewed by 7242
Abstract
The rise in crime rates in many parts of the world, coupled with advancements in computer vision, has increased the need for automated crime detection services. To address this issue, we propose a new approach for detecting suspicious behavior as a means of [...] Read more.
The rise in crime rates in many parts of the world, coupled with advancements in computer vision, has increased the need for automated crime detection services. To address this issue, we propose a new approach for detecting suspicious behavior as a means of preventing shoplifting. Existing methods are based on the use of convolutional neural networks that rely on extracting spatial features from pixel values. In contrast, our proposed method employs object detection based on YOLOv5 with Deep Sort to track people through a video, using the resulting bounding box coordinates as temporal features. The extracted temporal features are then modeled as a time-series classification problem. The proposed method was tested on the popular UCF Crime dataset, and benchmarked against the current state-of-the-art robust temporal feature magnitude (RTFM) method, which relies on the Inflated 3D ConvNet (I3D) preprocessing method. Our results demonstrate an impressive 8.45-fold increase in detection inference speed compared to the state-of-the-art RTFM, along with an F1 score of 92%,outperforming RTFM by 3%. Furthermore, our method achieved these results without requiring expensive data augmentation or image feature extraction. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

17 pages, 6436 KB  
Letter
Deep Learning-Based Real-Time Multiple-Person Action Recognition System
by Jen-Kai Tsai, Chen-Chien Hsu, Wei-Yen Wang and Shao-Kang Huang
Sensors 2020, 20(17), 4758; https://doi.org/10.3390/s20174758 - 23 Aug 2020
Cited by 42 | Viewed by 9331
Abstract
Action recognition has gained great attention in automatic video analysis, greatly reducing the cost of human resources for smart surveillance. Most methods, however, focus on the detection of only one action event for a single person in a well-segmented video, rather than the [...] Read more.
Action recognition has gained great attention in automatic video analysis, greatly reducing the cost of human resources for smart surveillance. Most methods, however, focus on the detection of only one action event for a single person in a well-segmented video, rather than the recognition of multiple actions performed by more than one person at the same time for an untrimmed video. In this paper, we propose a deep learning-based multiple-person action recognition system for use in various real-time smart surveillance applications. By capturing a video stream of the scene, the proposed system can detect and track multiple people appearing in the scene and subsequently recognize their actions. Thanks to high resolution of the video frames, we establish a zoom-in function to obtain more satisfactory action recognition results when people in the scene become too far from the camera. To further improve the accuracy, recognition results from inflated 3D ConvNet (I3D) with multiple sliding windows are processed by a nonmaximum suppression (NMS) approach to obtain a more robust decision. Experimental results show that the proposed method can perform multiple-person action recognition in real time suitable for applications such as long-term care environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 7023 KB  
Article
Automated Video Behavior Recognition of Pigs Using Two-Stream Convolutional Networks
by Kaifeng Zhang, Dan Li, Jiayun Huang and Yifei Chen
Sensors 2020, 20(4), 1085; https://doi.org/10.3390/s20041085 - 17 Feb 2020
Cited by 60 | Viewed by 6839
Abstract
The detection of pig behavior helps detect abnormal conditions such as diseases and dangerous movements in a timely and effective manner, which plays an important role in ensuring the health and well-being of pigs. Monitoring pig behavior by staff is time consuming, subjective, [...] Read more.
The detection of pig behavior helps detect abnormal conditions such as diseases and dangerous movements in a timely and effective manner, which plays an important role in ensuring the health and well-being of pigs. Monitoring pig behavior by staff is time consuming, subjective, and impractical. Therefore, there is an urgent need to implement methods for identifying pig behavior automatically. In recent years, deep learning has been gradually applied to the study of pig behavior recognition. Existing studies judge the behavior of the pig only based on the posture of the pig in a still image frame, without considering the motion information of the behavior. However, optical flow can well reflect the motion information. Thus, this study took image frames and optical flow from videos as two-stream input objects to fully extract the temporal and spatial behavioral characteristics. Two-stream convolutional network models based on deep learning were proposed, including inflated 3D convnet (I3D) and temporal segment networks (TSN) whose feature extraction network is Residual Network (ResNet) or the Inception architecture (e.g., Inception with Batch Normalization (BN-Inception), InceptionV3, InceptionV4, or InceptionResNetV2) to achieve pig behavior recognition. A standard pig video behavior dataset that included 1000 videos of feeding, lying, walking, scratching and mounting from five kinds of different behavioral actions of pigs under natural conditions was created. The dataset was used to train and test the proposed models, and a series of comparative experiments were conducted. The experimental results showed that the TSN model whose feature extraction network was ResNet101 was able to recognize pig feeding, lying, walking, scratching, and mounting behaviors with a higher average of 98.99%, and the average recognition time of each video was 0.3163 s. The TSN model (ResNet101) is superior to the other models in solving the task of pig behavior recognition. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

13 pages, 2108 KB  
Article
Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition
by Jongkwang Hong, Bora Cho, Yong Won Hong and Hyeran Byun
Sensors 2019, 19(6), 1382; https://doi.org/10.3390/s19061382 - 20 Mar 2019
Cited by 24 | Viewed by 4434
Abstract
In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the [...] Read more.
In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the image, becomes vital information to define the action. For example, the existence of the ball is vital information distinguishing “kicking” from “running”. Furthermore, some actions share typical global abstract poses, which can be used as a key to classify actions. Based on these observations, we propose the multi-stream network model, which incorporates spatial, temporal, and contextual cues in the image for action recognition. We experimented on the proposed method using C3D or inflated 3D ConvNet (I3D) as a backbone network, regarding two different action recognition datasets. As a result, we observed overall improvement in accuracy, demonstrating the effectiveness of our proposed method. Full article
(This article belongs to the Special Issue Intelligent Sensor Signal in Machine Learning)
Show Figures

Figure 1

Back to TopTop