Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (2)

Search Parameters:
Keywords = public domain dataset for multi-modal action recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 7778 KiB  
Article
Convolutional Block Attention Module–Multimodal Feature-Fusion Action Recognition: Enabling Miner Unsafe Action Recognition
by Yu Wang, Xiaoqing Chen, Jiaoqun Li and Zengxiang Lu
Sensors 2024, 24(14), 4557; https://doi.org/10.3390/s24144557 - 14 Jul 2024
Cited by 5 | Viewed by 3304
Abstract
The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground [...] Read more.
The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground miners (UAUM) was constructed and included ten categories of such actions. Underground images were enhanced using spatial- and frequency-domain enhancement algorithms. A combination of the YOLOX object detection algorithm and the Lite-HRNet human key-point detection algorithm was utilized to obtain skeleton modal data. The CBAM-PoseC3D model, a skeleton modal action-recognition model incorporating the CBAM attention module, was proposed and combined with the RGB modal feature-extraction model CBAM-SlowOnly. Ultimately, this formed the Convolutional Block Attention Module–Multimodal Feature-Fusion Action Recognition (CBAM-MFFAR) model for recognizing unsafe actions of underground miners. The improved CBAM-MFFAR model achieved a recognition accuracy of 95.8% on the NTU60 RGB+D public dataset under the X-Sub benchmark. Compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, the recognition accuracy was improved by 2%, 2.7%, 7.3%, and 14.3%, respectively. On the UAUM dataset, the CBAM-MFFAR model achieved a recognition accuracy of 94.6%, with improvements of 2.6%, 4%, 12%, and 17.3% compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, respectively. In field validation at mining sites, the CBAM-MFFAR model accurately recognized similar and multiple unsafe actions among underground miners. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

12 pages, 6673 KiB  
Article
C-MHAD: Continuous Multimodal Human Action Dataset of Simultaneous Video and Inertial Sensing
by Haoran Wei, Pranav Chopada and Nasser Kehtarnavaz
Sensors 2020, 20(10), 2905; https://doi.org/10.3390/s20102905 - 20 May 2020
Cited by 34 | Viewed by 5893
Abstract
Existing public domain multi-modal datasets for human action recognition only include actions of interest that have already been segmented from action streams. These datasets cannot be used to study a more realistic action recognition scenario where actions of interest occur randomly and continuously [...] Read more.
Existing public domain multi-modal datasets for human action recognition only include actions of interest that have already been segmented from action streams. These datasets cannot be used to study a more realistic action recognition scenario where actions of interest occur randomly and continuously among actions of non-interest or no actions. It is more challenging to recognize actions of interest in continuous action streams since the starts and ends of these actions are not known and need to be determined in an on-the-fly manner. Furthermore, there exists no public domain multi-modal dataset in which video and inertial data are captured simultaneously for continuous action streams. The main objective of this paper is to describe a dataset that is collected and made publicly available, named Continuous Multimodal Human Action Dataset (C-MHAD), in which video and inertial data stream are captured simultaneously in a continuous way. This dataset is then used in an example recognition technique and the results obtained indicate that the fusion of these two sensing modalities increases the F1 scores compared to using each sensing modality individually. Full article
(This article belongs to the Special Issue Sensors Fusion for Human-Centric 3D Capturing)
Show Figures

Figure 1

Back to TopTop