Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = video-based pedestrian-attribute recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 2148 KB  
Article
MRG-T: Mask-Relation-Guided Transformer for Remote Vision-Based Pedestrian Attribute Recognition in Aerial Imagery
by Shun Zhang, Yupeng Li, Xiao Wu, Zunheng Chu and Lingfei Li
Remote Sens. 2024, 16(7), 1216; https://doi.org/10.3390/rs16071216 - 29 Mar 2024
Cited by 2 | Viewed by 2255
Abstract
Nowadays, with the rapid development of consumer Unmanned Aerial Vehicles (UAVs), utilizing UAV platforms for visual surveillance has become very attractive, and a key part of this is remote vision-based pedestrian attribute recognition. Pedestrian Attribute Recognition (PAR) is dedicated to predicting multiple attribute [...] Read more.
Nowadays, with the rapid development of consumer Unmanned Aerial Vehicles (UAVs), utilizing UAV platforms for visual surveillance has become very attractive, and a key part of this is remote vision-based pedestrian attribute recognition. Pedestrian Attribute Recognition (PAR) is dedicated to predicting multiple attribute labels of a single pedestrian image extracted from surveillance videos and aerial imagery, which presents significant challenges in the computer vision community due to factors such as poor imaging quality and substantial pose variations. Despite recent studies demonstrating impressive advancements in utilizing complicated architectures and exploring relations, most of them may fail to fully and systematically consider the inter-region, inter-attribute, and region-attribute mapping relations simultaneously and be stuck in the dilemma of information redundancy, leading to the degradation of recognition accuracy. To address the issues, we construct a novel Mask-Relation-Guided Transformer (MRG-T) framework that consists of three relation modeling modules to fully exploit spatial and semantic relations in the model learning process. Specifically, we first propose a Masked Region Relation Module (MRRM) to focus on precise spatial attention regions to extract more robust features with masked random patch training. To explore the semantic association of attributes, we further present a Masked Attribute Relation Module (MARM) to extract intrinsic and semantic inter-attribute relations with an attribute label masking strategy. Based on the cross-attention mechanism, we finally design a Region and Attribute Mapping Module (RAMM) to learn the cross-modal alignment between spatial regions and semantic attributes. We conduct comprehensive experiments on three public benchmarks such as PETA, PA-100K, and RAPv1, and conduct inference on a large-scale airborne person dataset named PRAI-1581. The extensive experimental results demonstrate the superior performance of our method compared to state-of-the-art approaches and validate the effectiveness of mask-relation-guided modeling in the remote vision-based PAR task. Full article
(This article belongs to the Special Issue Signal Processing Theory and Methods in Remote Sensing)
Show Figures

Figure 1

16 pages, 1007 KB  
Article
Occluded Pedestrian-Attribute Recognition for Video Sensors Using Group Sparsity
by Geonu Lee, Kimin Yun and Jungchan Cho
Sensors 2022, 22(17), 6626; https://doi.org/10.3390/s22176626 - 1 Sep 2022
Cited by 1 | Viewed by 3794
Abstract
Pedestrians are often obstructed by other objects or people in real-world vision sensors. These obstacles make pedestrian-attribute recognition (PAR) difficult; hence, occlusion processing for visual sensing is a key issue in PAR. To address this problem, we first formulate the identification of non-occluded [...] Read more.
Pedestrians are often obstructed by other objects or people in real-world vision sensors. These obstacles make pedestrian-attribute recognition (PAR) difficult; hence, occlusion processing for visual sensing is a key issue in PAR. To address this problem, we first formulate the identification of non-occluded frames as temporal attention based on the sparsity of a crowded video. In other words, a model for PAR is guided to prevent paying attention to the occluded frame. However, we deduced that this approach cannot include a correlation between attributes when occlusion occurs. For example, “boots” and “shoe color” cannot be recognized simultaneously when the foot is invisible. To address the uncorrelated attention issue, we propose a novel temporal-attention module based on group sparsity. Group sparsity is applied across attention weights in correlated attributes. Accordingly, physically-adjacent pedestrian attributes are grouped, and the attention weights of a group are forced to focus on the same frames. Experimental results indicate that the proposed method achieved 1.18% and 6.21% higher F1-scores than the advanced baseline method on the occlusion samples in DukeMTMC-VideoReID and MARS video-based PAR datasets, respectively. Full article
Show Figures

Figure 1

14 pages, 912 KB  
Article
Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition
by Yang Li, Huahu Xu, Minjie Bian and Junsheng Xiao
Sensors 2020, 20(3), 811; https://doi.org/10.3390/s20030811 - 3 Feb 2020
Cited by 30 | Viewed by 9283
Abstract
As a result of its important role in video surveillance, pedestrian attribute recognition has become an attractive facet of computer vision research. Because of the changes in viewpoints, illumination, resolution and occlusion, the task is very challenging. In order to resolve the issue [...] Read more.
As a result of its important role in video surveillance, pedestrian attribute recognition has become an attractive facet of computer vision research. Because of the changes in viewpoints, illumination, resolution and occlusion, the task is very challenging. In order to resolve the issue of unsatisfactory performance of existing pedestrian attribute recognition methods resulting from ignoring the correlation between pedestrian attributes and spatial information, in this paper, the task is regarded as a spatiotemporal, sequential, multi-label image classification problem. An attention-based neural network consisting of convolutional neural networks (CNN), channel attention (CAtt) and convolutional long short-term memory (ConvLSTM) is proposed (CNN-CAtt-ConvLSTM). Firstly, the salient and correlated visual features of pedestrian attributes are extracted by pre-trained CNN and CAtt. Then, ConvLSTM is used to further extract spatial information and correlations from pedestrian attributes. Finally, pedestrian attributes are predicted with optimized sequences based on attribute image area size and importance. Extensive experiments are carried out on two common pedestrian attribute datasets, PEdesTrian Attribute (PETA) dataset and Richly Annotated Pedestrian (RAP) dataset, and higher performance than other state-of-the-art (SOTA) methods is achieved, which proves the superiority and validity of our method. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

12 pages, 1462 KB  
Article
Pedestrian Attributes Recognition in Surveillance Scenarios Using Multi-Task Lightweight Convolutional Neural Network
by Pu Yan, Li Zhuo, Jiafeng Li, Hui Zhang and Jing Zhang
Appl. Sci. 2019, 9(19), 4182; https://doi.org/10.3390/app9194182 - 7 Oct 2019
Cited by 7 | Viewed by 3559
Abstract
Pedestrian attributes (such as gender, age, hairstyle, and clothing) can effectively represent the appearance of pedestrians. These are high-level semantic features that are robust to illumination, deformation, etc. Therefore, they can be widely used in person re-identification, video structuring analysis and other applications. [...] Read more.
Pedestrian attributes (such as gender, age, hairstyle, and clothing) can effectively represent the appearance of pedestrians. These are high-level semantic features that are robust to illumination, deformation, etc. Therefore, they can be widely used in person re-identification, video structuring analysis and other applications. In this paper, a pedestrian attributes recognition method for surveillance scenarios using a multi-task lightweight convolutional neural network is proposed. Firstly, the labels of the attributes for each pedestrian image are integrated into a label vector. Then, a multi-task lightweight Convolutional Neural Network (CNN) is designed, which consists of five convolutional layers, three pooling layers and two fully connected layers to extract the deep features of pedestrian images. Considering that the data distribution of the datasets is unbalanced, the loss function is improved based on the sigmoid cross-entropy, and the scale factor is added to balance the amount of various attributes data. Through training the network, the mapping relationship model between the deep features of pedestrian images and the integration label vector of their attributes is established, which can be used to predict each attribute of the pedestrian. The experiments were conducted on two public pedestrian attributes datasets in surveillance scenarios, namely PETA and RAP. The results show that, compared with the state-of-the-art pedestrian attributes recognition methods, the proposed method can achieve a superior accuracy by 91.88% on PETA and 87.44% on RAP respectively. Full article
Show Figures

Figure 1

Back to TopTop