Next Article in Journal
A Method for Extrinsic Parameter Calibration of Rotating Binocular Stereo Vision Using a Single Feature Point
Next Article in Special Issue
Motion-Aware Correlation Filters for Online Visual Tracking
Previous Article in Journal
2D LiDAR SLAM Back-End Optimization with Control Network Constraint for Mobile Mapping
Previous Article in Special Issue
Improved Point-Line Feature Based Visual SLAM Method for Indoor Scenes
Article

Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature

School of Computer Science and Information Engineering, Hefei University of Technology, Feicui Road 420, Hefei 230000, China
*
Author to whom correspondence should be addressed.
Sensors 2018, 18(11), 3669; https://doi.org/10.3390/s18113669
Received: 31 August 2018 / Revised: 26 October 2018 / Accepted: 26 October 2018 / Published: 29 October 2018
(This article belongs to the Special Issue Visual Sensors)
Video-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification method called the end-to-end learning architecture with hybrid deep appearance-temporal feature. It can learn the appearance features of pivotal frames, the temporal features, and the independent distance metric of different features. This architecture consists of two-stream deep feature structure and two Siamese networks. For the first-stream structure, we propose the Two-branch Appearance Feature (TAF) sub-structure to obtain the appearance information of persons, and used one of the two Siamese networks to learn the similarity of appearance features of a pairwise person. To utilize the temporal information, we designed the second-stream structure that consisting of the Optical flow Temporal Feature (OTF) sub-structure and another Siamese network, to learn the person’s temporal features and the distances of pairwise features. In addition, we select the pivotal frames of video as inputs to the Inception-V3 network on the Two-branch Appearance Feature sub-structure, and employ the salience-learning fusion layer to fuse the learned global and local appearance features. Extensive experimental results on the PRID2011, iLIDS-VID, and Motion Analysis and Re-identification Set (MARS) datasets showed that the respective proposed architectures reached 79%, 59% and 72% at Rank-1 and had advantages over state-of-the-art algorithms. Meanwhile, it also improved the feature representation ability of persons. View Full-Text
Keywords: person re-identification; end-to-end architecture; appearance-temporal features; Siamese network; pivotal frames person re-identification; end-to-end architecture; appearance-temporal features; Siamese network; pivotal frames
Show Figures

Figure 1

MDPI and ACS Style

Sun, R.; Huang, Q.; Xia, M.; Zhang, J. Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature. Sensors 2018, 18, 3669. https://doi.org/10.3390/s18113669

AMA Style

Sun R, Huang Q, Xia M, Zhang J. Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature. Sensors. 2018; 18(11):3669. https://doi.org/10.3390/s18113669

Chicago/Turabian Style

Sun, Rui, Qiheng Huang, Miaomiao Xia, and Jun Zhang. 2018. "Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature" Sensors 18, no. 11: 3669. https://doi.org/10.3390/s18113669

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop