Next Article in Journal
Generic and Automatic Markov Random Field-Based Registration for Multimodal Remote Sensing Image Using Grayscale and Gradient Information
Previous Article in Journal
Towards Global-Scale Seagrass Mapping and Monitoring Using Sentinel-2 on Google Earth Engine: The Case Study of the Aegean and Ionian Seas
Previous Article in Special Issue
Systematic Comparison of Power Line Classification Methods from ALS and MLS Point Cloud Data
Article Menu
Issue 8 (August) cover image

Export Article

Open AccessArticle

A CNN-SIFT Hybrid Pedestrian Navigation Method Based on First-Person Vision

School of Electronic and Information Engineering, Beihang University, Xueyuan Road, Beijing 100191, China
Image Processing Center, Beihang University, Xueyuan Road, Beijing 100191, China
Data61, CSIRO, Canberra, ACT 2601, Australia
School of Software, Shanghai Jiao Tong University, Shanghai 200240, China
Authors to whom correspondence should be addressed.
Remote Sens. 2018, 10(8), 1229;
Received: 11 July 2018 / Revised: 27 July 2018 / Accepted: 1 August 2018 / Published: 5 August 2018
PDF [12162 KB, uploaded 3 September 2018]


The emergence of new wearable technologies, such as action cameras and smart glasses, has driven the use of the first-person perspective in computer applications. This field is now attracting the attention and investment of researchers aiming to develop methods to process first-person vision (FPV) video. The current approaches present particular combinations of different image features and quantitative methods to accomplish specific objectives, such as object detection, activity recognition, user–machine interaction, etc. FPV-based navigation is necessary in some special areas, where Global Position System (GPS) or other radio-wave strength methods are blocked, and is especially helpful for visually impaired people. In this paper, we propose a hybrid structure with a convolutional neural network (CNN) and local image features to achieve FPV pedestrian navigation. A novel end-to-end trainable global pooling operator, called AlphaMEX, has been designed to improve the scene classification accuracy of CNNs. A scale-invariant feature transform (SIFT)-based tracking algorithm is employed for movement estimation and trajectory tracking of the person through each frame of FPV images. Experimental results demonstrate the effectiveness of the proposed method. The top-1 error rate of the proposed AlphaMEX-ResNet outperforms the original ResNet (k = 12) by 1.7% on the ImageNet dataset. The CNN-SIFT hybrid pedestrian navigation system reaches 0.57 m average absolute error, which is an adequate accuracy for pedestrian navigation. Both positions and movements can be well estimated by the proposed pedestrian navigation algorithm with a single wearable camera. View Full-Text
Keywords: navigation; first-person vision; CNN; SIFT; movement estimation navigation; first-person vision; CNN; SIFT; movement estimation

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Zhao, Q.; Zhang, B.; Lyu, S.; Zhang, H.; Sun, D.; Li, G.; Feng, W. A CNN-SIFT Hybrid Pedestrian Navigation Method Based on First-Person Vision. Remote Sens. 2018, 10, 1229.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Remote Sens. EISSN 2072-4292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top