Next Article in Journal
A Review of Modern CMOS Transimpedance Amplifiers for OTDR Applications
Next Article in Special Issue
False Positive Decremented Research for Fire and Smoke Detection in Surveillance Camera using Spatial and Temporal Features Based on Deep Learning
Previous Article in Journal
A Modified Adaptive Neuro-Fuzzy Inference System Using Multi-Verse Optimizer Algorithm for Oil Consumption Forecasting
Previous Article in Special Issue
Layer Selection in Progressive Transmission of Motion-Compensated JPEG2000 Video
Open AccessArticle

WS-AM: Weakly Supervised Attention Map for Scene Recognition

by Shifeng Xia 1, Jiexian Zeng 1,2, Lu Leng 1,* and Xiang Fu 1
School of Software, Nanchang Hangkong University, Nanchang 330063, China
Science and Technology College, Nanchang Hangkong University, Gongqingcheng 332020, China
Author to whom correspondence should be addressed.
Electronics 2019, 8(10), 1072;
Received: 24 August 2019 / Revised: 16 September 2019 / Accepted: 19 September 2019 / Published: 21 September 2019
Recently, convolutional neural networks (CNNs) have achieved great success in scene recognition. Compared with traditional hand-crafted features, CNN can be used to extract more robust and generalized features for scene recognition. However, the existing scene recognition methods based on CNN do not sufficiently take into account the relationship between image regions and categories when choosing local regions, which results in many redundant local regions and degrades recognition accuracy. In this paper, we propose an effective method for exploring discriminative regions of the scene image. Our method utilizes the gradient-weighted class activation mapping (Grad-CAM) technique and weakly supervised information to generate the attention map (AM) of scene images, dubbed WS-AM—weakly supervised attention map. The regions, where the local mean and the local center value are both large in the AM, correspond to the discriminative regions helpful for scene recognition. We sampled discriminative regions on multiple scales and extracted the features of large-scale and small-scale regions with two different pre-trained CNNs, respectively. The features from two different scales were aggregated by the improved vector of locally aggregated descriptor (VLAD) coding and max pooling, respectively. Finally, the pre-trained CNN was used to extract the global feature of the image in the fully- connected (fc) layer, and the local features were combined with the global feature to obtain the image representation. We validated the effectiveness of our method on three benchmark datasets: MIT Indoor 67, Scene 15, and UIUC Sports, and obtained 85.67%, 94.80%, and 95.12% accuracy, respectively. Compared with some state-of-the-art methods, the WS-AM method requires fewer local regions, so it has a better real-time performance. View Full-Text
Keywords: convolution neural network; scene recognition; vector of locally aggregated descriptor; weakly supervised attention map convolution neural network; scene recognition; vector of locally aggregated descriptor; weakly supervised attention map
Show Figures

Figure 1

MDPI and ACS Style

Xia, S.; Zeng, J.; Leng, L.; Fu, X. WS-AM: Weakly Supervised Attention Map for Scene Recognition. Electronics 2019, 8, 1072.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop