In recent years, deep neural network (DNN) based scene classification methods have achieved promising performance. However, the data-driven training strategy requires a large number of labeled samples, making the DNN-based methods unable to solve the scene classification problem in the case of a small number of labeled images. As the number and variety of scene images continue to grow, the cost and difficulty of manual annotation also increase. Therefore, it is significant to deal with the scene classification problem with only a few labeled samples. In this paper, we propose an attention metric network (AMN) in the framework of the few-shot learning (FSL) to improve the performance of one-shot scene classification. AMN is composed of a self-attention embedding network (SAEN) and a cross-attention metric network (CAMN). In SAEN, we adopt the spatial attention and the channel attention of feature maps to obtain abundant features of scene images. In CAMN, we propose a novel cross-attention mechanism which can highlight the features that are more concerned about different categories, and improve the similarity measurement performance. A loss function combining mean square error (MSE) loss with multi-class N-pair loss is developed, which helps to promote the intra-class similarity and inter-class variance of embedding features, and also improve the similarity measurement results. Experiments on the NWPU-RESISC45 dataset and the RSD-WHU46 dataset demonstrate that our method achieves the state-of-the-art results on one-shot remote sensing image scene classification tasks.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited