Deep convolutional neural networks (DCNNs) with alternating convolutional, pooling and decimation layers are widely used in computer vision, yet current works tend to focus on deeper networks with many layers and neurons, resulting in a high computational complexity. However, the recognition task is still challenging for insufficient and uncomprehensive object appearance and training sample types such as infrared insulators. In view of this, more attention is focused on the application of a pretrained network for image feature representation, but the rules on how to select the feature representation layer are scarce. In this paper, we proposed a new concept, the layer entropy and relative layer entropy, which can be referred to as an image representation method based on relative layer entropy (IRM_RLE). It was designed to excavate the most suitable convolution layer for image recognition. First, the image was fed into an ImageNet pretrained DCNN model, and deep convolutional activations were extracted. Then, the appropriate feature layer was selected by calculating the layer entropy and relative layer entropy of each convolution layer. Finally, the number of the feature map was selected according to the importance degree and the feature maps of the convolution layer, which were vectorized and pooled by VLAD (vector of locally aggregated descriptors) coding and quantifying for final image representation. The experimental results show that the proposed approach performs competitively against previous methods across all datasets. Furthermore, for the indoor scenes and actions datasets, the proposed approach outperforms the state-of-the-art methods.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited