# Cohesion Intensive Deep Hashing for Remote Sensing Image Retrieval

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Background

#### 1.2. Motivation

#### 1.3. Contribution

## 2. Cohesion Intensive Deep Hashing

#### 2.1. Residual Hash Net

#### 2.2. Cohesion Intensive Loss Function

## 3. Experimental Results

#### 3.1. Experimental Setup

**caffe**framework (http://caffe.berkeleyvision.org/). We employ the residual net architecture [21], fine-tune a pretrained residual net model, and train a new layer Fc1 to produce hash codes. The metrics for evaluating retrieval accuracy are the

**mean average precision**(MAP), the precision-recall (P-R) curve, and t-distributed Stochastic Neighbor Embedding (t-SNE) [25].

#### 3.2. Results and Analysis

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Wang, Q.; Gao, J.; Yuan, Y. Embedding Structured Contour and Location Prior in Siamesed Fully Convolutional Networks for Road Detection. IEEE Trans. Intell. Transp. Syst.
**2018**, 19, 230–241. [Google Scholar] [CrossRef][Green Version] - Ma, J.; Zhou, H.; Zhao, J.; Gao, Y.; Jiang, J.; Tian, J. Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans. Geosci. Remote Sens.
**2015**, 53, 6469–6481. [Google Scholar] [CrossRef] - Ma, J.; Zhao, J.; Jiang, J.; Zhou, H.; Guo, X. Locality preserving matching. Int. J. Comput. Vis.
**2019**, 127, 512–531. [Google Scholar] [CrossRef] - Li, Y.; Tao, C.; Tan, Y.; Shang, K.; Tian, J. Unsupervised Multilayer Feature Learning for Satellite Image Scene Classification. IEEE Geosci. Remote Sens. Lett.
**2016**, 13, 157–161. [Google Scholar] [CrossRef] - Li, Y.; Zhang, Y.; Huang, X.; Yuille, A.L. Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images. Isprs J. Photogramm. Remote Sens.
**2018**, 146, 182–196. [Google Scholar] [CrossRef] - Patil, R.; Sharma, S.K.; Tignath, S. Remote Sensing and GIS based soil erosion assessment from an agricultural watershed. Arab. J. Geosci.
**2015**, 8, 6967–6984. [Google Scholar] [CrossRef] - Wang, J.; Kumar, S.; Chang, S.F. Semi-supervised hashing for scalable image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
- Datar, M.; Immorlica, N.; Indyk, P.; Mirrokni, V.S. Locality-sensitive hashing scheme based on p-stable distributions. Symp. Comput. Geom.
**2004**, 8–11, 253–262. [Google Scholar] - Weiss, Y.; Torralba, A.; Fergus, R. Spectral Hashing. Neural Inf. Process. Syst.
**2008**, 8–11, 1753–1760. [Google Scholar] - Gong, Y.; Lazebnik, S. Iterative quantization: A procrustean approach to learning binary codes. IEEE Trans. Pattern Anal. Mach. Intell.
**2012**, 35, 2916–2929. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lukac, N.; Zalik, B.; Cui, S.; Datcu, M. GPU-based kernelized locality-sensitive hashing for satellite image retrieval. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 1468–1471. [Google Scholar]
- Demir, B.; Bruzzone, L. Hashing-Based Scalable Remote Sensing Image Search and Retrieval in Large Archives. IEEE Trans. Geosci. Remote Sens.
**2016**, 54, 892–904. [Google Scholar] [CrossRef] - Li, P.; Ren, P. Partial Randomness Hashing for Large-Scale Remote Sensing Image Retrieval. IEEE Geosci. Remote Sens. Lett.
**2017**, 14, 464–468. [Google Scholar] [CrossRef] - Li, P.; Zhang, X.; Zhu, X.; Ren, P. Online Hashing for Scalable Remote Sensing Image Retrieval. Remote Sens.
**2018**, 10, 709. [Google Scholar] [CrossRef][Green Version] - Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] [CrossRef] - Oliva, A.; Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Int. J. Comput. Vis.
**2001**, 42, 145–175. [Google Scholar] [CrossRef] - Yu, X.; Wu, X.; Luo, C.; Peng, R. Deep learning in remote sensing scene classification: A data augmentation enhanced convolutional neural network framework. Gisci. Remote Sens.
**2017**, 54, 741–758. [Google Scholar] [CrossRef][Green Version] - Xie, M.; Jean, N.; Burke, M.; Lobell, D.; Ermon, S. Transfer Learning from Deep Features for Remote Sensing and Poverty Mapping. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Li, Y.; Zhang, Y.; Huang, X.; Zhu, H.; Ma, J. Large-Scale Remote Sensing Image Retrieval by Deep Hashing Neural Networks. IEEE Trans. Geosci. Remote Sens.
**2018**, 56, 950–965. [Google Scholar] [CrossRef] - Li, Y.; Zhang, Y.; Huang, X.; Ma, J. Learning Source-Invariant Deep Hashing Convolutional Neural Networks for Cross-Source Remote Sensing Image Retrieval. IEEE Trans. Geosci. Remote Sens.
**2018**, 56, 6521–6536. [Google Scholar] [CrossRef] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2015. [Google Scholar]
- Cao, Z.; Long, M.; Wang, J.; Yu, P.S. HashNet: Deep Learning to Hash by Continuation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5609–5618. [Google Scholar]
- Yang, Y.; Newsam, S.D. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
- Xia, G.S.; Hu, J.; Fan, H.; Shi, B.; Xiang, B.; Zhong, Y.; Zhang, L. AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens.
**2017**, 55, 3965–3981. [Google Scholar] [CrossRef][Green Version] - Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; Darrell, T. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In Proceedings of the International Conference on Machine Learning 2014, Beijing, China, 21–26 June 2014. [Google Scholar]
- Shen, F.; Shen, C.; Liu, W.; Shen, H.T. Supervised Discrete Hashing. In Proceedings of the Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 37–45. [Google Scholar]
- Kang, W.; Li, W.; Zhou, Z. Column sampling based discrete supervised hashing. In Proceedings of the AAAI Conference on Artificial Intelligence 2016, Phoenix, AZ, USA, 12–17 February 2016; pp. 1230–1236. [Google Scholar]
- Zhu, H.; Long, M.; Wang, J.; Cao, Y. Deep Hashing Network for efficient similarity retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence 2016, Phoenix, AZ, USA, 12–17 February 2016; pp. 2415–2421. [Google Scholar]
- Liu, H.; Wang, R.; Shan, S.; Chen, X. Deep Supervised Hashing for Fast Image Retrieval. In Proceedings of the Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2064–2072. [Google Scholar]

**Figure 1.**The residual hash net consists of a convolutional layer (Conv1), four residual blocks (Res-Block1, Res-Block2, Res-Block3, Res-Block4), and a fully connected layer (Fc1). A Heaviside-like function plays the role of the activation function for binarizing the outputs of Fc1. One input of the network is a remote sensing image, and the corresponding output is a piece of K-bit binary code for the image.

**Figure 2.**A residual block: Res-BlockQ {Q = 1,2,3,4} represents a residual block, which consists of p residual units. For Res-Block1, Res-Block2, Res-Block3, and Res-Block4, p is 3, 4, 6, 3, respectively. A residual unit contains three convolution layers. A residual unit takes both the input and output of its previous residual unit as the inputs.

**Figure 3.**The process of $tanh\left(\tau y\right)$ gradually approaching sgn(y) as $\tau $ increases, where $0<{\tau}_{1}<{\tau}_{T}<\infty $.

**Figure 4.**Illustration of UCMerced and AID data set. UCMerced contains 21 geography classes and each class has 100 images. We give some examples of several classes. AID contains 30 geography classes and each class has 200∼400 images. We give some examples of several classes of both data sets.

**Figure 6.**Visual image retrieval results of different deep methods examined with 32 bits. The 1st, 5th, 10th, 20th, 30th, 40th, and 50th retrieval results are shown. In addition, false retrieval results are marked with red rectangles.

**Figure 8.**The t-distributed Stochastic Neighbor Embedding (t-SNE) of different length hash codes generated by comparison methods on AID data set.

Layer | Configuration |
---|---|

Conv1 | $\mathbf{7}\times \mathbf{7}$, 64, Stride 2 |

Res-Block1 | $\left[\begin{array}{c}1\times 1,64\\ 3\times 3,64\\ 1\times 1,256\end{array}\right]\times 3,\phantom{\rule{3.33333pt}{0ex}}3\times 3max\mathrm{pooling}$ |

Res-Block2 | $\left[\begin{array}{c}1\times 1,128\\ 3\times 3,128\\ 1\times 1,512\end{array}\right]\times 4,\phantom{\rule{3.33333pt}{0ex}}3\times 3max\mathrm{pooling}$ |

Res-Block3 | $\left[\begin{array}{c}1\times 1,256\\ 3\times 3,256\\ 1\times 1,1024\end{array}\right]\times 6,\phantom{\rule{3.33333pt}{0ex}}3\times 3max\mathrm{pooling}$ |

Res-Block4 | $\left[\begin{array}{c}1\times 1,512\\ 3\times 3,512\\ 1\times 1,2048\end{array}\right]\times 3,\phantom{\rule{3.33333pt}{0ex}}3\times 3max\mathrm{pooling}$ |

Fc1 | $\mathrm{K}\phantom{\rule{4pt}{0ex}}\mathrm{dimensions},\mathrm{average}\phantom{\rule{4pt}{0ex}}\mathrm{pooling}$ |

**Table 2.**Mean average precision (MAP) and average retrieval time (s) of comparison methods on UCMerced data set.

Method | GIST | PRH | KSH | COSDISH | SDH | DSH | DHN | DHNNs | CIDH | ||
---|---|---|---|---|---|---|---|---|---|---|---|

Bits | mAP | Time | mAP | Time | mAP | mAP | mAP | mAP | mAP | mAP | mAP |

K = 32 | 0.4672 | 0.022907 | 0.3361 | 0.000846 | 0.4609 | 0.3235 | 0.5943 | 0.6327 | 0.6768 | 0.9396 | 0.9846 |

K = 64 | 0.4672 | 0.022907 | 0.3667 | 0.000927 | 0.5049 | 0.3631 | 0.6551 | 0.6831 | 0.7423 | 0.9718 | 0.9853 |

K = 96 | 0.4672 | 0.022907 | 0.4015 | 0.000971 | 0.5114 | 0.3840 | 0.6809 | 0.7342 | 0.7867 | 0.9762 | 0.9858 |

Method | GIST | PRH | KSH | COSDISH | SDH | DSH | DHN | DHNNs | CIDH | ||
---|---|---|---|---|---|---|---|---|---|---|---|

Bits | mAP | Time | mAP | Time | mAP | mAP | mAP | mAP | mAP | mAP | mAP |

K = 32 | 0.2439 | 0.040966 | 0.1816 | 0.000801 | 0.2164 | 0.1988 | 0.2444 | 0.4191 | 0.6953 | - | 0.8780 |

K = 64 | 0.2439 | 0.040966 | 0.2051 | 0.000872 | 0.2492 | 0.2245 | 0.3285 | 0.4585 | 0.7464 | - | 0.9043 |

K = 96 | 0.2439 | 0.040966 | 0.2199 | 0.000946 | 0.2599 | 0.2158 | 0.2599 | 0.4636 | 0.7682 | - | 0.9245 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Han, L.; Li, P.; Bai, X.; Grecos, C.; Zhang, X.; Ren, P. Cohesion Intensive Deep Hashing for Remote Sensing Image Retrieval. *Remote Sens.* **2020**, *12*, 101.
https://doi.org/10.3390/rs12010101

**AMA Style**

Han L, Li P, Bai X, Grecos C, Zhang X, Ren P. Cohesion Intensive Deep Hashing for Remote Sensing Image Retrieval. *Remote Sensing*. 2020; 12(1):101.
https://doi.org/10.3390/rs12010101

**Chicago/Turabian Style**

Han, Lirong, Peng Li, Xiao Bai, Christos Grecos, Xiaoyu Zhang, and Peng Ren. 2020. "Cohesion Intensive Deep Hashing for Remote Sensing Image Retrieval" *Remote Sensing* 12, no. 1: 101.
https://doi.org/10.3390/rs12010101