Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks
Abstract
:1. Introduction
- how to measure the inherent properties of images.
- how to analyze the relationship between image properties and features learned from different structures.
- how to make the features learned within the network correspond to the semantic concepts of images for straightforward interpretation.
- We introduce scene complexity and analyze the relationship between remote sensing scenes of different complexity and the scale and hierarchy of feature learning in CNNs.
- We propose a scene complexity measure that integrates scene search difficulty and scene memorability. Besides, we construct the first scene complexity dataset in remote sensing.
- We design a scene complexity prediction framework to adapt different complexity data to the network depth and scale, which effectively improves the downstream model’s recognition accuracy and reduces the number of parameters.
- We visualized and analyzed the relationship between semantic concept representation and model feature learning for scenes of different complexity, showing that complex scenes rely on learning multi-object features jointly to support semantic representation.
2. Materials and Methods
2.1. The Scene Complexity Dataset Construction
- Given an image they were required to answer “yes” or “no” to a question about whether there was a particular object class in the image or point to a location when asked to locate a randomly selected object in the image, for example,” Is there an airplane?” or “Where is the house?”;
- The response time to correctly answer the questions about the image was recorded;
- For each image, the average response time for the two types of question, across all the volunteers, was calculated;
- The sum of the search difficulty score and the memorability score is used as the scene complexity score of an image (Figure 3).
2.2. Methods
2.2.1. How to Control the Scale of Learning Feature
2.2.2. How to Control the Hierarchy of Learning Feature
2.2.3. Designing Image-Adaptive Networks with Scene Complexity
2.2.4. Class Activation Mapping and Semantic Representation
2.3. Training Details
3. Results
3.1. How the Scale of Feature Learning Influences the Recognition of Scenes with Different Complexity
3.2. How the Hierarchy of Feature Learning Influences the Recognition of Scenes with Different Complexity
3.3. How Adaptive Networks Based on Scene Complexity Improve Model’s Performance
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yang, M.Y.; Liao, W.; Ackermann, H.; Rosenhahn, B. On support relations and semantic scene graphs. ISPRS J. Photogramm. Remote Sens. 2017, 131, 15–25. [Google Scholar] [CrossRef] [Green Version]
- Geiger, A.; Lauer, M.; Wojek, C.; Stiller, C.; Urtasun, R. 3d traffic scene understanding from movable platforms. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1012–1025. [Google Scholar] [CrossRef] [Green Version]
- Baek, J.; Chelu, I.V.; Iordache, L.; Paunescu, V.; Ryu, H.; Ghiuta, A.; Petreanu, A.; Soh, Y.; Leica, A.; Jeon, B. Scene understanding networks for autonomous driving based on around view monitoring system. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1074–10747. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; MIT Press: Las Vegas, NV, USA, 2012; pp. 1097–1105. [Google Scholar]
- Shen, L.; Lin, Z.; Huang, Q. Relay backpropagation for effective learning of deep convolutional neural networks. In European Conference on Computer Vision; Springer: Amsterdam, The Netherlands, 2016; pp. 467–482. [Google Scholar]
- Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Dalla Mura, M. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Jain, A.K.; Ratha, N.K.; Lakshmanan, S. Object detection using Gabor filters. Pattern Recognit. 1997, 30, 295–309. [Google Scholar] [CrossRef]
- Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE/CVF Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef] [Green Version]
- Hu, F.; Xia, G.S.; Hu, J.; Zhang, L. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef] [Green Version]
- Zou, Q.; Ni, L.; Zhang, T.; Wang, Q. Deep learning based feature selection for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2321–2325. [Google Scholar] [CrossRef]
- Chen, C.; Gong, W.; Chen, Y.; Li, W. Object Detection in Remote Sensing Images Based on a Scene-Contextual Feature Pyramid Network. Remote Sens. 2019, 11, 339. [Google Scholar] [CrossRef] [Green Version]
- Qu, H.; Zhang, L.; Wu, X.; He, X.; Hu, X.; Wen, X. Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation. Appl. Sci. 2019, 9, 565. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Li, J.; He, L.; Wang, Y. Superpixel-Guided Layer-Wise Embedding CNN for Remote Sensing Image Classification. Remote Sens. 2019, 11, 174. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef] [Green Version]
- Egli, S.; Höpke, M. CNN-Based Tree Species Classification Using High Resolution RGB Image Data from Automated UAV Observations. Remote Sens. 2020, 12, 3892. [Google Scholar] [CrossRef]
- Taoufiq, S.; Nagy, B.; Benedek, C. HierarchyNet: Hierarchical CNN-Based Urban Building Classification. Remote Sens. 2020, 12, 3794. [Google Scholar] [CrossRef]
- Liu, Y.; Suen, C.Y.; Liu, Y.; Ding, L. Scene classification using hierarchical Wasserstein CNN. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2494–2509. [Google Scholar] [CrossRef]
- Feixas, M.; Acebo, E.D.; Bekaert, P.; Sbert, M. An Information Theory Framework for the Analysis of Scene Complexity. Comput. Graph. Forum 2010, 18, 95–106. [Google Scholar] [CrossRef]
- Moosmann, F.; Larlus, D.; Jurie, F. Learning saliency maps for object categorization. In Proceedings of the Eccv’06 Workshop on the Representation & Use of Prior Knowledge in Vision, Graz, Austria, 7–13 May 2006. [Google Scholar]
- Tian, M.; Wan, S.; Yue, L. A Novel Approach for Change Detection in Remote Sensing Image Based on Saliency Map. In Computer Graphics, Imaging and Visualisation; IEEE: Bangkok, Thailand, 2007; pp. 397–402. [Google Scholar]
- Isola, P.; Xiao, J.; Parikh, D.; Torralba, A.; Oliva, A. What Makes a Photograph Memorable? IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1469–1482. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ionescu, R.T.; Alexe, B.; Leordeanu, M.; Popescu, M.; Papadopoulos, D.P.; Ferrari, V. How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2157–2166. [Google Scholar]
- Ayromlou, M.; Zillich, M.; Ponweiser, W.; Vincze, M. Measuring scene complexity to adapt feature selection of model-based object tracking. In International Conference on Computer Vision Systems; Springer: Nice, France, 2003; pp. 448–459. [Google Scholar]
- Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision; Springer: Zurich, Switzerland, 2014; pp. 818–833. [Google Scholar]
- Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Xiao, J.; Hays, J.; Ehinger, K.A.; Oliva, A.; Torralba, A. Sun database: Large-scale scene recognition from abbey to zoo. In Proceedings of the 2010 IEEE conference on Computer vision and pattern recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 3485–3492. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 1–9. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
- Liu, R.; Lehman, J.; Molino, P.; Such, F.P.; Frank, E.; Sergeev, A.; Yosinski, J. An intriguing failing of convolutional neural networks and the coordconv solution. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 9605–9616. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Object Detectors Emerge in Deep Scene CNNs. arXiv 2014, arXiv:1412.6856. [Google Scholar]
- McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Medica 2012, 22, 276–282. [Google Scholar] [CrossRef]
- Li, H.; Dou, X.; Tao, C.; Wu, Z.; Chen, J.; Peng, J.; Deng, M.; Zhao, L. RSI-CB: A Large-Scale Remote Sensing Image Classification Benchmark Using Crowdsourced Data. Sensors 2020, 20, 1594. [Google Scholar] [CrossRef] [Green Version]
ConvNet Configuration | ||||
---|---|---|---|---|
VGG-16 | VGG-16*-A | VGG-16*-B | VGG-16*-C | VGG-16*-D |
Conv3-64 Conv3-64 | ||||
Maxpool | ||||
Conv3-128 Conv3-128 | ||||
Maxpool | ||||
Conv3-256 Conv3-256 Conv3-256 | ||||
Maxpool | Maxpool | Maxpool|GAP | Maxpool|GAP | Maxpool|GAP |
\ | FC-1024 FC-6 | FC-1024 FC-6 | FC-1024 FC-6 | FC-1024 FC-6 |
Conv3-512 Conv3-512 Conv3-512 | ||||
Maxpool | Maxpool | Maxpool | Maxpool|GAP | Maxpool|GAP |
\ | FC-1024 FC-9 | FC-1024 FC-9 | FC-1024 FC-9 | FC-1024 FC-9 |
Conv3-512 Conv3-512 Conv3-512 | ||||
Maxpool | Maxpool | Maxpool | Maxpool | Maxpool|GAP |
FC-4096 FC-4096 FC-22 soft-max | FC-1024 FC-7 soft-max | FC-1024 FC-7 soft-max | FC-1024 FC-7 soft-max | FC-1024 FC-7 soft-max |
Model | GoogLeNet | Inception 1 × 1 | Inception 3 × 3 | Inception 5 × 5 |
---|---|---|---|---|
OA | 0.8329 | 0.6863 | 0.7761 | 0.7815 |
Kappa | 0.8269 | 0.6708 | 0.7651 | 0.7706 |
Model | Train Accuracy (%) | Test Accuracy (%) |
---|---|---|
VGG-16 | 96.15 | 94.1 |
VGG-19 | 96.89 | 94.91 |
Model | Train Accuracy (%) | Test Accuracy (%) | Iteration | Parameters |
---|---|---|---|---|
VGG-16 | 96.15 | 94.10 | 200,000 | 134M |
VGG-16*-A | 97.56 | 94.51 | 240,000 | 343M |
VGG-16*-B | 97.89 | 95.42 | 150,000 | 138M |
VGG-16*-C | 98.11 | 95.80 | 100,000 | 37M |
VGG-16*-D | 97.20 | 94.86 | 80,000 | 12M |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Peng, J.; Mei, X.; Li, W.; Hong, L.; Sun, B.; Li, H. Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks. Remote Sens. 2021, 13, 742. https://doi.org/10.3390/rs13040742
Peng J, Mei X, Li W, Hong L, Sun B, Li H. Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks. Remote Sensing. 2021; 13(4):742. https://doi.org/10.3390/rs13040742
Chicago/Turabian StylePeng, Jian, Xiaoming Mei, Wenbo Li, Liang Hong, Bingyu Sun, and Haifeng Li. 2021. "Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks" Remote Sensing 13, no. 4: 742. https://doi.org/10.3390/rs13040742
APA StylePeng, J., Mei, X., Li, W., Hong, L., Sun, B., & Li, H. (2021). Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks. Remote Sensing, 13(4), 742. https://doi.org/10.3390/rs13040742