Vehicle Detection in Multisource Remote Sensing Images Based on Edge-Preserving Super-Resolution Reconstruction
Abstract
:1. Introduction
- (1)
- An SR construction module is established with the improved Local Implicit Image Function (LIIF) of a partial convolution-based padding to reconstruct high-resolution remote sensing images. Experimental results show that our module preserved clear edge structure and obtained better detection effects for small vehicle objects in remote sensing images.
- (2)
- An integrated framework of VDNET-RSI is constructed. In addition to SR reconstruction, the attention mechanisms and detection heads are added to expand the receptive field of vehicles, to further improve the robustness of vehicle detection, and to alleviate the problem of semantic information and spatial information.
2. Methods
2.1. Multi-Scale SR Reconstruction Module Considering Edge Optimization
2.2. Vehicle Detection Network Considering SR Reconstruction
3. Experiments
3.1. Experiment Data
3.2. Data Preprocessing
3.3. Evaluation Index
3.4. Result Analysis
- (1)
- Comparative Experimental Analysis of Vehicle Detection Using SR Reconstruction on Different Scales
- (2)
- Comparative Experiment and Analysis of Vehicle Detection Effects with Different Object Detection Methods
- (3)
- Ablation Experiment
- (4)
- Comparative Experiment and Analysis of Vehicle Detection Effects with Different Small Object Detection Methods
4. Discussion
4.1. Advantages and Disadvantages of Different Object Detection Methods
4.2. Application Scenarios
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Van Etten, A. You only look twice: Rapid multi-scale object detection in satellite imagery. arXiv 2018, arXiv:1805.09512. [Google Scholar]
- Ding, J.; Xue, N.; Xia, G.S.; Bai, X.; Yang, W.; Yang, M.Y.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; et al. Object detection in aerial images: A large-scale benchmark and challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 19, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Hou, Q.; Wang, Z.; Tan, F.; Zhao, Y.; Zheng, H.; Zhang, W. RISTDnet: Robust Infrared Small Target Detection Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7000805. [Google Scholar] [CrossRef]
- Noh, J.; Bae, W.; Lee, W.; Seo, J.; Kim, G. Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9725–9734. [Google Scholar]
- Rabbi, J.; Ray, N.; Schubert, M.; Chowdhury, S.; Chao, D. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Remote Sens. 2020, 12, 1432. [Google Scholar] [CrossRef]
- Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5966–5978. [Google Scholar] [CrossRef]
- Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4340–4354. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for infrared small object detection. IEEE Trans. Image Process 2022, 32, 364–376. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef]
- Sun, Y.; Cao, B.; Zhu, P.; Hu, Q. Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6700–6713. [Google Scholar] [CrossRef]
- Zhou, L.; Zheng, C.; Yan, H.; Zuo, X.; Liu, Y.; Qiao, B.; Yang, Y. RepDarkNet: A Multi-Branched Detector for Small-Target Detection in Remote Sensing Images. ISPRS Int. J. Geo-Inf. 2022, 11, 158. [Google Scholar] [CrossRef]
- Huang, J.; Rathod, V.; Sun, C.; Zhu, M.; Korattikara, A.; Fathi, A.; Murphy, K. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7310–7311. [Google Scholar]
- Chen, C.; Zhang, Y.; Lv, Q.; Wei, S.; Wang, X.; Sun, X.; Dong, J. Rrnet: A hybrid detector for object detection in drone-captured images. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 100–108. [Google Scholar]
- Liu, L.; Ouyang, W.L.; Wang, X.G.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Zhang, L.; Dong, R.; Yuan, S.; Li, W.; Zheng, J.; Fu, H. Making Low-Resolution Satellite Images Reborn: A Deep Learning Approach for Super-Resolution Building Extraction. Remote Sens. 2021, 13, 2872. [Google Scholar] [CrossRef]
- Guo, Z.; Wu, G.; Song, X.; Yuan, W.; Chen, Q.; Zhang, H.; Shi, X.; Xu, M.; Xu, Y.; Shibasaki, R.; et al. Super-Resolution Integrated Building Semantic Segmentation for Multi-Source Remote Sensing Imagery. IEEE Access 2019, 7, 99381–99397. [Google Scholar] [CrossRef]
- Schuegraf, P.; Bittner, K. Automatic Building Footprint Extraction from Multi-Resolution Remote Sensing Images Using a Hybrid FCN. ISPRS Int. J. Geo-Inf. 2019, 8, 191. [Google Scholar] [CrossRef]
- Shen, H.; Peng, L.; Yue, L.; Yuan, Q.; Zhang, L. Adaptive norm selection for regularized image restoration and super-resolution. IEEE Trans. Cybern. 2017, 46, 1388–1399. [Google Scholar] [CrossRef]
- Yang, W.; Feng, J.; Yang, J.; Zhao, F.; Liu, J.; Guo, Z.; Yan, S. Deep edge guided recurrent residual learning for image super-resolution. IEEE Trans. Image Process. 2017, 26, 5895–5907. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar]
- Romano, Y.; Isidoro, J.; Milanfar, P. RAISR: Rapid and accurate image super resolution. IEEE Trans. Comput. Imaging 2017, 3, 110–125. [Google Scholar] [CrossRef]
- Zhang, L.; Wu, X. An edge-guided image interpolation algorithm via directional filtering and data fusion. IEEE Trans. Image Process. 2006, 15, 2226–2238. [Google Scholar] [CrossRef]
- Ishii, M.; Takahashi, K.; Naemura, T. View interpolation based on super resolution reconstruction. Ieice Trans. Inf. Syst. 2010, 93, 1682–1684. [Google Scholar]
- Hsieh, C.C.; Huang, Y.P.; Chen, Y.Y.; Fuh, C.S. Video super-resolution by motion compensated iterative back-projection approach. J. Inf. Sci. Eng. 2011, 27, 1107–1122. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
- Zhang, H.; Wang, P.; Zhang, C.; Jiang, Z. A Comparable Study of CNN-Based Single Image Super-Resolution for Space-Based Imaging Sensors. Sensors 2019, 19, 3234. [Google Scholar] [CrossRef] [PubMed]
- Gao, M.; Han, X.H.; Li, J.; Ji, H.; Zhang, H.; Sun, J. Image super-resolution based on two-level residual learning CNN. Multimed. Tools Appl. 2020, 79, 4831–4846. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Agaian, S.; Panetta, K.; Grigoryan, A. Transform-based image enhancement algorithms with performance measure. IEEE Trans. Image Process 2001, 10, 367–382. [Google Scholar] [CrossRef]
- Zhu, H.; Gao, X.; Tang, X.; Xie, J.; Song, W.; Mo, F.; Jia, D. Super-resolution reconstruction and its application based on multilevel main structure and detail boosting. Remote Sens. 2018, 10, 2065. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Layer | Input/Output Channel | Layer | Input/Output Channel | Layer | Input/Output Channel |
---|---|---|---|---|---|
Mean shift | [3, 3] | Conv_4 | [256, 512] | Upsampling_3 | [256, 256] |
Conv_1 | [3, 64] | C3_3 | [512, 512] | Concat_3 | [256, 328] |
ResBlock | [3, 64] | Conv_5 | [512, 1024] | C3_CBAM_2 | [328, 256] |
Feature unfolding | [64, 576] | SPP | [1024, 1024] | Conv_9 | [256, 256] |
Local ensemble | [576, 580] | Conv_6 | [1024, 512] | Concat_4 | [256, 512] |
Linear_1 | [580, 256] | Upsampling_1 | [512, 512] | C3_CBAM_3 | [512, 256] |
Linear_2 | [256, 256] | Concat_1 | [512, 1024] | Conv_10 | [256, 512] |
Linear_3 | [256, 3] | C3_4 | [1024, 512] | Concat_5 | [512, 1024] |
Foucs | [3, 64] | Conv_7 | [512, 256] | C3_CBAM_4 | [1024, 512] |
Conv_2 | [64, 128] | Upsampling_2 | [256, 256] | Conv_11 | [512, 512] |
C3_1 | [128, 128] | Concat_2 | [256, 512] | Concat_6 | [512, 1024] |
Conv_3 | [128, 256] | C3_CBAM_1 | [256, 256] | C3_5 | [1024, 1024] |
C3_2 | [256, 256] | Conv_8 | [256, 256] |
×2 | ×3 | ×4 | |
---|---|---|---|
Image 1 | EME: 17.606 | EME: 14.766 | EME: 11.317 |
Avegrad: 0.011 | Avegrad: 0.005 | Avegrad: 0.003 | |
Image 2 | EME: 18.708 | EME: 15.462 | EME: 15.446 |
Avegrad: 0.011 | Avegrad: 0.005 | Avegrad: 0.005 | |
Image 3 | EME: 20.895 | EME: 16.169 | EME: 13.089 |
Avegrad: 0.010 | Avegrad: 0.005 | Avegrad: 0.003 |
Method | AP | Training Time/h | Inference Time/s | Parameters/M | GFLOPs |
---|---|---|---|---|---|
FCOS | 0.231 | 63.367 | 0.673 | 32.0 | 190.0 |
Faster-RCNN | 0.243 | 129.617 | 1.523 | 39.8 | 172.3 |
YOLOv5 | 0.566 | 38.915 | 0.15 | 47.0 | 115.4 |
VDNET-RSI | 0.629 | 155.067 | 0.278 | 50.4 | 168.4 |
Model | Input Image Size/Pixel | AP | Training Time/h | Inference Time/s |
---|---|---|---|---|
YOLOv5 | 800 × 800 | 0.566 | 38.915 | 0.150 |
YOLOv5_CBAM | 800 × 800 | 0.580 | 45.279 | 0.133 |
YOLOv5_D | 800 × 800 | 0.594 | 62.939 | 0.207 |
YOLOv5_D_CBAM | 800 × 800 | 0.605 | 68.732 | 0.167 |
VDNET-RSI | 800 × 800 | 0.629 | 155.067 | 0.278 |
Model | Advantages | Disadvantages |
---|---|---|
FCOS | There is no need to perform complex operations related to anchor boxes, greatly reducing the computational complexity of the algorithm and reducing the memory consumption during the training process. | Insufficient shallow feature extraction leads to a large number of missed vehicles and low precision. |
Faster-RCNN | The multitask Loss function is used to unify object classification and candidate box regression tasks, optimize the number of candidate boxes, and improve the detection speed. | The use of fixed-type anchor boxes for object detection is not suitable for small-size objects, and the extraction ability for small objects is poor, resulting in unsatisfactory vehicle detection performance in remote sensing images. |
YOLOv5 | By adding an adaptive anchor box calculation and feature fusion module, the feature extraction ability is relatively stronger and the detection performance is relatively better | For small objects smaller than 8 × 8 pixels, there are many vehicles undetected. |
UIU-NET | “U-Net in U-Net” framework to detect small objects in infrared images realizes multi-level and multi-scale feature learning of objects, and has good performance for small object detection in infrared images. | Missed detection of small objects under optical images, and false detection of similar inter-class ground objects. The edge effect of small object detection is poor. |
EESRGAN | A small object detection architecture using ESRGAN, EEN, achieves object detection for oil storage tanks, and vehicles with GSDs of 30 cm and 1.2 m. | If the object detection network is directly migrated to other databases, there will be error detection boxes, which may result in missed and false detections for small and dense vehicle detection. |
VDNET-RSI | Improved spatial resolution of small objects, added detection heads in the network detection layer for small object prediction, and added CBAM module in the Neck network to suppress irrelevant interference feature information expression in remote sensing images, thereby improving the detection precision for vehicles. | There are still a small number of missed and false detections for tiny and small vehicles |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, H.; Lv, Y.; Meng, J.; Liu, Y.; Hu, L.; Yao, J.; Lu, X. Vehicle Detection in Multisource Remote Sensing Images Based on Edge-Preserving Super-Resolution Reconstruction. Remote Sens. 2023, 15, 4281. https://doi.org/10.3390/rs15174281
Zhu H, Lv Y, Meng J, Liu Y, Hu L, Yao J, Lu X. Vehicle Detection in Multisource Remote Sensing Images Based on Edge-Preserving Super-Resolution Reconstruction. Remote Sensing. 2023; 15(17):4281. https://doi.org/10.3390/rs15174281
Chicago/Turabian StyleZhu, Hong, Yanan Lv, Jian Meng, Yuxuan Liu, Liuru Hu, Jiaqi Yao, and Xionghanxuan Lu. 2023. "Vehicle Detection in Multisource Remote Sensing Images Based on Edge-Preserving Super-Resolution Reconstruction" Remote Sensing 15, no. 17: 4281. https://doi.org/10.3390/rs15174281
APA StyleZhu, H., Lv, Y., Meng, J., Liu, Y., Hu, L., Yao, J., & Lu, X. (2023). Vehicle Detection in Multisource Remote Sensing Images Based on Edge-Preserving Super-Resolution Reconstruction. Remote Sensing, 15(17), 4281. https://doi.org/10.3390/rs15174281