MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images
Abstract
:1. Introduction
- RSIs frequently contain many small-sized objects that occupy fewer pixels in the image. Due to the layer-by-layer feature extraction in the convolutional neural network, some small and medium-sized objects are hard to maintain in the feature map of the highest layer. As a result, the model cannot collect enough information about object features, so it is prone to ignore small objects during the detection process.
- Many scenes in RSIs have dense objects, which cause overlapping of object candidate frames during the object detection process. Overlapping candidate boxes interfere with each other. In processing, it is common to erroneously eliminate the candidate frame containing the object.
- RSIs have different shooting angles, and the object’s size varies widely. Images often contain objects of various scales and sizes, which may be different sizes than the objects detected in stock photos. As a result, the shooting angles and size of the object can have a negative impact on remote sensing object detection (RSOD).
- RSIs have complex backgrounds since they are usually taken from a wide viewing angle, and the images contain backgrounds of different colors. During the detection process, the object is easily disturbed by the background, making object detection difficult.
- We propose a feature enhancement module called MDC block, which combines multiple convolution kernels and dilated convolutions to improve the performance of small object feature extraction and increase the receptive field. This module is integrated into the one-stage object detection model, considering both the ontology and adjacent spatial features of small objects. It extracts high-resolution images from low-resolution images, significantly enhancing RSI detection accuracy.
- We add the transformer block into the neck of a one-stage object detection model for the problems of dense sense and complex backgrounds in RSIs. The model can utilize adequate information from shallow to deep layers. The multi-head attention mechanism attaches importance to the relevant details among many image pixels. Therefore, the block prevents the loss of object information in complex backgrounds and dense scenes.
- To solve the problem of high computational costs, we use depthwise separable convolution instead of the conventional convolution in the MDC block. This means maintaining a balance between accuracy and speed.
2. Related Work
2.1. Traditional Remote Sensing Object Detection Methods
2.2. Remote Sensing Object Detection Based on DL
2.3. Remote Sensing Object Detection Based on Transformer
2.4. Remote Sensing Object Detection Based on Context Information
3. Methodology
3.1. Proposed Model Overview
3.2. Feature Enhancement Module
3.3. Transformer Block
3.4. One-Stage Multiscale Detection Network
3.4.1. Backbone Network
3.4.2. Neck Network
3.4.3. Detection Head
3.5. Loss Function
Algorithm 1 One-stage RSOD algorithm. |
|
4. Experiments
4.1. Dataset Introduction
- DIOR Dataset.The DIOR dataset is a massive object detection benchmark dataset for RSIs. The dataset includes 23,463 optical images and 192,472 instances from 20 different item classes. The 20 object classes are airplane, airport, baseball field (BF), basketball court (BC), bridge, chimney, dam, expressway service area (ESA), expressway toll station (ETS), golf court (GC), ground track field (GF), harbor, overpass, ship, stadium, storage tank (ST), tennis court (TC), train station (TS), vehicle, windmill.
- DOTA Dataset. The DOTA dataset is one of the large datasets for object detection in RSIs. There are 2806 RSIs in total from a range of sensors and platforms. Each image features items of various sizes, orientations, and shapes and ranges in resolution from 800 × 800 to 4000 × 4000 pixels. Experts annotated these DOTA images using 15 common object categories, including plane, baseball diamond (BD), bridge, ground track field (BF), small vehicle (SV), large vehicle (LV), ship, tennis court (TC), basketball court (BC), storage tank (ST), soccer ball field (SF), roundabout (RA), harbor, swimming pool (SP), helicopter.
- NWPU VHR-10 Dataset. The NWPU VHR-10 dataset consists of 800 high-resolution images selected from the Google Earth and Vaihingen databases. The experts then manually annotated 10 common object categories, including airplane, ship, storage tank (ST), baseball diamond (BD), tennis court (TC), basketball court (BC), ground track field (GF), harbor, bridge, and vehicle.
4.2. Evaluation Metrics
4.2.1. Experiments on the DIOR Dataset
4.2.2. Experiments on the DOTA Dataset
4.2.3. Experiments on the NWPU VHR-10 Dataset
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, Q.; Cong, R.; Li, C.; Cheng, M.M.; Fang, Y.; Cao, X.; Zhao, Y.; Kwong, S. Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images. IEEE Trans. Image Process. 2020, 30, 1305–1317. [Google Scholar] [CrossRef] [PubMed]
- Zhong, P.; Wang, R. A multiple conditional random fields ensemble model for urban area detection in remote sensing optical images. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3978–3988. [Google Scholar] [CrossRef]
- He, Z. Deep Learning in Image Classification: A Survey Report. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020; pp. 174–177. [Google Scholar]
- Lim, J.-S.; Astrid, M.; Yoon, H.-J.; Lee, S.-I. Small object detection using context and attention. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021; pp. 181–186. [Google Scholar]
- Zhang, H.; Luo, G.; Li, J.; Wang, F.-Y. C2FDA: Coarse-to-Fine Domain Adaptation for Traffic Object Detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 12633–12647. [Google Scholar] [CrossRef]
- Li, Z.; Wang, Y.; Zhang, N.; Zhang, Y.; Zhao, Z.; Xu, D.; Ben, G.; Gao, Y. Deep Learning-Based Object Detection Techniques for Remote Sensing Images: A Survey. Remote Sens. 2022, 14, 2385. [Google Scholar] [CrossRef]
- Lin, A.; Sun, X.; Wu, H.; Luo, W.; Wang, D.; Zhong, D.; Wang, Z.; Zhao, L.; Zhu, J. Identifying Urban Building Function by Integrating Remote Sensing Imagery and POI Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8864–8875. [Google Scholar] [CrossRef]
- Chen, Z.; Zhou, Q.; Liu, J.; Wang, L.; Ren, J.; Huang, Q.; Deng, H.; Zhang, L.; Li, D. Charms—China agricultural remote sensing monitoring system. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Vancouver, BC, Canada, 24–29 July 2011; pp. 3530–3533. [Google Scholar]
- Shimoni, M.; Haelterman, R.; Perneel, C. Hypersectral imaging for military and security applications: Combining myriad processing and sensing techniques. IEEE Geosci. Remote Sens. Mag. 2019, 7, 101–107. [Google Scholar] [CrossRef]
- Dong, R.; Xu, D.; Zhao, J.; Jiao, L.; An, J. Sig-NMS-Based Faster R-CNN Combining Transfer Learning for Small Target Detection in VHR Optical Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8534–8545. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [Green Version]
- Liao, L.; Du, L.; Guo, Y. Semi-Supervised SAR Target Detection Based on an Improved Faster R-CNN. Remote Sens. 2021, 14, 143. [Google Scholar] [CrossRef]
- Liu, X.; Yang, Z.; Hou, J.; Huang, W. Dynamic Scene’s Laser Localization by NeuroIV-Based Moving Objects Detection and LiDAR Points Evaluation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5230414. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, Y.; Qi, J.; Bin, K.; Wen, H.; Tong, X.; Zhong, P. Adversarial Patch Attack on Multi-Scale Object Detection for UAV Remote Sensing Images. Remote Sens. 2022, 14, 5298. [Google Scholar] [CrossRef]
- Qu, J.; Su, C.; Zhang, Z.; Razi, A. Dilated convolution and feature fusion SSD network for small object detection in remote sensing images. IEEE Access 2020, 8, 82832–82843. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, H.; Li, J. TRS: Transformers for Remote Sensing Scene Classification. Remote Sens. 2021, 13, 4143. [Google Scholar] [CrossRef]
- Zheng, K.; Dong, Y.; Xu, W.; Su, Y.; Huang, P. A Method of Fusing Probability-Form Knowledge into Object Detection in Remote Sensing Images. Remote Sens. 2022, 14, 6103. [Google Scholar] [CrossRef]
- Kim, T.; Park, S.R.; Kim, M.G.; Jeong, S.; Kim, K.O.J.P.E.; Sensing, R. Tracking Road Centerlines from High Resolution Remote Sensing Images by Least Squares Correlation Matching. Photogramm. Eng. Remote Sens. 2004, 70, 1417–1422. [Google Scholar] [CrossRef]
- An, R.; Gong, P.; Wang, H.; Feng, X.; Xiao, P.; Chen, Q.; Yan, P. A modified PSO algorithm for remote sensing image template matching. Photogramm. Eng. Remote Sens. 2010, 76, 379–389. [Google Scholar]
- Rizvi, I.A.; Mohan, K.B. Object-Based Image Analysis of High-Resolution Satellite Images Using Modified Cloud Basis Function Neural Network and Probabilistic Relaxation Labeling Process. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4815–4820. [Google Scholar] [CrossRef]
- Luccheseyz, L.; Mitray, S.K. Color image segmentation: A state-of-the-art survey. Proc. Indian Natl. Sci. Acad. 2001, 67, 207–221. [Google Scholar]
- Huang, Y.; Wu, Z.; Wang, L.; Tan, T. Feature Coding in Image Classification: A Comprehensive Study. IEEE Trans. Softw. Eng. 2013, 36, 493–506. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Fei-Fei, L.; Perona, P. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Han, J.; Guo, L.; Qian, X.; Zhou, P.; Yao, X.; Hu, X. Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS J. Photogramm. Remote Sens. 2013, 85, 32–43. [Google Scholar] [CrossRef]
- Shi, Z.W.; Yu, X.R.; Jiang, Z.G.; Li, B. Ship Detection in High-Resolution Optical Imagery Based on Anomaly Detector and Local Shape Feature. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4511–4523. [Google Scholar]
- Cao, Y.; Niu, X.; Dou, Y. Region-based convolutional neural networks for object detection in very high resolution remote sensing images. In Proceedings of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Changsha, China, 13–15 August 2016. [Google Scholar]
- Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2337–2348. [Google Scholar] [CrossRef]
- Zhang, Y.H.; Fu, K.; Sun, H.; Sun, X.; Zheng, X.W.; Wang, H.Q. A multi-model ensemble method based on convolutional neural networks for aircraft detection in large remote sensing images. Remote Sens. Lett. 2018, 9, 11–20. [Google Scholar] [CrossRef]
- Liu, W.; Ma, L.; Chen, H. Arbitrary-Oriented Ship Detection Framework in Optical Remote-Sensing Images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 937–941. [Google Scholar] [CrossRef]
- Li, Y.; Mao, H.; Liu, R.; Pei, X.; Jiao, L.; Shang, R. A Lightweight Keypoint-Based Oriented Object Detection of Remote Sensing Images. Remote Sens. 2021, 13, 2459. [Google Scholar] [CrossRef]
- Lu, X.; Ji, J.; Xing, Z.; Miao, Q. Attention and feature fusion SSD for remote sensing object detection. IEEE Trans. Instrum. Meas. 2021, 70, 5501309. [Google Scholar] [CrossRef]
- Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for multi-scale remote sensing target detection. Sensors 2020, 20, 4276. [Google Scholar] [CrossRef]
- Yang, X.; Liu, Q.; Yan, J.; Li, A. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. arXiv 2019, arXiv:1908.05612. [Google Scholar] [CrossRef]
- Qian, W.; Yang, X.; Peng, S.; Guo, Y.; Yan, J. Learning Modulated Loss for Rotated Object Detection. arXiv 2019, arXiv:1911.08299. [Google Scholar] [CrossRef]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xiao, Z.; Qian, L.; Shao, W.; Tan, X.; Wang, K. Axis learning for orientated objects detection in aerial images. Remote Sens. 2020, 12, 908. [Google Scholar] [CrossRef] [Green Version]
- Wei, H.; Zhang, Y.; Chang, Z.; Li, H.; Wang, H.; Sun, X. Oriented objects as pairs of middle lines. ISPRS J. Photogramm. Remote Sens. 2020, 169, 268–279. [Google Scholar] [CrossRef]
- Qing, Y.; Liu, W.; Feng, L.; Gao, W. Improved YOLO network for free-angle remote sensing target detection. Remote Sens. 2021, 13, 2171. [Google Scholar] [CrossRef]
- Lang, K.; Yang, M.; Wang, H.; Wang, H.; Wang, Z.; Zhang, J.; Shen, H. Improved One-Stage Detectors with Neck Attention Block for Object Detection in Remote Sensing. Remote Sens. 2022, 14, 5805. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing Convolutions to Vision Transformers. arXiv 2021, arXiv:2103.15808. [Google Scholar]
- Zheng, Y.; Sun, P.; Zhou, Z.; Xu, W.; Ren, Q. ADT-Det: Adaptive Dynamic Refined Single-Stage Transformer Detector for Arbitrary-Oriented Object Detection in Satellite Optical Imagery. Remote Sens. 2021, 13, 2623. [Google Scholar] [CrossRef]
- Gong, H.; Mu, T.; Li, Q.; Dai, H.; Li, C.; He, Z.; Wang, B. Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images. Remote Sens. 2022, 14, 2861. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, S.; Zhao, J.; Yao, R.; Xue, Y.; El Saddik, A. CLT-Det: Correlation Learning Based on Transformer for Detecting Dense Objects in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4708915. [Google Scholar] [CrossRef]
- Li, Q.; Chen, Y.; Zeng, Y. Transformer with Transfer CNN for Remote-Sensing-Image Object Detection. Remote Sens. 2022, 14, 984. [Google Scholar] [CrossRef]
- Zhou, T.; Li, L.; Bredell, G.; Li, J.; Konukoglu, E. Volumetric memory network for interactive medical image segmentation. Med. Image Anal. 2022, 83, 102599. [Google Scholar] [CrossRef] [PubMed]
- Zhou, T.; Li, J.; Wang, S.; Tao, R.; Shen, J. Matnet: Motion-attentive transition network for zero-shot video object segmentation. IEEE Trans. Image Process. 2020, 29, 8326–8338. [Google Scholar] [CrossRef]
- Li, J.; Wei, Y.; Liang, X.; Dong, J.; Xu, T.; Feng, J.; Yan, S. Attentive Contexts for Object Detection. IEEE Trans. Multimed. 2017, 19, 944–954. [Google Scholar] [CrossRef] [Green Version]
- Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2874–2883. [Google Scholar]
- Wang, S.; Zhou, T.; Lu, Y.; Di, H. Contextual Transformation Network for Lightweight Remote-Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5615313. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, C.; Liu, C.; Li, Z. Context Information Refinement for Few-Shot Object Detection in Remote Sensing Images. Remote Sens. 2022, 14, 3255. [Google Scholar] [CrossRef]
- Ma, W.; Guo, Q.; Wu, Y.; Zhao, W.; Zhang, X.; Jiao, L. A Novel Multi-Model Decision Fusion Network for Object Detection in Remote Sensing Images. Remote Sens. 2019, 11, 737. [Google Scholar] [CrossRef] [Green Version]
- Wei, Y.; Xiao, H.; Shi, H.; Jie, Z.; Feng, J.; Huang, T.S. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18 June 2018; pp. 7268–7277. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Ultralytics-Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 January 2021).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L. Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. Lett. 2021, 18, 431–435. [Google Scholar] [CrossRef]
- Yuan, Z.; Liu, Z.; Zhu, C.; Qi, J.; Zhao, D. Object Detection in Remote Sensing Images via Multi-Feature Pyramid Network with Receptive Field Block. Remote Sens. 2021, 13, 862. [Google Scholar] [CrossRef]
- Wang, J.; Wang, Y.; Wu, Y.; Zhang, K.; Wang, Q. FRPNet: A Feature-Reflowing Pyramid Network for Object Detection of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. Lett. 2021, 19, 8004405. [Google Scholar] [CrossRef]
- Wang, G.; Zhuang, Y.; Chen, H.; Liu, X.; Zhang, T.; Li, L.; Dong, S.; Sang, Q. FSoD-Net: Full-Scale Object Detection From Optical Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. Lett. 2021, 60, 5602918. [Google Scholar] [CrossRef]
- Tian, Z.; Zhan, R.; Hu, J.; Wang, W.; He, Z.; Zhuang, Z. Generating Anchor Boxes Based on Attention Mechanism for Object Detection in Remote Sensing Images. Remote Sens. 2020, 12, 2416. [Google Scholar] [CrossRef]
- Shi, L.; Kuang, L.; Xu, X.; Pan, B.; Shi, Z. CANet: Centerness-Aware Network for Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. Lett. 2021, 60, 5603613. [Google Scholar] [CrossRef]
- Wang, G.; Zhuang, Y.; Wang, Z.; Chen, H.; Shi, H.; Chen, L. Spatial Enhanced-SSD For Multiclass Object Detection in Remote Sensing Images. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 318–321. [Google Scholar]
- Chen, J.; Wan, L.; Zhu, J.; Xu, G.; Deng, M. Multi-Scale Spatial and Channel-wise Attention for Improving Object Detection in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. Lett. 2020, 17, 681–685. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhong, Y.; Ma, A.; Han, X.; Zhao, J.; Liu, Y.; Zhang, L. HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2020, 166, 1–14. [Google Scholar] [CrossRef]
- Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef] [Green Version]
- Zhang, W.; Jiao, L.; Li, Y.; Huang, Z.; Wang, H. Laplacian Feature Pyramid Network for Object Detection in VHR Optical Remote. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5604114. [Google Scholar] [CrossRef]
Method | TRD [48] | Ref. [64] | MFPNet [65] | FRPNet [66] | FSoD-Net [67] | Ref. [68] | CANet [69] | Ours | |
---|---|---|---|---|---|---|---|---|---|
Category | |||||||||
Proposals | HBB | HBB | HBB | HBB | HBB | HBB | HBB | HBB | |
Dataset Set | 50%, 50% | 50%, 50% | 50%, 50% | 50%, 50% | 50%, 50% | 50%, 50% | 50%, 50% | 50%, 50% | |
Airplane | 77.9 | 57.2 | 76.6 | 64.5 | 88.9 | 70.5 | 70.3 | 92.5 | |
Airport | 80.5 | 79.6 | 83.4 | 82.6 | 66.9 | 81.9 | 82.4 | 85.0 | |
BF | 70.1 | 70.1 | 80.6 | 77.7 | 86.8 | 76.5 | 72.0 | 93.5 | |
BC | 86.3 | 87.4 | 85.1 | 81.7 | 90.2 | 89.3 | 87.8 | 84.7 | |
Bridge | 39.7 | 46.1 | 44.3 | 47.1 | 45.5 | 49.0 | 55.7 | 53.7 | |
Chimney | 77.9 | 76.6 | 75.6 | 69.6 | 79.6 | 79.5 | 79.9 | 90.2 | |
Dam | 59.3 | 62.7 | 68.5 | 50.6 | 48.2 | 66.0 | 67.7 | 74.3 | |
ESA | 59.0 | 82.6 | 85.9 | 80.0 | 86.9 | 85.2 | 83.5 | 79.9 | |
ETS | 54.4 | 73.2 | 63.9 | 71.7 | 75.5 | 71.9 | 77.2 | 68.2 | |
GC | 74.6 | 78.2 | 77.3 | 81.3 | 67.0 | 81.2 | 77.3 | 68.6 | |
GF | 73.9 | 81.6 | 77.2 | 77.4 | 77.3 | 83.3 | 83.6 | 92.9 | |
Harbor | 49.2 | 50.7 | 62.1 | 78.7 | 53.6 | 52.8 | 56.0 | 68.4 | |
Overpass | 57.8 | 59.5 | 58.8 | 82.4 | 59.7 | 62.2 | 63.6 | 83.8 | |
Ship | 74.2 | 73.3 | 77.2 | 62.9 | 78.3 | 77.1 | 81.0 | 92.9 | |
Stadium | 61.1 | 63.4 | 76.8 | 72.6 | 69.9 | 76.0 | 79.8 | 77.4 | |
ST | 69.8 | 58.5 | 60.3 | 67.6 | 75.0 | 72.4 | 70.8 | 83.0 | |
TC | 84.0 | 85.9 | 86.4 | 81.2 | 91.4 | 87.7 | 88.2 | 92.8 | |
TS | 58.8 | 61.9 | 64.5 | 65.2 | 52.3 | 64.1 | 67.6 | 64.7 | |
Vehicle | 50.5 | 42.9 | 41.5 | 52.7 | 52.0 | 55.0 | 51.2 | 77.4 | |
WindMill | 77.2 | 86.9 | 80.2 | 89.1 | 90.6 | 90.3 | 89.6 | 83.0 | |
mAP | 66.8 | 68.0 | 71.2 | 71.8 | 71.8 | 73.6 | 74.3 | 80.5 |
Method | SE-SSD [70] | Ref. [71] | FSoD-Net [67] | HyNet [72] | YOLOv5 | MDCT(ours) | |
---|---|---|---|---|---|---|---|
Category | |||||||
Proposals | HBB | HBB | HBB | HBB | HBB | HBB | |
Dataset Set | 50%, 16%, 34% | 50%, 16%, 34% | 50%, 16%, 34% | 50%, 16%, 34% | 50%, 16%, 34% | 50%, 16%, 34% | |
Plane | 88.6 | 78.1 | 93.1 | 86.7 | 92.9 | 93.8 | |
BD | 74.6 | 67.7 | 71.1 | 58.7 | 85.3 | 88.6 | |
Bridge | 46.5 | 28.3 | 51.3 | 43.7 | 56.8 | 59.4 | |
GF | 61.9 | 42.1 | 57.5 | 54.0 | 71.9 | 74.6 | |
SV | 74.7 | 24.0 | 78.2 | 64.4 | 71.6 | 76.6 | |
V | 73.9 | 62.2 | 82.2 | 80.5 | 86.2 | 88.2 | |
Ship | 86.3 | 48.3 | 92.4 | 87.8 | 93.5 | 95.0 | |
TC | 90.4 | 82.6 | 92.9 | 85.1 | 95.0 | 96.2 | |
BC | 81.3 | 45.0 | 78.9 | 53.9 | 74.3 | 77.4 | |
ST | 79.9 | 38.4 | 93.5 | 60.3 | 77.2 | 76.2 | |
SF | 41.7 | 40.5 | 53.3 | 41.8 | 47.3 | 49.1 | |
RA | 56.9 | 36.4 | 70.6 | 47.6 | 73.0 | 73.6 | |
Harbor | 71.7 | 69.2 | 64.9 | 76.6 | 83.1 | 84.9 | |
SP | 73.4 | 38.5 | 78.7 | 48.7 | 61.5 | 60.6 | |
Helicopter | 59.6 | 36.1 | 64.9 | 40.3 | 53.5 | 43.7 | |
mAP | 70.8 | 49.2 | 75.33 | 62.0 | 74.8 | 75.7 |
Category | Proposals | Dataset Set | Airplane | Ship | ST | BD | TC | BC | GF | Harbor | Bridge | Vehicle | mAP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | ||||||||||||||
Ref. [29] | HBB | 75%,25% | 99.7 | 90.8 | 90.6 | 92.9 | 90.3 | 80.1 | 90.8 | 80.3 | 68.5 | 87.1 | 87.1 | |
CAD-Net [73] | HBB | 75%,25% | 97.0 | 77.9 | 95.6 | 93.6 | 87.6 | 87.1 | 99.6 | 100 | 86.2 | 89.9 | 91.5 | |
FPNet [74] | HBB | 75%,25% | 100 | 89.5 | 90.9 | 96.8 | 96.6 | 99.2 | 100 | 90.1 | 79.0 | 90.2 | 93.2 | |
CANet [69] | HBB | 75%,25% | 99.9 | 86.0 | 99.3 | 97.3 | 97.8 | 84.8 | 98.4 | 90.4 | 89.2 | 90.3 | 93.3 | |
YOLOv5 | HBB | 75%,25% | 99.6 | 84.7 | 95.9 | 97.4 | 98.9 | 88.8 | 99.5 | 95.2 | 78.5 | 88.0 | 92.8 | |
MDCT(ours) | HBB | 75%,25% | 99.6 | 90.2 | 97.6 | 98.1 | 99.0 | 92.9 | 99.4 | 97.4 | 87.7 | 94.9 | 95.7 |
Method | YOLOv5 | YOLOv5+MDC-K | YOLOv5+MDC-D | YOLOv5+MDC | MDCT (ours) | |
---|---|---|---|---|---|---|
Category | ||||||
Airplane | 93.7 | 93.8 | 94.3 | 94.3 | 92.5 | |
Airport | 80.8 | 81.2 | 83.2 | 82.8 | 85.0 | |
BF | 93.3 | 93.7 | 93.2 | 93.4 | 93.5 | |
BC | 83.6 | 86.6 | 86.5 | 86.1 | 84.7 | |
Bridge | 55.1 | 54.8 | 53.7 | 53.5 | 53.7 | |
Chimney | 90.2 | 90.0 | 91.1 | 91.4 | 90.2 | |
Dam | 67.0 | 67.5 | 66.7 | 67.0 | 74.3 | |
ESA | 72.9 | 75.6 | 78.4 | 81.0 | 79.9 | |
ETS | 66.5 | 67.2 | 68.9 | 68.3 | 68.2 | |
GC | 65.6 | 66.5 | 66.6 | 68.4 | 68.6 | |
GF | 93.9 | 93.8 | 94.2 | 94.2 | 92.9 | |
Harbor | 66.3 | 67.2 | 65.5 | 67.7 | 68.4 | |
Overpass | 83.3 | 84.7 | 84.3 | 84.2 | 93.8 | |
Ship | 91.5 | 91.9 | 92.3 | 92.6 | 92.9 | |
Stadium | 77.9 | 79.6 | 80.3 | 80.0 | 77.4 | |
SR | 80.6 | 83.4 | 81.2 | 82.8 | 83.0 | |
TC | 92.1 | 93.3 | 93.2 | 93.6 | 92.8 | |
TS | 59.1 | 59.9 | 62.3 | 61.4 | 64.7 | |
Vehicle | 77.9 | 79.6 | 80.3 | 80.0 | 77.4 | |
WindMill | 79.6 | 83.4 | 81.9 | 82.8 | 83.0 | |
[email protected] | 78.2 | 79.3 | 79.8 | 80.2 | 80.5 | |
[email protected]:.95 | 53.2 | 53.6 | 54.7 | 54.9 | 55.4 | |
Precision | 84.4 | 84.6 | 85.2 | 88.2 | 87.8 | |
Recall | 75.5 | 75.3 | 75.6 | 75.0 | 76.2 | |
Parameter(M) | 7.1 | 10.3 | 9.65 | 8.0 | 9.6 | |
GFLOPs | 16.5 | 32.9 | 29.6 | 21.4 | 23.6 | |
Inference time(ms) | 8.2 | 19.0 | 9.0 | 10.4 | 12.8 |
Metric | [email protected] | [email protected]:.95 | Precision | Recall | Parameter(M) | GFLOPs | Inference Time(ms) | |
---|---|---|---|---|---|---|---|---|
Method | ||||||||
YOLO+MDC(no DWS) | 80.2 | 55.6 | 87.7 | 75.4 | 15.8 M | 54.7 | 10.1 | |
YOLO+MDC | 80.2 | 54.9 (−0.7) | 88.2 (+0.5) | 75.0 (−0.4) | 8.0 (−7.8) | 21.4 (−33.3) | 10.4 (−0.3) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Hong, H.; Song, B.; Guo, J.; Chen, C.; Xu, J. MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images. Remote Sens. 2023, 15, 371. https://doi.org/10.3390/rs15020371
Chen J, Hong H, Song B, Guo J, Chen C, Xu J. MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images. Remote Sensing. 2023; 15(2):371. https://doi.org/10.3390/rs15020371
Chicago/Turabian StyleChen, Juanjuan, Hansheng Hong, Bin Song, Jie Guo, Chen Chen, and Junjie Xu. 2023. "MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images" Remote Sensing 15, no. 2: 371. https://doi.org/10.3390/rs15020371
APA StyleChen, J., Hong, H., Song, B., Guo, J., Chen, C., & Xu, J. (2023). MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images. Remote Sensing, 15(2), 371. https://doi.org/10.3390/rs15020371