Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection
Abstract
1. Introduction
2. The Theory of YOLO
2.1. The Principle of YOLO
2.2. The Network of YOLO-V3
3. Related Work
3.1. Improved Densely Connected Network
3.2. The Proposed Algorithm with Multi-Scale Detection
3.3. K-Means for Anchor Boxes
| Algorithm 1: The pseudocode of K-means | 
| 1: Given K cluster center points: , refer to the width and height of each anchor box. | 
| 2: Calculate the distance between each ground truth and each cluster center: . Since the position of the anchor box is not fixed, the center point of each ground truth is coincident with the clustering center. | 
| 3: Recalculate the cluster center for each cluster: | 
| 4: Repeat step 2 and step 3 until the clusters converge. | 
3.4. Relative to the Grid Cell
3.5. The NMS Algorithm for Merging Bounding Boxes
- Step 1: Take the bounding box with the highest confidence as the target for comparison. Then we compare the IOU between the bounding box and remaining boxes.
- Step 2: If the IOU is larger than the threshold we set, then remove the bounding box from the remaining bounding boxes.
- Step 3: Take the bounding box with the second highest confidence as the target for comparison and repeat Step 1 and Step 2 until all the bounding boxes are left.
| Algorithm 2: The pseudocode of non-maximum suppression (NMS) for our approach | 
| Original Bounding Boxes: , , refers to the set of original bounding boxes refers to the set of confidences of Detection result: refers to the set of the final bounding boxes 1: 2: while do: 3: 4: ; ; 5: for do: 6: if 7: ; 8: end 9: end 10: end | 
4. Experiment and Results
4.1. Loss Function
4.2. The Evaluation Indicators
4.3. Experiment on Remote Sensing Target Detection
- Scale diversity. Remote sensing images can be taken from hundreds of meters to nearly 10,000 meters in height, and ground targets may be of different sizes even if they are of the same kind. For example, ships in ports may be only tens of meters to more than 300 meters in size.
- Perspective particularity. The perspective of remote sensing images is basically overhead, but most of the conventional datasets are still ground level, so the mode of the same target is usually different. The detector trained well on the conventional datasets, which may have a poor effect on the remote sensing images.
- Problem of small targets. Most of the remote sensing targets are small in size. As a result, the target information is limited. The information of the targets has been lost due to the down sampling layers of the Convolutional Neural Network (CNN). After four times of down sampling, the feature map of the target with 24 × 24 pixels may take up only 1 pixel.
- Problem of multi-directions. The viewing angle of remote sensing images are usually overhead, while the directions of the targets are uncertain while there is a degree of certainty in conventional datasets.
- The high complexity of the background. The fields of remote sensing images are relatively large (usually covering several square kilometers). The fields of vision may contain various backgrounds, which will produce strong interference to the target detection.
4.3.1. Dataset Analysis
4.3.2. Experimental Results and Analysis in RSOD and UCS-AOD Dataset
4.3.3. Ablation Experiments
4.3.4. Expansion Experiment
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations:
| YOLO | You Only Look Once | 
| CV | Computer Version | 
| SVM | Support Vector Machine | 
| HOG | Histograms of Oriented Gradients | 
| DPM | Deformable Parts Model | 
| IOU | Intersection over Union | 
| FC | Full Connected Layer | 
| FCN | Full Convolutional Network | 
| CNN | Convolutional Neural Network | 
| GT | Ground Truth | 
| RPN | Region Proposal Network | 
| FPN | Feature Pyramid Network | 
| ResNet | Residual Network | 
| DenseNet | Densely Connected Network | 
| NMS | Non-Maximum Suppression | 
| TP | True Positive | 
| FP | False Positive | 
| FN | False Negative | 
| AP | Average Precision | 
| mAP | Mean Average Precision | 
| FPS | Frames Per Second | 
References
- Shi, W.; Jiang, J.; Bao, S.; Tan, D. CISPNet: Automatic Detection of Remote Sensing Images from Google Earth in Complex Scenes Based on Context Information Scene Perception. Appl. Sci. 2019, 9, 4836. [Google Scholar] [CrossRef]
- Zhong, Y.; Weng, W.; Li, J.; Zhu, S. Collaborative Cross-Domain $k$ NN Search for Remote Sensing Image Processing. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1801–1805. [Google Scholar] [CrossRef]
- Zhu, H.; Zhang, P.; Wang, L.; Zhang, X.; Jiao, L. A multiscale object detection approach for remote sensing images based on MSE-DenseNet and the dynamic anchor assignment. Remote Sens. Lett. 2019, 10, 959–967. [Google Scholar] [CrossRef]
- Zhang, Z.; Chen, J.; Liu, Z. SLIC segmentation method for full-polarised remote-sensing image. J. Eng. 2019, 2019, 6404–6407. [Google Scholar] [CrossRef]
- Shi, Y.; Wang, W.; Gong, Q.; Li, D. Superpixel segmentation and machine learning classification algorithm for cloud detection in remote-sensing images. J. Eng. 2019, 2019, 6675–6679. [Google Scholar] [CrossRef]
- Li, Y.; Xu, J.; Xia, R.; Wang, X.; Xie, W. A two-stage framework of target detection in high-resolution hyperspectral images. Signal Image Video Process. 2019, 13, 1339–1346. [Google Scholar] [CrossRef]
- Li, S.; Xu, Y.; Zhu, M.; Ma, S.; Tang, H. Remote Sensing Airport Detection Based on End-to-End Deep Transferable Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1640–1644. [Google Scholar] [CrossRef]
- Kujawa, S.; Mazurkiewicz, J.; Czekala, W. Using convolutional neural networks to classify the maturity of compost based on sewage sludge and rapeseed straw. J. Clean. Prod. 2020, 258, 120814. [Google Scholar] [CrossRef]
- Xiao, B.; Xu, Y.; Bi, X.; Zhang, J.; Ma, X. Heart sounds classification using a novel 1-D convolutional neural network with extremely low parameter consumption. Neurocomputing 2020, 392, 153–159. [Google Scholar] [CrossRef]
- Hashimoto, R.; Requa, J.; Dao, T.; Ninh, A.; Tran, E.; Mai, D.; Lugo, M.; El-Hage Chehade, N.; Chang, K.J.; Karnes, W.E.; et al. Artificial intelligence using convolutional neural networks for real-time detection of early esophageal neoplasia in Barrett’s esophagus (with video). Gastrointest. Endosc. 2020, 91, 1264–1271. [Google Scholar] [CrossRef]
- Chen, R.-C. Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 2019, 87, 47–56. [Google Scholar] [CrossRef]
- Bilal, M.; Hanif, M.S. Benchmark Revision for HOG-SVM Pedestrian Detector Through Reinvigorated Training and Evaluation Methodologies. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1277–1287. [Google Scholar] [CrossRef]
- Wang, L.; Wu, J.; Wu, D. Research on vehicle parts defect detection based on deep learning. J. Phys. Conf. Ser. 2020, 1437, 012004. [Google Scholar] [CrossRef]
- Zhang, D. Vehicle target detection methods based on color fusion deformable part model. EURASIP J. Wirel. Commun. Netw. 2018, 2018, 1–6. [Google Scholar] [CrossRef]
- Shen, J.; Pan, L.; Hu, X. Building Detection from High Resolution Remote Sensing Imagery Based on a Deformable Part Model. Geomat. Inf. Sci. Wuhan Univ. 2017, 42, 1285–1291. (In Chinese) [Google Scholar] [CrossRef]
- Chen, J.; Takiguchi, T.; Ariki, Y. Rotation-reversal invariant HOG cascade for facial expression recognition. Signal Image Video Process. 2017, 11, 1485–1492. [Google Scholar] [CrossRef]
- Jin, M.; Jeong, K.; Yoon, S.; Park, D.S. Real-time Pedestrian Detection based on GMM and HOG Cascade. In Sixth International Conference on Machine Vision; Verikas, A., Vuksanovic, B., Zhou, J., Eds.; SPIE: Bellingham, WA, USA, 2013; Volume 9067. [Google Scholar]
- Xu, Z.; Huo, Y.; Liu, K.; Liu, S. Detection of ship targets in photoelectric images based on an improved recurrent attention convolutional neural network. Int. J. Distrib. Sens. Netw. 2020, 16. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, G.; Zhao, J.; Yu, L.; Sheng, J.; Zhang, N.; Yuan, H. Second-Generation Sequencing with Deep Reinforcement Learning for Lung Infection Detection. J. Healthc. Eng. 2020, 2020. [Google Scholar] [CrossRef]
- Xue, D.; Sun, J.; Hu, Y.; Zheng, Y.; Zhu, Y.; Zhang, Y. Dim small target detection based on convolutinal neural network in star image. Multimed. Tools Appl. 2020, 79, 4681–4698. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. IEEE. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Li, X.; Shang, M.; Qin, H.; Chen, L. Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN; IEEE: Piscataway, NJ, USA, 2015; pp. 921–925. [Google Scholar]
- Girshick, R. IEEE. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems 28; Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Eds.; IEEE Computer Society: Los Alamitos, CA, USA, 2015; Volume 28. [Google Scholar]
- Sun, N.; Zhu, Y.; Hu, X. Faster R-CNN Based Table Detection Combining Corner Locating; IEEE Computer Society: Los Alamitos, CA, USA, 2019; pp. 1314–1319. [Google Scholar] [CrossRef]
- Kaiming, H.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Huang, Z.; Zhong, Z.; Sun, L.; Huo, Q. Mask R-CNN with Pyramid Attention Network for Scene Text Detection. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1550–5790. [Google Scholar]
- Shih, K.-H.; Chiu, C.-T.; Pu, Y.-Y. IEEE. Real-Time Object Detection via Pruning and a Concatenated Multi-Feature Assisted Region Proposal Network. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019; pp. 1398–1402. [Google Scholar]
- Shree, C.; Kaur, R.; Upadhyay, S.; Joshi, J. Multi-Feature Based Automated Flower Harvesting Techniques in Deep Convolutional Neural Networking. In Proceedings of the 2019 4th International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU), Ghaziabad, India, 18–19 April 2019; p. 6. [Google Scholar] [CrossRef]
- Yuan, J.; Xue, B.; Zhang, W.; Xu, L.; Sun, H.; Zhou, J. RPN-FCN Based Rust Detection on Power Equipment. In 2018 International Conference on Identification, Information and Knowledge in the Internet of Things; Bie, R., Sun, Y., Yu, J., Eds.; Elsevier Science Bv: Amsterdam, The Netherlands, 2019; Volume 147, pp. 349–353. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016, Pt I; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing Ag: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar]
- Lin, M.; Bing, L.; Zhiyu, Z.; Aravinda, C.V.; Kamitoku, N.; Yamazaki, K. Oracle Bone Inscription Detector Based on SSD; Springer International Publishing: Cham, Switzerland, 2019; pp. 126–136. [Google Scholar] [CrossRef]
- Tang, J.; Yao, X.; Kang, X.; Shun, N.; Ren, F. Position-Free Hand Gesture Recognition Using Single Shot Multibox Detector Based Neural Network. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 4–7 August 2019; pp. 2251–2256. [Google Scholar] [CrossRef]
- Cui, L.; Ma, R.; Lv, P.; Jiang, X.; Gao, Z.; Zhou, B.; Xu, M. MDSSD: Multi-scale deconvolutional single shot detector for small objects. Sci. China Inf. Sci. 2020, 63, 120113. [Google Scholar] [CrossRef]
- Haque, M.F.; Dae-Seong, K. Multi Scale Object Detection Based on Single Shot Multibox Detector with Feature Fusion and Inception Network. J. Korean Inst. Inf. Technol. 2018, 16, 93–100. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. IEEE. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Zhang, X.; Qiu, Z.; Huang, P.; Hu, J.; Luo, J. IEEE. Application Research of YOLO v2 Combined with Color Identification. In Proceedings of the 2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Zhengzhou, China, 18–20 October 2018; pp. 138–141. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. 2018. Available online: https://pjreddie.com/media/files/papers/YOLOv3.pdf (accessed on 30 July 2020).
- Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition Using one Stage Improved Model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar] [CrossRef]
- He, W.; Huang, Z.; Wei, Z.; Li, C.; Guo, B. TF-YOLO: An Improved Incremental Network for Real-Time Object Detection. Appl. Sci. 2019, 9, 3225. [Google Scholar] [CrossRef]
- Weber, J.; Lefevre, S. A multivariate Hit-or-Miss Transform for Conjoint Spatial and Spectral Template Matching. In Image and Signal Processing; Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D., Eds.; Springer-Verlag Berlin: Berlin, Germany, 2008; Volume 5099, pp. 226–235. [Google Scholar]
- Feng, T.; Ma, H.; Cheng, X.; Zhang, H. Calculation of the optimal segmentation scale in object-based multiresolution segmentation based on the scene complexity of high-resolution remote sensing images. J. Appl. Remote Sens. 2018, 12, 025006. [Google Scholar] [CrossRef]
- Sun, H.; Sun, X.; Wang, H.; Li, Y.; Li, X. Automatic Target Detection in High-Resolution Remote Sensing Images Using Spatial Sparse Coding Bag-of-Words Model. IEEE Geosci. Remote Sens. Lett. 2012, 9, 109–113. [Google Scholar] [CrossRef]
- Zhang, P.; Niu, X.; Dou, Y.; Xia, F. Airport Detection on Optical Satellite Images Using Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1183–1187. [Google Scholar] [CrossRef]
- Yu, Y.; Yang, X.; Xiao, S.; Lin, J. Automated Ship Detection from Optical Remote Sensing Images. In Advanced Materials in Microwaves and Optics; Wang, D., Ed.; Trans Tech Publications Ltd.: Zurich, Switzerland, 2012; Volume 500, pp. 785–791. [Google Scholar]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. AugFPN: Improving Multi-scale Feature Learning for Object Detection. arXiv 2019, arXiv:1912.05384. [Google Scholar]
- Wong, F.; Hu, H. Adaptive learning feature pyramid for object detection. IET Comput. Vis. 2019, 13, 742–748. [Google Scholar] [CrossRef]
- Zeng, Y.; Ritz, C.; Zhao, J.; Lan, J. Attention-Based Residual Network with Scattering Transform Features for Hyperspectral Unmixing with Limited Training Samples. Remote Sens. 2020, 12, 400. [Google Scholar] [CrossRef]
- Li, J.; Gu, J.; Huang, Z.; Wen, J. Application Research of Improved YOLO V3 Algorithm in PCB Electronic Component Detection. Appl. Sci. 2019, 9, 3750. [Google Scholar] [CrossRef]
- Ju, M.; Luo, H.; Wang, Z.; Hui, B.; Chang, Z. The Application of Improved YOLO V3 in Multi-Scale Target Detection. Appl. Sci. 2019, 9, 3775. [Google Scholar] [CrossRef]
- Liu, G.; Nouaze, J.C.; Mbouembe, P.L.T.; Kim, J.H. YOLO-Tomato: A Robust Algorithm for Tomato Detection Based on YOLOv3. Sensors 2020, 20, 2145. [Google Scholar] [CrossRef]
- Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective. Sensors 2020, 20, 2238. [Google Scholar] [CrossRef]
- Zhu, Y.; Newsam, S. IEEE. Densenet for Dense Flow. In Proceedings of the 2017 24th IEEE International Conference on Image Processing, Beijing, China, 17–20 September 2017; pp. 790–794. [Google Scholar]
- Huang, Z.; Wang, J. DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. 2020, 522, 241–258. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Chien-Yao, W.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]











| Layer | Filter | Size | Output | |
|---|---|---|---|---|
| Convolutional | 32 | 3 × 3 | 416 × 416 × 32 | |
| Convolutional | 64 | 3 × 3/2 | 208 × 208 × 64 | |
| Convolutional | 32 | 1 × 1 | ||
| 1× | Convolutional | 64 | 3 × 3 | |
| Residual | 208 × 208 × 64 | |||
| Convolutional | 128 | 3 × 3/2 | 104 × 104 × 128 | |
| 2× | Convolutional | 64 | 1 × 1 | |
| Convolutional | 128 | 3 × 3 | ||
| Residual | 104 × 104 × 128 | |||
| Convolutional | 256 | 3 × 3/2 | 52 × 52 × 256 | |
| 8× | Convolutional | 128 | 1 × 1 | |
| Convolutional | 256 | 3 × 3 | ||
| Residual | 52 × 52 × 256 | |||
| Convolutional | 512 | 3 × 3/2 | 26 × 26 × 512 | |
| 8× | Convolutional | 256 | 1 × 1 | |
| Convolutional | 512 | 3 × 3 | ||
| Residual | 26 × 26 × 512 | |||
| Convolutional | 1024 | 3 × 3/2 | 13 × 13 × 1024 | |
| 4× | Convolutional | 512 | 1 × 1 | |
| Convolutional | 1024 | 3 × 3 | ||
| Residual | 13 × 13 × 1024 | 
| Layers | Filter | Size | Output | |
|---|---|---|---|---|
| Convolutional | 32 | 3 × 3 | 416 × 416 × 32 | |
| Convolutional | 64 | 3 × 3/2 | 208 × 208 × 64 | |
| 1× | Convolutional | 32 | 1 × 1 | |
| Convolutional | 64 | 3 × 3 | ||
| Residual | 208 × 208 × 64 | |||
| Convolutional | 128 | 3 × 3/2 | 104 × 104 × 128 | |
| 2× | Convolutional | 64 | 1 × 1 | |
| Convolutional | 128 | 3 × 3 | ||
| Residual | 104 × 104 × 128 | |||
| Convolutional | 256 | 3 × 3/2 | 52 × 52 × 256 | |
| 4× | Convolutional | 128 | 1 × 1 | |
| Convolutional | 256 | 3 × 3 | ||
| Residual | 52 × 52 × 256 | |||
| 4× | Convolutional | 32 | 1 × 1 | |
| Convolutional | 64 | 3 × 3 | ||
| DenseNet | 52 × 52 × 512 | |||
| Convolutional | 512 | 3 × 3/2 | 26 × 26 × 512 | |
| 4× | Convolutional | 256 | 1 × 1 | |
| Convolutional | 512 | 3 × 3 | ||
| Residual | 26 × 26 × 512 | |||
| 4× | Convolutional | 64 | 1 × 1 | |
| Convolutional | 128 | 3 × 3 | ||
| DenseNet | 26 × 26 × 1024 | |||
| Convolutional | 1024 | 3 × 3/2 | 13 × 13 × 1024 | |
| 4× | Convolutional | 512 | 1 × 1 | |
| Convolutional | 1024 | 3 × 3 | ||
| Residual | 13 × 13 × 1024 | 
| 3× | Convolutional | 512 | 1 × 1 | |
| Convolutional | 1024 | 3 × 3 | ||
| Residual (RES 1st) | 13 × 13 × 1024 | |||
| The Structure of RES 1st | ||||
| 3× | Convolutional | 256 | 1 × 1 | |
| Convolutional | 512 | 3 × 3 | ||
| Residual (RES 2nd) | 26 × 26 × 512 | |||
| The Structure of RES 2nd | ||||
| 3× | Convolutional | 128 | 1 × 1 | |
| Convolutional | 256 | 3 × 3 | ||
| Residual (RES 3rd) | 52 × 52 × 256 | |||
| The Structure of RES 3rd | ||||
| 3× | Convolutional | 64 | 1 × 1 | |
| Convolutional | 128 | 3 × 3 | ||
| Residual (RES 4th) | 104 × 104 × 128 | |||
| The Structure of RES 4th | ||||
| Input Size | Batch Size | Momentum | Learning Rate | Training Step | 
|---|---|---|---|---|
| 416 × 416 | 8 | 0.9 | 0.001–0.00001 | 50,000 | 
| Dataset | Class | Image | Instances | Target Amount | ||
|---|---|---|---|---|---|---|
| Small | Medium | Large | ||||
| Training Set | Aircraft | 446 | 4993 | 3714 | 833 | 446 | 
| Oil tank | 165 | 1586 | 724 | 713 | 149 | |
| Overpass | 176 | 180 | 0 | 0 | 180 | |
| Playground | 189 | 191 | 0 | 12 | 179 | |
| Test Set | Aircraft | 176 | 1257 | 741 | 359 | 157 | 
| Oil tank | 63 | 567 | 257 | 213 | 97 | |
| Overpass | 36 | 41 | 0 | 0 | 41 | |
| Playground | 49 | 52 | 0 | 0 | 52 | |
| Dataset | Class | Image | Instances | 
|---|---|---|---|
| Training Set | Aircraft | 600 | 3591 | 
| Car | 310 | 4475 | |
| Test Set | Aircraft | 400 | 3891 | 
| Car | 200 | 2639 | 
| Method | Backbone | Metric (%) | FPS | ||||
|---|---|---|---|---|---|---|---|
| Aircraft | Oil Tank | Overpass | Playground | mAP (IOU = 0.5) | |||
| Faster RCNN | VGG-16 | 85.85 | 86.67 | 88.15 | 90.35 | 87.76 | 6.7 | 
| SSD | VGG-16 | 69.17 | 71.20 | 70.23 | 81.26 | 72.97 | 62.2 | 
| DSSD | ResNet-101 | 72.12 | 72.49 | 72.10 | 83.56 | 75.07 | 6.1 | 
| ESSD | VGG-16 | 73.08 | 72.94 | 73.61 | 84.27 | 75.98 | 37.3 | 
| YOLO-V2 | DarkNet19 | 62.35 | 67.74 | 68.38 | 78.51 | 69.25 | 35.6 | 
| YOLO-V3 | DarkNet53 | 74.30 | 73.85 | 75.08 | 85.16 | 77.10 | 29.7 | 
| YOLO-V3 tiny | DarkNet19 | 54.14 | 56.21 | 59.28 | 64.20 | 58.46 | 69.8 | 
| UAV-YOLO [52] | Figure 1 in [52] | 74.68 | 74.20 | 76.32 | 85.96 | 77.79 | 30.1 | 
| DC-SPP-YOLO [54] | Figure 5 in [54] | 73.16 | 73.52 | 74.82 | 84.82 | 76.58 | 33.5 | 
| ours | (Figure 3) | 86.42 | 87.57 | 89.37 | 91.56 | 88.73 | 25.8 | 
| Method | Backbone | Metric (%) | Leak Detection Rate (%) | ||
|---|---|---|---|---|---|
| Small | Medium | Large | |||
| Faster RCNN | VGG-16 | 84.73 | 87.87 | 89.18 | 11.8 | 
| SSD | VGG-16 | 70.38 | 73.41 | 77.51 | 21.1 | 
| DSSD | ResNet-101 | 74.42 | 75.18 | 77.70 | 15.2 | 
| ESSD | VGG-16 | 75.12 | 75.84 | 78.12 | 16.5 | 
| YOLO-V2 | DarkNet19 | 63.20 | 68.53 | 69.28 | 24.3 | 
| YOLO-V3 | DarkNet53 | 74.52 | 75.63 | 76.14 | 19.5 | 
| YOLO-V3 tiny | DarkNet19 | 55.26 | 56.47 | 60.17 | 31.4 | 
| UAV-YOLO [52] | Figure 1 in Reference [52] | 75.45 | 75.15 | 76.85 | 17.1 | 
| DC-SPP-YOLO [54] | Figure 5 in Reference [54] | 75.41 | 74.67 | 76.41 | 15.9 | 
| ours | (Figure 3) | 87.51 | 87.93 | 90.23 | 10.2 | 
| Method | Backbone | Metric (%) | FPS | |||
|---|---|---|---|---|---|---|
| Aircraft | Car | Leak Detection Rate (%) | mAP (IOU = 0.5) | |||
| Faster RCNN | VGG-16 | 87.31 | 86.48 | 13.8 | 86.90 | 6.1 | 
| SSD | VGG-16 | 70.24 | 72.61 | 23.7 | 71.43 | 61.5 | 
| DSSD | ResNet-101 | 73.17 | 74.19 | 16.1 | 73.68 | 5.2 | 
| ESSD | VGG-16 | 73.62 | 75.06 | 15.9 | 74.34 | 33.2 | 
| YOLO-V2 | DarkNet19 | 63.17 | 68.42 | 23.0 | 65.80 | 34.3 | 
| YOLO-V3 | DarkNet53 | 75.71 | 75.62 | 18.5 | 75.67 | 27.6 | 
| YOLO-V3 tiny | DarkNet19 | 57.58 | 56.35 | 35.2 | 56.97 | 65.3 | 
| UAV-YOLO [52] | Figure 1 in Reference [52] | 75.12 | 75.60 | 16.5 | 75.36 | 28.4 | 
| DC-SPP-YOLO [54] | Figure 5 in Reference [54] | 76.52 | 74.61 | 17.4 | 75.57 | 30.4 | 
| Ours | (Figure 3) | 89.31 | 88.24 | 9.3 | 88.78 | 24.9 | 
| DENSE 1st | DENSE 2nd | Metric (%) | FPS | |||||
|---|---|---|---|---|---|---|---|---|
| Aircraft | Oil Tank | Overpass | Playground | mAP (IOU = 0.5) | ||||
| 1 | 74.30 | 73.85 | 75.08 | 85.16 | 77.10 | 29.7 | ||
| 2 | ✓ | 76.81 | 75.38 | 77.21 | 85.37 | 78.69 | 30.9 | |
| 3 | ✓ | 77.28 | 76.39 | 79.65 | 85.92 | 79.81 | 31.4 | |
| 4 | ✓ | ✓ | 82.16 | 83.52 | 85.12 | 86.73 | 84.38 | 32.3 | 
| 4th Scale | Res 3 | Metric (%) | FPS | |||||
|---|---|---|---|---|---|---|---|---|
| Aircraft | Oil Tank | Overpass | Playground | mAP (IOU = 0.5) | ||||
| 1 | 77.25 | 76.38 | 84.36 | 86.12 | 81.03 | 29.7 | ||
| 2 | ✓ | 85.97 | 85.18 | 87.15 | 89.61 | 86.98 | 24.8 | |
| 3 | ✓ | 79.38 | 78.85 | 85.29 | 88.28 | 82.95 | 30.1 | |
| 4 | ✓ | ✓ | 86.42 | 87.57 | 89.37 | 91.56 | 88.73 | 25.8 | 
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors 2020, 20, 4276. https://doi.org/10.3390/s20154276
Xu D, Wu Y. Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors. 2020; 20(15):4276. https://doi.org/10.3390/s20154276
Chicago/Turabian StyleXu, Danqing, and Yiquan Wu. 2020. "Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection" Sensors 20, no. 15: 4276. https://doi.org/10.3390/s20154276
APA StyleXu, D., & Wu, Y. (2020). Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors, 20(15), 4276. https://doi.org/10.3390/s20154276
 
        
 
       