YOLOv3-Based Matching Approach for Roof Region Detection from Drone Images
Abstract
:1. Introduction
2. Traditional Image Stitching Methods and Deep Learning
2.1. Traditional Image Matching
2.1.1. Scale-Invariant Feature Transform (SIFT)
2.1.2. Speeded Up Robust Features (SURF)
2.1.3. Oriented FAST and Rotated BRIEF(ORB)
2.1.4. RANdom SAmple Consensus (RANSAC)
2.2. Deep Learning Algorithms
Object Detection Network
3. Proposed Method
3.1. Dataset and Training Process
3.1.1. Experiment Environment
3.1.2. The Datasets
3.2. Evaluation Methods
3.3. Evaluation and Testing Process
4. Experimental Results
4.1. Xizhi District, New Taipei City CASE 1
4.2. Xizhi District, New Taipei City CASE 2
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Brown, L.G. A Survey of Image Registration Techniques. ACM 1992, 24, 326–376. [Google Scholar] [CrossRef]
- Lowe, D.G. Object Recognition from Local Scale-Invariant Features. ICCV 1999, 99, 1150–1157. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Springer: New York, NY, USA, 2006; pp. 404–417. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G.R. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Harris, C.G.; Stephens, M.J. A combined corner and edge detector. In Proceedings of the Fourth Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988; pp. 147–152. [Google Scholar]
- Shi, J.; Tomasi, C. Good features to track. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, DC, USA, 21–23 June 1994; pp. 593–600. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef] [Green Version]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Serre, T. Deep Learning: The Good, the bad, and the Ugly. Annu. Rev. Vis. Sci. 2019, 5, 399–426. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer, Vision, Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Raina, R.; Madhavan, A.; Ng, A.Y. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; ACM: Montreal, QC, Canada, 2009; pp. 873–880. [Google Scholar]
- Cire¸san, D.C.; Meier, U.; Gambardella, L.M.; Schmidhuber, J. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Comput. 2010, 22, 3207–3220. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sugihara, K.; Hayashi, Y. Automatic Generation of 3D Building Models with Multiple Roofs. Tsinghua Sci. Technol. 2008, 13, 368–374. [Google Scholar] [CrossRef]
- Dahl, G.E.; Yu, D.; Deng, L.; Acero, A. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Trans. Audio Speech Lang. Process. 2011, 20, 30–42. [Google Scholar] [CrossRef] [Green Version]
- Lee, D.; Lee, S.-J.; Seo, Y.-J. Application of Recent Developments in Deep Learning to ANN-Based Automatic Berthing Systems. Int. J. Eng. Technol. Innov. 2020, 10, 75–90. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
- Mikolajczyk, K.; Schmid, C. An affine invariant interest point detector. In Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark, 28–31 May 2002; pp. 128–142. [Google Scholar]
- Zhang, Z.; Geiger, J.; Pohjalainen, J.; Mousa, A.E.; Jin, W.; Schuller, B. Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments. ACM Trans. Intell. Syst. Technol. 2018, 9, 49:1–49:28. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision—ECCV, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Tzutalin. Available online: https://github.com/tzutalin/labelImg (accessed on 30 May 2019).
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
- Alhwarin, F. Fast and Robust Image Feature Matching Methods for Computer Vision Applications; Shaker Verlag: Aachen, Germany, 2011. [Google Scholar]
- Karami, E.; Prasad, S.; Shehata, M. Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images. arXiv 2017, arXiv:1710.02726. [Google Scholar]
- Preeti, M.; Bharat, P. An Advanced Technique of Image Matching Using SIFT and SURF. Int. J. Adv. Res. Comput. Commun. Eng. 2016, 5, 462–466. [Google Scholar]
- He, M.M.; Guo, Q.; Li, A.; Chen, J.; Chen, B.; Feng, X.X. Automatic Fast Feature-Level Image Registration for High-Resolution Remote Sensing Images. J. Remote Sens. 2018, 2, 277–292. [Google Scholar]
- Agüera-Vega, F.; Carvajal-Ramírez, F.; Martínez-Carricondo, P. Accuracy of Digital Surface Models and Orthophotos Derived from Unmanned Aerial Vehicle Photogrammetry. J. Surv. Eng. 2016, 143, 4016025. [Google Scholar] [CrossRef]
- Manfreda, S.; Dvorak, P.; Mullerova, J.; Herban, S.; Vuono, P.; Arranz Justel, J.; Perks, M. Assessing the Accuracy of Digital Surface Models Derived from Optical Imagery Acquired with Unmanned Aerial Systems. Drones 2019, 3, 15. [Google Scholar] [CrossRef] [Green Version]
- Gross, J.W.; Heumann, B.W. A Statistical Examination of Image Stitching Software Packages or Use with Unmanned Aerial Systems. Photogramm. Eng. Remote Sens. 2016, 82, 419–425. [Google Scholar] [CrossRef]
- Oniga, V.-E.; Breaban, A.-I.; Statescu, F. Determining the Optimum Number of Ground Control Points for Obtaining High Precision Results Based on UAS Images. Proceedings 2018, 2, 352. [Google Scholar] [CrossRef] [Green Version]
Operating System | Ubuntu 16.04 LTS |
Central processing | Intel i7-8700 3.2GHz |
Random-access memory (RAM) | DDR4 2400 24GB |
Graphics card | TITAN Xp (Pascal) |
Software | Darknet, CUDA9.0 |
Characteristic Name | Description |
---|---|
Platform | ALIAS |
Flight altitude Above Ground Level (AGL) | 200 m |
Sensor | SONY a7R |
Resolution | 7360 × 4912 |
Output data format | JPEG (Exif 2.3)/ RAW (Sony ARW 2.3) |
Spatial resolution | 25 cm (GSD) |
Weather | Overcast |
Figure | SSIM | SSIM Execution Time (ms) |
---|---|---|
Figure 8c and Figure 9c | 0.7743 | 3.27 |
Figure 8c and Figure 9d | 0.6528 | 3.12 |
Figure 8c and Figure 9e | 0.2644 | 2.83 |
Figure 8d and Figure 9c | 0.6236 | 3.03 |
Figure 8d and Figure 9d | 0.8026 | 3.18 |
Figure 8d and Figure 9e | 0.2836 | 2.91 |
Figure 8e and Figure 9c | 0.2731 | 3.08 |
Figure 8e and Figure 9d | 0.2836 | 3.15 |
Figure 8e and Figure 9e | 0.7263 | 3.22 |
Method | Keypoint1 | Keypoint2 | Matches | Match Rate (%) | Execution Time (ms) | Match Performance (%) | RMSE | |
---|---|---|---|---|---|---|---|---|
YOLOv3 Time | Matching Time | |||||||
SIFT | 2000 | 2001 | 1126 | 56.29 | 0 | 1183.68 | 0.05 | 0.9647 |
YOLOv3+SIFT | 490 | 490 | 477 | 97.35 | 28.98 | 29.79 | 1.66 | 0.8578 |
SURF | 1582 | 1507 | 1024 | 66.30 | 0 | 1064.84 | 0.06 | 0.9285 |
YOLOv3+SURF | 80 | 80 | 78 | 97.50 | 28.98 | 22.94 | 1.88 | 0.8864 |
ORB | 1500 | 1486 | 586 | 39.25 | 0 | 506.63 | 0.08 | 0.9751 |
YOLOv3+ORB | 619 | 619 | 603 | 97.42 | 28.98 | 10.99 | 2.43 | 0.8962 |
Figure | SSIM | SSIM Execution Time (ms) |
---|---|---|
Figure 19c and Figure 20c | 0.7143 | 3.53 |
Figure 19c and Figure 20d | 0.3597 | 3.48 |
Figure 19d and Figure 20c | 0.4001 | 3.46 |
Figure 19d and Figure 20d | 0.7688 | 3.58 |
Method | Keypoint1 | Keypoint2 | Matches | Match Rate (%) | Execution Time (ms) | Match Performance (%) | RMSE | |
---|---|---|---|---|---|---|---|---|
YOLOv3 Time | Matching Time | |||||||
SIFT | 2000 | 2000 | 614 | 30.70 | 0 | 1094.49 | 0.03 | 0.9184 |
YOLOv3+SIFT | 130 | 186 | 124 | 78.48 | 16.72 | 25.91 | 1.84 | 0.9069 |
SURF | 309 | 520 | 239 | 57.66 | 0 | 936.49 | 0.06 | 0.9713 |
YOLOv3+SURF | 37 | 32 | 27 | 78.26 | 16.72 | 21.40 | 2.05 | 0.8715 |
ORB | 1500 | 1496 | 381 | 25.43 | 0 | 295.57 | 0.06 | 0.9742 |
YOLOv3+ORB | 129 | 105 | 95 | 81.20 | 16.72 | 9.94 | 3.05 | 0.9242 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yeh, C.-C.; Chang, Y.-L.; Alkhaleefah, M.; Hsu, P.-H.; Eng, W.; Koo, V.-C.; Huang, B.; Chang, L. YOLOv3-Based Matching Approach for Roof Region Detection from Drone Images. Remote Sens. 2021, 13, 127. https://doi.org/10.3390/rs13010127
Yeh C-C, Chang Y-L, Alkhaleefah M, Hsu P-H, Eng W, Koo V-C, Huang B, Chang L. YOLOv3-Based Matching Approach for Roof Region Detection from Drone Images. Remote Sensing. 2021; 13(1):127. https://doi.org/10.3390/rs13010127
Chicago/Turabian StyleYeh, Chia-Cheng, Yang-Lang Chang, Mohammad Alkhaleefah, Pai-Hui Hsu, Weiyong Eng, Voon-Chet Koo, Bormin Huang, and Lena Chang. 2021. "YOLOv3-Based Matching Approach for Roof Region Detection from Drone Images" Remote Sensing 13, no. 1: 127. https://doi.org/10.3390/rs13010127
APA StyleYeh, C. -C., Chang, Y. -L., Alkhaleefah, M., Hsu, P. -H., Eng, W., Koo, V. -C., Huang, B., & Chang, L. (2021). YOLOv3-Based Matching Approach for Roof Region Detection from Drone Images. Remote Sensing, 13(1), 127. https://doi.org/10.3390/rs13010127