Deep Learning-Based Semantic Segmentation of Urban Areas Using Heterogeneous Unmanned Aerial Vehicle Datasets
Abstract
:1. Introduction
- The training was performed without adjusting the spatial resolution and class types of different UAV datasets. The segmentation accuracy between only training a single dataset versus different datasets simultaneously was compared to understand the impact of data types on the accuracy of CSN;
- Based on the finding that CSN can enhance the segmentation accuracy of specific classes [3], a method was proposed to enhance the segmentation accuracy of the UAV datasets by modifying the shared encoding layer structure;
- To determine whether the RS images acquired from various platforms can enhance the segmentation accuracy of UAV images, heterogeneous UAV datasets and airborne datasets were used for training. Based on the results, the type of dataset that can be used with CSN for UAV image training was determined.
2. Methods
2.1. Combined Segmentation Network
2.2. Evaluation Metrics
3. Materials
3.1. Datasets
3.1.1. Semantic Drone Dataset
3.1.2. UAVid Dataset
3.1.3. ISPRS Potsdam Semantic Labeling Dataset
3.2. Test Images
4. Experiments and Results
4.1. Training Settings
4.2. Experimental Results
5. Discussion
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Audebert, N.; Le Saux, B.; Lefèvre, S. Segment-before-detect: Vehicle detection and classification through semantic segmentation of aerial images. Remote Sens. 2017, 9, 368. [Google Scholar] [CrossRef]
- Xu, P.; Tang, H.; Ge, J.; Feng, L. ESPC_NASUnet: An end-to-end super-resolution semantic segmentation network for mapping buildings from remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5421–5435. [Google Scholar] [CrossRef]
- Song, A.; Kim, Y. Semantic Segmentation of Remote-Sensing Imagery Using Heterogeneous Big Data: International Society for Photogrammetry and Remote Sensing Potsdam and Cityscape Datasets. ISPRS Int. J. Geo-Inf. 2020, 9, 601. [Google Scholar] [CrossRef]
- Neupane, B.; Horanont, T.; Aryal, J. Deep Learning-Based Semantic Segmentation of Urban Features in Satellite Images: A Review and Meta-Analysis. Remote Sens. 2021, 13, 808. [Google Scholar] [CrossRef]
- Lateef, F.; Ruichek, Y. Survey on semantic segmentation using deep learning techniques. Neurocomputing 2019, 338, 321–348. [Google Scholar] [CrossRef]
- Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical image segmentation review: The success of u-net. arXiv 2022, arXiv:2211.14830. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A DL framework for semantic seg-mentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Li, H.; Qiu, K.; Chen, L.; Mei, X.; Hong, L.; Tao, C. SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 905–909. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Panboonyuen, T.; Jitkajornwanich, K.; Lawawirojwong, S.; Srestasathiern, P.; Vateekul, P. Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning. Remote Sens. 2019, 11, 83. [Google Scholar] [CrossRef]
- Cui, B.; Chen, X.; Lu, Y. Semantic Segmentation of Remote Sensing Images Using Transfer Learning and Deep Convolutional Neural Network With Dense Connection. IEEE Access 2020, 8, 116744–116755. [Google Scholar] [CrossRef]
- Gerke, M. Use of the Stair Vision Library within the ISPRS 2D Semantic Labeling Benchmark (Vaihingen). 2014. Available online: https://www.researchgate.net/publication/270104226_Use_of_the_Stair_Vision_Library_within_the_ISPRS_2D_Semantic_Labeling_Benchmark_Vaihingen (accessed on 12 September 2023).
- Meletis, P.; Dubbelman, G. Training of convolutional networks on multiple heterogeneous datasets for street scene semantic segmentation. In Proceedings of the IEEE Intelligent Vehicles Symposium, Changshu, Suzhou, China, 26–30 June 2018; pp. 1045–1050. [Google Scholar]
- Ghassemi, S.; Fiandrotti, A.; Francini, G.; Magli, E. Learning and Adapting Robust Features for Satellite Image Segmentation on Heterogeneous Data Sets. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6517–6529. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 3213–3223. [Google Scholar]
- Lee, H.; Eum, S.; Kwon, H. Cross-Domain CNN for Hyperspectral Image Classification. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 3627–3630. [Google Scholar]
- Lyu, Y.; Vosselman, G.; Xia, G.-S.; Yilmaz, A.; Yang, M.Y. UAVid: A semantic segmentation dataset for UAV imagery. ISPRS J. Photogramm. Remote Sens. 2020, 165, 108–119. [Google Scholar] [CrossRef]
- Semantic Drone Dataset. Available online: http://dronedataset.icg.tugraz.ati (accessed on 12 September 2023).
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Dataset | Platform | Type of View | Flight Altitude | Number of Images (Number of Classes) | Category | |
---|---|---|---|---|---|---|
UAVid | UAV | side-view | 0–50 m | Train: 3300 Val: 1500 (class 8) | Building Road Static car Tree Low vegetation Human Moving Car Background clutter | |
Semantic drone dataset (SDD) | UAV | bird-view | 5–30 m | Train: 3300 Val: 1500 (class 23) | Background Paved area Dirt Grass Gravel Water Rocks Pool Vegetation Roof Wall Window | Door Fence Fence pole Person Dog Car Bicycle Tree Bald tree AR marker Obstacle Conflicting |
ISPRS Potsdam | Aircraft | bird-view | 800 m | Train: 3300 Val: 1500 (class 6) | Impervious surfaces Building Low vegetation Tree Car Clutter/background |
Model (Training Dataset) | Kappa | F1 Score | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Paved Area | Dirt | Grass | Gravel | Water | Rocks | Vegetation | Roof | Person | Car | Bicycle | Tree | Bald Tree | ||
U-Net (SDD) | 0.63 | 0.89 | 0.43 | 0.84 | 0.66 | 0.56 | 0.32 | 0.70 | 0.78 | 0.52 | 0.34 | 0.00 | 0.34 | 0.21 |
PSPNet (SDD) | 0.83 | 0.95 | 0.70 | 0.93 | 0.85 | 0.91 | 0.80 | 0.78 | 0.93 | 0.53 | 0.78 | 0.25 | 0.85 | 0.83 |
CSN Case 1 (SDD–ISPRS) | 0.35 | 0.72 | 0.27 | 0.66 | 0.17 | 0.00 | 0.00 | 0.35 | 0.18 | 0.04 | 0.00 | 0.00 | 0.08 | 0.00 |
CSN Case 1 (SDD–UAVid) | 0.38 | 0.76 | 0.21 | 0.59 | 0.44 | 0.16 | 0.04 | 0.22 | 0.52 | 0.11 | 0.01 | 0.00 | 0.09 | 0.03 |
CSN Case 2 (SDD–ISPRS) | 0.45 | 0.80 | 0.23 | 0.63 | 0.40 | 0.07 | 0.03 | 0.26 | 0.59 | 0.21 | 0.02 | 0.00 | 0.51 | 0.10 |
CSN Case 2 (SDD–UAVid) | 0.72 | 0.90 | 0.60 | 0.85 | 0.76 | 0.70 | 0.31 | 0.69 | 0.85 | 0.62 | 0.51 | 0.03 | 0.58 | 0.64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, A. Deep Learning-Based Semantic Segmentation of Urban Areas Using Heterogeneous Unmanned Aerial Vehicle Datasets. Aerospace 2023, 10, 880. https://doi.org/10.3390/aerospace10100880
Song A. Deep Learning-Based Semantic Segmentation of Urban Areas Using Heterogeneous Unmanned Aerial Vehicle Datasets. Aerospace. 2023; 10(10):880. https://doi.org/10.3390/aerospace10100880
Chicago/Turabian StyleSong, Ahram. 2023. "Deep Learning-Based Semantic Segmentation of Urban Areas Using Heterogeneous Unmanned Aerial Vehicle Datasets" Aerospace 10, no. 10: 880. https://doi.org/10.3390/aerospace10100880