Multi-Task cGAN for Simultaneous Spaceborne DSM Refinement and Roof-Type Classification
Abstract
:1. Introduction
2. Related Work
2.1. Pixel-Wise Image Classification
2.2. Depth Image Regression
2.3. Multi-Task Learning
- We efficiently adapt the cGAN architecture developed by Isola et al. [18] for multi-task learning.
- The proposed framework generates images with continuous values representing elevation models with enhanced building geometries and, at the same time, images with discrete values depicting the label information meaning to which class out of three (flat roof, non-flat roof, and background) every single pixel belongs.
- We investigate the potential of different network architectures for each task and select the combination of models that provides the best results for both pixel-wise classification and depth map generation. We show that joint training of multiple tasks within the end-to-end framework is beneficial. Moreover, the obtained roof classification information can be used later in a post-processing step for the final building modeling task.
- We investigate the potential of using a normal vector loss, which is included as an additional term to the objective function with least squares, thereby gaining more accurate and planar roof structures.
3. Methodology
3.1. Building Shape Improvements and Roof Type Understanding Model
3.1.1. One Generator, Two Outputs
3.1.2. Two Generators, Two Outputs
3.2. Loss Function
4. Study Area and Model Settings
4.1. Dataset
4.2. Implementation Details and Training
4.3. Inference Phase
5. Results and Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Hoja, D.; Reinartz, P.; Lehner, M. DSM generation from high resolution satellite imagery using additional information contained in existing DSM. In Proceedings of the High-Resolution Earth Imaging for Geospatial Information, Hannover, Germany, 17–20 May 2005; pp. 1–6. [Google Scholar]
- Eckert, S.; Hollands, T. Comparison of automatic DSM generation modules by processing IKONOS stereo data of an urban area. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2010, 3, 162–167. [Google Scholar] [CrossRef]
- Wohlfeil, J.; Hirschmüller, H.; Piltz, B.; Börner, A.; Suppa, M. Fully automated generation of accurate digital surface models with sub-meter resolution from satellite imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, XXXIX-B3, 75–80. [Google Scholar] [CrossRef]
- Xu, F.; Woodhouse, N.; Xu, Z.; Marr, D.; Yang, X.; Wang, Y. Blunder elimination techniques in adaptive automatic terrain extraction. ISPRS J. 2008, 29, 21. [Google Scholar]
- Sirmacek, B.; d’Angelo, P.; Krauss, T.; Reinartz, P. Enhancing urban digital elevation models using automated computer vision techniques. In Proceedings of the ISPRS Commission VII Symposium, Vienna, Austria, 5–7 July 2010. [Google Scholar]
- Brédif, M.; Tournaire, O.; Vallet, B.; Champion, N. Extracting polygonal building footprints from digital surface models: A fully-automatic global optimization framework. ISPRS J. Photogramm. Remote Sens. 2013, 77, 57–65. [Google Scholar] [CrossRef] [Green Version]
- Davydova, K.; Cui, S.; Reinartz, P. Building footprint extraction from digital surface models using neural networks. In Image and Signal Processing for Remote Sensing XXII; International Society for Optics and Photonics: Edinburgh, UK, 2016; Volume 10004, p. 100040J. [Google Scholar]
- Arefi, H.; Alizadeh Naeini, A.; Ghafouri, A. Building Extraction Using Surface Model Classification. In Proceedings of the GIS Ostrava 2013—Geoinformatics for City Transformation, Ostrava, Czech Republic, 21–23 January 2013. [Google Scholar]
- Bittner, K.; Cui, S.; Reinartz, P. Building Extraction from Remote Sensing Data using Fully Convolutional Networks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 481–486. [Google Scholar] [CrossRef]
- Liao, Y.; Kodagoda, S.; Wang, Y.; Shi, L.; Liu, Y. Understand scene categories by objects: A semantic regularized scene classifier using convolutional neural networks. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 2318–2325. [Google Scholar]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv 2013, arXiv:1312.6229. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Gregor, K.; Danihelka, I.; Mnih, A.; Blundell, C.; Wierstra, D. Deep autoregressive networks. arXiv 2013, arXiv:1310.8499. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; Terry Sejnowski: San Diego, CA, USA, 2014; pp. 2672–2680. [Google Scholar]
- Gregor, K.; Danihelka, I.; Graves, A.; Rezende, D.J.; Wierstra, D. Draw: A recurrent neural network for image generation. arXiv 2015, arXiv:1502.04623. [Google Scholar]
- Oord, A.V.D.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. arXiv 2016, arXiv:1601.06759. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. arXiv 2016, arXiv:1611.07004. [Google Scholar]
- Mohajeri, N.; Assouline, D.; Guiboud, B.; Bill, A.; Gudmundsson, A.; Scartezzini, J.L. A city-scale roof shape classification using machine learning for solar energy applications. Renew. Energy 2018, 121, 81–93. [Google Scholar] [CrossRef]
- Assouline, D.; Mohajeri, N.; Scartezzini, J.L. Building rooftop classification using random forests for large-scale PV deployment. In Proceedings of the Earth Resources and Environmental Remote Sensing/GIS Applications VIII, Warsaw, Poland, 5 October 2017; Volume 10428, p. 1042806. [Google Scholar]
- Castagno, J.D.; Atkins, E.M. Automatic Classification of Roof Shapes for Multicopter Emergency Landing Site Selection. arXiv 2018, arXiv:1802.06274. [Google Scholar] [Green Version]
- Alidoost, F.; Arefi, H. Knowledge based 3D building model recognition using convolutional neural networks from lidar and areal imageries. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 833–840. [Google Scholar] [CrossRef]
- Partovi, T.; Fraundorfer, F.; Azimi, S.; Marmanis, D.; Reinartz, P. Roof Type Selection based on patch-based classsification using deep learning for high Resolution Satellite Imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 653–657. [Google Scholar] [CrossRef]
- Axelsson, M.; Soderman, U.; Berg, A.; Lithen, T. Roof Type Classification Using Deep Convolutional Neural Networks on Low Resolution Photogrammetric Point Clouds From Aerial Imagery. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1293–1297. [Google Scholar]
- Felicísimo, A.M. Parametric statistical method for error detection in digital elevation models. ISPRS J. Photogramm. Remote Sens. 1994, 49, 29–33. [Google Scholar] [CrossRef]
- Wang, P. Applying two dimensional Kalman filtering for digital terrain modelling. Proc. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 1998, 32, 649–656. [Google Scholar]
- Walker, J.P.; Willgoose, G.R. A comparative study of Australian cartometric and photogrammetric digital elevation model accuracy. Photogramm. Eng. Remote Sens. 2006, 72, 771–779. [Google Scholar] [CrossRef]
- Anderson, E.; Thompson, J.; Austin, R. LIDAR density and linear interpolator effects on elevation estimates. Int. J. Remote Sens. 2005, 26, 3889–3900. [Google Scholar] [CrossRef]
- Smith, S.; Holland, D.; Longley, P. Quantifying interpolation errors in urban airborne laser scanning models. Geograph. Anal. 2005, 37, 200–224. [Google Scholar] [CrossRef]
- Shi, W.; Tian, Y. A hybrid interpolation method for the refinement of a regular grid digital elevation model. Int. J. Geogr. Inf. Sci. 2006, 20, 53–67. [Google Scholar] [CrossRef]
- Sirmacek, B.; d’Angelo, P.; Reinartz, P. Detecting complex building shapes in panchromatic satellite images for digital elevation model enhancement. In Proceedings of the ISPRS Workshop on Modeling of Optical Airborne and Space Borne Sensors, Istanbul, Turkey, 11–13 October 2010. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; Terry Sejnowski: San Diego, CA, USA, 2012; pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems; Terry Sejnowski: San Diego, CA, USA, 2014; pp. 2366–2374. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Liu, F.; Shen, C.; Lin, G. Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5162–5170. [Google Scholar]
- Mou, L.; Zhu, X.X. IM2HEIGHT: Height estimation from single monocular imagery via fully residual convolutional-deconvolutional network. arXiv 2018, arXiv:1802.10249. [Google Scholar]
- Bittner, K.; Korner, M. Automatic large-scale 3d building shape refinement using conditional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1887–1889. [Google Scholar]
- Bittner, K.; d’Angelo, P.; Körner, M.; Reinartz, P. DSM-to-LoD2: Spaceborne Stereo Digital Surface Model Refinement. Remote Sens. 2018, 10, 1926. [Google Scholar] [CrossRef]
- Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 160–167. [Google Scholar]
- Deng, L.; Hinton, G.; Kingsbury, B. New types of deep neural network learning for speech recognition and related applications: An overview. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 8599–8603. [Google Scholar]
- Kokkinos, I. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6129–6138. [Google Scholar]
- Liebel, L.; Körner, M. Auxiliary tasks in multi-task learning. arXiv 2018, arXiv:1805.06334. [Google Scholar]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef] [Green Version]
- Vakalopoulou, M.; Platias, C.; Papadomanolaki, M.; Paragios, N.; Karantzalos, K. Simultaneous registration, segmentation and change detection from multisensor, multitemporal satellite image pairs. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1827–1830. [Google Scholar]
- Srivastava, S.; Volpi, M.; Tuia, D. Joint height estimation and semantic labeling of monocular aerial images with CNNs. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 5173–5176. [Google Scholar]
- Sener, O.; Koltun, V. Multi-task learning as multi-objective optimization. In Advances in Neural Information Processing Systems; Terry Sejnowski: San Diego, CA, USA, 2018; pp. 527–538. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Switzerland, 2015; pp. 234–241. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7482–7491. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
- Hu, J.; Ozay, M.; Zhang, Y.; Okatani, T. Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries. arXiv 2018, arXiv:1803.08673. [Google Scholar]
- d’Angelo, P.; Reinartz, P. Semiglobal matching results on the ISPRS stereo matching benchmark. In Proceedings of the High-Resolution Earth Imaging for Geospatial Information, Hannover, Germany, 14–17 June 2011; pp. 79–84. [Google Scholar]
- Shewchuk, J.R. Triangle: Engineering a 2D quality mesh generator and Delaunay triangulator. In Workshop on Applied Computational Geometry; Springer: Berlin/Heidelberg, Germany, 1996; pp. 203–222. [Google Scholar]
- Delaunay, B. Sur la sphere vide. Otdelenie Matematicheskii i Estestvennyka Nauk 1934, 7, 1–2. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Höhle, J.; Höhle, M. Accuracy assessment of digital elevation models by means of robust statistical methods. ISPRS J. Photogramm. Remote Sens. 2009, 64, 398–406. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.; Zhu, T.; Tang, Y.; Zhang, W. Geostatistical approaches to refinement of digital elevation data. Geo-Spat. Inf. Sci. 2014, 17, 181–189. [Google Scholar] [CrossRef] [Green Version]
- Elaksher, A.F.; Bethel, J. Refinement of digital elevation models in urban areas using breaklines via a multi-photo least squares matching algorithm. J. Terr. Obs. 2010, 2, 7. [Google Scholar]
- Hobi, M.L.; Ginzler, C. Accuracy assessment of digital surface models based on WorldView-2 and ADS80 stereo remote sensing data. Sensors 2012, 12, 6347–6368. [Google Scholar] [CrossRef]
Method | Error | ||
---|---|---|---|
RMSE (m) | NMAD (m) | MAE (m) | |
cGAN [43] | 3.29 | 0.88 | 1.78 |
only UNet based | 3.20 | 0.91 | 1.71 |
only ResNet34 based | 3.23 | 0.96 | 1.71 |
only DeepLabv3+ based | 2.51 | 1.07 | 1.51 |
joint UNet and ResNet34 | 3.21 | 0.89 | 1.72 |
joint UNet and DeepLabv3+ | 3.12 | 0.90 | 1.69 |
Method | Error | |||
---|---|---|---|---|
IoU (%) | F1-Score (%) | Precision (%) | Recall (%) | |
only UNet based | 59.78 | 72.07 | 77.05 | 48.43 |
only ResNet34 based | 61.05 | 73.28 | 79.55 | 51.64 |
only DeepLabv3+ based | 62.73 | 74.83 | 78.59 | 52.18 |
joint UNet and ResNet | 61.54 | 73.73 | 79.28 | 51.80 |
joint UNet and DeepLabv3+ | 64.44 | 76.34 | 80.03 | 55.2 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bittner, K.; Körner, M.; Fraundorfer, F.; Reinartz, P. Multi-Task cGAN for Simultaneous Spaceborne DSM Refinement and Roof-Type Classification. Remote Sens. 2019, 11, 1262. https://doi.org/10.3390/rs11111262
Bittner K, Körner M, Fraundorfer F, Reinartz P. Multi-Task cGAN for Simultaneous Spaceborne DSM Refinement and Roof-Type Classification. Remote Sensing. 2019; 11(11):1262. https://doi.org/10.3390/rs11111262
Chicago/Turabian StyleBittner, Ksenia, Marco Körner, Friedrich Fraundorfer, and Peter Reinartz. 2019. "Multi-Task cGAN for Simultaneous Spaceborne DSM Refinement and Roof-Type Classification" Remote Sensing 11, no. 11: 1262. https://doi.org/10.3390/rs11111262