End-to-End Framework for the Automatic Matching of Omnidirectional Street Images and Building Data and the Creation of 3D Building Models
Abstract
:1. Introduction
- The main contribution of this study was the automatic linkage of several large raw datasets. The three main inputs in this research were omnidirectional images, building footprints, and aerial photographs. These inputs were automatically linked and pre-processed end to end.
- HRNetV2 was used to extract the wall surface areas and a detection transformer (DETR) was used to extract the locations of the opening of building elements, such as windows and doors. We propose a method to estimate roof types from building images using a CNN and further identify roof colors from aerial photographs. This approach allows the creation of a structured 3D model with accurate semantic information about the building elements.
- With our method, a 3D model with detailed information about window locations, door locations, texture images, roof types, and roof colors can be created automatically and comprehensively. This will contribute to the realization of an automatic LOD for three generations in the future.
2. Related Works
3. Data
3.1. Data and Focus Area
3.1.1. Omnidirectional Images
3.1.2. Building Footprint GIS Data
3.1.3. Aerial Photographs
3.1.4. Texture Images
3.2. Creation of the Dataset
3.2.1. Data of the Target Area for the Creation of 3D Building Models
3.2.2. Data for Training and Validation of the Framework
4. Method
4.1. Matching Omnidirectional Images and Buildings
4.1.1. Matching Building Bounding Boxes and Building Footprints
4.1.2. Matching Building Bounding Boxes and Building Edges
4.1.3. Rectification
4.2. Building Element Extraction
4.2.1. Wall Crop Creation
4.2.2. Window and Door Location Detection
4.3. Texture Image Generation
4.4. Determination of Roof Type and Color
4.4.1. Roof Type Classification
4.4.2. Roof Color Determination
4.5. Three-Dimensional Building Model Generation
5. Results and Discussion
5.1. Matching Building Images and Building Edges
5.2. Building Element Extraction
5.2.1. Wall Crop Creation
5.2.2. Window and Door Location Detection
5.3. Texture Image Generation
5.4. Determination of the Roof Type and Color
5.4.1. Roof Type Classification
5.4.2. Roof Color Determination
5.5. Three-dimensional Building Model Generation
5.6. Confidence Score for 3D Building Models
5.7. Limitation
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gruen, A.; Schubiger, S.; Qin, R.; Schrotter, G.; Xiong, B.; Li, J.; Ling, X.; Xiao, C.; Yao, S.; Nuesch, F. Semantically enriched high resolution LoD 3 building model generation. ISPRS Photogramm. Remote Sens. Spat. 2019, 42, 11–18. [Google Scholar] [CrossRef]
- Kolbe, T.H.; Gröger, G.; Plümer, L. CityGML: Interoperable access to 3D city models. In Geo-Information for Disaster Management; Oosterom, P., Zlatanocva, S., Fendel, E.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 883–899. [Google Scholar] [CrossRef]
- Wang, R. 3D building modeling using images and LiDAR: A review. Int. J. Image Data Fusion 2013, 4, 273–292. [Google Scholar] [CrossRef]
- Zlatanova, S.; Van Oosterom, P.; Verbree, E. 3D Technology for Improving Disaster Management: Geo-DBMS and Positioning. In Proceedings of the XXth ISPRS Congress; Available online: https://www.isprs.org/PROCEEDINGS/XXXV/congress/comm7/papers/124.pdf (accessed on 1 January 2024).
- Arroyo Ohori, K.; Biljecki, F.; Kumar, K.; Ledoux, H.; Stoter, J. Modeling cities and landscapes in 3D with CityGML. In Building Information Modeling; Springer: Cham, Switzerland, 2018; pp. 199–215. [Google Scholar] [CrossRef]
- Oishi, S.; Koide, K.; Yokozuka, M.; Banno, A. 4D Attention: Comprehensive framework for spatio-temporal gaze mapping. IEEE Robot. Autom. 2021, 6, 7240–7247. [Google Scholar] [CrossRef]
- Soon, K.H.; Khoo, V.H.S. CityGML modelling for Singapore 3D national mapping. ISPRS Photogramm. Remote Sens. Spat. 2017, 42, 37–42. [Google Scholar] [CrossRef]
- Gröger, G.; Plümer, L. CityGML—Interoperable semantic 3D city models. ISPRS J. Photogramm. Remote Sens. 2012, 71, 12–33. [Google Scholar] [CrossRef]
- Ministry of Land, Infrastructure, Transport and Tourism, PLATEAU. Available online: https://www.mlit.go.jp/plateau/ (accessed on 1 January 2024).
- Brenner, C.; Haala, N.; Fritsch, D. Towards Fully Automated 3D City Model Generation. In Automatic Extraction of Man-Made Objects from Aerial and Space Images (III); 2001; pp. 47–57. Available online: https://ifpwww.ifp.uni-stuttgart.de/publications/2001/Haala01_ascona.pdf (accessed on 1 January 2024).
- Lee, J.; Yang, B. Developing an optimized texture mapping for photorealistic 3D buildings. Trans. GIS 2019, 23, 1–21. [Google Scholar] [CrossRef]
- Rau, J.-Y.; Teo, T.-A.; Chen, L.-C.; Tsai, F.; Hsiao, K.-H.; Hsu, W.-C. Integration of GPS, GIS and photogrammetry for texture mapping in photo-realistic city modeling. In Proceedings of the Pacific-Rim Symposium on Image and Video Technology, Hsinchu, Taiwan, 10–13 December 2006; pp. 1283–1292. [Google Scholar] [CrossRef]
- He, H.; Yu, J.; Cheng, P.; Wang, Y.; Zhu, Y.; Lin, T.; Dai, G. Automatic, multiview, coplanar extraction for CityGML building model texture mapping. Remote Sens. 2021, 14, 50. [Google Scholar] [CrossRef]
- Tack, F.; Buyuksalih, G.; Goossens, R. 3D building reconstruction based on given ground plan information and surface models extracted from spaceborne imagery. ISPRS J. Photogramm. Remote Sens. 2012, 67, 52–64. [Google Scholar] [CrossRef]
- Wen, X.; Xie, H.; Liu, H.; Yan, L. Accurate reconstruction of the LOD3 building model by integrating multi-source point clouds and oblique remote sensing imagery. ISPRS Int. J. Geo-Inf. 2019, 8, 135. [Google Scholar] [CrossRef]
- Bshouty, E.; Shafir, A.; Dalyot, S. Towards the generation of 3D OpenStreetMap building models from single contributed photographs. Comput. Environ. Urban Syst. 2020, 79, 101421. [Google Scholar] [CrossRef]
- Diakite, A.A.; Zlatanova, S. Automatic geo-referencing of BIM in GIS environments using building footprints. Comput. Environ. Urban Syst. 2020, 80, 101453. [Google Scholar] [CrossRef]
- Ohori, K.A.; Ledoux, H.; Biljecki, F.; Stoter, J. Modeling a 3D city model and its levels of detail as a true 4D model. ISPRS. Int. J. Geo-Inf. 2015, 4, 1055–1075. [Google Scholar] [CrossRef]
- Ding, M.; Lyngbaek, K.; Zakhor, A. Automatic registration of aerial imagery with untextured 3D LiDAR models. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 8 June 2008; pp. 1–8. [Google Scholar] [CrossRef]
- Awrangjeb, M.; Zhang, C.; Fraser, C.S. Automatic extraction of building roofs using LIDAR data and multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2013, 83, 1–18. [Google Scholar] [CrossRef]
- Huang, J.; Stoter, J.; Peters, R.; Nan, L. City3D: Large-Scale Building Reconstruction from Airborne LiDAR Point Clouds. Remote Sens. 2022, 14, 2254. [Google Scholar] [CrossRef]
- Jovanović, D.; Milovanov, S.; Ruskovski, I.; Govedarica, M.; Sladić, D.; Radulović, A.; Pajić, V. Building virtual 3D city model for smart cities applications: A case study on campus area of the University of Novi Sad. ISPRS Int. J. Geo-Inf. 2020, 9, 476. [Google Scholar] [CrossRef]
- Rumpler, M.; Irschara, A.; Wendel, A.; Bischof, H. Rapid 3D city model approximation from publicly available geographic data sources and georeferenced aerial images. In Proceedings of the Computer Vision Winter Workshop, Mala Nedelja, Slovenia, 1–3 February 2012; pp. 1–8. Available online: https://www.tugraz.at/fileadmin/user_upload/Institute/ICG/Images/team_fraundorfer/personal_pages/markus_rumpler/citymodeling_cvww2012.pdf (accessed on 1 January 2024).
- Buyukdemircioglu, M.; Kocaman, S.; Isikdag, U. Semi-automatic 3D city model generation from large-format aerial images. ISPRS Int. J. Geo-Inf. 2018, 7, 339. [Google Scholar] [CrossRef]
- Müller, P.; Zeng, G.; Wonka, P.; Van Gool, L. Image-based procedural modeling of facades. ACM Trans. Graph. 2007, 26, 85. [Google Scholar] [CrossRef]
- Affara, L.; Nan, L.; Ghanem, B.; Wonka, P. Large scale asset extraction for urban images. In Proceedings of the European Conference on Computer Vision—ECCV 2016, Lecture Notes in Computer Science, 9907, Amsterdam, The Netherlands, 11–14 October 2016; pp. 437–452. [Google Scholar] [CrossRef]
- Martinović, A.; Mathias, M.; Weissenberg, J.; Gool, L.V. A three-layered approach to facade parsing. In Proceedings of the European Conference on Computer Vision—ECCV 2012, Florence, Italy, 7–13 October 2012; pp. 416–429. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Schmitz, M.; Mayer, H. A convolutional network for semantic facade segmentation and interpretation. ISPRS Photogramm. Remote Sens. Spat. 2016, 41, 709–715. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Dore, C.; Murphy, M. Semi-automatic generation of as-built BIM façade geometry from laser and image data. J. Inf. Technol. Constr. 2014, 19, 20–46. Available online: https://www.itcon.org/2014/2 (accessed on 1 January 2024).
- Agarwal, S.; Furukawa, Y.; Snavely, N.; Simon, I.; Curless, B.; Seitz, S.M.; Szeliski, R. Building Rome in a day. Commun. ACM 2011, 54, 105–112. [Google Scholar] [CrossRef]
- Snavely, N.; Seitz, S.M.; Szeliski, R. Photo tourism: Exploring photo collections in 3D. ACM Trans. Graph. 2006, 25, 835–846. [Google Scholar] [CrossRef]
- Zhang, K.; Riegler, G.; Snavely, N.; Koltun, V. NeRF++: Analyzing and improving neural radiance fields. arXiv 2020, arXiv:2010.07492. [Google Scholar] [CrossRef]
- Tancik, M.; Casser, V.; Yan, X.; Pradhan, S.; Mildenhall, B.; Srinivasan, P.P.; Barron, J.T.; Kretzschmar, H. Block-NeRF: Scalable large scene neural view synthesis. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8238–8248. [Google Scholar] [CrossRef]
- Bódis-Szomorú, A.; Riemenschneider, H.; Van Gool, L. Efficient volumetric fusion of airborne and street-side data for urban reconstruction. In Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 3204–3209. [Google Scholar] [CrossRef]
- Kelly, T.; Femiani, J.; Wonka, P.; Mitra, N.J. BigSUR: Large-scale structured urban reconstruction. ACM Trans. Graph. 2017, 36, 1–16. [Google Scholar] [CrossRef]
- Hertzmann, A.; Jacobs, C.E.; Oliver, N.; Curless, B.; Salesin, D.H. Image analogies. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 1 August 2021; pp. 327–340. [Google Scholar] [CrossRef]
- Wei, L.Y.; Levoy, M. Fast texture synthesis using tree-structured vector quantization. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 479–488. [Google Scholar] [CrossRef]
- Henzler, P.; Mitra, N.J.; Ritschel, T. Learning a neural 3D texture space from 2D exemplars. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8356–8364. [Google Scholar] [CrossRef]
- Vidanapathirana, M.; Wu, Q.; Furukawa, Y.; Chang, A.X.; Savva, M. Plan2Scene: Converting floorplans to 3D scenes. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10733–10742. [Google Scholar] [CrossRef]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4460–4470. [Google Scholar] [CrossRef]
- Ogawa, Y.; Oki, T.; Chen, S.; Sekimoto, Y. Joining street-view images and building footprint GIS data. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Searching and Mining Large Collections of Geospatial Data, Beijing, China, 8 November 2021; pp. 18–24. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar] [CrossRef]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar] [CrossRef]
- Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ADE20K dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 633–641. [Google Scholar] [CrossRef]
- Zhou, B.; Zhao, H.; Puig, X.; Xiao, T.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through the ADE20K dataset. Int. J. Comput. Vis. 2019, 127, 302–321. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the 2019 IEEE/CVF conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2009; pp. 658–666. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Venice, Italy, 21–26 July 2017; pp. 105–114. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
- Ashikhmin, M. Synthesizing natural textures. In Proceedings of the 2001 Symposium on Interactive 3D Graphics, Chapel Hill, NC, USA, 26–29 March 2001; pp. 217–226. [Google Scholar] [CrossRef]
- Efros, A.A.; Freeman, W.T. Image quilting for texture synthesis and transfer. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 12–17 August 2001; pp. 341–346. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2006; pp. 1027–1035. Available online: https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf (accessed on 1 January 2024).
- Fan, H.; Kong, G.; Zhang, C. An Interactive platform for low-cost 3D building modeling from VGI data using convolutional neural network. Big Earth Data 2021, 5, 49–65. [Google Scholar] [CrossRef]
Roof Type | Accuracy |
---|---|
Flat | 95.6% |
Gable | 88.5% |
Hip | 87.4% |
Overall | 89.7% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ogawa, Y.; Nakamura, R.; Sato, G.; Maeda, H.; Sekimoto, Y. End-to-End Framework for the Automatic Matching of Omnidirectional Street Images and Building Data and the Creation of 3D Building Models. Remote Sens. 2024, 16, 1858. https://doi.org/10.3390/rs16111858
Ogawa Y, Nakamura R, Sato G, Maeda H, Sekimoto Y. End-to-End Framework for the Automatic Matching of Omnidirectional Street Images and Building Data and the Creation of 3D Building Models. Remote Sensing. 2024; 16(11):1858. https://doi.org/10.3390/rs16111858
Chicago/Turabian StyleOgawa, Yoshiki, Ryoto Nakamura, Go Sato, Hiroya Maeda, and Yoshihide Sekimoto. 2024. "End-to-End Framework for the Automatic Matching of Omnidirectional Street Images and Building Data and the Creation of 3D Building Models" Remote Sensing 16, no. 11: 1858. https://doi.org/10.3390/rs16111858
APA StyleOgawa, Y., Nakamura, R., Sato, G., Maeda, H., & Sekimoto, Y. (2024). End-to-End Framework for the Automatic Matching of Omnidirectional Street Images and Building Data and the Creation of 3D Building Models. Remote Sensing, 16(11), 1858. https://doi.org/10.3390/rs16111858