# 2D Image-To-3D Model: Knowledge-Based 3D Building Reconstruction (3DBR) Using Single Aerial Images and Convolutional Neural Networks (CNNs)

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- It proposes a novel procedure to extract latent and inherent information from a single 2D image contributing to understanding and interpreting the 3D scenes;
- An optimized multi-scale convolutional–deconvolutional network (MSCDN) is designed for height prediction as well as extraction of the linear elements of roofs from single aerial images;
- The building detection, building boundary extraction, and segmentation of roofs can be performed simultaneously using one network, trained by a manually generated training dataset;
- In the proposed framework, the prismatic and parametric models of the individual buildings are reconstructed from a single aerial image without the need for any additional data;
- A training dataset (https://github.com/loosgagnet/Roofline-Extraction) including linear elements of different roofs is created manually, which can be used for different applications such as 3D city modeling and CAD models.

## 2. Materials and Methods

#### 2.1. Data Preparation

- Scale: all image tiles are randomly scaled by s ∈ [1, 1.5];
- Rotation: all image tiles are randomly rotated by r ∈ [−5, 5] degrees;
- Color: ortho-photo tiles are multiplied globally by a random RGB value c ∈ [0.8, 1.2];
- Flips: all image tiles are horizontally and vertically flipped with a probability of 0.5.

#### 2.2. CNNs Training

#### 2.3. 3D Reconstruction

^{2}based on the pixel size of images (e.g., 5 cm), all binary segments with the areas smaller than 50 m

^{2}can be removed. Next, a closing morphological operator including a disk-shaped structuring element with the radius of 50 cm (e.g., 10 pixels) is applied to the binary image to close all linear segments which belong to a linear object. Next, all of polygons are filled to create the filled binary segments. Finally, an opening morphological operator including a disk-shaped structuring with the radius of 100 cm (e.g., 20 pixels) is applied to the binary image to separate all linear segments which do not belong to a linear object.

^{2}are removed.

## 3. Results

^{−8}for the Adam optimizer.

## 4. Discussion

#### 4.1. Quality Assessment of Depth Prediction

#### 4.2. Quality Assessment of Roof Line Segmentation

#### 4.3. Quality Assessment of 3D Reconstruction

_{xy}is computed as the 2D Euclidean distances (d) between the corresponding vertexes in the reference and generated parametric models. In addition, the height accuracy (RMS

_{z}) is calculated based on the Z coordinates of vertexes [45].

_{xy}are about 1.2 m and 3.9 m, while the RMS

_{z}are about 0.8 m and 2.4 m for the Potsdam and Zeebrugge test data, respectively, promising results for 3D reconstruction without accurate or even complete 3D data, compared to the state-of-the-art methods, which employ accurate height data [46]. Also, the intersection over union (IoU) metric is calculated to quantify the overlap percentage between the binary polygons generated from the reference linear elements (Figure 20a) and the approximated polygons generated from the predicted linear elements (Figure 20b), which is about 82%. The difference map between two binary polygons at the per-pixel level is shown in Figure 20c. The green segments are true-positive pixels, the red segments are false-negative pixels, and the blue segments are false-positive pixels. Consequently, there is a similarity ratio of about 95.8% between two binary polygons in Potsdam building areas, while this value is about 88.4% for the Zeebrugge data. As shown in Figure 20, the large differences are corresponding to the small or flat buildings where the discrimination capability of the trained networks is decreased for flat roofs and roads. The final results of 3D building reconstruction superimposed on the predicted nDSM are shown in Figure 21, presenting the high visual quality while the spatial patterns of the predicted DSM resemble that of 3D models.

- Trees are one of the common error sources which decrease the accuracy of the predicted eave lines, significantly (Figure 22a);
- If the ridge lines are not extracted, the tilted roofs could be modeled as the flat roofs (Figure 22b).
- There are some classification errors between the eave and ridge lines (Figure 22c);
- If there are some errors in the predicted nDSM, the median values of the eave lines are not measured accurately (Figure 22d).

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Kolbe, T.H.; Gröger, G.; Plümer, L. CityGML—Interoperable Access to 3D City Models. In Proceedings of the Int. Symposium on Geo-information for Disaster Management, Delft, The Netherlands, 21–23 March 2005. [Google Scholar]
- Tarsha-Kurdi, F.; Landes, T.; Grussenmeyer, P.; Koehl, M. Model-Driven and Data-Driven Approaches Using Lidar Data: Analysis and Comparison. In Proceedings of the International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, Munich, Germany, 19–21 September 2006; Volume 36, pp. 87–92. [Google Scholar]
- Wang, R.; Peethambaran, J.; Chen, D. LiDAR Point Clouds to 3D Urban Models: A Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2018**, 11, 606–627. [Google Scholar] [CrossRef] - Cheng, L.; Gong, J.; Li, M.; Liu, Y. 3D Building Model Reconstruction from Multi-view Aerial Imagery and Lidar Data. Photogramm. Eng. Remote Sens.
**2011**, 77, 125–139. [Google Scholar] [CrossRef] - Kim, K.; Shan, J. Building roof modeling from airborne laser scanning data based on level set approach. ISPRS J. Photogramm. Remote Sens.
**2011**, 66, 484–497. [Google Scholar] [CrossRef] - Wang, Y.; Xu, H.; Cheng, L.; Li, M.; Wang, Y. Three-Dimensional Reconstruction of Building Roofs from Airborne LiDAR Data Based on a Layer Connection and Smoothness Strategy. Remote Sens.
**2016**, 8, 415. [Google Scholar] [CrossRef] - Yan, Y.; Gao, F.; Deng, S.; Su, N. A Hierarchical Building Segmentation in Digital Surface Models for 3D Reconstruction. Sensors
**2017**, 17, 222. [Google Scholar] [CrossRef] [PubMed] - Mccann, M.T.; Member, S.; Mixon, D.G.; Fickus, M.C.; Castro, C.A.; Ozolek, J.A.; Kovacevic, J. Images as Occlusions of Textures: A Framework for Segmentation. IEEE Trans. Image Process.
**2014**, 23, 2033–2046. [Google Scholar] [CrossRef] [PubMed] - Awrangjeb, M.; Ali, S.; Gilani, N. An Effective Data-Driven Method for 3-D Building Roof Reconstruction and Robust Change Detection. Remote Sens.
**2018**, 10, 1512. [Google Scholar] [CrossRef] - Lafarge, F.; Descombes, X.; Zerubia, J.; Pierrot-deseilligny, M. Structural Approach for Building Reconstruction from a Single DSM. J. Latex Cl. Files
**2007**, 6, 1–14. [Google Scholar] [CrossRef] [PubMed] - Huang, H.; Brenner, C.; Sester, M.; Hannover, D. 3D Building Roof Reconstruction from Point Clouds via Generative Models Categories and Subject Descriptors. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Chicago, IL, USA, 1–4 November 2004. [Google Scholar]
- Zhang, W.; Wang, H.; Chen, Y.; Yan, K.; Chen, M. 3D Building Roof Modeling by Optimizing Primitive’s Parameters Using Constraints from LiDAR Data and Aerial Imagery. Remote Sens.
**2014**, 6, 8107–8133. [Google Scholar] [CrossRef] - Zheng, Y.; Weng, Q. Model-driven Reconstruction of 3D Buildings Using LiDAR Data. IEEE Geosci. Remote Sens. Lett.
**2015**, 12, 1541–1545. [Google Scholar] [CrossRef] - Zheng, Y.; Weng, Q.; Zheng, Y. A Hybrid Approach for Three-Dimensional Building Reconstruction in Indianapolis from LiDAR Data. Remote Sens.
**2017**, 9, 310. [Google Scholar] [CrossRef] - Kaiser, P.; Wegner, J.D.; Aurélien, L.; Jaggi, M.; Hofmann, T.; Schindler, K. Learning Aerial Image Segmentation from Online Maps. IEEE Trans. Geosci. Remote Sens.
**2017**, 55, 6054–6068. [Google Scholar] [CrossRef] - Persello, C.; Stein, A. Deep Fully Convolutional Networks for the Detection of Informal Settlements in VHR Images. IEEE Geosci. Remote Sens. Lett.
**2017**, 14, 2325–2329. [Google Scholar] [CrossRef] [Green Version] - Wen, Q.; Jiang, K.; Wang, W.; Liu, Q.; Guo, Q.; Li, L.; Wang, P. Automatic building extraction from google earth images under complex backgrounds based on deep instance segmentation network. Sensors
**2019**, 19, 333. [Google Scholar] [CrossRef] [PubMed] - Srivastava, S.; Volpi, M.; Tuia, D. Joint Height Estimation and Semantic Labeling of Monocular Aerial Images with CNNs. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017. [Google Scholar]
- Ghamisi, P.; Yokoya, N. IMG2DSM: Height Simulation from Single Imagery Using Conditional Generative Adversarial Net. IEEE Geosci. Remote Sens. Lett.
**2018**, 5, 794–798. [Google Scholar] [CrossRef] - Mou, L.; Member, S.; Zhu, X.X.; Member, S. IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network. arXiv
**2018**, arXiv:1802.10249, 1–13. [Google Scholar] - Amirkolaee, H.A.; Arefi, H. Height estimation from single aerial images using a deep convolutional encoder-decoder network. ISPRS J. Photogramm. Remote Sens.
**2019**, 149, 50–66. [Google Scholar] [CrossRef] - Bittner, K.; D’Angelo, P.; Körner, M.; Reinartz, P. DSM-to-LoD2: Spaceborne Stereo Digital Surface Model Refinement. Remote Sens.
**2018**, 10, 1926. [Google Scholar] [CrossRef] - Axelsson, P.E. DEM generation from laser scanner data using adaptive TIN models. In Proceedings of the International Archives of the Photogrammetry and Remote Sensing, Amsterdam, The Netherlands, 16–22 July 2000; pp. 110–117. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell.
**2017**, 39, 2481–2495. [Google Scholar] [CrossRef] - Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper Depth Prediction with Fully Convolutional Residual Networks. In Proceedings of the IEEE International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. In Proceedings of the International Conference on Computer Vision (ICCV), Washington, DC, USA, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Kaiming, H.; Xiangyu, Z.; Shaoqing, R.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1–9. [Google Scholar]
- Robbins, H.; Monro, S. A Stochastic Approximation Method. Ann. Math. Stat.
**2007**, 22, 400–407. [Google Scholar] [CrossRef] - Kiefer, J.; Wolfowitz, J. Stochastic Estimation of the Maximum of a Regression Function. Ann. Math. Stat.
**1952**, 23, 462–466. [Google Scholar] [CrossRef] - Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
- Arefi, H.; Reinartz, P. Building Reconstruction Using DSM and Orthorectified Images. Remote Sens.
**2013**, 5, 1681–1703. [Google Scholar] [CrossRef] [Green Version] - ISPRS 2D Semantic Labeling Contest-Potsdam. Available online: http://www2.isprs.org/commissions/comm3/wg4/2d-sem-label-potsdam.html. (accessed on 15 September 2019).
- 2015 IEEE GRSS Data Fusion Contest. Available online: http://www.grss-ieee.org/community/technical-committees/data-fusion. (accessed on 15 September 2019).
- Shepard, D. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the the 1968 ACM National Conference, New York, NY, USA, 27–29 August 1968; pp. 517–524. [Google Scholar]
- Saxena, A.; Sun, M.; Ng, A.Y. Make3D: Learning 3D Scene Structure from a Single Still Image. IEEE Trans. Pattern Anal. Mach. Intell.
**2009**, 12, 824–840. [Google Scholar] [CrossRef] - Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. Int. Conf. Neural Inf. Process. Syst.
**2014**, 2, 2366–2374. [Google Scholar] - Zhao, S.; Fu, H.; Gong, M.; Tao, D. Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9788–9798. [Google Scholar]
- Liu, F.; Shen, C.; Lin, G. Deep Convolutional Neural Fields for Depth Estimation from a Single Image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5162–5170. [Google Scholar]
- Goldman, M.; Hassner, T.; Avidan, S. Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Li, B.; Shen, C.; Dai, Y.; Van Den Hengel, A.; He, M. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Proceedings of the Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1119–1127. [Google Scholar]
- He, L.; Wang, G.; Hu, Z. Learning depth from single images with deep neural network embedding focal length. IEEE Trans. Image Process.
**2018**, 27, 4676–4689. [Google Scholar] [CrossRef] - Li, Z.; Snavely, N. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2041–2050. [Google Scholar]
- McGlone, J.C.; Shufelt, J.A. Projective and object space geometry for monocular building extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR-94, Seattle, WA, USA, 21–23 June 1994; IEEE Computer Society Press: Washington, DC, USA; pp. 54–61. [Google Scholar]
- McKeown, D.M.; Bulwinkle, T.; Cochran, S.; Harvey, W.; McGlone, C.; Shufelt, J.A. Performance evaluation for automatic feature extraction. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Amsterdam, The Netherlands, 16–22 July 2000; pp. 379–394. [Google Scholar]
- Rottensteiner, F.; Trinder, J.; Clode, S.; Kubik, K. Using the Dempster-Shafer method for the fusion of LIDAR data and multi-spectral images for building detection. Inf. Fusion
**2005**, 6, 283–300. [Google Scholar] [CrossRef] - ISPRS. ISPRS Test Project on Urban Classification and 3D Building Reconstruction. Available online: http://www2.isprs.org/commissions/comm3/wg4/results/a3_recon.html (accessed on 15 September 2019).

**Figure 2.**Two samples of training data: (

**a**) RGB images; (

**b**) normalized digital surface models (nDSMs); and (

**c**) linear elements of roofs.

**Figure 6.**The difference between the standard and proposed residual blocks: (

**a**) the standard residual block [27], and (

**b**) the proposed residual block.

**Figure 7.**The 3D reconstruction workflow: (

**a**) The input RGB image; (

**b**) the predicted nDSM; (

**c**) the predicted linear elements of roofs; (

**d**) the initial binary segment; (

**e**) the extracted eave lines including the Hough line and the main orientation of the building block; (

**f**) the rotated binary segment; (

**g**) the first approximation of the building block; (

**h**) the final approximation of the building block; (

**i**) the overlay of the final approximation on the RGB image; (

**j**) eave and separator lines of roofs; (

**k**) individual roofs; (

**l**) prismatic reconstruction; (

**m**) predicted ridge lines; (

**n**) regularized eave lines; (

**o**) calculated hip lines of roofs; and (

**p**) parametric reconstruction.

**Figure 9.**The rule to define the separator lines for different building blocks: (

**a**) The approximated polygon; (

**b**) the corresponding eave lines; (

**c**) a rule-based search strategy to find separator lines; (

**d**) final separator lines; and (

**e**) individual building parts.

**Figure 10.**The proposed splitting and merging algorithm: (

**a**) The approximated polygon; (

**b**) the corresponding boundaries; (

**c**) the supporting lines; (

**d**) the merging rule; (

**e**) the individual polygons.

**Figure 11.**A rule-based search strategy to find an appropriate ridge line: (

**a**) original ridge lines; (

**b**) the ridge line parallel to the main orientation of the polygon; and (

**c**) the fitted ridge line.

**Figure 12.**Overview of training and testing areas: (

**a**) the training area from Potsdam, Germany; (

**b**) the testing area from Potsdam, Germany; (

**c**) the testing area from Zeebrugge, Belgium.

**Figure 14.**The results of the MSCDNs compared to the ground truths for the Potsdam test dataset: (

**a**) the input RGB image; (

**b**) the reference nDSM; (

**c**) the predicted nDSM; (

**d**) the reference linear elements of roofs; and (

**e**) the predicted linear elements.

**Figure 15.**The results of the MSCDNs compared to the ground truths for the Zeebrugge test dataset: (

**a**) the input RGB image; (

**b**) the reference nDSM; (

**c**) the predicted nDSM; (

**d**) the reference linear elements of roofs; and (

**e**) the predicted linear elements.

**Figure 16.**The results of prismatic reconstruction: (

**a**) The initial binary polygons, derived from the predicted linear elements (the first row for Potsdam and the second row for Zeebrugge); (

**b**) the regularized polygons of building blocks (the first row for Potsdam and the second row for Zeebrugge); (

**c**) the overlay of the regularized polygons on the RGB image (the first row for Potsdam and the second row for Zeebrugge); and (

**d**) the 3D reconstruction results in levels of details (LoD1) (the left image for Potsdam and the right image for Zeebrugge).

**Figure 17.**The results of parametric reconstruction for Potsdam (the left image) and Zeebrugge (the right image): (

**a**) The individual regularized linear elements superimposed on the approximated polygons; (

**b**) the individual regularized linear elements superimposed on the RGB image; and (

**c**) the 3D reconstruction results in LoD2.

**Figure 18.**The height difference map between the predicted nDSM and the ground truth: (

**a**) the Potsdam test data; (

**b**) the Zeebrugge test data.

**Figure 19.**The geometrical accuracy of proposed MSCDN: (

**a**) the input image; (

**b**) the ground truth; (

**c**) the predicted depth using MSCDN; and (

**d**) the predicted depth using fully convolutional residual network (FCRN) [25].

**Figure 20.**The difference map (

**c**) between the reference polygons (

**a**) and approximated polygons (

**b**) for the Potsdam (the first row) and Zeebrugge (the second row) datasets.

**Figure 21.**The final result of 3D building reconstruction (3DBR) superimposed on the predicted nDSM for the Potsdam (the first row) and Zeebrugge (the second row) datasets.

**Figure 22.**The error sources in 3D reconstruction: (

**a**) errors of predicted eave lines; (

**b**) errors of predicted ridge lines; (

**c**) errors of line classification; and (

**d**) errors of the predicted nDSM.

**Table 1.**The proposed architecture for the multi-scale convolutional–deconvolutional network (MSCDNs).

Encoder | Act. | Output size | Param. | Decoder | Act. | Output size | Param. | Fine-scale | Act. | Output size | Param. |
---|---|---|---|---|---|---|---|---|---|---|---|

Input image | - | 224 × 224 × 3 | - | Conv 3 × 3, s1 + BN | ReLU | 28 × 28 × 512 | 2.3 M | Input image | - | 224 × 224 × 3 | - |

2x {Conv 3 × 3, s1 + BN} | ReLU | 224 × 224 × 64 | 3.6 k | Conv 3 × 3, s1 + BN | - | 28 × 28 × 512 | 2.3 M | Conv 3 × 3, s1 + BN | - | 224 × 224 × 64 | 1.8 k |

Pooling 2 × 2, s2 | - | 112 × 112 × 64 | - | Summation | - | 28 × 28 × 512 | - | Summation | - | 224 × 224 × 64 | - |

2x {Conv 3 × 3, s1 + BN} | ReLU | 112 × 112 × 64 | 73.8 k | Conv 3 × 3, s1 + BN | ReLU | 28 × 28 × 512 | 2.3 M | Conv 3 × 3, s1 + BN | ReLU | 224 × 224 × 64 | 36.9 k |

Pooling 2 × 2, s2 | - | 56 × 56 × 64 | - | Conv 3 × 3, s1 + BN | - | 28 × 28 × 512 | 2.3 M | Conv 3 × 3, s1 | - | 224 × 224 × 1 | 577 |

3x {Conv 3 × 3, s1 + BN} | ReLU | 56 × 56 × 256 | 443.1 k | Summation | - | 28 × 28 × 512 | - | ||||

Pooling 2 × 2, s2 | - | 28 × 28 × 256 | - | Conv 3 × 3, s1 + BN | ReLU | 28 × 28 × 512 | 2.3 M | ||||

Conv 3 × 3, s1 + BN | ReLU | 28 × 28 × 256 | 590.1 k | Conv 3 × 3, s1 + BN | - | 28 × 28 × 512 | 2.3 M | ||||

Conv 3 × 3, s1 + BN | - | 28 × 28 × 256 | 590.1 k | Summation | - | 28 × 28 × 512 | - | ||||

Concatenation | - | 28 × 28 × 512 | - | Un-pooling 2 × 2 | - | 56 × 56 × 512 | - | ||||

Conv 3 × 3, s1 + BN | ReLU | 28 × 28 × 256 | 1.2 M | 3x {Conv 3 × 3, s1 + BN} | ReLU | 56 × 56 × 256 | 3.5 M | ||||

Conv 3 × 3, s1 + BN | - | 28 × 28 × 256 | 590.1 k | Un-pooling 2 × 2 | - | 112 × 112 × 256 | - | ||||

Concatenation | - | 28 × 28 × 512 | - | 2x {Conv 3 × 3, s1 + BN} | ReLU | 112 × 112 × 256 | 1.2 M | ||||

Conv 3 × 3, s1 + BN | ReLU | 28 × 28 × 256 | 1.2 M | Un-pooling 2 × 2 | - | 224 × 224 × 256 | - | ||||

Conv 3 × 3, s1 + BN | - | 28 × 28 × 256 | 590.1 k | Conv 3 × 3, s1 + BN | ReLU | 224 × 224 × 64 | 147.5 k | ||||

Concatenation | - | 28 × 28 × 512 | - | ||||||||

Total Parameters: | 5.2 M | Total Parameters: | 19.0M | Total Parameters: | 39.3 k |

Metrics | Descriptors | The Potsdam Test Data | The Zeebrugge Test Data |
---|---|---|---|

Standard | Mean Error | 1.69 m | 2.03 m |

Standard Deviation (SD) | 1.31 m | 1.26 m | |

Root Mean Square Error (RMSE) | 3.57 m | 3.30 m | |

Relative Error (REL) | 0.4% | 0.9% | |

Root Mean Squared Logarithmic Error (RMSLE) | 0.23 m | 1.19 m | |

Robust | Median Error | 1.35 m | 1.68 m |

Normalized Median Absolute Deviation (NMAD) | 1.32 m | 0.98 m | |

Quantile 68.3% | 2.16 m | 2.27 m | |

Quantile 95% | 4.31 m | 4.75 m |

**Table 3.**Comparison between the proposed MSCDN and the state-of-the-art methods for depth prediction.

Dataset | Method | RMSE [m] | REL [%] | RMSLE [m] |
---|---|---|---|---|

Make3D | Zhao et al. [37] | 10.424 | 0.403 | - |

Liu et al. [38] | 9.49 | 0.355 | 0.137 | |

Goldman et al. [39] | 8.789 | 0.406 | 0.183 | |

Li et al. [40] | 7.19 | 0.278 | 0.092 | |

Eigen et al. [36] | 7.16 | 0.190 | 0.270 | |

He et al. [41] | 6.801 | 0.208 | 0.087 | |

Laina et al. [25] | 4.46 | 0.176 | 0.072 | |

Li et al. [42] | 4.25 | 0.178 | 0.064 | |

Proposed MSCDN | 4.31 | 0.184 | 0.074 | |

Potsdam | Ghamisi et al. [19] | 3.89 | - | - |

Amini et al. [21] | 3.468 | 0.571 | 0.259 | |

Proposed MSCDN | 3.57 | 0.4 | 0.23 |

**Table 4.**The confusion matrix and the quality metrics for the predicted linear elements of roofs in Potsdam test data.

Classes of Linear Elements | Pre._eave [1, 0, 0] | Pre._ridge [0, 1, 0] | Pre._hip [0, 0, 1] | Pre._black [0, 0, 0] | Quality Measures | [%] |
---|---|---|---|---|---|---|

Ref._eave [1, 0, 0] | 49,181 | 2873 | 3215 | 5653 | Comp. | 95.46 |

Ref._ridge [0, 1, 0] | 5217 | 9332 | 5984 | 968 | Corr. | 95.46 |

Ref._hip [0, 0, 1] | 956 | 1301 | 3258 | 803 | Qual. | 91.31 |

Ref._black [0, 0, 0] | 1,300,905 | 210,002 | 187,722 | 36,224,481 | F1 score | 95.46 |

**Table 5.**The confusion matrix and the quality metrics for the predicted linear elements of roofs in Zeebrugge test data.

Classes of Linear Elements | Pre._eave [1, 0, 0] | Pre._ridge [0, 1, 0] | Pre._hip [0, 0, 1] | Pre._black [0, 0, 0] | Quality Measures | [%] |
---|---|---|---|---|---|---|

Ref._eave [1, 0, 0] | 237,914 | 23,994 | 7161 | 292,100 | Comp. | 91.12 |

Ref._ridge [0, 1, 0] | 11,203 | 32,590 | 13,540 | 11,093 | Corr. | 91.12 |

Ref._hip [0, 0, 1] | 6804 | 1996 | 11,248 | 5627 | Qual. | 83.69 |

Ref._black [0, 0, 0] | 968,694 | 121,216 | 75,591 | 15,514,881 | F1 score | 91.12 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Alidoost, F.; Arefi, H.; Tombari, F.
2D Image-To-3D Model: Knowledge-Based 3D Building Reconstruction (3DBR) Using Single Aerial Images and Convolutional Neural Networks (CNNs). *Remote Sens.* **2019**, *11*, 2219.
https://doi.org/10.3390/rs11192219

**AMA Style**

Alidoost F, Arefi H, Tombari F.
2D Image-To-3D Model: Knowledge-Based 3D Building Reconstruction (3DBR) Using Single Aerial Images and Convolutional Neural Networks (CNNs). *Remote Sensing*. 2019; 11(19):2219.
https://doi.org/10.3390/rs11192219

**Chicago/Turabian Style**

Alidoost, Fatemeh, Hossein Arefi, and Federico Tombari.
2019. "2D Image-To-3D Model: Knowledge-Based 3D Building Reconstruction (3DBR) Using Single Aerial Images and Convolutional Neural Networks (CNNs)" *Remote Sensing* 11, no. 19: 2219.
https://doi.org/10.3390/rs11192219