^{1}

^{★}

^{2}

^{3}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This paper presents a model-based approach for reconstructing 3D polyhedral building models from aerial images. The proposed approach exploits some geometric and photometric properties resulting from the perspective projection of planar structures. Data are provided by calibrated aerial images. The novelty of the approach lies in its featurelessness and in its use of direct optimization based on image rawbrightness. The proposed framework avoids feature extraction and matching. The 3D polyhedral model is directly estimated by optimizing an objective function that combines an image-based dissimilarity measure and a gradient score over several aerial images. The optimization process is carried out by the Differential Evolution algorithm. The proposed approach is intended to provide more accurate 3D reconstruction than feature-based approaches. Fast 3D model rectification and updating can take advantage of the proposed method. Several results and evaluations of performance from real and synthetic images show the feasibility and robustness of the proposed approach.

In the two past decades, the cartographic field has evolved significantly; mainly in order to provide a digital and 3D geometric description of urban environments in addition to 2D conventional paper urban maps. More precisely, some active work in the photogrammetric, remote sensing and computer vision communities is focused on the 3D building modeling approaches since the buildings constitute urban objects of great interest for the 3D city modeling. The 3D building modeling approaches are more and more developed due to the increasing needs of institutional and industrial applications in the civil and military contexts. The visualization of urban environments (e.g., virtual tourism), the urban planning, the site recognition (military applications) or the conservation of architectural work (cultural heritage) are some of the many applications requiring 3D building modeling approaches. For these reasons, several approaches are proposed across the literature and provide more or less accurate, detailed and adapted 3D building models according to the targeted applications. Globally, the proposed approaches tend to produce 3D building models with a quality closer to the physical reality. The prior knowledge of the urban areas under study (e.g., cities topology, environment densities, shape complexity, existing surveys, urban GIS databases) and the remotely sensed rawdata collected are very rich sources of information that can be used to develop sophisticated building modeling approaches. The 3D building reconstruction is a complex task due to the diversity of building shapes (e.g., architectural and contemporary buildings). The building facades usually have some microstructures (e.g., windows, doors) and the building roofs present some superstructures (e.g., chimneys, attic windows). The representations of 3D building models can thus be divided into three main categories (see

The complexity of 3D building models can be planimetric (complex polygonal ground footprint) as well as altimetric (e.g., heights variation). Aerial data are very useful for the coverage of large areas such as cities. In the literature, several aerial or satellite data-based approaches are proposed to extract 3D prismatic and polyhedral building models. The data usually employed as input to these approaches are either optical aerial or satellite images, aerial or satellite Digital Surface Model (DSM) or aerial 3D point clouds such as aerial LIDAR data (Light Detection And Ranging data). Some data samples usually employed are shown in

The flowchart of the two first strategies (image-based building modeling) is illustrated in

The 3D building reconstruction of a full urban environment requires automatic or semi-automatic methods. The massive reconstruction approaches usually employ a feature extraction stage. However, this stage is very sensitive since it can induce some missed-detections, false alarms, under-detections or over-detections. To control these effects, the 3D building modeling approaches employ computer vision strategies. These strategies are regrouped into two paradigms. More precisely, the first paradigm is a bottoms-up scheme and consists in the assembly of geometric features without pre-existing knowledge of the sought model. The second paradigm, called top-down, exploits a library of models and searches the model that best fits with the input data (images, DSMs).

As previously mentioned, several approaches for 3D reconstruction of polyhedral building models currently employ as input Digital Surface Models (see

In this paper, we propose a direct and featureless approach for the extraction of 3D simple polyhedral building models from aerial images (

We are interested in modeling residential buildings having simple polyhedral shapes and whose ground footprints are represented by quadrilaterals. We note that in most cases, these quadrilaterals are rectangles. However, this requirement is not a limitation to our approach. Indeed, any complex shape can be considered as a union of simple models with rectangular footprints.

The input data are calibrated aerial images. Hence, our research deals with the intermediary degree of generic modeling such as described in

In this study, we are essentially focusing on the approaches producing polyhedral building models (as shown in

The rest of the paper is organized as follows. Section 2 describes various existing image-based approaches for 3D polyhedral building modeling. Section 3 presents the global strategy of the proposed approach. Section 4 describes the optimization process of the approach. Section 5 gives several intermediary results and evaluations of major steps.

Many interesting building modeling approaches have been addressed in the literature for the reconstruction of 3D polyhedral building models (e.g., [

In [

Taillandier

In [

Fisher

In [

Zebedin

In [

In this section, we present our formulation of the problem and the adopted parametrization. In the previous section, we described several approaches that have been addressed in the literature. Here we state the characteristics of our approach.

Since aerial images are employed, the proposed approach only deals with roof models due to the angle of view. Indeed, an aerial image allows the visualization of two facades at best, since the building generally has a rectangular footprint. Nevertheless, the building facades can actually be determined using the prior knowledge of the ground-height of the area under study (from urban database) and by the assumption that the dominant facade planes are vertical. In this paper, we restrict our study to simple polyhedral models (several roof varieties). Some are illustrated in

The adopted multi-facet roof model comprises six vertices

Moreover, since the images are calibrated the 3D coordinates of the inner vertices _{M}_{M}_{M}_{N}_{N}, Z_{N}

Furthermore, it is easy to show that our polyhedral model can be fully described by the 3D coordinates of the inner vertices and of two outer vertices that are diagonally opposite (coplanarity constraint). Indeed, the building can be parameterized by eight parameters: four parameters for the image location of the inner vertices _{M}_{M}_{N}_{N}

In other words, our method has the obvious advantage that the coplanarity constraints are implicitly enforced in the model parametrization. By contrast, the feature-based approach requires fitting the planes to DSM or 3D points.

Recall that the 3_{M}_{N}

Thus, finding the model boils down to finding this vector

In this study, we present a novel modeling approach which is direct and image-based. The challenge consists in the reconstruction of 3D polyhedral building shapes using directly photometric information of aerial images. In computer vision, direct approaches have been essentially proposed for the image registration in order to generate mosaic images. Featureless image registration techniques strive to compute the global motion of the brightness pattern (e.g., affine or homographic transforms) without using matched features (e.g., [

As previously mentioned, our approach employs calibrated aerial images. The building under study is observed by _{i}_{i}_{i}_{i}_{i}

In computer vision, the homography principle is employed in image registration, auto-calibration of cameras, motion estimation and also for stereoscopy and 3D scene reconstruction. Mathematically, the homography is a projective collineation that describes an image-to-image transformation that can be used either in the case of a pure 3D camera rotation, or a planar scene (see

The homography matrix is the transfer matrix that allows the transfer of the point _{1}(_{1},_{1}) of the reference image (image 1) to its homologous point _{2}(_{2},_{2}). The equation that links each pair of homologous points can be defined as:

The matrix _{1} and _{2} respectively are the intrinsic matrix of the two cameras,

In our case, the intrinsic and extrinsic parameters of the cameras are known (calibrated cameras). The parameters of the plane need to be determined for each facet that compose the model. If the planes’ parameters are known, the homography matrix will directly transfer, facet by facet, sets of master pixels to their homologous pixels.

In this subsection, a measure has been defined in order to value the accuracy of hypothetical facets according to the data. As we recall, in the multi-facet case, the facets are rigidly joined as shown in

More precisely, our basic idea relies on the following fact: if the shape and the geometric parameters of the building (encoded by the vector _{m}

Recall that _{m}_{j}_{m}_{m}

The choice of the error function

We seek the polyhedral model

We can also measure the fitness of the 3D model by measuring the gradient norms along the projected 3D segments of the generated 3D models. In general, at facet discontinuities the image gradient is high. Thus, for a good fit, the projection of the 3D segments will coincide with pixels having a high gradient norm in all images. Therefore, we want to maximize the sum of gradient norms along these segments over all images. Recall that we have at most nine segments for our simple 3D polyhedral model. Thus, the gradient score is given by:
_{j}_{j}

Since we want the dissimilarity measure

It is worth noting that during the optimization of

The image-to-image transfer can be carried out pixel-to-pixel by combining 3D point construction (line of sight intersected with plane) and 3D-to-2D projection; or more directly facet-to-facet by using homographic transfer (

In order to minimize

In this subsection, we briefly describe the mechanisms and the goals of optimization processes. Moreover, we select an optimizer adapted to the considered modeling problem.

In our case, our approach begins by approximating any building model by one horizontal facet, _{ground_min}_{ground_max}_{min}_{max}_{min}_{max}_{ground_min}_{min}_{ground_max}_{max}

The Differential Evolution algorithm belongs to the family of Genetic Algorithms and to the evolutionary strategies. The genetic algorithm modifies the structure of individuals using the mutation and the crossover. The evolutionary strategies achieve the auto-adaptation by geometric manipulation of individuals. These ideas have been formulated by a simple and powerful operation of vectors mutation proposed in 1995 by Price and Storn ([

The DE algorithm is employed in order to compute the 3D model and integrates the minimization process guided by the dissimilarity measure previously defined. This algorithm achieves generations of solutions—populations. The population of the first generation is randomly chosen around a rough solution. The rough solution will thus define a given distribution for the model parameters. The rough solution is simply given by a zero-order approximation model (the prismatic model) which is also obtained by minimizing the dissimilarity score over one unknown (the average height of the roof).

In our case, the use of the DE algorithm is described in

We use the Differential Evolution optimizer since it has four interesting properties: (i) it does not need an accurate initialization, (ii) it can integrate geometric constraints according to the context (adaptability properties), for example, constraints can be imposed in order to ensure that the polyhedral roof model are an assembly of facets with slopes inferior to 60° (standard information coming from urban databases concerning the area under study), (iii) it does not need the computation of partial derivatives of the cost function, and (iv) theoretically it can provide the global optimum. Hence, this algorithm is easy to implement and to integrate into the applications. In our case, the experiments show that only a few iterations lead to convincing results.

This subsection deals with the one-facet or multi-facet selection. Several automatic strategies can be employed:

The first strategy consists in reconstructing the two models independently (one-facet model (sloped roof) and multi-facet model). Each calculated model provides a SAD score. The 3D model finally obtained will be the solution providing the minimum score among the models shown in

The second strategy (adopted) exploits the putative estimation of 4 facet normals. The building footprint (rectangular) is divided into 4 triangular facets as a pyramidal model (see _{∠} is set the highest deviation between the 4 facet normals and the vertical direction. If this computed score is less than a predefined tolerance threshold for normals verticalness denoted 𝒯_{⊥} (low angular deviation empirically fixed) then the retained model is the prismatic model initially calculated (e.g., _{∠} is set to the highest deviation between the facet normals. If this calculated score is less than a predefined tolerance threshold for normals parallelism denoted 𝒯_{‖} then the one-facet estimation (e.g.,

It is worth noting that erroneous feature-based solutions as illustrated in

In this section, we present the dataset employed as input of the proposed approach as well as the evaluations and the results obtained by our reconstruction method. We carry out several evaluations in order to analyze the convergence, the robustness and the accuracy of our image-based approach. These evaluations demonstrate the high potential of our modeling approach.

The considered input dataset contains multiscopic gray-scale aerial images (see sample

In this subsection, we measure the quality of the reconstruction obtained using the DE algorithm. We carry out a 3D modeling of one generic 3D facet running the DE algorithm with 30 iterations. The considered facet contains 5, 424 pixels. The number of individuals that compose the population is fixed to 30.

Additional evaluations and results of reconstructed 3D building models and convergence are illustrated in

The direct image-based and featureless approach provides a satisfying three-dimensional modeling since the results are very close to the ground truth data. As can be seen in

Accuracy evaluation will also be studied in the following subsections that deal with the robustness evaluation of the proposed approach in complex cases.

In order to get a quantitative evaluation of the 3D accuracy of the proposed approach, we adopted a simple and cheap scheme. For the sake of simplicity, we limited the study to a triangular facet that is viewed in two aerial images. In this scheme, we employ semi-synthetic aerial images (see

We have used two kinds of image noise: uniform and Gaussian. The three first tests correspond to a uniform noise. The three succeeding tests correspond to a Gaussian noise (see

In this way, the added noise that progressively increases simulates images of buildings with different levels of quality and tests them for a 3D reconstruction. Thus, the first level of noise simulates slight defaults in the acquisition. In this way, we can test the robustness of our reconstruction method according to the quality of the acquired images.

We observe that the noise added to the image of footprints does not severely affect the accuracy of the 3D reconstruction. Depending on the type of noise, the average errors associated to the vertices can reach 33 cm, the average error associated with the sloping angle can reach 3.5° and the average error associated with the vertices altitude can reach 31 cm. Moreover, we can observe that the maximum errors have an inaccuracy multiplied by two to three with a maximum sloping angle of 7.2°, a maximum location deviation of 62 cm and a maximum altitude deviation of 53 cm. The values concerning the location and the sloping errors seem to oscillate while the altitude error seems to increase more with the noise. Nevertheless, despite the presence of high noise magnitude, the location deviations remain inferior to one meter and the sloping angle deviation is inferior to 10°. These values prove that the quality of the acquired images and their resolution are sufficient to allow accurate 3D building reconstruction using the proposed method.

In this subsection, we propose to study the performance of the approach when the image resolution is reduced. A simple experiment was conducted. A triangular facet was selected. This facet contains 5432 pixels. We generated a sub-sampled facet by dropping every other column in the original image. Thus, we simulated a facet image with a reduced resolution.

As previously mentioned, we aim to reconstruct planar roofs from aerial images. An important question comes to mind: what is the effect of superstructures on our reconstruction method? Indeed, a large majority of buildings incorporate superstructures. Consequently, the superstructures may generate unwanted noise since their 3D structures are not included in the dominant plane associated with the facet.

In this section, we present a method which increases the robustness of the facet reconstruction having several superstructures. The aim is to prevent the superstructures from distorting the estimation of the planar roofs. The idea consists in (i) detecting the pixels of the superstructures, and (ii) in using the footprint removed from these pixels. Assuming that the 3D plane calculated by the Differential Evolution algorithm is relatively accurate, we can thus classify the associated pixels into two categories: the pixels belonging to the dominant plane and the outlier pixels (pixels that do not belong to the plane). The proposed method proceeds in two passes:

In the first pass, the DE algorithm is used with the totality of the reference footprint.

In the second pass, the DE algorithm is used with only those pixels considered as belonging to the dominant plane.

Once the plane has been estimated in the first pass, several techniques can be used in order to carry out a coarse classification of the pixels. We observe that the pixels that do not belong to the theoretical plane of the facet will have a significant residual (absolute difference between the gray levels in different images) since the transfer pixel-to-pixel will not be correct. The idea is then to detect the pixels having a significant residual. We present then two techniques based on the threshold of individual residuals:

The first technique selects the outlier pixels by determining a threshold for the residuals. The empirical threshold 𝒯_{emp} is defined as follows:

For

The second technique uses another formula for the threshold. This threshold noted 𝒯_{gen} is defined by:

We observe in _{gen} are removed. The adopted threshold 𝒯_{gen} provides satisfying results for massive and generic filtering of the facet superstructures.

We have carried out a comparison of modeling methods by using the solution provided by the DEM as a reference solution. We have applied the reconstruction methods on one facet including superstructures namely several chimneys (see

We observe the reconstructions to the altitude of the vertices since it is the parameter that varies the most (see

We observe that the most accurate method seems to employ the SAD measure. Moreover, we find that the presence of superstructures can affect the 3D reconstructed model. A filtering stage is thus necessary in order to increase the accuracy of the solution. To this step, we envisage testing several methods integrating by different ways the superstructure filtering with the aim of model improvement. We stress on the fact that the DEM-based reference solution does not correspond to the ground truth.

As previously mentioned, the correct image registration of the building footprint leads to a correct 3D building model.

In this work, we provided an overview of some problems and solutions dealing with 3D building modeling. We proposed a new methodology for 3D building reconstruction based on a featureless process. To the best of our knowledge, this method has never been exploited in the 3D building modeling problem. Unlike existing methods, the pixel-to-pixel matching process is avoided. However, it is a by-product of the proposed method in the sense that once the 3D shape of the building is known, the image-to-image transfer is known from the associated homographies. The optimization associated with the proposed method has been carried out using the Differential Evolution algorithm. The method has been validated using real and simulated images. The proposed approach was compared with DEM based modeling approaches. It is beyond the scope of the current work to compare the proposed approach with all existing feature-based approaches. Indeed, it is well known that featureless approaches outperform feature-based approaches regarding the accuracy of the estimated geometric transforms used for image registration. The proposed method provides a satisfying polyhedral building reconstruction from gray-scale calibrated aerial images. The proposed top-down approach is also able to rectify erroneous reconstructed polyhedral building models in existing 3D city models whenever the corresponding aerial images are available. Furthermore, we argue that the proposed modeling method provides a novel tool that can be used in existing large-scale urban modeling pipelines as a main or complementary tool.

Future work will be concentrated on the following directions:

Testing other dissimilarity measures. New dissimilarity measures could be used and evaluated.

Improving the reconstruction of roofs having superstructures. One possible solution is the integration of the outlier pixel filtering in the Differential Evolution algorithm. Indeed, each individual provides a normalized SAD score which considers only the pixels belonging to the roof plane. The pixels belonging to the superstructures will not be considered in the score calculation. The superstructure filtering will generate a different distribution in the progeny and provide a more accurate solution. The second scenario envisions running the proposed two passes several times.

As we have previously mentioned, the registration process is carried out between a reference footprint (fixed boundaries) selected by an operator in the master image and the rawbrightness of the aerial images of the multiscopic data set in which the building is visible. Consequently, the initial boundary line segments of the reference footprint could be slightly shifted. In a future work, we intend to use a method that is able to deform a rough 2D footprint into a precise 2D footprint. This rectification process should precede the image based 3D reconstruction. We stress the fact that the use of cadastral maps can release the requirement of having an accurate 2D footprint.

In our work, we assume that the building roof has conventional and simple shapes. Research could be done in the future in order to extend the direct approach to the case of generic buildings and roofs with atypical shapes.

Examples of generic model representations. Three illustrations of the same building with different level of details (from low to high).

The upper part of this figure illustrates an example of 3D building modeling process using a DSM. The middle part of this figure shows image-based feature extraction and assembly. The lower part shows our proposed direct and featureless image-based approach.

Flowchart diagram currently adopted by some image-based building modeling approaches. The diagram presents two paths conducting to 3D polyhedral building models. These two paths are illustrated by the first two rows of

Some erroneous reconstructed buildings resulting from a known feature-based framework for massive building reconstruction (BATI-3D^{®} prototype software—a large scale building modeling pipeline developed at the French National Geographical Agency). The estimated 3D models are projected onto the image or DSM.

Samples of parametric building models _{g}_{c}

The adopted generic 3D polyhedral model. The multi-facet model (_{A}_{B}_{C}_{D}_{I}_{M}_{N}_{H}_{1} corresponds to the center of projection of camera 1. Blue and green lines are outer and inner lines of sight (perspective lines), respectively. Π_{G}

Flowchart diagram of the proposed approach (top) and illustrations of the main steps (bottom).

Homography induced by a plane.

Illustration of one iteration of the DE algorithm.

(a) Building footprint initially selected. No prior knowledge of the model shape is known. (b) Estimated prismatic model (algorithm initialization). (c) Estimated one-facet model (sloped roof). (d) Estimated multi-facet model (hip roof).

A pair sample of aerial images extracted of the multiscopic dataset. Each image covers a common area of the city of Marseille acquired from different points of view (partial overlapping). The size of the images is 𝒩_{c}_{r}_{c}_{r}

(a) illustrates the facet in the master image. (b) illustrates the successively estimated 3D facets during the evolution of DE algorithm. (c) and (d) illustrate the final estimated 3D model.

Estimated 3D polyhedral building model and related convergence.

Estimated 3D polyhedral building model and related convergence.

The best solution at several iterations of the Differential Evolution algorithm. The evolution of the 3D model and the footprint in the associated image is shown. The proposed algorithm converges to an optimal final solution in a few iterations.

Adding noise to a facet for robustness evaluation. The intensities of the gray-scale pixels belong to the interval [0,255]. The magnitude of the uniform noise progressively increases according to the respective intervals _{1} = [−4, 4], _{2} = [−8, 8], _{3} = [−16, 16] and _{4} = [−32, 32]. These four levels of noise are shown in (b) (the corresponding random noise affected the bottom facet).

Error on the facet height.

Error on the sloping angle.

Error on the vertex 3D positions.

Automatic detection and filtering of the superstructures. The threshold 𝒯_{gen}

Filtering out the superstructures. (b) Tuning the _{emp}

Correct building modeling in the presence of significant shadows. The master images are not shown.

Some feature-based approaches developed for 3D polyhedral building modeling from aerial images.

Paper | Process | Input data | Strategy |
---|---|---|---|

Jibrini |
Automatic | Urban map/Aerial Images | Bottom-up |

Taillandier |
Automatic | Aerial Images | Bottom-up |

Fischer |
Automatic | Aerial Images | Hybrid |

Lafarge |
Automatic | Aerial Images | Top-down |

Jaynes |
Automatic | DEM/Aerial Images | Hybrid |

Zebedin |
Automatic | Aerial Images | Bottom-up |

Tseng |
Interactive | Aerial Images | Top-down |

Comparison of 3D modeling results obtained in the first case from a DEM-based approach and in the second case from our direct image-based approach.

DEM-based approach | Featureless proposed approach | |
---|---|---|

(117.59, 396.80, 26.95) | (117.66, 396.75, 27.21) | |

(123.96, 387.79, 23.70) | (124.05, 387.74, 23.80) | |

(108.36, 390.32, 24.51) | (108.33, 390.26, 23.85) | |

(116.70, 391.62, 24.97) | (116.66, 391.67, 25.06) |

Comparison of 3D modeling results in the cases original resolution and sub-sample images.

Original resolution images | Sub-sampled images | |
---|---|---|

(117.65, 396.75, 27.14) | (117.65, 396.77, 26.87) | |

(124.05, 387.75, 23.70) | (124.05, 387.75, 23.76) | |

(108.35, 390.24, 24.44) | (108.34, 390.25, 24.27) |

Comparing the modeling results obtained with the SAD and SSD scores using facets including superstructures with and without the filtering process.

DEM | SAD | SAD | SSD | SSD | |
---|---|---|---|---|---|

| |||||

Including | superstructures | filtering | superstructures | filtering | |

41.96 m | 42.92 m | 42.22 m | 43.61 m | 42.75 m | |

41.36 m | 41.10 m | 40.98 m | 40.84 m | 40.87 m | |

39.78 m | 39.62 m | 40.22 m | 38.88 m | 40.10 m | |

Average deviation in |
0.0 m | 0.46 m | 0.36 m | 1.02 m | 0.53 m |