Improving Details of Building Façades in Open LiDAR Data Using Ground Images

Recent open data initiatives allow free access to a vast amount of light detection and ranging (LiDAR) data in many cities. However, most open LiDAR data of cities are acquired by airborne scanning, where points on building façades are sparse or even completely missing due to occlusions in the urban environment, leading to the absence of façade details. This paper presents an approach for improving the LiDAR data coverage on building façades by using point cloud generated from ground images. A coarse-to-fine strategy is proposed to fuse these two-point clouds of different sources with very limited overlaps. First, the façade point cloud generated from ground images is leveled by adjusting the facade normal to perpendicular to the upright direction. Then leveling façade point cloud is geolocated by alignment between images GPS data and their structure from motion (SfM) coordinates. Next, a modified coherent point drift algorithm with (surface) normal consistency is proposed to accurately align the façade point cloud to the LiDAR data. The significance of this work resides in the use of 2D overlapping points on the building outlines instead of the limited 3D overlap between the two-point clouds. This way we can still achieve reliable and precise registration under incomplete coverage and ambiguous correspondence. Experiments show that the proposed approach can significantly improve the façade details in open LiDAR data, and achieve 2 to 10 times higher registration accuracy, when compared to classic registration methods.


Introduction
In recent years, there have been a lot of significant, global open data initiatives.They include vast amounts of open datasets in many North American cities [1][2][3] and large projects, such as the Infrastructure for Spatial Information in the European Community (INSPIRE) [4,5].As an important part among these open data, LiDAR data [6,7] are widely used for deriving three-dimensional (3D) spatial information over large areas.Due to free access of open LiDAR data, new avenues of research for students, researchers, and other LiDAR data user communities have been opened [8][9][10].However, these open LiDAR data are often sparse, incomplete, or even entirely void on building façades due to occlusions in the urban environment.This problem makes it difficult or impossible to achieve fine, complete building reconstruction at high levels of detail [11].
Recently, ground image capture devices, such as off-the-shelf digital cameras, smartphones with global positioning system (GPS) readings, and digital compasses, have become prevalent.They allow researchers to acquire a number of high-resolution images of building façades via crowdsourcing at a low cost.Considering that ground images are complementary to open LiDAR data, the former contains rich façade details and the later provides accurate roof information, so fusing façade point clouds generated from ground images into open LiDAR data is a promising way to improve the details of façades in the LiDAR data.Generating point cloud from multiple images is an essential task in the field of both photogrammetry and computer vision.To reconstruct 3D information from images, interior orientation parameters (focal length, principal point position, and camera distortion parameters) and exterior orientation parameters (locations and orientations) of cameras need to be estimated at first.This process was first introduced and solved as an analogue procedure using electrical circuits [12].After decades of development, the automation level and accuracy has been greatly improved [13][14][15].The state-of-the-art triangulation technology, or called structure from motion (SfM) in computer vision, has been able to precisely orient unordered image sets [16][17][18].Many dense matching methods have been proposed to generate highly detailed and dense point cloud of objects using calculated camera orientation parameters.Zhang et al. [19] proposed a dense matching approach for automatic DSM generation from high-resolution satellite images by using a coarse-to-fine hierarchical solution with an effective combination of several image matching algorithms and automatic quality control.Hirschmüller [20] introduced the semi-global matching (SGM) method that uses dynamic programming to achieve a pixel-wise matching result.Furukawa et al. [21] proposed a patch-based matching method that outputs a quasi-dense set of patches covering the surface visible in the images.Vu et al. [22] start directly from a rough mesh and further improve it according to a variational refinement of photo consistency energy.Based on the works of previous researchers, many open source programs [21,[23][24][25] have emerged, such as COLMAP [23], a general-purpose SfM and dense point cloud generation pipeline with high reliability under a variety of conditions.We can use these open source programs to process ground images to recover façade information precisely and in fine detail.
Various studies have focused on the fusion of multi-source data to reconstruct buildings.According to the types of fused data, these studies can be divided into the following situations: (1) Various sources of laser scanning data.Böhm [26] proposed an method for fusing airborne LiDAR scanning (ALS) and terrestrial LiDAR scanning (TLS) with overlaps using the iterative closest point (ICP) [27] algorithm.Boulaassal et al. [28] combined ALS, TLS, and vehicle LiDAR scanning (VLS) data to produce reliable 3D building models.However, the high cost of using several kinds of laser scanners limits the applications of this technique.Despite the recent emergence of many low-cost LiDAR systems [29][30][31], the inadequate density and quality of point clouds obtained from them introduces new difficulties in building reconstruction.(2) Aerial and ground images.Shan et al. [32] handled this situation using a viewpoint-dependent matching method so that the aerial and the ground images could be accurately matched to generate high-quality multi-view stereo models.However, the overlaps between ground images and the aerial images are required.(3) LiDAR data and images.Rönnholm et al. [33] present an overview of various levels of integration between laser scanning and photogrammetric images.Various methods to establish correspondences between the two different datasets, such as tie points [34,35], structural features [36][37][38], orthophoto (lasermap) [39,40] and other methods [41,42] have been studied.In a word, all the above works are based on the precondition of a certain degree of overlaps to establish correspondences among datasets for registration.However, there are limited overlaps between open LiDAR data and façade point cloud generated from ground images.The accurate fusion of the two sources of point clouds has not yet been adequately studied.
Essentially, the fusion of the façade point cloud and open LiDAR data is a process of point set registration that maps one-point set to the other according to their correspondences.Point set registration is a crucial step in many photogrammetry and computer vision tasks, including medical imaging [43], heritage reconstruction [44], and industrial applications [45].The iterative closest point (ICP) algorithm [27] is the most widely used and classic point set registration algorithm due to its simplicity and low computational complexity [46,47] compared with algorithms using local feature extraction [48], deterministic annealing [49], or probabilistic method [46,50].It iteratively assigns correspondence based on a closest distance criterion and finds the rigid transformation using a least squares approach between the pair of point sets until a local minimum is reached.A major drawback of the standard ICP algorithm is that it demands an accurate initial guess of the correspondence between two-point sets, otherwise, it may fall into a local minimum or even be non-convergent.Another drawback of the standard ICP algorithm is that it has a linear convergence behavior that requires dozens of iterations.Many ICP-based variants have been proposed to address these weaknesses [51][52][53][54][55][56].Myronenko et al. proposed a probabilistic-based point set registration algorithm [46] which is called coherent point drift (CPD).CPD considers the alignment of a pair of point sets as a probability density estimation problem where one-point set represents the Gaussian mixture model (GMM) centroids and the other represents the data points.A similarity transformation that aligns GMM centroids to data points is obtained by maximizing the GMM posterior probability for data points, determining an optimum value.The CPD algorithm, which exhibits a linear computational complexity, outperforms most state-of-the-art algorithms and achieves promising results with respect to conditions of noise, outliers, and missing points.However, most of these registration methods, including ICP and CPD, failed to register the façade point cloud and open LiDAR data because of very limited overlaps between the two sources of point clouds.
We proposed a coarse-to-fine approach to fuse the open LiDAR data and the façade point clouds generated from ground images, to improve the details of the building façades in the LiDAR data.First, the façade point cloud generated from ground images is leveled by adjusting the facade normal to perpendicular to the upright direction.Then, an initial geolocalization of the leveling façade point cloud is performed respectively in horizontal and vertical direction by aligning the SfM camera positions to their GPS imaging meta-data, so as to reduce the large differences in rotation, scale, and translation between the two kinds of point clouds.Second, accurate registration of two 3D point clouds is converted to a 2D outline information registration solved by our modified CPD algorithm with normal consistency (NC-CPD) and a vertical translation.The significance of the work resides in the best use of the most likely overlap between the two-point clouds and the achievement of reliable and precise registration under possibly incomplete coverage and ambiguous correspondence.
The overview of the proposed method is illustrated in Figure 1.The remainder of this paper is structured as follows.In Section 2, we describe our approach for aligning the façade point cloud generated from ground images to open LiDAR data.Section 3 presents experiment results and discusses the performance of the proposed approach.Finally, we conclude the paper in Section 4.

Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 19
Another drawback of the standard ICP algorithm is that it has a linear convergence behavior that requires dozens of iterations.Many ICP-based variants have been proposed to address these weaknesses [51][52][53][54][55][56].Myronenko et al. proposed a probabilistic-based point set registration algorithm [46] which is called coherent point drift (CPD).CPD considers the alignment of a pair of point sets as a probability density estimation problem where one-point set represents the Gaussian mixture model (GMM) centroids and the other represents the data points.A similarity transformation that aligns GMM centroids to data points is obtained by maximizing the GMM posterior probability for data points, determining an optimum value.The CPD algorithm, which exhibits a linear computational complexity, outperforms most state-of-the-art algorithms and achieves promising results with respect to conditions of noise, outliers, and missing points.However, most of these registration methods, including ICP and CPD, failed to register the façade point cloud and open LiDAR data because of very limited overlaps between the two sources of point clouds.
We proposed a coarse-to-fine approach to fuse the open LiDAR data and the façade point clouds generated from ground images, to improve the details of the building façades in the LiDAR data.First, the façade point cloud generated from ground images is leveled by adjusting the facade normal to perpendicular to the upright direction.Then, an initial geolocalization of the leveling façade point cloud is performed respectively in horizontal and vertical direction by aligning the SfM camera positions to their GPS imaging meta-data, so as to reduce the large differences in rotation, scale, and translation between the two kinds of point clouds.Second, accurate registration of two 3D point clouds is converted to a 2D outline information registration solved by our modified CPD algorithm with normal consistency (NC-CPD) and a vertical translation.The significance of the work resides in the best use of the most likely overlap between the two-point clouds and the achievement of reliable and precise registration under possibly incomplete coverage and ambiguous correspondence.
The overview of the proposed method is illustrated in Figure 1.The remainder of this paper is structured as follows.In Section 2, we describe our approach for aligning the façade point cloud generated from ground images to open LiDAR data.Section 3 presents experiment results and discusses the performance of the proposed approach.Finally, we conclude the paper in Section 4.

Methodology
Given a ground image set {I i |i = 1, 2, . . .G}, COLMAP [23], a general-purpose SfM and MVS pipeline, is used to generate the façade point cloud M loc and the camera positions C loc i i = 1, 2, . . .G in the SfM local coordinate system.Additionally, the GPS meta-information

Initial Geolocalization
Since the alignment between the façade point cloud M loc in the local coordinate system and open LiDAR data P geo in the georeferenced coordinate system features large translation, rotation and scale differences, geolocalization is performed to approximately transform M loc into the georeferenced coordinates in order to reduce these initial differences.

Leveling the Façade Point Cloud
As a first step in the initial geolocalization, we leveled the façade point cloud M loc to the upright direction (the opposite of the gravity vector) by estimating the upright vector D up .This is done on the assumption that D up should be perpendicular to the normal vectors of all façade points in M loc .
An initial upright vector, D up , is calculated by fitting a plane to the camera positions C loc i obtained in the SfM process, which assumes that images are captured approximately in one plane.Then, candidate façade points {p i } are identified, whose normal vectors N p i are approximately perpendicular to D up .In other words, the points such as N p i T D up < 0.3 are extracted.After that, a random sample consensus (RANSAC)-based [57] approach is applied to refine the upright vector D up by iteratively selecting two points from candidate façade points and estimating the cross products of their normal vectors.Finally, the leveling façade point cloud M loc is acquired by rotating M loc to make the z-axis in its coordinate system parallel to the upright vector D up .

Geolocalization of the Leveling Façade Point Cloud Using GPS Meta-Data
Since façade point cloud and SfM camera positions are obtained in the same local coordinate system, the problem of geolocating the façade point cloud can be converted into the problem of locating the SfM camera positions in the georeferenced coordinate system, as shown in Figure 2.However, due to the unbalanced precision between the horizontal and altitude directions in GPS positioning [58], it is difficult to directly use the latitude, longitude and altitude for high-accuracy three-dimensional registration while ensuring the façade point cloud leveling.We divided the registration into a planar transformation and a vertical translation separately.
In the horizontal direction, parameters of a RANSAC-like 2D similarity transformation are estimated between the camera positions (x and y coordinates) in the local SfM coordinate frame and their corresponding longitudes and latitudes in the GPS frame.Given the local coordinates C loc−2D i and the geo-referenced coordinates C GPS−2D i of the ground cameras, a minimal subset (3 points) of the ground cameras for point set registration is selected from C loc−2D i and C GPS−2D i at random.Then, the 2D-2D similarity transformation is estimated using the least-square method.The inlier set of the estimated transformation is obtained with the distance threshold of 10 m.This process is repeated to obtain the maximal consensus set, which has a maximum number of inliers.Finally, the 2D similarity transformation parameters R 2D ca , s 2D ca , T 2D ca for geolocating the cameras (images) and the façade point cloud into the georeferenced coordinate system, is estimated with this maximal consensus set using the least-square method again.This procedure is formulated in Equation ( 1): where s, R, T represent scale, rotation, and translation parameters, respectively.
Remote Sens. 2018, 10, x FOR PEER REVIEW 5 of 19 In the horizontal direction, parameters of a RANSAC-like 2D similarity transformation are estimated between the camera positions (x and y coordinates) in the local SfM coordinate frame and their corresponding longitudes and latitudes in the GPS frame.Given the local coordinates {  } and the geo-referenced coordinates {  } of the ground cameras, a minimal subset (3 points) of the ground cameras for point set registration is selected from {  } and {  } at random.
Then, the 2D-2D similarity transformation is estimated using the least-square method.The inlier set of the estimated transformation is obtained with the distance threshold of 10 m.This process is repeated to obtain the maximal consensus set, which has a maximum number of inliers.Finally, the 2D similarity transformation parameters {  ,   ,   } for geolocating the cameras (images) and the façade point cloud into the georeferenced coordinate system, is estimated with this maximal consensus set using the least-square method again.This procedure is formulated in Equation ( 1): where , ,  represent scale, rotation, and translation parameters, respectively.After the 2D alignment, a vertical translation   is calculated by matching the mean value of the z coordinate in {  } to the mean value of the altitude in {  }.Finally, by applying the {  ,   ,   } to the  and  coordinates of  and {  ,   } to the z coordinate of  , the initial geolocated façade point cloud  can be obtained.Scale, translation, and rotation differences are greatly relieved after initial geolocalization as described above, however, there are still certain differences between the initial geolocated façade point cloud  and the open LiDAR point cloud  due to inadequate positioning accuracy of embedded GPS, especially in urban environments [59].

Modified Coherent Point Drift with Normal Consistency (NC-CPD)
The previous step provides sufficient initial correspondences between the two-point clouds for their further accurate alignment.Due to inevitable noise points in the façade point cloud, including those generated in the MVS procedure and those from other ground objects such as trees, lamp-posts, Scale, translation, and rotation differences are greatly relieved after initial geolocalization as described above, however, there are still certain differences between the initial geolocated façade point cloud M geo and the open LiDAR point cloud P geo due to inadequate positioning accuracy of embedded GPS, especially in urban environments [59].

Modified Coherent Point Drift with Normal Consistency (NC-CPD)
The previous step provides sufficient initial correspondences between the two-point clouds for their further accurate alignment.Due to inevitable noise points in the façade point cloud, including those generated in the MVS procedure and those from other ground objects such as trees, lamp-posts, and passers-by, NC-CPD algorithm is proposed to register the two-point clouds with noise and structural ambiguities.

Coherent Drift Algorithm
The CPD algorithm was first introduced in [31] and considered the alignment of two-point sets as a probability density estimation.Given two D-dimensional point sets, X N×D = {x 1 , . . . ,x N } and Y M×D = {y 1 , . . . ,y M }, the CPD method considers the alignment of the two point sets as a probability density estimation problem where one point set represents the GMM centroids (Y M×D ) and the other one represents the data points (X N×D ).The similarity transformation Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 19

Coherent Drift Algorithm
The CPD algorithm was first introduced in [31] and considered the alignment of two-point sets as a probability density estimation.Given two D-dimensional point sets,  × = { , … ,  } and  × = { , … ,  }, the CPD method considers the alignment of the two point sets as a probability density estimation problem where one point set represents the GMM centroids ( × ) and the other one represents the data points ( × ).The similarity transformation ( , , ) that aligns the GMM centroids  × to the data points  × is obtained by maximizing the GMM posterior probability for the data points  × in order to find an optimum value.The GMM probability density function used in CPD can be written as: (R, s, T) that aligns the GMM centroids Y M×D to the data points X N×D is obtained by maximizing the GMM posterior probability for the data points X N×D in order to find an optimum value.The GMM probability density function used in CPD can be written as: where p(x|m ) = 2 for m = M + 1 and the uniform distribution p(x|M + 1 ) = 1/N is used to account for outliers.Denoting the weight as ω (0 ≤ ω ≤ 1), and taking P(m) = 1/M for all GMM components, then, the mixture model takes the form: GMM centroids locations are re-parametrized by similarity transformation parameters {R, s, t}.We can estimate them by maximizing the negative likelihood function: The correspondence probability is defined between two points y m and x n as the posterior probability of the GMM centroids given the data points P(m|x n ) = P(m)p(x n |m)/p(x n ).
To estimate the parameters R, s, T, σ 2 , one can use the expectation maximization (EM) algorithm.The first step (E step) is to guess the value of the parameters based on previous values R, s, T, σ 2 old , and Bayes' theory is used to calculate a posteriori probability distribution through the following equation: where p( The second step (M step) is to obtain new parameters by minimizing the negative logarithm likelihood function of Equation ( 4).The EM algorithm proceeds by alternating between E and M steps until convergence.After ignoring constants that are independent of R, s, t, σ 2 , the likelihood function can be written as: where For the detailed solution process, please refer to [31].

Coherent Point Drift with Normal Consistency
Though the original CPD algorithm achieves promising registration results when there is some noise and missing points, it may fail to handle ambiguities induced by repetitive and symmetric scene elements of buildings, as shown in Figure 3C.To resolve this problem (i.e., avoid the façade point cloud from registering the ambiguous part), we introduced normal consistency into the original CPD algorithm to suppress the alignment of the ambiguous part by considering the normal direction of the corresponding points.
corresponding points are sufficiently close, as shown in Figure 3D.
In the original CPD algorithm, a Gaussian distribution is used to model the likelihood of each centroid (|).To avoid aligning façade point clouds to ambiguous parts of open LiDAR data, a corresponding priority based on normal consistency is introduced to decrease the likelihood when points are aligned to ambiguities.To tolerate errors in estimating   and   , we assigned the dot product of   and   to 1 if   •   0.7 is satisfied, as shown in Equation ( 7).
where  is the standard deviation of all   •   − 1 .Then, the likelihood of centroids is modified as follows: When  = 1, the corresponding priority of each centroid is the same, and NC-CPD is degenerate with the original CPD algorithm.The normal 2D boundary points (see Section 2.3.2) extracted from open LiDAR data can be estimated according to their neighboring points (normal direction is toward the exterior of buildings), as shown in Figure 3A.Since the normal façade point cloud has been calculated in the MVS process by using COLMAP [23], the normal of 2D façade points (see Section 2.3.1)can be obtained by projecting onto the horizontal plane, as shown in Figure 3B.We assume that the façade point cloud is correctly aligned to the actual part of the open LiDAR data only if the normal directions of corresponding points are sufficiently close, as shown in Figure 3D.
In the original CPD algorithm, a Gaussian distribution is used to model the likelihood of each centroid p(x|m ).To avoid aligning façade point clouds to ambiguous parts of open LiDAR data, a corresponding priority based on normal consistency is introduced to decrease the likelihood when points are aligned to ambiguities.To tolerate errors in estimating N M i and N P i , we assigned the dot product of N M i and N P i to 1 if N M i •N P i ≥ 0.7 is satisfied, as shown in Equation (7).
where ϕ is the standard deviation of all N M i •N P i − 1 .Then, the likelihood of centroids is modified as follows: When S = 1, the corresponding priority of each centroid is the same, and NC-CPD is degenerate with the original CPD algorithm.

Accurate Alignment Using NC-CPD
Although the overlaps between M geo and P geo in 3D space are hard to find, 2D façade point overlaps can be accurately extracted.We decomposed the accurate alignment into a horizontal transformation and a vertical transformation, as shown in Figure 4.

Accurate Alignment Using NC-CPD
Although the overlaps between  and  in 3D space are hard to find, 2D façade point overlaps can be accurately extracted.We decomposed the accurate alignment into a horizontal transformation and a vertical transformation, as shown in Figure 4.
The above equation means that real façade points should contain enough neighborhood points while these neighborhood points' height should be distributed within a certain range in the vertical direction.After the two steps, façade points  are extracted from the façade point cloud  , and most noise points are removed.Then, we projected all points of  onto the horizontal plane to obtain 2D façade points  , as shown in Figure 5C,D.

2D Boundary Point Extraction of Open LiDAR Data
The alpha shape algorithm [60] is used to find the boundary points from the 2D LiDAR point cloud  , which is obtained by projecting LiDAR data into the horizontal plane.First, alpha shapes

2D Façade Point Extraction from the Façade Point Cloud
Although most of the points in the façade point cloud generated from ground images are part of the façade, there are inevitably many noise points (such as trees, lamp-posts, and passers-by), which will adversely affect the alignment.Thus, it is essential to extract the real façade points from the façade point cloud to reduce the adverse effect of the noise points.First, we extracted candidate façade points from the façade point cloud M geo by using the normal vector information.Since the façade point cloud M geo has been aligned in the upright direction, as described in Section 2.1.1,the dot product of normal N p i of façade point p i and the upright vector (z-axis) should be zero in an ideal case.Considering the errors during the step of setting the façade point cloud upright, we modified the condition to N p i T Z axis < 0.01.Then, we refined the candidate façade points by using their neighbor information.
For each point p i (x i , y i , z i ) in the candidate façade points, it is considered a real façade point only if its neighboring points {n i (x ni , y ni , z ni )} within 0.1 m radius satisfy the following conditions: The above equation means that real façade points should contain enough neighborhood points while these neighborhood points' height should be distributed within a certain range in the vertical direction.After the two steps, façade points M with all possible alpha radii { | ∈ (1, )} for  are calculated.Then, we found the critical alpha radius ( ) that creates a single region for the alpha shape.All alpha values above  can be extracted as candidate alpha values { }.Second, we used a threshold to select one alpha value,  , from { }.Finally, the holes are filled after creating the final alpha shape with alpha value  .The points in the final alpha shape are considered as the boundary point set  , as shown in Figure 5A,B.7) is also calculated with   and the initial  ℳ .Then,  (| ) in Equation ( 5) is calculated by updating (|) in Equation ( 8) using , , ,  , and  .By substituting  (| ) into Equation ( 6), the parameters , , , and  are updated by minimizing Q in Equation (6).The new  is also updated by using the new   =    .These steps are repeated until Q does not change too much or a certain number of iterations is reached.After applying the final transformation parameters {, , } to the x and y coordinates of  , an accurately aligned façade point cloud  for the x and y directions is obtained.The registration process of NC-CPD is described in detail in Algorithm 1.

2D Boundary Point Extraction of Open LiDAR Data
The alpha shape algorithm [60] is used to find the boundary points from the 2D LiDAR point cloud P geo 2D , which is obtained by projecting LiDAR data into the horizontal plane.First, alpha shapes with all possible alpha radii {R i |i ∈ (1, N)} for P geo 2D are calculated.Then, we found the critical alpha radius (R c ) that creates a single region for the alpha shape.All alpha values above R c can be extracted as candidate alpha values {R k }.Second, we used a threshold to select one alpha value, R f , from {R k }.Finally, the holes are filled after creating the final alpha shape with alpha value R f .The points in the final alpha shape are considered as the boundary point set P .First, we calculated the initial σ 2 with R = I, s = 1, and t = 0.The initial S in Equation ( 7) is also calculated with N i and the initial N i .Then, P old (m|x n ) in Equation ( 5) is calculated by updating p(x|m ) in Equation ( 8) using R, s, t, σ 2 , and S. By substituting P old (m|x n ) into Equation ( 6), the parameters R, s, T, and σ 2 are updated by minimizing Q in Equation (6).The new S is also updated by using the new N new i = N i R T .These steps are repeated until Q does not change too much or a certain number of iterations is reached.After applying the final transformation parameters {R, s, t} to the x and y coordinates of M geo f 2D , an accurately aligned façade point cloud M geo f 2D for the x and y directions is obtained.The registration process of NC-CPD is described in detail in Algorithm 1.
Construct initial normal consistency S in Equation ( 7

M
geo is obtained.

Vertical Alignment
The façade point cloud is accurately aligned to open LiDAR data in the x and y axis direction by the horizontal alignment described in the previous section.A translation, T z , on the vertical direction between ..

M geo
and P geo remains to be calculated.We calculated the optimal T z by matching corresponding boundary points respectively from P geo and ..

M
geo on the z axis, following these steps: (1) For a point p i (x i , y i ) of the 2D boundary points P geo b2D , find its 2D neighbor point set {p 1 , . . . ,p i } and q 1 , . . ., q j (within a radius of 0.1 m), from P geo and .. M geo .(2).Find the points p m and q n with a maximum value on the z-axis from {p 1 , . . . ,p i } and q 1 , . . . ,q j , respectively, then calculate the height difference using the equation T i = z p m − z q n .(3).For the other points in P geo b2D , repeat steps 1 and 2 to obtain the height difference set {T 1 , . . . ,T i }.Then, calculate the optimal T z by fitting height difference set {T 1 , . . . ,T i } to a line using RANSAC.Finally, apply the translation T z to the z coordinate of .. M geo , so that an accurately aligned façade point cloud M geo is obtained in the end.

Dataset Description
So far, there are currently no available benchmark datasets for fusing airborne LiDAR data and façade point clouds generated from images.The proposed method is evaluated on a combined dataset.
1.The open LiDAR data of Dortmund in Germany, which contains three experimental buildings (Rathaus, Lohnhalle, and Verwaltung), are downloaded from a German open data download portal [7].These open LiDAR data have been geolocated in the ETRS89 reference system using a universal transverse Mercator (UTM) projection with a point density of 25 points/m 2 .
2. Ground images of the three buildings come from a benchmark dataset named "ISPRS benchmark on multi-platform photogrammetry" [61], which can be downloaded from the official website of the ISPRS.These images are captured around buildings using high-resolution digital cameras on the ground.Due to the use of GPS-locating accessories, image shooting positions are recorded in these JPEG formatted images as GPS meta-data.Global coordinates of target centers distributed on the façade of the three buildings are provided for accuracy estimation.The details of these image collections are listed in Table 1.

Qualitative Analysis
As shown in Figure 6, façade point clouds of Rathaus, Lohnhalle, and Verwaltung (Figure 6B1-B3, respectively) are generated from ground images (Figure 6A1-A3, image sample) using SfM and MVS algorithms in COLMAP [32].The open LiDAR data of the three buildings is visualized using the height rendering map shown in Figure 6C1-C3.It is evident that there are no overlaps between the open LiDAR data and façade point clouds on the façades, except for Verwaltung, which has a small number of points on the façades.However, from another perspective, open LiDAR data and façade point clouds are complementary.The former lacks structural details on the façades, while the latter lacks roof information.
façade point clouds are complementary.The former lacks structural details on the façades, while the latter lacks roof information.We also tested the matching of our datasets using the ICP [27] and Normal Distributions Transform (NDT) [62] algorithms, two classical algorithms of point set registration.The visualizing results are shown in Figure 7. Due to a relatively good density of points on the façades, we can see that ICP and NDT achieve a relatively good result in Verwaltung compared with Rathaus and Lohnhalle.
Surface reconstruction (Figure 8) is performed using the method described in [63] to demonstrate the effectiveness of our alignment algorithm.Both completeness and structural details are achieved in the surface reconstruction after accurately aligning the façade point cloud to open LiDAR data.Surface reconstruction (Figure 8) is performed using the method described in [63] to demonstrate the effectiveness of our alignment algorithm.Both completeness and structural details are achieved in the surface reconstruction after accurately aligning the façade point cloud to open LiDAR data.

Quantitative Analysis
As described in Section 2.2, an iteration process is performed in the EM process to find the optimal alignment result.After less than 30 iterations, the ratio of  to the initial  quickly declined to 1%, as shown in Figure 9A.Accurate geographical coordinates { } of target centers distributed on the façade, as provided in the dataset of "ISPRS benchmark on multi-platform photogrammetry", are used for quantitative evaluations of the alignment results, as shown in Figure 9B.

Quantitative Analysis
As described in Section 2.2, an iteration process is performed in the EM process to find the optimal alignment result.After less than 30 iterations, the ratio of Q to the initial Q 0 quickly declined to 1%, as shown in Figure 9A.Accurate geographical coordinates {G i } of target centers distributed on the façade, as provided in the dataset of "ISPRS benchmark on multi-platform photogrammetry", are used for quantitative evaluations of the alignment results, as shown in Figure 9B.The difference in the final aligned façade point cloud includes deviations from both the façade point cloud generating process and the registration process.It is difficult to estimate the accuracy of the registration process alone.In order to find the optimal geolocated results of the façade point cloud, which include almost no transformation difference, target center registration (TCR) is performed by estimating the similarity transformation (, , ) between the local coordinates of the manually selected target centers and their provided global coordinates ({ }) using the leastsquare method.Since the transformation difference is greatly reduced by directly use of highprecision GCPs, transformation difference in the alignment of the façade point cloud using the TCR method comes mainly from the façade point cloud generating process.Thus, the results of the TCR methods can be used as reference values for other registration methods.We applied the proposed method, TCR, ICP, and NDT methods to register initial geolocated façade point cloud (result in Section 2.1) to the airborne LiDAR data.Then, the root mean square error (RMSE), mean error (ME), and standard deviation (SD) of the proposed method are calculated using the following equations: The CPD algorithm was first introduced in [31] and considered the alignment of two-point sets as a probability density estimation.Given two D-dimensional point sets,  × = { , … ,  } and  × = { , … ,  }, the CPD method considers the alignment of the two point sets as a probability density estimation problem where one point set represents the GMM centroids ( × ) and the other one represents the data points ( × ).The similarity transformation ( , , ) that aligns the GMM centroids  × the data points  × is obtained by maximizing the GMM posterior probability for the data points  × in order to find an optimum value.The GMM probability density function used in CPD can be written as: where (|) = ⁄ for all GMM components, then, the mixture model takes the form: GMM centroids locations are re-parametrized by similarity transformation parameters {, , }.
(R, s, T) between the local coordinates of the manually selected target centers and their provided global coordinates ({G i }) using the least-square method.Since the transformation difference is greatly reduced by directly use of high-precision GCPs, transformation difference in the alignment of the façade point cloud using the TCR method comes mainly from the façade point cloud generating process.Thus, the results of the TCR methods can be used as reference values for other registration methods.We applied the proposed method, TCR, ICP, and NDT methods to register initial geolocated façade point cloud (result in Section 2.1) to the airborne LiDAR data.Then, the root mean square error (RMSE), mean error (ME), and standard deviation (SD) of the proposed method are calculated using the following equations: where D i is the distance between the target center coordinates obtained from the aligned results and their provided global coordinates.The middle 90% of {D i } are used for calculating the RMSE, mean errors, and standard deviation, as shown in Table 2.By analyzing the RMSE value of the different methods in Table 2, we can see that the proposed method significantly improves the accuracy of test datasets compared to the ICP and NDT methods.It is well known that ICP and NDT are effective methods of point set registration with large overlaps.Experiments have shown that ICP and NDT methods cannot handle our datasets, in which almost no overlaps are found.Thus, RMSE value up to 10 m are obtained, except for the Verwaltung dataset, which had a small number of points on the façade parts.While the results are not as good as the result of the TCR method, the proposed method achieves the best accuracy compared to ICP and NDT methods due to the use of similarity on 2D outlines of buildings.
For Rathaus, the far mean capture distance leads to the apparent worst quality of the façade point cloud from captured images (i.e., the most significant RMSE value in the TCR method).Consequently, a relatively large RMSE value appeared in the Rathaus dataset using the proposed method.For the Lohnhalle dataset, we believe the disclosure of images captured around the target building caused relatively apparent deviation in the SfM process, even though the mean capture distance is much closer than that of the Rathaus dataset.

Robustness Analysis
It is known that different point densities and degrees of noise in the point clouds have a significant impact on the performance of registration.We perform several experiments on point clouds mixed with different degrees of noise and point densities to test the robustness of the proposed method.

Expandability for Crowdsourcing Images
To test the expandability for crowdsourcing images, we have tested the proposed method on crowdsourcing images of several buildings in our campus acquired by smartphones and digital cameras.Experiments show that if most of the façade point cloud can be successfully restored from crowdsourcing images, the proposed fusion approach is effective, as shown in Figure 11.Therefore, successfully reconstructing the facade point cloud from these images through, e.g., COLMAP is a prerequisite for our approach.
But sometimes COLMAP fails in the SfM procedure because the following problems are common to crowdsourcing images: (1) missing photos in some locations; (2) occlusion of trees; (3) repetitive structures on the façade (e.g., similar windows); and (4) failure to match images with large differences (e.g., capture angle and illumination).The above cases will result in failure to generate the façade point cloud, which will make the proposed method ineffective.Consequently, ensuring complete image coverage and good image quality is beneficial to extend the proposed method to crowdsourcing images.

Conclusions
This paper presents an accurate and efficient approach for improving building façade details of open LiDAR data using ground images.The essence of the proposed approach is the fusion of the façade point clouds generated from ground images and open LiDAR data, between which there are very limited overlaps and it is different from the common situation with adequate overlap.By using To evaluate the robustness of the proposed method to point density, we randomly down-sampled the façade point clouds of the Rathaus, Verwaltung, and Lohnhalle data from their original point densities to various reduced densities.The RMSE values evaluated with respect to the target center coordinates at different point densities are given in Figure 10A.It is evident that the proposed method performed well, even at 1% of the original point density, indicating the robustness of the proposed method to different point densities.We attribute the robustness of our approach to different point densities to the use of the 2D similarity of building outlines in the registration between two sources of point clouds.
To evaluate the robustness of the proposed method to noise, Gaussian noise with different standard deviations (1, 2, 3, 4, and 5 cm) is added to the point cloud data.The RMSE values evaluated with respect to the target center coordinates under different levels of noise are shown in Figure 10B.Even when Gaussian noise with a standard deviation of 5 cm is added to the point cloud, the proposed method achieved fine and stable accuracy.This indicates that the proposed method is very robust to different levels of noise.We attribute the robustness of our approach to different degrees of noise to the use of the probabilistic method in the accurate alignment method.

Expandability for Crowdsourcing Images
To test the expandability for crowdsourcing images, we have tested the proposed method on crowdsourcing images of several buildings in our campus acquired by smartphones and digital cameras.Experiments show that if most of the façade point cloud can be successfully restored from crowdsourcing images, the proposed fusion approach is effective, as shown in Figure 11.
Therefore, successfully reconstructing the facade point cloud from these images through, e.g., COLMAP is a prerequisite for our approach.
But sometimes COLMAP fails in the SfM procedure because the following problems are common to crowdsourcing images: (1) missing photos in some locations; (2) occlusion of trees; (3) repetitive structures on the façade (e.g., similar windows); and (4) failure to match images with large differences (e.g., capture angle and illumination).The above cases will result in failure to generate the façade point cloud, which will make the proposed method ineffective.Consequently, ensuring complete image coverage and good image quality is beneficial to extend the proposed method to crowdsourcing images.

Expandability for Crowdsourcing Images
To test the expandability for crowdsourcing images, we have tested the proposed method on crowdsourcing images of several buildings in our campus acquired by smartphones and digital cameras.Experiments show that if most of the façade point cloud can be successfully restored from crowdsourcing images, the proposed fusion approach is effective, as shown in Figure 11.Therefore, successfully reconstructing the facade point cloud from these images through, e.g., COLMAP is a prerequisite for our approach.
But sometimes COLMAP fails in the SfM procedure because the following problems are common to crowdsourcing images: (1) missing photos in some locations; (2) occlusion of trees; (3) repetitive structures on the façade (e.g., similar windows); and (4) failure to match images with large differences (e.g., capture angle and illumination).The above cases will result in failure to generate the façade point cloud, which will make the proposed method ineffective.Consequently, ensuring complete image coverage and good image quality is beneficial to extend the proposed method to crowdsourcing images.

Conclusions
This paper presents an accurate and efficient approach for improving building façade details of open LiDAR data using ground images.The essence of the proposed approach is the fusion of the façade point clouds generated from ground images and open LiDAR data, between which there are very limited overlaps and it is different from the common situation with adequate overlap.By using

Conclusions
This paper presents an accurate and efficient approach for improving building façade details of open LiDAR data using ground images.The essence of the proposed approach is the fusion of the façade point clouds generated from ground images and open LiDAR data, between which there are very limited overlaps and it is different from the common situation with adequate overlap.By using a two-step strategy, the scale, translation, and rotation differences are greatly relieved after initial geolocalization of the façade point cloud using GPS meta-data.2D overlapping points on the outline of buildings are effective for the registration of the façade point cloud and airborne point cloud than 3D overlapping points, which can hardly be found between the two different sources of point clouds.We decompose the registration of the two-point cloud into a horizontal and vertical transformations instead of 3D registration directly.The proposed NC-CPD inherits the noise robustness property of the original CPD algorithm, which is a probabilistic-based point set registration algorithm.At the same time, it can handle the registration with structural ambiguities of buildings by introducing normal consistency into the original CPD algorithm.
Both completeness and structural details of buildings in open LiDAR data are significantly improved after accurate alignment so that a complete and full-resolution city building model and other applications can be achieved.Experiments have shown that classic registration methods, such as ICP and NDT, cannot handle this situation.Compared with ICP and NDT, the proposed method achieves 2 to 10 times higher registration accuracy.

Figure 1 .
Figure 1.The overview of the proposed method.

Figure 2 .
Figure 2.An overview of the initial geolocalization process.(A) Camera positions calculated using SfM (red) (B) and camera GPS meta-information (green points) aligned using a random sample consensus (RANSAC)-based similarity transformation.Simultaneously, the façade point cloud (textured points) is aligned to the point cloud of the building from the open LiDAR data (blue points) by applying the calculated 2D similarity transformation parameters in the horizontal direction and a vertical translation.(C) The alignment results.

Figure 2 .
Figure 2.An overview of the initial geolocalization process.(A) Camera positions calculated using SfM (red) (B) and camera GPS meta-information (green points) aligned using a random sample consensus (RANSAC)-based similarity transformation.Simultaneously, the façade point cloud (textured points) is aligned to the point cloud of the building from the open LiDAR data (blue points) by applying the calculated 2D similarity transformation parameters in the horizontal direction and a vertical translation.(C) The alignment results.After the 2D alignment, a vertical translation T v ca is calculated by matching the mean value of the z coordinate in C loc i

Figure 3 .
Figure 3. Illustration of the alignment procedure.(A) The 2D boundary points from LiDAR (red points) and their normal   (green lines with arrows).(B) The 2D façade points (blue points) and their normal   (blue lines with arrow).(C) An incorrect alignment result due to ambiguities, as seen by the large difference between the normal directions in the overlapping part of the two point clouds.(D) A correct alignment result as seen by the high similarity between the normal directions of points in the overlapping part.

Figure 3 .
Figure 3. Illustration of the alignment procedure.(A) The 2D boundary points from LiDAR (red points) and their normal N P i (green lines with arrows).(B) The 2D façade points (blue points) and their normal N M i (blue lines with arrow).(C) An incorrect alignment result due to ambiguities, as seen by the large difference between the normal directions in the overlapping part of the two point clouds.(D) A correct alignment result as seen by the high similarity between the normal directions of points in the overlapping part.

Figure 4 .
Figure 4. Overview of the proposed accurate alignment process.Red points are building point cloud from open LiDAR data.Textured points are façade point cloud generated from ground images.

Figure 4 .
Figure 4. Overview of the proposed accurate alignment process.Red points are building point cloud from open LiDAR data.Textured points are façade point cloud generated from ground images.

geof
are extracted from the façade point cloud M geo , and most noise points are removed.Then, we projected all points of M geo f onto the horizontal plane to obtain 2D façade points M geo f 2D , as shown in Figure 5C,D.Remote Sens. 2019, 11, 420 9 of 18

Figure 5 .
Figure 5. Results of 2D façade point extraction and 2D boundary point extraction.(A) The open LiDAR data.(B) The boundary points (top view) extracted from open LiDAR data.(C) The façade point cloud.(D) The façade boundary points (top view) extracted from the façade point cloud.2.3.3.Horizontal Alignment Using NC-CPD From the previous steps, the 2D boundary points  and 2D façade points  are extracted from the open LiDAR data and the façade point cloud, respectively.The NC-CPD algorithm described in detail in Section 2.2.2 is used to match  to  .First, we calculated the initial  with  = ,  = 1, and  = .The initial  in Equation (7) is also calculated with   and the initial  ℳ .Then,  (| ) in Equation (5) is calculated by updating (|) in Equation (8) using , , ,  , and  .By substituting  (| ) into Equation (6), the parameters , , , and  are

Figure 5 .
Figure 5. Results of 2D façade point extraction and 2D boundary point extraction.(A) The open LiDAR data.(B) The boundary points (top view) extracted from open LiDAR data.(C) The façade point cloud.(D) The façade boundary points (top view) extracted from the façade point cloud.
geo b2D , as shown in Figure 5A,B.2.3.3.Horizontal Alignment Using NC-CPD From the previous steps, the 2D boundary points P geo b2D and 2D façade points M geo f 2D are extracted from the open LiDAR data and the façade point cloud, respectively.The NC-CPD algorithm described in detail in Section 2.2.2 is used to match M geo f 2D to P geo b2D

) 4
EM optimization.Repeat 5-7 until convergence to obtain the final R f , s f , t f , and σ 2 5 E-step: Update p(m|x n ) with R, s, T, σ 2 , S 6 M-step: Solve for the new R, s, T, σ 2 by minimizing Equation (6), 7Update S by usingN new i = N i R T 8The accurate aligned 2D façade points are given byM geo f 2D = s f M geo f 2D R T f + t T fFinally, we update the z coordinates of façade points by applying s and t, and the 3D façade point cloud ..

Figure 6 .
Figure 6.Datasets for evaluating the proposed method.From top to bottom, the different rows show the following results for Rathaus, Verwaltung, and Lohnhalle: (A) ground images, (B) façade point clouds, (C) open LiDAR data (height rendering), (D) coarse alignment results, and (E) accurate alignment results.The initial geolocalization results, which are not entirely accurate due to the low accuracy of GPS, are shown in Figure 6D1-D3 (the red color is assigned to open LiDAR data for better recognition).After performing the accurate alignment step, the façade point clouds and open LiDAR data are aligned well, as shown in Figure 6E1-E3.We also tested the matching of our datasets using the ICP[27] and Normal Distributions Transform (NDT)[62] algorithms, two classical algorithms of point set registration.The visualizing results are shown in Figure7.Due to a relatively good density of points on the façades, we can see that ICP and NDT achieve a relatively good result in Verwaltung compared with Rathaus and Lohnhalle.Surface reconstruction (Figure8) is performed using the method described in[63] to demonstrate the effectiveness of our alignment algorithm.Both completeness and structural details are achieved in the surface reconstruction after accurately aligning the façade point cloud to open LiDAR data.

Figure 6 .Figure 7 .
Figure 6.Datasets for evaluating the proposed method.From top to bottom, the different rows show the following results for Rathaus, Verwaltung, and Lohnhalle: (A) ground images, (B) façade point clouds, (C) open LiDAR data (height rendering), (D) coarse alignment results, and (E) accurate alignment results.The initial geolocalization results, which are not entirely accurate due to the low accuracy of GPS, are shown in Figure 6D1-D3 (the red color is assigned to open LiDAR data for better recognition).After performing the accurate alignment step, the façade point clouds and open LiDAR data are aligned well, as shown in Figure 6E1-E3.We also tested the matching of our datasets using the ICP [27] and Normal Distributions Transform (NDT) [62] algorithms, two classical algorithms of point set registration.The visualizing results are shown in Figure 7. Due to a relatively good density of points on the façades, we can see that ICP and NDT achieve a relatively good result in Verwaltung compared with Rathaus and Lohnhalle.Remote Sens. 2018, 10, x FOR PEER REVIEW 12 of 19

Figure 7 .
Figure 7. Fusion results of the proposed method compared with the ICP and NDT methods.Red points are building point cloud from open LiDAR data.Textured points are façade point cloud generated from ground images.

Figure 7 .
Figure 7. Fusion results of the proposed method compared with the ICP and NDT methods.Red points are building point cloud from open LiDAR data.Textured points are façade point cloud generated from ground images.

Figure 8 .
Figure 8. Poisson surface reconstruction results from: (A) open LiDAR data.(B) a façade point cloud generated from ground images.(C) the fusion point cloud of open LiDAR data and the façade point cloud.

Figure 8 .
Figure 8. Poisson surface reconstruction results from: (A) open LiDAR data.(B) a façade point cloud generated from ground images.(C) the fusion point cloud of open LiDAR data and the façade point cloud.

19 Figure 9 .
Figure 9. (A) Likelihood function  relative to the initial value  as a function of the number of iterations.(B) Illustration of ground targets.

Figure 9 . 19 2. 2 . 1 .
Figure 9. (A) Likelihood function Q relative to the initial value Q 0 as a function of the number of iterations.(B) Illustration of ground targets.The difference in the final aligned façade point cloud includes deviations from both the façade point cloud generating process and the registration process.It is difficult to estimate the accuracy of the registration process alone.In order to find the optimal geolocated results of the façade point cloud, which include almost no transformation difference, target center registration (TCR) is performed by estimating the similarity transformation

FigureFigure 10 .
Figure 10A,B illustrate the registration results achieved by the proposed method under different degrees of noise and point densities.Remote Sens. 2018, 10, x FOR PEER REVIEW 15 of 19

Figure 11 .
Figure 11.Fusion results of the proposed method tested on crowdsourcing images.Red points are building point cloud from airborne LiDAR data.Textured points are façade point cloud generated from ground images of several buildings on our campus acquired by smartphones and digital cameras.

Figure 10 .
Figure 10.Robustness analysis of the proposed method.In figure (A), we randomly down-sampled the façade point clouds of the Rathaus, Verwaltung, and Lohnhalle data from their original point densities to various reduced densities in order to estimate the robustness to point density.In figure (B), we added Gaussian noise with different standard deviations (1, 2, 3, 4, and 5 cm) to the point cloud data to estimate the robustness to noise.

Figure 11 .
Figure 11.Fusion results of the proposed method tested on crowdsourcing images.Red points are building point cloud from airborne LiDAR data.Textured points are façade point cloud generated from ground images of several buildings on our campus acquired by smartphones and digital cameras.

Figure 11 .
Figure 11.Fusion results of the proposed method tested on crowdsourcing images.Red points are building point cloud from airborne LiDAR data.Textured points are façade point cloud generated from ground images of several buildings on our campus acquired by smartphones and digital cameras.
2, . . .G of the images are extracted from the exchangeable image file format (EXIF) information of {I i }.The open LiDAR data P geo , with precise geographic coordinates corresponding to the capture area of {I i } are also given.
Finally, we update the z coordinates of façade points by applying  and , and the 3D façade point cloud   is obtained.

Table 1 .
Details of ground image datasets.

Table 2 .
The root mean square error (RMSE), the mean error (ME), and the standard deviation (SD) of the proposed method compared with target center registration (TCR), ICP and NDT.