Transformation Model with Constraints for High-Accuracy of 2 D-3 D Building Registration in Aerial Imagery

Guoqing Zhou 1,2,*, Qingli Luo 2, Wenhan Xie 3, Tao Yue 1, Jingjin Huang 2,4 and Yuzhong Shen 5 1 Guangxi Key Laboratory for Geospatial Informatics, Guilin University of Technology, Guilin 541004, China; yuetao@glut.edu.cn 2 The Center for Remote Sensing, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China; luoqingli2003@163.com (Q.L.); jingjin_huang@163.com (J.H.) 3 Chinese Academy of Surveying and Mapping, 28 Lianhuachi West Road, Beijing 100830, China; xiewh@casm.ac.cn 4 School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China 5 Department of Modeling, Simulation, and Visualization Engineering, Old Dominion University, Norfolk, VA 23529, USA; YShen@odu.edu * Correspondence: gzhou@glut.edu.cn; Tel.: +86-773-589-6073


Introduction
Texturing three-dimensional (3D) buildings at a large scale for the generation of a photorealistic urban environment has been an important research topic in the computer vision [1][2][3][4], computer graphics [5] and remote sensing communities [6].The prerequisite for texturing 3D buildings is high accuracy in the geometric alignment of 2D texture images, which are captured by different types of devices, such as airborne and spaceborne platforms, with their corresponding 3D models without textures, which are represented by various geometric data structures.This process is called 2D-3D registration.Researchers encountered various unique real-world problems and have developed a variety of approaches for 2D-3D registration.
Remote Sens. 2016, 8, 507 2 of 17 A commonly-adopted transformation model for a given ground point G can be expressed by: ‹ ‚`Rpω, φ, κq ¨xg where (X G , Y G , Z G ) T are the coordinates of the point G in 3D space with respect to a geographic coordinate system, (X S , Y S , Z S ) T are the center of the perspective projection, which is the translation vector between the origins of the 2D image coordinate system and the geographic coordinate system, (x g , y g , z g ) T are the image coordinates of the given ground point G with respect to the image coordinate system, ϕ, ω and k are three rotation angles and R(ϕ, ω, k) is a 3D orthogonal rotation matrix as follows, R " Traditional 2D-3D registration methods solve for the camera's position (X S , Y S , Z S ) T and its orientation matrix R in Equations ( 1) and (2).However, none of the distortion models are perfect.Any distortion model cannot completely describe the distortion over the entire image plane, i.e., the residual errors (incomplete correction) still remain no matter what kind of distortion model is adopted.As a result, the accuracy of image coordinates in the entire image plane is asymmetric (see Figure 1).This phenomenon has been demonstrated in [7] and it becomes obvious for a big dimension of aerial images, especially in which a much higher and bigger building is imaged on a large area in the image plane [8].Two constraint conditions are added into the traditional transformation model (Equation ( 1)) to increase the accuracy, reliability and robustness of the image distortion correction.For the example shown in Figure 1, the image distortion in Region 1 is different from that of Regions 2 and 3.If a conventional transformation model (e.g., photogrammetry backward intersection (PBI)) is applied to register such a 2D image that contains various and dissymmetric distortions with a 3D object represented by a rigid model, errors in the 2D-3D registration are undoubtedly unavoidable.This problem is especially profound for high-rise buildings because the bottom and top of a high-rise building are affected by different distortions in the same image.To avoid this problem, this paper presents a novel algorithm using two types of geometric constraints, i.e., co-planar and perpendicular conditions, to improve the accuracy of the 2D-3D registration.The two conditions added in the proposed transformation model, in relation to the original rigorous transformation model, can be summarized in the following.
Remote Sens. 2016, 8, 507 2 registration.Researchers encountered various unique real-world problems and have developed a variety of approaches for 2D-3D registration.
A commonly-adopted transformation model for a given ground point G can be expressed by: where (XG, YG, ZG) T are the coordinates of the point G in 3D space with respect to a geographic coordinate system, (XS, YS, ZS) T are the center of the perspective projection, which is the translation vector between the origins of the 2D image coordinate system and the geographic coordinate system, (xg, yg, zg) T are the image coordinates of the given ground point G with respect to the image coordinate system, φ, ω and k are three rotation angles and R(φ, ω, k) is a 3D orthogonal rotation matrix as follows, Traditional 2D-3D registration methods solve for the camera's position (XS, YS, ZS) T and its orientation matrix R in Equations ( 1) and ( 2).However, none of the distortion models are perfect.Any distortion model cannot completely describe the distortion over the entire image plane, i.e., the residual errors (incomplete correction) still remain no matter what kind of distortion model is adopted.As a result, the accuracy of image coordinates in the entire image plane is asymmetric (see Figure 1).This phenomenon has been demonstrated in [7] and it becomes obvious for a big dimension of aerial images, especially in which a much higher and bigger building is imaged on a large area in the image plane [8].Two constraint conditions are added into the traditional transformation model (Equation ( 1)) to increase the accuracy, reliability and robustness of the image distortion correction.For the example shown in Figure 1, the image distortion in Region 1 is different from that of Regions 2 and 3.If a conventional transformation model (e.g., photogrammetry backward intersection (PBI)) is applied to register such a 2D image that contains various and dissymmetric distortions with a 3D object represented by a rigid model, errors in the 2D-3D registration are undoubtedly unavoidable.This problem is especially profound for high-rise buildings because the bottom and top of a high-rise building are affected by different distortions in the same image.To avoid this problem, this paper presents a novel algorithm using two types of geometric constraints, i.e., co-planar and perpendicular conditions, to improve the accuracy of the 2D-3D registration.The two conditions added in the proposed transformation model, in relation to the original rigorous transformation model, can be summarized in the following.(1) Co-planar condition: The photogrammetric center, a line in the object coordinate system and the corresponding line in the image coordinate system are in the same plane.(2) Perpendicular condition: The constraint requires that if two lines are perpendicular in the object coordinate system, their projections are perpendicular in the image coordinate system.
The remainder of this paper is organized as follows: Section 2 briefly overviews related work; Section 3 focuses on the development of a rigorous transformation model; Sections 4 and 5 evaluate the correctness and accuracy of the proposed method and compare it to other existing methods; Section 6 summarizes the major contributions and advantages of the proposed method.

Related Work
2D-3D registration for different applications has been a very active research area in the last few decades.Researchers have developed a wide range of 2D-3D registration methods; papers in the existing literature provide an overall survey and study of various registration methods, such as [9][10][11][12][13][14][15][16][17].Most of the registration methods consist of four steps as follows:

‚
Feature detection: Existing methods can be classified into feature-based or intensity-based methods.Feature-based methods use point features, linear features and areal features extracted from both the 3D urban surface/building model and 2D imagery as the common feature pairs and then align the corresponding feature pairs via a transform model.This method has been widely applied in computer vision [18][19][20][21][22][23] and medical image processing [24,25].Intensity-based methods utilize the intensity-driven information, such as texture, reflectance, brightness, and shadow, to achieve high-accuracy in 2D-3D registration [11,13,26].These features, which are called ground control points (GCPs) in remote sensing, can be detected manually or automatically.

‚
Feature matching: The relationship between the features detected from the two images is established.Different feature similarity measures are used for the estimation.

‚
Transform model estimation: The transform type and its parameters between two images are estimated by certain mapping functions based on the previously-established feature correspondence.

‚
Image resampling and transformation: The slave image is resampled and transformed into the frame of the reference image, according to the transformation model and interpolation technique.
Among the previous efforts, imaging geometry has been utilized as a rigorous transformation model by some researchers, such as [26][27][28][29][30][31][32] suggested integrating single straight lines into the orientation of images for building reconstruction and/or building recognition.Habib et al. [33] proposed that straight lines be utilized as constraints in photogrammetric bundle adjustment to solve the interior orientation parameters (IOPs) of the camera and the exterior orientation parameters (EOPs) of aerial images.Two major advantages lie in that: (1) straight lines are employed to enhance the reliability of bundle adjustment; and (2) the IOPs are considered as unknown parameters to be solved in the model.The disadvantages of this method lie in that: (1) the straight line constraint is not as strong as that of co-planarity, because this type of constraint does not combine the image space, object space and perspective center; (2) the straight line constraint enhances only the accuracy along a straight line in theory, whereas the accuracy along other directions, such as that perpendicular to the straight line, becomes relatively weak.Zhang et al. [34] developed a methodology to co-register the laser scanning data and aerial images over urban areas.The method considers a linear feature as observed primitives and uses coplanar condition as error equations to solve for the EOP of the digital image.One advantage of this method is that it extends the straight line constraint into the coplanar constraint, which in fact combines features in object space, image space and the perspective center.One disadvantage is that it considers digital cameras as a rigid body for which the IOPs are known and are computed independently.
Considering the advantages and disadvantages of the two foregoing methods (i.e., [33,34]), this paper presents two types of constraint conditions, i.e., coplanar and perpendicular conditions.This idea comes from the fact that Zhou et al. [8] demonstrated that two aerial images over an urban area will have different accuracy of registration even with the same digital building model (DBM) and the same algorithm.This implies that highly accurate 2D-3D registration will not be achievable if constraint conditions are not applied.

Rigorous Transformation Model
We first briefly describe the fundamental concepts of the conventional rigorous transformation model, which is based on imaging geometry.For more detailed information, please refer to existing literature [27][28][29].This paper only considers aerial photography.The conventional rigorous transformation model is represented by: x g ´x0 " ´f r 11 pX G ´Xsq`r 12 pY G ´Ysq`r 13 pZ G ´Zsq r 31 pX G ´Xsq`r 32 pY G ´Ysq`r 33 pZ G ´Zsq " ´fx y g ´y0 " ´f r 21 pX G ´Xsq`r 22 pY G ´Ysq`r 23 pZ G ´Zsq r 31 pX G ´Xsq`r 32 pY G ´Ysq`r 33 pZ G ´Zsq " ´fy where f, x 0 , y 0 are the camera's IOP and r i,j (i = 1, 2, 3; j = 1, 2, 3) are components of the rotation matrix R(ω, φ, k), which was described in Equation (2).Considering two types of imagery distortions, radial and decentering, the image coordinates in Equation ( 3) can be modified by: where (x 1 g, y 1 g) are the distortion-free image coordinates after correction and δx, δy are corrections through the imagery distortion models below: where k 1 , ρ 1 and ρ 2 are the coefficients of radial distortion and decentering, respectively, and x g , y g are the coordinates of a target point in a defined image coordinate system.Because Equation (3) is not linear, it must be linearized with respect to both IOPs and EOPs.The vector form of the registration model can be expressed by: where X 1 represents EOPs; X 2 denotes IOPs; V is the residual error matrix; and A 1 and A 2 are their coefficients, whose partial derivatives with respect to the registration parameters can be found in [35].The vectors in Equation ( 6) are The registration parameters in Equation ( 6) are conventionally solved by least adjustment estimation using a number of well-distributed GCPs.
As mentioned previously, the conventional registration model is not robust for high-rise buildings in urban areas.Therefore, this paper proposes two types of conditions, co-planarity and perpendicularity, as constraints to the conventional model Equation (6).Mathematically, the two types of conditions are described as follows.

Coplanar Condition
The straight line condition does not combine the features in image space, object space and perspective center; thus, this paper proposes a coplanar constraint condition, which is similar to (Zhang et al., 2005).As shown in Figure 2, given two points P and Q on the object surface, their corresponding image points are p and q.Five points, O (camera exposure center), P, Q, p and q, should theoretically be coplanar.The mathematical model that describes the coplanar condition using three where F P is the residual error matrix; (X P , Y P , Z P ) and (X Q , Y Q , Z Q ) are coordinates of the ground points P, Q with respect to the ground coordinate system, respectively; (u P , v P , w P ) are coordinates of the image point p with respect to the camera coordinate system.
As mentioned previously, the conventional registration model is not robust for high-rise buildings in urban areas.Therefore, this paper proposes two types of conditions, co-planarity and perpendicularity, as constraints to the conventional model Equation (6).Mathematically, the two types of conditions are described as follows.

Coplanar Condition
The straight line condition does not combine the features in image space, object space and perspective center; thus, this paper proposes a coplanar constraint condition, which is similar to (Zhang et al., 2005).As shown in Figure 2, given two points P and Q on the object surface, their corresponding image points are p and q.Five points, O (camera exposure center), P, Q, p and q, should theoretically be coplanar.The mathematical model that describes the coplanar condition using three vectors OQ and

→
Op is: where FP is the residual error matrix; (XP, YP, ZP) and (XQ, YQ, ZQ) are coordinates of the ground points P, Q with respect to the ground coordinate system, respectively; (uP, vP, wP) are coordinates of the image point p with respect to the camera coordinate system.Combining Equations ( 1) and (3), Equation ( 7) yields where: Let: Equation ( 7) can be rewritten as: Combining Equations ( 1) and (3), Equation ( 7) yields where: 7) can be rewritten as: With the established coplanar constraint Equation ( 9), considering differential terms with respect to the unknown parameters when linearizing using Taylor's series, Equation ( 9) can also be rewritten using a differential form as, ˇˇˇˇˇˇup `du P v p `dv P w p `dw P X where dp.q " BF p Bp.q , and Equation ( 10) is rewritten as: where: p q are image coordinates of point p on the image plane after all corrections in Section 3.1; other symbols are the same as above.
Equation ( 11) is rewritten in vector form as below: where:

3
A P

4
A P

5
A P 6 s, X 1 " r dω dϕ dk dX S dY S dZ s s, W 1 " F P .

Perpendicular Condition
The perpendicularity constraint in 3D space is added to ensure preservation of the registered 2D and 3D pairs after applying registration transformation.As shown in Figure 3, suppose that AB and BC are two edges of a flat house roof, and their corresponding edges in the 2D image plane are ab and bc.Suppose that the coordinates of A, B and C are ( respectively, in the object coordinate system.For a flat-roof, cube house, the heights (i.e., Z coordinates) of A, B and C are the same, and the line segments AB and BC are perpendicular to each other.If AB is not perpendicular to BC and the angle between them is θ, i.e., B deviates from its correct position at B 1 , a line from C to O is drawn, and CO is perpendicular to AB 1 .l is the distance between B and O and can be expressed by [8]: (13) where S AB is the length of the segment AB. Theoretically, the distance l should be zero.Thus, the differential form of Equation ( 13) is: Remote Sens. 2016, 8, 507 7 of 17 Equation ( 14) can be rewritten in matrix form as below: Equation ( 15) can be then rewritten in vector form as: where: Remote Sens. 2016, 8, 507 Equation ( 14) can be rewritten in matrix form as below: Equation ( 15) can be then rewritten in vector form as: where: [ ]

Solution of Registration Parameters
By combining Equations ( 6), ( 12) and ( 16), we have: Remote Sens. 2016, 8, 507 8 of 17 Furthermore, Equation ( 17) can be rewritten as: where: Equation ( 18) is the model derived in this paper for 2D-3D registration.Distinct from the existent conventional model used for 2D-3D registration in Equation (7), this model is called the rigorous model in this paper.As can be seen in Equation (18), this registration model combines the linear edges and conventional point features of a building.This model should therefore achieve higher accuracy and reliability than the conventional model.Equation ( 18) is usually solved using least-squares estimation, which is expressed as [36]: With least-squares estimation, the solution normal equation matrix can be written as: where K s is an introduced unknown matrix (for its details, please refer to [37]).If the number of total observation equations is M, the dimension of K s is M ˆ1, i.e., K S " `KS 1 , K S 2 , ¨¨¨, K S M ˘T.
Let N XX " D T x D x ; Equation ( 20) can be rewritten as: The solution of unknown parameters would be: where Q ij (i, j = 1, 2) is the components of the covariance matrix, which is an inverse of the normal matrix, i.e.,

Accuracy Evaluation
The standard deviations of the unit weight and unknown parameters are typically used to evaluate the quality of adjustment and the accuracy of adjusted unknown parameters.The standard deviation of the unit weight δ 0 is computed by: where V is the residual vector and r is the redundancy of the observation equation.

Coplanar and Collinear Constraint
Lines have been used as registration primitives in previous work.As shown in Figure 4, if any three points B 1 , E 1 , C 1 in object space are collinear, the collinear constraint condition can be established according to the method proposed by [8,33].We can analyze the constraint condition of lines.

‚
Under the collinear constraint, three points B 1 , E 1 , C 1 are considered to lie on a line in object space.However, due to the incomplete camera calibration in practice, their projections b 1 , e 1 , c 1 cannot be guaranteed to lie on a line in image space.This implies that the collinear constraint condition ensures their collinearity only in object space, but it does not ensure their collinearity in image space.

‚
If the image is completely free of distortion after processing (this is definitely impossible.), their projections b 1 , e 1 , c 1 in image space should lie on a line; i.e., their projections also meet the collinear constraint condition in image space.Under this ideal condition, the collinear constraint is consistent with the coplanar constraint.In other words, the collinearity and coplanar constraint condition is highly correlated if they are simultaneously considered in a formula.

‚
Because the camera lens cannot be completely calibrated, the residue of image distortion still remains even though a very high accuracy of camera calibration is achieved.This fact results in a line in object space being distortion free, but its corresponding image still contains distortion in image space.That is, the coplanar constraint combines a line in object space with its corresponding one in image space, whereas the collinear constraint considers only a line in object space.Thus, the coplanar constraint condition is much stricter than the collinear constraint condition.
Given the foregoing analysis, this paper uses the coplanar condition instead of the collinear condition.

Coplanar and Collinear Constraint
Lines have been used as registration primitives in previous work.As shown in Figure 4, if any three points B′, E′, C′ in object space are collinear, the collinear constraint condition can be established according to the method proposed by [8,33].We can analyze the constraint condition of lines.
 Under the collinear constraint, three points B′, E′, C′ are considered to lie on a line in object space.However, due to the incomplete camera calibration in practice, their projections b′, e′, c′ cannot be guaranteed to lie on a line in image space.This implies that the collinear constraint condition ensures their collinearity only in object space, but it does not ensure their collinearity in image space. If the image is completely free of distortion after processing (this is definitely impossible.), their projections b′, e′, c′ in image space should lie on a line; i.e., their projections also meet the collinear constraint condition in image space.Under this ideal condition, the collinear constraint is consistent with the coplanar constraint.In other words, the collinearity and coplanar constraint condition is highly correlated if they are simultaneously considered in a formula. Because the camera lens cannot be completely calibrated, the residue of image distortion still remains even though a very high accuracy of camera calibration is achieved.This fact results in a line in object space being distortion free, but its corresponding image still contains distortion in image space.That is, the coplanar constraint combines a line in object space with its corresponding one in image space, whereas the collinear constraint considers only a line in object space.Thus, the coplanar constraint condition is much stricter than the collinear constraint condition.
Given the foregoing analysis, this paper uses the coplanar condition instead of the collinear condition.

Perpendicular Constraint
Perspective projections do not preserve angles.For example, as shown in Figure 4, if θ = 90 in object space, β is not 90 ˝after perspective projection.Furthermore, when θ = θ 1 = 90 ˝, the two angles' projections are both equal to β.This means that: (1) the perpendicular constraint is capable of enhancing the registration accuracy; (2) it may result in multiple solutions of IOPs and EOPs if we consider only the perpendicular constraint condition without adding the coplanar constraint and/or GCPs.For this reason, this paper simultaneously considers the perpendicular constraint, coplanar constraint and a few GCPs.Therefore, the perpendicular constraint proposed in this paper not only enhances the registration accuracy, but will also not reduce the risk of multiple solutions of IOPs and EOPs.
In a word, the coplanarity constraint condition in Figure 2 improves the accuracy of co-registration along a line (i.e., building edge), and the perpendicular constraint condition in Figure 3 improves the accuracy of co-registration along perpendicular lines (i.e., two perpendicular building edge).The two types of constraint conditions can maximize the accuracy symmetry without the need of adding additional GCPs.

Experimental Data
The experimental field is located in downtown Denver, Colorado, where the highest buildings reach 125 m, and many buildings are approximately 100 m.The City and County of Denver, Colorado, provided us with the aerial images and digital surface model (DSM).

‚
Aerial images: The six aerial images, with two flight strips, at the end lap of approximately 65% and the side lap of 30%, were acquired using a Leica RC30 aerial camera at a nominal focal length of 153.022 mm on 17 April 2000 (see Figure 5).The flying height was 1650 m above the mean ground elevation of the imaged area.The aerial photos were originally recorded on film and later scanned into digital format at a pixel resolution of 25 µm.

‚ DSM dataset:
The DSM in the central part of downtown Denver was originally acquired by LiDAR and then edited into grid format at a spacing of 0.6 m (Figure 6a).The DSM accuracy in planimetry and height was approximately 0.1 m and 0.2 m, respectively.The horizontal datum is Geodetic Reference System (GRS) 1980, and the vertical datum is North American Datum of 1983 (NAD83).
With the datasets provided, we conducted data pre-processing and 3D coordinate measurement.
‚ CSG representation: With the given DSM (see Figure 6a), a 3D CSG (constructive solid geometry) model was utilized to represent the building individually.The CSG is a very mature method that uses a few basic primitives to create a complex surface or object by Boolean operators [38].
The details of the CSG method can be found in [6,38].This paper proposes a three-level data structure to represent a building based on the given DSM data.The first level is 2D primitive representation, such as rectangle, circle, triangle or polygon, accompanied with their own parameters, including length, width, radius and the position of the point.The second level is 3D primitive representation, which is created by adding height information onto the 2D primitives.For example, by adding height information onto a rectangle (2D primitive), it becomes a cube (3D primitive); by adding height information onto a circle (2D primitive), it becomes a cylinder (3D primitive).In other words, cubes, cylinders, cones, pyramids, etc., are single 3D primitives.The third level is to create a 3D CSG representation of a building by combining multiple 3D primitives.During this process, a topological relationship is created, as well, so that building data can be easily retrieved and the geometric constraints proposed in this paper can be easily constructed.Meanwhile, the attributes (e.g., length, position, direction, grey) are assigned to each building (see Figure 6b).

‚
Coordinate measurement of "GCPs" and checkpoints: In areas with numerous high buildings, it is impossible to find conventional photogrammetric targeted points as ground control points (GCPs).In this paper, the corner points of building roofs or bottoms are taken as "GCPs".The 3D (XYZ) coordinates of "GCPs" are acquired from the DBM.The corresponding 2D image coordinates are automatically measured with Erdas/Imagine software.The first three "GCPs" are selected manually.First, we select their locations in DBM, and their 3D coordinates can be read directly.
We then find the corresponding locations in the 2D images manually with Erdas/Imagine software.After that, all other "GCPs" can be automatically measured with Erdas/Imagine software.The 2D image coordinates of "GCP" selection and measurements are based on back-projection.The EOPs of each image are solved by the first three measured "GCPs", and the 3D coordinates of the other corners of buildings are then back-projected onto the image plane, with which the 2D image coordinates are automatically measured (Figure 7a).The measurement accuracy of the 2D image coordinates is at the sub-pixel level.With the operation above, 321 points are measured, of which 232 points are taken as "GCPs" and 89 are used as checkpoints, which will be used to evaluate the final accuracy of 2D-3D registration (Figures 6a and 7b).
Remote Sens. 2016, 8, 507 11  Coordinate measurement of "GCPs" and checkpoints: In areas with numerous high buildings, it is impossible to find conventional photogrammetric targeted points as ground control points (GCPs).In this paper, the corner points of building roofs or bottoms are taken as "GCPs".The 3D (XYZ) coordinates of "GCPs" are acquired from the DBM.The corresponding 2D image coordinates are automatically measured with Erdas/Imagine software.The first three "GCPs" are selected manually.First, we select their locations in DBM, and their 3D coordinates can be read directly.We then find the corresponding locations in the 2D images manually with Erdas/Imagine software.After that, all other "GCPs" can be automatically measured with Erdas/Imagine software.The 2D image coordinates of "GCP" selection and measurements are based on back-projection.The EOPs of each image are solved by the first three measured "GCPs", and the 3D coordinates of the other corners of buildings are then back-projected onto the image plane, with which the 2D image coordinates are automatically measured (Figure 7a).The measurement accuracy of the 2D image coordinates is at the sub-pixel level.With the operation above, 321 points are measured, of which 232 points are taken as "GCPs" and 89 are used as checkpoints, which will be used to evaluate the final accuracy of 2D-3D registration (Figures 6a and 7b). Coordinate measurement of "GCPs" and checkpoints: In areas with numerous high buildings, it is impossible to find conventional photogrammetric targeted points as ground control points (GCPs).In this paper, the corner points of building roofs or bottoms are taken as "GCPs".The 3D (XYZ) coordinates of "GCPs" are acquired from the DBM.The corresponding 2D image coordinates are automatically measured with Erdas/Imagine software.The first three "GCPs" are selected manually.First, we select their locations in DBM, and their 3D coordinates can be read directly.We then find the corresponding locations in the 2D images manually with Erdas/Imagine software.After that, all other "GCPs" can be automatically measured with Erdas/Imagine software.The 2D image coordinates of "GCP" selection and measurements are based on back-projection.The EOPs of each image are solved by the first three measured "GCPs", and the 3D coordinates of the other corners of buildings are then back-projected onto the image plane, with which the 2D image coordinates are automatically measured (Figure 7a).The measurement accuracy of the 2D image coordinates is at the sub-pixel level.With the operation above, 321 points are measured, of which 232 points are taken as "GCPs" and 89 are used as checkpoints, which will be used to evaluate the final accuracy of 2D-3D registration (Figures 6a and 7b).

Establishment of the Registration Model
With the measured "GCPs" in Section 4.1, the 2D-3D registration model in Equation ( 18) can be established.Meanwhile, the two types of geometric constraint conditions, co-planarity (Equation ( 12)) and perpendicularity (Equation ( 16)), can simultaneously be established using the measured "GCPs".For example, the faces, Fa, Fb and Fc, the edges, L1, L2 and L3, and their attributes are described in the CSG model (see Figure 8a,b).From the attributes of the face data structure, L1 and L3 are automatically recognized as the edges of roofs, and L2 is seen as the edge of a wall.However, the control points P1, P3 and P5 are extracted, and they can be automatically constructed lines, l1, l2 and l3.When L1, L2 and L3 are back-projected onto the original aerial image, L1, L2 and L3 will be matched with lines l1, l2 and l3, respectively (Figure 8b).Thus, the attributes (e.g., edges of roofs and edges of walls) and topographic relationships (i.e., perpendicularity) of l1, l2 and l3 can be inherited from L1, L2 and L3.Thus, l1 and l3 are edges in the roof, and l2 is a vertical line in the wall.Additionally, l2 is perpendicular to l1 and l3; l1 is also perpendicular to l3.Therefore, when l1 ⊥ l3, Equation ( 16) is applied to construct a perpendicular constraint condition by replacing A, B and C by P5, P3 and P1 (see Figure 2).Similarly, the coplanar constraint condition (Equation ( 12)) can be established, as well.By combining all established equations above, the registration equations for Equation (18) are established.With the registration model established above, the solution can be found using the least-squares estimation.Owing to the non-linearization of the registration model (Equation ( 18)), the iterative

Establishment of the Registration Model
With the measured "GCPs" in Section 4.1, the 2D-3D registration model in Equation ( 18) can be established.Meanwhile, the two types of geometric constraint conditions, co-planarity (Equation ( 12)) and perpendicularity (Equation ( 16)), can simultaneously be established using the measured "GCPs".For example, the faces, F a , F b and F c , the edges, L 1 , L 2 and L 3 , and their attributes are described in the CSG model (see Figure 8a,b).From the attributes of the face data structure, L 1 and L 3 are automatically recognized as the edges of roofs, and L 2 is seen as the edge of a wall.However, the control points P 1 , P 3 and P 5 are extracted, and they can be automatically constructed lines, l 1 , l 2 and l 3 .When L 1 , L 2 and L 3 are back-projected onto the original aerial image, L 1 , L 2 and L 3 will be matched with lines l 1 , l 2 and l 3 , respectively (Figure 8b).Thus, the attributes (e.g., edges of roofs and edges of walls) and topographic relationships (i.e., perpendicularity) of l 1 , l 2 and l 3 can be inherited from L 1 , L 2 and L 3 .Thus, l 1 and l 3 are edges in the roof, and l 2 is a vertical line in the wall.Additionally, l 2 is perpendicular to l 1 and l 3 ; l 1 is also perpendicular to l 3 .Therefore, when l 1 K l 3 , Equation ( 16) is applied to construct a perpendicular constraint condition by replacing A, B and C by P 5 , P 3 and P 1 (see Figure 2).Similarly, the coplanar constraint condition (Equation ( 12)) can be established, as well.By combining all established equations above, the registration equations for Equation (18) are established.

Establishment of the Registration Model
With the measured "GCPs" in Section 4.1, the 2D-3D registration model in Equation ( 18) can be established.Meanwhile, the two types of geometric constraint conditions, co-planarity (Equation ( 12)) and perpendicularity (Equation ( 16)), can simultaneously be established using the measured "GCPs".For example, the faces, Fa, Fb and Fc, the edges, L1, L2 and L3, and their attributes are described in the CSG model (see Figure 8a,b).From the attributes of the face data structure, L1 and L3 are automatically recognized as the edges of roofs, and L2 is seen as the edge of a wall.However, the control points P1, P3 and P5 are extracted, and they can be automatically constructed lines, l1, l2 and l3.When L1, L2 and L3 are back-projected onto the original aerial image, L1, L2 and L3 will be matched with lines l1, l2 and l3, respectively (Figure 8b).Thus, the attributes (e.g., edges of roofs and edges of walls) and topographic relationships (i.e., perpendicularity) of l1, l2 and l3 can be inherited from L1, L2 and L3.Thus, l1 and l3 are edges in the roof, and l2 is a vertical line in the wall.Additionally, l2 is perpendicular to l1 and l3; l1 is also perpendicular to l3.Therefore, when l1 ⊥ l3, Equation ( 16) is applied to construct a perpendicular constraint condition by replacing A, B and C by P5, P3 and P1 (see Figure 2).Similarly, the coplanar constraint condition (Equation ( 12)) can be established, as well.By combining all established equations above, the registration equations for Equation (18) are established.With the registration model established above, the solution can be found using the least-squares estimation.Owing to the non-linearization of the registration model (Equation ( 18)), the iterative With the registration model established above, the solution can be found using the least-squares estimation.Owing to the non-linearization of the registration model (Equation ( 18)), the iterative solution is needed until the convergence of the solved registration parameters is met under a given threshold.The solved results are listed in Section 5.

Experimental Results
With the solved registration parameters, we project the 3D urban model back onto the aerial image.The 106 3D CSG building models are registered onto the original 2D aerial image, as shown in Figure 9.It is noted that the 3D CSG model is in the form of transparent wireframes to avoid the impact of the occluded wireframes of buildings when verifying the accuracy of 2D-3D registration.
Remote Sens. 2016, 8, 507 13 solution is needed until the convergence of the solved registration parameters is met under a given threshold.The solved results are listed in Section 5.

Experimental Results
With the solved registration parameters, we project the 3D urban model back onto the aerial image.The 106 3D CSG building models are registered onto the original 2D aerial image, as shown in Figure 9.It is noted that the 3D CSG model is in the form of transparent wireframes to avoid the impact of the occluded wireframes of buildings when verifying the accuracy of 2D-3D registration.

Accuracy Comparison and Analysis
The accuracy comparison analysis of the 2D-3D registration is designed through conventional registration models (i.e., Equation (1)) and three types of ground control primitives.The evaluation indices include: (1) theoretical accuracy of the solved registration parameters; (2) relative accuracy in 2D space and 3D space; and (3) visual check.

Comparison of Theoretical Accuracy
The IOPs, including focal length and principal point coordinates, were precisely calibrated and provided by the data vendor.The registration parameters (also EOPs, XS, YS, ZS, ϕ, ω and k) and image distortion parameters (k1, ρ1 and ρ2) are solved using Equation (22).The results and standard deviation are listed in Tables 1 and 2. As can be seen in Tables 1 and 2, the proposed model has the highest accuracy at a standard deviation of 0.47 pixels for solving IOPs and 0.37 pixels for solving EOPs.

Accuracy Comparison and Analysis
The accuracy comparison analysis of the 2D-3D registration is designed through conventional registration models (i.e., Equation (1)) and three types of ground control primitives.The evaluation indices include: (1) theoretical accuracy of the solved registration parameters; (2) relative accuracy in 2D space and 3D space; and (3) visual check.

Comparison of Theoretical Accuracy
The IOPs, including focal length and principal point coordinates, were precisely calibrated and provided by the data vendor.The registration parameters (also EOPs, X S , Y S , Z S , φ, ω and k) and image distortion parameters (k 1 , ρ 1 and ρ 2 ) are solved using Equation (22).The results and standard deviation are listed in Tables 1 and 2. As can be seen in Tables 1 and 2, the proposed model has the highest accuracy at a standard deviation of 0.47 pixels for solving IOPs and 0.37 pixels for solving EOPs.

Accuracy Comparison in 2D Space
The accuracy comparison in the 2D image plane is conducted to evaluate the correctness and robustness of the proposed method.The steps of the comparison analysis in 2D space are as follows: (1) With the registration parameters solved above, 3D buildings are back-projected onto the aerial image plane; (2) XY image coordinates of the back-projected and original 59 checkpoints are measured; (3) the root mean square (RMS) error of 59 checkpoints is computed by RMS X " a p∆ T ∆q{N, where ∆ is the difference in the x coordinates between the back-projected and the original 59 checkpoints and N is the number of checkpoints used; the operation for the y coordinates is similar.
As illustrated in Table 1, the accuracy estimation for the lens distortion parameters (δ IOP 0 ) is 1.23 pixels and 1.09 pixels when using 12 GCPs and 211 GCPs with conventional model.Compared with those results, δ IOP 0 is 0.47 pixel when using 32 points with the proposed model, which is an improvement of 61.8% over the conventional model when using 12 GCPs and 56.9% when using 211 GCPs.
On the similarity, from Table 2, the accuracy estimation for the exterior orientation parameters (δ IOP 0 ) is 1.26 and 1.02 when using 12 "GCPs" and 211 "GCPs" with the conventional model.Compared to those results, δ IOP 0 is 0.37 pixels when using 32 points with the proposed model, an improvement of 70.6% over the conventional model when using 12 GCPs and 63.7% when using 211 GCPs.
From Table 3, there are significant offsets between the building wireframes and the building edges for the conventional model, whose RMS values are approximately 10 pixels along the x and y directions; whereas the RMS of the proposed method is only about two pixels in both the x and y directions, an improvement of 78.0% over the conventional model when using 12 GCPs and 67.2% when using 211 GCPs.The results demonstrate that the accuracy of 2D-3D registration achieved by the proposed method has been greatly increased.Figure 10 depicts the 2D-3D registration results for the visual check using the proposed and conventional model.Five typical buildings with different heights are represented, and they are located at different locations of the image plane.a 1 through e 1 are the 2D-3D registration results created by the proposed model, in which 32 points plus the two types of constraints are employed; a 2 through e 2 and a 3 through e 3 are the 2D-3D registration results created by conventional models, in which 211 "GCPs" and 12 "GCPs" are employed, respectively.

Conclusions
The major contribution of this paper is the development of a rigorous transformation model for 2D-3D registration to increase its accuracy and reliability.This model simultaneously considers two types of geometric constraints (perpendicularity and co-planarity) and utilizes both point features and linear features as registration primitives.The details of the proposed 2D-3D registration model were presented.
A test field located in downtown Denver, Colorado, has been used to evaluate the proposed methods.The comparison analysis of the accuracy achieved by the proposed method and the conventional method are conducted.The experimental results demonstrated that: (1) the theoretical accuracy of the solved registration parameters can reach 0.47 pixels, whereas other methods reach only 1.23 and 1.09 pixels; (2) the RMS of 2D-3D registration created by the conventional model is approximately 10 pixels along both x and y directions, whereas the RMS from the proposed method is only two pixels in both the x and y directions.In addition, five buildings with different heights located at different locations of the aerial image plane are visually evaluated.It is demonstrated that the method proposed in this paper achieved higher accuracy than conventional methods, and 3D buildings presented by wireframes can be exactly registered in 2D aerial imagery.

Conclusions
The major contribution of this paper is the development of a rigorous transformation model for 2D-3D registration to increase its accuracy and reliability.This model simultaneously considers two types of geometric constraints (perpendicularity and co-planarity) and utilizes both point features and linear features as registration primitives.The details of the proposed 2D-3D registration model were presented.
A test field located in downtown Denver, Colorado, has been used to evaluate the proposed methods.The comparison analysis of the accuracy achieved by the proposed method and the conventional method are conducted.The experimental results demonstrated that: (1) the theoretical accuracy of the solved registration parameters can reach 0.47 pixels, whereas other methods reach only 1.23 and 1.09 pixels; (2) the RMS of 2D-3D registration created by the conventional model is approximately 10 pixels along both x and y directions, whereas the RMS from the proposed method is only two pixels in both the x and y directions.In addition, five buildings with different heights located at different locations of the aerial image plane are visually evaluated.It is demonstrated that the method proposed in this paper achieved higher accuracy than conventional methods, and 3D buildings presented by wireframes can be exactly registered in 2D aerial imagery.

Figure 1 .
Figure 1.Error of the 2D-3D registration caused by both various and dissymmetric image distortions, displacement and a rigid 3D object model.Figure 1. Error of the 2D-3D registration caused by both various and dissymmetric image distortions, displacement and a rigid 3D object model.

Figure 1 .
Figure 1.Error of the 2D-3D registration caused by both various and dissymmetric image distortions, displacement and a rigid 3D object model.Figure 1. Error of the 2D-3D registration caused by both various and dissymmetric image distortions, displacement and a rigid 3D object model.

Figure 2 .
Figure 2. The geometry for the coplanar condition.

Figure 2 .
Figure 2. The geometry for the coplanar condition.

Figure 3 .
Figure 3.The geometry of the perpendicular condition.

Figure 3 .
Figure 3.The geometry of the perpendicular condition.

Figure 4 .
Figure 4. Discussion of the relationship between the straight line and the coplanar constraint and the relationship between the perpendicular constraint and the angle-unpreserved perspective projection.

Figure 4 .
Figure 4. Discussion of the relationship between the straight line and the coplanar constraint and the relationship between the perpendicular constraint and the angle-unpreserved perspective projection.

Figure 5 .
Figure 5. Six aerial images from two strips in the study area, City of Denver, Colorado [6].

Figure 6 .
Figure 6.(a) 2D brightness is applied to represent the digital building model (DBM); and (b) constructive solid geometry (CSG) is used to represent the DBM [6].

Figure 5 .
Figure 5. Six aerial images from two strips in the study area, City of Denver, Colorado [6].

Figure 5 .
Figure 5. Six aerial images from two strips in the study area, City of Denver, Colorado [6].

Figure 6 .
Figure 6.(a) 2D brightness is applied to represent the digital building model (DBM); and (b) constructive solid geometry (CSG) is used to represent the DBM [6].Figure 6.(a) 2D brightness is applied to represent the digital building model (DBM); and (b) constructive solid geometry (CSG) is used to represent the DBM [6].

Figure 6 .
Figure 6.(a) 2D brightness is applied to represent the digital building model (DBM); and (b) constructive solid geometry (CSG) is used to represent the DBM [6].Figure 6.(a) 2D brightness is applied to represent the digital building model (DBM); and (b) constructive solid geometry (CSG) is used to represent the DBM [6].

Figure 7 .
Figure 7. (a) The corners of buildings in the 3D model, which are taken as "GCPs" and (b) the corresponding 2D image coordinates in the 2D image plane.

Figure 8 .
Figure 8.The two types of geometric constraints: (a) the linear features in the building model; and (b) the corresponding linear features in the aerial image.

Figure 7 .
Figure 7. (a) The corners of buildings in the 3D model, which are taken as "GCPs" and (b) the corresponding 2D image coordinates in the 2D image plane.

Figure 7 .
Figure 7. (a) The corners of buildings in the 3D model, which are taken as "GCPs" and (b) the corresponding 2D image coordinates in the 2D image plane.

Figure 8 .
Figure 8.The two types of geometric constraints: (a) the linear features in the building model; and (b) the corresponding linear features in the aerial image.

Figure 8 .
Figure 8.The two types of geometric constraints: (a) the linear features in the building model; and (b) the corresponding linear features in the aerial image.

Figure 9 .
Figure 9.The experimental results of 2D-3D registration using the proposed method.(a) The 2D-3D registration results with the proposed method; (b) the sub-area of (a) framed by the black rectangle; (c) the sub-area of (b), which illustrates the detail 2D-3D registration results of a single high-rise building.

Figure 9 .
Figure 9.The experimental results of 2D-3D registration using the proposed method.(a) The 2D-3D registration results with the proposed method; (b) the sub-area of (a) framed by the black rectangle; (c) the sub-area of (b), which illustrates the detail 2D-3D registration results of a single high-rise building.

Figure 10 .
Figure 10.Visual comparison of 2D-3D registration using the two models: (a1-e1) are the 2D-3D registration results created by the proposed model, in which 32 points plus the two types of constraints are employed; (a2-e2) and (a3-e3) are the 2D-3D registration results created by conventional models, in which 211 "GCPs" and 12 "GCPs" are employed, respectively.

Figure 10 .
Figure 10.Visual comparison of 2D-3D registration using the two models: (a 1 -e 1 ) are the 2D-3D registration results created by the proposed model, in which 32 points plus the two types of constraints are employed; (a 2 -e 2 ) and (a 3 -e 3 ) are the 2D-3D registration results created by conventional models, in which 211 "GCPs" and 12 "GCPs" are employed, respectively.

g
´f r 11 pX G ´Xsq`r 12 pY G ´Ysq`r 13 pZ G ´Zsq r 31 pX G ´Xsq`r 32 pY G ´Ysq`r 33 pZ G ´Zsq y 1 g ´f r 21 pX G ´Xsq`r 22 pY G ´Ysq`r 23 pZ G ´Zsq r 31 pX G ´Xsq`r 32 pY G ´Ysq`r 33 pZ G ´Zsq