E ﬃ cient Coarse Registration of Pairwise TLS Point Clouds Using Ortho Projected Feature Images

: The degree of automation and e ﬃ ciency are among the most important factors that inﬂuence the availability of Terrestrial light detection and ranging (LiDAR) Scanning (TLS) registration algorithms. This paper proposes an Ortho Projected Feature Images (OPFI) based 4 Degrees of Freedom (DOF) coarse registration method, which is fully automated and with high e ﬃ ciency, for TLS point clouds acquired using leveled or inclination compensated LiDAR scanners. The proposed 4DOF registration algorithm decomposes the parameter estimation into two parts: (1) the parameter estimation of horizontal translation vector and azimuth angle; and (2) the parameter estimation of the vertical translation vector. The parameter estimation of the horizontal translation vector and the azimuth angle is achieved by ortho projecting the TLS point clouds into feature images and registering the ortho projected feature images by Scale Invariant Feature Transform (SIFT) key points and descriptors. The vertical translation vector is estimated using the height di ﬀ erence of source points and target points in the overlapping regions after horizontally aligned. Three real TLS datasets captured by the Riegl VZ-400 and the Trimble SX10 and one simulated dataset were used to validate the proposed method. The proposed method was compared with four state-of-the-art 4DOF registration methods. The experimental results showed that: (1) the accuracy of the proposed coarse registration method ranges from 0.02 m to 0.07 m in horizontal and 0.01 m to 0.02 m in elevation, which is at centimeter-level and su ﬃ cient for ﬁne registration; and (2) as many as 120 million points can be registered in less than 50 s, which is much faster than the compared methods.


Introduction
Light detection and ranging (LiDAR) scanning, including Terrestrial LiDAR Scanning (TLS) [1], Personal LiDAR Scanning (PLS) [2], Mobile LiDAR Scanning (MLS) [3], and Airborne LiDAR Scanning (ALS) [4], can acquire dense point clouds of target scenes directly and efficiently. LiDAR scanning is among the most advanced three-dimensional (3D) spatial data acquisition technologies. Point clouds acquired by LiDAR scanning are used in numerous applications, including 3D modeling of target object in reverse engineering [5], 3D building reconstruction in city modeling [6], slopes and super-elevations estimation in road mapping [7], individual tree mapping in forestry [8], and large-scale Digital Elevation Model (DEM) generation in steep mountainous areas [9].
Registration is a fundamental and frequently encountered problem in point clouds processing [10,11]. Terrestrial LiDAR scanning scans the whole target scene station by station and registration is essential to align point clouds obtained from different stations to a unified frame [12][13][14][15][16][17][18]. In addition, the point clouds obtained by different LiDAR scanning platforms, such as TLS and MLS, are often merged to parameters are estimated utilizing the matched corresponding feature pairs. Rusu et al. [31] proposed a robust 16-dimensional feature called Persistent Feature Histograms (PFH) to achieve the registration of multi-view point clouds. To further improve efficiency, mathematical expressions of PFH are modified and optimized and a new type of local feature called Fast Point Feature Histograms (FPFH) was proposed [32]. Zai et al. [15] proposed an adaptive covariance descriptor to achieve coarse alignment of TLS point clouds. The descriptor was believed to be invariant to rigid transformation and robust to noise and varying resolutions. Gao et al. [33] proposed to utilize Extended Gaussian Image (EGI) to find point correspondences and thus achieve initial registration of point clouds. Yu and Ju [13] used an intrinsic shape signature keypoints detector and the Signature of Histograms of OrienTations (SHOT) [34] descriptor to find correspondences from point clouds. Then a branch-and-bound algorithm was proposed to robustly estimate registration parameters using point correspondences with incorrect matches. Since extracting keypoints from large-scale point clouds is time-consuming and the 3D local shape descriptors are usually high dimensional, 3D local shape descriptors-based coarse registration methods are usually inefficient.
Besides 3D local shape descriptors, high-level features, such as line features, plane features, and semantic features, are also employed for LiDAR point clouds registration. Yang and Zang [35] proposed to extract crest lines from point clouds to achieve coarse registration. Dold and Brenner [36] extracted 3D planar patches from laser scanning data and registered overlapping point clouds by finding corresponding patches. Yang et al. [17] extracted semantic points from point clouds first and then matched the semantic points using geometric constraints. Finally, registration parameters were estimated using the matched semantic points. Kelbe et al. [37] managed to register forest laser scanning point clouds using tree stems extracted from TLS point clouds.
Besides directly using 3D point clouds, coarse registration can also be achieved by 2D images assisted methods. Yang et al [38] proposed to achieve wide baseline 3D scan alignment by using 2D image features. To make the 2D image features more robust, dominant planar structures were extracted from point clouds and used to normalize the camera viewpoint changes. Barnea and Filin [39] first converted the 3D terrestrial laser point clouds into panoramic range images. Then, key features were extracted and matched by a combinatorial approach. Finally, registration parameters were estimated by the matched key features. Weinmann et al. [40] proposed to extract Scale Invariant Feature Transform (SIFT) features from reflectance images and then back project them to 3D utilizing range images to obtain 3D conjugates. Coarse registration parameters were estimated using the 3D conjugates.
Coarse registration is a key step that greatly affects efficiency in point clouds registration. Although numerous methods were proposed trying to solve this problem, it is still challenging. In most cases of LiDAR scanning, the laser scanner is coarsely leveled and even compensated by built-in inclination sensors in many cases. In such cases, the 3DOF rotations between two scans are constrained to only azimuth and the degrees of freedom in registration is reduced from 6 to 4. Cai et al. [12] used this constraint and proposed to use matched keypoints and a branch-and-bound algorithm to achieve efficient 4DOF registration. Similar to Cai et al. [12], this paper takes full advantage of this constraint, and proposes another efficient and automatic method to deal with the 4DOF coarse registration problem. The proposed efficient and automatic method was achieved by two steps: (1) horizontal translation vector and azimuth angle estimation by matching generated ortho projected feature images; (2) vertical translation estimation, achieved by a height difference estimation of overlapping regions after being horizontally aligned. Compared with existing coarse registration methods, the main contributions of this paper include: (1) a two-steps framework to achieve 4DOF coarse registration was proposed. The 4DOF registration was decomposed into two steps. In the first step, the horizontal translation vector and azimuth angle were estimated. In the second step, the vertical translation was estimated.
(2) Horizontal translation vector and azimuth angle estimation based on ortho projected feature images. Instead of using panoramic feature images to achieve registration of TLS point clouds, this paper proposed to use ortho projected feature images to achieve horizontal translation vector and azimuth angle estimation. The remainder of this paper is organized as follows. The experimental point clouds datasets and the 4DOF coarse registration method are described in Section 2. Experimental results are presented in Section 3. Discussions about the proposed method and the experimental results are presented in Section 4. Finally, Section 5 concludes this paper.

Experimental Point Clouds Datasets
The proposed method uses intensity information to generate ortho projected feature images. LiDAR scanners provided by different vendors have their characteristics in intensity value. To test the adaptability of the proposed algorithm to point clouds scanned by different types of LiDAR scanners, three TLS point clouds datasets scanned by two different LiDAR scanners were adopted as experimental datasets. The first one, named "DRIEGL", includes two stations scanned by a RIEGL VZ-400 TLS scanner in a square in Wuhan University. The RIEGL VZ-400 scanner is coarsely leveled in data acquisition. The horizontal scanning step-width and vertical scanning step-width were both set to 0.015 degrees in data scanning. The used point clouds are exported by RISCAN Pro [41]. Before exportation, the point clouds are automatically leveled by RISCAN Pro using the inclination sensor data. Station 1 in the DRIEGL was used for the target points and station 2 was used for the source points. There are about 88.90 million points in station 1 and 120.03 million points in station 2. Five spherical balls were installed in the scene and scanned by both stations. The point clouds of station 1, station 2, and all points before registration are illustrated in Figure 1a-c, respectively. Figure 1a,b is a top view of the station 1 and station 2 points. The red triangles in Figure 1a,b indicate the places where the scanners were placed in data acquisition. The distance between the two scan stations is about 20 m. It can be seen that there is a rotation between these two station points. In Figure 1c, the station 1 points are colored in red and the station 2 points are colored in blue. It is obvious that these two station points are not in a unified frame and need to be registered.
The second dataset was named "DTRIMBLE-1". The two stations of DTRIMBLE-1 were scanned by a Trimble SX10 TLS laser scanner (scanning total station). Similar to DRIEGL, the laser scanner was leveled during data acquisition. The scanning step-width of the Trimble SX10 is adjustable. The scanning step-width was set to a value larger than the RIEGL VZ-400 laser scanner, so the point density of the DTRIMBLE-1 was lower than that of the DRIEGL. There were about 5.63 million points in station 1 and 5.56 million points in station 2. Station 1 was used for the target points and station 2 was used for the source points that need to be registered. The source points, target points, and all points before the registration of DTRIMBLE-1 are illustrated in Figure 1d-f, respectively. Similar to Figure 1a,b, the positions where the scanners were located are labeled by a red triangle. It can be seen that the two stations were scanned at almost the same place, but a large rotation angle exists between the two stations. Similar to the DRIEGL, the station 1 points are colored in red and the station 2 points are colored in blue in Figure 1f. Figure 1f indicates that these two station points are not in a unified coordinate frame and need to be registered.
The third dataset was named "DTRIMBLE-2". The two stations of DTRIMBLE-2 were scanned using the same scanner and scanning parameters as DTRIMBLE-1. In contrast to scanning the two stations at almost the same place as DTRIMBLE-1, the two stations of DTRIMBLE-2 were scanned at two stations with a distance of about 16 meters, as indicated by the red triangles in Figure 1g,h. There are about 5.29 million points in station 1 and 5.61 million points in station 2. Figure 1i illustrates that the two station points of DTRIMBLE-2 are not in a unified coordinate system. Besides DRIEGL, DTRIMBLE-1, and DTRIMBLE-2, to evaluate the influence of viewpoint changes and occlusion of TLS point clouds on the ortho projected feature image registration, station 1 of DRIEGL is used to simulate a pairwise of TLS point clouds. As illustrated in Figure 2a

Ortho Projected Feature Image-based Coarse Registration Method
The registration of point clouds means to find the optimal transformation parameters between point clouds scanned from different scan stations. If there are two pointsets, denoted as and , the transformation between these two-point sets can be described as where denotes the coordinates of the point in source point cloud , and denotes the coordinates of the point in the target point cloud . and are the registration parameters, translation vector and rotation angle vector respectively, that need to be estimated.
is the rotation matrix computed from the rotation vector. If the scanner is leveled in data acquisition, the rotation between the source point and target point is reduced to around axis only. Then the transformation between source point and target point can be simplified from Equation (1) to Equation (2). In such a situation, the goal of registration is to estimate translation vector and azimuth angle . , This paper takes full advantage of the fact that in many cases the TLS laser scanner is leveled and/or compensated by inclination sensors in terrestrial LiDAR scanning. Only the translation vector and azimuth angle are enough for coarse registration of such point clouds, and thus the degrees of freedom are reduced from 6 to 4. Figure 3 shows the workflow of the proposed 4DOF coarse registration method. To improve coarse registration efficiency, the 4DOF coarse registration is decomposed into two steps: (1) the estimation of horizontal translation vector , and azimuth angle ; and (2) the estimation of vertical translation . The estimation of the horizontal translation vector and the azimuth angle is achieved by the registration of generated ortho projected feature images, which is described in detail in Section 2.2.1 and 2.2.2. The vertical translation is estimated by the height differences of the source points and the target points in overlapping regions after being horizontally aligned, which is described in detail in Section 2.2.3.

Ortho Projected Feature Image-Based Coarse Registration Method
The registration of point clouds means to find the optimal transformation parameters between point clouds scanned from different scan stations. If there are two pointsets, denoted as P and P , the transformation between these two-point sets can be described as where (x y z) T denotes the coordinates of the point in source point cloud P, and (x y z ) T denotes the coordinates of the point in the target point cloud P . t x t y t z T and (α β γ) T are the registration parameters, translation vector and rotation angle vector respectively, that need to be estimated. R(α β γ) is the rotation matrix computed from the rotation vector.
If the scanner is leveled in data acquisition, the rotation between the source point and target point is reduced to around Z axis only. Then the transformation between source point P and target point P can be simplified from Equation (1) to Equation (2). In such a situation, the goal of registration is to estimate translation vector t x t y t z T and azimuth angle γ.
This paper takes full advantage of the fact that in many cases the TLS laser scanner is leveled and/or compensated by inclination sensors in terrestrial LiDAR scanning. Only the translation vector t x t y t z T and azimuth angle γ are enough for coarse registration of such point clouds, and thus the degrees of freedom are reduced from 6 to 4. Figure 3 shows the workflow of the proposed 4DOF coarse registration method. To improve coarse registration efficiency, the 4DOF coarse registration is decomposed into two steps: (1) the estimation of horizontal translation vector t x , t y and azimuth angle γ; and (2) the estimation of vertical translation t z . The estimation of the horizontal translation vector and the azimuth angle is achieved by the registration of generated ortho projected feature images, which is described in detail in Sections 2.2.1 and 2.2.2. The vertical translation is estimated by the height differences of the source points and the target points in overlapping regions after being horizontally aligned, which is described in detail in Section 2.2.3.

Ortho Projected Feature Image Generation
To generate ortho projected feature images of source points and target points, feature image grid resolution is first determined. Then, the minimum value of , , and coordinates are found, denoted as , , and , respectively. A point , , in the point cloud is assigned to image pixel with row number and column number computed by Equation (3) and Equation (4). .) means the floor integer value. , More than one point may locate in the same image pixel of the feature image. The pixel value is computed by the average intensity and average height of all points that locate in the same image pixel. Firstly, the average intensity and height of points that locate in the same pixel are computed. Then, the average intensity and average height are scaled to [0,255], using the maximum and minimum intensity and height values. The intensity information is obtained by Equation (5), where is the number of points in the same image pixel, is the intensity value of the ith point, and are the minimum intensity value and the maximum intensity value of the point cloud. The height information is obtained by Equation (6), where is the height/elevation of the ith point. and are the minimum and maximum elevation of the point cloud. The final pixel value is computed by combining intensity information and height information according to Equation (7), where is a parameter that determines the relative proportion of intensity information and height information in the final pixel value.

Horizontal Translation Vector and Azimuth Angle Estimation
The feature images generated by Section 2.2.1 are ortho projected feature images. The translation vector and azimuth angle to align the ortho projected feature images are the same as the horizontal translation vector and azimuth angle between source points and target points. Thus, the estimation of horizontal translation vector and azimuth angle between source points and target points can be achieved by image registration of ortho projected feature images. In this paper, the SIFT feature proposed by Lowe [42,43] is used to detect distinctive keypoints and extract local feature descriptors from ortho projected feature images generated from source and target points. The SIFT descriptor is

Ortho Projected Feature Image Generation
To generate ortho projected feature images of source points and target points, feature image grid resolution S is first determined. Then, the minimum value of x, y, and z coordinates are found, denoted as x min , y min , and z min , respectively. A point p(x, y, z) in the point cloud is assigned to image pixel with row number and column number computed by Equation (3) and Equation (4). INT(.) means the floor integer value.
More than one point may locate in the same image pixel of the feature image. The pixel value is computed by the average intensity and average height of all points that locate in the same image pixel. Firstly, the average intensity and height of points that locate in the same pixel are computed. Then, the average intensity and average height are scaled to [0, 255], using the maximum and minimum intensity and height values. The intensity information is obtained by Equation (5), where n is the number of points in the same image pixel, r i is the intensity value of the ith point, r min and r max are the minimum intensity value and the maximum intensity value of the point cloud. The height information is obtained by Equation (6), where z i is the height/elevation of the ith point. z min and z max are the minimum and maximum elevation of the point cloud. The final pixel value is computed by combining intensity information and height information according to Equation (7), where α is a parameter that determines the relative proportion of intensity information and height information in the final pixel value.

Horizontal Translation Vector and Azimuth Angle Estimation
The feature images generated by Section 2.2.1 are ortho projected feature images. The translation vector and azimuth angle to align the ortho projected feature images are the same as the horizontal translation vector and azimuth angle between source points and target points. Thus, the estimation of horizontal translation vector and azimuth angle between source points and target points can be ISPRS Int. J. Geo-Inf. 2020, 9, 255 8 of 20 achieved by image registration of ortho projected feature images. In this paper, the SIFT feature proposed by Lowe [42,43] is used to detect distinctive keypoints and extract local feature descriptors from ortho projected feature images generated from source and target points. The SIFT descriptor is invariant to image scale and rotation, and is believed to be robust enough to affine distortion, 3D viewpoint changes, and illumination changes. Since the differences between ortho projected feature images of source and target points mainly include image rotation and 3D viewpoint changes, SIFT is suitable for keypoints detection and local feature descriptor extraction of ortho projected feature images. The details of the SIFT feature can be found in Lowe [42,43].
After SIFT descriptors extracted from feature images, keypoints from source feature image are matched with k nearest keypoints extracted from the target points feature image, using Euclidean distance by approximate nearest neighbor algorithm [44]. The matched k nearest keypoints are ordered by Euclidean distance. In the matched k nearest keypoints, the distance ratio between the nearest distance and the second nearest distance is a good indicator that reflects the match quality. The smaller the distance ratio, the better the match. To only reserve reliable correspondences, match results with distance ratios larger than 0.6 are eliminated.
The image match results are in the image pixel coordinate system. To estimate horizontal translation vector and azimuth angle between source points and target points, the coordinates of image match results in scanner coordinate system are reconstructed by Equations (8) and (9), where u and v are keypoints in image pixel coordinate system, x min and y min are minimum x and y coordinates, S is image grid resolution.
The keypoints in source points and target points are denoted as P S and P T . The horizontal translation vector and azimuth angle are estimated using P S and P T by Singular Value Decomposition (SVD) approach [45]. First, the centroids of P S and P T are found through Equations (10) and (11), where N is the number of keypoints. Then a matrix H, similar to the covariance matrix, is accumulated by Equation (12) and decomposed to U, S, V by SVD algorithm, as described in Equation (13). The rotation matrix from source points to target points can be calculated by V and U, using Equation (14). The azimuth angle can be easily computed from the rotation matrix. Finally, the horizontal translation vector can be estimated by Equation (15).

Vertical Translation Estimation
The source points are first transformed using the horizontal translation vector and azimuth angle estimated by the method described in Section 2.2.2 to make the source points horizontally aligned to target points. Since the source and target points have been horizontally aligned, the vertical translation can be estimated by the height differences of the source points and the target points in the overlapping regions. To estimate the height difference in the overlapping regions, a grid is created with the cell size equaling to feature image grid resolution S. All the source and target points are assigned to the grid. For a grid cell C in the overlapping region, the average z value of source points and target points located in the cell C are computed separately, and denoted as mz S C and mz T C . Thus, the height difference between mz S C and mz T C , denoted as dz C , is computed. The vertical translation between source points and target points is estimated by averaging all dz C in the overlapping regions. Source points are then vertically aligned to target points using the estimated vertical translation.

Accuracy Evaluation Criteria
Since the proposed Ortho Projected Feature Images (OPFI) based method registers the source point cloud to the target point in the horizontal and vertical direction separately, this paper evaluates the accuracy of registered source points in the horizontal and vertical direction separately, using ground truth points. The horizontal distance and vertical distance between the ith registered point and its corresponding ground-truth point are calculated and denoted as d i h and d i v . The Horizontal Root Mean Squares Error (HRMSE) and Vertical Root Mean Squares Error (VRMSE) are calculated by Equation (16) and Equation (17). In Equations (16) and (17), N is the number of registered points. Small HRMSE and VRMSE indicate good registration accuracy.

Results
The point clouds datasets described in Section 2.1 are used to conduct experiments to verify the proposed method. DSimRIEGL, DRIEGL, DTRIMBLE-1, and DTRIMBLE-2 are all registered by the proposed ortho projected feature image-based method. First, the ortho projected feature images of source points and target points are generated with an image grid resolution of 0.1 meters. In feature image generation, the α value is set to 0.5, to take advantage of both intensity and height information. Besides, to avoid large-scale empty pixels in feature images, points with a scanning distance larger than 100 m are eliminated for DSimRIEGL and DRIEGL. The point cloud obtained by the Trimble SX10 scanner is much sparser than the point clouds obtained by RIEGL VZ-400, and points with a scanning distance larger than 70 m are eliminated for DTRIMBLE-1 and DTRIMBLE-2.
SIFT features are then extracted from the generated ortho projected feature images and matched by k nearest neighbor algorithm. Figure 4 shows the generated ortho feature images and the SIFT matching results. In Figure 4a-d, the left parts are the ortho projected feature images generated from target points, and the right parts are the ortho projected feature images generated from source points. Different objects, such as squares, buildings, and trees, can be clearly distinguished in the feature images. The matched keypoint correspondences are connected by a color line. A total of 1484 correspondences were matched for DSimRIEGL. To illustrate the SIFT matching results more clearly, only SIFT matching results with distance ratio smaller than 0.1 (16 pairs) are drawn in Figure 4a. A total of 65 correspondences, 251 correspondences, and 23 correspondences were matched for DRIEGL, DTRIMBLE-1, and DTRIMBLE-2, respectively. To show the correspondences more clearly, only five representative correspondences were chosen and drawn in Figure 4b-d. The matched correspondences in DRIEGL, DTRIMBLE-1 and DTRIMBLE-2 are much less than that of DSimRIEGL, which indicates that viewpoint changes and occlusion in TLS scanning will affect the SIFT matching result. But the results indicate that enough correct correspondences can be found from ortho projected feature images generated from both simulated and real captured TLS point clouds to accomplish 4DOF registration. After enough correspondences were found from the ortho projected feature images, the horizontal translation vector and the azimuth angle were estimated using the matched correspondences. Then, the source points were horizontally aligned to the target points using the estimated horizontal translation vector and azimuth angle. The vertical translation was then estimated by the height difference of the source points and target points in overlapping regions after being horizontally aligned. Figure 5 shows the registration results of DSimRIEGL, DRIEGL, DTRIMBLE-1, and DTRIMBLE-2. Figure 5a1,b1,c1,d1 shows the overviews of the registration result of DSimRIEGL, DRIEGL, DTRIMBLE-1, and DTRIMBLE-2, respectively. The target points are colored in red and the source points after registration are colored in green. It can be seen from Figure 5a1,b1,c1,d1 that the registered source points match the target points well, which indicates that the source points are visually well registered to the target points. To identify the registration results more precisely, a small part of the registration results of each dataset (labeled in the black box in Figure 5a1,b1,c1,d1) are zoomed in and shown in Figure 5a2,b2,c2,d2. It can be seen from Figure 5a2,b2,c2,d2 that the walls scanned from target points and source points match each other very well, which indicates that the source points are precisely registered to target points. After enough correspondences were found from the ortho projected feature images, the horizontal translation vector and the azimuth angle were estimated using the matched correspondences. Then, the source points were horizontally aligned to the target points using the estimated horizontal translation vector and azimuth angle. The vertical translation was then estimated by the height difference of the source points and target points in overlapping regions after being horizontally aligned. Figure 5 shows the registration results of DSimRIEGL, DRIEGL, DTRIMBLE-1, and DTRIMBLE-2. Figure 5a1,b1,c1,d1 shows the overviews of the registration result of DSimRIEGL, DRIEGL, DTRIMBLE-1, and DTRIMBLE-2, respectively. The target points are colored in red and the source points after registration are colored in green. It can be seen from Figure  5a1,b1,c1,d1 that the registered source points match the target points well, which indicates that the source points are visually well registered to the target points. To identify the registration results more precisely, a small part of the registration results of each dataset (labeled in the black box in Figure  5a1,b1,c1,d1) are zoomed in and shown in Figure 5a2,b2,c2,d2. It can be seen from Figure 5a2,b2,c2,d2 that the walls scanned from target points and source points match each other very well, which indicates that the source points are precisely registered to target points. To compare the registration result of OPFI with other state-of-the-art methods, the four datasets were also registered by the FMP-BnB method [12], the BnB method [12], the 4DOF version of Lifting Method (LM) [46] and the 4DOF RANSAC method [27]. The C++ codes of these algorithms implemented by Cai et al. [47] were used. The C++ codes can be accessed through the GitHub repository (https://github.com/ZhipengCai/Demo---Practical-optimal-registration-of-terrestrial-LiDAR-scan-pairs). The default parameter values used in the C++ codes were adopted. To evaluate the registration results, the ground truth of registration parameters and registration results are first obtained. For the DSimRIEGL dataset, part 2 points without transformation are used as ground truth. The translation vector is -1.0 m in , , and . The azimuth angle is -45 degrees. The ground truth of DRIEGL is obtained by artificial marker-based coarse registration and the ICP fine registration algorithm. The coordinates of five spherical balls in the scene were obtained by ball fitting in commercial software RISCAN Pro [41] and used to coarsely register station 2 points to station 1 points. The ICP algorithm was then applied to the coarse registration result. The registration parameters were finally computed by the point pairs before and after registration. The ground truth of DTRIMBLE-1 and DTRIMBLE-2 were obtained by manually picked point correspondences-based coarse registration and the ICP fine registration algorithm. Five point correspondences were manually picked from station 2 and station 1 using the open source software CloudCompare [48]. Coarse registration was achieved using the five picked point correspondences. The ICP algorithm To compare the registration result of OPFI with other state-of-the-art methods, the four datasets were also registered by the FMP-BnB method [12], the BnB method [12], the 4DOF version of Lifting Method (LM) [46] and the 4DOF RANSAC method [27]. The C++ codes of these algorithms implemented by Cai et al. [47] were used. The C++ codes can be accessed through the GitHub repository (https://github. com/ZhipengCai/Demo---Practical-optimal-registration-of-terrestrial-LiDAR-scan-pairs). The default parameter values used in the C++ codes were adopted. To evaluate the registration results, the ground truth of registration parameters and registration results are first obtained. For the DSimRIEGL dataset, part 2 points without transformation are used as ground truth. The translation vector is −1.0 m in x, y, and z. The azimuth angle is −45 degrees. The ground truth of DRIEGL is obtained by artificial marker-based coarse registration and the ICP fine registration algorithm. The coordinates of five spherical balls in the scene were obtained by ball fitting in commercial software RISCAN Pro [41] and used to coarsely register station 2 points to station 1 points. The ICP algorithm was then applied to the coarse registration result. The registration parameters were finally computed by the point pairs before and after registration. The ground truth of DTRIMBLE-1 and DTRIMBLE-2 were obtained by manually picked point correspondences-based coarse registration and the ICP fine registration algorithm. Five point correspondences were manually picked from station 2 and station 1 using the open source software CloudCompare [48]. Coarse registration was achieved using the five picked point correspondences. The ICP algorithm was then applied to the coarse registration result. Similar to the DRIEGL dataset, the registration parameters were computed using the point pairs before and after registration. Figure 6 shows the accuracy of the estimated registration parameters. For the DSimRIEGL dataset, the translation vector accuracies of all algorithms are better than 0.05 m and the azimuth angle accuracies are better than 0.025 degrees. The RANSAC method achieves the best accuracy in translation vector estimation, and the OPFI method achieves the best accuracy in azimuth angle estimation. For the DRIEGL dataset, the OPFI method achieves the best accuracy in horizontal translation vector estimation. The vertical translation accuracy of OPFI is better than the FMP-BnB and BnB methods. The azimuth angle accuracy of OPFI is better than the RANSAC method. For the DTRIMBLE-1 dataset, OPFI achieves the best accuracy in vertical translation and x-direction of horizontal translation. The azimuth angle accuracies of all algorithms are better than 0.16 degrees. For the DTRIMBLE-2 dataset, OPFI achieves much better vertical translation accuracy than the other four algorithms, and achieves better accuracy than FMP-BnB and BnB in x and y-direction. In addition, the OPFI method achieves better azimuth angle accuracy than the other algorithms except for RANSAC. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 13 of 20 was then applied to the coarse registration result. Similar to the DRIEGL dataset, the registration parameters were computed using the point pairs before and after registration. Figure 6 shows the accuracy of the estimated registration parameters. For the DSimRIEGL dataset, the translation vector accuracies of all algorithms are better than 0.05 m and the azimuth angle accuracies are better than 0.025 degrees. The RANSAC method achieves the best accuracy in translation vector estimation, and the OPFI method achieves the best accuracy in azimuth angle estimation. For the DRIEGL dataset, the OPFI method achieves the best accuracy in horizontal translation vector estimation. The vertical translation accuracy of OPFI is better than the FMP-BnB and BnB methods. The azimuth angle accuracy of OPFI is better than the RANSAC method. For the DTRIMBLE-1 dataset, OPFI achieves the best accuracy in vertical translation and x-direction of horizontal translation. The azimuth angle accuracies of all algorithms are better than 0.16 degrees. For the DTRIMBLE-2 dataset, OPFI achieves much better vertical translation accuracy than the other four algorithms, and achieves better accuracy than FMP-BnB and BnB in x and y-direction. In addition, the OPFI method achieves better azimuth angle accuracy than the other algorithms except for RANSAC.  Table 1 and Table 2 show the HRMSE and VRMSE of results registered by OPFI and comparison methods. For the DSimRIEGL dataset, the HRMSE of OPFI is 0.06 m and slightly worse than the other four algorithms. The VRMSE of OPFI is the same as the LM and RANSAC methods, and better than that of the FMP-BnB and BnB methods. For the DRIEGL dataset, the HRMSE of OPFI is better than the FMP-BnB, BnB and LM methods and slightly worse than the RANSAC method. The VRMSE of OPFI is better than the FMP-BnB and BnB methods, and slightly worse than the LM and RANSAC methods. For the DTRIMBLE-1 dataset, even though the HRMSE of OPFI is larger than the other four algorithms, it is still better than 0.1 m. The VRMSE of OPFI is the same as that of LM and better than FMP-BnB, BnB, and RANSAC. For the DTRIMBLE-2 dataset, OPFI achieves comparable HRMSE to the RANSAC algorithm, and is better than the other three algorithms. The OPFI achieves the best VRMSE among the five algorithms for the DTRIMBLE-2 dataset. The results in Table 1 and Table 2 indicate that the proposed OPFI method can achieve centimeter registration accuracy, and the accuracy is comparable to other state-of-the-art methods.  Figure 7 shows the registration error of each point in each dataset as a heat map. The 3D Euclidean distance between a point after registration and its corresponding point in the reference data was used as a registration error. The color of each point ranging from blue to red reflects the magnitude of the registration error. It can be seen from Figure 7a-d that most points were registered with good accuracy (blue and green points). Some points (red points in Figure 7a-d) with long  Tables 1 and 2 show the HRMSE and VRMSE of results registered by OPFI and comparison methods. For the DSimRIEGL dataset, the HRMSE of OPFI is 0.06 m and slightly worse than the other four algorithms. The VRMSE of OPFI is the same as the LM and RANSAC methods, and better than that of the FMP-BnB and BnB methods. For the DRIEGL dataset, the HRMSE of OPFI is better than the FMP-BnB, BnB and LM methods and slightly worse than the RANSAC method. The VRMSE of OPFI is better than the FMP-BnB and BnB methods, and slightly worse than the LM and RANSAC methods. For the DTRIMBLE-1 dataset, even though the HRMSE of OPFI is larger than the other four algorithms, it is still better than 0.1 m. The VRMSE of OPFI is the same as that of LM and better than FMP-BnB, BnB, and RANSAC. For the DTRIMBLE-2 dataset, OPFI achieves comparable HRMSE to the RANSAC algorithm, and is better than the other three algorithms. The OPFI achieves the best VRMSE among the five algorithms for the DTRIMBLE-2 dataset. The results in Tables 1 and 2 indicate that the proposed OPFI method can achieve centimeter registration accuracy, and the accuracy is comparable to other state-of-the-art methods.  Figure 7 shows the registration error of each point in each dataset as a heat map. The 3D Euclidean distance between a point after registration and its corresponding point in the reference data was used as a registration error. The color of each point ranging from blue to red reflects the magnitude of the registration error. It can be seen from Figure 7a-d that most points were registered with good accuracy (blue and green points). Some points (red points in Figure 7a-d) with long scanning distances have a relatively lower registration accuracy compared with those with shorter scanning distances. scanning distances have a relatively lower registration accuracy compared with those with shorter scanning distances. The proposed OPFI method and comparison methods are all implemented using C++. All the experiments were conducted on a laptop with an Intel Core i9-8950 HK CPU and 32 Gigabyte memory. The run time of the proposed OPFI method and comparison methods were recorded and are shown in Figure 8. For OPFI, the time of ortho projected feature image generation, the match of ortho projected feature image, the horizontal translation vector and azimuth estimation, the vertical translation vector estimation, and the transformation of the source points are included. For all four datasets, the proposed OPFI method takes the least time, compared with the other four algorithms. The time used by the proposed OPFI method is less than half that of the other four algorithms. This result indicates that the proposed method is more efficient compared with the other four algorithms.  The proposed OPFI method and comparison methods are all implemented using C++. All the experiments were conducted on a laptop with an Intel Core i9-8950 HK CPU and 32 Gigabyte memory. The run time of the proposed OPFI method and comparison methods were recorded and are shown in Figure 8. For OPFI, the time of ortho projected feature image generation, the match of ortho projected feature image, the horizontal translation vector and azimuth estimation, the vertical translation vector estimation, and the transformation of the source points are included. For all four datasets, the proposed OPFI method takes the least time, compared with the other four algorithms. The time used by the proposed OPFI method is less than half that of the other four algorithms. This result indicates that the proposed method is more efficient compared with the other four algorithms.

Parameter Setting
The image grid resolution S of the feature image may affect the estimation accuracy of the horizontal translation vector and azimuth angle. Theoretically, the smaller the image grid resolution, the higher the accuracy. But in reality, the image grid resolution should be set according to the density of the point cloud. Image grid resolution smaller than point cloud density will lead to too many empty pixels in the feature image, and thus affect the registration of feature images. We used the DSimRIEGL dataset and set the image grid resolution from 0.05 m to 0.5 m to exploit the influence of the image grid resolution on coarse registration accuracy. In the experiment, the α value was set to 0.5 in feature image generation. Figure 9 illustrates coarse registration accuracy with different image grid resolution. The horizontal accuracy is more sensitive to image grid resolution and the vertical accuracy is almost not affected by the image grid resolution. Horizontal registration accuracy of about half of the image grid resolution can be achieved by the proposed coarse registration method.

Parameter Setting
The image grid resolution of the feature image may affect the estimation accuracy of the horizontal translation vector and azimuth angle. Theoretically, the smaller the image grid resolution, the higher the accuracy. But in reality, the image grid resolution should be set according to the density of the point cloud. Image grid resolution smaller than point cloud density will lead to too many empty pixels in the feature image, and thus affect the registration of feature images. We used the DSimRIEGL dataset and set the image grid resolution from 0.05 m to 0.5 m to exploit the influence of the image grid resolution on coarse registration accuracy. In the experiment, the value was set to 0.5 in feature image generation. Figure 9 illustrates coarse registration accuracy with different image grid resolution. The horizontal accuracy is more sensitive to image grid resolution and the vertical accuracy is almost not affected by the image grid resolution. Horizontal registration accuracy of about half of the image grid resolution can be achieved by the proposed coarse registration method. The value of in feature image generation determines the contribution of intensity and height information to the final feature image, and would thus affect the registration of feature images. Figure  10 illustrates the number of correct matched SIFT correspondences with different values. Figure  10 shows that intensity contributes well to the SIFT feature image registration. The larger the value, the more SIFT correspondences can be found. But when is larger than 0.6, the number of matched SIFT correspondences increases slowly. In order to take full advantage of both intensity and height information, value is recommended to be set to a value between 0.5 and 0.7. The value of α in feature image generation determines the contribution of intensity and height information to the final feature image, and would thus affect the registration of feature images. Figure 10 illustrates the number of correct matched SIFT correspondences with different α values. Figure 10 shows that intensity contributes well to the SIFT feature image registration. The larger the α value, the more SIFT correspondences can be found. But when α is larger than 0.6, the number of matched SIFT correspondences increases slowly. In order to take full advantage of both intensity and height information, α value is recommended to be set to a value between 0.5 and 0.7.

Parameter Setting
The image grid resolution of the feature image may affect the estimation accuracy of the horizontal translation vector and azimuth angle. Theoretically, the smaller the image grid resolution, the higher the accuracy. But in reality, the image grid resolution should be set according to the density of the point cloud. Image grid resolution smaller than point cloud density will lead to too many empty pixels in the feature image, and thus affect the registration of feature images. We used the DSimRIEGL dataset and set the image grid resolution from 0.05 m to 0.5 m to exploit the influence of the image grid resolution on coarse registration accuracy. In the experiment, the value was set to 0.5 in feature image generation. Figure 9 illustrates coarse registration accuracy with different image grid resolution. The horizontal accuracy is more sensitive to image grid resolution and the vertical accuracy is almost not affected by the image grid resolution. Horizontal registration accuracy of about half of the image grid resolution can be achieved by the proposed coarse registration method. The value of in feature image generation determines the contribution of intensity and height information to the final feature image, and would thus affect the registration of feature images. Figure  10 illustrates the number of correct matched SIFT correspondences with different values. Figure  10 shows that intensity contributes well to the SIFT feature image registration. The larger the value, the more SIFT correspondences can be found. But when is larger than 0.6, the number of matched SIFT correspondences increases slowly. In order to take full advantage of both intensity and height information, value is recommended to be set to a value between 0.5 and 0.7.

Registration of Point Clouds with Different Point Density
The density of point clouds varies in different applications. Suitability for point clouds with different density is important for a coarse registration method. Figure 11 illustrates the coarse registration results of point clouds with different point density. The point clouds are intentionally sampled to resolutions from 0.05 m to 0.5 m by a voxel grid sample algorithm. In feature image generation, the value of image grid resolution is set to the same as point resolution. Figure 11 shows that the vertical accuracy is almost not affected by the point resolution. The larger the point cloud resolution, the worse the registration accuracy. The horizontal accuracy is about half of the point resolution.

Registration of Point Clouds with Different Point Density
The density of point clouds varies in different applications. Suitability for point clouds with different density is important for a coarse registration method. Figure 11 illustrates the coarse registration results of point clouds with different point density. The point clouds are intentionally sampled to resolutions from 0.05 m to 0.5 m by a voxel grid sample algorithm. In feature image generation, the value of image grid resolution is set to the same as point resolution. Figure 11 shows that the vertical accuracy is almost not affected by the point resolution. The larger the point cloud resolution, the worse the registration accuracy. The horizontal accuracy is about half of the point resolution.

Accuracy and Efficiency
In the experiments, the proposed coarse registration method accomplished coarse registration of DSimRIEGL, DRIEGL, DTRIMBLE-1 and DTRIMBLE-2. The horizontal accuracy of coarse registration is 0.06, 0.02, 0.07, and 0.07 m for DSimRIEGL, DRIEGL, DTRIMBLE-1 and DTRIMBLE-2, respectively. The vertical accuracy is 0.01, 0.02, 0.01, and 0.01 m, respectively. The registration accuracy is comparable to that of FMP-BnB, BnB, LM, and RANSAC algorithm. The experimental results indicate that centimeter registration accuracy can be achieved by the proposed OPFI method, which is sufficient for coarse registration.
In the experiments, the proposed OPFI method only takes dozens of seconds to achieve coarse registration of millions of TLS points. Compared with the FMP-BnB, BnB, LM, and RANSAC algorithms, the proposed OPFI method takes less than half of the time to register the same datasets. It is far more efficient. The proposed OPFI method provides an alternative to efficient and automatic coarse registration of large-scale dense TLS point clouds.

Robustness of Feature Image Matching
Accurate and robust feature image matching is the key to the proposed OPFI 4DOF coarse registration method. Because of occlusion, different viewpoints, and intensity variation [49] of the TLS point clouds, the matching of feature images is difficult and challenging. The SIFT matching correspondences of DRIEGL, DTRIMBLE-1, and DTRIMBLE-2 are much less than that of DSimRIEGL. This indicates that the SIFT feature matching of ortho feature images is affected by occlusion, different viewpoints, and intensity variation of the TLS point clouds. The SIFT feature matching may fail to find enough correspondences in some cases, especially for point clouds acquired by different laser scanners and different platforms. Lines or edges are more invariant to occlusion, viewpoint changing and intensity variation. Line-based [50,51] or edge-based [52] image matching methods could be applied to improve the robustness of feature image matching.

Accuracy and Efficiency
In the experiments, the proposed coarse registration method accomplished coarse registration of DSimRIEGL, DRIEGL, DTRIMBLE-1 and DTRIMBLE-2. The horizontal accuracy of coarse registration is 0.06, 0.02, 0.07, and 0.07 m for DSimRIEGL, DRIEGL, DTRIMBLE-1 and DTRIMBLE-2, respectively. The vertical accuracy is 0.01, 0.02, 0.01, and 0.01 m, respectively. The registration accuracy is comparable to that of FMP-BnB, BnB, LM, and RANSAC algorithm. The experimental results indicate that centimeter registration accuracy can be achieved by the proposed OPFI method, which is sufficient for coarse registration.
In the experiments, the proposed OPFI method only takes dozens of seconds to achieve coarse registration of millions of TLS points. Compared with the FMP-BnB, BnB, LM, and RANSAC algorithms, the proposed OPFI method takes less than half of the time to register the same datasets. It is far more efficient. The proposed OPFI method provides an alternative to efficient and automatic coarse registration of large-scale dense TLS point clouds.

Robustness of Feature Image Matching
Accurate and robust feature image matching is the key to the proposed OPFI 4DOF coarse registration method. Because of occlusion, different viewpoints, and intensity variation [49] of the TLS point clouds, the matching of feature images is difficult and challenging. The SIFT matching correspondences of DRIEGL, DTRIMBLE-1, and DTRIMBLE-2 are much less than that of DSimRIEGL. This indicates that the SIFT feature matching of ortho feature images is affected by occlusion, different viewpoints, and intensity variation of the TLS point clouds. The SIFT feature matching may fail to find enough correspondences in some cases, especially for point clouds acquired by different laser scanners and different platforms. Lines or edges are more invariant to occlusion, viewpoint changing and intensity variation. Line-based [50,51] or edge-based [52] image matching methods could be applied to improve the robustness of feature image matching.

Conclusions
To improve the coarse registration efficiency of large-scale dense TLS point clouds, this paper proposed to reduce the 6DOF registration problem to 4DOF, by fully considering the fact that the laser scanner is usually leveled and compensated by built-in inclination sensors in data acquisition.
An efficient and automatic 4DOF coarse registration framework is proposed. In the proposed framework, the horizontal translation vector with the azimuth angle and the vertical translation vector is estimated separately. The horizontal translation vector and azimuth angle are estimated by the registration of ortho projected feature images generated by source and target points. The vertical translation is estimated by the height difference of source points and target points in the overlapping regions after being horizontally aligned. Simulated pairwise TLS point clouds and point clouds scanned by the RIEGL VZ-400 and the Trimble SX10 are used to validate the proposed method. The experimental results demonstrate that the coarse registration result of the proposed method is at centimeter-level, and only dozens of seconds are needed to coarsely register millions of points.
Because of occlusion, different viewpoints, and the fact that the intensity of laser scanning is affected by many factors, the registration of feature images is challenging. In some cases, the SIFT feature matching may fail to find enough correspondences from feature images of the source point cloud and the target point cloud. Large-scale edges and lines can be extracted from the feature images and they are less affected by occlusion, different viewpoints, and intensity variation. Line-based or edge-based matching methods could be applied to the matching of feature images to improve robustness in the future.
Author Contributions: Conceptualization, Hua Liu; methodology, Hua Liu, and Xiaoming Zhang; software Yuancheng Xu; validation, Xiaoming Zhang and Yuancheng Xu; formal analysis, Hua Liu; data curation, Hua Liu; writing-original draft preparation, Hua Liu; writing-review and editing, Xiaoming Zhang, Yuancheng Xu and Xiaoyong Chen; visualization, Yuancheng Xu; supervision, Xiaoyong Chen. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.