Pairwise Coarse Registration of Indoor Point Clouds Using 2D Line Features

: Registration is essential for terrestrial LiDAR (light detection and ranging) scanning point clouds. The registration of indoor point clouds is especially challenging due to the occlusion and self-similarity of indoor structures. This paper proposes a 4 degrees of freedom (4DOF) coarse registration method that fully takes advantage of the knowledge that the equipment is levelled or the inclination compensated for by a tilt sensor in data acquisition. The method decomposes the 4DOF registration problem into two parts: (1) horizontal alignment using ortho-projected images and (2) vertical alignment. The ortho-projected images are generated using points between the ﬂoor and ceiling, and the horizontal alignment is achieved by the matching of the source and target ortho-projected images using the 2D line features detected from them. The vertical alignment is achieved by making the height of the ﬂoor and ceiling in the source and target points equivalent. Two datasets, one with ﬁve stations and the other with 20 stations, were used to evaluate the performance of the proposed method. The experimental results showed that the proposed method achieved 80% and 63% successful registration rates (SRRs) in a simple scene and a challenging scene, respectively. The SRR in the simple scene is only lower than that of the keypoint-based four-point congruent set (K4PCS) method. The SRR in the challenging scene is better than all ﬁve comparison methods. Even though the proposed method still has some limitations, the proposed method provides an alternative to solve the indoor point cloud registration problem.


Introduction
The easy access to high-quality dense point clouds means that LiDAR (light detection and ranging) scanning has been widely used in areas such as surveying and mapping [1], forestry inventory [2], and hazard monitoring [3] during the last two decades. In recent years, indoor point clouds have become more and more desirable for applications in interactive visualization [4], as-built construction [5,6], indoor navigation [7], and building model reconstruction [8,9]. Cost-effective, convenient, and efficient methods for producing high-quality indoor point clouds are expected.
Various methods and equipment can be employed to acquire indoor point clouds. Traditional terrestrial laser scanning (TLS) is probably the most basic method used to obtain indoor point clouds. TLS can acquire high-accuracy point clouds, but it needs to scan station by station to cover the full view of the indoor scene. Mobile indoor mapping systems, such as NavVis [10], can acquire dense indoor point clouds in a moving mode, which is convenient and efficient, but the accuracy of the produced point clouds is lower than that Besides hand-crafted features and deep learning learnt features, the four-point congruent set (4PCS) is another special feature proposed to register point clouds. 4PCS-based registration methods first extract coplanar four-point-wide bases from source and target points. Affine invariant ratios are defined using base points to find candidate correspondences between wide bases in source and target points. Candidate transformations are estimated based on the candidate 4PCS correspondence, and the one that transforms the source points to have the most points close to the targets points is selected as the optimal and final transformation. The 4PCS algorithm works well for datasets with small overlaps and is resilient to noise and outliers, but is time-consuming for large-scale datasets. To improve its time efficiency, some variants have been developed based on the original 4PCS algorithm. Mellado et al. [33] improved the original 4PCS algorithm by introducing a smart indexing data organization to reduce the quadratic time complexity to linear time complexity. The improved algorithm is called SUPER 4PCS. Keypoint-based 4PCS (K4PCS), proposed by Theiler et al. [34], improves the time efficiency by first extracting keypoints from point clouds and then feeding the keypoints to the 4PCS algorithm, instead of using all points in the source and target points. Similar to K4PCS, Ge [35] proposed to extract semantic keypoints first and then feed the extracted semantic keypoints to the 4PCS algorithm.
Feature-based registration methods accomplish point cloud registration by matching feature correspondences in source and target point clouds. Meanwhile, the probabilistic registration method model uses the distribution of point clouds as a density function and performs registration either by employing a correlation-based approach or using an expectation maximization-based optimization framework [36]. Tsin and Kanade [37] proposed to use a kernel correlation function to measure the affinity between point clouds and register point clouds by finding the maximum kernel correlation using an M-estimator. Jian and Vemuri [38] proposed a unified framework for rigid and nonrigid point cloud registration which uses Gaussian mixture models to represent the point clouds and register the input point clouds by minimizing the statistical discrepancy measure between the two Gaussian mixtures. Myronenko and Song [39] proposed the coherent point drift (CPD) algorithm, which fits the Gaussian mixture model centroids of the source point cloud to the target point cloud by maximizing the likelihood in a way that forces the GMM centroids to move coherently as a group to preserve the topological structure of the point sets.
Even though coarse point cloud registration has been widely studied for a long time, it is still challenging due to the scene complexity, sparse distribution, occlusion, and noises of point clouds, especially for indoor point clouds. Compared with outdoor scenes, indoor scenes have their own characteristics: (1) many self-similarity structures exist and (2) they have much more structured geometric features. The registration of indoor scene point clouds can achieve a better performance if its own characteristics are fully considered. Bueno et al. [6] proposed to extract keypoints from indoor point clouds and then feed the keypoints to a 4PCS algorithm to achieve the registration of indoor point clouds. This method is similar to the K4PCS algorithm [34] for outdoor scenes, and the characteristics of the indoor point clouds are not considered. Tsai and Huang [40] designed an indoor 3D reconstruction system using pan-tilt and an RGB-D sensor. An algorithm to automatically register the data acquired in a fixed station was developed but the global registration of different stations was not presented. Mahmood et al. [5] extracted lines from a horizontal cross-section and registered the as-built model to an as-planned model using the line feature correspondences. This method is suitable for the registration of a point cloud of part of a building (such as a room) to the full building information model (BIM) but not for station-by-station point cloud registration. Sanchez et al. [41] proposed the structured scene feature-based registration (SSFR) algorithm that uses Gaussian images to find corresponding planes for the registration of indoor point clouds that have planes. The SSFR algorithm is not suitable for complex indoor scenes since its plane number is set as constant. Pavan et al. [42] proposed a plane-to-plane correspondences-based method for indoor and outdoor building point cloud registration. Planes are segmented first using the random sample consensus (RANSAC) algorithm and then the planes are matched using complex numbers. Finally, the transformation is estimated using the plane correspondences. This method is time-consuming due to the RANSAC segmentation step.
The rigid transformation between point clouds has six degrees of freedom (DOF) if it does not have any constraints and can be expressed by three translations and three rotation angles. For the state-of-the-art scanners, the inclination of equipment in data acquisition can be compensated by the built-in tilt sensor, and thus the rotation between point clouds scanned from different stations is constrained to azimuth only. Based on this assumption, Cai et al. [43] proposed a fast branch-and-bound (BnB) algorithm based on 3D keypoint correspondences for a 1D rotation search and thus achieved a computationally efficient 4 degrees of freedom (4DOF) registration. In our previous work [44], a two-step 4DOF registration algorithm for outdoor scenes is proposed. In the first step, the horizontal translation and azimuth angle is estimated by the keypoint-based registration of orthoprojected feature images. In the second step, the vertical translation is estimated by the height difference of the overlapping areas after they are horizontally aligned. Ge and Hu [45] proposed a three step 4DOF registration algorithm for urban scene point clouds. In the first step, the 2D transformation was estimated using the matched line primitives. In the second step, the vertical offset was compensated by least squares optimization. Finally, the full transformation was refined by a least squares algorithm using the uniformly sampled patches.
This paper proposes a 4DOF coarse registration algorithm for indoor point clouds. Similar to our previous work [44], the 4DOF registration is divided into two steps. The first step achieves 2D registration in the horizontal plane. The second step aligns the point clouds in the vertical plane after they are horizontally aligned. Differing from our previous work, this paper proposes a new method to generate ortho-projected images for indoor point clouds and achieves horizontal alignment by matching the 2D line features in the ortho-projected images rather than using the keypoint-based registration method. Besides this, this paper achieves vertical alignment by making the heights of the floor and ceiling in the source point cloud and the target point cloud equivalent rather than making the heights of the overlapping regions equivalent.
The remainder of this paper is structured as follows. In Section 2, the used indoor point cloud datasets are introduced, and the proposed registration method is described in detail. In Section 3, the results are presented. In Section 4, we briefly discuss the proposed method and the results. Finally, the conclusions are given in Section 5.

Indoor Point Cloud Datasets
Two datasets were used to validate the proposed method. The first one, provided by ETH Zurich (https://ethz.ch/content/dam/ethz/special-interest/baug/igp/photogrammetryremote-sensing-dam/documents/sourcecode-and-datasets/PascalTheiler/office.zip) and abbreviated as SR, contains five scans of an indoor office ( Figure 1a). All five scans overlap each other, and thus a total of 10 pairs of matching can be conducted ( Table 1). The ground truth transformations between pairs are provided by ETH Zurich. The second one, provided by Wuhan RGSpace Technology Co. LTD (http://rgspace.com/) and abbreviated as CB, contains 20 scans of the interior of a building (Figure 1b). The point clouds were captured by a self-integrated scanner using three Azure Kinect DKs (https: //azure.microsoft.com/en-us/services/kinect-dk/). The Azure Kinect DKs were operated in a WFOV (wide field-of-view) 2 × 2 binned mode in data acquisition and thus the scanning range was less than 2.9 m. The scanner was leveled using tilt sensors with a 0.1 degree precision and rotated around z axis with a pre-calibrated fixed angle interval (30 degrees) in data acquisition. The relative relationships between the Azure Kinect DK sensors were also carefully calibrated, and thus all the data scanned in a station can be transformed to a unified coordinate system. A total of 19 pairs of matching was conducted to ensure that all the 20 scans could be registered to the points of the first scan station. The ground truth ISPRS Int. J. Geo-Inf. 2021, 10, 26 5 of 24 transformations between pairs were provided by Wuhan RGSpace Technology Co. LTD. The pairs used to conduct experiments and other details can be found in Table 1. in data acquisition. The relative relationships between the Azure Kinect DK sensors were also carefully calibrated, and thus all the data scanned in a station can be transformed to a unified coordinate system. A total of 19 pairs of matching was conducted to ensure that all the 20 scans could be registered to the points of the first scan station. The ground truth transformations between pairs were provided by Wuhan RGSpace Technology Co. LTD. The pairs used to conduct experiments and other details can be found in Table 1.

Registration Method
The proposed indoor point cloud registration method assumes that the equipment is leveled or that the inclination is compensated by tilt sensors in data acquisition. The whole process of registration is achieved by three steps: (1) ortho-projected image generation, (2) ortho-projected image registration using 2D line features, and (3) point cloud registration. The overview of the proposed registration method is illustrated in Figure 2. S17 5851471 S18-S19 S18 5665512 S19 5482208

Registration Method
The proposed indoor point cloud registration method assumes that the equipment is leveled or that the inclination is compensated by tilt sensors in data acquisition. The whole process of registration is achieved by three steps: (1) ortho-projected image generation, (2) ortho-projected image registration using 2D line features, and (3) point cloud registration. The overview of the proposed registration method is illustrated in Figure 2.

Registration Framework
The rigid transformation between source point clouds and target point clouds without any constraints can be written as:

Registration Framework
The rigid transformation between source point clouds and target point clouds without any constraints can be written as: where x s y s z s T is the source point and x t y t z t T is the target point. levelled or the inclination is compensated by a tilt sensor in data acquisition, then α and β are equal to zero. In such a situation, Equation (1) can be written as: Equation (3) can be reformatted as: From Equation (5), we can conclude that the 4DOF registration can be decomposed into two steps: (1) the first step estimates the horizontal translation vector and rotation angle around the z axis (named as azimuth angle in the following), which corresponds to R, t xy ; (2) the second step estimates vertical translation, t z .
The estimation of the horizontal translation vector and the azimuth angle is equivalent to the registration of the ortho images of source points and target points. This paper uses vertical plane points such as walls and doors to generate point cloud ortho-projected images to achieve the estimation of the horizontal translation vector and azimuth angle. After being horizontally aligned, the vertical translation is estimated using the height difference of the floor and ceiling in the source and target points.

Ortho-Projected Image Generation
Vertical plane points such as walls and doors are used to generate ortho-projected images. To get vertical plane points, the ground points and ceiling points are first eliminated from the raw indoor point clouds. It is obvious that the ground and ceiling have the most points in the height histogram, as illustrated in Figure 3, so we can find the height of the ground and ceiling using the height histogram and eliminate the points with z values smaller than the ground height and points with z values higher than the ceiling height. Since the equipment is mounted above the floor and the coordinate origin of the sensor is on the equipment, the z values of the floor points are certainly smaller than zero and the z values of ceiling points are certainly larger than zero. The height of the floor points can be found by fitting under the zero part of a height histogram using extreme value distribution. Accordingly, the height of ceiling points can be found by fitting above the zero part of the height histogram using extreme value distribution. The probability den- Since the equipment is mounted above the floor and the coordinate origin of the sensor is on the equipment, the z values of the floor points are certainly smaller than zero and the z values of ceiling points are certainly larger than zero. The height of the floor points can be found by fitting under the zero part of a height histogram using extreme value distribution. Accordingly, the height of ceiling points can be found by fitting above the zero part of the height histogram using extreme value distribution. The probability density function used in this paper is defined as Equation (6). In Equation (6), µ is the location of the maximum value and s is a scale parameter.
After the heights of the floor (denoted as h f ) and ceiling (denoted as h c ) are found, to eliminate moving objects only points with z values between h f + d h f and h c − d h c are used to generate ortho-projected images. In this paper, are used to avoid moving objects on the floor and hanging objects on the ceiling.
Before registration, the source and target points are all in the scanner-owned coordinate system. To generate ortho-projected images of a station point cloud (source points or target points), the minimum x and y values (denoted as x min and y min ) of that station point cloud are found. For a point (x, y, z), its row number and column number in the ortho-projected image are obtained by Equations (7) and (8). L is the grid resolution of the ortho-projected images. Theoretically, the smaller the L, the better the matching accuracy but the higher the computing complexity. To compromise between accuracy and computing complexity, L is empirically set to 0.01 m in this paper. If a pixel in an ortho-projected image contains at least one point, its pixel value is set to 255. Otherwise, the pixel value is set to 0. In such a way, we can obtain a binary ortho-projected image for the source points and target points.

Ortho-Projected Images Registration
The ortho-projected images are binary images, and line features are the most frequently occurring features in the ortho-projected images. This paper uses a line-based algorithm to register the ortho-projected images. The lines in the ortho-projected images are not single-pixel lines; to better detect lines, the ortho-projected images are first thinned by the Zhang-Suen thinning algorithm [46]. Then, the progressive probabilistic Hough transform algorithm [47] is used to detect lines. Finally, the ortho-projected images are registered using the detected 2D lines. The following part of this section are details to register the source and target ortho-projected images using the detected lines.
The points (x, y) and (X, Y) in the source ortho-projected images and target orthoprojected images in the Cartesian coordinates can be written in the homogeneous coordinates [x y 1] T and [X Y 1] T , respectively, where T denotes transpose. The relation between the source ortho-projected image point [x y 1] T and target ortho-projected image point [X Y 1] T can be written as: or If ax + by + 1 = 0 is a line in the source ortho-projected image in Cartesian coordinates, the same line in homogenous coordinates can be written as: where l T = a b 1 is the parameter of a line in a source ortho-projected image. Similarly, line Ax + By + 1 = 0 in a target ortho-projected image in Cartesian coordinates can be written as: or where L T = A B 1 is the parameter of a line in a target ortho-projected image.
Left multiplying both sides of Equation (10) by L T , we obtain: From Equation (14), we know that the left side of Equation (15) is 0. Thus, the right side of Equation (15) is 0, which means: Combining Equations (12) and (16), it can be concluded that: where λ is a scale parameter. Equation (17) establishes the relation between line parameters in source and target ortho-projected images and the transformation between these two ortho-projected images. Equation (17) can be written in the detailed form as: Substituting Equation (20) into Equations (18) and (19), we obtain two equations (Equations (21) and (22)) relating unknown registration parameters θ, t x , t y to the parameters of corresponding lines in the source and target ortho-projected images.
In Equations (21) and (22), a and b are the parameters of a line in a source orthoprojected image, while A and B are the parameters of the corresponding line in a target ortho-projected image. a, b, A, and B are obtained from the result of a progressive probabilistic Hough transform algorithm. Thus, there are only three unknown parameters θ, t x , t y , which we need to solve, in Equations (21) and (22). If we obtain two non-parallel corresponding lines in source and target ortho-projected images, we have four equations to solve the three unknown registration parameters, and thus the unknown registration parameters can be estimated by a least squares algorithm after a linearization or non-linear optimization algorithm, such as the Levenberg-Marquardt (LM) algorithm. One thing that needs to be noted is that there are two solutions for Equations (21) and (22) if only a pair of lines are used. As illustrated in Figure 4, a pair of lines in a source orthoprojected image (l s 1 and l s 2 in Figure 4a) can be matched to its corresponding lines in the target ortho-projected image (l t 1 and l t 2 in Figure 4b) through solution 1 ( Figure 4c) and 2 ( Figure 4d). The translations t x and t y of the two solutions are the same. The rotation angle of solution 1 (denoted as θ 1 ) and solution 2 (denoted as θ 2 ) can be related by Equation (23). If we get one solution, we can get another solution using Equation (23). Then, the two solutions are further evaluated using: Then, the vertical offset between the horizontally aligned source points and the target points is obtained by the source points' floor height (ℎ ), the source points' ceiling height (ℎ ), the target points' floor height (ℎ ), and the target points' ceiling height (ℎ ) by: Finally, the registered point [ ] is obtained by: Like the RANSAC algorithm, this paper selects a pair of lines from the source and target ortho-projected images iteratively, and then the unknown registration parameters are estimated using least squares algorithm according to Equations (21) and (22). Finally, the estimated registration parameters are evaluated by a specific matching quality function Q(S, T), where S denotes the source ortho-projected image and T denotes the target orthoprojected image. Q(S, T) is based on the distance between the transformed pixels in the source ortho-projected image and its nearest neighbor in the target ortho-projected image. In the evaluation step, the pixels in the source ortho-projected image are transformed to the target ortho-projected image using the estimated registration parameters. For each pixel after transformation, its nearest pixel in the target ortho-projected image is found and its matching quality is defined as: In Equations (24) and (25), s i is the transformed source pixel and t j is its nearest neighbor in the target ortho-projected image. D s i , t j is the Euclidian distance between s i and t j . In this paper, threshold d is set to 5 pixels.
If m lines are detected in the source ortho-projected image and n lines are detected in the target ortho-projected image, there are a total of A 2 m pairs of lines in the source ortho-projected image and a total of A 2 n pairs of lines in the target ortho-projected image. Each pair of lines in the source ortho-projected image should be tested to each pair of lines in the target ortho-projected image. If m and n are large, the estimation process is time-consuming. Fortunately, the values of m and n are usually between 4 and 10 in our situation. To reduce the search space, the angle between the two lines is used to constrain the search. First, if the two lines are near parallel, this pair of lines is abandoned. Then, if the angle between the two lines from the source ortho-projected image is obviously different from that of the target ortho-projected image, the estimation and evaluation are skipped.

Point Cloud Registration
After obtaining the horizontal alignment parameters t x , t y , and θ, a source point x s y s z s T is first transformed by: Then, the vertical offset between the horizontally aligned source points and the target points is obtained by the source points' floor height (h s f ), the source points' ceiling height (h s c ), the target points' floor height (h t f ), and the target points' ceiling height (h t c ) by: Finally, the registered point x t y t z t T is obtained by:

Evaluation Metric Descriptions
Like many previous works [43,45,48], the rotation error e R , the translation error e T , the successful registration rate (SRR), and the runtime are used to evaluate the proposed algorithm. The rotation error e R and the translation error e T are defined as: where R and T represent the ground truth of rotation and translation, R and T represent the registration algorithm-estimated rotation and translation, tr(.) denotes the trace of the matrix, and e R is the angle of rotation in the axis-angle representation. Given the rotation error e R and the translation error e T , if e R is smaller than σ R and e T is smaller than σ T , the registration of this pair of point clouds is considered successful. In this paper, σ R is set to 3 degrees and σ T is set to 0.3 m. The rate between the successful registration pairs and the total pairs is defined as the successful registration rate.

Quantitative Evaluations
The registration results of the 10 pairs of point clouds from dataset SR and the 19 pairs of point clouds from dataset CB were quantitatively evaluated using the metrics described in Section 3.1. In order to compare the proposed method to other state-of-the-art methods, the two datasets were also registered by the fast match pruning branch-andbound (FMP-BnB) method [43], the 4DOF RANSAC method [49], the 4DOF version of the
(s1) S18-S19 (s2) S18-S19 (s3) S18-S19 Figure 6. The registration results of 19 pairs of point clouds in dataset CB. The first column is the point clouds before registration, the second column is the point clouds after registration using the proposed method, and the third column is the horizontal section view of the point clouds after registration. The brown points are target points and the blue points are source points.
All the algorithms were implemented using C++ and ran on a desktop with an Intel ® Core™ i7-9700K CPU and a 64 Gigabyte memory. The time to register each pair of point cloud was recorded. Figures 11 and 12 demonstrates the time needed to register the pairs of point clouds in dataset SR and CB using the proposed 2DLF method and the comparison methods. It can be seen from Figure 11 that the proposed 2DLF method was the most timeefficient method in four pairs of point clouds, while the FMP-BnB method, the RANSAC method, the LM method, the GTA method, and the K4PCS method were the most efficient method in 3, 0, 6, 3, and 0 pairs of point clouds in dataset SR. It can be seen from Figure 12 that the FMP-BnB method, the RANSAC method, the LM method, the GTA method, the K4PCS method, and the 2DLF method were the most time-efficient method in 7, 0, 11, 11, 5, and 11 pairs of point clouds in dataset CB. Due to the lower point number in each station, it took less time to register the pairs of point clouds in CB than SR. It can be concluded that the proposed 2DLF method took a little bit more time than the FMP-BnB method, the LM method, and the GTA method, but was obviously more time-efficient than the RANSAC method and the K4PCS method.

Accuracy and Time Efficiency
The proposed method achieved a comparable accuracy compared with the FMPmethod and the K4PCS method and a better accuracy compared with the RANS method, the LM method, and the GTA method in dataset SR. The successful registra rate is only lower than that of the K4PCS method and better than that of the RANS method and LM method in dataset SR. In dataset CB, the proposed method successf registered the maximum number of pairs of point clouds, and the accuracy is better t that of the comparison methods.
Both in dataset SR and CB, the time efficiency of the proposed method is compara with that of the FMP-BnB method, the LM method, and the GTA method, and it is m time-efficient than the RANSAC method and the K4PCS method. The time cost of proposed method is mainly determined by two factors: (1) the number of line pair match and (2) the time efficiency of the line match result in the evaluation algorithm reduce the number of line pairs, the angle between the two lines is checked before ma ing. The line pair is directly abandoned if the angle between the two lines is smaller t the threshold. The match is skipped if the angles between two lines in the source or projected image and the target ortho-projected image are significantly different. Find the nearest point of the transformed source ortho-projected image in the target ortho-p jected image is time-consuming. To make the line match result evaluation algorithm m

Accuracy and Time Efficiency
The proposed method achieved a comparable accuracy compared with the FMP-BnB method and the K4PCS method and a better accuracy compared with the RANSAC method, the LM method, and the GTA method in dataset SR. The successful registration rate is only lower than that of the K4PCS method and better than that of the RANSAC method and LM method in dataset SR. In dataset CB, the proposed method successfully registered the maximum number of pairs of point clouds, and the accuracy is better than that of the comparison methods.
Both in dataset SR and CB, the time efficiency of the proposed method is comparable with that of the FMP-BnB method, the LM method, and the GTA method, and it is more time-efficient than the RANSAC method and the K4PCS method. The time cost of the proposed method is mainly determined by two factors: (1) the number of line pairs to match and (2) the time efficiency of the line match result in the evaluation algorithm. To reduce the number of line pairs, the angle between the two lines is checked before matching. The line pair is directly abandoned if the angle between the two lines is smaller than the threshold. The match is skipped if the angles between two lines in the source orthoprojected image and the target ortho-projected image are significantly different. Finding the nearest point of the transformed source ortho-projected image in the target ortho-projected image is time-consuming. To make the line match result evaluation algorithm more

Accuracy and Time Efficiency
The proposed method achieved a comparable accuracy compared with the FMP-BnB method and the K4PCS method and a better accuracy compared with the RANSAC method, the LM method, and the GTA method in dataset SR. The successful registration rate is only lower than that of the K4PCS method and better than that of the RANSAC method and LM method in dataset SR. In dataset CB, the proposed method successfully registered the maximum number of pairs of point clouds, and the accuracy is better than that of the comparison methods.
Both in dataset SR and CB, the time efficiency of the proposed method is comparable with that of the FMP-BnB method, the LM method, and the GTA method, and it is more time-efficient than the RANSAC method and the K4PCS method. The time cost of the proposed method is mainly determined by two factors: (1) the number of line pairs to match and (2) the time efficiency of the line match result in the evaluation algorithm. To reduce the number of line pairs, the angle between the two lines is checked before matching. The line pair is directly abandoned if the angle between the two lines is smaller than the threshold. The match is skipped if the angles between two lines in the source ortho-projected image and the target ortho-projected image are significantly different. Finding the nearest point of the transformed source ortho-projected image in the target ortho-projected image is timeconsuming. To make the line match result evaluation algorithm more time-efficient, a grid filtering can be applied to the source ortho-projected image and the target ortho-projected image to reduce the point number.

Limitations
When the point clouds that need to be registered are scanned in a closed room, the generated ortho-projected image may be a square or rectangle. In such situation, the matching between the source ortho-projected image and the target ortho-projected image may result in a wrong matching result. As illustrated in Figure 13a,b, the proposed method cannot distinguish these two situations and may lead to a wrong matching result. Fortunately, the indoor scenes often have doors and static objects (as illustrated in Figure 13c), and these doors and static objects can help us to avoid such ambiguity. The successful registration of dataset SR proved that the proposed method can successfully deal with such situations. time-efficient, a grid filtering can be applied to the source ortho-projected image and the target ortho-projected image to reduce the point number.

Limitations
When the point clouds that need to be registered are scanned in a closed room, the generated ortho-projected image may be a square or rectangle. In such situation, the matching between the source ortho-projected image and the target ortho-projected image may result in a wrong matching result. As illustrated in Figure 13a,b, the proposed method cannot distinguish these two situations and may lead to a wrong matching result. Fortunately, the indoor scenes often have doors and static objects (as illustrated in Figure  13c), and these doors and static objects can help us to avoid such ambiguity. The successful registration of dataset SR proved that the proposed method can successfully deal with such situations. For point pairs with one scanned inside a room and one outside the room (for example, S11 and S12 in dataset CB), the registration may fail due to it being hard to find enough corresponding lines in the source and target ortho-projected images. Besides this, some pairs of point clouds scanned in large rooms may only exist one corresponding line due to the limited scanning distance of the used equipment, as illustrated in Figure 14a,b. The failed pairs of S4-S5, S8-S9, S10-S11, S11-S14, S14-S15, and S17-S18 belong to this case.
(a) (b) Figure 14. Case where the source and target ortho-projected images only have one corresponding line due to the limited scanning distance. (a) Target ortho-projected image generated from S17 in CB; (b) source ortho-projected image generated from S18 in CB. The lines inside the red circle are corresponding walls. For point pairs with one scanned inside a room and one outside the room (for example, S11 and S12 in dataset CB), the registration may fail due to it being hard to find enough corresponding lines in the source and target ortho-projected images. Besides this, some pairs of point clouds scanned in large rooms may only exist one corresponding line due to the limited scanning distance of the used equipment, as illustrated in Figure 14a,b. The failed pairs of S4-S5, S8-S9, S10-S11, S11-S14, S14-S15, and S17-S18 belong to this case. time-efficient, a grid filtering can be applied to the source ortho-projected image and the target ortho-projected image to reduce the point number.

Limitations
When the point clouds that need to be registered are scanned in a closed room, the generated ortho-projected image may be a square or rectangle. In such situation, the matching between the source ortho-projected image and the target ortho-projected image may result in a wrong matching result. As illustrated in Figure 13a,b, the proposed method cannot distinguish these two situations and may lead to a wrong matching result. Fortunately, the indoor scenes often have doors and static objects (as illustrated in Figure  13c), and these doors and static objects can help us to avoid such ambiguity. The successful registration of dataset SR proved that the proposed method can successfully deal with such situations. For point pairs with one scanned inside a room and one outside the room (for example, S11 and S12 in dataset CB), the registration may fail due to it being hard to find enough corresponding lines in the source and target ortho-projected images. Besides this, some pairs of point clouds scanned in large rooms may only exist one corresponding line due to the limited scanning distance of the used equipment, as illustrated in Figure 14a,b. The failed pairs of S4-S5, S8-S9, S10-S11, S11-S14, S14-S15, and S17-S18 belong to this case.
(a) (b) Figure 14. Case where the source and target ortho-projected images only have one corresponding line due to the limited scanning distance. (a) Target ortho-projected image generated from S17 in CB; (b) source ortho-projected image generated from S18 in CB. The lines inside the red circle are corresponding walls. Figure 14. Case where the source and target ortho-projected images only have one corresponding line due to the limited scanning distance. (a) Target ortho-projected image generated from S17 in CB; (b) source ortho-projected image generated from S18 in CB. The lines inside the red circle are corresponding walls.

Conclusions
This paper proposes a 4DOF registration method to coarsely register pairwise indoor point clouds that takes full advantage of the knowledge that the equipment is levelled or the tilt compensated is by a built-in inclination sensor. The 4DOF registration was decomposed into horizontal alignment and vertical alignment. The horizontal alignment was achieved by the matching of point cloud generated by ortho-projected images. This paper proposed a new ortho-projected image generation method using points the between floor and ceiling and a new ortho image-matching method that uses 2D line features detected from source and target ortho-projected images. Besides this, this paper proposed to achieve vertical alignment using the height difference of the floor and ceiling between the source points and target points. Two datasets, one with five stations and the other with 20 stations, were used to evaluate the performance of the proposed method. The experimental results show that the proposed method achieved an 80% successful registration rate in a simple scene, which is better than the FMP-BnB method (70%), the RANSAC method (70%), the LM method (50%), and the GTA method (70%). The proposed method achieved a better successful registration rate (63%) in challenging scenes than the FMP-BnB method (10%), the RANSAC method (21%), the LM method (5%), the GTA method (0%), and the K4PCS method (16%). The proposed method was more time-efficient than the RANSAC method and the K4PCS method. Even though the proposed method still has some limitations in indoor point cloud registration, it provides an alternative to solve the indoor point cloud registration problem.
In this paper, ortho-projected images were generated using only vertical plane points. Due to occlusion and the limited scanning range, it may fail to find enough lines in the source and target ortho-projected images and thus fail to register the point clouds. In the future, all source and target points may be exploited to generate ortho-projected images to improve the robustness of the method.

Data Availability Statement:
The dataset SR used in this paper is publicly available dataset and can be found here: https://ethz.ch/content/dam/ethz/special-interest/baug/igp/photogrammetryremote-sensing-dam/documents/sourcecode-and-datasets/PascalTheiler/office.zip. The dataset CB used in this paper is available on request from the corresponding author. The dataset is not publicly available due to it is collected by a commercial company.