A Ridgeline-Based Terrain Co-registration for Satellite LiDAR Point Clouds in Rough Areas

: It is still a completely new and challenging task to register extensive, enormous and sparse satellite light detection and ranging (LiDAR) point clouds. Aimed at this problem, this study provides a ridgeline-based terrain co-registration method in preparation for satellite LiDAR point clouds in rough areas. This method has several merits: (1) only ridgelines are extracted as neighbor information for feature description and their intersections are extracted as keypoints, which can greatly reduce the number of points for subsequent processing, and extracted keypoints is of high repeatability and distinctiveness; (2) a new local-reference frame (LRF) construction method is designed by combining both three dimensional (3D) coordinate and normal vector covariance matrices, which e ﬀ ectively improves its direction consistency; (3) a minimum cost–maximum ﬂow (MCMF) graph-matching strategy is adopted to maximize similarity sum in a sparse-matching graph. It can avoid the problem of “many-to-many” and “one to many” caused by traditional matching strategies; (4) a transformation matrix-based clustering is adopted with a least square (LS)-based registration, where mismatches are eliminated and correct pairs are fully participated in optimal parameters evaluation to improve its stability. Experiments on simulated satellite LiDAR point clouds show that this method can e ﬀ ectively remove mismatches and estimate optimal parameters with high accuracy, especially in rough areas. ridgeline-based keypoint extraction, stable LRF construction, generation, MCMF-based feature-matching, transformation matrix-based clustering and Rodriguez-LS-based registration, which are respectively introduced below.


Introduction
Different from conventional two dimensional (2D) remote sensed imagery, light detection and ranging (LiDAR) can obtain dense point clouds with three-dimensional information of high accuracy. Over the past decades, with the hardware development and the progress of data processing algorithms, LiDAR has been carried on various platforms (airborne, unmanned aerial vehicles, vehicle, handheld) with wide applications, for example, electric power design [1], heritage modeling [2], water surveys [3], environment survey [4] and urban and land planning [5].
However, up to now, LiDAR has rarely been carried on satellites. There are two main obstacles: (1) the altitude of satellites is too high, and the distance between laser is certain, resulting in its low density; (2) the energy of laser emitter is limited, and a large percentage of energy is lost after long-distance transmission, resulting in its weak signal. In recent years, with the lightweight of laser sensors and the intensive density of point clouds, increasing attention has been poured into the satellite LiDAR data [6]. To improve the vertical accuracy of stereo satellite images without ground control, the ZY(Zi-Yuan)-302 satellite, launched on 30 May 2016, carried with it the first Chinese earth observation satellite laser altimeter, which can improve the vertical accuracy from 11 m to 1.9 m through active measurement [7]. ICESat (ice, cloud, and land elevation satellite)-2-launched on 15 September 2018, loaded a multibeam, non-scan, single-photon laser altimeter ATLAS (advanced topographic laser altimeter system) for sea ice change detection, 3D surface information survey and global biomass estimation [4]. Global ecosystem dynamics investigation (GEDI), launched on 5 December 2018 with a full-waveform LiDAR, was designed to provide the first global, high-resolution observations of forest vertical structure [8]. With the successive launch of satellites equipped with lasers, the processing of satellite LiDAR data exerts a growing interest among researchers [9][10][11]. Registration of satellite LiDAR point clouds with sparse density and large footprint is a big challenge, and to date, there is no relevant research. With the registration algorithms of satellite images are increasingly mature in multi-spectral [12], multi-temporal [13] and multisensor [14], it's provide a good example for satellite LiDAR point cloud registration. Thus, to address its challenge, this study proposes a ridgeline-based terrain co-registration of satellite LiDAR point clouds.

Related Work
As one of the crucial steps in 3D point cloud processing, registration has two main phases: the coarse alignment and the fine alignment [15]. For decades, except ICP (iterative closest point)-based fine registration, a multitude of coarse registration algorithms have emerged to transform the 'source' point clouds to the 'target' point clouds. These methods can be categorized into three types: RANSAC (random sample consensus)-based methods, 4PCS (4 points congruent sets)-based methods and LS (least square)-based methods.

RANSAC and Its Variations
RANSAC [16], as one of widely used strategies to cope with noise in point cloud registration [17], is a random, data-driven process with two iterative steps: generating a hypothesis by random samples, and verifying hypotheses by the remaining data [18]. However, it is of low efficiency and accuracy when there is a host of outliers.
To address these challenges, many variations have been developed. Considering that the distance between inliers is closer than that between outliers, adjacent points sample consensus (NAPSAC) [19] algorithm was proposed to handle datasets with high dimension and low proportion of inliers. To improve the registration efficiency, the progressive sample consensus (PROSAC) [20] took initial matching results as a basis of sorting, and sampled the results from high to low. In ARRSAC (adaptive real-time random sample consensus) [21], more than one model was generated, and the best model was selected by iteratively sorting and choosing according to scores of an objective function. GroupSAC [22] combined both NAPSAC and PROSAC, where points were grouped according to similarities and then sampled from the largest clustering according to the PROSAC. However, a prior knowledge was needed for classification. Persad et al. [15] proposed a threshold-free modified-RANSAC to eliminate false keypoint matches, where a spatial nearest neighbor consistency check was applied to ensure that true correspondences were stored as inliers. Quan et al. [23] introduced an efficient and accurate GC1SAC (global-constrained one-point-based sample consensus) algorithm for robust transformation estimation, where only one correspondence and its LRF were utilized for iterative computation.

PCS and Its Variations
The 4PCS [24] algorithm was developed under the RANSAC framework. Through constructing and finding similar 4-point congruent sets in the target and source point clouds, it could achieve satisfactory registration results even with small overlap and noise. The affine invariance constraint and the LCP (large common point set) strategy were used to find 4-point congruent sets with maximum overlap. However, it was of huge search space, high complexity and low efficiency.
To speed up, a series of improvements based on 4PCS came into being. In K4PCS (keypoint 4PCS, [25]) algorithm, a keypoint detection was added instead of random sampling to achieve high registration accuracy. Super 4PCS [26] algorithm adopted an intelligent index strategy to reduce the computational complexity from O(n2) to O(n). The algorithm could be used for point cloud registration of large areas, but it had a bottleneck while searching coplanar points in all available source points. The generalized 4PCS [27] algorithm improved the construction process by generalizing the 4-point base, which was no longer strictly limited in a plane. It greatly improved the matching efficiency. Super generalized 4PCS [28] algorithm was a combination of super 4PCS algorithm and generalized 4PCS algorithm, using an intelligent index strategy and non-coplanar optimization. Semantic keypoint 4PCS [29] was proposed aiming at urban area registration, where semantic keypoints of building objects were extracted layer by layer, instead of original random sampling points for point cloud registration. The 2PNS (2 point + normal sets) [30] algorithm used normal vectors between two points to build matching pairs. It greatly speeded up and can cope with the scene registration with a small overlap. To reduce the number of 4-point sets, Xu et al. [31] proposed a multiscale sparse features 4PCS (MSSF4PCS) algorithm, adding a normal vector constraint for point-matching. At the same time, it optimized matching results and improves registration accuracy by calculating point characteristics in the neighborhood of 4-point pairs. In V4PCS (volumetric 4PCS) [32,33] algorithm, volume consistency was added to construct the coplanar four-point base, extending coplanar four-point-matching to non-coplanar four-point-matching. It could reduce computational complexity and improves operational efficiency.

LS and Its Variations
Least square (LS) is a mathematical optimization strategy, finding the best function of data by minimizing the square sum of errors. However, this method is sensitive to noise and when the fitting function is nonlinear, iteration is required.
Originally, the LS algorithm was widely used for 2D image alignment. Due to its high flexibility and powerful mathematical model [34], a large amount of LS-based methods was put forward to enhance its robustness. Gruen and Akca [34] used the LS-based method for surface-matching based on a generalized Gauss-Markov (GM) model. However, only random errors of target point clouds were modeled in its observation equations. A constrained total least squares (CTLS) was introduced by Chen et al. [35] to solve large rotation registration with an errors-in-variables (EIV) model. Compared with traditional constrained least squares (CLS) methods, it could well solve the impact of random errors. Aydar et al. [36] considered stochastic properties of both observations and parameters and proposed a total least square (TLS) registration of 3D surfaces, where an EIV model was utilized and its parameters were estimated by a TLS method. Ge and Wunderlich [37] proposed a surface-based-matching of 3D point clouds with a nonlinear Gauss-Helmert model to solve the weighted total least-squares problem. The Gauss-Helmert least-squares 3D (GH-LS3D) method considered random errors of both target and source. Ahmed et al. [38] proposed a least-squares registration of point sets over SE(d) (subgroup of Euclidean group E(d)) using closed-form projections, where the original least-squares problem was derived from the SDP (semidefinite program) solution via a linear map, solved by a variable splitting and ADMM (alternating directions method of multipliers). To avoid the influence of gross errors and extend its application, Yu et al. [39] proposed an advanced outlier detected total least-squares (ODTLS) method. However, the ODTLS was a dual iterative solution, induced to huge computation.

Contribution
As one of the important topographic feature lines, ridgelines play a vital role in terrain skeleton structure representation, forming topography boundaries together with valley lines [40]. They have two special advantages: stability, and distinctive when point clouds are sparse. Thus, this paper proposes a ridgeline-based terrain co-registration in preparation for satellite LiDAR point clouds in rough areas. The main contribution of the paper manifests in four aspects: (1) for extensive and enormous raw data, only point clouds of ridgelines are extracted as neighbor information for feature description, and then their intersections are regarded as keypoints. This step can efficiently reduce the number of points for subsequent processing and extracted keypoints is of high repeatability and distinctiveness.
(2) for the direction instability of the LRF's X and Y axes decomposed by traditional coordinate covariance, this study designs a new LRF construction method by combining both 3D coordinate covariance and normal vector covariance, which can effectively improve its direction consistency.
(3) the descriptor-matching is converted into a graph-matching, where a minimum cost-maximum flow method is adopted. It can efficiently achieve a global optimum in a sparse-matching graph and avoid the problem of "many-to-many" and "one to many" caused by traditional nearest neighbor-based or ratio-based matching strategies.
(4) different from traditional RANSAC or 4PCS-based methods, the registration makes full use of correct correspondences, where a transformation matrix-based clustering method is applied to reject gross error and select the maximum similar sample set and then optimal parameters are solved by the Rodriguez-LS-based algorithm. This step can effectively eliminate mismatches and obtain optimal parameters with high accuracy.

Overview
The paper is organized as follows. Section 2 introduces three kinds of DEM to analog satellite LiDAR point clouds in experiments. Section 3 presents the details of the ridgeline-based terrain co-registration method for satellite LiDAR point clouds. In Section 4, criteria for accuracy analysis are introduced. Experimental results and evaluations of the proposed method are shown in Section 5. Section 6 discusses the effect of different overlap, topography and noise on the proposed co-registration algorithm. Finally, a conclusion and some future researches related to this study are presented in Section 7.

Datasets
It is noteworthy that since there is no satellite LiDAR point cloud up to now, we use existing worldwide DEM (digital elevation model) data and airborne LiDAR point clouds to simulate satellite LiDAR point clouds. In the following experiments, three kinds of worldwide DEM data are utilized: ALOS DEM, ASTER GDEM and SRTM DEM, where the point distance of generated point clouds is equal to the resolution the original DEM.
ALOS DEM: it is an acquisition of the advanced land observing satellite (ALOS), launched in 2006, with a phased array L-band synthetic aperture radar (PALSAR). It has three observation modes: high resolution, scanning synthetic aperture radar and polarization. The horizontal and vertical accuracy of the DEM can reach 12 meters. The data were downloaded from the website: http://www.eorc.jaxa.jp/ALOS/en/aw3d30/data/index.htm.
ASTER GDEM: it is a global digital elevation data jointly released by the American National Aeronautics and Space Administration (NASA) and the Japanese Ministry of Economy, Trade and Industry (METI). The DEM is based on the observation results of Terra, a new generation of earth observation satellite of NASA. It is made of 1.3 million stereo image pairs collected by ASTER (advanced spaceborne thermal emission and reflection radiometer) sensors, covering all land areas between 83 • N and 83 • S. Its horizontal accuracy is 30 m with 95% confidence, and the elevation accuracy is 20 m with 95% confidence. The data were downloaded from the website: http://reverb.echo.nasa.gov/reverb/. SRTM DEM: it is provided by NASA's shuttle radar topographic mission (SRTM), covering areas between 60 • N and 60 • S. According to its accuracy, SRTM DEM can be divided into SRTM1 and SRTM3, with the corresponding resolution of 30 m and 90 m, respectively. The SRTM3 DEM is available for over 80% of the globe. SRTM1 DEM was also produced but is not available for all countries. The resolution of the DEM data used in the following experiments is 90 m. The vertical accuracy of the DEM is reported to be less than 16 m. The data were downloaded from the website: http://srtm.csi.cgiar.org/download. and ancillary data products produced through this project are public domain data. The data were downloaded from the website: https://portal.opentopography.org.
As shown in Figure 1, in the experiments, we mainly used the following six datasets, where Figure 1a-c is with different resolutions in the same region and Figure 1d-f are of small overlap with the same resolution. For the first three data (Figure 1a-c)-although their resolutions and accuracy are different-trends of their topography are basically coincident from a macro perspective. From the microscopic view (Figure 1f,g), with the decreasing resolution, their details become poor and blurred, which only reflects the overall distribution and many small cliffs were ignored. Figure 1d,e is simulated LiDAR point clouds in flat and rough terrain, where Line 1 and Line 2 (Figure 1d), Line 3 and Line 4 ( Figure 1e) are only partial overlapped. Their detailed information are reported in Table 1. ancillary data products produced through this project are public domain data. The data were downloaded from the website: https://portal.opentopography.org. As shown in Figure 1, in the experiments, we mainly used the following six datasets, where Figure 1a-c is with different resolutions in the same region and Figure 1d-f are of small overlap with the same resolution. For the first three data (Figure 1a-c)-although their resolutions and accuracy are different-trends of their topography are basically coincident from a macro perspective. From the microscopic view (Figure 1f,g), with the decreasing resolution, their details become poor and blurred, which only reflects the overall distribution and many small cliffs were ignored. Figure 1d,e is simulated LiDAR point clouds in flat and rough terrain, where Line 1 and Line2 (Figure 1d), Line 3 and Line 4 ( Figure 1e) are only partial overlapped. Their detailed information are reported in Table  1.

Methodology
As shown in Figure 2, similar to a majority of descriptor-based registration methods, the proposed ridgeline-based terrain co-registration for satellite LiDAR point clouds is composed of several steps: ridgeline-based keypoint extraction, stable LRF construction, voxel descriptor generation, MCMF-based feature-matching, transformation matrix-based clustering and Rodriguez-LS-based registration, which are respectively introduced below. *It is worth noting that although the resolutions of the above data are different, they were first downsampled to 120 m in the experiment to make them sparser.

Methodology
As shown in Figure 2, similar to a majority of descriptor-based registration methods, the proposed ridgeline-based terrain co-registration for satellite LiDAR point clouds is composed of several steps: ridgeline-based keypoint extraction, stable LRF construction, voxel descriptor generation, MCMF-based feature-matching, transformation matrix-based clustering and Rodriguez-LS-based registration, which are respectively introduced below.

Ridgeline-Based Keypoint Extraction
Satellite LiDAR point clouds are extensive, enormous and sparse and its feature varies slightly on a local surface. Ridgelines, as one of the important macroscopic topographic feature lines [41], are stable and distinctive when point clouds are sparse. Thus, this study designs a ridgeline-based keypoint extraction method.
As shown in Figure 3, to reduce the number of points for subsequent processing, the original point clouds are downsampled and divided into grids first (the sampling distance and grid size are both set to 120 m in this study). Second, a mean filter with a window (size is set to 5*5) is conducted on the sampled data. By subtracting the averaged data from the sampled data, the outlined data are obtained. Then, to get ridgelines, outline point clouds are refined by an improved thinning method. By summarizing previous classical thinning algorithm [42], thinning rules for each iteration are defined as follows: (1) the deleted point must be the foreground point; (2) the deleted point must be the edge point; (3) the endpoint cannot be deleted; (4) the isolated point cannot be deleted; (5) the connectivity should be unchanged; (6) the edge should be deleted first. After refinement, a spatial clustering [43] is utilized to remove trivial points. Finally, intersections with ridgelines in at least three directions are extracted as keypoints.

Ridgeline-Based Keypoint Extraction
Satellite LiDAR point clouds are extensive, enormous and sparse and its feature varies slightly on a local surface. Ridgelines, as one of the important macroscopic topographic feature lines [41], are stable and distinctive when point clouds are sparse. Thus, this study designs a ridgeline-based keypoint extraction method.
As shown in Figure 3, to reduce the number of points for subsequent processing, the original point clouds are downsampled and divided into grids first (the sampling distance and grid size are both set to 120 m in this study). Second, a mean filter with a window (size is set to 5 * 5) is conducted on the sampled data. By subtracting the averaged data from the sampled data, the outlined data are obtained. Then, to get ridgelines, outline point clouds are refined by an improved thinning method. By summarizing previous classical thinning algorithm [42], thinning rules for each iteration are defined as follows: (1) the deleted point must be the foreground point; (2) the deleted point must be the edge point; (3) the endpoint cannot be deleted; (4) the isolated point cannot be deleted; (5) the connectivity should be unchanged; (6) the edge should be deleted first. After refinement, a spatial clustering [43] is utilized to remove trivial points. Finally, intersections with ridgelines in at least three directions are extracted as keypoints.

Stable LRF construction
A unique and stable LRF plays a significant role in both a descriptor's robustness and descriptiveness [44]. A descriptor's invariance to rigid transformation can be achieved by its LRF. The core of LRF construction is to determine two stable axes, which is traditionally determined by a coordinate or normal vector covariance matrix. Usually, the maximum eigenvector is chosen as the X-axis, while the minimum is the Z-axis [23,44,45]. However, the direction of these LRF is ambiguous and through experiments (presented in Section 5.2), it is found that the minimum eigenvector decomposed by 3D coordinate covariance matrix is relatively more stable and unique than other two eigenvectors (Figure 4a), while the maximum eigenvector decomposed by normal vector covariance matrix is the same (Figure 4b). Therefore, inspired by [46], the 3D coordinate covariance and normal vector covariance are combined to construct a stable LRF. Once per keypoint pi is detected, neighbor points qj in the local surface Si = {qj, || qj -pi || < R} are obtained by a radius R, where ||qj -pi|| is the distance between keypoint pi and neighbor points qj. Moreover, their 3D coordinate cj (x, y, z) and normal vector nj (a, b, c) are, respectively used to construct the 3D coordinate covariance (Equation (1)) and the normal vector covariance (Equation (2)). Then, a PCA (Principal Component Analysis) decomposition is, respectively used for two covariances to acquire their eigenvectors. Let the minimum eigenvector of the 3D coordinate covariance be Z3d and the maximum eigenvector of the normal vector covariance be Xnormal. The LRF is defined as Equations (3)-(5), where ci, ni are the coordinate and normal vector of keypoint pi.To remove the ambiguity of LRF, the Xnormal and Z3d axes are oriented towards the majority direction of vectors [47].

Stable LRF Construction
A unique and stable LRF plays a significant role in both a descriptor's robustness and descriptiveness [44]. A descriptor's invariance to rigid transformation can be achieved by its LRF. The core of LRF construction is to determine two stable axes, which is traditionally determined by a coordinate or normal vector covariance matrix. Usually, the maximum eigenvector is chosen as the X-axis, while the minimum is the Z-axis [23,44,45]. However, the direction of these LRF is ambiguous and through experiments (presented in Section 5.2), it is found that the minimum eigenvector decomposed by 3D coordinate covariance matrix is relatively more stable and unique than other two eigenvectors (Figure 4a), while the maximum eigenvector decomposed by normal vector covariance matrix is the same (Figure 4b). Therefore, inspired by [46], the 3D coordinate covariance and normal vector covariance are combined to construct a stable LRF.

Stable LRF construction
A unique and stable LRF plays a significant role in both a descriptor's robustness and descriptiveness [44]. A descriptor's invariance to rigid transformation can be achieved by its LRF. The core of LRF construction is to determine two stable axes, which is traditionally determined by a coordinate or normal vector covariance matrix. Usually, the maximum eigenvector is chosen as the X-axis, while the minimum is the Z-axis [23,44,45]. However, the direction of these LRF is ambiguous and through experiments (presented in Section 5.2), it is found that the minimum eigenvector decomposed by 3D coordinate covariance matrix is relatively more stable and unique than other two eigenvectors (Figure 4a), while the maximum eigenvector decomposed by normal vector covariance matrix is the same (Figure 4b). Therefore, inspired by [46], the 3D coordinate covariance and normal vector covariance are combined to construct a stable LRF. Once per keypoint pi is detected, neighbor points qj in the local surface Si = {qj, || qj -pi || < R} are obtained by a radius R, where ||qj -pi|| is the distance between keypoint pi and neighbor points qj. Moreover, their 3D coordinate cj (x, y, z) and normal vector nj (a, b, c) are, respectively used to construct the 3D coordinate covariance (Equation (1)) and the normal vector covariance (Equation (2)). Then, a PCA (Principal Component Analysis) decomposition is, respectively used for two covariances to acquire their eigenvectors. Let the minimum eigenvector of the 3D coordinate covariance be Z3d and the maximum eigenvector of the normal vector covariance be Xnormal. The LRF is defined as Equations (3)-(5), where ci, ni are the coordinate and normal vector of keypoint pi.To remove the ambiguity of LRF, the Xnormal and Z3d axes are oriented towards the majority direction of vectors [47]. Once per keypoint p i is detected, neighbor points q j in the local surface S i = {q j , || q j − p i || < R} are obtained by a radius R, where ||q j − p i || is the distance between keypoint p i and neighbor points q j . Moreover, their 3D coordinate c j (x, y, z) and normal vector n j (a, b, c) are, respectively used to construct the 3D coordinate covariance (Equation (1)) and the normal vector covariance (Equation (2)). Then, a PCA (Principal Component Analysis) decomposition is, respectively used for two covariances to acquire their eigenvectors. Let the minimum eigenvector of the 3D coordinate covariance be Z 3d and the maximum eigenvector of the normal vector covariance be X normal . The LRF is defined as Remote Sens. 2020, 12, 2163 8 of 24 Equations (3)-(5), where c i , n i are the coordinate and normal vector of keypoint p i . To remove the ambiguity of LRF, the X normal and Z 3d axes are oriented towards the majority direction of vectors [47].

Descriptor Generation
Voxelization is an outstanding surface representation strategy with high efficiency in both computation and storage [48]. It has been extensively utilized in robotics and computer vision field [23]. Considering point clouds of ridgelines with wide footprints and sparse density, 3D voxelization is of high redundancy. Thus, a 2D rasterization is applied for feature description.
Once the LRF of each keypoint p i is established, the local surface S i is first transformed into a new coordinate system according to its LRF to keep rotation and translation invariance. Considering that the Z-axis of LRFs is more stable compared with its X and Y axes, the transformed local surface S i ' is projected onto the XY plane of the LRF (Figure 5a). Then, the projected plane S p is divided into 2D rasters with the number of g * g (Figure 5b). Further, a raster is coded according to whether it contains points or not. As shown in Figure 5c, if the raster contains points, it is coded as '1', otherwise, it is '0'. Thus, far, after all rasters in the neighborhood are coded, the final descriptor of the keypoint p i is generated by integrating all labels into a bit string with a length of g * g (for example, {000000010000......0000000100000}).

Descriptor generation
Voxelization is an outstanding surface representation strategy with high efficiency in both computation and storage [48]. It has been extensively utilized in robotics and computer vision field [23]. Considering point clouds of ridgelines with wide footprints and sparse density, 3D voxelization is of high redundancy. Thus, a 2D rasterization is applied for feature description.
Once the LRF of each keypoint pi is established, the local surface Si is first transformed into a new coordinate system according to its LRF to keep rotation and translation invariance. Considering that the Z-axis of LRFs is more stable compared with its X and Y axes, the transformed local surface Si' is projected onto the XY plane of the LRF (Figure 5a). Then, the projected plane Sp is divided into 2D rasters with the number of g * g (Figure 5b). Further, a raster is coded according to whether it contains points or not. As shown in Figure 5c, if the raster contains points, it is coded as '1', otherwise, it is '0'. Thus, far, after all rasters in the neighborhood are coded, the final descriptor of the keypoint pi is generated by integrating all labels into a bit string with a length of g * g (for example, {000000010000......0000000100000}).

Max Flow Min Cost-Based Graph-Matching
Similar to our previous work [45], after the descriptor generation, we define the length of the descriptor minus Hamming distance as the similarity score, where the Hamming distance is the number of '1' in x XOR y (x and y are binary descriptors). Following, a feature-matching strategy is required. Traditional matching strategies, such as nearest neighbor (NN) or nearest neighbor distance ratio (NNDR)-based strategies, tend to fall into a local optimum [49]. The global strategy, KM (Kuhn_Munkres) algorithm adopted in our previous work [45], has a shortcoming that its matching matrix must be square. However, factually, the amount of keypoints in the target and source is not equal, and matches with small similarities usually do not need to be considered. Thus, an efficient matching strategy called the minimum cost-maximum flow (MCMF) [50] is introduced for the nonsquare sparse matrix.
Graph construction: as shown in Figure 6, the capacity cost graph G=(V, E, c, w) in MCMF algorithm is constructed as follows: the graph G has N nodes which include one virtual source node

Max Flow Min Cost-Based Graph-Matching
Similar to our previous work [45], after the descriptor generation, we define the length of the descriptor minus Hamming distance as the similarity score, where the Hamming distance is the number of '1' in x XOR y (x and y are binary descriptors). Following, a feature-matching strategy is required. Traditional matching strategies, such as nearest neighbor (NN) or nearest neighbor distance ratio (NNDR)-based strategies, tend to fall into a local optimum [49]. The global strategy, KM (Kuhn_Munkres) algorithm adopted in our previous work [45], has a shortcoming that its matching matrix must be square. However, factually, the amount of keypoints in the target and source is not equal, and matches with small similarities usually do not need to be considered. Thus, an efficient matching strategy called the minimum cost-maximum flow (MCMF) [50] is introduced for the non-square sparse matrix.
Graph construction: as shown in Figure 6, the capacity cost graph G = (V, E, c, w) in MCMF algorithm is constructed as follows: the graph G has N nodes which include one virtual source node vs, one virtual sink node v t , one layer v i (1 ≤ i < m) corresponding to keypoints in the source and one layer v j (m ≤ j < N) corresponding to keypoints in the target. For arcs e = (vs, v i ) or e = (v j , v t ) connected to the source node vs or sink node v t , their capacity c si or c jt are set as 1 and their cost w si and w jt of unit flow is 0. For other arcs e = (v i , v j ), their capacity c ij is set as 1, while their cost of unit flow w ij is set as 1-s ij (s ij is the similarity between the keypoint v i and keypoint v j ). Let f be a feasible flow on capacity cost network G, the cost of the flow f is defined as Graph-matching: after the capacity cost graph G is constructed, a minimum cost-maximum flow (MCMF) strategy is adopted to find a feasible maximum flow f = { f1, f2, f3,..., fk} with k matching pairs (k is a given non-negative value) and total minimum cost w(f). In each iteration of the augmented path searching, a weighted residual digraph (Figure 6a) is constructed according to the current cost flow graph G. Each arc e = (vi, vj) in the capacity cost graph is decomposed into two mutually inverse arcs vij (colored in black in Figure 6a) and vji (colored in orange in Figure 6a). The capacity and cost of inverse arcs are defined as follows, where its capacity cji = 1 -cij and its cost wji = -wij. Through each flow fij, the correspondence (vi, vj) can be obtained. The detailed steps are shown in the Appendix, Table.A1.

Transformation Matrices-Based Clustering
After graph-matching, there are still a large number of mismatches (shown in Figure 7a). However, in previous literature [15,23,24,26], RANSAC and 4PCS-based algorithms were frequently used to deal with noise, which do not make full use of all correct matching pairs. Thus, considering that the correct matches' transformation matrix is of high consistency, while the wrong ones are diverse, a transformation matrix-based clustering is applied to eliminate mismatches.

Transformation Matrix Computation
Theoretically, if only 3D coordinates of correspondences are used, at least three correspondences are required to solve a transformation matrix between two 3D models. Raposo and Barreto [30] calculated the transformation matrix by two correspondences and their normal vectors. It is noteworthy that, if the LRF of keypoints has been established, only one correspondence is adequate [23]. Therefore, we use one correspondence and its LRF to calculate its transformation matrix. The equation is shown as follows: Graph-matching: after the capacity cost graph G is constructed, a minimum cost-maximum flow (MCMF) strategy is adopted to find a feasible maximum flow f = { f 1 , f 2 , f 3 , . . . , f k } with k matching pairs (k is a given non-negative value) and total minimum cost w(f). In each iteration of the augmented path searching, a weighted residual digraph (Figure 6a) is constructed according to the current cost flow graph G. Each arc e = (v i , v j ) in the capacity cost graph is decomposed into two mutually inverse arcs v ij (colored in black in Figure 6a) and v ji (colored in orange in Figure 6a). The capacity and cost of inverse arcs are defined as follows, where its capacity c ji = 1 − c ij and its cost w ji = −w ij . Through each flow f ij , the correspondence (v i , v j ) can be obtained. The detailed steps are shown in the Appendix A, Table A1.

Transformation Matrices-Based Clustering
After graph-matching, there are still a large number of mismatches (shown in Figure 7a). However, in previous literature [15,23,24,26], RANSAC and 4PCS-based algorithms were frequently used to deal with noise, which do not make full use of all correct matching pairs. Thus, considering that the correct matches' transformation matrix is of high consistency, while the wrong ones are diverse, a transformation matrix-based clustering is applied to eliminate mismatches.

Transformation Matrix Computation
Theoretically, if only 3D coordinates of correspondences are used, at least three correspondences are required to solve a transformation matrix between two 3D models. Raposo and Barreto [30] calculated the transformation matrix by two correspondences and their normal vectors. It is noteworthy that, if the LRF of keypoints has been established, only one correspondence is adequate [23]. Therefore, Remote Sens. 2020, 12, 2163 10 of 24 we use one correspondence and its LRF to calculate its transformation matrix. The equation is shown as follows: where R i is the rotation matrix, T i is the translation matrix, p S i and q T i is a correspondence in the model and scene. L S i and L T i are LRFs, respectively located at keypoints p S i and q T i .

Maximum Sample Set Chosen by Clustering
Traditional RANSAC-based methods are commonly applied to remove mismatches, which are time-consuming and sensitive to high outlier ratios. Considering that correct matches are of high consistency, while wrong ones are diverse, Yang et al. [51] proposed a consistency voting (CV) method, however, an initial voting set should be predefined. Inspired by that, a transformation matrix-based clustering is adopted to eliminate mismatches without a predefined initial voting set.
It is clearly shown in Figure 7a that transformation matrices of correct correspondences are always consistent with each other (especially translation matrix), whereas incorrect correspondences are rarely compatible with either correct or incorrect ones. Thus, two consistent requirements, translation and rotation, are considered to judge whether two correspondences belong to the same cluster or not. As reported in Equations (8) and (9), for two correspondences with rotation and translation matrices (R i , T i ), (R j , T j ), if the deviation of rotation and translation matrix A r , A T are, respectively less than thresholds, they are considered as one cluster; otherwise, they belong to two different clusters. In this study, the rotation threshold is set as 5 • , and the translation threshold is 5 mr (mesh resolution, the average distance between points).
Details of the transformation matrix-based clustering algorithm are shown in the Appendix A, Table A2. Once all pairs are labeled, pairs with the same label belong to one cluster. Finally, the cluster with the maximum size is chosen as the maximum similar set, and other clusters are eliminated (as shown in Figure 7b).
Remote Sens. 2018, 10, x FOR PEER REVIEW 10 of 23 method, however, an initial voting set should be predefined. Inspired by that, a transformation matrix-based clustering is adopted to eliminate mismatches without a predefined initial voting set. It is clearly shown in Figure 7a that transformation matrices of correct correspondences are always consistent with each other (especially translation matrix), whereas incorrect correspondences are rarely compatible with either correct or incorrect ones. Thus, two consistent requirements, translation and rotation, are considered to judge whether two correspondences belong to the same cluster or not. As reported in Equations (8) and (9), for two correspondences with rotation and translation matrices (Ri, Ti), (Rj, Tj), if the deviation of rotation and translation matrix Ar, AT are, respectively less than thresholds, they are considered as one cluster; otherwise, they belong to two different clusters. In this study, the rotation threshold is set as 5°, and the translation threshold is 5 mr (mesh resolution, the average distance between points).
Details of the transformation matrix-based clustering algorithm are shown in the Appendix, Table.A2. Once all pairs are labeled, pairs with the same label belong to one cluster. Finally, the cluster with the maximum size is chosen as the maximum similar set, and other clusters are eliminated (as shown in Figure 7b).

Rodriguez and LS-Based Registration
Once the maximum similar set is obtained, the relationship between source and target can be expressed as Equation (10) [52][53][54], where (XTi, YTi, ZTi ) T and (XSi, YSi, ZSi ) T are, respectively 3D coordinates of correspondences in the target and source, R is a rotation matrix and (T1, T2, T3) T is a translation matrix. Because the number of observations is greater than the number of necessary observations, a least square (LS) algorithm is widely applied to solve parameters. However, because the rotation matrix R is nonlinear in Equation (10), a Gauss-Markov (GM) model is usually established for iterative calculation [34], which is time-consuming. For simplification, a Rodriguez

Rodriguez and LS-Based Registration
Once the maximum similar set is obtained, the relationship between source and target can be expressed as Equation (10) [52][53][54], where (X T i , Y T i , Z T i ) T and (X S i , Y S i , Z S i ) T are, respectively 3D coordinates of correspondences in the target and source, R is a rotation matrix and (T 1 , T 2 , T 3 ) T is a translation matrix. Because the number of observations is greater than the number of necessary observations, a least square (LS) algorithm is widely applied to solve parameters. However, because the rotation matrix R is nonlinear in Equation (10), a Gauss-Markov (GM) model is usually established for iterative calculation [34], which is time-consuming. For simplification, a Rodriguez matrix [53,54] is applied instead of the transformation matrix, where the iterative calculation can be effectively avoided.
As reported in Equation (11), a Rodriguez matrix S is a skew-symmetric matrix, which is directly composed of three independent parameters (a, b, c). The Rodriguez matrix has an important character Equation (12), which is greatly helpful when solving parameters. Relationships between the rotation matrix R and the Rodriguez matrix S are reported in Equations (13) and (14). By transforming the rotation matrix R into a Rodriguez matrix S, it can effectively transform the observation equation from nonlinear into linear (Equation (15)). The process of the Rodriguez-LS-based method is shown in the Appendix A, Table A3. This method does not involve a solution of trigonometric and anti-trigonometric function in the calculation process, and the calculation accuracy is slightly higher than the traditional seven-parameter estimation methods.

Criteria
To quantitatively evaluate the results of the proposed co-registration method, several criteria are applied. The criteria for keypoint extraction, LRF construction, feature-matching and registration are, respectively introduced from Section 4.1 to Section 4.4.

Criteria for Keypoints Extraction
Traditionally, there are two main traits to quantitatively evaluate extracted keypoints: repeatability and distinctiveness. Repeatability is an ability to detect the same keypoints accurately under various nuisances, such as noise corruption, density variation and partial occlusion, while distinctiveness is an ability to be effectively described and matched under wrong correspondences, which is always measured in combination with a specific descriptor [55]. Thus, in this section, only the repeatability is adopted for keypoint evaluation.
The repeatability includes two aspects: absolute repeatability r abs and relative repeatability r rel . As reported in Equation (16), a keypoint p i in the model is said to be repeatable if the distance from its nearest neighbor q j in the scene is less than a threshold ε after true transformation (R t , T t ). In this study, this threshold ε is set to 2 mr. If there are K keypoints extracted in the model, and meanwhile k keypoints are repeatable, then the absolute and relative repeatability can be computed as Equation (17).

Criteria for LRF Construction
A stable LRF is of great significance to the distinctiveness and robustness of LRF-dependent descriptors, as its invariance to rigid transformations is achieved by LRF. The direction consistency of LRFs have a greater impact on the description encoding, as the contrary direction of LRFs will result in a completely different coding sequence [44]. Thus, a criterion, direction consistency is defined to judge the error of each coordinate axis.
Let L p (L p x , L p y , L p z ) and L q (L q x , L q y , L q z ) be two LRFs, respectively located on two keypoints p i and q i on the source and target. If the distance between two keypoints is less than a threshold η (in this study, it is set as 1 mr) after true transformation, then, the dot product of each axis is computed as Equation (18). The X-axis is regarded as consistent if the dot product between two axes L p x and L q x is larger than '0', which is the same for Y-axis and Z-axis.

Criteria for Feature-Matching
To quantitatively evaluate the performance of the proposed feature-matching method, the number of correct matches C p is adopted. When the distance ||q j − Tp i || between the keypoint p i in the source and the keypoint q j in the target is smaller than a threshold s (7.5 mr in this study), p i and q j are considered as a corresponding keypoint pair (T is the true transformation matrix). A match is considered as correct when the corresponding descriptors' keypoints are corresponding keypoint pairs [44].
As one of the widely used measurements in the literature [23,45], the recall and precision are calculated as follows: the precision P r is calculated as the number of correct descriptor matches with respect to the total number of descriptor matches (Equation (21)), while the recall R e is the number of correct descriptor matches with respect to the number of corresponding keypoint pairs (Equation (22)). "#" is the total number of samples from a given category.

Criteria for Registration
To quantitatively evaluate the performance of the proposed registration method, several criteria are adopted in the experiments: accuracy of rotation matrix A r (Equation (23)) and ranslation matrix A T (Equation 24)). Additionally, to assess the performance of gross error elimination, the total positioning error D sum (Equation (25)) and error in X, Y and Z axes D x , D y , D z of remained matches (Equation (26)) are calculated.
In above equations, R 2 and T 2 be the calculated rotation and translation matrix, while R 1 and T 1 are the true rotation and translation matrix; n is the amount of remained correspondences; ||p i − q i || is the distance between the correspondence p i in the source and q i in the target after transformation (R 2 , T 2 ); (p x , p y , p z ) and (q x , q y , q z ) are, respectively the x, y and z coordinates of p i and q i .

Results
To test the feasibility of the proposed co-registration method, a set of experiments were conducted on three kinds of mimetic satellite LiDAR point clouds and experimental results of the above five steps are, respectively analyzed in follows with the above criteria. Since the performance of the voxel descriptor has been discussed in detail in previous literature [23,32,56], it is not discussed in this study. It is worth noting that when adding noise in following experiments, we add Gaussian noise with a standard deviation of 0.1 mr, 0.2 mr, 0.3 mr and 0.5 mr (mr denotes mesh resolution) to them.

Results of Keypoint Extraction
For extensive, enormous and sparse satellite LiDAR point clouds, a ridgeline-based keypoint extraction is applied, where keypoints are defined as intersections of ridgelines. To reduce the amount of point clouds for subsequent processing, the original point clouds is first, downsampled. Figure 8 shows the results of ridgeline and keypoint extraction. It can be seen that extracted ridgelines can well reflect the topographic characteristics and extracted keypoints, which are colored in blue, are concentrated in the rough area and sparse in the flat area.
Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 23 above five steps are, respectively analyzed in follows with the above criteria. Since the performance of the voxel descriptor has been discussed in detail in previous literature [23,32,56], it is not discussed in this study. It is worth noting that when adding noise in following experiments, we add Gaussian noise with a standard deviation of 0.1 mr, 0.2 mr, 0.3 mr and 0.5 mr (mr denotes mesh resolution) to them.

Results of Keypoint Extraction
For extensive, enormous and sparse satellite LiDAR point clouds, a ridgeline-based keypoint extraction is applied, where keypoints are defined as intersections of ridgelines. To reduce the amount of point clouds for subsequent processing, the original point clouds is first, downsampled. Figure 8 shows the results of ridgeline and keypoint extraction. It can be seen that extracted ridgelines can well reflect the topographic characteristics and extracted keypoints, which are colored in blue, are concentrated in the rough area and sparse in the flat area.  To get the change in the number of points during this step, the number and efficiency of three point clouds with different resolutions in the same area are recorded as Table 2, where the sampling distance is set as 120 m. It can be seen from table that the number of points of extracted ridgelines is greatly reduced compared with the original point clouds or sampled point clouds. Thus, taking ridgelines as neighborhood information for feature description instead of all point clouds can effectively accelerate the subsequent processing speed. The computational time of ALOS 12 is much higher than others. This is because that a large amount of time is taken to sample. Except for sampling, the time of keypoint extraction of the three data are about 0.3 s. In addition, the absolute and relative repeatability rabs, rrel of keypoints with different Gaussian To get the change in the number of points during this step, the number and efficiency of three point clouds with different resolutions in the same area are recorded as Table 2, where the sampling distance is set as 120 m. It can be seen from table that the number of points of extracted ridgelines is greatly reduced compared with the original point clouds or sampled point clouds. Thus, taking ridgelines as neighborhood information for feature description instead of all point clouds can effectively accelerate the subsequent processing speed. The computational time of ALOS 12 is much higher than others. This is because that a large amount of time is taken to sample. Except for sampling, the time of keypoint extraction of the three data are about 0.3 s. In addition, the absolute and relative repeatability r abs , r rel of keypoints with different Gaussian noise affected are presented as Table 3. It can be seen that, with noise increasing, the relative repeatability of extracted keypoints declines on all three datasets. However, the absolute repeatability of the SRTM 90 decreases first and then increases, while that of ALOS 12 and ASTER 30 decreases all the time. This is because that the point distance of SRTM 90 is much larger than ALOS 12 and ASTER 30. Therefore, the added noise is relatively larger and has a great influence on the local surface, resulting in more points extracted as keypoints.

Results of LRF Construction
The direction of LRF has a more important impact on feature coding than angle deviation. To evaluate the direction consistency of the constructed LRF, experiments are conducted on the above three datasets with varying noise, where the sources have noise interference, and the targets have no noise interference. For a fair comparison, the direction consistency of LRF decomposed by coordinate covariance and normal vector covariance are both calculated with the same parameters. Table 4 lists the results of the above three LRF construction methods on three datasets with different levels of noise. It is clear that, from ALOS 0.1 mr to ASTER 0.3 mr, the LRF constructed by coordinate outperforms the LRF constructed by the normal vector. However, with noise continuously increasing, results are on the contrary from SRTM 0.1 mr to SRTM 0.3 mr. It shows that when the noise is large, the LRF constructed by the normal vector is more robust. Additionally, it is clear that the direction of the Z-axis decomposed by coordinate is relatively more stable than the other two axes (Y-axis is determined by X-axis and Z-axis), while the X-axis decomposed by the normal vector is relatively steadier than the other two axes. Thus, by combining two matrices, the proposed method makes full use of the above two methods and the X and Z axes constructed by the proposed method are much more directionally consistent than either.

Results of Feature-Matching
To evaluate the proposed MCMF-matching method, a set of experiments are conducted on three simulated satellite LiDAR point clouds with different resolutions in the same area. The results are reported in Table 4. For comparison, experiments are also conducted on the previous KM-matching algorithm, which has shown that it performs better than NN and NNDR-based strategies in previous studies [45]. In Table 5, K s is the number of keypoints on the source and K t is the amount of keypoints on the target, while M p is the amount of matching pairs, C p is the amount of correct matches, P r is the precision and R e is the recall. It can be seen that the efficiency of the MCMF algorithm is much higher than that of the KM algorithm on above three datasets. This is because the KM algorithm is suitable for square matrices, while MCMF can handle sparse non-square matrices. On one hand, in KM algorithm, when the number of keypoints on the source and target is not equal, keypoints with less amount is expanded by adding some vertices or edges of 0 to make the matching matrix be square, adding much redundant data (for example, (2430-2291) * 2430 in the first data); on the other hand, correspondences of similarity less than a threshold (0.4 in the study), usually do not need to be considered, as it would rarely be a correct match. In this case, a large proportion of matches with small similarity are skipped in MCMF algotirhm, which greatly improve the efficiency. However, some correct correspondences with their similarities less than the threshold are also ignored, resulting in the number of correct matches in the MCMF algorithm is a bit lower than the KM algorithm.

Results of Registration
Random sample-based registration methods (such as RANSAC or 4PCS) are usually used to deal with mismatches. However, these methods are of high randomness, as only three or four correspondences are utilized to determine the final transformation matrix, resulting in many correct matches not participating in it. Therefore, this paper proposes a clustering and Rodriguez-LS-based registration method. To test its applicability, two sets of experiments are carried out: (1) registration with different resolutions in the same area; (2) registration with the same resolution of small overlap. Figure 9 shows  Table 6 reports the evaluations of above registration results of the proposed method compared with the BSC [44]+KM+RANSAC algorithm, where BSC is a feature descriptor, KM is a featurematching strategy, and RANSAC is used for registration. It is found that, the rotation and translation deviations of the proposed method are much smaller than the BSC [44]+KM+RANSAC method. There are two reasons: (1) the keypoints we adopted are the intersection of the ridgelines, which is of high uniqueness; the keypoints adopted in the BSC are not specially designed for terrain point clouds and are not distinctive; (2) for the proposed method, after removing mismatches, the remained correspondences of above three datasets are widely distributed in the overlapping area and made full use of for LS-based registration and then the transformation parameters are solved by minimizing the square sum of correspondences' positioning errors; however, the RANSAC algorithm estimates the transformation matrix by randomly and iteratively selecting three correspondences, which are sparsely distributed in the overlapping area. Positioning errors of correspondences, especially the correspondences of the last iteration, will directly affect the registration accuracy.   Table 6 reports the evaluations of above registration results of the proposed method compared with the BSC [44]+KM+RANSAC algorithm, where BSC is a feature descriptor, KM is a feature-matching strategy, and RANSAC is used for registration. It is found that, the rotation and translation deviations of the proposed method are much smaller than the BSC [44]+KM+RANSAC method. There are two reasons: (1) the keypoints we adopted are the intersection of the ridgelines, which is of high uniqueness; the keypoints adopted in the BSC are not specially designed for terrain point clouds and are not distinctive; (2) for the proposed method, after removing mismatches, the remained correspondences of above three datasets are widely distributed in the overlapping area and made full use of for LS-based registration and then the transformation parameters are solved by minimizing the square sum of correspondences' positioning errors; however, the RANSAC algorithm estimates the transformation matrix by randomly and iteratively selecting three correspondences, which are sparsely distributed in the overlapping area. Positioning errors of correspondences, especially the correspondences of the last iteration, will directly affect the registration accuracy. In terms of efficiency, the proposed method of the above six datasets are much faster than the BSC [44]+KM+RANSAC method. This is because this proposed method only extracts ridgelines as neighborhood feature information, which greatly reduces the number of points processed.

Discussion
From the above experimental results, it can be seen that registration results are affected by many factors, for example, resolution, noise and topography. The effect of resolution has been shown above through registration results of different resolutions in the same area. In this section, attention will be focused on the influence of different overlap, topography and noise on registration results. As shown in Figure 10

Different Overlap
To discuss the influence of different overlap rates on registration results, a set of experiments were conducted. For quantitative assessment, criteria for feature-matching and registration are utilized. Their registration errors are reported in Table 7  From Table 7, we can see that, for the proposed method, since the above four point clouds could achieve satisfactory registration results through this method, results of larger overlap outperform the smaller ones, especially the Line7&Line8. This is because when the overlap rate is small in Line5&Line6, the proportion of keypoints in the overlapping area to the whole extracted keypoints becomes smaller. Therefore, when feature-matching, keypoints in the overlapping area have more choices and are of higher probability to select mismatches, especially when the features in the overlapping area are not obvious (such as Line5&Line6 and Line7&Line8). When the overlapping rate is large (Line7&Line8), keypoints in the overlapping areas account for more, and their choices are less, resulting in less probability to select mismatches. Thus, the point clouds of large overlap outperforms the small ones. It is worth noting that, if there are more correct pairs in overlap areas participated in the calculation of the rotation matrix, the registration accuracy is higher, such as Line7&Line8. Due to the almost identical correct matches in Line9&Line10 and Line11 &Line12, the registration errors of both two point clouds are actually quite similar. However, for the BSC [44]+KM+RANSAC method, for flat areas (Line5&Line6, Line7&Line8), registration failed regardless

Different Overlap
To discuss the influence of different overlap rates on registration results, a set of experiments were conducted. For quantitative assessment, criteria for feature-matching and registration are utilized. Their registration errors are reported in Table 7 compared with the BSC [44]+KM+RANSAC method.
From Table 7, we can see that, for the proposed method, since the above four point clouds could achieve satisfactory registration results through this method, results of larger overlap outperform the smaller ones, especially the Line 7 & Line 8. This is because when the overlap rate is small in Line 5 & Line 6, the proportion of keypoints in the overlapping area to the whole extracted keypoints becomes smaller. Therefore, when feature-matching, keypoints in the overlapping area have more choices and are of higher probability to select mismatches, especially when the features in the overlapping area are not obvious (such as Line 5 & Line 6 and Line 7 & Line 8). When the overlapping rate is large (Line 7 & Line 8), keypoints in the overlapping areas account for more, and their choices are less, resulting in less probability to select mismatches. Thus, the point clouds of large overlap outperforms the small ones. It is worth noting that, if there are more correct pairs in overlap areas participated in the calculation of the rotation matrix, the registration accuracy is higher, such as Line 7 & Line 8. Due to the almost identical correct matches in Line 9 & Line 10 and Line 11 & Line 12, the registration errors of both two point clouds are actually quite similar. However, for the BSC [44]+KM+RANSAC method, for flat areas (Line 5 & Line 6, Line 7 & Line 8), registration failed regardless of the overlap rates; for rough area and, registrations with high overlap rates (Line 11 & Line 12) are more likely to success compared to samll overlap rates (Line 9 & Line 10).

Different Topography
The effect of topography on registration results can be mainly divided into two aspects, one is the keypoint extraction, the other is feature-matching, which are both finally reflected in the number of correct correspondences.
As shown in Table 7, for the proposed method, when the overlapping changes, the number of correct matches in Line 5 & Line 6 and Line 7 & Line 8 varies greatly, while that of Line 9 & Line 10 and Line 11 & Line 12 stay stable. This is because of the different topography. The overlapping area of Line 5 & Line 6 and Line 7 & Line 8 is flat (Figure 10a,b), while Line 9 & Line 10 and Line 11 & Line 12 are rough (Figure 10c,d). When the overlap rate is small (Line 5 & Line 6), there are many keypoints extracted. However, becuase their local surface is of low distinctiveness and there are more choices to match, most of keypoints are mismatched (159 correct matches in 2377 matches). When the overlap rate is large (Line 7 & Line 8), the amount of extracted keypoints reduces sharply (from 2377 to 1247). Therefore, keypoints in the overlap area have fewer choices for matching, resulting in more correct matches (from 159 to 464). For Line 9 & Line 10 and Line 11 & Line 12, the change of correct matches is not obvious with the overlap rate varying (from 195 to 204). This is because when the overlap rate varies, although the amount of extracted keypoints changes, the number of matches with similarity greater than the threshold just slightly changes (from 546 to 403) because of their distinct features. However, for the BSC [44]+KM+RANSAC method, registration is often failed when the overlapping area is flat (as Line 5 & Line 6, Line 7 & Line 8), but registration was successful when the terrain is rough.
In short, for the proposed method, when the topography is rough in the overlapping area, the variation of the correct matches is small with the overlap rate changing; when the topography is flat, the number of correct matches fluctuates greatly.

Different Noise
To discuss the impact of noise on registration accuracy, experiments were conducted on Line 11 & Line 12 compared with the BSC [44]+KM+RANSAC method, where Gaussian noise with a standard deviation of 0.1 mr, 0.2 mr and 0.3 mr (mr denotes mesh resolution) is added to the Line 12. To quantitatively assess registration results with varying Gaussian noise, criteria for feature-matching and registration mentioned above are calculated and reported in Table 8. It can be seen that with noise increasing, for the proposed method, the number of matching pairs increases from 403 to 1763, while that of BSC [44]+KM+RANSAC method is almost no change at all. This is because noise makes ridgelines trivial and many points without apparent features are extracted as keypoints in the proposed method, while the BSC [44]+KM+RANSAC method used a ISS (intrinsic shape signature) keypoint extraction, which is not greatly affected by noise. Additionally, for the proposed method, the amount of correct matches gradually reduces from 204 to 112, which is because increasing noise destroyed the original characteristics of ridgelines. Meanwhile, due to the noise disturbed, the position error of correct pairs increases, resulting in the deviations (D sum , D x , D y , D z ) of correct correspondences raise after registration. However, the error of the transformation matrix, especially the rotation matrix, is essentially consistent, which mainly owes to (1) almost all wrong matches are eliminated by the transformation matrix-based clustering; (2) the optimal transformation matrix is calculated by the Rodriguez-LS algorithm making use of all correct matches. However, for the BSC [44]+KM+RANSAC method, as the noise increases, the registration error becomes larger, and when the noise reached to 0.3 mr, the registration fails. It is worth noting that, due to the large footprint of satellite LiDAR point clouds, the small error of the rotation matrix still has a great impact on the matching deviation, especially on the edge.

Conclusions
A ridgeline-based terrain co-registration method is proposed to handle extensive, enormous and sparse satellite LiDAR point clouds in rough areas. The presented method showed several merits in experiments: (1) the number of points greatly reduced and keypoints was of high repeatability and distinctiveness; (2) the proposed LRF is of high direction consistency; (3) the MCMF algorithm could efficiently achieve a maximum similarity sum in a sparse-matching graph; (4) the transformtion matrices based clustering makes full use of correct correspondences and can effectively eliminate mismatches. However, there are many aspects that require improvement in future work in order to overcome the limitations of ridgeline-based co-registration. In our cases, the experimental data are simulated from worldwide DEMs and airborne LiDAR point clouds instead of the real satellite LiDAR point clouds, which ignore the impact of platform jitter and atmospheric transmission. Additionally, this method is proposed for rough areas, however, for the flat areas where has no or not enough ridgelines, it will not perform well. Due to the limited research that is reported for registration of satellite LiDAR point clouds, our work may inspire more researchers to work in this field and to achieve better results.

Conflicts of Interest:
The authors declare no conflicts of interest.
Appendix A Table A1. Algorithm 1 minimum cost-maximum flow-Based Graph-Matching.

Input: Similarity Matrix Output: Maximum Flow With Minimum Cost
(1) initialize the capacity cost graph G = (V, E, c, w); initialize the maximum flow as 0; (2) construct weighted residual digraph; (3) search an augmented path p from vs to v t in the residual digraph. If it is found, return to step 4; otherwise, return to step 6.
(4) search for a path with the minimum cost and maximum flow f ij in augmented paths; (5) update the cost and capacity of related arcs in current residual digraph and return to step 3; (6) get a feasible maximum flow f = { f 1 , f 2 , f 3 , . . . , f k } with minimum cost, exit. (1) if the current stack T is empty, select any unmarked matching pair as a seed and put the seed in the stack with a label i; (2) pop up the top pair of the stack and look for all unmarked pairs. If the differences between both translation and rotation are less than thresholds, put it into the stack with a label i; (3) repeat step (2) until the stack is empty; (1) i++; repeat steps (1)-(3) until all pairs are marked. Table A3. Algorithm 3 Rodriguez-LS-based Registration. [ T x T y T z ]