Registration Algorithm Based on Line-Intersection-Line for Satellite Remote Sensing Images of Urban Areas

Image registration is an important step in remote sensing image processing, especially for images of urban areas, which are often used for urban planning, environmental assessment, and change detection. Urban areas have many artificial objects whose contours and edges provide abundant line features. However, the locations of line endpoints are greatly affected by large background variations. Considering that line intersections remain relatively stable and have high positioning accuracy even with large background variations, this paper proposes a high-accuracy remote sensing image registration algorithm that is based on the line-intersection-line (LIL) structure, with two line segments and their intersection. A double-rectangular local descriptor and a spatial relationship-based outlier removal strategy are designed on the basis of the LIL structure. First, the LILs are extracted based on multi-scale line segments. Second, LIL local descriptors are built with pixel gradients in the LIL neighborhood to realize initial matching. Third, the spatial relations between initial matches are described with the LIL structure and simple affine properties. Finally, the graph-based LIL outlier removal strategy is conducted and incorrect matches are eliminated step by step. The proposed algorithm is tested on simulated and real images and compared with state-of-the-art methods. The experiments prove that the proposed algorithm can achieve sub-pixel registration accuracy, high precision, and robust performance even with significant background variations.


Introduction
Image registration aims to align two or more images that have overlapping scenes and are captured by the same or different sensors at different times or from different viewpoints, which is a basic and essential step of remote sensing image processing. The accuracy of registration has a considerable influence on subsequent processing, such as image fusion, image retrieval, object recognition, and change detection. However, high-accuracy remote sensing image registration still faces many difficulties and challenges, especially for urban areas.
Urban scenes, which are often used for urban planning, environmental assessment, and change detection, are widely studied in the field of remote sensing. High-accuracy registration is required to achieve good processing results. Urban scenes contain many man-made objects, such as roads, buildings, and airports. Salient features can be extracted easily, but the localization of features is substantially affected by image variations. Natural disasters, such as earthquakes and floods, may greatly damage the contours of objects or even global geometric structures. With the wide application are less sensitive to background variations. Sui et al. [26] extracted LSD and utilized line segment intersections for Voronoi integrated spectral point matching. Li et al. [27] built the line-junction-line (LJL) structure with two line segments and their intersecting junction for constructing descriptors and design matching strategies. The registration with line segments and their intersections (RLI) algorithm, which was proposed by Lyu et al. [28], selects triplets of intersections of matching lines to estimate affine transformation iteratively; this algorithm has good robustness.
With the development of deep learning, deep learning-based methods emerge in recent years [29][30][31]. MatchNet proposed by Han et al. [29] is a typical architecture, which consists of a deep convolutional network that extracts features from patches and a network of three fully connected layers that computes a similarity between the extracted features. In the field of remote sensing, He et al. [32] proposed a Siamese convolutional neural network for multiscale patch comparison, which combines with the S-Harris corner detector to improve the matching performance for remote sensing images with complex background variations. Yang et al. [33] generated robust multi-scale feature descriptors utilizing high level convolutional information while preserving some localization capabilities, and designed a gradually increasing selection of inliers for registration. Deep learning-based descriptors can be robust to large background variations, but they have relatively low localization accuracy and need traditional strategies to improve registration performance.
Considering the rich line features in satellite remote sensing images of urban areas and the more stable location of intersections than that of endpoints, this paper proposes a high-accuracy remote sensing image registration algorithm that is based on the line-intersection-line (LIL) structure with two lines and their intersection. First, multi-scale line segments are detected in a Gaussian pyramid, and some constraints are set to filter and compute intersections to extract scale-invariant and accurately located LIL structures. Second, a new LIL local descriptor is constructed by using pixel gradients in two line support regions and realize initial matching. Then, a graph-based LIL outlier removal method is conducted using the LIL structures and changes in the geometric relations between matches. A variation matrix of relative position is built with a spatial relation descriptor based on affine properties and graph theory. Outliers are eliminated successively until the matrix is zero matrix. Finally, high-accuracy affine transformation is estimated with inliers.
The main contribution of this study centers on design of a double-rectangular local descriptor and a spatial relationship-based outlier removal strategy on the basis of the LIL structure. Compared with other similar methods, LIL descriptors are finer to resist large background variations and the outlier removal strategy is more simple and effective, which makes full use of features' structure and adjacency relations with simple affine properties.
In our experiments, the proposed algorithm can achieve sub-pixel accuracy registration and realize high precision and is robust to scale, rotation, illumination, occlusion, and even large background variations before and after disasters.

Methodology
The LIL algorithm can be divided into several main steps: LIL feature extraction, LIL feature description, LIL feature matching, and transformation model estimation. The flowchart is shown in Figure 1.

Multi-scale LIL Feature Extraction
In obtaining intersections, line segments should first be detected. EDLines [15] are used to detect line segments, which can well fit the contours of objects and maintain edge completeness. A Gaussian pyramid is constructed to realize multi-scale feature detection, as shown in Figure 2. A series of layers named octaves is obtained from the original image through Gaussian blur and downsampling; these octaves form an image pyramid with different scales. Given discrepancies between line features and point features, sampling parameters and Gaussian coefficients must be designed.  First, octave number O should be set adaptively according to the image size. For large-scale images, only a few lines will be detected, and their locations will not be highly accurate. As intersections are computed on the basis of line segments, the number and positioning accuracy of intersections  First, octave number O should be set adaptively according to the image size. For large-scale images, only a few lines will be detected, and their locations will not be highly accurate. As intersections are computed on the basis of line segments, the number and positioning accuracy of intersections First, octave number O should be set adaptively according to the image size. For large-scale images, only a few lines will be detected, and their locations will not be highly accurate. As intersections are computed on the basis of line segments, the number and positioning accuracy of intersections will also be affected. For obtaining sufficient and accurate line segments on each octave, with the assumption that the image size is X × Y, the experiment shows that the O should be as follows: (1) In the SIFT, O = log 2 min(X, Y) − 2 . In this case, only dozens of line segments will be extracted on the octaves with large scales, thereby fewer or none accurate intersections can be detected.
For the same consideration, the Gaussian image is down-sampled by a factor of √ 2 instead of 2 [1]. For an image of size 2000 × 2000, the octave number is 5, and the size of the fifth octave is 356 × 356. In this manner, the small number of octaves will not reduce scale invariance, and a large image size can ensure that sufficient line segments are detected on a large-scale image. The Gaussian coefficients should not be very large; otherwise, the positioning accuracy of line segments will be affected. A large-scale image is generated from a small-scale image as follows: where L(x, y, σ o ) is octave o, σ o is its scale, and ⊗ is the convolution operation in x and y. The scale varies with the same down-sampling factor, i.e., σ o = √ 2σ o−1 . The initial scale σ 0 is set to 0.25.
After the line segments for each octave are detected, the intersections of line segments are calculated to extract LIL. Only line segments on the same octave can form a LIL.
Given the large number of fragmented line segments, line segments and their intersections will be filtered within the following constraints to reduce computation cost and retain accurate features.

1.
Nearest-neighbor rectangular region search. Search other line segments in a rectangular region centered on line segment s i to compute intersections, as shown in Figure 3. Define the length of s i as S i ; then, set b = b coef · S i . The width and length of the rectangular region are 2b and S i + 2b, respectively. The intersection is reserved when the starting or ending point of s j is within the rectangular region.
Distance constraint. The intersection may be on the segment extension line. If it is very far from the line segment, the positioning accuracy will be reduced and the intersection may not make sense; thus, the distance constraint is needed. The distance between s i and s j is defined as the distance between the midpoint of shorter line segment s min = arg min{S i , S j } and their intersection, which must meet d ij > d coef · S min .
Obviously, the larger the search range, the more feature points will be detected. However, it will lead to increasing computational cost and decreasing matching precision. The values of b coef , θ i , and d coef are set empirically. More details are discussed in the Section 3.2. Figure 3 shows a rectangular region set with s 1 as the center. The endpoint of s 2 is within the rectangular region, and the intersection angle θ and distance d 12 satisfy the constraints of angle and distance. Thus, the intersection O is a valid intersection. However, s 3 is not in the rectangular region, s 4 does not satisfy the intersection angle constraint, and for s 5 , d 15 is too long to meet the distance constraint; none of these lines can form an valid intersection with s 1 .
The intersections that satisfy the above constraints are selected to build LIL structures. Such structure also need additional information to facilitate the construction of complete LIL features. Mapping all line segments and their intersections detected in each octave to the original image, a LIL feature is LIL = {p, θ, l 1 , l 2 , LILDes}. p is the intersection coordinate (x, y), and θ is the intersection angle between two lines. Considering that the intersection of detected EDLines s 1 and s 2 may lie on the segments or their extension, l 1 , l 2 denote the lines which construct the LIL structure which start from the intersection to one of the endpoints of s 1 and s 2 , whose lengths are L 1 and L 2 , respectively. LILDes is the LIL local feature descriptor, which is described in detail in the next section.

LIL Local Feature Description
The LIL local descriptor is constructed with LIL structures by using the gradients in the LIL neighborhood.
The framework of the LIL descriptor is shown in Figure 4, which is a double-rectangular shape. The descriptor framework consists of two line support regions (LSR), which are centered at lines l 1 and l 2 . The width of each LSR is W B , and the length is equal to that of the center line, i.e., L B = L 1 or L 2 . Each LSR is divided into M segments along the length direction, N segments along the width direction, i.e., M × N blocks. The whole LIL contains 2 × M × N blocks, for a total of two LSRs. Figure 4 shows an example where M = 5, N = 4, for a total of 40 blocks. For one LSR, the block of the ith row jth column is denoted as B i,j , with a width of w B i and length of l B j , where i = 1, 2, · · · , M, j = 1, 2, · · · , N. The width w B i will be smaller with B i,j closer to the center line. Consequently, the discriminability of descriptors will increase, and features will become increasingly fine near the center line. An example in Figure 4 shows that the corresponding widths of B i, * are {5, 4, 3, 4, 5} pixels, which are determined by empirical values in practice. Similarly, the descriptor is centered on the intersection, so blocks near the intersection should be finer. Therefore, the closer B i,j is to the intersection, the smaller the length l B j . l B j is determined by the following Formula (3): where l 0 is the minimum length, and N 0 is the separation column of different coordinates. When j > N 0 , l B j gradually widens, a logarithmic coordinate is adopted. When j N 0 , an equal-interval coordinate is adopted to prevent the length of B i,0 from being very short to be affected by noise. l 0 is determined by N and N 0 . If N and N 0 are determined, l 0 can be calculated by Equation (3). When N = 4, and N 0 = 2, the corresponding lengths of B * ,j are {L/8, L/8, L/4, L/2}, as shown in Figure 4. The gradient g of each pixel is first calculated in the LIL line support region. To achieve rotation invariance, the local coordinate systems of l 1 and l 2 are built separately, and the pixel gradients are rotated to the local coordinate system. With l 1 as an example, the direction along the line segment is denoted as d l , and the direction perpendicular to the line pointing to the inner side of LIL is denoted as

LIL Local Feature Description
The LIL local descriptor is constructed with LIL structures by using the gradients in the LIL neighborhood.
The framework of the LIL descriptor is shown in Figure 4, which is a double-rectangular shape. The descriptor framework consists of two line support regions (LSR), which are centered at lines l 1 and l 2 . The width of each LSR is W B , and the length is equal to that of the center line, i.e., L B = L 1 or L 2 . Each LSR is divided into M segments along the length direction, N segments along the width direction, i.e., M × N blocks. The whole LIL contains 2 × M × N blocks, for a total of two LSRs. Figure 4 shows an example where M = 5, N = 4, for a total of 40 blocks. For one LSR, the block of the ith row jth column is denoted as B i,j , with a width of w B i and length of l B j , where i = 1, 2, · · · , M, j = 1, 2, · · · , N. The width w B i will be smaller with B i,j closer to the center line. Consequently, the discriminability of descriptors will increase, and features will become increasingly fine near the center line. An example in Figure 4 shows that the corresponding widths of B i, * are {5, 4, 3, 4, 5} pixels, which are determined by empirical values in practice. Similarly, the descriptor is centered on the intersection, so blocks near the intersection should be finer. Therefore, the closer B i,j is to the intersection, the smaller the length l B j . l B j is determined by the following Formula (3): where l 0 is the minimum length, and N 0 is the separation column of different coordinates. When j > N 0 , l B j gradually widens, a logarithmic coordinate is adopted. When j N 0 , an equal-interval coordinate is adopted to prevent the length of B i,0 from being very short to be affected by noise. l 0 is determined by N and N 0 . If N and N 0 are determined, l 0 can be calculated by Equation (3). When N = 4, and N 0 = 2, the corresponding lengths of B * ,j are {L/8, L/8, L/4, L/2}, as shown in Figure 4. The gradient g of each pixel is first calculated in the LIL line support region. To achieve rotation invariance, the local coordinate systems of l 1 and l 2 are built separately, and the pixel gradients are rotated to the local coordinate system. With l 1 as an example, the direction along the line segment is denoted as d l , and the direction perpendicular to the line pointing to the inner side of LIL is denoted as d ⊥ . The origin point is one of vertices of LSR, which lays on the outer side of LIL near the intersection, as shown in Figure 4. g is projected onto this coordinate system g d = (g d l , g d ⊥ ) T . d ⊥ . The origin point is one of vertices of LSR, which lays on the outer side of LIL near the intersection, as shown in Figure 4. g is projected onto this coordinate system To reduce the effect of noise far from LIL on descriptor invariance, gradients are weighted by Gaussian functions to emphasize the gradients close to the center of the descriptor. Pixels closer to the line segment should contribute more to the descriptor, so the first Gaussian weighting function f g is along direction d ⊥ and centered on the line segment, with σ g equal to one half of the width of the LSR, i.e., f g = (1/ √ 2πσ g )e −d 2 /2σ 2 g , where d stands for the distance between the pixel to the center line. Pixels closer to the intersection should have more contribution. Similarly, along direction d l , the Gaussian weighting function f l is set with the length of LSR as σ l and the intersection as the center.
In addition, a local Gaussian weighting function is set for adjacent blocks B i,j , B i−1,j , and B i+1,j to reduce boundary effects between the block along direction d ⊥ . This weighting function is and d k is the distance between the kth row pixels and the center line of B i,j .
Then, the mean and standard deviation of accumulated weighted gradients for each row in each block are used to calculate the LIL descriptor. First, gradients of each row in B i,j and its adjacent blocks are accumulated. In Figure 4, the descriptor of B i,j is computed not only by itself but also by B i−1,j and B i+1,j . Classifying the gradients into four directions, the pth row accumulated gradients are as follows: where λ = f g (p) f b i (p), and q is the column of the pixel in the block. The LIL description matrix of B i,j , LILDM ij ∈ R 4n , can be constructed as follows: To reduce the effect of noise far from LIL on descriptor invariance, gradients are weighted by Gaussian functions to emphasize the gradients close to the center of the descriptor. Pixels closer to the line segment should contribute more to the descriptor, so the first Gaussian weighting function f g is along direction d ⊥ and centered on the line segment, with σ g equal to one half of the width of the LSR, i.e., f g = (1/ √ 2πσ g )e −d 2 /2σ 2 g , where d stands for the distance between the pixel to the center line. Pixels closer to the intersection should have more contribution. Similarly, along direction d l , the Gaussian weighting function f l is set with the length of LSR as σ l and the intersection as the center.
In addition, a local Gaussian weighting function is set for adjacent blocks B i,j , B i−1,j , and B i+1,j to reduce boundary effects between the block along direction d ⊥ . This weighting function is and d k is the distance between the kth row pixels and the center line of B i,j .
Then, the mean and standard deviation of accumulated weighted gradients for each row in each block are used to calculate the LIL descriptor. First, gradients of each row in B i,j and its adjacent blocks are accumulated. In Figure 4, the descriptor of B i,j is computed not only by itself but also by B i−1,j and B i+1,j . Classifying the gradients into four directions, the pth row accumulated gradients are as follows: where λ = f g (p) f b i (p), and q is the column of the pixel in the block.
The LIL description matrix of B i,j , LILDM ij ∈ R 4n , can be constructed as follows: where n is the rows of B ij and its adjacent blocks. The mean and standard deviation of each row in LILDM ij are computed, to form mean vector M ij ∈ R 4 and the standard deviation vector The LIL descriptor is composed of all block descriptors. Two LSRs contain 2MN blocks, so LILDes ∈ R 16MN is as follows: Finally, the mean vector M and the standard deviation vector S are normalized separately because of different magnitudes. Each element in the vector is normalized to [0, 1]. In addition, the element in the descriptor should not exceed the light threshold to resist nonlinear illumination. The threshold is generally set to 0.4 [17]. However, we divide the LSR along direction d l and the values of D ij calculated in different blocks is proportional to the block lengths, thereby the light threshold is set to 0.4l B i /L.

LIL Matching
The preliminary LIL matching is achieved by comparing the Euclidean distance between LIL local descriptors, but brute force matching is time consuming and less robust. Therefore, given the feature of LIL structures, the matching pairs are preliminarily selected on the basis of the intersection angle and length ratio of LIL. On one hand, for satellite remote sensing images, the difference of the same intersection angle in the two images should not be more than 30 • [27], i.e., |θ i − θ i | 30 • . On the other hand, for the same LIL, the change in the ratio of lengths between two line segments will not be very large. Set L ratio = L 1 /(L 1 + L 2 ), so the corresponding LILs should satisfy |L ratio i − L ratio i | 0.2. Then, after brute force matching and a symmetry test, the initial matching set can be obtained as However, many false matches exist in the initial matching set. We design an outlier removal strategy based on the LIL structure and spatial relative relation. We construct a spatial relation descriptor and variation matrix of relative position based on affine properties and graph matching to eliminate outliers step by step.
It's worth noting that the LIL matching method is restricted to satellite remote sensing images. Images acquired by orbital systems satisfy the plane hypothesis. The satellite orbit has high attitude so that we can ignore the elevation error of the ground. On this basis, the registration model can be considered as affine transformation and the spatial relation descriptor is also designed based on affine properties.

LIL Spatial Relation Descriptor
The spatial relations between outliers and inliers are always wrong. According to the LIL structure and geometric relations, we construct a spatial relation descriptor, which can record the variations in relative positions between any of matches and is used to construct the variation matrix of relative position.
The relationship between two satellite remote sensing images can be regarded as affine transformation. Affine transformation has the following properties. (1) Points on a line still lie on the same line after affine transformation; (2) Points on one side of a line are still on the same side of the same line after affine transformation. According to the two properties and the LIL structure, the LIL spatial relation descriptor can be constructed.
LIL divides the image into four parts, and the LIL local coordinate system is defined with the intersection O as origin, and l 1 and l 2 as coordinate axises, as shown in Figure 5. The left side and the positive half of coordinate axis are defined as positive, whereas the right side and the negative half of that are negative, constituting four quadrants {I(++), I I(+−), I I I(−−), IV(−+)}. For any two LIL matches in IMP, a = (LIL i , LIL i ) and b = (LIL j , LIL j ), the quadrant of LIL j in LIL i coordinate system on the reference image is calculated, i.e., Q(a, b) = {q|q = 1, 2, 3, 4, a = b; q = 0, a = b}. Similarly, the quadrant of LIL j in the LIL i coordinate system Q (a, b) can be obtained on the target image. As shown in Figure 5, for match a 3 under the local coordinate system of match a 1 , Q(a 1 , a 3 ) = 1, Q (a 1 , a 3 ) = 3; instead, for a 1 under the a 3 coordinate system, Q(a 3 , a 1 ) = 3, Q (a 3 , a 1 ) = 1.
Similarly, the quadrant of LIL j in the LIL i coordinate system Q (a, b) can be obtained on the target image. As shown in Figure 5, for match a 3 under the local coordinate system of match a 1 , Q(a 1 , The spatial relation descriptor ψ(a, b) is computed as follows: According to the definition, the spatial relation descriptor of one match is the number of quadrants that change from the reference image to the target image in the coordinate system of another match. If the quadrant remains unchanged, then the descriptor value is 0; otherwise, the value is 1 or 2 with quadrant changing.
The spatial relation descriptor reflects the relative variation in spatial relation between LIL matches. For one LIL match a, the more matches change relative to its position, the more non-zero descriptors, i.e., ψ(a, * ) = 0. By contrast, the more non-zero descriptors are present, i.e., ψ( * , a) = 0, the greater the position of a changes relative to others. Therefore, a is likely to be an outlier in the aspect of geometric relation. A matching set with no outlier in spatial relation should satisfy the following: That means the relative positions between any of LIL matches remain unchanged. Figure 5 shows four 4 LIL matches, where a 1 , a 2 , a 4 are inliers, and a 3 is outlier. Obviously, The spatial relation descriptor ψ(a, b) is computed as follows: According to the definition, the spatial relation descriptor of one match is the number of quadrants that change from the reference image to the target image in the coordinate system of another match. If the quadrant remains unchanged, then the descriptor value is 0; otherwise, the value is 1 or 2 with quadrant changing.
The spatial relation descriptor reflects the relative variation in spatial relation between LIL matches. For one LIL match a, the more matches change relative to its position, the more non-zero descriptors, i.e., ψ(a, * ) = 0. By contrast, the more non-zero descriptors are present, i.e., ψ( * , a) = 0, the greater the position of a changes relative to others. Therefore, a is likely to be an outlier in the aspect of geometric relation. A matching set with no outlier in spatial relation should satisfy the following: That means the relative positions between any of LIL matches remain unchanged. Figure 5 shows four 4 LIL matches, where a 1 , a 2 , a 4 are inliers, and a 3 is outlier. Obviously, ψ(a, b) = 0, a, b ∈ {a 1 , a 2 , a 4 }. For matches a 1 and a 3 , the spatial relation descriptor can be computed as ψ(a 1 , a 3 ) = |Q(a 1 , a 3 ) − Q (a 1 , a 3 )| = |1 − 3| = 2; by contrast, ψ(a 3 , a 1 ) = 2 in the coordinate system of a 3 . Similarly, the relations between a 2 , a 4 , and a 3 can be judged to calculate descriptors.

LIL Outlier Removal Based on LIL Spatial Relation Descriptor and Graph Theory
LIL spatial relation descriptors record the variations in relative positions between two matches. According to the theory of graph matching [34], a variation matrix of relative position M ∈ R N I ×N I can be constructed, and its elements are computed with the spatial relation descriptor: With Figure 5 as an example, after spatial relation descriptors are calculated, the elements in the matrix can be computed, such as M(a 1 , a 3 ) = M(a 3 , a 1 ) = ψ(a 1 , a 3 ) + ψ(a 3 , a 1 ) = 2 + 2 = 4. The variation matrix of relative position can be obtained as follows: Mark matches in IMP in indicator vector x of N I dimensions, such that x(a) = 1 if a is an inlier and x(a) = 0 otherwise. The score of total variation of spatial positions in the matching set is as follows: The problem of eliminating outliers is finding the optimal solution x * , which minimizes the score S, i.e., x * = arg min(x T Mx). On account of the positive semi-definite variation matrix of relative position M and the fact that the relative positions between inliers obtained in the final solution should not change, the following holds: Divide IMP into two sets: clean matching set CMP and removed matching set RMP. (12) can be decomposed into the following: where M C and M R are the matrixes associated with inliers and outliers respectively. Because x C (a) = 1, the following holds: Therefore, a strategy should be designed to remove outliers from IMP such that M C = O. Finally, we can obtain CMP with N C matches and RMP with N R matches, where N C + N R = N I .
A greedy algorithm is adopted to remove outliers successively until M = O. To reduce computational complexity, the row and column in M corresponding to the removed match are deleted, and the matrix dimension can gradually be reduced [34]. The algorithm steps are as follows.

1.
Initialize x with the N I × 1 unity vector, and RMP = ∅.

2.
Compute the accumulated error and solve the match with the maximum error. The accumulated error of each match is obtained by summing each row of the variation matrix of relative position, and the matches corresponding to the maximum value can be solved as follows: where OMP is the set of candidate removed matches. If OMP = ∅, then return x * = x; otherwise, set N o = Card(OMP ). More than one match may correspond to the maximum accumulated error. If N o = 1, then the outlier is a ∈ OMP; otherwise, these candidate matches should be compared by the number of matches whose position changes relative to them.
where ⊕ is the xor symbol used to count the number of non-zero elements in the row corresponding to LIL match a.

3.
Remove match c out , and add it to RMP. Delete the row and column in M corresponding to c out , and set x(c out ) = 0. 4.
Repeat Steps (2)and (3), until M = O ∈ R N C ×N C . Return x * = x, and add matches such that x * (a) = 1 to CMP.
All matches with large variations in spatial relation are eventually eliminated. By using all matches in CMP, we can estimate the affine transformation matrix T with the least squares method. To obtain high-accuracy transformation model, we further remove matches with large registration errors and recalculate the final transformation model T.

Experiment and Results
The proposed algorithm is tested on images with simulated transformations and real multi-temporal remote sensing images of urban scenes. The results are compared with those of similar and state-of-the-art algorithms under reasonable accuracy evaluation. To evaluate the LIL local descriptor and the LIL outlier removal method, we compare them with the SIFT descriptor and RANSAC, respectively.

Accuracy Evaluation
For general feature-based remote sensing image registration, the accuracy indexes usually include the root mean squared error (RMSE) of projected distance and precision, which are as follows: where T 0 is the ground truth affine model, p i is the coordinate of the sampled point, and CN and FN denote the number of correct and false matches, respectively, in the final matching set of number TN. The matches with projected distance <3 [28] are regarded as correct matches in this study.
In addition, to evaluate the LIL outlier removal method, we use the following indicators: where DF and DC denote the number of deleted false/correct matches, respectively. Accuracy can measure the closeness of identified matches to the actual matches. Recall represents the ratio of correct matches, which are identified in accordance with the actual situation. Speci f icity represents the ratio of false matches that are identified correctly.

Parameter Setting
Some parameters are determined by experiments in the process of LIL feature extraction and description, as shown in the following Table 1. In Table 1 For LIL local description, the widths of blocks w B = {w B 1 , w B 2 , · · · , w B 9 } and the number of rows M are determined by experiments. If the blocks are very wide or numerous, then the descriptor is not sufficiently fine for achieving high accuracy, and pixels very far from LIL have minimal significance in description. On the contrary, the robustness of LIL descriptors with large local distortions will be reduced. M = 9 is a proper value according to [17,28]. Several combinations are tested on twenty-four pairs of real remote sensing images, and the average results are depicted in Table 2. To balance the accuracy and dimension, w B = {11, 9, 7, 6, 5, 6, 7, 9, 11} is set. The selection of the number of columns N is based on the same consideration. Table 3 shows the average results of several combinations of N and N 0 . In this study, N is set to 4 to balance the dimensions and discriminability of descriptors. Besides, experiments show that N 0 = 2 can maximize the number of correct matches when N = 4. The dimension of LIL local descriptor is 576, which is eight times that of LBD and RLI in consequence of two LSRs and division along the direction of the line segment. Although the dimension is relatively high, it is much finer and more robust without excessively increasing the computation load for matching.

Experimental Results on Images with Simulated Transformations
Experiments are conducted on images with simulated transformations, and the proposed algorithm is compared with several similar state-of-the-art algorithms, including SIFT [1], LP [24], MSLD [16], RLI [28], and LJL [27]. SIFT is the most widely used algorithm for image matching.    Figure 7 shows the Precision and RMSE of the different algorithms tested on simulated images. The scale ratio ranges from 1 to 0.5, and the rotation angle is in the range of [0 • , 180 • ]. Illumination variations are simulated by increasing or decreasing the image intensity. In Figure 7c, negative coordinates indicate dark illumination, and positive coordinates indicate bright illumination. Regarding occlusion, cloud patches are extracted from a real remote sensing image with cloud cover (with gray value 230∼255), which are added randomly to the test image with the number of patches varying from 8 to 20.

Comparison of Matching Results with Different Methods
The data in Figure 7 show that the Precision and RMSE of the LIL can reach nearly 100% precision and sub-pixel accuracy, respectively, under the conditions of scale, rotation, illumination variations, and cloud cover. MSLD performs the worst under scale and rotation transformation. MSLD cannot achieve registration given considerable scale changes, and it has almost no rotation invariance. LP keeps rotation invariance in 0 • ∼90 • . The RMSE of RLI can reach e −13 given sufficient matches, as shown in Figure 7d. However, it loses robustness with rotation variations or dark lighting conditions, and its overall accuracy is lower than the others'. SIFT, LJL and the proposed algorithm remain robust in these experiments. Except for relatively high RMSE in the case of large rotation angle, SIFT can acheive higher registration accuracy than LIL because of numerous detected points. The RMSE of LJL under extreme conditions is sometimes slightly higher than that of the proposed algorithm, with its Precision lower than ours at the same time. All of these algorithms have good robustness against occlusion, but the registration accuracies of LP and MSLD algorithms are relatively low.

Comparison of SIFT and LIL Descriptor
In order to validate the superiority of the LIL local descriptor, this study compares it with the SIFT descriptor on simulated images with large background variations which are different combinations of scale, rotation, illumination, and occlusion variations. Feature points are intersections extracted by the LIL detection in all experiments, and are described by SIFT and LIL descriptors, respectively. Figure 8 shows the precisions after brute force matching. Overall, the precisions of matching results using the LIL descriptor are 3% 18% higher than those using the SIFT descriptors, which proves that the LIL descriptor have higher discriminability for the same feature points. Especially for low-precision test results, the advantage of the LIL descriptor is more obvious.

Comparison of SIFT and LIL Descriptor
In order to validate the superiority of the LIL local descriptor, this study compares it with the SIFT descriptor on simulated images with large background variations which are different combinations of scale, rotation, illumination, and occlusion variations. Feature points are intersections extracted by the LIL detection in all experiments, and are described by SIFT and LIL descriptors, respectively. Figure 8 shows the precisions after brute force matching. Overall, the precisions of matching results using the LIL descriptor are 3% 18% higher than those using the SIFT descriptors, which proves that the LIL descriptor have higher discriminability for the same feature points. Especially for low-precision test results, the advantage of the LIL descriptor is more obvious.

Experimental Results on Real Multi-Temporal Remote Sensing Images
Multi-temporal remote sensing images of urban areas with different types of large background variations are selected for testing to validate the robustness of the proposed algorithm in real situations. The datasets are shown in Table 4. The datasets include images captured in different years or seasons and high-resolution remote sensing images taken before and after natural disasters, which introduce great challenges for robust and high-accuracy registration. Table 5 shows the registration results of different algorithms on the six image pairs.

Experimental Results on Real Multi-Temporal Remote Sensing Images
Multi-temporal remote sensing images of urban areas with different types of large background variations are selected for testing to validate the robustness of the proposed algorithm in real situations. The datasets are shown in Table 4. The datasets include images captured in different years or seasons and high-resolution remote sensing images taken before and after natural disasters, which introduce great challenges for robust and high-accuracy registration. Table 5 shows the registration results of different algorithms on the six image pairs. The table shows that SIFT, LP, MSLD, and RLI have relatively poor performance. Although SIFT achieves high accuracy on simulated images, it performs poorly on real images with large variations. Many SIFT points are detected, but fail to match correctly using descriptors that can not resist large background variations. LP and MSLD are not satisfactory in terms of matching numbers and Precision. For image pair 4-6, the above methods are completely unable to achieve registration, and RLI diverges in the iterative process. LJL detects the most features and has good robustness, but its Precision is not high. Numerous false matches result in relatively large RMSE. The proposed algorithm is the most robust and accurate among the compared methods. Although detected matching numbers are not large, the Precision can reach almost 100% on these image pairs, and registration accuracy reaches sub-pixel level. Figures 9-14 illustrate the matching results of these algorithms. The yellow and red lines indicate correct and false matches, respectively. Figure 9 shows a comparison of matching results for image pair 1. Bridges and buildings undergo great changes over time. Figure 9a shows that many false matches which do not conform to the spatial relations are detected by SIFT. As shown in Figure 9b-d, the matches detected by LP, MSLD and RLI are low in quantity, distributed unevenly, and have many outliers, so they cannot realize correct registration. Although LJL detects numerous matches, as shown in Figure 9e, the final matches are clustered on the right side of the image because the bridge on the left side changes greatly and has few features. Consequently, the registration accuracy of LJL is relatively low. The matching intersections detected by the proposed algorithm are uniformly distributed, so higher accuracy can be obtained. Figure 10 depicts a comparison of matching results for image pair 2. This image pair has many line features, and most buildings are kept intact before and after the hurricane. However, viewpoints change slightly between this image pair, so the tilt and shadow of buildings will considerably impact line feature-based algorithms. The line segments or intersections of the top of buildings are in different positions in the reference and target images because the top of the building is not in the same plane as the ground. Consequently, matching error easily occurs. The false matches of LJL in Figure 10e are mostly at the top of buildings. The proposed algorithm eliminates many false matches at the top of buildings, and the reserved matches are mostly clustered on the ground, thus ensuring registration accuracy.
Remote Sens. 2019, 11,5 18 of 25 LJL is relatively low. The matching intersections detected by the proposed algorithm are uniformly distributed, so higher accuracy can be obtained. Figure 10 depicts a comparison of matching results for image pair 2. This image pair has many line features, and most buildings are kept intact before and after the hurricane. However, viewpoints change slightly between this image pair, so the tilt and shadow of buildings will considerably impact line feature-based algorithms. The line segments or intersections of the top of buildings are in different positions in the reference and target images because the top of the building is not in the same plane as the ground. Consequently, matching error easily occurs. The false matches of LJL in Figure 10e are mostly at the top of buildings. The proposed algorithm eliminates many false matches at the top of buildings, and the reserved matches are mostly clustered on the ground, thus ensuring registration accuracy.   Figure 11 is the comparison of matching results for image pair 3. Flood seriously damages the farmland and towns located in the lower left of the image. The correct matches of SIFT, LP and MSLD are clustered in the airport area, and the uneven distribution decreases the registration accuracy. LJL has uniform matching distribution and can achieve sub-pixel accuracy registration, but it has many false matches. RLI and the proposed algorithm can achieve 100% Precision on this image pair, indicating that RLI performs better on images with salient features and small variations.
The image pair in Figure 12 is a farmland area that has rich line features. The spectral information between images varies much, and SIFT, LP, MSLD, and RLI fail to register due to changes in geomorphological features caused by seasonal variation. LJL has poor matching results and also fails registration for these images whose line features are not distinct and dense. Meanwhile, the proposed algorithm remains robust.  Figure 11 is the comparison of matching results for image pair 3. Flood seriously damages the farmland and towns located in the lower left of the image. The correct matches of SIFT, LP and MSLD are clustered in the airport area, and the uneven distribution decreases the registration accuracy. LJL has uniform matching distribution and can achieve sub-pixel accuracy registration, but it has many false matches. RLI and the proposed algorithm can achieve 100% Precision on this image pair, indicating that RLI performs better on images with salient features and small variations.
The image pair in Figure 12 is a farmland area that has rich line features. The spectral information between images varies much, and SIFT, LP, MSLD, and RLI fail to register due to changes in geomorphological features caused by seasonal variation. LJL has poor matching results and also fails registration for these images whose line features are not distinct and dense. Meanwhile, the proposed algorithm remains robust.    Figures 13 and 14 depict that the proposed method perform robust on image pairs with scale variation and partial overlap, respectively, whereas LJL has low precision with numerous correct matches. However, the proposed method perform relatively worse on images with large-scale variation. Fewer correct matches result in lower registration accuracy than other experiments. Besides, the intersections in Figure 14 are unevenly distributed due to few artificial objects.

Comparison and Analysis of Outlier Filtering
To further evaluate the performance of the LIL outlier removal algorithm, we compare it with RANSAC on real remote sensing images for analysis. IN is the number of initial matches, IC is the number of correct matches in the set of initial matches, and IF is the initial number of false matches. The experimental results of LIL outlier removal and RANSAC are listed in Table 6. The estimated affine models of RANSAC are inaccurate, and the registration errors are large. Compared with LIL outlier removal, RANSAC detects a similar number of correct matches but reserves many false matches. By contrast, LIL can remove nearly all false matches and retain as many correct matches as possible. According to Table 6, Accuracy, Precision, Recall, and Speci f icity can be calculated, as shown in Figure 15. Among the four indicators, the Accuracy, Precision, and Speci f icity of LIL are all higher than those of RANSAC, and the detected matches conform to the actual situation more accurately, with a lower false alarm rate. Recall is relatively low, i.e., LIL is weak in identifying correct matches. The reason is that some correct matches with large registration error are eliminated to ensure high registration accuracy. Given that sufficient correct matches are selected, a transformation model with high accuracy can be estimated.  According to Table 6, Accuracy, Precision, Recall, and Speci f icity can be calculated, as shown in Figure 15. Among the four indicators, the Accuracy, Precision, and Speci f icity of LIL are all higher than those of RANSAC, and the detected matches conform to the actual situation more accurately, with a lower false alarm rate. Recall is relatively low, i.e., LIL is weak in identifying correct matches. The reason is that some correct matches with large registration error are eliminated to ensure high registration accuracy. Given that sufficient correct matches are selected, a transformation model with high accuracy can be estimated. For image pair 2 (which has a low Recall of LIL outlier removal), we analyze the RMSE of correct matches, as shown in Table 7. LIL removes numerous matches with relatively high errors, so the estimated transformation model is more accurate than that of RANSAC, and its registration accuracy is higher. For image pair 2 (which has a low Recall of LIL outlier removal), we analyze the RMSE of correct matches, as shown in Table 7. LIL removes numerous matches with relatively high errors, so the estimated transformation model is more accurate than that of RANSAC, and its registration accuracy is higher. Time consumptions of RANSAC and LIL outlier removal are listed in Table 6. RANSAC takes less time than the proposed method. In terms of image pairs 1, 4, and 6, the computational cost of LIL is closed to that of RANSAC. Because the dimension of variation matrix of relative position is relative to initial matches number, the time complexity is O(n 2 /2). Therefore, the computational time of image pairs 2 and 3 is much higher than that of others.

Discussion
The proposed LIL-based registration algorithm for satellite remote sensing images performs well on simulated and real images with urban scenes.
On images with simulated transformations, the LIL structure has large scale and rotation invariance. Compared with the algorithms based on line matching, i.e., LP and MSLD, the proposed algorithm's intersections are more robust to edge fragmentation and occlusion, and the positioning accuracy are higher. The proposed algorithm can resist large scale, rotation, illumination variations, and occlusion conditions, achieving registration with sub-pixel accuracy and high precision on general images. In addition, the matching results show that the LIL local descriptor has higher discriminability compared with the SIFT descriptor.
For real images with large background variations, the average RMSE and Precision of the proposed algorithm are 0.78 and 99.1%, respectively. By contrast, SIFT, LP and MSLD, which use only local gradient information and local structural constraints, have relative low registration accuracy or are even unable to register on images with large global geographic structure variations. The robustness of RLI is poor. RLI selects triplets of intersections of matching lines for registration iteratively. The performance of intersection matching depends on line matching. For images with small variations, high-accuracy registration can be achieved. However, for images with large background variations, if matches used for initial model estimation have large registration error, then the process of iterative refinement for transformation model easily diverges. LJL performs the second best among all methods, verifying the stability of the LJL structure, but the LJL descriptor is weak in resisting large radiation variations, as in image pair 4. In addition, although LJL can detect numerous correct matches, it has high computational complexity and is time consuming for describing and matching. Each LJL constructed in the original images is adjusted to all images in the pyramids and is described there, whereas the proposed algorithm describes each LIL only once. The LJL match propagation also need more calculation steps including local homography estimation and individual line segment matching, whereas the LIL outlier removal only uses simple affine properties and propagates once. For remote sensing images with complex textures, for example, measured on a 2.8 GHz Inter (R) Core (TM) processor with 16 GB of RAM, LJL consumes approximately 2 h to register an image pair with about 5000 and 4000 LJLs, whereas LIL takes around 1 min 10 s, which shows a great reduction in computational cost.
The reasons for the excellent performance of the proposed algorithm are as follows: (1) The LIL structure can well describe the contours of objects in the image, and intersections can reduce the impact of broken line segments and have more accurate positioning, against large background variations; (2) The LIL local feature description utilizes the neighborhood information of two lines and their intersection, and the division of descriptor region is more detailed than that of LBD and RLI. Besides, differing from traditional circular descriptors such as SIFT and LJL, the double-rectangular descriptor contain more structural information; (3) The LIL outlier removal strategy by using LIL structure and relative position changes between any of LILs can eliminate nearly all false matches that do not conform to the spatial relations given many stubborn outliers in the matching set. The proposed algorithm performs well in most remote sensing images of urban areas.
The proposed algorithm still has some limitations. (1) Intersections tend to be unevenly distributed due to the influence of line segment distribution. Clustered in some areas, many similar points with closed positions bring difficulties to the matching process; (2) The division and dimension of the descriptor are not the optimal solution. The width and length of blocks are not adaptive; instead, they depend on empirical values. Moreover, the division of the line support region along the direction of line segment leads to excessive descriptor dimensions and increases computational complexity; (3) The proposed algorithm has many false matches in the initial matching set and relies too much on the LIL outlier removal algorithm. Future work includes optimizing the structure and parameters of the descriptor to retain more significant features.

Conclusions
Given that many contours and edges exist in urban areas and the locations of intersections are less affected by large background variations than that of line segments, this paper proposes a registration algorithm based on line-intersection-line structure on satellite remote sensing images with urban scenes. The local information and spatial relations of LIL structure are utilized to design a description and matching strategy. First, multi-scale line segments are detected, and some constraints are implemented to extract LIL features. Next, LIL local descriptors are constructed with the pixel gradients of the LIL neighborhood to realize preliminary matching. Finally, a graph-based LIL outlier removal method is implemented using the LIL structure and variations in the relative position between matches in reference and target images. Outliers are eliminated successively to estimate the affine transformation model.
The proposed algorithm performs well on simulated and real images. It can resist large scale, rotation, illumination, and occlusion variations. It has good robustness against large background variations, achieving sub-pixel accuracy and high precision registration. The LIL local descriptor has discriminability and invariance. In addition, the LIL outlier removal strategy can identify as many false matches as possible and retain sufficient correct matches to estimate a high-accuracy transformation model with high robustness.
In summary, the proposed algorithm is more accurate and robust against background variations compared with the compared state-of-the-art methods. The parameters and structure of LIL will be further optimized in the future.