Automatic Registration Method for Optical Remote Sensing Images with Large Background Variations Using Line Segments

Image registration is an essential step in the process of image fusion, environment surveillance and change detection. Finding correct feature matches during the registration process proves to be difficult, especially for remote sensing images with large background variations (e.g., images taken pre and post an earthquake or flood). Traditional registration methods based on local intensity probably cannot maintain steady performances, as differences are significant in the same area of the corresponding images, and ground control points are not always available in many disaster images. In this paper, an automatic image registration method based on the line segments on the main shape contours (e.g., coastal lines, long roads and mountain ridges) is proposed for remote sensing images with large background variations because the main shape contours can hold relatively more invariant information. First, a line segment detector called EDLines (Edge Drawing Lines), which was proposed by Akinlar et al. in 2011, is used to extract line segments from two corresponding images, and a line validation step is performed to remove meaningless and fragmented line segments. Then, a novel line segment descriptor with a new histogram binning strategy, which is robust to global geometrical distortions, is generated for each line segment based on the geometrical relationships,including both the locations and orientations of theremaining line segments relative to it. As a result of the invariance of the main shape contours, correct line segment matches will have similar descriptors and can be obtained by cross-matching among the descriptors. Finally, a spatial consistency measure is used to remove incorrect matches, and transformation parameters between the reference and sensed images can be figured out. Experiments with images from different types of satellite datasets, such as Landsat7, QuickBird, WorldView, and so on, demonstrate that the proposed algorithm is automatic, fast (4 ms faster than the second fastest method, i.e., the rotationand scale-invariant shape context) and can achieve a recall of 79.7%, a precision of 89.1% and a root mean square error (RMSE) of 1.0 pixels on average for remote sensing images with large background variations.


Introduction
Image registration is the process of geometrically-aligning two or more images of the same scene taken from different viewpoints, at different times or by different sensors, and the main objective of it is to find the most appropriate transformation parameters between the images [1][2][3][4].It has been widely used in the field of remote sensing, including image fusion, change detection [5][6][7], disaster analysis [8], semantic labelling of images [9] and mapping sciences [10][11][12].Classical methods generally fall into two categories: area-based methods and feature-based methods [1].The majority of registration algorithms consist of four steps: feature detection, feature matching, transformation model estimation and image resampling [13][14][15].Many algorithms thathad been proposed in order to register remote sensing images with geometric and illumination differences mainly focused on the following parts: feature detection, feature description and feature matching [16][17][18][19].
When a disaster, such as a flood or earthquake, takes place, images taken pre and post the disaster should be registered rapidly in order to do the change detection [20].Registration of this type of remote sensing image pairs turns out to be difficult because significant intensity differences always exist in the same area of the corresponding images.The main characteristics of such image pairs are as follows: local intensity values change significantly; local objects, such as blocks or buildings, are ruined; changes in scale and viewpoint exist in images; the main shape contours, such as mountain ridges, long roads and coastal lines, remain relatively stable.Points, lines and regions are the three most common features used in the feature-based registration method [21][22][23], and the information of intensity and shape structure isusually used to describe the feature [23][24][25].The most popular local intensity-based feature detector and descriptor is the scale-invariant feature transform (SIFT) [26], which uses the local maximum of the scale space as the point feature and describes it depending on the gradient location and orientation around the point.The SIFT descriptor is highly discriminative, and many other descriptors, such as the speeded-up robust feature (SURF) [27], the gradient location and orientation histogram (GLOH) [28] and the affine-SIFT (ASIFT) [29], have been developed based on it.Bradley [30] presented a complex scale-invariant feature transform (CSIFT) algorithm, which is suitable for complex-valued images.Sedaghat et al. [31] proposed a uniform robust scale-invariant feature matching (UR-SIFT) approach for optical remote sensing images, which had solved the problem of the uneven distribution of the SIFT point features.However, SIFT-based features are hard to detect and match in remote sensing images with large background variations because of significant intensity changes in their neighborhoods.Ghosh et al. [32] introduced a method containing both registration and segmentation based on the advances in sparsity learning.An efficient and accurate set-based registration algorithm for aerial images was proposed by Arandjelović [33].This algorithm formulated the joint registration problem of a set of image sequences as an optimization scheme, and both robustness and high accuracy could be obtained.A few georeferencing-based methods had also been proposed in recent years [34,35], but ground control points (GCP) are not always available in remote districts and mountainous areas.Generally, the main shape contours, such as coastal lines, long roads and mountain ridges, hold relatively stable information, although global geometrical distortions between two corresponding images are more complex than local ones.The shape context (SC) proposed by Belongie et al. [36] is an effective and rich shape-based descriptor, which is used to describe each point of a shape silhouette based on the geometrical relationships of the remaining points relative to it, and it has been widely used in shape retrieval and trademark recognition.Huang [37] firstly introduced the shape context into remote sensing image registration.The Harris corner detector and the shape context were used as the feature detector and descriptor, respectively, and preferable results can be achieved in images with small background variations.However, its histogram binning strategy is not appropriate for global geometrical distortions, which may produce many outliers.Jiang [38] proposed a rotation and scale invariant shape context (RSISC) using a multi-scale morphological edge detector and an improved shape context descriptor, which was specially designed for global geometrical distortions.The magnitude of Fourier transform in one dimension was applied to achieve rotation invariance, which once was a big obstacle for the widespread use of the shape context.Nonetheless, RSISC cannot cope with high-resolution satellite images, because the gradient orientation of the edge point is too unreliable to be used.Arandjelović [39] proposed an object matching method using boundary descriptors for smooth and untextured images.Boundary curves were segmented from images, and equidistant key points could be obtained by sampling the curves.Then, the boundary normals of a local profile around one key point were utilized to describe this point.Although this algorithm isvery successful when handling the matching problem in many classical object image sets, there still exist two limitations when dealing with images taken pre and post disasters: since the key points are drawn independently from the two non-closed contours, there is inevitable jitter noise due to the differences of the starting points; the local descriptor is more sensitive to local noise and incorrect shape components, which commonly exist in remote sensing images.
Several line segment matching algorithms designed for industrial vision applications have also been reported in recent years.Arandjelović [40] proposed a gradient edge mapping algorithm for frontal face recognition, but the vertical symmetry of faces was utilized in the edge extraction process, which might not be effective in other scenarios.Wang et al. [41] defined a robust descriptor called MSLD (mean-standard deviation line descriptor) for line matching.The neighbor region of each line segment is divided into non-overlapped sub-regions, and the descriptor is generated using a SIFT-like strategy.Zhang et al. [42] presented a line matching method utilizing both the local appearance of line segments and their geometrical attributes.López [43] proposed a novel approach for line detection and matching.A phase-based edge detector and a fusion stage are performed for line detection, and an iterative process utilizing both the structural information and the appearance of each line is performed for line matching.Although the line matching algorithms mentioned above can achieve preferable results in certain situations, they may be non-effective for images taken pre and post disasters as a result of the use of local intensity information.
In order to register remote sensing images with large background variations, in which most local intensity information is unreliable, we proposed an automatic image registration method based on the line segments, which is described using the information of the main shape contour (named RMLSM for short).First, EDLines (Edge Drawing Lines) [44] is used to extract line segments from the reference and sensed images, and a line validation step is performed to remove meaningless and fragmented ones thatdo not belong to the main shape contour.Then, a novel line segment descriptor is generated for each line segment based on the geometrical relationships of the remaining line segments relative to it.As the main shape contours are relatively stable, correct line matches could be obtained by cross-matching among their descriptors.Finally, a spatial consistency measure is utilized to increase matching accuracy, and transformation parameters between the reference and sensed images can be figured out.Experiments on different types of satellite datasets comprising both rough terrains (e.g., mountainous areas or urban regions)and flat areas demonstrate that the proposed algorithm is automatic, fast and can achieve high accuracy (evaluated by recall, precision, correct matches (CM) and root mean square error (RMSE)) for remote sensing images with large background variations.
The rest of this paper is organized as follows.In Section 2, the proposed method is described in detail.Experiments and results are presented in Section 3. The discussion and conclusions parts are given in Sections 4 and 5 respectively.

Methodology
The entire process of the RMLSM algorithm is shown in Figure 1, and it can be divided into the following steps: feature detection, feature matching, transformation model estimation and accuracy evaluation.An image pair taken before and after a tsunami in Sendai city, Japan, is shown in Figure 2a, and there are two types of line segments with red and yellow colors in Figure 2b.We can see that although the backgrounds change significantly between two images, the main shape contours, which are depicted in Figure 2b using red lines, remain relatively invariant.Line segments depicted in yellow are meaningless and fragmented ones that should be discarded during the line segment detection process.

Feature Detection
As shown in Figure 1, this section can be divided into two parts: line segments' detection and line segments' validation.From Figure 2b, we can see that there exist two types of line segments.Red ones mainly consist of so-called meaningful line segments, which constitute the main invariant information (i.e., the main shape contour), while yellow ones mainly consist of meaningless and fragmented line segments, which may perturb the feature descriptor generation.The main characteristics of these meaningful line segments are as follows: large gradient magnitude; long distance; small slope differences between adjacent line segments.EDLines and a line validation step

Feature Detection
As shown in Figure 1, this section can be divided into two parts: line segments' detection and line segments' validation.From Figure 2b, we can see that there exist two types of line segments.Red ones mainly consist of so-called meaningful line segments, which constitute the main invariant information (i.e., the main shape contour), while yellow ones mainly consist of meaningless and fragmented line segments, which may perturb the feature descriptor generation.The main characteristics of these meaningful line segments are as follows: large gradient magnitude; long distance; small slope differences between adjacent line segments.EDLines and a line validation step

Feature Detection
As shown in Figure 1, this section can be divided into two parts: line segments' detection and line segments' validation.From Figure 2b, we can see that there exist two types of line segments.Red ones mainly consist of so-called meaningful line segments, which constitute the main invariant information (i.e., the main shape contour), while yellow ones mainly consist of meaningless and fragmented line segments, which may perturb the feature descriptor generation.The main characteristics of these meaningful line segments are as follows: large gradient magnitude; long distance; small slope differences between adjacent line segments.EDLines and a line validation step are used in this section in order to obtain as many meaningful line segments as possible from the reference and sensed images.

Line Segments' Detection Using EDLines
EDLines is a real-time line segment detector with high accuracy and blazing speed, and it is very suitable for extracting the line segment that constitutes the main shape contour.This algorithm is composed of three parts: (1) Given a grayscale image, a fast and novel edge detector called edge drawing (ED) [45], which can produce a set of contiguous, clean chains of pixels, is run.These pixels intuitively correspond to object boundaries.(2) Next, line segments are extracted from the generated pixel chains using the least squares line fitting method.(3) Finally, a line selection step based on the Helmholtz principle is used to eliminate incorrect line segment detections.
Compared toother famous edge detectors, such as Canny, etc., ED produces a set of contiguous, clean and connected chains of edge pixels rather than disjoint, independent and discontinuous entities.The flow chart of ED is shown in Figure 3.The comparative edge detection results of ED, Canny and Prewitt implemented on the left image of Figure 2a are shown in Figure 4. From Figure 4, it can be observed that a set of clean, contiguous, well-localized, one-pixel-wide edge segments isextracted by ED instead of only an isolated binary edge map (Prewitt) or an edge map containing many disjoint, unattended noisy edges (Canny).
Remote Sens. 2016, 8, 426 5 of 21 are used in this section in order to obtain as many meaningful line segments as possible from the reference and sensed images.

Line Segments' Detection Using EDLines
EDLines is a real-time line segment detector with high accuracy and blazing speed, and it is very suitable for extracting the line segment that constitutes the main shape contour.This algorithm is composed of three parts: (1) Given a grayscale image, a fast and novel edge detector called edge drawing (ED) [45], which can produce a set of contiguous, clean chains of pixels, is run.These pixels intuitively correspond to object boundaries.(2) Next, line segments are extracted from the generated pixel chains using the least squares line fitting method.(3) Finally, a line selection step based on the Helmholtz principle is used to eliminate incorrect line segment detections.
Compared toother famous edge detectors, such as Canny, etc., ED produces a set of contiguous, clean and connected chains of edge pixels rather than disjoint, independent and discontinuous entities.The flow chart of ED is shown in Figure 3.The comparative edge detection results of ED, Canny and Prewitt implemented on the left image of Figure 2a are shown in Figure 4. From Figure 4, it can be observed that a set of clean, contiguous, well-localized, one-pixel-wide edge segments isextracted by ED instead of only an isolated binary edge map (Prewitt) or an edge map containing many disjoint, unattended noisy edges (Canny).Next, the line segment extraction step is performed by splitting an edge segment consisting of a contiguous chain of edge pixels into one or more straight line segments.Walking over the pixels in sequence, lines are fit to pixels by means of the least squares line fitting method until the error exceeds a certain threshold (e.g., 1 pixel error).A new line segment is generated when the error exceeds this threshold.The remaining pixels of the chain are processed recursively until the last pixel of the edge segments and a set of line segments with endpoints expressed by sub-pixel coordinates are obtained.Table 1 shows the algorithm thatis used to extract line segments from a set of edge segments.
Finally, a line selection step is implemented based on the Helmholtz principle, which basically states that the expectation of one structure by chance must be very low for this structure to be perceptually meaningful [46].This step can help to eliminate a part of the meaningless and fragmented lines.To make this selection process by the Helmholtz principle concrete, "Number of False Alarms(NFA)" of a line segment is defined as follows: Let X be a line segment of length "ld" with at least "z" points having their directions (thedirection of a unit vector, which is orthogonal to the unit gradient vector of the reference point) aligned with the direction of X in an image of size N N × pixels.The NFA of X is defined as: are used in this section in order to obtain as many meaningful line segments as possible from the reference and sensed images.

Line Segments' Detection Using EDLines
EDLines is a real-time line segment detector with high accuracy and blazing speed, and it is very suitable for extracting the line segment that constitutes the main shape contour.This algorithm is composed of three parts: (1) Given a grayscale image, a fast and novel edge detector called edge drawing (ED) [45], which can produce a set of contiguous, clean chains of pixels, is run.These pixels intuitively correspond to object boundaries.(2) Next, line segments are extracted from the generated pixel chains using the least squares line fitting method.(3) Finally, a line selection step based on the Helmholtz principle is used to eliminate incorrect line segment detections.
Compared toother famous edge detectors, such as Canny, etc., ED produces a set of contiguous, clean and connected chains of edge pixels rather than disjoint, independent and discontinuous entities.The flow chart of ED is shown in Figure 3.The comparative edge detection results of ED, Canny and Prewitt implemented on the left image of Figure 2a are shown in Figure 4. From Figure 4, it can be observed that a set of clean, contiguous, well-localized, one-pixel-wide edge segments isextracted by ED instead of only an isolated binary edge map (Prewitt) or an edge map containing many disjoint, unattended noisy edges (Canny).Next, the line segment extraction step is performed by splitting an edge segment consisting of a contiguous chain of edge pixels into one or more straight line segments.Walking over the pixels in sequence, lines are fit to pixels by means of the least squares line fitting method until the error exceeds a certain threshold (e.g., 1 pixel error).A new line segment is generated when the error exceeds this threshold.The remaining pixels of the chain are processed recursively until the last pixel of the edge segments and a set of line segments with endpoints expressed by sub-pixel coordinates are obtained.Table 1 shows the algorithm thatis used to extract line segments from a set of edge segments.

Final edge map
Finally, a line selection step is implemented based on the Helmholtz principle, which basically states that the expectation of one structure by chance must be very low for this structure to be perceptually meaningful [46].This step can help to eliminate a part of the meaningless and fragmented lines.To make this selection process by the Helmholtz principle concrete, "Number of False Alarms(NFA)" of a line segment is defined as follows: Let X be a line segment of length "ld" with at least "z" points having their directions (thedirection of a unit vector, which is orthogonal to the unit gradient vector of the reference point) aligned with the direction of X in an image of size N N × pixels.The NFA of X is defined as: Next, the line segment extraction step is performed by splitting an edge segment consisting of a contiguous chain of edge pixels into one or more straight line segments.Walking over the pixels in sequence, lines are fit to pixels by means of the least squares line fitting method until the error exceeds a certain threshold (e.g., 1 pixel error).A new line segment is generated when the error exceeds this threshold.The remaining pixels of the chain are processed recursively until the last pixel of the edge segments and a set of line segments with endpoints expressed by sub-pixel coordinates are obtained.Table 1 shows the algorithm thatis used to extract line segments from a set of edge segments.
Finally, a line selection step is implemented based on the Helmholtz principle, which basically states that the expectation of one structure by chance must be very low for this structure to be perceptually meaningful [46].This step can help to eliminate a part of the meaningless and fragmented lines.To make this selection process by the Helmholtz principle concrete, "Number of False Alarms(NFA)" of a line segment is defined as follows: Let X be a line segment of length "ld" with at least "z" points having their directions (thedirection of a unit vector, which is orthogonal to the unit Remote Sens. 2016, 8, 426 6 of 21 gradient vector of the reference point) aligned with the direction of X in an image of size N ˆN pixels.The NFA of X is defined as: where "q" represents the accuracy of the line direction (q is set to be 0.125, referring to [44]).The line selection step can be realized as follows: we compute the NFA of a line segmentand preserve it if NFApld, zq ď ε; otherwise, this line segment will be rejected.Desolneux et al. [47] suggest setting ε to 1, which indicates that at most, one false detection is allowed per image.Figure 5a shows EDLines in action on an image pair taken pre and post a tsunami.Output "line_equation" FitLines(pixelChain+lineLen, pixel_num-lineLen); // Extract line segments from the remain pixels } //end-FitLines

Line Segments' Validation
A preliminary set of line segments is obtained after the implementation of EDLines.We see from Figure 5a that although the main shape contours have been extracted, there still exist many fragmented line segments, which are likely to perturb the feature matching process.As these fragmented line segments are usually discontinuous and shorter than the ones on the main shape contour, we use a length threshold len t to remove these line segments.In order to distinguish meaningful line segments from fragmented ones, this threshold should be determined appropriately.For the evaluation of this purely empirical threshold, we use lm to represent the median length of all line segments, and a sequence of candidate thresholds from 0.2 ˆlm to 1.8 ˆlm with an interval of 0.2 ˆlm isevaluated on ten real image pairs.Two evaluation criteria, i.e., recall and precision, which are defined as the ratio of the number of correct matches obtained after validation to the number of all original correct matches and the ratio of the number of correct matches obtained after validation to the number of all matches after validation, respectively, are utilized in order to test the effectiveness of each candidate threshold.The recall and precision values against the candidate threshold are shown in Figure 6, and each point in the graph represents the average of ten image pairs.According to [21], in order to obtain a better result in the shape-based feature matching algorithm, both the original recall and precision values would be best to be more than 0.7.We see from Figure 6 that in order to get a relatively better recall-precision tradeoff, the most appropriate value of len t is in r0.8 ˆlm, 1.2 ˆlms, and len t " 1.0 ˆlm is chosen in this method for all datasets.Figure 5b shows line segments' validation in action on Figure 5a.
which are defined as the ratio of the number of correct matches obtained after validation to the number of all original correct matches and the ratio of the number of correct matches obtained after validation to the number of all matches after validation, respectively, are utilized in order to test the effectiveness of each candidate threshold.The recall and precision values against the candidate threshold are shown in Figure 6, and each point in the graph represents the average of ten image pairs.According to [21], in order to obtain a better result in the shape-based feature matching algorithm, both the original recall and precision values would be best to be more than 0.7.We see from Figure 6 that in order to get a relatively better recall-precision tradeoff, the most appropriate value of    all line segments, and a sequence of candidate thresholds from × 0.2 lm to × 1.8 lm with an interval of × 0.2 lm isevaluated on ten real image pairs.Two evaluation criteria, i.e., recall and precision , which are defined as the ratio of the number of correct matches obtained after validation to the number of all original correct matches and the ratio of the number of correct matches obtained after validation to the number of all matches after validation, respectively, are utilized in order to test the effectiveness of each candidate threshold.The recall and precision values against the candidate threshold are shown in Figure 6, and each point in the graph represents the average of ten image pairs.According to [21], in order to obtain a better result in the shape-based feature matching algorithm, both the original recall and precision values would be best to be more than 0.7.We see from Figure 6 that in order to get a relatively better recall-precision tradeoff, the most appropriate value of

Feature Matching
In this section, we assume that the main shape contour of an image is essentially captured by a subset of line segments originated from the edge segments.More concretely, the main shape contour can be represented by a discrete set of line segments produced by Section 2.1.The final line segment matching set can be obtained as follows: (1) a descriptor is proposed for each line segment to find its best match in the other image; (2) a spatial consistency measure is used to remove incorrect matches obtained in Step (1).

Feature Descriptor
Two line segment sets obtained in Section 2.1 can be denoted as L " tl i u pi " 1, 2, 3...mq and S " s j ( pj " 1, 2, 3...nq; m and n are the numbers of line segments in L and S, respectively.For each line segment l i in the first image, we want to find its "best" matching line segment s j in the second image.Experience suggests that the richer the information one descriptor has, the easier the matching process will be.As intensity information around the line segment is unreliable, we consider the set of vectors originating from the position of a line segment to that of all other line segments on the main shape contour.These vectors represent the relationship of the entire shape contour relative to the reference line segment.Obviously, a descriptor constructed by this set of vectors is rich enough.In our approach, the geometric attributes of the line segments are represented by their midpoints, which can be denoted as P " tp i u " tx i , y i u pi " 1, 2, 3...mq (p i corresponds to l i ) in the first image and the orientations of their normal vectors.For a line segment l i in the first image, we compute a histogram h i ptq expressing the relative coordinates of the remaining m ´1 midpoints to its midpoint p i , h i ptq " # pp j ´pi q P binptq, pj " 1, 2, 3...m, j ‰ iq ( We use a new histogram binning strategy as shown in Figure 7b, and Figure 7a shows the log-polar space used in the shape context(SC), which is more sensitive to the location of nearby points than to those farther away.However, when log-polar space is applied, bins near the center become too small, which may lead to the sensitivity of the descriptor to the pixel location errors, and bins far from the center become too big to describe the shape silhouette exactly.Consider that the distribution of line segments is not as dense as that of points;the proposed histogram binning strategy with relatively uniform bins seems more suitable.We use identical radial quantization lengths and larger angular quantization numbers for outer circles compared toinner circles.The radial quantization number nr and the angular quantization number tM 1 , M 2 , ..., M nr u (M 1 corresponds to the smallest circle) will be discussed in Section 3. The advantages of the proposed histogram binning strategy are as follows: (1) robustness to pixel location errors; (2) accurate description of the shape contour far from the reference line segment; (3)

Feature Matching
In this section, we assume that the main shape contour of an image is essentially captured by a subset of line segments originated from the edge segments.More concretely, the main shape contour can be represented by a discrete set of line segments produced by Section 2.1.The final line segment matching set can be obtained as follows: (1) a descriptor is proposed for each line segment to find its best match in the other image; (2) a spatial consistency measure is used to remove incorrect matches obtained in Step (1).; m and n are the numbers of line segments in L and S , respectively.For each line segment i l in the first image, we want to find its "best" matching line segment j s in the second image.Experience suggests that the richer the information one descriptor has, the easier the matching process will be.As intensity information around the line segment is unreliable, we consider the set of vectors originating from the position of a line segment to that of all other line segments on the main shape contour.These vectors represent the relationship of the entire shape contour relative to the reference line segment.Obviously, a descriptor constructed by this set of vectors is rich enough.In our approach, the geometric attributes of the line segments are represented by their midpoints, which can be denoted as { } { , }( 1,2,3... ) in the first image and the orientations of their normal vectors.For a line segment i l in the first image, we compute a histogram ( ) We use a new histogram binning strategy as shown in Figure 7b, and Figure 7a shows the log-polar space used in the shape context(SC), which is more sensitive to the location of nearby points than to those farther away.However, when log-polar space is applied, bins near the center become too small, which may lead to the sensitivity of the descriptor to the pixel location errors, and bins far from the center become too big to describe the shape silhouette exactly.Consider that the distribution of line segments is not as dense as that of points;the proposed histogram binning strategy with relatively uniform bins seems more suitable.We use identical radial quantization lengths and larger angular quantization numbers for outer circles compared toinner circles.The radial quantization number nr and the angular quantization number  As shown in Figure 7c, the same distribution of points may produce different shape contours.Therefore,in each bin, we not only count the number of line segments falling in it, but also consider the orientation of each line segment (the orientation of the normal vector of each line segment) in this bin.A unit vector is attached to each line segment with its direction equal to the orientation of this line segment l i , and it is decomposed into two vectors along the x-axis and the y-axis, respectively.The d-bin histogram h i ptq for the line segment l i is then converted to a 2d-bin histogram as follows: It is easy to see that this 2d-bin histogram can provide a more accurate description for each line segment.We normalize radial distances by a factor α, which is the mean distance among the m 2 midpoint pairs in the image to achieve scale invariance, and orientations relative to that of the reference line segment are used to achieve rotation invariance.Finally, a Gaussian weighting function (see Figure 8b) is used to increase the robustness against the geometric distortions.The complete process of the proposed descriptor generation is shown in Figure 8.Compared to the descriptor in [39] (named OMBD (Object Matching Using Boundary Descriptors) for short), which is presented for smooth and untextured object matching, our proposed descriptor seems more suitable for remote sensing images with large background variations mainly due to the following aspects.
descriptor in [39] (named OMBD (Object Matching Using Boundary Descriptors) for short), which is presented for smooth and untextured object matching, our proposed descriptor seems more suitable for remote sensing images with large background variations mainly due to the following aspects.
We use a global descriptor instead of a local one in the OMBD, which is more sensitive to local noise and incorrect shape components, which commonly exist in remote sensing images; the OMBD descriptor only considers the local profile of the boundary normal's direction around a reference key point rather than the geometric relations of them, which may result in an unstable descriptor in case disordered segments exist.
Consider a line segment i l in the first image and a line segment j s in the second image.The cost of matching these two line segments isdefined as: where '( )  We use a global descriptor instead of a local one in the OMBD, which is more sensitive to local noise and incorrect shape components, which commonly exist in remote sensing images; the OMBD descriptor only considers the local profile of the boundary normal's direction around a reference key point rather than the geometric relations of them, which may result in an unstable descriptor in case disordered segments exist.
Consider a line segment l i in the first image and a line segment s j in the second image.The cost of matching these two line segments isdefined as: where h 1 i poq and h 1 j poq denote the 2d-bin normalized histogram at l i and s j , respectively.l a and s b are deemed as a correct line segment match if the following cross-matching expressions are satisfied: That is, not only s b is the best match of l a in the second image, but l a is the best match of s b in the first image, as well.This is a cross-matching process thataims at obtaining accurate results in case fragmented line segments or occlusions exist.

A Spatial Consistency Measure
Two preliminary line segment matching sets of size Q have been obtained after the steps above, which can be denoted as Lp " tl i u " tpk i , b i qu and Sp " ts i u " pk 1  i , b 1 i q ( pi " 1, 2, 3, ..., Q, l i matches s i , parameters k and b of the line equation y " kx `b are used to represent a line segment).Although the proposed descriptor is robust and discriminative, there still exist a few outliers in the preliminary matching sets mainly due to the small differences of two main shape contours.In this part, a spatial consistency measure [48] is utilized to remove the outliers.The spatial consistency is a measurement of the closeness of the reference features to the sensed features under the present transformation, and it is usually used for point matching.We roughly use the global affine transformation, of which the parameters are denoted as T g pθq, to approximate the present transformation and to evaluate the closeness of the matches one by one.The transformation is formulated as: where px 1 , y 1 q and px, yq represent the coordinates of two corresponding points.In order to calculate T g pθq (i.e., a 1 -a 6 ) from tpk i , b i qu and pk 1 i , b 1 i q ( , px 1 , y 1 q and px, yq should be replaced by pk 1 , b 1 q and pk, bq using the equation y " kx `b. In each iteration, first, a group of affine transformation parameters T g pθq between the two corresponding images can be calculated by means of the least square method using tpk i , b i qu and pk 1 i , b 1 i q ( pi " 1, 2, 3, ..., Qq; then, an error threshold E t " tk e , b e u is set for the slopeand intercept independently,and the root mean square error (RMSE) is obtained as follows: d_b i 2 s{Qu, dpiq " pT g pθqpl i q ´si q " td_k i , d_b i u, i " 1, 2, 3, ...Q.
Both dpiq and ErT g pθqs are composed of two components, which relate to the slope and intercept, respectively.Finally, while ErT g pθqs ą E t , the match with the maximum of dpiq is removed.The line segment matching sets are renewed, and the transformation parameters are recalculated.The iteration process will not stop until ErT g pθqs ď E t , and two final line segment matching sets can be obtained.

Transformation Model Estimation
After two matching sets of line segments have been obtained, the exact transformation parameters between the reference and sensed images will be calculated.According to [21], the relationship between two remote sensing images taken from the same area can be approximately modeled by affine transformations.In this section, a global affine transformation model is applied.In general, the affine transformation consists of a translation and linear transformations (rotation, shear or scaling), and it preserves collinearity and the ratio of distance.In the ideal situation, affine transformation parameters can be figured out from three nonparallel corresponding line segment matches.However, in order to increase the accuracy and robustness, all of the line segment matches obtained from Section 2.2 are used to estimate the coefficients by means of the least square method.

Accuracy Evaluation
The accuracy evaluation of the registration is performed using recall, precision, CM and RMSE, which can be defined as follows: recall " true positives total positives (7) precision " true positives true positives `f alse positives (8) correct matches " true positives (9) where the true positives represents the number of correct line segment matches in the final matching sets, the total positives represents the number of correct line segment matches existing in two images after feature detection and the f alse positives represents the number of incorrect line segment matches in the final matching sets.Two completely correct and well-distributed point matching sets of size np, which are denoted as tpe i u and pe 1 i ( pi " 1, 2, 3..., np, pe i matches pe 1 i q, are selected manually fromthe two corresponding images by an experienced expert to calculate the RMSE, and Tpθq represents the transformation parameters obtained in Section 2.3.

Experiments and Results
In this part, the performance of the RMLSM is evaluated using several types of satellite images.Experimental results compared tothe UR-SIFT, the RSISC and the MSLD are presented to verify the effectiveness of the proposed method.Image datasets, parameter setting and experimental results are introduced in the following.

Image Datasets
Two groups of remote sensing images are used to evaluate the performance of the RMLSM, including simulated images and multisensor/multitemporal real images.Because the highlight of the RMLSM is its robustness against large background variations, the image pairs of the first group are simulated by utilizing several patches of cloud in various shapes, of which centers are distributed randomly across the image, instead of the original pixels, and a certain amount of affine transformations are also added to the images.Figure 9 shows several examples of the simulated images.The second group consists of different types of multisensor and multitemporal images, most of which are taken pre and post disasters.The complete information of all image pairs is shown in Table 2.This dataset covers a variety of high-, medium-and low-resolution images with a ground sample distance (GSD) from 0.61 to 231.65 m, and both rough terrains and flat regions are included in this dataset.All real images have significant background variations and a certain amount of scale and viewpoint differences.

Parameter Setting
In our experiments, the UR-SIFT, the RSISC and the MSLD are implemented based on the codes offered by the authors of the three algorithms.
During the feature detection, the EDLines algorithm is implemented based on the publicly-available codes on their website with the default parameters presented explicitly in [44].The length threshold len t in the line segments validation step is set to be 1.0 ˆlm, which has already been described in detail in Section 2.1.2.
In the feature matching process, there are three parameters, i.e., radial quantization number nr, angular quantization number tM 1 , M 2 , ..., M nr u and an error threshold E t .The choices of these three parameters are purely empirical, and two aspects should be considered in the selection process: higher accuracy and lower dimension of the descriptor.Several combinations are tested on ten pairs of real remote sensing images, and the average results are depicted in Table 3.The best accuracy-dimension tradeoff is obtained when nr " 6; tM 1 , M 2 , ..., M 6 u " t4, 6, 8, 10, 12, 14u, which results in a 108-dimensional descriptor.In the section of the spatial consistency measure, E t is used to remove false matches.As the slope and intercept of a line segment are two different variables, E t " tk e , b e u " t0.1, 5u is able to distinguish an outlier from an inlier based on experience.

Experimental Results
EDLines and a line segment validation step are implemented to extract line segments from all image pairs.Then, the proposed descriptor is used for line segment description and matching.Finally, accuracy comparisons with the UR-SIFT, the RSISC and the MSLD are carried out to evaluate the proposed registration method.Table 4 shows the comparative matching accuracy of the proposed descriptor and the traditional shape context using all real remote sensing images in the image datasets.The process of feature detection is identical before the comparison of the two descriptors.The matching results show that the proposed descriptor is more effective than the traditional shape context in terms of both recall and precision.9, all of which contain considerable background variations and a certain amount of affine transformations.The results presented in Figure 10 show that the proposed method outperforms the other three techniques in recall, precision and RMSE.In some cases, the CM of the RMLSM is less than that of the RSISC.However, this will have little effect on the computational accuracy of the final transformation model, because a line segment contains more information than a point.As the UR-SIFT purely depends on the intensity information in the feature matching process, it performs worst among the four algorithms for all evaluation criteria.The matching results of the RMLSM for three sample simulated image pairs are presented in Figure 11.To make the graphs easier to read, the corresponding line segments have the same number and color.Registrations for such images prove to be difficult because the intensity information is unreliable.This figure shows that although there exist significant background variations and affine transformations in the image pairs, almost all of the line segments on the main shape contour have been extracted and correctly matched.The matching results of the RMLSM for three sample simulated image pairs are presented in Figure 11.To make the graphs easier to read, the corresponding line segments have the same number and color.Registrations for such images prove to be difficult because the intensity information is unreliable.This figure shows that although there exist significant background variations and affine transformations in the image pairs, almost all of the line segments on the main shape contour have been extracted and correctly matched.Figure 12 shows the performances of the four algorithms implemented on seven real image pairs in terms of recall, precision, CM and RMSE.All real image pairs are taken pre and post disasters, and rough terrains, such as mountainous regions and building areas, are included.Results demonstrate the effectiveness and robustness of the proposed method.The UR-SIFT achieves the worst result because it purely relies on the intensity information, which is unreliable in these real disaster image pairs.The RSISC can obtain better results than the UR-SIFT mainly due to the use of shape information, but the neglect of the point orientation results in more outliers than the RMLSM.The MSLD is a line matching algorithm, but it also relies too much on the appearance and the local structure of the neighborhood of each line segment, which makes it not perform as wellas the RMLSM.However, it can be observed that the recall and precision are somewhat low for No.4 in the second group as a result of relatively complicated distortions between these two images.2, while Figure 13 covers the high-resolution satellite images, and Figure 14 covers the medium-and low-resolution ones.To make the graphs easier to understand, the corresponding line segments are presented using the same number and color.Accurate and well-distributed line segment matches can be obtained as shown in these two figures.Results demonstrate the robustness of the proposed method against large background variations and a certain amount of geometric distortions.
Figure 15 shows the registration results of the UR-SIFT (Figure 15a), the RSISC (Figure 15b)and the MSLD (Figure 15c) on three sample real image pairs respectively and the results of the RMLSM (Figure 15d-f) on all these three image pairs.Six sub-images are enlarged for visual assessment.The sub-images cover a variety of scenes, such as roads, coastal lines and mountain ridges.From the results of the RMLSM, we can see that the sensed image fits well with the reference image in spite of the significant background variations.However, it can be observed that slight mismatches around the mountain ridges (Figure 15f) still exist as a result of the relief displacement of the mountains.This is a challenge for the registration of high-resolution images, and it cannot be settled perfectly through an image-to-image registration.Figure 15 shows the registration results of the UR-SIFT (Figure 15a), the RSISC (Figure 15b)and the MSLD (Figure 15c) on three sample real image pairs respectively and the results of the RMLSM (Figure 15d-f)on all these three image pairs.Six sub-images are enlarged for visual assessment.The sub-images cover a variety of scenes, such as roads, coastal lines and mountain ridges.From the results of the RMLSM, we can see that the sensed image fits well with the reference image in spite of the significant background variations.However, it can be observed that slight mismatches around the mountain ridges (Figure 15f) still exist as a result of the relief displacement of the mountains.This is a challenge for the registration of high-resolution images, and it cannot be settled perfectly through an image-to-image registration.Because of the blazing speed of EDLines and the low dimension of the proposed descriptor, the computational complexity of the proposed registration method is the lowest among the four algorithms;the average time of the registration process is 65 ms for the UR-SITF, 47 ms for the RSISC, 96 ms for the MSLD and 43 ms for the RMLSM (using an Intel Core (TM) 2 2.93-GHz central processing unit and a memory of 2 GB).Because of the blazing speed of EDLines and the low dimension of the proposed descriptor, the computational complexity of the proposed registration method is the lowest among the four algorithms;the average time of the registration process is 65 ms for the UR-SITF, 47 ms for the RSISC, 96 ms for the MSLD and 43 ms for the RMLSM (using an Intel Core (TM) 2 2.93-GHz central processing unit and a memory of 2 GB).

Discussion
An automatic image registration method for remote sensing images with large background variations is proposed in this paper, and preferable results are obtained on both simulated and real image pairs.
The average results of the proposed method for both simulated and real image pairs are 79.7%, 89.1% and 1.0 pixels in recall, precision and RMSE, respectively.The RMLSM outperforms the other methods for all three evaluation criteria.The UR-SIFT achieves a recall of 63.1%, a precision of 70.7% and an RMSE of 1.4 pixels on average, which is the worst result among the four algorithms, because it purely depends on the intensity information during the feature matching process.The MSLD can obtain a recall of 73.9%, a precision of 79.4% and an RMSE of 1.2 pixels on average.The accuracy of the MSLD is lower than that of the RMLSM because local structure-based methods are usually more sensitive to local noise and incorrect structure components.The recall of 78.8%, the precision of 85.0%, and the RMSE of 1.1 pixels are achieved on average during the implementation process of the RSISC, of which accuracy is lower than that of the RMLSM mainly due to the unreliability of the gradient orientation of the edge point.In several image pairs, such as No.2 and No.6 in the second group, the CM of the RMLSM is lower than that of the RSISC.However, this will have little effect on the computational accuracy of the final transformation model because a line segment contains more information than a point.
The results from this study show that the shape can be utilized to constitute a feature descriptor independently when the intensity around this feature is unreliable.Most of the previous studies concentrated on local intensity-based registration methods to achieve robustness to geometric and illumination variations between the corresponding images.However, finding the invariant

Discussion
An automatic image registration method for remote sensing images with large background variations is proposed in this paper, and preferable results are obtained on both simulated and real image pairs.
The average results of the proposed method for both simulated and real image pairs are 79.7%, 89.1% and 1.0 pixels in recall, precision and RMSE, respectively.The RMLSM outperforms the other methods for all three evaluation criteria.The UR-SIFT achieves a recall of 63.1%, a precision of 70.7% and an RMSE of 1.4 pixels on average, which is the worst result among the four algorithms, because it purely depends on the intensity information during the feature matching process.The MSLD can obtain a recall of 73.9%, a precision of 79.4% and an RMSE of 1.2 pixels on average.The accuracy of the MSLD is lower than that of the RMLSM because local structure-based methods are usually more sensitive to local noise and incorrect structure components.The recall of 78.8%, the precision of 85.0%, and the RMSE of 1.1 pixels are achieved on average during the implementation process of the RSISC, of which accuracy is lower than that of the RMLSM mainly due to the unreliability of the gradient orientation of the edge point.In several image pairs, such as No.2 and No.6 in the second group, the CM of the RMLSM is lower than that of the RSISC.However, this will have little effect on the computational accuracy of the final transformation model because a line segment contains more information than a point.
The results from this study show that the shape can be utilized to constitute a feature descriptor independently when the intensity around this feature is unreliable.Most of the previous studies concentrated on local intensity-based registration methods to achieve robustness to geometric and illumination variations between the corresponding images.However, finding the invariant information is the common goal of all methods.The experimental results show that local intensity-based algorithms may fail when dealing with image pairs taken pre and post disasters, and in that case, the main shape structure becomes the most stable information.
Although EDLines and a line segment validation step can effectively extract the main shape contour consisting of line segments on it and the proposed line segment descriptor is invariant to scale, rotation and a certain amount of other affine transformations, there still exist several limitations in our registration algorithm.Global geometric distortions are more complex than the local ones, and non-rigid transformations may exist between the satellite image pairs.However, we are uncertain whether the proposed algorithm can cope with the satellite image pairs with high order non-rigid transformations.

Conclusions
In this paper, an automatic image registration method has been proposed for remote sensing images with large background variations (e.g., images taken pre and post an earthquake or flood).First, EDLines is used for line segment extraction, and a validation step is developed to reject the meaningless and fragmented features.Then, a shape-based line segment descriptor is introduced for each line segment.Finally, a spatial consistency measure is used to remove incorrect matches, and transformation parameters can be figured out.The experiment results on both simulated and real satellite image pairs demonstrate that high-quality and well-distributed line segments can be extracted and correctly matched, and a recall of 79.7%, a precision of 89.1% and an RMSE of 1.0 pixels have been achieved on average for all image pairs.All of the comparative algorithms are implemented on the same platform, and the running time of the proposed method is 4ms faster than that of the second fastest method.From this study, the global shape structure information is proven to be reliable and effective in the process of remote sensing image registration, especially when significant background variations exist in the corresponding images.
As future work, a registration method for remote sensing images with both large background variations and complex global distortions, such as non-rigid transformations and thin-plate-spline transformation, will be investigated.Some compound features, like the combination of line segments and points, could also be considered in order to increase matching accuracy.

Figure 1 .Figure 2 .
Figure 1.Main steps of the proposed algorithm.

Figure 2 .
Figure 2.An image pair taken pre and post a tsunami.(a) The original images; (b) images with different types of line segments depicted in them.

Figure 3 .
Figure 3. Flowchart of the edge drawing (ED) process.

Figure 3 .
Figure 3. Flowchart of the edge drawing (ED) process.

Figure 3 .
Figure 3. Flowchart of the edge drawing (ED) process.
chosen in this method for all datasets.

FigureFigure 5 .
Figure 5b shows line segments' validation in action on Figure 5a.

Figure 6 .
Figure 6.Two evaluation criteria with different length thresholds.

Figure 5 .
Figure 5. Results of line segment extraction.(a) Results of the EDLines algorithm on two corresponding images; (b) results of the line segment validation step implemented in (a).
chosen in this method for all datasets.

FigureFigure 5 .
Figure 5b shows line segments' validation in action on Figure 5a.

Figure 6 .
Figure 6.Two evaluation criteria with different length thresholds.Figure 6.Two evaluation criteria with different length thresholds.

Figure 6 .
Figure 6.Two evaluation criteria with different length thresholds.Figure 6.Two evaluation criteria with different length thresholds.

Figure 7 .
Figure 7. (a) The log-polar histogram bins; (b) the proposed histogram binning strategy; (c) several different shapes constituted by the same points.

Figure 7 .
Figure 7. (a) The log-polar histogram bins; (b) the proposed histogram binning strategy; (c) several different shapes constituted by the same points.
the 2d-bin normalized histogram at i l and j s , respectively.

Figure 8 .Figure 8 .
Figure 8. Generation process of the proposed descriptor.(a) An image taken after a tsunami with the extracted line segments; (b) the proposed histogram binning strategy used on a line segment and a number of blue arrows representing the orientation of the other line segments; (c) descriptor computation; (d) Gaussian weighting function; (e) descriptor vector generation.

Figure 9 .
Figure 9. Examples of the simulated images.(a) A satellite image from Landsat5, Band1; (b) the corresponding image of (a) from Landsat5, Band2; (c) asimulated image by adding sixty randomly-distributed patches of cloud to (b); (d) asimulated image by adding eighty randomly-distributed patches of cloud to (b); (e) asimulated image by adding a certain amount of affine transformations to (c); (f) asimulated image by adding a certain amount of affine transformations to (d).

Figure 9 .
Figure 9. Examples of the simulated images.(a) A satellite image from Landsat5, Band1; (b) the corresponding image of (a) from Landsat5, Band2; (c) asimulated image by adding sixty randomly-distributed patches of cloud to (b); (d) asimulated image by adding eighty randomly-distributed patches of cloud to (b); (e) asimulated image by adding a certain amount of affine transformations to (c); (f) asimulated image by adding a certain amount of affine transformations to (d).

Figure 10
Figure 10 shows the comparative results of these four algorithms implemented on the simulated image pairs in terms of recall, precision, CM and RMSE.The No.1-No.5 image pairs refer to the (a)(b)-(a)(f) in Figure9, all of which contain considerable background variations and a certain amount of affine transformations.The results presented in Figure10show that the proposed method outperforms the other three techniques in recall, precision and RMSE.In some cases, the CM of the RMLSM is less than that of the RSISC.However, this will have little effect on the computational accuracy of the final transformation model, because a line segment contains more information than a point.As the UR-SIFT purely depends on the intensity information in the feature matching process, it performs worst among the four algorithms for all evaluation criteria.

Figure 11 .Figure 11 .
Figure 11.Matching results of the RMLSM on the simulated image pairs.(a) The first image pair in the first group; (b) the fourth image pair in the first group; (c) the fifth image pair in the first group.

Figure 11 .Figure 12 .
Figure 11.Matching results of the RMLSM on the simulated image pairs.(a) The first image pair in the first group; (b) the fourth image pair in the first group; (c) the fifth image pair in the first group.

Figure 13 .
Figure 13.Matching results of the RMLSL on the high-resolution real image pairs.(a) The first image pair in the second group: QuickBird-WorldView; (b) the second image pair in the second group: QuickBird-QuickBird; (c) the third image pair in the second group: QuickBird-QuickBird; (d) the fourth image pair in the second group: QuickBird-QuickBird.

Figure 13 .Figure 14 .
Figure 13.Matching results of the RMLSL on the high-resolution real image pairs.(a) The first image pair in the second group: QuickBird-WorldView; (b) the second image pair in the second group: QuickBird-QuickBird; (c) the third image pair in the second group: QuickBird-QuickBird; (d) the fourth image pair in the second group: QuickBird-QuickBird.

Figure 14 .
Figure 14.Matching results of the RMLSM on the medium-and low-resolution real image pairs.(a) The fifth image pair in the second group: Aqua-Aqua; (b) the sixth image pair in the second group: Landsat7-Landsat7; (c) the seventh image pair in the second group: SPOT 4-SPOT 4.

Figure 15 .
Figure 15.Registration results of three sample real image pairs.(a)Registration results of the UR-SIFT on the first image pair in the second group; (b) registration results of the RSISC on the fifth image pair in the second group; (c) registration results of the MSLD on the sixth image pair in the second group; (d) registration results of the RMLSM on the first image pair in the second group; (e) registration results of the RMLSM on the fifth image pair in the second group; (f) registration results of the RMLSM on the sixth image pair in the second group.

Figure 15 .
Figure 15.Registration results of three sample real image pairs.(a) Registration results of the UR-SIFT on the first image pair in the second group; (b) registration results of the RSISC on the fifth image pair in the second group; (c) registration results of the MSLD on the sixth image pair in the second group; (d) registration results of the RMLSM on the first image pair in the second group; (e) registration results of the RMLSM on the fifth image pair in the second group; (f) registration results of the RMLSM on the sixth image pair in the second group.

Table 1 .
Line segment extraction algorithm.

Table 2 .
The complete information of all experimental satellite image pairs.
Location Image Size Bits GSD(m) DateSensor Events

Table 2 .
The complete information of all experimental satellite image pairs.

Table 3 .
Average results of the accuracy with different parameters.

Table 4 .
Comparative matching accuracy between the RMLSM and the shape context (SC).