Optical Remote Sensing Image Registration Using Spatial-Consistency and Average Regional Information Divergence Minimization via Quantum-Behaved Particle Swarm Optimization

: Due to invariance to signiﬁcant intensity di ﬀ erences, similarity metrics have been widely used as criteria for an area-based method for registering optical remote sensing image. However, for images with large scale and rotation di ﬀ erence, the robustness of similarity metrics can greatly determine the registration accuracy. In addition, area-based methods usually require appropriately selected initial values for registration parameters. This paper presents a registration approach using spatial consistency (SC) and average regional information divergence (ARID), called spatial-consistency and average regional information divergence minimization via quantum-behaved particle swarm optimization (SC-ARID-QPSO) for optical remote sensing images registration. Its key idea minimizes ARID with SC to select an ARID-minimized spatial consistent feature point set. Then, the selected consistent feature set is tuned randomly to generate a set of M registration parameters, which provide initial particle warms to implement QPSO to obtain ﬁnal optimal registration parameters. The proposed ARID is used as a criterion for the selection of consistent feature set, the generation of initial parameter sets, and ﬁtness functions used by QPSO. The iterative process of QPSO is terminated based on a custom-designed automatic stopping rule. To evaluate the performance of SC-ARID-QPSO, both simulated and real images are used for experiments for validation. In addition, two data sets are particularly designed to conduct a comparative study and analysis with existing state-of-the-art methods. The experimental results demonstrate that SC-ARID-QPSO produces better registration accuracy and robustness than compared methods.


Introduction
Image registration is a fundamental process of aligning two or more images (referred to as the reference image and the sensed image respectively) of the same scene taken at different times, from different viewpoints, and/or by different sensors and transforms them into unified coordinate system [1].It is a crucial pre-process in many optical remote sensing image applications in which the required information must be obtained by integrating various data sources.Therefore, an accurate, effective, and robust image registration is a key success for follow-up data processing.
Generally speaking, ABMs usually obtain higher registration accuracy than FBMs.On the contrary, FBMs are more suitable for registering image pairs with scale difference.ABMs require prior knowledge of initial parameters for multiresolution image registration.By taking advantage of the robustness of features to scale variants and the high precision of similarity metrics, AFBMs integrates information obtained from local features and similarity metrics to effectively register images with large geometric and grayscale differences.Specifically, there are three strategies which integrate ABMS and FBMS into a joint AFBM.As for the first strategy, FBMs are first used to calculate initial parameters for the follow-up ABMs to refine alignment accuracy.For example, [36] implemented a scale-invariant feature transform (SIFT)-based method to calculate pre-registration parameters, which were refined based on maximum mutual information (MI) found by a post-registration process by maximizing MI.In [37], MI was replaced by regional mutual information (RMI).As for the second strategy FBMs were also implemented to performed pre-registration as the first one step with the difference that local features were tuned by similarity metrics (SMs) in the post-registration.There are several works are reported along this line [38].Proposed a pre-registration method by the scale restricted SIFT [23] and then followed by tuning Harris' corner in conjunction with a local self-similarity descriptor by normalized correlation coefficient to yield more accurate positions of feature pairs.As for the third strategy, selecting a consistent feature set based on spatial constraint and intensity information is implemented simultaneously.One interesting approach is the iterative SIFT (ISIFT) proposed in [41], which developed an iterative image registration approach that can be used to select consistent feature point sets obtained by maximizing similarity with spatial constraint to update registration parameters in an iterative manner.
Different from ISIFT in [41], this paper follows an idea of using intensity similarity/discrepancy metrics (SM/DM) to fine tune registration parameters by using an optimization process of registration parameters, which maximize/minimize the objective function derived from SM/DM using the SM/DM and SC-based initialized parameters.To do so, it requires an optimization procedure to find the best transformation parameters.Basically, such parameter optimization methods can be divided into two categories, gradient-based optimization methods (GBMs) [29][30][31][32]36] and evolutionary methods (EVMs) [24,37,38,[42][43][44][45][46].For GBMs, [31] proposed a stochastic gradient, called a simultaneous perturbation stochastic approximation search strategy, to accelerate the process of maximizing MI, which is a multiresolution framework based on wavelet-like pyramid decomposition.Another method [32] is developed a stochastic gradient approach to solve the parameters on a proposed frequency domain surface of MI.As a modified method, [36] proposed a Marquardt-Levenberg search strategy using multiresolution to maximize MI in the post-registration.These methods basically use the traditional MI to optimize registration parameters.For many modified versions of MI, EVMs are developed to avoid calculating the gradient similarity between registration parameters.For example, simulated annealing [44], ant colony optimization [24,45], particle swarm optimization (PSO) [42][43][44]46,47] and quantum-behaved PSO (QPSO) algorithm [38,47,48], and chaotic quantum particle swarm optimization implemented in [37], have used modified-MI for finding optimal registration parameters.Among them, QPSO is of particular interest by introducing quantum theory into PSO, which theoretically guaranteed that the optimal solution can be found in search space [47,48].By taking advantage of the robustness of features to scale variance and the high precision of similarity metrics to intensity difference, a novel AFBM is developed in this paper.Inspired by the concept of spectral information divergence (SID) [49], which measures the discrepancy of probabilistic behaviors between the spectral signatures, this paper proposes an average regional information divergence (ARID) as DM.The developed ARID is quite different from the above-mentioned MI-based methods or modified versions in the sense that it not only makes use of statistics derived from intensity correlation Remote Sens. 2020, 12, 3066 3 of 23 and spatial consistency information but also avoids calculating the joint probability distribution.Additionally, unlike the above-mentioned AFBMs, this paper further develops a two-component cascaded registration method, to be called spatial-consistency and average regional information divergence minimization via QPSO (SC-ARID-QPSO) as shown in Figure 1 for optical remote sensing images registration.It combines SC and intensity discrepancy in different manners.Specifically, it can be implemented by two components in cascade with a SM/DM-based consistent feature set selection implemented in the first component, Component I, followed by the second component, Component II, which iteratively optimizes registration parameters by maximizing/minimizing SM/DM with the spatial constraint via optimization algorithm.The key idea of SC-ARID-QPSO is to jointly use ARID with spatial consistency throughout these two components.
Remote Sens. 2020, x , x FOR PEER REVIEW 3 of 23 joint probability distribution.Additionally, unlike the above-mentioned AFBMs, this paper further develops a two-component cascaded registration method, to be called spatial-consistency and average regional information divergence minimization via QPSO (SC-ARID-QPSO) as shown in Figure 1 for optical remote sensing images registration.It combines SC and intensity discrepancy in different manners.Specifically, it can be implemented by two components in cascade with a SM/DMbased consistent feature set selection implemented in the first component, Component I, followed by the second component, Component II, which iteratively optimizes registration parameters by maximizing/minimizing SM/DM with the spatial constraint via optimization algorithm.The key idea of SC-ARID-QPSO is to jointly use ARID with spatial consistency throughout these two components.The remainder of this paper is organized as follows.The proposed regional information divergence (RID) is described in Section 2. SC-RID-QPSO is described in Section 3. Novelties of SC-ARID-QPSO is given in Section 4. The experimental setting is detailed in Section 5 followed by a comparative study of extensive experiments along with their results in Section 6.The discussion of experimental results, phenomenon analysis, and future work are presented in Section 7. Finally, a brief conclusion and contributions are summarized in Section 8.

Average Regional Information Divergence
Inspired from the concept of spectral information divergence (SID) [49], which measures the discrepancy of probabilistic behaviors between the spectral signatures, we drive a discrepancy metric based on region information divergence to capture the intensity correlation between overlapping region of reference image R and rectified sensed image Srec.For each pixel According to (1) an overlapped region (OR) in the reference image and rectified images can be further defined by taking the average of  The remainder of this paper is organized as follows.The proposed regional information divergence (RID) is described in Section 2. SC-RID-QPSO is described in Section 3. Novelties of SC-ARID-QPSO is given in Section 4. The experimental setting is detailed in Section 5 followed by a comparative study of extensive experiments along with their results in Section 6.The discussion of experimental results, phenomenon analysis, and future work are presented in Section 7. Finally, a brief conclusion and contributions are summarized in Section 8.

Average Regional Information Divergence
Inspired from the concept of spectral information divergence (SID) [39], which measures the discrepancy of probabilistic behaviors between the spectral signatures, we drive a discrepancy metric based on region information divergence to capture the intensity correlation between overlapping region of reference image R and rectified sensed image S rec .For each pixel P R i and P S rec i in overlapping regions, the intensity vectors of local neighborhood region with radius r are v R i = (v i 1 , . . ., v i L ) T and w S rec i = (w i 1 , . . ., w i L ) T , L = (2r + 1) 2 .Assume that p R i = (p i 1 , . . ., p i L ) T and q S rec i = (q i 1 , . . ., q i L ) T are probability vector corresponding v R i = (v i 1 , . . ., v i L ) T and w S rec i = (w i 1 , . . ., w i L ) T where p i l = v i l / L l=1 v i l and q i l = w i l / L l=1 w i l .From information theory and spectral information divergence (SID) [39], we can define the region information divergence (RID) between v R i and w S rec i similar to SID by According to (1) an overlapped region (OR) in the reference image and rectified images can be further defined by taking the average of RID(v R i , w S rec i ) over (v R i , w S rec i ) ∈ OR as average region information divergence (ARID) where |OR| is the total number of pixels in OR.Using (2) ARID can be used to measure the region discrepancy metric between reference image and rectified sensed image.As a result, the smaller is the value of ARID, the higher is the accuracy of the registration parameters.

SC-ARID-QPSO
The idea of SC-ARID-QPSO was briefly introduced in Section 1.The entire process of SC-ARID-QPSO is depicted in Figure 2 which implements two components in a cascade manner where the first component, Component I, is to produce a spatial consistent feature set to be used for generating initial particle swarms for QPSO; and the second component, Component II, is to implement QPSO for minimizing ARID using ARID-constrained initialized parameter set.In Component I, local features are extracted from the reference image and sensed image by SIFT, respectively.It is further followed by a spectral angular distance (SAD)-based feature matching to obtain initial matching feature set.Then, the consistent feature sets are obtained by minimizing ARID with spatial constraint.In Component II, the M initial parameter sets are calculated by fine-tuned consistent feature sets, which are generated by randomly tuning the selected consistent feature sets through a number of times constrained by ARID values where M is empirically determined.The obtained initial parameter sets are further used to produce initial particle swarms for QPSO, which minimizes ARID to find final optimal registration parameters.Figure 2 provides the step-by-step details of implementing each of the above-mentioned two-component cascaded registration method, which is clearly different from ISIFT in [41].
Remote Sens. 2020, x , x FOR PEER REVIEW 4 of 23 where OR is the total number of pixels in OR.Using (2) ARID can be used to measure the region discrepancy metric between reference image and rectified sensed image.As a result, the smaller is the value of ARID, the higher is the accuracy of the registration parameters.

SC-ARID-QPSO
The idea of SC-ARID-QPSO was briefly introduced in Section 1.The entire process of SC-ARID-QPSO is depicted in Figure 2 which implements two components in a cascade manner where the first component, Component I, is to produce a spatial consistent feature set to be used for generating initial particle swarms for QPSO; and the second component, Component II, is to implement QPSO for minimizing ARID using ARID-constrained initialized parameter set.In Component I, local features are extracted from the reference image and sensed image by SIFT, respectively.It is further followed by a spectral angular distance (SAD)-based feature matching to obtain initial matching feature set.Then, the consistent feature sets are obtained by minimizing ARID with spatial constraint.In Component II, the M initial parameter sets are calculated by fine-tuned consistent feature sets, which are generated by randomly tuning the selected consistent feature sets through a number of times constrained by ARID values where M is empirically determined.The obtained initial parameter sets are further used to produce initial particle swarms for QPSO, which minimizes ARID to find final optimal registration parameters.Figure 2 provides the step-by-step details of implementing each of the above-mentioned two-component cascaded registration method, which is clearly different from ISIFT in [41].

Feature Extraction and Matching
The first part of the Component I is to find initial matching feature points by SIFT [2], which consists of feature extraction and description.Then, the ratio of the nearest neighbor to the secondnearest neighbor [26] based on the SAD [49] is calculated to be used for the selection of the initial matching feature set.
SIFT selects the local maxima and minima of D(x) in scale space as candidate feature points.In order to obtain more accurate position, detailed fit to the nearby data samples is performed in scalesspatial spaces.Unstable feature points are rejected based on contrast values and principal curvature ratios of features.Then, a dominant orientation is assigned to each feature point based on gradient orientations of features within a local region to achieve rotation invariance.Finally, a 128-D descriptor based gradient histogram is constructed in a local region of each feature point.The initial feature matching is performed by using SAD [50] to measure the discrepancy between two descriptors defined by

Feature Extraction and Matching
The first part of the Component I is to find initial matching feature points by SIFT [2], which consists of feature extraction and description.Then, the ratio of the nearest neighbor to the second-nearest neighbor [26] based on the SAD [49] is calculated to be used for the selection of the initial matching feature set.SIFT selects the local maxima and minima of D(x) in scale space as candidate feature points.In order to obtain more accurate position, detailed fit to the nearby data samples is performed in scales-spatial spaces.Unstable feature points are rejected based on contrast values and principal curvature ratios of features.Then, a dominant orientation is assigned to each feature point based on gradient orientations of features within a local region to achieve rotation invariance.Finally, a 128-D descriptor based gradient histogram is constructed in a local region of each feature point.The initial feature matching is performed by using SAD [50] to measure the discrepancy between two descriptors defined by where a i is the ith feature descriptor vector in the sensed image, b i is the ith feature descriptor vector in the reference image, and θ ratio is the angle between the two vectors.The ratio of the first minimum angle to the second-minimum angle [31] denoted by θ ratio is used.If the initial match features with angle distance ratios are smaller than a threshold θ ratio , they are remained.

Consistent Feature Set Selection Based on ARID
On the basis of the initial feature matching, an effective outlier elimination can be further used to improve the precision of reserved feature pairs.A similar idea to using the ARID-based method to select a consistent feature set was also proposed in [41].The details of ARID-based SC feature set selection can be described by the following two steps.

1.
Implement SIFT and random sample consensus on reference image R and sensed image S to obtain initial consistent feature point sets where FS (0) j is the jth initial consistent feature point set and M 0 is the number of initial consistent feature point sets.

2.
Adopt S where Pa min , where j * = arg min FS j ∈S feature ARID FS j .

Component II: Fine Registration by Minimizing Average Regional Information Divergence (ARID) via QPSO
After the consistent feature set being selected, the registration parameters are optimized by minimizing ARID via QPSO.In order to implement QPSO, initial particle swarms must be specified first.In QPSO, each particle only has its position information mD ) in a D-dimensional space and represents a solution of the problem, specified by a set of registration parameters, where k is the current number of iteration and m is the mth particle in M particle swarms.
In the very beginning, the local optimal position P (0) l M l=1 and P (0) g are provided by initialization.The particle positions are updated by the evolution equation, which is given by where u md and φ are random numbers distributed uniformly in interval (0,1).β is an expansion-contraction factor, which is defined as follows: where N max is the maximum iteration number and N current is the current iteration number.α 0 and α 1 are initial and final values of the control parameter respectively and α 0 > α 1 .An automatical stopping rule is determined by two ASID values corresponding global particle warm P .Let n be the number of times that ARID min ≤ τ is satisfied.In this case, n = n + 1, else n = 0.If the number of iterations reaches N max or n reaches n τ , the iteration is terminated.The detailed description of SC-ARID-QPSO is provided in the following.

Novelties of SC-ARID-QPSO
There are several novelties derived from SC-ARID-QPSO which can be summarized as follows.

1.
The first novelty is introduction of ARID, which considers a neighboring region as a random variable.Such ARID is quite different from RMI and RIRMI in the sense that the latter was calculated under the assumption that high-dimensional distribution is approximately normally distributed, while the former defines the desired probability distribution by normalizing its neighborhood region histogram to unity.The discrepancy between the overlapped regions of reference image and rectified sensed image is calculated by averaging the values of RID of all pixels in the overlapped region without a need of calculating their joint probability distribution.

2.
The Second novelty is to develop a two-component cascaded registration method which optimizes spatial-consistency and ARID registration parameters.In contrast to most AFBMs [36,37] which only uses the pre-registration as initial parameters for the fine registration, SC-ARID-QPSO uses pre-registration based on SC and ARID to initialize the parameters for fine registration.The process of ARID-constrained parameter initialization guarantees the quality of initialized parameter set.In particular, it is quite different from an AFBM developed in [41], called iterative SIFT (ISIFT) which replaces the sensed image via a feedback loop to re-implement registration process.More specifically, ISIFT iteratively feeds a rectified image to replace the current sensed images, while SC-ARID-QPSO iteratively optimizes the registration parameters using an ARID-based objective function with the ARID-constrained initialized parameter set from the same set of the reference and sensed images.

3.
The last and most important novelty is fundamental different concepts used to derive ISIFT and SC-ARID-QPSO.Despite the fact that both ISIFT and SC-ARID-QPSO use an iterative process, the ISIFT feeds back rectified images to update the input sensed images as opposed to SC-ARID-QPSO, which uses QPSO iteratively to optimize parameter sets.So, ISIFT [41] and SC-ARID-QPSO are designed based on different uses of iterative processes since they both use spatial-consistency and intensity-similarity to perform different tasks.However, it is worth noting that there are several key differences between ISIFT and SC-ARID-QPSO.First of all, SC-ARID-QPSO takes advantage of QPSO to optimize registration parameter by minimizing ARID via optimization algorithm, while ISIFT does not.Second of all, SC-ARID-QPSO develops ARID as a novel discrepancy metric compared to ISIFT which uses traditional similarity metrics.Third of all, although the experimental datasets used in this paper are the same as the datasets in reference [41], their purposes are different.In this paper, the experimental results are used to demonstrate the effectiveness of ARID and the performance of SC-ARID-QPSO as opposed to ISIFT which used the data sets to show effectiveness of consistent feature selection strategy and iterative performance of ISIFT.Specifically, in Component I, the designed experiments are used for validating the accuracy and effectiveness of ARID.In Component II, the designed experiments are used to demonstrate the iterative process of parameter optimization by minimizing ARID via QPSO as well as the effective selection of initialized parameter set.In addition, the comparative experiments further show the superior performance of SC-ARID-QPSO to compared methods.

Experimental Setting
In order to validate the performance of SC-ARID-QPSO, two groups of data sets are used for experiments and are acquired from different bands, different resolution, the same/different sensors, and the same/different time.Experimental datasets are diverse, including aerial and satellite images, and has affine or projection distortion, involving scale differences, rotation differences, translation differences, and perspective differences.In addition, because the data contains different sensors at different times, it also has grayscale differences.
The proposed SC-ARID-QPSO was implemented in MATLAB.In order to compare the performance of ARID with other common similarity metrics, such as MI and RMI, we implemented the SC-ARID-QPSO with different metrics to replace ARID.Thus, it yielded two different versions, called spatial consistent (SC) with MI maximization via QPSO (SC-MI-QPSO), and SC with RMI maximization via QPSO (SC-RMI-QPSO).For comparing performance of Component I by SC with SM or RANSAC, different versions, called SC with MI (SC-MI), SC with RMI (SC-RMI), and SC with ARID (SC-ARID) were used.For comparing the results obtained in Component II, different initial consistent feature sets generated in Component I were also derived, which were RANSAC using MI maximization via QPSO, RANSAC with RMI maximization via QPSO, and RANSAC with ARID minimization via QPSO referred to as RS-MI-QPSO, RS-RMI-QPSO and RS-ARID-QPSO, respectively.More specifically, RANSAC and SC with SM, respectively, selected consistent feature point sets to generate initial particle swarm to optimize parameters by maximizing/minimizing SM/DM via QPSO.However, there is a key difference between the two initial particle swarms generated by RANSAC and SC with SM, as described in the introduction.SC with SM/DM selected a consistent feature set with maximum/minimum similarity value (SV)/DV, which has more accurate registration results than RANSAC.Moreover, the SV/DV of each particle in the initial particle swarm can be constrained by SM/DM to make the initial particle swarm better than the consistent feature set.
As demonstrated above, RS-MI-QPSO, RS-RMI-QPSO, RS-ARID-QPSO, SC-MI-QPSO, SC-RMI-QPSO, and SC-ARID-QPSO were used to compare their optimized results with different initial particle swarms and metrics.So, a total of 6 methods were tested for comparison.Specifically, SC-MI, SC-RMI, SC-ARID, and RANSAC were implemented to compare ARID with other different metrics in the Component I.In addition, to further conduct quantitative analysis, the final root mean square errors (RMSEs) of 6 methods with 2 components were calculated, and the visual displays of final fused registered images are also plotted for comparison.

Data Sets
The first group contains three simulated datasets and three real datasets, both of which can be used to validate the superior performance of spatial consistent feature point set based on RID (SC-ARID) and SC-ARID with parameters optimization via QPSO (SC-ARID-QPSO).The three simulated image pairs consist of two images selected from different bands, one band as the reference image and another band of corresponding multispectral images with a simulated transformation (2.5-times scale and 20 • rotational changes by ENVI [51]) as the sensed image.To verify the performance under real scenes, the three real image sets were obtained from different sensors and taken during the same or different years.These image pairs also cover a variety of spatial resolutions from 2 m to 30 m.
The second group contains 80 images, which were randomly selected from two public available data sets respectively [6,52,53].The first public available dataset contains 107 multispectral and multitemporal remote sensing image pairs with 512 × 512 [6,52], obtained from the United States Geological Survey (USGS) website [54].The pixel resolution of these images is 1 m.The second public available dataset [54,55] contains 12 classes scene and a total of 1200 images with 256 × 256.The pixel resolution of this public images is 0.3 m.In order to verify the statistical evaluation performance under affine transformation and projective transformation, two data sets were simulated for experiments as follows.Data set A was generated from 40 image sets, which were randomly selected from the first public available data set.For each data set, one band was selected as reference image and another band was applied a known affine transformation (scaling, rotation, and translation) to generate the sensed image.Data set B was generated from 40 image sets, which were randomly selected from the second public available data set.For each data set, one band was selected as reference image and Remote Sens. 2020, 12, 3066 9 of 23 another band was applied to a known projective transformation (scaling, rotation, translation, and perspective) to create the corresponding sensed image.All these data sets with detailed descriptions are made available at http://wiki.umbc.edu/display/rssipl/10.+Download.

Parameter Setting
In the component I, the ratio parameter θ ratio used for SAD feature matching.As a tradeoff between the number of correct matches and the rate of correct matches, θ ratio was set to 0.7.The maximum number of iterations for RANSAC was set to 100 based on the calculation equation [15] and experience empirically.The local 8-nieghbor pixels are used for calculating RID.In addition, the threshold of the model referred in [2] was set to 0.5.In the component II, the particle population of particle swarm was set to 20.For the stopping rule, the threshold of difference between similarity values th = 10 −4 .The number of durations n τ = 15 that the different between two consecutive discrepancy value is less than a certain threshold value.

Evaluation Criteria
Root mean square error (RMSE) was used to quantitatively evaluate the registration position accuracy.For an affine transformation and projective transformation, it is defined respectively by where a 1 , a 2 , b 1 , b 2 , c 1 , c 2 and a 1 , a 2 , a 3 , b 1 , b 2 , b 3 , c 1 , c 2 are the real values of the corresponding model parameter, â1 , â2 , b1 , b2 , ĉ1 , ĉ2 and â1 , â2 , â3 , b1 , b2 , b3 , ĉ1 , ĉ2 are the estimated values of the corresponding model parameters.As for the ground truth (simulated images) or the reliable reference geometric transformation parameters, they were calculated by manual registration using ENVI.
In addition, the checkboard mosaic images [40] for group 1 are provided for performance evaluation and visual inspection of registration results respectively.

Experimental Results
Based on the above demonstration of experimental design, the experimental results of the two groups of data sets are presented in this section, which contain simulated datasets experiments, real datasets experiments, and comparative analysis on registration performance with other methods.

Simulated Datasets Experiments
Figure 3 shows the SV/DV calculated by MI, RMI, ARID, and inner number of feature sets, respectively, along with the index of feature sets, and RMSEs in each feature set for simulated dataset 1 in Component I.In all figures of Figure 3, the horizontal axes are the index of feature sets, and the vertical axes in Figure 3a-c are SVs/DVs produced by MI, RMI, and ARID, while the vertical axis in Figure 3d is the feature number of each candidate feature set. Figure 3e shows the RMSE versus the index of feature sets where RMSE was calculated by each feature set.The number of feature pairs in each feature set larger than three for affine transformation was retained while the total number of feature sets was set to 100.Thus, the resulting number of each candidate set varies with different images as demonstrated in the experiments conducted in Section 6.
In Figure 3, the selected consistent feature sets by SC-MI, SC-RMI, SC-ARID, and RANSAC and their corresponding RMSE are highlighted by large solid circles.The RMSE calculated by consistent feature set selected by ARID were relatively smaller than or equal to the RMSE calculated by RANSAC.In order to demonstrate the effectiveness of ARID, Figure 3 shows the intuitively selection process in Component I for simulated image 1, and the corresponding RMSE is given in the Component I of Table 1.The feature set index corresponding to the maximum SV is abbreviated as j * MI , j * RANSAC and the minimum SV is abbreviated as j * ARID .Figure 3a,c show that j * MI = j * ARID = 24 with their calculated RMSE corresponding to 1.712 pixel as shown in Figure 3e.By contrast, Figure 3b shows that j * RMI = 10 with its RMSE corresponding to 1.746 pixel as shown in Figure 3e.On the other hand, Figure 3d presents that j * RANSAC = 2 with its RMSE corresponding to 1.976 pixel as shown in Figure 3e.For all three simulated images, the RMSE of Component I are given in Table 1.The RMSE based on SC with ARID can obtain the same as or relatively better results than other similarity metrics.The selected feature set directly affect the registration accuracy, which will also be conductive to the construction of subsequent initial particle swarms for QPSO.Since the SM/DM usually has many local extremes as the objective function, obtaining better initial parameters can help the algorithm tend to the optimal result in the iterative process.Simulated datasets 2 and 3 are also used to do the same experiments, and the same conclusions as simulated image 1 could be also drawn.To avoid duplication, only the results of Component I for simulated image 1 are presented here.
Figure 4 shows the iterative profiles of changes in SV/DV and RMSE obtained by the six methods mentioned above (RS-MI-QPSO, RS-RMI-QPSO, RS-ARID-QPSO, SC-MI-QPSO, SC-RMI-QPSO, and SC-ARID-QPSO). Figure 5 provides visual inspection of checkboard mosaicked images for the final registration results.Table 1 also tabulates the RMSE results of final results (RS-MI-QPSO, RS-RMI-QPSO, RS-ARID-QPSO, SC-MI-QPSO, SC-RMI-QPSO, and SC-ARID-QPSO).As shown in Figure 5, the checkboard mosaicked images are included to display the registration results of the reference image with the rectified sensed image, which show that the proposed SC-MI-QPSO, SC-RMI-QPSO, and SC-ARID-QPSO could rectify well.This confirms the results in Figure 4 and the Component II of Table 1.Based on the results shown in Figure 4 and the results of the Component II in Table 1, the progressive curves generated by the iteration process demonstrate the performance of the six methods and the final RMSE-based position accuracy.(g) (h) (i)

Real Datasets Experiments
Real image experiments were conducted for the above mentioned six methods.Like Figure 3, for real dataset 1, Figure 6a-c shows the SV/DV calculated for each feature set by MI, RMI and ARID respectively.Figure 6d shows the total number of features in each candidate feature set, while Figure 6e plots RMSEs calculated by each feature set. Figure 6a shows that * MI 14 j  with its calculated RMSE corresponding to 1.355 pixels.Figure 6b-c  with its RMSE corresponding to 1.592 pixels shown in Figure 6d.Compared to other SMs, using minimal ARID can obtained a better feature set corresponding to a lower RMSE.The same as simulated experiment, the results of real image 2 and image 3 are not included here to avoid duplication.Figure 7 shows the iterative profiles of changes in SV/DV and RMSE obtained by the six methods for real datasets 1-3.The Table 2 also tabulates the RMSE results of Component I and Component II. Figure 8 provides visual inspection of checkboard mosaicked images for the final registration results.As shown in Figure 8, the checkboard mosaicked images are included to display the registration results of the reference image with the rectified sensed image, which show that the proposed SC-ARID-QPSO, SC-RMI-QPSO, and SC-MI-QPSO could rectify well, which confirms the results in Figure 7 and Component II of Table 2.

Real Datasets Experiments
Real image experiments were conducted for the above mentioned six methods.Like Figure 3, for real dataset 1, Figure 6a-c shows the SV/DV calculated for each feature set by MI, RMI and ARID respectively.Figure 6d shows the total number of features in each candidate feature set, while Figure 6e plots RMSEs calculated by each feature set. Figure 6a shows that j * MI = 14 with its calculated RMSE corresponding to 1.355 pixels.Figure 6b-c shows that j * RMI = j * ARID = 56 with its calculated RMSE corresponding to 1.401 pixels.On the other hand, j * RANSAC = 32 with its RMSE corresponding to 1.592 pixels shown in Figure 6d.Compared to other SMs, using minimal ARID can obtained a better feature set corresponding to a lower RMSE.The same as simulated experiment, the results of real image 2 and image 3 are not included here to avoid duplication.
Figure 7 shows the iterative profiles of changes in SV/DV and RMSE obtained by the six methods for real datasets 1-3.The Table 2 also tabulates the RMSE results of Component I and Component II. Figure 8 provides visual inspection of checkboard mosaicked images for the final registration results.As shown in Figure 8, the checkboard mosaicked images are included to display the registration results of the reference image with the rectified sensed image, which show that the proposed SC-ARID-QPSO, SC-RMI-QPSO, and SC-MI-QPSO could rectify well, which confirms the results in Figure 7 and Component II of Table 2.

Real Datasets Experiments
Real image experiments were conducted for the above mentioned six methods.Like Figure 3, for real dataset 1, Figure 6a-c shows the SV/DV calculated for each feature set by MI, RMI and ARID respectively.Figure 6d shows the total number of features in each candidate feature set, while Figure 6e plots RMSEs calculated by each feature set. Figure 6a shows that * MI 14 j  with its calculated RMSE corresponding to 1.355 pixels.Figure 6b-c  with its RMSE corresponding to 1.592 pixels shown in Figure 6d.Compared to other SMs, using minimal ARID can obtained a better feature set corresponding to a lower RMSE.The same as simulated experiment, the results of real image 2 and image 3 are not included here to avoid duplication.Figure 7 shows the iterative profiles of changes in SV/DV and RMSE obtained by the six methods for real datasets 1-3.The Table 2 also tabulates the RMSE results of Component I and Component II. Figure 8 provides visual inspection of checkboard mosaicked images for the final registration results.As shown in Figure 8, the checkboard mosaicked images are included to display the registration results of the reference image with the rectified sensed image, which show that the proposed SC-ARID-QPSO, SC-RMI-QPSO, and SC-MI-QPSO could rectify well, which confirms the results in Figure 7 and Component II of Table 2.

Comparative Analysis on Registration Performance
The results in Sections 5.1 and 5.2 validated the effectiveness of SC-ARID-QPSO.In this section, we performed experiments to evaluate the comparative analysis of SC-ARID-QPSO to SC-ARID and other four compared methods, Random sample consensus (RANSAC) [15], locally linear transforming (LLT) [18,56], Locality preserving matching (LPM) [19,20,57], and via progressive sparse spatial consensus (PSSC) [21,58].For fairness of comparison, SIFT feature extraction and SAD

Comparative Analysis on Registration Performance
The results in Sections 5.1 and 5.2 validated the effectiveness of SC-ARID-QPSO.In this section, we performed experiments to evaluate the comparative analysis of SC-ARID-QPSO to SC-ARID and other four compared methods, Random sample consensus (RANSAC) [15], locally linear transforming (LLT) [18,56], Locality preserving matching (LPM) [19,20,57], and via progressive sparse spatial consensus (PSSC) [21,58].For fairness of comparison, SIFT feature extraction and SAD feature matching method were used.As shown in Figure 9a, the RMSEs of SC-ARID-QPSO were almost smaller than 1.5 pixels.Compared with SC-ARID the final registration accuracy obtained by the parameter optimization in Component II can be improved and resulted in smaller error.In addition, compared with the RMSE curves plotted in Figure 9a, the RMSEs of all methods for data with perspective deformation in dataset B are relatively larger as shown in Figure 9b.However, in comparison with SC-ARID and other four algorithms, SC-ARID-QPSO obtained significantly better results than the five compared algorithms.The RMSEs of most images produced by SC-ARID-QPSO were less than two pixels, compared to SC-ARID and the other four compared algorithms, which had RSMEs mostly around three pixels in Figure 9b.
almost smaller than 1.5 pixels.Compared with SC-ARID the final registration accuracy obtained by the parameter optimization in Component II can be improved and resulted in smaller error.In addition, compared with the RMSE curves plotted in Figure 9a, the RMSEs of all methods for data with perspective deformation in dataset B are relatively larger as shown in Figure 9b.However, in comparison with SC-ARID and other four algorithms, SC-ARID-QPSO obtained significantly better results than the five compared algorithms.The RMSEs of most images produced by SC-ARID-QPSO were less than two pixels, compared to SC-ARID and the other four compared algorithms, which had RSMEs mostly around three pixels in Figure 9b.

Discussion
According to Figures 3 and 6 and Component I in Tables 1 and 2, the consistent feature set selection results from different criteria with spatial consistency show that the selection strategy using ARID as a metric can also evaluate the accuracy of the registration parameters accurately and robustly.As shown by the three simulated datasets and three real datasets, the consistent feature sets in Component I selected by SC-ARID were relatively better that by RANSAC and better or similar to that selected by SC-MI and SC-RMI.The selected feature set directly affects the registration accuracy, which will also generate a better set of initial particle swarms for the subsequent QPSO.
Combining the iterative curves in Figures 4 and 7 and Component II in Tables 1 and 2, the registration parameters obtained by SC-ARID-QPSO, SC-RMI-QPSO, and SC-MI-QPSO are relatively better or similar to the parameters by RS-ARID-QPSO, RS-RMI-QPSO, and RS-MI-QPSO.For the ARID-based method, the obtained parameter initialization could yield relatively better final results.In addition, it can be seen from the iterative curves in Figures 4 and 7 that there was a decreasing tendency in registration error during the process of minimizing ARID.The iterative performance of the six methods and the final RMSE-based position accuracy demonstrated that the initialization of the registration parameters affected the registration accuracy to some extent.In addition, optimization algorithm and SV/DVs are another two key factors that had significant impact on the final registration parameters.Based on the above experimental results and analysis, the proposed with SV/DV via QPSO was shown to be effective and obtain good performance.The results validated the effectiveness and accuracy of the proposed cascaded registration system as well as the proposed ARID criterion.
By virtue of the above validation experiments, SC-ARID-QPSO performed better than SC-ARID and RANSAC to a certain degree.Because the performance of a feature-based method largely

Discussion
According to Figures 3 and 6 and Component I in Tables 1 and 2, the consistent feature set selection results from different criteria with spatial consistency show that the selection strategy using ARID as a metric can also evaluate the accuracy of the registration parameters accurately and robustly.As shown by the three simulated datasets and three real datasets, the consistent feature sets in Component I selected by SC-ARID were relatively better that by RANSAC and better or similar to that selected by SC-MI and SC-RMI.The selected feature set directly affects the registration accuracy, which will also generate a better set of initial particle swarms for the subsequent QPSO.
Combining the iterative curves in Figures 4 and 7 and Component II in Tables 1 and 2, the registration parameters obtained by SC-ARID-QPSO, SC-RMI-QPSO, and SC-MI-QPSO are relatively better or similar to the parameters by RS-ARID-QPSO, RS-RMI-QPSO, and RS-MI-QPSO.For the ARID-based method, the obtained parameter initialization could yield relatively better final results.In addition, it can be seen from the iterative curves in Figures 4 and 7 that there was a decreasing tendency in registration error during the process of minimizing ARID.The iterative performance of the six methods and the final RMSE-based position accuracy demonstrated that the initialization of the registration parameters affected the registration accuracy to some extent.In addition, optimization algorithm and SV/DVs are another two key factors that had significant impact on the final registration parameters.Based on the above experimental results and analysis, the proposed SC with SV/DV via QPSO was shown to be effective and obtain good performance.The results validated the effectiveness and accuracy of the proposed cascaded registration system as well as the proposed ARID criterion.
By virtue of the above validation experiments, SC-ARID-QPSO performed better than SC-ARID and RANSAC to a certain degree.Because the performance of a feature-based method largely depends on the quality of extracted features, SC-ARID-QPSO can have relatively good performance.This is because the registration accuracy is not much affected by the original extracted feature set even though the ARID-based parameter optimization via QPSO uses the initialized parameters obtained in Component I.
Figure 9 shows these comparative experimental results to demonstrate the effectiveness and superiority of SC-ARID-QPSO to RANSAC, LLT, LPM, and PSSC for registering both satellite and aerial optical remote sensing images with affine or perspective transformations.In Figure 9a, the SC-ARID method obtained relatively better registration results compared to other algorithms which validated the performance of SC-ARID, and ARID-minimized parameter optimization could further reduce the final registration error.This curve also showed that FBMs could obtain good results for image registration of affine deformation with the RMSE values lower than 1.5 pixels.Nevertheless, ARID-minimized parameter optimization could produce results with higher accuracy although it may be somewhat affected by the initialization parameters.In Figure 9b, the registration errors of experiments on dataset B were larger than those on dataset A. This was because dataset B is generated by projection deformation and the image resolution is relatively higher than the images in dataset A. Compared to other four methods, SC-ARID-QPSO could still obtain better results.In practical applications, it is acceptable to obtain a registration error of less than 1.5 pixels for an image with a resolution of 0.3 m.
As for computational complexity, SC-ARID-QPSO can be discussed in two aspects, the consistent feature set selection in Component I and the objective function minimization via QPSO for parameter optimization in Component II.In the Component I, the factors that affect the running time of SC-ARID are (a) the number of consistent feature sets, N SC ; and (b) the computing time of metrics, denoted by t ARID .In Component II, the factors contributed to the running time of SC-ARID-QPSO are (a) the number of iterations, N iter ; (b) the dimensionality of a particle, D defined by the type of the used transformation matrix; and (c) t ARID .Although the values of N iter and N SC are unknown, the maximum values of these variables in the conducted experiments were set to 100.Among them, t ARID is the primary factor to be affected by the image size.In order to further demonstrate the computational complexity, the time of calculating DV for each consistent feature set and the time of running a single iteration are used for illustration.For data set A, an image pair with a size of 512 × 512 registered by SC-ARID-QPSO required 0.85 s in calculating DV for each consistent feature set in Component I, and 11.47 s in a single-run iteration in Component II.For other compared methods, RANSAC, LLT, LPM, and PSSC took 13.26 s, 6.5 s, 4.68 s, and 3.48 s, respectively.The code used to conduct comparisons is publicly available in [56][57][58].Generally speaking, ABMs run longer time than FBMs because ABMs need to obtain the registration parameters through searching by template window sliding or parameter optimization methods.Nonetheless, as mentioned above, ABMs can achieve higher registration accuracy than FBMs.Since the proposed ARID was developed to calculate the information divergency of regional intensity information, its future work can look into its design rationale to develop more robust and accuracy metrics, and regional intensity information can be replaced by regional descriptor.

Conclusions
This paper develops a two-component cascaded registration method based on spatial consistency and average regional information divergency, to be called spatial-consistency (SC) and average regional information divergence (ARID) minimization via quantum-behaved particle swarm optimization (SC-ARID-QPSO).As expected, SC-ARID-QPSO produces better positional accuracy than other existing registration algorithms, for example, RANSAC [2], LLT [18], LPM [19,20], and PSSC [21].Several contributions which are not found in [41] are summarized as follows.

1.
A new measure, called average regional information divergency (ARID), is specifically designed to calculate the discrepancy between the registration parameters.2.
SC-ARID-QPSO, using ARID to select a consistent feature set in Component I as well as to optimize registration parameters in Component II, provides a novel approach as a new AFBM.

3.
The feature extraction method and parameter optimization algorithm used in SC-ARID-QPSO can be replaced by better feature extraction methods and optimization algorithms.Under certain extreme situations, where there are no more than three or more pairs of correctly matched feature pairs among initial SIFT feature pairs, other modified better feature extraction methods can be used to replace SIFT.In this case, the framework of SC-ARID-QPSO is still applicable.

Figure 1 .
Figure 1.A graphic diagram of implementing spatial-consistency and average regional information divergence minimization via quantum-behaved particle swarm optimization (SC-ARID-QPSO).

1 (.
overlapping regions, the intensity vectors of local neighborhood region with radius r are From information theory and spectral information divergence (SID)[49], we can define the region information divergence (RID) between v

Figure 1 .
Figure 1.A graphic diagram of implementing spatial-consistency and average regional information divergence minimization via quantum-behaved particle swarm optimization (SC-ARID-QPSO).

j
is the jth initial registration parameter and discrepancy values DVs set V initial consistent feature point set FS j * is selected based on minimum ARID, denoted by ARID

3. 2 . 1 ...,
Initial Parameters of QPSO Generated by the Consistent Feature Set Obtained in Component I 1. Find the coordinate of feature points in FS j * by randomly tuning parameter b to generate tuned feature point sets S tuned feature = FS j * The first M particles, which have smaller DVs than ARID (0) min , are used as initial particle swarms P If there are not M particle satisfying the DVs are less than ARID (0) min , the number of tuned feature set M should be increased.The initial particle swarms are also used as initial local optimum particles P denoted by ARID min , as the global optimum particle P (0) g .Remote Sens. 2020, 12, 3066 6 of 23 3.2.2.Parameters Optimized by Minimizing ARID via QPSO QPSO to optimize registration parameters: While k < N max do; Calculate β using Equation (7) and mbest (k−1) using Equation (4).For m = 1:M do For d = 1:D do Calculate p

FigureFigure 4 .
Figure Iterative plots for simulated images 1-3.(a) SVs and RMSE plots for simulated image 1.(b) SVs and RMSE plots for simulated image 2. (c) SVs and RMSE plots for simulated image 3.
RMSE corresponding to 1.401 pixels.On the other hand, * RANSAC 32 j 

Figure 7 .
Figure 7. Iterative plots for real images 1-3.(a) SVs and RMSE plots for real image 1.(b) SVs and RMSE plots for real image 2. (c) SVs and RMSE plots for real image 3.

Figure 9 .
Figure 9.Comparison of final RMSEs using SC-ARID-QPSO, SC-ARID, RANSAC, LLT, LPM and PSSC with different transformation.(a) Set-A data sets with affine deformation; (b) Set-B data sets with perspective deformation

Figure 9 .
Figure 9.Comparison of final RMSEs using SC-ARID-QPSO, SC-ARID, RANSAC, LLT, LPM and PSSC with different transformation.(a) Set-A data sets with affine deformation; (b) Set-B data sets with perspective deformation.

Table 1 .
Comparison of the final RMSEs for simulated image pairs.

Table 2 .
Comparison of the final RMSEs for real image pairs.

Table 1 .
Comparison of the final RMSEs for simulated image pairs.

Table 2 .
Comparison of the final RMSEs for real image pairs.