Improving Co-registration for Sentinel-1 SAR and Sentinel-2 Optical images

Co-registering the Sentinel-1 SAR and Sentinel-2 optical data of European Space Agency (ESA) is of great importance for many remote sensing applications. However, we find that there are evident misregistration shifts between the Sentinel-1 SAR and Sentinel-2 optical images that are directly downloaded from the official website. To address that, this paper presents a fast and effective registration method for the two types of images. In the proposed method, a block-based scheme is first designed to extract evenly distributed interest points. Then the correspondences are detected by using the similarity of structural features between the SAR and optical images, where the three dimension (3D) phase correlation (PC) is used as the similarity measure for accelerating image matching. Finally, the obtained correspondences are employed to measure the misregistration shifts between the images. Moreover, to eliminate the misregistration, we use some representative geometric transformation models such as polynomial models, projective models, and rational function models for the co-registration of the two types of images, and compare and analyze their registration accuracy under different numbers of control points and different terrains. Six pairs of the Sentinel-1 SAR L1 and Sentinel-2 optical L1C images covering three different terrains are tested in our experiments. Experimental results show that the proposed method can achieve precise correspondences between the images, and the 3rd. Order polynomial achieves the most satisfactory registration results. Its registration accuracy of the flat areas is less than 1.0 10m pixels, and that of the hilly areas is about 1.5 10m pixels, and that of the mountainous areas is between 1.7 and 2.3 10m pixels, which significantly improves the co-registration accuracy of the Sentinel-1 SAR and Sentinel-2 optical images.


Introduction
Sentinel-1 and 2 are special satellite series of European Copernicus program created by the European Space Agency (ESA).Equipped with new space sensors, Sentinel serial satellites carry out global observation with high revisit rate, and use special wave bands to monitor the Earth surface.At present, Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 Multiple Spectral Image (MSI) sensors have steadily provided products to the public.The public can download their original data (Level-0) and preprocessed data products (Level-1) free of charge.Although the Sentinel-1 SAR Level-1 (L1) and Sentinel-2 optical Level-1C (L1C) image productions have undergone geometric calibration and rectification, there are still evident misregistration shifts between the images due to their large differences in imaging mechanism.It is necessary to perform further co-registration for the two types of images.This is also a prerequisite step to integrate their image information for the subsequent image processing and analysis tasks such as image fusion [1][2][3][4], change detection [5], biological information estimation [6], land cover monitoring [7], classification application [8], etc.Therefore, this research has practical significance in promoting the comprehensive application of the Sentinel SAR and optical products.
Image registration aims to align two or more images acquired by different sensors or on different dates, which mainly include two steps [9]: image matching and geometric correction.Image matching is to detect correspondences or control points (CPs) between images, while geometric correction is to determine a geometric transform using CPs and perform image rectification.
The correspondence detection between the Sentienl-1 SAR and Sentinel-2 optical images belongs to the category of multimodal image matching because they acquire images in different spectral ranges, which results in significant nonlinear radiation or intensity differences between them.In general, multimodal image matching techniques can be divided into feature-based approaches and area-based approaches [10].In feature-based approaches, salient features such as corners, line intersections, edges, or boundaries of objects are first detected, and then are matched by means of their similarity to find correspondences between images.Currently, Feature-based approaches mainly utilize some invariant feature detectors and descriptors such as Harris [11], SIFT (Scale Invariant Feature Transform) [12], SURF (Speeded Up Robust Features) [13], and their improved versions for image matching [14,15].However, these features are sensitive to complex nonlinear radiometric changes [16], which makes their matching performance degraded.Area-based methods usually employ a template matching manner to detect correspondences using some similarity measures such as the SSD (Sum of Squared Differences), the NCC (Normalized Cross Correlation), MI (Mutual Information), etc. SSD and NCC are vulnerable to multimodal image matching because they cannot effectively handle nonlinear radiometric differences [10,17].MI can tolerate radiometric differences to a certain extent [18], but it is difficult to meet the requirement of real time processing due to its large computational cost [17].Recently, structure and shape features, such as HOG (Histogram of Orientated Gradient) [19], LSS (Local Self-Similarity) [20], HOPC (Histogram of Orientated Phase Congruency) [21] and their improved versions [16,22,23], are integrated as similarity measures for multimodal matching.Since they can capture common properties between multimodal images, they are robust to nonlinear radiometric changes [24].Accordingly, the area-based methods based on structure and shape features are suitable for the matching of the Sentinel-1 SAR and Sentinel-2 optical images.The main problem of these methods is how to raise their computational efficiency and make the achieved correspondences uniformly distributed over images, which will be addressed in this document.
For remote sensing image rectification, it is a procedure to use rigorous and non-rigorous mathematical models to align the sense images with the reference images.Image rectification aims to find the optimal mathematical transformation model that can fit geometric distortions between images.In general, the precise geometric correction of remote sensing images is performed by using rigorous mathematical models and orbit ephemeris parameters of satellites.However, the orbit ephemeris parameters are not available for ordinary users.Accordingly, non-rigorous or empirical mathematical models are often used for remote sensing image registration.These models mainly include Direct Linear Transformation (DLT), affine transformation models, polynomial models, Rational Function Models (RFMs), projective transformation models, etc. El-Manadili and Novak suggested the use of DLT for geometric correction of SPOT images [25].Okamoto et al., discussed affine models for SPOT level 1 and 2 stereo scenes and achieves good results [26].Subsequently, geometric correction based on polynomial and projective models is performed on high resolution images such as SPOT and IKONOS images, and the correction accuracy is evaluated under different number of CPs and different terrains [27][28][29].Also, RFMs are widely applied for geometric correction and 3D reconstruction of high resolution images [30][31][32][33][34][35].These research illustrates that the nonrigorous or empirical geometric models can be used for precise correction of remote sensing images.Accordingly, these empirical models will be explored for the co-registration of the Sentinel-1 SAR and Sentinel-2 optical images in this document.
Sentinel serial satellites have provided a large number of time serial products since they are put into operation.Many researchers have carried out relevant registration research and geolocation accuracy analysis on the published product data.For example, Schubert et al. and Languille et al. presented first results of geolocation assessment for Sentinel-1A SAR images and Sentinel-2A optical images, respectively [36,37].Their results show that both the SAR L1 productions and the optical L1C productions meet their geolocation accuracy specifications defined by ESA.Then, the geolocation accuracy of Sentinel-1A and 1B SAR L1 productions are further validated and improved using high precise corner reflectors [38,39] Meanwhile, Yan et al. characterized the misregistration errors and proposed a sub-pixel registration method for Sentinel-2A multi-temporal images [40].In addition, some researchers also conducted the registration tests between Sentinel-2 optical images and other satellite images such as Landsat-8 images.Barazzetti et al. analyzed the co-registration accuracy of the Sentinel-2 and Landsat-8 optical images [41].Then, Yan et al. proposed an automatic sub-pixel registration method for the two types of images [42].Subsequently, Skakun et al. explored a phase correlation method for CP detection, and evaluated a variety of geometric transformation models for the co-registration of Sentinel-2 and Landsat-8 optical images [43].Recently, Stumpf et al. improved the registration accuracy of Sentinel-2 and Landsat-8 optical images for Earth surface motion measurements by a dense matching method [44].
When we use the Sentinel-1 SAR L1 images and Sentinel-2 optical L1C images which are directly downloaded from the official website1 , it can be observed that the two types of images have significant misregistration shifts.This is because the Sentinel-1 SAR images have not undergone the terrain correction, and the side-looking geometry of the SAR images results in significant geometric distortions with respect to Sentinel-2 the optical images.Current methods main perform the coregistration of the Sentinel-1 SAR and Sentinel-2 optical images by using the Sentinel Application Platform (SNAP) toolbox2 to carry out the terrain correction process for Sentinel-1 SAR images [1][2][3], and assume that such process can align the two types of images well [1].However, the registration accuracy is quite satisfactory because the data instructions given by ESA [45,46] illustrate that the Sentienl-1 SAR L1 productions and the Sentinel-2 optical L1C productions have a geolocation accuracy of 7m (3 ) and 12m (3 ), respectively.In other words, the two types of images may have a registration error of about 2 10m pixels.Moreover, the terrain correction process is quite time consuming which cannot meet real-time applications.To address that, this paper presents a fast and robust matching method for the directly downloaded Sentinel-1 and Sentinel-2 images , which is used to measure their misregistration shifts and determine an optimum geometric transformation model for their co-registration.Firstly, we propose a block-based image matching scheme based on structural and shape features.In this procedure, a blocked extraction strategy with the Features from Accelerated Segment Test (FAST) operator is designed to detect evenly distributed interest points from images.Then structure features are extracted by using gradient information of images, and the three dimension (3D) phase correlation (PC) is used as the similarity measure for detecting correspondences by a template matching manner.The proposed matching scheme could insure the uniform distribution of correspondences by the block extraction strategy, handle nonlinear radiometric differences using the similarity of structure features, and accelerate image matching because of the use of 3D PC.Subsequently, these correspondences are used to measure and analyze the misregistration shifts between the Sentinel SAR and optical images.Meantime, we also compare the registration accuracy of the current representative geometric transformation models (such as polynomials, projective transformations and RFMs), and analyze various acquisition factors that influence their performance (e.g.terrain, number of CPs, etc.).Finally, we determine an optimal geometric transformation model for the co-registration of the Sentinel SAR and optical images, and improve the registration accuracy of such images.The main contributions of this paper are as follows.
(1) We design a block-based matching scheme based on structural features to detect evenly distributed correspondences between the Sentinel-1 SAR L1 images and the Sentinel-2 optical L1C images, which is computationally effective and is robust to nonlinear radiometric differences.(2) We precisely measure the misregistration shifts between the Sentinel SAR and optical images.
(3) We compare and analyze the registration accuracy of various geometric transformation model, and determine an optimal model for the co-registration of the Sentinel SAR and optical images.This paper is structured as follows.Section 2 first gives the introduction for the Sentinel-1 SAR and Sentinel-2 optical image productions, and then Section 3 describes the proposed co-registration method for the two types of images.Subsequently, quantitative experimental results are presented in Section 4. Finally, we conclude with a discussion of the results and a recommendation for future work.

Sentinel-1 SAR and Sentinel-2 optical data introduction
The Sentinel-1 SAR instrument provides a long-term Earth observation with high reliability and improved revisit time at C-band, which captures data in four operation modes [47]: Stripmap (SM), Interferometric Wide swath (IW), Extra Wide swath (EW), and Wave (WV).The IW mode is the primary acquisition mode over land and satisfies the majority of service requirements.It provides a large swath of 250km at 5  20m spatial resolution (single look).The IW L1 productions include the Single Look Complex (SLC) data and the Ground Range Detected (GRD) data.The IW L1 GRD data are the multi-looked intensity or gray images, which have the similar visual features with optical images and are available in two resolutions (20  22m and 88  87m).The data in the two resolution are respectively resampled at the pixel spacing of 10  10m and 40  40m, and then they are distributed to users.In this paper, the 10m data are used as it provides more spatial detail than the 40m data.The Sentinel-1 IW L1 GRD 10m productions are provided in geolocated tiles of about 25000  16000 pixels in the geographic longitude/latitude coordinate in the World Geodetic System 84 (WGS84) datum.The downloaded Sentinel-1 L1 GRD images have not undergone the terrain correction, which results in they have significant geometric distortions related to their side-looking geometry.
The Sentinel-2 carries the MSI that acquires data at 13 spectral bands: 4 visible and near-infrared bands with 10m, 6 near-infrared, red edge and short wave infrared bands with 20m, and 3 bands with 60m resolution.Similar to Sentinel-1, the Sentinel-2 10m bands are used in this paper because of its higher resolution compared with the other bands.The Sentinel-2 MSI covers a field of view providing a 290km swath and performs a systematic global observation with high revisit frequency.The processed L1C 10m productions are available in geolocated tiles of 10980  10980 pixels with the UTM projection in the WGS84 datum.
According to the above description, the Sentinel-1 IW L1 GRD 10m productions and the Sentinel-2 L1C optical 10m productions present fine image details and have the same resolution, which makes the two types of images easily intergraded for the following remote sensing image application.Therefore, this paper carries out the co-registration between the Sentinel-1 SAR L1 IW GRD 10 images and the Sentinel-2 optical L1C 10m images.

Detection of correspondences or CPs
The first step of registering the Sentinel SAR and optical images is to achieve reliable and precise correspondences or CPs between such images.Due to different imaging modes, The Sentinel SAR and optical images have significant nonlinear radiometric differences, that is, the images present quite different intensity and texture information.To address that, we employ structure features for correspondence detection because structure properties can be preserved between SAR and optical images in spite of significant radiometric differences [16,21,48].Moreover, to rapidly extract evenly distributed correspondences, this paper designs a block-based matching strategy and perform image matching in frequency domain for acceleration.Meanwhile, this matching scheme also fully considers the characteristics of Sentinel images such as large data volume (more than 10000*10000 pixel) and initial georeference information.Generally speaking, the proposed matching method consists of four steps: interest point detection, structure feature extraction, interest point matching, and mismatch elimination, where the Sentinel-2 optical image is used as the reference image and the Sentinel-1 SAR image is used as the sensed image.

Interest point detection
Currently, there are many classical operators to detect interest points such as Moravec [49], Harris [11], Difference of Gaussians (DoG) [12], the Features from Accelerated Segment Test (FAST) [50] and so on.Given the decisive advantage that is its faster high computation speed than others, the FAST operator, therefore, is employed to detect interest points in this paper.
In order to detect evenly distributed interest points in the reference image, we design a block detection strategy on based the FAST operator, which named the block-based FAST operator.Specifically, the image is first divided into N N  non-overlapping blocks, and the FAST value is computed for each pixel of every block.Then K pixel points with the largest FAST values are selected as the interest points in a block.As a result, we can obtain

Structure feature extraction
Once a number of interest points are detected in the reference image, we determine a template window centered on an interest point, and predict its corresponding search window in the sensed image by the initial georeference information.Then we extract structure features for both the template and search windows, and use them for correspondence detection by a template matching manner.The selection of structural feature descriptor is crucial for image matching, and it needs to consider both the robustness and computational efficiency.Accordingly, a recently published structural feature descriptor [24] named CFOG (Channel features of Orientated Gradients), is employed for correspondence detection because it presents faster and more robust matching performance than other state of the art structural feature descriptors such as HOG, LSS and HOPC.CFOG is built based on dense orientated gradient features of images, and its construction process (see Figure 2) mainly including two part: orientated gradient channels and 3D Gaussian convolution.Given an image, the m orientated gradient channels of the image first are calculated.The i g is used to represent each orientated gradient channel 1 im  .In the actual calculation, the gradient amplitudes in the horizontal and vertical directions are used to calculate the value of each orientated gradient channel, which are denoted as x g and y g , respectively, which can be expressed by the following equation: Where,  represents the orientation of the gradient, abs represents the absolute value, and its purpose is to limit the gradient direction within the range of 0° to 180°.After that, the feature channel of convolution is realized by using a 3D-like Gaussian kernel.
Where,  denotes the convolution operation, G denotes the Gaussian function, and  is its standard deviation.The aim of such operation is to integrate orientated gradient information of neighborhood for feature description, which can improve the robustness to noise and nonlinear radiometric differences.

Interest point matching
CFOG yields the 3D pixel-wise descriptors with large data volume for both the template and search windows.It is usually time consuming if using these descriptors to perform the template matching by the traditional similarity measures (e.g., SSD and NCC) in spatial domain.Accordingly, we carry out the similarity evaluation in frequency domain, where the 3D PC is used as the similarity metric.The 3D PC can effectively speed up the image matching by using the Fast Fourier Transform (FFT) because the dot product operation in frequency domain is equal to the convolution operation in spatial domain.The details of the matching technique are as follows.
Given the CFOG descriptors of the template and search windows, their relationship is expressed as follows: Where ( , , ) Their normalized cross-power spectrum can be expressed as: In the above equation, * denotes the complex conjugate.According to the translation theory, a Dirac delta function

Mismatch elimination
Due to some occlusions such as clouds and shadow, it is inevitable to cause some correspondences with large errors during the matching process.Therefore, it is necessary to eliminate these mismatches to improve the matching quality.In this paper, the Random Sample Consensus (RANSAC) algorithm is applied to remove these outliers [51], achieving the final correspondences between the Sentinel SAR and optical images.After that, these correct correspondences are used for the subsequent analysis of geometric transformation models.

Mathematical models of geometric transformation
After sufficient CPs are obtained between the Sentinel SAR and optical images, it is crucial to determine an optimum geometric transformation model for the co-registration of such images.Accordingly, we investigate several representative transformation models, and analyze their registration accuracy.These models include polynomial models, projective models, and RFMs, which are described in details as follows.

Polynomial models
Image correction based on polynomials does not take into account the geometric process of spatial imaging, and deals with image deformation by means of a mathematical simulation.Geometric distortions of remote sensing images are often quite complicated because they are caused by a variety of factors such as imaging models, Earth curvature, atmospheric refraction, etc.In general, these distortions can be considered as the combined effect of translation, rotation, scaling, deflection, bending, affine and higher-order based deformation.It is difficult to build a rigorous mathematical model to correct these distortions, whereas an appropriate polynomial can be used to fit the geometric relationship between images.Due to their simplicity and availability, polynomials have been widely applied for image correction by many commercial software packages.Common polynomials include the 1st.Order polynomial, the 2nd.Order polynomials, the 3rd.Order polynomials and higher-order polynomials.The general two-dimensional polynomial model is expressed as follows: Where ( , ) xy are the coordinates of the sensed image, ( , ) XY are the coordinates of the reference image, and ( , ) ab are the polynomial parameters.The minimum number N of correspondences for solving the parameters is related to the order n of polynomials, which is calculated by

Projective models
The projective transformation means that if a straight line in one image is mapped to another image, it is still a straight line, but the parallel relation between lines cannot be maintained.Twodimensional plane projective model is a linear transformation of homogeneous three-dimensional vectors.Under homogeneous coordinate system, the projective transformation can be described in the following nonsingular 3x3 matrix form. 1 By setting 8) is rewritten as With the development of geometric transformation models, the projective transformation has been extended by changing the properties of same denominator and increasing the number of parameters, thus forming some flexible projective models for remote sensing image correction.These projective models are as follows.
The projective model of 10 Parameters is The projective model of 22 Parameters is The projective model of 38 Parameters is

RFM
Nowadays, the RFM is one of the commonly used geometric transformation models for remote sensing image correction.Lots of research has shown that the RFM can be used as a generalized sensor model, which provides an exact approximation of rigorous sensor models for many satellite images, especially for high-resolution satellite images.This means that the RFM can replace rigorous senor models for precise geometric correction of high-resolution images.Due to its generalization, the RFM has been used as a standard data transfer format for high-resolution images.The RFM relates the object space ( , , ) X Y Z coordinates to image space ( , ) rc coordinates in the form of rational functions that are ratios of two polynomials.The generic form of the RFM is given as: ( , , , ) P P P P denote the polynomials of ( , , ) n n n X Y Z , and their highest order is often limited to 3. In such a case, the polynomial has the following form.
( ) Where The RFM has two types with same denominator (i.e., P2 = P4) and different denominator (i.e., P2 ≠ P4).The minimum number of CPs is determined by the order of this model, which is given in Table 1.It is worth noting that the RFM is a 3 three-dimensional polynomial model when P2 = P4 = 1.

Image co-registration
Based on the above matching approach and geometric transformation models, this paper proposes a technical solution to measure the misregistration shifts and improve the co-registration accuracy between the Sentinel SAR and optical images.Figure 3 shows the technique solution, which includes the following steps.

Data preprocessing
According to the data description in Section 2, the Sentinel-1 SAR L1 images and Sentinel-2 optical L1C images have different projection coordinate systems and different sizes.Accordingly, a reprojection and a crop operation are carried out for the two types of images before co-registration.In this study, the SAR images are reprojected into the coordinate system (i.e., UTM and WGS84) of the optical images.Considering that the SAR images have some geolocation errors, and their size are larger than the optical images, we crop the SAR images by using the geographical range of the optical images with adding some margin (e.g. 100 pixels) in each direction.Such process ensure that the two types of images can cover the same geographical range.

Image matching
We detect sufficient and evenly distributed correspondences between images by the technique described in Section 3.1.Some of these correspondences are selected as CPs which are used to calculate the parameters of geometric transformation models, while others are used as checkpoints to evaluate the registration accuracy of these models.

Misregistration measurement
The correspondences between the Sentinel SAR and optical images should have the same geographical coordinates if the two types of images are co-registered exactly.Based on this precondition, the differences of geographical coordinates of the correspondences correspond to the misregistration shifts between the Sentinel SAR and optical images.Accordingly, the misregistration shift is quantified as: is the mean misregistration shift between the SAR and optical images, and there are a total of m correspondences.

Parameter calculation of geometric models
Based on the obtained CPs, we calculate the parameters of the employed geometric models including polynomial models, projective models, and RFMs using the least square method.Meanwhile, the Digital Elevation Model (DEM) is introduced for the parameter calculation of RFMs.

Accuracy analysis and evolution
We adjust the number of CPs to perform the registration experiment, and then compute the root mean square errors (RMSEs, Equation 16) of checkpoints for accuracy evaluation of each geometric model.Finally, the optimal geometric model is determined for the co-registration of the SAR and optical images.

(
) ( ) Where T denotes the geometric transformation model, and N is the number of checkpoints.

Experimental data
We select six pairs of Sentinel-1 SAR L1 images and Sentinel-2 optical L1C images for the experiment.These images locate in China and Europe (see Figure 4), and cover three types of different terrains such as flat areas, hilly areas, and mountainous areas.Each type includes two pairs of images.These images are almost cloud-free, which makes the applied matching technique reliable to detect correspondences between such images.Table 2 gives the detail description of experimental data.In addition, 30m DEM data (from ASTER GDEM) are used as the elevation reference data for image correction.All these experimental data are shown in Figure 5.  Image locates in southeastern of Poland, with Kielce city as the center.From the DEM map in Figure 5(f), the mountain range crosses the image area diagonally.
Although there are no obvious peaks, but the elevation distribution is very complex, floating around 190-500 meters, with a maximum elevation difference of 300 meters, belonging to the mountain terrain.

Image matching
In the image matching, the optical image is first divided into 20  20 girds and one FAST interest point is detected in each grid, reaching a total of 400 interest points.Then their match points in the SAR image is achieved by using CFOG and 3D PC with a template matching manner, where the template window is set to 100  100 pixels, and the search window is set to 200  200 pixels.Finally, the RANSAC algorithm is employed to remove the outliers to obtain the correspondences.The matching process is performed on a personal computer (PC) with Intel Core i7-10700KF CPU 3.80 GHz.Table 3 shows that the number of the interest points, the correspondences and the run time.It can be found that these image pairs achieve different number of correspondences because of their different image characteristics.Specifically, the images covering flat areas obtain more correspondences than the images covering mountain areas.This may be attributed to that the images of mountain areas have more significant local geometric distortions than these of flat areas.These geometric distortions results in the extracted structural features between the images cannot correspond well, which degrades the matching performance to some degree.In addition, the run time is about 6.6 seconds for all the image pairs, which demonstrates the high computational efficiency of the proposed match method.Overall, our match approach is fast and reliable for the matching of the Sentinel SAR and optical images, and the obtained correspondences for all the tested image pairs are sufficient for their co-registration.To make a fair and reasonable test, it is necessary to use the same number of correspondences to perform the following misregistration measurement and compare the registration accuracy of different transformation models.Accordingly, we select 143 correspondences with least residuals for each image pair in the following tests.

Misregistration measurement
According to the obtained 143 correspondences in the previous subsection, we use their mean offsets of match locations to compute the misregistration shifts x  , y  and s  (see Equation 15).
Figure 6 shows the misregistration shifts of the six pairs of images covering three different terrains.
In general, the misregistration shifts presents a large difference for different terrains.The misregistration shifts are kept at about 20-30 pixels in the flat areas and about 20-40 pixels in hilly areas, respectively.While in the mountainous areas with large topographic relief, the misregistration shifts increases sharply to about 50-60 pixels.This is because that Sentinel SAR and optical images have different imaging geometry models, where the SAR sensor acquires data by a side-look imaging way while the optical sensor captures data by a push-broom imaging way.This makes geometric distortions between them more significant with the increase of terrain fluctuation.

Figure 6．Statistics of mean misregistration shifts (units pixels) of correspondences for different terrains
Table 4 gives the statistics of the misregistration shifts for each image pair.We can see that all these images pairs have the large standard deviations (STDs), which means that the misregistration shifts are unevenly distributed across the images, that is, the misregistration shifts are quite different for the correspondences in different regions of each image pair (see Figure 7).The results indicate that these images have significant nonlinear geometric distortions, and different image regions correspond to different misregistration shifts.The main reason for that is the Sentinel-1 SAR images have not undergone the terrain correction.The above results demonstrate that there are evident misregistration shifts between the Sentinel-1 SAR and Sentinel-2 optical images.Accordingly, a further co-registration process for the SAR and optical images is presented in the following.

Co-registration accuracy analysis and evaluation
In this subsection, we will evaluate the registration accuracy of different geometric transformation models for the Sentinel-1 SAR and Sentinel-2 optical images.Firstly, the obtained 143 correspondences (in subsection 4.2) are divided into checkpoints and CPs (see Figure 8), where 48 correspondences are used as checkpoints and the others are used as CPs (their number is variable).Then, we compare the performance of the polynomials (from 1st up to 5st order), the projective transformations (for 10 parameters up to 38 parameters), and the RFMs (from 1st up to 3rd) under different number of CPs (from 25 to 95) and different terrains.Finally, an optimum geometric model is determined for the co-registration of the two types of images.The experimental analysis for different terrains is given in the following.

Accuracy analysis of flat areas
Let us first analyze the registration accuracy of the polynomial models.It can be found from Table 5 that the RMSEs of the 1st.Order polynomial are more than 10 pixels for the two flat areas, which are much larger than these of the other polynomials.Accordingly, we mainly compare the registration accuracy of the polynomials (from 2nd.Order to 5th.Order) because their RMSEs are relatively close.Figure 9 shows the checkpoint RMSEs versus the number of CPs for these polynomials.It can be observed that the 2nd.Order and the 3rd.Order polynomials present the stable performance when the number of CPs changes.The 3rd Order polynomial achieves higher registration accuracy than the 2nd.Order polynomial, and its accuracy are kept at about 0.4 pixels for study area 1 and about 0.75 pixels for study area 2, respectively.For the 4th.Order and 5th.Order polynomials, although their accuracy is close to that of the 3rd.Order polynomial under a large number of control points (e.g., more than 80), their performance has a significant fluctuation with the change of the number of CPs.Moreover, they require more CPs for image registration compared with the 3rd.Order polynomial, which will increase computational cost and is not beneficial to practical application.Based on the above analysis, the 3rd.Order polynomial performs better compared with the other polynomials.For projective models and RFMs, we can see from Table 5 that the 22-parameter and 38parameter projective models perform better than the other projective models, and the 2nd.Order and 3rd.Order RFMs (different denominator) achieves higher accuracy than the others.Accordingly, these models are compared with the 3rd.Order polynomial.Figure 10 shows their checkpoint RMSEs versus the number of CPs.It can be clearly observed that the 3rd.Order polynomial achieve the smallest RMSEs under the any amount of CPs among these models.Moreover, the performance of the 3rd.Order polynomial are more stable than that of the other models when the number of CPs changes.These results demonstrate that the 3rd.Order polynomial is the best geometric transformation model for the co-registration of the Sentinel-1 SAR and Sentinel-2 optical images in the flat areas.

Accuracy analysis of hilly areas
The experimental analysis is the similar to that in the flat areas.We also first compare and analyze the registration accuracy of the polynomials.From Table 6, we can see that the 1st Order polynomial performs much worse than the other polynomials.Accordingly, the comparative analysis focuses on the polynomials of 2nd.Order to 5th.Order.Figure 11 shows the checkpoint RMSEs versus the number of CPs for these polynomials.It can be seen that the 2nd.Order polynomial obtains lower accuracy than the other polynomials, while the 3rd.Order, the 4th.Order, and the 5th.Order polynomials achieve the similar accuracy.However, in study area 4, the accuracy of the 4th.Order polynomial has a sharp fluctuation when the number of CPs changes (see Figure 11(b)), which shows its instability for image registration.The 3rd.Order and the 5th.Order polynomials present the stable performance and have the same level of accuracy, but the 3rd.Order polynomial requires less CPs and is more computational efficient than the 5th.Order polynomial.Accordingly, the 3rd.Order polynomial is more preferable than the other polynomials.For the other geometric models, the ones with higher accuracy are selected to compare with the 3rd.Order polynomial.We can see from Table 6 that these models are the 22-parameter projective transformation, the 38-paremeter projective transformation, the 2nd.Order and the 3rd.Order RFMs (different denominator).Figure 12 shows the checkpoint RMSEs versus the number of CPs for these models.Apparently, the 3rd.Order polynomial achieves a registration accuracy of about 1.5 pixels for both study area 3 and area 4, and outperforms the other models under any mount of CPs.These results confirm that the 3rd Order polynomial is optimal among these models for the co-registration of the Sentinel-1 SAR and Sentinel-2 optical images in the hilly areas.

Accuracy analysis of mountainous areas
Figure 13 shows the checkpoint RMSEs versus the number of CPs for the polynomials.We can see that the 2nd.Order polynomial perform much worse than the other polynomials.The accuracy of the 5th.Order polynomial has a significant fluctuation when the number of control points changes.By comparison, the 3rd.Order and the 4th.Order polynomials have more stable performance and achieves higher accuracy.Compared the 4th.Order polynomial, the performance of the 3rd.Order polynomial is more stable for study areas 6, and it requires less CPs.Accordingly, the 3rd.Order polynomial is more suitable than the other polynomials for the co-registration of these images.The same as the analysis scheme in the flat areas and hilly areas, we choose other several models with higher accuracy from Table 7, and compare them with the 3rd.Order polynomial.Figure 14 shows the checkpoint RMSEs versus the number of CPs for these models.For study area 5, the accuracy of the 3rd.Order polynomial is about 1.7 pixels, which is much better than that of the other models.In the case of study area 6, the accuracy of the 3rd.RFM model improves with the increase of the number of CPs, and its accuracy is close to that of the 3rd.Order polynomial when the number of CPs reaches 95.However, the 3rd.Order polynomial still yields slightly better accuracy than the 3rd.RFM model.The accuracy is about 2.3 pixels.Furthermore, the 3rd.Order polynomial require less control points compared with the 3rd.RFM model.Therefore, the 3rd.Order polynomial outperforms the other geometric transformation models for the co-registration of the Sentinel-1 SAR and Sentinel-1 optical images in the mountainous areas.

Evaluation of registration results
Based on the above experimental analysis, the 3rd.Order polynomial is the optimal geometric transformation model for the co-registration of the Sentinel-1 SAR and Sentinel-2 optical images, and thereby this model is used to perform the image registration.This process just takes about 3 seconds for one image pair by using our PC.Since the previous image matching process takes about 6.7 seconds, the proposed method costs less than 10 seconds to complete the whole co-registration.
To illustrate the advantage of the proposed method, it is compared with the terrain correction process using the SNAP toolbox.Table 8 gives the registration accuracy and run time of the two methods.We can see that the proposed method outperforms the terrain correction process for most test cases in registration accuracy, especially for the images covering flat and hilly areas.Moreover, the proposed method is almost 100 times faster than the terrain correction process, which is quite beneficial for the response of emergency events such as earthquakes and floods.The registration results is shown in Figure 15.It can be observed that the Sentinel-1 SAR and Sentinel-2 optical images have been aligned well.

Discussions and Conclusions
This paper first proposed a fast and robust block-based match scheme based on structural features and 3D PC for correspondence detection between the Sentinel-1 SAR L1 IW GRD 10m images and the Sentinel-2 optical L1C 10m images.Then the obtained correspondences are used to measure the misregistration shifts of the two types of images and analyse the performance of various geometric transformation models for their co-registration.Finally, an optimal geometric model is determined to improve the registration accuracy.
In the image matching, we apply the various techniques to ensure the matching quality.The block-based scheme aims to make the correspondences evenly distributed over the images, the CFOG structural feature descriptor can handle significant radiometric differences between the images, and 3D PC greatly improves the computational efficiency.The mismatches are removed by using the RANSAC algorithm.Accordingly, the obtained correspondences are reliable for the misregistration measurement and the accuracy evaluation of geometric transformation models.
The misregistration is measured by computing the geometric shifts of correspondences between the two types of images.Six pairs of images covering different terrains are matched and used for the misregistration measurement, as well as for the subsequent co-registration tests.Experimental results show that the misregistration shifts between them are about between 20 and 60 10m pixels, and the shifts vary drastically in different regions of images.Furthermore, the misregistration increases with the increase of topographic relief.Specifically speaking, the misregistration of the flat areas is kept at 20-30 10m pixels, and that of the hilly areas is kept at 20-40 10m pixels, and that of mountainous areas increase to 50-60 10m pixels.Such large of registration shifts result in that the two types of images cannot be effectively integrated for the subsequent remote sensing applications without a precise coregistration.Since only the six pairs of images are used to measure the misregistration shifts, the shifts may be different when using the images located in other areas and acquired at other times.However, this research finds that the misregistration shifts are common between the Sentinel-1 SAR L1 and Sentinel-2 optical L1C images.Moreover, the larger topographic relief is, the larger the misregistration shifts are.
To improve the co-registration accuracy between the Sentinel-1 SAR and Sentinel-2 optical images, we compare the performance of a variety of geometric transformation models including polynomials, projective models, and RFMs.In general, considering some factors such as registration accuracy, required number of CPs, and stability of performance, the 3rd.Order polynomial performs better than the other models, while the projective models don't achieve the satisfactory registration accuracy.For the RFMs, especially the 3rd.Order RFM, although it considers the influence of terrain height and has been widely used for the correction and registration of high-resolution images, its performance is not quite satisfactory for the co-registration of the Sentinel SAR and optical images.The main reason may be that the parameters of the RFMs are calculated by plenty of CPs in the case of high-resolution images [31].Whereas in our experiments, the number of CPs are limited within 95, which may not be able to provide the exact solution for the RFM parameters.Accordingly, the accuracy of the RFMs could be improved by increasing the number of CPs.In a word, our experiments show that the 3rd.Order polynomial is the optimal geometrical model under a limited number of CPs (e.g., less than 95) for the co-registration of the Sentinel-1 SAR and Sentinel-2 optical images.
Compared with the co-registration process pipeline that the terrain correction is applied on the Sentinel-1 SAR images by using the SNAP toolbox, the proposed method is almost 100 times faster, and obtains higher registration accuracy for most tested image pairs, especially for the images covering flat and hilly areas.Therefore, the proposed method can effectively improve the coregistration for the Sentinel SAR and optical images, and also can meet the requirement of real-time processing.In addition, the registration accuracy of the proposed method varies for different terrains.When using the optimal geometric model (i.e., the 3rd.Order polynomial), The flat areas achieve the highest accuracy that reaches the sub-pixel (probably 0.4 to 0.75 pixels), followed by the hilly areas which achieve an accuracy of about 1.5 pixels, whereas the mountainous areas have the lowest accuracy which is between 1.7 and 2.3 pixels.These results illustrate that the registration accuracy decrease with the increase of topographic relief.This is because large topographic relief increases local geometric distortions between images, and the 3rd.Order polynomial only can approximately fit these distortions.To track this problem, a possible solution is to perform a true ortho-rectification using the rigorous physical models and orbit ephemeris parameters of Sentinel-1 and Sentinel-2 satellites.Meanwhile, a high resolution DEM data also should be used in this process.These work will be carried out in future research.Moreover, we also will test some non-rigid geometric transformation models such as the piecewise-linear and thin-plate-spline functions for the coregistration of the Sentinel-1 SAR and Sentinel-2 optical images.

Figure 1 .
Figure 1.Extraction results of interest points by two different strategies.(a) Initial FAST operator.(b) Block-based FAST operator.

Figure 2 .
Figure 2. The construction process of CFOG.
can be obtained by the inverse Fourier transform of the normalized cross-power spectrum.The function has an obvious sharp peak value at the offset position, while the value of other positions is close to zero.Accordingly, the offset 0 0 ( , ) xy between the template and search windows can be determined by the peak position.
model parameters.The minimum number of CPs for the 10-parameter projective model is 5, that for 22-parameter projective model is 11, and that for the 38-parameter projective model is 14.
model parameters called rational polynomial coefficients.

Figure 3 .
Figure 3. Flowchart of misregistration measurement and improvement for the Sentinel-1 SAR and the Sentinel-2 optical images.

y
are the matched location for the correspondence i between the Sentinel SAR and optical images, , xy are the misregistration shifts (units 10m pixels) in x and y directions for one correspondence, and , xy  are the mean misregistration shifts in x and y directions.s

Figure 4 .
Figure 4. Study areas in China and Europe

Figure 7 .
Figure 7. Distribution of misregistration shifts (units pixels) x  and y  of different correspondences for all the study areas.(a) Study area 1.(b) Study area 2. (c) Study area 3. (d) Study area 4. (e) Study area 5. (f) Study area 6.

Figure 10 .
Figure 10.Checkpoint RMSEs of the 3rd.Order polynomial and other geometric models for the flat areas.(a) Study area 1.(b) Study area 2.

Figure 12 .
Figure 12.Checkpoint RMSEs of the 3rd.Order polynomial and the other geometric models for the hilly areas.(a) Study area 3. (b) Study area 4.

Figure 13 .
Figure 13.Checkpoint RMSEs of the polynomials for the mountainous areas.(a) Study area 5. (b) Study area 6.

Figure 14 .
Figure 14.Checkpoint RMSEs of the 3rd.Order polynomial and the other geometric models for the Mountainous areas.(a) Study area 5. (b) Study area 6.

Table 1 .
Nine cases for the RFM

Table 2 .
Detailed description of experimental data Image locates in the Midwest of France, centered at Tours city.From the DEM map in Figure5(d), the area has many hills and presents an undulating terrain.

Table 3 .
Matching statistics of all study areas

Table 4 .
Statistics of the misregistration shifts (units pixels) of all the study areas

Table 5 .
Checkpoint error statistics (units pixels) of different geometric models in the flat areas, where the number of CPs is 95 and the number of checkpoints is 48

Table 6 .
Checkpoint error statistics (units pixels) of different geometric models in the hilly area, where the number of CPs is 95 and the number of checkpoints is 48.

Table 7 .
Checkpoint error statistics (units pixels) of different geometric models in the mountainous areas, where the number of CPs is 95 and the number of checkpoints is 48.

Table 8 .
Comparison of registration results of the proposed method and the terrain correction process.