Near Real-Time Automatic Sub-Pixel Registration of Panchromatic and Multispectral Images for Pan-Sharpening

: This paper presents a near real-time automatic sub-pixel registration method of high-resolution panchromatic (PAN) and multispectral (MS) images using a graphics processing unit (GPU). In the ﬁrst step, the method uses differential geo-registration to enable accurate geographic registration of PAN and MS images. Differential geo-registration normalizes PAN and MS images to the same direction and scale. There are also some residual misalignments due to the geometrical conﬁguration of the acquisition instruments. These residual misalignments mean the PAN and MS images still have deviations after differential geo-registration. The second step is to use differential rectiﬁcation with tiny facet primitive to eliminate possible residual misalignments. Differential rectiﬁcation corrects the relative internal geometric distortion between PAN and MS images. The computational burden of these two steps is large, and traditional central processing unit (CPU) processing takes a long time. Due to the natural parallelism of the differential methods, these two steps are very suitable for mapping to a GPU for processing, to achieve near real-time processing while ensuring processing accuracy. This paper used GaoFen-6, GaoFen-7, ZiYuan3-02 and SuperView-1 satellite data to conduct an experiment. The experiment showed that our method’s processing accuracy is within 0.5 pixels. The automatic processing time of this method is about 2.5 s for 1 GB output data in the NVIDIA GeForce RTX 2080Ti, which can meet the near real-time processing requirements for most satellites. The method in this paper can quickly achieve high-precision registration of PAN and MS images. It is suitable for different scenes and different sensors. It is extremely robust to registration errors between PAN and MS.


Introduction
From the perspective of how sensors work, remote sensing image data are split into PAN images and MS images [1]. The detector must receive enough energy to maintain the signal-to-noise ratio. The MS single-band receiving energy is weak, and the size of the detector can only be increased, resulting in the spatial resolution of an MS image being lower than that of the PAN image. Although the spatial resolution of PAN images is high, in order to increase the intensity of the energy received, its spectral range is wider and its spectral resolution is lower than those of MS images. In general, a PAN image has relatively high spatial resolution, but the spectral resolution is low; an MS image data adopts a high spectral resolution, and the spatial resolution is low. A pansharpening method combines the advantages of the two: it is a process of merging a high-resolution PAN image and a lower resolution MS image to create a single high-resolution color image [2][3][4].
Due to the complementarity of various datasets, pansharpening technology has become one of the main research topics in satellite image processing. Early methods such as hue-intensity-saturation (HIS) [5], principal component analysis (PCA) [6] and high pass filter (HPF) [7], used in Landsat TM and SPOT, have had achieved great success [8].
matching method is required. In this study, the method of rectification with tiny facet primitive [43] was used for high-precision sub-pixel registration.
Finally, due to the development of remote sensing satellite manufacturing technology in recent years, the amount of satellite imaging single scene data has increased sharply, and applications such as disaster relief and military reconnaissance have extremely high requirements for the efficiency of algorithms. Traditional CPU-based algorithms are very inefficient when processing tens of gigabytes or even hundreds of gigabytes, and often cannot meet real-world demands. There is also an urgent need for automated processing [44,45]. In order to achieve near real-time automatic sub-pixel registration of PAN and MS images, our method maps parallel algorithms to a GPU for processing. The method in this paper can greatly improve the registration efficiency while ensuring accuracy.

Methods
To achieve near real-time automatic registration of high resolution PAN and MS images, the algorithm in this paper was designed with full consideration of parallelism. The differentiation method is one of the best methods for parallel processing, in which the basic theory is to treat a complex model as consisting of many simple models. These simple models can be processed in parallel. They can be efficiently mapped to a GPU for processing. We used differential geo-registration to geometrically register PAN and MS images to obtain co-registered enlarged images, and eliminated potential errors through differential rectification with tiny facet primitive to achieve sub-pixel accurate registration. The entire process can be automated and processed in near real-time. The steps of the algorithm in this paper are as follows: Step 1. Calculate the overlap range of PAN and MS images; Step 2. Differential geo-registration for PAN and MS images; Step 3. Differential rectification with tiny facet primitive for PAN and MS images.
The method in this paper was used as the preprocessing of PAN and MS images before fusion, which can accurately register them to improve the quality of fusion. We assumed that the bands of a given MS image were accurately registered, and only considered the misregistration between PAN and MS images. In addition, only the four bands of visible, near-infrared (VNIR) were considered in this experiment-that is, the bands of blue, green, red, and near-infrared. In fact, as long as the MS bands are accurately registered, no matter how many bands there are, the result should be the same. As the algorithm in this paper uses RGB to proportionally synthesize the pseudo-PAN image and the PAN image to match the control points and rectify all MS bands with the same parameters.

Calculate the Overlap Range
Although PAN and MS images are taken at the same time, in actual processing, one finds that the scene divisions of the two do not strictly correspond, especially in the case of different sources. Therefore, the overlap range of the two images needs to be calculated first.
As shown in Figure 1, the four corner points of PAN and MS were projected to the same elevation surface through the RFM model [46][47][48], and the overlap area on the object side was calculated. Then the overlap area was mapped back on their respective image coordinates to obtain overlay ranges for the PAN and MS images.
The RFM is a universal rational polynomial model that directly establishes the relationship between the pixel coordinates of an image point and the geographic coordinates of its counterpart. RFM hides the satellite sensor parameters and attitude orbit parameters, and has many advantages, such as versatility, high calculation efficiency, and coordinate back calculation without iteration. Therefore, it has been widely used and has become an international standard.  The RFM is a universal rational polynomial model that directly establishes tionship between the pixel coordinates of an image point and the geographic coo of its counterpart. RFM hides the satellite sensor parameters and attitude orbit ters, and has many advantages, such as versatility, high calculation efficiency, a dinate back calculation without iteration. Therefore, it has been widely used and come an international standard.
In order to ensure the stability of the calculation, RFM regularizes the imag nates (l, s), latitude and longitude coordinates (B, L), and the height of ellipsoid H the coordinate range between [−1, 1]. The formula for calculating the normalize coordinates (l, s) corresponding to the image coordinates ( , ) of the image po  In order to ensure the stability of the calculation, RFM regularizes the image coordinates (l, s), latitude and longitude coordinates (B, L), and the height of ellipsoid H to make the coordinate range between [−1, 1]. The formula for calculating the normalized image coordinates (l, s) corresponding to the image coordinates (l n , s n ) of the image point is: ( Among them, LineO f f , SampleO f f are the translation values of the image-side coordinates, respectively, and LineScale, SampleScale are the zoom values of the image-side coordinates, respectively. The formula for calculating the normalized coordinates (U, V, W) of the object coordinates (B, L, H) is: Among them, LonO f f , LatO f f , HeiO f f are the translation values of the object coordinates, and LonScale, LatScale, HeiScale are the zoom values of the object coordinates.
For each scene image, the relationship between image coordinates and object coordinates can be expressed as a polynomial ratio as follows: The numerator and denominator of the polynomial in the above formula are expressed as follows: Remote Sens. 2021, 13, 3674
When both b 1 , d 1 take the value 1, only 78 other rational polynomial coefficients need to be solved. The least square method can be used to solve RPCs through virtual control points. Thus Equation (1) can be written as follows: Then the error equation is: Among them, X l = ( a 1 a 2 a 3 · · · a 19 a 20 b 2 b 3 · · · b 19 b 20 ) T and It is not difficult to find that the coefficients X l and X s to be solved are independent unknown parameters and can be solved separately. Take the first equation in Equation (6) as an example. If there are n virtual control points, the matrix form of the observation equation can be expressed as follows: Among them, V n U n · · · U n 2 W n W n 3 −l nn V n −l nn U n · · · −l nn U n 2 W n −l nn U n 3 In the same way, the adjustment equation of the second equation can be obtained: Among them, V n U n · · · U n 2 W n W n 3 −s nn V n −s nn U n · · · −s nn U n 2 W n −s nn W n 3 Estimate the optimal solution of the error equation system by least squares:

Differential Geo-Registration
Different remote sensing images can be geometrically coarse-registered using geographic information, especially PAN and MS images, which are imaged at the same time; they are basically consistent geographically. Therefore, by setting a basic simulation plane, the PAN and MS images can be simulated on the same plane with geographic information, and then the PAN and MS geographic registration can be realized. In this study, the overlapping area of PAN and MS images was used as the simulation plane, which was calculated using geographic information.
As shown in Figure 2, the overlapping area obtained in the previous step was divided into small blocks by the method of differentiation. The smaller the block, the higher the accuracy. The corresponding points of the same name between each block of each PAN image and each block of each MS image were obtained by the RFM model. Additionally, we calculated the transformation model for each block through the points of the same name. The transformation model can adopt linear transformation, affine transformation, perspective transformation, etc. The computational burdens and accuracies of different transformation models vary. Generally speaking, the greater the computational burden, the higher the accuracy. This paper adopted perspective transformation as its transformation model to improve accuracy as a result of what is discussed in Section 3.4. Although different blocks had been rectified with different parameters, an overlapping area between them was used to maintain their continuity. The same design was also used in the differential rectification with tiny facet primitive.
When mapping to the GPU, we need some of the following principles. (1) The size of the thread block must be an integer multiple of the number of thread beams. (2) The thread block size should be at least 64 or more. (3) The thread block's maximum should exceed 1024; (4) Avoid using too much logical hype, such as a branch statement or a loop statement.
Perspective transformation [49] is a linear projection where three dimensional objects are projected on a picture plane, which can take three-dimensional information into consideration. As shown in the following formula, the perspective transformation first projects two-dimensional space (s, l) to three-dimensional space (x, y, z).
Among them, (x, y, z) are represented by (s, l) as follows: x = a 11 s + a 12 l + a 13 y = a 21 s + a 22 l + a 23 z = a 31 s + a 32 l + a 33 (11)  Then project from three-dimensional space (x, y, z) to another two-dimensional space (s , l ). s = x z = a 11 s+a 12 l+a 13 a 31 s+a 32 l+a 33 l = y z = a 21 s+a 22 l+a 23 a 31 s+a 32 l+a 33 (12) In the above formula, a 33 is generally equal to 1. There is a total of 8 unknowns, which can be calculated through 4 pairs of points with the same name. Additionally, when a 31 = 0, and a 32 = 0, the perspective transformation degenerates into an affine transformation. Perspective transformation [49] is a linear projection where three dimensional objects are projected on a picture plane, which can take three-dimensional information into consideration. As shown in the following formula, the perspective transformation first projects two-dimensional space (s, l) to three-dimensional space (x, y, z).
Among them, (x, y, z) are represented by (s, l) as follows: Then project from three-dimensional space (x, y, z) to another two-dimensional space (s', l').
In the above formula, is generally equal to 1. There is a total of 8 unknowns, which can be calculated through 4 pairs of points with the same name. Additionally, when After the projection relationship is established, resampling is required. In this study, each block was transformed on the GPU according to its projection relationship in parallel. Interpolation methods for resampling include nearest neighbor, bilinear, and bicubic. Considering the efficiency and accuracy, we adopted the bilinear interpolation for resampling. Bilinear interpolation is performed using linear interpolation first in one direction, and then again in the other direction. Although each step is linear in terms of the sampled values and in the position, the interpolation as a whole is not linear but rather quadratic by the sample location. As shown in Equation (13), first interpolate in x-direction.
Then it will be interpolated in y-direction to obtain the desired estimate.
Remote Sens. 2021, 13, 3674 8 of 24 PAN and MS images were normalized to the same direction and scale after differential geo-registration, which can be used directly for subsequent processing. However, there will be certain errors between them, such as the errors caused by tremor of the satellite platform [39,40] and random error by calibration [41,42], which needs to be further corrected.

Differential Rectification with Tiny Facet Primitive
PAN and MS images are taken almost at the same time, and they are also calibrated during preprocessing. In theory, the geometric relationship between the two images is strict. Its accuracy can reach the sub-pixel level after the differential geo-registration. However, during the actual processing, it was found that many data could not meet this standard, especially in the case of large roll, large pitch, and large terrain undulations. Therefore, this paper used differential rectification with tiny facet primitive to further correct the errors between PAN and MS images.
As can be seen in Figure 3, each geo-registered image was differentiated into many corresponding blocks. These blocks were uploaded to the GPU for parallel processing. The processing of each block included three steps: (1) matching to obtain the same name point; (2) calculating the transform model; (3) differential rectification with the transform model.
where , represent the row and column size of the matching window; , represent the row number and column number in the matching window; , ， , represent the gray values in the PAN and MS image matching windows, respectively; , are the row and column of the MS image for the PAN image coordinate difference. For MS images that have undergone registration processing, take = = 0. For images that have not undergone band registration, the size of , is determined according to the camera design value.
Divide the MS simulation Image into blocks according to the dx and dy. MS   After extracting the points with the same name, it is necessary to calculate the transform relationship between each block. The perspective transformation is also used at this stage; please refer to Section 2.2. Bilinear interpolation is also used in subsequent differential rectification; please refer to Section 2.2. Matching and rectification formulas were mapped to two different kernel functions for calculation. Extracting points with the same name is based on template matching. This paper adopted the correlation coefficient as the method of similarity measurement in template matching. The correlation coefficient [50,51] is a standardized covariance function, which Remote Sens. 2021, 13, 3674 9 of 24 can overcome the linear deformation of the gray level and maintain the invariance of the correlation coefficient. It is the most commonly used matching measure in image matching.
This paper assumed that the MS image was accurately registered, and only considered the misregistration between PAN and MS images. Therefore, we use the following approximate formula to use RGB to proportionally synthesize the pseudo-PAN image and PAN image to match the control points and rectify all MS bands with the same parameters.
where PAN pseudo is defined as pseudo-PAN image. For a certain point on the PAN image, a matching window of m × n is established with the point as the center, and a matching window of the same size and shape is established at the initial value point of the same name in the MS image, and the correlation coefficient is calculated as follows: where m,n represent the row and column size of the matching window; i,j represent the row number and column number in the matching window; g i,j , g i+r,j+c represent the gray values in the PAN and MS image matching windows, respectively; c,r are the row and column of the MS image for the PAN image coordinate difference. For MS images that have undergone registration processing, take c = r = 0. For images that have not undergone band registration, the size of c,r is determined according to the camera design value.
After extracting the points with the same name, it is necessary to calculate the transform relationship between each block. The perspective transformation is also used at this stage; please refer to Section 2.2. Bilinear interpolation is also used in subsequent differential rectification; please refer to Section 2.2. Matching and rectification formulas were mapped to two different kernel functions for calculation.

Registration Quality Assessment Measures
This paper evaluated the effect of the algorithm from two angles: accuracy and efficiency. In terms of accuracy, the correlation coefficient was used to evaluate the registration accuracy of the algorithm. For the formula of the correlation coefficient, see Section 2.3. After the points with the same name were matched by the correlation coefficient, the root mean square error (RMSE) of these points were calculated as the final registration accuracy. For n correct checkpoints whose correct coefficient is greater than 0.9, the RMSE xy calculation formula is as follows: In the above formula, are the image coordinates of the correct check point and its reference point. RMSE x and RMSE y are the root mean square error in x-direction and y-direction, respectively. In terms of efficiency, we used a pure CPU implementation method and a GPU method.

Pan-Sharpening Quality Assessment Measures
A pull resolution assessment infers the quality of the pansharpened image at the scale of the PAN image without resorting to a single reference image [16,52,53]. The quality with no reference (QNR) is one of the widely used full resolution assessments [16,52]. QNR consists of two parts: one is the spectral difference (D λ ), and the other is the spatial difference (D s ).
D λ is calculated between the low-resolution MS image and the fused MS image. Calculate the universal image quality index (UIQI) [54] values between the two sets of bands at low resolution and high resolution. The difference in the corresponding UIQI values on the two scales produces the spectral distortion introduced by the pansharpening process. D s combines the UIQI values calculated between each MS band and PAN images downgraded to MS resolution, and between pan-sharpened MS and full-resolution PAN images. The absolute difference between the corresponding UIQI values averaged over all frequency bands produces D s .
The effectiveness of the pansharpening method can be described from the two perspectives of spectrum and space. These two indicators can provide a unique quality index, referred to as QNR ∈ [0, 1], with 1 being the best attainable value:

Data Introduction
The data used in this paper were from the GF-6 (GaoFen-6) and GF-7 (GaoFen-7) satellites, a series of high-resolution earth observation satellites planned by China's "China High-resolution Earth Observation System (CHEOS)"; ZY3-02 (ZiYuan3-02) of the National Land Satellite Remote Sensing Application Center of the Ministry of Natural Resources; and SV-1 (SuperView-1), China Aerospace Science and Technology Corporation's commercial remote sensing satellite. Among them, the resolution ratio of ZY3-02 PAN images to MS images is not the common 4 to 1, which allowed us to verify the versatility of the method in this paper. The selected images also include various topographies and landforms, such as mountains, plains, and coastal areas. The details of the data are shown in Table 1.

Precision Analysis
In order to verify the accuracy of the algorithm, we first registered the PAN and MS images described above. Figure 4 shows the registration results of these four pairs of images. It can be seen in the figure that the PAN and MS images were normalized to the same resolution and direction, and differential corrections were also performed. It is difficult to see the deviations visually, as this time the PAN and MS images were strictly registered. The corresponding details can be seen in Figure 5. Columns a and dare PAN images, columns b and e are MS images, and columns c and f are PAN and MS contrast images. Lines 1 and 2 are GF-6, lines 3 and 4 are GF-7, lines 5 and 6 are ZY3-02, and lines 7 and 8 are SV-1. These detailed images further demonstrate the registration accuracy of the method in this paper. Both PAN and MS images were registered. The detailed pictures were evenly cut out from each group of pictures, which shows that the overall deviation distribution is basically the same. For the ZY3-03 and SV-1 images, the accuracy of the registration results of the mountainous terrain with large undulations was also very high.
In this study, RMSE was used to quantitatively analyze the registration results and further analyze the accuracy of the registration method. As shown in Table 2, we compared the accuracy of the registration performed using different algorithms. SIFT with affine transformation was registered by affine transformation with matched SIFT points. The Level 2 MS image was up-sampled to the scale of the Level 2 PAN image. The Level 1A MS image was up-sampled to the scale of the Level 1A PAN image. The Level 1A image, after differential geo-registration, was registered by differential geo-registration described in Section 2.2. It can be seen from the table that among the four different geomorphological areas captured by the four satellites, the method in this paper had the best accuracy, and the accuracy was kept within 0.5 pixels. The registration with affine transformation by matched SIFT points cannot eliminate the relative geometric distortion of PAN and MS images. Therefore, the accuracy of registration using that method is closely related to internal distortion. The accuracy of the ZY3-02 image with a large amount of internal distortion was poor, and there were still residual errors in the registered images. The actual remote sensing image products cannot completely eliminate internal distortion. In addition, the method of registration by extracting feature points is not suitable for many images, because it is difficult to match the feature points in these cases, such as large differences in radiation response characteristics, severe cloud occlusion, and large differences in resolution. As shown in Figures A1 and A2 in Appendix A, the hyperspectral image of ZY1E (ZiYuan 1E-spatial resolution is 30 m) and the panchromatic image of GF-7 (Spatial resolution is 0.8 m) cannot match the SIFT feature points. However, the method in this paper could complete the rapid registration as long as the geometric measurements were accurate, which also shows its potential for image registration using images from different sources.
The directly enlarged Level 1A image is a result of a subset of the differential georegistration, which is a special mode, and its accuracy will be lower than that of the differential geo-registered image. The accuracy of the differential geographic corrections of Level 1A image was better than that of the Level 2 image, as each block was strictly geographically aligned when differential geographic correction. This accuracy depends on the stability of the satellite and the accuracy of camera calibration. Therefore, differential rectification with tiny facet primitive was required for further correction.
Level 2 PAN images and MS images which were enlarged based on geographic location were the least accurate. PAN and MS images are generally corrected separately in blocks, and the corresponding block is usually not consistent with the actual size, which results in the corrected image not strictly corresponding with the former, causing errors. After the high-precision registration in this study, the next step of pansharpening was carried out.    As shown in Figures 6-9, we used the Gram-Schmidt [14] method to fuse the images registered by the method in this paper and the Level 2 enlarged image. The Gram-Schmidt method is sensitive to the registration accuracy. When the registration accuracy is poor, color deviation and ghosting will occur. It can be seen from the figures that the accurately registered images maintained good fusion effects in different scenes and different satellite images. The GS method exploits the intensity component as the first vector of the new orthogonal basis. The orthogonalization processes the MS vector. Then it finds its projection on the hyper-plane defined by the previously found orthogonal vectors and its orthogonal component such that the sum of the orthogonal and projection components is equal to the zero-mean version of the original vectorized band. Pansharpening is completed by substituting each intensity component with a histogram-matched P before performing the inverse transformation. Level 1A image was better than that of the Level 2 image, as each block was strictly geographically aligned when differential geographic correction. This accuracy depends on the stability of the satellite and the accuracy of camera calibration. Therefore, differential rectification with tiny facet primitive was required for further correction. Level 2 PAN images and MS images which were enlarged based on geographic location were the least accurate. PAN and MS images are generally corrected separately in blocks, and the corresponding block is usually not consistent with the actual size, which results in the corrected image not strictly corresponding with the former, causing errors. After the high-precision registration in this study, the next step of pansharpening was carried out.
As shown in Figures 6-9, we used the Gram-Schmidt [14] method to fuse the images registered by the method in this paper and the Level 2 enlarged image. The Gram-Schmidt method is sensitive to the registration accuracy. When the registration accuracy is poor, color deviation and ghosting will occur. It can be seen from the figures that the accurately registered images maintained good fusion effects in different scenes and different satellite images. The GS method exploits the intensity component as the first vector of the new orthogonal basis. The orthogonalization processes the MS vector. Then it finds its projection on the hyper-plane defined by the previously found orthogonal vectors and its orthogonal component such that the sum of the orthogonal and projection components is equal to the zero-mean version of the original vectorized band. Pansharpening is completed by substituting each intensity component with a histogram-matched P before performing the inverse transformation.    Take ZY3-02 with obvious ghost phenomenon in Figure 8. As an example to show the effect of the algorithm in this paper. As shown in Figure 10, the Level 2 PAN and upsampled MS images are not fully registered, which also leads to obvious ghosting and color cast in the fusion result. The algorithm in this paper made the PAN and MS images basically completely registered, the effect of fusion was clear and no ghosting, and the color was maintained well. The up-sampled Level 2 images have particularly obvious color deviations and ghosting in the cases of ZY3-02 and SV-1, which is also consistent with the Level 2 image error Take ZY3-02 with obvious ghost phenomenon in Figure 8. As an example to show the effect of the algorithm in this paper. As shown in Figure 10, the Level 2 PAN and up-sampled MS images are not fully registered, which also leads to obvious ghosting and color cast in the fusion result. The algorithm in this paper made the PAN and MS images basically completely registered, the effect of fusion was clear and no ghosting, and the color was maintained well. Take ZY3-02 with obvious ghost phenomenon in Figure 8. As an example the effect of the algorithm in this paper. As shown in Figure 10, the Level 2 PAN sampled MS images are not fully registered, which also leads to obvious ghost color cast in the fusion result. The algorithm in this paper made the PAN and MS basically completely registered, the effect of fusion was clear and no ghosting, color was maintained well. The up-sampled Level 2 images have particularly obvious color deviations an ing in the cases of ZY3-02 and SV-1, which is also consistent with the Level 2 ima The up-sampled Level 2 images have particularly obvious color deviations and ghosting in the cases of ZY3-02 and SV-1, which is also consistent with the Level 2 image error measured in the previous part of this paper. The Level 2 image errors of GF-6 and GF-7 are relatively small, and the ghost phenomenon is not obvious, but there are serious color deviations.
As shown in Table 3, we used the QNR index at full resolution to further quantitatively demonstrate our method's superiority. It can be seen from the table that the algorithm in this paper could effectively improve the spectral and spatial consistency when other conditions were the same. The registration result will greatly affect the fusion effect. Therefore, the precise registration of PAN and MS images is of great significance for pansharpening. Table 3. Global distortion/quality index integrated with fused data.

Satellites
Methods D λ (p = 1) In summary, the method in this paper could achieve high-precision registration of PAN and MS images, which is suitable for different scenes and different sensors. It is extremely robust to errors between PAN and MS images.

Efficiency Analysis
The CPU of the server used for efficiency analysis was an Intel Core i7-6700K @ 4.00 GHz, the GPU was an NVIDIA GeForce RTX 2080Ti, and the operating system was Ubuntu 18.04.4 LTS. In addition, we had 64 GB of RAM and a 256 GB high-speed Nvme SSD. The processing time discussed in this article does not include the time for reading and writing data. After the data were read in, they were simulated as the data flow in from the previous step. The specific processing time of each algorithm was analyzed through the log. The log printed out the time taken for each key step, accurate to the millisecond level.
It can be seen from Table 4 that the speedup ratio of the computational concepts which was mapped to the GPU was several hundred fold. However, the overall efficiency improvement was dozens of fold, because GPU processing requires additional processing, such as the CPU/GPU copying each other, which takes a long time and slows down the overall efficiency. In general, the processing time increased from ten minutes to seconds. The work that was completed in ten minutes can now be completed in tens of seconds. This is a great help to the completion of near real-time processing work. As can be seen in Figure 11, the processing efficiency of this method is almost proportional to the amount of data.
The reason why the improvement is so obvious is that the parallelism of the algorithms mapped to the GPU is very high. In differential geo-registration, the main time consuming factor is that each pixel is returned, and the pixels do not affect each other and can be handled simultaneously. In differential rectification, what is time consuming is the control point matching and block correction. Control points were obtained by calculating correlation coefficients. Not only can all control points be calculated at the same time, but each point can also have correlation coefficients calculated in parallel. The block rectification can also calculate all pixels at the same time, with high parallelism.
Take GF-7 as an example. Its PAN camera has a width of approximately 35,660 pixels, and it images approximately 7200 lines per second. The data volume per second is 490 MB.
The MS camera has a width of approximately 8915 pixels and images approximately 1800 lines per second. The amount of data per second is 122 MB. Therefore, for GF-7, the amount of PAN and MS data generated per second is about 612 MB; FUS (fusion) data totals about 1960 MB. It can be seen from Figure 11 that the algorithm in this paper ran almost as fast as said data could be collected, according to the conversion of the previous processing efficiency. The processing of one second of GF-7 data takes about 5.09 s, which is close to near real-time processing.
The reason why the improvement is so obvious is that the parallelism of the algo rithms mapped to the GPU is very high. In differential geo-registration, the main time consuming factor is that each pixel is returned, and the pixels do not affect each other and can be handled simultaneously. In differential rectification, what is time consuming is the control point matching and block correction. Control points were obtained by calculating correlation coefficients. Not only can all control points be calculated at the same time, bu each point can also have correlation coefficients calculated in parallel. The block rectifica tion can also calculate all pixels at the same time, with high parallelism.

Discussion
In the method in this paper, the resampling method and the transformation model in block correction have large impacts on the results. The resampling method uses bilinear sampling. The transformation model for block correction uses perspective transformation. Here we discuss the reasons for choosing the bilinear resampling method and perspective transformation. The data used for this were the data introduced in Section 3.1. The analysis of the transformation model used the four groups of PAN and MS images. The analysis of the resampling method used the GF-7 MS image.
The transformation model in block correction was applied in differential geo-registration efforts and differential rectification with tiny facet primitive. In order to illustrate the accuracy of the block transformation model in block correction, we tested affine transformation, linear transformation, and perspective transformation. It can be seen in Table 5 that the accuracy of perspective transformation is significantly better than those of affine transformation and linear transformation. The linear transformation only needs two points to fit, the affine transformation needs three points to fit, and the perspective transformation needs four points to fit. The four points of the perspective transformation correspond to the four corner points of a block, making it suitable for block correction. In the GF-6 image with relatively flat terrain (the Yellow River estuary), the difference in RMSE xy between these three models was not large, about 0.002. However, the RMSE xy of SV-1, which has large terrain undulations, was about 0.04. This shows that the more obvious is the terrain undulation, the more obvious the advantage of perspective transformation, a linear transformation or affine transformation can be used for flat areas. The areas with large terrain undulations require higher-precision perspective transformations.
After differential geographic registration, the image is normalized to the same scale and direction, and different interpolation algorithms have an impact on the similarity measure in the subsequent matching. We designed an experiment to demonstrate this effect. The experimental steps were as follows: 1.
Select a scene image and down-sample it to different scales in a Gaussian pyramid.

2.
Up-sample the down-sampling results to the original image size by nearest, bilinear, and bicubic methods.

3.
Analyze the matching results of the grid points of different up-sampling methods. The correlation coefficient of the correct matching point is 0.9.

4.
Analyze the peak signal-to-noise ratios (PSNRs) of the up-sampling methods. The higher the PSNR, the better. It can be seen in Table 6 that among the different sampling methods, bicubic had the best effect, followed by bilinear, and nearest was the worst. When the resolution difference reaches 16 times, the correlation coefficient is basically invalid. For the method in this paper, when the resolution difference is too large, the differential geographic registration is still applicable, but the differential correction based on the correlation coefficient is invalid. As shown in Figures A1 and A2 in Appendix A, it degenerated into differential geographic registration, as differential geographic registration does not depend on the relationship between images. Unlike the method based on feature point matching, the method in this paper does not depend on the matching accuracy of feature points, and is also suitable for images with large radiation differences, cloud occlusion, and large-scale differences. However, the method in this paper relies on the initial geometric accuracy and requires accurate geometric calibration. The method in this paper needs to be carried out on the basis of the preprocessing of the originally captured image, and depends on the accuracy of the matching points.

Conclusions
We proposed a near-real-time PAN and MS sub-pixel registration method. Firstly, differential geo-registration is used to perform accurate geographical registration of PAN and MS images, which can normalize PAN and MS images to the same direction and scale. Compared with the direct enlargement of Level 2 images, this method has higher registration accuracy. At this time, the main factors of the image registration accuracy are the relative calibration error of PAN and MS images, the error caused by the camera's highfrequency tremor, etc. These errors are difficult to eliminate on the object side. Therefore, this paper used sub-pixel differential rectification with tiny facet primitive on the image side to eliminate these potential errors. Through sub-pixel tiny facet element differential correction, the registration accuracy between PAN and MS images was basically stabilized within 0.5 pixels, so that PAN and MS images could be accurately registered.
Although the method in this paper could effectively improve the registration accuracy of PAN and MS images, the computational burden was very large-especially for the calculations of links, such as geographic registration, matching, and correction. Therefore, this article mapped these computationally large and highly parallel links to a GPU for processing. The processing efficiency of these links on the GPU was hundreds of times better that the processing efficiency on the CPU. Counting the extra operations brought about by GPU processing, the processing time of the GPU relative to the CPU was still multiples shorter. The processing time decreased from tens of minutes to tens of seconds. Additionally, taking GF-7 as an example, it took about 5.09 s to process the amount of data captured by GF-7 per second, which basically realizes near real-time processing.
In general, the algorithm in this paper could not only provide high-precision registration for PAN and MS fusion, but it has high processing efficiency and can basically achieve near real-time processing. In addition it is suitable for different scenes and different sensors. It is extremely robust to registration errors between PAN and MS images. It also has the potential for high-precision registration of sensors from different sources.
(a) (b) Figure A1. ZY1E hyperspectral image (spatial resolution is 30 m) and GF-7 panchromatic image (spatial resolution is 0.8 m) registered by the method in this paper, which have a resolution difference of about 37.5 times. (a) A hyperspectral image of ZY1E. (b) A panchromatic image of GF-7. Figure A2. The registration details of the ZY1E hyperspectral image and the GF-7 panchromatic image. Figure A2. The registration details of the ZY1E hyperspectral image and the GF-7 panchromatic image.