Geometric Positioning Accuracy Improvement of ZY-3 Satellite Imagery Based on Statistical Learning Theory

With the increasing demand for high-resolution remote sensing images for mapping and monitoring the Earth’s environment, geometric positioning accuracy improvement plays a significant role in the image preprocessing step. Based on the statistical learning theory, we propose a new method to improve the geometric positioning accuracy without ground control points (GCPs). Multi-temporal images from the ZY-3 satellite are tested and the bias-compensated rational function model (RFM) is applied as the block adjustment model in our experiment. An easy and stable weight strategy and the fast iterative shrinkage-thresholding (FIST) algorithm which is widely used in the field of compressive sensing are improved and utilized to define the normal equation matrix and solve it. Then, the residual errors after traditional block adjustment are acquired and tested with the newly proposed inherent error compensation model based on statistical learning theory. The final results indicate that the geometric positioning accuracy of ZY-3 satellite imagery can be improved greatly with our proposed method.


Introduction
ZY-3 is the first civilian high-resolution stereo mapping satellite in China [1], and provides a tangible improvement in the monitoring capabilities for many fields, such as agriculture, forestry, geology, and so on. It has been designed for 1:50,000 scale stereo mapping requirements, which is of great significance to strengthen China's ability to independently obtain more geospatial information. The revisit period of the ZY-3 satellite is 5 days, and so abundant high-resolution satellite (HRS) stereo image pairs can be collected over a short period. These stereo image pairs are acquired from a three-line camera (TLC) sensor on the satellite platform. The ground sampling distance (GSD) of the forward-, backward-, and nadir-directed cameras are 3.5 m, 3.5 m, and 2.1 m, respectively.
Before making use of these remote sensing images for mapping and monitoring, their geometric characteristics must be considered [2], in which the block adjustment can improve the geometric consistency and accuracy [3]. There are three types of adjustment approaches used in current research, namely bundle adjustment [4], direct georeferencing (rigorous sensor model) [5], and the rational function model (RFM) [6]. All three methods were compared in [7], and the results showed that the with the help of rational polynomial coefficients (RPCs) contain the systematic errors of the satellite orientation (position, altitude, camera, jitter errors, and so on) and image point errors (systematic or random errors). The image point errors are mainly random errors in the acquisition of their coordinates. Compared with systematic errors, image point errors can be reduced to a small value with an accurate acquisition method. In fact, the traditional free block adjustment method without using GCPs can reduce random errors in geometric positioning to a large extent [28]. Usually, the residual errors are distributed in a specific direction and within a certain range due to the inherent errors of sensors on the satellite platform. Based on this, the inherent error compensation model is proposed to remedy the inherent errors of the satellite platform. The inherent errors of the satellite platform are also known as systematic errors. The density-based spatial clustering of applications with noise (DBSCAN) [29] algorithm is compatible with analysing the error distribution as a popular clustering method in the field of statistical learning. Compared with traditional clustering methods such as K-means [30] and decision trees [31], the DBSCAN method can automatically classify the dataset without a specified number of categories.
The remainder of this paper is organized as follows: The mathematical details of the control point free block adjustment with the improved FIST algorithm and the inherent error compensation model are presented in Section 2. Some experiments are conducted to verify the proposed model and the results are analysed in Section 3. Finally, the conclusions of our experiments are presented in Section 4. Figure 1 shows the main steps of our proposed method. Unlike some traditional methods, our proposed method includes two stages, which are summarised as follows and described in detail below: coarse correction and accurate correction, which are comprised of the following five steps. Firstly, the tie point sets are obtained after the image registration step with the help of the cascade scale-invariant feature transform(SIFT) algorithm and features from accelerated segment test (FAST) method and the space intersection step. With the extracted dataset clustered by the DBSCAN algorithm, traditional block adjustment with the improved FIST algorithm and the proposed weight strategy are applied to improve the geometric positioning accuracy. Then, the data clustering method is applied for a second time to analyze the distribution of residual errors in order to extract the inherent satellite platform errors of our experimental dataset. Finally, the inherent error compensation model is established with the help of the clustering method, aiming to reduce the inherent errors. The main steps of our experiments are introduced in detail in the following context.

Cascade SIFT Method and Space Intersection Method
Initially, the tie point sets can be obtained with the help of the cascade SIFT method and FAST method. The cascade SIFT method is derived from the traditional SIFT [32] method. Since the SIFT operator is invariant with position, scale, and rotation, it is suitable to be applied for remote sensing image registration. It can be written as follows: L(x, y; kσ) = I(x, y) * G(x, y; kσ), where k is the scale operator; I(x, y) is the original image space, and G(x, y; kσ) is a Gaussian function with a standard deviation of σ; L(x, y; σ) and D(x, y; kσ) represent the Gaussian scale space and the difference of Gaussian (DoG), respectively. The cascade SIFT method can be divided into two parts: the large coverage adapted anisotropic Gaussian-SIFT (AAG-SIFT) algorithm and the local-SIFT algorithm. The large coverage AAG-SIFT algorithm [33] is applied to a whole remote sensing image to achieve the coarse registration. Firstly, the DoG is applied in image scale to detect most points which can represent the integral structure features of the image. Removing unstable points with low contrast in all extremal points, the final feature points in image scale are obtained. Next, the gradient of these feature points are calculated and the histogram of these gradients around detected feature points are counted. The main directions of these feature points are extracted based on the histogram. After this, the descriptors of these feature points are generated, and the random sample consensus (RANSAC) algorithm is applied to screen out the correct matching points. With all coarse correct matching points in the image space detected, the coarse registration is finished.
After the coarse registration step, a small window around the object-space coordinate is extracted with the help of the RPCs. A local SIFT matching method is utilized and accurate matching points are obtained with a local-scale DoG. With the local matching method, the matching points searching process is undertaken in the local region of the image with the interference of other unrelated regions removed. The coarse correct matching points of the same local area and the accurate matching points are combined together to ensure the accurate registration.
After the accurate image registration, an effective corner detection method-the features from accelerated segment test (FAST) method [34]-is applied to acquire the tie point sets. A 9 × 9 window is used to strengthen the stability of the tie point matching process, as shown in Figure 2. The characteristics of the neighborhood of a tie point is calculated by the ratio as: where µ(p) represents the pixel value in central point p in Figure 2, and µ(k) denotes the pixel value in template k. For each ratio value, the similarity is measured by comparing it with a threshold as follows: where Th is a threshold value set defined by experience. We consider the searched point as a candidate tie point if more than half of the numbers of d k of the searched point are continuous and they all equal to 1 or 2. After the detection process, many candidate matching points are selected, while there exist some false detections and duplicate detections. In order to solve this issue, a false point elimination method is employed. First, we assign a score function to each candidate point based on its d k values [35]. For a candidate matching point, the more continuous nonzero values of d k , the higher its score. Then, a non-maximal suppression is utilized to select the best matching point. The tie point sets can be obtained accurately with this efficiency method.
Traditionally, object-space coordinates of these tie point sets have been calculated with the aid of DEM [36] or GPS data [19], which means their accuracy is highly dependent on the precision of the auxiliary data. However, as the satellite acquires overlapping images, the space intersection method is suitable for ZY-3 stereo image pairs to derive initial values of these tie point sets. Moreover, traditional methods may cause the problem of weak convergence. Figure 3a displays an example of weak convergence. A and A are different observations of the same object at different times. Due to the systematic errors of the platform and image distortion, the calculated coordinates of A and A with the help of each stereo image pair usually disagree. Therefore, the multi-observation dataset provides redundant information and strong constraints which can lead to faster convergence and more accurate solutions in object-space. In this procedure, all stereo image pairs which contain the same object are used together to form overdetermined equations. with with where (G n r , G n c ) represent the error functions between the calculated and extracted image-space coordinates in the nth image; (r, c) are the extracted image-space coordinates of an object, and (P, L, H) are the object-space coordinates and (∆P, ∆L, ∆H) are the corrections of the object-space to be solved; Line_O f f , Line_Scale, Sample_O f f , and Sample_Scale are the scale and translation operators of the calculated image-space coordinates with RPCs [6]; l is the vector of residual errors; Num L , Den L , Num S , and Den S are the rational polynomial model including 80 rational polynomial coefficients (RPCs) with degree no more than three, and u = [ To solve the overdetermined equations, the least square method (LSM) is applied. The problem of Equation (4) can be simplified as where G is the design matrix which consists of the partial derivatives of Gr, Gc to P, L, and H; and l is the residual errors vector between the calculated and observed image coordinates.
(a) (b) Through the singular value decomposition of the designed matrix G, the least-squares solutions can be obtained and the coordinates of the object in all images will be determined uniquely. With the help of multi-temporal images used together in the overdetermined equations, different observations will converge to the object A as shown in Figure 3b.

Data Classification and Preprocessing
The object-space coordinates of tie points are obtained through the space intersection step from the stereo image pairs. However, there are always some points with large geometric positioning errors compared with errors in other points which may be caused by errors in the image registration step or other factors. Therefore, points with large geometric positioning errors must be separated from the other data. Fortunately, this step can be regarded as a data classification problem. The object-space coordinates of the tie points are calculated from the extracted image-space coordinates and the RPCs, and the extracted object-space coordinates can be obtained from the space intersection step. Ideally, the original errors between these two values of the tie points should distribute around their extracted object-space coordinates within a certain range. Therefore, the density-based spatial clustering of application with noise (DBSCAN) method [29] based on statistical learning theory which is widely used in the field of machine learning is suitable here for data screening.
Supposing we have a dataset as shown in Figure 4a, and all points in this dataset can be divided into three types: core points (green points distributed inside the dense regions in Figure 4b), border points (blue points distributed at the edge of the dense regions in Figure 4b), and noise points (red points distributed in sparse regions in Figure 4b). In order to separate each type of these points with no a priori knowledge, their density distribution is important information that we can rely on. Two parameters are set before the density-based searching: Eps represents the max distance between neighborhood points in the same category and Minpts, which means the minimum number of points in one category. Searching from one point randomly in the tested dataset at first, if there are more than Minpts points in the Eps neighborhood of this point, it can be regarded as a core point and it can form a category. If the distances between points which do not belong to this category and the number of points in the formed category are smaller than Minpts, then these points are added to the formed category. If there are points which do not belong to any category, they are regarded as noise points. In this way, different categories can be formed and the noise points are separated from each category as shown in Figure 4c. An appropriate set of Eps and Minpts can achieve a satisfactory output of the classification process. Compared with other classification algorithms, the DBSCAN algorithm can detect the number of categories in the test dataset with no bias in the shape of clusters. Moreover, noise points can be quickly separated from other information in the dataset with no prior knowledge. With the help of the DBSCAN algorithm, we classify our dataset while identifying noise points to achieve a better convergence.

Free Block Adjustment
Because of its simplicity of implementation and standardization, the rational function model is widely used as a block adjustment model for the exterior orientation of high-resolution satellite images. It describes the relationship between image space and object space by a ratio of two cubic polynomials with 80 coefficients [6] as: where (X, Y) is the normalized image-space coordinates. However, due to the lack of accuracy in measuring the exterior orientation elements of spaceborne sensors, the RPCs have a low precision. Therefore, an affine transformation model (AFM) is usually applied here to compensate for the bias in the RPCs to improve their precision. The AFM is defined as follows: where a 0 , a 1 , a 2 , b 0 , b 1 , and b 2 are affine transformation coefficients, Sample and Line represent the image-space coordinates determined by RPCs, and (r, c) are the real image-space coordinates measured automatically. As a result, the error equations for the least squares solution can be derived in the form of [16]: where V is the residual vector, A and B are the design matrices which consist of the partial derivatives of F r , F c to a 0 , a 1 , a 1 , b 0 , b 1 , b 2 , and P, L, H; t and s are the correction vectors of the affine transformation parameters and object-space coordinates, respectively; l is the difference of the calculated and observed image point coordinates [17]. Moreover, Equation (9) applying the Gauss-Newton model can be rewritten in the matrix form as follows: where P is the weight matrix.
For simplicity, Equation (9) can be written as As a result, matrices U and V are both diagonal. s and t are dependent on each other, and usually the number of unknowns in s is much larger than t. Therefore, we use a Gauss elimination method to reduce the complexity of the unknown matrix, and the normal equation is reduced to

Weight Strategy
In order to acquire a better convergence of the results, a good weight strategy plays an important role in the iteration step. However, researchers have seldom mentioned it in detail in their papers. With gross points removed from the experimental dataset in the data classification and preprocessing step, we can make sure that values of all tie points are valid estimations. Thus, it is appropriate to apply a stable and effective weight strategy. In our study, all observations are independent and the measurement accuracy of each element in s is the same. The bias in the RPCs of different images vary, which means that elements in the matrix t of each image share the same value. Thus, the initial weight of s and t as unity. During the iteration, the weight of s and t are updated separately under our weight strategy as follows: where P s i and P t j represent the weights of the ith tie point set in s and the weight of the jth image in t in the mth iteration, respectively; ∆s m i and σs m−1 i represent the bias and standard deviation of elements in s, respectively; ∆x t and ∆y t are the image coordinate errors of the tth point in the jth image, and k j is the number of image points in the jth image; σl m−1 j is the standard deviation of the difference of all tie points between the calculated coordinates in the iteration step and coordinates extracted in the space intersection step in the jth image; (H j , W j ) is the length and width of the jth image.

The Improved FIST Algorithm
The fast iterative shrinkage-thresholding algorithm is an advanced algorithm originating from the traditional iterative shrinkage-thresholding algorithm (ISTA). It is a popular algorithm in solving the minimization of the L1 norm problem in the compressive sensing field. Based on the algorithm proposed by Nesterov to solve the problem of minimizing smooth convex functions, the FIST algorithm has proved to be an "optimal" method at first order in algorithm complexity [25]. This method is attractive due to its simplicity and suitability for solving large-scale problems as well as achieving better convergence.
Considering a basic linear inverse problem as with an ill-conditioned coefficient matrix A, regularization methods are required to stabilize the solution, in which l 1 regularization has attracted a revival of interest and a considerable amount of attention. In this way, the linear inverse problem can be expressed as follows: with L(∇ f ) is the Lipschitz constant of ∇ f . An approximate mapping of the problem is [26] prox(x) = argmin where ρ is the fixed step size within the range of (0, 2 L(∇ f ) ) to guarantee the convergence of Equation (18). To solve Equation (18), the FIST algorithm consists of the following three steps: (1) Initialization: (3) Estimation: if x k − x k−1 2 < , then x k is the expected result; else return to the iteration step.
where x k is the unknowns vector, z k is a combination vector of the unknowns.
To accelerate the rate of convergence in the above iteration, an improved step size operator ρ k based on the Barzilai-Borwein (BB) algorithm [37] is applied, and the iteration step can be rewritten as (2) Iteration: where ρ k is the ratio of ∆x k and ∆s k , which represents the rate of convergence; φ is an arithmetic factor to guarantee the convergence of the iteration. According to the Barzilai-Borwein (BB) algorithm [37], the iteration step factor can be decided by the information from the current iteration and the former one which represents the rate of convergence at x k . Compared with the traditional FIST algorithm, the parameters ρ k cannot only optimize the iteration step of x k , but also contribute to better convergence rates. Furthermore, the parameter z k+1 is a specific linear combination of x k−1 , x k , which can significantly outperform the traditional IST algorithm and other classical gradient methods.

Inherent Error Compensation Model
With the random errors reduced efficiently by the above free block adjustment model, the remaining errors are mainly composed of three parts: errors due to the inaccuracy of RPCs, systematic errors, and image points errors. Errors of image points and RPCs can be constrained into a small range with an affine transformation model added in the free block adjustment procedure. Furthermore, the RPCs are an approximation of the rigorous sensor model, which means errors due to the inaccuracy of RPCs originate from the systematic errors of the satellite platform to some extent.
The systematic errors include errors of the satellite orientation such as those due to position, altitude, camera, jitter, etc. Most of the systematic errors of the satellite platform can be corrected manually to a great extent. However, there will always be residual errors in the satellite orientation caused by the limitations of current technology, which is the cause of the bias between RPCs and the real sensor model. Therefore, the analysis of the distribution of the residual errors based on the statistical learning theory is an important aspect in our experiment.
Firstly, the residual errors of all tie points after block adjustment are clustered by the DBSCAN algorithm and graphed with the aid of location information. Considering the statistical property of our dataset, experimental data with few noise points are analysed and a cluster center is obtained. Since all residual errors in images from a stable sensor on the satellite platform may share the same inherent errors, all errors should have the same distribution after control point free block adjustment. Without ground control points, the actual bias between the calculated coordinates and actual coordinates is unknown. Thus, the average of the calculated coordinates contains the inherent errors of the sensor before they are removed from each point, given that the block adjustment method without GCPs does not enable their determination. Corrections based on a rigorous sensor model are helpful in restricting systematic errors of the sensor, but are not suitable for a general rational function model with RPCs. A control point free block adjustment method can result in better convergence of the computation of the correspondence between images by reducing the random errors, which means a high precision of relative positioning accuracy can be achieved. Since systematic errors exist during the whole process of block adjustment, the statistical analysis is helpful in handling them.
According to the analysis above, residual errors after control point free block adjustment are mainly the systematic errors of the sensor with small RPC errors and images point errors, which we refer to as inherent errors. Therefore, the error distributions of all tie points are consistent. Furthermore, the relationship between the relative positions of images with respect to each other are improved greatly by the above free block adjustment method. We need to improve the absolute geometric positioning accuracy. Considering that the clustering center of all residual errors are the same, an affine transform model in image-space is inefficient and not straightforward. In order to achieve accurate absolute positioning, a perspective transformation model of the object-space is added as an additional condition. The scale of errors in object-space are quite different from that in image-space, and a 1 pixel difference in image space can cause a 3 m error in the object-space. Therefore, a constraint in the object-space based on the perspective transformation model is appended with a translation model in the image-space as follows: where (k 00 , k 01 ) are the estimated inherent errors obtained from the former experiment of the homologous dataset in the direction of longitude and latitude, or the direction of the along-track and cross-track; (P, L, H) is the average of the calculated object-space coordinates (P c , L c , H c ), and (F P , F L , F H ) are the residual errors of the object-space coordinates. In order to simplify the computation, a linear transformation model of the image-space coordinates is applied as a result of the achievement of the free block adjustment. The last three lines of Equation (21) can be written in matrix form as:    where the vector of [k 00 k 10 0] T is a translation coefficient in object-space obtained according to the distribution of inherent errors, which is the main compensation item of the formula. The matrix which includes the elements from k 01 to k 23 of Equation (22) are aimed to strengthen the relative relationships between images in both image-space and object-space. Equation (22) originates from the perspective transformation in the object-space as

Study Area and Data Set
In our experiment, 95 multi-observation images of Beijing and Songshan from ZY-3 were tested, covering a total area of about 35,000 km 2

Tie Point Sets Acquisition
Based on the cascade SIFT and FAST algorithms, we developed an automatic matching program to detect the potential tie points in every test image. Firstly, the large coverage scale adapted anisotropic Gaussian-SIFT(AAG-SIFT) algorithm was applied at image scale, which is helpful to find the matching areas between images quickly. Then, 300 × 300 pixel windows covering the tie points were calculated with RPCs to ensure the accuracy of the matching points during the accurate matching process. After this, the FAST method was applied in the local accurate registration regions to obtain the tie points in all images. Figure 6 shows some examples of the extracted tie point sets.

Control Point Free Block Adjustment
After data preprocessing, the traditional method of block adjustment was applied to the selected dataset. A stable and efficient weight strategy was applied here after the previous clustering and filtering, which means all test data is credible. To evaluate the performance of the free block adjustment method, we calculated the root mean square error(RMSE) of the absolute errors of the check points and the positioning accuracy in-plane was measured by the distance as follows where (X c , Y c ) and (X, Y) represent the calculated and actual coordinates of the check points in object-space, and (µ X , µ Y , µ P ) are used for the accuracies in X, Y and the resulting vector directions. By the aid of the DBSCAN algorithm, the errors of all tie points were clustered together. In other words, the distribution of the absolute values of these check points indicates that almost all "noise" points in the previous dataset were removed, which contributed to a better convergence and faster computation. The error distributions of each check point set of Beijing and Songshan areas are compared in Figure 7. The directions of the arrows in the graphs indicate the directions of the error distribution, and the length represents the value of the error point. From the figure, we can see that the use of the stable weight strategy and IFIST algorithm to solve the normal equations is efficient in handling the free block adjustment. With the help of the multi-temporal dataset, the random errors of the image-space coordinates could be reduced by the multi-observation images. With the effect of the random errors in image-space reduced, the residual errors were mainly the systematic errors. Considering the error distributions graphed in Figure 7, the absolute errors of the experimental areas were almost the same in direction and numeric value, and therefore not dependent on the geographical locations of the test datasets. As for the ZY-3 satellite, cameras of this satellite are static on the platform so that the systematic errors for all homologous data should be the same, which demonstrates consistency between the residual errors derived during the block adjustment process and the theoretical analysis. Furthermore, the error distribution converged in a certain direction, and the numerical values of the RMSE of these errors were small, as shown in Table 1. The absolute errors of all check points are displayed together in Figure 8. Based on the DBSCAN theory, the green point in Figure 8c-e is the cluster center point, which shows the convergence of the block adjustment. Comparing the results in Figure 8a,c, the error distribution after the block adjustment showed an obvious convergence. While considering the error distribution of Figure 8c,d, we can see that the errors of different experimental areas converged in almost the same direction, and the cluster center was almost the same for the Songshan area as for the Beijing test area.
At the same time, a verification of the proposed weight strategy and the improved FIST method was conducted. A unit weight strategy and some traditional solutions were tested. The following four methods were applied on the Beijing test dataset: the improved fast iteration shrinkage-thresholding (IFIST) algorithm, the fast iteration shrinkage-thresholding (FIST) algorithm, the preconditioned conjugate gradient (PCG) algorithm, and the spectrum correction (SC) algorithm with a unit weight strategy and the proposed weight strategy, respectively. With the same dataset of the Beijing area and 20 times iterations, the time consumed and relative accuracy of the tie points were calculated. The relative accuracy was calculated by the mean of the quadratic sum of the difference between the calculated image-space coordinates and the average of these image-space coordinates in the same tie point set. Moreover, the absolute errors of these check points were calculated with different weight strategy and the RSME and STD of the absolute errors in the resulting vector directions of latitude and longitude were calculated to validate the stability of our proposed weight strategy. The calculation formulae of the RMSE and STD are: where X i is the residual error of the ith tie point in the resulting directions of latitude and longitude; X is the mean of all residual errors X i , and N is the number of the tie points. The advantage of our proposed method which we tested with dataset of the Beijing area can be obtained by comparing the outputs of each method, which are shown in Figure 9 and Table 2. The accuracy refers to the mean value of the residual errors in image-space of each tie point, and the time consumed was conducted with the same condition of 15 iterations. Figure 9 displays the convergence of different methods with the same dataset. The IFIST algorithm was the fastest with a convergence of 13 iterations, the SC and PCG algorithms converged to stable results with almost 18 iterations, while the FIST and IST algorithms needed more than 22 iterations for their convergence. The time consumed by each method is the product of iterations and the time consumed for each iteration. Thus, the time consumed for 15 iterations and the accuracy of each method are shown in Table 2. The convergence of our proposed weight strategy of the Beijing and Songshan areas with the IFIST algorithm are shown in Table 3.

Inherent Error Compensation
After the data cluster step for the test dataset following the control point free block adjustment, residual errors of all tie points were clustered into the same category in which the cluster center can represent most of their characteristics. With the random errors reduced by the free block adjustment, the distribution of the residual errors indicated a systematic error in the test dataset. Considering the error distribution of different areas, the residual errors were similar while the original errors were distributed differently. With the random errors reduced, the cluster center of the residual errors are considered to be the inherent error of the sensor of ZY-3 satellite images.
In order to verify the theory of the inherent error compensation model, we developed a confirmatory experiment based on our test dataset, in which 23, 38, and 54 images were randomly chosen from the whole dataset to form a total of 1.8 × 10 +27 combinations of images. With the free block adjustment applied to each group, the residual errors could be calculated and the cluster center of each group could be obtained with the help of DBSCAN algorithm. Then, the convergence of the results of each group were verified and all cluster centers were extracted from each group of these data sets stored in an Excel file. To reduce the time consumed, the cluster centers of 3416 groups of the combinations of 23 images, 5468 groups of the combinations of 38 images, and 8437 groups of the combinations of 54 images were randomly chosen to be graphed, and the cluster center distributions in each case are shown in Figure 10a-c. The purple points are the cluster centers of each combination and the green points are the newly calculated cluster centers in all purple points. The consistency of the residual errors indicates that all data in the test dataset shared the same characteristics. Furthermore, the coordinates of the cluster centers of the three combinations were (5.08 m, 6.82 m), (5.17 m, 6.82 m), and (5.09 m, 6.87 m) in the vector directions of longitude and latitude, respectively, which indicates that the computation of three groups converged at almost the same position. Therefore, the residual errors were considered as an inherent error of the ZY-3 satellite images based on the statistical learning theory. The inherent error compensation model is proposed to improve the geo-positioning accuracy further. In order to acquire a more credible value of the inherent error, we put all images together to conduct the block adjustment process, and the cluster center of the residual errors of the whole dataset was located (5.13 m, 6.83 m) in the direction of longitude and latitude. Based on experiments with all data in the test dataset, this result is closer to the real value of the inherent error. As a result, the cluster center of the whole dataset was considered as an approximation of the inherent error of ZY-3 satellite imagery. The coordinates of the cluster center were utilized to propose the inherent error compensation model as k 00 and k 10 in Equation (21). With the help of the inherent compensation model, the RMSE of the geometric positioning accuracy of the test dataset was improved greatly to 4.27 m and 3.39 m in plane for the Beijing and Songshan areas, respectively. Comparison of the error distribution of the geometric positioning accuracy after the inherent error compensation model is detailed in Table 4.
Furthermore, the results in Figure 8e,f are more intuitive to show the improvement of the geometric positioning accuracy of the tested datasets, and the convergence of the computations performed better compared with the results before the inherent error compensation. Figure 11 displays the improvement of the control point free block adjustment model and the inherent error compensation model. The blue line represents the error distribution of all check points in the resulting vector directions of latitude and longitude, the orange line represents the error distribution after control point free block adjustment, and the green line is the error distribution after the inherent error compensation. (a)

Conclusions
In this paper, we put forward a new inherent error compensation model based on statistical learning theory to improve the geometric positioning accuracy of ZY-3 satellite images. Datasets of Beijing and Songshan areas were separately processed to conduct the free block adjustment together with an affine transformation model to compensate the bias of the RPCs. Simultaneously with the help of a stable and efficient weight strategy and the improved FIST algorithm which is widely used in the field of compressive sensing, the geometric positioning accuracy reduced to 9.68 m and 8.47 m in plane for the Beijing and Songshan areas. Previous studies terminated once the improvement of the affine transformation model was achieved. Therefore, we put our focus on studying the distribution of the residual errors after the free block adjustment. With the random errors reduced, the distribution of the residual errors of all check points had similar characteristics. With the aid of the DBSCAN algorithm, we could obtain a regular distribution of the residual errors by statistical analysis. Based on this, the inherent error compensation model is put forward and the geometric positioning accuracy of the test datasets was improved to 4.27 m and 3.39 m in plane for the Beijing and Songshan areas, respectively. The final experimental results showed that the residual errors of the test datasets converged to within 2 pixels in geometric positioning accuracy of ZY-3 remote sensing images. Further research of this method will be tested on remote sensing images from other platforms and multi-source datasets.