An SRTM-Aided Epipolar Resampling Method for Multi-Source High-Resolution Satellite Stereo Observation

Binocular stereo observation with multi-source satellite images used to be challenging and impractical, but is now a valuable research issue with the introduction of powerful deep-learningbased stereo matching approaches. However, epipolar resampling, which is critical for binocular stereo observation, has rarely been studied with multi-source satellite images. The main problem is that, under the multi-source stereo mode, the epipolar-line-direction (ELD) at an image location may vary when computed with different elevations. Thus, a novel SRTM (Shuttle Radar Topography Mission)-aided approach is proposed, where a point is transformed from the original image-space to the epipolar image-space through a global rotation, followed by a block-wise homography transformation. The global rotation transfers the ELDs at the center of the overlapping area to the x-axis, and then block-wise transformation shifts the ELDs of all grid-points to the x-axis and eliminates the y-disparities between the virtual corresponding points. Experiments with both single-source and multi-source stereo images showed that the proposed method is obviously more accurate than the previous methods that do not use SRTM. Moreover, with some of the multi-source image pairs, only the proposed method ensured the y-disparities remained within ±1 pixel.


Introduction
Stereo observation has long been interested in computer vision, photogrammetry, and remote sensing.Conventional stereo observation depends on binocular stereo matching in image-space [1] or multi-view stereo matching in object-space [2,3].Recently, even a monocular view can be used to reconstruct depth information [4][5][6].However, binocular stereo matching is still the most widely used technique in industry, since its complexity is controllable and its results are stable.As the pre-step of binocular stereo observation, epipolar resampling eliminates the y-disparities between conjugate points, so that a left-image-point along the same x-axis can be matched on the right image.Afterwards, the x-disparity map is obtained to build the depth maps or the 3D models.Epipolar resampling greatly improves computational efficiency by facilitating the image matching task to be considered a one-dimensional correspondence-labeling task.Thus, many valuable stereo matching approaches have been inspired, among which the most widely used is semi-global matching (SGM) [7].The SGM method and its variants are now widely used in aerial/UAV photogrammetry and along-track satellite stereo observation.
With the rapid development of high-resolution remote sensing technologies, more and more users are choosing cheaper and more flexible high-resolution along-track satellite stereo images to replace the conventional aerial images when producing 3D digital terrain models (DTM).Currently, the stereo satellite images are mainly collected by two types of systems: multi-line-array sensors, which collect images from different viewing angles simultaneously (e.g., ZY-3 and TH-1, from China), and single-line-array sensors, which observe one place two times with different pitch angles (e.g., IKONOS, GeoEye-1, WorldView-1/2/3, and Pleiades).The along-track stereo images acquired by either of the two systems have time intervals of only seconds, which means the radiative conditions and the land-cover are almost the same on different views.The latest along-track stereo satellite imagery has been demonstrated to have very high imaging quality and very good accuracy in both the horizontal and altitude directions [8,9].
In addition to the along-track stereo satellite images, the images collected from different tracks can also be used for cross-track stereo observation as long as they have large enough convergence angles (>10 • ).Cross-track stereo observation is very useful in producing high-relative-accuracy digital ortho-maps (DOM), because the high-resolution digital elevation model (DEM) is inaccessible to most users since it is either too expensive or restricted by the government.With inaccurate DEMs, the DOMs produced by overlapping satellite images with large convergence angles will have poor mosaicking accuracy.The mosaicking error is randomly distributed and can only be eliminated with more accurate elevation.However, unlike along-track stereo observation, cross-track stereo observation is very challenging.The images from different tracks may be acquired in different years or seasons, which have different landcovers and different radiative properties.Meanwhile, some stereo image pairs are acquired by multi-sensors.Multi-source satellite images also have different geometric properties.As a result, the SGM-based methods can hardly obtain satisfying results, and therefore cross-track or multi-source satellite stereo observation has rarely been applied before.Recently, the introduction of powerful CNN-based stereo matching approaches [10,11] has brought new hope to this area.As the pre-step of the stereo observation, epipolar resampling of multi-source satellite stereo observations should be carried out.Currently, there is no research showing that an epipolar resampling method can work well with multi-source satellite stereo images, especially when their track directions are quite different.
In this paper, a novel SRTM-aided epipolar resampling method, using a global rotation followed by a block-wise homography transformation, is proposed, and is demonstrated with both along-track and cross-track as well as single-source and multi-source stereo observations.The proposed method uses the pre-corrected rational function model (RFM) and the elevations supplied by SRTM to compute accurate epipolar-line-directions (ELDs).The experiments used 13 pairs of stereo images acquired by ZY-3, GF-1, and GF-2 from China; IKONOS-2, WorldView-2, and WorldView-3 from Digital Globe Inc. of the USA; and Pleiades-HR 1a and Pleiades-HR 1b from Europe.

Related Works
In this section, we briefly review the existing studies on the model of epipolarity between binocular linear pushbroom satellite images and the existing approaches of epipolar resampling.
The epipolar geometry of frame images can be modeled in a very simple manner through the fundamental matrix F, which can be recovered with only eight conjugate points, or even fewer under different conditions [12,13].The epipolar geometry of frame images is based on the coplanarity of the two perspective centers, two conjugate image points, and the corresponding object-space point.The intersections of the epipolar plane and the two image planes result in two straightforward corresponding epipolar lines.Thus, epipolar resampling of frame images can also be implemented with very simple geometric transformations.
Unlike the frame imagery, the pushbroom satellite imagery has a very complicated rigorous epipolar relationship.The pushbroom images are formed by stitching the one-dimensional scanning lines captured as the sensor moves.Each scanning line has a unique perspective center and a unique camera attitude, since the satellite is moving in an elliptical orbit with varying velocity and is slowly rotating to keep the sensors facing the ground [14].It has been pointed out that, when modeling the exterior orientation parameters as second-order polynomials, the epipolar curve can be approximated as a straight line in a small range but not for the entire pushbroom image.Moreover, different points on the projection of a light ray may correspond to different epipolar curves, which implies that there is actually no conjugate epipolar curve on pushbroom images [15].
When undertaking epipolar resampling for pushbroom satellite images, the projection model is usually simplified.Some works [16,17] have used linear approximation with the pushbroom imaging model, which assumes that, over the acquisition duration of one image scene, the sensor is moving in a straight line with a constant velocity and the attitude of the sensor is fixed.With the approximation, the projection model can be recovered with several ground control points (GCPs), and the epipolar curves can be represented as hyperbola curves.Some works [18,19] have used the even simplified parallel projection model for high-resolution satellite imagery with a very narrow field-of-view (FOV).The parallel-projected image can be transformed from the original perspective-projected image through a simple image-space rectification.Under the parallel projection, the epipolar curves are straight lines and have unique correspondences.Thus, epipolar-resampling with the parallel-projection occurs in a similar manner to the frame images.Some researchers have found that the ELD is much more stable on what they called the projection reference plane than on the image plane.The projection reference plane is a virtual horizontal plane in object space [20,21].In these works, the images were firstly ortho-rectified onto the projection reference plane, and then transformed with a 2D affine model that rotates the images to make the ELD parallel to the x-axis.The georeferencing models used in these methods should be pre-corrected to eliminate the convergence error of conjugate points.Either the rigorous sensor model [22] or the rational function model (RFM) [23] can be used as the projection model.In recent years, some works have directly used the pre-corrected RFM as the projection model, without making any simplification when modeling the epipolar relationship.In other works [24,25], the authors used 2D polynomial functions calculated with conjugate image points to transform the epipolar curves into straight lines.In another paper [26], the authors used block-wise transformation to transform the epipolar curves into straight lines.
The above-mentioned methods make some assumptions to simplify the pushbroom projection model based on the property of high-resolution satellite imagery as having narrow FOVs.Most of the methods have been demonstrated to be effective with along-track stereo images.Wang et al. [21] also demonstrated their method with cross-track SPOT-5-HRG images, which have intervals of several days between the acquisition dates.However, multi-source satellite stereo image pairs are much more complicated to deal with [27][28][29][30][31][32].None of the methods have been demonstrated to be effective with multi-source imagery, and there is a lack of analysis on the accuracy loss of their assumptions, especially for those assuming that a unified ELD exists and the epipolar curves are straight.In reality, the ELD changes not only with different image locations, but also with different back-projection elevations at the same image location.Moreover, the angles between the orbits of different satellites may be large, and thus the simplifications may be inaccurate.To deal with these problems, a new SRTM-aided method is proposed in this article, and is demonstrated to be effective with both single-source and multi-source stereo satellite images.

Epipolar Geometry of Pushbroom Imagery
In this section, we briefly explain why medium-accuracy elevation should be used in computing the ELD for pushbroom images, and describe the process of obtaining the ELD and the corresponding epipolar-line-segment for an image-space location with pre-corrected RFMs.

Property of Epipolar-Line-Direction
Define (p, p ) as a pair of conjugate points on two-view pushbroom stereo images, where p is on the left image I and p is on the right image I .Define P as the corresponding object-space point of p and p .The ELD at the position of p, denoted as ed(p), is defined as the direction of the left-image-projection of a small segment, which is on the light of p and crosses the object-space point P. When the two-view images are relatively well oriented, the point p will be lying on the right-image-projection of the light of p.For frame images, the ELD ed(p) can be obtained without knowing the actual location of p because the coplanarity conveyed by the perspective projection ensures the ed(p) is not changed with different p s on the right-image-projection of the light of p.However, a similar property is not established with two-view pushbroom images.Figure 1 illustrates four typical types of geometric relationships of pushbroom stereo images.For the convenience of analysis, it is assumed that the satellites are moving in straight lines with constant velocities, and that the attitudes of the sensors are not changed during the acquisitions.Thus, the points O0, p, P1, P2, p1, p2, and O1 (O2) are in a same plane, and the ELD is fixed at the direction of the x-axis on both images.The condition in Figure 1b is called along-track stereo, where only one track exists and is lying on the plane formed by points O0, p, P1, P2, p1, p2, O1, and O2.As a result, the ELD is fixed at the direction of the y-axis in the image-space.Apparently, under the crosstrack coplanar stereo condition and the along-track stereo condition, the ELD will not be influenced The condition in Figure 1a is called cross-track coplanar stereo, where Track-1 and Track-2 are approximately parallel, and the scanning planes of the two tracks are approximately parallel.As a result, different points on the light of p, denoted as l (p), correspond to the same perspective center.Thus, the points O 0 , p, P 1 , P 2 , p 1 , p 2 , and O 1 (O 2 ) are in a same plane, and the ELD is fixed at the direction of the x-axis on both images.The condition in Figure 1b is called along-track stereo, where only one track exists and is lying on the plane formed by points O 0 , p, P 1 , P 2 , p 1 , p 2 , O 1 , and O 2 .As a result, the ELD is fixed at the direction of the y-axis in the image-space.Apparently, under the cross-track coplanar stereo condition and the along-track stereo condition, the ELD will not be influenced by different corresponding points, or, in other words, by different heights.This phenomenon explains why methods which do not consider the influence of different heights have still achieved relatively good results with some of the cross-track stereo images, and all of the along-track stereo images.Stereo observation with images captured by satellites that obtain nadir images with a fixed pitch angle (e.g., multi-track ZY3-NAD, GF-1, or GF-2) follows the cross-track coplanar stereo condition.The along-track stereo condition corresponds to the most widely used along-track stereo images captured by multi-line-array sensors (e.g., ZY3-TLC and TH1-TLC), or a single-line-array sensor with different pitch angles (e.g., IKONOS stereo, GeoEye stereo, and WorldView stereo).
Under the conditions shown in Figure 1c,d, coplanarity no longer established, thus the ELD will be influenced by different heights and there are no conjugate epipolar curves.Condition (c) is called the cross-track non-coplanar stereo, which usually happens for image pairs made up with a forward-view and a backward-view captured on different tracks (e.g., stereo observation with a ZY3 forward-view and a ZY-3 nadir-view in different tracks).Condition (d) is called the cross-multi-directional-track (CMDT) stereo, which is a special case of the cross-track non-coplanar condition.The CMDT condition usually happens for multi-source stereo images.In the experimental section of this paper, it is shown that there exist about 20 degree angles between the tracks of a Chinese mapping satellite image (e.g., ZY3, GF1, or GF2) and a high-resolution satellite image provided by DigitalGlobe, Inc. (e.g., IKONOS or WorldView-2).For better analysis, the image pairs following the CMDT condition are tested.
In summary, the objective of epipolar resampling for frame images-that is, transforming the conjugate epipolar-curves to the same row-can never be achieved under the cross-track non-coplanar stereo condition or CMDT stereo condition of pushbroom images, since the conjugate epipolar-curves do not exist [15].However, for the purpose of digital terrain model generating, we only need to focus on the ELD computed with the elevations near the ground.Many public data sources can supply medium-accuracy elevation information, such as the elevation information supplied by the Shuttle Radar Topography Mission (SRTM), which has a better-than-100 m accuracy in most areas and covers land all over the world within the latitude range from 54 • S to 60 • N [33].As a result, in the generic epipolar resampling method proposed in this paper, an accurate ELD is determined with the elevation supplied by the SRTM.

Obtaining Epipolar-Line-Directions
In this section, we introduce the geometric process used in the proposed epipolar resampling approach.When the stereo satellite images I (the left image) and I (the right image) have been relative-oriented accurately, the lights of the two conjugate points p and p will intersect somewhere in the object-space.As a result, the epipolar-curve ec(p) corresponding to the left-image-point p can be easily obtained by projecting the light of p onto the right image-space, as ec(p) will cross the point p .Similarly, the epipolar-curve ec(p ) which crosses the point p can be obtained by projecting the light of p .
The light of p or p can be obtained by the process of backprojection with the RFM model [23].The RFM model projects the geographic coordinates (L, P, H) to the image-space coordinates (x, y) with two fractions of three-order polynomials, where L is latitude; P is longitude; H is height; x is the column of the image (the sample coordinate); and y is the row of the image (the number of scanning lines).When fixing the height H, the RFM can also be used to calculate the horizontal coordinates (latitude and longitude) with the image-space coordinates (x, y) through solving a set of non-linear equations, which is known as the backprojection process.For the convenience of presentation, the process of projection is denoted as: and the process of backprojection is denoted as: Before obtaining the ELDs, we need to find the corresponding points.Some approaches [24,25] directly use the matched tie-points.However, this may not be a good choice for production since the resampling accuracy will be influenced by the matching error and the spatial distribution of the tie-points.In the proposed approach, we chose to use the pre-corrected RFMs and the SRTM to obtain virtual corresponding points.Although the SRTM has tens of meters of elevational error, the ELDs are quite accurate.Because the satellites are relatively stable carriers, the curvature of the epipolar curves on the high-resolution satellite images are very small, and treating a small segment of an epipolar curve as a straight line-segment will not cause the loss of any accuracy.The corresponding point of p, denoted as p , is determined in two steps.First, backproject the point p onto the elevation plane of SRTM through an iteration process [34] (which is called the ray-tracing process in computer graphics).Thus, the ground point P and its elevation H are obtained.Second, project the ground point P onto the right image I and, thus, the corresponding point p is obtained.The iterative process of backprojecting the p onto the elevation plane is denoted as: The process of obtaining the corresponding point with the aid of SRTM is denoted as: Then, the ELD ed(p) at the location of p is obtained by projecting a small segment of the light (denoted as P − P + ) corresponding to p around the ground point P onto the left image I (see Figure 2).The segment P − P + is obtained through the backprojection process (Equation ( 2)) on p with the heights (H-∇H) and (H+∇H).∇H, the height interval between point P and P − or P + , can be set as 20-100 m.
Conversely, the ELD ed p at the location of p is obtained by similar steps to the point p.The process of obtaining the ELD of p is denoted as: Remote Sens. 2019, 10, x FOR PEER REVIEW 6 of 18 resampling accuracy will be influenced by the matching error and the spatial distribution of the tiepoints.In the proposed approach, we chose to use the pre-corrected RFMs and the SRTM to obtain virtual corresponding points.Although the SRTM has tens of meters of elevational error, the ELDs are quite accurate.Because the satellites are relatively stable carriers, the curvature of the epipolar curves on the high-resolution satellite images are very small, and treating a small segment of an epipolar curve as a straight line-segment will not cause the loss of any accuracy.The corresponding point of p, denoted as p', is determined in two steps.First, backproject the point p onto the elevation plane of SRTM through an iteration process [34] (which is called the ray-tracing process in computer graphics).Thus, the ground point P and its elevation H are obtained.Second, project the ground point P onto the right image I' and, thus, the corresponding point p' is obtained.The iterative process of backprojecting the p onto the elevation plane is denoted as: , , , , The process of obtaining the corresponding point with the aid of SRTM is denoted as: Then, the ELD ( ) ed p  at the location of p is obtained by projecting a small segment of the light (denoted as − + ′ ′ P P ) corresponding to p' around the ground point P onto the left image I (see Figure 2).The segment − + ′ ′ P P is obtained through the backprojection process (Equation ( 2

Proposed Approach
The proposed approach was designed with two objectives in mind: (1) Eliminate the ydisparities between the image points and their correspondences obtained through the process in Equation ( 4). ( 2) Transform the ELDs obtained through the process in Equation ( 5) at all image locations to the positive direction of the x-axis.

Proposed Approach
The proposed approach was designed with two objectives in mind: (1) Eliminate the y-disparities between the image points and their correspondences obtained through the process in Equation ( 4).
(2) Transform the ELDs obtained through the process in Equation ( 5) at all image locations to the positive direction of the x-axis.

Workflow
The proposed approach has three steps: 1.
Prepare the global rotation matrix and the block-wise homography transformation models.

2.
Evaluate the prepared transformation model through virtual corresponding points.
In the proposed approach, a point either on the left image or on the right image needs two transformations from the original image-space to the epipolar image-space (see Figure 3).First, the point in the original image-space is rotated into the rotated image-space.In the rotated image space, only the ELD at the center of the overlapping area is ensured to be in the direction of the x-axis.Second, a block-wise homography transformation is made to transform the point from the rotated image-space to the epipolar image-space.Conversely, a point is transformed from the epipolar image-space by a reverse block-wise homography transformation, followed by a reverse global rotation.
Remote Sens. 2019, 10, x FOR PEER REVIEW 7 of 18 The proposed approach has three steps: 1. Prepare the global rotation matrix and the block-wise homography transformation models.
2. Evaluate the prepared transformation model through virtual corresponding points.
In the proposed approach, a point either on the left image or on the right image needs two transformations from the original image-space to the epipolar image-space (see Figure 3).First, the point in the original image-space is rotated into the rotated image-space.In the rotated image space, only the ELD at the center of the overlapping area is ensured to be in the direction of the x-axis.Second, a block-wise homography transformation is made to transform the point from the rotated image-space to the epipolar image-space.Conversely, a point is transformed from the epipolar image-space by a reverse block-wise homography transformation, followed by a reverse global rotation.
where the translating parameters t and t' are included to ensure the entirety of the rotated overlapping areas are in a proper image-coordinate for resampling.
After the global rotation, we need to obtain the corresponding grid locations at the rotated image-space and the epipolar image-space, which is the most complicated part of the proposed approach.The workflow is illustrated in Figure 4.After that, the homography transformation within a block can be easily solved with its four grid-point-correspondences.sin θ where the translating parameters t and t are included to ensure the entirety of the rotated overlapping areas are in a proper image-coordinate for resampling.
After the global rotation, we need to obtain the corresponding grid locations at the rotated image-space and the epipolar image-space, which is the most complicated part of the proposed approach.The workflow is illustrated in Figure 4.After that, the homography transformation within a block can be easily solved with its four grid-point-correspondences.
Finally, when the transformation models are prepared, the y-disparities of the virtual corresponding points-which are transformed to the epipolar image-space-are used to examine the accuracy of the models before image resampling.
Remote Sens. 2019, 10, x FOR PEER REVIEW 8 of 18 Finally, when the transformation models are prepared, the y-disparities of the virtual corresponding points-which are transformed to the epipolar image-space-are used to examine the accuracy of the models before image resampling.

Block-Wise Transformation of the Left Image
As explained in Section 3, the ELD may be influenced by different heights under the cross-track non-coplanar stereo condition and the CMDT stereo condition, which means the ELDs at different image locations may show small differences because of the uneven land.As a result, the block-wise homography transformation is chosen to ensure the ELDs at all the left-image locations point to the positive direction of the x-axis.
Denote the grids of the left image in the rotated image-space as: where rot  is a regular grid formed by squares, and covers the whole overlapping area of the left image in the rotated image-space.Denote the square surrounded by grid ( ) sq .The side length of the squares is denoted as h.


in the epipolar image-space as:

Block-Wise Transformation of the Left Image
As explained in Section 3, the ELD may be influenced by different heights under the cross-track non-coplanar stereo condition and the CMDT stereo condition, which means the ELDs at different image locations may show small differences because of the uneven land.As a result, the block-wise homography transformation is chosen to ensure the ELDs at all the left-image locations point to the positive direction of the x-axis.
Denote the grids of the left image in the rotated image-space as: where G rot is a regular grid formed by squares, and covers the whole overlapping area of the left image in the rotated image-space.Denote the square surrounded by grid g i+1,j , g i,j+1 , and g i+1,j+1 as sq i,j .The side length of the squares is denoted as h.
Denote the corresponding grids of G rot in the epipolar image-space as: The key is how to determine the location of g (epi) i,j s.In the proposed method, the x-coordinate of g is inherited from g (rot) i,j .Only the y-coordinates are adjusted to modify the ELDs.Thus, the transformation of grids from the rotated image-space to the epipolar image-space is: where the y-translation ∆y i,j is determined in the following steps.
Denote the ELD at g in the rotated image-space as ed (rot) i.j and denote the angle between ed (rot) i.j and the x-axis as θ (rot) i,j .Assume we can transform the neighboring area of g (rot) i,j from the rotated image-space to the epipolar image-space by a y-shearing and a y-translation: where (x, y) is the coordinate of a point in the neighboring area of g i,j .The y-shearing turns the ELD at the location of g (rot) i,j to the x-axis.The purpose of y-translation is to deal with the dislocation at the square center caused by the different y-shearings of its four grid points.For example, at the center of square sq i,j , of which the x-coordinate is x (rot) i.j + h/2 , four different combinations of a y-shearing and a y-translation may be applied according to Equation (9): + ∆y i,j ; with grid g (rot) i,j T i,j,1,0 = (h/2) • tan θ (rot) i+1,j + ∆y i+1,j ; with grid g (rot) i+1,j T i,j,0,1 = −(h/2) • tan θ (rot) i,j+1 + ∆y i,j+1 ; with grid g The difference between each pair of y-transformations at the center of square sq i,j should be minimized.There are six difference values, denoted as: dT i,j,1 = T i,j,0,0 − T i,j,0,1 dT i,j,2 = T i,j,0,0 − T i,j,1,0 dT i,j,3 = T i,j,0,0 − T i,j,1,1 dT i,j,4 = T i,j,0,1 − T i,j,1,0 dT i,j,5 = T i,j,0,1 − T i,j,1,1 dT i,j,6 = T i,j,1,0 − T i,j,1,1 (12) As a result, the y-translations ∆y i,j are computed by solving the following constrained optimization problem: where ∆y m/2 , n/2 is the y-translation at the center grid, and the purpose of this constraining condition is to ensure a unique solution.
With the solution of ∆y i,j s, the grids G rot in the epipolar image-space are obtained.Finally, the homography transformation H i,j within the square sq i,j is computed with the four pairs of grid points (g i,j+1 ), and (g i+1,j+1 ).

Block-Wise Transformation of the Right Image
The block-wise homography transformation of the right image has two objectives: one is to eliminate the y-disparity between the corresponding points (p, p ), and the other is to balance the ground resolution in the x-direction and the y-direction.
Denote the grid on the right image in the rotated image-space as: where g is the corresponding point of the left-image-grid g computed in three steps: (1) reverse global rotation on the left image; (2) the backprojection and projection process in Equation ( 4) with the aid of SRTM; and (3) rotation on the right image.Apparently, G rot is an irregular grid and the homography transformation H i,j is applied within the polygon P i,j formed by the grid point g i,j+1 , and g (rot) i+1,j+1 .Denote the corresponding grids of the right image in the epipolar image-space as: The locations of g s are determined as follows: x The y-coordinate of g is inherited from the y-coordinate of g (of the left image) to eliminate the y-disparity.The x-coordinate of g is adjusted with a linear transformation to balance the ground resolution.The parameters of the linear transformation A, B, and C are obtained by solving the following optimization problem: Finally, the homography transformation H i,j is computed with the four pairs of grid points on the right image: (g i,j+1 ), and (g i+1,j+1 ).

Experiments
Seventeen scenes of satellite images with vendor-provided RFMs were used to demonstrate the proposed epipolar resampling approach.Two methods that can be easily implemented with RFM were used for comparison.For the convenience of listing the results, the proposed approach in this paper is called RBH (rotation and block-wise homography transformation); the approach based on the object-space project-reference-plane [21] is called PRP; and the approach using tie-points (TPs) to solve the 2D-polynomial transformations [24] is called TPP.

Data and Process
Table 1 lists the meta-information of the satellite images used in the experiments.There were three test areas.The first one was in Guangdong Province, China, covering the city of Guangzhou, the city of Foshan, and some other areas on the west side of the Pearl River Estuary.The land is mostly plains, but some high hills exist in the north and in the west.The elevation in this area ranges from 0 m to 500 m.The plains are mostly dense urban areas, while heavy vegetation also exists on the hills.Many rivers exist in this area.The second test area included the plains near the city of Nanking, Jiangsu Province, China.The land is mostly plains, while some small hills exist.The elevation in this area ranges from 0 m to 230 m.The area is most rural.Many rivers and lakes exist in this area.The third test area was near the city of Seoul, South Korea.Half of the land is plains, and the other half is mountainous.The elevation in this area ranges from 10 m to 400 m.The plains are dense urban areas, while the mountains are covered by heavy vegetation.
The tested images were acquired by eight different satellites and nine sensors.In Table 1, ZY-3-NAD means a ZY-3 panchromatic nadir image; ZY-3-FWD means a ZY-3 panchromatic forward-view image; and ZY-3-BWD means a ZY-3 panchromatic backward-view image.ZY-3 is the first civil stereo mapping satellite of China.There are about 24 degree viewing-angles fixed between the ZY-3-NAD and the ZY-3-FWD/BWD.The ground sampling distance (GSD) on the ZY-3-NAD images is about 2.1 m, and the GSDs on the ZY-3-FWD/BWD images are about 3.2 m.GF-1-PMS and GF-2-PMS are the panchromatic images of the GF-1 satellite and GF-2 satellite, which were also launched in China.The GSD is about 2.0 m on the GF-1-PMS images and about 0.8 m on the GF-2-PMS images.The product level of the tested images captured by ZY-3, GF-1, and GF-2 is Level-1A, which means only radiative correction is made to the block-wise images, with an RFM generated with the initial rigorous georeferencing model.
Images captured by the panchromatic sensors on the IKONOS-2 and the WorldView-2/3, launched by Digital Globe Inc. from Westminster, Colorado, USA, were also tested.The GSD is about 1.0 m on the IKONOS-2 images, about 0.5 m on the WorldView-2 images, and about 0.4 m on the WorldView-3 images.The product level of the IKONOS-2 images is standard geometrically corrected.The product level of the WorldView-2/3 images is Level-1B.Both levels mean that radiative corrections and geometric corrections were made, and the images were not ortho-rectified with a DEM.
Two images captured by the Pleiades-HR 1A and the Pleiades-HR 1B were used in the South Korea test area.The product level of the Pleiades images is Sensor, which means the images were not ortho-rectified.The GSD is about 0.5 m on the Pleiades images.The relative positions of the images in the three test areas are shown in Figure 5. Combining the attitude angles in Table 1, we found that the pitch angles of the ZY-3-NAD images and GF-1/2-PMS images are very close to zero.Thus, the stereo image-pairs formed by ZY-3-NAD images and GF-1/2-PMS images are under the cross-track coplanar stereo condition (Figure 1a), and the image-pairs formed by a GF-1/2 PMS image and a ZY-3-FWD or ZY-3-BWD image follow the cross-track non-coplanar stereo condition (Figure 1c).The stereo image-pairs collected by ZY3 three-line-array and WorldView-2 in Guangdong and collected by WorldView-3 in South Korea are under the along-track-stereo condition (Figure 1b).For the images collected by IKONOS-2 in Guangdong, both the roll angle and the pitch angle were changed for side observation and stereo observation, thus the pair of IKONO-2 images is under the cross-track non-coplanar condition.As shown in Figure 5, the directions of the orbits of IKONOS-2, WorldView-2/3, and Pleiades are just south, while those of the ZY-3 and GF-1/2 have about 20-degree angles to the south direction.As a result, the stereo image pairs formed by a ZY-3/GF-1/GF-2 image and an IKONOS-2/WorldView-2/image are under the CMDT stereo condition (Figure 1d).Tie-point matching [35], outlier filtering [36], and block adjustment [37] were processed before testing the epipolar resampling approaches, because the ZY-3/GF-1/GF-2 images are Level-1A images that were not geometrically corrected by the vendors.Meanwhile, although the images captured by IKONOS-2/WorldView-2/WorldView-3 were geometrically corrected by the vendors, their standard product accuracy (6 m for IKONOS-2 and 3 m for WorldView-2/3) cannot ensure the disparity perpendicular to the ELD is within one pixel of their GSDs.
The settings of the epipolar resampling process are as follows.When processing RBH (the proposed approach), the number of grid-points is at least 20 in the x-direction or y-direction; ∇ H, the elevation interval for computing the ELD through the process in Equation ( 5), was set as 50 m.When processing PRP [21], the elevation of the projection reference plane was set as the average elevation of the overlapped area; the epipolar images were resampled in the same ground resolution of the original left image.When processing TPP [24], two three-order polynomials were used as transformation models; each epipolar curve was expressed by 10 pieces of line-segments.
After preparing the transformation models, the accuracy was evaluated by the y-disparities of the virtual corresponding points (VCPs) and the tie-points (TPs).The VCPs were random left-imagepoints with their correspondences collected through the process in Equation ( 4); that is, ray-tracing Tie-point matching [35], outlier filtering [36], and block adjustment [37] were processed before testing the epipolar resampling approaches, because the ZY-3/GF-1/GF-2 images are Level-1A images that were not geometrically corrected by the vendors.Meanwhile, although the images captured by IKONOS-2/WorldView-2/WorldView-3 were geometrically corrected by the vendors, their standard product accuracy (6 m for IKONOS-2 and 3 m for WorldView-2/3) cannot ensure the disparity perpendicular to the ELD is within one pixel of their GSDs.
The settings of the epipolar resampling process are as follows.When processing RBH (the proposed approach), the number of grid-points is at least 20 in the x-direction or y-direction; ∇H, the elevation interval for computing the ELD through the process in Equation ( 5), was set as 50 m.When processing PRP [21], the elevation of the projection reference plane was set as the average elevation of the overlapped area; the epipolar images were resampled in the same ground resolution of the original left image.When processing TPP [24], two three-order polynomials were used as transformation models; each epipolar curve was expressed by 10 pieces of line-segments.
After preparing the transformation models, the accuracy was evaluated by the y-disparities of the virtual corresponding points (VCPs) and the tie-points (TPs).The VCPs were random left-image-points with their correspondences collected through the process in Equation (4); that is, ray-tracing with SRTM followed by projection.The TPs used for evaluation were selected with the following steps.First, map the left-image-point p from to the right image I through the process in Equation ( 4) and denote it as p".Second, obtain the ELD through the process in Equation ( 5) at the location of p , which is the matched point of p on the right image I .Third, compute the point-to-line distance between the point p" and the straight line that crosses the point p and follows the direction of the ELD [36].The TPs with less-than-0.5 pixel point-to-line distances were used for evaluation.It should be noted that the images with larger GSDs were set as the left images in the epipolar resampling, to avoid up-sampling the images.The y-disparities were computed with the unit of the left-image-pixel.

Experimental Results
The experimental results are listed in different groups in Tables 2-5.Each group corresponds to a stereo condition illustrated in Figure 1.The optimal values (closest to zero) are in bold font.
The results of two cross-track coplanar stereo image-pairs are listed in Table 2.All four images in the Jiangsu test area have close-to-zero pitch angles.With both the image pairs, the accuracy of the proposed RBH method and that of the PRP method evaluated by either the TPs or VCPs were very close; the y-disparities of the TPs ranged from about −0.5 pixels to 0.5 pixels, and the y-disparities of the VCPs ranged from about −0.1 pixels to 0.1 pixels.The accuracy of the TPP method was lower; the y-disparities of the TPs ranged from about −0.8 pixels to 0.9 pixels, and the y-disparities of the VCPs ranged from about −0.3 pixels to 0.5 pixels.The results of four along-track stereo image pairs are listed in Table 3.The proposed RBH method still had the smallest y-disparity range for both TPs and VCPs.The y-disparities ranges of the PRP method and the TPP method, although a little wider than those of the RBH method, remained within the range of ±1 pixel, except on the WorldView-3 image pairs in South Korea.The results of four cross-track non-coplanar image pairs are listed in Table 4.In this group, the ranges of the y-disparities of VCPs with the RBH methods were obviously narrower than those with the PRP method or with the TPP method.For the pairs of GD-IK-1/GD-W2-1 and SK-P1A/SK-W3-2 with the TPP method, the ranges of the y-disparities of both TPs and VCPs exceeded the range of ±1 pixel, which may have had a bad influence on one-dimensional matched point searching.The results of three CMDT stereo image pairs are listed in Table 5.In this group, the accuracy of the proposed RBH method is much better than that of either the PRP method or the TPP method, whether evaluated with the y-disparities of the TPs or evaluated with the y-disparities the VCPs.In all three image pairs, only the RBH method kept the y-disparities of TPs within the range of −1 to 1 pixels.The y-disparity ranges with both the PRP method and the TPP method exceeded the range of ±1 pixel.In the pair of GD-Z3N-2/GD-IK-2, the y-disparity of the TPs exceeded 3 pixels with the PRP method and exceeded 5 pixels with the TPP method, which would result in failures in dense image matching.

Analysis and Discussion
In Section 3.1, we point out that different elevations will not influence the computation of the ELD in the cross-track coplanar stereo condition and along-track stereo condition, but will influence it in the cross-track non-coplanar stereo condition and CMDT stereo condition.The experimental results presented in Section 5.2 demonstrate the above conclusion.With the cross-track coplanar stereo image pairs and the along-track stereo image pairs, the proposed RBH method had almost the same accuracy as the PRP method and the TPP method, although they computed the ELDs with different strategies.However, with the cross-track non-coplanar stereo image pairs and the CMDT stereo image pairs, the proposed RBH method had obviously better accuracy than the PRP method and the TPP method, which demonstrates that medium-accuracy elevations can greatly improve the computing accuracy of the ELDs.Moreover, in some image pairs (e.g., GD-Z3N-2/GD-IK-2), only the proposed RBH method could obtain accurate enough results, since the PRP method and TPP method exceeded the y-disparity-range of ±1-pixel with some TPs.
For a better understanding, Figure 6 illustrates the θ In almost all tested image pairs, the PRP method achieved better accuracies than the TPP method even when evaluated by the TPs, demonstrating that a corresponding-point-free method is a better choice since both the ELD and the corresponding epipolar curve can be accurately determined with pre-corrected RFMs.In almost all tested image pairs, the PRP method achieved better accuracies than the TPP method even when evaluated by the TPs, demonstrating that a corresponding-point-free method is a better choice since both the ELD and the corresponding epipolar curve can be accurately determined with pre-corrected RFMs.

Conclusions
This paper classifies the geometric conditions of satellite-image-based binocular stereo observation into four classes: cross-track coplanar stereo, along-track stereo, cross-track non-coplanar stereo, and CMDT stereo.Under the cross-track coplanar stereo condition and along-track stereo condition, the ELDs are not influenced by the choice of elevation in computing.However, under the cross-track non-coplanar stereo condition and the CMDT stereo condition, the ELDs are influenced by elevation.To deal with this problem, this paper proposes a novel SRTM (or other public medium-accuracy elevation data source)-aided method of epipolar resampling for satellite stereo images.The proposed method transforms a point from the original image-space to the epipolar image-space through two transformations: a global rotation, followed by a block-wise homography transformation.The SRTM supplies medium-accuracy elevations to compute an accurate ELD when preparing the block-wise homography transformation.The experiments demonstrated that, compared with two other methods that compute the ELDs with a unified elevation plane, the proposed method is much more accurate.On some image pairs, only the proposed method could ensure the y-disparities of the TPs remained within the range of ±1 pixels.Since most multi-source stereo image pairs are cross-track non-coplanar stereo and CMDT stereo, the contributions of this paper are very important for research regarding the problem of multi-source satellite stereo observation.
There is still work to do to improve the proposed method in the future.First, the block-wise transformation from the epipolar image-space to the rotated image-space on both the left image and the right image uses irregular grids, which is time-consuming since the transformation is very frequently required in the resampling process.Thus, optimizations in efficiency are required.Second, the proportion between the x-disparity and the height difference is uneven since the proposed method works on the image-space, which makes the post-processing of the dense image matching inefficient because we need to transform the matching points from the epipolar image space to the original image-space, and then obtain accurate elevations through forward-intersecting. Similar approaches may be developed in the object-space to deal with this problem.
In this study, SRTM was used to constrain the ELD of an image location because the corresponding object-space point was close to the elevation plane of SRTM.In the research on multi-source satellite stereo, SRTM constrained dense image-matching is also a potential technical route that may lead to important breakthroughs.

Figure 1 .
Figure 1.Four typical types of geometric relationships of pushbroom stereo images: (a) cross-track coplanar stereo; (b) along-track stereo; (c) cross-track non-coplanar stereo; and (d) cross-multidirectional-track stereo.The condition in Figure 1a is called cross-track coplanar stereo, where Track-1 and Track-2 are approximately parallel, and the scanning planes of the two tracks are approximately parallel.As a result, different points on the light of p, denoted as ( ) l p  , correspond to the same perspective center.

Figure 2 .
Figure 2. The process of obtaining the corresponding point and the ELD with the aid of SRTM.

Figure 2 .
Figure 2. The process of obtaining the corresponding point and the ELD with the aid of SRTM.

Figure 3 .
Figure 3.The process of transforming points between the original image-space and the epipolar image-space in the proposed approach.The rotation matrix can be easily constructed according to the ELD computed with the process described in Section 3.1.Denote ( ) org c θ and ( ) org c θ ′ as the angle between the x-axis and the ELD at the center of the overlapping area on the left image and the right image, then the global rotation matrixes of the left image I and the right image I' are:

Figure 3 .
Figure 3.The process of transforming points between the original image-space and the epipolar image-space in the proposed approach.The rotation matrix can be easily constructed according to the ELD computed with the process described in Section 3.1.Denote θ (org) c and θ (org) c as the angle between the x-axis and the ELD at the center of the overlapping area on the left image and the right image, then the global rotation matrixes of the left image I and the right image I are:

Figure 4 .
Figure 4.The process of transforming the grid from the rotated image-space to the epipolar imagespace in the proposed approach.

Figure 4 .
Figure 4.The process of transforming the grid from the rotated image-space to the epipolar image-space in the proposed approach.
Remote Sens. 2019, 10, x FOR PEER REVIEW 12 of 18 of IKONO-2 images is under the cross-track non-coplanar condition.As shown in Figure 5, the directions of the orbits of IKONOS-2, WorldView-2/3, and Pleiades are just south, while those of the ZY-3 and GF-1/2 have about 20-degree angles to the south direction.As a result, the stereo image pairs formed by a ZY-3/GF-1/GF-2 image and an IKONOS-2/WorldView-2/image are under the CMDT stereo condition (Figure 1d).

Figure 5 .
Figure 5.The relative positions of the tested satellite images: (a) the Guangdong test area; (b) the Jiangsu test area; and (c) the South Korea test area.

Figure 5 .
Figure 5.The relative positions of the tested satellite images: (a) the Guangdong test area; (b) the Jiangsu test area; and (c) the South Korea test area.

Table 1 .
The meta-information of the satellite images.

Table 2 .
The results of the image pairs in the cross-track coplanar stereo condition.

Table 3 .
The results of the image pairs in along-track stereo condition.

Table 4 .
The results of the image pairs in the cross-track non-coplanar stereo condition.

Table 5 .
The results of the image pairs in the cross-multi-directional-track stereo condition.