A General Framework of Remote Sensing Epipolar Image Generation

Epipolar images can improve the efficiency and accuracy of dense matching by restricting the search range of correspondences from 2-D to 1-D, which play an important role in 3-D reconstruction. As most of the satellite images in archives are incidental collections, which do not have rigorous stereo properties, in this paper, we propose a general framework to generate epipolar images for both in-track and cross-track stereo images. We first investigate the theoretical epipolar constraints of single-sensor and multi-sensor images and then introduce the proposed framework in detail. Considering large elevation changes in mountain areas, the publicly available digital elevation model (DEM) is applied to reduce the initial offsets of two stereo images. The left image is projected into the image coordinate system of the right image using the rational polynomial coefficients (RPCs). By dividing the raw images into several blocks, the epipolar images of each block are parallel generated through a robust feature matching method and fundamental matrix estimation, in which way, the horizontal disparity can be drastically reduced while maintaining negligible vertical disparity for epipolar blocks. Then, stereo matching using the epipolar blocks can be easily implemented and the forward intersection method is used to generate the digital surface model (DSM). Experimental results on several in-track and cross-track images, including optical-optical, SAR-SAR, and SAR-optical pairs, demonstrate the effectiveness of the proposed framework, which not only has obvious advantages in mountain areas with large elevation changes but also can generate high-quality epipolar images for flat areas. The generated epipolar images of a ZiYuan-3 pair in Songshan are further utilized to produce a high-precision DSM.


Introduction
With the rapid development of high-resolution remote sensing sensors, digital stereo image data with large coverage, high precision, and timeliness can be easily obtained. Each pair of overlapping images forms a stereoscopic observation of the ground target, allowing the estimation of its 3D position. An important application of 3D information processing is the extraction of the digital surface model (DSM). DSM can truly reflect the situation on the ground. In addition to terrain elevation, DSM also contains the elevation information of ground objects such as buildings, bridges, and trees, and is widely used in various fields. Dense matching (stereo matching) is a key technology in DSM generation, and it has an important constraint, the epipolar geometry constraint. Using epipolar lines to retrieve conjugate points in image pairs can improve the efficiency and accuracy of stereo matching and it is a development trend to generate high-quality epipolar images from original satellite stereo image pairs [1]. Currently, optical satellite images are mostly obtained by linear array pushbroom imaging technology. However, unlike frame satellite imagery, linear pushbroom satellite images do not have a strict epipolar relationship and can only generate approximate epipolar images [2]. Therefore, how to generate epipolar images based on linear pushbroom satellite images has become an ongoing research topic.
As early as the 1980s, Zhang and Zhou proposed a SPOT image approximate epipolar arrangement method, namely, the polynomial fitting method [3]. This method only needs to use the coordinates of the conjugate points, without knowing other image information. Kim analyzed the epipolar curves generation method of the sensor model based on the collinear equations according to the projection trajectory method [4]. Morgan et al. proposed an approximate epipolar curves generation method based on a parallel projection model [2]. This method has certain requirements for the sensor field of view, terrain undulations, and conjugate point matching accuracy and lacks versatility [5]. Wang et al. proposed a new satellite stereo image epipolar resampling method, which directly establishes the correspondence between original image pixels and epipolar image pixels through a geometric sensor model [6].
A significant fundamental content of epipolar correction for remote sensing images is to establish the imaging equation between the image coordinates and the spatial coordinates. The general imaging equation is a rigorous model established from the imaging geometric relationship of the sensor, which needs to be calculated by the sensor parameters. However, due to confidentiality and other reasons, the parameters of many sensors cannot be disclosed, so the rational function model (RFM) has emerged. Within a certain elevation range, RFM can replace the rigorous model [7]. Therefore, we can use the rational polynomial coefficients (RPCs) in the RFM to establish the coordinate conversion relationship without knowing the specific parameters of the sensor. With the widespread application of RPCs, some RPC-based approximate epipolar image generation methods have been proposed. Oh et al. proposed a pushbroom satellite image epipolar resampling method based on RPCs [8], but for areas with large terrain changes, this method needs further research. Koh et al. proposed a unified piecewise epipolar resampling method that can generate stereo image pairs with near-zero y-parallax [9].
Since the 20th century, with the development of radar imaging theory, the concept of Synthetic Aperture Radar (SAR) was proposed. Due to the particularity of the SAR imaging mechanism, optical images and SAR images show different appearances and abilities [10]. Specifically, SAR images obtained by the active sensors reflect the electromagnetic characteristics of the surface targets and provide the ability to view in all-day, all-weather conditions and through clouds. At the same time, the optical images obtained by the passive sensors reflect the radiometric properties of the targets and can provide complementary information. Therefore, methods for generating epipolar images based on optical-SAR image pairs have begun to be proposed. First, Bagheri et al. mathematically proved that there is an epipolar geometry constraint relationship between SAR-optical image pairs, studied the possibility of 3-D reconstruction from very-high-resolution (VHR) SAR-optical image pairs, and indicated that SAR-optical image pairs in urban areas can be used to design a 3-D reconstruction framework [11]. Moreover, VHR SAR-optical image pairs have great potential for analyzing complex urban areas. If the accuracy of key points detection can be improved and the similarity between SAR images and optical images can be measured, then the accuracy of 3-D reconstruction can also be improved [12]. Therefore, how to generate high-quality epipolar images based on SAR-optical image pairs needs to be further studied.
In addition to optical image pairs and SAR-optical image pairs, the concept of epipolar geometry is also applicable to the cocircular geometry of SAR image pairs [13]. Pan et al. proposed an endpoints growing method based on RPCs to generate epipolar images and applied this method to IKONOS image pairs, TerraSAR-X image pairs, and mixture stereo pairs [14]. However, the feasibility of this method for mixture stereo pairs is subject to RPCs fitting. Karlheinz Gutjahr et al. proposed a method to create the epipolar geometry of a SAR sensor with an arbitrary three-dimensional structure through proper geometric image transformation and applied it to the TerraSAR-X stereo data set [13].
Although the above-mentioned existing methods have been used to generate approximate epipolar images of satellite stereo images, technical limitations still exist when considering the versatility, applicability, and simplicity of these models. Moreover, due to different viewing conditions and sun angles, in areas with large terrain undulations, nonlinear distortions will appear in the images, and it is difficult to perform image registration [15]. Moreover, in areas such as mountains and valleys with harsh terrain conditions, the commonly used methods are inefficient and poor in safety [16]. The existing methods are difficult to be used on multiple types of terrain and multi-sensor images to achieve ideal results at the same time.
Based on the above analysis, we propose an approximate epipolar image generation framework that can be used for images in areas with large terrain undulations and multi-sensor images, such as SAR-SAR images and SAR-optical images. By generating approximate epipolar images, the two-dimensional stereo matching problem is simplified to one-dimensional stereo matching, thereby improving the accuracy and efficiency of stereo matching [17]. Among them, the use of the digital elevation model (DEM) can reduce the initial offset of the two stereo images, thereby reducing the impact of terrain elevation changes on the generation of epipolar images. We first process the Level-1 images, which is the radiometric correction products, and use RPCs to project the left image into the image coordinate system of the right image to obtain a new image, which we name as Level-1P images, and then divide the entire image into blocks for parallel processing. Next, we generate approximate epipolar images and DSM for each block. Finally, all DSMs are spliced together to realize the generation of the whole DSM. The stereo satellite data used in this paper can be roughly divided into three categories. They are three-line array satellites, agile stereo satellites, and the general multiple view satellites. Three-line array satellites such as ZY-3 are equipped with three line-array cameras to image the same target from different angles along the direction of the satellite's flight, to obtain the in-track stereo images [18]. Agile satellites (e.g., GeoEye) can maneuver in three directions: roll, pitch, and yaw [19]. By adjusting these three directions in a short period, agile satellites can realize nearly simultaneous multi-angle observations of the same target. The general multiple views satellites collect stereo images by taking images of the ground from different views, such as GF-2 and GF-3, and this kind of stereo image has no rigorous stereo property.
The rest of this paper is organized as follows. Section 2 first verifies the epipolar constraint relationship of single-sensor and multi-sensor images and then describes the proposed approximate epipolar image generation framework. In Section 3, the experimental results of epipolar image generation are shown, and their accuracy is qualitatively evaluated and quantitatively analyzed. In Section 4, DSM generation based on our proposed framework is discussed. Conclusions are drawn in Section 5.

Methods
In this section, we first describe the epipolar constraint relationship and investigate the theoretical epipolar constraints of single-sensor and multi-sensor images. It is proved that the epipolar curve of the optical image pair is hyperbolic, the epipolar constraint relationship of the SAR image pair is a quartic polynomial. Moreover, the epipolar constraint relationship of the optical-SAR image pair and SAR-optical are quadratic polynomial and high order polynomial, respectively. Then, we take GF-2 and GF-3 images as examples to verify the characteristics of the epipolar constraint relationship between the pushbroom optical image pairs and the SAR image pairs. It is proved that the epipolar curves of these images can be approximated as straight lines within the scope of the image scene, and they satisfy the conjugacy and approximate parallel properties. Finally, we propose a general epipolar image generation framework and introduce it in detail based on the epipolar constraint relationship.

Theory of the Multi-Sensor Epipolar Constraint
In conventional 3D reconstruction frameworks, the epipolar geometry is the first clarity constraint that facilitates the procedure of image matching by reducing the search space from 2D to 1D [2]. Usually, the images to be processed are created by resampling the original images according to epipolar geometry before dense matching, and the epipolarity constraint always exists for optical stereo images captured by frame type cameras that follow a perspective projection [17]. This principle is illustrated in Figure 1. The effective generation of epipolar-rectified images requires high positioning accuracy of the satellite sensor models, and thus a few ground control points (GCPs) are chosen to improve the positioning accuracy of the orientation parameters.
In the left view of the stereo pair, there is point p l , which is the projection of the ground point P in the left image. If the point on the ray of point p l is projected into the right view through the image geometric model, the trajectory of these projection points forms the epipolar curve.

Epipolar Constraint of Optical Image Pairs
Supposing that the target P corresponds to two points p l and p r on the stereo pairs, and their image coordinates are denoted as (x l , y l ) and (x r , y r ). If the point p l is on the i-th scan line, and its coordinates on the focal plane can be denoted as (0, p l ), there exists a relationship between the two coordinates [6]: where p 0 is the translation coefficient and e stands for the detector size. We can draw the following equation from the collinear relationship of the imaging geometry: where X p , Y p , Z p is the spatial coordinates of P, X S i , Y S i , Z S i is the center of i-th scan line, λ denotes the scale coefficient and R i denotes the rotation matrix, and f represents the focal length. We can also draw a similar relationship of point p r : Combining the two collinear equations of points p l and p r , we can obtain where R j is the inverse matrix of R j , and it is denoted as R j = r j m×n 3×3 , and R ij = R j R i = (r m×n ) 3×3 . Assuming that the satellite is flying with high stability, the attitude remains unchanged during imaging, and thus the coordinates of the projection center can be regarded as a linear function that changes with the exposure time (scanning number), as follows: where u 1 ...u 12 are the known imaging parameters. All variables except (x r , y r ) can be regarded as known variables, if we use the left image as the reference images. Substituting the above function into Equation (4), we can derive the epipolar curve on the right image, which refers to the relationship between x r and y r can be derived as follows: where U k , k = 1, 2, ..., 7 are the combinations of known variables. We can see from Equation (6) that the epipolar curve of two linear sensors is a hyperbolic curve.

Epipolar Constraint of SAR Image Pairs
The imaging geometry of SAR sensors is a side-looking model based on the Range-Doppler (RD) geometry which is different from the linear array model of optical sensors. Similar to the optical stereo pairs, the spatial coordinates and image coordinates of the target P also fulfill two imaging models [20], where λ is the radar wavelength, f dc denotes the Doppler shift parameter and is generally set to 0, R is the slant-range of the first range pixel, and M denotes the range sampling rate.
where k denotes the reciprocal of the pulse repetition frequency. We also assume all variables except (x r , y r ) are known parameters when taking the left image as the reference images. Combining two Doppler equations in Equation (7), we can obtain The above equation can be simplified as And from the Doppler equation of the left image, we can derive that Substituting Z p and X p into the two Range equations in Equation (7), we can deduce Removing Y p from the above equations, we can derive the final relationship between (x r , y r ) as follows, where V 1 ...V 13 are known variables, and it is a quartic polynomial.

Epipolar Constraint of Optical-SAR Image Pairs
As for the target P imaged by the pushbroom optical and the R-D SAR models, its imaging equations of the two stereo pairs can be described as follows [6,20]: From the linear array equation, we can obtain that Substituting X p and Y p into the Doppler equation of the right image, we can obtain Z p as We also assume that all variables except (x r , y r ) are known parameters when taking the left image as the reference image. Substituting Similarly, X p and Y p are also linear functions of x r , denoted as Substituting X p , Y p , Z p into the Range equation, we can deduce and it is clear that the epipolar curve is a quadratic curve.

Epipolar Constraint of SAR-Optical Image Pairs
The epipolar geometries between optical-SAR pairs and SAR-optical pairs are also different since the optical and SAR imaging models are different. As for the target P imaged by the R-D SAR model and the pushbroom optical model, its imaging equations of the two stereo pairs can be described as follows [6,20]: Following the derivation procedure of Section 2.1.3, and by assuming that all variables except (x r , y r ) are known parameters, we can obtain the relationship between x r , y r as a high-order polynomial:

Verification of the Properties of Epipolar Constraint
As the rational function model can be regarded as an accurate alternative for rigorous imaging models for both optical and SAR images, we hereafter use RPCs for the sake of simplicity. For stereo images, the point p l in the left image can be projected to the object space by backward mapping using RPCs. By changing the elevation H, we can get the projected light in the object space. Then, using the RPCs of the right image to project the light into the right image, we can obtain the projection trajectory curve l. l is the epipolar curve corresponding to the point p l . Moreover, the conjugate point of p l in the right image will be on the curve l. This method of generating epipolar curves is known as the projection trajectory method (PTM) [21]. Then, we use PTM to verify the epipolar curve is a hyperbolic curve for optical acquisitions with pushbroom model, a quartic curve for SAR acquisitions with Range-Doppler model, and the epipolar curves of the optical-SAR image pair and the SAR-optical image pair satisfy the quadratic polynomial and the high order polynomial, respectively. In addition, we also further verify other properties for multi-sensor models. We first choose a point in the left image, set its height range to [−500 m:9000 m] which is set based on the altitude of the Earth's surface, then we get a series of points in the object space and map them to the right image. Finally, we use different functions to fit the epipolar curve. Figure 2 shows the epipolar curves generated from the optical satellite (GF-2) image pairs and the SAR satellite (GF-3) image pairs using the PTM. From this figure, we can find that the epipolar curves of the GF-2 image pairs and GF-3 image pairs are similar to straight lines. This means that within the scope of the image scene, the epipolar curve can be approximated as a straight line. To further verify the shape of the epipolar curve, we fit the coordinates of the points in the projection trajectory. The fitting results are shown in Figures 3-6.
The fitting results in Figure 3a show that when a linear function is used to fit the epipolar curve of the GF-2 image pair, the fitting residual can reach 3 pixels. However, when the fitting function is changed to a hyperbolic curve, the fitting residual is reduced by~80%. Therefore, the epipolar curve of the optical image pair can be approximated as a hyperbolic curve, which is in accordance with the theoretical conclusion. As shown in Figure 4, for the GF-3 image pair, the quartic curve fitting residual of the epipolar curve is only approximately 1/50 of the linear fitting residual. This means that the epipolar curve of the SAR image pair can be approximated by a quartic curve. For the optical-SAR image pair, when using linear fitting to fit the projection points, the fitting residual is between −25 and 8, which is shown in Figure 5a. However, the residual is within 1 pixel, if a quadratic polynomial is used for fitting. The results in Figure 6 show that by comparing the fitting residuals, the epipolar curve of the SAR-optical image pair can be approximated as a high order polynomial curve. These prove that the epipolar constraint relationship of the image pairs we derived is effective. In addition, we also verify the conjugacy of the epipolar curves. Conjugacy means that the corresponding points are located on the corresponding epipolar curves, and the points on the corresponding epipolar curves correspond one to one. We select six points (Point 1-6) from the same epipolar curve in Figure 2b and calculate the corresponding epipolar curve of each point in Figure 2a. As the epipolar curve can be approximated as a straight line in a local range, for the convenience of calculation, we perform linear fitting on the epipolar curves of the six points in the left image. The result is shown in Figure 7a. From the linear fitting results, it can be found that the epipolar curves fitted by these 6 points are not the same line, which indicates that there is no strict epipolar curve pair in the stereo images obtained by the linear pushbroom camera. However, the partial magnified image demonstrates that the difference between the fitting results of the epipolar curves at different points is very small, within 0.1 pixels. Therefore, within the scope of the image scene, these points' epipolar curves can be approximately regarded as the same epipolar curve, that is, the epipolar curves satisfy the conjugacy in the image range. Next, we verify the approximate parallel property of the epipolar curves. In Figure 2a

The Proposed General Epipolar Image Generation Framework
Based on the epipolar constraint relationship deduced in Section 2.1, we then introduce the proposed framework. Our framework can be divided into the following steps: (1) image pairs relative orientation, (2) image spatial conversion and overlapping region extraction based on DEM, (3) cropping the image pairs into small patches, (4) matching stereo image pairs based on spatial structure features and three-dimensional phase correlation algorithm, (5) estimating the fundamental matrix F based on the matching points, (6) image pairs correction and generate epipolar images, (7) calculating the coordinates in the original image and generate DSM using space intersection method, and (8) mosaic DSM. The flowchart of the proposed framework is presented in Figure 8. Next, we will introduce our framework in detail. We first preprocess the input Level-1 images, that is, we perform the relative orientation between image pairs. The relative orientation takes the principle that the corresponding rays are coplanar as the theoretical basis to restore or determine the relative positional relationship of the image pairs during imaging. The principle is to construct the conditional expression of the relative orientation elements based on the intersecting characteristic of the light pair corresponding to the stereo pair and solve the relative orientation elements by formulating equations [22]. We denote the preprocessed left and right images as Image 1 and Image 2 , respectively. Then, we generate epipolar images and DSM from Image 1 and Image 2 .

Image Spatial Conversion and Overlapping Region Extraction Based on DEM
The rational function model is a generalized sensor model that can replace the linear array pushbroom camera model and the SAR physical camera model [14]. This model makes full use of the auxiliary parameters of satellite imagery and solves the model coefficients by fitting with the existing strict geometric model. Because of the high accuracy of the rational polynomial coefficients, it is widely used in line array optical satellites. Therefore, our framework is based on the RPCs to perform image space conversion and overlapping region extraction on stereo images. The RFM expresses the image space coordinates (r, c) as a polynomial ratio of the object space coordinates (X, Y, Z), which is shown in Equation (21).
where P 1 , P 2 , P 3 , and P 4 are polynomials, and r n , c n , X n , Y n , and Z n are normalized coordinates. According to Equation (21), the corresponding image coordinates can be calculated according to the latitude, longitude, and elevation of a certain point in the object space. This process is called the RPCs positive solution. In addition, the RPCs inverse solution calculation formulas are as follows: where P 5 , P 6 , P 7 , and P 8 are polynomials. For the RPCs positive solution, by substituting the latitude and longitude coordinates and elevation into Equation (21), the row and column coordinates in the image space can be obtained. However, when (r, c) is known and uses the RPCs inverse solution formulas to calculate the spatial coordinates, it cannot be solved directly due to a lack of elevation information. Therefore, it is necessary to combine the DEM to obtain the corresponding spatial coordinates through multiple iterations, that is the Ray Tracing method combined with DEM from the image space to the object space [23]. Therefore, we first use RPCs to perform spatial conversion and overlapping region extraction on stereo satellite images based on DEM to obtain processed image pairs. For ease of description, we denote the RPCs corresponding to Image 1 and Image 2 as RPC 1 and RPC 2 . The schematic diagram of the image space conversion processing flow is shown in Figure 9. The specific steps are as follows: 1.
For each pixel in Image 1 , substitute the row and column coordinates into Equation (22) according to RPC 1 . The corresponding object space longitude, latitude, and elevation are iteratively calculated using the ray tracing method.

2.
Substituting the spatial coordinates obtained in the previous step into Equation (21) constructed by RPC 2 , the coordinates in the image space of Image 2 corresponding to the object points can be obtained.

3.
According to the corresponding relationship between the image space of Image 1 and the image space of Image 2 , the new left image, which is called Level-1P image and denoted as Image 1P , is obtained. Image 1P only contains the overlapping area of the image pair. Then, Image 1P and Image 2 are used as a new image pair to generate epipolar images and DSM.

Image Blocks Dividing in Image 1P and Image 2
We divide the image pairs into several square image block pairs to improve the accuracy and speed of epipolar images and DSM generation. We crop an image block every 512 pixels when dividing image blocks, so that adjacent image blocks have a 50% overlap area, to ensure that the final generated DSM can cover all areas of the image. The image blocks covering the same area in Image 1P and Image 2 are respectively denoted as img 1 and img 2 . By dividing the image pairs into several block pairs, it is helpful to extract more reliable conjugate features and increase the efficiency. The block size is set to 1024 × 1024 pixels, which can maintain the affine transformation characteristics [24].

Images Matching Based on Spatial Structure Features and 3D Phase Correlation
Considering the radiometric and geometric distortions caused by view conditions and sun angles, herein, we use a robust feature matching method to obtain high-precision correspondences. Based on our previous work [25], we extract intensity-invariant and sensor-invariant structure features and put them into the 3D phase correlation framework. Traditional phase correlation methods are based on two core ideas. The first is the assumption of brightness constancy, that is, the intensity of pixels in an image with a certain offset remains unchanged. The second is the famous Fourier transform characteristic, which means that the Fourier phases of two functions that have an offset in the spatial domain have a linear difference. These two core ideas are given as follows: where I 1 and I 2 are two images, F 1 and F 2 are the corresponding Fourier Transforms, and x 0 and y 0 are the offsets of the coordinates. The normalized cross power spectrum can be obtained: The inverse Fourier transform of the cross power spectrum is the impulse function in the spatial domain. By locating the extreme points in the spatial domain, the offsets in the two directions can be found: However, for multi-view and multi-sensor remote sensing images, there are often radiation and geometric differences between them, which make the first core idea of the phase correlation method no longer applicable. To solve this problem, we first map the input images to a feature space, which retains the invariance by capturing the inherent structural information. Two three-dimensional feature sets are formed and embedded in the three-dimensional phase correlation process to estimate the two-dimensional translational displacement by using dense spatial structure feature representation instead of image intensity.
For spatial structure features, we use dense scale invariant transform features [26]. First, we calculate the horizontal and vertical gradient response of each pixel. Then, we obtain the histogram structure of the gradient position and direction based on the gradient magnitude and direction to construct the spatial feature. Finally, we embed two 3-D feature images into the 3-D phase correlation process, as shown below: where V 1 and V 2 are two 3-D feature images. x and y are the coordinates of the pixels in the image. g represents the gradient magnitude of the pixel. The Fourier translation characteristics of the 3-D phase correlation are The normalized cross power spectrum of the above equation is where F 1 and F 2 are the three-dimensional Fourier transforms corresponding to V 1 and V 2 . d is a three-dimensional unit vector, and its direction is the direction of the gradient of the pixel. The three-dimensional inverse Fourier Transform of Equation (28) is an impulse function throughout three-dimensional space: The biggest difference between Q(x, y, g) and Q(x, y) is that we need to locate the impulse response in three-dimensional space instead of two-dimensional space. A simple and direct method is to sum the third dimension information in Q(x, y, g) and convert it into Q(x, y) to solve. Finally, two-dimensional translational displacement can be estimated by locating the maximum response, and then correspondences are obtained.

Fundamental Matrix Estimation Based on Conjugation Points
In Section 2.1, we introduced the epipolar constraint relationship between image pairs. The epipolar constraint can also be expressed in mathematical form, that is, the fundamental matrix. The fundamental matrix F represents the mapping from p l in the left image to the epipolar curve l r in the right image. This relationship is expressed as l r = Fp l [27].
Based on the epipolar constraint, the corresponding right image point p r must be on the epipolar curve l r , and p T r l r = p T r Fp l = 0.
When the coordinates of the matching points in the image pairs are obtained from the aforementioned feature matching method, combined with Equation (30), F between the image pairs can be calculated. As F is a 3 × 3 matrix with 9 unknowns in total, at least eight pairs of matching points are required to solve F. Therefore, we use the Eight-Point-Algorithm to calculate F [28].

Image Pairs Correction Based on Fundamental Matrix
Combining the matching points of the left and right images img 1 and img 2 , and the fundamental matrix F, the image pairs can be corrected to generate epipolar images. In this paper, we use the Hartley method [29] to correct the image pairs. This method can be implemented in two steps.
First of all, we need to calculate the projection transformation matrices H l and H r based on the fundamental matrix [30] and this method is the same as the epipolar rectification technique of the Ames Stereo pipeline [31]. The projection transformation matrix H can be obtained by multiplying four matrices, that is, H = CGRT. The four matrices C, G, R, and T, respectively, represent different transformation relations. We assume that the homogeneous coordinates of the pole of the right image is (u e , v e , 1) T , and the homogeneous coordinates of the center point of the right image is (u c , v c , 1) T . Then, we can obtain the projection transformation matrix H r corresponding to the right image as follows: where u 1 e , v 1 e , 1 T = T(u e , v e , 1) T and u 2 e , v 2 e , 1 Assuming that among the N pairs of matching points obtained in the Section 2.
Second, we resample the pair of image blocks img 1 and img 2 after calculating the projection transformation matrices H l and H r . We assume that these two matrices H l and H r can map vectors to the same plane J. Moreover, in the plane J, there is a window W with a size of 1024 × 1024 pixels. Then, img 1 and img 2 can be resampled through H 1 and H r , so that the resampled two images are within the range of window W. For any point p in W, its corresponding point in img 1 can be expressed as p l = H −1 l p . Similarly, the corresponding point of p in img 2 is p r = H −1 r p . Then, img 1 and img 2 can be resampled to the window W in plane J which are denoted as img 1 and img 2 , through bicubic interpolation.
Finally, we crop the middle part from img 1 and img 2 as the final epipolar images, denoted as ep_img 1 and ep_img 2 .

DSM Generation
We use Semi-Global Matching (SGM) [32] which was proposed by Hirschmüller to obtain pixelwise matching points from ep_img 1 and ep_img 2 . The realization process of the stereo matching algorithm can be summarized into the following four steps: matching cost calculation, matching cost aggregation, disparity calculation, and disparity optimization [33].
We set the left epipolar image ep_img 1 as the reference image, after obtaining the disparity map of the image pair based on the stereo matching method. Through the back-projection transformation, we calculate the coordinates of each pixel in ep_img 1 and ep_img 2 in Image 1P and Image 2 , respectively; their coordinates are denoted as u 1p , v 1p and (u 2 , v 2 ). It is necessary to convert the coordinates in Image 1P to the coordinates in the original image Image 1 in order to further generate DSM. We substitute RPC 2 into Equation (22) and use the ray tracing method based on DEM to calculate the spatial coordinates (X, Y, Z) corresponding to u 1p , v 1p . Then, by substituting (X, Y, Z) into Equation (21) formed by RPC 1 , we obtain the corresponding coordinates of u 1p , v 1p in Image 1 . Finally, we can produce DSM based on the space forward intersection method.

Results
In this section, we first evaluate the performance of the proposed framework by comparing it to the current state-of-the-art algorithms. The first method is an epipolar image generation algorithm based on the PTM [21]. The second method is the S2P method [24]. Various scenarios of different sensors, including urban, agriculture, and mountain areas, are used. Then, we explore the applicability of our framework on SAR-optical and SAR-SAR image pairs. In the comparative experiment, the disparities of the matching points are used as the quantitative metric to evaluate the quality of the generated epipolar images. The satellite images used in our experiment include ZY-3, GF-2, GF-3, and GeoEye images. Among them, ZY-3, GF-2, and GeoEye are optical satellites, and GF-3 is a SAR satellite. The coverage area of these image data and their information are shown in Table 1.

Experimental Results of Epipolar Images Generation
To verify the performance of the proposed approximate epipolar image generation framework, we first select the ZY-3 satellite images of the Songshan area in China, which is mostly mountainous. The elevation range of Songshan is approximately 67-1472 m. In order to further verify the performance of our framework, the results obtained by our framework are compared to two other methods.
The epipolar images generated by the proposed framework is shown in Figure 10c. From the results, it can be found that the vertical disparity of the generated epipolar image pair is approximately negligible, which means that satisfactory results can be obtained for areas with large terrain elevation changes that do not satisfy the affine transformation model. In addition, the epipolar images generated by two other comparison methods in the same region are shown in Figure 10a,b. In order to make the comparison of results more intuitive, we manually select two corresponding points in the epipolar images and display their coordinates (x, y), where x is the horizontal coordinate and y is the vertical coordinate. The results show that the y-disparity of the epipolar images generated by these three methods is relatively small and can be approximately ignored. However, in addition to the y-disparity, the x-disparity of the points in Figure 10c is almost negligible. Therefore, it is clear that compared with the other two methods, the proposed framework can get better epipolar images.
We further calculate the y-disparity of the SIFT matching points in the epipolar image pair to quantitatively evaluate the results. The smaller this y-disparity, the higher the accuracy of the generated epipolar images. Moreover, when performing stereo matching, the reduction of the search range of matching points will improve the accuracy and efficiency of image matching. Therefore, we also calculate the x-disparity of the SIFT matching points at the same time. When the x-disparity is small, the matching efficiency and accuracy during dense matching of epipolar images will be improved. The disparity of matching points is shown in Table 2. For the sake of comparison, the data in the tables are the statistical results of the absolute value of the disparity.  It can be found that the proposed framework can generate high-quality epipolar images in regions with large terrain undulations according to the image matching results. For the y-disparity, the results obtained by our framework are slightly smaller than the results obtained by the S2P and PTM-based methods. Although the proposed framework only shows a small advantage in y-disparity, it has obvious advantages in x-disparity. The epipolar images obtained by our framework also have higher similarity in the x-direction and the average x-disparity of the matching points can reach the sub-pixel level. The results show that the epipolar images generated by our framework can rearrange the original images better so that the corresponding points of the image pair are approximately on the same horizontal line, thereby reducing the search range of conjugate points in image matching to 1-D space. In addition, the generated epipolar images can also reduce the disparity in the x-direction, which provides a guarantee for the efficiency and accuracy of the dense matching before the subsequent DSM generation.
We also conduct experiments on ZY-3 images in Tianshan, China, which are shown in Figure 11. Compared with the Songshan area, the terrain changes in this area are more obvious, approximately 24-3091 m. The quality of the epipolar images generated in this area is quantitatively evaluated, and the results are shown in Table 3. Compared with the Songshan area, the elevation of the Tianshan area has greater fluctuations, so the x-disparity of the generated epipolar image will be greater. However, by comparing the results of the other two methods, it can be found that the x-disparity of the epipolar image generated by the proposed framework is still optimal among the three methods. The results show that in areas with large elevation fluctuations, not only the y-disparity can reach sub-pixels in the epipolar images generated by our framework, but most of the average x-disparity does not exceed 3 pixels. Therefore, our epipolar image generation framework has more obvious advantages for areas with large undulations like Tianshan.

Verification of the Applicability of the Proposed Framework
It has been verified in Section 3.1 that the proposed framework can generate good epipolar images for areas with large terrain elevation changes. We select stereo image pairs from other satellites and other regions to generate epipolar images to verify the applicability of the proposed framework.

The Results of Epipolar Image Generation for Different Terrains
We choose the Omaha region of the United States to verify the applicability of the proposed framework in regions with small terrain undulations. The results of the comparative experiment are shown in Figure 12 and Table 4. The results show that the proposed framework can still achieve ideal results for urban areas with smaller terrain undulations and larger elevation gradients than mountain areas. Compared with the S2P method, the proposed framework has no obvious advantages for the y-disparity of the epipolar image in the urban area. But the x-disparity is still relatively small. The mean values of the x-disparity of the corresponding points are both about 1 to 2 pixels. Therefore, the proposed epipolar image generation framework is also applicable to areas with small terrain undulations and large terrain undulation gradients.  We conduct experiments on satellite images from other sources to further verify the universality of the proposed epipolar image generation framework. We choose the stereo image pair consisting of GF-3 satellite image and GF-2 satellite image in Songshan in this experiment. We first generate epipolar images based on the SAR-optical image pair. Due to the particularity of SAR imaging, we adopt different structural features in the matching stage for SAR images [25]. The epipolar images obtained using the proposed framework are shown in Figure 13. The average y-disparity of the image matching points can reach approximately 1 pixel, and the average x-disparity is less than 2 pixels (see Table 5). It can be seen from the comparison results that although the S2P method can generate epipolar images with a small y-disparity, the x-disparity is large. Moreover, the epipolar images are distorted, which is not conducive to subsequent image matching. Therefore, the proposed epipolar images generation framework is also applicable for SAR-optical satellite stereo image pairs.  In addition to the epipolar image generation experiment on the SAR-optical image pairs, we also perform a verification experiment on the SAR-SAR image pair. We choose GF-3 images as our SAR image source and the corresponding area of the images used in Omaha. The epipolar images generated by GF-3 image pairs in the Omaha area are shown in Figure 14 and the results of the comparative experiment are shown in Table 6. By comparing with the S2P method, it can be found that the proposed framework has obvious advantages for the generation of epipolar images of SAR-SAR image pairs. The average value of the x-disparity and y-disparity of the epipolar images we generated can reach the sub-pixel level. And the epipolar image pair is not distorted, which is conducive to the dense matching of the epipolar image.  In summary, the proposed epipolar images generation framework can be applied to areas with large and small topographic undulations through the above experiments. Moreover, it can be used not only for the epipolar image generation of optical image pairs but also for the epipolar image generation of SAR-optical image pairs and SAR-SAR image pairs.

Discussion
The results in Section 3 have shown that the proposed framework can generate better quality epipolar images. Next, we evaluate the performance of DSM generation using the generated epipolar images based on the previous experiment.
In the DSM generation experiment, we use the ZY-3 satellite images with a ground sampling distance of 3.5 m. Then, the generated epipolar images of Songshan are tested to obtain the corresponding DSM through the SGM-based stereo matching method and the Space Intersection method, and the result is shown in Figure 15. We compare the generated DSM with the DSM generated by other methods and the existing DSM to evaluate the accuracy of the DSM generated by our framework, based on the GCPs. We choose AW3D30 DSM and DSM generated by ENVI as comparison data. Among them, AW3D30 (ALOS World 3D-30 m) is a DSM data set released by the Japan Aerospace Exploration Agency(JAXA), with a grid resolution of~30 m. This DSM data set is obtained by using PRISM on the land observation satellite ALOS to image the same object using three cameras with different perspectives [34]. In addition, the other DSM for comparison is obtained through the photogrammetry extension module in ENVI-v5.6 [35]. The ENVI software uses the metadata information of the ZY-3 original image pair to generate DSM. Comparing the generated DSM with GCPs, the absolute values of the difference between the elevation is mostly within 4 m, and the average absolute difference is 2.33 m. The histogram of the elevation accuracy is shown in Figure 16. Table 7 lists the accuracy of DSM at the GCPs marked in Figure 15. Among them, the vertical precision of AW3D30 DSM in this area is~20 m. Moreover, the vertical precision of ENVI-DSM and the generated DSM are both about 2 m. Therefore, the results show that the accuracy of the DSM we generated is greatly improved compared to AW3D30 DSM, and it can achieve higher accuracy than the DSM generated by ENVI. Therefore, the proposed framework can generate good quality epipolar images, which can be used to generate high accuracy DSM.

Conclusions
In this paper, we proposed a framework for generating epipolar images based on DEM constraints. DSM can be generated by stereo matching using the obtained approximate epipolar images. Through feature matching of the generated epipolar images, the epipolar images generated by our framework have higher precision and versatility, compared with the existing PTM method and the S2P method. The advantages of the proposed framework can be summarized as follows.

1.
The epipolar images generated by the proposed framework can better achieve the goal of restricting the search range of matching points from 2-D to 1-D for areas with large terrain undulations. Moreover, it is found that the matching points of the epipolar images have not only a small y-disparity, but also a small x-disparity. This shows that the search range of matching points can be further reduced, and the accuracy and speed of stereo matching will be improved, which is beneficial to ensure the efficiency of subsequent stereo matching work.

2.
This framework is also suitable for areas with small terrain undulations and can achieve ideal results. This can ensure that our framework can be used to generate epipolar images for the entire image with various scenarios. In addition to optical image pairs, this framework can be also applied to SAR-optical image pairs and SAR-SAR image pairs. This means that our framework has good versatility.

3.
The epipolar images generated by this framework can be applied to the generation of high-accuracy DSM. For optical image pairs, the average value of the elevation difference between the DSM generated by the proposed framework and the GCPs is no more than 3 m, which has good accuracy. That is, the proposed epipolar image generation framework can be used in subsequent DSM generation work.
Therefore, the epipolar image generation framework proposed in this paper is suitable for different terrains and various sensors, and a high-precision DSM can also be obtained.
The influence of the size of image blocks on the accuracy of the generated DSM needs to be further verified as our framework divides the image into blocks. In addition, we also conducted experiments on the generation of DSM from SAR-SAR image pairs, but there are still some invalid and distorted values in the DSM. Therefore, we will investigate robust stereo matching methods for SAR images in our future works.