A General Framework of Remote Sensing Epipolar Image Generation

Wang, Xuanqi; Wang, Feng; Xiang, Yuming; You, Hongjian

doi:10.3390/rs13224539

Open AccessArticle

A General Framework of Remote Sensing Epipolar Image Generation

¹

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

²

Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, Chinese Academy of Sciences, Beijing 100190, China

³

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(22), 4539; https://doi.org/10.3390/rs13224539

Submission received: 20 September 2021 / Revised: 27 October 2021 / Accepted: 5 November 2021 / Published: 11 November 2021

(This article belongs to the Special Issue Image Processing and Analysis: Trends in Registration, Data Fusion, 3D Reconstruction, and Change Detection II)

Download

Browse Figures

Versions Notes

Abstract

:

Epipolar images can improve the efficiency and accuracy of dense matching by restricting the search range of correspondences from 2-D to 1-D, which play an important role in 3-D reconstruction. As most of the satellite images in archives are incidental collections, which do not have rigorous stereo properties, in this paper, we propose a general framework to generate epipolar images for both in-track and cross-track stereo images. We first investigate the theoretical epipolar constraints of single-sensor and multi-sensor images and then introduce the proposed framework in detail. Considering large elevation changes in mountain areas, the publicly available digital elevation model (DEM) is applied to reduce the initial offsets of two stereo images. The left image is projected into the image coordinate system of the right image using the rational polynomial coefficients (RPCs). By dividing the raw images into several blocks, the epipolar images of each block are parallel generated through a robust feature matching method and fundamental matrix estimation, in which way, the horizontal disparity can be drastically reduced while maintaining negligible vertical disparity for epipolar blocks. Then, stereo matching using the epipolar blocks can be easily implemented and the forward intersection method is used to generate the digital surface model (DSM). Experimental results on several in-track and cross-track images, including optical-optical, SAR-SAR, and SAR-optical pairs, demonstrate the effectiveness of the proposed framework, which not only has obvious advantages in mountain areas with large elevation changes but also can generate high-quality epipolar images for flat areas. The generated epipolar images of a ZiYuan-3 pair in Songshan are further utilized to produce a high-precision DSM.

Keywords:

DEM; epipolar constraint; RPCs; epipolar image; DSM; stereo matching

1. Introduction

With the rapid development of high-resolution remote sensing sensors, digital stereo image data with large coverage, high precision, and timeliness can be easily obtained. Each pair of overlapping images forms a stereoscopic observation of the ground target, allowing the estimation of its 3D position. An important application of 3D information processing is the extraction of the digital surface model (DSM). DSM can truly reflect the situation on the ground. In addition to terrain elevation, DSM also contains the elevation information of ground objects such as buildings, bridges, and trees, and is widely used in various fields. Dense matching (stereo matching) is a key technology in DSM generation, and it has an important constraint, the epipolar geometry constraint. Using epipolar lines to retrieve conjugate points in image pairs can improve the efficiency and accuracy of stereo matching and it is a development trend to generate high-quality epipolar images from original satellite stereo image pairs [1]. Currently, optical satellite images are mostly obtained by linear array pushbroom imaging technology. However, unlike frame satellite imagery, linear pushbroom satellite images do not have a strict epipolar relationship and can only generate approximate epipolar images [2]. Therefore, how to generate epipolar images based on linear pushbroom satellite images has become an ongoing research topic.

As early as the 1980s, Zhang and Zhou proposed a SPOT image approximate epipolar arrangement method, namely, the polynomial fitting method [3]. This method only needs to use the coordinates of the conjugate points, without knowing other image information. Kim analyzed the epipolar curves generation method of the sensor model based on the collinear equations according to the projection trajectory method [4]. Morgan et al. proposed an approximate epipolar curves generation method based on a parallel projection model [2]. This method has certain requirements for the sensor field of view, terrain undulations, and conjugate point matching accuracy and lacks versatility [5]. Wang et al. proposed a new satellite stereo image epipolar resampling method, which directly establishes the correspondence between original image pixels and epipolar image pixels through a geometric sensor model [6].

A significant fundamental content of epipolar correction for remote sensing images is to establish the imaging equation between the image coordinates and the spatial coordinates. The general imaging equation is a rigorous model established from the imaging geometric relationship of the sensor, which needs to be calculated by the sensor parameters. However, due to confidentiality and other reasons, the parameters of many sensors cannot be disclosed, so the rational function model (RFM) has emerged. Within a certain elevation range, RFM can replace the rigorous model [7]. Therefore, we can use the rational polynomial coefficients (RPCs) in the RFM to establish the coordinate conversion relationship without knowing the specific parameters of the sensor. With the widespread application of RPCs, some RPC-based approximate epipolar image generation methods have been proposed. Oh et al. proposed a pushbroom satellite image epipolar resampling method based on RPCs [8], but for areas with large terrain changes, this method needs further research. Koh et al. proposed a unified piecewise epipolar resampling method that can generate stereo image pairs with near-zero y-parallax [9].

Since the 20th century, with the development of radar imaging theory, the concept of Synthetic Aperture Radar (SAR) was proposed. Due to the particularity of the SAR imaging mechanism, optical images and SAR images show different appearances and abilities [10]. Specifically, SAR images obtained by the active sensors reflect the electromagnetic characteristics of the surface targets and provide the ability to view in all-day, all-weather conditions and through clouds. At the same time, the optical images obtained by the passive sensors reflect the radiometric properties of the targets and can provide complementary information. Therefore, methods for generating epipolar images based on optical-SAR image pairs have begun to be proposed. First, Bagheri et al. mathematically proved that there is an epipolar geometry constraint relationship between SAR–optical image pairs, studied the possibility of 3-D reconstruction from very-high-resolution (VHR) SAR–optical image pairs, and indicated that SAR-optical image pairs in urban areas can be used to design a 3-D reconstruction framework [11]. Moreover, VHR SAR–optical image pairs have great potential for analyzing complex urban areas. If the accuracy of key points detection can be improved and the similarity between SAR images and optical images can be measured, then the accuracy of 3-D reconstruction can also be improved [12]. Therefore, how to generate high-quality epipolar images based on SAR-optical image pairs needs to be further studied.

In addition to optical image pairs and SAR-optical image pairs, the concept of epipolar geometry is also applicable to the cocircular geometry of SAR image pairs [13]. Pan et al. proposed an endpoints growing method based on RPCs to generate epipolar images and applied this method to IKONOS image pairs, TerraSAR-X image pairs, and mixture stereo pairs [14]. However, the feasibility of this method for mixture stereo pairs is subject to RPCs fitting. Karlheinz Gutjahr et al. proposed a method to create the epipolar geometry of a SAR sensor with an arbitrary three-dimensional structure through proper geometric image transformation and applied it to the TerraSAR-X stereo data set [13].

Although the above-mentioned existing methods have been used to generate approximate epipolar images of satellite stereo images, technical limitations still exist when considering the versatility, applicability, and simplicity of these models. Moreover, due to different viewing conditions and sun angles, in areas with large terrain undulations, nonlinear distortions will appear in the images, and it is difficult to perform image registration [15]. Moreover, in areas such as mountains and valleys with harsh terrain conditions, the commonly used methods are inefficient and poor in safety [16]. The existing methods are difficult to be used on multiple types of terrain and multi-sensor images to achieve ideal results at the same time.

Based on the above analysis, we propose an approximate epipolar image generation framework that can be used for images in areas with large terrain undulations and multi-sensor images, such as SAR-SAR images and SAR-optical images. By generating approximate epipolar images, the two-dimensional stereo matching problem is simplified to one-dimensional stereo matching, thereby improving the accuracy and efficiency of stereo matching [17]. Among them, the use of the digital elevation model (DEM) can reduce the initial offset of the two stereo images, thereby reducing the impact of terrain elevation changes on the generation of epipolar images. We first process the Level-1 images, which is the radiometric correction products, and use RPCs to project the left image into the image coordinate system of the right image to obtain a new image, which we name as Level-1P images, and then divide the entire image into blocks for parallel processing. Next, we generate approximate epipolar images and DSM for each block. Finally, all DSMs are spliced together to realize the generation of the whole DSM. The stereo satellite data used in this paper can be roughly divided into three categories. They are three-line array satellites, agile stereo satellites, and the general multiple view satellites. Three-line array satellites such as ZY-3 are equipped with three line-array cameras to image the same target from different angles along the direction of the satellite’s flight, to obtain the in-track stereo images [18]. Agile satellites (e.g., GeoEye) can maneuver in three directions: roll, pitch, and yaw [19]. By adjusting these three directions in a short period, agile satellites can realize nearly simultaneous multi-angle observations of the same target. The general multiple views satellites collect stereo images by taking images of the ground from different views, such as GF-2 and GF-3, and this kind of stereo image has no rigorous stereo property.

The rest of this paper is organized as follows. Section 2 first verifies the epipolar constraint relationship of single-sensor and multi-sensor images and then describes the proposed approximate epipolar image generation framework. In Section 3, the experimental results of epipolar image generation are shown, and their accuracy is qualitatively evaluated and quantitatively analyzed. In Section 4, DSM generation based on our proposed framework is discussed. Conclusions are drawn in Section 5.

2. Methods

In this section, we first describe the epipolar constraint relationship and investigate the theoretical epipolar constraints of single-sensor and multi-sensor images. It is proved that the epipolar curve of the optical image pair is hyperbolic, the epipolar constraint relationship of the SAR image pair is a quartic polynomial. Moreover, the epipolar constraint relationship of the optical-SAR image pair and SAR-optical are quadratic polynomial and high order polynomial, respectively. Then, we take GF-2 and GF-3 images as examples to verify the characteristics of the epipolar constraint relationship between the pushbroom optical image pairs and the SAR image pairs. It is proved that the epipolar curves of these images can be approximated as straight lines within the scope of the image scene, and they satisfy the conjugacy and approximate parallel properties. Finally, we propose a general epipolar image generation framework and introduce it in detail based on the epipolar constraint relationship.

2.1. Theory of the Multi-Sensor Epipolar Constraint

In conventional 3D reconstruction frameworks, the epipolar geometry is the first clarity constraint that facilitates the procedure of image matching by reducing the search space from 2D to 1D [2]. Usually, the images to be processed are created by resampling the original images according to epipolar geometry before dense matching, and the epipolarity constraint always exists for optical stereo images captured by frame type cameras that follow a perspective projection [17]. This principle is illustrated in Figure 1. The effective generation of epipolar-rectified images requires high positioning accuracy of the satellite sensor models, and thus a few ground control points (GCPs) are chosen to improve the positioning accuracy of the orientation parameters.

In the left view of the stereo pair, there is point

p_{l}

, which is the projection of the ground point P in the left image. If the point on the ray of point

p_{l}

is projected into the right view through the image geometric model, the trajectory of these projection points forms the epipolar curve.

2.1.1. Epipolar Constraint of Optical Image Pairs

Supposing that the target P corresponds to two points

p_{l}

and

p_{r}

on the stereo pairs, and their image coordinates are denoted as

(x_{l}, y_{l})

and

(x_{r}, y_{r})

. If the point

p_{l}

is on the i-th scan line, and its coordinates on the focal plane can be denoted as

(0, p_{l})

, there exists a relationship between the two coordinates [6]:

\begin{matrix} x_{l} = i, y_{l} = (p_{l} + p_{0}) / e, \end{matrix}

(1)

where

p_{0}

is the translation coefficient and e stands for the detector size. We can draw the following equation from the collinear relationship of the imaging geometry:

(\begin{matrix} X_{p} \\ Y_{p} \\ Z_{p} \end{matrix}) = (\begin{matrix} X_{S_{i}} \\ Y_{S_{i}} \\ Z_{S_{i}} \end{matrix}) + λ_{i} R_{i} (\begin{matrix} 0 \\ p_{l} \\ - f \end{matrix}), R_{i} = (r_{i m \times n})_{3 \times 3},

(2)

where

(X_{p}, Y_{p}, Z_{p})

is the spatial coordinates of P,

(X_{S_{i}}, Y_{S_{i}}, Z_{S_{i}})

is the center of i-th scan line,

λ

denotes the scale coefficient and

R_{i}

denotes the rotation matrix, and f represents the focal length. We can also draw a similar relationship of point

p_{r}

:

\begin{matrix} x_{r} = j \\ y_{r} = (p_{r} + p_{0}) / e \end{matrix} & (\begin{matrix} X_{p} \\ Y_{p} \\ Z_{p} \end{matrix}) = (\begin{matrix} X_{S_{j}} \\ Y_{S_{j}} \\ Z_{S_{j}} \end{matrix}) + λ_{j} R_{j} (\begin{matrix} 0 \\ p_{r} \\ - f \end{matrix}), R_{j} = (r_{j m \times n})_{3 \times 3} .

(3)

Combining the two collinear equations of points

p_{l}

and

p_{r}

, we can obtain

(\begin{matrix} 0 \\ p_{r} \\ - f \end{matrix}) = \frac{1}{λ_{j}} {R_{j}}^{'} (\begin{matrix} X_{S_{i}} - X_{S_{j}} \\ Y_{S_{i}} - Y_{S_{j}} \\ Z_{S_{i}} - Z_{S_{j}} \end{matrix}) + \frac{λ_{i}}{λ_{j}} {R_{j}}^{'} R_{i} (\begin{matrix} 0 \\ p_{l} \\ - f \end{matrix}),

(4)

where

{R_{j}}^{'}

is the inverse matrix of

R_{j}

, and it is denoted as

{R_{j}}^{'} = ({r^{'}}_{j m \times n})_{3 \times 3}

, and

R_{i j} = {R_{j}}^{'} R_{i} = {(r_{m \times n})}_{3 \times 3} .

Assuming that the satellite is flying with high stability, the attitude remains unchanged during imaging, and thus the coordinates of the projection center can be regarded as a linear function that changes with the exposure time (scanning number), as follows:

\begin{matrix} X_{S_{i}} = u_{1} i + u_{2}, Y_{S_{i}} = u_{3} i + u_{4}, Z_{S_{i}} = u_{5} i + u_{6}, \\ X_{S_{j}} = u_{7} j + u_{8}, Y_{S_{j}} = u_{9} j + u_{10}, Z_{S_{j}} = u_{11} j + u_{12}, \end{matrix}

(5)

where

u_{1} . . . u_{12}

are the known imaging parameters. All variables except

(x_{r}, y_{r})

can be regarded as known variables, if we use the left image as the reference images. Substituting the above function into Equation (4), we can derive the epipolar curve on the right image, which refers to the relationship between

x_{r}

and

y_{r}

can be derived as follows:

e y_{r} - p_{0} = - f \frac{U_{1} x_{r} + U_{2}}{U_{3} x_{r} + U_{4}} \Rightarrow U_{4} x_{r} y_{r} + U_{5} x_{r} + U_{6} y_{r} + U_{7} = 0,

(6)

where

U_{k}, k = 1, 2, . . ., 7

are the combinations of known variables. We can see from Equation (6) that the epipolar curve of two linear sensors is a hyperbolic curve.

2.1.2. Epipolar Constraint of SAR Image Pairs

The imaging geometry of SAR sensors is a side-looking model based on the Range-Doppler (RD) geometry which is different from the linear array model of optical sensors. Similar to the optical stereo pairs, the spatial coordinates and image coordinates of the target P also fulfill two imaging models [20],

\begin{matrix} {(X_{p} - X_{S_{i}})}^{2} + {(Y_{p} - Y_{S_{i}})}^{2} + {(Z_{p} - Z_{S_{i}})}^{2} = {(R_{i} + M_{i} y_{l})}^{2} \\ V_{X_{i}} (X_{p} - X_{S_{i}}) + V_{Y_{i}} (Y_{p} - Y_{S_{i}}) + V_{Z_{i}} (Z_{p} - Z_{S_{i}}) = - \frac{λ_{i} f_{d c}}{2} (R_{i} + M_{i} y_{l}), \\ {(X_{p} - X_{S_{j}})}^{2} + {(Y_{p} - Y_{S_{j}})}^{2} + {(Z_{p} - Z_{S_{j}})}^{2} = {(R_{j} + M_{j} y_{r})}^{2} \\ V_{X_{j}} (X_{p} - X_{S_{j}}) + V_{Y_{j}} (Y_{p} - Y_{S_{j}}) + V_{Z_{j}} (Z_{p} - Z_{S_{j}}) = - \frac{λ_{j} f_{d c}}{2} (R_{j} + M_{j} y_{r}), \end{matrix}

(7)

where

λ

is the radar wavelength,

f_{d c}

denotes the Doppler shift parameter and is generally set to 0, R is the slant-range of the first range pixel, and M denotes the range sampling rate.

(X_{S}, Y_{S}, Z_{S})

and

(V_{X}, V_{Y}, V_{Z})

denote the position and velocity vectors of the antenna phase center, and the position vectors are polynomial functions of the velocity vectors, time and acceleration vectors. For the sake of simplification, the movement trajectory of SAR sensors is approximated as a linear model, as shown below:

\{\begin{matrix} X_{S_{i}} = X_{S_{i 0}} + V_{X_{i 0}} k x_{l} \\ Y_{S_{i}} = Y_{S_{i 0}} + V_{Y_{i 0}} k x_{l} \\ Z_{S_{i}} = Z_{S_{i 0}} + V_{Z_{i 0}} k x_{l} \end{matrix}, \{\begin{matrix} X_{S_{j}} = X_{S_{j 0}} + V_{X_{j 0}} k x_{r} \\ Y_{S_{j}} = Y_{S_{j 0}} + V_{Y_{j 0}} k x_{r} \\ Z_{S_{j}} = Z_{S_{j 0}} + V_{Z_{j 0}} k x_{r} \end{matrix},

(8)

where k denotes the reciprocal of the pulse repetition frequency. We also assume all variables except

(x_{r}, y_{r})

are known parameters when taking the left image as the reference images. Combining two Doppler equations in Equation (7), we can obtain

(V_{X_{i}} V_{Z_{j}} - V_{X_{j}} V_{Z_{i}}) X_{p} = (V_{Y_{j}} V_{Z_{i}} - V_{Y_{i}} V_{Z_{j}}) Y_{p} + (V_{X_{i}} X_{S_{i}} + V_{Y_{i}} Y_{S_{i}} + V_{Z_{i}} Z_{S_{i}}) V_{Z_{j}} - (V_{X_{j}} X_{S_{j}} + V_{Y_{j}} Y_{S_{j}} + V_{Z_{j}} Z_{S_{j}}) V_{Z_{i}} .

(9)

The above equation can be simplified as

X_{p} = v_{1} Y_{p} + v_{2} x_{r} + v_{3} .

And from the Doppler equation of the left image, we can derive that

Z_{p} = (V_{X_{i}} X_{S_{i}} + V_{Y_{i}} Y_{S_{i}} + V_{Z_{i}} Z_{S_{i}} - V_{X_{i}} X_{p} - V_{Y_{j}} Y_{p}) / V_{Z_{i}} .

(10)

Substituting

Z_{p}

and

X_{p}

into the two Range equations in Equation (7), we can deduce

\begin{matrix} v_{20} Y_{P}^{2} + (v_{21} x_{r} + v_{22}) Y_{p} + v_{23} x_{r}^{2} + v_{24} x_{r} + v_{25} = 0, \\ v_{26} Y_{P}^{2} + (v_{27} x_{r} + v_{28}) Y_{p} + v_{29} x_{r}^{2} + v_{30} x_{r} + v_{31} = {(R_{j} + M_{j} y_{r})}^{2} . \end{matrix}

(11)

Removing

Y_{p}

from the above equations, we can derive the final relationship between

(x_{r}, y_{r})

as follows, where

V_{1} . . . V_{13}

are known variables, and it is a quartic polynomial.

V_{1} y_{r}^{4} + V_{2} x_{r}^{4} + V_{3} x_{r}^{2} y_{r}^{2} + V_{4} x_{r}^{3} + V_{5} y_{r}^{3} + V_{6} x_{r} y_{r}^{2} + V_{7} x_{r}^{2} y_{r} + V_{8} x_{r}^{2} + V_{9} y_{r}^{2} + V_{10} x_{r} y_{r} + V_{11} x_{r} + V_{12} y_{r} + V_{13} = 0

(12)

2.1.3. Epipolar Constraint of Optical–SAR Image Pairs

As for the target P imaged by the pushbroom optical and the R-D SAR models, its imaging equations of the two stereo pairs can be described as follows [6,20]:

\begin{matrix} (\begin{matrix} X_{p} \\ Y_{p} \\ Z_{p} \end{matrix}) = (\begin{matrix} X_{S_{i}} \\ Y_{S_{i}} \\ Z_{S_{i}} \end{matrix}) + λ_{i} R_{i} (\begin{matrix} 0 \\ p_{l} \\ - f \end{matrix}), \begin{matrix} x_{l} = i \\ y_{l} = (p_{l} + p_{0}) \end{matrix}, \\ {(X_{p} - X_{S_{j}})}^{2} + {(Y_{p} - Y_{S_{j}})}^{2} + {(Z_{p} - Z_{S_{j}})}^{2} = {(R_{j} + M_{j} y_{r})}^{2}, \\ V_{X_{j}} (X_{p} - X_{S_{j}}) + V_{Y_{j}} (Y_{p} - Y_{S_{j}}) + V_{Z_{j}} (Z_{p} - Z_{S_{j}}) = - \frac{λ_{j} f_{d c}}{2} (R_{j} + M_{j} y_{r}) . \end{matrix}

(13)

From the linear array equation, we can obtain that

X_{p} = X_{S_{i}} + \frac{(r_{i 12} p_{l} - r_{i 13} f)}{(r_{i 32} p_{l} - r_{i 33} f)} (Z_{p} - Z_{S_{i}}), Y_{p} = Y_{S_{i}} + \frac{(r_{i 22} p_{l} - r_{i 23} f)}{(r_{i 32} p_{l} - r_{i 33} f)} (Z_{p} - Z_{S_{i}}) .

(14)

Substituting

X_{p}

and

Y_{p}

into the Doppler equation of the right image, we can obtain

Z_{p}

as

Z_{p} = \frac{[V_{X_{j}} (X_{S_{j}} - X_{S_{i}}) + V_{Y_{j}} (Y_{S_{j}} - Y_{S_{i}}) + V_{Z_{j}} Z_{S_{j}}] (r_{i 32} p_{l} - r_{i 33} f) + [V_{X_{j}} (r_{i 12} p_{l} - r_{i 13} f) + V_{Y_{j}} (r_{i 22} p_{l} - r_{i 23} f)] Z_{S_{i}}}{V_{X_{j}} (r_{i 12} p_{l} - r_{i 13} f) + V_{Y_{j}} (r_{i 22} p_{l} - r_{i 23} f) + V_{Z_{j}} (r_{i 32} p_{l} - r_{i 33} f)} .

(15)

We also assume that all variables except

(x_{r}, y_{r})

are known parameters when taking the left image as the reference image. Substituting

X_{S_{j}} = X_{S_{j 0}} + V_{X_{j 0}} k x_{r}, Y_{S_{j}} = Y_{S_{j 0}} + V_{Y_{j 0}} k x_{r}, Z_{S_{j}} = Z_{S_{j 0}} + V_{Z_{j 0}} k x_{r}

into

Z_{p}

, we find out that

Z_{p}

is a linear function of

x_{r}

,

Z_{p} = w_{1} + w_{2} x_{r} .

(16)

Similarly,

X_{p}

and

Y_{p}

are also linear functions of

x_{r}

, denoted as

X_{p} = w_{3} + w_{4} x_{r}, Y_{p} = w_{5} + w_{6} x_{r} .

(17)

Substituting

X_{p}, Y_{p}, Z_{p}

into the Range equation, we can deduce

w_{7} x_{r}^{2} + w_{8} x_{r} + w_{9} y_{r}^{2} + w_{10} y_{r} + w_{11} = 0,

(18)

and it is clear that the epipolar curve is a quadratic curve.

2.1.4. Epipolar Constraint of SAR-Optical Image Pairs

The epipolar geometries between optical-SAR pairs and SAR-optical pairs are also different since the optical and SAR imaging models are different. As for the target P imaged by the R-D SAR model and the pushbroom optical model, its imaging equations of the two stereo pairs can be described as follows [6,20]:

\begin{matrix} {(X_{p} - X_{S_{i}})}^{2} + {(Y_{p} - Y_{S_{i}})}^{2} + {(Z_{p} - Z_{S_{i}})}^{2} = {(R_{i} + M_{i} y_{l})}^{2}, \\ V_{X_{i}} (X_{p} - X_{S_{i}}) + V_{Y_{i}} (Y_{p} - Y_{S_{i}}) + V_{Z_{i}} (Z_{p} - Z_{S_{i}}) = - \frac{λ_{i} f_{d c}}{2} (R_{i} + M_{i} y_{l}), \\ (\begin{matrix} X_{p} \\ Y_{p} \\ Z_{p} \end{matrix}) = (\begin{matrix} X_{S_{j}} \\ Y_{S_{j}} \\ Z_{S_{j}} \end{matrix}) + λ_{j} R_{j} (\begin{matrix} 0 \\ p_{r} \\ - f \end{matrix}), \begin{matrix} x_{r} = j \\ y_{r} = (p_{r} + p_{0}) \end{matrix} . \end{matrix}

(19)

Following the derivation procedure of Section 2.1.3, and by assuming that all variables except

(x_{r}, y_{r})

are known parameters, we can obtain the relationship between

x_{r}, y_{r}

as a high-order polynomial:

\begin{matrix} d_{21} x_{r}^{2} y_{r}^{4} + d_{22} x_{r} y_{r}^{4} + d_{23} x_{r}^{2} y_{r}^{3} + d_{24} y_{r}^{4} + d_{25} x_{r} y_{r}^{3} + d_{26} x_{r}^{2} y_{r}^{2} + d_{27} y_{r}^{3} + d_{28} x_{r}^{2} y_{r} \\ + d_{29} x_{r} y_{r}^{2} + d_{30} x_{r}^{2} + d_{31} y_{r}^{2} + d_{32} x_{r} y_{r} + d_{33} x_{r} + d_{34} y_{r} + d_{35} = 0 . \end{matrix}

(20)

2.1.5. Verification of the Properties of Epipolar Constraint

As the rational function model can be regarded as an accurate alternative for rigorous imaging models for both optical and SAR images, we hereafter use RPCs for the sake of simplicity. For stereo images, the point

p_{l}

in the left image can be projected to the object space by backward mapping using RPCs. By changing the elevation H, we can get the projected light in the object space. Then, using the RPCs of the right image to project the light into the right image, we can obtain the projection trajectory curve l. l is the epipolar curve corresponding to the point

p_{l}

. Moreover, the conjugate point of

p_{l}

in the right image will be on the curve l. This method of generating epipolar curves is known as the projection trajectory method (PTM) [21]. Then, we use PTM to verify the epipolar curve is a hyperbolic curve for optical acquisitions with pushbroom model, a quartic curve for SAR acquisitions with Range–Doppler model, and the epipolar curves of the optical–SAR image pair and the SAR–optical image pair satisfy the quadratic polynomial and the high order polynomial, respectively. In addition, we also further verify other properties for multi-sensor models. We first choose a point in the left image, set its height range to [−500 m:9000 m] which is set based on the altitude of the Earth’s surface, then we get a series of points in the object space and map them to the right image. Finally, we use different functions to fit the epipolar curve.

Figure 2 shows the epipolar curves generated from the optical satellite (GF-2) image pairs and the SAR satellite (GF-3) image pairs using the PTM. From this figure, we can find that the epipolar curves of the GF-2 image pairs and GF-3 image pairs are similar to straight lines. This means that within the scope of the image scene, the epipolar curve can be approximated as a straight line. To further verify the shape of the epipolar curve, we fit the coordinates of the points in the projection trajectory. The fitting results are shown in Figure 3, Figure 4, Figure 5 and Figure 6.

The fitting results in Figure 3a show that when a linear function is used to fit the epipolar curve of the GF-2 image pair, the fitting residual can reach 3 pixels. However, when the fitting function is changed to a hyperbolic curve, the fitting residual is reduced by ~80%. Therefore, the epipolar curve of the optical image pair can be approximated as a hyperbolic curve, which is in accordance with the theoretical conclusion. As shown in Figure 4, for the GF-3 image pair, the quartic curve fitting residual of the epipolar curve is only approximately

1 / 50

of the linear fitting residual. This means that the epipolar curve of the SAR image pair can be approximated by a quartic curve. For the optical-SAR image pair, when using linear fitting to fit the projection points, the fitting residual is between −25 and 8, which is shown in Figure 5a. However, the residual is within 1 pixel, if a quadratic polynomial is used for fitting. The results in Figure 6 show that by comparing the fitting residuals, the epipolar curve of the SAR–optical image pair can be approximated as a high order polynomial curve. These prove that the epipolar constraint relationship of the image pairs we derived is effective.

In addition, we also verify the conjugacy of the epipolar curves. Conjugacy means that the corresponding points are located on the corresponding epipolar curves, and the points on the corresponding epipolar curves correspond one to one. We select six points (Point 1–6) from the same epipolar curve in Figure 2b and calculate the corresponding epipolar curve of each point in Figure 2a. As the epipolar curve can be approximated as a straight line in a local range, for the convenience of calculation, we perform linear fitting on the epipolar curves of the six points in the left image. The result is shown in Figure 7a. From the linear fitting results, it can be found that the epipolar curves fitted by these 6 points are not the same line, which indicates that there is no strict epipolar curve pair in the stereo images obtained by the linear pushbroom camera. However, the partial magnified image demonstrates that the difference between the fitting results of the epipolar curves at different points is very small, within 0.1 pixels. Therefore, within the scope of the image scene, these points’ epipolar curves can be approximately regarded as the same epipolar curve, that is, the epipolar curves satisfy the conjugacy in the image range. Next, we verify the approximate parallel property of the epipolar curves. In Figure 2a, we randomly select 4 points and perform a linear fitting on the projection trajectory of each point in Figure 2b. The linear fitting results are shown in Figure 7b. Figure 7b indicates that the linear fitting slopes of the epipolar curves corresponding to these four points are approximately equal, which means that the epipolar curves have the parallel property in the image range.

2.2. The Proposed General Epipolar Image Generation Framework

Based on the epipolar constraint relationship deduced in Section 2.1, we then introduce the proposed framework. Our framework can be divided into the following steps: (1) image pairs relative orientation, (2) image spatial conversion and overlapping region extraction based on DEM, (3) cropping the image pairs into small patches, (4) matching stereo image pairs based on spatial structure features and three-dimensional phase correlation algorithm, (5) estimating the fundamental matrix F based on the matching points, (6) image pairs correction and generate epipolar images, (7) calculating the coordinates in the original image and generate DSM using space intersection method, and (8) mosaic DSM. The flowchart of the proposed framework is presented in Figure 8. Next, we will introduce our framework in detail.

We first preprocess the input Level-1 images, that is, we perform the relative orientation between image pairs. The relative orientation takes the principle that the corresponding rays are coplanar as the theoretical basis to restore or determine the relative positional relationship of the image pairs during imaging. The principle is to construct the conditional expression of the relative orientation elements based on the intersecting characteristic of the light pair corresponding to the stereo pair and solve the relative orientation elements by formulating equations [22]. We denote the preprocessed left and right images as

I m a g e_{1}

and

I m a g e_{2}

, respectively. Then, we generate epipolar images and DSM from

I m a g e_{1}

and

I m a g e_{2}

.

2.2.1. Image Spatial Conversion and Overlapping Region Extraction Based on DEM

The rational function model is a generalized sensor model that can replace the linear array pushbroom camera model and the SAR physical camera model [14]. This model makes full use of the auxiliary parameters of satellite imagery and solves the model coefficients by fitting with the existing strict geometric model. Because of the high accuracy of the rational polynomial coefficients, it is widely used in line array optical satellites. Therefore, our framework is based on the RPCs to perform image space conversion and overlapping region extraction on stereo images. The RFM expresses the image space coordinates

(r, c)

as a polynomial ratio of the object space coordinates

(X, Y, Z)

, which is shown in Equation (21).

\begin{matrix} r_{n} = \frac{P_{1} (X_{n}, Y_{n}, Z_{n})}{P_{2} (X_{n}, Y_{n}, Z_{n})}, c_{n} = \frac{P_{3} (X_{n}, Y_{n}, Z_{n})}{P_{4} (X_{n}, Y_{n}, Z_{n})}, \end{matrix}

(21)

where

P_{1}

,

P_{2}

,

P_{3}

, and

P_{4}

are polynomials, and

r_{n}

,

c_{n}

,

X_{n}

,

Y_{n}

, and

Z_{n}

are normalized coordinates.

According to Equation (21), the corresponding image coordinates can be calculated according to the latitude, longitude, and elevation of a certain point in the object space. This process is called the RPCs positive solution. In addition, the RPCs inverse solution calculation formulas are as follows:

\begin{matrix} X_{n} = \frac{P_{5} (r_{n}, c_{n}, Z_{n})}{P_{6} (r_{n}, c_{n}, Z_{n})}, Y_{n} = \frac{P_{7} (r_{n}, c_{n}, Z_{n})}{P_{8} (r_{n}, c_{n}, Z_{n})}, \end{matrix}

(22)

where

P_{5}

,

P_{6}

,

P_{7}

, and

P_{8}

are polynomials. For the RPCs positive solution, by substituting the latitude and longitude coordinates and elevation into Equation (21), the row and column coordinates in the image space can be obtained. However, when

(r, c)

is known and uses the RPCs inverse solution formulas to calculate the spatial coordinates, it cannot be solved directly due to a lack of elevation information. Therefore, it is necessary to combine the DEM to obtain the corresponding spatial coordinates through multiple iterations, that is the Ray Tracing method combined with DEM from the image space to the object space [23].

Therefore, we first use RPCs to perform spatial conversion and overlapping region extraction on stereo satellite images based on DEM to obtain processed image pairs. For ease of description, we denote the RPCs corresponding to

I m a g e_{1}

and

I m a g e_{2}

as

R P C_{1}

and

R P C_{2}

. The schematic diagram of the image space conversion processing flow is shown in Figure 9. The specific steps are as follows:

For each pixel in $I m a g e_{1}$ , substitute the row and column coordinates into Equation (22) according to $R P C_{1}$ . The corresponding object space longitude, latitude, and elevation are iteratively calculated using the ray tracing method.
Substituting the spatial coordinates obtained in the previous step into Equation (21) constructed by $R P C_{2}$ , the coordinates in the image space of $I m a g e_{2}$ corresponding to the object points can be obtained.
According to the corresponding relationship between the image space of $I m a g e_{1}$ and the image space of $I m a g e_{2}$ , the new left image, which is called Level-1P image and denoted as $I m a g e_{1 P}$ , is obtained. $I m a g e_{1 P}$ only contains the overlapping area of the image pair. Then, $I m a g e_{1 P}$ and $I m a g e_{2}$ are used as a new image pair to generate epipolar images and DSM.

Figure 9. The schematic diagram of the image space conversion process.

2.2.2. Image Blocks Dividing in $I m a g e_{1 P}$ and $I m a g e_{2}$

We divide the image pairs into several square image block pairs to improve the accuracy and speed of epipolar images and DSM generation. We crop an image block every 512 pixels when dividing image blocks, so that adjacent image blocks have a 50% overlap area, to ensure that the final generated DSM can cover all areas of the image. The image blocks covering the same area in

I m a g e_{1 P}

and

I m a g e_{2}

are respectively denoted as

i m g_{1}

and

i m g_{2}

. By dividing the image pairs into several block pairs, it is helpful to extract more reliable conjugate features and increase the efficiency. The block size is set to 1024 × 1024 pixels, which can maintain the affine transformation characteristics [24].

2.2.3. Images Matching Based on Spatial Structure Features and 3D Phase Correlation

Considering the radiometric and geometric distortions caused by view conditions and sun angles, herein, we use a robust feature matching method to obtain high-precision correspondences. Based on our previous work [25], we extract intensity-invariant and sensor-invariant structure features and put them into the 3D phase correlation framework. Traditional phase correlation methods are based on two core ideas. The first is the assumption of brightness constancy, that is, the intensity of pixels in an image with a certain offset remains unchanged. The second is the famous Fourier transform characteristic, which means that the Fourier phases of two functions that have an offset in the spatial domain have a linear difference. These two core ideas are given as follows:

I_{1} (x, y) = I_{2} (x - x_{0}, y - y_{0}), F_{1} (u, v) = F_{2} (u, v) exp \{- 2 π i (u x_{0} + v y_{0})\},

(23)

where

I_{1}

and

I_{2}

are two images,

F_{1}

and

F_{2}

are the corresponding Fourier Transforms, and

x_{0}

and

y_{0}

are the offsets of the coordinates. The normalized cross power spectrum can be obtained:

Q (u, v) = \frac{F_{1} (u, v) F_{2} {(u, v)}^{*}}{|F_{1} (u, v) F_{2} {(u, v)}^{*}|} = exp \{- 2 π i (u x_{0} + v y_{0})\} .

(24)

The inverse Fourier transform of the cross power spectrum is the impulse function in the spatial domain. By locating the extreme points in the spatial domain, the offsets in the two directions can be found:

Q (x, y) = δ (x - x_{0}, y - y_{0})

(25)

However, for multi-view and multi-sensor remote sensing images, there are often radiation and geometric differences between them, which make the first core idea of the phase correlation method no longer applicable. To solve this problem, we first map the input images to a feature space, which retains the invariance by capturing the inherent structural information. Two three-dimensional feature sets are formed and embedded in the three-dimensional phase correlation process to estimate the two-dimensional translational displacement by using dense spatial structure feature representation instead of image intensity.

For spatial structure features, we use dense scale invariant transform features [26]. First, we calculate the horizontal and vertical gradient response of each pixel. Then, we obtain the histogram structure of the gradient position and direction based on the gradient magnitude and direction to construct the spatial feature. Finally, we embed two 3-D feature images into the 3-D phase correlation process, as shown below:

V_{1} (x, y, g) = V_{2} (x - x_{0}, y - y_{0}, g),

(26)

where

V_{1}

and

V_{2}

are two 3-D feature images. x and y are the coordinates of the pixels in the image. g represents the gradient magnitude of the pixel. The Fourier translation characteristics of the 3-D phase correlation are

F \{V_{2} (x - x_{0}, y - y_{0}, g)\} = e^{- 2 π i (u x_{0} + v y_{0})} F \{V_{2} (x, y, g)\} .

(27)

The normalized cross power spectrum of the above equation is

\begin{matrix} Q (u, v, w) & = \frac{F_{1} (u, v, w) F_{2} {(u, v, w)}^{*}}{|F_{1} (u, v, w) F_{2} {(u, v, w)}^{*}|} = exp \{- 2 π i (u x_{0} + v y_{0})\} \vec{d}, \end{matrix}

(28)

where

F_{1}

and

F_{2}

are the three-dimensional Fourier transforms corresponding to

V_{1}

and

V_{2}

.

\vec{d}

is a three-dimensional unit vector, and its direction is the direction of the gradient of the pixel. The three-dimensional inverse Fourier Transform of Equation (28) is an impulse function throughout three-dimensional space:

Q (x, y, g) = F^{- 1} \{Q (u, v, w)\} = δ (x - x_{0}, y - y_{0}) \vec{d}

(29)

The biggest difference between

Q (x, y, g)

and

Q (x, y)

is that we need to locate the impulse response in three-dimensional space instead of two-dimensional space. A simple and direct method is to sum the third dimension information in

Q (x, y, g)

and convert it into

Q (x, y)

to solve. Finally, two-dimensional translational displacement can be estimated by locating the maximum response, and then correspondences are obtained.

2.2.4. Fundamental Matrix Estimation Based on Conjugation Points

In Section 2.1, we introduced the epipolar constraint relationship between image pairs. The epipolar constraint can also be expressed in mathematical form, that is, the fundamental matrix. The fundamental matrix F represents the mapping from

p_{l}

in the left image to the epipolar curve

l_{r}

in the right image. This relationship is expressed as

l_{r} = F p_{l}

[27].

Based on the epipolar constraint, the corresponding right image point

p_{r}

must be on the epipolar curve

l_{r}

, and

p_{r}^{T} l_{r} = p_{r}^{T} F p_{l} = 0 .

(30)

When the coordinates of the matching points in the image pairs are obtained from the aforementioned feature matching method, combined with Equation (30), F between the image pairs can be calculated. As F is a

3 \times 3

matrix with 9 unknowns in total, at least eight pairs of matching points are required to solve F. Therefore, we use the Eight-Point-Algorithm to calculate F [28].

2.2.5. Image Pairs Correction Based on Fundamental Matrix

Combining the matching points of the left and right images

i m g_{1}

and

i m g_{2}

, and the fundamental matrix F, the image pairs can be corrected to generate epipolar images. In this paper, we use the Hartley method [29] to correct the image pairs. This method can be implemented in two steps.

First of all, we need to calculate the projection transformation matrices

H_{l}

and

H_{r}

based on the fundamental matrix [30] and this method is the same as the epipolar rectification technique of the Ames Stereo pipeline [31]. The projection transformation matrix H can be obtained by multiplying four matrices, that is,

H = C G R T

. The four matrices C, G, R, and T, respectively, represent different transformation relations. We assume that the homogeneous coordinates of the pole of the right image is

{(u_{e}, v_{e}, 1)}^{T}

, and the homogeneous coordinates of the center point of the right image is

{(u_{c}, v_{c}, 1)}^{T}

. Then, we can obtain the projection transformation matrix

H_{r}

corresponding to the right image as follows:

H_{r} = C G R T = [\begin{matrix} 1 & 0 & u_{c} \\ 0 & 1 & v_{c} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ - \frac{1}{u_{e}^{2}} & 0 & 1 \end{matrix}] [\begin{matrix} \frac{u_{e}^{1}}{\sqrt{{(u_{e}^{1})}^{2} + {(v_{e}^{1})}^{2}}} & \frac{v_{e}^{1}}{\sqrt{{(u_{e}^{1})}^{2} + {(v_{e}^{1})}^{2}}} & 0 \\ - \frac{v_{e}^{1}}{\sqrt{{(u_{e}^{1})}^{2} + {(v_{e}^{1})}^{2}}} & \frac{u_{e}^{1}}{\sqrt{{(u_{e}^{1})}^{2} + {(v_{e}^{1})}^{2}}} & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} 1 & 0 & - u_{c} \\ 0 & 1 & - v_{c} \\ 0 & 0 & 1 \end{matrix}],

(31)

where

{(u_{e}^{1}, v_{e}^{1}, 1)}^{T} = T {(u_{e}, v_{e}, 1)}^{T}

and

{(u_{e}^{2}, v_{e}^{2}, 1)}^{T} = R {(u_{e}^{1}, v_{e}^{1}, 1)}^{T}

.

Assuming that among the N pairs of matching points obtained in the Section 2.2.3, the coordinates of the i-th pair of matching points in the left and right images are

p_{1}^{i} = {(u_{1}^{i}, v_{1}^{i})}^{T}

and

p_{2}^{i} = {(u_{2}^{i}, v_{2}^{i})}^{T}

, respectively. Then,

H_{l}

can be computed by the following equation, where d denotes the distance between

H p_{1}^{i}

and

H_{r} p_{2}^{i}

.

H_{l} = \underset{H}{a r g} m i n \sum_{i = 1}^{N} d {(H p_{1}^{i}, H_{r} p_{2}^{i})}^{2} .

(32)

Second, we resample the pair of image blocks

i m g_{1}

and

i m g_{2}

after calculating the projection transformation matrices

H_{l}

and

H_{r}

. We assume that these two matrices

H_{l}

and

H_{r}

can map vectors to the same plane J. Moreover, in the plane J, there is a window W with a size of 1024 × 1024 pixels. Then,

i m g_{1}

and

i m g_{2}

can be resampled through

H_{1}

and

H_{r}

, so that the resampled two images are within the range of window W. For any point

p^{'}

in W, its corresponding point in

i m g_{1}

can be expressed as

p_{l} = H_{l}^{- 1} p^{'}

. Similarly, the corresponding point of

p^{'}

in

i m g_{2}

is

p_{r} = H_{r}^{- 1} p^{'}

. Then,

i m g_{1}

and

i m g_{2}

can be resampled to the window W in plane J which are denoted as

i m g_{1}^{'}

and

i m g_{2}^{'}

, through bicubic interpolation.

Finally, we crop the middle part from

i m g_{1}^{'}

and

i m g_{2}^{'}

as the final epipolar images, denoted as

e p_i m g_{1}

and

e p_i m g_{2}

.

2.3. DSM Generation

We use Semi-Global Matching (SGM) [32] which was proposed by Hirschmüller to obtain pixelwise matching points from

e p_i m g_{1}

and

e p_i m g_{2}

. The realization process of the stereo matching algorithm can be summarized into the following four steps: matching cost calculation, matching cost aggregation, disparity calculation, and disparity optimization [33].

We set the left epipolar image

e p_i m g_{1}

as the reference image, after obtaining the disparity map of the image pair based on the stereo matching method. Through the back-projection transformation, we calculate the coordinates of each pixel in

e p_i m g_{1}

and

e p_i m g_{2}

in

I m a g e_{1 P}

and

I m a g e_{2}

, respectively; their coordinates are denoted as

(u_{1 p}, v_{1 p})

and

(u_{2}, v_{2})

. It is necessary to convert the coordinates in

I m a g e_{1 P}

to the coordinates in the original image

I m a g e_{1}

in order to further generate DSM. We substitute

R P C_{2}

into Equation (22) and use the ray tracing method based on DEM to calculate the spatial coordinates

(X, Y, Z)

corresponding to

(u_{1 p}, v_{1 p})

. Then, by substituting

(X, Y, Z)

into Equation (21) formed by

R P C_{1}

, we obtain the corresponding coordinates of

(u_{1 p}, v_{1 p})

in

I m a g e_{1}

. Finally, we can produce DSM based on the space forward intersection method.

3. Results

In this section, we first evaluate the performance of the proposed framework by comparing it to the current state-of-the-art algorithms. The first method is an epipolar image generation algorithm based on the PTM [21]. The second method is the S2P method [24]. Various scenarios of different sensors, including urban, agriculture, and mountain areas, are used. Then, we explore the applicability of our framework on SAR–optical and SAR–SAR image pairs. In the comparative experiment, the disparities of the matching points are used as the quantitative metric to evaluate the quality of the generated epipolar images. The satellite images used in our experiment include ZY-3, GF-2, GF-3, and GeoEye images. Among them, ZY-3, GF-2, and GeoEye are optical satellites, and GF-3 is a SAR satellite. The coverage area of these image data and their information are shown in Table 1.

3.1. Experimental Results of Epipolar Images Generation

To verify the performance of the proposed approximate epipolar image generation framework, we first select the ZY-3 satellite images of the Songshan area in China, which is mostly mountainous. The elevation range of Songshan is approximately 67–1472 m. In order to further verify the performance of our framework, the results obtained by our framework are compared to two other methods.

The epipolar images generated by the proposed framework is shown in Figure 10c. From the results, it can be found that the vertical disparity of the generated epipolar image pair is approximately negligible, which means that satisfactory results can be obtained for areas with large terrain elevation changes that do not satisfy the affine transformation model. In addition, the epipolar images generated by two other comparison methods in the same region are shown in Figure 10a,b. In order to make the comparison of results more intuitive, we manually select two corresponding points in the epipolar images and display their coordinates

(x, y)

, where x is the horizontal coordinate and y is the vertical coordinate. The results show that the y-disparity of the epipolar images generated by these three methods is relatively small and can be approximately ignored. However, in addition to the y-disparity, the x-disparity of the points in Figure 10c is almost negligible. Therefore, it is clear that compared with the other two methods, the proposed framework can get better epipolar images.

We further calculate the y-disparity of the SIFT matching points in the epipolar image pair to quantitatively evaluate the results. The smaller this y-disparity, the higher the accuracy of the generated epipolar images. Moreover, when performing stereo matching, the reduction of the search range of matching points will improve the accuracy and efficiency of image matching. Therefore, we also calculate the x-disparity of the SIFT matching points at the same time. When the x-disparity is small, the matching efficiency and accuracy during dense matching of epipolar images will be improved. The disparity of matching points is shown in Table 2. For the sake of comparison, the data in the tables are the statistical results of the absolute value of the disparity.

It can be found that the proposed framework can generate high-quality epipolar images in regions with large terrain undulations according to the image matching results. For the y-disparity, the results obtained by our framework are slightly smaller than the results obtained by the S2P and PTM-based methods. Although the proposed framework only shows a small advantage in y-disparity, it has obvious advantages in x-disparity. The epipolar images obtained by our framework also have higher similarity in the x-direction and the average x-disparity of the matching points can reach the sub-pixel level. The results show that the epipolar images generated by our framework can rearrange the original images better so that the corresponding points of the image pair are approximately on the same horizontal line, thereby reducing the search range of conjugate points in image matching to 1-D space. In addition, the generated epipolar images can also reduce the disparity in the x-direction, which provides a guarantee for the efficiency and accuracy of the dense matching before the subsequent DSM generation.

We also conduct experiments on ZY-3 images in Tianshan, China, which are shown in Figure 11. Compared with the Songshan area, the terrain changes in this area are more obvious, approximately 24–3091 m. The quality of the epipolar images generated in this area is quantitatively evaluated, and the results are shown in Table 3. Compared with the Songshan area, the elevation of the Tianshan area has greater fluctuations, so the x-disparity of the generated epipolar image will be greater. However, by comparing the results of the other two methods, it can be found that the x-disparity of the epipolar image generated by the proposed framework is still optimal among the three methods. The results show that in areas with large elevation fluctuations, not only the y-disparity can reach sub-pixels in the epipolar images generated by our framework, but most of the average x-disparity does not exceed 3 pixels. Therefore, our epipolar image generation framework has more obvious advantages for areas with large undulations like Tianshan.

3.2. Verification of the Applicability of the Proposed Framework

It has been verified in Section 3.1 that the proposed framework can generate good epipolar images for areas with large terrain elevation changes. We select stereo image pairs from other satellites and other regions to generate epipolar images to verify the applicability of the proposed framework.

3.2.1. The Results of Epipolar Image Generation for Different Terrains

We choose the Omaha region of the United States to verify the applicability of the proposed framework in regions with small terrain undulations. The results of the comparative experiment are shown in Figure 12 and Table 4. The results show that the proposed framework can still achieve ideal results for urban areas with smaller terrain undulations and larger elevation gradients than mountain areas. Compared with the S2P method, the proposed framework has no obvious advantages for the y-disparity of the epipolar image in the urban area. But the x-disparity is still relatively small. The mean values of the x-disparity of the corresponding points are both about 1 to 2 pixels. Therefore, the proposed epipolar image generation framework is also applicable to areas with small terrain undulations and large terrain undulation gradients.

3.2.2. The Results of Epipolar Image Generation for Different Sensors

We conduct experiments on satellite images from other sources to further verify the universality of the proposed epipolar image generation framework. We choose the stereo image pair consisting of GF-3 satellite image and GF-2 satellite image in Songshan in this experiment. We first generate epipolar images based on the SAR-optical image pair. Due to the particularity of SAR imaging, we adopt different structural features in the matching stage for SAR images [25]. The epipolar images obtained using the proposed framework are shown in Figure 13. The average y-disparity of the image matching points can reach approximately 1 pixel, and the average x-disparity is less than 2 pixels (see Table 5). It can be seen from the comparison results that although the S2P method can generate epipolar images with a small y-disparity, the x-disparity is large. Moreover, the epipolar images are distorted, which is not conducive to subsequent image matching. Therefore, the proposed epipolar images generation framework is also applicable for SAR-optical satellite stereo image pairs.

In addition to the epipolar image generation experiment on the SAR-optical image pairs, we also perform a verification experiment on the SAR–SAR image pair. We choose GF-3 images as our SAR image source and the corresponding area of the images used in Omaha. The epipolar images generated by GF-3 image pairs in the Omaha area are shown in Figure 14 and the results of the comparative experiment are shown in Table 6. By comparing with the S2P method, it can be found that the proposed framework has obvious advantages for the generation of epipolar images of SAR–SAR image pairs. The average value of the x-disparity and y-disparity of the epipolar images we generated can reach the sub-pixel level. And the epipolar image pair is not distorted, which is conducive to the dense matching of the epipolar image.

In summary, the proposed epipolar images generation framework can be applied to areas with large and small topographic undulations through the above experiments. Moreover, it can be used not only for the epipolar image generation of optical image pairs but also for the epipolar image generation of SAR–optical image pairs and SAR–SAR image pairs.

4. Discussion

The results in Section 3 have shown that the proposed framework can generate better quality epipolar images. Next, we evaluate the performance of DSM generation using the generated epipolar images based on the previous experiment.

In the DSM generation experiment, we use the ZY-3 satellite images with a ground sampling distance of 3.5 m. Then, the generated epipolar images of Songshan are tested to obtain the corresponding DSM through the SGM-based stereo matching method and the Space Intersection method, and the result is shown in Figure 15. We compare the generated DSM with the DSM generated by other methods and the existing DSM to evaluate the accuracy of the DSM generated by our framework, based on the GCPs. We choose AW3D30 DSM and DSM generated by ENVI as comparison data. Among them, AW3D30 (ALOS World 3D-30 m) is a DSM data set released by the Japan Aerospace Exploration Agency(JAXA), with a grid resolution of ~30 m. This DSM data set is obtained by using PRISM on the land observation satellite ALOS to image the same object using three cameras with different perspectives [34]. In addition, the other DSM for comparison is obtained through the photogrammetry extension module in ENVI-v5.6 [35]. The ENVI software uses the metadata information of the ZY-3 original image pair to generate DSM.

Comparing the generated DSM with GCPs, the absolute values of the difference between the elevation is mostly within 4 m, and the average absolute difference is 2.33 m. The histogram of the elevation accuracy is shown in Figure 16. Table 7 lists the accuracy of DSM at the GCPs marked in Figure 15. Among them, the vertical precision of AW3D30 DSM in this area is ~20 m. Moreover, the vertical precision of ENVI-DSM and the generated DSM are both about 2 m. Therefore, the results show that the accuracy of the DSM we generated is greatly improved compared to AW3D30 DSM, and it can achieve higher accuracy than the DSM generated by ENVI. Therefore, the proposed framework can generate good quality epipolar images, which can be used to generate high accuracy DSM.

5. Conclusions

In this paper, we proposed a framework for generating epipolar images based on DEM constraints. DSM can be generated by stereo matching using the obtained approximate epipolar images. Through feature matching of the generated epipolar images, the epipolar images generated by our framework have higher precision and versatility, compared with the existing PTM method and the S2P method. The advantages of the proposed framework can be summarized as follows.

The epipolar images generated by the proposed framework can better achieve the goal of restricting the search range of matching points from 2-D to 1-D for areas with large terrain undulations. Moreover, it is found that the matching points of the epipolar images have not only a small y-disparity, but also a small x-disparity. This shows that the search range of matching points can be further reduced, and the accuracy and speed of stereo matching will be improved, which is beneficial to ensure the efficiency of subsequent stereo matching work.
This framework is also suitable for areas with small terrain undulations and can achieve ideal results. This can ensure that our framework can be used to generate epipolar images for the entire image with various scenarios. In addition to optical image pairs, this framework can be also applied to SAR-optical image pairs and SAR-SAR image pairs. This means that our framework has good versatility.
The epipolar images generated by this framework can be applied to the generation of high-accuracy DSM. For optical image pairs, the average value of the elevation difference between the DSM generated by the proposed framework and the GCPs is no more than 3 m, which has good accuracy. That is, the proposed epipolar image generation framework can be used in subsequent DSM generation work.

Therefore, the epipolar image generation framework proposed in this paper is suitable for different terrains and various sensors, and a high-precision DSM can also be obtained. The influence of the size of image blocks on the accuracy of the generated DSM needs to be further verified as our framework divides the image into blocks. In addition, we also conducted experiments on the generation of DSM from SAR-SAR image pairs, but there are still some invalid and distorted values in the DSM. Therefore, we will investigate robust stereo matching methods for SAR images in our future works.

Author Contributions

Conceptualization, Y.X. and X.W.; methodology, X.W., F.W., and Y.X.; software, X.W. and Y.X.; validation, X.W. and Y.X.; formal analysis, X.W. and Y.X.; investigation, X.W. and Y.X.; resources, F.W. and Y.X.; data curation, X.W. and Y.X.; writing—original draft preparation, X.W.; writing—review and editing, Y.X., F.W., and H.Y.; visualization, X.W. and Y.X.; supervision, Y.X., F.W., and H.Y.; project administration, F.W. and H.Y.; funding acquisition, F.W. and H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant 61901439 and in part by the Key Research Program of Frontier Science, Chinese Academy of Sciences, under Grant ZDBS-LY-JSC036.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The AW3D30 DSM data were obtained from https://www.eorc.jaxa.jp/ALOS/en/aw3d30/data/index.htm accessed on 20 September 2021. And the data supporting the study are available from the authors upon reasonable request.

Acknowledgments

We thank the reviewers for their valuable comments and suggestions. We also thank the production team for revising the format of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, M.; Hu, F.; Li, J. Epipolar arrangement of satellite imagery by projection trajectory simplification. Photogramm. Rec. 2010, 25, 422–436. [Google Scholar] [CrossRef]
Morgan, M.F. Epipolar Resampling of Linear Array Scanner Scenes. PhD Thesis, University of Calgary, Calgary, AB, Canada, 2004. [Google Scholar]
Zhang, Z.; Zhou, Y. A New Approach to Arrange the Approximate Epipolar Lines for SPOT Images. J. Wuhan Tech. Univ. Surv. Mapp. 1989. Available online: https://en.cnki.com.cn/Article_en/CJFDTotal-WHCH198902002.htm (accessed on 20 September 2021).
Kim, T. A Study on the Epipolarity of Linear Pushbroom Images. Photogramm. Eng. Remote. Sens. 2000, 66, 961–966. [Google Scholar]
Oh, J.H.; Shin, S.W.; Kim, K.O. Direct Epipolar Image Generation From IKONOS Stereo Imagery Based On RPC and Parallel Projection Model. Korean J. Remote. Sens. 2006, 22, 451–456. [Google Scholar]
Wang, M.; Hu, F.; Li, J. Epipolar resampling of linear pushbroom satellite imagery by a new epipolarity model. Isprs J. Photogramm. Remote. Sens. 2011, 66, 347–355. [Google Scholar] [CrossRef]
Grodecki, J. IKONOS stereo feature extraction—RPC approach. In ASPRS Annual Conference St. Louis; 2001; Available online: https://www.researchgate.net/profile/Jacek-Grodecki/publication/228845277_IKONOS_stereo_feature_extraction-RPC_approach/links/5cacdc78458515cd2b0d25e2/IKONOS-stereo-feature-extraction-RPC-approach.pdf (accessed on 20 September 2021).
Oh, J.; Lee, W.H.; Toth, C.K.; Grejner-Brzezinska, D.A.; Lee, C. A Piecewise Approach to Epipolar Resampling of Pushbroom Satellite Images Based on RPC. Photogramm. Eng. Remote. Sens. 2010, 76, 1353–1363. [Google Scholar] [CrossRef]
Koh, J.W.; Yang, H.S. Unified piecewise epipolar resampling method for pushbroom satellite images. Eurasip J. Image Video Process. 2016, 2016, 11. [Google Scholar] [CrossRef] [Green Version]
Xiang, Y.; Wang, F.; You, H. OS-SIFT: A Robust SIFT-Like Algorithm for High-Resolution Optical-to-SAR Image Registration in Suburban Areas. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 3078–3090. [Google Scholar] [CrossRef]
Bagheri, H.; Schmitt, M.; D’Angelo, P.; Zhu, X.X. A Framework for SAR-Optical Stereogrammetry over Urban Areas. ISPRS J. Photogramm. Remote. Sens. 2018, 146, 389–408. [Google Scholar] [CrossRef]
Qiu, C.; Schmitt, M.; Zhu, X.X. Towards automatic SAR-optical stereogrammetry over urban areas using very high resolution imagery. ISPRS J. Photogramm. Remote. Sens. 2018, 138, 218–231. [Google Scholar] [CrossRef] [PubMed]
Gutjahr, K.; Perko, R.; Raggam, H.; Schardt, M. The Epipolarity Constraint in Stereo-Radargrammetric DEM Generation. IEEE Trans. Geosci. Remote. Sens. 2014, 52, 5014–5022. [Google Scholar] [CrossRef]
Pan, H.B.; Guo, Z.; Chen, T. A general method of generating satellite epipolar images based on RPC model. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2011, Vancouver, BC, Canada, 24–29 July 2011. [Google Scholar]
Perko, R.; Gutjahr, K.; Krüger, M.; Raggam, H.; Schardt, M. DEM-based Epipolar Rectification for Optimized Radargrammetry. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Fort Worth, TX, USA, 23–28 July 2017. [Google Scholar]
Zhang, Z.; Yang, S.; Zhang, J.; Tao, K.E. Multi-Baseline Digital Close-Range Photogrammetry. Geospat. Inf. 2007, 5, 1–4. [Google Scholar]
Cho, W.; Schenk, T.; Madani, M. Resampling Digital Imagery to Epipolar Geometry. In Reports of the Department of Geodetic Science and Surveying; Ohio State University: Columbus, OH, USA, 1992. [Google Scholar]
Jia, K.X.; Xi, C.; Zheng, F.H. Study on the DSM Production and Application Based on Rapid Produced Three- Line Array Images of ZY- 3. Geomat. Spat. Inf. Technol. 2015, 38, 111–113. [Google Scholar]
Lemaître, M.; Verfaillie, G.; Jouhaud, F.; Lachiver, J.M.; Bataille, N. Selecting and scheduling observations of agile satellites. Aerosp. Sci. Technol. 2002, 6, 367–381. [Google Scholar] [CrossRef]
Meric, S.; Fayard, F.; Pottier, É. A Multiwindow Approach for Radargrammetric Improvements. IEEE Trans. Geosci. Remote. Sens. 2011, 49, 3803–3810. [Google Scholar] [CrossRef] [Green Version]
Hu, F.; Wang, M.; Li, D.; Xiao, M. Approximate Epipolar Rearrangement Algorithm of Satellite Stereo-imagery by Projection Trajectory. Geomat. Inf. Sci. Wuhan Univ. 2009, 34, 1431–1435. [Google Scholar]
Wilson, W.S.; Apel, J.R. The Principles of Photogrammetry, 2nd ed.; Wuhan University Press: Wuhan, China, 2009. [Google Scholar]
Teo, T.A.; Chen, L.C.; Liu, C.L.; Tung, Y.C.; Wu, W.Y. DEM-Aided Block Adjustment for Satellite Images With Weak Convergence Geometry. IEEE Trans. Geosci. Remote. Sens. 2010, 48, 1907–1918. [Google Scholar]
de Franchis, C.; Meinhardt-Llopis, E.; Michel, J.; Morel, J.M.; Facciolo, G. On stereo-rectification of pushbroom images. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 5447–5451. [Google Scholar] [CrossRef]
Xiang, Y.; Tao, R.; Wan, L.; Wang, F.; You, H. OS-PC: Combining Feature Representation and 3-D Phase Correlation for Subpixel Optical and SAR Image Registration. IEEE Trans. Geosci. Remote. Sens. 2020, 58, 6451–6466. [Google Scholar] [CrossRef]
Lowe, D. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar] [CrossRef] [Green Version]
Hartley, R.I. In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 580–593. [Google Scholar] [CrossRef] [Green Version]
Hartley, R.I. Theory and Practice of Projective Rectification. Int. J. Comput. Vis. 1999, 35, 115–127. [Google Scholar] [CrossRef]
Ozcanli, O.C.; Yi, D.; Mundy, J.L.; Webb, H.; Hammoud, R.; Tom, V. A comparison of stereo and multiview 3-D reconstruction using cross-sensor satellite imagery. In Proceedings of the Computer Vision & Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Beyer, R.A.; Oleg, A.; Scott, M.M. The Ames Stereo Pipeline: NASA’s Open Source Software for Deriving and Processing Terrain Data. Earth Space Sci. 2018, 5, 537–548. [Google Scholar] [CrossRef]
Hirschmüller, H. Stereo Processing by Semi-Global Matching and Mutual Information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
Scharstein, D.; Szeliski, R.; Zabih, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001), Kauai, HI, USA, 9–10 December 2001; pp. 131–140. [Google Scholar] [CrossRef]
Takaku, J.; Tadono, T.; Doutsu, M.; Ohgushi, F.; Kai, H. UPDATES OF ‘AW3D30’ ALOS GLOBAL DIGITAL SURFACE MODEL IN ANTARCTICA WITH OTHER OPEN ACCESS DATASETS. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2021, 43, 401–408. [Google Scholar] [CrossRef]
ENVI-IDL Technology Hall. Extract DSM and Point Cloud Data Based on GF-7 Stereo Pair. 2021. Available online: http://blog.sina.com.cn/s/blog_764b1e9d010302tm.html (accessed on 15 April 2021).

Figure 1. A schematic diagram of the epipolar geometry relationship.

O_{l}

and

O_{r}

are respectively two cameras’ projection points. The projection of point P onto the left and right image are

p_{l}

and

p_{r}

.

l_{r}

is the epipolar curve of point

p_{l}

, and there has that

p_{r}

must be on

l_{r}

.

Figure 1. A schematic diagram of the epipolar geometry relationship.

O_{l}

and

O_{r}

are respectively two cameras’ projection points. The projection of point P onto the left and right image are

p_{l}

and

p_{r}

.

l_{r}

is the epipolar curve of point

p_{l}

, and there has that

p_{r}

must be on

l_{r}

.

Figure 2. Epipolar curves generated by PTM. The red and green dots represent conjugate point pairs. The red curves are the epipolar curves corresponding to the red dots. (a,b) GF-2 image pair. (c,d) GF-3 image pair.

Figure 3. The fitting results of points on the projection trajectory of the GF-2 image pair. (a) Linear fitting residual. (b) Hyperbolic fitting residual.

Figure 4. The fitting results of points on the projection trajectory of the GF-3 image pair. (a) The linear fitting residual. (b) The quartic polynomial fitting residual.

Figure 5. The fitting results of points on the projection trajectory of the GF-2–GF-3 image pair. (a) The linear fitting residual. (b) The quadratic polynomial fitting residual.

Figure 6. The fitting results of points on the projection trajectory of the GF-3–GF-2 image pair. (a) The linear fitting residual. (b) The high order polynomial fitting residual.

Figure 7. (a) The linear fitting results of the epipolar curves in the left image corresponding to Points 1–6. These 6 points are selected from the same epipolar curve in the right image. (b) is the linear fitting results of the epipolar curves in the right image. Points 1–4 are the four randomly selected points in the left image.

Figure 8. The flowchart of the proposed framework. The gray blocks indicate the processing of the entire image pairs, and the green blocks indicate the parallel processing of the image blocks.

Figure 10. The epipolar images of two regions generated from ZY-3 images in Songshan. (a) The epipolar images generated by the S2P method. (b,c) The epipolar images generated by the PTM method and the proposed framework, respectively.

(x, y)