1. Introduction
A Digital Surface Model (DSM) is a fundamental dataset in remote sensing applications. Synthetic Aperture Radar (SAR) images can be used to construct Digital Surface Models. Four methods can be used to extract height from SAR images: clinometry, interferometry, radargrammetry, and polarimetry [
1]. InSAR (Interferometric Synthetic Aperture Radar) and stereo radargrammetry however, are the most widely used techniques among them. For instance, SRTM DEM [
2] and TanDEM-X DEM [
3] are the most representative InSAR products, while StereoSAR DSMs are generated from SAR images with different incidence angles.
Depending on phase information of SAR imagery, the InSAR technique is more prone to decorrelation related to temporal changes in objects, large spatial baselines, or turbulent atmospheric conditions, hampering the achievable DSM quality. Radargrammetry however, is an alternative to InSAR processing for DSM retrieval as it makes use of amplitude information and thus is less susceptible to the decorrelation effect. For a given pair of images, usually either an InSAR or radargrammetry technique can be applied: the former requires small changes in illumination angle to acquire repeat-pass images, whereas the latter requires large baselines to yield stable intersection of line-of-sight rays. Radargrammetry is analogous to photogrammetry where disparities of correspondences on stereo images are computed by image matching and used to derive 3D point clouds. Therefore, the accuracy of image matching directly determines the quality of final DSMs. Much research has been devoted to stereo image matching.
According to Scharstein and Szeliski [
4], stereo matching methods can be divided into two categories: local matching and global matching algorithms. In addition, stereo matching algorithms are generally composed of four steps: matching cost calculation, cost aggregation, disparity computation, and disparity refinement. Based on this taxonomy and distinctions of the algorithmic building blocks, the performance of a variety of stereo algorithms can be effectively evaluated and improved.
Local matching algorithms, namely, window-based algorithms, compute disparity of correspondences utilizing only the intensity values within a finite window centered by the point to match. Usually, local algorithms consist of two steps: matching cost computation and cost aggregation. Matching cost is computed from area-based similarity measures. Common similarity measures include normalized cross-correlation (NCC), sum of absolute differences (SAD), sum of squared differences (SSD), as well as census transformation [
5], and mutual information (MI) [
6]. Cost aggregation is realized by operation on the intensity values within a support window such as box-filters or weighted sums. Disparity is determined simply with the minimum matching cost by the “winner-take-all” (WTA) strategy. Local algorithms have high computational efficiency but with several disadvantages. The use of a support window implicitly assumes that the disparities between the pixels within the window are consistent. This violates real world conditions as discontinuities are ubiquitous; local algorithms yield sparse and inaccurate correspondences and derive ambiguous results. Moreover, the obtained disparity for each pixel is locally optimized; no matter if a fixed or an adaptive window is selected. The size of the window also influences matching performance.
Global matching algorithms, however, use all the information from all the pixels in the image to estimate disparities. Typically, cost aggregation is skipped and disparity computation is the key step. Global algorithms are formulated as an energy function, combining a data term and a smoothness term. The data term is composed of matching costs, and the smoothness term is an explicit expression of shape priors, which can be specifically designed. By introducing the smoothness term, global algorithms can implement regularization constraints to adjacent pixel disparities to preserve discontinuities. Therefore, global algorithms behave robustly even in occlusive, discontinuous, or textureless regions. Disparities are obtained when the global energy is at the minimum. This, however, is an NP-hard problem [
7], thus optimization methods such as belief propagation (BP) [
8] and graph cut (GC) [
7,
9] are used to approximate the minimum solution of the global energy. The disparity quality of global algorithms is precise but at a high computational and memory cost.
The semi-global matching (SGM) algorithm proposed by Hirschmüller [
10,
11,
12] achieves a compromise between accuracy and computational efficiency. SGM approximates the 2D energy optimization problem by aggregating cost along several 1D scanlines, greatly reducing the processing time. Pixel-wise disparity is determined by adding up matching costs from all directions and selecting the disparity where the cost is minimal. Beyond its accuracy and fast matching speed, the ability to retrieve homogeneous surfaces as well as preserve sharp height discontinuities makes SGM one of the most effective matching algorithms. Modifications to the original SGM are an active area of research and there are a number of modified versions of SGM like tSGM [
13], weighted SGM [
14], iterative SGM [
15], SGM forest [
16], object-based SGM [
17], and so forth.
In terms of radargrammetric DSM generation, local matching has been the dominant matching approach since the 1980s [
18,
19,
20,
21,
22]. No work has been published to investigate the feasibility and effectiveness of performing pixel-wise dense matching methods for more than two decades. Until 2014, a hierarchical semi-global matching was used to retrieve DSM from TerraSAR-X stereo imagery, demonstrating that the epipolar geometry concept widely used in optical stereo methods could also be applicable to satellite SAR stereo pairs [
23]. The authors also applied the same approach to TerraSAR-X staring Spotlight triplets [
24] for 3D mapping. They further improved the epipolar rectification method in [
25] where the SAR image pairs are projected onto a coarse a priori DEM (e.g., SRTM, ASTER) instead of a plane. However, the work in [
23,
24,
25] focused on the study of the space-borne SAR epipolar model, and we can only know some major parameters and that a constant penalty function was adopted from [
23], no parameters or details of the SGM algorithm were presented in [
24] or [
25].
In 2016, Di Rita et al. developed a radargrammetric DSM workflow termed the DATE workflow [
26,
27,
28] that exploits the OpenCV library Semi-Global Block Matching (SGBM) algorithm [
29,
30]. The workflow also implements a coarse-to-fine matching method, which computes a disparity map on each pyramid level. The DATE workflow does not transfer disparity from the higher level to a lower level to reduce the disparity search range for a lower-level matching; instead, it converts the disparities to height correction values at every level. The final DSM is obtained by iteratively adding height corrections to a coarse a-priori DSM at each pyramid level. The core of the DATE workflow relies on the a priori DSM hierarchically refined to generate a final DSM. In addition, the SGBM algorithm in OpenCV sets a fixed disparity search range for all pixels, which could lead to more outliers and voids in the output disparity map. Furthermore, users must set and adjust the minimum disparity value, the disparity range, and the matched block size [
31], all of which influence the final DSM accuracy. In 2018, Bagheri et al. proposed a semi-global matching framework to investigate the potential of 3D reconstruction from SAR-optical image pairs and feasibility of generating point clouds with a median accuracy [
32], but the framework is designed for extracting sparse homologous points from SAR-optical stereo pair and not for DSM reconstruction.
In this study, we propose a hierarchical SGM dense matching pipeline to generate high-quality radargrammetric DSMs in densely vegetated and mountainous areas. The feasibility and effectiveness of the semi-global dense matching algorithm is investigated using Stripmap and Spotlight mode TerraSAR-X stereo data pairs covering Mount Song in central China. Furthermore, we investigated the influence of three penalty functions on the vertical accuracy of final DSMs. The pipeline was built in a user-friendly manner where only two penalty parameter values need to be set. Without any post-processing, the disparity map can be directly used for generating final DSMs. A high-resolution airborne photogrammetric DSM was used to validate the radargrammetric DSMs. NASA’s 30 m resolution SRTM DEM and DSMs extracted by hierarchical NCC matching approach were also included to compare the vertical accuracy. The results demonstrate that, our SGM pipeline not only produces DSMs with higher vertical accuracy than the SRTM DEM and NCC-derived DSMs, but is also more efficient than the NCC local matching method. In addition, a penalty function exploiting the Canny edge detector delivers higher vertical accuracy than the constant penalty function, or the gray gradient penalty function in stereo SAR semi-global matching. Semi-global matching is a powerful alternative tool to local matching algorithms with higher accuracy and efficiency for radargrammetric DSM generation in complex mountainous areas.
2. Materials and Methods
We implemented a hierarchical semi-global matching method—a tSGM-like method, for matching stereo SAR images. After preprocessing the SAR images, epipolar images are generated from stereo SAR amplitude images, and Gaussian pyramid images are derived from the epipolar pair. Consequently, SGM is executed on each pyramid level to produce a disparity map. The disparity map of a higher level is transferred to a lower level, dynamically shortening the disparity search range for every pixel at the lower level. Disparities of the lowest pyramid level form the final desired disparity map. Image coordinates of homologous points are calculated from the output disparities. In turn, these homologous points are intersected based on the range-Doppler model to derive 3D geographic coordinates, so the coordinates are used to yield a final gridded DSM by a triangulation interpolation. The formulation of the penalty function as expressed in Equation (1) influences the matching performance (discussed in
Section 2.4). Thus, we encoded three different formulations of the penalty parameter
in the hierarchical SGM pipeline and obtain corresponding disparity maps to assess the impact of different penalty functions on the final DSM accuracy. In addition, we implemented a hierarchical NCC local matching algorithm for comparison. In this section, we interpret principles and fundamental procedures of these matching algorithms.
2.1. Mount Song Test Site and Datasets
The test site is situated at Mount Song in the northwest of Henan province in China, as shown in
Figure 1. Mount Song is one of the prominent cradles of Chinese civilization and is well-known as the central mountain of the Five Great Mountains of China. With height ranging from 150 m to 1512 m above sea level, Mount Song possesses a complex terrain. Mount Song, with 3.6 billion years of geological history, is composed of metamorphic and sedimentary rocks from five geologic periods including the Archean, Proterozoic, Paleozoic, Mesozoic, and Cenozoic epochs. The mountain massif stretches from east to west and constitutes steep slopes and precipitous cliffs.
Our study area is covered with vegetation, thus the DEM generation by the InSAR approach is heavily affected by decorrelation, and therefore, radargrammetry is a suitable way to map this area. We selected two pairs of stereo TerraSAR-X/TanDEM-X images to generate DSMs for the Mount Song area. One was a Stripmap (SM) stereo pair (SM_0712, SM_0718) collected from an ascending orbit; the other was a Spotlight (SL) stereo pair (SL_0925, SL_1001) collected from a descending orbit. The image acquisition parameters are listed in
Table 1. The coverage of the stereo images is shown in
Figure 1.
As shown in
Figure 1, one Stripmap image is shielded by heavy rain, which affects the stereo matching. The influence is seen in the final radargrammetric DSMs generated from this Stripmap pair. An airborne photogrammetric DSM at 1 m resolution with a height precision of 1 m was used as the reference data for evaluating the quality of our radargrammetric DSMs. Although it was acquired in 2009, two years earlier than the stereo SAR image acquisition time, the study area remained relatively stable and demonstrated no big changes so that the reference DSM is sufficient to validate our radargrammetric method.
NASA’s 30 m spatial resolution SRTM DEM [
2] acquired with InSAR approach was also used for the height accuracy comparisons with the DSMs generated with our radargrammetric methods. The SRTM DEM and the radargrammetric DSMs are comparable because they were all generated from X-band SAR images, thus the extracted heights all refer to the upper surface of objects on earth. The SRTM DEM was also used in the production of the epipolar images for stereo matching, as discussed in the next subsection.
2.2. Epipolar Rectification
Epipolar images are the input data required for the SGM pipeline. After epipolar rectification, correspondences are aligned on epipolar lines so that the 2D image matching problem is simplified to 1D matching along epipolar lines. We applied multi-looking and Lee filtering to attenuate the inherent speckle and noise in SAR amplitude images, then adopted the projection trajectory method assisted by external DEM, proposed by Perko et al. [
25], to generate rectified epipolar images in object space. For the Stripmap mode images with the range resolution of 1.8 m and the azimuth resolution of 3.3 m, we performed 3 × 3 multi-looking on the amplitude images. For the Spotlight mode images with the range resolution of 1.2 m and the azimuth resolution of 1.6 m, we performed 5 × 5 multi-looking on the amplitude images. Therefore, the epipolar images of both stereo pairs were generated with a spatial resolution of 10 m, and DSMs were derived from both stereo pairs with the same resolution of 10 m. If amplitude images at full resolution are used to generate epipolar images, the matching process will introduce many mismatched disparities or voids due to the speckles and noise maintained in SAR images. The a priori SRTM DEM helps eliminate most of the SAR geometric distortions in range direction including layover and foreshortening effects. When such a priori DEMs are not available, we can generate them with commonly-used radargrammetric methods such as NCC matching, or simply by applying an identical mean height value as proposed by [
23]. Discussion of epipolar geometry, however, is not the focus of this study and readers can refer to [
23,
25,
33] for further details. We show the stereo anaglyphs of epipolar images generated from our TerraSAR-X stereo pairs in
Figure 2.
In
Figure 2, the size of Spotlight epipolar images in (a) is 1650 × 1689 pixels, and the size of Stripmap epipolar images in (b) is 5902 × 5261 pixels. The blue and red colors represent the disparity between the left and right epipolar images. Only one color showing in the overlapping areas indicates that there is no disparity between the homologous points, and most of the disparities were eliminated in each epipolar pair. The white regions refer to areas that lack matching texture. This will be shown in
Section 3.3.2. To further illustrate the geometric consistency of the epipolar images, a zoomed region was selected from each epipolar pair and shown in
Figure 3.
Figure 3 shows most of the homologous points match in the stereo anaglyphs, appearing as the same color. Only slight residual disparities remain along the epipolar direction shown in the red or blue color. In this case, the disparity search range shrinks. Thus, stereo matching is easier and faster.
2.3. Hierarchical Semi-Global Matching
The semi-global matching algorithm models the correspondence problem as an energy function which combines a data term and a smoothness term, in the same way as global matching algorithms. Semi-global matching minimizes a global cost function to extract pixel-wise disparity based on the following form:
where
represents the disparity image with the same size of epipolar images, in which the value of each pixel is the disparity of corresponding pixel in the base image.
is the global energy to be minimized by finding optimal disparities. To the right of the equal sign, the first term is the data term summing up matching cost of every pixel where
represent the pixel on base image and
is the pixel-wise photo consistency measure. The second term is one part of smoothness term regularizing small disparity changes where pixel
is in the vicinity of pixel
. Operator
T produces a value of one if the subsequent condition is true and otherwise produces a zero value. The third term is the other part of smoothness term controlling large disparities to maintain height discontinuities. The smoothness term forces a small constant penalty
when the disparity difference of pixel
and its neighbor pixel
equals to one, and a large constant penalty
when the disparity difference of pixel
and its neighbor pixel
exceeds one.
2.3.1. Matching Cost Calculation
The first step of SGM is calculating matching cost for every pixel on the rectified base image. Given a disparity
, the photo consistency is computed for pixel
on the left epipolar image and pixel
on the right epipolar image. Census transformation [
5] was chosen as the photo consistency measure because it delivers the most robust matching cost results among a variety of parametric and non-parametric matching cost methods, and the mutual information cost method [
34].
We adopted a 9 × 7 window to perform the census transform for every pixel. It compares the amplitude value of each pixel within the window with the center pixel to produce a binary string. By performing exclusive-OR operation between the binary strings of suspected correspondences, the matching cost was calculated by Hamming distance. For each pixel, the matching cost is calculated for every disparity in its disparity range and accumulated in the cost aggregated step.
2.3.2. Cost Aggregation
SGM aggregates matching costs following the concept of Dynamic Programming. It chooses multiple directions on image to perform the aggregation and finally adds up matching costs of all directions instead of optimizing along only one image row. The original SGM calculates the aggregated cost
for pixel
given disparity
along arbitrary direction
by Equation (2):
where
is the previous pixel of pixel
along the path.
As we implemented a hierarchical SGM method, we transferred the disparity from a higher pyramid level to a lower pyramid level in order to make an adaptive adjustment of disparity search range for every pixel, which is no longer an identical disparity search range like [–32, 32] for all pixels. Specifically, the disparity map of a higher level is upscaled by two to match the size of the next level, so the disparity value of pixel
is multiplied by two to serve as the initial disparity value
for the same pixel on a lower level. The complete disparity search range for pixel
on the lower level is defined as [
]. Thus, a pixel-wise disparity range is determined for each pixel on the lower level, and the radius of four guarantees that the range is not too large to increase ambiguities or too narrow to miss the correct disparity value. This is a coarse-to-fine matching strategy that decreases outliers by narrowing the disparity search range as well as prompts matching efficiency. Consequently, the aggregated cost is modulated according to Equation (3):
The aggregated cost is added up from all directions to obtain the final cost for pixel
at a given disparity
:
2.3.3. Disparity Computation and Refinement
To determine the final disparity within the disparity range for pixel
, we select the disparity with the minimum aggregated cost in accordance with the “Winner Takes All” approach.
The sub-pixel level disparity is estimated by fitting a quadratic curve through the determined pixel-level disparity and the two disparities corresponding to the aggregated costs on either side of the minimum cost. However, these sub-pixel level disparities contain outliers and mismatched points due to occlusions and weak texture. Left-right consistency check, removal of peaks, and median filtering are conducted according to Hirschmüller [
12] to remove unreliable disparities. At this stage, the hierarchical semi-global matching is completed and the final disparity map can be exported. The disparity map can be directly used for spatial forward intersection in the next step without any post-processing.
2.4. Penalty Function Schemes for
We encoded three penalty functions for in our hierarchical SGM pipeline for evaluation. The three SGM algorithms with each formulation are abbreviated with SGM_const, SGM_gray and SGM_canny hereinafter. The penalty functions to be evaluated are: constant penalty, gray gradient penalty, and Canny edge penalty. The formulations of the three penalty functions are as follows:
- (1)
is a constant value determined by experience:
- (2)
is inversely proportional to the absolute gray/amplitude gradient of the current processed pixel
and its proceeding pixel
along the path. The original SGM adopts this penalty formulation for
:
- (3)
is calculated by the Canny algorithm which is an advanced edge detection algorithm. This penalty function is applied in the tSGM method [
13]:
is the binary value of the Canny edge detector.
and
are empirically determined constant values. When an edge is detected, low penalty
is imposed. Otherwise, when no edge is detected, larger penalty
is applied. In practice, we found that making
equivalent to
could give reliable results for the census matching cost.
To find the optimal parameter values for each penalty function, we used a “coarse-to-fine” strategy to adjust the parameter values. This strategy guarantees that the step size is not too large to miss the optimal value and not too small to bring heavy computation. First, we set the tuning range for as [10, 200] with a step size of 20, and the tuning range for as [50, 1500] with a step size of 100. Notice that for every () group, should be larger than . In practice, we found that different combinations of and make a big difference on the completeness of the disparity map, i.e., the percent of effective disparity values, while the effective disparity values almost remain the same using different combinations. Therefore, the optimal values of are selected based on the disparity map with the highest completeness. Second, a smaller step size is used to further adjust the optimal values obtained in the first step. The new step size is 10 for and 50 for . The final optimal values are determined with the highest disparity completeness.
2.5. Hierarchical NCC Matching
For comparison, we also implemented a hierarchical normalized cross-correlation (NCC) matching algorithm for the epipolar pyramids to generate the DSMs for the Song Mountain study area. NCC is still the most common SAR imagery matching method, no matter for DSM reconstruction [
22,
35,
36] or for surface motion estimation by pixel offset tracking technique [
37,
38,
39]. The hierarchical NCC matching strategy is widely used in radargrammetry. Commercial software products such as PCI-Geometica [
40], SISAR [
20], and SARscape [
41], all implement a hierarchical NCC matching method in their radargrammetric modules.
NCC matching is a typical local algorithm in which correspondences are searched based on the similarity of two match windows. The NCC coefficient
of two window templates is formulated in Equation (9):
where
is the pixel on the base image for which the NCC coefficient is calculated,
is the pixel with disparity shift
along epipolar direction on the matching image:
;
represents the matching window on the base image and the matching image;
and
are the amplitude values at
in the matching window of the base and matching image respectively;
and
are the mean values of base and match window. For each pixel in the search window on the match image, a NCC coefficient is calculated. Abiding by the “Winner Takes All” (WTA) rule, the correspondences are determined with the largest NCC coefficient which exceeds a preset threshold. For comparison with SGM results, no post-processing is applied to the NCC disparity map.
In our experiment, we generated a Gaussian pyramid of epipolar images with five levels. Thus, stereo matching was performed on each level and disparities were transmitted from a higher level to a lower level. The main parameters of each stereo matching algorithm are summarized in
Table 2.
As shown in
Table 2, we adopted a matching window size of 5 × 5 pixels for the top two epipolar pyramid levels, a 7 × 7 window size for the third level, and a 9 × 9 window size for the last two levels, as window size influences the performance of local matching methods. For SGM algorithms, a window size of 9 × 7 was adopted to calculate matching cost based on census transformation at every pyramid level. The number of cost aggregation paths in our SGM pipeline is eight. The only difference of the three SGM algorithms is the penalty function applied.
2.6. Stereo SAR Intersection
Once the disparities are exported after image matching, we can calculate the image coordinates of homologous points following the same epipolar rectification steps. The homologous points are then used for spatial intersection to obtain corresponding geographic coordinates. The range-Doppler (RD) model is used for the stereo intersection. The mathematical formulation of RD model is shown in Equation (10) where the first is Range Equation and the second is Doppler Equation.
is the geolocation of target at range time
,
and
are the geolocation and velocity vector of satellite sensor at azimuth time
, which is implicitly included in the second equation. The distance between sensor and target equals to the product of light velocity
and range time
. In case of zero-Doppler, the squint angle
is zero.
Each point on the SAR image corresponds to a pair of range time and azimuth time . For each pair of homologous points on the stereo images, we obtain two sets of RD model expressions, four equations in total. Therefore, the 3D geolocation of homologous points can be derived by solving the condition equations with least squares adjustment. After the stereo forward intersection, the dense geographic coordinates were resampled into a final gridded DSM. No post-processing or outlier detection is conducted during this process. The final DSMs are direct products produced from the disparity maps.
3. Results
In our experiments, we compared the Mount Song DSMs generated by SGM with three penalty functions and NCC method. All the radargrammetric DSMs were resampled to a 10 m spatial resolution for height accuracy evaluation. The height accuracy was assessed in comparison with the reference airborne photogrammetric DSM. The height datum of radargrammetric DSMs and the reference airborne DSM is WGS84 ellipsoid, thus the SRTM DEM height was converted from EGM96 geoid to WGS84 datum for comparison. We processed epipolar images using stereo matching, spatial forward intersection, and triangulation interpolation to generate gridded DSMs. To genuinely reflect the matching performance of SGM, the disparities were directly put into the forward intersection step, and all the intersected spatial points were imported to the triangulation interpolation, no post-processing step or outlier detection was applied. The matching results and vertical accuracy assessment of the radargrammetric DSMs are presented in this section.
3.1. Optimal Penalty Parameter Values
Adopting the two-step “coarse-to-fine” parameter tuning strategy as presented in
Section 2.4, we calculated the completeness of the disparity maps. The optimal parameter values for the three SGM penalty functions were determined using the disparity map with highest completeness. The optimal penalty parameter values for the Spotlight (SL) and Stripmap (SM) stereo pair are listed in
Table 3.
As illustrated by
Table 3, the optimal parameter values depend on which penalty function was used. In terms of SGM_const and SGM_canny, we found the optimal parameter values of
and
are (150, 200) for both the SL and the SM stereo pair. With respect to SGM_gray, the optimal values for the SL data are the same as those for the other two functions; however, the parameter tuning for the SM data was more time-consuming and no regularity was found. These parameters provide an empirical reference to process other stereo SAR images with the SGM method. Meanwhile, the completeness of matching depends on which penalty function is applied.
3.2. Matching Completeness
Mismatched pixels on the epipolar image were marked as invalid disparity value on the disparity map. The mismatched pixels were detected by the disparity consistency checking in SGM pipeline and by NCC threshold judgement during NCC matching. Since the disparity values are directly used to generate the final DSMs without post-processing, the matching completeness was defined as the percentage of valid disparity values with respect to the total number of pixels of overlapping region in the epipolar image. The matching completeness of the two stereo pairs is listed in
Table 4.
Table 4 shows that the completeness of all the SGM-derived disparity maps was higher than 96%. For NCC matching, we selected the NCC threshold with the highest DSM precision after validation with the reference DSM and calculated the matching completeness. The disparity map completeness for each SGM algorithm was very close, but the height accuracy of DSM generated by each algorithm is different, as will be revealed in the next sub-section.
3.3. Quality Assessment of Radargrammetric DSMs
For each stereo matching algorithm, we obtained two gridded DSMs: one from the Spotlight pair and one from the Stripmap pair. The DSM reconstructed from the Stripmap pair using the SGM_canny algorithm is displayed with hillshading in
Figure 4: the landform and terrain relief was clearly rebuilt by the SGM method. Visually, there is no distinct noise or outlier except a small area rendered as black spots in the northwest. This is consistent with the area obscured by rain clouds seen in image SM_0718. We selected three sub-regions covered by the airborne photogrammetric DSM, marked by blue, yellow, and orange rectangles in
Figure 4, to quantitatively assess the quality of the DSM.
We subtracted the height of airborne reference DSM from the radargrammetric DSMs in the quantitative evaluation. Based on the derivation from reference height, several commonly used quality indicators were calculated to demonstrate and compare the accuracy of the radargrammetric DSMs. These metrics were mean error, mean absolute error (MAE), the root mean square error (RMSE), and the linear error of 90% (LE90). LE90 is the absolute height error located where 90% of the sorted absolute height errors are smaller than or equal to it. Unlike the first three metrics, LE90 is a more reliable and robust indicator even when height errors do not obey normal distribution. In the following sub-sections, we sum up the results and assess the height accuracy for each test area.
3.3.1. Height Accuracy of Test Area 1
The mountainous test area 1 marked by the blue rectangle in
Figure 4 is only within coverage of the SM data. The area has an East-West width of 8117 m and a North-South length of 7118 m. The height range is 260–1385 m. The DSM vertical accuracy indicators are summarized in
Table 5.
As shown, the indicator values from all the three SGM algorithms were smaller than those from the SRTM DEM. The LE90 from SGM_const and SGM_canny are more than 1 m smaller than that from SRTM DEM. The SGM_gray indicator values are slightly smaller than those from the SRTM DEM but the SGM_canny algorithm yielded the highest accuracy in terms of all accuracy indicators. The NCC method yielded the lowest accuracy. To demonstrate the results visually, all the DSM products for this test area are shown with hillshading in
Figure 5.
In
Figure 5, height is indicated by graduated colors. The figures coincide with the results presented in
Table 5: Compared to the SRTM DEM, the SGM-derived DSMs reveal more details and more clarity. Despite the smoothness, topography of the SRTM DEM is more blurred than the SGM DSMs. The SGM DSMs show similar hillshading effects and can only be distinguished by the sporadic noise mainly situated on the top area of the figures. The massive amount of noise in the NCC DSM conceals the topographic details. To further reveal the differences of these height products, two sets of height profiles along the line segments marked as “m” (
) and “n” (
) in
Figure 5c were extracted and plotted in
Figure 6 and
Figure 7. We first interpret the profiles along
in
Figure 6.
As seen in
Figure 6a, the range of the height profile along
is 440–840 m. The height curves of SGM DSMs and SRTM DEM are close to each other. To examine the difference between the height curves more closely, we zoomed in on a portion marked by a dotted box in
Figure 6b. The zoomed view shows the height profile from SGM_canny algorithm was closest to the reference DSM. The SRTM DEM profile generally deviates more from the reference DSM than the SGM profiles. NCC curve contains the biggest deviations in height. Profiles along
show a similar trend in
Figure 7.
As shown in
Figure 7a, the height range of the height profile along
is 350–1200 m. The height profiles of SGM DSMs and the SRTM DEM almost coincide. The zoomed view in
Figure 7b shows the profile from SGM_canny DSM approaches the reference DSM most. The SRTM DEM profile contains larger height deviations than the SGM ones. The NCC curve deviates most from the reference DSM.
To sum up, the two sets of height curves in
Figure 6 and
Figure 7 follow similar trends. The length of both
and
is approximately 7000 m long. Along the entire distance, height profiles from SGM DSMs show no systematic offset compared to the reference DSM profile. The zoomed views reveal that profiles produced using the SGM_canny DSM most closely matches the reference height. The deviation distribution of each profile verifies the accuracy indicator values.
3.3.2. Height Accuracy of Test Area 2
The mountainous test area 2, marked with a yellow rectangle in
Figure 4, is within the coverage of both the SM and SL data. It has an East-West width of 3313 m and a North-South length of 3368 m. The height range of this area is 440–1200 m. The accuracy indicators of DSM generated from Spotlight (SL) stereo pair and Stripmap pair are summarized in
Table 6 and
Table 7.
Table 6 shows that for the SL data, the vertical accuracy of SGM DSMs is slightly lower than that of the SRTM DEM: the RMSE values range from 11.3–11.6 m and the LE90 values range from 17.6–17.9 m, 0.6–1.7 m larger than those from the SRTM DEM. For the SM data in
Table 7, the vertical accuracy of SGM DSMs is the highest: the RMSE values range from 8.9 to 10.5 m, and the LE90 values range from 12.5 to 15.4 m which are generally 2 m smaller than those from the SRTM DEM. The NCC method provides indicator values about two times larger than SRTM. All in all, the SGM_canny algorithm provides the highest accuracy with the SM mode stereo pair. The DSM products for test area 2 are shown in
Figure 8 with hillshading.
In
Figure 8, SGM DSMs reveal more terrain details and more clarity than the SRTM DEM. Again, the NCC DSMs are full of noise. Comparing the Spotlight DSMs in the second row and Stripmap DSMs in the third row, we found the results show similar smoothness. However, notice the region marked with the red ellipse in
Figure 8b, the hillshading appears as triangulation facets. The triangulation facets also appear in all the SL mode SGM DSMs in
Figure 8d–f. Only the SGM_canny result in
Figure 8f shows less facets and more complete mountain terrains than the other Spotlight results. These facets indicate that stereo matching in this area found very low density of correspondences. Thus, the hillshading appeared as triangulation facets after the triangulation interpolation. To investigate the causes, the Spotlight stereo amplitude images, the epipolar images and the disparity map in this region are reviewed and shown in
Figure 9.
Figure 9a shows the subsets of ascending SL mode amplitude images corresponding to the facets, and a distinct layover can be seen on the slope facing towards the radar sensor. The geometric distortion is considerably corrected after epipolar rectification as shown in
Figure 9b. However, there is no texture information in the severe layover region which appears as white color on the rectified epipolar images. In
Figure 9c, the subset of the Spotlight mode disparity map generated by the SGM_gray algorithm indicates invalid disparity values by white-colored regions appearing in the layover area. Homologous points were determined by the disparities and intersected to derive 3D geographic points. We plotted the geographic points with black color overlaying on the reference DSM hillshade. The reference DSM is almost completely covered by the dense geographic point layer but no points exist where layover effects appear. Therefore, facets were produced after the triangulation interpolation and remained in the final gridded DSM. In turn, the layover impact also explains why the height accuracy of Spotlight results in
Table 6 is lower than that of Stripmap results in
Table 7. For the same area in the descending SM data, there are no layover distortions, so that the height was built successfully without any triangulation facets appearing. As we generated epipolar images and final DSMs for both SM and SL data at the same 10 m resolution, the layover impact is the main reason for the difference in accuracy.
3.3.3. Height Accuracy of Test Area 3
Test area 3, a flat area marked with an orange rectangle in
Figure 4, is within the coverage of both the SM and SL images. It has an East-West width of 3705 m and a North-South length of 3797 m. The height range is 410–525 m. The accuracy indicator values of DSM generated from Spotlight (SL) stereo pair and from Stripmap (SM) stereo pair are summarized in
Table 8 and
Table 9.
Table 8 and
Table 9 reveal that for both the SL data and SM data, SGM method yields accuracy values that are quite close to the SRTM DEM. The RMSE values and LE90 values from SGM results are only several decimeters larger than those from the SRTM DEM. In addition, the accuracy indicators of SGM_const, SGM_gray, and SGM_canny results are very close, almost identical. Still, NCC delivers the lowest accuracy. In
Figure 10, we plotted the DSM products for this test area with hillshading.
Figure 10 shows that NCC DSMs are too noisy to reflect small topographic changes, while the SGM DSMs reconstruct detailed topography in this area. With the same 10 m resolution, the SL mode SGM DSMs are less noisy than the SM mode DSMs, though the corresponding quality metric values visible in
Table 8 are very close to those in
Table 9. The SGM algorithms produced quite similar results for the Spotlight data, while for the Stripmap data, the SGM_canny algorithm generated the smoothest result with the least noise in
Figure 10i.
3.4. Matching Processing Time
We recorded the processing time of each radargrammetric matching algorithm to evaluate the efficiency. All radargrammetric algorithms were implemented with C++ language and ran on a notebook computer with an Intel Core i5-7300HQ CPU (4 cores, 2.5 GHz). The time consumption is summarized in
Table 10.
The matching for SL stereo pair took a shorter time than the SM stereo pair, since the size of SL mode epipolar image is one third of that of the SM mode epipolar image. For SL data, it cost less than six seconds for all SGM algorithms and 27.6 s for the NCC method. For SM data, all SGM algorithms spent less than 1 min, while NCC spent more than 13 min. Notice that SGM_const is the fastest for using constant penalty values, but SGM_gray and SGM_canny cost only a few more seconds.