1. Introduction
The emergence of Synthetic Aperture Radar (SAR) imagery has revolutionized geospatial information, particularly in photogrammetry and remote sensing. SAR offers distinct advantages over optical imaging, including all-weather, day-and-night operation, subsurface and surface-penetration capability, and complementary information that often surpasses conventional optical data. Among its key applications is the extraction of elevation information, such as in digital elevation models (DEMs) and digital surface models (DSMs) [
1,
2,
3].
Two primary methodologies dominate elevation extraction from SAR data: interferometry (InSAR) and radargrammetry [
4,
5,
6,
7,
8]. InSAR exploits the phase difference between two SAR images acquired from spatially separated sensor positions. The resulting interferogram—encoding elevation variations as phase values between 0 and 2π—suffers from inherent phase ambiguity. Resolving this ambiguity (phase unwrapping) is critical for deriving absolute elevation values and for mitigating SAR-specific speckle noise [
9,
10].
While InSAR can theoretically achieve high accuracy, practical implementation faces significant challenges [
9,
11,
12,
13]. Elevation-estimation accuracy is highly sensitive to the baseline distance between acquisitions: a larger baseline improves elevation precision but degrades coherence, particularly over homogeneous or vegetated regions (e.g., forest canopies). Precise, pixel-level registration of the image pair is also essential to ensure that phase differences correspond to the same ground targets. Phase unwrapping remains computationally intensive and often requires trade-offs between accuracy and processing time. Moreover, local matching during image registration further escalates computational demands, limiting real-time applicability [
6].
Radargrammetry, an alternative approach, leverages intensity information rather than phase, drawing on photogrammetric principles [
14,
15,
16,
17,
18,
19,
20]. As illustrated in
Figure 1, same-side stereo imaging is preferred because of its radiometric consistency, avoiding distortions inherent in opposite-side configurations. Although SAR and optical imaging share geometric similarities, SAR’s ground-to-image projection is circular rather than perspective [
15,
21,
22,
23,
24].
Although radargrammetric methods require appropriate acquisition conditions, they are less sensitive to imaging geometry than interferometry. Because they operate on intensity images, radargrammetric approaches can benefit from advances in photogrammetric matching techniques with suitable modifications. Given the significant progress in both sparse and dense matching methods, radargrammetry has garnered increasing attention as a viable tool for elevation extraction. It also provides auxiliary benefits for interferometry—improving image registration and supporting phase unwrapping—when used to generate complementary elevation data [
6]. Additionally, radargrammetry facilitates the integration of SAR and optical data, enabling hybrid geospatial information systems.
Despite its potential, radargrammetry must contend with challenges less pronounced in optical stereo imaging. These include speckle noise [
25], lower radiometric resolution, weaker texture patterns, and geometric distortions, such as foreshortening, layover, and shadowing. Radargrammetric elevation accuracy is highly sensitive to the precision of the pixel-matching process, particularly in same-side stereo configurations. Addressing these challenges begins with effective speckle reduction, which improves radiometric quality and facilitates more reliable matching.
Dense disparity maps generally require area-based matching; feature-based methods yield only sparse correspondences and must be supplemented by area-based techniques—particularly when working with rectified images that minimize the search space [
26,
27,
28,
29,
30,
31].
Feature-based correspondence techniques [
32,
33,
34,
35,
36,
37,
38,
39,
40], beyond traditional dense matching and deep learning methods, offer a sparse alternative, particularly effective on raw or unrectified imagery. These methods first detect salient key points—such as corners, circular features, and edges—that appear in both reference and secondary images. Because these features often reside in high-frequency regions (
Figure 2), they occupy only a small portion of the image, simplifying matching but limiting density.
This sparsity proves efficient for estimating initial geometric relationships or minimizing mismatch risk but inherently lacks coverage for full-image matching. Even when key points are artificially multiplied, computational costs escalate quadratically, making full-pixel comparison impractical. As a result, a small subset of points is used to derive epipolar geometry and resample images, narrowing the search space to a linear domain.
Moreover, defining robust descriptors for each point—often nonlinear due to stability and invariance demands—is central to performance, especially when handling multi-sensor data. Applying such descriptors across all pixels for dense matching introduces memory and complex bottlenecks. Therefore, in elevation modeling via radargrammetry or photogrammetry, dense matching is typically deferred until after geometric rectification using feature-based methods, which excel in this preprocessing stage.
Over the years, numerous area-based matching methods have been developed. Broadly, these methods can be divided into two main groups: classical methods (from simple techniques to advanced hybrid models) and deep learning-based methods [
42,
43,
44,
45,
46,
47,
48,
49,
50,
51]. Deep learning-based approaches have achieved promising results; however, several considerations limit their effectiveness for SAR image processing. First, deep learning relies on large, labeled datasets for training. While such datasets are increasingly available for optical imagery, assembling extensive and labeled SAR datasets remains challenging. Second, in contrast to classification tasks—where the output space is discrete—dense disparity estimation involves a continuous, high-dimensional output space. Consequently, most deep learning-based stereo-matching methods are trained and tested on narrow, domain-specific datasets, limiting generalizability.
In contrast, area-based algorithms compute disparity maps image by image, eliminating dependence on prior training. These algorithms are generally implemented in two forms: local and global/semi-global [
43]. The general structure of area-based matching models consists of four main components: (1) computation of a similarity (or cost) metric, (2) aggregation of similarity values, (3) disparity calculation, (4) refinement of the disparity map.
In local methods, a square window centered on each pixel is used as the feature region. The corresponding pixel in the second image is identified by finding the most similar region. Because the window itself provides contextual information, an explicit cost-aggregation step is usually unnecessary. However, a well-known trade-off exists: small windows are desirable for precise localization, whereas larger windows help to reduce mismatches—particularly in low-texture or repetitive regions. This trade-off is critical in SAR intensity images, which often exhibit homogeneous textures and speckle noise.
Classical local methods include similarity metrics, such as normalized cross-correlation (NCC), sum of squared differences (SSD), and sum of absolute differences (SAD). Other metrics include the census transform [
52,
53,
54], which converts local neighborhoods into binary vectors matched via the Hamming distance, and mutual information [
55], which is effective when radiometric differences are present. Although local methods are computationally efficient—especially with small kernels—they often perform poorly in the low-texture regions common in satellite SAR imagery.
To address this limitation, global and semi-global methods [
41,
56] introduce regularization terms that enforce smoothness or penalize abrupt disparity changes [
57]. While these methods improve accuracy, they are more resource-intensive in memory and computation. Multi-scale pyramid structures, image partitioning with overlaps and adaptive cost functions are commonly employed to optimize performance. In the context of SAR imagery, semi-global matching (SGM) has been adapted successfully. Studies such as [
14,
17] applied SGM to radargrammetry and reported improved performance with pyramid-based implementations and adaptive penalties (SGM-Gray and SGM-Canny).
The core concept proposed in this study is to leverage local methods characterized by low computational complexity. Among dense matching techniques, local approaches theoretically offer the lowest complexity, and in practice, this attribute underpins the foundation of the proposed algorithm, see
Figure 3. Consequently, the method inherently exhibits lower computational demand compared to non-local alternatives. To facilitate a clearer understanding, a comparative summary table is provided in
Table 1 and
Table 2, offering an intuitive overview of the core dense matching strategies. This table also helps to contextualize deep learning-based methods within the broader complex landscape.
Nevertheless, dense matching remains a major computational bottleneck. Selecting suitable comparison kernels and attaining sub-pixel accuracy dramatically increase processing time. Therefore, optimizing this step is essential. To solve this issue, the present study proposes a low-complexity radargrammetric algorithm for SAR intensity images.
As outlined in the proposed framework section, the algorithm is fundamentally built upon restructuring the input data from a 2D image space into a 3D volume (data cube), where the kernel surrounding each pixel—after dimensionality reduction—is stored along the third axis. This enables rapid cost volume construction without relying on iterative procedures during implementation. Unlike sparse, feature-based matching methods which operate on a limited set of candidate points, dense matching requires exhaustive pixel-wise evaluation. In such contexts, direct feature storage or descriptor encoding along the third axis becomes infeasible due to memory constraints. Moreover, although feature extraction is typically designed for discriminability, unless expressible via linear formulations, its implementation for large-scale pixel grids proves impractical. Accordingly, the proposed algorithm adopts a two-stage linear computation pipeline. The rationale for this design is to balance accuracy with computational and memory efficiency: by aggressively constraining the initial search space using a high-confidence heuristic, the second refinement stage can operate with minimal computational overhead while achieving improved precision.
The rest of this work is structured as follows:
Section 2 describes the datasets and methods;
Section 3 and
Section 4 present results and discussion, respectively. Finally,
Section 5 concludes this research.
2. Materials and Methods
The first subsection describes the datasets, their features, and the technical justification behind their selection. The second subsection outlines the theoretical foundations, particularly for comparative analysis, while the third subsection details the proposed algorithm and its implementation.
2.1. Data Used
To assess the performance of the proposed algorithm, this study utilizes both Sentinel-1 SAR imagery (
Figure 4) and a TerraSAR-X stereo pair (
Figure 5) in comparison with other techniques.
These images were selected due to their established efficacy in generating elevation models. The study area spans elevations of 1000 to 3300 m above sea level and encompasses diverse terrains, including the rugged Zagros Mountains and flat/semi-flat agricultural regions. Notably, the mountainous regions exhibit homogeneous texture (due to shrub cover) and significant speckle effects, despite geometric distortions.
Sentinel-1’s C-band images were chosen despite their lower radiometric and spatial resolution compared to X-band systems (e.g., TerraSAR-X/TanDEM-X;
Figure 6). This selection intentionally increases the challenge for matching algorithms; successful performance here suggests strong generalizability to higher-resolution data.
Further, the large baseline of the dataset introduces incidence angle variations between identical pixels in the two images, exacerbating geometric distortions. The ground spatial resolution is 10 m (range and azimuth), covering a broader area than higher-resolution imagery. Although TerraSAR-X data (1–2 m) are often resampled to 10 m to mitigate speckle effects [
17], the proposed algorithm can be adapted to higher-resolution data with minor modifications.
To assess algorithmic stability, the study area was divided into 8 subsets. Ground range detected (GRD) SAR data were directly utilized and georeferenced using basic ground models, thereby minimizing the need for epipolar resampling. Nearly dense matching was conducted using NCC-based template matching, and narrow rotation between stereo pairs was compensated for using a simple affine transformation. While this method may not be universally applicable to all stereo SAR intensity pairs, it proved effective in this study because only a small portion of the original scenes was used. The affine transformation approach successfully addressed the azimuth parallax with an accuracy of approximately one pixel (see
Figure 7). However, in some datasets residual displacements of several pixels may remain.
Although image rectification and search-space reduction are critical for accurate elevation modeling, these topics exceed the scope of the present study. Readers are referred to [
27,
28,
58] for detailed discussion on epipolar resampling techniques in non-georeferenced SAR imagery.
Preliminary verification of azimuth displacement was performed using 2D NCC template matching, confirming the assumption of a 1D search space in the range direction. Compared to optical images, SAR intensity images generally pose fewer epipolarity challenges due to their inherent geometric stability, despite being more susceptible to geometric distortions resulting from its side looking acquisition geometry. Nonetheless, exceptions to this general observation do exist.
For validation, a reference elevation model with a vertical accuracy of 2.5 m was employed, derived from nationwide aerial imagery acquired approximately 15 years ago. Since the focus is on non-urban areas (agricultural/mountainous regions), temporal evaluation changes are negligible relative to the model’s accuracy. As will be shown in the Results Section, the vertical errors of he generated outputs exceed 30 m; therefore, alternative elevation datasets such as SRTM could also be used for comparative purposes without significantly affecting the conclusions. Example subsets (A1–A4) and their elevation models (B1–B4) are illustrated in
Figure 8.
In the second stage, a pair of TerraSAR-X stereo images was utilized to further analyze the performance of the proposed algorithm alongside other benchmarked methods. Acquisition parameters are listed in
Table 3. As illustrated in
Figure 9, severe distortions—especially across mountainous regions—have arisen due to the area’s topographic conditions. To mitigate these effects and reduce the influence on the output results, the normalized images used in the dense matching stage were epipolar resampled based on a DEM-driven approach [
27].
Furthermore, given the presence of broad homogeneous regions and minor discrepancies in ground range resolution caused by differing incident angles (particularly in topographically variable zones), both images were resampled to a common ground pixel size of 10 m. Additionally, the stereo pair and the corresponding reference surface elevation model were projected onto the WGS84 ellipsoid and UTM Zone 39. The selected study area spans elevation values ranging from 700 to 3000 m above sea level.
Since the elevation model used as the reference DSM for the Sentinel-1 image was acquired as a nationwide surface coverage, the same model was employed to assess the geolocation accuracy of the TerraSAR-X data over the Tehran region. Additionally, to enable epipolar resampling, an SRTM elevation model with a 30 m vertical resolution was utilized. As shown in
Figure 9, layover-affected regions of the test area have been excluded from the evaluation section to facilitate more accurate result analysis.
2.2. State-of-the-Art Methods
In the present study, to compare the proposed algorithm with existing methods, a normalized cross-correlation-based method in a pyramid structure is applied to increase computational speed, alongside the basic and improved variants of the SGM algorithms, namely, SGM-const, SGM-grey, and SGM-canny. This section outlines the key components of the state-of-the-art algorithms. For a more in-depth examination, readers are referred to foundational works [
17,
56].
2.2.1. General Structure of Disparity Model Generation in Rectified Images
Across various elevation modeling algorithms, the generation of disparity models forms the core processing stage. Depending on whether the method is local or global, the following general steps are typically involved:
Step 1: Preprocessing of the input image pair (essential for SAR images).
Step 2: Selection of a similarity or dissimilarity (cost) metric, with parameter tuning.
Step 3: Construction of a cost volume model based on the chosen metric. In local models, this step uses neighborhood kernels; in global or semi-global models, it is calculated on a pixel-wise basis.
Step 4.1 (local models): Refinement of the cost volume model using adaptive or hybrid methods.
Step 4.2 (global/semi-global models): Cost aggregation with regularization to handle abrupt disparity variations.
Step 5.1: Generation of the final disparity model using either minimum cost or maximum similarity.
Step 5.2 (optional): Post-processing to refine the disparity model.
In both local and semi-global approaches, the bulk of computational load lies in the cost volume calculation and aggregation stages, respectively. For semi-global models, although the initial pixel-wise cost volume calculation is computationally efficient, the aggregation stage significantly increases processing time. Due to RAM limitations, cost volume size must be constrained, necessitating pyramid-based or tile-based processing for large images.
2.2.2. Normalized Cross-Correlation
Among various similarity metrics, NCC is employed due to its robustness. The similarity between pixel
p in the reference image and
q =
p +
d in the target image is calculated as
where
W: Neighborhood window;
mi,j, si,j: Intensity values in the reference and matching images;
μm, μs: Mean intensities in W.
For pyramid-based implementation, assuming halving image dimensions at each level, the maximum number of levels
is estimated as
Here is to the smaller image dimension (rows or columns), and ⌊*⌋ denotes the floor function. Empirically, lower pyramid levels are often not utilized if their dimensions fall below a few hundred pixels. Therefore, 7 (for a minimum of 128 rows or columns) or 8 (for a minimum of 256 rows or columns) is subtracted. For example, an image with dimensions of about 5000 pixels typically results in 4 to 5 usable pyramid levels. To transfer results between levels, image warping is generally applied before reperforming the matching process at higher resolutions.
2.2.3. Semi-Global Matching Model
In SGM, the cost function extends the basic local similarity metric by incorporating regularization terms to enforce smoothness in the disparity map. This makes it a quasi-global method [
59,
60]. As depicted in
Figure 10, the generated cost volume exhibits pseudo-random behavior; hence, regularization is necessary to obtain coherent disparity maps.
Typically, the computational complexity of solving such a cost function is NP-hard [
39], and they are generally not solvable. The semi-global method [
38] addresses this issue by approximating the solution within a reduced subspace of the original problem domain, thereby overcoming the challenge of solving the full global cost function.
The final cost function at pixel
p with disparity
d is defined as
where
(
p,
d) is the aggregated cost along the path
ri,
i denotes the number of paths used, and
S(
p,
d) is the final cost for pixel
p at disparity
d. Each path-based cost
is defined recursively as
where
C(
p,
d) is the matching cost at pixel
p,
P1 and
P2 are regularization parameters controlling disparity smoothness and edge preservation, and
p −
r refers to the preceding pixel along path
r.
The subtraction of the minimum term prevents unbounded cost accumulation along the path. The values
and
are key to algorithm performance. Although these parameters can be adapted based on similarity or cost functions, the cost metric is commonly used. According to [
17], there are three typical formulations for
:
In this method, all pixels in the image have the same coefficient for P2.
SGM-gray (SGM-2)—intensity gradient-based penalty:
where
I(
p) is the grayscale intensity at pixel
p.
where
is the binary result of the Canny edge detector at pixel
, and
,
are respective penalties.
For a more comprehensive understanding of the aforementioned methods, the reader is encouraged to consult the relevant literature.
2.3. Proposed Methodology
As mentioned in the Introduction, elevation model generation is highly dependent on the accurate estimation of disparity models in stereo images. A brief examination reveals that, although adaptive modification of the cost function can be computationally intensive, the primary source of complexity in both local and non-local models lies in the computation of the cost function itself. In dense matching, the similarity metric must be computed for all pixels in the image, and this process is heavily influenced by the size of the kernel used.
In contrast to close-range imagery, optical satellite and SAR intensity images typically exhibit complex, repetitive textures and irregular patterns. These characteristics necessitate the use of larger kernels, which in practice offer more robust neighborhood information for comparison and result in cost volumes with reduced noise. However, due to the influence of elevation variation on the spatial configuration of neighboring pixels, the positional consistency within a neighborhood—especially when using a large kernel—tends to be highly non-stationary.
To address this, the present algorithm computes the disparity value in two stages. In the first stage, a large kernel size is used to obtain an initial disparity estimate at the multi-pixel level. In the second stage, a refined sub-pixel disparity calculation is performed using a small kernel. The use of large kernels in the first stage presents a challenge, as it significantly increases both the computation time and memory requirements. To mitigate this, the proposed algorithm introduces a method to reduce the computational complexity of large kernels by converting them into a small feature vector. This approach decreases computation time while maintaining the accuracy necessary for downstream processing.
Additionally, the method enables storage of neighborhood information in the third dimension, allowing for efficient computation of the three-dimensional cost volume. A simplified schematic of the proposed algorithm is presented in
Figure 11. The following sections provide a detailed, step-by-step explanation of the algorithm.
2.3.1. Preprocessing
SAR intensity images are inherently affected by speckle noise due to the coherent nature of radar imaging. Since radargrammetry relies directly on intensity images, similar to photogrammetric approaches, reducing speckle noise is crucial, as it directly influences the reliability of the similarity metrics that form the basis of stereo-matching. Speckle degrades these metrics and poses serious challenges in subsequent stages.
Various algorithms have been developed to suppress speckle noise, with adaptive methods that preserve image texture receiving particular attention. In the present algorithm, two approaches [
25,
61,
62] were used to reduce speckle noise. These algorithms were selected adaptively to balance noise suppression and texture preservation.
Furthermore, SAR intensity images acquired in stereo mode with a large baseline exhibit strong radiometric differences due to varying imaging angles. To address this, histogram matching is applied after speckle reduction to harmonize intensity distributions between the two images.
Note that similarity metrics, such as mutual information and the census transform are particularly sensitive to noise, making speckle reduction a critical preprocessing step. The impact of speckle reduction and histogram matching is illustrated in
Figure 12.
Table 4 and
Table 5 summarize the parameters applied in both preprocessing stages for Sentinel-1 and TerraSAR-X, respectively.
2.3.2. Storing Pixel Neighborhood Information as a Feature Vector in the Third Dimension and Constructing a Data Cube
If the neighborhood information of each pixel is stored in the third dimension, the cost function can be computed at high-speed using matrix shifting, like semi-global pixel-wise cost calculation. However, even for small kernel sizes, such as 5 by 5, the third dimension reaches a size of 25. As the kernel size increases, the required memory to store this neighborhood information becomes extremely large, rendering practical implementation infeasible.
Since the neighborhood data forms the feature vector of a pixel, the main question arises: Is it necessary to retain information from all neighboring pixels for effective comparison?
Consider a 1000 × 1000 image containing one million pixels. According to Equation (5), for a kernel of size
by
and an image with n-bit radiometric resolution, the size of the feature space is given by
In this equation, denotes the size of the feature space, is the kernel dimension, and represents the radiometric resolution of the image in bits. Even for very small kernel sizes, this number becomes extremely large, forming a hyperspace where the majority of possible feature vectors are unpopulated. Consequently, the full kernel can be converted into a feature vector with smaller dimensions while still maintaining sufficient discriminatory power for pixel comparison.
To illustrate this,
Figure 13 presents the normalized autocorrelation of a large-dimensional feature kernel with respect to its neighboring pixels, shown both in its original form and in a reduced form obtained through interpolation.
As observed, even after a significant reduction in kernel size, the similarity metric—especially near the autocorrelation peak—remains stable. Therefore, for cost function computation, reduced samples of the original kernel can be used without a major loss in performance.
However, naively extracting the neighborhood around each pixel followed by dimensionality reduction does not reduce computational time. To address this, the proposed algorithm performs dimensionality reduction for all pixels simultaneously using shift operations and joint sampling, storing the results in the third dimension.
The process begins with selecting two initial parameters: the original kernel size (OK) and the reduced vector size (RV). According to the sampling step (OK/RV), sampling is performed around each pixel in both row and column directions. The mean of these two directions is computed and stored as a feature vector of length RV in the third dimension.
A key implementation detail is that direct shift operations on the original image are unreliable due to two primary issues:
Radiometric non-stationarity around pixels caused by elevation differences.
High radiometric frequency variations in some image regions that may violate the Nyquist sampling criterion, resulting in inaccurate neighborhood representation.
To mitigate these issues, both images are first smoothed radiometrically using a linear low-pass mean filter before sampling. A linear low-pass filter ensures a uniform shift (linear phase response) across the image without altering the elevation-induced nonlinear shifts. While this filter removes high-frequency radiometric noise, it does not eliminate high-frequency elevation features such as fine topographic changes.
To clarify this, consider a homogeneous region in an image with similar gray levels but substantial elevation variation. Radiometric smoothing may reduce gray-level contrast, potentially obscuring fine elevation changes. However, since elevation-induced variations also influence radiometric values—particularly along the column direction—the algorithm incorporates a two-stage design. This design ensures that initial smoothing does not significantly hinder the detection of fine elevation features.
Figure 14 illustrates the process of feature vector creation in both the conventional and the proposed algorithm.
2.3.3. Constructing the Cost Cube Using Shift Operations
Typically, the search region in the secondary image determines the size of the third dimension in the cost cube. This region can be extracted either manually or automatically. If prior knowledge about the expected elevation variations is available, a search region can be manually defined, usually with a slight overlap to ensure that constraints on the search space do not degrade the accuracy of the algorithm’s output. Alternatively, the search range can be estimated automatically using the phase correlation method, as described in Equation (6).
Here,
and
are the input images, FFT represents the Fourier Transform, and conj(⋅) is the complex conjugate operator. For images with a constant shift, the phase correlation function exhibits a sharp peak at the shift location. However, in stereo images, the correlation exhibits a broader region of high values, reflecting the range and mean of elevation differences across the image, shown in
Figure 15.
Since the stereo images are rectified, displacement occurs only along the horizontal (column) direction, making only the first row of the phase correlation output relevant. To extract a reliable search, range from this row, a moving Max filter is applied to smooth out discontinuities and introduce overlap. Then, a threshold is used to determine the lower and upper bounds of the disparity range.
To extract the relevant disparity range, a binary mask is created by thresholding the first row (because of the epipolar resampling of the input, phase correlation happens only in the first row) of the phase correlation image as follows:
The search area SA is then determined by
In this context, is the thresholded phase correlation image, is the search range, and and denote the minimum and maximum column indices where phase correlation exceeds the threshold . In the next step, the use of a large kernel size during the initial disparity estimation stage results in a broader full-width half maximum (FWHM) of the cost function. Note that FWHM is defined as the distance between two indices in which a symmetric function around its maximum reaches to its half-maximum value. This increased FWHM implies that fewer samples are required to extract the optimal disparity value, as the cost function becomes smoother and its maximum becomes easier to identify even at reduced resolution. Consequently, the cost function can be sampled at coarser intervals without sacrificing accuracy.
To better illustrate this point,
Figure 16 compares the shape of the cost function for small (e.g., 5 × 5) and large (e.g., 125 × 125) kernel sizes in both original and reduced feature vector forms. As shown, larger kernels yield a smoother cost function with reduced noise and broader peaks, facilitating more efficient sampling.
Accordingly, instead of using a one-pixel step, a larger step size can be adopted in the disparity search process. Each sample’s index must be recorded separately, ensuring that the correct disparity value is retrieved even when step sizes exceed one pixel. A practical choice for the step size is the ratio of FWHMs for reduced and original kernels:
Here, denote the half-maximum widths of the cost function generated using the reduced and original kernels, respectively. The step size s is typically chosen as 2 or 3 to prevent a reduction in the accuracy of optimal value extraction. In this way, the third dimension of the cost (similarity) model is reduced to one-half or one-third.
2.3.4. Refinement of the Disparity Model Using Dual-Mode Median Filtering with Gradient-Based Thresholding
After calculating the initial disparity model using Equations (3)–(9), a refinement step is applied to improve model continuity and suppress noise. This refinement is performed using a dual-mode median filtering strategy, which operates adaptively based on the local gradient of the disparity model. The primary objective is to eliminate abrupt jumps or discontinuities in the disparity surface that may result from matching errors or noise.
To begin, the gradient of the initial disparity model is computed in both the row and column directions to determine the elevation jump for each pixel. A threshold is then applied to this gradient to distinguish between regions with smooth transitions and those exhibiting abrupt disparities (e.g., object boundaries or matching artifacts). Depending on the gradient magnitude, the disparity values are selectively smoothed using one of two median filters:
In Equation (10),
is the refined disparity at pixel (
i,
j),
is the gradient of magnitude of the disparity model at (
i,
j),
is the threshold value, and
is the median filter. Empirical analysis suggests that a threshold value between 1 and 2 yields good continuity in the final disparity model.
is selected as a 3 × 3 or 5 × 5 median filter, and
is chosen with dimensions ranging from 9 × 9 to 11 × 11. The final gradient value is then calculated as
2.3.5. Sub-Pixel Estimation of the Optimal Value of the Similarity Model
To achieve sub-pixel accuracy in disparity estimation, the proposed algorithm employs a quadratic interpolation model, which provides a reliable approximation of the cost function near its maximum value. The quadratic model used is defined as
In Equation (12) (x represents the pixel location along the epipolar line (column number), while a, b, and c are the coefficients of the function
To determine these coefficients, a least-squares fitting method with zero degrees of freedom is used, leveraging three known points around the peak of the cost function:
where
, and
are, respectively, the maximum value of the cost function and the two lower values at the pixel with address
i and
j, and
t,
u, and
v are their corresponding positions along the epipolar line. Substituting these into the quadratic function yields the system of equations:
The maximum value at the sub-pixel level is equal to
By computing the inverse matrix parametrically, the values of
a and
b can be calculated, and then the sub-pixel disparity can be computed in parallel for all pixels:
2.3.6. Image Warping Using the Refined Disparity Model and Repetition of the Similarity Volume Construction with a Small Kernel Size
At this stage of the algorithm, the cost volume construction process is repeated, but this time using a smaller kernel size to achieve finer disparity resolution. Before starting the process, the search image is warped using the initial refined disparity model obtained from the previous stage.
Without image warping, repeating the similarity calculation with a small kernel would necessitate re-evaluating the entire disparity search space at a step size of one pixel, leading to increased computational complexity and higher memory requirements. By contrast, warping the search image using the initial refined disparity model effectively aligns the stereo pair, significantly narrowing the search range required for matching and enabling more efficient and localized cost computation.
The update of the input algorithm is outlined below:
Acquisition of stereo images (I1 and I2)
Selection of kernel sizes:
- ○
K1: initial (larger) kernel size.
- ○
K2: reduced kernel size.
Estimation of the sampling ratio K2/K1, followed by image smoothing using a mean filter applied in the frequency domain
Conversion of 2D images to 3D by
- ○
Sampling pixel neighborhoods using row and column averaging.
- ○
Storing reduced feature vectors in the third dimension.
Estimation of the search area (SA):
- ○
Apply phase correlation (PC) in the FFT domain.
- ○
Threshold the result to obtain a constrained disparity search interval.
Determination of the motion step (MS) from the structural shape of the cost function
Matrix shifting:
- ○
Shift the reference image over the search image.
- ○
Compute the cost function at each candidate disparity using pixel-wise similarity.
Disparity estimation:
- ○
Identify the maximum of the cost function.
- ○
Store the corresponding disparity values as the initial disparity model (∆).
Disparity refinement:
- ○
Apply median filtering guided by disparity gradient thresholding.
- ○
Generate a refined disparity model (∆′).
Search image warping using ∆′
- ○
Use the refined disparity model to realign the search image.
- ○
Reconstruct the data cube using a small kernel.
- ○
Set motion step to 1 pixel.
- ○
Repeat steps by small kernel for higher precision.
Sub-pixel disparity estimation:
- ○
Fit a quadratic function to the cost values around each maximum.
- ○
Apply least-squares fitting to estimate sub-pixel disparity locations.
This two-pass architecture—coarse-to-fine disparity estimation—enables the algorithm to achieve both robustness (in the initial pass with a large kernel) and high accuracy (in the refined pass with a small kernel), while maintaining computational efficiency through intelligent search space reduction using image warping.
To clarify the algorithmic procedure and illustrate the role of the associated equations, the diagram in
Figure 17 has been provided
. 2.4. Expected Elevation Accuracy (Practical Perspective)
Before presenting the results, it is useful to review the theoretically achievable elevation accuracy derived from stereo imagery in radargrammetric and photogrammetric approaches. From a practical standpoint, the attainable elevation accuracy in these methods depends on four main factors: baseline (incidence angles), spatial resolution, radiometric resolution, and the accuracy of correspondence point extraction. Although deriving exact mathematical relationships—particularly for the latter two factors—is challenging, their general influence can be assessed to some extent. For illustration, the effects of two key factors (baseline and spatial resolution) are shown in
Figure 18 and
Figure 19, respectively.
Additionally, the main factors affecting elevation extraction accuracy are presented in Equations (20) and (21).
In Equation (20), denotes the elevation extraction accuracy, represents the coefficient associated with the baseline, accounts for geometric distortions, is the coefficient related to radiometric resolution, and reflects the accuracy of correspondence point extraction. The term indicates the attainable planimetric accuracy in the images. Since planimetric accuracy itself is affected by various factors such as ground pixel size and the Kell factor, Equation (21) replaces with two terms: , representing the conversion effect from planimetric accuracy to pixel size, and P, a constant representing the nominal pixel size (in meters). The indices i and j correspond to the pixel’s spatial position in the image space, indicating that, in practice, elevation accuracy is spatially variable across the image.
While mathematically, improving correspondence extraction accuracy could reduce the influence of other factors, this is impractical for two main reasons. First, the tools required to verify the numerical accuracy of correspondence locations—typically a fraction of a pixel—are not readily available. Second, the aforementioned factors are not statistically independent but highly interdependent. For example, radiometric resolution and geometric distortions significantly impact correspondence extraction accuracy. Consequently, flight planning and sensor configuration during data acquisition are critical to achieving the desired accuracy, and post-processing methods have limited capability to compensate for these factors.
As previously noted, directly examining these relationships and isolating influencing factors is extremely difficult due to their complexity and unknown dependencies. Therefore, a simplified estimation method can be adopted for assessing potential elevation accuracy. If the ground pixel size (in meters) of the stereo pair and the maximum elevation variation in the object space (in meters) are known, Equation (22) can be used to estimate the minimum extractable elevation step size. In this context,
is replaced by
, representing the minimal achievable elevation step size (larger values correspond to lower elevation accuracy):
In Equation (22) -,
is the maximum elevation difference in the ground space (meters);
and
denote the maximum and minimum operators, respectively;
represents the disparity values in the image space (pixels);
is the nominal pixel size (meters);
is a fractional coefficient representing correspondence extraction accuracy. Finally,
is the estimated minimum elevation step size achievable (meters). If disparities are manually measured, the best achievable accuracy is half a pixel, giving
= 0.5. Equation (23) presents the relationship between elevation model resolution and elevation model accuracy. Although this estimate is approximate, using
γ = 0.5 provides a reasonable expected value. While algorithmic matching methods can theoretically reach sub-pixel accuracies (as low as 0.1 pixel), validating such precision remains difficult. Therefore, adopting the more conservative operator-derived value (
γ = 0.5) yields a more realistic expectation.
For example, in the present dataset, the maximum elevation difference in the ground space is approximately 2000 m, with a disparity range of about 55 pixels and a pixel size of 10 m. This results in an estimated elevation model resolution of roughly 18 m. Thus, under optimal conditions, an accuracy of around 18 m could be expected; however, SAR-specific distortions and lower radiometric resolution will likely degrade this theoretical limit.
3. Results
This section presents the results obtained from the implementation of the proposed algorithm alongside three comparison methods: adaptive SGM and pyramidal NCC. The first subsection outlines the parameters used for each in tabular form.
Since the images used are at the GRD level and are georeferenced, the final disparity models obtained from each method can be converted into DSMs (digital surface models). This is possible because the disparity values are directly related to the metric space of the images. To facilitate this conversion, the SAR images and the reference DSM (derived from aerial images) were first converted from the geographic coordinate system to a projected coordinate system (metric space). The WGS84 reference ellipsoid was selected, and the UTM projection was applied. Given that the stereo section of the image pair lies entirely within UTM Zone 38, both the SAR images and the reference DSM were exported in that zone. Because of the incidence angle differences along the range direction for the stereo pair, displacement and elevation in the image can be described by Equation (24):
In Equation (24), dp is the ground displacement (parallax) between two points in meters, and are the incidence angles of the first and second sensors at the point of interest, and h is the elevation in ground space in meters. That is why the SAR stereo pair has a higher elevation extraction precision in the far-range part of the images.
It should be noted that if SAR geometric models are used during the image normalization stage and the circular imaging geometry is not linearized, the incidence angles will influence the normalization. In such cases, normalized images maintain a linear relationship with the ground space, especially if imprecise elevation models are used. However, if simple models (e.g., affine transformations) are employed to project georeferenced images, the transformation from disparity space to absolute elevation becomes nonlinear, requiring the full intersection process (i.e., range and Doppler equations and coordinate transformations) for accurate results.
In this study, the images were projected assuming a constant elevation in the range direction on the WGS84 reference ellipsoid. With available incidence angle data for both images, Equation (24) can be used within the stereo area to convert disparity to elevation. Incidence angles for each range distance can be interpolated using the beginning and ending angle values provided in the image metadata.
To enhance accuracy and eliminate systematic errors in the DSM comparison, at least 50 control points were extracted from each of the stereo images. Due to the 10 m spatial resolution of the SAR images, Google Earth imagery (Google) was utilized to extract the control points, focusing on high reflectivity features such as small buildings, water reservoirs, or medium-sized rocks in mountainous areas. Of these, 35 points were used to model the elevation, and 15 points were used for testing.
To address potential temporal discrepancies between the aerial reference imagery and the SAR data, control points were carefully extracted from man-made structures visible in Google Earth imagery that matched the acquisition date of the SAR dataset as closely as possible. This approach minimized errors due to changes over time and ensured greater consistency between the reference and SAR data.
As portrayed in
Figure 20, even without explicit incidence angle data, a reasonably accurate ground–disparity relationship can be established if control points are uniformly distributed along the range direction of the entire stereo area.
In this simulation, exaggerated disparity values were used. However, smaller incidence angles—while increasing sensitivity to disparity-elevation conversion—can also reduce the absolute accuracy of the final DSM. Therefore, for radargrammetric purposes, incidence angles must be chosen to balance accuracy and minimize geometric distortion during the correspondence process.
Although opposite-side imaging can improve accuracy, it is rarely used due to geometric complications. The following subsections present the algorithm parameters and results from GCP-based analysis and DSM comparisons, both quantitatively and qualitatively.
The parameter settings for each algorithm were optimized through trial and error. While a full parameter sweep was impractical, the values were tuned to provide a fair comparison. The proposed algorithm uses the NCC similarity measure, whereas SGM-based methods use the SAD cost function. Despite preprocessing efforts to reduce speckle, SAR images still exhibit some fluctuations, which significantly affect the output of the MI measure. SAD, being less sensitive, was therefore preferred for SGM.
Large kernel sizes are more necessary for satellite SAR imagery due to weaker texture patterns. As demonstrated in [
17], TerraSAR images were resampled to 10 m resolution to mitigate this issue. In practice, high-resolution satellite images, due to the lack of feature patterns at smaller scales, especially images acquired from agricultural, forested, or mountainous areas with natural vegetation and grasslands, exhibit very weak texture and are considered homogeneous in areas likely smaller than 10 to 15 m. Therefore, either the kernel size must be classically increased, or the image must be resampled to a coarser spatial resolution. This issue has also been examined for optical images in the present study, confirming the same point. For this reason, if small kernel sizes are used for the initial search range in correspondence estimation, the highly noisy behavior of the cost function model will prevent the extraction of the correct optimal value. On the other hand, as shown in
Table 6, increasing the kernel size in local methods significantly increases computational complexity, which in practice limits their use.
3.1. Disparity Model Completeness
One of the key factors for evaluating the performance of correspondence algorithms is their success rate in correctly determining disparity values across all pixels in the image. This performance is typically quantified using the disparity model completeness parameter, which measures the proportion of pixels with valid disparity values. Assuming a total of P pixels in the stereo region and P
d as the number of pixels with valid disparity values, the completeness parameter C and bad pixel percentage B are defined as follows:
where
is the completeness (% of valid disparity values),
is the bad pixel percentage (% of invalid or missing disparity values),
is the total number of pixels in the stereo region, and
Pd is the number of pixels with valid disparity values.
Table 7 presents the completeness percentages for each method across eight subsets of a high-resolution SAR stereo dataset (1596 × 4833 pixels).
Among all methods, SGM-3 (based on Canny edge detection) achieved the highest completeness across most subsets, producing smoother and denser disparity maps. The relatively lower percentage of SGM-2 (grayscale-based SGM) underscores the difficulty of disparity estimation in SAR imagery due to radiometric inconsistencies that hinder the detection of elevation jumps using intensity thresholds. Although the NCC algorithm yields high completeness, this result can be misleading, as completeness can be artificially inflated by increasing kernel size, which suppresses texture detail or allows larger absolute matching errors. A similar trend is observed in the proposed method, which can boost valid disparity counts through efficient vector encoding. Therefore, completeness alone is not a sufficient indicator of performance and should be interpreted in conjunction with accuracy-based metrics.
3.2. Processing Time Evaluation
Table 8 summarizes the total processing time for each algorithm, including a common preprocessing step (adaptive speckle reduction and histogram matching) lasting approximately 4 s for all methods.
The proposed algorithm demonstrates the lowest overall processing time.
Table 9 further details the breakdown of its temporal components. The results confirm the proposed method’s efficiency in both computational speed and disparity estimation density.
3.3. Quantitative and Qualitative Evaluation of Elevation Models
The performance of each algorithm is further assessed by comparing the elevation models derived from their disparity outputs to a reference DSM, using RMSE and standard deviation metrics.
Table 10 and
Table 11 present these results.
The quantitative results obtained from each method, along with the residuals of the surface models relative to the reference model, are presented in
Figure 21 and
Figure 22, respectively. The proposed algorithm consistently achieves a low RMSE and standard deviation, confirming its effectiveness in generating accurate and stable elevation models. Visual inspection of residual maps (
Figure 23) further illustrates that while some localized artifacts remain, the proposed method produces smoother results compared to NCC, particularly in complex terrain.
Errors tend to increase in high-relief areas due to geometric distortions and texture variations intrinsic to SAR imaging.
Figure 23 provides a profile-based comparison along a representative image row, reinforcing these findings.
To validate geometric consistency, incidence angle corrections were applied, followed by establishing a linear transformation between disparity and elevation using at least 35 control points per image (plus 15 for validation), as shown in
Figure 24. This ensured the absence of systematic errors in the elevation models.
Finally,
Figure 25 highlights disparity gaps in texture-poor regions (e.g., agricultural fields), where correspondence algorithms often fail due to homogeneous surface patterns. These gaps underscore the inherent challenges of SAR-based disparity estimation in low-texture scenes, which may require integration with other data sources (e.g., LiDAR or optical stereo) for completeness.
To facilitate a more insightful analysis of residual elevation errors from the reference surface model across different approaches, their error distributions have been visualized using histograms. These plots illustrate both the root mean square error (RMSE) and the 90th percentile of absolute error, while also depicting the overall statistical shape of the error distribution. As shown in
Figure 26, the outputs of most methods exhibit an approximately Rayleigh-like distribution. Ideally, a Gamma distribution is preferred, where the majority of errors concentrate near zero and the frequency of larger deviations decays exponentially. In contrast, the observed Rayleigh-like patterns in the current models show a peak (or mode) greater than zero but still demonstrate exponential decay in the tail. Accordingly, superior algorithms tend to produce error distributions with sharper peaks closer to zero and a more rapid exponential falloff, indicating enhanced performance. In the histogram model, the absolute error values are considered to avoid misleading interpretations caused by mean or mode values. This choice ensures a more robust visualization of error distribution, especially in cases where skewness or outliers might distort central tendency metrics. In addition, to assess the contribution of the preprocessing stage, the proposed algorithm was evaluated independently—without preprocessing—to highlight its role in reaching optimal accuracy. The result is illustrated in
Figure 27.
To thoroughly analyze the performance of the proposed algorithm and highlight the significance of its two-stage structure, a series of dense matching experiments have been conducted. These assessments emphasize the algorithm’s behavior during the matching process by employing feature vectors of varying lengths, as well as local matching techniques using kernels of different sizes—evaluated at both the first and second stages of the proposed algorithm. The obtained results are summarized in
Table 12 and
Table 13, respectively.
Table 14 presents the performance evaluation parameters of the proposed algorithm in comparison with other algorithms. The performance evaluation of the proposed algorithm in comparison with existing methods on TerraSAR-X data is presented in the following section.
Table 14 provides a detailed quantitative summary, including key evaluation metrics and the tuning parameters associated with each approach.
Figure 28 showcases the visual elevation models reconstructed by each algorithm, alongside the reference surface model. Consistent with the method applied to Sentinel-1 data, elevation error maps are separately illustrated in
Figure 29. To enable deeper statistical interpretation beyond standard metrics,
Figure 30 depicts the histogram-based distribution of absolute error values for each model. This visualization allows for comparative analysis of error concentration and tail behavior across methods.
4. Discussion
In the present study, an algorithm was proposed for extracting disparity models from stereo SAR intensity images using a radargrammetric approach [
64,
65,
66,
67,
68]. The proposed method, inspired by the basic local NCC algorithm and designed to address its structural and computational limitations, introduces a spatial–frequency domain mechanism for generating the cost volume in the correspondence algorithm. The proposed algorithm exhibits low computational complexity and execution time while maintaining stable results in comparison to adaptive methods. The proposed algorithm can be potentially used for various applications concerning ground deformation and land subsidence/uplift [
69,
70].
Since the core of the proposed algorithm for computing the cost volume relies on reducing the dimensionality of conventional kernels, it inherently shows low sensitivity to noise. Although a preprocessing step was added to all algorithms in this study for quantitative quality comparison, the proposed method, unlike the others, is capable of producing an output with limited elevation accuracy even without preprocessing—a capability not found in the other methods without such a step. As previously explained, this is because the proposed algorithm focuses more on the textural features surrounding each pixel rather than changing the similarity measure. Transforming the neighborhood structure of a pixel into a more compact and lightweight feature vector enables the storage of features in the third dimension for both the left and right images. This facilitates the real-time generation of the cost volume, similar to semi-global methods. Still, unlike semi-global and global pixel-based methods, the generated cost volume is not pixel-based and, in practice, does not require a cost aggregation step.
As discussed in the expected accuracy analysis (
Section 2.4), the achieved results fall short of ideal predictions due to multiple contributing factors. One major source of deviation stems from inherent geometric distortions presents in radar imagery. As illustrated schematically, when the radar incident angle approaches near-vertical orientations (close to 90°), the effective ground resolution in the range direction deteriorates significantly compared to the nominal sensor resolution. This causes aggregation of extended surface areas into individual pixels, leading to reduced spatial precision and overall matching accuracy. The degradation is particularly pronounced in mountainous or topographically complex regions.
Such structural limitations also influence the stereo-matching process, particularly the γ parameter in Equation (22). Although a nominal value of 0.5 is commonly used for estimating ideal accuracy, it cannot be assumed that all pixels conform to this level of precision. Among the influential parameters is the kernel size used during matching. To achieve optimal performance, both the proposed algorithm and conventional local methods are tested with varying kernel sizes, selecting the configuration that yields the best output. However, uniform kernel sizes across the image may not result in uniform accuracy. A kernel that performs optimally in one region may not be ideal elsewhere, particularly across heterogeneous landscapes.
This motivates the use of adaptive methods, where kernel dimensions are dynamically adjusted to compensate for local variations. While the proposed algorithm supports flexible kernel configurations, its current design—based on unified computation—does not accommodate per-pixel dynamic kernel adaptation within a single processing pass. This presents an opportunity for future work, where simultaneous optimization of kernel sizes across the image may further improve disparity estimation accuracy.
Although the proposed structure may be less effective than semi-global methods in regions with abrupt elevation changes, the integration of a dedicated post-processing stage helps suppress unwanted distortions in the final model. It is noteworthy that height model generation in areas with significant elevation variability—such as urban environments—using SAR intensity imagery has recently gained attention. Therefore, adapting dense matching algorithms to achieve performance comparable to those used in optical imagery warrants further investigation. As demonstrated in
Table 12 and
Table 13, leveraging an adaptively sized kernel window could address the trade-off between pixel-level matching precision and smoothing behavior. This approach, which may offer an alternative to the two-stage design, should be considered in future research. Additionally, due to the high computational load involved in both cost function computation and aggregation in the SGM algorithm, incorporating the proposed algorithm into its framework may retain its stability while significantly reducing computational complexity.
Parameter tuning in the proposed algorithm is significantly less complex than in other algorithms, primarily due to its two-stage architecture. In the first step, the focus is on significantly reducing the search space or disparity range by employing larger kernel sizes. In local algorithms, increasing the kernel size often imposes a hefty computational burden, particularly due to the instability of the surrounding texture and the invalidity of assuming a uniform disparity model for all pixels within a kernel. Moreover, vectorization or cost volume construction (as performed in the proposed algorithm) is typically not feasible in conventional local approaches. Even when implemented in two stages, superior results can be achieved by using a large kernel in the first stage followed by a smaller kernel in the second. The central concept of the proposed method is based on this principle, which, as discussed in earlier sections, commonly limits the applicability of local or adaptive methods, especially in real-time applications.
Up to now, most algorithms and methods for monitoring and detecting ground deformations—such as subsidence and landslides—have focused heavily on interferometric techniques. Given that, as mentioned earlier in the Introduction, interferometry poses several practical challenges, the proposed algorithm can contribute to this field from two perspectives: indirectly by supporting interferometric algorithms, particularly during the critical stages of co-registration and phase unwrapping, and directly as a standalone method.
Although radargrammetry may initially face limitations—especially when spatial displacements are within the centimeter or sub-centimeter range—similar to the development of Persistent Scatterer techniques in interferometry, radargrammetric approaches can also benefit from such features. These targets exhibit stable intensity values, in addition to consistent phase returns, and often appear as very bright pixels in the image. When the scatterer is located at a sub-pixel level, similar to what occurs in optical imagery due to limited spatial resolution and sensor constraints, the intensity affects not only the primary pixel but also its neighbors. Unlike typical SAR pixels, these effects are significant and meaningful. This phenomenon is known as the point spread function (PSF), or its frequency domain counterpart, the modulation transfer function (MTF) [
71,
72].
Due to PSF’s high sensitivity to sub-pixel displacements, this characteristic can be exploited for monitoring fine-scale spatial changes. However, accurate PSF estimation first requires reliable matching of similar features between stereo images, and also the use of the image’s original radiometric values (including speckle), as PSF is sensitive to grayscale levels. As previously discussed, the proposed algorithm is inherently suitable for application to raw images with speckle thanks to its use of a compact kernel, and it preserves gray-level consistency between stereo pairs by employing a linear-phase low-pass filter. This enables accurate extraction of the PSF and, consequently, the precise estimation of sub-pixel displacements in stable scatterers. Further exploration in this area could be highly valuable, especially considering that the latest sensors provide significantly higher spatial resolution capabilities.
5. Conclusions
This study introduces a novel algorithm for extracting corresponding pixels in SAR-intensity stereo images, aiming to balance the trade-off between output model quality and execution time. The performance of the proposed algorithm, when compared to adaptive methods (especially SGM), demonstrates its strong potential for generating disparity models from stereo images, particularly SAR intensity images. Compared to the state-of-the-art techniques, the proposed algorithm achieves higher accuracy in areas with flatter topography and lower accuracy in mountainous regions. Since the algorithm was tested on datasets that included approximately equal proportions of both types of topography (in some cases, mountainous areas made up a larger portion of the image), the results can be considered highly reliable. Furthermore, the SAR data used in this study presents greater challenges than higher-resolution datasets, such as Terra-SAR-X, due to two main factors: (1) lower radiometric resolution (Sentinel images using the C-band versus Terra-SAR-X images using the X-band), and (2) greater geometric distortion resulting from larger differences in incidence angles in the stereo image. Therefore, it is reasonable to generalize the proposed algorithm to datasets with better spatial and radiometric resolution.
To enable a more rigorous evaluation, a pair of TerraSAR-X datasets was analyzed alongside Sentinel-1 imagery, facilitating a comprehensive assessment of the proposed algorithm’s robustness and generalizability relative to existing methods. The results—both quantitative and qualitative—consistently demonstrated the algorithm’s competitive performance in conjunction with other approaches. Nonetheless, as noted in the discussion, structural refinements to the algorithm could further enhance its qualitative outcomes while substantially improving computational efficiency.
The proposed algorithm is primarily built upon computational complexity modeling, but it still requires further adaptability to diverse conditions (e.g., significant elevation discontinuities) to remain competitive with state-of-the-art methods. Particularly, recent research trends have shown heightened interest in high-resolution SAR imagery, both radiometrically and spatially, as a promising alternative in scenarios where optical data or interferometric techniques face serious limitations. The core algorithm has been independently validated to assess its robustness and generalization capability. In future work, hybrid integration with semi-global models is envisioned, aiming to leverage their dynamic performance in smoother elevation reconstruction—especially in areas with pronounced topographic variation—while significantly enhancing their cost function construction and aggregation efficiency, which remains their main computational bottleneck.
The very short time required to generate the cost volume also allows for the integration of a correction or editing stage, i.e., the inclusion of adaptive post-processing operations. Additionally, the method could serve as a preprocessor or initial stage for semi-global algorithms, potentially reducing the need for cost function aggregation across multiple paths, e.g., the standard 16-path aggregation in SGM, offering a promising direction for future research.