Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images

Jannati, Hamid; Valadan Zoej, Mohammad Javad; Ghaderpour, Ebrahim; Mazzanti, Paolo

doi:10.3390/rs17152693

Open AccessArticle

Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images

¹

Department of Photogrammetry and Remote Sensing, Faculty of Geomatic Engineering, K. N. Toosi University of Technology, Tehran 19967-15433, Iran

²

Department of Earth Sciences and CERI Research Centre, Sapienza University of Rome, Ple Aldo Moro, 5, 00185 Rome, Italy

³

NHAZCA s.r.l., Via Vittorio Bachelet, 12, 00185 Rome, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(15), 2693; https://doi.org/10.3390/rs17152693

Submission received: 12 June 2025 / Revised: 28 July 2025 / Accepted: 31 July 2025 / Published: 3 August 2025

(This article belongs to the Special Issue Advancing Synthetic Aperture Radar: Imaging, Processing, and Applications in Remote Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Synthetic Aperture Radar (SAR) images and optical imagery have high potential for extracting digital elevation models (DEMs). The two main approaches for deriving elevation models from SAR data are interferometry (InSAR) and radargrammetry. Adapted from photogrammetric principles, radargrammetry relies on disparity model estimation as its core component. Matching strategies in radargrammetry typically follow local, global, or semi-global methodologies. Local methods, while having higher accuracy, especially in low-texture SAR images, require larger kernel sizes, leading to quadratic computational complexity. Conversely, global and semi-global models produce more consistent and higher-quality disparity maps but are computationally more intensive than local methods with small kernels and require more memory (RAM). In this study, inspired by the advantages of local matching algorithms, a computationally efficient and novel model is proposed for extracting corresponding pixels in SAR-intensity stereo images. To enhance accuracy, the proposed two-stage algorithm operates without an image pyramid structure. Notably, unlike traditional local and global models, the computational complexity of the proposed approach remains stable as the input size or kernel dimensions increase while memory consumption stays low. Compared to a pyramid-based local normalized cross-correlation (NCC) algorithm and adaptive semi-global matching (SGM) models, the proposed method maintains good accuracy comparable to adaptive SGM while reducing processing time by up to 50% relative to pyramid SGM and achieving a 35-fold speedup over the local NCC algorithm with an optimal kernel size. Validated on a Sentinel-1 stereo pair with a 10 m ground-pixel size, the proposed algorithm yields a DEM with an average accuracy of 34.1 m.

Keywords:

radargrammetry; dense matching; computational complexity; SAR stereo images; elevation model

Graphical Abstract

1. Introduction

The emergence of Synthetic Aperture Radar (SAR) imagery has revolutionized geospatial information, particularly in photogrammetry and remote sensing. SAR offers distinct advantages over optical imaging, including all-weather, day-and-night operation, subsurface and surface-penetration capability, and complementary information that often surpasses conventional optical data. Among its key applications is the extraction of elevation information, such as in digital elevation models (DEMs) and digital surface models (DSMs) [1,2,3].

Two primary methodologies dominate elevation extraction from SAR data: interferometry (InSAR) and radargrammetry [4,5,6,7,8]. InSAR exploits the phase difference between two SAR images acquired from spatially separated sensor positions. The resulting interferogram—encoding elevation variations as phase values between 0 and 2π—suffers from inherent phase ambiguity. Resolving this ambiguity (phase unwrapping) is critical for deriving absolute elevation values and for mitigating SAR-specific speckle noise [9,10].

While InSAR can theoretically achieve high accuracy, practical implementation faces significant challenges [9,11,12,13]. Elevation-estimation accuracy is highly sensitive to the baseline distance between acquisitions: a larger baseline improves elevation precision but degrades coherence, particularly over homogeneous or vegetated regions (e.g., forest canopies). Precise, pixel-level registration of the image pair is also essential to ensure that phase differences correspond to the same ground targets. Phase unwrapping remains computationally intensive and often requires trade-offs between accuracy and processing time. Moreover, local matching during image registration further escalates computational demands, limiting real-time applicability [6].

Radargrammetry, an alternative approach, leverages intensity information rather than phase, drawing on photogrammetric principles [14,15,16,17,18,19,20]. As illustrated in Figure 1, same-side stereo imaging is preferred because of its radiometric consistency, avoiding distortions inherent in opposite-side configurations. Although SAR and optical imaging share geometric similarities, SAR’s ground-to-image projection is circular rather than perspective [15,21,22,23,24].

Although radargrammetric methods require appropriate acquisition conditions, they are less sensitive to imaging geometry than interferometry. Because they operate on intensity images, radargrammetric approaches can benefit from advances in photogrammetric matching techniques with suitable modifications. Given the significant progress in both sparse and dense matching methods, radargrammetry has garnered increasing attention as a viable tool for elevation extraction. It also provides auxiliary benefits for interferometry—improving image registration and supporting phase unwrapping—when used to generate complementary elevation data [6]. Additionally, radargrammetry facilitates the integration of SAR and optical data, enabling hybrid geospatial information systems.

Despite its potential, radargrammetry must contend with challenges less pronounced in optical stereo imaging. These include speckle noise [25], lower radiometric resolution, weaker texture patterns, and geometric distortions, such as foreshortening, layover, and shadowing. Radargrammetric elevation accuracy is highly sensitive to the precision of the pixel-matching process, particularly in same-side stereo configurations. Addressing these challenges begins with effective speckle reduction, which improves radiometric quality and facilitates more reliable matching.

Dense disparity maps generally require area-based matching; feature-based methods yield only sparse correspondences and must be supplemented by area-based techniques—particularly when working with rectified images that minimize the search space [26,27,28,29,30,31].

Feature-based correspondence techniques [32,33,34,35,36,37,38,39,40], beyond traditional dense matching and deep learning methods, offer a sparse alternative, particularly effective on raw or unrectified imagery. These methods first detect salient key points—such as corners, circular features, and edges—that appear in both reference and secondary images. Because these features often reside in high-frequency regions (Figure 2), they occupy only a small portion of the image, simplifying matching but limiting density.

This sparsity proves efficient for estimating initial geometric relationships or minimizing mismatch risk but inherently lacks coverage for full-image matching. Even when key points are artificially multiplied, computational costs escalate quadratically, making full-pixel comparison impractical. As a result, a small subset of points is used to derive epipolar geometry and resample images, narrowing the search space to a linear domain.

Moreover, defining robust descriptors for each point—often nonlinear due to stability and invariance demands—is central to performance, especially when handling multi-sensor data. Applying such descriptors across all pixels for dense matching introduces memory and complex bottlenecks. Therefore, in elevation modeling via radargrammetry or photogrammetry, dense matching is typically deferred until after geometric rectification using feature-based methods, which excel in this preprocessing stage.

Over the years, numerous area-based matching methods have been developed. Broadly, these methods can be divided into two main groups: classical methods (from simple techniques to advanced hybrid models) and deep learning-based methods [42,43,44,45,46,47,48,49,50,51]. Deep learning-based approaches have achieved promising results; however, several considerations limit their effectiveness for SAR image processing. First, deep learning relies on large, labeled datasets for training. While such datasets are increasingly available for optical imagery, assembling extensive and labeled SAR datasets remains challenging. Second, in contrast to classification tasks—where the output space is discrete—dense disparity estimation involves a continuous, high-dimensional output space. Consequently, most deep learning-based stereo-matching methods are trained and tested on narrow, domain-specific datasets, limiting generalizability.

In contrast, area-based algorithms compute disparity maps image by image, eliminating dependence on prior training. These algorithms are generally implemented in two forms: local and global/semi-global [43]. The general structure of area-based matching models consists of four main components: (1) computation of a similarity (or cost) metric, (2) aggregation of similarity values, (3) disparity calculation, (4) refinement of the disparity map.

In local methods, a square window centered on each pixel is used as the feature region. The corresponding pixel in the second image is identified by finding the most similar region. Because the window itself provides contextual information, an explicit cost-aggregation step is usually unnecessary. However, a well-known trade-off exists: small windows are desirable for precise localization, whereas larger windows help to reduce mismatches—particularly in low-texture or repetitive regions. This trade-off is critical in SAR intensity images, which often exhibit homogeneous textures and speckle noise.

Classical local methods include similarity metrics, such as normalized cross-correlation (NCC), sum of squared differences (SSD), and sum of absolute differences (SAD). Other metrics include the census transform [52,53,54], which converts local neighborhoods into binary vectors matched via the Hamming distance, and mutual information [55], which is effective when radiometric differences are present. Although local methods are computationally efficient—especially with small kernels—they often perform poorly in the low-texture regions common in satellite SAR imagery.

To address this limitation, global and semi-global methods [41,56] introduce regularization terms that enforce smoothness or penalize abrupt disparity changes [57]. While these methods improve accuracy, they are more resource-intensive in memory and computation. Multi-scale pyramid structures, image partitioning with overlaps and adaptive cost functions are commonly employed to optimize performance. In the context of SAR imagery, semi-global matching (SGM) has been adapted successfully. Studies such as [14,17] applied SGM to radargrammetry and reported improved performance with pyramid-based implementations and adaptive penalties (SGM-Gray and SGM-Canny).

The core concept proposed in this study is to leverage local methods characterized by low computational complexity. Among dense matching techniques, local approaches theoretically offer the lowest complexity, and in practice, this attribute underpins the foundation of the proposed algorithm, see Figure 3. Consequently, the method inherently exhibits lower computational demand compared to non-local alternatives. To facilitate a clearer understanding, a comparative summary table is provided in Table 1 and Table 2, offering an intuitive overview of the core dense matching strategies. This table also helps to contextualize deep learning-based methods within the broader complex landscape.

Nevertheless, dense matching remains a major computational bottleneck. Selecting suitable comparison kernels and attaining sub-pixel accuracy dramatically increase processing time. Therefore, optimizing this step is essential. To solve this issue, the present study proposes a low-complexity radargrammetric algorithm for SAR intensity images.

As outlined in the proposed framework section, the algorithm is fundamentally built upon restructuring the input data from a 2D image space into a 3D volume (data cube), where the kernel surrounding each pixel—after dimensionality reduction—is stored along the third axis. This enables rapid cost volume construction without relying on iterative procedures during implementation. Unlike sparse, feature-based matching methods which operate on a limited set of candidate points, dense matching requires exhaustive pixel-wise evaluation. In such contexts, direct feature storage or descriptor encoding along the third axis becomes infeasible due to memory constraints. Moreover, although feature extraction is typically designed for discriminability, unless expressible via linear formulations, its implementation for large-scale pixel grids proves impractical. Accordingly, the proposed algorithm adopts a two-stage linear computation pipeline. The rationale for this design is to balance accuracy with computational and memory efficiency: by aggressively constraining the initial search space using a high-confidence heuristic, the second refinement stage can operate with minimal computational overhead while achieving improved precision.

The rest of this work is structured as follows: Section 2 describes the datasets and methods; Section 3 and Section 4 present results and discussion, respectively. Finally, Section 5 concludes this research.

2. Materials and Methods

The first subsection describes the datasets, their features, and the technical justification behind their selection. The second subsection outlines the theoretical foundations, particularly for comparative analysis, while the third subsection details the proposed algorithm and its implementation.

2.1. Data Used

To assess the performance of the proposed algorithm, this study utilizes both Sentinel-1 SAR imagery (Figure 4) and a TerraSAR-X stereo pair (Figure 5) in comparison with other techniques.

These images were selected due to their established efficacy in generating elevation models. The study area spans elevations of 1000 to 3300 m above sea level and encompasses diverse terrains, including the rugged Zagros Mountains and flat/semi-flat agricultural regions. Notably, the mountainous regions exhibit homogeneous texture (due to shrub cover) and significant speckle effects, despite geometric distortions.

Sentinel-1’s C-band images were chosen despite their lower radiometric and spatial resolution compared to X-band systems (e.g., TerraSAR-X/TanDEM-X; Figure 6). This selection intentionally increases the challenge for matching algorithms; successful performance here suggests strong generalizability to higher-resolution data.

Further, the large baseline of the dataset introduces incidence angle variations between identical pixels in the two images, exacerbating geometric distortions. The ground spatial resolution is 10 m (range and azimuth), covering a broader area than higher-resolution imagery. Although TerraSAR-X data (1–2 m) are often resampled to 10 m to mitigate speckle effects [17], the proposed algorithm can be adapted to higher-resolution data with minor modifications.

To assess algorithmic stability, the study area was divided into 8 subsets. Ground range detected (GRD) SAR data were directly utilized and georeferenced using basic ground models, thereby minimizing the need for epipolar resampling. Nearly dense matching was conducted using NCC-based template matching, and narrow rotation between stereo pairs was compensated for using a simple affine transformation. While this method may not be universally applicable to all stereo SAR intensity pairs, it proved effective in this study because only a small portion of the original scenes was used. The affine transformation approach successfully addressed the azimuth parallax with an accuracy of approximately one pixel (see Figure 7). However, in some datasets residual displacements of several pixels may remain.

Although image rectification and search-space reduction are critical for accurate elevation modeling, these topics exceed the scope of the present study. Readers are referred to [27,28,58] for detailed discussion on epipolar resampling techniques in non-georeferenced SAR imagery.

Preliminary verification of azimuth displacement was performed using 2D NCC template matching, confirming the assumption of a 1D search space in the range direction. Compared to optical images, SAR intensity images generally pose fewer epipolarity challenges due to their inherent geometric stability, despite being more susceptible to geometric distortions resulting from its side looking acquisition geometry. Nonetheless, exceptions to this general observation do exist.

For validation, a reference elevation model with a vertical accuracy of 2.5 m was employed, derived from nationwide aerial imagery acquired approximately 15 years ago. Since the focus is on non-urban areas (agricultural/mountainous regions), temporal evaluation changes are negligible relative to the model’s accuracy. As will be shown in the Results Section, the vertical errors of he generated outputs exceed 30 m; therefore, alternative elevation datasets such as SRTM could also be used for comparative purposes without significantly affecting the conclusions. Example subsets (A1–A4) and their elevation models (B1–B4) are illustrated in Figure 8.

In the second stage, a pair of TerraSAR-X stereo images was utilized to further analyze the performance of the proposed algorithm alongside other benchmarked methods. Acquisition parameters are listed in Table 3. As illustrated in Figure 9, severe distortions—especially across mountainous regions—have arisen due to the area’s topographic conditions. To mitigate these effects and reduce the influence on the output results, the normalized images used in the dense matching stage were epipolar resampled based on a DEM-driven approach [27].

Furthermore, given the presence of broad homogeneous regions and minor discrepancies in ground range resolution caused by differing incident angles (particularly in topographically variable zones), both images were resampled to a common ground pixel size of 10 m. Additionally, the stereo pair and the corresponding reference surface elevation model were projected onto the WGS84 ellipsoid and UTM Zone 39. The selected study area spans elevation values ranging from 700 to 3000 m above sea level.

Since the elevation model used as the reference DSM for the Sentinel-1 image was acquired as a nationwide surface coverage, the same model was employed to assess the geolocation accuracy of the TerraSAR-X data over the Tehran region. Additionally, to enable epipolar resampling, an SRTM elevation model with a 30 m vertical resolution was utilized. As shown in Figure 9, layover-affected regions of the test area have been excluded from the evaluation section to facilitate more accurate result analysis.

2.2. State-of-the-Art Methods

In the present study, to compare the proposed algorithm with existing methods, a normalized cross-correlation-based method in a pyramid structure is applied to increase computational speed, alongside the basic and improved variants of the SGM algorithms, namely, SGM-const, SGM-grey, and SGM-canny. This section outlines the key components of the state-of-the-art algorithms. For a more in-depth examination, readers are referred to foundational works [17,56].

2.2.1. General Structure of Disparity Model Generation in Rectified Images

Across various elevation modeling algorithms, the generation of disparity models forms the core processing stage. Depending on whether the method is local or global, the following general steps are typically involved:

Step 1: Preprocessing of the input image pair (essential for SAR images).
Step 2: Selection of a similarity or dissimilarity (cost) metric, with parameter tuning.
Step 3: Construction of a cost volume model based on the chosen metric. In local models, this step uses neighborhood kernels; in global or semi-global models, it is calculated on a pixel-wise basis.
Step 4.1 (local models): Refinement of the cost volume model using adaptive or hybrid methods.
Step 4.2 (global/semi-global models): Cost aggregation with regularization to handle abrupt disparity variations.
Step 5.1: Generation of the final disparity model using either minimum cost or maximum similarity.
Step 5.2 (optional): Post-processing to refine the disparity model.

In both local and semi-global approaches, the bulk of computational load lies in the cost volume calculation and aggregation stages, respectively. For semi-global models, although the initial pixel-wise cost volume calculation is computationally efficient, the aggregation stage significantly increases processing time. Due to RAM limitations, cost volume size must be constrained, necessitating pyramid-based or tile-based processing for large images.

2.2.2. Normalized Cross-Correlation

Among various similarity metrics, NCC is employed due to its robustness. The similarity between pixel p in the reference image and q = p + d in the target image is calculated as

γ (p, q) = \frac{|\sum_{(i, j) ϵ W} (m_{i, j} - μ_{m}) (s_{i, j} - μ_{s})|}{\sqrt{\sum_{(i, j) ϵ W} {(m_{i, j} - μ_{m})}^{2}} \sqrt{\sum_{(i, j) ϵ W} {(s_{i, j} - μ_{s})}^{2}}}

(1)

where

W: Neighborhood window;
m_i,j, s_i,j: Intensity values in the reference and matching images;
μ_m, μ_s: Mean intensities in W.

For pyramid-based implementation, assuming halving image dimensions at each level, the maximum number of levels

x

is estimated as

x = [{l o g}_{2} (n)] - \{7,8\}

(2)

Here

n

is to the smaller image dimension (rows or columns), and ⌊*⌋ denotes the floor function. Empirically, lower pyramid levels are often not utilized if their dimensions fall below a few hundred pixels. Therefore, 7 (for a minimum of 128 rows or columns) or 8 (for a minimum of 256 rows or columns) is subtracted. For example, an image with dimensions of about 5000 pixels typically results in 4 to 5 usable pyramid levels. To transfer results between levels, image warping is generally applied before reperforming the matching process at higher resolutions.

2.2.3. Semi-Global Matching Model

In SGM, the cost function extends the basic local similarity metric by incorporating regularization terms to enforce smoothness in the disparity map. This makes it a quasi-global method [59,60]. As depicted in Figure 10, the generated cost volume exhibits pseudo-random behavior; hence, regularization is necessary to obtain coherent disparity maps.

Typically, the computational complexity of solving such a cost function is NP-hard [39], and they are generally not solvable. The semi-global method [38] addresses this issue by approximating the solution within a reduced subspace of the original problem domain, thereby overcoming the challenge of solving the full global cost function.

The final cost function at pixel p with disparity d is defined as

S (p, d) = \sum_{i} L_{r_{i}} (p, d)

(3)

where

L_{r_{i}}

(p, d) is the aggregated cost along the path r_i, i denotes the number of paths used, and S(p, d) is the final cost for pixel p at disparity d. Each path-based cost

L_{r_{i}}

is defined recursively as

L_{r} (p, d) = C (p, d) + \min (L_{r} (p - r, d), L_{r} (p - r, d - 1) + P_{1}, L_{r} (p - r, d + 1) + P_{1}, \min_{i} L_{r} (p - r, i) + P_{2}) - \min_{k} L_{r} (p - r, k)

(4)

where C(p, d) is the matching cost at pixel p, P₁ and P₂ are regularization parameters controlling disparity smoothness and edge preservation, and p − r refers to the preceding pixel along path r.

The subtraction of the minimum term prevents unbounded cost accumulation along the path. The values

P_{1}

and

P_{1}

are key to algorithm performance. Although these parameters can be adapted based on similarity or cost functions, the cost metric is commonly used. According to [17], there are three typical formulations for

P_{2}

:

SGM-const (SGM-1):—constant penalty:

P_{2} (p) = P_{2}^{0} = c o n s t a n t

In this method, all pixels in the image have the same coefficient for P2.

SGM-gray (SGM-2)—intensity gradient-based penalty:

$P_{2} (p) = \max (\frac{P_{2}^{0}}{|I_{(p)} - I_{(p - r)}|}, P_{1})$

where I(p) is the grayscale intensity at pixel p.

SGM-Canny (SGM-3)—edge-aware penalty using Canny edge detection:

P_{2} (p) = \{\begin{matrix} P_{21} i f C (p) = 1 \\ P_{22} i f C (p) = 0 \end{matrix}

where

C (p)

is the binary result of the Canny edge detector at pixel

p

, and

P_{21}

,

P_{22}

are respective penalties.

For a more comprehensive understanding of the aforementioned methods, the reader is encouraged to consult the relevant literature.

2.3. Proposed Methodology

As mentioned in the Introduction, elevation model generation is highly dependent on the accurate estimation of disparity models in stereo images. A brief examination reveals that, although adaptive modification of the cost function can be computationally intensive, the primary source of complexity in both local and non-local models lies in the computation of the cost function itself. In dense matching, the similarity metric must be computed for all pixels in the image, and this process is heavily influenced by the size of the kernel used.

In contrast to close-range imagery, optical satellite and SAR intensity images typically exhibit complex, repetitive textures and irregular patterns. These characteristics necessitate the use of larger kernels, which in practice offer more robust neighborhood information for comparison and result in cost volumes with reduced noise. However, due to the influence of elevation variation on the spatial configuration of neighboring pixels, the positional consistency within a neighborhood—especially when using a large kernel—tends to be highly non-stationary.

To address this, the present algorithm computes the disparity value in two stages. In the first stage, a large kernel size is used to obtain an initial disparity estimate at the multi-pixel level. In the second stage, a refined sub-pixel disparity calculation is performed using a small kernel. The use of large kernels in the first stage presents a challenge, as it significantly increases both the computation time and memory requirements. To mitigate this, the proposed algorithm introduces a method to reduce the computational complexity of large kernels by converting them into a small feature vector. This approach decreases computation time while maintaining the accuracy necessary for downstream processing.

Additionally, the method enables storage of neighborhood information in the third dimension, allowing for efficient computation of the three-dimensional cost volume. A simplified schematic of the proposed algorithm is presented in Figure 11. The following sections provide a detailed, step-by-step explanation of the algorithm.

2.3.1. Preprocessing

SAR intensity images are inherently affected by speckle noise due to the coherent nature of radar imaging. Since radargrammetry relies directly on intensity images, similar to photogrammetric approaches, reducing speckle noise is crucial, as it directly influences the reliability of the similarity metrics that form the basis of stereo-matching. Speckle degrades these metrics and poses serious challenges in subsequent stages.

Various algorithms have been developed to suppress speckle noise, with adaptive methods that preserve image texture receiving particular attention. In the present algorithm, two approaches [25,61,62] were used to reduce speckle noise. These algorithms were selected adaptively to balance noise suppression and texture preservation.

Furthermore, SAR intensity images acquired in stereo mode with a large baseline exhibit strong radiometric differences due to varying imaging angles. To address this, histogram matching is applied after speckle reduction to harmonize intensity distributions between the two images.

Note that similarity metrics, such as mutual information and the census transform are particularly sensitive to noise, making speckle reduction a critical preprocessing step. The impact of speckle reduction and histogram matching is illustrated in Figure 12.

Table 4 and Table 5 summarize the parameters applied in both preprocessing stages for Sentinel-1 and TerraSAR-X, respectively.

2.3.2. Storing Pixel Neighborhood Information as a Feature Vector in the Third Dimension and Constructing a Data Cube

If the neighborhood information of each pixel is stored in the third dimension, the cost function can be computed at high-speed using matrix shifting, like semi-global pixel-wise cost calculation. However, even for small kernel sizes, such as 5 by 5, the third dimension reaches a size of 25. As the kernel size increases, the required memory to store this neighborhood information becomes extremely large, rendering practical implementation infeasible.

Since the neighborhood data forms the feature vector of a pixel, the main question arises: Is it necessary to retain information from all neighboring pixels for effective comparison?

Consider a 1000 × 1000 image containing one million pixels. According to Equation (5), for a kernel of size

k

by

k

and an image with n-bit radiometric resolution, the size of the feature space is given by

F = {(k \times k)}^{(2^{n} - 1)}

(5)

In this equation,

F

denotes the size of the feature space,

k

is the kernel dimension, and

n

represents the radiometric resolution of the image in bits. Even for very small kernel sizes, this number becomes extremely large, forming a hyperspace where the majority of possible feature vectors are unpopulated. Consequently, the full kernel can be converted into a feature vector with smaller dimensions while still maintaining sufficient discriminatory power for pixel comparison.

To illustrate this, Figure 13 presents the normalized autocorrelation of a large-dimensional feature kernel with respect to its neighboring pixels, shown both in its original form and in a reduced form obtained through interpolation.

As observed, even after a significant reduction in kernel size, the similarity metric—especially near the autocorrelation peak—remains stable. Therefore, for cost function computation, reduced samples of the original kernel can be used without a major loss in performance.

However, naively extracting the neighborhood around each pixel followed by dimensionality reduction does not reduce computational time. To address this, the proposed algorithm performs dimensionality reduction for all pixels simultaneously using shift operations and joint sampling, storing the results in the third dimension.

The process begins with selecting two initial parameters: the original kernel size (OK) and the reduced vector size (RV). According to the sampling step (OK/RV), sampling is performed around each pixel in both row and column directions. The mean of these two directions is computed and stored as a feature vector of length RV in the third dimension.

A key implementation detail is that direct shift operations on the original image are unreliable due to two primary issues:

Radiometric non-stationarity around pixels caused by elevation differences.
High radiometric frequency variations in some image regions that may violate the Nyquist sampling criterion, resulting in inaccurate neighborhood representation.

To mitigate these issues, both images are first smoothed radiometrically using a linear low-pass mean filter before sampling. A linear low-pass filter ensures a uniform shift (linear phase response) across the image without altering the elevation-induced nonlinear shifts. While this filter removes high-frequency radiometric noise, it does not eliminate high-frequency elevation features such as fine topographic changes.

To clarify this, consider a homogeneous region in an image with similar gray levels but substantial elevation variation. Radiometric smoothing may reduce gray-level contrast, potentially obscuring fine elevation changes. However, since elevation-induced variations also influence radiometric values—particularly along the column direction—the algorithm incorporates a two-stage design. This design ensures that initial smoothing does not significantly hinder the detection of fine elevation features.

Figure 14 illustrates the process of feature vector creation in both the conventional and the proposed algorithm.

2.3.3. Constructing the Cost Cube Using Shift Operations

Typically, the search region in the secondary image determines the size of the third dimension in the cost cube. This region can be extracted either manually or automatically. If prior knowledge about the expected elevation variations is available, a search region can be manually defined, usually with a slight overlap to ensure that constraints on the search space do not degrade the accuracy of the algorithm’s output. Alternatively, the search range can be estimated automatically using the phase correlation method, as described in Equation (6).

P C (I 1, I 2) = \frac{F F T (I 1) . c o n j (F F T (I 2))}{a b s (F F T (I 1) . c o n j (F F T (I 2)))}

(6)

Here,

I 1

and

I 2

are the input images, FFT represents the Fourier Transform, and conj(⋅) is the complex conjugate operator. For images with a constant shift, the phase correlation function exhibits a sharp peak at the shift location. However, in stereo images, the correlation exhibits a broader region of high values, reflecting the range and mean of elevation differences across the image, shown in Figure 15.

Since the stereo images are rectified, displacement occurs only along the horizontal (column) direction, making only the first row of the phase correlation output relevant. To extract a reliable search, range from this row, a moving Max filter is applied to smooth out discontinuities and introduce overlap. Then, a threshold is used to determine the lower and upper bounds of the disparity range.

To extract the relevant disparity range, a binary mask is created by thresholding the first row (because of the epipolar resampling of the input, phase correlation happens only in the first row) of the phase correlation image as follows:

P C T (r, c) = \{\begin{matrix} 1 f o r P C (r = 1, c) > τ \\ 0 f o r P C (r = 1, c) \leq τ \end{matrix}, τ = 0.05 \times \max (P C (r = 1, c)), f o r r = 1 a n d A l l c

(7)

The search area SA is then determined by

S A = [c 1, c 2], s . t . c 1 = a r g \max_{c} (P C T) a n d c 2 = a r g \min_{c} (P C T)

(8)

In this context,

P C T

is the thresholded phase correlation image,

S A

is the search range, and

c 1

and

c 2

denote the minimum and maximum column indices where phase correlation exceeds the threshold

τ

. In the next step, the use of a large kernel size during the initial disparity estimation stage results in a broader full-width half maximum (FWHM) of the cost function. Note that FWHM is defined as the distance between two indices in which a symmetric function around its maximum reaches to its half-maximum value. This increased FWHM implies that fewer samples are required to extract the optimal disparity value, as the cost function becomes smoother and its maximum becomes easier to identify even at reduced resolution. Consequently, the cost function can be sampled at coarser intervals without sacrificing accuracy.

To better illustrate this point, Figure 16 compares the shape of the cost function for small (e.g., 5 × 5) and large (e.g., 125 × 125) kernel sizes in both original and reduced feature vector forms. As shown, larger kernels yield a smoother cost function with reduced noise and broader peaks, facilitating more efficient sampling.

Accordingly, instead of using a one-pixel step, a larger step size can be adopted in the disparity search process. Each sample’s index must be recorded separately, ensuring that the correct disparity value is retrieved even when step sizes exceed one pixel. A practical choice for the step size is the ratio of FWHMs for reduced and original kernels:

S L = \frac{{F W H M}_{R K}}{{F W H M}_{O K}}

(9)

Here,

{F W H M}_{R K} a n d {F W H M}_{O K}

denote the half-maximum widths of the cost function generated using the reduced and original kernels, respectively. The step size s is typically chosen as 2 or 3 to prevent a reduction in the accuracy of optimal value extraction. In this way, the third dimension of the cost (similarity) model is reduced to one-half or one-third.

2.3.4. Refinement of the Disparity Model Using Dual-Mode Median Filtering with Gradient-Based Thresholding

After calculating the initial disparity model using Equations (3)–(9), a refinement step is applied to improve model continuity and suppress noise. This refinement is performed using a dual-mode median filtering strategy, which operates adaptively based on the local gradient of the disparity model. The primary objective is to eliminate abrupt jumps or discontinuities in the disparity surface that may result from matching errors or noise.

To begin, the gradient of the initial disparity model is computed in both the row and column directions to determine the elevation jump for each pixel. A threshold is then applied to this gradient to distinguish between regions with smooth transitions and those exhibiting abrupt disparities (e.g., object boundaries or matching artifacts). Depending on the gradient magnitude, the disparity values are selectively smoothed using one of two median filters:

∆^{'} p (i, j) = \{\begin{matrix} {M F}_{1} ∆ p (i, j) i f G (∆ (i, j)) < τ 2 \\ M F_{2} (∆ p (i, j)) i f G (∆ (i, j)) > τ 2 \end{matrix}

(10)

In Equation (10),

∆^{'} p (i, j)

is the refined disparity at pixel (i, j),

G (*)

is the gradient of magnitude of the disparity model at (i, j),

τ 2

is the threshold value, and

M F

is the median filter. Empirical analysis suggests that a threshold value between 1 and 2 yields good continuity in the final disparity model.

M F_{1}

is selected as a 3 × 3 or 5 × 5 median filter, and

M F_{2}

is chosen with dimensions ranging from 9 × 9 to 11 × 11. The final gradient value is then calculated as

G (∆ p (i, j)) = \sqrt{{(∆ p (i, j) - ∆ p (i - 1, j))}^{2} + {(∆ p (i, j) - ∆ p (i, j - 1))}^{2}}

(11)

2.3.5. Sub-Pixel Estimation of the Optimal Value of the Similarity Model

To achieve sub-pixel accuracy in disparity estimation, the proposed algorithm employs a quadratic interpolation model, which provides a reliable approximation of the cost function near its maximum value. The quadratic model used is defined as

f (x) = a x^{2} + b x + c

(12)

In Equation (12) (x represents the pixel location along the epipolar line (column number), while a, b, and c are the coefficients of the function

f .

To determine these coefficients, a least-squares fitting method with zero degrees of freedom is used, leveraging three known points around the peak of the cost function:

\{\begin{matrix} T (i, j), U (i, j), V (i, j) \\ t (i, j), u (i, j), v (i, j) \end{matrix}

(13)

where

T, U

, and

V

are, respectively, the maximum value of the cost function and the two lower values at the pixel with address i and j, and t, u, and v are their corresponding positions along the epipolar line. Substituting these into the quadratic function yields the system of equations:

\{\begin{matrix} T (i, j) = a t^{2} (i, j) + b t (i, j) + c \\ U (i, j) = a u^{2} (i, j) + b u (i, j) + c \\ V (i, j) = a v^{2} (i, j) + b v (i, j) + c \end{matrix}

(14)

Also, the matrix form:

[\begin{matrix} T (i, j) \\ U (i, j) \\ V (i, j) \end{matrix}] = [\begin{matrix} t^{2} (i, j) & t (i, j) & 1 \\ u^{2} (i, j) & u (i, j) & 1 \\ v^{2} (i, j) & v (i, j) & 1 \end{matrix}] \times [\begin{matrix} a \\ b \\ c \end{matrix}]

(15)

[\begin{matrix} a \\ b \\ c \end{matrix}] = {[\begin{matrix} t^{2} (i, j) & t (i, j) & 1 \\ u^{2} (i, j) & u (i, j) & 1 \\ v^{2} (i, j) & v (i, j) & 1 \end{matrix}]}^{- 1} \times [\begin{matrix} T (i, j) \\ U (i, j) \\ V (i, j) \end{matrix}]

(16)

The maximum value at the sub-pixel level is equal to

∆ f (i, j) = \frac{- b (i, j)}{2 \times a (i, j)}

(17)

By computing the inverse matrix parametrically, the values of a and b can be calculated, and then the sub-pixel disparity can be computed in parallel for all pixels:

a (i, j) = [- 1 / (t u + t v - u v - t^{2}), - 1 / (t u - t v + u v - u^{2}), 1 / (t u - t v - u v + v^{2})] \times [\begin{matrix} T (i, j) \\ U (i, j) \\ V (i, j) \end{matrix}]

(18)

b (i, j) = [(u + v) / (t u + t v - u v - t^{2}), (t + v) / (t u - t v + u v - u^{2}), - (t + u) / (t u - t v - u v + v^{2})] \times [\begin{matrix} T (i, j) \\ U (i, j) \\ V (i, j) \end{matrix}]

(19)

2.3.6. Image Warping Using the Refined Disparity Model and Repetition of the Similarity Volume Construction with a Small Kernel Size

At this stage of the algorithm, the cost volume construction process is repeated, but this time using a smaller kernel size to achieve finer disparity resolution. Before starting the process, the search image is warped using the initial refined disparity model obtained from the previous stage.

Without image warping, repeating the similarity calculation with a small kernel would necessitate re-evaluating the entire disparity search space at a step size of one pixel, leading to increased computational complexity and higher memory requirements. By contrast, warping the search image using the initial refined disparity model effectively aligns the stereo pair, significantly narrowing the search range required for matching and enabling more efficient and localized cost computation.

The update of the input algorithm is outlined below:

Acquisition of stereo images (I₁ and I₂)
Selection of kernel sizes:
○
K1: initial (larger) kernel size.
○
K2: reduced kernel size.
Estimation of the sampling ratio K2/K1, followed by image smoothing using a mean filter applied in the frequency domain
Conversion of 2D images to 3D by
○
Sampling pixel neighborhoods using row and column averaging.
○
Storing reduced feature vectors in the third dimension.
Estimation of the search area (SA):
○
Apply phase correlation (PC) in the FFT domain.
○
Threshold the result to obtain a constrained disparity search interval.
Determination of the motion step (MS) from the structural shape of the cost function
Matrix shifting:
○
Shift the reference image over the search image.
○
Compute the cost function at each candidate disparity using pixel-wise similarity.
Disparity estimation:
○
Identify the maximum of the cost function.
○
Store the corresponding disparity values as the initial disparity model (∆).
Disparity refinement:
○
Apply median filtering guided by disparity gradient thresholding.
○
Generate a refined disparity model (∆′).
Search image warping using ∆′
○
Use the refined disparity model to realign the search image.
○
Reconstruct the data cube using a small kernel.
○
Set motion step to 1 pixel.
○
Repeat steps by small kernel for higher precision.
Sub-pixel disparity estimation:
○
Fit a quadratic function to the cost values around each maximum.
○
Apply least-squares fitting to estimate sub-pixel disparity locations.

This two-pass architecture—coarse-to-fine disparity estimation—enables the algorithm to achieve both robustness (in the initial pass with a large kernel) and high accuracy (in the refined pass with a small kernel), while maintaining computational efficiency through intelligent search space reduction using image warping.

To clarify the algorithmic procedure and illustrate the role of the associated equations, the diagram in Figure 17 has been provided.

2.4. Expected Elevation Accuracy (Practical Perspective)

Before presenting the results, it is useful to review the theoretically achievable elevation accuracy derived from stereo imagery in radargrammetric and photogrammetric approaches. From a practical standpoint, the attainable elevation accuracy in these methods depends on four main factors: baseline (incidence angles), spatial resolution, radiometric resolution, and the accuracy of correspondence point extraction. Although deriving exact mathematical relationships—particularly for the latter two factors—is challenging, their general influence can be assessed to some extent. For illustration, the effects of two key factors (baseline and spatial resolution) are shown in Figure 18 and Figure 19, respectively.

Additionally, the main factors affecting elevation extraction accuracy are presented in Equations (20) and (21).

σ_{i, j}^{Z} = α_{i, j}^{1} . α_{i, j}^{2} . β_{i, j}^{0} . β_{i, j}^{1} . σ_{i, j}^{H}

(20)

σ_{i, j}^{Z} = α_{i, j}^{1} . α_{i, j}^{2} . α_{i, j}^{3} . β_{i, j}^{0} . β_{i, j}^{1} . P = P . (\prod_{k} β_{i, j}^{k}) . (\prod_{k} α_{i, j}^{k})

(21)

In Equation (20),

σ_{i, j}^{Z}

denotes the elevation extraction accuracy,

α_{i, j}^{1}

represents the coefficient associated with the baseline,

α_{i, j}^{2}

accounts for geometric distortions,

β_{i, j}^{0}

is the coefficient related to radiometric resolution, and

β_{i, j}^{1}

reflects the accuracy of correspondence point extraction. The term

σ_{i, j}^{H}

indicates the attainable planimetric accuracy in the images. Since planimetric accuracy itself is affected by various factors such as ground pixel size and the Kell factor, Equation (21) replaces

σ_{i, j}^{H}

with two terms:

α_{i, j}^{3}

, representing the conversion effect from planimetric accuracy to pixel size, and P, a constant representing the nominal pixel size (in meters). The indices i and j correspond to the pixel’s spatial position in the image space, indicating that, in practice, elevation accuracy is spatially variable across the image.

While mathematically, improving correspondence extraction accuracy could reduce the influence of other factors, this is impractical for two main reasons. First, the tools required to verify the numerical accuracy of correspondence locations—typically a fraction of a pixel—are not readily available. Second, the aforementioned factors are not statistically independent but highly interdependent. For example, radiometric resolution and geometric distortions significantly impact correspondence extraction accuracy. Consequently, flight planning and sensor configuration during data acquisition are critical to achieving the desired accuracy, and post-processing methods have limited capability to compensate for these factors.

As previously noted, directly examining these relationships and isolating influencing factors is extremely difficult due to their complexity and unknown dependencies. Therefore, a simplified estimation method can be adopted for assessing potential elevation accuracy. If the ground pixel size (in meters) of the stereo pair and the maximum elevation variation in the object space (in meters) are known, Equation (22) can be used to estimate the minimum extractable elevation step size. In this context,

σ_{i, j}^{Z}

is replaced by

{\tilde{S}}_{I_{1} I_{2}}^{Z}

, representing the minimal achievable elevation step size (larger values correspond to lower elevation accuracy):

{\tilde{S}}_{I_{1} I_{2}}^{Z} \geq \frac{γ \times ∆ Z}{(\max (d i s p) - \min (d i s p)) \times P}, γ \in [0,1]

(22)

{\tilde{S}}_{I_{1} I_{2}}^{Z} \propto \frac{1}{\frac{1}{m n} \sum_{j = 1}^{n} {\sum_{i = 1}^{m} σ}_{i, j}^{Z}}

(23)

In Equation (22) -,

∆ Z

is the maximum elevation difference in the ground space (meters);

m a x (*)

and

m i n (*)

denote the maximum and minimum operators, respectively;

d i s p

represents the disparity values in the image space (pixels);

P

is the nominal pixel size (meters);

γ

is a fractional coefficient representing correspondence extraction accuracy. Finally,

{\tilde{S}}_{I_{1} I_{2}}^{Z}

is the estimated minimum elevation step size achievable (meters). If disparities are manually measured, the best achievable accuracy is half a pixel, giving

γ

= 0.5. Equation (23) presents the relationship between elevation model resolution and elevation model accuracy. Although this estimate is approximate, using γ = 0.5 provides a reasonable expected value. While algorithmic matching methods can theoretically reach sub-pixel accuracies (as low as 0.1 pixel), validating such precision remains difficult. Therefore, adopting the more conservative operator-derived value (γ = 0.5) yields a more realistic expectation.

For example, in the present dataset, the maximum elevation difference in the ground space is approximately 2000 m, with a disparity range of about 55 pixels and a pixel size of 10 m. This results in an estimated elevation model resolution of roughly 18 m. Thus, under optimal conditions, an accuracy of around 18 m could be expected; however, SAR-specific distortions and lower radiometric resolution will likely degrade this theoretical limit.

3. Results

This section presents the results obtained from the implementation of the proposed algorithm alongside three comparison methods: adaptive SGM and pyramidal NCC. The first subsection outlines the parameters used for each in tabular form.

Since the images used are at the GRD level and are georeferenced, the final disparity models obtained from each method can be converted into DSMs (digital surface models). This is possible because the disparity values are directly related to the metric space of the images. To facilitate this conversion, the SAR images and the reference DSM (derived from aerial images) were first converted from the geographic coordinate system to a projected coordinate system (metric space). The WGS84 reference ellipsoid was selected, and the UTM projection was applied. Given that the stereo section of the image pair lies entirely within UTM Zone 38, both the SAR images and the reference DSM were exported in that zone. Because of the incidence angle differences along the range direction for the stereo pair, displacement and elevation in the image can be described by Equation (24):

h = \frac{d p}{\tan (Ω_{1}) - t a n (Ω_{2})}

(24)

In Equation (24), dp is the ground displacement (parallax) between two points in meters,

Ω_{1}

and

Ω_{2}

are the incidence angles of the first and second sensors at the point of interest, and h is the elevation in ground space in meters. That is why the SAR stereo pair has a higher elevation extraction precision in the far-range part of the images.

It should be noted that if SAR geometric models are used during the image normalization stage and the circular imaging geometry is not linearized, the incidence angles will influence the normalization. In such cases, normalized images maintain a linear relationship with the ground space, especially if imprecise elevation models are used. However, if simple models (e.g., affine transformations) are employed to project georeferenced images, the transformation from disparity space to absolute elevation becomes nonlinear, requiring the full intersection process (i.e., range and Doppler equations and coordinate transformations) for accurate results.

In this study, the images were projected assuming a constant elevation in the range direction on the WGS84 reference ellipsoid. With available incidence angle data for both images, Equation (24) can be used within the stereo area to convert disparity to elevation. Incidence angles for each range distance can be interpolated using the beginning and ending angle values provided in the image metadata.

To enhance accuracy and eliminate systematic errors in the DSM comparison, at least 50 control points were extracted from each of the stereo images. Due to the 10 m spatial resolution of the SAR images, Google Earth imagery (Google) was utilized to extract the control points, focusing on high reflectivity features such as small buildings, water reservoirs, or medium-sized rocks in mountainous areas. Of these, 35 points were used to model the elevation, and 15 points were used for testing.

To address potential temporal discrepancies between the aerial reference imagery and the SAR data, control points were carefully extracted from man-made structures visible in Google Earth imagery that matched the acquisition date of the SAR dataset as closely as possible. This approach minimized errors due to changes over time and ensured greater consistency between the reference and SAR data.

As portrayed in Figure 20, even without explicit incidence angle data, a reasonably accurate ground–disparity relationship can be established if control points are uniformly distributed along the range direction of the entire stereo area.

In this simulation, exaggerated disparity values were used. However, smaller incidence angles—while increasing sensitivity to disparity-elevation conversion—can also reduce the absolute accuracy of the final DSM. Therefore, for radargrammetric purposes, incidence angles must be chosen to balance accuracy and minimize geometric distortion during the correspondence process.

Although opposite-side imaging can improve accuracy, it is rarely used due to geometric complications. The following subsections present the algorithm parameters and results from GCP-based analysis and DSM comparisons, both quantitatively and qualitatively.

The parameter settings for each algorithm were optimized through trial and error. While a full parameter sweep was impractical, the values were tuned to provide a fair comparison. The proposed algorithm uses the NCC similarity measure, whereas SGM-based methods use the SAD cost function. Despite preprocessing efforts to reduce speckle, SAR images still exhibit some fluctuations, which significantly affect the output of the MI measure. SAD, being less sensitive, was therefore preferred for SGM.

Large kernel sizes are more necessary for satellite SAR imagery due to weaker texture patterns. As demonstrated in [17], TerraSAR images were resampled to 10 m resolution to mitigate this issue. In practice, high-resolution satellite images, due to the lack of feature patterns at smaller scales, especially images acquired from agricultural, forested, or mountainous areas with natural vegetation and grasslands, exhibit very weak texture and are considered homogeneous in areas likely smaller than 10 to 15 m. Therefore, either the kernel size must be classically increased, or the image must be resampled to a coarser spatial resolution. This issue has also been examined for optical images in the present study, confirming the same point. For this reason, if small kernel sizes are used for the initial search range in correspondence estimation, the highly noisy behavior of the cost function model will prevent the extraction of the correct optimal value. On the other hand, as shown in Table 6, increasing the kernel size in local methods significantly increases computational complexity, which in practice limits their use.

3.1. Disparity Model Completeness

One of the key factors for evaluating the performance of correspondence algorithms is their success rate in correctly determining disparity values across all pixels in the image. This performance is typically quantified using the disparity model completeness parameter, which measures the proportion of pixels with valid disparity values. Assuming a total of P pixels in the stereo region and P_d as the number of pixels with valid disparity values, the completeness parameter C and bad pixel percentage B are defined as follows:

\begin{matrix} C = \frac{P_{d}}{P} \times 100, P_{d} \leq P \\ B = (1 - \frac{P_{d}}{P}) \times 100, P_{d} \leq P o r B = 100 - C \end{matrix}

(25)

where

C

is the completeness (% of valid disparity values),

B

is the bad pixel percentage (% of invalid or missing disparity values),

P

is the total number of pixels in the stereo region, and P_d is the number of pixels with valid disparity values.

Table 7 presents the completeness percentages for each method across eight subsets of a high-resolution SAR stereo dataset (1596 × 4833 pixels).

Among all methods, SGM-3 (based on Canny edge detection) achieved the highest completeness across most subsets, producing smoother and denser disparity maps. The relatively lower percentage of SGM-2 (grayscale-based SGM) underscores the difficulty of disparity estimation in SAR imagery due to radiometric inconsistencies that hinder the detection of elevation jumps using intensity thresholds. Although the NCC algorithm yields high completeness, this result can be misleading, as completeness can be artificially inflated by increasing kernel size, which suppresses texture detail or allows larger absolute matching errors. A similar trend is observed in the proposed method, which can boost valid disparity counts through efficient vector encoding. Therefore, completeness alone is not a sufficient indicator of performance and should be interpreted in conjunction with accuracy-based metrics.

3.2. Processing Time Evaluation

Table 8 summarizes the total processing time for each algorithm, including a common preprocessing step (adaptive speckle reduction and histogram matching) lasting approximately 4 s for all methods.

The proposed algorithm demonstrates the lowest overall processing time. Table 9 further details the breakdown of its temporal components. The results confirm the proposed method’s efficiency in both computational speed and disparity estimation density.

3.3. Quantitative and Qualitative Evaluation of Elevation Models

The performance of each algorithm is further assessed by comparing the elevation models derived from their disparity outputs to a reference DSM, using RMSE and standard deviation metrics. Table 10 and Table 11 present these results.

The quantitative results obtained from each method, along with the residuals of the surface models relative to the reference model, are presented in Figure 21 and Figure 22, respectively. The proposed algorithm consistently achieves a low RMSE and standard deviation, confirming its effectiveness in generating accurate and stable elevation models. Visual inspection of residual maps (Figure 23) further illustrates that while some localized artifacts remain, the proposed method produces smoother results compared to NCC, particularly in complex terrain.

Errors tend to increase in high-relief areas due to geometric distortions and texture variations intrinsic to SAR imaging. Figure 23 provides a profile-based comparison along a representative image row, reinforcing these findings.

To validate geometric consistency, incidence angle corrections were applied, followed by establishing a linear transformation between disparity and elevation using at least 35 control points per image (plus 15 for validation), as shown in Figure 24. This ensured the absence of systematic errors in the elevation models.

Finally, Figure 25 highlights disparity gaps in texture-poor regions (e.g., agricultural fields), where correspondence algorithms often fail due to homogeneous surface patterns. These gaps underscore the inherent challenges of SAR-based disparity estimation in low-texture scenes, which may require integration with other data sources (e.g., LiDAR or optical stereo) for completeness.

To facilitate a more insightful analysis of residual elevation errors from the reference surface model across different approaches, their error distributions have been visualized using histograms. These plots illustrate both the root mean square error (RMSE) and the 90th percentile of absolute error, while also depicting the overall statistical shape of the error distribution. As shown in Figure 26, the outputs of most methods exhibit an approximately Rayleigh-like distribution. Ideally, a Gamma distribution is preferred, where the majority of errors concentrate near zero and the frequency of larger deviations decays exponentially. In contrast, the observed Rayleigh-like patterns in the current models show a peak (or mode) greater than zero but still demonstrate exponential decay in the tail. Accordingly, superior algorithms tend to produce error distributions with sharper peaks closer to zero and a more rapid exponential falloff, indicating enhanced performance. In the histogram model, the absolute error values are considered to avoid misleading interpretations caused by mean or mode values. This choice ensures a more robust visualization of error distribution, especially in cases where skewness or outliers might distort central tendency metrics. In addition, to assess the contribution of the preprocessing stage, the proposed algorithm was evaluated independently—without preprocessing—to highlight its role in reaching optimal accuracy. The result is illustrated in Figure 27.

To thoroughly analyze the performance of the proposed algorithm and highlight the significance of its two-stage structure, a series of dense matching experiments have been conducted. These assessments emphasize the algorithm’s behavior during the matching process by employing feature vectors of varying lengths, as well as local matching techniques using kernels of different sizes—evaluated at both the first and second stages of the proposed algorithm. The obtained results are summarized in Table 12 and Table 13, respectively.

Table 14 presents the performance evaluation parameters of the proposed algorithm in comparison with other algorithms. The performance evaluation of the proposed algorithm in comparison with existing methods on TerraSAR-X data is presented in the following section. Table 14 provides a detailed quantitative summary, including key evaluation metrics and the tuning parameters associated with each approach. Figure 28 showcases the visual elevation models reconstructed by each algorithm, alongside the reference surface model. Consistent with the method applied to Sentinel-1 data, elevation error maps are separately illustrated in Figure 29. To enable deeper statistical interpretation beyond standard metrics, Figure 30 depicts the histogram-based distribution of absolute error values for each model. This visualization allows for comparative analysis of error concentration and tail behavior across methods.

4. Discussion

In the present study, an algorithm was proposed for extracting disparity models from stereo SAR intensity images using a radargrammetric approach [64,65,66,67,68]. The proposed method, inspired by the basic local NCC algorithm and designed to address its structural and computational limitations, introduces a spatial–frequency domain mechanism for generating the cost volume in the correspondence algorithm. The proposed algorithm exhibits low computational complexity and execution time while maintaining stable results in comparison to adaptive methods. The proposed algorithm can be potentially used for various applications concerning ground deformation and land subsidence/uplift [69,70].

Since the core of the proposed algorithm for computing the cost volume relies on reducing the dimensionality of conventional kernels, it inherently shows low sensitivity to noise. Although a preprocessing step was added to all algorithms in this study for quantitative quality comparison, the proposed method, unlike the others, is capable of producing an output with limited elevation accuracy even without preprocessing—a capability not found in the other methods without such a step. As previously explained, this is because the proposed algorithm focuses more on the textural features surrounding each pixel rather than changing the similarity measure. Transforming the neighborhood structure of a pixel into a more compact and lightweight feature vector enables the storage of features in the third dimension for both the left and right images. This facilitates the real-time generation of the cost volume, similar to semi-global methods. Still, unlike semi-global and global pixel-based methods, the generated cost volume is not pixel-based and, in practice, does not require a cost aggregation step.

As discussed in the expected accuracy analysis (Section 2.4), the achieved results fall short of ideal predictions due to multiple contributing factors. One major source of deviation stems from inherent geometric distortions presents in radar imagery. As illustrated schematically, when the radar incident angle approaches near-vertical orientations (close to 90°), the effective ground resolution in the range direction deteriorates significantly compared to the nominal sensor resolution. This causes aggregation of extended surface areas into individual pixels, leading to reduced spatial precision and overall matching accuracy. The degradation is particularly pronounced in mountainous or topographically complex regions.

Such structural limitations also influence the stereo-matching process, particularly the γ parameter in Equation (22). Although a nominal value of 0.5 is commonly used for estimating ideal accuracy, it cannot be assumed that all pixels conform to this level of precision. Among the influential parameters is the kernel size used during matching. To achieve optimal performance, both the proposed algorithm and conventional local methods are tested with varying kernel sizes, selecting the configuration that yields the best output. However, uniform kernel sizes across the image may not result in uniform accuracy. A kernel that performs optimally in one region may not be ideal elsewhere, particularly across heterogeneous landscapes.

This motivates the use of adaptive methods, where kernel dimensions are dynamically adjusted to compensate for local variations. While the proposed algorithm supports flexible kernel configurations, its current design—based on unified computation—does not accommodate per-pixel dynamic kernel adaptation within a single processing pass. This presents an opportunity for future work, where simultaneous optimization of kernel sizes across the image may further improve disparity estimation accuracy.

Although the proposed structure may be less effective than semi-global methods in regions with abrupt elevation changes, the integration of a dedicated post-processing stage helps suppress unwanted distortions in the final model. It is noteworthy that height model generation in areas with significant elevation variability—such as urban environments—using SAR intensity imagery has recently gained attention. Therefore, adapting dense matching algorithms to achieve performance comparable to those used in optical imagery warrants further investigation. As demonstrated in Table 12 and Table 13, leveraging an adaptively sized kernel window could address the trade-off between pixel-level matching precision and smoothing behavior. This approach, which may offer an alternative to the two-stage design, should be considered in future research. Additionally, due to the high computational load involved in both cost function computation and aggregation in the SGM algorithm, incorporating the proposed algorithm into its framework may retain its stability while significantly reducing computational complexity.

Parameter tuning in the proposed algorithm is significantly less complex than in other algorithms, primarily due to its two-stage architecture. In the first step, the focus is on significantly reducing the search space or disparity range by employing larger kernel sizes. In local algorithms, increasing the kernel size often imposes a hefty computational burden, particularly due to the instability of the surrounding texture and the invalidity of assuming a uniform disparity model for all pixels within a kernel. Moreover, vectorization or cost volume construction (as performed in the proposed algorithm) is typically not feasible in conventional local approaches. Even when implemented in two stages, superior results can be achieved by using a large kernel in the first stage followed by a smaller kernel in the second. The central concept of the proposed method is based on this principle, which, as discussed in earlier sections, commonly limits the applicability of local or adaptive methods, especially in real-time applications.

Up to now, most algorithms and methods for monitoring and detecting ground deformations—such as subsidence and landslides—have focused heavily on interferometric techniques. Given that, as mentioned earlier in the Introduction, interferometry poses several practical challenges, the proposed algorithm can contribute to this field from two perspectives: indirectly by supporting interferometric algorithms, particularly during the critical stages of co-registration and phase unwrapping, and directly as a standalone method.

Although radargrammetry may initially face limitations—especially when spatial displacements are within the centimeter or sub-centimeter range—similar to the development of Persistent Scatterer techniques in interferometry, radargrammetric approaches can also benefit from such features. These targets exhibit stable intensity values, in addition to consistent phase returns, and often appear as very bright pixels in the image. When the scatterer is located at a sub-pixel level, similar to what occurs in optical imagery due to limited spatial resolution and sensor constraints, the intensity affects not only the primary pixel but also its neighbors. Unlike typical SAR pixels, these effects are significant and meaningful. This phenomenon is known as the point spread function (PSF), or its frequency domain counterpart, the modulation transfer function (MTF) [71,72].

Due to PSF’s high sensitivity to sub-pixel displacements, this characteristic can be exploited for monitoring fine-scale spatial changes. However, accurate PSF estimation first requires reliable matching of similar features between stereo images, and also the use of the image’s original radiometric values (including speckle), as PSF is sensitive to grayscale levels. As previously discussed, the proposed algorithm is inherently suitable for application to raw images with speckle thanks to its use of a compact kernel, and it preserves gray-level consistency between stereo pairs by employing a linear-phase low-pass filter. This enables accurate extraction of the PSF and, consequently, the precise estimation of sub-pixel displacements in stable scatterers. Further exploration in this area could be highly valuable, especially considering that the latest sensors provide significantly higher spatial resolution capabilities.

5. Conclusions

This study introduces a novel algorithm for extracting corresponding pixels in SAR-intensity stereo images, aiming to balance the trade-off between output model quality and execution time. The performance of the proposed algorithm, when compared to adaptive methods (especially SGM), demonstrates its strong potential for generating disparity models from stereo images, particularly SAR intensity images. Compared to the state-of-the-art techniques, the proposed algorithm achieves higher accuracy in areas with flatter topography and lower accuracy in mountainous regions. Since the algorithm was tested on datasets that included approximately equal proportions of both types of topography (in some cases, mountainous areas made up a larger portion of the image), the results can be considered highly reliable. Furthermore, the SAR data used in this study presents greater challenges than higher-resolution datasets, such as Terra-SAR-X, due to two main factors: (1) lower radiometric resolution (Sentinel images using the C-band versus Terra-SAR-X images using the X-band), and (2) greater geometric distortion resulting from larger differences in incidence angles in the stereo image. Therefore, it is reasonable to generalize the proposed algorithm to datasets with better spatial and radiometric resolution.

To enable a more rigorous evaluation, a pair of TerraSAR-X datasets was analyzed alongside Sentinel-1 imagery, facilitating a comprehensive assessment of the proposed algorithm’s robustness and generalizability relative to existing methods. The results—both quantitative and qualitative—consistently demonstrated the algorithm’s competitive performance in conjunction with other approaches. Nonetheless, as noted in the discussion, structural refinements to the algorithm could further enhance its qualitative outcomes while substantially improving computational efficiency.

The proposed algorithm is primarily built upon computational complexity modeling, but it still requires further adaptability to diverse conditions (e.g., significant elevation discontinuities) to remain competitive with state-of-the-art methods. Particularly, recent research trends have shown heightened interest in high-resolution SAR imagery, both radiometrically and spatially, as a promising alternative in scenarios where optical data or interferometric techniques face serious limitations. The core algorithm has been independently validated to assess its robustness and generalization capability. In future work, hybrid integration with semi-global models is envisioned, aiming to leverage their dynamic performance in smoother elevation reconstruction—especially in areas with pronounced topographic variation—while significantly enhancing their cost function construction and aggregation efficiency, which remains their main computational bottleneck.

The very short time required to generate the cost volume also allows for the integration of a correction or editing stage, i.e., the inclusion of adaptive post-processing operations. Additionally, the method could serve as a preprocessor or initial stage for semi-global algorithms, potentially reducing the need for cost function aggregation across multiple paths, e.g., the standard 16-path aggregation in SGM, offering a promising direction for future research.

Author Contributions

Conceptualization, H.J., M.J.V.Z., E.G. and P.M.; data curation, H.J.; methodology, H.J.; formal analysis, H.J.; writing—original draft, H.J.; supervision, M.J.V.Z., E.G. and P.M.; writing—review and editing, M.J.V.Z., E.G. and P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Space It Up, funded by the Italian Space Agency and the Ministry of University and Research—Contract No. 2024-5-E.0—CUP No. I53D24000060005.

Data Availability Statement

The datasets and code utilized in this research can be found online at https://github.com/HamidJannati/DenseMatching (accessed on 1 August 2025).

Acknowledgments

The authors would like to thank the reviewers for their time and insightful comments. E. Ghaderpour and P. Mazzanti thank Space It Up, funded by the Italian Space Agency and the Ministry of University and Research—Contract No. 2024-5-E.0—CUP No. I53D24000060005.

Conflicts of Interest

The authors Ebrahim Ghaderpour and Paolo Mazzanti were employed by the company NHAZCA s.r.l. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Ostrowski, J.A.; Cheng, P. DEM Extraction from Stereo SAR Satellite Imagery. In Proceedings of the IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No.00CH37120), Honolulu, HI, USA, 24–28 July 2000; Volume 5, pp. 2176–2178. [Google Scholar]
Toutin, T.; Gray, L. State-of-the-Art of Elevation Extraction from Satellite SAR Data. ISPRS J. Photogramm. Remote Sens. 2000, 55, 13–33. [Google Scholar] [CrossRef]
Bagheri, H.; Schmitt, M.; d’Angelo, P.; Zhu, X.X. A Framework for SAR-Optical Stereogrammetry over Urban Areas. ISPRS J. Photogramm. Remote Sens. 2018, 146, 389–408. [Google Scholar] [CrossRef] [PubMed]
Farr, T.; Rosen, P.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, RG2004. [Google Scholar] [CrossRef]
Schrader, H.; Fahrland, E.; Paschko, H.; Mania, R. SAR-Based Elevation Models—From Global to Local. In Proceedings of the EUSAR 2024—15th European Conference on Synthetic Aperture Radar, Munich, Germany, 23–26 April 2024; p. 5. [Google Scholar]
Wu, Y.; Zhang, H.; Wang, J.; Wang, R.; Zhao, F.; Wu, Z.; Cai, Y. Stereo-Radargrammetry Assisted InSAR Phase Unwrapping Method for DEM Generation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5233718. [Google Scholar] [CrossRef]
Yang, W.; Li, X.; Yang, B.; Fu, Y. A Novel Stereo Matching Algorithm for Digital Surface Model (DSM) Generation in Water Areas. Remote Sens. 2020, 12, 870. [Google Scholar] [CrossRef]
Balz, T.; Zhang, L.; Liao, M. Direct Stereo Radargrammetric Processing Using Massively Parallel Processing. ISPRS J. Photogramm. Remote Sens. 2013, 79, 137–146. [Google Scholar] [CrossRef]
Zhang, T.; Chen, Y.; Zhang, L.; Wilson, J.P.; Zhu, R.; Chen, R.; Li, Z. Multibaseline Interferometry Based on Independent Component Analysis and InSAR Combinatorial Modeling for High-Precision DEM Reconstruction. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5205417. [Google Scholar] [CrossRef]
Sterzai, P.; Creati, N.; Zanutta, A. Computation of the Digital Elevation Model and Ice Dynamics of Talos Dome and the Frontier Mountain Region (North Victoria Land/Antarctica) by Synthetic-Aperture Radar (SAR) Interferometry. Glacies 2025, 2, 3. [Google Scholar] [CrossRef]
Zhang, L.; Huang, G.; Li, Y.; Yang, S.; Lu, L.; Huo, W. A Robust InSAR Phase Unwrapping Method via Improving the Pix2pix Network. Remote Sens. 2023, 15, 4885. [Google Scholar] [CrossRef]
Crosetto, M. Calibration and Validation of SAR Interferometry for DEM Generation. ISPRS J. Photogramm. Remote Sens. 2002, 57, 213–227. [Google Scholar] [CrossRef]
Zhang, S.; Wang, J.; Feng, Z.; Wang, T.; Li, J.; Liu, N. Verification of the Accuracy of Sentinel-1 for DEM Extraction Error Analysis under Complex Terrain Conditions. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104157. [Google Scholar] [CrossRef]
Wang, J.; Lv, X.; Wang, H.; Li, S.; Fu, X. A Novel Stereo Positioning Feedback Method for Multi-Image Radargrammetric DSM Generation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 3003605. [Google Scholar] [CrossRef]
Chang, Y.; Xiong, X.; Xu, Q.; Jin, G.; Zhang, G.; Cui, R. Dense Matching Method for UAV SAR Images Without Epipolar Rectification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4006105. [Google Scholar] [CrossRef]
Dubois, C.; Nascetti, A.; Thiele, A.; Crespi, M.; Hinz, S. SAR-SIFT for Matching Multiple SAR Images and Radargrammetry. PFG–J. Photogramm. Remote Sens. Geoinf. Sci. 2017, 85, 149–158. [Google Scholar] [CrossRef]
Wang, J.; Gong, K.; Balz, T.; Haala, N.; Soergel, U.; Zhang, L.; Liao, M. Radargrammetric DSM Generation by Semi-Global Matching and Evaluation of Penalty Functions. Remote Sens. 2022, 14, 1778. [Google Scholar] [CrossRef]
Toutin, T. Generating DEM from Stereo Images with a Photogrammetric Approach: Examples with VIR and SAR Data. Adv. Remote Sens. 1995, 4, 110–117. [Google Scholar]
Feng, S.; Lin, Y.; Wang, Y.; Yang, Y.; Shen, W.; Teng, F.; Hong, W. DEM Generation With a Scale Factor Using Multi-Aspect SAR Imagery Applying Radargrammetry. Remote Sens. 2020, 12, 556. [Google Scholar] [CrossRef]
Guimarães, U.S.; da Silva Narvaes, I.; de Lourdes Bueno Trindade Galo, M.; de Queiroz da Silva, A.; de Oliveira Camargo, P. Radargrammetric Approaches to the Flat Relief of the Amazon Coast Using COSMO-SkyMed and TerraSAR-X Datasets. ISPRS J. Photogramm. Remote Sens. 2018, 145, 284–296. [Google Scholar] [CrossRef]
Luo, Y.; Deng, Y.; Xiang, W.; Zhang, H.; Yang, C.; Wang, L. Radargrammetric 3D Imaging through Composite Registration Method Using Multi-Aspect Synthetic Aperture Radar Imagery. Remote Sens. 2024, 16, 523. [Google Scholar] [CrossRef]
Capaldo, P.; Crespi, M.; Fratarcangeli, F.; Nascetti, A.; Pieralice, F.; Porfiri, M.; Toutin, T. Dsms Generation from Cosmo-Skymed, Radarsat-2 and Terrasar-X Imagery on Beauport (Canada) Test Site: Evaluation and Comparison of Different Radargrammetric Approaches. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, XL-1/W1, 41–46. [Google Scholar] [CrossRef]
Wang, J.; Lv, X.; Huang, Z.; Fu, X. An Epipolar HS-NCC Flow Algorithm for DSM Generation Using GaoFen-3 Stereo SAR Images. Remote Sens. 2023, 15, 129. [Google Scholar] [CrossRef]
Brunner, D.; Lemoine, G.; Bruzzone, L. Estimation of Building Heights from Detected Dual-Aspect VHR SAR Imagery Using an Iterative Simulation and Matching Procedure in Combination with Functional Analysis. In Proceedings of the 2009 IEEE Radar Conference, Pasadena, CA, USA, 4–8 May 2009; pp. 1–6. [Google Scholar]
Jannati, H.; Valadan Zoej, M.J. Intelligent Wavelet Coefficients Thresholding: Speckle Reduction Approach in SAR Imagery. J. Indian Soc. Remote Sens. 2024, 52, 681–701. [Google Scholar] [CrossRef]
Hirschmuller, H. Stereo Vision in Structured Environments by Consistent Semi-Global Matching. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2386–2393. [Google Scholar]
Perko, R.; Gutjahr, K.; Krüger, M.; Raggam, H.; Schardt, M. DEM-Based Epipolar Rectification for Optimized Radargrammetry. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017. [Google Scholar]
Gutjahr, K.; Perko, R.; Raggam, H.; Schardt, M. The Epipolarity Constraint in Stereo-Radargrammetric DEM Generation. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5014–5022. [Google Scholar] [CrossRef]
Gutjahr, K.; Perko, R.; Raggam, H.; Schardt, M. 3D-Mapping from TERRASAR-X Staring Spotlight Data. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 1817–1820. [Google Scholar]
Recla, M.; Schmitt, M. Deep Learning-Based DSM Generation from Dual-Aspect SAR Data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, X-2–2024, 193–200. [Google Scholar] [CrossRef]
Dong, Y.; Zhang, L.; Balz, T.; Luo, H.; Liao, M. Radargrammetric DSM Generation in Mountainous Areas through Adaptive-Window Least Squares Matching Constrained by Enhanced Epipolar Geometry. ISPRS J. Photogramm. Remote Sens. 2018, 137, 61–72. [Google Scholar] [CrossRef]
Karami, E.; Prasad, S.; Shehata, M. Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images. arXiv 2017, arXiv:1710.02726. Available online: https://arxiv.org/pdf/1710.02726 (accessed on 24 July 2025).
Mistry, D.; Banerjee, A. Comparison of Feature Detection and Matching Approaches: SIFT and SURF. GRD J. Eng. 2017, 2, 7–12. [Google Scholar] [CrossRef]
Bansal, M.; Kumar, M.; Kumar, M. 2D Object Recognition: A Comparative Analysis of SIFT, SURF and ORB Feature Descriptors. Multimed. Tools Appl. 2021, 80, 18839–18857. [Google Scholar] [CrossRef]
Li, J.; Hu, Q.; Ai, M. RIFT: Multi-modal Image Matching Based on Radiation-Variation Insensitive Feature Transform. IEEE Trans. Image Process. 2020, 29, 3296–3310. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Zhang, L.; Du, B. SAR Image Matching Using a Modified SIFT Algorithm. Remote Sens. 2010, 2, 707–719. [Google Scholar]
Zhang, Y.; Zhang, L.; Du, B. A Multi-class Feature Matching Framework for SAR Image Registration. Remote Sens. 2015, 7, 4565–4583. [Google Scholar]
Hirschmüller, H.; Scharstein, D. Evaluation of Stereo Matching Costs on Images with Radiometric Differences. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 1582–1599. [Google Scholar] [CrossRef]
Boykov, Y.; Veksler, O.; Zabih, R. Fast Approximate Energy Minimization via Graph Cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1222–1239. [Google Scholar] [CrossRef]
Felzenszwalb, P.F.; Huttenlocher, D.R. Efficient Belief Propagation for Early Vision. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, Washington, DC, USA, 27 June–2 July 2004; Volume 1, pp. I–I. [Google Scholar]
van der Schaaf, A.; Van Hateren, J.H. Modelling the Power Spectra of Natural Images: Statistics and Information. Vis. Res. 1996, 36, 2759–2770. [Google Scholar] [CrossRef] [PubMed]
Hirschmüller, H. Stereo Processing by Semi-Global Matching and Mutual Information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Guided Image Filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1397–1409. [Google Scholar] [CrossRef]
Tomasi, C.; Manduchi, R. Bilateral Filtering for Gray and Color Images. In Proceedings of the Sixth International Conference on Computer Vision, Bombay, India, 7 January 1998; pp. 839–846. [Google Scholar] [CrossRef]
Revaud, J.; Weinzaepfel, P.; Harchaoui, Z.; Schmid, C. DeepMatching: Hierarchical Deformable Dense Matching. Int. J. Comput. Vis. 2016, 120, 300–323. [Google Scholar] [CrossRef]
Sun, D.; Yang, X.; Liu, M.Y.; Kautz, J. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8934–8943. [Google Scholar] [CrossRef]
Boykov, J.; Kim, S.; Park, J.; Lee, K.M. DiffMatch: Diffusion Model for Dense Matching. arXiv 2023, arXiv:2305.19094. [Google Scholar] [CrossRef]
Di Rita, M.; Nascetti, A.; Fratarcangeli; Crespi, M. Upgrade of Foss Date Plug-In: Implementation of a New Radargrammetric Dsm Generation Capability. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, XLI-B7, 821–825. [Google Scholar] [CrossRef][Green Version]
Scharstein, D.; Szeliski, R.; Zabih, R. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. In Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001), Kauai, HI, USA, 9–10 December 2001; pp. 131–140. [Google Scholar]
Zhou, F.; He, F.; Gui, C.; Dong, Z.; Xing, M. SAR Target Detection Based on Improved SSD with Saliency Map and Residual Network. Remote Sens. 2022, 14, 180. [Google Scholar] [CrossRef]
Jiang, C.; Tang, S.; Ren, Y.; Li, Y.; Zhang, J.; Li, G.; Zhang, L. Three-Dimensional Coordinate Extraction Based on Radargrammetry for Single-Channel Curvilinear SAR System. Remote Sens. 2022, 14, 4091. [Google Scholar] [CrossRef]
Ko, J.; Ho, Y.-S. Stereo Matching Using Census Transform of Adaptive Window Sizes with Gradient Images. In Proceedings of the 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Jeju, Republic of Korea, 13–16 December 2016; pp. 1–4. [Google Scholar]
Zabih, R.; Woodfill, J. Non-Parametric Local Transforms for Computing Visual Correspondence. In Computer Vision—ECCV ’94; Eklundh, J.-O., Ed.; Springer: Berlin/Heidelberg, Germany, 1994; pp. 151–158. [Google Scholar]
Zhao, G.; Du, Y.; Tang, Y. A New Extension of the Rank Transform for Stereo Matching. Adv. Eng. Forum 2011, 2–3, 182–187. [Google Scholar] [CrossRef]
Viola, P.; Wells III, W.M. Alignment by Maximization of Mutual Information. Int. J. Comput. Vis. 1997, 24, 137–154. [Google Scholar] [CrossRef]
Hirschmuller, H. Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 807–814. [Google Scholar]
Banz, C.; Pirsch, P.; Blume, H. Evaluation of Penalty Functions for Semi-Global Matching Cost Aggregation. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, XXXIX-B3, 1–6. [Google Scholar] [CrossRef]
Wang, M.; Hu, F.; Li, J. Epipolar Resampling of Linear Pushbroom Satellite Imagery by a New Epipolarity Model. ISPRS J. Photogramm. Remote Sens. 2011, 66, 347–355. [Google Scholar] [CrossRef]
Spangenberg, R.; Langner, T.; Rojas, R. Weighted Semi-Global Matching and Center-Symmetric Census Transform for Robust Driver Assistance. In Computer Analysis of Images and Patterns; Wilson, R., Hancock, E., Bors, A., Smith, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 34–41. [Google Scholar]
Bu, P.; Zhao, H.; Yan, J.; Jin, Y. Collaborative Semi-Global Stereo Matching. Appl. Opt. 2021, 60, 9757–9768. [Google Scholar] [CrossRef]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image Denoising by Sparse 3D Transform-Domain Collaborative Filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef]
MuraliMohanBahu, Y.; Subramanyam, M.V.; GiriPrasad, M.N. A Modified BM3D Algorithm for SAR Image Despeckling. Procedia Comput. Sci. 2015, 70, 69–75. [Google Scholar] [CrossRef]
Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
Wang, J.; Chai, H.; Li, X.; Lv, X. Improving Mountainous DSM Accuracy Through an Innovative Opposite-Side Radargrammetry Algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6641–6653. [Google Scholar] [CrossRef]
Persson, H.; Fransson, J.E.S. Forest Variable Estimation Using Radargrammetric Processing of TerraSAR-X Images in Boreal Forests. Remote Sens. 2014, 6, 2084–2107. [Google Scholar] [CrossRef]
Liu, L.; Li, Z.; Zhu, Y.; Zhang, Y.; Cao, C.; Du, X.; Han, K.; Fu, H. Geolocation Error Compensation Method for Geocoded SAR Images Using Pixel-Offset Series Without Control Points. Int. J. Digit. Earth 2025, 18, 2482883. [Google Scholar] [CrossRef]
Palamà, R.; Monserrat, O.; Crippa, B.; Crosetto, M.; Bru, G.; Ezquerro, P.; Bejar-Pizarro, M. Radargrammetry DEM Generation Using High-Resolution SAR Imagery Over La Palma During the 2021 Cumbre Vieja Volcanic Eruption. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4000705. [Google Scholar] [CrossRef]
Renga, A.; Moccia, A. Effects of Orbit and Pointing Geometry of a Spaceborne Formation for Monostatic-Bistatic Radargrammetry on Terrain Elevation Measurement Accuracy. Sensors 2009, 9, 175–195. [Google Scholar] [CrossRef]
Ghaderpour, E.; Masciulli, C.; Zocchi, M.; Bozzano, F.; Scarascia Mugnozza, G.; Mazzanti, P. Estimating Reactivation Times and Velocities of Slow-Moving Landslides via PS-InSAR and Their Relationship with Precipitation in Central Italy. Remote Sens. 2024, 16, 3055. [Google Scholar] [CrossRef]
Ghaderpour, E.; Bozzano, F.; Scarascia Mugnozza, G.; Mazzanti, P. Ground Deformation Monitoring via PS-InSAR Time Series: An Industrial Zone in Sacco River Valley, Central Italy. Remote Sens. Appl. Soc. Environ. 2024, 34, 101191. [Google Scholar] [CrossRef]
Song, F.; Chen, Q.; Tang, X.; Xu, F. Analytical Model of Point Spread Function under Defocused Degradation in Diffraction-Limited Systems: Confluent Hypergeometric Function. Photonics 2024, 11, 455. [Google Scholar] [CrossRef]
Li, J.; Xing, F.; Sun, T.; You, Z. Efficient assessment method of on-board modulation transfer function of optical remote sensing sensors. Opt. Express 2015, 23, 6187–6208. [Google Scholar] [CrossRef]

Figure 1. General structure of same-side stereo imaging using the radargrammetric approach in SAR images. GCS is short for geographic coordinate system.

Figure 2. Analysis of frequency domain behavior in optical and intensity SAR imagery, emphasizing the compact placement of distinct features within the frequency plot [41].

Figure 3. Graphical comparison of the computational complexity of dense matching algorithms by varying the basic parameters (disparity value d and the kernel size

K_{T}

).

Figure 3. Graphical comparison of the computational complexity of dense matching algorithms by varying the basic parameters (disparity value d and the kernel size

K_{T}

).

Figure 4. Sentinel-1 dataset specifications and geographic location of the image acquisition site.

Figure 5. Geolocation of the stereo TerraSAR-X pair and the designated test region is followed by epipolar resampling using a DEM-based approximation (Tehran, Iran).

Figure 6. Radiometric resolution comparison between SAR images with higher (X-band) and lower (C-band) central frequencies.

Figure 7. Checking epipolarity constraint by two dimensional normalized cross-correlation template matching.

Figure 8. Anaglyph images of subsets A1 to A4 and corresponding elevation models B1 to B4 (4 of 8 subsets).

Figure 9. Layover-affected regions of the test area (marked as white pixels) were discarded after epipolar resampling, exclusively for evaluation, owing to poor texture resulting from distortion correction.

Figure 10. Three-dimensional cost volume and the optimal disparity model embedded within the cost volume.

Figure 11. Simplified schematic of the proposed algorithm.

Figure 12. The effect of speckle reduction and histogram matching on the SAR image pair (for better visualization, the image pair is displayed using a color code).

Figure 13. Comparison of the normalized autocorrelation criterion for a kernel and its neighbors in the original form versus a reduced feature vector using interpolation.

Figure 14. Comparison of the conventional and proposed approaches for generating compact feature vectors and storing them in the third dimension.

Figure 15. Phase correlation value for two SAR intensity stereo images.

Figure 16. Effect of kernel size on cost function shape for original and reduced feature vector forms.

Figure 17. Schematic representation of the algorithm execution steps and the role of referenced equations.

Figure 18. Influence of base-to-height ratio on disparity space.

Figure 19. Example illustrating how geometric distortions affect the real pixel size value and, consequently, the disparity space.

Figure 20. Effect of incidence angles on the relationship between disparity and ground elevation.

Figure 21. The reference surface elevation model alongside the proposed algorithm and other comparative methods (results correspond to dataset number 1).

Figure 22. Visualization of the residuals of the elevation models of the algorithms relative to the reference elevation model (for better visualization, the images are presented in a logarithmic and normalized format).

Figure 23. Comparison of elevation models obtained from different methods along a longitudinal profile (the displayed DSM model is the reference model).

Figure 24. The relationship between the elevation values of the control points and their corresponding disparity values in the image space (in pixels), after correcting for the distortions caused by the incidence angles of the two sensors in the range direction.

Figure 25. An area from subset 1 with weak image texture that leads to the generation of artifacts in the produced disparity model.

Figure 26. Histogram representation of algorithm-wise error distribution, enabling comparative behavioral analysis on Sentinel-1 dataset.

Figure 27. Performance of the proposed algorithm without preprocessing, highlighting its baseline accuracy and the impact of preprocessing.

Figure 28. The reference surface elevation model alongside the proposed algorithm and other comparative methods (results correspond to TerraSAR-X dataset).

Figure 29. Visualization of the residuals of the elevation models of the algorithms relative to the reference elevation model on TerraSAR-X dataset (for better visualization, the images are presented in a logarithmic and normalized format).

Figure 30. Histogram representation of algorithm-wise error distribution, enabling comparative behavioral analysis on TerraSAR-X dataset.

Table 1. Computational complexity of dense stereo-matching algorithms.

Category	Algorithm	Structure Type	Computational Complexity (Big-O)	Parameters
Local Matching	SAD/SSD/NCC/Census [38]	Fixed window	$O (n_{T} \times k_{T}^{2} \times d)$	$n_{T}, k_{T}, d$
Global Matching	Graph Cuts [39]	Global graph	$O (n_{T} \times d^{3})$	$n_{T}, d$
	Belief Propagation [40]	Message passing	$O (n_{T} \times d^{2})$	$n_{T}, d$
	Semi-Global Matching (SGM) [42]	Path-wise DP	$O (n_{T} \times k_{T}^{2} + n_{T} \times d \times p)$	$n_{T}, k_{T}, d, p$
Adaptive Local	Guided Filter Matching [43]	Adaptive window	$O (n_{T} \times k_{T}^{2} \times d)$	$n_{T}, k_{T}, d$
	Bilateral Filter Matching [44]	Adaptive window	$O (n_{T} \times k_{T}^{2} \times d)$	$n_{T}, k_{T}, d$
Learning-Based	DeepMatching [45]	Hierarchical CNN	$T r a i n : O (e \times b \times m \times c_{i n} \times c_{o u t})$ $I n f e r : O (n_{T} \times l o g (n_{T}) \times d \times c_{i n})$	$e, b, m, c_{i n}, c_{o u t}$
	PWC-Net [46]	CNN and cost volume	$T r a i n : O (e \times b \times m \times c_{i n} \times c_{o u t})$ $I n f e r : O (n_{T} \times {k_{T}}^{2} \times d \times c_{i n} \times c_{o u t})$	$As above plus k_{T}$
	DiffMatch [47]	Diffusion model	$T r a i n : O (e \times b \times m \times s \times c_{i n} \times c_{o u t})$ $T r a i n : O (n_{T} \times k_{T}^{2} \times d \times c_{i n} \times c_{o u t})$	$As above plus s$ : diffusion steps

n_{T}

: total number of pixels in the image;

k_{T}

: kernel size (if applicable); d: disparity range; p: number of paths (SGM); e: number of training epochs; b: batch size; m: model size (number of parameters); s: number of diffusion steps (DiffMatch);

c_{i n}

,

c_{o u t}

: input/output channels.

Table 2. Computational complexity and keypoint density of feature-based sparse matching algorithms in SAR.

Algorithm	Feature Type	Preprocessing Complexity	Matching Complexity	Estimated Keypoint Density (%)	SAR Adaptation Notes
SIFT	Gradient-based [32]	$O (W \times H \times s)$	$O (N^{2})$	0.5–1.2%	Sensitive to speckle; Gaussian smoothing degrades edges
SURF	Haar wavelets [33]	$O (W \times H)$	$O (N \times \log (N))$	0.3–0.8%	Faster than SIFT; less robust to SAR distortions
ORB	Binary descriptors [34]	$O (W \times H)$	$O (N \times \log (N))$	0.2–0.6%	Efficient; binary patterns struggle with SAR speckle
HOPC	Phase congruency [35]	$O (W \times H \times f)$	$O (N^{2})$	1.5–3.0%	Designed for SAR; robust to radiometric differences
RIFT	Radiation-invariant [35]	$O (W \times H \times s)$	$O (N^{2})$	1.0–2.5%	Modality-agnostic; good SAR-optical matching
SAR-SIFT	Gradient and speckle-aware [36]	$O (W \times H \times s)$	$O (N^{2})$	1.2–2.8%	SAR-tuned; uses adaptive filtering before keypoint detection
Multi-class Feature Matching	Line and region features [37]	$O (W \times H \times k)$	$O (N \times M)$	2.0–4.5%	Combines LSD and template matching; robust to SAR distortions

W, H: Image width and height. s: Number of scales or octaves. f: Number of frequency bands (for phase congruency). N, M: Number of key points or regions in each image. k: Kernel size for speckle suppression or edge detection.

Table 3. Acquisition parameters of TerraSAR-X stereo images.

Dataset	Satellite	Acquisition Time	Acquisition Mode/Polarization	Orbit Direction	Incidence Angle (°)	Resolution Slant Range/Azimuth (m)
1	TSX	10 June 2019	SM/HV	Descending	28	1.2/3.3
2	TSX	17 April 2019	SM/HV	Descending	39	1.2/3.3

Table 4. Preprocessing parameters tuned across two stages for Sentinel-1 stereo pairs.

Preprocessing Step	Histogram Matching [63]	Speckle Reduction
Parameters	Standard imhistmatch () MATLAB 2017 b function is used.	Parameters	Intelligent Wavelet Thresholding [25]	Parameters	SAR Block Matching 3D [61,62]
$r e f$	Reference Image Selection. A brighter image is employed to configure the mapping parameter to ‘uniform’, which enables faster execution during processing.	$p$	Parameter Extraction from Partial Image. Only 5% of the image is utilized for parameter extraction to expedite implementation. Increasing this ratio yields no significant improvement in parameter estimation.	$σ^{2}$	Global Speckle Variance. This parameter is calculated as 0.016 and may optionally be used as input for noise modeling or adaptive filtering stages.
$n b i n s$	Reference Histogram Binning. The number of equally spaced bins in the reference histogram is experimentally set to 85. Increasing the bin count beyond this value yields diminishing returns, as fluctuations in histogram levels are already minimized, resulting in a sufficiently smooth grayscale output.	$σ^{2}$	Global Speckle Variance. This value is computed as 0.016 and characterizes the overall noise level in the image.	$S$	Smoothing Level. This parameter is experimentally set to 0.12, with $S \in [0,1]$ for standard grayscale intensity ranges.
$m e t h o d s :$ $‘ u n i f o r m ’$ $‘ p o l y n o m i a l ’$	Mapping Method. The ‘uniform’ mapping method is applied by default. When the reference image is darker, the algorithm switches to the ‘polynomial’ method, which incurs a higher computational cost and results in a slower runtime.	$S N R$	Estimated Signal-to-Noise Ratio (SNR). The SNR is approximately 8.	$p r o f i l e :$ $‘ n p^{’}$ $‘ l c ’$	‘np’ refers to the normal profile, which operates at a slower pace. In contrast, ‘lc’ denotes the fast profile, running more quickly than ‘np’.
-	-	$W a v e l e t - l e v e l s$	Wavelet Decomposition Levels. A two-level wavelet decomposition is employed to capture both coarse and fine-scale features in the image	$I P$	Inner Parameters. This algorithm involves several internal parameters that are typically held constant and seldom require tuning, such as those governing wavelet decomposition, Wiener filter kernel selection, and related components.
-	-	$ε_{1}, ε_{2}$	Horizontal and Vertical Variation Coefficients. Both coefficients are set to 1.	$L$	Logarithmic Transformation. A log transform is applied to better accommodate the multiplicative nature of speckle noise, allowing downstream processing steps—often designed for additive noise—to operate more effectively.
-	-	$S$	Search Method. To expedite implementation, the Genetic Algorithm (GA) is exclusively employed, incorporating random mutation and configured to run for 500 iterations.	-	-

Table 5. Preprocessing parameters tuned across two stages for TSX stereo pair.

Preprocessing Step	Histogram Matching [63]	Speckle Reduction
Parameters	Standard imhistmatch () MATLAB function is used.	Parameters	Intelligent Wavelet Thresholding [25]	Parameters	SAR Block Matching 3D [61,62]
$r e f$	Reference Image Selection. A brighter image is employed to configure the mapping parameter to ‘uniform’, which enables faster execution during processing.	$p$	Parameter Extraction from Partial Image. Only 10% of the image is utilized for parameter extraction to expedite implementation. Increasing this ratio yields no significant improvement in parameter estimation.	$σ^{2}$	Global Speckle Variance. This parameter is calculated as 0.01 and may optionally be used as input for noise modeling or adaptive filtering stages.
$n b i n s$	Reference Histogram Binning. The number of equally spaced bins in the reference histogram is experimentally set to 85. Increasing the bin count beyond this value yields diminishing returns, as fluctuations in histogram levels are already minimized, resulting in a sufficiently smooth grayscale output.	$σ^{2}$	Global Speckle Variance. This value is computed as 0.01 and characterizes the overall noise level in the image.	$S$	Smoothing Level. This parameter is experimentally set to 0.08, with $S \in [0,1]$ for standard grayscale intensity ranges.
$m e t h o d s :$ $‘ u n i f o r m ’$ $‘ p o l y n o m i a l ’$	Mapping Method. The ‘uniform’ mapping method is applied by default. When the reference image is darker, the algorithm switches to the ‘polynomial’ method, which incurs a higher computational cost and results in a slower runtime.	$S N R$	Estimated Signal-to-Noise Ratio (SNR). The SNR is approximately 10.	$p r o f i l e :$ $‘ n p^{’}$ $‘ l c ’$	‘np’ refers to the normal profile, which operates at a slower pace. In contrast, ‘lc’ denotes the fast profile, running more quickly than ‘np’.
-	-	$W a v e l e t - l e v e l s$	Wavelet Decomposition Levels. A two-level wavelet decomposition is employed to capture both coarse and fine-scale features in the image	$I P$	Inner Parameters. This algorithm involves several internal parameters that are typically held constant and seldom require tuning, such as those governing wavelet decomposition, Wiener filter kernel selection, and related components.
-	-	$ε_{1}, ε_{2}$	Horizontal and Vertical Variation Coefficients. Both coefficients are set to 1.	$L$	Logarithmic Transformation. A log transform is applied to better accommodate the multiplicative nature of speckle noise, allowing downstream processing steps—often designed for additive noise—to operate more effectively.
-	-	$S$	Search Method. To expedite implementation, the Genetic Algorithm (GA) is exclusively employed, incorporating random mutation and configured to run for 800 iterations.	-	-

Table 6. Optimum parameter settings for all methods (determined via trial and error).

Dataset	NCC	SGM-1	SGM-2	SGM-3	Proposed
1–8 (1596 × 4833)	Kernel size (K): 15 × 15 (For bigger kernel sizes, output disparity becomes smoother but at the cost of intense increasing in the processing time (order of $O (K^{2})$ ) and decreasing output disparity resolution on details)	As recommended in [17], p1 and p2 were set on 150 and 200 (matching completeness does not have a meaningful change by $\pm 15$ or have a near-optimum fluctuated result)	As recommended in [17], p1 and p2 were set on 180 and 250 (matching completeness does not have a meaningful change by $\pm 35$ or have a near-optimum fluctuated result)	As recommended in [17], p1 and p2 were set on 150 and 200 (matching completeness does not have a meaningful change by $\pm 15$ or have a near-optimum fluctuated result)	Level 1: $\{\begin{matrix} \begin{matrix} o r i g i n a l k e r n e l : 33 \times 33 \\ r e d u c e d v e c t o r : 9 \end{matrix} \\ s t d t h r e s h o l d : n o t - n e e d e d \\ 2 d m e d i a n f i l t e r : n o t - n e e d e d \end{matrix}$ Level 2: $\{\begin{matrix} \begin{matrix} o r i g i n a l k e r n e l : 9 \times 9 \\ r e d u c e d v e c t o r : 7 \end{matrix} \\ t h r e s h o l d : 0.35 \\ 2 d m e d i a n f i l t e r : 3 \times 3 a n d 7 \times 7 \end{matrix}$

Table 7. Disparity completeness (% of valid disparity pixels over total).

Method Dataset (1596 × 4833 Pixels)	NCC	SGM-1	SGM-2	SGM-3	Proposed
Subset1	93.3	92.2	91.6	94.1	93.9
Subset2	92.1	93.9	93.2	93.6	92.5
Subset3	93.6	93.7	93.3	95.2	95.8
Subset4	90.1	92.5	92.1	94.8	94.6
Subset5	90.8	92.4	92.1	94.2	94.0
Subset6	93.5	93.8	93.3	95.1	94.2
Subset7	91.2	93.1	92.5	95.8	94.8
Subset8	91.7	93.9	92.4	95.3	95.3

Table 8. Processing time per method (seconds).

Method Dataset	NCC	SGM-1	SGM-2	SGM-3	Proposed
1–8 (1596 × 4833)	4 + 287 s	4 + 11 s	4 + 18 s	4 + 16 s	4 + 7 s

Table 9. Processing time details of the proposed method (seconds, up to 2 decimal places).

Proposed Method Dataset	Preprocessing (Adaptive Speckle Reduction and Histogram Matching)	Parameter Setting	Cost Volume Creation	Cost Volume Refinement	Sub-Pixel Disparity Extraction
1–8 (1596 × 4833)	4.13 (all methods)	On hold	1.05 s	5.38 s	0.85 s

Table 10. RMSE of DSMs compared to the reference surface model (meters).

Method Dataset ((1596 × 4833) Pixels)	NCC	SGM-1	SGM-2	SGM-3	Proposed
Subset1	63.8	49.9	45.0	35.6	31.6
Subset2	67.1	34.4	39.4	37.5	35.1
Subset3	73.6	35.9	36.6	31.6	33.9
Subset4	63.4	32.8	33.5	36.5	33.7
Subset5	69.5	34.4	36.3	32.6	36.5
Subset6	75.7	36.1	37.3	36.6	33.0
Subset7	63.9	47.1	33.0	40.7	34.2
Subset8	56.6	33.7	41.7	33.9	34.7

Table 11. Standard deviation of DSM errors compared to reference surface model (meters∓).

Method Dataset ((1596 × 4833) Pixels)	NCC	SGM-1	SGM-2	SGM-3	Proposed
Subset1	21.3	9.6	7.7	6.5	5.6
Subset2	11.0	6.7	7.0	6.3	6.0
Subset3	15.3	6.3	6.5	5.2	6.0
Subset4	16.2	7.2	5.8	6.5	6.5
Subset5	13.2	6.0	6.3	6.5	6.5
Subset6	13.8	6.3	6.5	6.5	5.8
Subset7	13.4	7.2	7.5	7.7	6.0
Subset8	15.6	5.8	5.7	6.0	5.7

Table 12. Performance variation of the proposed algorithm is assessed by varying feature vector lengths in Dataset 1, focusing on Level 1 of its configuration.

	Kernel Size (Local Method) $(k \times k)$	Feature length (Proposed Algorithm) ( $k$ -Element)
	$(11 \times 11)$	3	5	7	9	11	13
Runtime (seconds)	1.23	1.41	1.48	1.52	1.57	1.62	1.68
Completeness (%)	87.21	82.31	85.5	87.7	87.5	87.6	87.6
Mean Error Distance to Target Pixel (in pixels)	2.5	3.9	3.4	3.1	2.7	2.6	2.6
Standard Deviation of Error (in pixels)	4.9	5.5	5.2	5.1	5.0	4.9	4.9
	Kernel size (Local Method) $(k \times k)$	Feature length (Proposed Algorithm) ( $k$ -element)
	$(33 \times 33)$	3	5	7	9	11	13
Runtime (seconds)	9.7	1.41	1.48	1.52	1.57	1.62	1.68
Completeness (%)	93.3	83.6	92.7	93.0	93.8	93.8	93.9
Mean Error Distance to Target Pixel (in pixels)	3.6	5.3	5.1	4.4	3.9	3.7	3.6
Standard Deviation of Error (in pixels)	2.1	3.6	2.8	2.3	2.3	2.2	2.1

Table 13. Performance variation of the proposed algorithm is assessed by varying feature vector lengths in Dataset 1, focusing on Level 2 of its configuration.

	Kernel Size (Local Method) $(k \times k)$	Feature Length (Proposed Algorithm) ( $k$ -Element)
	$(5 \times 5)$	9 (Without level 1 processing)	9 (With level 1 processing)
Runtime (seconds)	0.8	1.5	1.5
Completeness (%)	64.6	62.4	92.3
Mean Error Distance to Target Pixel (in pixels)	1.2	1.3	1.3
Standard Deviation of Error (in pixels)	10.6	11.2	2.3
	Kernel size (local method) $(k \times k)$	Feature length (proposed algorithm) ( $k$ -element)
	$(9 \times 9)$	9 (Without level 1 processing)	9 (With level 1 processing)
Runtime (seconds)	1.2	1.5	1.5
Completeness (%)	80.2	78.4	94.2
Mean Error Distance to Target Pixel (in pixels)	1.8	1.9	1.8
Standard Deviation of Error (in pixels)	5.5	6.8	1.1

Table 14. Performance indicators of algorithms on TerraSAR subset.

TerraSAR Subset (1557 × 2546) Method	Parameter Settings	Completeness (%)	RMSE (Meter)	STD (Meter)	LE90 (Meter)	Run Time (Seconds)
NCC	Kernel size (K): 15 × 15 (For bigger kernel sizes, output disparity becomes smoother but at the cost of intense increasing in the processing time)	92.5%	35.1	7.48	69.2	52.9
SGM1	As recommended in [17], p1 and p2 were set on 150 and 200	92.6%	18.3	3.78	34.6	7.5
SGM2	As recommended in [17], p1 and p2 were set on 180 and 250	93.5%	17.4	3.53	32.5	8.1
SGM3	As recommended in [17], p1 and p2 were set on 150 and 200	94.3%	17.9	3.51	30.5	8.2
Proposed method	Level 1: $\{\begin{matrix} \begin{matrix} o r i g i n a l k e r n e l : 25 \times 25 \\ r e d u c e d v e c t o r : 13 \end{matrix} \\ s t d t h r e s h o l d : n o t - n e e d e d \\ 2 d m e d i a n f i l t e r : n o t - n e e d e d \end{matrix}$ Level 2: $\{\begin{matrix} \begin{matrix} o r i g i n a l k e r n e l : 7 \times 7 \\ r e d u c e d v e c t o r : 13 \end{matrix} \\ t h r e s h o l d : 1.8 \\ 2 d m e d i a n f i l t e r : 5 \times 5 a n d 9 \times 9 \end{matrix}$	95.3%	18.1	3.45	34.3	3.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jannati, H.; Valadan Zoej, M.J.; Ghaderpour, E.; Mazzanti, P. Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images. Remote Sens. 2025, 17, 2693. https://doi.org/10.3390/rs17152693

AMA Style

Jannati H, Valadan Zoej MJ, Ghaderpour E, Mazzanti P. Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images. Remote Sensing. 2025; 17(15):2693. https://doi.org/10.3390/rs17152693

Chicago/Turabian Style

Jannati, Hamid, Mohammad Javad Valadan Zoej, Ebrahim Ghaderpour, and Paolo Mazzanti. 2025. "Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images" Remote Sensing 17, no. 15: 2693. https://doi.org/10.3390/rs17152693

APA Style

Jannati, H., Valadan Zoej, M. J., Ghaderpour, E., & Mazzanti, P. (2025). Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images. Remote Sensing, 17(15), 2693. https://doi.org/10.3390/rs17152693

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Used

2.2. State-of-the-Art Methods

2.2.1. General Structure of Disparity Model Generation in Rectified Images

2.2.2. Normalized Cross-Correlation

2.2.3. Semi-Global Matching Model

2.3. Proposed Methodology

2.3.1. Preprocessing

2.3.2. Storing Pixel Neighborhood Information as a Feature Vector in the Third Dimension and Constructing a Data Cube

2.3.3. Constructing the Cost Cube Using Shift Operations

2.3.4. Refinement of the Disparity Model Using Dual-Mode Median Filtering with Gradient-Based Thresholding

2.3.5. Sub-Pixel Estimation of the Optimal Value of the Similarity Model

2.3.6. Image Warping Using the Refined Disparity Model and Repetition of the Similarity Volume Construction with a Small Kernel Size

2.4. Expected Elevation Accuracy (Practical Perspective)

3. Results

3.1. Disparity Model Completeness

3.2. Processing Time Evaluation

3.3. Quantitative and Qualitative Evaluation of Elevation Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI