Accurate 3D Terrain Reconstruction for Multi-View Thermal Infrared Images with Small Intersection Angles

Xu, Yixuan; Liang, Quan; Guo, Junhong; Du, Xinwang; Wu, Chao; Li, Xiaoyan; Chen, Fansheng

doi:10.3390/rs18050681

Open AccessArticle

Accurate 3D Terrain Reconstruction for Multi-View Thermal Infrared Images with Small Intersection Angles

by

Yixuan Xu

^1,2

,

Quan Liang

^1,2

,

Junhong Guo

^1,2,

Xinwang Du

^1,2,

Chao Wu

^1,2,

Xiaoyan Li

^1,2,*

and

Fansheng Chen

^1,2,3,4

¹

Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Key Laboratory of Intelligent Infrared Perception, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China

⁴

Shanghai Frontier Base of Intelligent Optoelectronics and Perception, Institute of Optoelectronics, Fudan University, Shanghai 200438, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(5), 681; https://doi.org/10.3390/rs18050681

Submission received: 14 January 2026 / Revised: 18 February 2026 / Accepted: 24 February 2026 / Published: 25 February 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

An affine-initialized RPC framework enables stable 3D reconstruction from bidirectional whisk-broom thermal infrared imagery with small intersection angles.
A hierarchical, longitude/latitude-first RPC optimization improves numerical stability and mitigates height-error propagation under weak stereo geometry.

What are the implications of the main findings?

Reliable 3D terrain reconstruction can be achieved without access to confidential rigorous sensor models.
The framework offers a practical solution for DEM generation from low-texture thermal infrared stereo imagery under weak stereo geometry.

Abstract

Accurate 3D terrain reconstruction from multi-view whisk-broom thermal infrared imagery with small intersection angles remains challenging because stereo geometry is weak and height sensitivity is limited. To address this challenge, we develop an affine-initialized rational polynomial coefficient (RPC) reconstruction framework for 3D positioning under weak geometric conditions. An affine model is first used to estimate initial 3D coordinates from image tie points, which are then used to initialize RPC-based refinement. The refinement adopts an iterative scheme with hierarchical updates, where longitude and latitude are optimized before altitude to mitigate error propagation when height observability is low. The method is evaluated using multi-view data acquired by the SDGSAT-1 Thermal Infrared Spectrometer (TIS) over plain, hilly, and mountainous terrains, with intersection angles ranging from 0.57° to 6.5°. The results show that approximately 80% of the reconstruction errors fall within 2 pixels and more than 90% fall within 3 pixels, corresponding to 60 m and 90 m at the image resolution used in this study. The root-mean-square error (RMSE) remains below 0.3 pixels in plains, 1.3 pixels in hilly areas, and 1.8 pixels in mountainous areas. Overall, the proposed framework facilitates stable 3D terrain reconstruction from whisk-broom thermal infrared imagery and reduces reliance on confidential rigorous sensor models.

Keywords:

3D reconstruction; thermal infrared; bidirectional whisk-broom; affine model; RPC model

1. Introduction

1.1. Motivation

3D reconstruction from remote sensing imagery enables detailed characterization of surface geometry and has been widely used in natural hazard assessment, ecological studies, agricultural and forestry monitoring, and urban planning [1,2,3,4]. Consequently, long-term 3D monitoring provides an effective means to support the analysis, prediction, and mitigation of adverse impacts associated with natural events such as volcanic eruptions. However, most existing 3D reconstruction studies primarily rely on high-resolution optical imagery, which depends on reflected sunlight and is therefore largely limited to daytime observations.

Thermal infrared remote sensing is well suited for nighttime observations and for conditions with reduced visibility, making it a valuable modality for near all-weather monitoring. To achieve both high spatial resolution and wide swath coverage, the SDGSAT-1 TIS employs a bidirectional whisk-broom imaging mode. However, the small intersection angles between the forward- and backward-view images lead to weak stereo geometry, which substantially increases the difficulty of 3D reconstruction. In addition, thermal infrared (TIR) imagery typically exhibits low contrast and weak texture, which reduces structural detail and complicates reliable feature matching.

The data used in this study were acquired by the bidirectional whisk-broom TIS onboard the SDGSAT-1 satellite, which was launched in 2021. The TIS provides a spatial resolution of 30 m and a swath width of 300 km, and it was among the highest-resolution publicly available thermal infrared datasets at the time of acquisition [5]. With its combination of relatively high spatial resolution, wide swath coverage, and day–night observation capability, TIS offers a useful data source for 3D reconstruction across diverse terrain types. The intersection angles between the forward and backward scans range from approximately 0.57° to 6.5°. Although thermal infrared imagery is valuable for environmental monitoring, such small intersection angles lead to weak stereo geometry and pose substantial difficulties for 3D reconstruction based on rigorous sensor models and conventional forward intersection. In addition, the choice of a suitable generalized imaging model that relates 2D image observations to 3D ground coordinates is a key factor affecting both reconstruction accuracy and computational efficiency.

1.2. Related Work

In recent years, 3D reconstruction from remote sensing imagery has been widely studied and can be broadly categorized into two classes according to the sensor modeling strategy: rigorous-model-based methods and generalized-model-based methods. A rigorous sensor model describes the physical imaging process of a specific sensor by relating image and object points through collinearity equations [6]. Based on a rigorous sensor model (RSM), Zhang et al. proposed an automated workflow incorporating attitude compensation and self-calibration adjustment for large-scale high-resolution satellite imagery, including data from the ZY-3 three-line-array camera [7]. They reported a positioning accuracy of 1.87 m. However, constructing a rigorous model for a given camera is often challenging because it requires detailed proprietary hardware parameters, such as focal length and precise platform pose, and it must accommodate diverse and sometimes complex imaging modes [8]. Moreover, hardware information for many satellites is confidential, making these parameters difficult to obtain in practice [9]. As a result, rigorous-model-based methods have been developed mainly for three-line-array imagery [10]. They are generally not directly applicable to the whisk-broom images considered in this study due to fundamental differences in imaging geometry [11]. By contrast, generalized models do not rely on explicit interior and exterior orientation parameters. They include simplified forms such as the affine model [12] and more flexible representations such as the RPC model [13]. This reduced dependence on physical sensor parameters supports applicability across a wide range of sensors and imaging modes. Consequently, many recent 3D reconstruction studies adopt generalized sensor models.

The RPC model is a widely used generalized sensor model that relates 2D image observations to 3D ground coordinates through a rational function. In practice, RPC parameters can be fitted using a set of well-distributed ground control points (GCPs), typically requiring a certain number of control points to solve for the RPC coefficients and normalization terms [14]. Although the RPC model does not explicitly represent the physical imaging mechanism, it can achieve accuracy comparable to that of rigorous sensor models and can approximate the imaging geometries of various sensors, including whisk-broom systems, which has made it a common choice for 3D reconstruction [15]. Seo et al. proposed a virtual RPC approach that supports 3D reconstruction without requiring original RPC files [16]. The method alleviates confidentiality constraints for commercial satellite data and reported a reconstruction accuracy of 2.8 m RMSE, but it relies on external references for altitude estimation and is sensitive to large-scale deformations in the GCP network. To improve RPC parameter accuracy for high-resolution satellite imagery, Grodecki et al. proposed an RPC block adjustment method to compensate for systematic errors and enhance numerical stability relative to conventional approaches [17]. However, its performance depends on the initial quality of exterior orientation, and it may require additional parameters for long image strips (e.g., >100 km). Noh et al. developed a high-latitude digital elevation model (DEM) generation method that integrates the RPC model with the SETSM (Surface Extraction with Triangulated Irregular Network based Search-space Minimization) pipeline [18]. Their framework automatically corrects RPC errors without manual intervention and reported a positioning accuracy of 0.2 m. However, the method can be computationally demanding because it requires iterative optimization and TIN construction, particularly for large-scale processing.

RPC-based methods typically adopt either a forward form that projects 3D ground points to 2D image coordinates or an inverse form that maps 2D image observations to 3D ground coordinates. In either case, triangulation can be formulated as a numerical optimization problem rather than as direct forward intersection, which can improve robustness when stereo geometry is weak and intersection angles are small [19]. However, the inverse RPC model is often not provided explicitly. In practice, 3D coordinates are therefore estimated by combining 2D image measurements with auxiliary altitude information, which can reduce accuracy or require prior elevation products such as a Digital Surface Model (DSM) for correction [20]. Singh et al. performed 3D reconstruction using Cartosat-1 stereo imagery and compared a rigorous orbit model with an RPC-based approach [21]. They reported that the RPC-based solution achieved lower RMSE in steep terrain, with 44.7 m versus 50.8 m for the rigorous model. Nevertheless, RPC-based refinement can be sensitive to the quality of the initial elevation input, and inaccurate DEM initialization may cause poor convergence or divergence. Tao et al. compared forward- and inverse-RPC-based reconstruction strategies [15]. They found that the forward model can achieve higher accuracy when solving 3D coordinates via least-squares adjustment, whereas the inverse model can be more efficient during altitude iteration. To improve accuracy without relying on the inverse model, Zheng et al. proposed a Gröbner-basis minimal solver [22]. However, its algebraic complexity can increase computation time. Forward-RPC-based approaches can also be computationally demanding because they require good initial values and careful normalization. When initial values are obtained from manual settings or low-order approximations, the optimization may become trapped in local optima or fail to converge.

In contrast, methods based on simplified models typically require lower computational cost and can therefore be faster [12]. Goossens et al. proposed an alternative approach based on a perspective camera model, in which GCPs and a homography-based approximation are used for DEM reconstruction [23]. They reported an altitude accuracy of 4.8–10.2 m, but the approach depends on a relatively large number of accurate and well-distributed GCPs. Wang et al. introduced an Affine-to-Euclidean Reconstruction (AE-Rec) framework to rapidly recover 3D scene structure without high-order polynomial iterations [24]. They reported an altitude accuracy of 0.239 m. However, approaches that rely on locally linear projection can be limited because they cannot fully capture the nonlinear imaging geometry of remote sensing sensors. As a result, systematic biases may arise in areas with pronounced terrain relief, particularly in altitude estimation. In addition, the affine model implicitly assumes locally smooth terrain and unobstructed visibility. It can degrade in steep or highly occluded landscapes, such as cliffs and canyons, where occlusions are frequent and the multi-view geometry deviates from affine assumptions. Under such conditions, affine-based reconstruction may produce distorted elevation estimates.

Recently, hybrid approaches that combine multiple generalized models have attracted increasing attention. De Franchis et al. developed the Satellite Stereo Pipeline (s2p), which couples the RPC model with a local affine approximation [25]. The approach partitions a large-swath image into smaller tiles and approximates the image geometry within each tile using an affine model. They reported sub-pixel epipolar errors of less than 0.05 pixels. However, tile-based processing can introduce elevation discontinuities at tile boundaries. These discontinuities may appear as visible seams in the resulting DEM, and additional post-processing is often required to mitigate such artifacts.

With recent advances in deep learning, learning-based approaches have attracted growing attention for 3D reconstruction from remote sensing imagery [26]. Mao et al. proposed an elevation semantic flow network based on the SFFDE model [27]. Their method estimates a DSM by predicting semantic flow fields and reported an RMSE of 1.133 m. It generates a gridded elevation product from single-view imagery, which can reduce data acquisition requirements. However, the recovered elevation scale depends on external references. Deformation in image control points can introduce systematic bias, causing the output to deviate toward above-ground-level heights rather than the intended DSM. In addition, many deep learning models offer limited interpretability, which can hinder error diagnosis and principled model refinement.

In summary, bidirectional whisk-broom imagery with small intersection angles poses substantial challenges for reliable 3D reconstruction. Rigorous-model-based approaches can be difficult to apply when detailed sensor parameters are unavailable or confidential. RPC-based methods are often computationally demanding, whereas simplified models may not provide sufficient accuracy under complex imaging geometry. To address these issues, this study proposes an affine-initialized RPC framework that supports 3D reconstruction without access to rigorous sensor parameters while preserving computational efficiency.

1.3. Contribution

This study aims to enable reliable 3D terrain reconstruction from bidirectional whisk-broom thermal infrared imagery under weak stereo geometry, where existing pipelines may exhibit numerical instability or convergence difficulties. The proposed approach is designed to improve reconstruction stability and accuracy under small intersection angles. The main contributions are summarized as follows:

(1): We propose a two-stage, affine-initialized RPC reconstruction framework, in which a fast affine-based 3D estimator provides stable initial values for subsequent RPC iterative refinement. This design reduces sensitivity to initialization under weak stereo geometry while maintaining computational efficiency.
(2): We develop an enhanced local affine preprocessing strategy by partitioning large whisk-broom images into overlapping sub-images for rapid estimation. By exploiting locally smooth imaging geometry, this strategy mitigates distortions introduced by affine linearization and helps alleviate boundary discontinuities in the initialization stage.
(3): We propose a hierarchical coordinate optimization strategy for RPC refinement, where longitude and latitude are optimized before altitude. This decoupling addresses the ill-conditioning of altitude estimation under small intersection angles and improves the numerical stability of the refinement process.

2. Materials and Methods

2.1. Overall Framework

The proposed algorithm consists of two main stages: (i) rapid estimation of initial 3D ground coordinates and (ii) iterative refinement of these coordinates. In the first stage, an affine model approximates the mapping between 3D object space and 2D image space as linear, which facilitates efficient initialization. In the second stage, an RPC model is employed to iteratively optimize the 3D coordinates using a hierarchical strategy, thereby improving the final reconstruction accuracy.

As illustrated in Figure 1, the proposed pipeline consists of six main steps: (1) preprocessing, where the original image pair is cropped into tiles; (2) affine reconstruction, in which sparse correspondences are extracted and an initial 3D structure is estimated in the affine space; (3) geographic upgrading, which converts the affine 3D points into coarse ground coordinates using GCPs; (4) RPC-based coordinate refinement, where a hierarchical strategy (optimizing longitude/latitude before altitude) is employed to obtain the final reconstructed 3D point cloud; (5) optional visualization, in which the point cloud can be rasterized onto a regular grid to generate a DEM using Gaussian-weighted interpolation solely for display; and (6) quantitative evaluation against ASTER GDEM V3. Importantly, all accuracy metrics are computed directly from the reconstructed 3D point cloud prior to any DEM gridding or visualization. Consequently, the DEM generation step does not affect the reported quantitative results.

2.2. Rapid Estimation of Geographic Coordinates Based on an Affine Model

The objective of the rapid initial 3D estimation is to obtain a coarse preliminary 3D solution that serves as a reliable initialization for subsequent RPC-based iterative refinement. This initialization improves optimization efficiency and helps stabilize convergence. The procedure comprises two stages: (1) computing a local 3D approximation under an affine model and (2) transforming the resulting local 3D coordinates into geographic coordinates (longitude, latitude, and altitude).

2.2.1. Local Affine-Based Initial 3D Approximation

The imaging geometry of a whiskbroom sensor differs from that of a three-line pushbroom stereo sensor. Due to the bidirectional scanning characteristics, applying conventional rigorous physical sensor models and standard stereo positioning assumptions may require additional consideration. In such settings, stereo triangulation can be sensitive to the effective baseline. When the effective baseline is short (equivalently, the intersection angle is small), the triangulation geometry tends to be ill-conditioned, which increases the uncertainty in the vertical component. In photogrammetry, a larger base-to-height ratio is generally associated with improved height accuracy; however, bidirectional whiskbroom image pairs typically exhibit a narrow effective baseline. Consequently, directly applying ray-intersection triangulation may lead to unstable initial 3D estimates and reduced height accuracy for these data.

Accordingly, it is beneficial to adopt a local model that is less susceptible to weak stereo geometry (e.g., short effective baselines and small intersection angles). The affine model represents the image formation in a local region using an affine transformation, providing a stable linear approximation to the underlying projection. Under this model, image measurements can be expressed as an affine projection of 3D points, which reduces perspective-induced nonlinear effects and helps mitigate numerical instability when the intersection angle is small.

During a short acquisition interval, the satellite platform motion can be approximated as linear, while the whiskbroom sensor performs cross-track scanning through an oscillating mirror. Given the large sensor-to-ground distance, the mapping between image coordinates and object-space coordinates varies smoothly within a small tile. Following the general camera-modeling perspective that image measurements can be locally approximated by low-order mappings, an affine model provides a practical local approximation for relating a 2D image point in a small tile of a bidirectional whiskbroom image to its corresponding 3D object-space coordinate [14]. The imaging geometry and the overall affine-to-RPC workflow are summarized in Figure 2.

To approximate large bidirectional whisk-broom images using a local affine model, each image is partitioned into a set of overlapping tiles. The tile size is chosen to trade off local geometric fidelity and the numerical stability of affine parameter estimation: tiles that are too small may contain too few reliable correspondences, whereas tiles that are too large may violate the local linearity assumption and reduce the approximation accuracy. This procedure can be expressed mathematically as follows:

p_{i}^{k} = M_{i} P_{aff}^{k} + t_{i}

(1)

where

p_{i}^{k}

denotes the 2D feature point extracted from the forward-view (

i

= 1) and backward-view (

i

= 2) whisk-broom images using the Scale-Invariant Feature Transform (SIFT) method [28]. Each point is represented by a line coordinate

r_{i}^{k}

and a sample coordinate

c_{i}^{k}

, i.e.,

p_{i}^{k} = {(r_{i}^{k}, c_{i}^{k})}^{T}

.

P_{aff}^{k}

represents the corresponding 3D point of the

k

-th ground feature in the affine reconstruction space (in non-homogeneous form). The affine mapping for view

i

is parameterized by a 2 × 3 linear transformation matrix

M_{i}

and a 2D translation vector

t_{i}

.

Outlier filtering was applied to the sparse matches to remove incorrect correspondences. Because each tile pair can be reasonably approximated by a local affine model, Random Sample Consensus (RANSAC) was used to robustly estimate an affine geometric relation (i.e., the tile-wise affine transformation) and to reject inconsistent matches [29]. This procedure effectively eliminates spurious correspondences and improves the reliability of subsequent reconstruction.

In this work, the reconstruction pipeline follows a coarse-to-fine strategy: a sparse, affine-based reconstruction is first used to provide initialization, after which dense reconstruction is performed to improve spatial coverage. This design is adopted to enhance practical robustness for thermal infrared imagery acquired under weak stereo geometry. Dense matching approaches (e.g., SGM and ThermoStereoRT) can produce a large number of correspondences; however, for thermal infrared scenes with limited texture and extremely small intersection angles, the matching evidence may be less distinctive in some regions, and the confidence of the resulting disparities can vary accordingly [30,31]. Under such conditions, relying on dense correspondences alone for the initial 3D estimation may not always provide sufficiently stable starting coordinates for subsequent refinement. In contrast, sparse feature-based matching typically yields fewer but higher-confidence correspondences, which can serve as a conservative basis for establishing an initial 3D structure and the subsequent geodetic initialization. Accordingly, we first compute a sparse reconstruction from reliable feature matches to obtain a robust starting point, and then proceed to dense reconstruction to enhance completeness. Importantly, this choice is not intended to suggest that dense matching is generally ineffective; rather, in the present setting, the sparse-to-dense workflow offers a cautious way to balance initialization stability, coverage, and computational effort.

In this study, we adopted an enhanced SIFT-based feature matching strategy to improve robustness in low-texture thermal infrared imagery. The refinements primarily target correspondence reliability and keypoint localization accuracy while keeping the original SIFT descriptor unchanged. Specifically, keypoints are refined to sub-pixel (and sub-scale) precision via the standard DoG extremum adjustment with Hessian-based interpolation and stability tests (e.g., contrast and edge-response checks). Candidate matches are then screened using the nearest-neighbor distance ratio (NNDR) criterion. To further suppress mismatches, we apply an additional geometric verification step based on a sample-consensus scheme (FSC), in which a local transformation model is estimated from minimal samples and the largest inlier set is selected under a pixel-level error threshold, followed by least-squares refinement of the final model parameters.

To obtain a linear constraint, we center the 2D image coordinates by subtracting the centroid of all feature-point coordinates from

g_{1}

and

g_{2}

. This centering removes the translation term.

W = [\begin{array}{l} p_{c 1}^{1} & p_{c 1}^{2} & \dots & p_{c 1}^{n} \\ p_{c 2}^{1} & p_{c 2}^{2} & \dots & p_{c 2}^{n} \end{array}] = [\begin{matrix} M_{1} \\ M_{2} \end{matrix}] [\begin{array}{l} P_{aff}^{1} & P_{aff}^{2} & \dots & P_{aff}^{n} \end{array}]

(2)

To account for outliers, the problem is reformulated as the following least-squares minimization:

\min_{M, P_{aff}} ‖ \hat{W} - M P_{aff} ‖_{F}^{2}

(3)

where

\hat{W}

is the measurement matrix.

M

denotes the linear observation matrix obtained by stacking

M_{1}

and

M_{2}

row-wise.

P_{aff}

is the corresponding 3D point-cloud matrix in the affine coordinate system.

{‖.‖}_{F}

denotes the Frobenius norm.

Under the affine camera model, image measurements depend linearly on the 3D structure, so the measurement matrix

\hat{W}

typically exhibits low-rank structure. In the absence of noise,

\hat{W}

has rank at most three. Therefore, we perform singular value decomposition (SVD) on

\hat{W}

and construct a rank-3 approximation by keeping the three dominant singular values and their associated singular vectors. The resulting factorization provides estimates of the affine projection matrices

M_{1}

and

M_{2}

for the two views and the corresponding affine 3D point set

P_{aff}

[32].

2.2.2. Global Geographic Coordinate Upgrading

For an image pair, the estimated 3D affine structure can be related to the geographic (ground) coordinate system via a 3D affine transformation. In a spatially homogeneous coordinate representation, this transformation consists of a 3 × 3 linear component and a 3 × 1 translation vector, yielding 12 degrees of freedom in total. The parameters can be solved using at least four non-coplanar GCPs, each providing longitude

λ^{GCP}

, latitude

φ^{GCP}

, and orthometric height

H^{GCP}

.

Given the measured pixel coordinates of a GCP in the two images, we first center the observations by subtracting the corresponding image centroids, resulting in the centered coordinates. The affine 3D coordinate is then obtained by solving the overdetermined linear system in the least-squares sense. For numerical robustness, we compute the solution using the Moore–Penrose pseudoinverse (implemented via SVD):

P_{aff}^{GCP} = M^{+} [\begin{matrix} p_{c 1}^{GCP} \\ p_{c 2}^{GCP} \end{matrix}]

(4)

where

p_{c i}^{GCP}

denotes the centered image coordinates of GCP in two images.

The affine coordinates

P_{aff}^{GCP}

are mapped to geographic coordinates

P_{geo}^{GCP}

through a 3D affine transformation expressed in homogeneous form:

P_{geo}^{GCP} = T {[{(P_{aff}^{GCP})}^{T} 1]}^{T}

(5)

where

T

is a 3 × 4 affine transformation matrix that comprises a linear component and a translation term. Given at least four non-coplanar GCPs with known geographic coordinates and their corresponding affine coordinates, the parameters of

T

are estimated by least-squares fitting:

T = \min_{T} \sum_{j} ‖ T {[{(P_{aff, j}^{GCP})}^{T} 1]}^{T} - P_{geo, j}^{GCP} ‖_{F}^{2}

(6)

After

T

is obtained, each reconstructed affine 3D point

P_{aff}^{k}

in the point cloud is converted to its geographic coordinate

P_{geo}^{k}

by applying the same transformation:

P_{geo}^{k} = T {[{(P_{aff}^{k})}^{T} 1]}^{T}

(7)

Merge the results from all tiles to obtain a set of global initial values for subsequent optimization, following the workflow in Figure 2.

However, the affine model provides only a simplified approximation to the rigorous imaging geometry. Consequently, it may introduce non-negligible reconstruction errors, which tend to be more evident over large spatial extents or in areas with strong terrain relief. Therefore, the reprojection accuracy achievable using the affine model alone is generally insufficient for high-quality final reconstruction. Importantly, in the proposed framework, affine reconstruction is not intended to deliver the final 3D coordinates. Instead, it acts as a fast and geometrically consistent initialization step that provides reasonable starting values for the subsequent RPC-based refinement, where the final reconstruction accuracy is achieved.

2.3. Iterative Coordinate Optimization Using the RPC Model

To further improve reconstruction accuracy under weak stereo geometry, we perform an iterative refinement based on the RPC model. In practice, the convergence of such refinement may be influenced by the choice of initial values. Accordingly, the proposed framework uses affine-based initialization and a hierarchical update strategy to facilitate stable and reliable refinement within the RPC formulation.

The RPC model is commonly used to relate 3D ground coordinates to 2D image coordinates through rational polynomial functions. It can be written as:

\{\begin{matrix} r_{n} = \frac{N u m_{r} (\begin{matrix} λ_{n} & φ_{n} & h_{n} \end{matrix})}{D e n_{r} (\begin{matrix} λ_{n} & φ_{n} & h_{n} \end{matrix})} \\ c_{n} = \frac{N u m_{c} (\begin{matrix} λ_{n} & φ_{n} & h_{n} \end{matrix})}{D e n_{c} (\begin{matrix} λ_{n} & φ_{n} & h_{n} \end{matrix})} \end{matrix}

(8)

where

r_{n}

and

c_{n}

denote the normalized line and sample coordinates, respectively. For each image coordinate

j \in \{r, c\}

,

N u m_{j} (\cdot)

and

D e n_{j} (\cdot)

are cubic polynomials of the normalized geographic variables: longitude

λ_{n}

, latitude

φ_{n}

, and ellipsoidal height

h_{n}

.

To relate the normalized image coordinates to the original image coordinate system, the normalized quantities are converted back via the scale and offset parameters:

\{\begin{matrix} r = r_{n} r_{S} + r_{O} \\ c = c_{n} c_{S} + c_{O} \end{matrix}

(9)

where

r

and

c

denote the original line and sample coordinates, respectively. The subscript

O

in

r_{O}

and

c_{O}

indicates the offset parameters, and

r_{S}

and

c_{S}

are the corresponding scale factors.

To handle the nonlinearity of the RPC model in the refinement stage, we apply a first-order Taylor expansion around the current estimates at each iteration. This linearization yields approximate observation equations in a linear form, which can then be used to form the Jacobian (partial-derivative) matrix and the corresponding error equations for iterative least-squares updating:

\{\begin{matrix} r \approx \hat{r} + \frac{\partial r}{\partial λ} Δ λ + \frac{\partial r}{\partial φ} Δ φ + \frac{\partial r}{\partial h} Δ h \\ c \approx \hat{c} + \frac{\partial c}{\partial λ} Δ λ + \frac{\partial c}{\partial φ} Δ φ + \frac{\partial c}{\partial h} Δ h \end{matrix}

(10)

where

\hat{r}

and

\hat{c}

are the line and sample coordinates obtained by forward projection of the current geographic estimates

(λ^{(0)}, φ^{(0)}, h^{(0)})

, respectively.

Given the image coordinates

(r_{1}, c_{1})

,

(r_{2}, c_{2})

of corresponding feature points in the forward- and backward-looking images, the linearized residual equations can be written as:

[\begin{array}{l} v_{r 1} \\ v_{c 1} \\ v_{r 2} \\ v_{c 2} \end{array}] = [\begin{array}{l} \frac{\partial r_{1}}{\partial λ} & \frac{\partial r_{1}}{\partial φ} & \frac{\partial r_{1}}{\partial h} \\ \frac{\partial c_{1}}{\partial λ} & \frac{\partial c_{1}}{\partial φ} & \frac{\partial c_{1}}{\partial h} \\ \frac{\partial r_{2}}{\partial λ} & \frac{\partial r_{2}}{\partial φ} & \frac{\partial r_{2}}{\partial h} \\ \frac{\partial c_{2}}{\partial λ} & \frac{\partial c_{2}}{\partial φ} & \frac{\partial c_{2}}{\partial h} \end{array}] [\begin{array}{l} Δ λ \\ Δ φ \\ Δ h \end{array}] - [\begin{array}{l} r_{1} - {\hat{r}}_{1} \\ c_{1} - {\hat{c}}_{1} \\ r_{2} - {\hat{r}}_{2} \\ c_{2} - {\hat{c}}_{2} \end{array}]

(11)

where

v = {[\begin{matrix} v_{r 1} & v_{c 1} & v_{r 2} & v_{c 2} \end{matrix}]}^{T}

denotes the residual vector. The corrections

(Δ λ, Δ φ, Δ h)

to the geographic coordinates are then estimated via iterative least-squares adjustment (Shi et al., 2023) [33].

To obtain initial values for iterative RPC-based optimization, prior studies have commonly adopted either (i) a fixed initial altitude (e.g., 0 m, 100 m, or the mean height implied by the RPC offset parameters) or (ii) a simplified first-order RPC approximation obtained by omitting higher-order terms. Under weak stereo geometry, a fixed-altitude initialization may be insufficiently representative of the true terrain variation in some scenes, which can adversely affect the convergence behavior of subsequent iterative refinement. In contrast, constructing and solving a first-order RPC approximation is typically more involved and may incur additional computational cost, especially when applied repeatedly across many points or tiles.

Accordingly, in this study, the initial altitude for RPC refinement is initialized using an inverse distance weighting (IDW) interpolation based on the coarse 3D estimates obtained from the affine stage:

h^{(0)} = \frac{\sum_{i = 1}^{k} w_{i} h_{i}}{\sum_{i = 1}^{k} w_{i}}, w_{i} = \frac{1}{d_{i}^{2}}

(12)

where

h^{(0)}

denotes the initialized altitude at the query (interpolation) location.

k

is the number of neighboring points used for interpolation.

h_{i}

is the altitude of the

i

-th neighbor derived from the affine-based coarse estimates. The weight

w_{i}

is determined by the horizontal distance

d_{i}

between the query location and the

i

-th neighbor.

This formulation assigns larger weights to closer neighbors, yielding a spatially smooth initialization that reflects local elevation variations. Using the geometrically consistent yet sparse 3D points obtained from the affine-SVD stage, the IDW-based initialization provides a practical starting point for the subsequent nonlinear RPC refinement with limited additional computational overhead. In this way, the refinement process is less dependent on an arbitrary altitude guess and can exhibit more reliable convergence behavior in weak-geometry settings.

Under small intersection angles, the RPC projection is typically less sensitive to altitude than to planar coordinates. As a result, jointly updating longitude, latitude, and altitude in a single optimization may lead to stronger parameter coupling, which can make the refinement numerically less stable and may adversely affect height estimation in weak-geometry cases. To alleviate this issue, we adopt a stepwise optimization strategy. In the first stage, longitude and latitude are updated by minimizing the reprojection error while keeping the altitude fixed at its initialized value. This stage aims to obtain a more stable planar estimate that is primarily constrained by the image measurements. In the second stage, the updated longitude and latitude are held fixed, and the altitude is refined by minimizing the reprojection error with respect to height only. This decoupled procedure reduces cross-parameter interference during refinement and tends to improve the convergence behavior of the RPC optimization under weak stereo geometry.

The overall iterative procedure is summarized in Algorithm 1.

Algorithm 1. Iterative Geocoordinate Optimization.
Input: initial geographic coordinates $(λ^{(0)}, φ^{(0)}, h^{(0)}),$ threshold $(ε_{λ}, ε_{φ}, ε_{h})$ Output: Optimized coordinates $(λ_{F}, φ_{F}, h_{F})$ 1: repeat
2:	$(Δ λ^{(i)}, Δ φ^{(i)})$ ←solve the error equation by Equation (11)
3:	$(λ^{(i + 1)}, φ^{(i + 1)})$ ←add the corrections $(Δ λ^{(i)}, Δ φ^{(i)})$ to initial values $(λ^{(i)}, φ^{(i)})$
4: until $Δ λ^{(i)} < ε_{λ}$ and $Δ φ^{(i)} < ε_{φ},$ , $(λ_{F}, φ_{F}) = (λ^{(i)}, φ^{(i)})$
5: repeat
6:	$Δ h^{(i)}$ ←solve the error equation by Equation (11)
7:	$h^{(i + 1)}$ ←add the corrections $Δ h^{(i)}$ to initial values $h^{(i)}$
8: until $Δ h^{(i)} < ε_{h},$ , $h_{F} = h^{(i)}$
9: result $(λ_{F}, φ_{F}, h_{F})$

A denser set of correspondences is obtained by warping one view using a global affine transformation estimated from sparse matches via RANSAC. The resulting dense matches are then fed into the stepwise RPC optimization described above to obtain dense 3D points, providing a practical balance between computational cost and refinement robustness.

After RPC refinement, the reprojection errors of most correspondences are typically reduced. Nevertheless, a small number of points may still exhibit large residuals, for example due to occasional mismatches or locally unfavorable scene conditions. To suppress such outliers, we apply a simple altitude-range consistency check based on the RPC height normalization parameters:

h_{O} - α h_{S} \leq h_{F} \leq h_{O} + α h_{S}

(13)

where

h_{F}

denotes the final ellipsoidal height of a reconstructed point. The parameters

h_{O}

and

h_{S}

are the RPC height offset and scale factors, respectively. The coefficient

α

is a user-defined margin that slightly expands the admissible height interval to reduce the risk of over-filtering when the RPC normalization parameters are not fully representative of local elevation variation.

A terrain model is constructed at 30 m resolution and referenced to the WGS-84 ellipsoid. For consistency in comparison with ASTER GDEM V3, which reports orthometric heights relative to the EGM96 geoid, the reconstructed ellipsoidal heights are converted to orthometric heights using the corresponding geoid undulation:

H_{F} = h_{F} - ζ

(14)

where

H_{F}

denotes the final orthometric height of a reconstructed point.

ζ

is the geoid undulation (i.e., the geoid height relative to the reference ellipsoid) at that location.

The DEM is obtained by resampling the reconstructed 3D point cloud onto a regular grid. For each grid cell, the height is computed as a kernel-weighted average of neighboring points within a predefined search radius. A Gaussian kernel is adopted to assign distance-dependent weights [34]:

\exp (- σ * {(x / g r i d_s i z e)}^{2})

(15)

where

x

is the horizontal distance between a reconstructed point and the grid-cell center.

σ

controls the decay rate of the weighting function.

The gridded DEM is used only as a convenient raster representation of the reconstructed surface. All quantitative accuracy metrics reported in this study are computed directly from the reconstructed 3D points before gridding; hence, the interpolation step does not affect the reported error statistics.

2.4. Accuracy Assessment

In this study, ASTER GDEM V3 is used as the reference DEM for accuracy assessment. To facilitate visualization, the reconstructed point cloud is rasterized onto a regular grid, and an optional Gaussian smoothing is applied to the gridded surface to reduce jagged appearance and small gaps that may arise from spatially irregular point sampling. This visualization step is not used in any quantitative evaluation. All reported error metrics as well as the correlation analyses are computed directly from the reconstructed 3D points prior to any DEM gridding or optional smoothing.

Gaussian smoothing is adopted for visualization because it is straightforward to implement and has a limited number of hyperparameters. Alternative edge-preserving filters (e.g., bilateral or median filtering) may better maintain sharp terrain features, but they often introduce additional parameter choices and can yield visually inconsistent results depending on local texture and sampling density. We note that Gaussian smoothing can attenuate high-frequency terrain details in rugged areas; accordingly, it is applied only for display, and future work will explore edge-preserving gridding and filtering strategies to improve visual fidelity.

RMSE is used to quantify the accuracy of the reconstructed terrain. It is defined as

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(H_{F}^{i} - H_{R}^{i})}^{2}}

(16)

where

H_{R}

denotes the reference elevation derived from the GDEM at the planimetric location (longitude and latitude) corresponding to the

i

-th reconstructed point. When the location does not coincide with a DEM grid node, the reference elevation is obtained by bilinear interpolation.

N

is the total number of reconstructed 3D points.

In addition to RMSE, the mean absolute error (MAE) is reported to characterize the typical magnitude of the elevation error irrespective of its sign:

M A E = \frac{1}{N} \sum_{i = 1}^{N} |H_{F} - H_{R}|

(17)

The median error is reported as a robust measure of the error distribution’s central tendency, with reduced sensitivity to outliers than the mean.

Finally, the mean error (ME) is computed to assess systematic bias in the elevation estimates:

M E = \frac{1}{N} \sum_{i = 1}^{N} (H_{F} - H_{R})

(18)

A positive ME indicates overall overestimation, whereas a negative ME indicates overall underestimation.

Additionally, two threshold-based indicators are reported to further summarize reconstruction accuracy: the proportions of reconstructed 3D points whose absolute elevation errors fall within ±60 m and ±90 m, respectively. These thresholds approximately correspond to 2 and 3 pixels in the image domain. Specifically,

R_{60} = \frac{N_{60}}{N_{total}}, R_{90} = \frac{N_{90}}{N_{total}}

(19)

where

R_{60}

and

R_{90}

denote the two indicators.

N_{60}

and

N_{90}

are the numbers of points with absolute errors within ±60 m and ±90 m.

N_{total}

is the total number of reconstructed 3D points.

To provide a comprehensive and statistically robust assessment, multiple complementary metrics are employed. RMSE is used as the primary indicator to summarize overall accuracy while reflecting the impact of large residuals. To reduce sensitivity to outliers, MAE and the median error are also reported to better represent the typical error magnitude. In addition, the mean error is computed to quantify systematic bias in the estimated elevations. Beyond scalar metrics, error histograms are analyzed to examine the distribution of residuals (e.g., shape, symmetry), which helps distinguish predominantly random errors from potential systematic effects. Collectively, these indicators provide a balanced characterization of reconstruction performance under weak stereo geometry.

3. Results

3.1. Experimental Data

This study uses thermal infrared imagery acquired by the TIS onboard SDGSAT-1 (Figure 3). TIS employs an oscillating scanning mirror that sweeps across the orbital track, resulting in two acquisitions from different viewing directions (forward and backward scans). These acquisitions partially overlap over a shared ground area. Despite the small intersection angle, the overlap ensures that the same surface is observed in both scans, providing paired measurements that can support subsequent 3D reconstruction analysis. The key sensor parameters are summarized in Table 1.

In addition to the TIS imagery, auxiliary inputs include GCPs and image-specific RPC models. Each RPC model was fitted using no fewer than 39 GCPs. The GCPs were sampled from a coarse-resolution DEM.

To examine performance across different terrain conditions, the test scenes were grouped into three terrain types—plain, hill, and mountain—based on elevation characteristics (e.g., representative elevation levels of <200 m, ~500 m, and >500 m, respectively). The method was further evaluated using image pairs spanning three representative intersection-angle levels: a larger angle (≈6.5°), an intermediate angle (≈3.45°), and a smaller angle (≈0.57°). These levels correspond to different imaging time separations between the forward and backward scans. The scene settings are listed in Table 2.

For affine modeling of large bidirectional images, each scene is partitioned into overlapping tiles. Unless otherwise stated, the experiments use a tile size of 50 × 50 pixels, and the influence of this setting is further examined in Section 4.3. During inlier selection, RANSAC is applied to the matched keypoints by estimating a fundamental matrix with an inlier threshold of 1.0 pixel. For the IDW interpolation used to obtain initial altitude estimates, the number of neighboring samples is set to 10.

The proposed algorithm is implemented in Python 3.8, using NumPy 1.24.4 and SciPy 1.10.1 for core numerical computations. All experiments are conducted on a desktop workstation equipped with an AMD Ryzen 9 5900HX CPU (8 cores, 16 threads) and 16 GB RAM, without GPU acceleration. Parallel execution is enabled via Python’s multiprocessing module, using up to eight worker processes.

3.2. Qualitative Comparison of Different 3D Scenes

To qualitatively assess the agreement between the terrain models reconstructed by the proposed method and the reference topography, we visualize the reconstructed surfaces in 3D and compare them with the corresponding GDEM, as shown in Figure 4. One representative scene is selected for each terrain type: a mountainous area in Lushi County, Henan Province (Scene 6), a hilly area in Nanyang City, Henan Province (Scene 9), and a plain area in Zhumadian City, Henan Province (Scene 12). These scenes exhibit elevation of >800 m, ~500 m, and <100 m, respectively, which are characteristic of the three terrain categories. The corresponding intersection angles are approximately 0.57°, 3.45°, and 6.5°, spanning weak to relatively favorable stereo geometries.

To better visualize elevation variations, the reconstructed terrain surfaces are rendered from three azimuth directions with a consistent viewing angle (Figure 5). Overall, the generated terrain models exhibit similar large-scale morphology to the reference GDEM across the mountain, hill, and plain scenes. Noticeable local discrepancies remain, which can be partly explained by differences in sensor characteristics and imaging geometry. Specifically, the SDGSAT-1 TIS imagery has a coarser ground sampling distance (30 m) than the ASTER optical imagery (15 m), limiting the representation of fine-scale topographic details. In addition, the ASTER stereo configuration provides a much larger intersection angle (27.6°), which generally yields stronger stereo constraints than the small-angle TIS bidirectional observations.

To further assess these differences in a more objective manner, we next perform a quantitative evaluation of reconstruction errors between the generated terrain models and the reference DEM. The results are presented in the following section.

3.3. Quantitative Assessment of Reconstructed 3D Scenes

To evaluate the practical contribution of the affine initialization under large-scale relief and strongly varying terrain, we further quantify the elevation errors obtained at the affine-initialization stage. The results show that, while the affine model alone does not achieve the accuracy required for final terrain reconstruction, it typically yields geometrically consistent and stable initial estimates that facilitate the subsequent RPC-based refinement. As summarized in Table 3, RPC refinement substantially reduces the RMSE across all terrain types, with relative reductions of 32.85–41.10%.

Figure 6 summarizes the altitude error characteristics of the mountainous Scene 5 using three complementary views: a histogram of signed errors, a scatter plot of reconstructed altitude versus reference DEM altitude, and a spatial map of absolute errors. Together, these plots provide a descriptive assessment of central tendency, dispersion, elevation dependence, and spatial variability.

Central tendency is reported using both the mean and the median. The mean error is −5.26 m and the median error is −1.34 m, indicating a small negative offset in the distribution center. Figure 6a shows that errors are concentrated around zero, while a non-negligible spread and visible tails are also present. The standard deviation is 50.09 m and the RMSE is 50.37 m, which are close in magnitude and therefore jointly reflect the overall dispersion of errors in this scene.

The scatter plot in Figure 6b compares reconstructed altitudes with the reference DEM across the full elevation range. Most samples align closely with the 1:1 line. In addition, a small number of points depart markedly from the main trend, consistent with the tail behavior observed in the histogram. These points are reported here as residual outliers without further attribution.

Figure 6c visualizes the spatial distribution of absolute errors. The error magnitudes are not spatially uniform: some subregions exhibit larger absolute residuals than others. This spatial pattern is presented as a qualitative observation that complements the global statistics in Figure 6a and the elevation-wise comparison in Figure 6b.

Overall, the RMSE is 50.37 m. The proportions of errors within ±60 m and ±90 m are 81.17% and 91.97%, respectively. These percentages provide an empirical summary of practical error bounds for this scene.

A similar quantitative assessment was performed for the hilly region in Scene 9 (Figure 7). The mean and median errors are −4.41 m and −1.68 m, respectively, indicating a small negative offset in the central tendency. The difference between the two statistics suggests mild asymmetry in the error distribution, which is consistent with the histogram in Figure 7a.

Figure 7a shows a relatively concentrated error distribution compared with the mountainous case. The peak is narrower and the spread is smaller, and the standard deviation is 28.36 m. These statistics indicate reduced dispersion of residuals in this scene. The RMSE is 28.70 m, which is close to the standard deviation, suggesting that the overall error magnitude is primarily characterized by scatter rather than by a large global bias.

In Figure 7b, most samples cluster around the 1:1 line over the elevation range, indicating that the reconstructed altitudes generally follow the reference trend. The distribution of points around the diagonal appears slightly asymmetric in some elevation intervals, as reflected by an unequal density of points on the positive and negative error sides. This observation is reported descriptively here and motivates further examination of elevation-dependent behavior and potential outliers.

The spatial map in Figure 7c shows that absolute errors are not spatially uniform. Larger residuals occur more frequently in areas with stronger local relief, whereas relatively flatter subregions tend to exhibit smaller errors. This spatial variability is consistent with the concentration of larger errors in the tails of Figure 7a.

Overall, the scene achieves an RMSE of 28.70 m. The proportions of errors within ±60 m and ±90 m are 95.56% and 98.82%, respectively, providing an empirical summary of practical error bounds for this hilly area.

Finally, we analyze the error characteristics of the plain-area Scene 12 (Figure 8). The mean and median errors are −0.81 m and −0.19 m, respectively, indicating that the distribution center is very close to zero with a slight negative offset. Figure 8a shows a sharply peaked histogram with a narrow spread, suggesting that most residuals are small in magnitude.

The standard deviation is 5.35 m and the RMSE is 5.41 m, both substantially lower than those observed in the mountainous and hilly scenes. In Figure 8b, the point cloud is tightly clustered around the 1:1 line across the (limited) elevation range, indicating close agreement between reconstructed altitudes and the reference DEM. Figure 8c further shows that the absolute errors exhibit only modest spatial variability within this scene.

For this plain area, the RMSE is 5.41 m. All errors fall within ±60 m and ±90 m, providing an empirical summary of the error bounds for this scene.

Comparing the three terrain types, flatter areas are associated with smaller absolute errors in our experiments. At the same time, because the plain scene spans a narrower elevation range, the same absolute error may correspond to a larger relative error with respect to local relief, which should be considered when interpreting terrain-dependent performance.

4. Discussion

4.1. Applicability of Existing RPC-Based Methods

Existing RPC-based stereo reconstruction pipelines (e.g., s2p) have been widely used for satellite imagery and can perform well when stereo geometry is sufficiently strong and image quality meets the typical assumptions of dense stereo processing. However, for bidirectional whisk-broom thermal infrared images with ultra-small intersection angles, stable 3D reconstruction becomes substantially more challenging due to severely ill-conditioned geometry, further aggravated by low-texture radiometric patterns.

In this study, the SDGSAT-1 TIS data exhibit extremely small intersection angles, leading to very weak stereo geometry. We attempted to apply s2p to the same forward/backward TIS image pair using its standard workflow (with the provided RPC models). In our experiments, the pipeline did not yield usable terrain outputs on this dataset, which precludes a fair quantitative comparison of accuracy or runtime under the same conditions. This outcome is primarily related to the violation of key operating assumptions in such pipelines under ultra-weak geometry, rather than to implementation details. Similar outcomes were observed under several reasonable parameter settings.

Importantly, this observation does not imply a limitation or deficiency of s2p or similar RPC-based stereo methods. Instead, it reflects differences in problem settings and intended application domains. In many established pipelines, local affine models are mainly employed to facilitate local rectification and dense matching, while 3D recovery relies on sufficiently strong geometric constraints and reliable correspondences.

In contrast, the proposed method follows a model-centric reconstruction strategy: affine modeling is used to provide geometrically consistent initial values, and 3D point coordinates are directly refined through RPC-based formulations using image observations. By tightly coupling geometric initialization with hierarchical RPC optimization, the proposed approach is specifically designed to improve numerical stability under extremely weak stereo geometry. Therefore, it should be regarded as a complementary solution targeting extreme imaging scenarios characterized by ultra-small intersection angles, rather than as a replacement for established stereo reconstruction pipelines.

4.2. Overall Accuracy Performance and Interpretation

It should be noted that all quantitative accuracy metrics reported in this study are computed at the point level using the reconstructed 3D point cloud, with the reference DEM serving only as an external elevation benchmark. DEM gridding and surface visualization are included solely to aid interpretation and qualitative inspection and therefore do not influence the reported RMSE, MAE, median error, or correlation analysis. In particular, the optional DEM grid is obtained from the point cloud via Gaussian-distance–weighted neighborhood interpolation to assign grid values; no additional smoothing or filtering is applied beyond this gridding step.

This point-level evaluation is well suited to assessing reconstruction behavior under extremely small intersection angles, which is the primary focus of this work. By contrast, grid-based DEM accuracy, while important for downstream applications, depends on the interpolation strategy and the chosen grid resolution. A systematic investigation of these factors is beyond the scope of the present study and is left for future work.

In addition to RMSE and ME, we report complementary metrics—MAE and median error—in Appendix A, Table A1. Across the tested scenes, these metrics show patterns that are generally consistent with those indicated by RMSE, thereby providing additional support for the overall performance assessment and reducing reliance on any single statistic. To avoid redundancy in the main text, RMSE is used as the primary accuracy indicator, whereas MAE and median error are included in Appendix A for completeness.

The accuracy metrics for different terrain types are summarized in Table 4.

Table 4 reports RMSE, ME, and the threshold-based ratios

R_{60}

and

R_{90}

for different terrain types and intersection angles. Overall, mountainous scenes exhibit relatively larger RMSE values (mean RMSE: 43.84 m) and lower

R_{60}

and

R_{90}

(85.78% and 94.96%, respectively), whereas hill scenes show moderate errors (mean RMSE: 27.63 m) with higher

R_{60}

and

R_{90}

(95.21% and 98.86%). Plain scenes achieve the smallest errors (mean RMSE: 7.72 m) and reach 100% for both

R_{60}

and

R_{90}

. Within each terrain category, scenes associated with larger intersection angles often present smaller RMSE values and higher

R_{60}

and

R_{90}

, although the trend is not strictly monotonic across all cases.

ME summarizes the average signed deviation and provides an indication of overall bias direction. In Table 4, ME is generally smaller in magnitude than RMSE, suggesting that the error budget is dominated by dispersion rather than a single global bias term. Nevertheless, ME alone is not sufficient to characterize accuracy because it can be affected by outliers and may be reduced by cancellation between positive and negative errors. Therefore, ME is interpreted together with RMSE and the threshold-based ratios

R_{60}

and

R_{90}

, which jointly describe both the typical error level and the proportion of points within specified elevation-error tolerances.

RMSE is a widely used metric for summarizing overall reconstruction error because it assigns a higher penalty to large deviations through the squared-error term. It is reported in meters, which facilitates physical interpretation. In Table 4, RMSE varies across scenes and terrain types: plain areas (Scenes 12–14) show smaller RMSE values, whereas mountainous scenes generally exhibit larger RMSE values. Nevertheless, RMSE can be sensitive to a small number of large absolute errors; therefore, it may not fully represent the typical accuracy achieved by most points.

To complement RMSE, we additionally report two threshold-based ratios,

R_{60}

and

R_{90}

, defined as the percentages of reconstructed points whose absolute elevation errors are within ±60 m and ±90 m, respectively. These indicators provide an intuitive view of the fraction of points meeting a specified error tolerance. As shown in Table 4,

R_{60}

and

R_{90}

are high in most scenes, while their values decrease in more challenging cases, such as mountainous terrain under weaker stereo configurations.

Overall, the joint interpretation of RMSE, ME, and the threshold-based ratios suggests that reconstruction performance depends on scene conditions and terrain complexity. Plain scenes consistently achieve the smallest RMSE and reach 100% for both

R_{60}

and

R_{90}

in this dataset, while hilly scenes present intermediate performance, and mountainous scenes show larger dispersion. Meanwhile,

R_{90}

remains above 91% for all scenes, indicating that the majority of reconstructed points fall within the ±90 m tolerance across the evaluated cases. These results highlight the importance of using multiple complementary metrics to characterize both typical error levels and within-tolerance performance.

Despite the overall performance, several limitations are observed. ME is not consistently close to zero across scenes, suggesting that a small residual bias may remain under certain conditions. In addition, accuracy degrades in mountainous scenes, where RMSE increases and

R_{60}

decreases compared with hill and plain scenes. The results also vary noticeably with intersection angle: for the same terrain type, RMSE and

R_{60}

can differ across stereo configurations, and this variation is more evident in mountain areas. These observations indicate that the proposed pipeline is more sensitive under weak stereo geometry (i.e., small intersection angles) and challenging imaging conditions, and further robustness improvements are still desirable.

The above performance differences are plausibly related to scene complexity and radiometric characteristics in thermal infrared imagery. Mountainous regions typically involve strong relief, steep slopes, and frequent self-occlusions, which may reduce the availability and reliability of correspondences. Shadows and spatially varying thermal emission can further decrease local contrast and introduce ambiguous patterns, potentially increasing matching uncertainty. Moreover, when the intersection angle is small, viewpoint differences between the stereo images may become less favorable for stable depth recovery, particularly over rugged terrain. In contrast, plains generally exhibit smoother topography and fewer occlusions, and their thermal appearance is often more spatially uniform, which can be more conducive to consistent matching. We note that these factors are presented as plausible explanations consistent with the observed trends, rather than as a definitive attribution.

A small intersection angle generally corresponds to weak stereo geometry (i.e., a small effective

B / H

). Under such configurations, depth/height estimation becomes more sensitive to disparity errors, and the sensitivity can be further exacerbated over steep terrain. In mountainous areas, large slopes and relief increase geometric nonlinearity and occlusions, so a given image-domain mismatch may translate into a larger elevation deviation than in relatively flat regions. In contrast, plains have near-zero slopes and fewer geometric discontinuities, which typically reduces error amplification and leads to more stable height estimates. We note that this discussion provides an intuitive geometric interpretation consistent with the observed performance differences, rather than a strict analytical derivation.

As summarized in Table 5, within the same terrain type, scenes associated with larger intersection angles often exhibit slightly smaller RMSE values, and

R_{90}

also shows a modest increase in several cases. At the same time, the magnitude of these angle-related differences is generally smaller than the performance gap observed across terrain types. In this dataset, the variations between mountainous, hilly, and plain scenes are more pronounced than those among the three tested intersection angles.

From a geometric perspective, smaller intersection angles are commonly associated with weaker stereo constraints (i.e., a smaller effective

B / H

), under which height estimates can become more sensitive to matching noise and local surface orientation. This provides a plausible interpretation for the higher RMSE and lower within-threshold ratios observed in some small-angle cases, particularly over rugged terrain. Nevertheless, given that all evaluated configurations fall within a relatively small-angle regime, the improvement brought by increasing the intersection angle appears limited compared with the effect of terrain relief and slope. Therefore, terrain complexity remains a primary factor correlated with the error level in Table 5, while intersection-angle changes contribute a secondary but observable influence.

The terrain-related accuracy difference is substantial. For instance, RMSE remains below 9 m in plain scenes, whereas it exceeds 30 m in mountainous scenes in this dataset. By comparison, the variation associated with small changes in intersection angle is more limited.

Overall, the reconstruction performance varies with both terrain complexity and stereo configuration. Plain scenes consistently yield the lowest errors, which suggests that the proposed pipeline can achieve reliable results under relatively favorable surface conditions (e.g., gentle relief and more consistent image texture). In hilly and mountainous scenes, the error level increases, but the threshold-based ratios (

R_{60}

and

R_{90}

) indicate that a large fraction of points still fall within the specified elevation-error tolerances. These results imply that the method remains reasonably robust across different terrains, while its accuracy is reduced in more challenging topography.

4.3. Influence of Terrain and Key Parameters

To further examine how terrain characteristics relate to reconstruction performance under a small intersection angle, we conducted additional analyses on the most challenging mountain scene (intersection angle = 0.57°). Using the reference DEM, we computed two terrain descriptors on non-overlapping 50 × 50-pixel tiles: mean slope (average of per-pixel slope values derived with a standard DEM-based slope operator, in degrees) and local altitude range (maximum minus minimum orthometric elevation within each tile). We then analyzed their relationships with reconstruction error, as summarized in Figure 9, which shows the scatter plots of RMSE against (a) mean slope and (b) local altitude range. The results show that RMSE is moderately and positively correlated with both mean slope and altitude range (Pearson’s r = 0.597 and 0.609, respectively; p < 0.01), suggesting that larger errors tend to co-occur with stronger terrain relief in this scene. A plausible interpretation is that, under ultra-weak stereo geometry, greater relief may coincide with occlusion/shadow effects and less informative parallax cues, which can reduce the numerical stability of RPC-based height estimation and increase height uncertainty. This analysis is correlational and is intended to characterize error–terrain relationships rather than establish causality.

The larger reconstruction errors observed in mountainous regions may be attributed to a combination of factors, including terrain-related occlusion and shadow effects, increased geometric sensitivity under small intersection angles, and weak or repetitive texture in thermal infrared imagery. These conditions can reduce the availability and spatial coverage of reliable correspondences and, under weak stereo geometry, may further increase height uncertainty in the subsequent reconstruction.

Beyond terrain effects, the performance of the local affine initialization is also influenced by the tile-partition strategy. As summarized in Table 6, in our experiments, very small tiles (25 × 25 pixels) did not yield a sufficient number of valid correspondences for stable affine estimation; consequently, the subsequent RPC refinement could not be executed for this setting. For the remaining feasible configurations, a tile size of 50 × 50 pixels provides the best overall accuracy with a balanced runtime, whereas larger tiles tend to reduce local geometric adaptability and are associated with increased errors.

The runtime reported in Table 6 corresponds to processing one complete evaluation scene (i.e., the same spatial extent used for quantitative accuracy assessment) under the stated CPU-only setting with parallel execution enabled. With a fixed overlap ratio, the total number of tiles increases with scene area, and most steps in the pipeline are tile-parallel. As a result, the end-to-end runtime tends to scale approximately with the tile count and can be further reduced by allocating more CPU resources or distributing tiles across additional workers.

We further examined the sensitivity to the IDW neighborhood size

k

, which is used only to generate a coarse initial altitude estimate for the subsequent RPC refinement. As shown in Table 7, varying

k

within a small range around 10 (i.e.,

k

= 8, 10, 12) results in very similar accuracy metrics and comparable runtimes. This observation suggests that the overall performance is relatively insensitive to the choice of

k

within this practical range for our dataset and settings.

The affine stage is not intended to serve as a final accurate solution, especially in scenes with strong terrain relief. Instead, its primary purpose in our framework is to provide a geometrically consistent and numerically stable initialization for the subsequent RPC optimization.

4.4. Limitations and Future Work

The affine-based initialization in this framework is built on an assumption of locally smooth imaging geometry. It may therefore be less reliable in extremely rugged terrain or in very large scenes where occlusions are frequent. In such cases, the affine approximation can introduce non-negligible initialization errors, which may carry over into subsequent refinement. In our experiments, the hierarchical RPC refinement was generally able to proceed when the initialization maintained overall geometric consistency, even if the affine estimates were not highly accurate in an absolute sense. Future work may consider adaptive tiling designs and alternative local approximations to improve the stability of initialization in challenging terrain and large-scale scenarios.

Feature detection and matching are included as auxiliary components rather than being examined as a major focus of this study. Although more recent detectors may offer advantages under certain imaging conditions, the present work concentrates on the behavior of the overall pipeline when combining affine-based initialization with hierarchical RPC optimization under ultra-weak stereo geometry. A systematic comparison of different feature extraction and matching schemes is left for future work.

This work presents an initial investigation of combining affine-based initialization with RPC-based refinement for stereo pairs with extremely small intersection angles. To address the degradation observed in complex terrain, future studies may explore incorporating complementary data sources (e.g., LiDAR point clouds) to provide additional elevation and structural constraints. Such information has the potential to reduce the influence of occlusions, weak texture, and unfavorable stereo geometry in mountainous areas, but its effectiveness remains to be assessed through further experiments.

5. Conclusions

This study presents an affine-initialized, hierarchical RPC reconstruction framework for 3D terrain positioning from bidirectional whisk-broom thermal infrared stereo imagery under very weak stereo geometry. The framework combines a fast affine-based initialization with a stepwise RPC refinement procedure, with the goal of providing geometrically consistent starting values for subsequent optimization while avoiding reliance on confidential sensor parameters. Experiments on SDGSAT-1 TIS multi-view data with intersection angles ranging from 0.57° to 6.5° show that, for the evaluated scenes, approximately 80% of the reconstructed points fall within ±60 m (about 2 pixels) and more than 90% within ±90 m (about 3 pixels). The corresponding RMSEs are below 0.3 pixels in plain areas, 1.3 pixels in hilly terrain, and 1.8 pixels in mountainous regions. Overall, these results indicate that the proposed framework can provide usable reconstructions in the tested setting with extremely small intersection angles, where achieving stable estimation can be challenging. Despite these results, several limitations should be noted. The affine-based initialization relies on a local approximation of imaging geometry and may become less reliable in very rugged terrain or large-scale scenes where occlusions are frequent. In addition, feature detection and matching are treated as supporting components in this study; therefore, the impacts of alternative matching strategies are not systematically evaluated. Performance degradation in complex mountainous areas further suggests that image-only constraints may be insufficient in some cases, particularly under ultra-weak geometry and low-texture conditions. Future work may explore incorporating additional structural constraints and complementary data sources (e.g., LiDAR point clouds or higher-resolution optical imagery) and further evaluate their effectiveness in improving robustness and accuracy, especially in mountainous regions.

Author Contributions

Conceptualization, J.G.; methodology, Y.X.; software, Y.X.; validation, Y.X. and X.D.; formal analysis, Q.L. and Y.X.; investigation, J.G.; resources, F.C.; data curation, Q.L.; writing—original draft preparation, Y.X.; writing—review and editing, Y.X. and X.L.; visualization, C.W.; supervision, X.L.; project administration, F.C.; funding acquisition, F.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Hangzhou Joint Fund of the Zhejiang Provincial Natural Science Foundation of China under Grant No. LHZSZ25F010001, and the Open Fund of State Key Laboratory of Infrared Physics, grant number SITP-NLIST-YB-2024-10.

Data Availability Statement

The data used in this study is available by contacting the corresponding author.

Acknowledgments

The authors would like to thank the SDG BIG DATA Center and National Space Science Center for providing us with data. The data utilized in this study is sourced from SDGSAT-1 and provided by CBAS.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1 reports MAE and the signed median error for all scenes as complementary metrics to the RMSE-based evaluation. Together, they summarize typical error magnitude and bias direction with reduced sensitivity to outliers. Overall, their variations are generally consistent with the RMSE trends and do not alter the main conclusions.

Table A1. MAE, median error, RMSE and mean error of the data.

Scene Number	MAE (m)	Median Error (m)	RMSE (m)	Mean Error (m)
1	21.35	−3.18	30.32	−4.94
2	24.27	5.32	31.89	1.41
3	38.23	−2.68	47.79	−4.68
4	30.82	−2.41	52.05	−11.08
5	39.42	−1.34	50.61	1.77
6	36.02	−1.25	50.37	−5.26
7	14.08	0.70	19.89	1.30
8	27.28	4.99	38.10	7.27
9	19.81	−1.68	28.70	−4.41
10	27.67	−5.27	32.71	−1.55
11	11.34	−2.68	35.48	−6.92
12	3.74	−0.19	5.41	−0.81
13	5.15	−0.10	7.77	−0.27
14	6.88	1.23	8.96	0.79

References

Ganci, G.; Cappello, A.; Neri, M. Data Fusion for Satellite-Derived Earth Surface: The 2021 Topographic Map of Etna Volcano. Remote Sens. 2023, 15, 198. [Google Scholar] [CrossRef]
Colverd, G.; Takami, J.; Schade, L.; Bot, K.; Gallego-Mejia, J.A. Tomographic SAR Reconstruction for Forest Height Estimation. arXiv 2024, arXiv:2412.00903v2. [Google Scholar] [CrossRef]
Petrović, I.; Sečnik, M.; Hočevar, M.; Berk, P. Vine Canopy Reconstruction and Assessment with Terrestrial Lidar and Aerial Imaging. Remote Sens. 2022, 14, 5894. [Google Scholar] [CrossRef]
Li, Z.; Ji, S.; Fan, D.; Yan, Z.; Wang, F.; Wang, R. Introduction of 3D Information of Buildings from Single-View Images Based on Shadow Information. ISPRS Int. J. Geo-Inf. 2024, 13, 62. [Google Scholar] [CrossRef]
Guo, H.D.; Liang, D.; Chen, F.; Sun, Z.C.; Liu, J. Big Earth Data Facilitates Sustainable Development Goals. Bull. Chin. Acad. Sci. 2021, 36, 874–884. [Google Scholar] [CrossRef]
Schönberger, J.L.; Frahm, J.-M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Zhang, X.; Pan, H.; Zhou, S.; Zhu, X. Self-Calibration Strip Bundle Adjustment of High-Resolution Satellite Imagery. Remote Sens. 2024, 16, 2196. [Google Scholar] [CrossRef]
Bullinger, S.; Bodensteiner, C.; Arens, M. 3D Surface Reconstruction from Multi-Date Satellite Images. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2021, XLIII-B2-2021, 313–320. [Google Scholar] [CrossRef]
Hirschmüller, H.; Scholten, F.; Hirzinger, G. Stereo Vision Based Reconstruction of Huge Urban Areas from an Airborne Pushbroom Camera (HRSC). In Pattern Recognition: 27th DAGM Symposium; Springer: Berlin, Germany, 2005; pp. 58–66. [Google Scholar] [CrossRef]
Sivakumar, V.; Kumar, B.; Srivastava, S.K.; Krishna, B.G.; Srivastava, P.K.; Kiran Kumar, A.S. DEM Generation for Lunar Surface using Chandrayaan-1 TMC Triplet Data. J. Indian Soc. Remote Sens. 2012, 40, 551–564. [Google Scholar] [CrossRef]
Dong, Q.; Gao, X.; Cui, H.; Hu, Z. Robust Camera Translation Estimation via Rank Enforcement. IEEE Trans. Cybern. 2022, 52, 862–872. [Google Scholar] [CrossRef]
Facciolo, G.; de Franchis, C.; Meinhardt-Llopis, E. Automatic 3D Reconstruction from Multi-Date Satellite Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Gao, J.; Liu, J.; Ji, S. Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching. arXiv 2021, arXiv:2109.11121. [Google Scholar] [CrossRef]
Hartley, R.I.; Saxena, T. The Cubic Rational Polynomial Camera Model; Technical Report; G.E. Corporate R&D and CMA Consulting: Niskayuna, NY, USA, 2001. [Google Scholar]
Tao, C.V.; Hu, Y. 3D reconstruction methods based on the rational function model. Photogramm. Eng. Remote Sens. 2002, 68, 705–714. [Google Scholar]
Seo, D.U.; Park, S.Y. 3D Reconstruction from Multi-view Google Earth Satellite Stereo Images by Generating Virtual RPC based on 3D Homography-based Georeferencing. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 48, 1075–1080. [Google Scholar] [CrossRef]
Grodecki, J.; Dial, G. Block Adjustment of High-Resolution Satellite Images Described by Rational Polynomials. Photogramm. Eng. Remote Sens. 2003, 69, 59–68. [Google Scholar] [CrossRef]
Noh, M.-J.; Howat, I.M. Automated stereo-photogrammetric DEM generation at high latitudes: Surface Extraction with TIN-based Search-space Minimization (SETSM) validation and demonstration over glaciated regions. GISci. Remote Sens. 2015, 52, 198–217. [Google Scholar] [CrossRef]
d’Angelo, P.; Reinartz, P. DSM based orientation of large stereo satellite image blocks. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 39, 209–214. [Google Scholar] [CrossRef]
Jacobsen, K. Analysis and correction of systematic height model errors. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 333–339. [Google Scholar] [CrossRef]
Singh, M.K.; Gupta, R.D.; Snehmani; Bhardwaj, A.; Ganju, A. Effect of sensor modelling methods on computation of 3-D coordinates from Cartosat-1 stereo data. Geocarto Int. 2015, 31, 506–526. [Google Scholar] [CrossRef]
Zheng, E.; Wang, K.; Dunn, E.; Frahm, J.-M. Minimal Solvers for 3D Geometry from Satellite Imagery. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Goossens, R.; Schmidt, M.; Menz, G. High resolution DEM and ortho-photomap generation from TERRA-ASTER data—Case study of Morocco. In Geoinformation for European-Wide Integration; Benes, T., Ed.; Millpress: Rotterdam, The Netherlands, 2003; pp. 19–24. [Google Scholar]
Wang, P.; Shi, L.; Chen, B.; Hu, Z.; Qiao, J.; Dong, Q. Pursuing 3-D Scene Structures with Optical Satellite Images from Affine Reconstruction to Euclidean Reconstruction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5632214. [Google Scholar] [CrossRef]
de Franchis, C.; Meinhardt-Llopis, E.; Michel, J.; Morel, J.-M.; Facciolo, G. An Automatic and Modular Stereo Pipeline for Pushbroom Images. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 2014, II-3, 49–56. [Google Scholar] [CrossRef]
Stucker, C.; Schindler, K. ResDepth: A Deep Residual Prior For 3D Reconstruction from High-resolution Satellite Images. ISPRS J. Photogramm. Remote Sens. 2022, 183, 560–580. [Google Scholar] [CrossRef]
Mao, Y.; Chen, K.; Zhao, L.; Chen, W.; Tang, D.; Liu, W.; Wang, Z.; Diao, W.; Sun, X.; Fu, K. Elevation Estimation-Driven Building 3D Reconstruction from Single-View Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5608718. [Google Scholar] [CrossRef]
Ma, W.; Wen, Z.; Wu, Y.; Jiao, L.; Gong, M.; Zheng, Y.; Liu, L. Remote Sensing Image Registration with Modified SIFT and Enhanced Feature Matching. IEEE Geosci. Remote Sens. Lett. 2017, 14, 3–7. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Bu, P.; Wang, H.; Dou, Y.; Wang, Y.; Yang, T.; Zhao, H. Weighted omnidirectional semi-global stereo matching. Signal Process. 2024, 220, 109439. [Google Scholar] [CrossRef]
Hu, A.; Li, A.; Jin, X.; Zou, D. ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement. arXiv 2025, arXiv:2504.07418. [Google Scholar] [CrossRef]
Tomasi, C.; Kanade, T. Shape and Motion from Image Streams under Orthography: A Factorization Method. Int. J. Comput. Vis. 1992, 9, 137–154. [Google Scholar] [CrossRef]
Shi, R.; Zhang, Z.; Qiu, X.; Ding, C. A Novel Gradient Descent Least-Squares (GDLS) Algorithm for Efficient Gridless Line Spectrum Estimation with Applications in Tomographic SAR Imaging. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5208313. [Google Scholar] [CrossRef]
Gao, D.; Li, P.Z.X.; Sze, V.; Karaman, S. GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians. IEEE Robot. Autom. Lett. 2025, 10, 2774–2781. [Google Scholar] [CrossRef]

Figure 1. Framework of the proposed method.

Figure 2. Forward-view (

t_{1}

) and backward-view (

t_{2}

) whisk-broom scans provide overlapping coverage for stereo reconstruction, with a minimum intersection angle of 0.57°. For a matched correspondence

p_{1}^{k} (r_{1}^{k}, c_{1}^{k})

(forward) and

p_{2}^{k} (r_{2}^{k}, c_{2}^{k})

(backward), an affine model yields the 3D point

P_{aff}^{k}

in the affine reconstruction space. Using GCPs with known affine coordinates

P_{aff}^{GCP}

and geographic coordinates

P_{geo}^{GCP} (λ^{GCP}, φ^{GCP}, h^{GCP})

, a transformation

T

converts

P_{aff}^{k}

into an initial geodetic estimate

P_{geo}^{k} (λ^{(0)}, φ^{(0)}, h^{(0)})

. An RPC model can then be applied to further refine the geodetic coordinates to

P_{geo}^{k} (λ_{F}, φ_{F}, h_{F})

.

Figure 2. Forward-view (

t_{1}

) and backward-view (

t_{2}

) whisk-broom scans provide overlapping coverage for stereo reconstruction, with a minimum intersection angle of 0.57°. For a matched correspondence

p_{1}^{k} (r_{1}^{k}, c_{1}^{k})

(forward) and

p_{2}^{k} (r_{2}^{k}, c_{2}^{k})

(backward), an affine model yields the 3D point

P_{aff}^{k}

in the affine reconstruction space. Using GCPs with known affine coordinates

P_{aff}^{GCP}

and geographic coordinates

P_{geo}^{GCP} (λ^{GCP}, φ^{GCP}, h^{GCP})

, a transformation

T

converts

P_{aff}^{k}

into an initial geodetic estimate

P_{geo}^{k} (λ^{(0)}, φ^{(0)}, h^{(0)})

. An RPC model can then be applied to further refine the geodetic coordinates to

P_{geo}^{k} (λ_{F}, φ_{F}, h_{F})

.

Figure 3. The SDGSAT-1 TIS original image pairs. (a–f) are the forward and backward images of mountain, hill, plain area, respectively.

Figure 4. 3D visualization of the generated terrain models and the corresponding GDEM for representative mountain, hill, and plain scenes.

Figure 5. Generated terrain model based on various terrains and intersection angles.

Figure 6. Error summaries for the mountainous scene in Fuxin City, Liaoning Province (Scene 5, intersection angle = 0.57°). (a) Histogram of signed altitude errors (10 m bin width), with ±60 m and ±90 m ranges highlighted. (b) Scatter plot of reconstructed altitude versus reference DEM altitude, colored by signed altitude error; the dashed line shows the 1:1 relationship and the solid lines indicate ±30 m and ±60 m bounds. (c) Spatial map of absolute altitude errors.

Figure 7. Error summaries for the hilly scene in Nanyang City, Henan Province (Scene 9, intersection angle = 3.45°): (a) signed-error histogram (10 m bins), (b) reconstructed altitude vs. reference DEM altitude colored by signed error, and (c) spatial map of absolute altitude errors.

Figure 8. Error summaries for the plain scene in Zhumadian City, Henan Province (Scene 12, intersection angle = 6.5°): (a) signed-error histogram (10 m bins), (b) reconstructed altitude vs. reference DEM altitude colored by signed error, and (c) spatial map of absolute altitude errors.

Figure 9. Scatter plot of RMSE correlation with (a) mean slope and (b) altitude range.

Table 1. Key parameters of the TIS on the SDGSAT-1 satellite.

Parameter	Value
Orbit	505 km
Bands	8~10.5 μm
	10.3~11.3 μm
	11.5~12.5 μm
Swath	300 km
Spatial resolution	30 m
Revisit period	11 days
maximum intersection angle	6.5°
minimum intersection angle	0.57°

Table 2. Terrain, intersection angle, and region of the scene data used.

Scene Number	Terrain	Intersection Angle (°)	Region
1	Mountain	6.5	Fuxin County, Liaoning Province
2			Fuxin County, Liaoning Province
3		3.45	Jinzhou City, Liaoning Province
4			Zhenping County, Henan Province
5		0.57	Fuxin City, Liaoning Province
6			Lushi County, Henan Province
7	Hill	6.5	Kazuo County, Liaoning Province
8		3.45	Luohe City, Henan Province
9			Nanyang City, Henan Province
10		0.57	Dengzhou City, Henan Province
11			Chifeng City, Neimenggu Province
12	Plain	6.5	Zhumadian City, Henan Province
13		3.45	Tanghe County, Henan Province
14		0.57	Jiamusi City, Heilongjiang Province

Table 3. RMSE comparison between the affine initialization and the final RPC refinement.

Terrain	Intersection Angle (°)	Affine RMSE (m)	Final RMSE (m)	Reduction (%)
Mountain	0.57	85.53	50.37	41.10
Hill	3.45	42.74	28.70	32.85
Plain	6.5	8.32	5.41	34.98

Table 4. RMSE, ME,

R_{60}

and

R_{90}

of various terrains.

Table 4. RMSE, ME,

R_{60}

and

R_{90}

of various terrains.

Scene Number	Terrain	Intersection Angle	RMSE (m)	Mean Error (m)	$R_{60}$ (%)	$R_{90}$ (%)
1	Mountain	6.5°	30.32	−4.94	93.99	98.54
2			31.89	1.41	93.49	98.97
3		3.45°	47.79	−4.68	79.48	94.36
4			52.05	−11.08	88.06	94.10
5		0.57°	50.61	1.77	78.48	91.82
6			50.37	−5.26	81.17	91.97
		Mean	43.84	4.86	85.78	94.96
7	Hill	6.5°	19.89	1.30	98.58	99.94
8		3.45°	38.10	7.27	89.57	95.56
9			28.70	−4.41	96.37	98.82
10		0.57°	32.71	−1.55	93.11	98.29
11			35.48	−6.92	91.84	98.21
		Mean	27.63	3.29	95.21	98.86
12	Plain	6.5°	5.41	−0.81	100	100
13		3.45°	7.77	−0.27	100	100
14		0.57°	8.96	0.79	100	100
		Mean	7.72	0.51	100	100

Table 5. RMSE, mean error,

R_{60}

and

R_{90}

of different intersection angles.

Table 5. RMSE, mean error,

R_{60}

and

R_{90}

of different intersection angles.

Scene Number	Intersection Angle	Terrain	RMSE (m)	Mean Error (m)	$R_{60}$ (%)	$R_{90}$ (%)
1	6.5°	Mountain	30.32	−4.94	93.99	98.54
2			31.89	1.41	93.49	98.97
7		Hill	19.89	1.30	98.58	99.94
12		Plain	5.41	−0.81	100	100
		Mean	18.99	1.71	97.21	99.49
3	3.45°	Mountain	47.79	−4.68	79.48	94.36
4			52.05	−11.08	88.06	94.10
8		Hill	38.10	7.27	89.57	95.56
9			28.70	−4.41	96.37	98.82
13		Plain	7.77	−0.27	100	100
		Mean	34.88	5.54	90.53	96.73
5	0.57°	Mountain	50.61	1.77	78.48	91.82
6			50.37	−5.26	81.17	91.97
10		Hill	32.71	−1.55	93.11	98.29
11			35.48	−6.92	91.84	98.21
14		Plain	8.96	0.79	100	100
		Mean	35.63	3.26	88.92	96.06

Table 6. Table of reconstruction accuracy and computation time with different tile sizes.

Tile Size (px × px)	MAE (m)	Median Error (m)	RMSE (m)	Affine Time (min)	RPC Time (min)	Total Time (min)
25 × 25	Fail (insufficient correspondences; RPC not executed)
50 × 50	29.73	−0.76	47.53	5.93	36.04	41.97
100 × 100	39.26	−2.9	56.96	1.57	45.67	47.24
200 × 200	54.29	2.42	72.94	0.42	46.50	46.92

Table 7. Reconstruction accuracy and computation time with different

k

.

Table 7. Reconstruction accuracy and computation time with different

k

.

k	MAE (m)	Median Error (m)	RMSE (m)	Affine Time (min)	RPC Time (min)	Total Time (min)
8	30.07	−0.77	47.90	5.93	36.52	42.45
10	29.73	−0.76	47.53	5.93	36.04	41.97
12	29.46	−0.73	47.27	5.93	36.17	42.10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, Y.; Liang, Q.; Guo, J.; Du, X.; Wu, C.; Li, X.; Chen, F. Accurate 3D Terrain Reconstruction for Multi-View Thermal Infrared Images with Small Intersection Angles. Remote Sens. 2026, 18, 681. https://doi.org/10.3390/rs18050681

AMA Style

Xu Y, Liang Q, Guo J, Du X, Wu C, Li X, Chen F. Accurate 3D Terrain Reconstruction for Multi-View Thermal Infrared Images with Small Intersection Angles. Remote Sensing. 2026; 18(5):681. https://doi.org/10.3390/rs18050681

Chicago/Turabian Style

Xu, Yixuan, Quan Liang, Junhong Guo, Xinwang Du, Chao Wu, Xiaoyan Li, and Fansheng Chen. 2026. "Accurate 3D Terrain Reconstruction for Multi-View Thermal Infrared Images with Small Intersection Angles" Remote Sensing 18, no. 5: 681. https://doi.org/10.3390/rs18050681

APA Style

Xu, Y., Liang, Q., Guo, J., Du, X., Wu, C., Li, X., & Chen, F. (2026). Accurate 3D Terrain Reconstruction for Multi-View Thermal Infrared Images with Small Intersection Angles. Remote Sensing, 18(5), 681. https://doi.org/10.3390/rs18050681

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accurate 3D Terrain Reconstruction for Multi-View Thermal Infrared Images with Small Intersection Angles

Highlights

Abstract

1. Introduction

1.1. Motivation

1.2. Related Work

1.3. Contribution

2. Materials and Methods

2.1. Overall Framework

2.2. Rapid Estimation of Geographic Coordinates Based on an Affine Model

2.2.1. Local Affine-Based Initial 3D Approximation

2.2.2. Global Geographic Coordinate Upgrading

2.3. Iterative Coordinate Optimization Using the RPC Model

2.4. Accuracy Assessment

3. Results

3.1. Experimental Data

3.2. Qualitative Comparison of Different 3D Scenes

3.3. Quantitative Assessment of Reconstructed 3D Scenes

4. Discussion

4.1. Applicability of Existing RPC-Based Methods

4.2. Overall Accuracy Performance and Interpretation

4.3. Influence of Terrain and Key Parameters

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI