1. Introduction
Synthetic aperture radar (SAR) is an active microwave radar imaging system that offers advantages over optical or optoelectronic sensors in terms of all-weather, all-day and high-resolution imaging capabilities (SAR plays a crucial role in remote sensing applications over long distances) [
1,
2]. The utilization of SAR imaging technology enables the acquisition of high-quality three-dimensional (3D) scene models, bearing significant implications in academic, military, commercial, and disaster management domains. With the rapid advancements in SAR theories and technologies, various SAR imaging techniques have emerged to gather 3D information of observed scenes, including interferometric SAR (InSAR) [
3,
4,
5], tomographic SAR (TomoSAR) [
6,
7,
8], and radargrammetry [
9,
10,
11]. Among these techniques, radargrammetry uses images with parallax to calculate terrain elevation information by substituting the image information of corresponding pixels into the 3D imaging model’s equation system. Compared to the InSAR and TomoSAR techniques, radargrammetry enables the use of images acquired at different times and locations, thereby imposing fewer restrictions on platforms and images. As a result, radargrammetry alongside optical photogrammetry, has achieved numerous accomplishments in digital photogrammetry and surface elevation inversion [
12,
13].
The process of acquiring high-quality 3D imaging through radargrammetry involves two main steps: corresponding point measurement and analytical stereo positioning. Initially, corresponding points are derived by registering multi-aspect SAR images. These corresponding points are subsequently utilized in the radargrammetric equations to calculate the 3D model of the target scene. The conventional methods for corresponding point measurement in SAR images comprise statistical detection techniques and feature-based matching algorithms [
14]. Additionally, leveraging prior information such as digital elevation models (DEMs) and ground control points enables the achievement of high-precision image matching, although its applicability is limited [
15,
16]. Statistical detection methods typically rely on grayscale or gradient information in the images and employ matching methods (e.g., correlation and mutual information methods) to align image windows. On the other hand, feature-based matching algorithms extract features that are less affected by grayscale variations among different sources. The scale invariant feature transform (SIFT) is the classic algorithm which is invariant to image scaling and rotation. It has been widely applied in the registration of images and has received many successful improvements [
17,
18]. However, due to the presence of noise and complex distortions in SAR images, the accuracy and efficiency of feature extraction for complex scenes at a single scale are suboptimal. Furthermore, various geometric models for SAR image reconstruction have been proposed, including F. Leberl’s Range–Doppler (RD) equations (model) [
19], G. Konecny’s geocentric range projection model [
20] and the Range–Coplanarity model [
21], etc. Studies have compared the applicability and accuracy of these different mathematical models, revealing that the F. Leberl model requires fewer parameters and exhibits a broader range of applications [
22,
23].
However, the quality of SAR images directly impacts the matching of corresponding points, with noise and ghosting being potential issues that can adversely affect the quality of 3D imaging results. Furthermore, the accuracy of multi-aspect SAR image registration and airborne platform parameters serves as an input parameter in the imaging equations, directly influencing the final calculation results. During the process of 2D SAR imaging, calibrating the platform parameters is also necessary. The distortion caused by parallax in multi-aspect SAR images can disrupt the measurement of corresponding points. Traditional registration methods suffer from low accuracy and efficiency, making the resolution of the parallax–registration contradiction a challenge in radargrammetry [
15]. Additionally, the accuracy of 3D imaging depends on the resolution and quantity of SAR images. However, the practical constraints of efficiency pose a challenge in realistically achieving high-precision 3D imaging in radargrammetry.
To address the aforementioned issues, this paper proposes a multi-aspect SAR radargrammetric method with composite registration. It utilizes composite registration methods, combining SIFT and normalized cross-correlation (NCC) for the detection of corresponding points in multi-aspect SAR image pairs. After the segmentation of the large-scale SAR image based on the coarse registration results from SIFT, each region is subjected to NCC precise registration following resampling, enabling the efficient and rapid extraction of matching points. Then, the platform parameters are calibrated according to the imaging mode of the SAR images, allowing for high-precision stereo imaging of the target region using the F. Leberl imaging equations. Finally, a multi-aspect point cloud fusion algorithm based on DEMs is utilized to obtain the high-precision reconstruction of the target scene.
To obtain high-precision multi-aspect SAR images and platform parameters, we first calibrate the platform coordinates and other parameters during motion compensation in the SAR imaging process. Then, this paper adopts the registration of SAR images using stereo image groups to enhance the registration efficiency and accuracy, simultaneously mitigating the difficulty of registration caused by large disparities. In this way, the disparity threshold for image pairs is increased, resulting in a higher accuracy of 3D imaging. Moreover, this paper proposes a composite registration method that enables automatic and robust registration. The proposed method initially applies feature-based coarse registration and window-based precise registration. Subsequently, 3D reconstruction is performed based on feature extraction, thereby enhancing the registration accuracy and efficiency for crucial targets. There have also been many learning-based models for multimodal coregistration, which are widely used for image registration or 3D imaging [
24,
25]. However, compared to the demands of network training, feature-based registration methods can improve the efficiency of coarse registration and 3D imaging based on radargrammetry.
For validation purposes, this paper presents a comprehensive 3D reconstruction of the Five-hundred-meter Aperture Spherical radio Telescope (FAST) through the proposed method. This method effectively utilizes the benefits of multi-aspect SAR images to remove overlaps and eliminate shadow areas. It validates the accuracy and effectiveness of the theories and methods introduced in this paper. The rest of the paper is organized as follows.
Section 2 outlines the proposed methods for SAR image registration and 3D imaging. The experimental results of these methods are presented in
Section 3, with a detailed analysis of the registration and 3D imaging results found in
Section 4. Finally,
Section 5 provides a summary of this paper.
2. Methods
Radargrammetry applies radar imaging principles to calculate the 3D coordinates of a target. The 3D coordinates are obtained using construction equations (i.e., equations for 3D reconstruction based on geometric models), which are based on the observational information of the corresponding point from different aspect angles.
Figure 1 illustrates that multi-aspect SAR images have the capability to achieve 3D imaging when satisfying the requirements of the construction equations.
Construction equations require information of corresponding points in SAR images from different aspect angles, making image registration a crucial step in determining these corresponding points. Therefore, this paper proposes a composite registration method that combines SIFT coarse registration with NCC precise registration to address the disparity and registration contradictions. Subsequently, we interpret the principles and fundamental procedures of these registration algorithms; the process of achieving high-precision 3D imaging for complex structured targets is described in detail and illustrated in
Figure 2. The SAR imaging algorithm (BP algorithms) stated in Step 1 [
26] will not be introduced in this paper. The subsequent sections will focus on Steps 2 to Steps 4.
Step 1: By selecting (filtering) the original data and imaging results according to the platform trajectory and image disparity, we can acquire a multi-aspect SAR image sequence composed of multiple groups of image pairs. The utilization of stereo image groups can increase the disparity threshold in image pair registration, effectively mitigating the disparity–registration contradiction.
Step 2: This involves implementing coarse registration and segmentation, using multi-scale SIFT for two SAR images requiring registration. Subsequently, we resample each region and achieve a precise image registration using the NCC algorithm. The utilization of composite registration in sub-regions mitigates the impact of challenges such as layover and perspective contraction on the registration of multi-aspect SAR images.
Step 3: Based on the registration information of each pixel in the primary image, which is obtained from the image pairs, the 3D coordinates of corresponding points are computed using the RD equations. This process generates a main point cloud for image pairs and multiple regional point clouds for sub-regions. To enhance the accuracy of the imaging equations’ solution, a weighted cross-correlation coefficient-based method is adopted for the registration results of corresponding points in the stereo image group. Furthermore, motion compensation is applied to the trajectory data to reduce the impact of platform parameter errors on the results.
Step 4: This comprises performing point cloud fusion based on DEMs. In this process, the point clouds in selected regions are filtered based on the cross-correlation coefficient, enabling the fusion of high-density point clouds and obtaining the final point clouds of the scene.
2.1. Generation and Selection of Multi-Aspect SAR Images
This paper divides circular SAR data into multiple sub-apertures and utilizes a strip-fitting imaging method [
26] as depicted in
Figure 3, resulting in sub-aperture images. The SAR platform flies along the y-axis looking towards its right side at an altitude of H. D is a sampled point when the aircraft is at A, whose ideal position should be B. Subsequently, all SAR images are selected and arranged based on their aspect angles. As a result, the raw SAR image sequence and corresponding platform information of the circular SAR are obtained. The fitting method compensates for coordinate data during imaging, enabling strip-mode imaging of circular SAR sub-aperture data. Theoretically, there is no difference between circular SAR and strip SAR data. Therefore, both multiple strip SAR images and circular SAR sub-aperture images can be obtained. Generally, circular SAR data can obtain more image pairs but with a narrower swath width, facilitating high-precision reconstruction in small areas.
By integrating image sequences from multiple trajectories, the original multi-aspect image database is obtained and primarily classified based on aspect angles and trajectories. Within this database, any image can be designated as the primary image. The secondary images are selected from the same or different trajectories with a certain disparity threshold. These secondary images, in conjunction with the primary image, form independent stereo image groups, respectively. The images in the database are further categorized to generate a multi-aspect SAR image sequence composed of multiple stereo image groups.
2.2. Multi-Aspect SAR Image Registration
To address the discrepancy between disparity and registration in SAR image pairs, this paper proposes a composite registration method that combines SIFT for coarse registration and NCC for precise registration. First, the secondary images are resampled based on the results of segmentation and the affine matrices of each region obtained through coarse registration. Subsequently, precise registration is performed on each sub-region to obtain the final registration results for the image pair.
Increasing the number of image pairs can augment the input parameters of the RD equations, thereby enhancing the positional accuracy, while increasing the computational complexity. The number of secondary images is constrained by the available data and the disparity threshold in coarse registration. To improve efficiency and increase the disparity threshold, this paper registers the stereo image group and establishes a connection between the two image pairs by transferring parameters obtained from coarse registration. As illustrated in
Figure 4, the disparity threshold is increased by transferring parameters through a secondary image, and the overall computational complexity is primarily determined by the number of precisely registered image pairs.
Although, in theory, the composite registration method can achieve registration for images with large disparities, practical considerations restrict the improvement in registration accuracy and efficiency, such as the anisotropy of the target scattering coefficient and shadowing caused by disparity. Consequently, the disparity threshold for direct registration generally does not exceed 5°, and the maximum disparity between the primary and secondary images within the same stereo image group does not exceed 15°.
2.2.1. Coarse Registration Using the SIFT Algorithm
Coarse registration first obtains feature descriptors for each image from image pairs at different scales and then performs feature matching. For each stereo image group, separate image scale spaces are established for the primary and secondary images [
27]. The Gaussian difference scale space is utilized for detecting keypoints and computing the main orientation and feature descriptor of them. The SIFT registration results for the image pair are obtained by calculating the Euclidean distance between the feature descriptors of each keypoint [
28].
Despite undergoing several preliminary screenings, such as the elimination of edge points and points with low contrast, there are still incorrect matching points. To address this problem, the random sample consensus (RANSAC) method is employed to filter the matching points [
29], facilitating the estimation of the affine matrix through the least-squares (LS) method. Specifically,
k random matching points are selected, and the unknown affine matrix
is obtained by solving the LS problem, which is represented by:
where,
and
represent the homogeneous coordinates of the matching point m in image
i and image
j, respectively. The remaining matching points are filtered using the estimated affine matrix. The matching points that meet the error requirement are then included in the affine matrix estimation, resulting in filtered coarse registration points and the overall affine matrix estimation. Subsequently, images are divided into different sub-regions through the density-based spatial clustering of applications with noise (DBSCAN) algorithm, which is based on the distribution of the filtered feature points. The same coarse registration is conducted for each sub-region of the image pair after segmentation. This process yields resampling parameters, including regional affine matrices and predefined regional sampling rates.
2.2.2. Precise Registration Using NCC Matching
Due to the limited quantity and accuracy of the corresponding points obtained solely through SIFT feature matching in coarse registration, it is necessary to perform precise registration to achieve sub-pixel-level alignment. In this paper, affine matrices and other parameters obtained from the coarse registration are utilized to resample secondary images based on a sinc interpolation algorithm, which can obtain more interpolation points with high accuracy. The resampled secondary images are then subject to precise registration with the primary image. The NCC method can achieve sub-pixel registration results, which is crucial for obtaining accurate results in registration. After the affine transformation of a sub-region, the rotational and scaling discrepancies of the target are greatly reduced. The NCC method is employed to identify corresponding points in the images and calculate their correlation coefficients and offsets. The expression for extending NCC to 2D images is provided in:
where,
is:
where,
I(
x,
y) represents the amplitude value at the corresponding coordinate in the image, which is used to compute the similarity between the reference image I
1 and the matching image
I2. The NCC algorithm involves sliding windows of size (2
n + 1) × (2
n + 1), centered at selected point in the reference image I
1 and the matching window of the same size (2
n + 1) × (2
n + 1) in image
I2. The similarity between the windows is computed by calculating the cross-correlation coefficient. The window in image
I2 is continuously moved, and the point with the maximum cross-correlation coefficient is selected as the corresponding point for the selected point in
I1. By iterating through all the points in
I1 and repeating the aforementioned steps, a precise registration result is obtained. To enhance imaging efficiency and accuracy, the registration results are filtered based on the final cross-correlation coefficients and the grayscale values of the window’s center pixel.
2.3. Three-Dimensional Coordinates Calculation Based on Radargrammetry
After completing the registration of a stereo image group, the corresponding pixel coordinates of the primary image in different image pairs are obtained. Then, the 3D coordinates of corresponding points can be calculated using the RD equation system. These equations reflect the relationship between the objects on the ground and the corresponding pixels in the image, as shown in
Figure 1. The RD equation system consists of two components: the range sphere Equation (4) and the Doppler cone Equation (5). These equations relate the position vector of a target
with its image coordinate
within the image. Furthermore, the equation system incorporates the platform coordinate vector
, velocity vector
and additional imaging parameters corresponding to the azimuth coordinate
, namely the pixel spacing in the range direction
, the scanning delay
and the squint angle of the radar signal
θ.
The range sphere equation:
The Doppler cone equation:
In the case of zero-Doppler-processed SAR data, the right-hand side of Equation (5) is set to zero, given by:
Each pixel corresponds to two equations and is underdetermined. However, by performing registration, the coordinates of identical points in different images are obtained, simultaneously satisfying the RD equation system of those respective images. For a pixel in the primary image S0, a set of RD equation systems exists for each corresponding point , i = 1, 2, …, n, in the secondary images Si. Consequently, this yields a total of 2n + 2 equations for the point, resulting in an overdetermined equation system, which is employed to calculate the 3D coordinates of each point.
In addition to improving SAR image quality and accuracy and enhancing registration precision through composite registration, the calibration of platform coordinates and velocity parameters is necessary. To convert data from an inertial measurement unit (IMU) to a unified coordinate system, this paper adopts a fusion method that combines platform IMU data
PI with GPS data
PG, as expressed in Equation (7). The LS method is applied to derive the coordinate transformation parameters
T and
P, enabling transformation from the platform coordinate system to the imaging coordinate system. The transformed IMU data are then compensated using linear fitting, resulting in platform parameters that exhibit higher accuracy compared to GPS data, as depicted in
Figure 5. Based on the compensated coordinate in the geodetic coordinate system, the corrected velocity and other parameters can be obtained.
For a pixel of the primary image, there can be a maximum of 2(
n + 1) equations from n image pairs. Initially, the image pair with the highest correlation coefficient in the registration result is selected, resulting in four equations and an optimal solution
R1. Subsequently, the equation system for the remaining image pairs is computed by utilizing
R1 as the initial value, which generates additional results
Ri,
i = 2, …,
n. The combination of these results, along with the equation system result
R0 composed of all equations, constitutes the final result
Ri,
i = 0, 1, …,
n, for this point within the stereo image group set. Finally, employing the cross-correlation coefficient
ki,
i = 0, 1, …,
n, from image pairs’ registration as the weight, where
k0 = 1, the LS method is utilized to obtain the optimal estimation of the coordinates
R, as demonstrated below:
High-precision 3D reconstruction is achieved for a stereo image group, resulting in a main point cloud corresponding to the stereo image group. Additionally, independent 3D imaging is performed on the prominent regions of each image, resulting in multiple 3D point cloud results within individual image pairs. Therefore, the point cloud of each stereo image group comprises the main point cloud and multiple regional point clouds. Ultimately, a collection of 3D point clouds is obtained from the entire SAR image sequence, where each point includes relevant information such as its corresponding pixels in the image pairs.
2.4. Point Cloud Fusion
The point clouds obtained from different image pairs may have positioning errors ranging from several meters to tens of meters, originating from inaccuracies in the SAR images and platform parameters, while each point cloud may contain erroneous data. In this paper, a point clouds fusion method based on a digital elevation model (DEM) is used to fuse point cloud sequences, while removing incorrectly positioned results, resulting in the final 3D cloud of the target scene.
Fusion of two point clouds with correspondence points can be easily achieved. The correspondence points
and
,
m = 1, 2, …,
k, between the two point clouds, correspond to identical image pixels
. The transformation matrix
from point group
to point group
can be calculated as depicted by Equation (9), where
typically approximates a translation matrix.
For the fusion of two point clouds without correspondence points, the first step is to calculate the corresponding DEM for each point cloud. This paper adopts elevation fitting to extract DEMs from point clouds. Specifically, it starts with density-based clustering within a small range around each pixel and then performs kriging interpolation to obtain the elevation for that pixel. In this way, points with low connectivity, which contain erroneously positioned results, are removed.
The main point cloud of an image pair is selected as the reference point cloud. The affine matrix R between a specific point cloud’s DEM () and the reference point cloud’s DEM () is obtained utilizing the LS method. This transformation minimizes the elevation differences between the transformed DEM () and the point cloud () in the same area, resulting in the transformation coordinates of the point cloud relative to the reference point cloud and completing the fusion of the two point clouds. After the fusion of all point clouds in the point cloud sequence is complete, the 3D imaging results of the target scene are obtained.
6. Conclusions
This paper proposed a method for multi-aspect SAR 3D imaging using composite registration based on radargrammetry. By employing the composite registration of stereo image groups, it enables the registration of multi-aspect SAR images, facilitating high-precision 3D imaging in complex scenes without requiring prior knowledge of the target area. This method has the capability to acquire 3D imaging data from various perspectives, utilizing multi-aspect SAR images acquired at different times and trajectories, resulting in a comprehensive 3D representation of the unknown scene. It signifies an important advancement in the practical application of radargrammetry for 3D imaging.
The 3D imaging method proposed in this study is based on SAR image data, which are susceptible to issues in SAR images such as layover. Consequently, the registration methods and point cloud fusion utilized in 3D imaging exhibit certain limitations, and delicate structures like ropes and support columns cannot be entirely reconstructed. We will further consider more feature extraction and matching methods for the registration and segmentation processes, achieving the automatic identification, extraction and high-density imaging of key targets. Also, we will utilize more multi-sensor SAR images for 3D imaging.
Furthermore, insufficient data in the experiments results in the incomplete reconstruction of specific scenes, such as mountainous regions and buildings. Although this paper only uses multi-aspect SAR images from two circular SAR, multi-source SAR images acquired through SAR imaging algorithms that comply with the geometric model can be used for 3D imaging, including satellite SAR images with special compensation. While the imaging area in this paper may not be extensive, the computational complexity resulting from an expanded coverage area should also be considered. To address these challenges, we will concentrate on the 3D scattering characteristics of the target scene, explore novel and efficient data acquisition methods and processing methods, and advance the practical application of detailed multi-dimensional information acquisition in complex scenes.