TMP-Net: Terrain Matching and Positioning Network by Highly Reliable Airborne Synthetic Aperture Radar Altimeter

Lu, Yanxi; Song, Anna; Liu, Gaozheng; Tan, Longlong; Xu, Yushi; Li, Fang; Wang, Yao; Jiang, Ge; Yang, Lei

doi:10.3390/rs16162966

Open AccessArticle

TMP-Net: Terrain Matching and Positioning Network by Highly Reliable Airborne Synthetic Aperture Radar Altimeter

by

Yanxi Lu

¹,

Anna Song

²

,

Gaozheng Liu

¹,

Longlong Tan

¹,

Yushi Xu

¹,

Fang Li

^1,*,

Yao Wang

¹,

Ge Jiang

¹ and

Lei Yang

²

¹

The Institute of Electronic Engineering, China Academy of Engineering Physics, Mianyang 621000, China

²

Tianjin Key Laboratory for Advanced Signal Processing, Civil Aviation University of China, Tianjin 300300, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(16), 2966; https://doi.org/10.3390/rs16162966

Submission received: 11 July 2024 / Revised: 8 August 2024 / Accepted: 10 August 2024 / Published: 13 August 2024

Download

Browse Figures

Versions Notes

Abstract

Airborne aircrafts are dependent on the Global Navigation Satellite System (GNSS), which is susceptible to interference due to the satellite base-station and cooperative communication. Synthetic aperture radar altimeter (SARAL) provides the ability to measure the topographic terrain for matching with Digital Elevation Model (DEM) to achieve positioning without relying on GNSS. However, due to the near-vertical coupling in the delay-Doppler map (DDM), the similarity of DDMs of adjacent apertures is high, and the probability of successful matching is low. To this end, a novel neural network of terrain matching and aircraft positioning is proposed based on the airborne SARAL imagery. The model-driven terrain matching and aircraft positioning network (TMP-Net) is capable of realizing aircraft positioning by utilizing the real-time DDMs to match with the DEM-based DDM references, which are generated by a point-by-point coupling mechanism between the airborne routine and ground terrain DEM. Specifically, the training dataset is established by a numerical simulation method based on a semi-analytical model. Therefore, DEM-based DDM references can be generated by forward deduction when only regional DEM can be obtained. In addition to the model-based DDM generation, feature extraction, and similarity measurement, an aircraft positioning module is added. Three different positioning methods are designed to achieve the aircraft positioning, where three-point weighting exhibits the best performance in terms of positioning accuracy. Due to the fact that both the weighted triplet loss and softmax loss are employed in a cooperative manner, the matching accuracy can be improved and the positioning error can be reduced. Finally, both simulated and measured airborne datasets are used to validate the effectiveness of the proposed algorithm. Quantitative and qualitative evaluations show the superiority.

Keywords:

airborne synthetic aperture radar altimeter (SARAL); deep learning; image matching; aircraft positioning

1. Introduction

Airborne positioning is crucial for flight safety. Global Navigation Satellite System (GNSS) is widely used for airborne positioning at present. It is a satellite navigation system that transmits encoded signals in Earth orbit via a group of satellites. The receiver on the aircraft receives the echoes, which are used to decode the precise position of the aircraft [1]. However, due to the satellite base-station and excessive distances, the encoded signals are sensitive to interference. Meanwhile, the open encoding and decoding loop is susceptible to interference by manned interruption. Therefore, GNSS positioning is unreliable in the case of a complicated electromagnetic environment.

Synthetic aperture radar altimeter (SARAL) provides the ability to measure the topographic terrain for matching with the DEM references to achieve aircraft positioning without relying on GNSS. SARAL is a radar system used to measure the height of the radar platform with respect to the ocean and landscape surface [2,3,4,5,6], which is not limited by light and climate conditions [7,8,9] and can produce the two-dimensional delay-Doppler map (DDM). Conventional radar altimeters mainly use limited pulses to measure ground elevation parameters. Compared with conventional radar altimeters, the synthetic aperture principle is incorporated, which can provide high resolution and wide coverage. The airborne SARAL gives a significant opportunity to obtain elaborate terrain surface features, making them suitable for use in positioning, and they can be widely used in airborne applications [10,11]. SARAL works in down-looking mode, while the common synthetic aperture radar (SAR) works in boardside mode. Some studies are published to perform image matching with measured data acquired from airborne SAR. Specifically, Yunhao Chang et al. discuss a method using scale-invariant feature transform (SIFT) for matching SAR images despite rotation differences [12]. Oscar Sommervold et al. conduct a comprehensive survey review of various methods for registering SAR with optical images, and address the challenges and recent advancements [13]. These studies based on SAR to achieve image matching have made great progress. However, SARAL directs the center of the radar beam at the nadir of the flight. It directs the center of the beam-width to the nadir of the flight. Specifically, when the platform flies directly above a ground scatter, the radar transmits wideband signals and receives the echoes after a certain time delay. DDMs are obtained by processing the echoes received by the SARAL. The DDM features reflect the accidented relief of the nadir of the aircraft, and it can be used to detect terrain changes.

Digital Elevation Model (DEM) is a digital representation of the Earth’s surface topography; it records elevation information at different positions in a discrete manner. At present, DEM data on a global scale can be obtained effortlessly, and it is widely used in areas such as terrain analysis and geological research. Due to DEM’s ability to reflect the elevation of the ground, it is possible to use DEM as a reference to match with the DDM for aircraft positioning without relying on GNSS.

However, due to the radar beam center in SARAL being directed vertically at the nadir of the airborne flight, the SARAL echoes exhibit strong coupling of near-vertical terrain. On the one hand, due to the width of the antenna beam of SARAL, the SARAL will receive not only the echoes from the nadir, but also that from the scatters around the nadir. On the other hand, the terrain in the area covered by the radar beam is accidented, and the echo energy of the scatters is different. It may lead to the echoes of the surrounding scatters being received by the radar receiver before the nadir. The echoes from both on and off the nadir will be coupled together; this is known as vertical coupling [14]. Due to vertical coupling, the similarity of DDMs of adjacent apertures is high. Therefore, it is difficult to achieve accurate image matching with SARAL. Meanwhile, if the observation time is short in SARAL, the DDM resolution is low, which easily leads to low matching accuracy and large positioning errors.

To achieve accurate aircraft positioning, the real-time SARAL DDM is compared with the reference images generated by flight hypothesis and DEM datasets. If the robustness and scenario generalization ability of the matching methods are not strong, the probability of successful matching will decrease. In addition, DDMs are the two-dimensional images containing terrain variation obtained by SARAL. However, DEM is the discrete data reflecting the three-dimensional elevation information. Therefore, the DDM and the DEM cannot be matched directly.

Terrain matching can be realized by using image matching methods. Image matching methods can be mainly divided into two categorizes. They are based on image gray scale correlation [15] and image feature correlation [16], respectively. Due to the huge amount of calculation and the related calculation of many unnecessary areas, the matching methods based on gray scale are difficult to apply when there are too many images. The methods based on feature correlation have strong robustness [17,18,19,20,21], and their high-precision characteristics are suitable for matching in complicated terrain. However, conventional methods based on feature correlation have locality; they focus on the local area of the image rather than the whole image. The differences of DDMs in SARAL between adjacent apertures are very small, which makes it difficult to realize matching by using conventional matching methods based on feature correlation.

Deep learning is a possible way to incorporate feature correlation to distinguish DDM differences among adjacent apertures, since it utilizes a large amount of datasets for data-driven learning. Specifically, deep learning has a strong ability to fit nonlinear relationships and effectively process massive amounts of data, which is widely used in various downstream tasks [22,23,24,25,26,27,28,29,30]. It is a typical data-driven method and generally employs a neural network architecture to fit the training data for an accurate formulation. Due to the ability of convolutional neural network (CNN) to extract high-dimensional features, CNN has made progress in image matching methods. To accommodate practical applications, many CNN variants [23,31] have been invented. They attract wide interest in SAR imaging fields, and researchers are motivated to adopt deep learning to solve the image matching task in different scenarios [12,13,18,32,33,34]. However, the existing tomographic image matching methods rely on visual images that the human eye can recognize and the human brain can understand. DDMs are obtained by projecting the three-dimensional terrain onto the two-dimensional plane using SARAL. Due to the fact that the echoes from both sides of SARAL are coupled to each other, the resultant DDM only reflects the variation in terrain from the top view. Therefore, it is difficult for human eyes to directly interpret. It is difficult to match the DDM and references generated from the DEM. Firstly, the DDMs and DEM references belong to different modalities, so their similarity cannot be measured and they are difficult to match directly. Secondly, due to the lack of formulation guidelines and limited interpretation of CNNs, it is difficult to achieve the mapping between DEM and DDM with the help of CNNs. Thirdly, the raw data for practical DDMs are insufficient, and there are no public datasets. It is difficult to obtain sufficient training data without public datasets and data generation methods. Finally, due to SARAL’s near-vertical observation, the similarity of DDMs of adjacent apertures is high. Therefore, it is difficult for existing deep learning methods to match the DDM with the DEM references, and the accuracy for possible matching is poor.

In this paper, a novel and model-driven deep learning algorithm by airborne SARAL is proposed for aircraft positioning. Specifically, to solve the problem of high similarity between two adjacent aperture DDMs, a terrain matching and aircraft positioning network (TMP-Net) is designed. It is capable of accommodating low image quality, and the probability of successful matching is improved to a great extent. Firstly, the model-driven method is used to realize the mapping from DEM references to the DDM dataset. Therefore, it enhances the interpretability the proposed network, and generates the DEM-based DDM references for image matching. Secondly, the CNN is used to extract the fine features of DDMs, which realizes the effective differences of DDMs of adjacent apertures. Meanwhile, the triplet loss [35] and the softmax loss [36,37] are used to optimize the network parameters in a weighted way to improve the probability of successful matching. In addition, the cosine similarity is used to measure the similarity between the deep feature embedding vectors after feature extraction for real-time DDMs and DEM-based DDM references. Finally, due to the output of the conventional image matching network, there is similarity between the two images, which does not achieve positioning. The aircraft positioning module is added based on the image matching neural network. Three different positioning methods are selected to output positioning coordinates, respectively. Unlike conventional image matching, which uses visual images, DEM and SARAL DDM are used, and model-driven CNN is utilized to achieve inter-modal data mapping, accomplishing the terrain matching and aircraft positioning. A series of qualitative and quantitative comparison experiments are set up using simulated and measured data to demonstrate the effectiveness and adaptability of our network.

2. Airborne SARAL Geometry and Signal Model

Airborne SARAL is different from airborne SAR. SAR works in boardside mode. However, SARAL works in down-looking mode. The airborne SARAL geometry is provided in Figure 1a. The Doppler beam is equivalent according to the synthetic aperture principle, and the azimuthal resolution in the along-track direction is improved. In Figure 1a, the airborne SARAL flies in a straight line, transmits wideband pulses, and receives echoes from the scatter over a synthetic aperture. Within the beam-illuminated area, SARAL will observe the ground scatter not only from the near-vertical position, i.e., at position B in Figure 1a, but also some other scatters along the flight, such as positions A and C in Figure 1a. According to the basic principle that features at different angles in the beam area generate different Doppler frequencies, Doppler beam sharpening (DBS) can be employed. SARAL can make full use of the Doppler frequencies to further divide an antenna beam into several sub-beams [38].

Figure 1b represents the concentric circle footprint from the moment of position B. The pulse footprint in the form of concentric circles is subdivided by adding isometric and Doppler lines according to the Doppler frequency. In Figure 1b, the azimuth is the flight direction of the aircraft. However, only the nadir echoes are useful for terrain matching and aircraft positioning.

In Figure 1a, the airborne SARAL flies along a flight path in the X-axis direction at an height of

h_{0}

with a constant speed v. The coordinate of the ground scatter L is

(x_{L}, y_{L}, z_{L})

, where

x_{L}

=

v t_{s}

. The coordinates of position B for the airborne radar are

(x_{B}, y_{B}, z_{B})

, located directly above position L. The goal of the airborne SARAL is to obtain high-resolution DDM at position L. The airborne SARAL transmits wideband pulses with linear frequency modulation when it passes directly above position L. Then, the SARAL receives echoes after a time delay. Therefore, the received echoes can be expressed as

\begin{matrix} s_{p u l} (τ) = A_{0} w_{r} \{τ - \frac{2 R (η)}{c}\} w_{a} (η - η_{c}) \\ \times cos \{2 π f_{0} (τ - \frac{2 R (η)}{c}) + π K_{r} {(τ - \frac{2 R (η)}{c})}^{2} + ψ\} \end{matrix}

(1)

where

τ

is the fast-time regarding to the time delay of direct range,

η

is the slow-time of the pulse repetition,

K_{r}

is the chirp rate, and c is the speed of light. In (1),

A_{0}

is a constant representing the back-scattering coefficient,

ψ

represents the phase change of radar signals caused by the surface scattering process, and

R (η)

is the instantaneous range between the airborne SARAL and the ground scatter. After range de-ramping and range compression [8], the echoes in the baseband can be written as

s_{rc} (τ, η) = A_{1} w_{r} \{τ - \frac{2 R (η)}{c}\} w_{a} (η - η_{c}) exp \{- j \frac{4 π f_{0} R (η)}{c}\}

(2)

where

A_{1} = A_{0} exp {j ψ}

is the complex-valued constant, which is ignored in subsequent derivations. In (2),

w_{r} {\cdot}

is the range envelope that denotes the response after the range compression. Usually,

w_{r} {\cdot}

is a Sinc function. The instantaneous range

R (η)

of the scatter can be given as

R (η) = \sqrt{h_{0}^{2} + v^{2} η^{2}} \approx h_{0} + \frac{v^{2} η^{2}}{2 h_{0}}

(3)

where

h_{0}

is the reference height. Substituting (3) into (2), the echo can be given as

s_{r} (τ, η) = w_{r} \{τ - \frac{2 R (η)}{c}\} w_{a} (η - η_{c}) exp \{- j \frac{4 π f_{0} h_{0}}{c}\} exp \{- j \frac{2 π f_{0} v^{2}}{c h_{0}} η^{2}\}

(4)

Azimuth phase modulation can be clearly seen from the second exponential term in (4). Since the phase is a function of

η

, the azimuthal chirp rate can be calculated as

K_{a} \approx \frac{2 v^{2}}{λ h_{0}}

(5)

where the azimuthal chirp rate gives a linear relationship between the Doppler frequency

f_{η}

and the slow-time

η

as

f_{η} = - K_{a} η

(6)

Next, a Fourier transformation in the azimuthal direction is employed, and (4) is converted into the delay-Doppler domain as

s_{r} (τ, f_{η}) = w_{r} \{τ - \frac{2 R (f_{η})}{c}\} W_{a} (f_{η} - f_{η_{c}}) exp \{- j \frac{4 π f_{0} h_{0}}{c}\} exp \{- j π \frac{{f_{η}}^{2}}{K_{a}}\}

(7)

where

R (f_{η}) \approx h_{0} + \frac{λ h_{0} {f_{η}}^{2}}{8 v^{2}}

is the range migration. In (7),

W_{a} (f_{η} - f_{η_{c}})

is the frequency domain form of azimuth antenna pattern

w_{a} (η - η_{c})

.

3. Preliminaries

3.1. DDM Inversion

After the echoes are pre-processed in (1)−(7), the echoes are converted into the delay-Doppler domain. The two-dimensional DDMs of the ground scatters can be obtained by applying the range-oriented Fourier transform to (4) and transforming it into the range-compressed domain. Therefore, to ensure the stability of the input data of the network and guarantee a suitable range of values, the image data matrix

s_{r} (τ, f_{η})

obtained by (7) is normalized pre-processed. The maximum and minimum values of the matrix are extracted first, and then all the elements in the matrix are mapped to the range of 0–255 using normalization to obtain

X

. The matrix

X

represents the two-dimensional DDM. The image is shown in Figure 2a.

According to the definition of range migration, the last exponential phase in (7) represents the quadratic curve of range migration of the echoes. To achieve the focused DDM images, the range migration correction is required. Applying the range Fourier transformation to (7), the echoes in the two-dimensional spectrum domain can be given as

\begin{matrix} S_{r} (f_{τ}, f_{η}) = W_{r} \{\frac{f_{τ}}{K_{r}}\} W_{a} (f_{η} - f_{η_{c}}) exp \{j π \frac{f_{η}^{2}}{K_{a}}\} \\ \times exp \{- j \frac{4 π (f_{0} + f_{τ}) h_{0}}{c}\} exp \{- j \frac{4 π f_{τ} Δ R (f_{η})}{c}\} \end{matrix}

(8)

where

Δ R (f_{η}) = R (η) - h_{0}

and

f_{τ}

is the range frequency. In (8),

W_{r} \{\frac{f_{τ}}{K_{r}}\}

is the frequency domain form of

w_{r} \{τ - \frac{2 R (f_{η})}{c}\}

. To fulfill the range migration correction, a phase function is generated as

G (f_{τ}) = exp (j \frac{4 π f_{τ} Δ R (f_{η})}{c})

(9)

After multiplying (9) to (7), the range-compressed signal can be written as

{\tilde{s}}_{r} (τ, f_{η}) = w_{r} \{τ - \frac{2 h_{0}}{c}\} W_{a} (f_{η} - f_{η_{c}}) exp \{- j \frac{4 π f_{0} h_{0}}{c}\} exp \{j π \frac{f_{η}^{2}}{K_{a}}\}

(10)

For the baseband signal, the range migration correction is carried out according to (8)–(10). Then, the two-dimensional DDM after the range migration correction is obtained. Similarly, the image obtained by (10) is pre-processed in the same way as

X

, and all elements in the matrix are mapped between 0 and 255 to obtain

\tilde{X}

. The matrix

\tilde{X}

represents the two-dimensional DDM after range migration correction, and the image is shown in Figure 2b. The resolution of DDMs contains range resolution and azimuth resolution. The range resolution depends on the bandwidth of the radar transmitting signal. The range resolution of DDMs in the revised manuscript is 7.5 m. The azimuth resolution is determined by the synthetic aperture length of the radar antenna and the observation time. The azimuth resolution of DDMs provided in the revised manuscript is 14.94 m. In this paper, to realize aircraft positioning, the data-driven deep learning algorithm is employed. The required training dataset is established. The numerical stability rate of the deep learning is improved and the generalization ability of the model on unseen data is enhanced.

3.2. Data-Driven CNN

DDM contains rich ground features and flight status. However, it is difficult to extract the key information using the conventional model-driven methods. Model-driven methods possess a strong ability for mathematical generalization. However, as the complexity of problems increases, it becomes challenging to solve practical non-linear problems. Accordingly, data-driven methods are superior for solving non-linear problems. To achieve the fine elevation features of the DDMs, data-driven methods can be employed. The most classical data-driven method is CNN [39]. With the improvement of computing resources, significant success has been realized by CNN. Therefore, it is a reasonable choice to use CNN for the feature extraction of DDMs.

However, there are some problems in using data-driven CNNs alone to achieve terrain matching and aircraft positioning. Due to the fact that data-driven CNNs lack mathematical interpretability, it is difficult to realize mapping from DEM references to a DDM dataset. Meanwhile, the performance of the network depends on a large number of high-quality training data. When the amount of training data is insufficient or the DEM cannot be effectively mapped to the DDM, it is difficult to achieve terrain matching and aircraft positioning. Therefore, it is necessary to combine model-driven methods with data-driven CNNs to achieve terrain matching and aircraft positioning.

3.3. Digital Elevation Model (DEM)

DEM is a kind of discrete expression of terrain surface morphology; it realizes the digital simulation of terrain through finite terrain elevation data. DEM is a dataset of plane coordinates

(X, Y)

and elevation Z of regular grid points within a certain area. It mainly describes the spatial distribution of regional geomorphology. DEM collects data (including sampling and measurement) through contour lines or similar spatial models, followed by data interpolation.

The elevation of each sampling point at a certain sampling interval is recorded via DEM data, including both plane and elevation information. DEM represents a finite sequence of three-dimensional vectors on area

S

, described in a functional form as

{p_{S}}^{(i)} = (X_{i}, Y_{i}, Z_{i}); i = 1, 2, \dots, n \in S

(11)

where

(X_{i}, Y_{i})

is the plane coordinates and

Z_{i}

is the elevation corresponding to

(X_{i}, Y_{i})

. With the development of geological survey and computer technology, the DEM as a digital representation of terrain has great advantages in the storage and display of terrain. Therefore, DEM is used as an image dataset to realize terrain matching and aircraft positioning independent of external environments. Meanwhile, compared with other positioning methods, such as inertial navigation, the DEM-based terrain matching and aircraft positioning method has unique advantages. Its positioning errors will not accumulate over time, and it can be positioned for a long time without manual error correction.

However, DEM resolution is an important indicator of the accuracy of terrain characterization by DEM. The higher the resolution, the more accurate the terrain characterization. Meanwhile, the amount of data also increases geometrically. In addition, the elevation information is contained by DEM in discrete form, so DEM cannot be directly used for matching with DDM. Therefore, it is necessary to develop a model-driven neural network that can realize the DEM mapping to DDM and the DDM matching, simultaneously.

4. Terrain Matching and Positioning Network (TMP-Net)

In this paper, the near-vertical coupling between adjacent apertures, leading to a high DDM similarity, low image quality, and low matching accuracy are considered. A model-driven terrain matching and aircraft positioning network (TMP-Net) is proposed, which is capable of realizing terrain matching and aircraft positioning without relying on the GNSS. The framework of the TMP-Net is an end-to-end network, including a model-based DDM generation module, a feature extraction model, a similarity measurement module, and an aircraft positioning module. The four modules are embedded into a single architecture. The model-based DDM generation module, feature extraction model, similarity measurement module, and aircraft positioning module are designed for DEM-based DDM reference generation, DDMs feature extraction, feature vector similarity measurement, and output positioning coordinates, respectively. In addition to TMP-Net, a specific loss function is designed to effectively distinguish DDM at different apertures.

4.1. Model-Based DDM Generation Module

Research on terrain matching and aircraft positioning based on DEM is insufficient at present. Due to the time complexity of the methods and the large amount of data, the research can only be carried out in a small area. Meanwhile, real-time aircraft positioning cannot be guaranteed. In this paper, the model-driven CNN is used to expand the terrain matching and aircraft positioning based on DEM and DDM to a large area. It focuses on the breakthrough in real-time, accurate, and large-area terrain matching and aircraft positioning. Meanwhile, DDM inversion, DEM mapping to DDM, and terrain matching positioning are integrated to form a complete terrain matching and aircraft positioning system.

Due to the lack of mathematical interpretability of the data-driven CNN, it is difficult to implement DEM mapping to DDM effectively. Therefore, a novel DDM generation algorithm is proposed in this paper, which plays a semi-model-driven role in the intended network, so that the real-time DDMs and DEM references can be matched in the same dimension [40,41].

High-precision DDM training data are the basis of terrain matching and aircraft positioning, and the coordinates of each scatter can be obtained as

{p_{S}}^{(i)}

from (11) according to the DEM. The current position of the platform is

p_{L}

and the speed is v. Although SARAL observes scatters at the nadir, its power is mainly based on back-scattering [42]. The position vector can be given as

r^{(i)} = ({p_{S}}^{(i)} - p_{L}) / {∥{p_{S}}^{(i)} - p_{L}∥}_{2}

(12)

where

p_{L} = {(x_{L}, y_{L}, z_{L})}^{T}

is the current position of the platform and

{p_{S}}^{(i)} = {(x^{(i)}, y^{(i)}, z^{(i)})}^{T}

is the coordinates of each scatter,

i \in S

, and

S

represents the total set of scatters. The Doppler frequency of each scatter can be obtained from (12) as

D^{(i)} = 2 v^{T} r^{(i)} / λ

(13)

where

v = {({\dot{x}}_{L}, {\dot{y}}_{L}, {\dot{z}}_{L})}^{T}

is the real-time speed of the platform and

λ

is the radar wavelength. The relative range of each scatter is (14) and the back-scattering coefficient is (15).

R^{(i)} = {∥{p_{S}}^{(i)} - p_{L}∥}_{2}

(14)

σ^{(i)} = b_{1} + b_{2} exp (- b_{3} θ^{(i)}) + b_{4} cos (b_{5} θ^{(i)} + b_{6})

(15)

where

θ = arccos (|z^{(i)} - z_{L}|)

and

b_{1} \sim b_{6}

depend on the type of land cover medium [40,41,42]. From (14) and (15), the reflection of a single scatter is calculated as

s^{(i)} = \sqrt{\frac{λ^{2}}{{(4 π)}^{3}} σ^{(i)} \frac{1}{{(R^{(i)})}^{4}}}

(16)

Accumulate each element of the DDM matrix based on the sets of range and Doppler indexes as

D_{m n} = |\sum_{i \in R_{m} \cup D_{n}} s^{(i)}|

(17)

where

R_{m}

is the index set of the m-th range gate and

D_{n}

is the index set of the n-th Doppler channel. After the above-mentioned processing, the DDM power matrix can be obtained as

D \in R^{M \times N}

, where M is the number of pulses and N is the number of range gates. Therefore, sufficient DDMs can be obtained. Then, the amplitude value of the DDM can be mapped to the range of 0–255 to obtain

Y

. The matrix

Y

is the normalized DEM-based DDM reference.

4.2. Feature Extraction Model

Before feature extraction, grid point selection will be used to improve the matching efficiency and positioning accuracy of the proposed network. Due to the fact that the aircraft follows a preset flight in practice, a series of flight points in the preset flight are selected. A series of grid points around the flight points can be divided into the X direction with 92 m as the interval, Y direction with 90 m as the interval, and Z direction with 205 m as the interval. In such a case, the number of grid points of the simulated flight A1–A2 is 3146 and the number of grid points of the measured flight B1–B2 is 885.

It can limit matching area while traversing multiple points, taking into account the matching efficiency and positioning accuracy. The grid points are divided off-line. We assume that the start point, end point, and time interval of the aircraft flight are determined. In such a case, the approximate flight direction and speed can be assumed to be in a small and negligible confidence interval. In different flights, a series of grid points are established off-line along the flight. Then, a series of DEM-based DDM references are generated according to the predetermined flight and the model-based DDM generation.

The real-time DDMs are obtained by processing the SARAL echoes. Each real-time DDM need to be matched with each DEM-based DDM reference to measure similarity. It will definitely result in huge calculation and low efficiency. To some extent, the DEM-based DDM references of the grid points far away from the flight points are also used to match the real-time DDM. It is time-consuming and will reduce the matching efficiency. Meanwhile, the matching error is introduced, which increases the positioning error. To further improve the matching efficiency and reduce the positioning error, grid points need to be selected before matching. Therefore, spheres with the flight points as the center and a radius of 200 m are designed. Only the DEM-based DDM references of the grid points inside the spheres are kept for matching with real-time DDM. However, the grid points outside the spheres are selected. Subsequent experiments show that the positioning error can be significantly reduced by grid point selection.

On the basis of improving matching efficiency and positioning accuracy through grid point selection, a feature extraction module is used to realize the effective feature extraction of DDMs. Specifically, the feature extraction module mainly contains a backbone network and middle-level feature fusion.

4.2.1. Backbone Network

The data-driven CNN is often used to extract fine features of images of interests. The common architectures of it include ResNet [23] and VGG [31]. ResNet18 [23] in ResNet is widely used for downstream tasks. Therefore, ResNet18 is considered as the backbone network in this paper.

Specifically, ResNet18 contains five convolutional groups. Each convolutional group contains one or more convolution operations. Multiple similar residual blocks are contained in the second to fifth convolution groups, which can also be called stage1, stage2, stage3, and stage4. ResNet18 is widely used in image feature extraction because of its superior performance. However, ResNet18 is not sufficient to meet the requirements of the paper. Due to average pooling being used by ResNet18 to extract the regional mean, it is not beneficial for preserving the local and fine features of DDMs of interest. Therefore, an improved ResNet18 is designed in this paper. First, the convolution group is discarded. Second, in the first residual convolution group, the maximum pooling operation is discarded. Meanwhile, the step size in the first convolution operation is adjusted to 2. Next, the average pooling after the fourth residual convolution group is adjusted to the maximum pooling. This is because the regional mean value is extracted by the average pooling, and the regional maximum value is extracted by the maximum pooling. In contrast, the local features of the DDMs are more likely to be preserved by the maximum pooling. Meanwhile, the maximum pooling can reduce smoothness and focus on fine features of the DDMs. Finally, after the maximum pooling, the fully connected layer is added as a bottleneck structure to reduce the dimension. The improved ResNet18 network architecture is shown in Figure 3, where ‘conv’ indicates the convolution layer, and ‘/2’ indicates that the step size is 2. In Figure 3, ‘maxpool’ indicates the maximum pooling layer and ‘fc’ indicates the fully connected layer. The local and fine features of DDMs of interest can be extracted effectively by utilizing the improved ResNet18.

4.2.2. Middle-Level Feature Fusion

To efficiently extract the global and local features of DDMs, the improved ResNet18 is used as the backbone network. Then, the middle-level features are extracted and fused to obtain the intended DDM features. Specifically, the output of any layer in deep learning can be considered as the middle-level feature. In this paper, the outputs of stage2 and stage3 are adopted as the middle-level feature.

A convolution block is required to change the size of the middle-layer features, according to the size of the feature map in the backbone network. The convolution block is a bottleneck structure, and it contains two convolution layers. The complete convolution block is shown in the Figure 4. Moreover, after the convolution block, maximum pooling is adopted in the middle-layer feature fusion. Then, to obtain the intended DDM features, the middle-layer features of stage2 and stage3 are fused with the top-layer features of the backbone network.

To sum up, to effectively extract the global and local features of DDMs, the feature extraction module

H

is designed in this paper. The improved ResNet18 is used as the backbone network and middle-layer feature fusion is added. The complete feature extraction network is shown in Figure 5. Specifically, the black line represents the backbone network. The red line indicates the middle-level feature fusion with cross-layer connectivity. The feature extraction module

H

is used to extract depth feature embedding vectors of real-time DDMs and DEM-based DDM references.

Feature extraction module

H

is used to extract features from the real-time DDMs. The real-time DDM after range migration correction and normalization is obtained as

\tilde{X}

. The depth feature embedding vector of the

\tilde{X}

can be obtained as

F = H (\tilde{X})

(18)

Similarly, the feature extraction module

H

is used to extract features from the DEM-based DDM references. The DEM-based DDM references after normalization are obtained as

Y

. The depth feature embedding vector of

Y

can be obtained as

T = H (Y)

(19)

The network architecture shown in Figure 5 is used, and intended DDM features are extracted effectively. Meanwhile, the explicit mapping from input DDM to output depth feature embedding vectors is realized.

4.3. Loss Function

The loss function plays a crucial role in deep learning [43]. To improve the positioning accuracy and generalization ability of the model, model parameters are adjusted by minimizing the loss function. The selection of appropriate loss function can reflect the characteristics of the task and deal with the data imbalance and overfitting. It plays an important role in model training.

To solve the problem of high similarity between DDMs in adjacent apertures, a triplet loss function is used in this paper. The triplet loss function is widely used in deep learning. The core idea is to make the features of the same label as close as possible in spatial position, while making the features of different labels as far away as possible [35]. The triplet anchor, positive, and negative need to be set in the triplet loss. The anchor and positive are different samples of the same category, and the anchor and negative are different categories.

x_{i}^{A}

,

x_{i}^{P}

, and

x_{i}^{N}

are a set of data input into the CNN; the idea of a triplet loss function is expressed in the Euclidean norm as

\begin{matrix} {∥f (x_{i}^{A}) - f (x_{i}^{P})∥}_{2}^{2} + α < {∥f (x_{i}^{A}) - f (x_{i}^{N})∥}_{2}^{2} \\ \forall (f (x_{i}^{A}), f (x_{i}^{P}), f (x_{i}^{N})) \in V \end{matrix}

(20)

where i indicates batch number, and

α

is the threshold. In (20), A, P, and N represent the anchor, positive, and negative, respectively. In such cases,

f (x_{i}^{A})

,

f (x_{i}^{P})

, and

f (x_{i}^{N})

are the feature mappings to

x_{i}^{A}

,

x_{i}^{P}

, and

x_{i}^{N}

. According to (20), the triplet loss function can be written as

L = \sum_{i}^{M} {[{∥f (x_{i}^{A}) - f (x_{i}^{P})∥}_{2}^{2} - {∥f (x_{i}^{A}) - f (x_{i}^{N})∥}_{2}^{2} + α]}_{+}

(21)

where

{[\cdot]}_{+}

is the operator that will return the specific value when the value in

[\cdot]

is greater than zero, and otherwise return zero. In (21), d is used to represent the distance function between two arbitrary samples.

d (A, P) + α \leq d (A, N)

(22)

Meanwhile, the loss function in (21) can be further simplified as

L = \frac{1}{M} \sum_{i = 1}^{M} max \{d (A_{i}, P_{i}) - d (A_{i}, N_{i}) + α, 0\}

(23)

In (21) and (23), the distance between two arbitrary heterogeneous samples is larger than the distance between two arbitrary homogeneous samples plus

α

. When (20) is not satisfied, the loss is larger than zero, and the network updates the parameters through back-propagation. Otherwise, the loss is zero. The samples meet the training requirements, no gradient is generated, and the parameters of the network do not need to be updated.

4.4. Similarity Measurement Module

To determine if the terrain matching is successful, the similarity of the depth feature vectors between real-time DDMs and DEM-based DDM references is measured. Because the cosine similarity is efficient and suitable for high-dimensional data, it is employed for measuring the similarity of the DDMs in this paper. It is different from the similarity measurement methods that solve the linear distance between two vectors in multi-dimensional space, such as Euclidean norm [44]. The cosine similarity measures the difference between two vectors by solving the cosine value of the angle of two vectors in vector space [45,46]. It reflects the difference of direction in the vector space, regardless of the size of the vector. From (18) and (19), the depth feature embedding vectors F of real-time DDMs and the depth feature embedding vectors T of DEM-based DDM references can be obtained. Therefore, the cosine similarity can be defined as

sim (F, T) = cos (θ) = \frac{F \cdot T}{{∥F∥}_{2} {∥T∥}_{2}} = \frac{\sum_{i = 1}^{n} F_{i} T_{i}}{\sqrt{\sum_{i = 1}^{n} F_{i}^{2}} \cdot \sqrt{\sum_{i = 1}^{n} T_{i}^{2}}}

(24)

where the operator · is the inner product operator. In (24), the smaller the angle between two vectors, the higher the vector similarity.

To sum up, the cosine similarity between the depth feature embedding vector of real-time DDMs and the DEM-based DDM references can be calculated. Then, the similarity values are obtained and the descending indexes are sorted out, where

{sim}_{1} (F, T)

indicates the highest similarity,

{sim}_{2} (F, T)

indicates the second similarity, and

{sim}_{3} (F, T)

indicates the third similarity. They are used in the aircraft positioning module to realize the positioning of aircraft.

4.5. Aircraft Positioning Module

The real-time DDMs are matched with each of the DEM-based DDM references in turn, and the matching result is returned by the similarity value. Therefore, following the feature extraction module and similarity measurement module, an aircraft positioning module is designed, so that the positioning coordinates can be obtained according to the matched coordinates of the DEM-based DDM references. In this paper, three different positioning methods are considered, including single-point matching, three-point weighting, and three-point centroid. After theoretical analysis and experimental validation, three-point weighting is preferred in the scenario of SARAL DDM positioning.

Single-point matching is a method to obtain the positioning coordinates based on the matched DEM-based DDM references with the highest similarity. The calculation of single-point matching is relatively simple. The coordinates of the matched DDM reference are considered as the positioning coordinates, and the calculation process can be given as

x_{f 1} = x_{1}

(25)

y_{f 1} = y_{1}

(26)

z_{f 1} = z_{1}

(27)

where

(x_{f 1}, y_{f 1}, z_{f 1})

are the positioning coordinates of the single-point matching output and

(x_{1}, y_{1}, z_{1})

are the coordinates of the DEM-based DDM references with the highest similarity

{sim}_{1} (F, T)

.

The three-point weighting and three-point centroid are the methods to obtain the positioning coordinates based on the matched DEM-based DDM references with the top three similarities. Three-point weighting refers to the result obtained by weighting the coordinates of the top three matched DEM-based DDM references with their similarity for the positioning coordinates as

x_{f 2} = \frac{{sim}_{1} (F, T) x_{1} + {sim}_{2} (F, T) x_{2} + {sim}_{3} (F, T) x_{3}}{{sim}_{1} (F, T) + {sim}_{2} (F, T) + {sim}_{3} (F, T)}

(28)

y_{f 2} = \frac{{sim}_{1} (F, T) y_{1} + {sim}_{2} (F, T) y_{2} + {sim}_{3} (F, T) y_{3}}{{sim}_{1} (F, T) + {sim}_{2} (F, T) + {sim}_{3} (F, T)}

(29)

z_{f 2} = \frac{{sim}_{1} (F, T) z_{1} + {sim}_{2} (F, T) z_{2} + {sim}_{3} (F, T) z_{3}}{{sim}_{1} (F, T) + {sim}_{2} (F, T) + {sim}_{3} (F, T)}

(30)

where

(x_{f 2}, y_{f 2}, z_{f 2})

are the positioning coordinates of the three-point weighting output,

(x_{2}, y_{2}, z_{2})

are the coordinates of the DEM-based DDM reference with the second similarity

{sim}_{2} (F, T)

, and

(x_{3}, y_{3}, z_{3})

are the coordinates of the DEM-based DDM reference with the third similarity

{sim}_{3} (F, T)

.

The three-point centroid is the centroid of the coordinates of the matched DEM-based DDM references with the top three similarities for the positioning coordinates as

x_{f 3} = \frac{x_{1} + x_{2} + x_{3}}{3}

(31)

y_{f 3} = \frac{y_{1} + y_{2} + y_{3}}{3}

(32)

z_{f 3} = \frac{z_{1} + z_{2} + z_{3}}{3}

(33)

where

(x_{f 3}, y_{f 3}, z_{f 3})

are the positioning coordinates of the three-point centroid output. To sum up, three different positioning methods are used to match real-time DDMs and DEM-based DDM references, and the real-time position of the aircraft can be obtained.

5. Processing Flowchart

The processing flowchart of the proposed TMP-Net is demonstrated in Figure 6. In general, it is divided into four modules, namely a model-based DDM generation module, a feature extraction model, a similarity measurement module, and an aircraft positioning module. In the model-based DDM generation module, the real-time DDMs are obtained by processing the raw SARAL echoes. Meanwhile, according to the airborne flights, the flight points are selected. The grid points are divided around the flight points and the grid points are selected. Then, the DDM references are generated by numerical simulation method according to the DEM. In the similarity measurement module, the feature vectors extracted from the DEM-based DDM references and the real-time DDMs are used for measuring similarity. In the aircraft positioning module, the similarities obtained according to the previous modules are used. Then, the coordinates of the DEM-based DDM references are obtained, and the final positioning coordinates are output. To sum up, the proposed network architecture of TMP-Net is shown in Figure 7.

6. Experiments

6.1. Dataset Preparation

First of all, the training, validation, and test datasets are established. In this paper, a numerical simulation method is employed for the DEM-based DDM reference generation, as discussed in Section 4.1. To ensure a small generalization error, a rich training dataset in the region of interest is constructed. The parameters of the area covered by the training dataset are provided in Table 1. First, a large area 60 km long and 40 km wide is selected. The training dataset is constructed in this large area. Second, to take into account as many complicated terrains as possible, 5000 coordinates are randomly selected in the area. Third, the numerical simulation method proposed is used to generate DEM-based DDM references of 5000 coordinates. The complete process of the numerical simulation method is provided in Section 4.1. In such a case, 5000 DEM-based DDM references can be obtained. Specifically, data augmentation is a typical technology in deep learning. The primary aim of data augmentation is to expand the training dataset, thereby enhancing the interference and generalization capacity of the model. To increase the richness and diversity of the training dataset, different data augmentation methods are employed in this paper. For each training sample, brightness, contrast, sharpness, shear, solarize, and translate are performed, respectively [47]. Therefore, a training sample is selected, and its data augmentation diagram is given in Figure 8. Homogeneous samples are the DDM groups formed under the same index i, and heterogeneous samples are the DDM groups corresponding to different indexes i and j. Then, a series of data augmentation methods are used to expand DEM-based DDM references. A total of 5000 DEM-based DDM references are augmented, totaling 5000 × 12 DEM-based DDM references.

6.2. Model Training

To effectively extract global and local features of DDMs, the triplet loss and softmax loss are adopted for the loss function in (34). For each training data batch, Q coordinates are randomly selected, and K different DDMs after data augmentation are randomly selected under each of the coordinates. Then, a DDM group containing

Q \times K

images is constructed. To capture small differences between the adjacent aperture DDMs as much as possible, triples are used. The triple containing the hardest positive sample and hardest positive sample of each DDM in the batch are used.

L_{t h} = \frac{1}{Q \times K} \sum_{A \in b a t c h} {(max_{P \in A} d_{A, P} - \underset{N \in A}{min d_{A, N} + α})}_{+}

(34)

The intended network is co-constrained by the triplet loss and the softmax loss function, where each category corresponds to a training coordinate. Therefore, the total loss function can be written as

L = L_{triplet, 1} + L_{triplet, 2} + L_{softmax, 1} + L_{softmax, 2}

(35)

where

L_{t r i p l e t, 1}

and

L_{s o f t max, 1}

are, respectively, the triplet loss and softmax loss obtained from the features extracted by the ResNet18 backbone. In (35),

L_{t r i p l e t, 2}

and

L_{s o f t max, 2}

are, respectively, the triplet loss and softmax loss obtained from the features extracted by the middle feature-level fusion.

To give the training and validation results of the network, the training loss and validation loss images are drawn and provided in Figure 9. First, in Figure 9, with the increase in the training times, the training loss and validation loss decrease. Second, the gap between the training loss and validation loss gradually reduces after convergence. In Figure 9, the features of the DDMs are learned gradually by the model during the training process. The model also performs well on the validation dataset, which is not used in the training process. Therefore, the great training effect of the model is proved in Figure 9.

Regarding high-dimensional and large datasets, Adam [48] is used to adaptively adjust the learning rate of each parameter during training. The Adam method with step decay and a learning rate of 0.001 is applied in weight optimization [49].

6.3. Experimental Setup

To quantitatively analyze the positioning effect, some indexes are introduced. They are matching accuracy and positioning error. The matching accuracy is expressed by the average matching error in the X direction, the average matching error in the Y direction, the standard deviation of the matching error in the X direction, and the standard deviation of the matching error in the Y direction, respectively. The average matching error in the X direction indicates the deviation between the positioning coordinate and the real position in the X direction as

Δ \bar{x} = \frac{1}{N} \sum_{i = 1}^{N} Δ x_{i}

(36)

where

Δ x_{i}

represents the difference between the positioning coordinates and real positions in the X direction for each set of positioning coordinates. In (36), N represents the total number of matches.

The average matching error in the Y direction represents the deviation between the positioning coordinate and the real position in the Y direction as

Δ \bar{y} = \frac{1}{N} \sum_{i = 1}^{N} Δ y_{i}

(37)

where

Δ y_{i}

represents the difference between the positioning coordinates and real positions in the Y direction.

The standard deviation of the X direction matching error can be given as

σ_{x} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i} - Δ \bar{x})}^{2}}

(38)

The standard deviation of the Y direction matching error can be given as

σ_{y} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(y_{i} - Δ \bar{y})}^{2}}

(39)

In addition to the matching accuracy, positioning error is also introduced to analyze the positioning performance. The positioning error represents the Euclidean distance between the positioning coordinates and the real position. In this paper, the positioning error is represented by three-dimensional error, horizontal error, and vertical error. The three-dimensional error represents the error between the positioning coordinate and the real position. The horizontal error represents the error between the positioning coordinate and the real position in the horizontal plane. The vertical error represents the error between the positioning coordinate and the real position in the vertical direction.

6.4. Experimental Results

To verify the effectiveness of the proposed algorithm in practical applications, both simulated data and measured data of airborne SARAL in complicated terrain are employed. To evaluate the superiority of the TMP-Net, several qualitative and quantitative experiments are conducted. Then, the test sets consist of simulated DDM and measured DDM isolated from the training sets. A series of measured experiments are conducted. The testing dataset consists of simulated flight A1–A2 and measured flight B1–B2.The simulated and measured experiments of TMP-Net in near-vertical coupling are provided in Section 6.4.1 and Section 6.4.2, respectively.

6.4.1. Simulated Data Experiment

A flight is preset under a certain terrain, and its corresponding simulated DDMs are generated by the numerical simulation method. The system parameters of the airborne SARAL are shown in Table 2. One flight A1–A2 is selected, and the flight is shown in the DEM in Figure 10. The flight parameters of simulated flight A1–A2 are provided in Table 3.

In flight A1–A2, experimental results without grid point selection and with grid point selection are listed, respectively. The matching results and positioning errors qualitatively without grid point selection are shown in Figure 11 and Figure 12, respectively. The matching accuracy and average positioning errors are quantitatively calculated as shown in Table 4.

In addition, the matching results and positioning errors qualitatively with grid point selection are shown in Figure 13 and Figure 14, respectively. The matching accuracy and average positioning errors are quantitatively calculated, as shown in Table 5.

In Figure 11 and Figure 12 and Table 4, for the simulated flight A1–A2 without grid point selection, single-point matching has the best positioning error. The matching accuracy of single-point matching is the best in the X direction, while three-point weighting has the best matching accuracy in the Y direction. In Figure 13 and Figure 14 and Table 5, for the simulated flight A1–A2 with grid point selection, three-point weighting has the best matching accuracy and the smallest positioning error, and it is the best in terms of stability. In addition, compared with that without grid point selection, it can be concluded that the matching accuracy is significantly improved and the positioning error is significantly reduced with the grid point selection. It proves that the grid point selection can effectively improve the positioning effect and increase the effectiveness of the proposed algorithm.

To sum up, three different positioning methods can achieve different results for the same flight. Meanwhile, it is proved by experiments that the effect obtained by the grid point selection is better than that without grid point selection, and the operation efficiency is faster. Through a series of qualitative and quantitative experiments, the effectiveness of the proposed algorithm is proved.

6.4.2. Measured Data Experiments

In this paper, a real system is developed and built, as shown in Figure 15. The flight parameters of the real system are provided in Table 6. The real system includes a lightweight multi-rotated Unmanned Aerial Vehicle (UAV) and SARAL system, and the SARAL system is shown in Figure 1a. The detailed parameters of the UAV, including weight, size, and peak power consumption, are provided in Table 6. The radar altimeter sensor is shown in Figure 15a. The flight diagram of the real system is provided in Figure 15b. Raw data are collected to generate the real-time DDMs for the experiment. The multi-rotated Unmanned Aerial Vehicle (UAV) used in this paper is shown in Figure 15c. The flight diagram of the real system is shown in Figure 15d.

To prove the truth of the real system and real experiment, the measured DDM at six different coordinates are given in Figure 16a. Meanwhile, the six DDM references obtained by the simulated system with the same coordinates as in Figure 16a are provided. Specifically, the six DDM references are shown in Figure 16b. The simulated DDM and measured DDM under the same coordinate are represented by the small image in the same position in Figure 16a,b. Under the same coordinates, the simulated DDMs are similar to the measured DDM. It proves that the simulated system and the proposed numerical simulated method are effective.

Then, a flight is selected to set up a series of qualitative and quantitative experiments. To distinguish the measured flight from the simulated flight A1–A2 in Section 6.4.1, the measured flight is named B1–B2 here. The measured data are produced in terrain B. The whole DEM of terrain B is given in Figure 17a. Flight B1–B2 is shown in the DEM in Figure 17b. The DDM of a certain flight point on the flight B1–B2 is shown in Figure 17c. The flight parameters of measured flight B1–B2, including airspeed, altitude, and flight distance, are provided in Table 6. The qualitative matching results and positioning errors of flight B1–B2 with grid point selection are shown in Figure 18, Figure 19, and Figure 20, respectively. The quantitative matching accuracy and average positioning error are shown in Table 7. In Table 7, the matching accuracy and average positioning error obtained by the three-point weighting and three-point centroid are superior to single-point matching. Meanwhile, the matching accuracy and average positioning error of three-point weighting and three-point centroid are similar, and three-point weighting is slightly better than three-point centroid.

In Table 4, the matching accuracy and average positioning error of the single-point matching is the best, followed by three-point weighting. In Table 5, the three-point weighting positioning effect is the best, followed by the three-point centroid. Meanwhile, the positioning effect of three-point weighting and three-point centroid with grid point selection is significantly enhanced compared with single-point matching. In Table 7, the three-point weighting positioning effect is the best, followed by the three-point centroid, and the single-point matching effect is the worst. Therefore, it can be seen from Table 4, Table 5, and Table 7 that three-point weighting has strong robustness.

To sum up, in the simulated experiments and the measured experiments, the three-dimensional error and the horizontal error can be limited to 50 m by using the three-point weighting. Through the above qualitative and quantitative experimental analysis, it is proved that the proposed algorithm is highly reliable and has a certain generalization ability.

7. Conclusions

In this paper, a novel CNN framework for image matching is proposed to solve the problem that aircraft cannot be located when the GNSS signal is interfered with or locked, and the DDM generated by SARAL is used to realize terrain matching and positioning. The simulated DDM is generated using a numerical simulation method and the dataset is constructed to satisfy the training, validation, and testing. The triplet loss and softmax loss are selected as the loss function in the network. The cosine similarity of real-time DDMs and reference DDMs is measured, and then the aircraft positioning module is used to obtain the real-time positioning coordinates of the aircraft. The experimental results show that the positioning error of the proposed network is small, the positioning error of the three-dimensional and horizontal direction is about 40 m, and the vertical error is about 8m. Moreover, the method is suitable for migration scenarios and has generalization ability.

Author Contributions

Conceptualization, Y.L. and L.Y.; methodology, Y.L., Y.X., Y.W. and L.Y.; software, A.S., F.L. and G.J.; validation, L.Y.; formal analysis, F.L. and L.Y.; investigation, Y.L., G.L. and L.T.; data curation, G.L.; writing—original draft preparation, A.S.; writing—review and editing, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 62271487.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Spencer, J.; Frizzelle, B.; Page, P.; Vogler, J. Global Positioning System: A Field Guide for the Social Sciences; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
Massachusetts Institute of Technology; Electronics Research Center (U.S.); Williams College. The Terrestrial Environment: Solid-Earth and Ocean Physics; NASA contractor report; National Aeronautics and Space Administration: Washington, WA, USA, 1970.
Raney, R. The delay/Doppler radar altimeter. IEEE Trans. Geosci. Remote Sens. 1998, 36, 1578–1588. [Google Scholar] [CrossRef]
Jensen, J.; Raney, R. Delay/Doppler radar altimeter: Better measurement precision. In Proceedings of the IGARSS ’98. Sensing and Managing the Environment. 1998 IEEE International Geoscience and Remote Sensing. Symposium Proceedings. (Cat. No.98CH36174), Seattle, WA, USA, 6–10 July 1998; Volume 4, pp. 2011–2013. [Google Scholar] [CrossRef]
Yu, H.; Cao, N.; Lan, Y.; Xing, M. Multisystem Interferometric Data Fusion Framework: A Three-Step Sensing Approach. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8501–8509. [Google Scholar] [CrossRef]
Sahoo, A.K.; Ray, C.; Vaze, P. The SARAL/AltiKa mission: A step forward to the future of altimetry. Ocean Sci. 2014, 10, 1–10. [Google Scholar]
Chen, J.; Li, M.; Yu, H.; Xing, M. Full-Aperture Processing of Airborne Microwave Photonic SAR Raw Data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Xu, G.; Chen, Y.; Ji, A.; Zhang, B.; Yu, C.; Hong, W. 3-D High-Resolution Imaging and Array Calibration of Ground-Based Millimeter-Wave MIMO Radar. IEEE Trans. Microw. Theory Tech. 2024, 72, 4919–4931. [Google Scholar] [CrossRef]
Altiparmaki, O.; Amraoui, S.; Kleinherenbrink, M.; Moreau, T.; Maraldi, C.; Visser, P.N.; Naeije, M. Introducing the Azimuth Cutoff as an Independent Measure for Characterizing Sea-State Dynamics in SAR Altimetry. Remote Sens. 2024, 16, 1292. [Google Scholar] [CrossRef]
Strub, P.; James, C. Altimeter-derived surface circulation in the large-scale NE Pacific Gyres.: Part 2: 1997–1998 El Niño anomalies. Prog. Oceanogr. 2002, 53, 185–214. [Google Scholar] [CrossRef]
Moreau, T.; Cadier, E.; Boy, F.; Aublanc, J.; Mavrocordatos, C. High-performance altimeter Doppler processing for measuring sea level height under varying sea state conditions. Adv. Space Res. 2021, 67, 1870–1886. [Google Scholar] [CrossRef]
Chang, Y.; Xu, Q.; Xiong, X.; Jin, G.; Hou, H.; Man, D. SAR image matching based on rotation-invariant description. Sci. Rep. 2023, 13, 14510. [Google Scholar] [CrossRef]
Sommervold, O.; Gazzea, M.; Arghandeh, R. A survey on SAR and optical satellite image registration. Remote Sens. 2023, 15, 850. [Google Scholar] [CrossRef]
Moore, R.K.; Williams, C.S. Radar Terrain Return at Near-Vertical Incidence. Proc. IRE 1957, 45, 228–238. [Google Scholar] [CrossRef]
Barnea, D.I.; Silverman, H.F. A Class of Algorithms for Fast Digital Image Registration. IEEE Trans. Comput. 1972, C-21, 179–186. [Google Scholar] [CrossRef]
Schmid, K.; Lutz, P.; Tomić, T.; Mair, E.; Hirschmüller, H. Autonomous Vision-Based Micro Air Vehicle for Indoor and Outdoor Navigation. J. Field Robot. 2014, 31, 537–570. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Proceedings, Part I 9. Springer: Graz, Austria, 2006; pp. 404–417. [Google Scholar]
Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary Robust invariant scalable keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar] [CrossRef]
Viergever, M.A.; Maintz, J.B.A.; Klein, S.; Murphy, K.; Staring, M.; Pluim, J.P.W. A survey of medical image registration-under review. Med. Image Anal. 2016, 33, 140–144. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 60, 84–90. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Bordone Molini, A.; Valsesia, D.; Fracastoro, G.; Magli, E. DeepSUM: Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3644–3656. [Google Scholar] [CrossRef]
Han, Y.; Ye, J.C. Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT. IEEE Trans. Med Imaging 2018, 37, 1418–1429. [Google Scholar] [CrossRef] [PubMed]
Seo, H.; Huang, C.; Bassenne, M.; Xiao, R.; Xing, L. Modified U-Net (mU-Net) With Incorporation of Object-Dependent High Level Features for Improved Liver and Liver-Tumor Segmentation in CT Images. IEEE Trans. Med Imaging 2020, 39, 1316–1325. [Google Scholar] [CrossRef]
Wang, P.; Zhang, H.; Patel, V.M. SAR Image Despeckling Using a Convolutional Neural Network. IEEE Signal Process. Lett. 2017, 24, 1763–1767. [Google Scholar] [CrossRef]
Zhou, S.; Wang, J.; Zhang, J.; Wang, L.; Huang, D.; Du, S.; Zheng, N. Hierarchical U-Shape Attention Network for Salient Object Detection. IEEE Trans. Image Process. 2020, 29, 8417–8428. [Google Scholar] [CrossRef] [PubMed]
Mullissa, A.G.; Marcos, D.; Tuia, D.; Herold, M.; Reiche, J. deSpeckNet: Generalizing Deep Learning-Based SAR Image Despeckling. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Wang, P.; Liu, H.; Zhou, X.; Xue, Z.; Ni, L.; Han, Q.; Li, J. Multidimensional Evaluation Methods for Deep Learning Models in Target Detection for SAR Images. Remote Sens. 2024, 16, 1097. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. LIFT: Learned Invariant Feature Transform; Springer International Publishing: Amsterdam, The Netherlands, 2016. [Google Scholar]
Chen, X.; He, K. Exploring Simple Siamese Representation Learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–25 June 2021; pp. 15745–15753. [Google Scholar] [CrossRef]
Detone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar] [CrossRef]
Alpaydin, E. Neural Networks and Deep Learning. In Machine Learning: The New AI; MIT Press: Cambridge, MA, USA, 2016; pp. 85–109. [Google Scholar]
Domingos, P. A few useful things to know about machine learning. Commun. Acm 2016, 55, 78. [Google Scholar] [CrossRef]
Shuangbao, Y.; Zhen-he, Z.; Ke, X.; Zhi-sen, W.; Lingwei, S.; Lei, W.; Hai-ying, C.; Xi-yu, X. The Ground Process Segment of SAR Altimeter. Remote Sens. Technol. Appl. 2018, 32, 1083–1092. [Google Scholar]
Yu, H.; Yang, T.; Zhou, L.; Wang, Y. PDNet: A Lightweight Deep Convolutional Neural Network for InSAR Phase Denoising. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–9. [Google Scholar] [CrossRef]
Landy, J.C.; Tsamados, M.; Scharien, R.K. A Facet-Based Numerical Model for Simulating SAR Altimeter Echoes From Heterogeneous Sea Ice Surfaces. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4164–4180. [Google Scholar] [CrossRef]
Zhu, Z.; Zhang, H.; Xu, F. Raw signal simulation of synthetic aperture radar altimeter over complex terrain surfaces. Radio Sci. 2020, 55, 1–17. [Google Scholar] [CrossRef]
Dobson, M.C.; Ulaby, F.T.; Hallikainen, M.T.; El-rayes, M.A. Microwave Dielectric Behavior of Wet Soil-Part II: Dielectric Mixing Models. IEEE Trans. Geosci. Remote Sens. 1985, GE-23, 35–46. [Google Scholar] [CrossRef]
Bengio, Y. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 437–478. [Google Scholar]
Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 509–522. [Google Scholar] [CrossRef]
Salton, G.; Wong, A.; Yang, C.S. A vector space model for automatic indexing. Commun. ACM 1974, 18, 613–620. [Google Scholar] [CrossRef]
Daniel, J.; James, H.M. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice Hall: Saddle River, NJ, USA, 2000. [Google Scholar]
Zhou, K.; Liu, Z.; Qiao, Y.; Xiang, T.; Loy, C.C. Domain Generalization: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4396–4415. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]

Figure 1. Airborne SARAL geometry: (a) airborne SARAL geometry. (b) Doppler pulse footprint.

Figure 2. DDM: (a) DDM without range migration correction. (b) DDM with range migration correction.

Figure 3. Backbone neural network architecture.

Figure 4. Convolution block.

Figure 5. Backbone neural network architecture.

Figure 6. TMP-Net flowchart.

Figure 7. TMP-Net network architecture.

Figure 8. Training sample augmentation diagram in the same state. (a) Brightness. (b) Contrast. (c) Sharpness. (d) Shear. (e) Solarize. (f) Translate.

Figure 9. Training and validation loss.

Figure 10. Flight in DEM database.

Figure 11. Matching results without grid point selection of simulated flight A1–A2. (a) Single point. (b) Three-point weighting. (c) Three-point centroid.

Figure 12. Positioning errors without grid point selection of simulated flight A1–A2. (a) Single point. (b) Three-point weighting. (c) Three-point centroid.

Figure 13. Matching results with grid point selection of simulated flight A1–A2. (a) Single point. (b) Three-point weighting. (c) Three-point centroid.

Figure 14. Positioning errors with grid point selection of simulated flight A1–A2. (a) Single point. (b) Three-point weighting. (c) Three-point centroid.

Figure 15. Real system. (a) Multi-rotated UAV positioning system demonstration. (b) Real system diagram. (c) Multi-rotated UAV Model KWT-X6L-15. (d) UAV carrying SARAL flight.

Figure 16. Real and simulated DDM. (a) Real DDM. (b) Simulated DDM.

Figure 17. Measured flight B1–B2. (a) Terrain B DEM. (b) Flight B1–B2 in DEM. (c) Real-time DDM.

Figure 18. Matching results with grid point selection of measured flight B1–B2. (a) Single point. (b) Three-point weighting. (c) Three-point centroid.

Figure 19. Matching results with grid point selection of measured flight B1–B2. (a) Single point. (b) Three-point weighting. (c) Three-point centroid.

Figure 20. Positioning errors with grid point selection of measured flight B1–B2. (a) Single point. (b) Three-point weighting. (c) Three-point centroid.

Table 1. Parameters of the training dataset.

Parameter	Value
Area Length	60 km
Area Width	40 km
Data Size	5000 × 12

Table 2. System parameters of the airborne SARAL.

Parameter	Value
Bandwidth	20 MHz
Altitude	2.06 km
Speed	15 m/s
Band	X
Pulse Repetition Interval (PRI)	20 µs
Pulses of Each Burst	125
BeamWidth	$60^{\circ}$
PulseWidth	5 µs

Table 3. Flight parameters of the simulated flight A1–A2.

Parameter	Value
Airspeed	15 m/s
Altitude	2.600 km
Flight Distance	8.874 km

Table 4. Average positioning errors without grid point selection of simulated flight A1–A2.

Conditions	Single	Weighting	Centroid
$Δ \bar{x}$	29.4860	49.1124	49.2628
$Δ \bar{y}$	36.2320	35.4576	35.5731
$σ_{x}$	20.3912	182.1755	182.8089
$σ_{y}$	36.8177	33.5617	33.6566
Three-Dimensional	73.7033	96.5331	96.9271
Horizontal	49.4916	70.2910	70.5272
Vertical	40.7725	48.3339	48.5609

Table 5. Average positioning errors with grid point selection of simulated flight A1–A2.

Conditions	Single	Weighting	Centroid
$Δ \bar{x}$	27.9288	21.3061	21.3240
$Δ \bar{y}$	36.8472	23.8934	23.9929
$σ_{x}$	16.5743	13.9867	14.0104
$σ_{y}$	36.7244	29.8962	30.1384
Three-Dimensional	49.7522	38.0175	38.1252
Horizontal	48.6481	36.3968	36.5028
Vertical	8.6003	8.6003	8.6003

Table 6. Flight parameters of the real system.

Parameter	Value
Weight	22.5 ± 0.2 kg
Size	2570 ± 10 mm
Peak Power Consumption	98 W
Airspeed	15 m/s
Altitude	2.060 km
Flight Distance (X distance)	2.285 km
Flight Distance (Y distance)	0.231 km

Table 7. Average positioning errors with grid point selection of measured flight B1–B2.

Conditions	Single	Weighting	Centroid
$Δ \bar{x}$	47.6171	25.0264	25.0235
$Δ \bar{y}$	66.1873	40.4711	40.5502
$σ_{x}$	39.4937	20.0844	20.1297
$σ_{y}$	54.5426	33.6899	33.6797
Three-Dimensional	87.4046	49.6955	49.7463
Horizontal	87.3999	49.6898	49.7407
Vertical	0.5069	0.5069	0.5069

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Y.; Song, A.; Liu, G.; Tan, L.; Xu, Y.; Li, F.; Wang, Y.; Jiang, G.; Yang, L. TMP-Net: Terrain Matching and Positioning Network by Highly Reliable Airborne Synthetic Aperture Radar Altimeter. Remote Sens. 2024, 16, 2966. https://doi.org/10.3390/rs16162966

AMA Style

Lu Y, Song A, Liu G, Tan L, Xu Y, Li F, Wang Y, Jiang G, Yang L. TMP-Net: Terrain Matching and Positioning Network by Highly Reliable Airborne Synthetic Aperture Radar Altimeter. Remote Sensing. 2024; 16(16):2966. https://doi.org/10.3390/rs16162966

Chicago/Turabian Style

Lu, Yanxi, Anna Song, Gaozheng Liu, Longlong Tan, Yushi Xu, Fang Li, Yao Wang, Ge Jiang, and Lei Yang. 2024. "TMP-Net: Terrain Matching and Positioning Network by Highly Reliable Airborne Synthetic Aperture Radar Altimeter" Remote Sensing 16, no. 16: 2966. https://doi.org/10.3390/rs16162966

APA Style

Lu, Y., Song, A., Liu, G., Tan, L., Xu, Y., Li, F., Wang, Y., Jiang, G., & Yang, L. (2024). TMP-Net: Terrain Matching and Positioning Network by Highly Reliable Airborne Synthetic Aperture Radar Altimeter. Remote Sensing, 16(16), 2966. https://doi.org/10.3390/rs16162966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TMP-Net: Terrain Matching and Positioning Network by Highly Reliable Airborne Synthetic Aperture Radar Altimeter

Abstract

1. Introduction

2. Airborne SARAL Geometry and Signal Model

3. Preliminaries

3.1. DDM Inversion

3.2. Data-Driven CNN

3.3. Digital Elevation Model (DEM)

4. Terrain Matching and Positioning Network (TMP-Net)

4.1. Model-Based DDM Generation Module

4.2. Feature Extraction Model

4.2.1. Backbone Network

4.2.2. Middle-Level Feature Fusion

4.3. Loss Function

4.4. Similarity Measurement Module

4.5. Aircraft Positioning Module

5. Processing Flowchart

6. Experiments

6.1. Dataset Preparation

6.2. Model Training

6.3. Experimental Setup

6.4. Experimental Results

6.4.1. Simulated Data Experiment

6.4.2. Measured Data Experiments

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI