1. Introduction
Airborne positioning is crucial for flight safety. Global Navigation Satellite System (GNSS) is widely used for airborne positioning at present. It is a satellite navigation system that transmits encoded signals in Earth orbit via a group of satellites. The receiver on the aircraft receives the echoes, which are used to decode the precise position of the aircraft [
1]. However, due to the satellite base-station and excessive distances, the encoded signals are sensitive to interference. Meanwhile, the open encoding and decoding loop is susceptible to interference by manned interruption. Therefore, GNSS positioning is unreliable in the case of a complicated electromagnetic environment.
Synthetic aperture radar altimeter (SARAL) provides the ability to measure the topographic terrain for matching with the DEM references to achieve aircraft positioning without relying on GNSS. SARAL is a radar system used to measure the height of the radar platform with respect to the ocean and landscape surface [
2,
3,
4,
5,
6], which is not limited by light and climate conditions [
7,
8,
9] and can produce the two-dimensional delay-Doppler map (DDM). Conventional radar altimeters mainly use limited pulses to measure ground elevation parameters. Compared with conventional radar altimeters, the synthetic aperture principle is incorporated, which can provide high resolution and wide coverage. The airborne SARAL gives a significant opportunity to obtain elaborate terrain surface features, making them suitable for use in positioning, and they can be widely used in airborne applications [
10,
11]. SARAL works in down-looking mode, while the common synthetic aperture radar (SAR) works in boardside mode. Some studies are published to perform image matching with measured data acquired from airborne SAR. Specifically, Yunhao Chang et al. discuss a method using scale-invariant feature transform (SIFT) for matching SAR images despite rotation differences [
12]. Oscar Sommervold et al. conduct a comprehensive survey review of various methods for registering SAR with optical images, and address the challenges and recent advancements [
13]. These studies based on SAR to achieve image matching have made great progress. However, SARAL directs the center of the radar beam at the nadir of the flight. It directs the center of the beam-width to the nadir of the flight. Specifically, when the platform flies directly above a ground scatter, the radar transmits wideband signals and receives the echoes after a certain time delay. DDMs are obtained by processing the echoes received by the SARAL. The DDM features reflect the accidented relief of the nadir of the aircraft, and it can be used to detect terrain changes.
Digital Elevation Model (DEM) is a digital representation of the Earth’s surface topography; it records elevation information at different positions in a discrete manner. At present, DEM data on a global scale can be obtained effortlessly, and it is widely used in areas such as terrain analysis and geological research. Due to DEM’s ability to reflect the elevation of the ground, it is possible to use DEM as a reference to match with the DDM for aircraft positioning without relying on GNSS.
However, due to the radar beam center in SARAL being directed vertically at the nadir of the airborne flight, the SARAL echoes exhibit strong coupling of near-vertical terrain. On the one hand, due to the width of the antenna beam of SARAL, the SARAL will receive not only the echoes from the nadir, but also that from the scatters around the nadir. On the other hand, the terrain in the area covered by the radar beam is accidented, and the echo energy of the scatters is different. It may lead to the echoes of the surrounding scatters being received by the radar receiver before the nadir. The echoes from both on and off the nadir will be coupled together; this is known as vertical coupling [
14]. Due to vertical coupling, the similarity of DDMs of adjacent apertures is high. Therefore, it is difficult to achieve accurate image matching with SARAL. Meanwhile, if the observation time is short in SARAL, the DDM resolution is low, which easily leads to low matching accuracy and large positioning errors.
To achieve accurate aircraft positioning, the real-time SARAL DDM is compared with the reference images generated by flight hypothesis and DEM datasets. If the robustness and scenario generalization ability of the matching methods are not strong, the probability of successful matching will decrease. In addition, DDMs are the two-dimensional images containing terrain variation obtained by SARAL. However, DEM is the discrete data reflecting the three-dimensional elevation information. Therefore, the DDM and the DEM cannot be matched directly.
Terrain matching can be realized by using image matching methods. Image matching methods can be mainly divided into two categorizes. They are based on image gray scale correlation [
15] and image feature correlation [
16], respectively. Due to the huge amount of calculation and the related calculation of many unnecessary areas, the matching methods based on gray scale are difficult to apply when there are too many images. The methods based on feature correlation have strong robustness [
17,
18,
19,
20,
21], and their high-precision characteristics are suitable for matching in complicated terrain. However, conventional methods based on feature correlation have locality; they focus on the local area of the image rather than the whole image. The differences of DDMs in SARAL between adjacent apertures are very small, which makes it difficult to realize matching by using conventional matching methods based on feature correlation.
Deep learning is a possible way to incorporate feature correlation to distinguish DDM differences among adjacent apertures, since it utilizes a large amount of datasets for data-driven learning. Specifically, deep learning has a strong ability to fit nonlinear relationships and effectively process massive amounts of data, which is widely used in various downstream tasks [
22,
23,
24,
25,
26,
27,
28,
29,
30]. It is a typical data-driven method and generally employs a neural network architecture to fit the training data for an accurate formulation. Due to the ability of convolutional neural network (CNN) to extract high-dimensional features, CNN has made progress in image matching methods. To accommodate practical applications, many CNN variants [
23,
31] have been invented. They attract wide interest in SAR imaging fields, and researchers are motivated to adopt deep learning to solve the image matching task in different scenarios [
12,
13,
18,
32,
33,
34]. However, the existing tomographic image matching methods rely on visual images that the human eye can recognize and the human brain can understand. DDMs are obtained by projecting the three-dimensional terrain onto the two-dimensional plane using SARAL. Due to the fact that the echoes from both sides of SARAL are coupled to each other, the resultant DDM only reflects the variation in terrain from the top view. Therefore, it is difficult for human eyes to directly interpret. It is difficult to match the DDM and references generated from the DEM. Firstly, the DDMs and DEM references belong to different modalities, so their similarity cannot be measured and they are difficult to match directly. Secondly, due to the lack of formulation guidelines and limited interpretation of CNNs, it is difficult to achieve the mapping between DEM and DDM with the help of CNNs. Thirdly, the raw data for practical DDMs are insufficient, and there are no public datasets. It is difficult to obtain sufficient training data without public datasets and data generation methods. Finally, due to SARAL’s near-vertical observation, the similarity of DDMs of adjacent apertures is high. Therefore, it is difficult for existing deep learning methods to match the DDM with the DEM references, and the accuracy for possible matching is poor.
In this paper, a novel and model-driven deep learning algorithm by airborne SARAL is proposed for aircraft positioning. Specifically, to solve the problem of high similarity between two adjacent aperture DDMs, a terrain matching and aircraft positioning network (TMP-Net) is designed. It is capable of accommodating low image quality, and the probability of successful matching is improved to a great extent. Firstly, the model-driven method is used to realize the mapping from DEM references to the DDM dataset. Therefore, it enhances the interpretability the proposed network, and generates the DEM-based DDM references for image matching. Secondly, the CNN is used to extract the fine features of DDMs, which realizes the effective differences of DDMs of adjacent apertures. Meanwhile, the triplet loss [
35] and the softmax loss [
36,
37] are used to optimize the network parameters in a weighted way to improve the probability of successful matching. In addition, the cosine similarity is used to measure the similarity between the deep feature embedding vectors after feature extraction for real-time DDMs and DEM-based DDM references. Finally, due to the output of the conventional image matching network, there is similarity between the two images, which does not achieve positioning. The aircraft positioning module is added based on the image matching neural network. Three different positioning methods are selected to output positioning coordinates, respectively. Unlike conventional image matching, which uses visual images, DEM and SARAL DDM are used, and model-driven CNN is utilized to achieve inter-modal data mapping, accomplishing the terrain matching and aircraft positioning. A series of qualitative and quantitative comparison experiments are set up using simulated and measured data to demonstrate the effectiveness and adaptability of our network.
2. Airborne SARAL Geometry and Signal Model
Airborne SARAL is different from airborne SAR. SAR works in boardside mode. However, SARAL works in down-looking mode. The airborne SARAL geometry is provided in
Figure 1a. The Doppler beam is equivalent according to the synthetic aperture principle, and the azimuthal resolution in the along-track direction is improved. In
Figure 1a, the airborne SARAL flies in a straight line, transmits wideband pulses, and receives echoes from the scatter over a synthetic aperture. Within the beam-illuminated area, SARAL will observe the ground scatter not only from the near-vertical position, i.e., at position B in
Figure 1a, but also some other scatters along the flight, such as positions A and C in
Figure 1a. According to the basic principle that features at different angles in the beam area generate different Doppler frequencies, Doppler beam sharpening (DBS) can be employed. SARAL can make full use of the Doppler frequencies to further divide an antenna beam into several sub-beams [
38].
Figure 1b represents the concentric circle footprint from the moment of position B. The pulse footprint in the form of concentric circles is subdivided by adding isometric and Doppler lines according to the Doppler frequency. In
Figure 1b, the azimuth is the flight direction of the aircraft. However, only the nadir echoes are useful for terrain matching and aircraft positioning.
In
Figure 1a, the airborne SARAL flies along a flight path in the
X-axis direction at an height of
with a constant speed
v. The coordinate of the ground scatter
L is
, where
=
. The coordinates of position B for the airborne radar are
, located directly above position
L. The goal of the airborne SARAL is to obtain high-resolution DDM at position
L. The airborne SARAL transmits wideband pulses with linear frequency modulation when it passes directly above position
L. Then, the SARAL receives echoes after a time delay. Therefore, the received echoes can be expressed as
where
is the fast-time regarding to the time delay of direct range,
is the slow-time of the pulse repetition,
is the chirp rate, and
c is the speed of light. In (
1),
is a constant representing the back-scattering coefficient,
represents the phase change of radar signals caused by the surface scattering process, and
is the instantaneous range between the airborne SARAL and the ground scatter. After range de-ramping and range compression [
8], the echoes in the baseband can be written as
where
is the complex-valued constant, which is ignored in subsequent derivations. In (
2),
is the range envelope that denotes the response after the range compression. Usually,
is a Sinc function. The instantaneous range
of the scatter can be given as
where
is the reference height. Substituting (
3) into (
2), the echo can be given as
Azimuth phase modulation can be clearly seen from the second exponential term in (
4). Since the phase is a function of
, the azimuthal chirp rate can be calculated as
where the azimuthal chirp rate gives a linear relationship between the Doppler frequency
and the slow-time
as
Next, a Fourier transformation in the azimuthal direction is employed, and (
4) is converted into the delay-Doppler domain as
where
is the range migration. In (
7),
is the frequency domain form of azimuth antenna pattern
.
4. Terrain Matching and Positioning Network (TMP-Net)
In this paper, the near-vertical coupling between adjacent apertures, leading to a high DDM similarity, low image quality, and low matching accuracy are considered. A model-driven terrain matching and aircraft positioning network (TMP-Net) is proposed, which is capable of realizing terrain matching and aircraft positioning without relying on the GNSS. The framework of the TMP-Net is an end-to-end network, including a model-based DDM generation module, a feature extraction model, a similarity measurement module, and an aircraft positioning module. The four modules are embedded into a single architecture. The model-based DDM generation module, feature extraction model, similarity measurement module, and aircraft positioning module are designed for DEM-based DDM reference generation, DDMs feature extraction, feature vector similarity measurement, and output positioning coordinates, respectively. In addition to TMP-Net, a specific loss function is designed to effectively distinguish DDM at different apertures.
4.1. Model-Based DDM Generation Module
Research on terrain matching and aircraft positioning based on DEM is insufficient at present. Due to the time complexity of the methods and the large amount of data, the research can only be carried out in a small area. Meanwhile, real-time aircraft positioning cannot be guaranteed. In this paper, the model-driven CNN is used to expand the terrain matching and aircraft positioning based on DEM and DDM to a large area. It focuses on the breakthrough in real-time, accurate, and large-area terrain matching and aircraft positioning. Meanwhile, DDM inversion, DEM mapping to DDM, and terrain matching positioning are integrated to form a complete terrain matching and aircraft positioning system.
Due to the lack of mathematical interpretability of the data-driven CNN, it is difficult to implement DEM mapping to DDM effectively. Therefore, a novel DDM generation algorithm is proposed in this paper, which plays a semi-model-driven role in the intended network, so that the real-time DDMs and DEM references can be matched in the same dimension [
40,
41].
High-precision DDM training data are the basis of terrain matching and aircraft positioning, and the coordinates of each scatter can be obtained as
from (
11) according to the DEM. The current position of the platform is
and the speed is
v. Although SARAL observes scatters at the nadir, its power is mainly based on back-scattering [
42]. The position vector can be given as
where
is the current position of the platform and
is the coordinates of each scatter,
, and
represents the total set of scatters. The Doppler frequency of each scatter can be obtained from (
12) as
where
is the real-time speed of the platform and
is the radar wavelength. The relative range of each scatter is (
14) and the back-scattering coefficient is (
15).
where
and
depend on the type of land cover medium [
40,
41,
42]. From (
14) and (
15), the reflection of a single scatter is calculated as
Accumulate each element of the DDM matrix based on the sets of range and Doppler indexes as
where
is the index set of the
m-th range gate and
is the index set of the
n-th Doppler channel. After the above-mentioned processing, the DDM power matrix can be obtained as
, where
M is the number of pulses and
N is the number of range gates. Therefore, sufficient DDMs can be obtained. Then, the amplitude value of the DDM can be mapped to the range of 0–255 to obtain
. The matrix
is the normalized DEM-based DDM reference.
4.2. Feature Extraction Model
Before feature extraction, grid point selection will be used to improve the matching efficiency and positioning accuracy of the proposed network. Due to the fact that the aircraft follows a preset flight in practice, a series of flight points in the preset flight are selected. A series of grid points around the flight points can be divided into the X direction with 92 m as the interval, Y direction with 90 m as the interval, and Z direction with 205 m as the interval. In such a case, the number of grid points of the simulated flight A1–A2 is 3146 and the number of grid points of the measured flight B1–B2 is 885.
It can limit matching area while traversing multiple points, taking into account the matching efficiency and positioning accuracy. The grid points are divided off-line. We assume that the start point, end point, and time interval of the aircraft flight are determined. In such a case, the approximate flight direction and speed can be assumed to be in a small and negligible confidence interval. In different flights, a series of grid points are established off-line along the flight. Then, a series of DEM-based DDM references are generated according to the predetermined flight and the model-based DDM generation.
The real-time DDMs are obtained by processing the SARAL echoes. Each real-time DDM need to be matched with each DEM-based DDM reference to measure similarity. It will definitely result in huge calculation and low efficiency. To some extent, the DEM-based DDM references of the grid points far away from the flight points are also used to match the real-time DDM. It is time-consuming and will reduce the matching efficiency. Meanwhile, the matching error is introduced, which increases the positioning error. To further improve the matching efficiency and reduce the positioning error, grid points need to be selected before matching. Therefore, spheres with the flight points as the center and a radius of 200 m are designed. Only the DEM-based DDM references of the grid points inside the spheres are kept for matching with real-time DDM. However, the grid points outside the spheres are selected. Subsequent experiments show that the positioning error can be significantly reduced by grid point selection.
On the basis of improving matching efficiency and positioning accuracy through grid point selection, a feature extraction module is used to realize the effective feature extraction of DDMs. Specifically, the feature extraction module mainly contains a backbone network and middle-level feature fusion.
4.2.1. Backbone Network
The data-driven CNN is often used to extract fine features of images of interests. The common architectures of it include ResNet [
23] and VGG [
31]. ResNet18 [
23] in ResNet is widely used for downstream tasks. Therefore, ResNet18 is considered as the backbone network in this paper.
Specifically, ResNet18 contains five convolutional groups. Each convolutional group contains one or more convolution operations. Multiple similar residual blocks are contained in the second to fifth convolution groups, which can also be called stage1, stage2, stage3, and stage4. ResNet18 is widely used in image feature extraction because of its superior performance. However, ResNet18 is not sufficient to meet the requirements of the paper. Due to average pooling being used by ResNet18 to extract the regional mean, it is not beneficial for preserving the local and fine features of DDMs of interest. Therefore, an improved ResNet18 is designed in this paper. First, the convolution group is discarded. Second, in the first residual convolution group, the maximum pooling operation is discarded. Meanwhile, the step size in the first convolution operation is adjusted to 2. Next, the average pooling after the fourth residual convolution group is adjusted to the maximum pooling. This is because the regional mean value is extracted by the average pooling, and the regional maximum value is extracted by the maximum pooling. In contrast, the local features of the DDMs are more likely to be preserved by the maximum pooling. Meanwhile, the maximum pooling can reduce smoothness and focus on fine features of the DDMs. Finally, after the maximum pooling, the fully connected layer is added as a bottleneck structure to reduce the dimension. The improved ResNet18 network architecture is shown in
Figure 3, where ‘conv’ indicates the convolution layer, and ‘/2’ indicates that the step size is 2. In
Figure 3, ‘maxpool’ indicates the maximum pooling layer and ‘fc’ indicates the fully connected layer. The local and fine features of DDMs of interest can be extracted effectively by utilizing the improved ResNet18.
4.2.2. Middle-Level Feature Fusion
To efficiently extract the global and local features of DDMs, the improved ResNet18 is used as the backbone network. Then, the middle-level features are extracted and fused to obtain the intended DDM features. Specifically, the output of any layer in deep learning can be considered as the middle-level feature. In this paper, the outputs of stage2 and stage3 are adopted as the middle-level feature.
A convolution block is required to change the size of the middle-layer features, according to the size of the feature map in the backbone network. The convolution block is a bottleneck structure, and it contains two convolution layers. The complete convolution block is shown in the
Figure 4. Moreover, after the convolution block, maximum pooling is adopted in the middle-layer feature fusion. Then, to obtain the intended DDM features, the middle-layer features of stage2 and stage3 are fused with the top-layer features of the backbone network.
To sum up, to effectively extract the global and local features of DDMs, the feature extraction module
is designed in this paper. The improved ResNet18 is used as the backbone network and middle-layer feature fusion is added. The complete feature extraction network is shown in
Figure 5. Specifically, the black line represents the backbone network. The red line indicates the middle-level feature fusion with cross-layer connectivity. The feature extraction module
is used to extract depth feature embedding vectors of real-time DDMs and DEM-based DDM references.
Feature extraction module
is used to extract features from the real-time DDMs. The real-time DDM after range migration correction and normalization is obtained as
. The depth feature embedding vector of the
can be obtained as
Similarly, the feature extraction module
is used to extract features from the DEM-based DDM references. The DEM-based DDM references after normalization are obtained as
. The depth feature embedding vector of
can be obtained as
The network architecture shown in
Figure 5 is used, and intended DDM features are extracted effectively. Meanwhile, the explicit mapping from input DDM to output depth feature embedding vectors is realized.
4.3. Loss Function
The loss function plays a crucial role in deep learning [
43]. To improve the positioning accuracy and generalization ability of the model, model parameters are adjusted by minimizing the loss function. The selection of appropriate loss function can reflect the characteristics of the task and deal with the data imbalance and overfitting. It plays an important role in model training.
To solve the problem of high similarity between DDMs in adjacent apertures, a triplet loss function is used in this paper. The triplet loss function is widely used in deep learning. The core idea is to make the features of the same label as close as possible in spatial position, while making the features of different labels as far away as possible [
35]. The triplet anchor, positive, and negative need to be set in the triplet loss. The anchor and positive are different samples of the same category, and the anchor and negative are different categories.
,
, and
are a set of data input into the CNN; the idea of a triplet loss function is expressed in the Euclidean norm as
where
i indicates batch number, and
is the threshold. In (
20),
A,
P, and
N represent the anchor, positive, and negative, respectively. In such cases,
,
, and
are the feature mappings to
,
, and
. According to (
20), the triplet loss function can be written as
where
is the operator that will return the specific value when the value in
is greater than zero, and otherwise return zero. In (
21),
d is used to represent the distance function between two arbitrary samples.
Meanwhile, the loss function in (
21) can be further simplified as
In (
21) and (
23), the distance between two arbitrary heterogeneous samples is larger than the distance between two arbitrary homogeneous samples plus
. When (
20) is not satisfied, the loss is larger than zero, and the network updates the parameters through back-propagation. Otherwise, the loss is zero. The samples meet the training requirements, no gradient is generated, and the parameters of the network do not need to be updated.
4.4. Similarity Measurement Module
To determine if the terrain matching is successful, the similarity of the depth feature vectors between real-time DDMs and DEM-based DDM references is measured. Because the cosine similarity is efficient and suitable for high-dimensional data, it is employed for measuring the similarity of the DDMs in this paper. It is different from the similarity measurement methods that solve the linear distance between two vectors in multi-dimensional space, such as Euclidean norm [
44]. The cosine similarity measures the difference between two vectors by solving the cosine value of the angle of two vectors in vector space [
45,
46]. It reflects the difference of direction in the vector space, regardless of the size of the vector. From (
18) and (
19), the depth feature embedding vectors
F of real-time DDMs and the depth feature embedding vectors
T of DEM-based DDM references can be obtained. Therefore, the cosine similarity can be defined as
where the operator · is the inner product operator. In (
24), the smaller the angle between two vectors, the higher the vector similarity.
To sum up, the cosine similarity between the depth feature embedding vector of real-time DDMs and the DEM-based DDM references can be calculated. Then, the similarity values are obtained and the descending indexes are sorted out, where indicates the highest similarity, indicates the second similarity, and indicates the third similarity. They are used in the aircraft positioning module to realize the positioning of aircraft.
4.5. Aircraft Positioning Module
The real-time DDMs are matched with each of the DEM-based DDM references in turn, and the matching result is returned by the similarity value. Therefore, following the feature extraction module and similarity measurement module, an aircraft positioning module is designed, so that the positioning coordinates can be obtained according to the matched coordinates of the DEM-based DDM references. In this paper, three different positioning methods are considered, including single-point matching, three-point weighting, and three-point centroid. After theoretical analysis and experimental validation, three-point weighting is preferred in the scenario of SARAL DDM positioning.
Single-point matching is a method to obtain the positioning coordinates based on the matched DEM-based DDM references with the highest similarity. The calculation of single-point matching is relatively simple. The coordinates of the matched DDM reference are considered as the positioning coordinates, and the calculation process can be given as
where
are the positioning coordinates of the single-point matching output and
are the coordinates of the DEM-based DDM references with the highest similarity
.
The three-point weighting and three-point centroid are the methods to obtain the positioning coordinates based on the matched DEM-based DDM references with the top three similarities. Three-point weighting refers to the result obtained by weighting the coordinates of the top three matched DEM-based DDM references with their similarity for the positioning coordinates as
where
are the positioning coordinates of the three-point weighting output,
are the coordinates of the DEM-based DDM reference with the second similarity
, and
are the coordinates of the DEM-based DDM reference with the third similarity
.
The three-point centroid is the centroid of the coordinates of the matched DEM-based DDM references with the top three similarities for the positioning coordinates as
where
are the positioning coordinates of the three-point centroid output. To sum up, three different positioning methods are used to match real-time DDMs and DEM-based DDM references, and the real-time position of the aircraft can be obtained.