Glacier Motion Monitoring Using a Novel Deep Matching Network with SAR Intensity Images

: Synthetic Aperture Radar technology is highly convenient for monitoring the glacier surface motion in unfavorable areas due to its advantages of being independent of time and weather conditions. A novel glacier motion monitoring method based on the deep matching network (DMN) is proposed in this paper. The network learns the relationship between the glacier SAR image patch-pairs and the corresponding matching labels in an end-to-end manner. Unlike conventional methods that utilize shallow feature tracking, the DMN performs a similarity measurement of deep features, which comprises feature extraction and a metric network. Feature extraction adopts the framework of a Siamese neural network to improve the training efﬁciency and dense connection blocks to increase the feature utilization. In addition, a self-sample learning method is introduced to generate training samples with matching labels. The experiments are performed on simulated SAR images and real SAR intensity images of the Taku Glacier and the Yanong Glacier, respectively. The results conﬁrm the superiority of the DMN presented in the paper over other methods, even in case of strong noise. Furthermore, a quantitative 2D velocity ﬁeld of real glaciers is obtained to provide reliable support for high-precision, long-term and large-scale automatic glacier motion monitoring.


Introduction
The excessive melting of glaciers will lead to a series of ecological and environmental problems, such as a rising sea level and sudden natural disasters. On 7 February 2021, a massive mudslide occured in Chamori, Uttarakhand, India, causing severe damage and numerous casualties. Shugar et al. [1] proposed a rapid and comprehensive disaster cascade reconstruction method, which idetified the driving factors of the disaster by analyzing various types of data including satellite images, seismic data, numerical simulation and on-site videos. Multiple complex factors cause avalanches of rock and ice to quickly turn to the disaster. Therefore, increasing attention has been paid to the study of glacier motion changes. It is particularly significant and necessary to estimate glacier motion in order to provide quantitative records and predict the trend in motion [2].
At present, glacier motion monitoring mainly relies on remote sensing images acquired from different sensors. Satellite optical images are commonly used for glacier motion monitoring, with characteristics of high-resolution, wide coverage and low cost. However, they are susceptible to illumination condition and weather phenomena, resulting in discontinuous and insufficient data acquisition. As an active microwave remote sensing, SAR (Synthetic Aperture Radar) technology can collect reliable and long-term data even in harsh environments [3]. Along with the progress in SAR imaging techniques, increasing amounts of geometry information can be extracted from SAR imagery, which provides favorable support for improving glacier motion estimation.
Generally, glacier motion monitoring approaches using SAR images are divided into two categories: interferometric SAR (InSAR) technique-based and offset trackingbased approaches.
The basic principle of the InSAR technique-based approach is to utilize the phase information of complex SAR images obtained at different times [4]. The InSAR technique-based approach mainly includes three representative modes: differential InSAR (D-InSAR) [5], multi-aperture InSAR (MAI) [6] and coherence tracking [7]. Coherence tracking is usually utilized in combination with other methods, such as D-InSAR and feature tracking. Although these methods can provide precise estimation results, the main limitation is that InSAR mainly provides LOS (Line of Sight) aligned displacements. In addition, these methods are generally limited by the loss of coherence, which is influenced by many factors (i.e., ice melting, rapid flow velocity and large data acquisition intervals.) The offset tracking utilizes the "correlation like" methods and the amplitude information of SAR images to estimate glacier motion. There are three typical methods, including intensity tracking, feature tracking and speckle tracking. Intensity tracking calculates the relative displacements of image patches in different directions based on cross-correlation algorithms. Normalized cross-correlation (NCC) [8] and phase correlation (PC) [9,10] are the state-of-the-art methods in spatial and frequency domain, respectively. They are commonly used for glacier motion monitoring. However, the performance is influenced by the correlation of texture patterns on the glacier surface and the speckle noise of SAR images. Some researchers have come up with different methods to improve it. For example, Muhuri [11] took advantage of stokes vector correlation to obtain more robust and higher SNR (signalto-noise ratio) results. Feature tracking estimates the glacier motion velocity based on the relative displacement of identifiable objects or feature points on the glacier surface [3]. The features can be selected manually [12] or automatically through feature operators [13]. It is of great importance to select feature points that are relatively stable and not sensitive to time variation, such as point-like features or coherent scatterers [14]. Speckle tracking can be considered as a special SAR feature tracking method. The speckle pattern of an SAR image is a kind of high-frequency and inevitable noise and can be correlated between SAR image pairs in some conditions. The speckle tracking method is usually performed using correlation-based algorithms [15,16]. The performance of the offset tracking-based approach is related to the cross-correlation algorithms and feature operators, and it is affected by meteorological and flow characteristics. Although the accuracy of the offset tracking-based approach is lower than that of the InSAR technique-based approach, offset tracking is effective for large or discontinued glacier surface displacements [17].
Accurate glacier surface velocity estimation depends on precise geometric matching of the images acquired on different dates. During the past decades, various image matching methods have been proposed, which are mainly distinguished as area-based, feature-based and deep learning-based approaches.
Area-based approaches directly compare the similarity between the intensity of two images by some appropriate metric criteria. It is an optimization problem for similarity maximization. The statistic correlation or feature correlation of two images can be considered as the metric criterion, for instance, mutual information [18,19] and correlation methods [20,21]. These methods are easily implemented, but they are liable to obtain local optima when the images are occupied by serious noise.
Feature-based approaches generally establish the relationship between images by mining the correspondence between the salient features of images. They can deal with image distortion and deformation to some extent, making them more efficient in practical applications. A commonly used feature descriptor is the scale-invariant feature transform (SIFT) [22], but it is affected by the radiation and geometric characteristics of remote sensing technology. Some improvements are proposed to resolve this problem [23,24]. However, they require a wealth of professional domain knowledge to manually select or design the feature extractor, which causes them to lack a certain generalization.
The crux of the problem is to match and track the feature points on the glacier SAR intensity images. Therefore, the improvement of accurate feature matching is a key problem to be solved. In recent years, deep learning shows its superiority in remote sensing image processing [25][26][27]. A stacked multi-layer architecture is adopted to learn high-level features through nonlinear transformation and exploit the inherent structures and distribution characteristics of the data [28]. Compared with hand-crafted and shallow features, a deep learning network can extract deep features to obtain more precise and robust matching results [29,30]. Traditional glacier motion monitoring is based on shallow features of SAR intensity images for image matching. However, SAR intensity images used for glacier motion monitoring usually have differences in image intensity values, geomorphic features, texture information and speckle noise, as well as differences in shallow features extracted from them. It increases the difficulty of finding the correct correlation. Therefore, the feature learning ability of deep learning is suitable for the precise matching of glacier SAR intensity images, which can be used to obtain a more accurate estimation of glacier motion. Charrier [31] applied two co-registration methods, GeFolki and PWC-Net-multimodal, to ice flow estimation. GeFolki is based on optical flow equations, and PWC-Net-multimodal is based on the PWC-Net architecture, which is a compact CNN model for optical flow. The study confirmed the ability of optical flow methods to estimate ice flow using optical and SAR images, and PWC-Net-multimodal needs further research. Meanwhile, optical flow methods are more suitable for small changes of surface features.
Due to the ability to explore the temporal or spatial correlation information of images, deep convolutional neural networks (DCNNs) have achieved excellent results in image processing. In this paper, DCNNs are introduced into glacier motion monitoring to obtain more precise feature matching. However, this method faces great challenges, since glacier SAR intensity images have inherent speckle noise, a large number of repeated textures and non-grid deformation. Moreover, the glacier surface motion must conform to physical rules. The main problem is the lack of labeled training samples. Consequently, the urgent problems to be solved are the generation of sufficient labeled samples and the design of a reasonable and robust model, which can suppress the speckle noise, achieve accurate matching and have strong generalization ability. This paper proposes a novel glacier motion monitoring method based on a deep matching network (DMN), which can extract the high-level features from SAR intensity images and perform exact matching for glacier motion estimation. It is compared with other motion estimation methods using simulated SAR images, and it is verified on real SAR intensity images of the Taku Glacier and the Yanong Glacier. The specific contributions of this work are indicated as follows: • We exploit the application potential of deep learning in glacier surface motion monitoring, and we take advantage of deep features to achieve glacier motion estimation with high precision, providing method support for large-scale and long-term glacier dynamic monitoring. • We propose a deep matching network (DMN) to extract deep features from multitemporal glacier SAR intensity images and measure the features. The DMN directly learns the relationship between SAR image patches and their corresponding matching labels in an end-to-end manner. • A self-sample learning method is introduced to solve the problem of the lack of training samples with matching labels, which makes the DMN training more robust with strong generalization ability. • The proposed method uses the DMN to obtain accurate pixel-level matching, and it takes advantage of a peak centroid-based subpixel matching method and outlier removal approach to acquire subpixel displacement. The experiment results show that the proposed method can acquire precise estimation and is less affected by SAR noise.
The sections of this paper are arranged as follows. Section 2 introduces the DMN model and shows how to achieve glacier motion estimation in detail. Section 3 describes the experimental data and settings, and it demonstrates the results obtained from experimentation on simulated SAR images and real SAR intensity images, respectively. Section 4 discusses the experimental results and limitations of the proposed method. Finally, Section 5 gives conclusions with regard to the proposed method and prospects for future work.

Methodology
In order to conduct long-term and large-scale dynamic monitoring for glaciers and automatically obtain accurate flow velocity on the glacier surface, we propose a novel glacier motion measuring method based on the deep matching network (DMN) with SAR intensity images, in which deep learning is applied to the glacier SAR images to achieve automatic and precise matching. As shown in Figure 1, the framework mainly includes the following parts: SAR image pre-processing, deep matching network and glacier motion estimation.

Deep Matching Network
Peak centroid-based subpixel matching 2D shift calculation Glacier motion results Firstly, the initial registration and processing of SAR images are performed. The samples are prepared for network training by a self-sample learning method. In the second phase, the DMN is designed and trained to match the SAR image patch pairs densely and accurately. Finally, an effective subpixel matching method based on peak centroid is used to obtain the displacement with subpixel accurac so as to estimate the glacier surface velocity field. The details are given below.

Self-Sample Learning for Model Training
Sufficient samples are of great importance to the training of a robust deep learning network due to a large number of parameters. However, there are few image data and corresponding labels available for glacier SAR image matching. Therefore, a self-sample learning method is proposed to address these problems. The purpose is to generate training images with matching labels. In general, an image patch is matched with itself and its augmentation versions, and it is not matched with image patches in other locations of the image and their augmentation versions. As shown in Figure 2a, an image is divided into image patches with a size of 64 × 64 pixels. On the right of Figure 2a, patch A and patch B are selected as an example. Patch A is matched with itself and not matched with patch B. Figure 2b,c show some examples of augmentation versions of patch A and patch B, respectively. There are different data augmentation methods, such as image scaling, displacement, rotation, flip and shear transformations. The method is introduced to produce a large number of labeled SAR image pairs. With this data generation process, we are able to train the DMN without collecting new data.

Deep Matching Network
A deep matching network is proposed to extract deep features from glacier SAR intensity images and perform accurate matching. Inspired by the significant superiority of jointly learning feature representation and similarity measurement [29,32], the DMN adopts a unified framework, including a feature extraction network and a metric network.
Due to the large size of glacier SAR intensity images, we carry out the patch-based image processing and use deep convolutional neural networks (DCNNs) for feature extraction, which can retain the connection between the neighborhood and the local features. Compared with fully connected networks, DCNNs have fewer network parameters. However, the deeper a network is, the more likely vanishing gradients and overfitting will occur. Dense connection blocks (DCB) are introduced to alleviate this problem, which establish the connection between the subsequent layer and all the previous layers of each dense block and allow the features to be fully utilized within each dense block, achieving excellent performance with fewer network parameters [33]. Thus, we will combine deep convolutional neural networks with dense connection blocks (DCNNs-DCB) to extract complex features from image patches. The metric network learns to compare the similarity of the feature representations through several fully connected layers. The DMN establishes the relationship between the image patch-pairs and the labels through supervised training.
Suppose that the glacier SAR intensity images to be matched are expressed as X and Y. The image patch-pairs are acquired as {(x i , y i ); i = 1, 2, · · · , K}, where K is the number of patch-pairs. As shown in Figure 3, these patch-pairs are considered as the input of DMN. The feature extraction network takes advantage of Siamese network architecture to promote training and testing efficiency. Each branch introduces DCB to improve information transmission. For the k-th layer of each dense connection block, the feature maps obtained by all previous layers with the same spatial size are used as its input. It can be expressed as represents concatenating the feature maps of layers 0, 1, · · · , k − 1 and S(·) refers to the combination of frequently used nonlinear operations. As shown in Figure 3, the j-th channel of the l-th layer f l j is denoted as follows, where f l−1 i is the i-th output of the l − 1-th layer, S l−1 represents the number of feature maps at the l − 1-th layer, ω and b represent the weight and bias of the network, * denotes the convolution operator, and σ(·) adopts the ReLU activation function, which can be calculated by σ(x) = max(0, x).
The multiple inputs of S(·) are concatenated into a tensor. Table 1 gives the details of the network structure. Conv is the abbreviation of convolution layer and DCB represents dense connection block. Pooling layers are used to reduce the dimensionality of each feature map, commonly including max-pooling and average-pooling.
The metric network is used to compare the similarity between two deep features, which is determined by the Euclidean distance. During the network training, the distances between matched or similar inputs are minimized.
In the training process, we choose appropriate training patch-pairs and their labels to carry out joint end-to-end training based on stochastic gradient descent. The goal is to minimize the contrastive loss [34], where Label is the matching label to the image patch-pair x and y, which are valued as 1 or 0. Label = 1 means that the image patch-pair is matched, and Label = 0 if they are nonmatched.x andŷ are denoted as the learned features of patch x and y, respectively. m > 0 indicates the threshold value, which represents the largest distance between dissimilar features. N represents the number of samples. D W is the Euclidean distance betweenx and y, D W (x,ŷ) = x −ŷ 2 .

Subpixel Displacement Estimation and Post-Processing
Accurate image patch matching and pixel-level displacement can be achieved through DMN. However, it is not adequate for precise and compact displacement estimation, which relies on the quality and resolution of SAR intensity images and application requirements. Various methods have been developed to estimate the displacements with non-integer accuracy. The peak centroid is a straightforward and effective method to acquire the subpixel peak location in the correlation surface [9]. Firstly, two sub-images with a size of 64 × 64 pixels are taken around the pixel-level matching position. Secondly, the position of the correlation peak is searched again. Then, a subpixel position ( a, b) of the correlation peak is obtained by calculating the weighted average of the correlation values.
where a and b are the column and line indexes of Ω. Ω represents a neighborhood of the correlation peak. The size of Ω is set as 5 × 5 pixels. C(a, b) represents the correlation values between a and b, which can be obtained by calculating the cross-power spectrum.
The correlation values below the mean values in the neighborhood Ω are excluded. Due to the inherent defects of SAR images and the scene conditions of glacier motion, dense matching inevitably has some outliers. For example, regions outside the glacier which are not part of the glacier motion monitoring areas are stationary. Therefore, it is necessary to remove the outliers using post-processing methods. According to the neighborhood consistency of each velocity point, we take advantage of a mean filter with a window size of 3 × 3 pixels to remove the estimated results outside the glacier areas and correct the results of abrupt changes in value and direction with the surrounding velocity points.

Results
Both simulated SAR images and real SAR intensity images are used to evaluate the performance of the proposed method quantitatively and qualitatively. Section 3.1 introduces the training data and experimental settings. Section 3.2 describes the generation process of simulated SAR data and quantitatively compares the proposed method with other methods. Section 3.3 introduces the real SAR intensity images and measures the surface motion of real SAR image data from the Taku Glacier and the Yanong Glacier.

Training Data and Experimental Settings
In this paper, we choose a high-resolution remote sensing image with texture details for training, as shown in Figure 4. Because SAR intensity images used for glacier motion monitoring have already been orthorectified and initially registrated, precise displacement is of great importance. The size of training image is 512 × 512 pixels. We take image patches with a size of 64 × 64 pixels in both x and y directions with an interval of 8 pixels, and we collect training samples with matching labels following the self-sample learning method described in Section 2.1. For each training sample, a random data augmentation pipeline with a probability of 50% is used to obtain three different data augmentation samples. The random data augmentation pipeline includes Gaussian blur filtering with coefficients varying from 0 to 0.5, image rotation with the factor varying from −5 • to 5 • , and image contrast enhancement with the factor varying from 0.5 to 1.5. Finally, all the training samples are added with random multiplicative noise with a variance of 0.1. By using the self-sample learning method mentioned in Section 2.1, we consider that a training sample is matched with its three augmentation versions, and it is non-matched with three randomly selected samples at other locations or their augmentation versions. In this way, a total of 19,494 pairs of training samples are obtained, which were half matched and half non-matched. During the training process, 70% of training samples are randomly chosen as the training dataset and the rest are chosen as the validation dataset. We repeat the training process five times. All the models are trained using the Adam optimizer [35] with an initial learning of 10 −3 for 50 epochs and a batch size of 64. The weight decay is set to 10 −10 . The threshold value m of contrastive loss is set to 1.

Simulated SAR Data
In this paper, simulated SAR data are generated using optical image data to verify the effectiveness of the proposed method and provide quantitative comparisons with other methods. The generation mode follows the process described in [36]. SAR images have inevitable multiplicative noise following a Rayleigh distribution. Thus, such noise is multiplied with the optical images to produce simulated SAR images. Three different optical images are designed for the performance tests, as shown in Figure 5a-c. The three scenes are denoted as image A, image B and image C, respectively. Firstly, the simulated SAR images are achieved by multiplying random noises with the optical images. To verify the performance of DMN in different noise environments, the mean value of noise remains 1, and the variance increases from 0.1 to 1 with steps of 0.1. Two simulated SAR images are obtained by randomly adding the same noise variance to an optical image. Then, a subpixel displacement is performed on one of them based on the interpolation method, so a pair of simulated SAR images is obtained. In the experimental settings, the displacements in the x-direction range from 0.25 to 2.5 pixels with an increment of 0.25 pixels, while the displacement in the y-direction is fixed to 2.5 pixels. Therefore, for the same optical image, 100 simulated SAR image pairs with different displacements and speckle noises can be obtained. Figure 5d-f show some examples of the simulated SAR images.

Experiment Results
Root mean squared error (RMSE) is utilized to evaluate the change degree of the data and measure the average error. In our experiments, Centroid_NCC, Centroid_OC, SimiFit_NCC, SimiFit_PC and Upsampling are applied as comparisons to evaluate the performance, which adopt different combinations of similarity measurement and subpixel matching methods and are described in [37]. Centroid_NCC uses NCC to calculate the correlation functions of image patches in the search window and obtains subpixel displacement using the peak centroid method, which is implemented in the ImGRAFT toolbox [38]. Centroid_OC differs from Centroid_NCC in the utilization of OC (Orientation Correlation) [39], which takes advantage of orientation images to measure cross-correlation in the frequency domain. SimiFit_NCC performs a quadratic curve fitting in the x and y direction on the NCC correlation values to estimate the subpixel displacement [40]. SimiFit_PC searches the subpixel peak positions of PC correlation from the initial peak and its neighborhood, according to the 1D peak fitting model [41]. For upsampling, subpixel displacement is measured by resampling the PC similarity values around the peak to a higher resolution based on discrete Fourier transform [42].
In all methods, the image patch size is 64 × 64 pixels, and the size of the search window is 128 × 128 pixels with steps of 1 pixel. In order to avoid errors in comparison results due to null values of non-matched patches, we make an assumption that if there is no matching patch in the search window, the distance is set as the farthest distance in the search window. This assumption holds only for a quantitative comparison of simulated SAR images. In practice, a null value is assigned to the result when no matching patch is found.
For each scene as shown in Figure 5, we randomly conduct five network training and obtain five trained models, which are applied to 10 pairs of simulated SAR images under each noise level to obtain the average results. Figure 6 shows the average results of different matching methods on the simulated SAR images of three scenes in terms of RMSE with various noise variance levels. It can be seen that the estimation accuracies of all the matching methods tested reduce along with the increase of multiplicative noise variances. The results of SimiFit_PC and Centroid_OC fluctuate greatly with the change of speckle noise. Centroid_OC has the worst results on three scenes. A large mutation value indicates that no matching patch is found within the search window due to the influence of speckle noise. Centroid_NCC, SimiFit_NCC and Upsampling show better results than them. However, the proposed method proves the best performance in processing different simulated SAR images, and it is stable under various noise levels. It indicates that the proposed method is less affected by speckle noise than other methods. Table 2 shows each training time for five network trainings and the average testing time of applying five trained models to each scene for 10 image pairs with different noise variance. It can be seen that the average training time for each model is 272.4 s and the average testing time is 457.7 s. The matching process is conducted in a search window at an interval of 1 pixel during testing, which requires more testing times.

Real SAR Data
In this paper, we take the scenes of the Taku Glacier from TerraSAR X-band (TSX) as the main study area, and we apply the proposed method to the images of the Yanong Glacier from Sentinel-1 C-band (S1C) for further verification.
The Taku Glacier is one of 19 notable glaciers in the Juneau Icefield, winding through the mountains of southeastern Alaska. Figure 7 shows the geographic location of the Taku Glacier in the Juneau Icefield. It is of great importance to monitor the motion of the Taku Glacier since it is the only advancing glacier, which can provide insights into trends in the glaciers of southeastern Alaska. TSX radar orbits the Earth at an altitude of 514,000 m, collecting radar data day and night using an active antenna. Regardless of weather conditions and cloud cover, it can capture images with spatial resolution up to 1 m × 1 m in spotlight mode. The real glacier SAR data used in this paper were captured by TSX radar from the ascending pass direction with 11-day intervals, which were acquired in strip-map mode (SM) and HH polarization channels. The spatial resolution of the range image is 3 m × 3 m, which is resampled to 2.09 m × 2.09 m after the correction of topographic relief. Due to the relative high resolution, TSX is the primary data source for glacier motion monitoring. Figure 8 shows the real TSX images of the Taku Glacier in 2009, where Figure 8a was captured on July 11 and Figure 8b was acquired on July 22. Their size is 5248 × 5248 pixels. Due to the short observation time interval, the two SAR intensity images look similar, and the visible differences are highlighted in the images. They are snow-covered areas and the areas near the edge of the mountain.  The Yanong Glacier is the largest developed glacier in Gangrigabu Mountain of the southeast of Qinghai-Tibet Plateau, as shown in Figure 9. The S1C images of the Yanong Glacier used in this paper were imaged in interferometric wide swath mode (IW) from the ascending pass direction with 24-day intervals, which were obtained in VV polarization channels. After the topographic relief process, the resolution is resampled to 14.1 m × 14.1 m. Figure 10 shows the real S1C images of the Yanong Glacier, where Figure 10a was captured on 23 November 2016, and Figure 10b was acquired on 3 February 2017. Their size is 1152 × 1152 pixels.

Experiment Results
The real SAR intensity images are preprocessed by means of Range Doppler terrain correction, with an external digital elevation model employed. In addition, they are initially registered to cause the image coordinates to be aligned. The training samples are generated as described in Section 3.1, and the trained model is directly applied to real glacier SAR image matching. The TSX images shown in Figure 8 are cut into patches with a size of 64 × 64 pixels. For each patch in Figure 8a, the DMN is performed with steps of 1 pixel in the search window of 96 × 96 pixels centered on the image patch in Figure 8b. When a matched patch is obtained, the subpixel displacement is estimated by using a peak centroid-based method. To refine the outcome of DMN, a post-processing based on the neighborhood consistency which is described in Section 2.3 is applied to eliminate the outlier and matching errors. For simplicity, the geographical and scale information of real SAR images are not provided in the experimental results.
The final result is shown in Figure 11, where the colors of the velocity field map indicate magnitudes of glacier motion velocity and the direction of arrows represents the glacier motion direction. The length of arrows also indicates the relative magnitudes of velocity. The velocity field map shows the typical mode of glacier surface motion, where the glacier in the center has higher velocity, and the glacier near the boundary moves more slowly. It also reveals the motion trend of the Taku Glacier during a period of 11 days in the summer of 2009 and provides the precise values. In general, the glacier surface moves from the north (the upper part of the study area) to the south (the lower part of the study area). Two magnified motion fields in the right of Figure 11 show the complex and large glacier motion, respectively. It can be seen from motion field I that the glacier surface motion turns to different directions and magnitudes in the bend of the glacier. The velocity magnitudes of motion field II near the glacier's outlet are larger than other areas, and their velocity directions remain consistent. Figure 11. Velocity estimation of glacier surface motion in the Taku Glacier. Region I and II are given in the glacier areas with the most complex and largest movements; where the glacier diverges in region I and region II is close to the glacier's outlet. The red arrows indicate the relative velocity and direction of glacier motion.
In addition, qualitative comparisons with other methods are made on the Taku Glacier and the Yanong Glacier. The results are obtained without outlier removal, as shown in Figure 12 and Figure 13, respectively.
For the motion estimation of the Taku Glacier, except for the SimiFit_PC method, the results obtained by other methods are relatively consistent in the motion of the glacier body. The areas in yellow boxes of Figure 12 are covered by snow. It is difficult to extract the information of glacier motion in these areas, leading to it being a little messy in the velocity field map. Centroid_NCC, SimiFit_NCC and Upsampling obtain similar estimation results. However, there are some places with no velocity in the snow-covered areas, which are not covered by the color representing velocity magnitude. It indicates that no matching patches are captured in the search window. The result for Centroid_OC also has non-matched patches in other areas besides the yellow boxes. The proposed method obtains matching values in these areas, even if the values look a bit messy. It can be seen from the final results after outlier removal as shown in Figure 11 that some of these values are outliers and some of them contribute to the motion estimation.   Similar conclusions can be drawn from the estimation of the Yanong Glacier shown in Figure 13. According to the law of glacier flow, glacier velocity is relatively fast in the main body and outlet of the glacier and slow at the edge. However, SimiFit_PC results in a messy velocity field at the edges as well. It can be seen from the enlarged part of the boxes in Figure 13 that other methods have some non-matched image patches except for the proposed method. The result of the proposed method is not ideal. Because we only use a high-resolution remote sensing image as shown in Figure 4 for training data and SAR intensity images of the Yanong Glacier have low resolution, a large difference between the test images and training data leads to poor results. However, the proposed method presents the impact in the velocity direction at the intersections between glacier branches and the main body, which is consistent with the law of glacier motion.

Discussion
As can be seen from the experimental results given in Section 3, the proposed method can obtain more precise estimation than other traditional methods in the quantitative analysis of simulated SAR images. Meanwhile, a qualitative analysis is made for the motion estimaton of the Taku Glacier and the Yanong Glacier. As aforementioned in Section 3.3.1, the SAR intensity images of the Taku Glacier and Yanong Glacier were acquired by Ter-raSAR X-band (TSX) and Sentinel-1 C-band (S1C), respectively. In general, the longer the wavelength, the stronger the penetration ability, which will affect the detection performance of a glacier covered by floating soil. For marine applications, the HH polarization of the L-band is more sensitive, while the VV polarization of the C-band is better. However, it is noted that the results obtained by TSX images are clearer and have less errors than those obtained by S1C images. In fact, the Yanong Glacier is covered with soil, while the Taku Glacier is not. Theoretically, a C-band should be more advantageous in the case of soil covering, but the proposed method is performed on intensity images, and the image resolution has a greater impact on the motion estimation results. Matching errors not only occurs in the areas where features cannot be extracted (such as snow-covered), but they also are related to the quality of the SAR images obtained. However, the proposed method can still find matched patches in these regions and estimate the glacier's motion velocity.
From the quantitative measurement on simulated SAR images and qualitative comparisons on real SAR intensity images, it can be seen that the glacier motion monitoring based on deep learning proposed in this paper can extract deep features from SAR intensity images with speckle noise for accurate and dense matching. However, there are some limitations as follows.

•
As shown in Table 2, the proposed method takes a number of times to conduct model training and testing, because the deep learning method uses a large amount of training samples to acquire prior information for image matching. • A deep learning network requires a mass of parameters to be trained and adjusted, which is prone to over-fitting and under-fitting. • The proposed method shows better results than other traditional methods for images with complex textures, since the training data in this paper are high-resolution remote sensing images with rich texture. Therefore, the selection of training data affects the results of the proposed method. Increasing the diversity of training data can improve the performance of the deep learning model. Therefore, in order to obtain a robust deep learning network, there is a certain relationship among the model complexity, number of training samples and computation cost, which should be considered according to practical applications.

Conclusions
In this paper, we apply deep learning technology to glacier surface motion monitoring and propose a deep matching network for SAR image matching, which can capture the complex structure and deep information of SAR image data. A self-sample learning method is introduced to solve the lack of training samples and makes the network model robust with strong generalization. It can obtain better SAR image matching results compared with other matching algorithms even under certain speckle noise, resulting in precise and reliable glacier velocity estimation. The proposed method is validated by simulated SAR and real glacier SAR intensity images. The experimental results have confirmed that deep learning creates a new frontier for high-precision and long-term glacier motion monitoring using SAR intensity images.
Meanwhile, we also find some problems that need to be further improved during the process of experimental analysis. Firstly, taking into account practical application, we will concentrate on the actual situation of SAR image acquisition and conduct more experiments on real glacier SAR images in order to analyze which SAR images are more suitable for glacier motion monitoring. Secondly, for the application of deep learning in glacier motion monitoring, we will enhance the matching precision in the case of large speckle noise and perform the subpixel matching based on deep learning. More efficient deep feature learning models will be considered to reduce the computation costs. We will also conduct in-depth studies on nonlinear problems in the glacier motion domain in the future.