A Deep-Learning-Facilitated, Detection-First Strategy for Operationally Monitoring Localized Deformation with Large-Scale InSAR

: SAR interferometry (InSAR) has emerged in the big-data era, particularly beneﬁtting from the acquisition capability and open-data policy of ESA’s Sentinel-1 SAR mission. A large number of Sentinel-1 SAR images have been acquired and archived, allowing for the generation of thousands of interferograms, covering millions of square kilometers. In such a large-scale interferometry scenario, many applications actually aim at monitoring localized deformation sparsely distributed in the interferogram. Thus, it is not effective to apply the time-series InSAR analysis to the whole image and identify the deformed targets from the derived velocity map. Here, we present a strategy facilitated by the deep learning networks to ﬁrstly detect the localized deformation and then carry out the time-series analysis on small interferogram patches with deformation signals. Speciﬁcally, we report following-up studies of our proposed deep learning networks for masking decorrelation areas, detecting local deformation, and unwrapping high-gradient phases. In the applications of mining-induced subsidence monitoring and slow-moving landslide detection, the presented strategy not only reduces the computation time, but also avoids the inﬂuence of large-scale tropospheric delays and unwrapping errors. The presented detection-ﬁrst strategy introduces deep learning to the time-series InSAR processing chain and makes the mission of operationally monitoring localized deformation feasible and efﬁcient for the large-scale InSAR.


Introduction
Synthetic Aperture Radar interferometry (InSAR) has emerged in the big-data era, particularly after the launches of ESA's Sentinel-1 satellites [1,2].Given the acquisition capability and open-data policy of the European Space Agency (ESA), a large amount of SAR images have been acquired and archived.We are currently working towards achieving large-scale interferometry for measuring surface deformation on a continental scale [3][4][5][6][7][8].Meanwhile, such a large number of images require increasingly more computational resources for a full set of InSAR and time-series InSAR analysis, e.g., [2,[9][10][11][12][13].For example, thousands of Sentinel-1 SAR images need to be processed for a long-frame track across the whole Tibetan plateau, and this number is still rapidly increasing.In the near future, with the launches of the new-generation SAR satellites, e.g., the NISAR mission [14], an efficient processing strategy is highly desired.
For large-scale InSAR applications, we are generally interested in two types of signals.One is the long-wavelength deformation, e.g., earthquake cycle deformation along active faults [15][16][17][18], deformation resulting from the mass redistribution [19][20][21], and deformation related to the discharge and the recharge of aquifer systems [22][23][24].The other is localized deformation sparsely distributed in a large area, e.g., from mining activities [25,26], slowmoving landslides [27][28][29], sink holes [30][31][32][33], infrastructures [34][35][36], etc.For the former, InSAR measurements from different tracks and frames need to be aligned to a geodetic reference framework, based on either an available GNSS network [4,37] or referred to an area assumed to be stable.Meanwhile, the ionospheric and tropospheric delays need to be corrected, or at least largely eliminated.Interferograms with large-scale unwrapping errors have to be identified and removed from the processing chain [2,38].Given the abovementioned processing requirements, the derived velocity maps are often highly smoothed to enhance the long-wavelength deformation, e.g., [39].
On the other hand, many end users are interested in the localized deformation, particularly that related to nature hazards.In the related applications, time-series InSAR analysis is implemented by retrieving the deformation series from multiple SAR images.While these targets, e.g., slow-moving landslides, are sparsely distributed in mountainous areas, often associated with low coherence and strong tropospheric turbulence.The associated phase noise make the phase unwrapping very challenging.Moreover, referring all the measurements to a consistent geodetic frame is very difficult when the processing area is large, e.g., [3,28].Actually, the disaster-bearing body is moving in relation to its surrounding material, and we thus only need to refer the deformation to a nearby, coherent target or an area with distributed targets that can be assumed stable.Then, we can conduct the phase unwrapping, tropospheric correction and time series retrieval on a local scale.If well programmed, many similar local-scale analyses can be parallelly performed for a long-frame dataset with a distributed computational facility.To fulfill this strategy, the first step is to detect and locate, ideally from wrapped interferograms, the localized deformation that the limited computational resource is focused on.
The interferogram, a combination of phase delays resulting from the deformation, water vapors, and decorrelation noises, is represented as an array of complex numbers.The phase component has to be correctly unwrapped to be meaningful for a programmed feature extractor, e.g., a threshold, or a clustering criterion.Directly extracting features from wrapped interferograms is very challenging, because the mapping between the wrapped phases and real signal is extremely nonlinear.While the phase unwrapping is an NPhardness problem, often associated with non-Gaussian unwrapping errors that are difficult to correct, particularly for an operational InSAR processing system.
Deep learning represents a set of representation learning approaches using a hierarchical architecture, usually consisting of simple but nonlinear modules [40].Supervised deep learning models require a large number of labeled samples for training.During the training, these models continuously adjust their internal parameters using feedforward or backward propagation algorithms to learn high-level feature representations in the dataset and build input-output mappings.A neural network needs to be literately formed to build the input-output mapping from a multilayer stack of simple modules by looking at many labeled samples.This procedure allows the learner to generalize well far from the training samples, and thus resolve the so-called selectivity-invariance dilemma [40].
Given the promising capability of deep learning networks in analyzing nature data, we propose a strategy to directly extract information from wrapped interferograms or the stack of their gradients.This can be done by training different deep learning networks to facilitate the time-series InSAR analysis.In the following section of the paper, we firstly introduce and review the widely used deep learning architectures for different InSAR applications.Then, we report on follow-up applications of the networks we proposed in previous studies, namely in interferogram masking, localized deformation detection/localization, and high-gradient phase unwrapping [41][42][43].In this paper, we focus on validating the generalization ability when applying these published networks to different datasets, sites, and scenarios.We also discuss their implications and possible future development in adopting deep learning methods in achieving an operational deformation monitoring system with large-scale InSAR.

Architectures of Deep Learning Networks in InSAR Applications
Based on different tasks, we catalog the deep learning networks in the InSAR applications into four types; namely, classification/identification, segmentation, target detection, and time-series analysis.They have different network architectures and input-output dimensions.
For classification or identification, we typically train the network to classify the whole input, e.g., an image or the time-series data, into several categories.The framework of classification can be found in Figure 1.Users care more about the category information rather than the spatial information of the output.The spatial dimension of the output is always one by one, and the channel of the output is the number of categories.Thus, the output format of the classification network is a one-dimension vector.Some typical classification networks are VGG Net [44] and ResNet [45], which are often applied for abnormal signals classification in InSAR studies.For instance, Schwegmann et al., used a convolutional neural network (CNN) to identify subsidence features from both the wrapped interferogram and the unwrapped displacement map [46].They employed a nine-layer network to separate salient subsidence from deformation; i.e., false positives in the displacement maps [46].Mirmazloumi et al., presented a CNN to distinguish five types of deformation series, namely stable, linear, quadratic, bilinear, and unwrapping errors, from interferograms over the Granada region in Spain [47].
Remote Sens. 2023, 15, x FOR PEER REVIEW 3 of 20 in adopting deep learning methods in achieving an operational deformation monitoring system with large-scale InSAR.

Architectures of Deep Learning Networks in InSAR Applications
Based on different tasks, we catalog the deep learning networks in the InSAR applications into four types; namely, classification/identification, segmentation, target detection, and time-series analysis.They have different network architectures and input-output dimensions.
For classification or identification, we typically train the network to classify the whole input, e.g., an image or the time-series data, into several categories.The framework of classification can be found in Figure 1.Users care more about the category information rather than the spatial information of the output.The spatial dimension of the output is always one by one, and the channel of the output is the number of categories.Thus, the output format of the classification network is a one-dimension vector.Some typical classification networks are VGG Net [44] and ResNet [45], which are often applied for abnormal signals classification in InSAR studies.For instance, Schwegmann et al., used a convolutional neural network (CNN) to identify subsidence features from both the wrapped interferogram and the unwrapped displacement map [46].They employed a nine-layer network to separate salient subsidence from deformation; i.e., false positives in the displacement maps [46].Mirmazloumi et al., presented a CNN to distinguish five types of deformation series, namely stable, linear, quadratic, bilinear, and unwrapping errors, from interferograms over the Granada region in Spain [47].Different from the classification, users care about both the spatial information and the channel information for the applications of segmentation, meaning that its output is a tensor.The spatial resolution of the output tensor is mostly the same as the input, and the channel of the output can be various with different tasks.For example, if a framework is applied to make a pixel-level segmentation for the input, the channel of the output will equal the number of segments.If the network is applied for the regression, the channel of the output is set as one.The framework of segmentation can be found in Figure 2. Representative networks are encoder-decoder series, e.g., the UNet [48], which can effectively integrate the features from shallow and deep layers to make an accurate and dense prediction.Segmentation networks are widely applied in InSAR data processing, such as phase unwrapping, de-noising/filtering, DEM generation, and deformation extraction.Different from the classification, users care about both the spatial information and the channel information for the applications of segmentation, meaning that its output is a tensor.The spatial resolution of the output tensor is mostly the same as the input, and the channel of the output can be various with different tasks.For example, if a framework is applied to make a pixel-level segmentation for the input, the channel of the output will equal the number of segments.If the network is applied for the regression, the channel of the output is set as one.The framework of segmentation can be found in Figure 2. Representative networks are encoder-decoder series, e.g., the UNet [48], which can effectively integrate the features from shallow and deep layers to make an accurate and dense prediction.Segmentation networks are widely applied in InSAR data processing, such as phase unwrapping, de-noising/filtering, DEM generation, and deformation extraction.
Liang et al., proposed the SA-UNet, which employs an encoder-decoder structure and a spatial attention module to perform phase unwrapping with a synthetic dataset.Experiments showed higher accuracy and improved robustness than traditional phase unwrapping methods [49].Anantrasirichai et al., trained the AlexNet for extracting volcanic deformation.They found that training with synthetic interferograms improved the capability of the neural networks to detect volcanic deformation in satellite imagery [50].
Costante et al., proposed an encoder-decoder architecture for obtaining DEMs from a single-pass image acquisition.This network was able to extract high-level features from the SAR image using the encoder part, and then reconstruct a full-resolution DEM through the decoder part [51].Vitale et al., adopted a multi-objective neural network for InSAR phase denoising.Evaluations showed that the proposed solution was robust to different noise levels, leading to good noise suppression and phase variation preservation [52].Liang et al., proposed the SA-UNet, which employs an encoder-decoder structure and a spatial attention module to perform phase unwrapping with a synthetic dataset.Experiments showed higher accuracy and improved robustness than traditional phase unwrapping methods [49].Anantrasirichai et al., trained the AlexNet for extracting volcanic deformation.They found that training with synthetic interferograms improved the capability of the neural networks to detect volcanic deformation in satellite imagery [50].Costante et al., proposed an encoder-decoder architecture for obtaining DEMs from a single-pass image acquisition.This network was able to extract high-level features from the SAR image using the encoder part, and then reconstruct a full-resolution DEM through the decoder part [51].Vitale et al., adopted a multi-objective neural network for InSAR phase denoising.Evaluations showed that the proposed solution was robust to different noise levels, leading to good noise suppression and phase variation preservation [52].
For the detection, the network needs to predict the approximate location of a certain target along with its possibility of being this target.Its outputs are the corner coordinates of a box and the category index of the detected target.The detection framework is presented in Figure 3. Two kinds of detection frameworks are often applied, i.e., the RCNN series [53][54][55] and the You Only Look Once (YOLO) series [56][57][58].People often employ them to automatically detect hazard-related signals from the InSAR data; e.g., deformation related to volcanic activities, earthquakes, landslides, mining activities, etc.For the detection, the network needs to predict the approximate location of a certain target along with its possibility of being this target.Its outputs are the corner coordinates of a box and the category index of the detected target.The detection framework is presented in Figure 3. Two kinds of detection frameworks are often applied, i.e., the RCNN series [53][54][55] and the You Only Look Once (YOLO) series [56][57][58].People often employ them to automatically detect hazard-related signals from the InSAR data; e.g., deformation related to volcanic activities, earthquakes, landslides, mining activities, etc.
Guo et al., combined the Small Baseline Subset InSAR (SBAS-InSAR) and the YOLO network to detect landslides in high mountain areas along the Yunnan-Myanmar border.The results showed that the landslide detection rates could be improved through the fusion of images acquired by Sentinel-1 SAR and Gaofen-2 optical satellites [37].Yu et al., proposed a lightweight model for detecting the mining-induced funnels from the interferograms based on a modified YOLO-V5 network.The experiments showed that the model could effectively identify localized subsidence in mining sites with less memory consumption and higher accuracy [59].Li et al., proposed the SeMo-YOLO to address the challenges of multi-scale object detection and real-time monitoring with high accuracy.They showed that the model could achieve significant improvements in multi-scale target detection in near real time [60].
For extracting time-related information, deep learning frameworks are designed to learn the contextual information from time-series data.Examples of representative networks are the Recurrent Neural Network (RNN) [61], Long Short Term Memory (LSTM) [62], and Transformer [63].We can combine these networks with other deep learning frameworks for different tasks, such as damage mapping, change detection, deformation prediction, etc.The framework of time-series analysis can be found in Figure 4.
Stephenson et al., trained an RNN to predict the probability of coherence between pre-event and post-event SAR images.The difference between the forecast and observed co-event coherence provided a confidence measure of the damage, which could be used to detect anomalous changes of the surface property due to a natural disaster [64].Kulshrestha et al., used a two-layered bidirectional (LSTM) model to classify and detect sinkholes from InSAR-derived time-series deformation on coherent scatterers [65].Liu et al., developed heterogeneous long short-term memory (HLSTM) networks for large-scale land subsidence prediction with InSAR data.The results showed that the proposed method achieved the highest prediction accuracy compared with the results from other methods [66].Wang et al., proposed an InSAR deformation prediction system based on the transformer network for predicting time-series deformation around salt lakes.This method achieved an improved performance in predicting permafrost deformation trends compared with other models [67].The results showed that the landslide detection rates could be improved through the fusion of images acquired by Sentinel-1 SAR and Gaofen-2 optical satellites [37].Yu et al., proposed a lightweight model for detecting the mining-induced funnels from the interferograms based on a modified YOLO-V5 network.The experiments showed that the model could effectively identify localized subsidence in mining sites with less memory consumption and higher accuracy [59].Li et al., proposed the SeMo-YOLO to address the challenges of multi-scale object detection and real-time monitoring with high accuracy.They showed that the model could achieve significant improvements in multi-scale target detection in near real time [60].
For extracting time-related information, deep learning frameworks are designed to learn the contextual information from time-series data.Examples of representative networks are the Recurrent Neural Network (RNN) [61], Long Short Term Memory (LSTM) [62], and Transformer [63].We can combine these networks with other deep learning frameworks for different tasks, such as damage mapping, change detection, deformation prediction, etc.The framework of time-series analysis can be found in Figure 4.  To summarize, deep learning methods have been widely introduced, adopted, and applied in the InSAR community.The two driving forces, the big data and the rapid development of graphics processing units (GPUs), keep driving forward the developments in deep learning.In the following sections of the paper, we report on our efforts aiming to build up a deep learning facilitated operational InSAR system for monitoring the localized deformation.

Results
In this section, we present three networks that were applied in the time-series InSAR analysis.Details on the network architecture, training datasets, and results validation have been published in [41][42][43].Here, we review the basic concepts of constructing these networks, and present their performances in real applications.The main focus of this work is to show the generalization ability of the proposed networks when applied to new datasets, new sites, and to discuss their implications and possible future improvements.

Applications to Interferogram Masking
Generating a reliable mask for an interferogram is important for almost all of the InSAR applications.Manually drawing masks is the most precise approach, but it is very time-consuming.It would be infeasible to process thousands of interferograms in the time-series analysis.Setting a mean coherence threshold and/or an amplitude dispersion threshold can be efficient, but may lose many targets that only maintain high coherence in some interferograms.For single-interferogram applications, such as DEM generation, in coseismic deformation studies, setting a coherence threshold is difficult, due to the overestimate effect in low coherence areas [68].Therefore, although generating a good mask for an interferogram is a simple binary segmentation task, a generalized decorrelation feature discriminator is difficult to achieve.A deep learning network is suitable for this task for two reasons: (1) the deep learning network can directly learn the features of decorrelation noises from raw data, i.e., interferogram, coherence, and amplitude maps; (2) for such a binary segmentation task, network architectures have been well developed in the computer vision community, which can be adopted.
Masking an interferogram is a typical segmentation task for deep learning, which means classifying each pixel of the interferogram into two categories: decorrelated noises and valid phases.Because interferograms, coherence maps, and amplitude maps all provide useful information for identifying completely decorrelated areas, we put them together to form a three-dimension array as input.We adopted the widely applied encoder-decoder framework of the UNet [48] to build the main architecture of the Mask Net.Here, we used the SK-ResNet18 as an encoder and a modified decoder by replacing the 3-by-3 convolution kernel with the Res-SK Blocks [43].This modification improves the convolution kernel's capability to integrate different reception fields during the up-sampling procedure, which is beneficial to achieve a more reliable prediction result.
We used TerraSAR-X interferograms with manually drawn masks to train the network rather than using simulated interferograms.This is because a coherence threshold is still required to determine which parts in the simulated interferogram are completely decorrelated.Coherence is defined as the measure of similarity between SAR complex signals, and is estimated by spatially averaging the complex pixels in a moving window [68].Therefore, in a real coherence map, the boundary of a decorrelated area is often blurred due to the moving-window effects.In our previous work [43], we only drew masks on two TerraSAR-X interferograms: one covers the Kilauea volcano in Hawaii, and the other covers Wuhan city in China.After training, the Mask Net showed an excellent performance in terms of generating a reliable mask for various Tandem-X interferograms covering mountainous regions, cities, and plains with decorrelated areas due to water bodies, shadowing effects, and dense vegetation.
Surprisingly, with the network trained by only two TanDEM-X interferograms, the Mask Net also performed well for Sentinel-1 interferograms that had completely different wavelengths and resolutions.As shown in Figure 5, the Mask Net generated reliable masks for Sentinel-1 interferograms of the Tibet Plateau and of the plain in southeast China.We also produced masks by setting coherence thresholds determined by the classic Otsu method, which classifies the coherence values into two classes with largest variance differences [69].Note that the high coherence values were attributed to the short temporal baselines of the two interferograms; i.e., 6 days and 12 days for the Taihu Lake and Tibetan ones, respectively.The results showed that the masks generated by the deep learning method clearly followed the boundaries of water bodies without leaving isolated pixels inside them.This follow-up application to Sentinel-1 interferograms demonstrates the strong generalization ability of the Mask Net in distinguishing completely decorrelated phases from interferometric fringes with high and moderate coherences.

Application to Slow-Moving Landslide Detection
Detecting slow-moving landslides from interferograms is difficult because the accumulated deformation is too small to produce any visible feature in a single interferogram.This is particularly the case when only short-term interferograms are formed to preserve coherence for phase unwrapping.Therefore, previous works have focused on manually flagging slow-moving landslides from velocity maps or the gradient of velocity maps derived from multi-temporal InSAR analysis [27,28,37,70].However, the unwrapping errors and tropospheric delays can make the traditional time-series InSAR analysis difficult when extended to a large scale, e.g., multiple frames of Sentinel-1 images.Phase gradients, calculated by differencing adjacent pixels of wrapped interferograms, can enhance the feature of localized deformation while suppressing the long wavelength signal.After stacking the wrapped phase gradient of multiple interferograms, the localized deformation, even when very small, can be significantly enhanced.Based on this feature, the phase-gradient stacking method has been successfully applied to identify small fractures after the 2019 Ridgecrest earthquake [71][72][73].From these applications, the gradient of the localized deformation exhibits as a coupled positive-to-negative (red-to-blue if a jet colormap is applied) pattern, making the learning and inferring procedures efficient for the

Application to Slow-Moving Landslide Detection
Detecting slow-moving landslides from interferograms is difficult because the accumulated deformation is too small to produce any visible feature in a single interferogram.This is particularly the case when only short-term interferograms are formed to preserve coherence for phase unwrapping.Therefore, previous works have focused on manually flagging slow-moving landslides from velocity maps or the gradient of velocity maps derived from multi-temporal InSAR analysis [27,28,37,70].However, the unwrapping errors and tropospheric delays can make the traditional time-series InSAR analysis difficult when extended to a large scale, e.g., multiple frames of Sentinel-1 images.Phase gradients, calculated by differencing adjacent pixels of wrapped interferograms, can enhance the feature of localized deformation while suppressing the long wavelength signal.After stacking the wrapped phase gradient of multiple interferograms, the localized deformation, even when very small, can be significantly enhanced.Based on this feature, the phase-gradient stacking method has been successfully applied to identify small fractures after the 2019 Ridgecrest earthquake [71][72][73].From these applications, the gradient of the localized deformation exhibits as a coupled positive-to-negative (red-to-blue if a jet colormap is applied) pattern, making the learning and inferring procedures efficient for the deep learning network.
In our previous work, we combined the stacked phase gradients and deep learning network to automatically detect slow-moving landslides from large-scale interferograms [41].We adopted the YOLO network as the backbone architecture, as it is widely used for small target recognition in the computer vision community [57,58,60].Among the YOLO series, YOLOv3 was a classical, stable, and widely used version at the time of conducting our previous work [58].As new YOLO versions have become available, we also used some of the promising features in YOLOv4, namely the mosaic data augmentation method and CIOU loss in our network [41].We also utilized the CBAM and Drop-Block modules in other advanced deep learning modules to improve the network performance for the detection.The final Attention-YOLOv3 network was applied to detect and locate slow-moving landslides from the stacked phase-gradient map.
Training is a main issue for localized deformation detection, because simulating the deformation of slow-moving landslides is still very challenging given the complex rheology and geometry of various types of landslides.We have to rely on real signals from the stacked phase gradients.However, there are too few known slow-moving landslides in this region to fulfill the training request.We thus processed Sentinel-1 images covering several types of terrains in China, and manually labeled 712 azimuth-gradient and 581 rangegradient samples to train the Attention-YOLOv3 network.After training, the network successfully detected 3366 landslides from 349 ascending and 191 descending short-term Sentinel-1 SAR interferograms covering an area of ~180,000 km 2 of the western Sichuan province of China [41].
The combination of phase-gradient stacking and deep learning allow for detecting far more slow-moving landslides than traditional InSAR methods such as SBAS-InSAR or phase stacking methods [2,3,11,74].However, it is very difficult, if not impossible, to demonstrate that all the targets detected by the Attention-YOLOv3 network are indeed associated with slow-moving landslides.On-site verification is even more difficult, because many small slow-moving landslides lack surface cracks and are even inaccessible.In our previous work, we verified the precision and recall rates of the network using (1) the manually labeled validation samples, (2) the velocity maps derived from the SBAS-InSAR and phase stacking methods, (3) locations of the previously published landslides, and (4) the geomorphic features from the optical imagery.The validation, particularly from the independently conducted cross-validation with optical imagery, shows that the Attention-YOLOv3 can detect all the slow-moving landslides that show clear velocity anomalies in InSAR-derived velocity maps, while ~30% of the detection cannot be identified as slow-moving landslides from optical imagery.In other words, our detection-first strategy put detection with a high priority at the expense of a relatively large false alarm (30% in the western Sichuan case).This strategy thus guarantees the preservation of many as localized deformations associated with potential hazards in an operational monitoring application.As follows, we report the results of applying the Attention-YOLOv3 network to the northwest Yunan province in southwest China.
Northwestern Yunnan accounts for 88.64% of the total area of the Yunnan Province of China, where the landform is dominated by a mountainous plateau.The Nujiang River, Lancang River, Yangbi River, and Jinsha River flow in parallel in canyons among high mountains.The highest peak is Meili Snow Mountain, with an altitude of 6740 m, while the lowest point is only 210 m, causing an altitude contrast of nearly 6500 m.Precipitation is abundant, yet it is extremely unevenly distributed in both space and time.The mountain body is subject to external forces such as heavy rainfall and earthquakes, which often lead to mudflows, collapses, and large landslides.However, the steep terrain and the dense vegetation make this region difficult for traditional InSAR methods to derive reliable velocity measurements.
We stacked the phase gradients calculated from about 1000 short-term interferograms (within 36 days) covering this region.They were acquired from 2014 to 2022 with both ascending and descending geometries.Because the tropospheric delays are temporally uncorrelated and spatially correlated within a distance of several kilometers, after stacking, the phase difference between adjacent pixels is likely due to localized deformation rather than tropospheric turbulence.Figure 6a reports that such localized deformation clearly forms a coupled red-blue pattern, showing a positive gradient from the non-deform area to the deformation peak, and a negative gradient from the deformation peak to the nondeform area.Applying the Attention-YOLOv3 network trained from the manually labeled samples [41], we detected 555 slow-moving landslides in northwestern Yunnan.Notably, the localized deformation detected from the phase gradient may have resulted from other sources, such as mining activities and local subsidence in towns.To reduce the rate of such false alarms, we set a slope threshold of 10 degrees to remove detections on flat terrain.Figure 6b depicts the distribution of the detected slow-moving landslides with rivers and major cities in this region.It is evident that most of the detected slow-moving landslides are distributed along large river gorges; i.e., the Lancang River, the Yangbi River, and the Jinsha River.Similar to the results we obtained in western Sichuan province [41], dense landslides are developed at the confluence of rivers, particularly in the north and middle segments of the Jinsha River.
Deep learning plays an important role in the rapid detection and localization of slowmoving landslides sparsely distributed in a large area.The consistent characteristics of the detected slow-moving landslides in different regions further verify the generalization ability of our network in real applications.However, at this stage, the main limitation is that we are not able to reveal the landslide boundary from phase-gradient maps, because an anchor-based detector such as the YOLO network is applied.Nevertheless, the detection procedure requires much less workload than the time-series InSAR analysis for the whole region.It took around three days from the Sentinel-1 SLC images to obtain the detection results for the Sichuan case with three standard frames of ascending (145 acquisitions and 349 interferograms) and descending (97 acquisitions and 191 interferograms) Applying the Attention-YOLOv3 network trained from the manually labeled samples [41], we detected 555 slow-moving landslides in northwestern Yunnan.Notably, the localized deformation detected from the phase gradient may have resulted from other sources, such as mining activities and local subsidence in towns.To reduce the rate of such false alarms, we set a slope threshold of 10 degrees to remove detections on flat terrain.Figure 6b depicts the distribution of the detected slow-moving landslides with rivers and major cities in this region.It is evident that most of the detected slow-moving landslides are distributed along large river gorges; i.e., the Lancang River, the Yangbi River, and the Jinsha River.Similar to the results we obtained in western Sichuan province [41], dense landslides are developed at the confluence of rivers, particularly in the north and middle segments of the Jinsha River.
Deep learning plays an important role in the rapid detection and localization of slowmoving landslides sparsely distributed in a large area.The consistent characteristics of the detected slow-moving landslides in different regions further verify the generalization ability of our network in real applications.However, at this stage, the main limitation is that we are not able to reveal the landslide boundary from phase-gradient maps, because an anchor-based detector such as the YOLO network is applied.Nevertheless, the detection procedure requires much less workload than the time-series InSAR analysis for the whole region.It took around three days from the Sentinel-1 SLC images to obtain the detection results for the Sichuan case with three standard frames of ascending (145 acquisitions and 349 interferograms) and descending (97 acquisitions and 191 interferograms) tracks.Further work on building an operational slow-moving landslide detection system will focus on coherent target identification, phase unwrapping, and time-series inversion on the flagged areas that likely host moving targets.

Application to Mining-Induced Subsidence Detection with Time-Series Analysis
Monitoring mining-induced subsidence is another typical InSAR application targeting localized deformation.The rapid subsidence due to mining activities is represented as enclosed and dense fringes in wrapped interferograms.This pattern makes the detection relatively easier than that for slow-moving landslides.The most challenging part is how to unwrap these dense fringes that are often polluted by decorrelation noises.
Deep learning methods were first introduced for phase unwrapping as a segmentation problem [49,[75][76][77][78].This is an apparent and natural migration, because fringe edges, if clearly exhibited, split the wrapped interferogram into many segmentations.One can unwrap the phases if the deep learning network can precisely extract the fringe edges and assign the correct number of 2π to each segment.The U-Net is the commonly used deep learning architecture for image segmentation, and thus, in early phase-unwrapping applications.However, precisely extracting the fringe edges is difficult due to the downsampling procedure of the U-Net architecture.Specifically, downsampling wrapped interferograms often causes unstable solutions when the interferogram is polluted by decorrelation noises.A full-resolution encoding and decoding architecture is thus preferred.The deep learning regression can construct a direct, pixel-to-pixel mapping between the input to the output.We selected the DnCNN [79] as our backbone network because it is a well-developed CNN architecture applied to image denoising.
By combining the classic DnCNN [79], deep residual learning [45], and the dilated convolutions [80], we designed two similar networks to detect mining-induced subsidence and to unwrap its corresponding interferograms, namely the DDNet and the PUNet [42].Specifically, the DnCNN constructs pixel-level mapping from the wrapped deformation to the normalized deformation and the unwrapped phases; deep residual learning can avoid the gradient-disappear effect by introducing shortcut connections during the training, and the dilated convolution can enlarge the receptive field of the network to include more details of the wrapped interferogram.
From the perspective of training, phase unwrapping is an inherently suitable task for supervised learning, because a large amount of interferograms with known unwrapped phases can be simulated as training samples.We developed an interferogram simulator for generating training datasets [42,81].We adopted Perlin noises for mimicking the tropospheric turbulences, and the complex Gaussian noises for the decorrelation noises.The randomly distorted Gaussian surface was generated and was added to simulate the mining-induced subsidence, and was then converted to the unwrapped phase for training the PUNet and the probability of deformation for training the DDNet.The simulated interferogram, though perhaps not able to fully represent natural deformation signals, can provide perfect mapping between the input and output.
To apply the PUNet and DDNet to real interferograms, a convincing validation is also a challenge in terms of slow-moving landslide detection.In addition to the evaluation of simulated interferograms, we used the PUNet to unwrap the 12, 24, 36, and 48-day interferograms covering a detected sinking site.The unwrapped phases by the PUNet showed that the deformation was linearly correlated with the time within the 48-day period.While tested traditional unwrapping methods failed to unwrap the 36-day and 48-day interferograms with dense fringes as they revealed smaller unwrapped phases than the 12-day and 24-day interferograms.We also validated the PUNet using an ALOS-2 interferogram covering the same area in the same period.With a longer wavelength, the Lband interferogram can reliably reveal rapid deformation.The PUNet successfully revealed rapid subsidence at the level of millimeters per day using the L-band interferograms.
We applied the DDNet and PUNet to the ascending Sentinel-1 images (September 2018 to November 2019) covering an area of about 180,000 km 2 of the Shanxi province, China.The dense mining activities in this region have caused many sinking sites with velocities exceeding 1 m per year.We generated a sequential interferogram with 12-day temporal baselines.By applying the two networks, we detected more than 1300 localized subsidence areas.We then unwrapped the cropped interferograms centered on the detection, and converted the unwrapped phases to a time series.The velocities derived were from tens of centimeters to 2 m per year, much higher than the values derived from interferograms unwrapped by the traditional phase unwrapping method.Unfortunately, among these detected sites, a deadly accident occurred after years of mining at the end of 2021.
At 23:00 local time on 15 December 2021, an undermining water-leakage accident occurred in an illegal mining site near the Duxigou village of Shanxi Province.As of 18:00 on 17 December, 20 trapped people were successfully lifted and sent for medical treatment.Sadly, two other trapped people were killed in this accident.As reported by the media, this mining site was constructed in 2018, and was closed in July after being denounced by local villagers.The mining site was reopened in August 2021, and the mining activity continued until this accident.Traditional time-series InSAR analysis failed to reveal a continual deformation field, likely due to the large deformation gradient (Figure 7a).We applied the PUNet network to the Sentinel-1 images acquired from 2017 to 2021 to derive the deformation series in this area.As shown in Figure 7b, the velocity map clearly reveals two sinking sites near the Duxigou village with a peak velocity of more than 30 cm/year.The deformation series of the site closer to the Duxigou village (site B in Figure 7c) shows that the subsidence accelerated several times since 2017.Specifically, the sudden acceleration occurred in the middle of 2020 and at the end of 2020, consistent with the time of the reopening of the mining site and before the accident in 2021.
Remote Sens. 2023, 15, x FOR PEER REVIEW 12 of 20 interferograms unwrapped by the traditional phase unwrapping method.Unfortunately, among these detected sites, a deadly accident occurred after years of mining at the end of 2021.At 23:00 local time on 15 December 2021, an undermining water-leakage accident occurred in an illegal mining site near the Duxigou village of Shanxi Province.As of 18:00 on 17 December, 20 trapped people were successfully lifted and sent for medical treatment.Sadly, two other trapped people were killed in this accident.As reported by the media, this mining site was constructed in 2018, and was closed in July after being denounced by local villagers.The mining site was reopened in August 2021, and the mining activity continued until this accident.Traditional time-series InSAR analysis failed to reveal a continual deformation field, likely due to the large deformation gradient (Figure 7a).We applied the PUNet network to the Sentinel-1 images acquired from 2017 to 2021 to derive the deformation series in this area.As shown in Figure 7b, the velocity map clearly reveals two sinking sites near the Duxigou village with a peak velocity of more than 30 cm/year.The deformation series of the site closer to the Duxigou village (site B in Figure 7c) shows that the subsidence accelerated several times since 2017.Specifically, the sudden acceleration occurred in the middle of 2020 and at the end of 2020, consistent with the time of the reopening of the mining site and before the accident in 2021.This application of the emergency response shows the capability of deep-learningfacilitated InSAR in monitoring rapid subsidence due to mining activities, whether they be planned or illegal activities.Based on the Sentinel-1 InSAR processor (https://sarimggeodesy.github.io/software/,accessed on 27 April 2023) and the DDNet and PUNet [42], it is now feasible to build an operational system for nationwide mining-induced subsidence This application of the emergency response shows the capability of deep-learningfacilitated InSAR in monitoring rapid subsidence due to mining activities, whether they be planned or illegal activities.Based on the Sentinel-1 InSAR processor (https://sarimggeodesy. github.io/software/,accessed on 27 April 2023) and the DDNet and PUNet [42], it is now feasible to build an operational system for nationwide mining-induced subsidence monitoring.We can use the archived images to detect existing sinking sites and their deformation series.Then, the detection can be regularly repeated, perhaps every year, with the newly acquired images.The time-series analysis can be also easily updated with PUNet on the detected sinking areas.Such a system is much more efficient than traditional time-series InSAR analysis methods, and can operationally run without much human involvement.

Detection-First Strategy for the Operational InSAR System
Since the 2000s, InSAR has revolutionized the way people map deformation fields at the surface.The space-borne InSAR has grown from opportunistic science to a routine tool for monitoring natural hazards, such as earthquakes, volcanic activities, landslides, and ground instabilities [6].For earthquakes and volcanoes, the area of interest is well known.An operational monitoring system can be established to process the continually acquired SAR images covering these areas.However, for landslides and mining-induced subsidence, the area of interest is often unknown and sparsely distributed in a large area.The traditional strategy is to process the whole set of interferograms to obtain the surface velocity map and then to identify potential hazards.This strategy suffers from two disadvantages: (1) most of the terrain is actually stable in the illuminated area, and conducting time-series analysis for the whole image thus causes a huge waste of computational resources; (2) we need to refer the derived deformation measurements to a geodetic frame.During this procedure, the tropospheric and phase-unwrapping errors have to be eliminated on a large scale.
Localized deformations due to mining activity, landslides, etc., all have their own spatial features in the wrapped interferogram and can be enhanced in the stack of phasegradients.To apply a deep learning network to these phase images, we need to probe the deformation feature based on specific applications.When the deformation gradient is large, and the pattern is apparent in a single interferogram, we may design a regression network to directly detect them [42].For slow-moving landslides, their velocity is at the level of centimeters per year; the single interferogram, particularly the short-term one, is dominated by atmospheric turbulence.For such a case, phase-gradient stacking is a simple and efficient way to enhance the weak signal for the detection network [41].
The two applications presented here suggest that we can firstly detect and locate targets likely moving relative to the surroundings from wrapped interferograms.The computational resources can be then focused on these flagged areas for unwrapping and time-series analysis.In this procedure, deep learning methods have shown great performance in identifying moving targets, relying on its capability of extracting high-level information hidden in the wrapped interferograms.Deep learning plays an important role in this detection-first strategy by making the process automatic and efficient.Although this strategy is still in its infancy, and its reliability and precision still need to be validated in more on-going projects for deformation monitoring, the presented results prompt us to drive it further.

Build Up Proper Deep Learning Networks for Different Tasks
As InSAR specialists, we are not in a position to design a deep learning network, but to construct a proper network by adopting available modules from the well-established architectures.For the three networks presented here: the Mask Net is based on the UNet architecture [48], the Attention-YOLOV3 is based on the classical YOLOv3 network [58], and the DDNet/PUNet is based on the DnCNN [79].The three architectures share some common features, but also have their own modules, loss functions, and are trained with different datasets, with special needs for different tasks.
Essentially, the deep learning network for learning spatial information consists of two main components, (1) the backbone, which is a CNN for learning the features in images (feature extractor); (2) the function head, also a CNN, the output of which is particularly designed to fulfill the requirement of the task.Here, the backbones of the three networks are adopted from widely applied modules, including residual blocks [79], the convolution layer, the batch-normalization (BN) layer, and the activation function layer.The construction of backbones is a procedure by which to test different parameters, such as the size of the convolution kernel, the number of convolution implements, instances of downsampling, whether to employ the BN and the activation function, etc.We need to determine the optimal scheme for a network after multiple ablation experiments, which is the process for adjusting parameters and testing the effects of different modulus.
Although all three backbones are composed of modules with images as input, the function head is different regarding different tasks.Masking, phase unwrapping, and mining subsidence detection require an output with the same dimension and resolution as the input.Thus, we adopted the segmentation/regression head.For landslide detection, the network is designed to output the position information, which are the corner coordinates and the possibility of there being a landslide.A detection head has to be adopted from the YOLO series.
In addition to adopting proper network architectures, the loss functions associated with these tasks need to be specifically designed.Masking is a segmentation task that discriminates between the coherent and decorrelated phases.Thus, we employed BCE (Binary Cross Entropy) loss, as used in the classification [43].For phase unwrapping, the network needs to learn how to fit the data via a non-linear mapping function, which belongs to regression problems.We thus adopted the regression loss, MSE (minimum sum of squares error) loss for this task [42].For slow-moving landslide detection, we designed a loss function that combines the regression and classification loss, because the network needs to simultaneously output both the corner coordinates of the bounding boxes and the categories of boxes [41].
It is important to properly identify the bottlenecks in the InSAR processing chain, particularly from two aspects: one is that it requires complicated, case-dependent algorithms; the other is that it requires many human interventions.Our previous attempts suggested that when the task is difficult for a programmed algorithm, but can be easily resolved by humans, it is worth adopting a deep learning network.For example, masking an interferogram is difficult for the conventional segmentation methods because it is impossible to set a single coherence or amplitude threshold to discriminate fringes and decorrelation noises.However, it is easy for humans even without InSAR expertise to manually draw a mask, given the clear differences between fringes and noises from the phases.It is thus efficient to introduce a deep learning network to replace both human intervention and conventional segmentation methods for masking interferograms.Similar principles can be applied to other steps in the InSAR processing chain, such as phase unwrapping and filtering.These procedures are even more suitable for deep learning networks because we can use simulated interferograms as labels to train the network.
Finally, we, as InSAR specialists, should occupy a position that connects the artificial intelligence community with the end users of InSAR, to migrate more powerful deep learning networks for improving the operational deformation monitoring system with the big data of SAR images.

Training and Validation
Both the training and validation are extremely important for the supervised deep learning network, particularly when integrated into an operational system.Here, the masking and the landslide detection networks are trained from real interferograms with manually labeled samples, while the PUNet for phase unwrapping is trained with simulated interferograms.This is because the mapping between wrapped and unwrapped phases can be perfectively simulated, while the boundary of decorrelated phase patches and phase gradients of slow-moving landslides are difficult to simulate.Consequently, we may extensively retain the network for unwrapping a large amount of well-labeled training samples, making phase unwrapping a very suitable task for deep learning networks to resolve.
For slow-moving landslide detection, the amount of previously published landslides is too small for training the network.Even though we can obtain the locations of landslides from optical imagery or field investigations, most of such inventories provide information of catastrophic landslides rather than slow-moving landslides.Therefore, we have to manually label the red-blue pattern in the stacked phase gradients to train the network.Some may argue that it is impossible to manually obtain an optimal label, because as long as humans are involved, errors exist.Nevertheless, compared with the large number of true labels, the effect of false labels is almost negligible [82].Here, our priority is to detect all possible moving targets, and then to carry out the time-series InSAR analysis on them.
As training samples are extremely important for the generalization capability of the deep learning network, we developed an interferogram simulator for producing a pool of training samples for phase unwrapping and to be possibly extended to other tasks; e.g., filtering and deformation detections [81].The workflow and components of the simulator are shown in Figure 8. Simulated interferograms, though perhaps not able to fully represent the natural signals from deformation and atmospheric delays, precisely preserve the mapping between wrapped and unwrapped phases.If the deformation can be better simulated based on physical models, we can easily integrate it into the simulator to reproduce the natural signal.For slow-moving landslide detection, the amount of previously published landslides is too small for training the network.Even though we can obtain the locations of landslides from optical imagery or field investigations, most of such inventories provide information of catastrophic landslides rather than slow-moving landslides.Therefore, we have to manually label the red-blue pattern in the stacked phase gradients to train the network.Some may argue that it is impossible to manually obtain an optimal label, because as long as humans are involved, errors exist.Nevertheless, compared with the large number of true labels, the effect of false labels is almost negligible [82].Here, our priority is to detect all possible moving targets, and then to carry out the time-series InSAR analysis on them.
As training samples are extremely important for the generalization capability of the deep learning network, we developed an interferogram simulator for producing a pool of training samples for phase unwrapping and to be possibly extended to other tasks; e.g., filtering and deformation detections [81].The workflow and components of the simulator are shown in Figure 8. Simulated interferograms, though perhaps not able to fully represent the natural signals from deformation and atmospheric delays, precisely preserve the mapping between wrapped and unwrapped phases.If the deformation can be better simulated based on physical models, we can easily integrate it into the simulator to reproduce the natural signal.Compared with training, validation is even more difficult for InSAR-based monitoring systems, particularly for slow-moving landslide detection.This is mainly due to the unique capability of measuring millimeter-level deformations of InSAR.One suggestion is that we could guarantee that the network detect all of the potential moving targets at the expense of higher false alarms.However, while we cannot validate all of the detections, we can progressively conduct field investigations along with the operation of the monitoring system.Compared with training, validation is even more difficult for InSAR-based monitoring systems, particularly for slow-moving landslide detection.This is mainly due to the unique capability of measuring millimeter-level deformations of InSAR.One suggestion is that we could guarantee that the network detect all of the potential moving targets at the expense of higher false alarms.However, while we cannot validate all of the detections, we can progressively conduct field investigations along with the operation of the monitoring system.

Future Development in Applying Deep Learning Networks in Operational InSAR Systems
All three of the networks presented here are still iteratively developed to be seamlessly integrated in operational InSAR processing systems, such as that for global DEM generation [43], mining activity monitoring [42], and nationwide slow-moving landslide monitoring [41].During this procedure, we propose three directions for further work.
Training dataset: Because the supervised learning requires a large number training samples that also have to be close to the natural signal, a physical-based simulator, and even the neural network itself for different deformation process, is highly desired [83].Plenty of work has been carried out for simulating the mining-induced subsidence [84][85][86], and this can be adopted in our interferogram simulator.Landslide deformation simulation is more complicated, e.g., [87][88][89][90], it is thus important to solidify the knowledge of landslides exports to form a database of labeled slow-moving landslides for training.This still needs further study to understand the different interpretation results from geomorphic features, optical imagery, and InSAR results.
Multi-sensor monitoring system: The temporal resolution of most deformation monitoring systems are limited by using a single SAR system.Therefore, a network to combine multi-band SAR images, e.g., with the C-band, L-band, and X-band SAR images, could significantly increase the temporal resolution for monitoring surface deformation.Furthermore, a network that can combine deformation from InSAR/pixel offsets with geomorphology data, optical imagery, and geological information may further improve reliability for slow-moving landslide detection [28,89,91].Beyond deformation, the temporal resolution of change detection can be significantly improved for training deep learning networks to combine optical and radar images [92,93].For such applications, it requires more effort to develop the deep learning network to co-register images from different sensors.
From monitoring to prediction: When we have a good collection of time-series deformation measurements from slow-moving landslides, it is possible to build up a time-series network to learn the temporal behavior of deformation, particularly the process from slow movement to catastrophic collapse [94][95][96].In an operational InSAR system, we may design an RNN network to compare the prediction with the measurements within each updating period, supervising the network to learn the evolution of the slow-moving landslides.The ultimate goal would be providing a mid-term warning system for hazards associated with localized deformation, where on-site instruments should be deployed.

Conclusions
In this paper, we report our recent attempts in integrating deep learning methods in large-scale InSAR applications, focusing on detecting and extracting the localized deformation signal.We propose a detect-first strategy in the time-series InSAR analysis, and apply it for monitoring slow-moving landslides and mining-induced subsidence.Our latest applications show that these networks have a strong generalization ability to work well with different datasets, monitoring sites, and scenarios.During their implementation in operational InSAR systems, we recall training networks for multi-sensor images with enriched training samples.We believe deep learning networks will play more important roles in not only monitoring ground deformation, but also providing in-depth analysis of time-series measurements from InSAR images in the near future.

Figure 1 .
Figure 1.The architecture of the classification framework.Conv: Convolution Layer, DS: Down-Sampling Layer, FC: Full Connection Layer.

Figure 1 .
Figure 1.The architecture of the classification framework.Conv: Convolution Layer, DS: Down-Sampling Layer, FC: Full Connection Layer.

Figure 2 .
Figure 2. The architecture of the segmentation framework.Conv: Convolution Layer, DS: Down-Sampling Layer, US: Up-Sampling Layer.

Figure 2 .
Figure 2. The architecture of the segmentation framework.Conv: Convolution Layer, DS: Down-Sampling Layer, US: Up-Sampling Layer.

20 Figure 3 .
Figure 3.The architecture of the detection framework.Conv: The Convolution Layer.Guo et al., combined the Small Baseline Subset InSAR (SBAS-InSAR) and the YOLO network to detect landslides in high mountain areas along the Yunnan-Myanmar border.The results showed that the landslide detection rates could be improved through the fusion of images acquired by Sentinel-1 SAR and Gaofen-2 optical satellites[37].Yu et al., proposed a lightweight model for detecting the mining-induced funnels from the interferograms based on a modified YOLO-V5 network.The experiments showed that the model could effectively identify localized subsidence in mining sites with less memory consumption and higher accuracy[59].Li et al., proposed the SeMo-YOLO to address the challenges of multi-scale object detection and real-time monitoring with high accuracy.They showed that the model could achieve significant improvements in multi-scale target detection in near real time[60].For extracting time-related information, deep learning frameworks are designed to learn the contextual information from time-series data.Examples of representative networks are the Recurrent Neural Network (RNN)[61], Long Short Term Memory (LSTM)[62], and Transformer[63].We can combine these networks with other deep learning frameworks for different tasks, such as damage mapping, change detection, deformation prediction, etc.The framework of time-series analysis can be found in Figure4.

20 Figure 5 .
Figure 5.Comparison of masks generated by coherence threshold and the Mask Net for Sentinel-1 interferograms.The top line shows a two-frame interferogram covering the Tibet Plateau with lakes.The bottom line shows part of the interferogram covering the Taihu Lake region in southeastern China.Red lines of the right column indicate the coherence thresholds derived from the OTSU method [69].

Figure 5 .
Figure 5.Comparison of masks generated by coherence threshold and the Mask Net for Sentinel-1 interferograms.The top line shows a two-frame interferogram covering the Tibet Plateau with lakes.The bottom line shows part of the interferogram covering the Taihu Lake region in southeastern China.Red lines of the right column indicate the coherence thresholds derived from the OTSU method [69].

Figure 6 .
Figure 6.Landslides detected from the phase-gradient stacking map in the northwest Yunnan province, China.(a) Azimuth and range phase-gradient maps with the detections indicated as black boxes.(b) The detection results from ascending and descending tracks with rivers and large towns in this region.

Figure 6 .
Figure 6.Landslides detected from the phase-gradient stacking map in the northwest Yunnan province, China.(a) Azimuth and range phase-gradient maps with the detections indicated as black boxes.(b) The detection results from ascending and descending tracks with rivers and large towns in this region.

Figure 7 .
Figure 7. Emergence response to the 2021 undermining water-leakage accident near Duxigou village, Shanxi province, China.(a,b) show InSAR line-of-sight velocities derived from standard time-series InSAR processing [2] and from the cropped interferograms unwrapped by the PUNet, respectively.(c,d) show the derived time-series deformation from PUNet on point A and point B, indicated in (b).The gray areas in (d) indicate different periods with accelerated subsidence.

Figure 7 .
Figure 7. Emergence response to the 2021 undermining water-leakage accident near Duxigou village, Shanxi province, China.(a,b) show InSAR line-of-sight velocities derived from standard time-series InSAR processing [2] and from the cropped interferograms unwrapped by the PUNet, respectively.(c,d) show the derived time-series deformation from PUNet on point A and point B, indicated in (b).The gray areas in (d) indicate different periods with accelerated subsidence.
Remote Sens. 2023, 15, x FOR PEER REVIEW 15 of 20 training samples, making phase unwrapping a very suitable task for deep learning networks to resolve.

Figure 8 .
Figure 8. Flowchart for generating simulated interferograms as the traisning set.To the right, it shows five samples with simulated high-gradient deformation associated with increasing levels of noises from top to bottom.

Figure 8 .
Figure 8. Flowchart for generating simulated interferograms as the traisning set.To the right, it shows five samples with simulated high-gradient deformation associated with increasing levels of noises from top to bottom.