Persistent Scatterer Pixel Selection Method Based on Multi-Temporal Feature Extraction Network

Hu, Zihan; Li, Mofan; Li, Gen; Wang, Yifan; Sun, Chuanxu; Dong, Zehua

doi:10.3390/rs17193319

Open AccessArticle

Persistent Scatterer Pixel Selection Method Based on Multi-Temporal Feature Extraction Network

by

Zihan Hu

^1,2

,

Mofan Li

³,

Gen Li

^1,2,*,

Yifan Wang

^1,2,

Chuanxu Sun

^1,2 and

Zehua Dong

^1,2,4

¹

Radar Research Lab, School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

²

Chongqing Key Laboratory of Novel Civilian Radar, Chongqing 401120, China

³

Institute of Remote Sensing Satellite, China Academy of Space Technology, Beijing 100094, China

⁴

Key Laboratory of Electronic Information Technology in Satellite Navigation, Beijing Institute of Technology, Ministry of Education, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(19), 3319; https://doi.org/10.3390/rs17193319 (registering DOI)

Submission received: 31 July 2025 / Revised: 23 September 2025 / Accepted: 23 September 2025 / Published: 27 September 2025

(This article belongs to the Special Issue Monitoring and Modelling of Geological Disasters Based on InSAR Observations: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

The Multi-Temporal Feature Extraction Network (MFN) combining the 3D U-Net and the convolutional long short-term memory (CLSTM) is proposed for persistent scatterer (PS) pixel selection.
The MFN can better balance the quality and quantity of PS pixel selection results compared to traditional ADI-based method, ultimately leading to significantly improved deformation measurement results.

What is the implication of the main finding?

The combination of the 3D U-Net and the CLSTM in MFN delivers superior spatiotemporal characteristics and extraction capability, ensuring the selection of a large number of high-quality PS pixels. This study provides a valuable reference for future research initiatives.
As an end-to-end network, the MFN can directly utilize time-series SAR image raw data with automated characteristics extraction. This study provides an option requiring less human intervention for PS pixel selection.

Abstract

Persistent scatterer (PS) pixel selection is crucial in the PS-InSAR technique, ensuring the quality and quantity of PS pixels for accurate deformation measurements. However, traditional methods like the amplitude dispersion index (ADI)-based method struggle to balance the quality and quantity of PS pixels. To adequately select high-quality PS pixels, and thus improve the deformation measurement performance of PS-InSAR, the multi-temporal feature extraction network (MFN) is constructed in this paper. The MFN combines the 3D U-Net and the convolutional long short-term memory (CLSTM) to achieve time-series analysis. Compared with traditional methods, the proposed MFN can fully extract the spatiotemporal characteristics of complex SAR images to improve PS pixel selection performance. The MFN was trained with datasets constructed by reliable PS pixels estimated by the ADI-based method with a low threshold using ∼350 time-series Sentinel-1A SAR images, which contain man-made objects, farmland, parkland, wood, desert, and waterbody areas. To test the validity of the MFN, a deformation measurement experiment was designed for Tongzhou District, Beijing, China with 38 SAR images obtained by Sentinel-1A. Moreover, the similar time-series interferometric pixel (STIP) index was introduced to evaluate the phase stability of selected PS pixels. The experimental results indicate a significant improvement in both the quality and quantity of selected PS pixels, as well as a higher deformation measurement accuracy, compared to the traditional ADI-based method.

Keywords:

3D U-Net; CLSTM; deep learning; persistent scatterer selection

1. Introduction

The differential interferometric synthetic aperture radar (DInSAR) technique is a remote sensing technique capable of providing highly precise surface deformation measurements with millimeter-level accuracy, achieved by analyzing phase differences in the radar echoes [1,2,3]. It finds critical applications in geodesy, geophysics, and earth sciences, including the monitoring of tectonic plate movements [4], deformation due to activities like groundwater extraction or oil reservoir depletion [5,6], and the evaluation of volcanic and seismic hazards [7,8]. However, there are limitations attributed to temporal and geometric decorrelations, as well as atmospheric effects, which limit its deformation measurement accuracy.

To overcome these limitations, the persistent scatterer interferometric synthetic aperture radar (PS-InSAR) technique [9,10] has been proposed and developed in the last two decades. The PS-InSAR technique is an extension of the DInSAR technique and relies on persistent scatterer (PS) pixels, whose amplitude and phase values are stable over time and imaging geometry.

As the proposers of this technique, Ferretti et al. [9] designed a PS-InSAR processing flow that is primarily divided into two steps. First, the PS candidates (PSCs) are selected using the ADI-based method, and the atmospheric phase screen (APS) contributions are estimated and removed based on an iterative algorithm. Then, the digital elevation model (DEM) errors and line-of-sight (LOS) velocities are estimated by maximizing the pixel-by-pixel temporal coherence, and the PS-identification is performed again by setting a threshold on the maximum temporal coherence. Subsequently, Guoxang Liu et al. [11] proposed a more streamlined approach, requiring only a single ADI-based PS selection step. Neighboring PS pixels are connected to each other to build a network. Spatially correlated components, such as APS contributions, can be eliminated through the phase differencing of connected pixels. The DEM error difference and LOS velocity difference on each connection are solved by maximizing the model coherence. The resulting two systems of linear equations pertaining to DEM errors and LOS velocities, respectively, are independently solved using the least squares method. The deformation measurements in this paper are obtained through this approach.

However, in either processing flow, the quality and quantity of selected PS pixels in the PS-InSAR technique are crucial factors that influence the accuracy and reliability of deformation measurements [12,13]. High-quality PS pixels are valuable for accurate deformation measurements, but an inadequate selection may lead to restricted coverage of the study area. Conversely, high-density PS pixels may provide comprehensive coverage but can be less accurate due to the poor phase stability of PS pixels [14,15]. Therefore, PS pixel selection is an essential step of PS-InSAR technique.

Conventional PS pixel selection methods can be roughly divided into two categories: amplitude-based methods and phase-based methods. Amplitude-based methods, with amplitude dispersion index (ADI) [9] as a representative, characterize the phase stability of pixels in time-series images through statistical information of amplitude information. These methods perform well for very bright pixels and are therefore suitable for urban areas that contain significant man-made structures [16]. However, in sparsely built areas, the density of PS pixels selected by these methods is too low and generally insufficient to produce reliable deformation measurements [17]. Phase-based methods, with temporal phase coherence (TPC) [18] as a representative, estimate pixels’ phase stability directly from its interferometric phase noise. Since the amplitude information is no longer utilized, these methods can identify high phase stability pixels with low amplitude and can effectively increase the density of PS pixels in non-urban areas. In addition, the integration of amplitude-based and phase-based methods has already demonstrated enhanced performance in certain specific application scenarios [13,19]. Taking StaMPS [19] as an example, it utilizes both ADI and TPC for PS pixel selection, which has proven to be reliable even in natural terrains. However, this method is more computationally expensive and still cannot adequately identify all possible PS pixels in both urban and non-urban areas [20]. Furthermore, all above-mentioned methods are sensitive to the threshold value choice: a low threshold can ensure the selection of PS pixels with better phase quality but may also lead to the exclusion of valid PS pixels; conversely, a high threshold may increase the number of PS pixels but can introduce lower-quality pixels. This threshold, however, is traditionally chosen based on scholars’ experience and is highly subjective.

Deep learning is a machine learning technique with multiple levels of representation [21]. In recent years, deep learning methods have rapidly developed and been successfully applied in the field of SAR image processing [22,23,24,25]. In our case, the PS pixel selection task requires us to construct a function that can label pixels containing persistent scatterers based on time-series single look complex (SLC) SAR images. This function has a complex structure, making it difficult to represent it in simple mathematical formulas. Conventional PS pixel selection methods try to approximate this function which relied on the researchers’ choice of features and thresholds while the influence brought by subjective factors limited their effectiveness.However, for deep learning, complex functions can be efficiently learned by models with a sufficiently complex structure [21].

Currently, some researchers have made beneficial attempts to introduce deep learning methods into the task of PS pixel selection. Tiwari et al. [26] developed a two-dimensional convolution neural network (CNN) structure and a convolutional long short-term memory (CLSTM) structure to extract spatial and spatiotemporal characteristics of PS pixels from time-series interferometric phase. Subsequently, Zhang et al. [27] introduced a one-dimensional CNN to extract temporal characteristics from SAR amplitude and coherence of interferograms, achieving improved computational efficiency and lower training sample requirements. In the study by Chen et al. [28], a two-branch network named PSFNet was employed for PS pixel selection. Within this framework, the ResUNet structure extracts spatial characteristics from mean amplitude, amplitude dispersion, and average coherence of time-series SAR images, while the TANet extracts temporal characteristics from the time-series interferometric phase. The two-dimensional characteristics are ultimately concatenated and fused. Alternatively, Azadnejad et al. [29] designed a multi-layer perceptron (MLP) structure which requires a total of 18 time-domain and frequency-domain characteristics derived from time-series SAR amplitude as input. This approach seeks to reduce the complexity of deep learning models and the training computational cost by leveraging an ample set of pre-extracted characteristics. Although the above methods can all obtain higher-quality PS pixels selection results than conventional approaches, their implementation relies on a certain degree of feature pre-extraction, which complicates the processing flow. However, for an end-to-end deep learning model, the feature extraction is performed automatically during training without manual intervention, thus not influenced by the will of researchers.

Therefore, to improve the quality of PS pixel selection and simplify the processing flow, we construct the multi-temporal feature extraction network (MFN) in this paper. The MFN combines the 3D U-Net [30] and the CLSTM [31] to achieve time-series analysis and takes time-series SLC SAR images as input. 3D U-net is a kind of convolutional neural network (CNN) that has been wildly used for volumetric segmentation in medical images [32,33,34]. CLSTM is a kind of recurrent neural network (RNN) that was proposed for solving spatiotemporal sequence forecasting problems [31]. In the MFN, we first extract spatial characteristics from raw time-series SLC SAR images using 3D U-net. Then, we transmit the extracted spatial feature maps into CLSTM for temporal characteristics extraction and obtain the probability of pixels to be PS pixels. Compared with traditional methods, MFN fully extracts the spatiotemporal characteristics of complex SAR images to improve selection performance.

The remainder of this paper is organized as follows. Section 2 introduces the characteristics of the PS pixel and analyzes the limitations of the traditional PS pixel selection method. Section 3 presents the proposed MFN including 3D U-Net and CLSTM in detail, and describes the training set construction method. Section 4 evaluates the effectiveness of MFN by using 38 Sentinel-1 single-look complex images of Tongzhou district, Beijing, China. Section 5 discusses the performance of MFN and its advantages over traditional PS selection methods, while also indicating some potential directions for future research. Section 6 concludes this article.

2. PS Pixel Statistical Characteristics

In SAR images, a resolution cell contains a large number of scatterers and the amplitude and phase of its corresponding pixel are jointly determined by all of them. Among all scatterers, targets such as building corners, railings, and exposed rocks exhibit strong scattering intensity and maintain coherence over long time intervals. These are referred to as PS, with the pixels containing them referred to as PS pixels. PS can dominate the value of the PS pixel due to its strong scattering intensity. The remaining weak scatterers only result in minimal change in amplitude and phase (see Figure 1).

For PS pixel value, let us consider a circular complex Gaussian noise n with variance

σ_{n}^{2}

in both real part

n_{R}

and imaginary part

n_{I}

. Without loss of generality, assuming that the complex reflectivity g of a pixel has

0^{\circ}

phase (i.e.,

∠ g = 0

), then the amplitude values A obey the Rice distribution [9]

f_{A} (a) = \frac{a}{σ_{n}^{2}} e^{- \frac{(a^{2} + g^{2})}{2 σ_{n}^{2}}} \cdot I_{0} (\frac{a g}{σ_{n}^{2}}) a > 0

(1)

where

I_{0} (\cdot)

is the modified Bessel function. The shape of this distribution can be determined by the signal-to-noise ratio (SNR) (i.e.,

g / σ_{n}

) of the pixel, and

f_{A} (a)

approximately obeys Gaussian distribution when the SNR is high enough. In SAR images, most pixels meet

σ_{n} ≪ |g|

and the amplitude standard deviation

σ_{A}

approximates the noise standard deviation

σ_{n}

[9].

σ_{A} \approx σ_{n_{R}} = σ_{n_{I}}

(2)

Under this fact, the dispersion of PS pixel values can be represented in the form shown in Figure 2. Where

μ_{A}

and

σ_{A}

are the mean and standard deviation of amplitude of the signal, respectively, and

σ_{n}

is the standard deviation of circular complex Gaussian noise (its

3 σ

range is plotted). Assuming the PS pixel has a high SNR, which means

σ_{n}

is small, we have phase standard deviation (PSD)

σ_{φ} \approx t a n (σ_{φ}) \approx σ_{A} / μ_{A}

. We define

σ_{A} / μ_{A}

as ADI

D_{A}

, for a high SNR case, it can be regarded as an approximation of

σ_{φ}

[9].

D_{A} = \frac{σ_{A}}{μ_{A}} \approx \frac{σ_{n_{I}}}{g} \approx t a n (σ_{φ}) \approx σ_{φ}

(3)

Since the mean value and the standard deviation of each pixel can be easily obtained from time-series SAR images, PS pixel selection can be achieved by comparing the ADI of the candidate pixels with a set threshold. In the high SNR region of SAR images, this method can accurately identify high-quality PS pixels. Therefore, as a simple and effective method for PS pixel selection, the ADI-based method is widely used today (either individually [35,36,37] or in combination with other methods [38,39,40]).

However, this method can hardly balance the quality and quantity of PS pixels. To illustrate the problem, a simulation of the correspondence between the PSD and the ADI is designed. As shown in Figure 3a, we set 50,000 pixels in the stack of 38 SAR images. The signal is fixed to 1 and the standard deviations of both real and imaginary parts of noise are randomly selected between 0 and 0.8. It can be seen that, with the increase in PSD, the distribution of points becomes increasingly scattered and deviates from the 1:1 ratio line, which means the ADI-based method loses stability at low SNR.

If we consider PS candidates with a PSD of 0.3 or less, the points with amplitude deviation less than 0.3 will be selected (see Figure 3b). In this case, a large number of high-quality candidates are missed (shown in the blue area). To reduce the missing alarm rate, the ADI threshold needs to be elevated. As shown in Figure 3c, the ADI threshold is adjusted to 0.4, and in this case, previously missed pixels are almost completely recognized. However, this change introduces more false PS pixels (shown in red area) which will seriously affect the following deformation measurement.

Limited by the high SNR assumption, the false alarm probability of the ADI-based method is increasing with the PSD. Therefore, to reduce the ratio of false PS pixels, in actual processing only a low ADI threshold can be chosen, which will result in missing a large number of high-quality PS pixels.

3. The Proposed MFN-Based Method

As mentioned above, the ADI-based method utilized time-series amplitude information and determined whether a pixel is a PS pixel by a fixed threshold. However, the phase information is not employed, and the PS pixel spatial distribution characteristics are not considered throughout the processing. In addition, the subjectivity of threshold selection can affect the quality of PS pixels.

To overcome the limitations of traditional PS pixel selection methods, the MFN is constructed in this paper. The MFN combines the 3D U-Net [30] for extracting spatial characteristics and the CLSTM [31] for extracting temporal characteristics. This structure presents several advantages:

Firstly, MFN can achieve spatiotemporal characteristics extraction on complex data. We believe that, compared to the ADI-based method, additional consideration of phase information and spatial characteristics extraction ability will facilitate the selection of high-quality PS pixels.

Secondly, as an end-to-end network, the MFN can directly utilize time-series SAR image raw data and can automatically implement feature extraction during training. This not only simplifies the processing flow but also prevents subjective factors affect the quality of the PS pixel.

The process of the MFN-based PS-InSAR algorithm is shown in Figure 4, and the MFN structure is shown in Figure 5. In this section, we will introduce the MFN’s structure and the construction of training datasets.

3.1. Spatial Characteristics Extraction Based on 3D U-Net

Previous studies have shown that PS pixels are always statistically inhomogeneous in their neighborhood [20]. Therefore, in building MFN, our first consideration is to make it possible to realize the spatial characteristics extraction of the raw time-series SAR images.

3D U-Net is a CNN that was proposed and applied to biomedical volumetric image segmentation in 2016 [30]. As an improvement based on the U-Net [41], it consists of an encoding path and a decoding path. The encoding path follows the typical CNN structure. However, in the decoding path, the pooling operators are replaced by up-sampling operators to increase the resolution of the output. In each resolution step, features from the encoding path are combined with the upsampled output. 3D U-Net transforms traditional 2D architecture into 3D architecture, which corresponds exactly to the 3D structure of time-series SAR images (image number × length × width). In addition, the time-series SAR image feature extraction task is not a conventional computer vision task, which requires autonomous data generation and annotation, with fewer samples available for training; thus, using a network structure with a large number of parameters may cause overfitting. So here we only set four resolution steps.

The left part of Figure 5 shows the 3D U-Net structure in the MFN. The encoding path expands the receptive field by 2 × 2 × 2 max pooling with 2 × 2 × 2 stride. Each step extracts the spatially relevant information around pixels using two 3 × 3 × 3 3D convolutions with ReLU activation function. Normalization is essential to reduce the difficulty of network training. Among the most classic methods is Batch Normalization (BN), which involves normalization along the batch dimension. However, considering that the time-series SAR image sets occupy a large memory space, the batch size during training cannot be set sufficiently large, which leads to increased errors in BN. To address this, we introduce Group Normalization (GN) [42] before each convolution. This method divides the channel dimension into groups and performs normalization within each group. Since GN does not rely on the batch dimension, its performance is unaffected by batch size. The decoding path recovers the spatially relevant information voxel to the original scale by four steps upsampling exactly opposite to the above four steps max pooling. The data in the same resolution step of the encoding and the decoding path are directly connected in the channel dimension which allows the feature maps recovered from upsampling to contain more low-level information.

The network-required input is a voxel tile of 38 time-series SAR images with two channels (corresponding to the real and imaginary parts of SAR images). To mitigate demands on computer hardware resources [43], we employ an overlapping blocking strategy to group the large coverage data into the network. Specifically, the time-series SAR images are divided along the length and width dimensions, with each division creating a 100 × 100 pixel block. This means a single batch of data transmitted into the network has a size of 2 × 38 × 100 × 100 (channel × image number × length × width). Each block is partitioned with a stride of 40 along both the length and width dimensions, indicating that there is some overlap between consecutive batches of data derived from the same time-series SAR image set. The average prediction of all the blocks containing each pixel determines the final probability of identifying that pixel as a PS pixel. Insufficient spatial information creates uncertainty regarding the effectiveness of the network on block edge pixels. Nevertheless, our overlapping strategy ensures that all internal pixels receive enough information in specific blocks. This ensures that the prediction results of pixels remain unaffected by the blocking operation.

3.2. Temporal Characteristics Extraction Based on CLSTM

The phase stability of a pixel responds to the overall variation of its phase value in time-series SAR images. The 3D convolutions kernel of the previously mentioned 3D U-Net has one dimension corresponding to the image number of the time-series SAR images, which means that it has a certain capability of temporal characteristics extraction. However, the structure of the 3D U-Net is not designed for time-series analysis.

CLSTM is an RNN for time-series data analysis. As a generalization of the classical LSTM structure, it creatively replaces the full connections in input-to-state and state-to-state transitions with convolutional networks [31].This makes CLSTM capture characteristics better.

A complete CLSTM structure consists of several levels of sequentially connected CLSTM blocks. The structure of the CLSTM block corresponding to the t-th spatial feature map is shown in Figure 6. The block updates and maintains the input cell state through the control of a series of switching units, thus maintaining a long-time memory of the input data and outputting the results of the timing analysis. The switching units are categorized into forget gates, input gates, and output gates according to their functions, and the maintenance process of the network module can be expressed as follows [31].

f_{t} = σ (W_{x f} * X_{t} + W_{h f} * H_{t - 1} + W_{c f} ⊙ C_{t - 1} + b_{f})

(4)

i_{t} = σ (W_{x i} * X_{t} + W_{h i} * H_{t - 1} + W_{c i} ⊙ C_{t - 1} + b_{i})

(5)

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ \tanh (W_{x c} * X_{t} + W_{h c} * H_{t - 1} + b_{c})

(6)

o_{t} = σ (W_{x o} * X_{t} + W_{h o} * H_{t - 1} + W_{c o} ⊙ C_{t} + b_{o})

(7)

H_{t} = o_{t} ⊙ \tanh (C_{t})

(8)

where

σ (\cdot)

denotes the Sigmoid activation function, ∗ denotes the convolution operation, and ⨀ denotes the Hadamard product.

W_{x} = [W_{x f}, W_{x i}, W_{x o}]

,

W_{h} = [W_{h f}, W_{h i}, W_{h o}]

,

W_{c} = [W_{c f}, W_{c i}, W_{c o}]

are weights used to adjust the switching units

f_{t}

,

i_{t}

,

o_{t}

, respectively, and

b_{c} = [b_{f}, b_{i}, b_{c}, b_{o}]

is the corresponding bias. Benefiting from the existence of these gates in each block, CLSTM can overcome the long-term dependencies problem [44] and effectively discover the relationship between the changing pattern of pixels in the time-series and whether they are PS pixels or not.

Based on the above two structures, our network can be viewed as a model that receives spatiotemporal correlation information in time-series SAR images. It accomplishes feature extraction and integration to achieve specific pixel recognition.

3.3. Loss Function

A loss function combining Dice Loss and Binary Cross-Entropy Loss (BCE Loss) is used for model training. In the majority of the study area, the number of PS pixels in the SAR images is significantly lower than the number of non-PS pixels. Dice Loss mainly considers the overall similarity of the segmentation result, and can reduce the influence of class-imbalance on network training. If there are N pixel points in the prediction matrix, where the label of the i th pixel is

y_{i}

(recorded as 1 if it is a PS pixel and 0 if it is not a PS pixel), the probability that the network predicts the pixel to be a PS pixel is

p_{i}

, the Dice Loss can be calculated as follows [45]:

D i c e L o s s = 1 - \frac{2 \sum_{i = 1}^{N} p_{i} y_{i}}{\sum_{i = 1}^{N} p_{i}^{2} + \sum_{i = 1}^{N} y_{i}^{2}}

(9)

BCE Loss mainly considers the pixel-level classification accuracy and can be calculated as follows: [46]

\begin{matrix} B C E L o s s = - & \frac{1}{N} \sum_{i = 1}^{N} (y_{i} \cdot l o g (p_{i}) + (1 - y_{i}) \cdot l o g (1 - p_{i})) \end{matrix}

(10)

The loss function we use is obtained by combining the above two loss functions.

B C E D i c e L o s s = D i c e L o s s + B C E L o s s

(11)

3.4. Training Dataset Construction

The training dataset we construct for our network consists of time-series SAR images from Sentinel-1A of five cities, as shown in Table 1. The main areas of them all have a dense concentration of buildings that can provide rich PS pixels. Considering that the CNN has translation invariance, whether or not the image is segmented has no effect on the training results. Selecting areas containing more feature pixels is equivalent to expanding the training dataset. In addition to this, they contain a variety of ground object features including wood, parkland, farmland, desert, and waterbody.

The construction process of the training dataset is shown in Figure 7. The real and imaginary parts of the registered SLC SAR images are used as the two channels to form a four-dimensional dataset. After segmentation in terms of image length, width, and number dimensions, each dataset becomes a matrix of 2 × 38 × 500 × 2000 (channel × image number × length × width).

The PS pixel label of the training dataset can be obtained by some existing PS pixel selection methods. In this paper, we adopt the widely used ADI-based method. Although, as described in Section 2, this method can hardly balance the quality and quantity of PS pixels, the results obtained can be applied to network training as long as the accuracy of point selection is ensured by lowering the threshold (

D_{A} < 0.32

). Additionally, a sufficient number of images are included in the PS pixel selection to minimize misjudgment resulting from inadequate estimation samples. Figure 8 demonstrates the impact of the number of images on the reliability of the ADI-based method. Increasing the number of images results in a significant decrease in the deviation of the scattering distribution.

Finally, we performed data augmentation on the dataset using image symmetry, image rotation (90°, 180°, 270°), and Gaussian blurring.

4. Experimental Datasets and Results

4.1. Validation Area and Dataset

To assess the effectiveness of our MFN-based PS-InSAR algorithm, SAR images gathered by Sentinel-1A over Tongzhou District, Beijing, China, are used to investigate the deformation in the area.

Tongzhou District is located in the southeastern region of Beijing. At the outset of the 21st century, the pace of urbanization hastened in the locality, engendering a significant rise in population density. This triggered ecological degradation and over-exploitation of groundwater, which caused a swift ground deformation in the area.

In this section, 38 single-look complex images obtained by Sentinel-1A from July 2015 to December 2017, as shown in Table 2, are applied to the deformation analysis in Tongzhou district. We focus on the region of 116.581°E–116.644°E longitude and 39.768°N–39.852°N latitude. Figure 9 shows the optical image of the area we concentrate on.

All SAR image data utilized in this section originate from Sentinel-1A, launched in 2014, carrying a C-band SAR instrument. Sentinel-1A has four operational modes. Among them, the Interferometric Wide Swath (IW) Mode, which is used for our data, is the main operational mode over land with a spatial resolution of 5-by-20 m.

4.2. Performance Analysis of the MFN

Being a PS pixel selection method that fully exploits the spatiotemporal information of time-series SAR images, the MFN-based method proposed in this paper can overcome the disadvantages of the traditional ADI-based method and can balance the quality and quantity of PS pixels.

To validate the above conclusion, we use the ADI-based method and the MFN-based method to select PS pixels in the study area. For the ADI-based method, the cases of 0.28 and 0.437 thresholds are selected. The effectiveness of these methods will be evaluated in terms of both PS pixel selection and deformation measurement.

4.2.1. PS Pixel Selection Effectiveness Evaluation

The number of PS pixels selected by the MFN-based method and the ADI-based method is presented in Table 3. The ADI-based method (

D_{A} < 0.28

) ensures the quality of PS pixels by choosing a lower threshold, but the number of PS pixels it selected is lower than the other two methods. The number of PS pixels selected by the ADI-based method (

D_{A} < 0.437

) and the MFN-based method is similar. Their PS pixel selection results are shown in Figure 10a and it is clear that their distribution is very different.

The L1 region is situated in a densely built-up area in the upper part of the validation area. As shown in Figure 10b, the PS candidates identified by the MFN-based method are concentrated in built-up areas. However, the results of the ADI-based method, although there is an overlap with the results of the MFN-based method, are irregularly dispersed in the low-reflectivity areas between the buildings.

The L2 region contains mainly open space. As shown in Figure 10c, the additional predictions of the MFN-based method are mainly distributed on a road crossing the river. Although the scattering stability of the road is usually weaker than that of general man-made targets, there may be structures along the road such as guardrails and street lamps that produce more stable scattering and the proposed method exploits this information and supplements the lack of PS on the road. The results of the ADI-based method are mostly distributed on the river bank and bare ground, which suggests that the ADI-based method is more serious for false alarms at a higher threshold.

To assess the quality of selected pixels quantitatively, we introduce a metric that measures the temporal phase noise of pixels. This metric, named STIP (Similar Time-series Interferometric Pixel) index [20], is determined by counting the number of pixels in a search window that adhere to the following conditions:

\underset{n \to - (N - 2) : N - 2}{arg max} {(\underset{m = 1}{\sum^{N - 1}} e^{i φ_{x} (m)} \cdot e^{- i φ_{y} (m + n)})}_{e^{- i φ_{y} (w)} = 0 if 1 > w > N - 1} = 0

(12)

Here, N is the number of time-series SAR images, x is the pixel under test, y is a certain pixel in its neighborhood, and

φ (w)

is the interference phase of a certain pixel in the w-th interferogram. The equation eliminates the part of the phase that is spatially correlated through a correlation operation between the two pixel values to achieve phase stability detection. Thus, a higher STIP index for a pixel indicates increased phase stability.

The STIP index distribution of pixels in the study area is shown in Figure 11a. Considering that PS pixels are essentially pixels with high phase stability, the performance of PS pixel selection methods can be assessed by calculating the distribution of the STIP index of their PS pixel selection results.

Figure 11b shows the STIP index distribution for the PS pixels selected by the two previously mentioned methods. It can be seen that, as the number of selected PS pixels is nearly the same, the statistical histogram of the MFN-based method is more biased towards the high STIP index region than the ADI-based method (

D_{A} < 0.437

). In other words, the overall STIP index of PS pixels selected by the MFN-based method takes a higher value, and it can be concluded that the MFN-based method produces a greater proportion of high-quality PS pixels.

4.2.2. Deformation Measurement Effectiveness Evaluation

The quality and quantity of PS pixel selection directly determine the effectiveness of deformation measurement. The deformation measurement process used in this section is shown in Figure 4. Firstly, the PS pixel selection results are employed to construct a Delaunay Triangulation Network. Secondly, the maximum likelihood estimation technique is utilized to solve the deformation parameters along connections. Finally, the deformation rate measurements at the PS pixel locations are obtained through the least square method.

We present the above-mentioned three PS pixel selection results for the study area. In practice, to ensure the precision of deformation measurement, the ADI-based method typically applies a more rigorous threshold. Figure 12a shows the deformation measurement result acquired by utilizing the PS pixel selection result with the ADI-based method with a threshold of 0.28. It can be seen that there is a clear deformation trend in the densely built-up area in the upper right of the study area, while there is an upward trend in the lower right. If the threshold of the ADI-based method is increased to 0.437, more PS pixels can be selected and the coverage of the measurement results will be greater (see Figure 12b). However, due to the increase in the false alarm rate (see Figure 3), a large number of pixels with low phase stability will be included in the triangulation network construction and the deformation analysis results will be inaccurate. As shown in Table 3 the number of PS points selected by the MFN-based method is close to that of the ADI-based method with a threshold of 0.437. However, its deformation measurement result (see Figure 12c) is close to the ADI-based method with a threshold of 0.28, meaning that the additional PS pixels introduced did not reduce the quality of the deformation measurement.

To quantitatively confirm the effectiveness of deformation measurements, dependable deformation measurements of the study area are required as a reference. The Stanford Method of Persistent Scatterer (StaMPS) software (version 4.1 beta) package developed by Andy Hooper et al. is used for extracting ground displacements from time-series of SAR acquisitions. Currently, the SNAP-StaMPS integrated processing is extensively employed in processing Sentinel-1 data [47]. In this section, we use the StaMPS software package to perform a regression analysis of the deformation measurement in the study area as a reference and with the three deformation measurements mentioned above. Specifically, we extracted the common parts of all four PS pixel selection results and calculated the Person Correlation Coefficient R according to (13).

R = \frac{n \sum_{1}^{n} v_{i}^{M} v_{i}^{T} - \sum_{1}^{n} v_{i}^{M} \sum_{1}^{n} v_{i}^{T}}{\sqrt{n \sum_{1}^{n} {(v_{i}^{M})}^{2} - {(\sum_{1}^{n} v_{i}^{M})}^{2}} \sqrt{n \sum_{1}^{n} {(v_{i}^{T})}^{2} - {(\sum_{1}^{n} v_{i}^{T})}^{2}}}

(13)

where n is the number of PS pixels involved in the analysis,

v_{i}^{M}

and

v_{i}^{T}

are the measured and reference values of the deformation rate, respectively.

Figure 13 shows the comparison of the deformation measurements of the above three methods with the StaMPS method. When the ADI threshold increases from 0.28 to 0.437, R decreases from 0.80421 to 0.4062, indicating that the effect of the ADI-based method is sensitive to the threshold setting. In contrast, the MFN-based method proposed in this paper improves R to 0.88593 while maintaining the quantity of PS pixels, indicating that increasing the number of high-quality PS points improves the deformation monitoring accuracy of the PS-InSAR algorithm.

4.3. Performance Analysis of Different Structures

The MFN proposed in this paper is composed of the 3D U-Net structure and the CLSTM structure. In order to elucidate the contribution of the two structures to the performance of the MFN, we utilize MFN, 3D U-Net, and CLSTM, respectively, for PS pixel selection and evaluate the quality of the results.

However, the definition of the PS pixel itself is fuzzy; in other words, the demarcation between high and low phase stability is not clear. This makes it difficult to obtain the so-called ‘true value’ of the PS pixel. In order to analyze the performance of the above three structures, a multi-structure voting evaluation method is employed.

PS pixels exhibit complex spatiotemporal characteristics in time-series SAR images, and the above three methods rely on some of them to distinguish between PS pixels and non-PS pixels. Considering that pixels with sufficient characteristics are more likely to be PS pixels, we regard pixels that are selected by two or more structures as true values (see Figure 14). The PS pixel selection results for each structure are then analyzed in terms of accuracy rate

R_{a}

and error rate

R_{e}

. If the number of true values is denoted as T, and for the PS pixel selection result of a certain structure, the number of true values contained in it is

T P

and the number of remaining pixels is

F P

. Then these two metrics can be expressed as follows:

R_{a} = \frac{T P}{T}

(14)

R_{e} = \frac{F P}{T}

(15)

R_{a}

reflects the ability of the structure to select PS pixels, while the

R_{e}

reflects the ability of the structure to reject non-PS pixels.

After analysis, a total of 70647 PS pixels in the study area are considered true values, and the effects of three structures are proposed in Table 4. It can be seen that the MFN structure selects the greatest number of PS pixels, and

R_{a}

of its PS pixel selection result is higher than that of the 3D U-Net and the CLSTM. In terms of

R_{e}

, the MFN performs similarly to the 3D U-Net, while the CLSTM exhibits suboptimal performance.

In summary, the MFN structure has superior PS pixel selection performance compared to its two components. Among them, the 3D U-Net structure contributes the most of the overall performance. The CLSTM structure, although it is difficult to be used for PS pixel selection alone, its time-series analysis capability brings improvement to the overall performance when combined with the 3D U-Net structure.

5. Discussion

The proposed MFN integrates spatiotemporal feature extraction capabilities and can fully select high-quality PS pixels from time-series SAR images. To assess the performance of MFN, we conduct PS pixel selection and deformation measurements in the Tongzhou District, Beijing, which exhibits swift ground deformation. The experimental results indicate that the number of selected PS pixels by MFN in the study area is close to that of the ADI-based method with a threshold of 0.437. However, from the perspectives of spatial distribution rationality of selected PS pixels, STIP index distribution, or the quality of corresponding deformation measurement, the ADI-based method evidently introduces a significant number of low-quality PS pixels. This is primarily due to the increase in the false alarm rate caused by the failure of the approximation of the ADI to the PSD in the case of lower SNR. In contrast, the proposed MFN, although trained using labels provided by the ADI-based method, does not exhibit the same issue. This suggests that the MFN can learn more intrinsic features of PS pixels in time-series SAR images than the ADI feature. When the threshold is lowered to 0.28, the quality of deformation measurements corresponding to the ADI-based method improves significantly. However, the sparse distribution of PS pixels adversely affected deformation measurement quality, resulting in slightly inferior performance compared to MFN. Additionally, we analyze the contributions of different structures in the MFN. The results demonstrate that the 3D U-Net structure contributes most of the overall performance, while the CLSTM delivers additional performance gains.

5.1. Comparative Analysis of Computational Cost

However, besides the quality and quantity of selected PS pixels, computational cost is also an important criterion for evaluating a PS pixel selection method. A more efficient PS selection method contributes to enhancing the overall efficiency of deformation measurement. As a result, we selected 38 scenes of SAR images with a size of 600 × 2000 in the study area, and applied four methods (MFN, StaMPS, ADI-based method, and STIP-based method (search window size: 9 × 9)y) for PS pixel selection, with the computation time recorded in Table 5. The CPU device we used was an AMD Ryzen 9 7945HX (Advanced Micro Devices, Santa Clara, CA, USA) and the GPU device was an NVIDIA RTX 4060 (NVIDIA Corporation, Santa Clara, CA, USA). Among these four methods, the ADI-based method is the fastest in computation, mainly due to its simpler computational steps.The MFN, while slower than the ADI-based method, but faster than the remaining two methods, proving its relative advantage in computational cost as a deep-learning-based method.

5.2. Sensitivity to the Number of Input Images

Generally, the performance of PS pixel selection methods is directly related to the number of input time-series SAR images. Incorporating more SAR images helps improve the quality of PS pixel selection. However, it is regrettable that the data sources available for processing are not always sufficient. Therefore, here we analyze the sensitivity of our proposed MFN method to the number of input images.

Figure 15 illustrates the variation in the mean STIP index and the number of selected PS pixels obtained by the MFN-based method as the number of input SAR images changes. It can be observed that, as the number of input SAR images decreases, the quantity of selected pixels increases, while the mean quality of these pixels declines. Notably, when the number of input SAR images exceeds 15, this change is gradual, indicating that the PS selection results of the MFN-based method are relatively insensitive to variations in number of input SAR images within this range. However, when the number of input images falls below 15, the performance of the MFN-based method deteriorates rapidly.

5.3. Performance Evaluation Across Different Scene

In Section 4, we have demonstrated the effectiveness of the proposed method using a scene in Tongzhou District. This area is located at the urban fringe with relatively low building density. To further evaluate the general applicability of our method, we additionally select a central urban area of Beijing (116.478°E–116.541°E longitude and 39.869°N–39.953°N latitude) based on the data presented in Table 2 for further experimentation. Figure 16 shows the optical image of the central urban area.

Figure 17 shows the STIP index distribution in the central urban area, along with scatistical histograms of the STIP index for the selected PS pixels obtained by the StaMPS method, the ADI-based method (

D_{A} < 0.472

), and our proposed MFN-based method. In terms of quantity, the MFN-based method selected a total of 186616 PS pixels in this area, which is comparable to the 186057 PS pixels selected by the ADI-based method (

D_{A} < 0.472

), and both significantly exceed the 119872 PS pixels selected by the StaMPS method. In terms of quality, the mean STIP index for the PS pixels selected by the MFN-based method is 35.0851, compared to 30.3674 for the ADI-based method (

D_{A} < 0.472

) and 33.4186 for the StaMPS method, demonstrating that our proposed MFN-based method achieves the optimal selection quality.

In summary, the experiment conducted in the central urban area further demonstrates the performance of the proposed MFN-based method, confirming its general applicability.

5.4. Challenges and Prospects for X-Band Application

The Sentinel-1A operates at C-band, featuring short spatiotemporal baselines and good global coverage, with its data free of charge. The assessment of the proposed MFN in Section 4 is also conducted using Sentinel-1A SAR images. However, X-band sensors have particular advantages in small structure deformation measurement tasks in urban areas. Benefiting from higher resolution, PS-InSAR processing using X-band data typically provides more PS pixels than C-band PS-InSAR, thereby better revealing deformation patterns in the scene [48]. Next, we aim to improve the proposed MFN to adapt it for X-band data processing. This is a highly challenging task. The shorter wavelength of X-band data makes the interferometric phase more sensitive to deformation and DEM errors. For systems with larger baselines spans (such as COSMO-SkyMed), the impact of DEM errors on the interferometric phase becomes even more significant. The rapid variation in the interferometric phase will make phase unwrapping more difficult. Consequently, the selection of PS pixels must guarantee not only the quality and quantity of PS pixels but also moderate phase variation between PS pixels.

5.5. Limitation and Future Work

In this paper, the MFN is constructed and applied to PS pixel selection task. While the method achieves notable performance in both the quality and quantity of selected PS pixels, it exhibits several limitations.

Firstly, as shown in Table 5, the proposed MFN-based method requires longer computation time compared to the traditional ADI-based method. While this increased compulational demand does not present a pronounced disadvantage for analyses of limited study area, it becomes a significant constraint as the study area expands. To address this limitation, our future work will focus on adjusting the blocking size of input SAR images to achieve a balance between memory constraints and computational efficiency.

Secondly, as shown in Figure 15, a decrease in the number of input SAR images adversely affects the performance of the MFN-based method, mainly due to insufficient temporal characteristics. To address this limitation, our future work will focus on enhancing the network’s architectures for feature extraction and utilization, thereby improving its performance under conditions of limited input images.

Finally, the proposed method is primarily focuses on C-band Sentinel-1A data. Our future work will extend the validation to X-band radar data to more comprehensively demonstrate the generalization ability of the proposed method.

6. Conclusions

Considering that the traditional ADI-based method can hardly balance the quality and quantity of PS pixels, in this paper, the MFN is constructed to fully select high-quality PS pixels from time-series SAR images. The MFN combines the 3D U-Net and the CLSTM, which can effectively achieve spatiotemporal feature extraction.

The experimental results demonstrate that, in comparison to the ADI-based method, the MFN-based method surpasses the low threshold (

D_{A} < 0.28

) case in term of quality and the high threshold (

D_{A} < 0.437

) case in term of quantity of PS pixel selection results. Therefore, we believe that the introduction of the MFN-based method into the PS-InSAR technique can effectively improve the coverage and accuracy of the deformation measurement.

In addition, considering that any type of scatterer exhibits certain spatiotemporal characteristics in the time-series SAR images, the MFN can be used in the task of recognizing other types of scatterers as long as suitable training is imposed.

Author Contributions

Conceptualization, Z.H. and M.L.; methodology, Z.H. and M.L.; software, Z.H. and M.L.; validation, Z.H. and M.L.; formal analysis, Z.H., M.L., and G.L.; investigation, Z.H., G.L., Y.W., and C.S.; resources, Z.D.; data curation, Z.H., G.L., Y.W., and C.S.; writing—original draft preparation, Z.H.; writing—review and editing, Z.H., G.L., and Z.D.; visualization, Z.H., Y.W., and C.S.; supervision, G.L. and Z.D.; project administration, G.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 62227901, the Postdoctoral Fellowship Program of CPSF grant number GZC20233416, the China Postdoctoral Science Foundation grant number 2024M764135.

Data Availability Statement

The Sentinel-1A data used in this study were provided by the European Space Agency (ESA), https://search.asf.alaska.edu/#/ (accessed on 20 July 2025).

Acknowledgments

Sentinel-1A data used in this study were provided by the European Space Agency (ESA). We are very grateful for the above support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bürgmann, R.; Rosen, P.A.; Fielding, E.J. Synthetic aperture radar interferometry to measure Earth’s surface topography and its deformation. Annu. Rev. Earth Planet. Sci. 2000, 28, 169–209. [Google Scholar] [CrossRef]
Xue, F.; Lv, X.; Dou, F.; Yun, Y. A review of time-series interferometric SAR techniques: A tutorial for surface deformation analysis. IEEE Geosci. Remote Sens. Mag. 2020, 8, 22–42. [Google Scholar] [CrossRef]
Yang, Z.; Li, Z.; Zhu, J.; Wang, Y.; Wu, L. Use of SAR/InSAR in mining deformation monitoring, parameter inversion, and forward predictions: A review. IEEE Geosci. Remote Sens. Mag. 2020, 8, 71–90. [Google Scholar] [CrossRef]
Lazeckỳ, M.; Spaans, K.; González, P.J.; Maghsoudi, Y.; Morishita, Y.; Albino, F.; Elliott, J.; Greenall, N.; Hatton, E.; Hooper, A.; et al. LiCSAR: An automatic InSAR tool for measuring and monitoring tectonic and volcanic activity. Remote Sens. 2020, 12, 2430. [Google Scholar] [CrossRef]
Caló, F.; Notti, D.; Galve, J.P.; Abdikan, S.; Görüm, T.; Pepe, A.; Balik Şanli, F. Dinsar-Based detection of land subsidence and correlation with groundwater depletion in Konya Plain, Turkey. Remote Sens. 2017, 9, 83. [Google Scholar] [CrossRef]
Métois, M.; Benjelloun, M.; Lasserre, C.; Grandin, R.; Barrier, L.; Dushi, E.; Koçi, R. Subsidence associated with oil extraction, measured from time series analysis of Sentinel-1 data: Case study of the Patos-Marinza oil field, Albania. Solid Earth 2020, 11, 363–378. [Google Scholar] [CrossRef]
Corsa, B.; Barba-Sevilla, M.; Tiampo, K.; Meertens, C. Integration of DInSAR time series and GNSS data for continuous volcanic deformation monitoring and eruption early warning applications. Remote Sens. 2022, 14, 784. [Google Scholar] [CrossRef]
Nofl, D.; Darwishe, H.; Chaaban, F.; Mohammad, A. Mapping surface displacements after the 6 February 2023 earthquake in Syria and Turkey using DInSAR and GIS techniques. Spat. Inf. Res. 2024, 32, 231–251. [Google Scholar] [CrossRef]
Ferretti, A.; Prati, C.; Rocca, F. Permanent scatterers in SAR interferometry. IEEE Trans. Geosci. Remote Sens. 2001, 39, 8–20. [Google Scholar] [CrossRef]
Ferretti, A.; Prati, C.; Rocca, F. Nonlinear subsidence rate estimation using permanent scatterers in differential SAR interferometry. IEEE Trans. Geosci. Remote Sens. 2000, 38, 2202–2212. [Google Scholar] [CrossRef]
Liu, G.; Buckley, S.M.; Ding, X.; Chen, Q.; Luo, X. Estimating Spatiotemporal Ground Deformation with Improved Persistent-Scatterer Radar Interferometry. IEEE Trans. Geosci. Remote Sens. 2009, 47, 3209–3219. [Google Scholar] [CrossRef]
Iglesias, R.; Mallorqui, J.J.; López-Dekker, P. DInSAR pixel selection based on sublook spectral correlation along time. IEEE Trans. Geosci. Remote Sens. 2013, 52, 3788–3799. [Google Scholar] [CrossRef]
Foroughnia, F.; Nemati, S.; Maghsoudi, Y.; Perissin, D. An iterative PS-InSAR method for the analysis of large spatio-temporal baseline data stacks for land subsidence estimation. Int. J. Appl. Earth Obs. Geoinf. 2019, 74, 248–258. [Google Scholar] [CrossRef]
Li, G.; Ding, Z.; Li, M.; Hu, Z.; Jia, X.; Li, H.; Zeng, T. Bayesian estimation of land deformation combining persistent and distributed scatterers. Remote Sens. 2022, 14, 3471. [Google Scholar] [CrossRef]
Goel, K.; Adam, N. A distributed scatterer interferometry approach for precision monitoring of known surface deformation phenomena. IEEE Trans. Geosci. Remote Sens. 2013, 52, 5454–5468. [Google Scholar] [CrossRef]
Esfahany, S.S. Exploitation of Distributed Scatterers in Synthetic Aperture Radar Interferometry. Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2017. [Google Scholar]
Hooper, A.; Zebker, H.; Segall, P.; Kampes, B. A new method for measuring deformation on volcanoes and other natural terrains using InSAR persistent scatterers. Geophys. Res. Lett. 2004, 31. [Google Scholar] [CrossRef]
Zhao, F.; Mallorqui, J.J. A temporal phase coherence estimation algorithm and its application on DInSAR pixel selection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8350–8361. [Google Scholar] [CrossRef]
Hooper, A.; Segall, P.; Zebker, H. Persistent scatterer interferometric synthetic aperture radar for crustal deformation analysis, with application to Volcán Alcedo, Galápagos. J. Geophys. Res. Solid Earth 2007, 112. [Google Scholar] [CrossRef]
Narayan, A.B.; Tiwari, A.; Dwivedi, R.; Dikshit, O. Persistent scatter identification and look-angle error estimation using similar time-series interferometric pixels. IEEE Geosci. Remote Sens. Lett. 2017, 15, 147–150. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Malmgren-Hansen, D.; Nobel-J, M. Convolutional neural networks for SAR image segmentation. In Proceedings of the 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Abu Dhabi, United Arab Emirates, 7–10 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 231–236. [Google Scholar]
Ren, Y.; Li, X.; Xu, H. A deep learning model to extract ship size from Sentinel-1 SAR images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5203414. [Google Scholar] [CrossRef]
Al-Najjar, H.A.; Pradhan, B.; Beydoun, G.; Sarkar, R.; Park, H.J.; Alamri, A. A novel method using explainable artificial intelligence (XAI)-based Shapley Additive Explanations for spatial landslide prediction using Time-Series SAR dataset. Gondwana Res. 2023, 123, 107–124. [Google Scholar] [CrossRef]
Hu, J.; Wu, W.; Gui, R.; Li, Z.; Zhu, J. Deep learning-based homogeneous pixel selection for multitemporal SAR interferometry. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5234518. [Google Scholar] [CrossRef]
Tiwari, A.; Narayan, A.B.; Dikshit, O. Deep learning networks for selection of measurement pixels in multi-temporal SAR interferometric processing. ISPRS J. Photogramm. Remote Sens. 2020, 166, 169–182. [Google Scholar] [CrossRef]
Zhang, Y.; Wei, J.; Duan, M.; Kang, Y.; He, Q.; Wu, H.; Lu, Z. Coherent pixel selection using a dual-channel 1-D CNN for time series InSAR analysis. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102927. [Google Scholar] [CrossRef]
Chen, S.; Zhao, C.; Jiang, M.; Yu, H. PSFNet: A Feature-Fusion Framework for Persistent Scatterer Selection in Multi-Temporal InSAR. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 19972–19985. [Google Scholar] [CrossRef]
Azadnejad, S.; Kandiri, A.; Hrysiewicz, A.; O’Loughlin, F.; Holohan, E.; Dev, S.; Donohue, S. Application of deep learning for coherent pixel selection in time series InSAR for urban area and transport infrastructure monitoring. Int. J. Appl. Earth Obs. Geoinf. 2025, 142, 104718. [Google Scholar] [CrossRef]
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II 19. Springer: Cham, Switzerland, 2016; pp. 424–432. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28. Available online: https://proceedings.neurips.cc/paper/2015/hash/07563a3fe3bbe7e3ba84431ad9d055af-Abstract.html (accessed on 20 July 2025).
Mehta, R.; Arbel, T. 3D U-Net for brain tumour segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Granada, Spain, 16 September 2018; Springer: Cham, Switzerland, 2018; pp. 254–266. [Google Scholar]
Wang, F.; Jiang, R.; Zheng, L.; Meng, C.; Biswal, B. 3d u-net based brain tumor segmentation and survival days prediction. In Proceedings of the International MICCAI Brainlesion Workshop, Shenzhen, China, 17 October 2019; Springer: Cham, Switzerland, 2019; pp. 131–141. [Google Scholar]
Hwang, H.; Rehman, H.Z.U.; Lee, S. 3D U-Net for skull stripping in brain MRI. Appl. Sci. 2019, 9, 569. [Google Scholar] [CrossRef]
Ishitsuka, K.; Tsuji, T.; Lin, W.; Kagabu, M.; Shimada, J. Seasonal and transient surface displacements in the Kumamoto area, Japan, associated with the 2016 Kumamoto earthquake: Implications for seismic-induced groundwater level change. Earth Planets Space 2020, 72, 144. [Google Scholar] [CrossRef]
Li, M.; Zhang, X.; Bai, Z.; Xie, H.; Chen, B. Land subsidence in Qingdao, China, from 2017 to 2020 based on PS-InSAR. Int. J. Environ. Res. Public Health 2022, 19, 4913. [Google Scholar] [CrossRef]
Chai, L.; Xie, X.; Wang, C.; Tang, G.; Song, Z. Ground subsidence risk assessment method using PS-InSAR and LightGBM: A case study of Shanghai metro network. Int. J. Digit. Earth 2024, 17, 2297842. [Google Scholar] [CrossRef]
Zhou, C.; Gong, H.; Zhang, Y.; Warner, T.A.; Wang, C. Spatiotemporal evolution of land subsidence in the Beijing plain 2003–2015 using persistent scatterer interferometry (PSI) with multi-source SAR data. Remote Sens. 2018, 10, 552. [Google Scholar] [CrossRef]
Chen, Y.; Dong, X.; Qi, Y.; Huang, P.; Sun, W.; Xu, W.; Tan, W.; Li, X.; Liu, X. Integration of DInSAR-PS-stacking and SBAS-PS-InSAR methods to monitor mining-related surface subsidence. Remote Sens. 2023, 15, 2691. [Google Scholar] [CrossRef]
Varugu, B.K.; Jones, C.E.; Wang, K.; Chen, J.; Osborne, R.L.; Voyiadjis, G.Z. Optimized GNSS cal/val site selection for expanding InSAR viability in areas with low phase coherence: A case study for southern Louisiana. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4875–4889. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hou, J.; Xu, B.; Li, Z.; Zhu, Y.; Feng, G. Block PS-InSAR ground deformation estimation for large-scale areas based on network adjustment. J. Geod. 2021, 95, 111. [Google Scholar] [CrossRef]
Lu, Y.; Salem, F.M. Simplified gating in long short-term memory (lstm) recurrent neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1601–1604. [Google Scholar]
Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 565–571. [Google Scholar]
Nachmani, E.; Marciano, E.; Lugosch, L.; Gross, W.J.; Burshtein, D.; Be’ery, Y. Deep learning methods for improved decoding of linear codes. IEEE J. Sel. Top. Signal Process. 2018, 12, 119–131. [Google Scholar] [CrossRef]
Mancini, F.; Grassi, F.; Cenni, N. A workflow based on SNAP–StaMPS open-source tools and GNSS data for PSI-Based ground deformation using dual-orbit sentinel-1 data: Accuracy assessment with error propagation analysis. Remote Sens. 2021, 13, 753. [Google Scholar] [CrossRef]
Wang, Y.; Bai, Z.; Zhang, Y.; Qin, Y.; Lin, Y.; Li, Y.; Shen, W. Using TerraSAR X-band and sentinel-1 C-band SAR interferometry for deformation along Beijing-Tianjin intercity railway analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4832–4841. [Google Scholar] [CrossRef]

Figure 1. PS pixel scatterer composition.

Figure 2. PS pixel values dispersion. Under high SNR condition (small

σ_{n}

), PSD

(σ_{φ})

can be approximated by ADI

(D_{A} = σ_{A} / μ_{A})

.

Figure 2. PS pixel values dispersion. Under high SNR condition (small

σ_{n}

), PSD

(σ_{φ})

can be approximated by ADI

(D_{A} = σ_{A} / μ_{A})

.

Figure 3. Relationship between ADI and PSD. (a) Simulation result; (b) 0.3 ADI threshold; (c) 0.4 ADI threshold.

Figure 4. The process of the MFN-based PS-InSAR algorithm.

Figure 5. The structure of MFN, which combines the 3D U-Net [30] and the CLSTM [31].

Figure 6. The t-th CLSTM block structure [31], where * denotes the convolution operation.

Figure 7. Process of training dataset construction.

Figure 8. Approximate relationship between ADI and PSD in different cases of SAR image number N involved in PS pixel selection.

Figure 9. The optical image of the study area.

Figure 10. PS pixel selection result of MFN-based method, ADI-based method (

D_{A} < 0.437

), and their intersection in the (a) study area; (b) corresponds to L1; (c) corresponds to L2.

Figure 10. PS pixel selection result of MFN-based method, ADI-based method (

D_{A} < 0.437

), and their intersection in the (a) study area; (b) corresponds to L1; (c) corresponds to L2.

Figure 11. Assessing the quality of PS pixel selection of the ADI-based method (

D_{A} < 0.437

) and the MFN-based method by the STIP index. (a) STIP index distribution in the study area; (b) STIP index statistical histograms of the two methods.

Figure 11. Assessing the quality of PS pixel selection of the ADI-based method (

D_{A} < 0.437

) and the MFN-based method by the STIP index. (a) STIP index distribution in the study area; (b) STIP index statistical histograms of the two methods.

Figure 12. Deformation rate of the study area measured by PS-InSAR technique. The PS pixels used for the measurements are selected by (a) ADI-based method (

D_{A} < 0.28

), (b) ADI-based method (

D_{A} < 0.437

), (c) MFN-based method, respectively.

Figure 12. Deformation rate of the study area measured by PS-InSAR technique. The PS pixels used for the measurements are selected by (a) ADI-based method (

D_{A} < 0.28

), (b) ADI-based method (

D_{A} < 0.437

), (c) MFN-based method, respectively.

Figure 13. Deformation measurement quality assessment. The deformation rate measurements obtained based on the PS pixels selected by (a) ADI-based method (

D_{A} < 0.28

), (b) ADI-based method (

D_{A} < 0.437

), (c) MFN-based method are regressed against the measurements obtained by StaMPS as a reference.

Figure 13. Deformation measurement quality assessment. The deformation rate measurements obtained based on the PS pixels selected by (a) ADI-based method (

D_{A} < 0.28

), (b) ADI-based method (

D_{A} < 0.437

), (c) MFN-based method are regressed against the measurements obtained by StaMPS as a reference.

Figure 14. The principles of multi-strucutre voting evaluation method.

Figure 15. Sensitivity of the MFN-based PS pixel selection to the number of input images.

Figure 16. The optical image of the central urban area.

Figure 17. Assessing the quality of PS pixel selection of the StaMPS method, the ADI-based method (

D_{A} < 0.472

), and the MFN-based method by the STIP index. (a) STIP index distribution in the central urban area; (b) STIP index statistical histograms of the three methods.

Figure 17. Assessing the quality of PS pixel selection of the StaMPS method, the ADI-based method (

D_{A} < 0.472

), and the MFN-based method by the STIP index. (a) STIP index distribution in the central urban area; (b) STIP index statistical histograms of the three methods.

Table 1. Data used to build the training dataset.

Location	Sensor	Orbit Direction	Temporal Coverage	SLC Number	Feature Type
Yueyang, Hunan, China	Sentinel-1A	Ascending	20190108-20201228	60	Building, Farmland, Waterbody
Washington, DC, USA	Sentinel-1A	Ascending	20181231-20211227	84	Building, Parkland, Waterbody
Wuhan, Hubei, China	Sentinel-1A	Ascending	20190108-20211223	90	Building, Farmland, Wood, Waterbody
Karachi, Sindh, Pakistan	Sentinel-1A	Ascending	20190103-20201223	60	Building, Desert, Waterbody
Bayannur, Inner Mongolia, China	Sentinel-1A	Ascending	20190106-20201226	57	Building, Farmland, Desert, Waterbody

Table 2. Test data obtained from Sentinel-1A.

Date	Interval [Days]	Baseline [m]	Date	Interval [Days]	Baseline [m]
30 July 2015	564	7.91	1 February 2017	12	−52.34
23 August 2015	540	−116.88	13 February 2017	0	0
16 September 2015	516	−8.8	9 March 2017	−24	21.49
10 October 2015	492	−17.25	20 May 2017	−96	−90.67
3 November 2015	468	−77.9	13 June 2017	−120	−29.59
13 May 2016	276	−114.22	25 June 2017	−132	−8.65
6 June 2016	252	−13.31	19 July 2017	−156	−30.03
5 August 2016	192	−46.06	31 July 2017	−168	−84.48
17 August 2016	180	−26.25	12 August 2017	−180	−71.74
29 August 2016	168	−16.46	24 August 2017	−192	13.73
4 October 2016	132	−8.48	5 September 2017	−204	25.18
16 October 2016	120	38.55	17 September 2017	−216	−54.19
28 October 2016	108	−5.53	11 October 2017	−240	−110.9
9 November 2016	96	−50.85	23 October 2017	−252	−41.78
21 November 2016	84	−97.87	4 November 2017	−264	44.99
3 December 2016	72	−27.13	16 November 2017	−276	20.54
15 December 2016	60	40.97	28 November 2017	−288	−51.23
27 December 2016	48	5.19	10 December 2017	−300	−20.78
8 January 2017	36	14.93	22 December 2017	−312	79.85

Table 3. PS pixel number in the study area.

Method	PS Pixel Number
ADI-based method $D_{A} < 0.28$	11,958
ADI-based method $D_{A} < 0.437$	74,344
MFN-based method	74,644

Table 4. Effect of different structures of PS pixel selection.

Structure	PS Pixel Number	$R_{a}$	$R_{e}$
MFN	74,644	96.4004%	9.2573%
3D U-Net	71,667	93.2708%	8.1730%
CLSTM	69,317	49.8606%	48.2568%

Table 5. Computational time of different methods.

Method	Device	Computation Time [s]
MFN	GPU	64.6511
StaMPS	CPU	339.3928
ADI-based method	CPU	1.0709
STIP-based method	CPU	3332.2350

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Z.; Li, M.; Li, G.; Wang, Y.; Sun, C.; Dong, Z. Persistent Scatterer Pixel Selection Method Based on Multi-Temporal Feature Extraction Network. Remote Sens. 2025, 17, 3319. https://doi.org/10.3390/rs17193319

AMA Style

Hu Z, Li M, Li G, Wang Y, Sun C, Dong Z. Persistent Scatterer Pixel Selection Method Based on Multi-Temporal Feature Extraction Network. Remote Sensing. 2025; 17(19):3319. https://doi.org/10.3390/rs17193319

Chicago/Turabian Style

Hu, Zihan, Mofan Li, Gen Li, Yifan Wang, Chuanxu Sun, and Zehua Dong. 2025. "Persistent Scatterer Pixel Selection Method Based on Multi-Temporal Feature Extraction Network" Remote Sensing 17, no. 19: 3319. https://doi.org/10.3390/rs17193319

APA Style

Hu, Z., Li, M., Li, G., Wang, Y., Sun, C., & Dong, Z. (2025). Persistent Scatterer Pixel Selection Method Based on Multi-Temporal Feature Extraction Network. Remote Sensing, 17(19), 3319. https://doi.org/10.3390/rs17193319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Persistent Scatterer Pixel Selection Method Based on Multi-Temporal Feature Extraction Network

Abstract

Highlights

Abstract

1. Introduction

2. PS Pixel Statistical Characteristics

3. The Proposed MFN-Based Method

3.1. Spatial Characteristics Extraction Based on 3D U-Net

3.2. Temporal Characteristics Extraction Based on CLSTM

3.3. Loss Function

3.4. Training Dataset Construction

4. Experimental Datasets and Results

4.1. Validation Area and Dataset

4.2. Performance Analysis of the MFN

4.2.1. PS Pixel Selection Effectiveness Evaluation

4.2.2. Deformation Measurement Effectiveness Evaluation

4.3. Performance Analysis of Different Structures

5. Discussion

5.1. Comparative Analysis of Computational Cost

5.2. Sensitivity to the Number of Input Images

5.3. Performance Evaluation Across Different Scene

5.4. Challenges and Prospects for X-Band Application

5.5. Limitation and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI