Landslide Displacement Prediction via Attentive Graph Neural Network

: Landslides are among the most common geological hazards that result in considerable human and economic losses globally. Researchers have put great efforts into addressing the landslide prediction problem for decades. Previous methods either focus on analyzing the landslide inventory maps obtained from aerial photography and satellite images or propose machine learning models— trained on historical land deformation data—to predict future displacement and sedimentation. However, existing approaches generally fail to capture complex spatial deformations and their inter-dependencies in different areas. This work presents a novel landslide prediction model based on graph neural networks, which utilizes graph convolutions to aggregate spatial correlations among different monitored locations. Besides, we introduce a novel locally historical transformer network to capture dynamic spatio-temporal relations and predict the surface deformation. We conduct extensive experiments on real-world data and demonstrate that our model signiﬁcantly outperforms state-of-the-art approaches in terms of prediction accuracy and model interpretations.


Introduction
Landslides are among the most common geological hazards worldwide, which happen due to the frictional strength and the destabilizing forces of the slope [1]. They can be aggravated by heavy rainfall, snowstorms, or other natural hazards in mountainous areas or other regions with deep ravines and steep terrain [2]. The occurrence of landslides often washes away roads, railways, and even villages and towns, threatening human lives and causing enormous economic losses [3]. Monitoring and, consequently, preventing such disasters have received significant attention from both industry and academia [2,[4][5][6][7][8][9][10].
Researchers have developed many methods to predict the landslide in critical areas, e.g., hydropower stations and inhabited mountainous regions [7]. Earlier approaches often rely on experts' knowledge to produce landslide susceptibility maps and analyze the slope deformation [11,12]. For example, domain experts can evaluate the safety factor of slopes based on the detailed topography and geological characteristics of a specific site and often provide accurate forecasts. However, these approaches require a broad range of domain knowledge-including but not limited to geology, ecology, pedology, mechanics, and statistics-and still may not be able to predict the landslides promptly. Therefore, various machine learning methods have been employed to train models for accurate and timely landslide prediction [13,14], enabled by the rapid developments of Geographic Information System (GIS) technology and wireless sensor networks to monitor and predict landslides automatically. For example, typical statistics and machine learning methods such as Bayesian networks [15], logistic regression [16,17], decision trees, random are capable of learning spatial relationships in graphs, they usually depend on recurrent neural networks (RNNs) to model and predict the time series [29]. Besides, researchers have identified evolving graphs to include temporal perception, which provides a new perspective of spatio-temporal learning [27]. Recently, Transformer, a new simple network architecture proposed by Vaswani et al. [30], has been successfully applied to natural language processing and visual recognition for modeling spatio-temporal data [31][32][33]. For instance, a recent study [34] applied a gated attention network to make higher-level semantic segmentation on remote sensing data.
Although existing approaches have achieved significant progress on landslide prediction, few efforts have been conducted on continuous landslide susceptibility prediction using InSAR time-series measurements. Typically, the local surface deformation is studied in isolation, without associating the information between different time stamps. The aforementioned studies have more than one block of learning spatial and temporal characteristics separately or sequentially, failing to capture spatio-temporal characteristics systematically.
In this work, we present a novel deep learning landslide prediction model that combines graph neural networks and a new Transformer network. The proposed landslide forecasting attentive graph neural network (LandGNN) exploits graph convolutions to learn spatial correlations (e.g., geographic distance) among monitoring sites. Furthermore, a locally historical Transformer is designed to aggregate spatial and temporal features jointly, allowing consistent modeling of the land displacement dependencies and understanding of the interactions between sites. Our main contributions are three-fold:

•
We present a GNN-based landslide prediction model using accurate InSAR data. It shows superiority compared with traditional and deep learning methods in predicting land deformation. • We propose a variant of the typical self-attention mechanism, which we call locally historical Transformer, to simultaneously utilize spatial and temporal dependencies. • We provide a new real-world dataset collected from a critical area prone to landslides, based on which extensive experiments were conducted, evaluating and demonstrating the effectiveness of LandGNN.

Related Work
In this section, a review of the related literature is conducted. Furthermore, we position our work in the context from boarder perspectives of landslide prediction, graph neural network, and self-attention mechanisms.

Land Displacement Prediction
A range of methods has been applied to predict landslides in the last decades, which can be classified into three categories: (i) knowledge-based methods, (ii) data-driven models, and (iii) deep learning-based techniques.
The first group of methods [10,11,35,36] relies on environmental conditions and experts' domain knowledge to evaluate the probability of landslide occurrences. For example, Kang et al. [10] analyze the pre-sliding displacement of the Guanling landslide with advanced land observing InSAR and study the mechanism of the landslide from the perspective of topography, geological structure, and historical rainfall records. Liu et al. [35] emphasize the kinematic uncertainties in the semi-empirical dynamics method expressed by the diffusion angle. Zhu et al. [11] characterize the importance of factors by extracting the features from the relationship between landslide susceptibility and various factors by domain experts. Daniela et al. [36] developed quantitative analysis based on high-quality and detailed digital datasets, considering joint comparisons between four morphometric variables, i.e., slope, roughness, terrain ruggedness index, and elevation standard deviation. Knowledge-driven methods largely depend on the understanding of the fundamental causes. However, delivering a high prediction accuracy based on various influential factors may introduce experts' domain knowledge.
Landslide displacement directly reflects the deformation and stability of a slope, from which data-driven models can be utilized to understand the patterns of landslide characteristics. The rapid development of measurement technology has inspired a number of research on data-driven approaches, including statistically-based machine learning methods, i.e., random forest [7], logistic regression [17], naive Bayes trees [15], random subspace [37], and support vector machine [38]. Their promising results demonstrate the advantages of data-driven approaches over expert systems on landslide susceptibility mapping and displacement prediction.
Given the outstanding performance in forecasting time-series data, deep-learning approaches have been extensively explored for landslide displacement prediction. Existing methods exploit various network layers, informative features, and nonlinear dependencies from multidimensional data. Recurrent neural networks (RNNs) and their variants are widely used and have achieved impressive results. For instance, GC-GRU-N [8] is a multi-weight graph convolutional network incorporating gated recurrent network (GRU) [39] to learn temporal dependencies. VMD-stacked LSTM-TAR [9] predicts the trend sequences with stacked long short time memory (LSTM) [40] network to model rainfall and reservoir water levels as influential factors to landslides. Besides, convolutional neural networks (CNNs) and graph neural networks (GNNs) are also adopted to learn the spatial patterns among different monitored locations. For example, Lei et al. [23] propose a noise-insensitive approach using multivariate morphological reconstruction in image preprocessing. Ju et al. [41] use three image-based object detection methods for landslide susceptibility detection and obtain accurate and stable results. Compared with CNNs, GNNs can accurately model neighborhood relations in non-Euclidean spaces with more accurate spatial dependencies in point-cloud [8,42], e.g., Jiang et al. [8] combine GNN and RNN to model spatio-temporal dependencies.
Our work is most closely related to the deep-learning-based approaches. However, previous models mainly depend on specific designs of model architecture without jointly considering spatial and temporal features. This observation motivates us to introduce the self-attention mechanism to model the spatio-temporal dependencies simultaneously.

InSAR Technology
InSAR technology is an advanced geodetic tool that features fine spatial resolution, high measurement precision (in cm or less), and all-day and all-weather working capabilities. InSAR systems emit electromagnetic waves, collect and analyze the amplitude and phase of the returned energy from a target, usually used for retrieving complete 3D surface displacements [43,44]. Due to the promising performance of capturing the movement of active landslides, various InSAR techniques have been employed to detect potential slope failures [1,2,10,45], including the traditional InSAR [10], corner reflector InSAR [35], and squeeSAR technique [46]. In this work, we analyze the 3D InSAR data collected from slopes around the dam of hydropower stations prone to landslides. We introduce the used data format in the next section and focus on the displacement forecasting methodology.

Graph Neural Networks
Recent years have witnessed the success of GNNs for modeling graph data [47][48][49]. The core of GNN models is made up of the extract and aggregate functions. The extract function captures useful information from neighborhoods of the target node with attributes of edges between them as queries. Then the extracted features are aggregated with sophisticated pooling and normalization operations. For example, graph convolutional network (GCN) [47] learns node representations on the graph, which is extended to CNN using spectral methods. Benefit from the capability of encoding expressive spatial representations, there are plenty of spatio-temporal GNNs designed for predicting the attribute of points, including climate forecasting [50], urban flow prediction [51], traffic estimation [28,29,32,52] and land displacement prediction [8].
However, most existing methods model spatial and temporal dependencies separately, using GNNs and extra time-series modules, e.g., attention layers [28,29,32] and recurrent neural networks [8,50]. To capture the interactions between spatial and temporal dependencies, we propose to learn spatio-temporal characteristics simultaneously. Moreover, existing spatio-temporal GNNs are not applicable for point cloud data due to the lack of pre-defined networks like roads in the traffic system. Recent studies like Point-GNN [53] leverages GNN for point cloud data learning but are designed for object detection and classification and thus are not suitable for time-series prediction. In contrast, we carefully define the spatial graph structure on the point cloud and then incorporate a locally historical transformer to model the mutual spatio-temporal interactions between time-series on different monitored locations.

Transformers
Since proposed by [30], the self-attention mechanism has attracted worldwide interest, which is also known as Transformer. Nowadays, transformers are pervasive and have achieved great successes in the field of natural language processing (e.g., the well-known BERT [54] and GPT-3 [55]) and computer vision (e.g., Vision Transformer [56] and Detection Transformer [57]). Due to the excellent performance in modeling long-term time-series and spatial patterns, Transformers are also applied to various spatio-temporal forecasting tasks and obtain state-of-the-art results [58]. For example, traffic flow forecasting [29,59] and air quality prediction [60] with Transformers. Despite the effectiveness across various domains, Transformers are barely explored in landslide prediction and slope displacement forecasting. In this work, we design an attention-based spatio-temporal model, emphasizing the critical locally spatial dependencies in modeling the temporal patterns for displacement time-series data prediction.

Methodology
In this section, the studied problem is defined and the details of the proposed model LandGNN are discussed.

Problem Definition
InSAR measurements: The data consists of N monitored sites S = {s 1 , . . . , s N }, where s i = (z i , l i ) is a particular site determined by 3D geographical coordinates z i ∈ R 3 (longitude, latitude, elevation). In each site, the displacement is measured by InSAR technology, and consists of a sequence of records l i = {l 1 i , . . . , l T+T i } representing the deformation of this monitored site evolving in time. The formula l t i ∈ R p indicates each deformation that has p features. Figure 1 illustrates the studied slopes (details are described in Section 4). (1)

LandGNN
The main idea behind LandGNN is depicted in Figure 2. First, it builds the spatial graph containing neighborhood relations and geographical relative position relations. Next, we employ a typical graph convolution network to compute the feature fusion from neighbors. Subsequently, a locally historical Transformer is proposed to aggregate spatio-temporal features and make final predictions. With such architecture, LandGNN not only learns the intra-independence of nearing sites but also considers the evolution of local interactions.

Spatial Graph
Considering the InSAR data in different sites are independent of each other, the correlations among sites must be built into a spatial graph. Naturally, we can map original 3D relative positions to a graph, where nearing sites are closer to each other in Euclidean distance. A direct solution to the prediction problem is converting the point cloud into a 2D image from which CNNs can learn spatial features for prediction. However, the prediction accuracy is limited due to the image resolution and the information loss resulting from mapping the 3D point cloud to the 2D image. Recently, methods based on the 3D point cloud have enabled significant improvements [53]. By connecting near-neighbors within a pre-set fixed distance, graphs containing spatial correlations can be easily constructed. Building such a graph has been proven effective in object detection in computer vision. Following the same idea, we convert InSAR data into a graph. We set a threshold δ and then connect all sites pair s i and s j if Euclidean distance z i − z j 2 < δ. Note that the threshold is a critical hyper-parameter, balancing the preservation of more geographical information with discarding redundant edges.

Spatial Feature Fusion
Catastrophic movement would result in dramatic changes to the surface, which are usually severe and would be quickly transmitted to adjacent areas. There is no site isolated from others on the surface, and the deformation will spread to surrounding locations, causing a more comprehensive range of surface changes. Therefore, there are signs that precede a massive landslide. We can continuously monitor the ground and give a warning when the range of deformation exceeds expectations. Toward this end, we apply convolu-tion operations [47] over the graph to aggregate neighbors' features and capture this kind of interaction: where A is the adjacency matrix, D is the corresponding degree matrix, I is the identity matrix and λ controls self-weight. X (y) is the input of y-th layer with trainable parameters Φ (y) , and we have Y layers in total. We initialize X (0) = L, and use ReLU as the activation function. LetÂ = λI + A andD − 1 Then Equation (2) can be rewritten as follows: Right after the G is defined,D − 1 2ÂD − 1 2 is fixed and used throughout the entire graph convolution procedure. Since the absolute values of deformation are generally far less than the geographical positions, we can approximately consider the graph G as invariant during the monitored and predicted periods calculated in advance as a constant. Next, we obtain high-order features by applying Equation (3) recursively to aggregate from y-th order neighborhood. And we found that a 3-layer convolutional network is sufficient in our empirical evaluations, and we denote Q = X (Y) as feature fusion output. Note that many advanced GNNs such as GraphSAGE and GIN can be easily used to replace the GCN.

Locally Historical Transformer
We draw some observations at different timestamps in Figure 3, in which the relative displacements are re-scaled from 0 to 1. We found that the displacements are consecutive both spatially and temporally. On the one hand, the displacements of sites at a specific time are very similar to observations at the previous time step. On the other hand, the deformations in spatially close areas are very similar and interact with each other. That is, spatio-temporal causality is critical for modeling and predicting landslide susceptibility.
Transformer architecture and attention mechanisms have been widely used for capturing spatial, temporal [48,61], and spatio-temporal dependencies [31][32][33]. However, existing methods use two groups of attention blocks to model spatial and temporal dependencies separately and then concatenate the results. As a result, these studies cannot exploit the full potential of self-attention to learn spatio-temporal correlations jointly. This motivates us to propose novel locally historical attention blocks to systematically explore the spatiotemporal causality. More specifically, we propose using spatio-temporal masks to encode corresponding positions that comprise positional information from both temporal and spatial perspectives. Multiple positional masks are applied in different attention heads to model multi-level spatial dependency.
Traditionally, we can build a T × T triangular matrix to represent temporal dependency and a N × N reachability matrix to represent spatial geographical dependency. Since the key is to guild locally historical mask-as illustrated in Figure 2, we propose a NT × NT mask matrix to model spatial and temporal dependencies jointly, which can be formalized as: where N (s n i ) are k-hop neighbors of s n i that can be found by k-hop accessibility matrix A k . The nodes marked by 0 will be ignored. Moreover, multi-head attention is utilized to handle the complexity of multi-reachability. Q is duplicated h times and sent to h heads, and in each processing we calculate a customized masking for k-hop reachability by Equation (4). With such a mask design, the attention blocks are able to model spatial and temporal dependencies concomitantly. Subsequently, we compute the self-attention block following [30] as: where is Hadamard product (i.e., element-wise matrix product), Q, E, U are matrices of queries, keys and values, respectively. Next, we concatenate h heads together: MultiHead(Q, E, U, k) = concat(Head 1 , . . . , Head h )Θ.
where Θ is trainable parameters that can aggregate features from attention heads and control the output size. Note that k are not associated with h and can be determined by the necessity of the situation. One can use an arbitrary number of locally historical attention blocks to obtain the best performance.  The locally historical transformer can handle different adjacency matrices and spatiotemporal masks, allowing LandGNN to capture the dynamic spatio-temporal relationships and adapt to different locations. Note that many monitored areas may lead to space complexity explosion and attention weights dispersion. Therefore, in implementation, one can sample a subset of nodes, make predictions on the corresponding subgraphs iteratively, and calculate the average results at every location to approximate the predictions. Algorithm 1 summarizes the details of predicting land displacement via LandGNN.

Algorithm 1 Predicting via LandGNN.
Input: Adjacency matrix A, reachability on h heads {k z } h z=1 , N monitored sites S, displacement observations L, convolution layers Y. 1: Initialize z = 1 and X 0 = L; 2: while z ≤ h do Calculate attention function.

Objective
Given the ground-truth (L ) and the predicted ( L) displacement values of each node, we aim to minimize the gap between them as: which is exactly the root mean square errors (RMSE).

Experiments
We now present the experimental results that demonstrate the effectiveness of LandGNN against the state-of-the-art landslide prediction algorithms.

Dataset
Our model and the baselines are evaluated on real-world InSAR data of the slopes around a large-scale hydropower station Houziyan Dam, located on the Dadu River in Danba County, Sichuan province, China. Figure 1a illustrates the studied slopes, where dots alongside the river denote the monitored locations-red and cyan dots denote areas on the west and east sides, respectively. Figure 1b plots the plain graph, where the color is darker, the lower the node is. For both slopes, we used eight months of data spanning from 1 January 2019 to 31 August 2019 for evaluations. Table 1 summarizes the statistics of the two slopes. Note that the land displacement on the east side is slightly larger.

Baselines and Experimental Settings
We compare our method against the following baselines: (1) Historical Average (HA) is a time series model that predicts the future displacement of each location according to the averaged previous observations; (2) Support Vector Regression (SVR) is a typical time-series model which predicts the value at a future time step through minimizing the generalization error bound [17]; (3) Autoregressive Integrated Moving Average (ARIMA) is one of the most consolidated statistics-based approaches for time series modeling and prediction; (4) LSTM and GRU are two well-known variants of RNN that have been used for time series prediction [62]; (5) STGCN utilizes temporal gated convolution to model timeseries by an external specially designed graph and is wildly used for traffic prediction [27]; (6) DCRNN captures spatial features from random walks and temporal features from an auto-encoder structure [28]. (7) STAL is an attention-based LSTM for spatio-temporal forecasting [62].
All the models are trained on a server with a GeForce GTX 3090 GPU. We used the previous 50% observations for training and 30% data for validation. We then tested the models with the remaining 20% most recent data. HA, SVR and ARIMA are trained using the machine learning toolkit scikit-learn (https://scikit-learn.org). The deep learning models are tuned using Adam optimizer with an initial learning rate of 3 × 10 −4 . Both LSTM and GRU are 3-layer neural networks with 250 units in each layer. Our LandGNN is implemented with a 3-layer graph convolution network with 50 units in each layer. The spatially local transformer consists of a 3-layer decoder, and each layer has a 3-head attention block with 1, 2, and 3 order reachability matrices as masks. The graphs were built using thresholds δ = 80 for the west side, and δ = 100 for the east side, and the default self-connection weight of λ is 15.

Evaluation Metrics
We report the performance of all models using metrics widely used for evaluating time series models: RMSE, mean absolute error (MAE), accuracy (ACC), coefficient of determination (R 2 ), and explained variance score (EVS).

3.
Accuracy: Coefficient of Determination (R 2 ): Explained Variance Score (var):  Table 2 reports the prediction performance of all models on two sides in terms of five metrics. We can see that our proposed model LandGNN consistently outperforms other methods across both land sides. This result demonstrates that modeling spatial correlations among different sites is essential in predicting landslide susceptibility. In addition, traditional machine learning models such as HA, SVR, and ARIMA perform poorly due to their inability to capture non-linear interactions between locations. Meanwhile, RNN models are good at modeling long-and short-term dependencies in time-series and, therefore, significantly improve the prediction performance over traditional methods. However, the difference between them is trivial, and most importantly, they ignore the spatial interactions, which leads to inferior results compared to other spatio-temporal models. However, STGCN and DCRNN build spatio-temporal dependencies separately and sequentially compared to LandGNN, which models spatio-temporal dependencies as a whole, explaining the performance degradation. The comparison between STAL and LandGNN indicates that our model is superior to temporal prediction models, which verifies the contribution of the locally historical attention block. Moreover, our locally historical transformer network is better at capturing spatio-temporal dependency than vanilla GNN-based approaches such as STGCN and DCRNN.

Parameter Sensitivity
We now investigate the influence of two crucial hyperparameters in LandGNN. First, we discuss the influence of self-weight. Parameter δ determines how dense the constructed neighbor graph would be. Naturally, a more significant value of δ would result in a denser graph, requiring more computational cost for feature aggregation. Figure 4 shows how the prediction performance is affected by δ and its impact on training time. In the beginning, we hypothesized that the larger the value of δ, the better the prediction results. However, this hypothesis does not hold. For example, 80 m and 100 m are enough for the model to achieve the best performance on two sides. Therefore, increase the value of δ would degrade the model performance. This phenomenon happens due to the displacement nature of the lands. The surrounding areas may have similar displacements (e.g., positive values), which cannot be generalized to locations residing in distant places-where the displacements might be negative, which would neutralize the feature aggregations in GCN.
Another important hyperparameter is the distance threshold when constructing the adjacency matrix. When we consider more neighbors, the influence of the node itself decreases. For an aggregation network with a certain distance threshold δ, an optimal λ that controls the self-weight exists for the best prediction performance. Figure 5 shows the mean RMSE as well as the range of standard errors obtained from 10-run experiments. We can see that the optimal λ is around 20 and 10 for the west and east sides, respectively.

Visualizations
To explore how the proposed locally historical transformer works, we draw some attention weights of the first self-attention block in Figure 6. When making predictions at timestamp 3, we randomly select one location which lies precisely at the center of the red circle and draw its attention weights paid to locations on timestamps 2 and 1 with a 2-hop accessibility matrix. Additionally, to distinguish little attention weights and background, we set all pixel background values to 0. We also added 0.02 to pixels of all monitored locations so that the sum of their values is larger than 1. Results show that the predictions on the selected area are influenced mainly by neighboring areas with higher values of blocks. Moreover, the attention paid at timestamp 1 is less than at timestamp 2, which can be inferred by the scale of the color bar. In a word, attention is more densely distributed as monitored time is closer, which exactly confirms the motivation of our locally historical transformer, i.e., we should focus on more spatially and temporally relative locations in landslide forecasting. We also depict the real deformation and the predicted values to investigate the performance of LandGNN qualitatively. We randomly selected several monitored sites for visualization, and the results are shown in Figure 7. We can see that predictions made by LandGNN follow the actual displacement trend, indicating that our method can capture real-time landslide susceptibility.

Discussion
In the last section, we have empirically shown that LandGNN performs well in forecasting land displacement. We contrast the forecasting performance of LandGNN within three distinct groups of approaches, including statistic, machine learning, and deep learning methods. Experimental results under five evaluations are reported in Table 2, showing the superiority of LandGNN generally. In this section, based on the experimental results, we discuss both advantages and disadvantages of our model in comparison to the baselines and discuss what challenges we may face when deploying our model in the wild.
We observe that models solely rely on time-series perform worse than other spatiotemporal models. Although there is extensive evidence that modeling spatio-temporal dependencies is critical for forecasting tasks [27,28,62], we show that capturing spatiotemporal dependencies jointly can further improve the performance of our model. As shown in Figure 3, the displacements are consecutive spatially and temporally, which motivates us to design the locally historical transformer to handle the complicated interactions. The contrast of STAL and LandGNN strongly validates the effectiveness of the proposed mechanism and the re-modeling of standard transformers since both of them are attentionbased. It also verifies the argument that modeling spatio-temporal dependencies jointly is a better approach. Though the proposed LandGNN combined with locally historical transformer outperforms baselines significantly, there is room for further improvements. For example, in addition to the joint modeling of spatio-temporal dependencies we designed in LandGNN, other fusing mechanisms can be used here to boost performance. Another potential improvement for LandGNN could be the computing efficiency (due to the expensive costs of monitoring landslides and real-time deformation computations). We think more efforts are desired to investigate efficient model structures while maintaining a satisfactory performance.
Another concern is about the adjacency matrix construction. The collected InSAR data are isolated 3D points with varying displacements, and most existing GNN approaches apply kNN or threshold approach to identify the neighbors of each point [27,50,51]. Following existing methods, we construct the proper adjacency relationships via searching the space of δ, as illustrated in Figure 4 of Section 4. After investigating different hyperparameter values, we determined the optimal δ for two datasets. We have discussed the potential reasons for performance improvement and degradation and how these hyperparameters possibly influence the subsequent aggregation mechanism, i.e., the GCN architecture in our case. It is also noteworthy that the performance of the LandGNN varies substantially within the threshold, which indicates that hyperparameter-tuning is indispensable in constructing a reliable forecasting system based on LandGNN. Since the threshold is not derivable, it usually requires a lot of human effort to tune the hyperparameters, especially when the dataset is large-scaled or in the circumstances that we need to re-train the model when we observe new InSAR data. On the other hand, the threshold-based adjacency matrix construction cannot model the complicated correlations among nodes because it is the only spatial distance we considered. In other words, dynamic adjacency matrices are desired for representing the varying node relationships. Therefore, we suggest three potential research directions for improving LandGNN: automatically discriminating neighbors for all nodes, mining the changing spatio-temporal correlations, and constructing a better adjacency matrix for feature aggregations.
Here we provide two more directions for future studies on landslide displacement prediction. First, the InSAR measurements are disturbed by the environment, and it inevitably brings noise to the data. That is to say, the correlations between nodes are usually uncertain. Existing models are still unable to learn data uncertainties and resist anomalies. We believe probabilistic approaches can be used here to make the model prediction more robust. Second, GNNs-based models often suffer from over-smoothing issues, and the suggested depth of GNN layers is generally no more than four layers. The intense displacements are therefore smoothed when aggregating the surrounding features, limiting the capabilities of transformers.

Conclusions
We presented LandGNN, a spatio-temporal graph neural network-based model for predicting land displacement using the high-quality InSAR measurement data.

•
We apply graph convolution to aggregate spatial features on the defined graph structure and exploit transformer architecture to capture the locally historical dependencies between monitored nodes. Compared to traditional and deep learning-based methods, LandGNN explicitly models the spatio-temporal interactions between different locations and thus achieves better forecast performance.

•
The experiments conducted on real-world datasets show that LandGNN is superior to previous approaches due to the capability of considering the evolution of local interactions. We also report the sensitivity of two important hyperparameters to explore the effectiveness of adjacency relations. Meanwhile, the visualization study indicates how the attention mechanism works, further validating our motivation. • Our ongoing work aims to explore more information, such as the azimuth between monitored locations and the weather conditions to improve the accuracy and robustness of LandGNN. In addition, incorporating the data uncertainty and understanding the details of interactions between different areas while explaining the model prediction results are worthy of further investigation, which could benefit the development of prediction approaches useful for various safety-critical applications. Data Availability Statement: Not applicable.