Capture and Prediction of Rainfall-Induced Landslide Warning Signals Using an Attention-Based Temporal Convolutional Neural Network and Entropy Weight Methods

The capture and prediction of rainfall-induced landslide warning signals is the premise for the implementation of landslide warning measures. An attention-fusion entropy weight method (En-Attn) for capturing warning features is proposed. An attention-based temporal convolutional neural network (ATCN) is used to predict the warning signals. Specifically, the sensor data are analyzed using Pearson correlation analysis after obtaining data from the sensors on rainfall, moisture content, displacement, and soil stress. The comprehensive evaluation score is obtained offline using multiple entropy weight methods. Then, the attention mechanism is used to weight and sum different entropy values to obtain the final landslide hazard degree (LHD). The LHD realizes the warning signal capture of the sensor data. The prediction process adopts a model built by ATCN and uses a sliding window for online dynamic prediction. The input is the landslide sensor data at the last moment, and the output is the LHD at the future moment. The effectiveness of the method is verified by two datasets obtained from the rainfall-induced landslide simulation experiment.


Introduction
Rainfall-induced landslides are geological hazards triggered by prolonged rainfall or short-term heavy rainfall. Scholars have conducted in-depth research on landslide susceptibility mapping [1], data modeling [2], and mechanism analysis [3].
Machine learning (ML) and deep learning (DL) are important methods for landslide prediction because of their ability to achieve complex nonlinear modeling. Many ML and DL methods are used for landslide detection and prediction with better performance than traditional methods. Wei et al. proposed an attention-constrained neural network with overall cognition (OC-ACNN) to capture features to predict landslides [4]. Ghorbanzadeh et al. used different deep convolutional neural networks (CNNs) for landslide remote sensing images and achieved better results in landslide mapping [5]. An integrated framework of DL models with rule-based object-based image analysis (OBIA) to detect landslides was explored by Ghorbanzadeh et al. [6]. Wang et al. optimized the Elman neural network with the genetic algorithm and used it to implement the prediction of landslide displacement [7]. Wang et al. compared five machine learning methods for reservoir displacement prediction, and the Hodrick-Prescott filter decomposed the cumulative displacement into trend displacement and periodic displacement [8]. Wang et al. predicted the intrinsic evolution trend of landslide displacement by (double exponential smoothing, DES) DES-VMD-LSTM, based on the Gaussian process regression (GPR) model to assess the uncertainty in the first prediction [9]. Miao et al. applied the fruit fly optimization algorithm back-propagation neural network (FOA-BPNN) for the prediction of random displacements [10]. Gong et al. considered the problem of interval prediction of landslide displacements and proposed a new method of interval prediction of landslide displacements combining dual-output least squares support vector machine (DO-LSSVM) and particle swarm optimization (PSO) algorithms [11]. Time series analysis and long short-term memory neural networks are used in landslide displacement prediction [12,13]. Lin et al. analyzed the internal relationship between rainfall, reservoir water level, and periodic landslide displacement and used the double-bidirectional long short-term memory (Double-BiLSTM) model to predict landslide displacement [14]. Zhang et al. proposed a method based on Gated Recurrent Unit (GRU) and Fully Integrated Empirical Decomposition of Adaptive Noise (CEEMDAN) for the dynamic prediction of landslide displacement [15]. The application of hybrid methods based on metaheuristics (MH) in the field of geohazards is a recent research direction in disaster prediction. Ma et al. conducted a comparative study on MHs and proposed a new hybrid algorithm, namely MH-based support vector machine regression (SVR) [16]. The hybrid method has high performance in terms of accuracy and reliability for landslide displacement prediction. Meanwhile, the hybrid method combined with a multiverse optimization (MVO) for hyperparameter optimization of MHs [17] improves the reliability of disaster prediction modeling.
Rainfall is commonly used for early warning as an important trigger for landslides. Cost-sensitive rainfall thresholds were investigated by Sala et al. and sensitivity analysis was performed [18]. However, rainfall thresholds that are difficult to standardize cannot be used as early warning signals for the occurrence of landslides. Changes in soil moisture are an important factor in landslides. Domínguez-Cuesta et al. focused on the role of rainfall and soil moisture as triggering and evolutionary factors for unstable events [19]. Soil moisture saturation and sudden rainfall are more likely to lead to landslides. Chen et al. analyzed the role of soil moisture index (SWI) in landslides based on 279 mass movements that occurred in Taiwan during 2006-2017 [20].
These data-driven approaches effectively implement the displacement prediction problem for landslides; however, these models do not consider correlations among multiple sensor data and do not capture warning signals in sensor data well. Entropy value, as a physical quantity describing the degree of data chaos, has also been used to analyze landslide risk [21]. However, landslide hazard analysis using the information entropy value method does not take into account the effects of different entropy values on landslide sensor data. A single entropy value method for landslide warning feature analysis failure will result in the possibility of misclassification.
Challenges: First, there are many landslide monitoring sensors, but the methods of effectively capturing warning signals are less studied. Second, there are correlations among different types of landslide sensor data, which need to be analyzed. Third, the accuracy of data-driven rainfall-induced landslide hazard prediction models needs to be improved. Contributions: • We combine an attention mechanism with multiple entropy weight methods and propose an attention-fusion entropy weight method (En-Attn) to capture warning signals based on massive landslide sensor data.

•
We propose an attention-based temporal convolutional neural network for landslide warning signals prediction based on massive sensor data.

•
We carry out the experimental simulation of rainfall-induced landslides, collect sensor data when landslides occur, analyze the precursory warning characteristics of the data, and use a variety of entropy weight methods to analyze the characteristics of warning signals offline.

•
Our model is validated on two datasets obtained from rainfall-induced simulation experiments, and our model has high accuracy compared with similar landslide warning capture and prediction methods.

Capture Models of Landslide Warning Signal
We obtain massive sensor data from landslide simulation experiments, including rainfall, the soil moisture content in shallow layers, the soil moisture content in deep layers, soil stress, and displacement. The evaluation of landslide warning signals is to extract the warning features from these massive sensor data to characterize the landslide warning situation. The entropy weight methods (EWM) can be used to assess the degree of landslide hazard [21].

Entropy Weight Methods
Entropy is a measure of uncertain information. The smaller the entropy value, the greater the amount of information and the greater the weight. The entropy weight method (EWM) [22] is an objective weighting method. The canonical EWM uses information entropy (InEn) [23] as the basis for calculation. In fact, there are many entropy methods, namely approximate entropy [24], sample entropy [25], fuzzy entropy [26], and permutation entropy [27]. Therefore, an improved entropy method can be obtained by replacing the information entropy in the canonical entropy weight method with the following four entropy values: approximate entropy (ApEn), sample entropy (SampEn), fuzzy entropy (FuzzyEn), permutation entropy (PeEn).
The calculation process of the EWM [28] has five steps.
Step 3: Calculate the coefficient of variation using Equation (3).
where z ij is the raw data at row i and column j in the sensor dataset.
x ij is the data normalized by z ij . e j is the entropy value of x ij . f En is the method for calculating the entropy values using Equations (6)-(26) for the specific formula.
N is the number of rows in the sensor dataset. d j is the coefficient of variation of x ij . ω j is the corresponding weight of each column of data obtained by the EWM. s i is the weight entropy score.
M is the number of columns in the sensor dataset. Information entropy (InEn) [23] can be calculated by Equation (6).
where ln denotes the natural logarithm.
f InEn j denotes the information entropy value. The calculation of ApEn can also be understood as the degree of self-similarity of a sequence in the pattern. For the change of a signal sequence, the change of the approximate entropy value can be used to achieve the purpose of effective identification. The biggest advantage of the approximate entropy calculation is that it does not require a large amount of data, most of the measured time series can meet the requirements, and the obtained results are robust and reliable [29]. The calculation of approximate entropy (ApEn) is as follows: where d[X i , X j ] denotes the distance between the vector X i and X j . B i is the number of items that satisfy the condition d[X i , X j ] < r. r denotes the similarity tolerance threshold. Φ m i denotes the ratio of the approximate quantity to the total quantity, namely the approximate ratio.
f ApEn denotes the approximate entropy value of sequence X i . m is the dimension of X i , which is an artificially set parameter value.
ApEn characterizes the complexity of a sequence. The value of ApEn is less affected by the amount of data and is suitable for non-stationary and nonlinear sequences. ApEn preserves the time series information in the original signal sequence and reflects the characteristics of the signal sequence on the structural distribution. The entropy value of the fault signal will be greater for fault data present in a set of continuous data, so ApEn is often used to detect the fault signal. The fault signal here refers to the presence of multiple abnormal signals in a set of sequential signals.
SampEn is an improved method based on ApEn [29]. The SampEn has better consistency. If one time series has a higher SampEn value than another time series, then the other r and m values also have higher SampEn values. Meanwhile, SampEn is not sensitive to missing data [29].
The calculation of sample entropy (SampEn) is as follows: where B m i denotes the ratio of the number of d[X i , X j ] < r to the total number of vectors N-m, for a given threshold r (r > 0).
f SampEn denotes the sample entropy value of the sequence X i . In the definitions of ApEn and SampEn, the similarity of vectors is determined by the difference in absolute values of the data. Correct analysis results cannot be obtained when there are slight fluctuations in the data used or baseline drift. FuzzyEn removes the influence of baseline drift through mean operation, and the similarity of vectors is no longer determined by the absolute amplitude difference, but determined by the shape of the fuzzy function determined by the exponential function, thereby fuzzifying the similarity measure [26]. The FuzzyEn uses an exponential function to fuzzify the similarity measurement formula. The continuity of the exponential function makes the fuzzy entropy change continuously and smoothly with the parameter change.
The calculation of fuzzy entropy (FuzzyEn) is as follows: where m denotes the embedding dimension. Y denotes the sequence after the phase space reconstruction of X.
x 0 is the mean of m consecutive x(i + j). d m i,j denotes the maximum value of the difference between the corresponding endpoints of Y i and Y j . D m i,j is the similarity between Y i and Y j after using the fuzzy membership function. ψ m is a function defined like Φ m i and B m i . f FuzzyEn denotes the fuzzy entropy value of sequence X i . Permutation entropy (PeEn) is a method to detect the randomness and dynamic mutation behavior of time series. The PeEn has the characteristics of simple and fast calculation, strong anti-noise ability, and can realize the characteristics of online monitoring of mutation signals. PeEn introduces the idea of permutation when calculating the complexity between reconstructed subsequences.
S is a set of symbol sequences consisting of the index of each element position column after each reconstructed component is rearranged in ascending order. j m is the column index of the position of the mth element in the vector. P i is the probability of occurrence of each sort. PE denotes the permutation entropy value of the sequence. f PeEn denotes the normalized value of the permutation entropy. The matrix has k reconstruction components in total, and each reconstruction component has m-dimensional embedded elements. Arrange the jth category in the matrix in ascending order according to the size of the array using Equation (22). j 1 , j 2 , · · · , j m represents the subscript index value of each element in the reconstructed component. Note that the above sequence has a parameter τ, namely the time delay factor, which must be a positive integer. In fact, this parameter can be understood as the downsampling of the sequence. For example, when τ = 3, it is sampling every three data points. When τ = 1, the sequence is the same as the sequence definition of the ApEn and SampEn.

Attention-Fusion Entropy Method
The attention mechanism can pay attention to important parts of the sequence data [2,30]. Queries and key-value pairs are mapped to outputs. The calculation process of the attention mechanism is shown in Figure 1. Equation (27) shows the score function, and Equation (28) shows the attention calculation process. The score function is essentially seeking a degree of similarity, and the Softmax function is to normalize the weights at all positions so that the sum is equal to one [31].
where Q denotes the queries, and Q = W q i X t , where W q i is the weight corresponding to Q. K denotes the keys C denotes the result of the weighted summation of weights and variables.
1 √ d denotes the scaling factor. The role of the scaling factor is to keep the dot product of Q and K from becoming too large [31]. Once the dot product is too large, the activation function Softmax enters a region with a small gradient. The attention mechanism is used for the calculation to fuse multiple EWMs, and the fused entropy method is obtained, which is named as En-Attn. Figure 2 shows that the input of the En-Attn model is historical sensor data, including rainfall, shallow moisture content, deep moisture content, displacement, and soil stress. The three types of data are calculated by three EWMs for comprehensive evaluation scores. The difference between these three entropy weight methods is that the entropy is different, namely InEn, FuzzyEn, and PeEn. The reason why ApEn and SampEn are not used in the En-Attn model is that FuzzyEn is an improvement on SampEn and ApEn. Meanwhile, in the actual dataset, the difference between these three methods is not obvious. For the same datasets, the result of getting almost the same output needs to be computed three times, which consumes computation time and occupies the memory of the computation space. Therefore, FuzzyEn is chosen instead of the three EWMs to reduce the time and space complexity of the En-Attn method. The demonstration of the details of these three EWMs for landslide sensor data processing is presented in Section 4.1. The attention mechanism is used to fuse the outputs of the three EWMs (InEn, FuzzyEn, and PeEn) and finally outputs landslide hazard degree (LHD). Algorithm 1 elaborates the specific calculation steps.

Prediction Model of Landslide Warning Signal
The prediction model of the hazard degree of rainfall-induced landslides is based on temporal convolutional neural networks (TCNs). TCNs have a good predictive effect on the processing of time series data [32,33]. We add an attention module to the data before TCN input to extract the prediction features of the input data; we also add an attention module to the output data of TCN to extract the features of the output data to improve the performance of TCN.
The TCN incorporating the attention mechanism is shown in Figure 3, including the attention mechanism (I-Attn) in the input stage, the attention mechanism (T-Attn) after the TCN output, and the TCN that plays the main prediction role. The input of I-Attn is sensor data at time t and the hidden layer at time t − 1, and the output is the attention weight at time t. The input of T-Attn is the hidden layer at time t, and the output is the size of the attention weight at time t and the weight value of the TCN's output, which is the final predicted output value. TCN is composed of multiple residual blocks [32]. The output of the previous residual block is the input of the next residual block. The 1D convolution in TCN enables equal lengths of the input and output sequences [34]. Causal convolution ensures that the prediction process does not suffer from data leakage. TCN enlarges the convolutional field size, which can be obtained from Equation (29). The calculation of the number of residual blocks is obtained from Equation (30).
where k denotes the size of the convolutional kernel. B denotes the size of the dilated base. N denotes the number of residual blocks. L denotes the length of the input tensor. In the actual landslide experiment, the sensor data are transmitted back to the host computer as a continuous string of arrays. The dynamic sliding prediction of the ATCN model is implemented using a sliding window as a way to process the dynamic data, as shown in Figure 4. The input of the sliding window is the five-dimensional sensor data of T i length, and the output is the landslide hazard degree (LHD) of T o length. The sliding window moves forward with the time step while the predicted value is output. Algorithm 2 illustrates the specific steps of the landslide warning signals prediction model (ATCN). The performance of the ATCN is experimentally verified in Section 4.2.

Landslide Simulation Platform
The landslide simulation platform (LSP) is built to simulate the occurrence of rainfallinduced landslides. The landslide simulation platform (LSP) simulates a small monitoring area in a mountain rather than a large area such as a natural landslide itself. This is because simulating a mountain in nature is actually very challenging, and all we can do is simulate a certain monitoring area. In nature, multiple monitoring zones work together on a large mountain. The analysis of a monitoring zone is a prerequisite for data analysis and early warning of a large mountain. Figure 5 shows the physical objects of the LSP. The structure of the LSP includes the simulated rainfall system and the sensor measurement system. The experimental process includes five steps: Step 1: Place the rock and soil mass inside the soil box.
Step 2: Install five types of sensors at the appropriate positions.
Step 3: Use the hydraulic support rod to adjust the soil box to a suitab we chose 30°.
Step 4: Turn on the rain sprinklers for rainfall simulation and use t software to monitor the sensor data and save it to the database.
Step 5: Analyze and process the sensor data after the experiment is com In the landslide simulation experiment platform, we installed five type tipping bucket rain gauge, a draw-wire displacement sensor, a soil stress g moisture content sensors. The installation positions of the sensors are show The locations of the sensors installed in the experiment are as follows: 1. The tipping bucket rain gauge is located in the center of the soil-carry its opening facing upwards for better rain reception. 2. The position of the draw-wire displacement sensor is in the front thi carrying box. It monitors the change in soil displacement as the leadi landslide moves. 3. The soil stress gauge is positioned in the front third of the soil-carryin tor the stress changes within the soil at the leading edge of the landslid 4. The location of the soil moisture sensor for monitoring the shallow mo is about 30 cm from the surface, and the location of the soil moisture se itoring the deep moisture content is about 80 cm from the surface.
Note that the above sensor installation locations are limited by the LS used as a reference criterion for experiments.  The simulated rainfall system consists of the following components: rainfall sprinklers, soil-carrying box, hydraulic support rods, and lift bars. The rain sprinklers simulate the natural rainfall environment, and controlling the amount of rainfall can simulate the rainstorm. The soil-carrying box contains rock and soil mass to simulate natural slope conditions. The hydraulic support rods and the lifting bars can adjust the angle of the soil-carrying box to simulate the angle of the potential landslide body in nature. Water will seep out of the tube wall as it passes through the porous ceramic tube, simulating underground water in the rock and soil mass.
The experimental process includes five steps: Step 1: Place the rock and soil mass inside the soil box.
Step 2: Install five types of sensors at the appropriate positions.
Step 3: Use the hydraulic support rod to adjust the soil box to a suitable angle. Here, we chose 30 • .
Step 4: Turn on the rain sprinklers for rainfall simulation and use the monitoring software to monitor the sensor data and save it to the database.
Step 5: Analyze and process the sensor data after the experiment is completed. In the landslide simulation experiment platform, we installed five types of sensors: a tipping bucket rain gauge, a draw-wire displacement sensor, a soil stress gauge, and two moisture content sensors. The installation positions of the sensors are shown in Figure 6.
The locations of the sensors installed in the experiment are as follows: 1.
The tipping bucket rain gauge is located in the center of the soil-carrying box, with its opening facing upwards for better rain reception.

2.
The position of the draw-wire displacement sensor is in the front third of the soilcarrying box. It monitors the change in soil displacement as the leading edge of the landslide moves.

3.
The soil stress gauge is positioned in the front third of the soil-carrying box to monitor the stress changes within the soil at the leading edge of the landslide. 4.
The location of the soil moisture sensor for monitoring the shallow moisture content is about 30 cm from the surface, and the location of the soil moisture sensor for monitoring the deep moisture content is about 80 cm from the surface.
Note that the above sensor installation locations are limited by the LSP and are only used as a reference criterion for experiments.

Landslide Data Processing
We carry out two experiments on rainfall-induced landslides and obtain datasets for L 1 and L 2 . The rainfall, soil stress, and displacement in the datasets are normalized to obtain the sensor data curves in Figure 7. The ordinate on the left of Figure 7 is moisture content, and the ordinate on the right is the percentage of data. After a period of time, the moisture content of the soil in the shallow layer begins to rise, and the moisture content of the soil in the deep layer rises in response. The reason why the relationship between the two moisture contents in Figure 7b is not significant is that before rainfall, the deep soil moisture content is high and close to saturation.
The Pearson correlation coefficient method is used to analyze the landslide sensor datasets to analyze the correlation between different types of sensor data.
The Pearson correlation coefficient is suitable for two columns of spaced variables (continuous variables) in a normal distribution. The correlation coefficient and the probability of the correlation can be obtained for two columns of data using Equation (31) when they have the same number of data and correspond to each other.
where r p denotes Pearson correlation coefficient. X represents senor data. Y represents sensor data other than X. σ X denotes the standard deviation of X. σ Y denotes the standard deviation of Y. The Pearson correlation coefficient ranges between −1 and 1. When the Pearson correlation coefficient is 0, the X and Y vectors are not correlated. When its value is greater than 0.8, X and Y are highly correlated.
We let X and Y be one of the five types of sensor data, respectively, and the heatmaps are obtained in Figure 8 after the calculation of Equation (31). In Figure 8a, the rainfall and displacement show a high correlation with the magnitude of soil stress and a moderate correlation with the shallow moisture content and the deep moisture content. The shallow moisture content and the deep moisture content are highly correlated states. The shallow moisture content shows a weak correlation with the displacement amount. Soil stress shows a strong correlation with displacement. In Figure 8b, rainfall displays a strong correlation with displacement, soil stress, and deep moisture content and a moderate correlation with shallow moisture content. The correlation between shallow moisture content and other sensor data is weak. The relationship between the landslide process and different sensor data is analyzed as follows: 1.
The amount of rainfall directly affects the moisture content of the shallow soil. Surface water will exist when the surface seepage rate is less than the rainfall.

2.
The moisture content of deep soil is significantly higher than that of shallow soil due to groundwater action during the initial stage of rainfall. The moisture content in the deeper layers of the soil would gradually increase as surface water gradually infiltrates into the ground as rainfall continues. However, its moisture content does not exceed the shallow moisture content at this stage. The growth rate of the shallow moisture content would gradually decrease, and the size of the deep moisture content would eventually be approximately equal to the shallow moisture content throughout the entire landslide formation process.

3.
The soil stress also varies as the soil layer's moisture content varies. The shear strength of the soil is characterized by soil stress. The soil stress increases quickly for a while when there is no significant displacement of the surface, after which the surface gradually becomes significantly displaced during the sliding phase. As the soil's moisture content rises, the clay in the soil softens and loses some of its slip resistance. It also loses shear strength.

4.
The soil moisture content tends to become saturated before the landslide body enters the catastrophic slip phase. When the soil stress increases, the landslide body enters the severe sliding stage. When a landslide reaches the severe slip stage, the surface displacement dramatically rises, and erosion-created depressions and gullies start to show up near the body's front edge.

5.
After entering the stabilization stage, the surface displacement of the landslide body no longer increases, but due to the effects of rainfall and groundwater, the surface and underground runoff still play a role in triggering the secondary landslide.

Experiments and Results
In this section, we describe experiments on landslide warning signals and signal prediction. We present the results of two experiments to demonstrate the effectiveness of En-Attn as well as ATCN in landslide warning signal capture and prediction.

Landslide Hazard Degree and Results
We apply the En-Attn model to process the landslide datasets L 1 and L 2 . Figure 9 illustrates the landslide hazard degree (LHD) obtained by En-Attn as well as the three EWMs. The LHD obtained by all six methods shows an increasing trend, indicating a gradual increase in the characteristics of the hazard level during landslide formation. The LHD ranges from 0 to 1. LHD = 0 means no warning feature, and LHD = 1 means the landslide warning feature is significant and enters a very urgent warning situation. For dataset L 1 , the LHD increases gradually, and when the time step is greater than 14,000, the LHD increment rate increases. For dataset L 2 , the incremental rate of LHD increases when the time step is greater than 10,000, while the volatility of LHD is greater compared to L 1 . Note that the differences in the LHD obtained by ApEn, SampEn, and FuzzyEn are not significant, and the differences exhibited by the local enlarged image are shown in Figure 8a,b. The reason that only FuzzyEn is considered in the En-Attn model and not both ApEn and SampEn is because the differences between the three methods are not significant.
The single entropy value method is prone to fluctuations in the calculation of LHD, as in the case of PeEn in Figure 8b. The LHD obtained by the En-Attn model not only demonstrates landslide warning characteristics but also exhibits better stability and robustness. The En-Attn model overcomes the drawbacks of the single EWM and adapts better to the case of multi-sensor data to evaluate landslide warning features.

Prediction Experiments and Results
We apply the ATCN model to process the landslide datasets L 1 and L 2 and their LHD. The ATCN model is elaborated in Section 2.2. We conducted experiments to test the performance of the ATCN model, comparing long short-term memory neural networks (LSTM) [35], grated recurrent units (GRU) [36], temporal neural networks (TCN) [32,34], convolutional long short-term memory neural networks (ConvLSTM) [37], and dual-stage attention-based recurrent neural networks (DA-RNN) [30]. The metrics [2] for evaluating the performance are root mean square error (RMSE), mean absolute error (MAE) and mean absolute percent error (MAPE), and the specific equations are shown in Equations (32)- (34).
where N is the total number of test data. y t is the true value at the tth time step. y t is the predicted value at the tth time step. The model tests are divided into two types of sliding windows, "100-10" and "100-50", which reflect different input data lengths and prediction lengths. The hyperparameters of the TCN and ATCN models are set as follows: filters = 32, batch size = 128, kernel size = 8, where the activation function of the attention mechanism is Softmax. The hyperparameters of the LSTM and GRU models are set as follows: the number of units is 16. The activation function is ReLU, the optimization algorithm is Adam, the initial learning rate is 0.001, and the learning rate can be adjusted according to the loss function subsequently. The hyperparameter experiments of ATCN are shown in Appendix A. All models are run 20 times, and the predicted values are obtained after testing the datasets L 1 and L 2 . The average values of RMSE, MAE, and MAPE are shown in Tables 1 and 2. Tables 1 and 2 demonstrate the RMSE, MAE, and MAPE of ATCN and its counterparts. Table 1 shows that the RMSE, MAE, and MAPE metrics of ATCN are lower for dataset L 1 , which implies better performance of ATCN.
The ATCN outperforms other models in the prediction of LHD. Compared with the TCN model, the RMSE, MAE, and MAPE of ATCN decreased by 55.60%, 52.13%, and 51.17%, respectively, with the sliding window set to "100-10". The ATCN can effectively capture the characteristics of landslide prediction. The ATCN also outperforms other models when the sliding window is "100-50". In comparison to the TCN model, the performance of the three metrics is decreased by 43.30%, 35.63%, and 34.24%, respectively. The poor performance is due to the absence of attention mechanisms in the LSTM, GRU, and ConvLSTM, as well as the insignificant features obtained from the complex landslide sensor signals. Figure 2 displays the metrics for dataset L 2 , which is similar to dataset L 1 . The classical recurrent neural network models, LSTM and GRU, performed poorly because the predictive properties shown by the sensor data in dataset L 2 are not obvious. The performance of DA-RNN and ATCN with the addition of the attention mechanism is outstanding. The three metrics of ATCN are decreased by 33.74%, 30.15%, and 29.06%, respectively, in comparison to DA-RNN when the sliding window is set to "100-10". The three metrics of ATCN are decreased by 35.97%, 35.44%, and 35.10%, respectively, compared to DA-RNN when the sliding window is set to "100-50".
Comparing the model performance with different prediction lengths, it can be seen that the shorter the prediction length, the smaller the performance metrics, and the better the prediction effect. When the prediction length is long, the attention mechanism captures the long-term dependency characteristics more and more prominently, and the performance of DA-RNN and ATCN with the attention mechanism is better than the other models. Comparing the DA-RNN and ATCN models, ATCN has better prediction results and stable performance when the sliding windows are "100-10" and "100-50". The ATCN model has the lowest error and the best prediction, as seen in Tables 1 and 2. The two sliding windows can be compared to demonstrate that the model's error increases with prediction length. ATCN's prediction accuracy is greater.

Discussion and Conclusions
This work adopts the attention mechanism to integrate the multi-entropy values to capture the landslide warning signals and explores the ATCN to realize landslide hazard prediction. Compared with its counterparts, our model has the characteristics of higher accuracy. Compared with current landslide hazard prediction methods, our methods have the following characteristics: 1.
Exploring deep learning algorithms combined with big landslide data is an extension of deep learning application scenarios. This model uses a simple attention mechanism combined with a temporal convolutional neural network. Although this model is simple, its prediction effect is better than other complex deep learning models.

2.
Effective landslide hazard capture. In the traditional sense, the capture of rainfallinduced landslide hazards is either directly replaced by the landslide displacement or only a single EWM is used to realize the signals capture. The model uses the attention mechanism to integrate a variety of EWMs, and the obtained landslide warning signals are more reliable.

3.
Note that our model cannot be adapted for landslide hazard prediction with a small amount of data, as massive data is the basis of our model.
In the future, we intend to design a software system that integrates the algorithms for actual landslide sites. Further, we intend to consider different types of sensor data because more kinds of sensor data represent more comprehensive landslide disaster information. Furthermore, we plan to consider the sensor data of the landslide simulation platform in relation to soil thickness. We use landslide simulation experiments in this study. However, we could not achieve the exact same processes in the laboratory as in nature. For example, simulating different soil layers, which would take millions of years to form in nature. Our future research work will take into account multiple natural environmental factors to improve the experimental setup, including slope angle and dynamics of water extinction.   Table A1 shows the metrics of ATCN for different batch sizes tested with kernel size = 16, filters = 8. The results in Table A1 show that the RMSE, MAE, and MAPE metrics of the model for both sliding window cases are the smallest for batch size = 128. Table A2 provides the metrics of ATCN with different filters tested for batch size = 128 and kernel size = 16. The sliding window "100-50" model exhibits the smallest RMSE, MAE, and MAPE metrics when filter = 8, according to Table A2. Table A3 demonstrates the metrics of ATCN for different kernel sizes with batch size = 128 and filters = 8. The results in Table A3 demonstrate that for the sliding window "100-10" with kernel size = 16, the RMSE, MAE, and MAPE metrics are minimum. The smallest MAE and MAPE metrics are for the sliding window "100-50" with kernel size = 16. The optimal combination of hyperparameters for the ATCN model is batch size = 128, kernel size = 16, and filters = 8. Note that our model code runs on Windows 10, NVIDIA GeForce GTX 1650 GPU, and the deep learning framework is TensorFlow 2.6.0.