## 1. Introduction

As one of the most common natural hazards in the world, landslides pose a significant threat to public health and safety. According to statistics, landslides have affected 4.8 million people and caused 18,275 deaths during the period of 2009–2019 [

1]. Landslide displacement prediction, which provides the necessary information to determine the extent of ongoing hazard, has proven to be the most cost-saving risk reduction measure [

2,

3,

4]. However, landslide displacement prediction is complex and remains a key challenge in natural hazard research. This challenge arises because landslides are nonlinear, dynamic systems, and the associated movements can be induced by different causes, such as geological factors [

5], hydrological factors [

6,

7], morphological factors, and human activities [

4,

8].

A large number of efforts in the literature have focused on the precise prediction of landslide displacement [

9]. Currently, approaches used for landslide displacement prediction are categorized as physical modelling approaches and data-driven approaches [

10]. Physical models (also known as white-box models), which rely on detailed descriptions of landslide mechanism processes, can provide clear physical explanations of landslides. The commonly used physical models include the tertiary creep model [

11], the Hayashi model [

12], and the general creep model [

13]. Those physical models require numerous expensive geotechnical characterizations of the materials involved in landslides and therefore may be applicable only in limited cases [

14].

Data-driven models differ from physical models because a characterization of the actual landslide mechanism processes is not fully required. Thus, the data-driven models are also known as black-box models. The main advantage of data-driven models is that the trained models can be easily updated on the basis of new and more recent data.

Data-driven models include but are not limited to statistical methods, artificial neural networks (ANNs), support vector machines (SVMs) [

15], and extreme learning machines (ELMs) [

16]. Owing to their capacity to approximate arbitrary, nonlinear, and dynamic systems with high precision, data-driven models achieve good model performance in the prediction of landslide displacement.

Despite their widespread application, the output of most existing data-driven models is a single estimate for each prediction horizon. These single estimates, which provide deterministic values, are referred to as point predictions [

3]. The defining characteristic of a point prediction is its accessibility with regard to understanding and operation. The main drawback of point prediction is that it only provides the prediction error, with no information regarding the associated predictive uncertainties, which limits the use of point prediction in decision-making applications.

The predictive uncertainties consisting primarily of input uncertainty, parameter uncertainty, and model uncertainty could be substantial. It is highly desirable to know the degree of uncertainty that is associated with a particular point prediction and convert the point prediction into informative resources for emergency landslide risk management [

3,

17]. Only limited studies have examined the quantification of uncertainty associated with landslide displacement prediction by constructing prediction intervals (PIs). The output of a PI is an interval composed of upper and lower bounds, where we expect the predictive value of the series to fall within some (prespecified) probability, which is deemed the PI nominal confidence (PINC). A hybrid approach based on an echo state network and mean-variance estimation was proposed by Yao et al. [

18] to measure the uncertainty in landslide deformation prediction and perform interval prediction. A bootstrap-based approach was proposed by Ma et al. [

4] to perform interval prediction of landslide displacement. Wang et al. [

2] proposed a direct interval prediction using least squares support vector machines or the construction of PIs of landslide displacement. Kernel-based support vector machine quantile regression (KSVMQR) was utilized in [

3] for quantification of the predictive uncertainty of landslide displacement.

However, the traditional methods have certain disadvantages in displacement prediction and quantification of predictive uncertainty. For example, the bootstrap-based approach requires significantly high computational costs, especially for large datasets [

2]. Additionally, the performances of SVM-based approaches are sensitive to the choice of kernel type and parameter values [

19]. Therefore, more efforts still need to be made for the improvement of prediction performance and quantification of the predictive uncertainty.

Ensemble prediction, a state-of-the-art artificial intelligence technique, aims to improve prediction robustness and accuracy and uncertainty quantification [

20,

21]. Ensemble prediction has been successfully applied in a variety of fields, including prediction performance improvement and uncertainty quantification of remaining useful life [

22], bankruptcy [

23], shear capacity of reinforced-concrete deep beams [

24], residential electricity consumption [

25], wind power [

26], flood susceptibility [

27,

28], and landslide susceptibility [

29].

In this study, a probability-scheme combination ensemble prediction that employs quantile regression neural networks and kernel density estimation (QRNNs-KDE) was proposed for robust and accurate prediction and uncertainty quantification of landslide displacement. The Fanjiaping landslide with long-term and near real-time monitoring data was selected as a case study to explore the performance of the QRNNs-KDE approach. The deformation characteristics were clarified for fully understanding the triggering factors.

## 2. Methodology

#### 2.1. Description of Uncertainty Sources

Predictive uncertainty in data-driven models consists primarily of input uncertainty, parameter uncertainty, and model uncertainty [

30,

31,

32].

The input uncertainty is related to the input data uncertainty and the input variable section uncertainty. The input data uncertainty is primarily due to measurement and sampling error and environmental noise. The input variable section uncertainty accounts for uncertainty inherent in the selection of input variables from the candidate data set. For physical models, the required inputs are pre-determined, being consistent with considered rheological models. However, for data-driven models, the selection of input variables is problem-dependent and cannot be determined in advance. Only major and relevant variables are selected as final inputs to train the data-driven model. The selection of the variables to include in a data-driven model from the original data set is inherently uncertain, especially when the input candidate pool is very large. For example, in data-driven models that utilize decomposition algorithms, only a portion of the decomposed sub-components are selected as input variables. The candidate input pool, which consists of sub-components, increases very quickly with the decomposition level and potentially increases the input variable selection uncertainty.

The parameter uncertainty refers to the uncertainty in the model parameter vector and mainly arises from the inability to identify a unique set of best parameters for the model [

33].

Model uncertainty arises primarily from the model structure uncertainty and model error. Model structure uncertainty is associated with the specific model setting of learning algorithms, such as the polynomial order in polynomial regression models, the number of hidden nodes in an ANN or ELM, and the type of kernel function in an SVM. The input uncertainty may also account for model structure uncertainty, because different input variables “automatically" produce different model structures. Model error refers to the difference between two model estimates with respect to the corresponding target and is caused by the inability to reproduce the real processes.

#### 2.2. Ensemble Prediction

Ensemble prediction is not a specific learning algorithm but a strategic combination of multiple predictions into a single output with a model combination process [

21]. Based on the selection of the learning algorithm, ensemble prediction models can be further classified into homogeneous and heterogeneous ensemble models (

Figure 1). A homogeneous ensemble model generates multiple learners with the same learning algorithm on different training datasets, which are produced by manipulating the original training data (schematic illustrated in

Figure 1a). Bootstrap aggregation, also known as bagging for short, is the most straightforward and widely used method of manipulating the training dataset. By contrast, a heterogeneous ensemble model generates multiple learners with different learning algorithms on the same training data set (schematic illustrated in

Figure 1b).

The base learner combination is the main step in the ensemble prediction model. Summation and averaging are simple combination schemes. A more general approach involves assigning a weight to each base learner. In the present study, a heterogeneous ensemble model was built based on QRNNs and KDE. QRNNs serve as base learning algorithms to produce multiple base learners, and the probability combination scheme based on KDE is used to combine the base learners into the final ensemble prediction.

#### 2.3. Quantile Regression Neural Network

#### 2.3.1. Quantile Regression

Quantile regression is a common statistical technique for conducting inferences concerning conditional quantile functions [

34,

35]. More formally, any real-valued random variable

Y may be characterized by its distribution function as follows:

whereas for any

$0<\tau <1$,

is called the

$\tau \mathrm{th}$ quantile of

Y.

Given a data set

$({x}_{i}(t),Y(t))$ for

$i=1,2,\cdots ,I$ and

$t=1,2,\cdots ,N$, the linear quantile regression can be expressed as follows:

where

$0<\tau <1$ is the quantile, and

b is an error with zero expectation.

The estimated parameters

${\theta}_{i}$ can be approximated by minimizing a sum of the asymmetrically weighted absolute residual cost functions, which are expressed as follows:

where

$Y(t)$ is the observation at time

t and

${\rho}_{\tau}$ is the check function, which is also known as the pinball loss function and is defined as follows:

#### 2.3.2. Quantile Regression Neural Network

Given inputs ${x}_{i}(t)$ and an output $Y(t)$, the output from a QRNN is calculated as follows:

Consider a hidden-layer transfer function

$h(\cdot )$; the output from the

j-th hidden-layer node

${g}_{j}(t)$ is given by applying the hidden-layer transfer function to the inner product between

${x}_{i}(t)$ and hidden-layer weights

${w}_{ij}^{(h)}$ plus the hidden-layer bias

${b}_{j}{}^{(h)}$, which can be calculated as follows:

An estimate of the conditional

$\tau $-quantile

${\widehat{y}}_{\tau}(t)$ is

where

${w}_{j}^{(o)}$ are the output-layer weights,

${b}^{(o)}$ is the output-layer bias, and

$f(\cdot )$ is the output-layer transfer function. The transfer function

$h(\cdot )$ and

$f(\cdot )$ are usually set as the hyperbolic tangent sigmoidal and linear function, respectively [

36].

As an alternative method to prevent overfitting, weight delay regularization for the magnitude of the input-hidden layer weight can be applied by setting a penalty with a nonzero value.

#### 2.4. Kernel Density Estimation (KDE)

Nonparametric density estimation is the process of fitting a parametric density model of a random variable without making the assumption that the density belongs to a particular parametric family [

37,

38]. Various methods have been proposed for nonparametric density estimation, e.g., k-nearest neighbors method, Parzen windows, histogram, and KDE [

38]. In the domain of nonparametric density estimation, the K-nearest neighbors method has a very limited scope of practical applications due to its very poor performance. The Parzen windows method presents slightly better performance but also produces discontinuities (stair-like curves) that are quite annoying in practice [

38]. A histogram is a simple form of the nonparametric density estimation. However, it suffers serious and noticeable drawbacks. First, the resulting visualization strongly depends on the choice of binning. Second, the natural feature of the histogram is discontinuity, which causes extreme difficulty if derivatives of the estimates are required.

Fortunately, those abovementioned drawbacks can be easily eliminated by using KDE [

38,

39]. In fact, KDE has been extensively studied and has become the most popular method in nonparametric density estimation. Given a random sample

${Y}_{1},{Y}_{2},\cdots ,{Y}_{m}$, the value of the density at the point

$y$ estimated by the KDE method is given by the following:

where

$h$ is the bandwidth with positive real value and

$K(\cdot )$ is the kernel function. In this study, the most effective Epanechnikov kernel [

38] was adopted and expressed as

where

$\mathbb{R}(\cdot )$ is the indicator function, that is,

$\mathbb{R}(y\in A)=1$ for

$y\in A$ and

$\mathbb{R}(y\in A)=0$ for

$y\notin A$.

The selection of bandwidth parameter is a crucial issue in KDE. The bandwidth parameter influences the smoothness of the KDE curve and also determines the tradeoff between the bias and variance. In general, the smaller the bandwidth, the smaller the bias, and the larger the variance. A number of methods have been proposed to find the optimal bandwidth, such as Silverman’s rule of thumb and the Sheather-Jones method. Silverman’s rule of thumb bandwidth with a Gaussian kernel and Epanechnikov kernel can be computed as follows:

where

$\widehat{\sigma}$ is the estimation of

$\sigma $ (standard deviation of the input data) [

38].

#### 2.5. Ensemble Prediction Employing QRNNs and KDE

The proposed ensemble prediction employing QRNNs and KDE is shown in

Figure 2. The QRNNs-KDE approach consists of four stages: (1) data splitting and normalization, (2) QRNN modelling, (3) probability density function (PDF) estimation by KDE, and (4) final ensemble prediction.

Data splitting and normalization: The original landslide monitoring dataset is divided into training data and testing data. The training data are used for model construction, and the testing data are used to evaluate the performance of the constructed model. To eliminate the influence of dimensional data, the training data and testing data are first normalized in the range of 0 to 1.

QRNNs modelling: QRNNs serve as base learning algorithms to generate multiple base learners ${Y}_{1}(t),{Y}_{2}(t),\cdots ,{Y}_{m}(t)$ by applying a finite number of conditional quantities ${\tau}_{1}\le {\tau}_{2}\le \cdots \le {\tau}_{m}$ within the domain $0<\tau <1$, e.g., $\tau $ = 0.01, 0.02, …, 0.98, 0.99. The base learners of landslide displacement are obtained after renormalizing the outputs from the QRNNs approach. To avoiding overfitting in QRNNs modelling, a penalty parameter with nonzero value is applied.

PDF estimation by KDE: Multiple base learners from the QRNNs base model are treated as the input for KDE to estimate the probability density function (PDF) of the base learners. The kernel function and bandwidth influence the shape of the KDE curve. An appropriate kernel function and an optimal bandwidth should be chosen to best match the features of the original dataset.

Final ensemble prediction: In the present study, the final ensemble prediction was obtained through a probability combination scheme as follows:

where

${p}_{i}(t)$ is the probability value of the

i-th base learner and

${Y}_{i}(t)$ is obtained from the KDE for monitoring period

$t$.

#### 2.6. Evaluation Metrics and Uncertainty Quantification

In this study, five indices—coefficient of determination (

R^{2}) MSE, RMSE, NRMSE, and MAPE—were applied to assess the performance of point prediction. R

^{2}, MSE, RMSE, NRMSE, and MAPE are defined as

where

${\widehat{u}}_{t}$ and

${u}_{t}$ denote the

t-th predictive value and observation, respectively, and

$\overline{u}$ and

$\overline{\widehat{u}}$ denote the mean of the observation and the mean of the predictive value, respectively.

In the present study, the associated predictive uncertainties were quantified with PIs. After the above procedures, full PDFs of the future landslide displacement were achieved. An interval prediction with a $(1-\alpha )\times 100\%$ confidence interval can be obtained from the $\alpha /2$ and $1-\alpha /2$ quantiles of the obtained PDF. The $\alpha $ level, also called the significance level, ranges from 0 to 1 and is the probability of not capturing the value of the parameter. The predictive values of the $\alpha /2$ quantity and $1-\alpha /2$ quantity are set as the upper bound (${U}_{t}^{1-\alpha}$) and lower bound (${L}_{t}^{1-\alpha}$), respectively. For example, a 90% central PI can be obtained from the 0.05 and 0.95 quantiles of the PDF. The upper bound and lower bound of the 90% confidence level correspond to the predictive values of the 0.95 and 0.05 quantiles of the obtained PDF.

The prediction interval coverage probability (PICP), normalized mean PI width (NMPIW), and coverage width-based criterion (CWC) are three indices for evaluating the correctness of the approximated PIs. The PICP reflects the degree of reliability of PIs and is defined as

where

${I}_{t}^{1-\alpha}$ is defined as follows:

NMPIW measures the width of the PI; it is defined as

where

$\varsigma $ is the range of the underlying targets.

For high-quality PIs, narrow PIs (smaller NMPIW) with a high coverage probability (large PICP close to 100%) have great value [

40,

41]. Theoretically, NMPIW and PICP are conflicting. Therefore, CWC, which is a new balance criterion between PICP and NMPIW [

42], is proposed to give a comprehensive assessment of PIs. CWC is defined as

where

$\psi $ is a small positive value within the range of (0.1%, 0.5%),

$\mu $ corresponds to the nominal confidence level associated with PIs that is usually set to

$1-\alpha $, and

$\delta $ is a small positive value less than 1.

$\gamma $ is set to 1 during the training process; for testing, it is defined by the following step function:

## 4. Results

PDFs: The PDFs of predictive displacement at ZG289 and ZG291 constructed by the proposed QRNNs-KDE approach are shown in

Figure 7 and

Figure 8. The fast movement is the main concern in landslide displacement prediction. Here, only a portion of the prediction describing the fast landslide is selected and shown.

Figure 7 and

Figure 8 show that rather than a single estimate, the range and complete PDF of the predictive displacement are provided by the proposed approach. All landslide displacement observations are distributed in the middle of the PDFs with high probability in addition to the observations of May and June at ZG289, which appear at the tail of the probability density curve. The small fraction falling into the right tail follows the increase in the prediction period; here, there are more uncertainties associated with longer-term landslide predictions.

Final ensemble prediction:

Figure 9 shows the final ensemble predictions. As shown in

Figure 9, the ensemble predictions obtained via the probability combination scheme showed a high degree of consistency in the landslide displacement observations, with coefficient of determination values of 0.999932 and 0.999944. To further evaluate the prediction performances of ensemble prediction based on the QRNNs-KDE, the evaluation metrics of the BP, RBF, ELM, and SVM approaches are shown in

Table 2. As shown in

Table 2, the final ensemble predictions using the QRNNs-KDE approach outperformed the persistence methods with the smallest MSE, RMSE, NRMSE, and MAPE and the largest R

^{2}. Moreover, compared with predictions at monitoring point ZG289 using the Copula-KSVMQR approach in [

3], the QRNNs-KDE approach provided more accurate prediction with smaller MAPE and RMSE.

Uncertainty quantification: Based on the PDFs shown in

Figure 7 and

Figure 8, PIs at a high confidence level (90%) were constructed for ZG289 and ZG291 (

Figure 10a,c, respectively). To evaluate the prediction performances based on the QRNNs-KDE approach, 90% PIs were constructed based on the bootstrap-ELM-ANN approach (

Figure 10b,d). The corresponding evaluation metrics are shown in

Table 3. As shown in

Figure 10 and

Table 3, the constructed PIs based on the QRNNs-KDE approach perfectly covered the observations with a high percentage, and the QRNNs-KDE approach outperformed the bootstrap-ELM-ANN approach with smaller NMPIW and CWC. For example, the performance indices NPIW and CWC of 90% PIs at ZG289 were 0.0215 and 0.1661, respectively, which were lower than those obtained using the bootstrap-ELM-ANN approach. The normalized mean PI width using the QRNNs-KDE approach was approximately 90% narrower than that for the bootstrap-ELM-ANN approach.

The experimental results show that the final ensemble predictions based on the QRNNs-KDE approach outperformed the traditional BP, RBF, ELM, SVM, and Copula-KSVMQR algorithms with regard to deterministic point prediction. The QRNNs-KDE approach was more informative than traditional algorithms because it provided the likely range of landslide displacement. The landslide observations were distributed in the middle of the prediction range with high probability. Moreover, regarding the aspect of uncertainty quantification, the QRNNs-KDE provided more satisfactory PIs than the bootstrap-ELM-ANN approach. Therefore, we believe that the final ensemble predictions based on the QRNNs-KDE approach have the advantages of accurate prediction and uncertainty quantification of landslide displacement.

## 5. Discussion

In this study, with regard to point prediction, the probability-scheme combination ensemble prediction, which employs QRNNs-KDE, provided the best prediction. The fundamental reasons behind this can be explained from statistical, computational, and representational perspectives [

47]. From a statistical perspective, the available training data set may not be able to provide sufficient information for training the true model (

h^{*} in

Figure 11). Constructing an ensemble model (

h^{’} in

Figure 11) might not be better than the single best prediction model

h^{*}, but it does reduce the risk of choosing a bad learner with poor generalizability (schematic in

Figure 11a). From a computational perspective, in a single model the training algorithms might get stuck in lock optima by only performing a local search. Constructing an ensemble model by searching from different starting positions might be a better alternative (schematic in

Figure 11b). From a representational perspective, it is possible that the searched hypothesis space might not contain the true model

h^{*}. Constructing an ensemble model might expand the representable space (schematic in

Figure 11c).

In the proposed QRNNs-KDE approach, the probability combination scheme is employed to combine 99 base learners into one final ensemble to improve the model performance. However, a concern about computational time may be associated with this ensemble strategy. The required computational time is highly related to the number of base learners. For the case of ZG 291, the required computation time is 191.85 s to train 99 base learners in RStudio Version 1.2.5042 on an Intel(R) Xeon(R) E-2176M @ 2.70 GHz CPU with 64 GB RAM. Thus, we believe that the proposed approach is computationally efficient.

Nevertheless, the probability-scheme combination ensemble prediction, which employ QRNNs and KDE, also holds inherent limitations associated with data-driven models, such as the lack of an explicit input-output relationship, and the requirement of large training data to maintain the model performance.

In practical applications, the main motivation for the construction of predictive range and complete PDF is to quantify the likely predictive uncertainty in the deterministic point predictions. Availability of range and complete PDF of the predictive displacement allows the researchers and practitioners to efficiently quantify the level of predictive uncertainty with the deterministic point predictions and to consider a multiple of solutions/scenarios for the best and worst conditions. Wide ranges are an indication of presence of a high level of uncertainty in the operation. This information can guide the researchers and practitioners to avoid the selection of risky actions under uncertain conditions. In contrast, narrow range means that decisions can be made more confidently with less chance of confronting an unexpected condition in the future, for example, if a sharp displacement increment with a wider range was predicted for the further. An alert should be carefully determined whether reaching tertiary creep stage by researchers and practitioners through comprehensive analysis. Under this circumstance, time-of-failure forecasting should be run in parallel, and a multiple of solutions/scenarios should be considered until either failure precursors are identified or the movements suspended.

The proposed QRNNs-KDE approach is suitable for medium-term to long-term horizon forecasting. Results from previous studies [

2,

48] have shown that the performance of data-driven models varies for landslides with different deformation behaviors. Usually, for landslides with drastic step-like deformation, the prediction accuracy is lower, and the corresponding prediction error is larger. Therefore, in practical applications of medium-term to long-term horizon forecasting, when predicting landslides with drastic deformation, the proposed QRNNs-KDE approach should be applied with caution. To achieve excellent performance, sufficient data are recommended and needed for model training.

## 6. Conclusions

In this study, a QRNNs-KDE approach was proposed to improve the prediction accuracy and uncertainty quantification of landslide displacement. The Fanjiaping landslide in the TGRA was selected as a case study to explore the performance of the QRNNs-KDE approach. The following conclusions from the study were obtained:

The movements of the Fanjiaping landslide was especially pronounced under prolonged periods of dropping reservoir levels, especially during periods of slight dropdown at the highest reservoir level, and the minimum triggering threshold consists of episodes lasting one month, with cumulative rainfall exceeding 158 mm.

The QRNNs-KDE approach achieves perfect performance and outperforms the traditional BP, RBF, ELM, SVM, bootstrap-ELM-ANN, and Copula-KSVMQR methods. Additionally, the proposed approach is more informative by providing the likely range and complete PDFs of landslide displacement. The landslide displacement observations are distributed in the middle of the prediction range with high probability.

In practical application, the proposed QRNNs-KDE approach is suitable for medium-term to long-term horizon forecasting. The range and complete PDF of the predictive displacement can supplement final point predictions for decision making.