Active-Learning Approaches for Landslide Mapping Using Support Vector Machines

: Ex post landslide mapping for emergency response and ex ante landslide susceptibility modelling for hazard mitigation are two important application scenarios that require the development of accurate, yet cost-effective spatial landslide models. However, the manual labelling of instances for training machine learning models is time-consuming given the data requirements of ﬂexible data-driven algorithms and the small percentage of area covered by landslides. Active learning aims to reduce labelling costs by selecting more informative instances. In this study, two common active-learning strategies, uncertainty sampling and query by committee, are combined with the support vector machine (SVM), a state-of-the-art machine-learning technique, in a landslide mapping case study in order to assess their possible beneﬁts compared to simple random sampling of training locations. By selecting more “informative” instances, the SVMs with active learning based on uncertainty sampling outperformed both random sampling and query-by-committee strategies when considering mean AUROC (area under the receiver operating characteristic curve) as performance measure. Uncertainty sampling also produced more stable performances with a smaller AUROC standard deviation across repetitions. In conclusion, under limited data conditions, uncertainty sampling reduces the amount of expert time needed by selecting more informative instances for SVM training. We therefore recommend incorporating active learning with uncertainty sampling into interactive landslide modelling workﬂows, especially in emergency response settings, but also in landslide susceptibility modelling.


Introduction
Despite significant progress in landslide hazard assessment and mitigation, these hazards still present a major challenge for policymakers to reduce monetary losses and casualties. The occurrence probability of landslides, which broadly include a large variety of downslope movement processes on hillslopes under the effects of gravity [1], varies greatly in space and time as a result of complex patterns of predisposing factors and temporal variation in triggering factors. Considering the ongoing global trends of urbanization, deforestation, and climate change, landslide science faces the growing challenge of having to update landslide hazard assessments and provide rapid post-disaster information in the event of regional triggering events such as rainstorms and earthquakes [2][3][4]. For example, an earthquake in Tomakomai, Japan triggered about 10,000 landslides causing 36 deaths [3], and in 2018, landslides triggered by seasonal heavy precipitation caused approximately 105 deaths and USD 212 million in losses in China [5]. In Italy, as in many other regions worldwide, landslides are mostly triggered by intense or prolonged rainfall [6]. These hazards often cause long-term economic loss, population displacement, and negative effects on the natural environment.
Landslide mapping refers to the manual or automated detection and delineation of actual landslides that are appreciable in remote-sensing imagery or based on their topographic footprint [7][8][9]. Additionally, the growing availability of light detection and ranging (LiDAR) derived high-resolution digital terrain models (HRDTM) allows us to detect landslides where passive optical sensors are limited (e.g., within the forest) [10,11]. This classification task is related to landslide susceptibility mapping, which focuses on estimating the probability of future landslide occurrences based on predisposing factors: usually topographic, geological, and land use/land cover conditions. Evidently, factors that control susceptibility can also provide valuable information for landslide mapping [12], which also requires post-event remote-sensing data (e.g., optical or LiDAR). Conversely, landslide inventories created by means of landslide mapping are a necessary input for landslide susceptibility mapping using supervised classification models. Together, landslide mapping and susceptibility modelling play a critical role in providing information that is necessary for decision making in emergency situations and for reducing risk in the development of spatial planning strategies.
Machine-learning techniques are increasingly being adopted in landslide modelling, as they have the potential to better adapt to complex nonlinear Earth surface processes and their interactions with land use than parametric statistical techniques such as logistic regression. Examples for black-box machine-learning models include the SVM, artificial neural networks, and random forests, whereas the generalized additive model (GAM) as an intermediate-complexity model is popular due to its nonlinear but more interpretable structure [13][14][15][16]. However, these data-driven supervised learning algorithms need a large number (e.g., thousands) of observations of landslide presence/absence, which are usually derived from manually digitized landslide inventories. Creating landslide inventories and more generally the manual labelling or annotation of these instances is a very timeconsuming task, which increases the cost of landslide modelling studies and leads to delays in post-disaster situation awareness.
Active learning (AL) is a framework that promises to reduce this burden by selecting "informative" instances for the user to label [17,18]. In each of these queries to the user, an additional small batch of unlabelled instances (e.g., grid cells) is selected based on an informativeness measure and then presented to an "oracle" (i.e., a human annotator) for labelling. Active learning aims to achieve better accuracies using as few labelled instances as possible, thereby minimizing the cost of obtaining labelled data. Active learning has increasingly been adopted in remote-sensing classification [19][20][21] but has rarely been adopted in the context of landslide mapping [22].
The SVM has become increasingly popular in the context of landslide modelling along with other nonlinear techniques such as the generalized additive model [15,23,24]. Compared to the less flexible GAM, the SVM is capable of modelling nonlinear interactions among predictors while avoiding overfitting through regularization. Therefore, active learning based on the SVM for predicting landslides was adopted in this paper.
Hence, the main objective of this study is to assess the potential of different activelearning strategies for landslide mapping based on limited amounts of labelled data. We consider two popular active-learning query strategies in combination with the SVM in a case study from the Andes of southern Ecuador [25].

Materials and Methods
In this study, we use active-learning strategies [26] to sample "interesting" (i.e., informative) locations for training SVM models for landslide detection, and we compare this approach to a simple random sampling strategy ( Figure 1). In active learning, a small training data set is initially retrieved to obtain a preliminary SVM fit. This model's classifications then allow us to identify relevant additional instances that have the greatest potential to improve the model fit. These instances are then labelled, and the SVM is retrained with the additional training data. This step is repeated to investigate changes in model performance with increasing instance size. Hyperparameters are tuned in each individual step, and model performances are estimated in identical test sets to ensure comparability. Details of this procedure are explained in the following sections, and Settles (2010) [26] provides a detailed overview of active-learning strategies.

Active-Learning and Traditional Learning Strategies Used
Active learning is a subfield of machine learning that is also referred to as query learning or optimal experimental design in the statistical literature. It is a very broad field encompassing many different approaches [26][27][28][29][30]. In general, the purpose of active learning is to require as few labelled instances as possible while achieving a level of high accuracy. After starting with a small initial training set, small batches of additional "informative" instances are presented to an expert for labelling.
There are three different settings of active learning, namely membership query synthesis, stream-based selective sampling, and pool-based sampling [26,27]. For our study area, we have a large collection of unlabelled data. Due to uncertainty in the random generation process of instances using membership query synthesis [31] and the high cost of labelling selected instances one by one using stream-based selective sampling, pool-based sampling was adopted in this study. Pool-based sampling can be further divided into two main categories: uncertainty sampling and query by committee [19].
We denote the unlabelled data set as x and the classes of the unlabelled data set as y. P θ is the posterior probability as estimated by the current model θ in a given activelearning step.

Uncertainty Sampling
Uncertainty sampling chooses the instances that are predicted with the lowest confidence, i.e., that are associated with the greatest uncertainty in the current model [32]. It may be the most common and simplest active-learning approach. We briefly present three popular uncertainty-sampling strategies, but we choose only margin sampling due to the mathematical equivalence of these approaches in two-class situations.
(1) Least Confidence The least-confidence strategy for a sequence of models queries the instances for which the current model has the least confidence as the predicted classes are equally likely [33]. Therefore, the most "informative" instances are selected bŷ whereŷ is the most likely label, i.e., the class label with the highest posterior probability under the current model θ. x * LC represents the instance that the current model θ is most likely to mislabel. Margin sampling (MS) was proposed to additionally take advantage of information regarding the posterior probabilities of all of the labels, not only the most likely one [34]. Here, instances with the smallest margin between the posterior of the first and the second most likely labels are selected. Since these are more ambiguous, the model has difficulty in differentiating between the two most likely class labels. Hence, knowing the true label would help the classifier to discriminate them more effectively. The instances are selected using whereŷ 1 andŷ 2 are the first and the second most likely labels, respectively.
(3) Entropy measure One of the more general pool-based sampling strategies is based on the entropy measure [35]. This strategy uses entropy, which is an information-theoretic measure of uncertainty of a random variable. It aims at using information from all of the remaining classes to detect the most informative instances. Intuitively, the entropy measure strategy should perform better than the least-confidence and MS strategies, especially for very large label sets. Instances are selected using where E(y, x) is the entropy value of class y for instance x. Instances with the highest entropy value, which imply more uncertainty in the distribution, are selected as x * EP . Each of these uncertainty sampling strategies have their own application scenarios. In binary classification, however, all three are equivalent in selecting instances with the posterior class probabilities closest to 0.5 [26]. We implemented uncertainty sampling using the equation given for margin sampling, and we therefore refer to it as margin sampling in the rest of the paper.

Query by Committee
Query by committee (QBC) is another more theoretically motivated active learning algorithm that selects informative unlabelled instances based on different models (a committee) trained on the current labelled training set [26,36]. Based on the posterior probabilities predicted by the different committee members, the unlabelled instances with the maximum disagreement are selected. Two important measures of disagreement among the models are the Kullback-Leibler (KL) divergence [37,38] and vote entropy [39]. Because KL divergence applies several independent justifications and calculates the average difference between the label distributions of any committee, it is considered the better approach to selecting informative instances [30]. Hence, KL divergence was adopted in this study. Let C = {θ(1), . . . , θ(C)} denotes the set of models forming the committee. Then corresponds to the committee's average posterior probability of class label y i , and denotes the KL divergence, which we try to maximize on average over all committee members: This strategy focuses on the instances x * KL with the larger average difference with respect to the label distributions of any one committee.
Various strategies can be applied to creating a committee, such as bootstrap resampling of the training data [40,41]. We decided to set up a committee of SVM classifiers trained using different hyperparameter values since the behaviour of the SVM strongly varies with its cost and bandwidth parameters, C and γ [42]. In each active-learning round, we randomly sampled 250 pairs of hyperparameter values (log 2 C between −12 and +15 and log 2 γ between −12 and +6). The best-performing 25 hyperparameter settings were then selected to form a committee for that round.

Random Sampling as a Baseline
In addition to the active-learning strategies described above, simple random sampling (RS) was used for comparison. It randomly selects instances from the unlabelled data with equal probability and does not try to assess the utility of the data for landslide mapping.

Landslide Classification Model
In this study we used a support vector machine (SVM) model, which is a flexible supervised machine-learning technique [43]. It has previously shown competitive performances in landslide modelling [12,15,44,45]. This technique is particularly appealing in active learning because its flexibility can be tuned extremely well, allowing it to transition from a strongly penalized simple model to a more complex one as the sample size grows larger. The SVM can be applied in both one-class and two-class cases. Yao, et.al. (2008) compared one-class and two-class SVM on landslide analysis and concluded that two-class SVM could have better prediction efficiency than one-class SVM [46]. Therefore, in this study, we applied two-class SVM as the active-learning classifier.
Because the flexibility of the SVM is controlled by its hyperparameters γ (bandwidth) and C (cost), a k-fold cross-validation was used in each active-learning iteration to optimize them [47]. In this cross-validation, the training set is split into k equally sized partitions, one of which is retained for testing the model, and the remaining k − 1 partitions are used as training data. This process is repeated k times, and every partition is used once as the validation data. Performance estimates are averaged over the k partitions to obtain a crossvalidation estimate of the performance measure. We used k = 10, which is a commonly used setting. Given the spatial nature of our data, we used k-means clustering of the sample coordinates to generate spatial cross-validation partitions [48]. The choice of the SVM kernel function is less critical; therefore, the radial basis function kernel was adopted.
The area under the receiver operating characteristic (ROC) curve [49,50] (AUROC) [51] was used as a measure of predictive performance. Its range is between 0.5 (no predictive skill) and 1 (perfect separation).

Repetition and Performance Estimation
The workflow described above was repeated 150 times in order to eliminate the influence of random variability on our results. In active learning as well as random sampling, we draw an additional batch 25 points in each iteration or epoch after an initial random sample of 210 points in the first step. This initial data contains 10 landslide points and 200 non-landslide points, roughly representing the spatial landslide density within the study area. With the chosen batch size, on average, one additional landslide point will be drawn from the study area even in random sampling, given the 4% landslide density. We performed 50 iterations in order to observe the convergence of results for large instance sizes, although such large sample sizes (1460 labelled instances) are not of practical relevance.
The entire study area (87,223 non-landslide and 2569 landslide gird cells) as the target area for landslide mapping was used for analysing and comparing AUROC performances obtained by SVMs with active-(MS, QBC) and passive-learning strategies (RS). We ex-tracted 55,887 non-landslide and 1663 landslide grid cells from the overall data set to serve as a pool of candidate grid cells from which we sampled the training data set.
In order to gain insight into the importance of predictors at different active-learning stages, we further calculated the permutation-based variable importance as a simple overall measure of predictive importance [52]. This was applied in a spatial cross-validation framework, i.e., by making predictions on spatially disjoint cross-validation test sets [53]. Accumulated local effects (ALE) [54] plots were further generated to visualize the shape of the relationships between important predictor variables and SVM landslide classifications.

Study Area and Data
The study area of our case study is located in the Andes of Southern Ecuador in the Reserva Biológica San Francisco (RBSF). The RBSF is located between the provincial capitals of Loja and Zamora (3 • 58 30 S, 79 • 4 25 W) [62]. The slopes are steep (1st quartile of slope angle: 28.8 • , median: 36.5 • ) and covered with evergreen lower and upper mountain rainforest [63]. The annual precipitation in the study area ranges from 2000 mm in the lower parts to more than 6000 mm between 2900 and 3100 m a.s.l. with nearly daily rainfall [64]. This area is characterized by a high frequency of landslide occurrences, which underlines the potential utility of active-learning techniques for generating event-based landslide inventories on demand with as little labelling as possible. In this study area, landslides are important ecosystem disturbances that trigger local vegetation successions and thus contribute to habitat complexity in this unique hotspot of plant biodiversity [65,66]. Landslide processes were previously studied in more detail with a focus on geomorphic process rates and the possible effects of human land use [25,67]. In this case study, we focus on the "natural" part of the RBSF study area of Muenchow et al. (2012). The dataset includes 178 landslides with a mean landslide size of 793 m 2 . We refer the reader to Muenchow et al. (2012) for further geomorphological detail and an analysis of landslide susceptibility [25].
We used a high-resolution orthorectified aerial photograph of the study area as a direct optical indicator of vegetation disturbance by landslides. The image was acquired in 2001 and has a 0.3 m × 0.3 m spatial resolution (data source: E. Jordan and L. Ungerechts, Düsseldorf; DFG Research Unit FOR 816). Small cloudy patches and other errors were masked out manually ( Figure 2). Landslides were mapped in this imagery by J. Muenchow (Erlangen), who analyzed the landslide distribution and characteristics in this study in more detail as part of a regional-scale comparison [25].
Vegetation indices (VIs) play an important role in mapping landslides and other forest disturbances [68,69]. Although the near-infrared part of the spectrum is particularly useful for identifying photosynthetically active, healthy plants, the imagery available for this case study is limited to the visible part of the spectrum, while having the benefit of offering the resolution required to detect the narrow, elongated landslides of this study area. Considering the spectral characteristics of the available orthophoto, we used the green chromatic coordinate (GCC) vegetation index [70], which has been shown to compare favourably to other indices in the visible part of the spectrum in distinguishing the forest from the soil [71]. The GCC is generally effective in suppressing the effects of changes in scene illumination [72], which is important in our mountainous study area. GCC is defined as where R, G, and B represent the red, green, and blue bands of the ortho-photo. The red (RCC) and blue (BCC) chromatic coordinates are calculated in the same way. Because RCC  Vegetation indices (VIs) play an important role in mapping landslides and other forest disturbances [68,69]. Although the near-infrared part of the spectrum is particularly useful for identifying photosynthetically active, healthy plants, the imagery available for this case study is limited to the visible part of the spectrum, while having the benefit of offering the resolution required to detect the narrow, elongated landslides of this study area. Considering the spectral characteristics of the available orthophoto, we used the green chromatic coordinate (GCC) vegetation index [70], which has been shown to compare favourably to other indices in the visible part of the spectrum in distinguishing the forest from the soil [71]. The GCC is generally effective in suppressing the effects of changes in scene illumination [72], which is important in our mountainous study area. GCC is defined as where R, G, and B represent the red, green, and blue bands of the ortho-photo. The red (RCC) and blue (BCC) chromatic coordinates are calculated in the same way. Because RCC and BCC are strongly correlated (correlation coefficient: 0.95), we used GCC as a vegetation index as well as RCC. A digital elevation model (DEM) of the RBSF at a 10 m × 10 m resolution produced by E. Jordan and L. Ungerechts (Düsseldorf) was generated from stereo aerial photographs from the year 1997. Following Muenchow et al. (2012) [25], we derived the following terrain attributes from the DEM, as they are commonly included in landslide distribution models as preparatory factors: local slope angle (slope), plan and profile curvature (plancurv and profcurv), and the slope angle (cslope) and logarithm of the size of the upslope contributing area (log.carea) [15]. These terrain attributes are intended to act as proxies for destabilizing forces (slope, cslope), water availability (log.carea, concave curvatures), and exposure to wind (convex curvatures) as well as general variability in the characteristics of soil and vegetation [25]. Our expectation is that these terrain attributes will further improve landslide classification.
Overall, our feature set consisted of five terrain attributes and the GCC and RCC as  [25], we derived the following terrain attributes from the DEM, as they are commonly included in landslide distribution models as preparatory factors: local slope angle (slope), plan and profile curvature (plancurv and profcurv), and the slope angle (cslope) and logarithm of the size of the upslope contributing area (log.carea) [15]. These terrain attributes are intended to act as proxies for destabilizing forces (slope, cslope), water availability (log.carea, concave curvatures), and exposure to wind (convex curvatures) as well as general variability in the characteristics of soil and vegetation [25]. Our expectation is that these terrain attributes will further improve landslide classification.
Overall, our feature set consisted of five terrain attributes and the GCC and RCC as remote-sensing variables. Predictors that presented outliers were winsorized at the 1st and 99th percentile.

Model Performance
Overall, active learning using margin sampling outperformed query by committee and random sampling after only four epochs, i.e., starting with a learning instance size of 310 grid cells (Figure 3). The SVM with MS increased continuously from this point, going from 0.80 in epoch 3 with only 285 grid cells to 0.83 after epoch 8, i.e., with label information for >410 grid cells. Mean AUROCs obtained with RS and QBC were very similar; they both reached~0.79 for only large sample sizes. The similar performances of QBC as an active-learning strategy and RS for passive learning suggest that the instances labelled in QBC-based active learning were no more informative than the ones retrieved with simple random sampling. Nevertheless, SVM performances with QBC were less variable than those completed with RS. an active-learning strategy and RS for passive learning suggest tha in QBC-based active learning were no more informative than the o ple random sampling. Nevertheless, SVM performances with QBC those completed with RS. Similarly, the random variability of AUROC performances o revealed much less variable results for MS-based active learning which show similar results (Figure 4). Differences in variability bet were at least twofold across all epochs, which indicates that inform only improves the performance but also reduces the probability of due to random variability. Considering the importance of the cost and gamma hyperpara Similarly, the random variability of AUROC performances over the 150 repetitions revealed much less variable results for MS-based active learning than for QBC and RS, which show similar results (Figure 4). Differences in variability between MS and QBC/RS were at least twofold across all epochs, which indicates that informative instance data not only improves the performance but also reduces the probability of obtaining poor results due to random variability. Similarly, the random variability of AUROC performances o revealed much less variable results for MS-based active learning which show similar results (Figure 4). Differences in variability be were at least twofold across all epochs, which indicates that inform only improves the performance but also reduces the probability of due to random variability. Considering the importance of the cost and gamma hyperpar ity of the SVM, we examined the variability in optimal hyperpara tive-learning epochs in margin sampling. As the sample size incr Considering the importance of the cost and gamma hyperparameters for the flexibility of the SVM, we examined the variability in optimal hyperparameters for different activelearning epochs in margin sampling. As the sample size increased, the optimal cost parameters across the repetitions were increasingly concentrated around the 2 0 to 2 5 region and the optimal γ to around 2 −5 , although the optimal region extended diagonally towards higher cost values when combined with larger γ values ( Figure 5).
Remote Sens. 2021, 13,2588 parameters across the repetitions were increasingly concentrated around the 2 0 gion and the optimal γ to around 2 −5 , although the optimal region extended di towards higher cost values when combined with larger γ values ( Figure 5).

Model Interpretation
A permutation-based variable importance assessment for the SVM with mar pling revealed that the most important predictors were GCC and RCC, which lowed by a logarithm of the catchment area and catchment slope ( Figure 6). Thus tors that are commonly used in landslide susceptibility modelling helped to imp performance of models for landslide mapping consistently across all epochs. N there are moderate to strong correlations between the slope variables as well as the upslope contributing area and the two curvature variables (Table 1). local slope angle (slope), plan and profile curvature (plancurv and profcurv), the slope an (cslope) and logarithm of the size of the upslope contributing area (log.carea), and green c coordinate (GCC) and red chromatic coordinate (RCC).

Model Interpretation
A permutation-based variable importance assessment for the SVM with margin sampling revealed that the most important predictors were GCC and RCC, which was followed by a logarithm of the catchment area and catchment slope ( Figure 6). Thus, predictors that are commonly used in landslide susceptibility modelling helped to improve the performance of models for landslide mapping consistently across all epochs. Note that there are moderate to strong correlations between the slope variables as well as between the upslope contributing area and the two curvature variables (Table 1) (Figure 7). Broadly speaking and as expected, landslides are primarily characterized by low vegetation vigour as represented by a low vegetation index, and to a smaller extent, by a steep upslope area. They are also rarely found in the valley bottoms, where the upslope contributing area is large, or directly on ridges or hilltops, where the upslope contributing area would be small. Ridges and hilltops often show reduced vegetation canopy due to factors other than landslides, such as windthrow, and the inclusion of the upslope contributing area therefore reduces confounding with these patterns, resulting in a geomorphologically more plausible classification. Local slope angle (slope), plan and profile curvature (plancurv and profcurv), the slope angle (cslope) and logarithm of the size of the upslope contributing area (log.carea), and green chromatic coordinate (GCC) and red chromatic coordinate (RCC).  (Figure 7). Broadly speaking and as expected, landslides are primarily characterized by low vegetation vigour as represented by a low vegetation index, and to a smaller extent, by a steep upslope area. They are also rarely found in the valley bottoms, where the upslope contributing area is large, or directly on ridges or hilltops, where the upslope contributing area would be small. Ridges and hilltops often show reduced vegetation canopy due to factors other than landslides, such as windthrow, and the inclusion of the upslope contributing area therefore reduces confounding with these patterns, resulting in a geomorphologically more plausible classification. Landslide maps predicted by SVM with margin sampling clearly depict many of the landslide-affected areas even after only five epochs (Figure 8). There is little change in the spatial pattern of mapped landslides after more than five epochs, which is consistent with the relatively stable model performances and variable importance reported above. Despite the visual similarities between epochs 5 and 10, it should be remembered that quantitative performances in terms of the mean and standard deviation of AUROC did substantially improve from epoch 5 to epoch 10, as hyperparameter tuning started to stabilize as well around epoch 10. Landslide maps predicted by SVM with margin sampling clearly depict many of the landslide-affected areas even after only five epochs (Figure 8). There is little change in the spatial pattern of mapped landslides after more than five epochs, which is consistent with the relatively stable model performances and variable importance reported above. Despite the visual similarities between epochs 5 and 10, it should be remembered that quantitative performances in terms of the mean and standard deviation of AUROC did substantially improve from epoch 5 to epoch 10, as hyperparameter tuning started to stabilize as well around epoch 10.

Potential of SVM with AL
Overall, our results confirm the potential of AL for remote-sensing applications and

Potential of SVM with AL
Overall, our results confirm the potential of AL for remote-sensing applications and for landslide mapping in particular [19,22], and demonstrate the suitability of uncertainty sampling strategies. AL retrieves the data that it believes is more likely to be misclassified [26]. For the learning process of SVM, it builds a margin to classify the instances based its features. If a new candidate point's distance from the margin is too small, this instance is more likely to be misclassified by the model and labelling these instances therefore has the greatest potential to improve model performance.
Landslide data is always imbalanced, which poses a particular challenge in classification modelling that can be addressed using uncertainty sampling. In this study, landslides covered only 4% of the study area, and consequently, random sampling is very poor at collecting information on positives. In contrast, AL strategies can reduce the impact of imbalance by retrieving more "useful" instances, including more positives (Table 2 and Figure 9). This is especially true for MS. more likely to be misclassified by the model and labelling these ins the greatest potential to improve model performance. Landslide data is always imbalanced, which poses a particular cation modelling that can be addressed using uncertainty sampling. slides covered only 4% of the study area, and consequently, random sa at collecting information on positives. In contrast, AL strategies can r imbalance by retrieving more "useful" instances, including more po Figure 9). This is especially true for MS.

Limitations of SVM with AL
A possible limitation of AL in the context of SVM classification tween the hyperparameters and the query strategy. Specifically, since on posterior probability, they are sensitive to SVM hyperparameters cannot be reliably estimated, as is the case in the initial epochs with model performance can be highly variable and sometimes poor (Fig  this occurs, AL query strategies may be close to random sampling, or sample irrelevant regions in the featured space [73,74]. The QBC strat

Limitations of SVM with AL
A possible limitation of AL in the context of SVM classification is the interaction between the hyperparameters and the query strategy. Specifically, since AL queries depend on posterior probability, they are sensitive to SVM hyperparameters. If hyperparameters cannot be reliably estimated, as is the case in the initial epochs with small sample sizes, model performance can be highly variable and sometimes poor (Figures 3 and 4). When this occurs, AL query strategies may be close to random sampling, or they may even oversample irrelevant regions in the featured space [73,74]. The QBC strategy may be particularly sensitive to this problem since SVMs with 25 different hyperparameter settings were used to form a committee. In general, it can be difficult to strike the right balance between diversity and goodness-of-fit of committee members, and the use of the top hyperparameter settings may not achieve an optimal committee. Although other strategies could be used, it was beyond the scope of this work to experiment with these additional design options. Xu et al. (2020) used three different pre-trained SVMs as committee members to conduct classifications and concluded that pre-trained SVMs made QBC more robust in iterative training [42]. Stumpf formed QBC committees using 500 fully grown trees from a random forest (RF) and concluded they can achieve good results [22], but no comparison with other, model-independent QBC approaches or with other query strategies was made. Considering these limitations and the positive results achieved with uncertainty sampling strategies, we suggest that the latter offers several advantages ranging from a simpler, model-agnostic implementation to fewer design decisions and reduced computational cost.

Conclusions
In this study, active-learning strategies for training landslide detection models outperformed models trained using randomly sampled data. The mean AUROCs of the SVM with margin sampling as an active-learning strategy was 0.80 with only 285 instances and 0.83 with 410 instances. In contrast, SVMs with query-by-committee and random sampling achieved AUROCs around 0.79 but only for large sample sizes. Meanwhile, the SVM with margin sampling was more robust than the other strategies. Therefore, uncertainty sampling is particularly promising as it achieved the best performance, was best able to handle imbalanced data, and is straightforward to implement regardless of the machine-learning model being used.
Labelling a large number of instances using human experts is a time-consuming process that cannot be executed in sufficient detail under time constraints, e.g., in emergency situations. Additionally, human experts cannot recognize which additional instances would be the most "useful" for predicting the response to, or in our case for identifying, landslides. Active-learning methods are therefore a promising strategy as part of an interactive landslide detection workflow, especially in an emergency response setting.
Author Contributions: Conceptualization, methodology, writing-original draft preparation, visualization, Z.W.; formal analysis, data curation, A.B. and Z.W.; writing-review and editing, supervision, A.B. All authors have read and agreed to the published version of the manuscript.
Funding: Z.W. was funded through a China Scholarship Council PhD scholarship, which is gratefully acknowledged.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.