Susceptibility Assessment for Landslide Initiated along Power Transmission Lines

: The power network has a long transmission span and passes through wide areas with complex topography setting and various human engineering activities. They lead to frequent landslide hazards, which cause serious threats to the safe operation of the power transmission system. Thus, it is of great signiﬁcance to carry out landslide susceptibility assessment for disaster prevention and mitigation of power network. We, therefore, undertake an extensive analysis and comparison study between different data-driven methods using a case study from China. Several susceptibility mapping results were generated by applying a multivariate statistical method (logistic regression (LR)) and a machine learning technique (random forest (RF)) separately with two different mapping-units and predictor sets of differing conﬁgurations. The models’ accuracies, advantages and limitations are summarized and discussed using a range of evaluation criteria, including the confusion matrix, statistical indexes, and the estimation of the area under the receiver operating characteristic curve (AUROC). The outcome showed that machine learning method is well suitable for the landslide susceptibility assessment along transmission network over grid cell units, and the accuracy of susceptibility models is evolving rapidly from statistical-based models toward machine learning techniques. However, the multivariate statistical logistic regression methods perform better when computed over heterogeneous slope terrain units, probably because the number of units is signiﬁcantly reduced. Besides, the high model predictive performances cannot guarantee a high plausibility and applicability of subsequent landslide susceptibility maps. The selection of mapping unit can produce greater differences on the generated susceptibility maps than that resulting from the selection of modeling methods. The study also provided a practical example for landslide susceptibility assessment along the power transmission network and its potential application in hazard early warning, prevention, and mitigation.


Introduction
As climate change intensified and energy demand expanded [1], the malfunctions of power network frequently occurred in recent years, such as the massive outage in Texas, in the United States, in 2021 [2,3], and the severe electricity shortage in northeast China since September 2021 [4].As the main frame of power network, high-voltage transmission lines must span wide areas with different geographic and climatic features.Power transmission infrastructures are usually built on mountainous and hilly area in order to avoid the mutual interaction with human activity.However, such areas are also more prone to geohazard, especially landslide.According to statistics from China Electric Power Research Institute (CEPRI), in the most recent 5 years, a total of over 6000 transmission towers were under the potential threat of geohazards, and more than 25% were caused by landslides [5].Therefore, the identification of landslide susceptible areas along the transmission network provides a basis for landslide early warning and integrated risk analysis, which significantly contribute to hazard prevention and the safety operation of power network.
Landslide susceptibility assessment aims to estimate the spatial probability of potential unstable slopes based on the information of past and present landslide events.The quality of landslide susceptibility map highly depends on the input data, especially the explanatory variables and landslides inventory [6][7][8][9].Many approaches have been developed to quantitatively assess landslide susceptibility, which could be loosely grouped into three main groups, physically-driven models, knowledge-driven models, data-driven models, including statistically-based classification methods and recently well-developed machine learning (ML) methods [10][11][12][13].Each of these approaches has shown its advantages and limitations.The quality of a heuristic model largely depends on the understanding of the real causes, and the conditioning factors of investigators in an area, currently, deliver a high prediction accuracy, however, inevitably introducing a lot of subjective experience [14,15].Among them, only physically-driven method considers the physical interactions between landslide occurrence and its terrain condition.However, such models require high-quality geotechnical and hydrological input data on the specific site [16][17][18], thus only being applicable to small scales.Thus, in this article, we focus on the susceptibility assessment for landslide initiation along power transmission lines by using data-driven modeling methods in Geographic Information System (GIS) integrated environment.
Statistical methods are sustained on the basic assumption that landslides are recurrent events that occur independently, and future landslides are more likely to occur under the same geologic and morphologic conditions which led to a past landslide [19][20][21].Besides, landslide conditioning factors are spatially linked and, therefore, can be used to predict future landslides [22,23].The development of landslide is dominated by the geological and topographical conditions of the individual slopes, and also triggered by external factors, such as hydrological conditions, climatic conditions, human engineering activities, and earthquake [24].More conditioning factors were identified for landslide susceptibility assessment in past literatures [12,25].Consequently, the selection of landslide conditioning factors is a crucial step in landslide susceptibility mapping (LSM).
As a long-term research hotspot, basically all the issues about susceptibility mapping have been extensively studied, including the selection and optimization of explanatory variables, performance evaluation of various susceptibility models, and validation of the susceptibility assessment result [23,26].Recently, comparisons between the performance of statistical and various novel machine learning models accounted for the majority of these landslide susceptibility studies [27][28][29][30].Besides, uncertainty related with landslide inventories, predisposing factors, and analytical tools have been addressed in literature [31,32], but little attention has been given to the effects on landslide susceptibility maps resulting from the selection of the terrain mapping unit to represent predicting result [33,34].The type and size of the terrain mapping unit may exert significant influence on the final results of the susceptibility assessment.The selection of the most adjusted terrain mapping unit for modeling depends on the scale and aim of the work, the quality and resolution of the available information, and the type and size of landslides [20,[35][36][37].Many authors stated that landslide susceptibility models performance increased when slope unit (SU) is used, probably owing to its clear topographic meaning and distinct ability to exploit heterogeneous information in a consistent way [33,36].
A number of landslide susceptibility related analyses were developed in the selected region due to the large amount of landslide and their potential threat to local habitations [38][39][40][41][42][43][44][45].Most of them mainly focused on enhancing the performance of predictive model, by using hybrid machine learning models and hyper-parameter optimization methods, while neglecting the practical understanding and applications of landslide susceptibility result.Therefore, in this work, both SU and raster unit were employed to assess the landslide susceptibility along power transmission lines, aiming to obtain the most applicable LSM results.
Limited number of contributions are specifically oriented to the quantitative estimation of landslide spatial probability to some specific risk-bearing element, such as roads or power transmission lines, which has a long-linear distribution [46].Jaiswal et al. presented a landslide hazard assessment study along a transportation line in southern India, combining the spatial, temporal and volume probability estimation.[47].Peng et al. performed a regional landslide susceptibility mapping in the Three Gorges Area considering different exposed elements, including roads [38].Das et al. developed and applied a quantitative methodology for landslide hazard assessment in a national highway corridor in the Himalayan region, using homogeneous susceptible units [48].Ge et al. compared five different landslide susceptibility models using the case study in Longnan, City Gansu Province, with a 330-kV transmission line [49].
Besides, for tasks, such as LSM, there is some subjectivity when dealing with their performance evaluation.Both the performance of susceptibility models and the overall quality of the LSM result required a comprehensive evaluation.The set of tests included the degree of model fit, the robustness of the model, the uncertainty associated with the probabilistic estimate, and the model prediction skill [20,[50][51][52][53]. Usually, the overall model performance determined by quantitative evaluation metrics is not enough to offer a full assessment of model reliability.A deeper sufficiency analysis of landslide susceptibility maps generated by the models is essential [13].In such condition, Guzzetti et al. proposed a set of criteria, the Susceptibility Quality Level (SQL), to rank the quality of a landslide susceptibility assessment, which considers the type of tests performed to evaluate the quality of susceptibility assessments results [20].
This study was designed and performed with the purpose of producing a susceptibility map with the special attention to landslides along the power transmission networks.The high-voltage transmission lines in the Three Gorges Reservoir area (TGRA), China, was selected as a case study due to no landslide susceptibility map existing.The result provided a comparison case between LR and RF models employing two different mapping unit.Furthermore, a comprehensive validation of the generated LSM was presented.

Study Area
The study area is selected as the environs of the high-voltage transmission lines of 5 counties in the TGRA, China, namely Wanzhou, Yunyang, Wushan, Fengjie, and Badong, between longitudes 30 • 49 ~31 • 41 N and latitudes 107 • 55 ~110 • 19 E (Figure 1).Regarding the transmission lines as the center, the environs are created with a buffer distance of 2000 m.The distance is determined using the empirical equation of landslide runout distance proposed in Reference [54].Landslides out of this range are considered to have no impact to the power transmission infrastructures.The main strata include: T 1d , T 1j , T 2b , T 3xj : Triassic Daye Formation, Jialingjiang Formation, Badong Formation, and Xujiahe Formation, respectively; J 1 : Jurassic Zhenzhuchong Formation and Ziliujing Formation; J 2 : Jurassic Shaximiao Formation and Xingtiangou formation; J 3p , J 3s : Jurassic Penglaizhen Formation, and Suining Formation.
The region is in the southwest of China, close to the middle reaches of the Yangtze River, with a maximum elevation of 2469 m.A complete power transmission network has been established in the area, which is mainly composed of 7 different high-voltage lines, such as Shenwan Line, Panlong Line, and Wanpan Line.
Influenced by the Himalayan orogeny, the study area is mainly moderate to low altitude mountains and river valleys sector of the TGRA.The current landscape was marked by Fengjie County as a topographic turning point, with low altitude mountains and hills in the west-side and higher altitude river valley and moderate mountains in the east [55].Meanwhile, solid evidence of intensive tectonics is observed in the area, including large-scale structures, such as the Wanzhou synclinorium, Qiyueshan anticline, Wushan syncline, Guandukou syncline, and Xiannvshan fault [55].
The age of the bedrock varies from Cambrian to Quaternary, and the lithology transitioned from Jurassic and Triassic clastic rock (sandstone, mudstone, and sandstone interbedded with mudstone layers) to hard carbonaceous rocks (limestone, marlstone, and dolostone) from west to east, with a large number of stratified hard and soft interbedded rocks entrained.The so-called red strata are widespread in this area, mainly exposed in the west of Fengjie, which refer to sandstone, mudstone, and sandstone interbedded with mudstone layers, while the hard rocks form the steep gorges and valleys in the east of Fengjie.

Methods
In this study, we aim to assess the susceptibility for landslides along power transmission network with 2 data-driven methods and critically evaluate the generated LSM from the perspective of the practical application of the power transmission divisions.
The adopted procedure we implemented mainly consisted of four steps (Figure 2): (a) We construct a spatial database from various data sources and extract the landslide conditioning factors from the constructed database using two types of mapping units (raster and slope units).(b) We analyze the landslide conditioning factors through the optimize processes, which include multicollinearity diagnose and factor contribution analysis; then, the optimized factors are used to create the training and test datasets through resampling strategy.(c) We establish the susceptibility models using data-driven methods: logistic regression and random forest.The parameters of the involved machine learning methods are obtained by error and trial method.In addition, we assess and compare the models' performance using some evaluation methods and an independent landslide dataset; (d) Lastly, we generate LSMs and comprehensively assess the overall performance of them.The main process is operated in ArcGIS.The study area has long been plagued by landslides, which are largely induced by intensive rainfall and reservoir water level fluctuation [55].Since the study area has been cropped as a 4-km-wide belt-like area along the transmission line, a total of 264 landslides within this area were defined as potentially threats to power transmission lines.It should be noted that shallow landslides in colluvium are the most representative landslide in this area; however, rock falls and soil mass movements with small magnitude were excluded from this inventory due to the low degree of threat.Therefore, the remaining landslides in the record tend to have large volume.According to the existing information, the inventory contains 41 rock slides, with volume ranging from 3600 to 1.6 × 10 7 m 3 , and 207 earth slides [56], with volume ranging from 500 to 7 × 10 6 m 3 .However, landslides o medium (10 5 to 10 6 m 3 ) to large-size (10 6 to 10 7 m 3 ) account for 74% of the total landslides.A total of 50 landslides developed along the Yangtze River or its tributaries, which were considered to be seriously affected by groundwater level fluctuation.More than 90% percent of the recorded landslides were triggered or greatly affected by the intense rainfall which frequently occurred during rainy season.The main source of this landslide inventory is an old landslide inventory of the TGRA, supplemented with some recent reports of field investigations and landslide news.
Another new landslide database was obtained from the annual transmission line routine inspection implemented by the local power operation and maintenance department and collected by CEPRI.In total, 14 landslides were reported to have exerted varying degrees of influence on transmission towers in the study area from January 2016 to June 2020.

Landslide Conditioning Factors
Landslide conditioning factors need to represent the totality of possible influences that govern landslide triggering mechanisms.Their acquisition path and classification scheme are directly related to their different natures and environment characteristics [57].Based on field investigation, work experience, past literature, and available data of the study area, various aspects of geo-spatial data were intentionally collected and used in this study; the information of data source is shown in Table 1.A total of 15 conditioning factors are considered for the analysis, including: elevation, slope, aspect, profile curvature, plan curvature, topographic wetness index (TWI), Terrain roughness index (TRI), stream power index (SPI), lithology, distance from lineaments, distance from river, distance from road, bedding structure, Normalized difference vegetation index (NDVI), and land cover.
Slope, aspect, profile curvature, plan curvature, TRI, TWI, and SPI were obtained from digital elevation model (DEM) from ASTER GDEM with a 30 m resolution.Among them, slope, aspect, profile curvature, plan curvature, and TRI are conventional factors.TWI and SPI define the amount and power of water flow and accumulation, which can be used to quantify topographic influence on hydrological processes [24].Lithology and faults were extracted by vectorizing geological maps at scales of 1:200,000.Bedding Structure indicates the intersection relationship between strata and slope, and it was subtracted by reclassifying the combination of stratum direction and slope direction based on topographic maps and geological maps [40].NDVI was calculated using Landsat-8 OLI data.Distance from lineaments, rivers, and roads was derived after buffering the vectors of faults, rivers, and roads.The classification scheme for each conditioning factor was shown in Table 2. Two types of assessment units, namely 30-m grid unit and SU, are adopted in this study.SU is defined as the region between ridges and valleys which is constrained by homogeneous slope aspect and degree of inclination [58].It is widely used for its clear topographic condition.
SU could be delineated manually from topographic map or automatically partitioned by some auxiliary software with or without input parameters [58][59][60].The main procedures are as follows: (i) filling the original and the reverse DEM; (ii) extracting the surface water flow direction and flow accumulation to establish the stream link; (iii) extracting the valley lines and ridge lines and establish watershed; and (iv) delineating the slope units based on the watershed; all of these can be found in many previous literature [33,61,62].The whole procedure was integrated and automatically processed in an ArcGIS-based hydrologic analysis modules to avoid subjective uncertainty.Then, in order to rectify the unreasonable delineation due to the insufficient resolution of the applied DEM, the auto-generated SU map was manually modified using the digital terrain map.Table 3 shows the general characteristics of the mapping units.[40,63,64], since they disregard the scales and units of the landslide conditioning factors and allow preliminary ranking [13].
IG is defined as a reduction in entropy E(Y) of a referent landslide inventory Y (with j classes), due to the informational interference of a conditioning factor F (with n classes).It measures the amount of information gained about a random variable from observing another random variable.The factor with a higher value of IGR indicates a higher contribution in the models.The Information Gain value for landslide conditioning factor Fi corresponding to the out-class Y (landslide and non-landslide) is measured as (Equation ( 1)): where E(Y) is the entropy value of Fi and is calculated by using Equation ( 2); E(Y|Fi) is the entropy of Y after associating values of landslide conditioning factor Fi and is estimated using Equation ( 3) where P(Yi) is the prior probability of the class Y and P(Yi|Fi) is the posterior probabilities of Y given the values of conditioning factor Fi.
Then, the IGR of the landslide conditioning factor Y is calculated as: where IntI is the potential information generated by dividing the training data T into m subsets.The formula of IntI was shown as follows: Multicollinearity occurs when the input variables have a high linear correlation between specific conditioning factors [65].Variance Inflation Factors (VIF) provides an index of how much the variance of the estimated regression coefficient has increased due to collinearity.The degree of multicollinearity can be quantified by calculating the standard error variations of landslide conditioning factors.The formula of VIF is given as: where R 2 measured the extent of one specific factor that is correlated with another factor in linear regression.The lower the standard error and VIF value, the lower the multicollinearity risk.

Preparation of the Sample Datasets
Both of pixel-based unit and slope unit were used to cross tabulated with values of the landslide conditioning factors to obtain the corresponding attribute matrices for susceptibility assessment.It should be noted that all the individual SU is assumed to have a homogeneous terrain and lithologic condition; thus, the mode of the categorized factors values within the area of the SU is used to represent such discrete attribute.
The modeling dataset is comprised of the positive cases (with landslides) and the negative ones (with no landslide), with the target class value (landslide) set to '1' and '0' for non-landslide.In raster-based models, positive cells comprised 39,190 historical landslide cells, which were selected by the 264 landslide polygons.The same number of no landslide cells were extracted and randomly selected from the non-landslide areas, such as river networks or terrains with slope angle close to 0 • .Finally, the whole model dataset was randomly separated into the training dataset (70%) and the test dataset (30%).In slopebased models, the same sampling strategy is used for model construction, including 231 landslide SU and 231 non-landslide SU.The modeling process is implemented in SPSS Modeler.

Logistic Regression
Logistic regression (LR) [66] is a widely used multivariate statistical method for landslide susceptibility modeling which can reveal the empirical relationship between the target of landslide occurrence and various independent explanatory variables [67,68].In the function of a usual linear LR model, the explanatory variables can be continuous, discrete interval, dichotomous, categorical, or any combination of them which do not need to be normally distributed [69].The formula of LR is as follows: where x 1 , x 2 , . . .x n are explanatory variables, and Y is a combination function that describe the linear relationship of these variables.For predicting the presence or absence of landslides, Y is used as a binary variable (0 or 1).The parameters b 1 , b 2 , . . .b n are the coefficients at normalized scale which allow for comparison of the relative importance of each independent variables on the response, and a is the intercept.

Random Forest
Random Forest is a supervised classification algorithm and is an ensemble method of a large set of independently trained decision trees [70].With each tree voting for the class membership, the prediction of respective class assignment is determined according to the majority voting of all trees.Taking advantage of the variance among individual trees, such an ensemble method is considered to be robust, accurate, and less prone to overfitting, especially performing on complex dataset with undisclosed noisy variables [71].RF is known to provide high accuracy rates with respect to outliers in predictors due to the use of random selection of the predictors (bagging) and the subsequently combination of model construction [72].
The main hyper-parameters in RF models include the number of trees, the maximum depth of trees, and the maximum number of features considered at each split.Parameters of the models was usually conducted by means of internal cross-validation, OOB (out of bag) method, which is used to estimate variable importance and the internal classification error.

LSMs Performance and Validation
It is necessary to evaluate both of the fitness of the applied landslide susceptibility models and the overall quality of the generated LSMs [20].
In this study, the confusion matrix, statistical indexes, and ROC curve were used to analyze the accuracy of susceptibility models [22,23].In the confusion matrices, landslide samples are denoted as positive, while non-landslide samples are negative.The instance which correctly classified landslide samples is recorded as TP (true positive), and the instance which correctly classified non-landslide samples is TN (true negative); FP (false positive) are the non-landslide samples that are predicted as landslide; FN (false negative) are the landslide samples that are predicted to be non-landslide.A set of quantitative indexes, including accuracy (ACC), precision (PRE), TP rate, TN rate, and MCC, were estimated based on confusion matrices using the formulas shown as follows: Precision (PRE) = TP(TP + FP), (10) True Positive Rate (TPR) = Sensitivity = TP(TP + FN), (11) True Negative Rate (TNR) = Specificity = TN/(TN + FP), Matthews correlation coefficient (MCC) = TP × TN − FP × FN / (TP + FP)(TP + FN)(TN + FP)(TN + FN). (13) Among them, the overall accuracy (ACC) describes the number of correctly classified events of both landslide samples and non-landslide samples.Other confusion-matrix-based statistical index, such as precision, sensitivity, and specificity, were also used to evaluate the capabilities of the predictive models and generate Receiver operating characteristic (ROC), which is a type of curve consider '1-specificity' and sensitivity as the horizontal and vertical axes, respectively [73].The area under the ROC curves (AUC) can be implemented to evaluate the performance of the models, and the model with a larger AUC is considered to have a higher predictive capability.Besides, mean absolute error (MAE) and root mean squared error (RMSE) are useful indicators for predictive accuracy evaluation of continuous variables [74].
To test the quality of the generated LSM, the landslide distribution in different susceptibility levels was statistically analyzed.Meanwhile, area extent covered by each susceptibility class is validated against the landslide density distribution.The independent dataset, with landslides having occurred in the most recent 5 years (2016-2020), was also used in this procedure, and the overall performance of the LSMs were comprehensively discussed.

Selection of Landslide Conditioning Factor
The selection of model parameters in landslide susceptibility makes a major determinant of model accuracy.Fifteen factors were prepared and considered as initial conditioning factors for landslide susceptibility assessment.VIF, which is used to detect and quantify multicollinearity between conditioning factors, and IGR, for factors contribution ranking, are implemented in this study for feature selection.The feature analysis shows that all the implemented landslide condition factors are linearly independent and effective to be used in the LSM for this study.As shown in Table 4, all factors are under the critical multicollinearity threshold (TOL < 0.2 or VIF > 5 [75]) and beyond the contribution threshold of IGR > 0. However, two factors (SPI and TWI) were removed in the SU form because of the unbalance in the sample size of various categories (namely, too much sample for a single category).
Furthermore, two conditioning factors (Altitude and Land cover) ranked among the top three in factor contribution in both raster and SU form.Distance from rivers and slope showed a high importance in SU form, while showing much less importance in raster form.
However, three conditioning factors, i.e., Aspect, Plan curvature, and Profile curvature, show relative less important to other topographic factors.It should be noted that geological factors (i.e., Lithology, distance from lineaments) are normally identified as critical factors in susceptibility models in the TGRA, owing to the complex geological condition.However, such factors seem to be less importance for landslides along power transmission network.Instead, environmental factors (i.e., Land cover, NDVI, Distance from river, Distance from roads) become the dominant factors.

Validation and Model Comparison
The machine learning models of RF and the multivariate statistical model of LR were applied to assess the susceptibility of landslide.To evaluate the performance of the applied models, several statistical index-based evaluation metrics were employed using both training and testing datasets (which are mentioned in Section 3.4).
In this case study, both the RF and the LR models show satisfactory performance, and the result of the testing dataset also shows the same trend as the training dataset (see Table 5).The RF model is able to outperform the LR models in raster-form concerning all the statistical index.When comparing the LR models with different mapping unit, the LR-SU model showed noticeable enhancement over the LR-Raster model.The ROC curves in Figure 3 show the training and testing performance of the applied models.The RF model achieved excellent performance of AUC over 0.9, while LR also showed a good result, of AUC over 0.8.It is interesting to note that the LR-SU obviously outperformed the LR-Raster, despite the much smaller sample size.

Producing LSMs and Result Evaluation
Once the susceptibility models were successfully trained, they were used to determine the landslide susceptibility index for every pixel or SU, which are calculated as decimal float numbers range from 0 to 1.Then, the generated LSMs were reclassified into four levels by defining the limits of the cumulative distribution of the susceptibility values, namely low (40%), moderate (30%), high (20%), and very high (10%).This four-level classification system was mainly designed in order to correspond to the current criterion of hazard early-warning system of the CEPRI.Figure 4 shows the distribution of the landslide susceptibility classes and their defined susceptibility index (LSI) using three ensemble models, respectively.Comparing the distribution of LSI in each susceptibility level of the three generated LSMs, the model built with RF in raster unit has the highest threshold for very-high susceptibility area (LSI > 0.898), and the lowest LSI for low susceptibility area (<0.134).Such distribution of LSI indicates higher probability of landslide occurrence in area with very-high susceptibility, and lower likelihood of landslide occurrence in area with low susceptibility.This is followed by the LR model in SU, which has a threshold for very-high susceptibility level at LSI > 0.833, and LSI < 0.168 for low susceptibility level, respectively.This observation implies that, with more reasonable LSI distribution, the RF model in raster unit is less sensitive than the LR model in both raster unit and SU.This is probably because the RF model is more capable of dealing with redundant or scattered features and capturing the dominant factors.The statistics of the susceptibility maps are shown in Table 6.All the generated LSM results fulfill the two basic spatial principals: (i) an increase in landslide density ratio from low susceptible classes to high susceptible classes; and (ii) high susceptibility classes covering small extents.For the LSM from RF model, 61.9% of the total landslides, covering 61.5% of the total landslide area, occurred in 7.75% of the study area which is categorized as very high susceptibility level.Only 9.43% of the landslides, covering less than 5% of the total landslide area, occurred in low to moderate susceptible area, which took up 72.85% of the total study area.As for the LR model in raster form, 49.8% of the total landslide, covering 48.73% of the total landslide area, occurred in 10% of the study area (which is classified as very high susceptibility level).However, 48 landslides (18.11%) occurred in low to moderate susceptible area, which took up 70.14% of the total study area.It is quite noticeable that the LR model showed a relatively weak predictive ability comparing to the machine learning RF model.Such an outcome provides another example in accord with the current trend which highlights the predictive power of advanced and evolving machine learning models.When it comes to the LR models with different mapping unit, the results reveal that the proportion of landslides in each level of susceptibility region are quite close.In spite of the difference in size and total number of the mapping unit, LR-SU model has a better performance in landslide prediction compared to the LR-Raster model.
Furthermore, landslide data from 2016 to 2020 were applied for further validation.Fourteen towers were reported to have suffered varying degrees of damage from landslides during this period.By overlaying the 14 coordinated landslides on the LSMs generated by the LR-SU and RF models (Figure 5a,b), it can be discovered that most of the new landslides fell into the high-or very-high-susceptibility regions of LSMs generated by both LR-SU and RF models (Table 7), showing a certain prediction ability.Apart from this, 30 towers are predicted to be of very high susceptibility levels in all the LSM results, which need to be paid special attention in future line inspections.The No. 200 tower of the Panlong line, as a key transmission tower connecting the long-span line across the Yangtze River, was situated on the Yanzi ancient landslide.However, signs of landslide reactivation were found in February 2016 (Figure 6).The tower No. 200 was detected to have noticeable inclination and immediately relocated to the new site beyond the range of the landslide.Comparative analyses indicated that the Yanzi landslide was located in very-high-susceptibility regions of both LSM generated by the LR-SU model and the RF model.Field investigation indicated that the landslide revival was probably owing to the excavation for road construction and the related human engineering activities.In July 2020, tower No. 152 of the Panlong line suffered severe threat from a small-size landslide occurred in Yunyang County (Figure 7).The field investigation revealed that the landslide was triggered by slope cutting for road construction (Figure 7) and short periods of intense rainfall.Comparative analyses indicated that the landslide was located in very-low-susceptibility regions of LSM by the LR-SU model, but it fell within a series of grids with moderate to high susceptibility levels in LSM generated by the RF model.As a matter of fact, landslides do not happen on a single cell.For this reason, although the prediction ability of the RF model was significantly better than that of the LR model, sometimes, the LSM in SU form may perform better in result exhibition and practical application.

Conditioning Factors
The contribution of landslide conditioning factors may largely vary in different models owing to the difference of models' mechanism.Consequently, the methods which allow preliminary ranking of the factors' contribution become popular for it may alleviate a lot of repetitive work from trial and error.The result of factors contribution analysis showed that terrain factors and environmental factors are taking the leading role, probably because the main type of geohazards in this study is colluvial landslide, which is different from the most prominent and focused reservoir landslides in the TGRA [40].This result is generally in agreement with general rainfall triggered shallow landslide scenarios observed elsewhere.Besides, factors in different mapping unit may vary significantly in classification and spatial distribution, such as distance from rivers and slope, which showed a high importance in SU form, while much less importance in raster form.This is probably because the slope unit is assumed to have homogeneous attribute, and the sample sizes were much smaller comparing to that in raster form.Thus, the conversion from the original data source to other unit form also require specific rules and standard of classification.

Scale Effects and Problem of Suitable Mapping Unit
The scale effects of the employed mapping unit, particularly their type, size, and resolution, directly impact the precision and accuracy of LSMs [20,24].The feature within a single unit is assumed to be homogenous, which evidently influences the form and categorization of landslide conditioning factors, further leading to the difference in factors contribution; see Table 4.
Despite their operational advantage and low computational burden over grid-cells, the slope terrain unit has clear physical meaning, and they can avoid the shortcomings of low geomorphological representativeness of grid-based susceptibility mapping which differentiate susceptibility conditions within the same slope [43].It also allows for easy orientation to hazard exposed elements and comparison of their susceptibility levels (e.g., transmission towers), which is interesting for decision-makers, such as the local authorities or power managements, whereas the homogeneity inside the terrain units may lead to the overestimation of the susceptibility level, especially for some units with large size.
From previous studies, a pixel-based mapping unit is applicable to medium to largescale area, while a slope-based mapping unit is more applicable to small-scale area.When it comes to larger scale, such as a province or nation, SU is rarely used for it brings more efforts in units delineation.In this case, in order to focus on landslides along the power transmission lines, the study region was customized as a long narrow belt, covering a wide range but medium area.Combining with the machine-learning techniques, pixel-based mapping unit provided LSMs with higher accuracy, while SU-based mapping unit gave clearer results with better display effect, which are a better fit for the future applications, such as risk assessment and early warning analysis.Consequently, considering the scale effect and finding the most suitable mapping unit and its best fit resolution for landslide susceptibility is another challenge in the future, especially when facing the different needs of different target audience.

Model Comparison and Performance Evaluation
Comparing to the evaluation of the predictive performances of the susceptibility models, the overall performance of LSM remains a more difficult task with higher uncertainty.From past literature, a broad set of metrics, which considered the specific advantages and limitations of different models, were employed in model evaluation [73], while the standard, such as Susceptibility Quality Level (SQL), used to comprehensively evaluate the quality of LSM, are not widely used.
In this study, the RF model in raster form showed better performance than the LR model in all the evaluation index.In addition, we also tried to construct RF models on the SU.However, the model failed to give a satisfactory result on testing sample, even if all the samples in the training set are almost correctly classified.
Based on our experience, the samples size in SU form were significantly reduced comparing to that in raster form.In addition, when solving such problem with small data volume, low data complexity, and nonlinearity, many complex machine learning models tend to occur overfitting phenomenon, while the LR model is relatively unlikely to cause such problem.This is probably the reason why complex machine learning models are rarely employed in studies of landslide susceptibility assessment based on SU, and most of the researchers have used simpler model, such as LR, to avoid such problems.Besides, in cases with massive data samples, using machine learning methods inevitably increases the computation burden and requires higher computing and storage capacity.Consequently, we reaffirm that the balance between model predictive ability and computation effort merit attention in different application scenarios.
Apart from this, we consider that the distribution of the model errors and uncertainties should be attached with special attention, even associated to the single mapping units.The landslide near tower No. 152 (Figure 7) provide a significant example showing such uncertainty.From the field investigation, we found that the landslide was mainly caused by the slope cutting during the road construction and triggered by a heavy rainfall in July.To some extent, the change in topography caused by the artificial slope cutting could be reflected by the distance from roads.However, the rainfall was sporadic incident which is not taken into the consideration of landslide susceptibility assessment.Besides, comparing to the raster unit, slope unit had an intrinsic difficulty in extracting the attributes of some conditioning factors, such as distance to roads and curvatures.Such difficulty is mainly due to the inherent homogeneity in slope-unit.It also brings uncertainties which are highly related to the evaluation scale and the size of individual SU, since the slope-unit with a large area tends to have more heterogeneous information inside.Moreover, much of the uncertainty are related to the landslide inventories.The information about the type, size and exact boundaries of landslides could greatly contribute to the LSM, and the representation of landslides (using points or polygons) could make a big difference in susceptibility modeling process, such as the sampling strategy, feature extraction method, and validation approach [33].

Challenge and Future Directions
Although LSMs provide indicative information on landslide occurrence, geographical landslide early warning systems (LEWS) have proven to be more effective tools for hazard prevention in practical use, especially for the giant, complex power transmission system which is operating under high load.However, LEWSs are complex systems which involve their design, implementation, operation, management, and verification [76], and current LEWSs usually combine landslide susceptibility assessment and analysis of rainfall thresholding and forecasting [77].For a power transmission network, which is also seriously affected by geohazards, it is of great importance to attach more attention to the further application of LEWS.
Besides, extreme climate and weather events often cause great impacts on infrastructure, such as transport network, or power transmission system [78], such as the wide range of power failure in Texas, in the United States, in 2021, and the extreme rainstorm in Henan Province in July 2021.However, a general lack of studies of extreme events and its influence on geohazard has been noted in the previous literatures [79,80].From the perspective of decision-maker, how to enhance the resistance of key facilities to such extreme events and the generated geohazards remains to be discovered.Three aspects of technologies, including the wide application of LEWS, rapid hazard identification, and fast safety protection technology for power transmission facilities, are highly recommended in application and required to be further developed in the future.

Conclusions
Using different data-driven methods and two different mapping units, we presented several landslide susceptibility assessment results in an area along power transmission lines.The LSM results were validated and compared, and it supported the following conclusions: In this study area, environmental factors, such as altitude, tend to have a higher contribution to landslide occurrence.In raster form, the RF model performs better than LR with a training and validation accuracy of 0.927 and 0.915, respectively.However, the RF model fails to give out a reasonable LSM output in SU form, probably owing to the insufficient training sample in model construction.However, the LR model in SU form also presented better performance over its raster form, with the AUC value of 0.882 and 0.879 for training and verifying samples, respectively.In general, LSMs generated by machine learning methods could be a valuable tool in hazard prevention along power transmission lines.

Figure 2 .
Figure 2. The flowchart of the landslide susceptibility assessment.

Figure 3 .
Figure 3.The ROC curves of the RF and LR models in landslide susceptibility assessment: (a) training and (b) testing.

Figure 4 .
Figure 4. Landslide Susceptibility Maps obtained with different models and mapping units.(a) RF model in Raster unit.(b) LR model in Raster unit.(c) LR model in SU.

Figure 5 .
Figure 5. Distribution of recent landslides from 2016-2021 in generated landslide susceptibility maps, with 2 representative landslides showing in detail view: (a) LSM by RF model in raster unit; (b) LSM by LR model in SU.

Figure 6 .
Figure 6.The Yanzi ancient Landslides located at Badong County: (a) general view of the Yanzi landslide, the yellow rectangles indicating the position of the tower before and after relocation.(b-d) Crack L1 on the foundation platform of the transmission tower; (e) cracks in one leg of the transmission tower.

Figure 7 .
Figure 7. Landslide that occurred at Yunyang County in July 2020: (a) general view of the landslide which posed threat to the tower No. 152.(b) The aerial image of the landslide, where the landslide boundary is marked with yellow line.

Author
Contributions: S.L. organized the data and wrote the paper, W.L. and X.L. analyzed and processed the data, K.Y. and L.G. supervised and reviewed the work, C.Z. reviewed and edited the manuscript, B.Z. provided necessary analytical data.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the National Natural Science Foundation of China, grant numbers 41907253 and the National Key R&D Program of China, grant number No. 2018YFC0809400.

Table 1 .
Data and data sources.

Table 3 .
General characteristics of the mapping units.
3.1.4.Feature Selection MethodsInformation gain ratio (IGR) is the most efficient and widely used feature selection method in susceptibility modeling

Table 4 .
Multicollinearity analysis and factor contribution analysis result for each landslide conditioning factor in raster and SU form.

Table 5 .
Evaluation metrics on the performance of different models.

Table 6 .
Accuracy statistics of the generated LSMs.

Table 7 .
Susceptibility level of the recent landslide from 2016 to 2020.