A Novel Historical Landslide Detection Approach Based on LiDAR and Lightweight Attention U-Net

: Rapid and accurate identiﬁcation of landslides is an essential part of landslide hazard assessment, and in particular it is useful for land use planning, disaster prevention, and risk control. Recent alternatives to manual landslide mapping are moving in the direction of artiﬁcial intelligence—aided recognition of these surface processes. However, so far, the technological advancements have not produced robust automated mapping tools whose domain of validity holds in any area across the globe. For instance, capturing historical landslides in densely vegetated areas is still a challenge. This study proposed a deep learning method based on Light Detection and Ranging (LiDAR) data for automatic identiﬁcation of historical landslides. Additionally, it tested this method in the Jiuzhaigou earthquake-hit region of Sichuan Province (China). Speciﬁcally, we generated a Red Relief Image Map (RRIM), which was obtained via high-precision airborne LiDAR data, and on the basis of this information we trained a Lightweight Attention U-Net (LAU-Net) to map a total of 1949 historical landslides. Overall, our model recognized the aforementioned landslides with high accuracy and relatively low computational costs. We compared multiple performance indexes across several deep learning routines and different data types. The results showed that the Multiple-Class based Semantic Image Segmentation (MIOU) and the F1_score of the LAU-Net and RRIM reached 82.29% and 87.45%, which represented the best performance among the methods we tested.


Introduction
Landslides pose severe threats to human lives, activities, and infrastructure in mountainous terrains [1][2][3]. Understanding their genesis and dynamics is fundamental to reduce landslide risks [4,5]. In this context, record and monitoring data help gain insights on how landslides behave. The way the geoscientific community gathering landslide records fundamentally boils down to landslide identification routines. These have originally been based on field surveys and expert-based landslide recognition practices using orthophotos or satellite scenes [6]. However, such practices suffer from a high degree of subjectivity, and they also require a significant amount of time and resources [7]. Conversely, more recent technological advancements have pointed out the use of automated landslide recognition to standardize the procedure towards objective results produced with a significant speed-up [8].
Among the automated landslide mapping procedures proposed so far, the selection of the data and the algorithmic architecture one may use imply some limitations [9][10][11][12]. For instance, the choice of optical images brings a clear and interpretable overview of an area unless affected by cloud and/or dense vegetation covers [13][14][15][16]. Conversely, radar images are less sensitive to these issues, although the signal they record is a function of surface deformation data [17,18]. Therefore, these images may not be suitable to map historical The study site we selected is located within the Jiuzhaigou, Aba Tibetan Autonomous Prefecture of Sichuan Province, and it covers an area of approximately 356.73 km 2 ( Figure 1).
The study area is almost completely contained within the Jiuzhaigou National Forest Park, characterized by a subtropical monsoon climate responsible for an average annual rainfall of about 500-600 mm [45]. The altitude ranges from 1892 to 4359 above mean sea level, and the underlying lithology mostly consists of bioclastic limestone and calcareous dolomite, at times featuring deep canyon landform [46].
Historically, a number of geological disasters have taken place within the area. The Jiuzhaigou earthquake on 8 August 2017 is certainly the most recent one [47]. However, several other earthquakes have been reported in the literature through the years. However, a comprehensive historical landslide inventory has not been compiled so far, and the public information on past slope failures has mainly been achieved through simulations [48]. One of the reasons behind this is due to the vegetation, which covers approximately 79.4% of the study area [49]. The study area is almost completely contained within the Jiuzhaigou National Forest Park, characterized by a subtropical monsoon climate responsible for an average annual rainfall of about 500-600 mm [45]. The altitude ranges from 1892 to 4359 above mean sea level, and the underlying lithology mostly consists of bioclastic limestone and calcareous dolomite, at times featuring deep canyon landform [46].
Historically, a number of geological disasters have taken place within the area. The Jiuzhaigou earthquake on 8 August 2017 is certainly the most recent one [47]. However, several other earthquakes have been reported in the literature through the years. However, a comprehensive historical landslide inventory has not been compiled so far, and the public information on past slope failures has mainly been achieved through simulations [48]. One of the reasons behind this is due to the vegetation, which covers approximately 79.4% of the study area [49].

Data Preparation
The airborne LiDAR data was initially used to obtain the point cloud (provided by Sichuan Bureau of Surveying and Mapping (SBSM), and the point cloud density is 30 pints/m 2 ) from Wuhuahai to Rizegou, and these locations were hit by the earthquake in 2017. Because the obtained point cloud data removed vegetation and buildings, we generated DEM of this region by © ArcGIS Pro. Firstly, the LAS dataset was created by using

Data Preparation
The airborne LiDAR data was initially used to obtain the point cloud (provided by Sichuan Bureau of Surveying and Mapping (SBSM), and the point cloud density is 30 pints/m 2 ) from Wuhuahai to Rizegou, and these locations were hit by the earthquake in 2017. Because the obtained point cloud data removed vegetation and buildings, we generated DEM of this region by © ArcGIS Pro. Firstly, the LAS dataset was created by using data management tools. Subsequently, the LAS dataset was converted to a 1 m resolution DEM. In the conversion process, the interpolation type was selected by natural neighbor method, and the sampling value was set as 1. At the same time, we also obtained the same range of UAV optical images (0.2 m × 0.2 m) from SBSM ( Figure 1a). It can help researchers understand the geological environment from optical point of this area.

Historical Landslides Data
As part of any data-driven model, a dataset is required to build a suitable classifier. Therefore, we had to interpret LiDAR-based terrain imaged and map historical landslides to be later used to train and validate our deep learning model. Notably, when mapping landslides through LiDAR data, shallow and small landslides may not be easily captured due to shadow effects [50]. To deal with this issue, we adopted the two-dimensional visual- ization method based on 3D data proposed by Chiba, Red Stereoscopic Map (RRIM) [51,52]. Specifically, we initially used the Topographic Openness tool offered in SAGA GIS to further extract Positive and Negative Openness layers from the DEM [28]. From these layers we then computed the valley ridge index I as follows [28]: where O p is Positive openness index of terrain, and O N is Negative openness index of terrain. By combining slope steepness and the ridge index, we produced the RRIM layer [28]. This sequence is graphically summarized in Figure 2. RRIM is particularly efficient to represent ambient lighting and better support accurate interpretation. Figure 3 shows a visual comparison of the historical landslides we mapped through Lidar source data, hillshade, optical image, and RRIM.     According to the mapping based on the LiDAR DEM, it can be observed that the landslides exhibited distinct morphological feature in the RRIM (e.g., semicircular niches, pressure ridges, depressions, and the Hummocky relief in deposits). The landslides can clearly be recognized from these distinct features in Figure 4. Visually inspecting the RRIM layer, we interpreted a total of 1949 historical landslides within the study site, covering a total area of 20.24 km 2 . Among them, the smallest landslide is 502 m 2 , and the largest is 0.67 km 2 . We stress that most of them are covered by vegetation and thus could hardly be recognized by optical images alone. This is where the RRIM brings added value in mapping historical landslides. Our research question is then to test whether this added value can be further brought into a deep learning architecture, automatizing in turn the mapping procedure. We recall here that any artificial intelligence requires two sets of data for it to work. One is used for calibration or the procedure of estimating the functional relations responsible for landslide mapping, and the other one for validation or the procedure of testing the model performance and its capacity to generalize the classification over unknown data. In order to avoid mutual interference between training set and test set and ensure the diversity of information of the two sample sets, we screened 1949 historical landslides. Finally, 1364 landslide samples were used as training data, then these data were enhanced by rotation and mirroring, and 585 landslide samples were used as validation data to our deep learning model. These two sets of historical landslides are geographically shown in Figure 4. During the course of creating image data, authors have obtained images with a size of 512 × 512 size by cutting the geometric center. It ensures the integrity of the landslide accumulation area, as well as avoiding the mutual interference between training samples and test samples.

Detection Approach
Deep learning is a technique which is gaining momentum within the geoscientifi community [53,54]. Different from other artificial intelligence, deep learning interpret images in a similar manner as a human would do. The available information is exploite

Detection Approach
Deep learning is a technique which is gaining momentum within the geoscientific community [53,54]. Different from other artificial intelligence, deep learning interprets images in a similar manner as a human would do. The available information is exploited to build a binary classifier based on a gridded structure where terrain and spectral data are projected to. In addition to this, though, a deep learning routine allows to solve for complex problems by including the neighboring information to any specific grid under examination [55,56]. Several deep learning routines are currently available and under continuous development. In this paper, we opted for the FCN, which is currently one of the most widely used deep learning models in image processing.
In this study, we designed a Fully Convolutional Network (FCN) to analyze LiDAR derived information and ultimately identified historical landslides. The overall approach can be summarized into three steps: (1) DEM extraction from LiDAR data; (2) manual labeling of historical landslides on LiDAR topography; (3) deep learning-based classification. The detailed flow chart of this method is shown in Figure 5.

U-Net
Image recognition can be divided into three stages: image classification, target detection, and target segmentation. Image classification determines whether a given object is contained in an image. Target detection is then used to locate the position of the target and object segmentation separates the object boundary, which is the ultimate goal of the image recognition process. Among the target segmentation tools, U-Net series is undoubtedly the most popular for FCN [57]. It consists of an end-to-end image segmentation method, which further allows the network to make pixel-level prediction and directly obtain the label map. This method is widely applied to medical image segmentation tasks [58], and has the advantages of a relatively simple architecture, fast training speed, overfitting reduction protocols, and it is ultimately suitable even for small data sets. Similar to the medical data set, the historical landslides data set also has some problems, such as difficult data acquisition, small data samples, and various shape and size changes. Therefore, in this work we use a U-Net architecture to achieve this task, specifically for classifying, detecting, and segmenting the landslide polygons described in Section 3.2.

Attention Gate
Attention Gate is an approach proposed by Ozan Oktay in 2018 [59]. As the deep learning process deepens into the data, two levels of information are extracted. The shallower level has a better insight of the broader spatial characteristics of an image associated

U-Net
Image recognition can be divided into three stages: image classification, target detection, and target segmentation. Image classification determines whether a given object is contained in an image. Target detection is then used to locate the position of the target and object segmentation separates the object boundary, which is the ultimate goal of the image recognition process. Among the target segmentation tools, U-Net series is undoubtedly the most popular for FCN [57]. It consists of an end-to-end image segmentation method, which further allows the network to make pixel-level prediction and directly obtain the label map. This method is widely applied to medical image segmentation tasks [58], and has the advantages of a relatively simple architecture, fast training speed, over-fitting reduction protocols, and it is ultimately suitable even for small data sets. Similar to the medical data set, the historical landslides data set also has some problems, such as difficult data acquisition, small data samples, and various shape and size changes. Therefore, in this work we use a U-Net architecture to achieve this task, specifically for classifying, detecting, and segmenting the landslide polygons described in Section 3.2.

Attention Gate
Attention Gate is an approach proposed by Ozan Oktay in 2018 [59]. As the deep learning process deepens into the data, two levels of information are extracted. The shallower level has a better insight of the broader spatial characteristics of an image associated with lower feature information (or the landslide specifics). As for the deeper level, the opposite situation arises. The deeper the level, the more the learning process on the object of interest. Therefore, deep levels carry richer feature-specific information but lower spatial one. The additive attention coefficient formula [59] used in this paper is as follows: where q l att and α l i are both of attention coefficients, σ 1 is Relu function, σ 2 is Sigmoid function, b g and b ϕ are both of the convolution bias terms, x i is a pixel vector, g i is a gating vector, and W T X , W T G , and ϕ T are all part of the convolution kernel. The role of an Attention Gate is to balance these two levels of information and combine the highly detailed information on the landslide characteristics in such a way that can be suitably generalized. This is achieved by iterating a process where the learning mechanism is based on deemphasizing the background information while emphasizing the foreground information in a given image. The concept of emphasis or attention translates into assigning weights to specific areas of an image (where we mapped landslides) and reducing the activation value of the background to optimize the segmentation. This is the reason why this approach has gained more and more attention within the geoscientific community. In fact, landslide inventories are extremely sparse by nature. In other words, areas covered by landslides are much smaller in number and extent compared to areas where landslides have not manifested yet [60].

Lightweight Attention U-Net
We design a Lightweight Attention U-Net (LAU-Net) for historical landslides identification based on a combined U-Net and Attention Gate architecture. The network structure of LAU-Net is shown in Figure 6. This FCN model is mainly composed by encoder, bottleneck, decoder, and skip connection. During the encoding phase, input images are initially converted into an internal coding, then they are projected to 32 dimensions through the convolution and pooling layers. After this multi-stage decoding, the spatial information carried by a given image is progressively decomposed and compressed into a smaller dimensional object, whose minimum size is commonly referred to as bottleneck. After the essential information has been brought to the bottleneck, similar to the traditional U-Net, we designed a symmetrical decoder, meant to bring back the image dimensionality to its original state. The two processes describe above are then ultimately combined through a skip connection step, where the multi-scale structure of the information is brought back into the network. Below we will describe each block for clarity.
Encoder: Firstly, the binary channel images with a size of 512 × 512 pixels are transformed into a special code through the input layer, and their feature dimension and resolution remain unchanged [61]. Convolution layer (Conv) contains several feature planes, and neurons in the same feature plane share weights. We set the convolution kernel to be 3 × 3, used Relu as the activation function, and added Batch Normalization (BN) to the convolution layer. Every two convolution layers are followed by a 2 × 2 maximum subsampling (max pooling) for image down sampling. This design can reduce the connections between different layers of the network, simplify the complexity of the model, and reduce the risk of overfitting.

Optimization
Even after the implementation of the LAU-Net network described above, the model could still suffer from issues related to the number and extent of landslides in an image. Specifically, this type of deep learning classifiers is usually implemented in a context where the objects of interest occupy a significant portion of a given image. For instance, in the first paper published by Oktay et al. 2018, a large portion of the body scan hosted the pancreas, or the target to be mapped. However, in the context of landslide automated mapping, the proportion of a given image occupied by landslides is usually a mere fraction of the total. In other words, the background information is several orders of magnitude larger than the foreground one would like to identify. As a result, the overall classification may still produce undesired True Positive Rates (the proportion of correctly identified landslides over the total number of landslides). To address this issue, we adopted the optimization step introduced by [62], where a generalized loss function named Tversky loss is computed as the LAU-Net evolves through the epochs. The Tversky loss function is expressed as follows: where p 0i is the probability of pixel i being a landslide, and p 1i is the probability of pixel i being a non-landslide. Additionally, g 0i is 1 for a landslide pixel and 0 for a nonlandslide pixel, and vice versa for the g 1i . Finally, the minimization of the loss function described above is performed by using the Adam optimization proposed by [63].

Experiment
The source code for all of the algorithms mentioned above has been implemented using the open library TensorFlow. TensorFlow offers a wide range of deep learning routines. Therefore, to test our LAU-Net architecture, we benchmarked its landslide identification performance with respect to other famous image segmentation methods. These correspond to ResU-Net, R2U-Net, DeepLabv3, SwinU-Net, and U-Net++ [64][65][66][67][68], and these models adopt the same parameter index design as LAU-Net. All these binary classifiers, including our LAU-Net, have been run on a machine with the following characteristics: Bottleneck: In the bottleneck, two successive convolution layers are used to learn high dimensional features, and the characteristic dimension (256) and resolution (64 × 64) of the data remain unchanged.
Decoder: As a symmetric decoder corresponding to the encoder based on the convolution module, we also used a convolution layer with a scale of 3 × 3 for deconvolution, and every two convolution layers are followed by a 2 × 2 Upsampling layer. Each Upsampling layer reduces the feature dimension of data to half of the original and improves the resolution of the feature map at the same time. After several decoding stages, the last 1 × 1 convolution layer converts the feature vectors of 32 channels into the required classification results.
Skip connection: Unlike U-Net uses copy connections to simply connect shallow features with deep ones. Attention U-Net integrates the multi-scale features extracted from the encoder and the upsampling feature through the Attention Gate, then inputs this information into decoder. Attention Gate can adjust the feature importance of landslide area, optimize the segmentation effect of landslide, and accelerate the decoding efficiency of the model.

Optimization
Even after the implementation of the LAU-Net network described above, the model could still suffer from issues related to the number and extent of landslides in an image. Specifically, this type of deep learning classifiers is usually implemented in a context where the objects of interest occupy a significant portion of a given image. For instance, in the first paper published by Oktay et al. 2018, a large portion of the body scan hosted the pancreas, or the target to be mapped. However, in the context of landslide automated mapping, the proportion of a given image occupied by landslides is usually a mere fraction of the total. In other words, the background information is several orders of magnitude larger than the foreground one would like to identify. As a result, the overall classification may still produce undesired True Positive Rates (the proportion of correctly identified landslides over the total number of landslides). To address this issue, we adopted the optimization step introduced by [62], where a generalized loss function named Tversky loss is computed as the LAU-Net evolves through the epochs. The Tversky loss function is expressed as follows: where p 0i is the probability of pixel i being a landslide, and p 1i is the probability of pixel i being a non-landslide. Additionally, g 0i is 1 for a landslide pixel and 0 for a non-landslide pixel, and vice versa for the g 1i . Finally, the minimization of the loss function described above is performed by using the Adam optimization proposed by [63].

Experiment
The source code for all of the algorithms mentioned above has been implemented using the open library TensorFlow. TensorFlow offers a wide range of deep learning routines. Therefore, to test our LAU-Net architecture, we benchmarked its landslide identification performance with respect to other famous image segmentation methods. These correspond to ResU-Net, R 2 U-Net, DeepLabv3, SwinU-Net, and U-Net++ [64][65][66][67][68], and these models adopt the same parameter index design as LAU-Net. All these binary classifiers, including our LAU-Net, have been run on a machine with the following characteristics: For repeatability and reproducibility, we also list the hyperparameter we opted for: (1) the learning rate is 1 × 10 −5 , if the model falls into the local optimal solution, it will decay by a factor of 0.7; (2) the batch size is 16; (3) the maximum number of epochs is 150.

Model Validation and Comparison
All the models mentioned above are trained and compared using data sets generated by the RRIM method. In this experiment, authors apply accuracy, F1_score (%), and MIOU as the precision evaluation indexes, which is widely used as comprehensive evaluation system for image segmentation problems. According to the loss function index, we stored the testing results corresponding to the minima of each model, as well as the MIOU and the required computational time. An overview is presented in Table 1: Table 1. Results of different deep learning models. As shown in Table 1, the addition of the Attention Gate channel to the Lightweight U-Net (LU-Net) network produces the lowest loss rate (loss decreased by 0.30%), and higher generalization performance in the verification set (Accuracy improved by 0.76%, MIOU improved by 0.84%, F1 improved by 0.95%). At the same time, the Attention Gate channel also improves the decoding ability of the model in the deconvolution stage. This is recorded in the computational time reports. The LAU-Net consumes less time than the LU-Net, with a time reduction of by 22 ms in each epoch. In comparison with other methods, although several other FCN models use deeper network structure and larger parameters, LAU-Net is not inferior, and it achieves the best generalization ability with a much smaller computational burden. These summary metrics support the use of LAU-Net as the most suitable FCN model for historical landslides identification, among the most common deep learning routines.

Loss (%) Accuracy (%) F1 (%) MIOU (%) Computational Time (s/Epochs)
Aside from the specific metrics, in the process of model prediction, an automated procedure may still misclassify some small geomorphic units as landslides. The smallest geomorphic unit area that the method can detect is 190 m 2 , but these smaller units are not caused by erosion factors such as landslides (e.g., Mountain flood residues, human engineering activities, etc.). The smallest landslide size in our manually mapped inventory is 544 m 2 . Therefore, we imposed the mapping procedure to convert the landslide labels into polygons, and in the process filtered geomorphic units with an area of less than 500 m 2 . Figure 7 shows the resulting segmentation process of our LAU-Net. The blue label marks the training data set of historical landslides, the yellow label marks the same but used for validation and the red polygons indicate the landslides boundary generated by our LAU-Net. The figure highlights a convincing segmentation effect on both the training set and the verification set, and it can accurately identify all landslides in the remote sensing images. We stress here that a LiDAR survey generates extremely finely resolved images. Thus, mosaicking them in a single image would result in a prohibitive object to be loaded in most computers. Therefore, we kept each image separate from the others. In turn, this implies that historical landslides at the edge of the original remote sensing image will inevitably be divided into several patches, in which case the landslides features should be inevitably lost. However, the model can still effectively identify them.
Remote Sens. 2022, 14, x FOR PEER REVIEW 12 Thus, mosaicking them in a single image would result in a prohibitive object to be lo in most computers. Therefore, we kept each image separate from the others. In turn implies that historical landslides at the edge of the original remote sensing image inevitably be divided into several patches, in which case the landslides features shou inevitably lost. However, the model can still effectively identify them.

Identification Effect Analysis of Different Data Types
To corroborate the use of our modeling protocol, we opted to add another ele of comparison. The literature reports a number of applications where landslide ide cation is performed by using other sources of information such as optical images, the DEM itself and shaded relief. Therefore, once we have proven the performance o LAU-Net, the remaining element to be evaluated was the type of image we used. In

Identification Effect Analysis of Different Data Types
To corroborate the use of our modeling protocol, we opted to add another element of comparison. The literature reports a number of applications where landslide identification is performed by using other sources of information such as optical images, the raw DEM itself and shaded relief. Therefore, once we have proven the performance of our LAU-Net, the remaining element to be evaluated was the type of image we used. In addition to using the RRIM image (two channel), we then tested an optical image captured during a UAV survey (three channel), the raw Lidar DEM (two channel) and its hillshade derivative types (one channel). The resulting process is graphically shown in Figure 8, and numerically summarized in Table 2.  Out of all the data sources, the RRIM did produce the best results, being roughly 7% more accurate than the model using the shaded relief, 13% more than the DEM, and about 17% more than the model using the UAV optical information. From Figure 9, we can clearly observe that, expect for the RRIM data having good identification results, there are some problems in the identification results of other data sources. For the hillshade data, landslide ranges are not only obviously smaller, but also have the problem of cavity identification. For the DEM data, it is easy to miss small landslides during detection. Finally, for the UAV data, it loses a lot of targets and performs worst in several categories.  Out of all the data sources, the RRIM did produce the best results, being roughly 7% more accurate than the model using the shaded relief, 13% more than the DEM, and about 17% more than the model using the UAV optical information. From Figure 9, we can clearly observe that, expect for the RRIM data having good identification results, there are some problems in the identification results of other data sources. For the hillshade data, landslide ranges are not only obviously smaller, but also have the problem of cavity identification. For the DEM data, it is easy to miss small landslides during detection. Finally, for the UAV data, it loses a lot of targets and performs worst in several categories.

Scale Parameter Analysis of Attention U-Net
In this paper, authors designed a LAU-Net for the specific image recognition tas historical landslides. Compared to traditional Attention U-Net (TAU-Net) with near million parameters, this LAU-Net produces satisfactory results with one fourth of the rameters and one third of the computational burden. Figure 10 shows the training pro of the LAU-Net compared to the TAU-Net, and Table 3 shows the verification resul LAU-Net and TAU-Net. As can be seen from the result, with the same optimizer and function, although traditional Attention U-Net performs slightly better than LAU-Ne the training set, they have markedly little differences in their generalization ability. stress here that when solving image recognition tasks, the ability of a model to genera its prediction is often more important than the model fitting itself. These considerat are an additional point of discussion when promoting the use of similar complex clas ers for landslide detection. Table 3. Optimal verification results for LAU-Net and TAU-Net.

Scale Parameter Analysis of Attention U-Net
In this paper, authors designed a LAU-Net for the specific image recognition task of historical landslides. Compared to traditional Attention U-Net (TAU-Net) with nearly 8 million parameters, this LAU-Net produces satisfactory results with one fourth of the parameters and one third of the computational burden. Figure 10 shows the training process of the LAU-Net compared to the TAU-Net, and Table 3 shows the verification result for LAU-Net and TAU-Net. As can be seen from the result, with the same optimizer and loss function, although traditional Attention U-Net performs slightly better than LAU-Net in the training set, they have markedly little differences in their generalization ability. We stress here that when solving image recognition tasks, the ability of a model to generalize its prediction is often more important than the model fitting itself. These considerations are an additional point of discussion when promoting the use of similar complex classifiers for landslide detection.

Considerations on Multiple Sources
When comparing the multiple data sources within the framework of the deep learning routine we presented, the RRIM data type achieves the best recognition effect. According to our analysis, this is due to the most detailed depiction of terrain and landform by RRIM. As shown in Figure 3, humans can clearly judge the difference between the landslide and the background through RRIM data and distinguish the source area and accumulation area of the landslide. This visual reflection can also improve the computer recognition effect. The hillshade comes relatively close to the performance provided by the RRIM counterpart. This is likely due to the capacity of the hillshade to clearly reflect geomorphic features. Thus, even historical landslides would be detectable, even more clear than when using elevation and optical alternatives. However, the shadow effect is due to peaks overlooking lowlands in the sunlight direction. The resulting darkness at these specific incidence angles may have limited the ability of our LAU-Net compared to the richer information provided by the RRIM. Interesting considerations arise in relation to the use of the DEM. In fact, this data type requires the least preprocessing to be fed to the deep

Considerations on Multiple Sources
When comparing the multiple data sources within the framework of the deep learning routine we presented, the RRIM data type achieves the best recognition effect. According to our analysis, this is due to the most detailed depiction of terrain and landform by RRIM. As shown in Figure 3, humans can clearly judge the difference between the landslide and the background through RRIM data and distinguish the source area and accumulation area of the landslide. This visual reflection can also improve the computer recognition effect. The hillshade comes relatively close to the performance provided by the RRIM counterpart. This is likely due to the capacity of the hillshade to clearly reflect geomorphic features. Thus, even historical landslides would be detectable, even more clear than when using elevation and optical alternatives. However, the shadow effect is due to peaks overlooking lowlands in the sunlight direction. The resulting darkness at these specific incidence angles may have limited the ability of our LAU-Net compared to the richer information provided by the RRIM. Interesting considerations arise in relation to the use of the DEM. In fact, this data type requires the least preprocessing to be fed to the deep learning model. However, when checking the classification results based on the DEM, many small historical landslides were misclassified. The boundary of large historical landslides appeared extremely noisy. This may be due to the limited capacity to normalize the DEM information into the deep learning process. As for the use of optical images, these produced by far the worst results. This is likely due to the dense vegetation, which inevitably masks the geomorphic signature of historical landslides. In summary, we consider RRIM data to be the most suitable data type in the context of AI-aided historical landslides identification.

Historical Landslides in Jiuzhaigou
Due to the abundant rainfall in the subtropical monsoon climate zone, and coupled with the unique mountain canyon landform, Jiuzhaigou often suffered geological disasters such as landslides and debris flows in the past. According to the records, a ravine debris flow destroyed a village which occurred at the Zaru Tample 100 years ago. Additionally, a collapse landslide occurred at Guodu in 1952 [69]. As the environment changed, the valleys and slopes where historical landslides occurred were recovered with vegetation, but it does not mean that the surface environment in the area is stable. On 7 July 2017, Jiuzhaigou earthquake induced 1988 coseismic landslides in the study area ( Figure 11). It can be seen from this figure that in the regions with large scale development of historical landslides, such as regions A and B, the density of coseismic landslides induced by earthquakes is relatively high. In contrast, coseismic landslides are rarely observed in the region C, where the distribution of historical landslides is less. Therefore, the effective identification of historical landslides can not only provide support for risk assessment of geological disasters, but also deepen the spatiotemporal pattern of geological disasters before and after an earthquake in mountainous regions.
quakes is relatively high. In contrast, coseismic landslides are rarely observed in the region C, where the distribution of historical landslides is less. Therefore, the effective identification of historical landslides can not only provide support for risk assessment of geological disasters, but also deepen the spatiotemporal pattern of geological disasters before and after an earthquake in mountainous regions.

Conclusions
In this study, a novel approach based on LiDAR data and LAU-Net is proposed to identify historical landslides in densely vegetated mountainous areas. Authors generate RRIM through high precision data to interpret historical landslides in Jiuzhaigou. This expert-based procedure led to a total of 1949 historical landslides. A LAU-Net has then been compared to a number of competitors when it comes to AI-aided detection yet proving to produce the highest classification performance. Having confirmed this, we also compared the effects of different data types highlighting the rich information compressed into RRIM data. All these considerations indicate the use of the LAU-Net built on a RRIM foundation to be the optimal combination for historical landslides' mapping. We believe this to be the case because the framework we proposed is capable of going deep to extract diagnostic feature information while bridging it back to the original spatial context. Despite this task being particularly complex due to the masked geomorphic signature of historical landslides-mostly by vegetation cover-the RRIM still proved to be the best candidate to reflect the body of the failed mass. We recommend a combination of appropriate data preprocessing methods and a concise network architecture to solve problems such as identifying specific geological hazards, which often yield more surprising results.