1. Introduction
Landslides are geological phenomena that endanger and cause great losses to human life, property, resources and the environment. Landslides are frequent in China, with 23,952 landslides occurring nationwide during the five-year period from 2016 to 2020, accounting for 70% of the total number of geological disasters in the country. China is the country with the most widely developed loess in the world, mainly distributed in 34 prefecture-level cities in seven provinces and regions, including Shaanxi, Gansu and Shanxi, which is one of the high-risk areas for geological hazards in China [
1]. One-third of geological hazards in China occur on the Loess Plateau, causing casualties, road damage, and reduction of arable land, and the majority of landslides are distributed in the western mountainous areas with complex topographic and geological conditions [
2]. For example, 116 landslides have occurred in the Heifangtai area in Gansu Province, causing a total of 37 deaths, more than 100 injuries, more than 2,000,000 m
2 of arable land abandoned, 413,000 m
2 of farmland destroyed, and a total economic loss of 190 million yuan. As landslides can cause great loss to human life, property safety and the ecological environment, it is necessary to monitor and warn of loess landslides.
Landslides have complex characteristics, such as hidden, sudden and uncertain spatial and temporal evolution, sudden occurrence triggered by external factors, unclear disaster-causing conditions, a poorly understood internal structure, and unknown deformation and rupture processes and mechanisms [
3]. At present, there are three main methods of landslide monitoring, namely, the geological monitoring method [
4], remote sensing monitoring method [
5] and integrated space-air-ground monitoring method [
6], which are used to monitor the internal properties of landslides, external disaster-causing factors and other indirect information. The geological monitoring method is mainly based on the monitoring of the geological environment, including direct and indirect information, and the geological monitoring data are mainly obtained through external measurements by surveyors in a simple but labor-intensive and time-consuming process. Remote sensing monitoring methods are widely used in landslide monitoring because of their wide range, high speed and rich image information. Displacement changes are the most significant manifestation of landslide stability deterioration, and destabilization before damage and surface displacements are monitored quantitatively by optical remote sensing [
7], interferometric synthetic aperture radar (InSAR) [
8], and airborne LiDAR monitoring systems [
9]. By using a high-precision remote sensing monitoring method, we can accurately monitor the historical and present slope deformation by comparing and analyzing the multiperiod image data of the same area, thus realizing the monitoring of landslide potential from a “space-air” perspective. The integrated space-air-ground monitoring method uses geological monitoring methods to obtain geological disaster data and then integrates modern monitoring technology, such as InSAR and LiDAR, and geographic information technology to conduct a “census” of landslide hazards, monitor the dynamic evolution of landslides, and generate high-precision data for landslide prediction [
6].
Landslide monitoring generates a huge amount of multimodal data, including text data and image data, which are of limited use for current landslide hazard analysis methods. In terms of landslide text data, analysis methods for landslide hazards include landslide catalog-based analysis, deterministic methods, heuristic-based methods, and statistical methods. The landslide inventory-based analysis method is a primitive method for predicting regional landslide hazard susceptibility. DeGraff et al. used landslide history hazard maps to construct isopleth maps of landslide deposits and then qualitatively assessed landslide susceptibility classes based on geological conditions in the study area [
10]. Deterministic methods usually use a simple slope limit equilibrium model to assess slope stability for the purpose of regional landslide susceptibility prediction [
11]. Deterministic methods can better explore the relationships between landslide hazards and causal factors, but data collection is difficult, and the causal factors are spatially variable. Heuristic-based methods rank and weight factors affecting slope instability based on subjective experience to determine the probability of regional landslide occurrence, and include hierarchical analysis, the fuzzy logic method and linear weight combination [
12]. In recent years, statistical methods, such as the weight-of-evidence method and logistic regression, have been widely used in landslide susceptibility analysis.
In terms of landslide image data, landslide monitoring as an image processing problem is very amenable to machine learning techniques. Support vector machines [
13], random forests [
14], artificial neural networks [
15], and deep learning [
16] have been widely used in landslide classification. Deep learning, a branch of machine learning, is used to construct deep artificial neural networks similar to human neural network systems that analyze and interpret the input data, extract the features of the data and combine them into abstract high-level features, and plays an important role in fields such as computer vision and natural language processing [
17]. Convolutional neural networks (CNNs) use convolutional operations to extract features from input images, effectively learning feature expressions from many samples, and have a stronger model generalization capability. LeNet was the first proposed CNN model and laid the foundation for the development of CNN [
18]. The AlexNet model increases the depth of the CNN model, employs a ReLU as the activation function, and uses a dropout technique [
19]. VGGNet inherited the framework of AlexNet and LeNet and increases the depth of the network by stacking convolutional layers with 3 × 3 convolutional kernels to improve the performance of the network [
20]. The Inception-v1 module of GoogLeNet uses sparse links to reduce the number of model parameters and ensure the efficient use of computational resources, and the network depth reaches 22 layers while improving the performance of the network [
21]. ResNet consists of several residual blocks that are connected across layers, weakening the strong links between each layer, and is used to solve degradation problems in deep networks [
22]. DenseNet uses a simple connectivity model consisting of dense blocks that connect all layers directly, reducing the number of required parameters and the computational costs of the model [
23]. EfficientNetV2 introduces the Fused-MBConv module and an incremental learning strategy that uses adaptive regularization to accelerate training by gradually increasing the image size during training [
24]. The Vision Transformer splits the whole input image into small image blocks, using the linear embedding sequence of these small image blocks as the input of the transformer, and then using supervised learning for image classification training [
25]. The Swin Transformer is a hierarchical transformer that limits the computation of self-attention to nonoverlapping local windows by shifting windows, while considering cross-window connections to improve efficiency and compatibility with a wide range of vision tasks [
26].
The massive landslide monitoring data accumulated over time are large in amount, broad in range of sources, and complex in structure, but only a small portion of this data can be transformed into useful knowledge. From the perspective of landslide causation mechanisms, landslides contain three constituent factors, i.e., disaster-causing factors, disaster-generating environments and disaster-bearing bodies, and landslides are the result of the joint action of these three factors. To carry out landslide research, build a knowledge system for the study of landslides, increase the knowledge of and relationships between disaster-causing factors, the disaster-generating environment and the disaster-bearing bodies and carry out spatiotemporal prediction for landslide hazards, it is important to attain improved disaster mitigation and public service capabilities. Outcomes of this research will help scientists grasp the causes of landslide hazards, improve early monitoring, forecasting and early warning of landslides, protect people’s lives and reduce property losses caused by landslides, and improve disaster mitigation and public service. Realizing the transformation of massive unstructured data to structured knowledge, iteratively enhance data and knowledge, and providing intelligent landslide analysis services have become key bottlenecks in disaster management [
27].
A knowledge graph is a semantic network knowledge base composed of various entities and relationships in the objective world, which can describe various objects and their relationships intuitively, naturally and efficiently and can be used to discover hidden knowledge and patterns. Building upon this basis, a unique geological knowledge graph has been formed in the field of geology. For a typical geographic phenomenon such as a landslide, which is rich in mechanism knowledge and spatiotemporal process information, a landslide knowledge graph is constructed to store and manage multisource heterogeneous data. It is based on the landslide domain knowledge system and aims to reveal the specific manifestations and deep intrinsic factors of landslide occurrence. The TransE [
28] model, which is a knowledge representation learning model, makes full use of landslide data in the case of small sample sizes and high-dimensional heterogeneity of prediction index entities and their relationships. It predicts missing triads by leveraging semantic networks constructed from knowledge graphs and addresses the issue of incompleteness in complex prediction tasks based on knowledge graphs with high accuracy and scalability. Thus, it is a potential method to perform landslide spatial identification.
Currently, the analysis and applications of landslide monitoring mainly use unimodal data, such as text data or image data, while multimodal data are rarely fused for analysis. The current data sources for multimodal learning include images, text, audio, video, and other sources from the fields of video classification [
29], sentiment analysis [
30], cross-modal search [
31], and image synthesis [
32]. The advantages of multimodal learning are that it makes up for the limitations of unimodal information, is less affected by noise in individual modalities, has redundancy and complementarity among modalities, and can obtain information with richer features to improve the performance of the whole model.
A landslide knowledge graph is utilized to manage multimodal data generated from landslide monitoring and is incorporated into the landslide image recognition task. The information derived from different modalities complements each other, thereby enhancing the accuracy of landslide remote sensing image recognition. Therefore, we construct a landslide knowledge graph based on the landslide knowledge domain. Subsequently, we incorporate the landslide knowledge graph into the task of landslide remote sensing image recognition, proposing a recognition model that integrates the landslide knowledge graph with remote sensing images. Additionally, we present two distinct feature fusion methods.
4. Results
In this section, we first present the recognition results of the ResNet model for landslide remote sensing images and show the recognition results of the feature classifier for landslide knowledge graph data. Then, we discuss the recognition results of the FKGRNet model based on the feature classifier and the recognition results of the FKGRNet model based on feature splicing. Next, we outline the performance of the different models on the remote sensing image dataset and compare the results with the recognition results of the FKGRNet model. Finally, we conduct ablation experiments to investigate the role of knowledge in the landslide knowledge graph on the fusion model.
4.1. Identification Based on the ResNet Model
From
Section 2.2, a total of 307 landslide samples were found in the study area, and 307 sample points were randomly selected as non-landslide samples in the non-landslide area. The landslide image dataset was divided according to a ratio of 8:2; the training set contained 492 images, and the validation set contained 122 images. A Windows-based computer with a NVIDIA GeForce GTX 3080TI 16G GPU and Intel Core i11-11900k processor was used for the experiments. The ResNet model based on the PyTorch environment was loaded with weights pretrained on the ImageNet dataset, the number of iterations was set to 50 epochs, and the batch size was 32 for the Adam optimizer. The initial learning rate was set to 0.001, and the learning rate was reduced by half every 5 epochs using a learning rate optimization strategy. The input image was resized to 224 × 224 pixels, and data enhancement methods such as random horizontal and vertical flipping were used to normalize the image pixel values to (0,1) to speed up model convergence.
We trained the ResNet model using different network depths.
Figure 7 shows the training accuracy, training loss, validation accuracy and validation loss of the ResNet model for different network depths as well as the convergence of the model training. As seen in
Table 2, as the network depth of the ResNet model increases, the landslide recognition accuracy increases, and the ability of ResNet to extract features increases, which illustrates the effectiveness of the ResNet residual module for landslide image feature extraction.
Figure 8 illustrates the feature maps produced for landslide images L026 and L259 by the ResNet model at various depths. The brightness of the color in each feature map represents the level of significant features captured by the respective convolutional layer at each location. Notably, the landslide features in image L026 are prominently visible, and they are effectively extracted across all three network depths. In terms of the network structure of ResNet34, it can be seen from b1-b5 in
Figure 8 that the ResNet convolutional module also becomes increasingly sensitive to the extraction of image features, with the model extracting significant features in the landslide region. In terms of the network depth of ResNet, it can be seen from a4, b4 and c4 in
Figure 8 that, as the network depth increases, ResNet becomes more focused on the landslide region of the image. However, image L259 is mistakenly classified as a non-landslide image after undergoing feature extraction by the ResNet network. A further analysis of the images reveals that the landslide features in image L259 are predominantly concentrated in the middle and lower parts, but they are not adequately extracted during the convolutional layer processing step. This inadequacy might explain the incorrect identification of image L259 as a non-landslide image.
4.2. Feature Classifier-Based Recognition
To train the entity vector of the landslide knowledge graph, training samples and validation samples need to be selected to construct the dataset required for the training of the TransE model, and both the training and validation sets must consist of landslide samples and non-landslide samples. The training set and validation set were divided according to an 8:2 ratio using the landslide samples and non-landslide samples, and the landslide knowledge was kept in one-to-one correspondence with the landslide images. According to
Section 2.2, the landslide event entities and their attribute features in the study area were obtained from the landslide knowledge graph and converted into a vector representation, and each landslide event was formally expressed as
In the formula, slope, elevation, precipitation, NDVI, water system distance, road distance, plane curvature, and profile curvature are numerical attribute features, and aspect, landform type, and stratigraphic lithology are literal attribute features.
Each landslide factor in the landslide knowledge graph dataset is processed. The slope, elevation, precipitation, NDVI, water system distance and road distance are numerical attribute features, and the values are kept constant. The plane curvature and profile curvature are equally divided into three intervals and numbered sequentially. The aspect is divided into 12 main slide intervals according to the main slide direction, the landform type is divided into 4 categories according to the geomorphology, and the stratigraphic lithology is divided into 74 categories according to the lithology. After the above processing, the text-based descriptions of landslide vector and non-landslide vector are converted into numerical descriptions.
The landslide vector
, the non-landslide vector
and the vector to be measured
are grouped into triads of similar relations, dissimilar relations and relations to be measured according to the rules in
Table 3. For example, the similarity relationship between landslide vector
and landslide vector
constitutes a triad
, while the relationship between the vector to be measured
and landslide vector
constitutes a triad
.
The numbers of triples constructed from the slippery slope vector
, the non-landslide slope vector
and the vector to be tested
are shown in
Table 4. The training set includes 1968 similar and dissimilar relationship triples and 122 relationship triples to be measured, and the validation set includes 488 similar and dissimilar relationship triples.
After the hyperparameters are adjusted and optimized, the learning rate is set to 0.005, and the number of iteration epochs is set to 200. After modeling with TransE, the distributed vectors of entities and relationships are output, and each entity vector is converted to a 50-dimensional vector. The model loss is shown in
Figure 9, which shows that the model converges. Using the additivity of vectors, the relationship between a pair of entities is judged, and link prediction of the landslide knowledge graph is performed.
From
Section 3.1, a discriminant vector
,
],
,
], …,
,
], …,
,
]] is given randomly when the vector
to be measured is input to the feature classifier for landslide identification. Here,
,
] is a randomly given set of landslide vectors and non-landslide vectors. According to Formula (2), the vector
to be measured is compared with each element vector group
,
] in the discriminant vector
. If the distance of the similar relationship between the vector
to be measured and the landslide vector
is smaller than the distance of the similar relationship between the vector
to be measured and the non-landslide vector
, the vector
to be measured is considered closer to the landslide vector
in this element vector group and in the recognition result vector. After discriminating all the element vector groups in the discriminant vector
, a 246-dimensional identification result vector is output. By calculating the frequencies of 0 and 1 in the recognition result vector, the recognition result vector is mapped to a two-dimensional classification probability vector. The model recognition results are shown in
Table 5. From the table, the recognition accuracy of the feature classifier for the landslide entity vector in the landslide knowledge graph is 86.07%, and the F1-score reaches 85.85%.
4.3. Recognition Results of the FKGRNet Model Based on the Feature Classifier
The classification result vectors of the feature classifier and the ResNet model are fed into the feature classifier for fusion, and the landslide recognition results based on the fused feature classifier model are obtained.
Figure 10 shows the accuracy of the feature classifier-based FKGRNet model compared with the baseline model. From
Figure 10 and
Table 5, taking the ResNet34 baseline model as an example for analysis, the accuracy of identifying loess landslides is 86.07% for the feature classifier and 91.80% for ResNet34. The accuracy of the fused feature classifier model is 7.37% and 1.64% higher than that of the feature classifier and ResNet34, respectively, indicating that, with the combination of landslide knowledge graph data and landslide images, the accuracy of the fusion model for landslide identification has been significantly improved. In terms of performance, the F1-score of the fused feature classifier model is 7.58% and 1.63% higher than that of the feature classifier and ResNet34, respectively, indicating that the performance of the fused feature classifier model is better than that of the separate models. Meanwhile, the FKGRNet fusion model based on different network depths outperforms its corresponding baseline network in terms of accuracy and F1-score, which indicates the importance of landslide knowledge graph data processed by TransE and feature classifier in landslide image recognition. As the network depth increases, the recognition effect of the baseline model improves. The recognition accuracy of FKGRNet reaches its highest value when the network depth is 50, but the landslide recognition accuracy improvement exhibited by the fusion model becomes smaller compared with that of the baseline model.
4.4. Recognition Results of the FKGRNet Model Based on Feature Splicing
The landslide knowledge graph feature vectors based on the TransE model and the image feature vectors based on the ResNet model are feature spliced, and landslide binary classification is performed by the fully connected layer to obtain the landslide recognition results.
Figure 11 shows the training accuracy, training loss, validation accuracy and validation loss of the feature splicing-based FKGRNet model for different network depths. From
Figure 11 and
Table 6, the accuracy of the feature splicing-based FKGRNet34 model for landslide recognition reaches 95.08%, which is 3.28% higher than that of the baseline model (ResNet34), indicating that the fusion of the knowledge graph feature vectors and the landslide image feature vectors by splicing has a positive effect on the accuracy of landslide recognition. In terms of performance, the F1-score of the fused feature splicing model is 3.30% higher than the baseline model, indicating that the fused feature splicing model outperforms the baseline model. Furthermore, as seen in
Table 5 and
Table 6, the fused feature splicing FKGRNet model is superior to the baseline model at different network depths, and the accuracy and F1-score of the fused feature splicing FKGRNet model with different network depths are better than those of the fused feature classifier FKGRNet model with the same depth.
4.5. Comparison Results with Other Models
We evaluate the mainstream convolutional neural network structures: VGGNet16, GoogLeNet, EfficientNetV2, DenseNet201, the Swin Transformer and the Vision Transformer. From
Table 7, the FKGRNet model has an overall advantage over the other structures with different fusion methods when analyzed with the ResNet34 baseline model, and the FKGRNet model obtains the highest accuracy, precision, recall, and F1-score on the validation set. Therefore, combining a landslide knowledge graph with landslide image data to construct a multimodal landslide recognition model is helpful for improving landslide recognition accuracy.
4.6. Ablation Experiments
To investigate the role of knowledge from the landslide knowledge graph on the fusion model, we conducted ablation experiments on the proposed network and its factors, using ResNet34 as the baseline model. We also investigate the roles of different factors in the feature classifier, the feature classifier-based FKGRNet, and the feature splicing-based FKGRNet models. In
Table 8, the baseline rows represent the accuracy corresponding to different baseline models, and the first column represents the factor for each ablation. For each baseline model, there are columns for accuracy and amount of change to describe the accuracy and its change after factor ablation. First, the classification accuracy of the feature classifier, the feature classifier-based FKGRNet, and the feature splicing-based FKGRNet models all increase after ablating the profile curvature, indicating that the profile curvature factor has a negative effect on landslide classification, and its upper-level conceptual knowledge curvature feature is less useful in this study area. Second, the accuracy of the feature classifier as well as the other evaluation factor in the feature classifier-based FKGRNet model decrease after ablation, indicating that the factor and its upper-level conceptual knowledge have a positive role in the classification. Third, for the same ablation factor, in the feature classifier as well as the feature classifier-based FKGRNet model, the ablation factor has a greater effect on the accuracy, while the feature splicing-based FKGRNet model is less sensitive to changes in the ablation factor, indicating that the feature splicing-based FKGRNet model has better stability to the fluctuation of data in the landslide knowledge graph.
To investigate the role of landslide knowledge in the fusion model, we perform further experiments. Since geological conditions, climatic conditions, and thematic index characteristics are only used as landslide factors in this experiment, the results of their ablation experiments are consistent with the corresponding factors. We conduct ablation experiments on the landform features, curvature characteristics and feature distance in the input landslide knowledge. First, the recognition accuracy of the feature classifier model based on landslide knowledge changes the most (decreases by 16.38%) after ablating the landform feature knowledge, and the accuracies of the feature classifier-based FKGRNet model and the feature splicing-based FKGRNet model in the landslide knowledge ablation experiment also decrease by 2.46% and 1.64%, respectively. This indicates that the role of landform feature knowledge in the fusion model is more important than that of other landslide knowledge. Second, after ablating the feature distance knowledge, the recognition accuracies of the three models do not change more than the they do after ablating the landslide knowledge alone, which indicates that the role of landslide knowledge factors does not follow a simple linear superposition relationship. Third, after ablating the curvature characteristic knowledge, the recognition accuracies of all three models increase, indicating that landslide knowledge has a negative effect on the landslide recognition results of the fusion model; this is consistent with the performance trend of the profile curvature factor. From the above ablation experiments, it can be seen that the knowledge contained in a landslide knowledge graph can have different roles in the landslide remote sensing image recognition task, the FKGRNet model can effectively handle multimodal landslide information, and the addition of this knowledge has a positive effect on the increase of the recognition accuracy increase achieved for landslide remote sensing images.
5. Discussion
In this study, we innovatively introduce a landslide knowledge graph representing external knowledge into landslide remote sensing image recognition and propose an FKGRNet-based remote sensing image landslide recognition method. We use ResNet as the baseline model and compare the FKGRNet model based on feature classification and the FKGRNet model based on feature splicing. Then, we compare the performance of FKGRNet models with different network depths on landslide recognition tasks and compare the recognition results with those of other deep learning models.
In terms of model fusion methods, we propose two ways of fusing landslide knowledge graph with models, namely, feature classifier and feature splicing. Since deep learning models have an upper limit on their performance on image recognition tasks, after the introduction of external knowledge, combining knowledge with models can complement the recognition of knowledge and remote sensing images. The TransE model can convert the knowledge in the landslide knowledge graph into entity vectors, and the ResNet model can extract the features of remote sensing images of landslides and fuse them by different combining methods. From
Table 5 and
Table 6, the fusion model significantly outperforms the ResNet baseline model in the landslide recognition task, while the feature splicing-based FKGRNet model outperforms the feature classifier-based FKGRNet model in both accuracy and F1-score in the landslide recognition task.
In terms of the network depth of the models, we compared the performance of the ResNet baseline model, the feature classifier-based FKGRNet model, and the feature splicing-based FKGRNet model on the landslide recognition task using network depths of 18, 34, and 50 layers. As seen in
Table 2,
Table 5 and
Table 6, ResNet50, the feature classifier-based FKGRNet model and the feature splicing-based FKGRNet model all obtained the highest accuracy and F1-score on the landslide recognition task. Additionally, as the network depth increased, the accuracy of the baseline model, the feature classifier-based FKGRNet model and the feature splicing-based FKGRNet model gradually increased on the landslide recognition task, and we believe that the stacking of ResNet residual modules has a positive effect on landslide image feature extraction.
In terms of the role of knowledge in the fusion model, we first extracted the image features using ResNet and then compared them to those of the fusion model for landslide recognition. As shown in
Figure 8, we used the ResNet34 model to extract features from landslide image L259, but the model did not recognize it as a landslide image. As seen from e5 in
Figure 8, after implementing multiple convolutional layers in ResNet, the model did not accurately focus on the landslide region of this image, which may have contributed to the failed recognition result. Two approaches were used to fuse the knowledge: a feature classifier-based fusion approach and a feature splicing-based fusion approach. In the case of the feature classifier-based fusion approach, the entity data in the landslide knowledge graph was fused using Bayesian Equation (8) after executing the feature classifier to obtain a prior probability vector of landslide knowledge. When it was difficult for ResNet to recognize landslide images, i.e., when the probability of ResNet recognizing the current image as a landslide image was approximately 0.5, the feature classifier-based fusion approach was of great help. This is because in this case, the Bayesian Equation (7) allowed the prior probability vector of landslide knowledge to dominate, thus improving the recognition effect. Regarding the fusion method used for feature splicing, the feature vectors of the landslide knowledge were spliced with the image features at the fully connected layer using feature splicing. This allowed both the image information and knowledge information to be considered when the model outputted its landslide recognition results.
In the ablation experiments, to investigate the role of knowledge in the landslide knowledge graph on the fusion model, we conducted ablation experiments on the landslide concept knowledge and its factors using ResNet34 as the baseline model for fusion. As shown in
Table 8, the role of the ablation factor was basically the same in the feature classifier, the feature classifier-based FKGRNet model and the feature splicing-based FKGRNet model, but the degree of the role varied under different fusion models. For example, after ablating the slope factor, the accuracy of the feature classifier was reduced by 6.57%, the accuracy of the FKGRNet feature classifier was reduced by 1.64%, and the FKGRNet feature splicing was reduced by 0.82%. This indicates that the FKGRNet model based on feature splicing is less sensitive to changes in ablation factors, and at the same time, there is better stability in the fluctuation of data in the landslide knowledge graph. The accuracy of the feature classifier-based FKGRNet model remained unchanged under the ablation of factors such as Landform and NDVI; we believe this accuracy may be the upper limit of the recognition accuracy of the model.
The research in this paper has the following limitations: (1) the amount of landslide sample data is small, and the landslide identification task still needs publicly available datasets with a large number of landslide images and landslide point information for further study; (2) the amount of monitoring data in the landslide knowledge graph is too small, only some publicly available loess landslide data are currently used, and if more monitoring data of loess landslide sample points are available, the landslide knowledge map can be enriched and the rich semantic and data information in the landslide knowledge map can be further utilized; and (3) due to certain limitations, we did not use high-resolution remote sensing satellite images such as the Gaofen series and only used Google Earth images for analysis.