Deep Transfer Learning Approach for Identifying Slope Surface Cracks

Yang, Yuting; Mei, Gang

doi:10.3390/app112311193

Open AccessArticle

Deep Transfer Learning Approach for Identifying Slope Surface Cracks

by

Yuting Yang

and

Gang Mei

^*

School of Engineering and Technology, China University of Geosciences (Beijing), Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(23), 11193; https://doi.org/10.3390/app112311193

Submission received: 30 October 2021 / Revised: 21 November 2021 / Accepted: 23 November 2021 / Published: 25 November 2021

(This article belongs to the Special Issue Applications of Machine Learning on Earth Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Geohazards such as landslides, which are often accompanied by surface cracks, have caused great harm to public safety and property. If these surface cracks could be identified in time, this would be of great significance for the monitoring and early warning of geohazards. Currently, the most common method for crack identification is manual detection, which has low efficiency and accuracy. In this paper, a deep transfer learning approach is proposed to effectively and efficiently identify slope surface cracks for the sake of fast monitoring and early warning of geohazards, such as landslides. The essential idea is to employ transfer learning by training (a) a large sample dataset of concrete cracks and (b) a small sample dataset of soil and rock masses’ cracks. In the proposed approach, (1) pretrained crack identification models are constructed based on a large sample dataset of concrete cracks; (2) refined crack identification models are further constructed based on a small sample dataset of soil and rock masses’ cracks. The proposed approach could be applied to conduct UAV surveys on high and steep slopes to provide monitoring and early warning of landslides to ensure the safety of people and property.

Keywords:

geological disasters; landslide; slope surface crack; transfer learning; deep learning

1. Introduction

Geological disasters (geohazards) frequently occur worldwide, and various geological disasters have caused great harm to public safety and property [1,2,3]. The occurrence of geological disasters, such as landslides, debris flows, rock collapses, surface subsidence, surface collapse, surface cracks, and earthquakes, is often accompanied by the occurrence of surface cracks in soil and rock masses.

Especially for high and steep slopes, which are more prone to geological disasters, such as rock collapses and landslides (as illustrated in Figure 1), a series of cracks will form during the early stage of geological disasters. The cracks caused by a collapse are illustrated in Figure 2 (source: www.news.sohu.com and www.heraldnet.com, accessed on 15 April 2021). Before a rock collapse occurs, there are cracks on the top of the slope; the cracks at the rock collapse area will gradually expand, and new cracks will appear at the foot of the slope over time [1,2].

Therefore, monitoring and identifying cracks on the top of a rock collapse slope can effectively provide an early warning of collapse and reduce the loss of human life and property [2,3]. Landslide cracks are illustrated in Figure 2. Landslide cracks are an important sign accompanying landslides and can be divided into tensile cracks, shear cracks, bulging cracks, and fan-shaped cracks [1,2]. In the early stage of landslide formation, intermittent cracks will appear on the rear edge of the landslide [4,5], and the length of the cracks tends to be constant. Continuous cracks appearing on the trailing edge and the length of the cracks showing an expanding trend indicate that the landslide is slowly intensifying [6,7].

As the landslide slides down, a shear zone will be formed between the landslide and the parent body, and shear cracks will appear. The soil and rock masses distributed at the front of the landslide body will also form open cracks due to the uplift of the soil and rock masses hindered by the slide of the landslide body. At the same time, the spread of the landslide tongue to both sides will also form fan-shaped cracks [8,9]. The change in landslide cracks reflects the development process and extent of the landslide. Therefore, the identification of cracks at different positions of the landslide is of great significance for the early monitoring and warning of landslides [10,11].

Currently, the most commonly used crack detection method is manual detection. However, there are several problems in manual detection, including detecting blind spots in the detection process, high fieldwork intensity, low work efficiency, the subjective impact of detection results, and low accuracy [12,13].

To address the problems arising from manual detection, various methods based on image identification have been developed [12,14,15]. However, methods based on image identification are easily affected by environmental factors, such as light, shadows, background, etc., and the computing of later image-based identification methods is often very complicated, resulting in low computational efficiency. The reliability of the detection result of a crack picture with a complex background is unsatisfactory [16].

In recent years, deep learning has achieved good performance in object identification tasks and has the advantages of high parallelism, good robustness, and strong generalization ability [17]. Deep learning does not need to design artificial feature extraction, especially in image classification; the accuracy of trained deep learning network models is significantly higher than that of traditional machine learning approaches [18,19,20].

Many deep learning models have been produced with higher accuracy, generalization ability, and robustness. Applying them to crack detection can effectively improve the efficiency of crack identification and make crack detection more efficient [21,22]. Commonly used identification models include LeNet5 [23], AlexNet [24], GoogLeNet [25,26], VGG [27], ResNet (Deep Residual Network) [28], DenseNet (Dense Connected Convolutional Network) [29], and MobileNets [30], etc. Recently, the Vision Transformer (ViT) [31] model was proposed for image identification and widely used.

Much research has been conducted to identify cracks using deep neural networks. For example, Zhang et al. [32] applied the deep learning method to road crack detection for the first time using a deep convolutional neural network to identify the cracks in 500

3264 \times 2448

images of a strong noise environment (vehicles, tree shadows, etc.) collected by smartphones, which broadens the application scope of deep learning crack recognition. Zhang et al. [33] proposed an automated road crack detection method for three-dimensional asphalt pavement, which is an efficient detection framework based on a convolutional neural network (CrackNet). Cha et al. [34] proposed a damage assessment method based on deep learning, which is more robust and accurate in the detection of concrete cracks than traditional image detection methods.

Moreover, Chen et al. [35] proposed the NB-CNN network framework, which fused a convolutional neural network (CNN) and Naïve Bayes data. The approach aggregated each frame of information in the video for data fusion to further perform crack detection. The overall performance and robustness of the detection system were improved. Dorafshan et al. [36] compared six traditional edge detection methods with a deep convolution neural network (DCNN) in terms of crack detection effects. The DCNN shows greater advantages for calculation time and accuracy, indicating the superiority of the DCNN in crack detection. Maeda et al. [37] used smartphones in cars to obtain pictures to establish a dataset and successfully detected eight road damage scenes by training a convolutional neural network damage detection model. In addition, we compared the accuracy and operation speed of GPU server detection and smartphone detection and developed a smartphone application that has been put into use.

In addition, Alexey Dosovitskiy et al. [31] applied the Transformer in the NLP field to the computer vision field and proposed a model (Vision Transformer, ViT) to train the image classification models under the condition of reducing the modification of the Transformer as much as possible. When the dataset is large, the training result of the ViT model is almost the same as that of current optimal convolutional network structures, and the computing resources required for training are significantly reduced. The emergence of the ViT promotes the application of Transformer in the field of computer vision. For example, Jiang et al. [38] proposed the ViT–CNN ensemble model to classify cancer cells images and normal cells images to assist in the diagnosis of acute lymphoblastic leukemia. The ViT–CNN integrated model achieves 99.03% classification accuracy on the test set, which is better than other models by experimental comparison. Laila Bashma et al. [39] presented an approach for the multi-label classification of remote sensing images based on data-efficient transformers. The validity of the model was also demonstrated by an experimental validation on two datasets with a ground resolution of 2 cm collected in the cities of Trento and Civezzano.

However, currently, there is little research work focusing on the use of convolutional neural network models or Transformer methods for the identification of slope surface cracks, especially for the surface cracks of high and steep slopes. This is because deep learning methods have a high dependence on the amount of data, and a large amount of training data is required for deep learning model training. However, it is difficult to obtain crack images of high and steep slopes, and the data sample size is small. This leads to difficulties in the construction of deep learning models.

To address the above problems, in this paper, a deep transfer learning approach is proposed to effectively and efficiently identify slope surface cracks for the early monitoring and warning of geohazards, such as landslides.

First, seven deep neural network models are used to identify concrete cracks, which are easier to obtain as a dataset, and pretrained crack identification models are constructed. Then, a transfer learning strategy is exploited by combining a small sample soil and rock masses’ crack dataset we compiled with a large sample concrete crack dataset. Pretrained deep neural network models are used to identify soil and rock masses’ cracks. Finally, deep learning models for the identification of slope cracks are obtained, which are expected to be used in the slope monitoring and early warning of geological disasters to ensure public safety and property.

The rest of this paper is organized as follows. Section 2 describes the proposed deep learning framework in detail. Section 3 analyzes the results obtained by this framework. Section 4 discusses the advantages, applicability, and disadvantages of the proposed framework, as well as possible future work. Section 5 concludes the paper.

2. Methods

2.1. Overview

In this paper, we propose a deep transfer learning approach to effectively and efficiently identify slope surface cracks for early monitoring and warning of geohazards, such as landslides.

First, the deep learning models are pretrained for the identification of slope surface cracks, and the models are compared and analyzed to screen for an optimized model. Then, a transfer learning strategy is utilized. A small sample dataset of surface cracks in soil and rock masses is established and combined with a large sample dataset of concrete cracks. The pretrained models are applied to the identification of soil and rock masses’ slope surface cracks, thereby constructing deep learning models for the identification of slope surface cracks.

The workflow of the proposed deep transfer learning approach is illustrated in Figure 3.

Step 1: Collecting and processing the concrete crack dataset and pretraining deep learning models for the identification of slope surface cracks.

(1): Collecting and compiling a large sample dataset of concrete cracks.
(2): Using convolutional neural network models (e.g., LeNet5, AlexNet, LeNet5 model using InceptionA and InceptionE modules, ResNet18, MobileNet) and Vision Transformer to identify concrete cracks. Establishing a series of pretrained deep learning models for identifying cracks in soil and rock masses slopes.
(3): Enhancing the data of each model. Comparing and analyzing the accuracy and calculation efficiency of each model. Analyzing the application range of each model and screening for an optimized model to prepare for the identification of soil and rock masses’ slope surface cracks.

Step 2: Sorting out and labeling the dataset of surface cracks in soil and rock masses. Using a transfer learning strategy to obtain refined deep learning models for the identification of slope surface cracks.

(1): For the difficult-to-obtain datasets of surface cracks in soil and rock masses, collecting, sorting, and labeling relevant data. Designing and producing corresponding surface cracks in soil and rock masses small sample datasets and applying them to the identification of soil and rock masses’ slope surface cracks.
(2): Employing a transfer learning strategy, the training set of large sample dataset of concrete cracks is combined with the training set of the small sample dataset of surface cracks in the soil and rock masses slopes. A series of pretraining models established are employed for deep learning training to obtain a high-precision deep learning model for UAV aerial survey that can be applied to the identification of surface cracks in slopes to provide fast monitoring and early warning of geological disasters.

2.2. Step 1: Construction of Pretrained Deep Learning Models for the Identification of Slope Surface Cracks

2.2.1. Data Collection and Cleaning

In this paper, we collect concrete crack images through publicly available datasets. The concrete images with cracks and the concrete images without cracks are collected separately. We crop the images to a uniform size, such as

227 \times 227

pixels, remove the images with excessive interference, and save the same number of images with and without cracks. Eighty percent of the images are randomly selected as the training set and twenty percent as the test set. Finally, the required concrete crack dataset is obtained.

2.2.2. Data Augmentation

Data augmentation refers to the process of performing a series of transformation operations on a limited dataset to generate new data, reduce network overfitting, and improve the generalization ability of the training models. In this paper, we employ data augmentation to address the problem of insufficient data in the dataset and expand the amount of data.

Data augmentation includes random flip transformation, random rotation transformation, scale transformation, random cropping, and center cropping transformation, as well as brightness, contrast, and hue transformation, as illustrated in Figure 4. In this paper, the above-mentioned data augmentation methods are used in combination. Through the torchvision.transforms.Compose() function, a data augmentation operation is performed before each data acquisition, which greatly expands the content of the dataset.

2.2.3. Pretrained Model Construction

The pretrained models are constructed by applying six deep neural network models, and a total of seven deep neural network models are built: LeNet5, AlexNet, InceptionA, InceptionE, ResNet, MobileNet, and Vision Transformer. Each of the models is briefly described as follows.

(1): LeNet5: The LeNet5 model is the most original image classification model. The entire neural network has 7 layers (excluding the input layer), each layer has multiple Feature maps, and each Feature map has multiple neurons. The result is output by the Sigmoid function, and Softmax is used as the activation function [23]. The LeNet5 model has a simple structure and a small number of network layers, but the image needs to be scaled to $32 \times 32$ pixels first and then used to train the LeNet5 model.
(2): AlexNet: The AlexNet model is the first convolutional neural network model that has attracted wide attention. Compared with LeNet, AlexNet has a deeper network structure, cascading convolutional layers, dropout to suppress overfitting, ReLu to replace Sigmoid as the activation function, and multi-GPU training [24]. A series of optimization measures have been taken to achieve higher accuracy.
(3): Inception module: The Inception module was first employed in GoogLeNet, which enhanced the function of the convolution module and further deepened the network depth. Inception modules stack multiple convolutional and pooling layers, and an Inception module can contain several different types of convolutional and pooling operations at the same time [25,26]. Later, InceptionV2, InceptionV3, and InceptionV4 network structures have been developed. Because the GoogLeNet network is too large, only the InceptionA and InceptionE modules in InceptionV3 are applied to LeNet to further deepen the depth of the LeNet network and improve the accuracy.
(4): ResNet: The residual module Residual Block is introduced to alleviate the vanishing gradient problem caused by the network being too deep, thus speeding up the neural network training and improving the model’s accuracy [28]. The ResNet structure of the network mainly has ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152, which represent neural networks with different numbers of convolutional layers. In this paper, the ResNet18 network structure is employed for the deep learning model established.
(5): MobileNet: This is a lightweight deep neural network model suitable for mobile terminals, embedded devices, and other devices that have the low computing power and require speed and real-time performance. MobileNet mainly applies Depthwise Separable Convolution (Xception variant structure) instead of the traditional standard convolution operation [30]. The main MobileNet network structures are MobileNetV1, MobileNetV2, and MobileNetV3. In this paper, we adopt the MobileNetV1 network structure and employ the deep separable convolution module to reduce the computational parameters and the amount of calculation.
(6): Vision Transformer (ViT): Vision Transformer is based on the widely used Transformer model in the field of natural language processing. ViT combines the fields of CV and NLP and does not use the traditional CNN convolutional neural network but creatively deals with the computer vision field based on the Transformer architecture. It can achieve good results and high accuracy [31]. In this paper, the Vison Transformer model is employed for crack identification, and the effect of crack image identification is further tested.

2.3. Construction of Refined Deep Learning Models for the Identification of Slope Surface Cracks

2.3.1. Dataset Construction

As the data of surface cracks on high and steep slopes are difficult to obtain, and the soil and rock masses’ cracks on the slope are the same as surface cracks, we collected surface cracks in soil, and rock masses’ data are collected by ourselves and used as the identification of soil and rock masses’ slope surface cracks dataset. We sorted, cropped, and identified the obtained crack images of the soil and rock masses, and the images of the soil and rock masses with surface cracks and the images of the soil and rock masses without surface cracks of the same size were obtained. The number of images with cracks and the number of images without cracks are equivalent. Eighty percent of them are randomly selected as the training set and twenty percent are the test set to construct an image dataset of cracks in the slope to identify cracks in soil and rock masses.

2.3.2. Refined Model Construction

Deep learning has been widely used because it can learn advanced features of the data. However, deep learning requires sufficient data to complete training and has a strong dependence on the amount of data. For many tasks, it is impossible to obtain a large number of trainable datasets, but transfer learning methods can be employed.

Transfer learning solves the problem of insufficient high-quality training data in the target field by transferring knowledge from an existing source field to the target field [40,41]. In this way, the corresponding problems can be solved even in the case of a lack of data, which reduces the data dependence of deep learning. The basic principles of transfer learning are illustrated in Figure 5.

When the amount of data is small, it is difficult to accurately classify the data. After employing similar problems for auxiliary training, although there will still be a certain deviation in the classification of the data, the classification task can be completed. For example, after the classification of five-pointed stars and circles, using transfer learning, the task of classifying six-pointed stars and circles can be accomplished.

In this paper, we employ the idea of transfer learning and use the concrete cracks dataset, which is similar to the surface cracks in the soil and rock masses, to carry out deep learning model pretraining, including LeNet5, AlexNet, and LeNet5 models using InceptionA and InceptionE modules, ResNet18, MobileNet, and Vision Transformer. After constructing the pretrained deep learning models, we combine the small sample surface cracks of the soil and rock masses dataset and the large sample concrete crack dataset to obtain an optimized dataset. The deep learning model is trained again after the model screening and comparison. Then, the reified deep learning models employed for the identification of soil and rock masses’ slope surface cracks are finally obtained.

3. Results

3.1. Experimental Environment

The software and hardware environment configurations used in this paper are listed in Table 1 and Table 2, respectively.

3.2. Experimental Data

3.2.1. Pretrained Model Dataset

In this paper, we first obtained a public concrete crack image dataset (source: www.kagg-le.com, accessed on 15 April 2021) for deep learning model pretraining. There are a total of 40,000 images in the dataset, 20,000 with cracks and 20,000 without cracks, and the image size is

227 \times 227

pixels. Eighty percent of images with cracks and images without cracks are randomly selected as the training set, and the remaining twenty percent are used as the test set. Therefore, the training set has a total of 32,000 images, of which there are 16,000 images with cracks and 16,000 images without cracks. The test set has a total of 8000 images, of which there are 4000 images with cracks and 4000 images without cracks. The background images of some concrete with cracks and concrete without cracks in the dataset are illustrated in Figure 6 and Figure 7.

3.2.2. Refined Model Dataset

Since there is little related research on the identification of surface cracks in soil and rock masses, the datasets of surface cracks in soil and rock masses required for the construction of the refined model cannot be directly obtained. Thus, we need to produce the dataset of surface cracks in soil and rock masses. In this paper, we collect, sort, and crop the crack images in soil and rock masses and produce a smaller dataset for the identification of cracks in soil and rock masses. The dataset has a total of 400 images, and the size of each image is adjusted to the same size as the original dataset:

227 \times 227

pixels. Among them, 200 are images contain soil and rock masses with surface cracks, and 200 are images contain soil and rock masses without surface cracks. Eighty percent of them are randomly selected as the training set of the rock and soil crack recognition dataset, and twenty percent are selected as the test set. Therefore, there are a total of 320 images in the training set, 160 images with cracks, and 160 images without cracks. There are a total of 80 images in the test set, 40 images with cracks and 40 images without cracks. Some images of the dataset are illustrated in Figure 8 and Figure 9.

3.3. Test Results and Analysis of the Pretrained Deep Learning Models for the Identification of Slope Cracks

The deep neural networks used in this paper are LeNet5, AlexNet, the LeNet5 network model improved by InceptionV3’s InceptionA and InceptionE modules (referred to as InceptionA and InceptionE in the following for easy identification), ResNet18, MobileNet, and Vision Transformer. The loss function used in the model is

t o r c h . n n . C r o s s E n t r o p y L o s s ()

, and the optimizer is

t o r c h . o p t i m . S G D ()

. The comparison and analysis of each model in terms of accuracy and efficiency are as follows. The statistics of the total number of parameters and floating point operations for each model are listed in Table 3.

As can be seen from Table 3, the floating point operations and parameters of LeNet5 are very small, indicating a simple model structure and a small memory footprint. In contrast, the floating point operations and parameters of InceptionE are very large, which indicates that the model structure is complex and occupies more memory. Among the other models, the floating point operations of MobileNet, ResNet18, InceptionA, AlexNet, and Vision Transformer increase in order. Furthermore, the parameters are AlexNet, MobileNet, InceptionA, ResNet18, and VisionTransformer in descending order. The accuracy and GPU parallel computation time of each model are compared and analyzed as follows.

3.3.1. Accuracy

When the data augmentation is not performed, the accuracy comparison of each model under 10 epochs is illustrated in Figure 10a, and the results are listed in Table 4. From Figure 10b and Table 4, it can be seen that before data augmentation, the accuracy of each model is higher. Except for the slightly lower accuracy of Vision Transformer, the accuracy of the other models can reach more than 99%. Moreover, the accuracy of the model does not fluctuate much in 10 epochs, and the model is relatively stable. Vision Transformer has the lowest accuracy rate among the seven models, but it can also reach more than 94%, indicating that Vision Transformer has broad space for future applications in the field of computer vision. Take the maximum accuracy of each model in 10 epochs for comparison, and the columnar analysis chart is illustrated in Figure 10a.

From the comparative analysis in Figure 10b, it can be seen that before data augmentation, AlexNet has the highest accuracy, reaching 99.800%. Except for Vision Transformer, LeNet5 has the lowest accuracy. This is due to the relatively simple structure of LeNet and the relatively deep model network. The accuracy of InceptionA and InceptionE improve on LeNet5 and are higher than that of LeNet5. This is due to the use of the Inception module to further deepen the network model. The accuracy of the ResNet18 network model can also reach 99.750%. Because its depth is the highest among the seven network models, its accuracy is also higher. However, due to the small dataset and fewer training images, ResNet18 failed to give full play to its functionality, and the accuracy rate was slightly lower than that of AlexNet.

The quality of the original dataset is very high before data augmentation. There is not much noise in the dataset. In this case, the accuracy of nearly all models is very high. However, the original dataset is not generic; that is, only very good samples are collected in the dataset. Therefore, after data augmentation, many lower quality samples are created and used to train the models.

In this case, the data augmentation operation on the dataset increases the training deviation of the original dataset, resulting in a decrease in accuracy. To further analyze the generalization ability and stability of each model, Figure 11 illustrates the comparison of accuracy rates under 10 epochs before and after data augmentation for the same network model.

From the comparative analysis in Figure 11, it can be seen that after data augmentation, the accuracy of each model is lower than that before the data augmentation. Among the models, the accuracy of AlexNet and InceptionA decreases slightly after data augmentation. Although the accuracy of ResNet18 fluctuates greatly, the highest accuracy is not much lower than before the data augmentation. This shows that these three types of network models have strong generalization ability. LeNet5, InceptionE, and MobileNet have a large decrease in accuracy after data augmentation, which is lower than other models in generalization ability. The accuracy of Vision Transformer drops the most after data augmentation, indicating that its generalization ability in the field of computer vision is lower than other models. However, its accuracy rate can still reach a relatively high level of approximately 87%, indicating that it still has good development prospects in the field of computer vision. The accuracy comparison of each model for 10 epochs is illustrated in Figure 12a and Table 5.

From the analysis of Figure 12a and Table 5, it can be seen that AlexNet, InceptionA, and ResNet18 still maintain a high level of accuracy after data augmentation, all of which can reach more than 99%. Among them, AlexNet has the highest accuracy, which can reach 99.7%. After several rounds of training, LeNet and InceptionE can reach an accuracy rate of more than 98%. The accuracy of MobileNet and Vision Transformer is significantly lower than that of the other models. In addition, the accuracy of ResNet18, MobileNet, and Vision Transformer fluctuates greatly, and the model stability is low. The accuracy of InceptionE continues to increase, so it is speculated that it has not reached its highest value of accuracy. After increasing the number of training rounds, the highest accuracy rate of the InceptionE model is 98.800%. Therefore, the highest accuracy of the InceptionE model is 98.800% in comparison to the highest accuracy of other data models. The comparison is illustrated in Figure 12b.

From the analysis of Figure 12b and Table 5, it can be seen that the accuracy rate of AlexNet after data augmentation is the highest among the seven models at 99.700%. Compared with the accuracy before data augmentation, the decrease in accuracy is the smallest, which means that when the dataset is relatively small, AlexNet has a strong generalization ability and high model stability.

Although the accuracy rates of InceptionA, InceptionE, and LeNet5 decreased and the accuracy rates of LeNet5 and InceptionE decreased relatively more, their accuracy rates remained high. In addition, the accuracy of 10 epochs has small fluctuations, the model converges quickly, and a high accuracy rate can be achieved in a short time.

The highest accuracy of ResNet18 is high, reaching more than 99%, but the accuracy of the model fluctuates greatly in 10 epochs. This is due to its Residual Block module, which causes its convergence to slow down, but it avoids the vanishing gradient. The accuracy of MobileNet and Vision Transformer has dropped the most, and the accuracy in 10 epochs fluctuates greatly, indicating that the stability of these two models is low, and good training results cannot be achieved for datasets with a large amount of data. However, the training time of these two models is relatively short, especially compared with the deep network model, such as GoogLeNet, which emphasizes how lightweight the models are and the improvement in training efficiency.

3.3.2. Efficiency

The calculation time of 10 epochs for each model is illustrated in Table 6, and the comparison chart is illustrated in Figure 13. Since the data are enhanced, the training dataset of each model is different in each epoch round; in this case, the operation time after data augmentation is not comparable. We only compare and analyze the calculation time for each epoch of each model before data augmentation and make a preliminary judgment on the calculation efficiency of each model.

It can be seen in Figure 13 that the calculation times for each epoch of each model are not very different, and the calculation time for 10 epochs is evenly distributed. Among the models, InceptionE has the longest calculation time, LeNet5 has the shortest calculation time, which is due to the simplicity of the LeNet5 model, the minimal number of parameters and floating point operations, and its shallow depth. The Inception module used in the InceptionE model is more complex, which increases the depth and complexity of the model. The number of parameters and floating point operations of InceptionE is the highest among all models; thus, the computation time is long and the computation efficiency is low.

The computational efficiency of the three models, i.e., AlexNet, InceptionA, and MobileNet, are similar, approximately 3–4 min. Among the three models, AlexNet is the fastest, MobileNet is the second fastest, and InceptionA is the slowest. The AlexNet model is computationally intensive, but the number of parameters is small, so it can achieve a faster operation speed. The MobileNet model has a large number of parameters but a small number of floating point operations, and its operating speed is greatly improved using the depth-separable convolution module. Thus, it can maintain a high operation efficiency even when the network model is deeper. InceptionA has the largest number of parameters and larger floating point operations, so its operation speed is the slowest.

ResNet18 has a relatively long operation time due to a large number of layers and the deep network model. The operation time of the Vision Transformer is the longest except for InceptionE, which is due to its larger number of parameters and floating point operations, second only to InceptionE, indicating that the application of Vision Transformer in the field of computer vision still has a certain optimization space in terms of model lightweighting and improving computational efficiency.

3.4. Test Results and Analysis of the Refined Deep Learning Models for the Identification of Slope Surface Cracks

3.4.1. Comparison of Accuracy before and after Using Transfer Learning

The small sample dataset of surface cracks in soil and rock masses is directly used for model training and testing. Taking LeNet5 as an example, the obtained accuracy rate is low and even unable to reach 75%. In this paper, we use transfer learning and combine the training set of the small sample dataset of surface cracks in soil and rock masses into the training set of the large sample dataset of concrete cracks. We reused LeNet5 for training, and the accuracy rate was significantly improved, reaching 98.094%, an increase of more than 23%. A comparison of accuracy before and after LeNet5 transfer learning is illustrated in Table 7 and Figure 14.

3.4.2. Comparison of the Accuracy of Each Model after Transfer Learning

The transfer learning method is adopted for the identification of surface cracks in soil and rock masses. Seven network models pretrained by deep learning, LeNet5, AlexNet, InceptionA, InceptionE, ResNet18, MobileNet, and Vision Transformer, are applied to identify surface cracks in soil and rock masses. After 10 epochs, the accuracy rate is illustrated in Table 8, and the comparison is illustrated in Figure 15.

From the analysis of Figure 15 and Table 8, it can be seen that the identification of surface cracks in soil and rock masses can reach a higher accuracy after transfer learning. Except for Vision Transformer, the accuracy of the other six models can reach more than 95%, and the highest accuracy of the Vision Transformer can also reach 89.597%, which is close to 90%. Therefore, the application of transfer learning has a better effect on the identification of surface cracks in soil and rock masses.

To make the training test results obtained after inserting the soil and rock crack data training set into the original dataset more real, the training set is processed with data augmentation. The accuracy curve for the identification of surface cracks in soil and rock masses after transfer learning is basically the same as the original dataset curve shape that is enhanced before transfer learning. LeNet5, AlexNet, InceptionA, and InceptionE are all accurate and stable, while ResNet18, MobileNet, and Vision Transformer have relatively high volatility. Each model maintains its original characteristics after transfer learning and does not cause major changes due to transfer learning.

3.4.3. Comparative Analysis of the Accuracy of the Refined Models and the Pretrained Models

Seven pretrained deep learning network models, LeNet5, AlexNet, InceptionA, InceptionE, ResNet18, MobileNet, and Vision Transformer, are employed for the identification of surface cracks in soil and rock masses. Compared with the identification of concrete cracks conducted by the original dataset training, the accuracy comparison analysis chart for 10 epochs is illustrated in Figure 16.

Taking the highest accuracy rate of each model when using the original dataset and comparing the highest accuracy rate of each model after using the dataset of surface cracks in soil and rock masses for transfer learning, a comparison is illustrated in Figure 17.

From the analysis in Figure 17, it can be seen that for the identification of slope cracks, applying refined deep learning models to identify surface cracks in soil and rock masses, the accuracy rate basically maintains the accuracy level of the original pretrained models. The accuracy of MobileNet and Vision Transformer has even increased, which proves that the generalization ability of the deep learning refined models has been improved. This shows that it is feasible to employ transfer learning between the slope surface cracks in soil and rock masses and concrete cracks, and the training effect is better. Among the seven models, AlexNet has the highest accuracy in both the original dataset and the soil and rock crack dataset, further indicating that AlexNet is suitable for the identification of slope surface cracks. The accuracy of MobileNet and Vision Transformer did not decrease but increased after using transfer learning, which indicates the strong adaptability of MobileNet and Vision Transformer to the identification of soil and rock masses’ slope surface cracks. MobileNet and Vision Transformer have good development potential in the identification of soil and rock masses’ slope surface cracks in the future.

4. Discussion

In this paper, we present a deep transfer learning approach for identifying slope surface cracks. The following section will discuss the advantages and some shortcomings of the presented approach, as well as prospects for future work.

4.1. Advantages and Applicability

By exploiting the idea of transfer learning, a large sample dataset of concrete cracks (a total of 40,000 images) is combined with a small sample of surface cracks in soil and rock masses dataset (a total of 400 images), which we collected, sorted, and annotated. Pretrained deep learning models are employed for the identification of surface cracks in soil and rock masses. A high accuracy rate has been achieved. The limitation of using a smaller dataset of slope surface cracks in soil and rock masses is overcome. The massive amount of data required for deep learning model training is also reduced greatly. Therefore, we hope to build a refined deep learning model that can be deployed on UAVs for UAV aerial surveys to identify slope surface cracks and be applied to geological disaster monitoring and early warning.

The utilization of the strategy of transfer learning is demonstrated to be effective. Taking LeNet5 as an example, before transfer learning, when only the small sample dataset is used, the highest accuracy rate can only reach 75%. After transfer learning, the large sample dataset and the small sample dataset were combined, and the highest accuracy rate of the identification of slope cracks was significantly improved, reaching 98.094%, an increase of more than 23%. The massive amount of data required for the identification of slope surface cracks is significantly reduced.

The flexibility of the model is related to the depth of the model. The more layers of the model, the deeper and stronger the flexibility of the model. However, the more layers of the model, the more potential features of the dataset are extracted. When faced with a small-scale dataset, models with many layers are prone to overfitting. Deep learning models with few layers and a shallow depth can achieve better learning effects. In this paper, the dataset is small. The combined large and small sample datasets only have 40,400 images. Therefore, the AlexNet model and LeNet model with a small number of layers have a higher accuracy rate, and the accuracy of ResNet18 and Vision Transformer models with more layers is lower.

4.2. Shortcomings

The refined models in this paper can only identify slope surface cracks, but the location, width, length, and depth of the slope surface cracks are still uncertain, and further optimization is needed. Moreover, in the construction of the dataset for the identification of slope cracks in this paper, the dataset is still small. For some models that are suitable for large sample datasets, such as Vision Transformer, good training or test results cannot be obtained. Fewer real datasets are available, and more in-depth work is needed to obtain more real image datasets through UAV aerial photography or other means.

4.3. Future Work

(1): Due to the lack of image data of surface cracks in soil and rock masses, there is little research work focusing on the deep learning of surface cracks in soil and rock masses. Therefore, it is very important to collect the image data of surface cracks in soil and rock masses and establish a dataset of surface cracks in soil and rock masses. In the future, UAV aerial surveys can be used to collect crack images in the study area, and image cropping can be carried out through relevant codes to establish a dataset of surface cracks in soil and rock masses in the study area. In addition, the dataset sharing network platform of surface cracks in soil and rock masses can also be established, and users can upload crack pictures in real-time to share with other researchers, thereby establishing a huge soil and rock crack database, which is convenient for researchers to study and learn about surface cracks in soil and rock masses.
(2): In addition, it is known from related work on Vision Transformer [31] that when the dataset is large enough, Vision Transformer performs better and achieves higher accuracy when trained on large-scale datasets. Therefore, Vision Transformer has strong potential in the application of large-scale datasets. In the future, more real slope surface cracks of soil and rock masses image data need to be collected to evaluate the performance of Vision Transformer in large-scale slope surface cracks of soil and rock masses datasets.
(3): With the development of deep learning, there will be more deep neural networks with high accuracy and excellent performance in the future. This paper only conducts related research on the identification of cracks. In the future, more detailed research will be conducted on the location, length, width, and depth of the cracks. A deep learning model of slope crack detection suitable for UAV aerial surveys will be established to realize the practical application of deep learning in the field for the identification of slope cracks.
(4): To achieve the expected objective of deploying deep learning models on UAVs, much preliminary research work was conducted in this paper to compare the recognition accuracy and computational efficiency of seven deep learning models and analyze them in conjunction with the total number of parameters and floating-point operations of the models. Among the seven deep learning models, the AlexNet model has the highest accuracy and computational efficiency. MobileNet, on the other hand, has the smallest computational volume among the models with the same magnitude of total parameters and thus has a relatively high computational efficiency. In practical applications, the deep learning models deployed on UAVs are generally the models that have been trained. The training process is generally performed before the model is deployed, so the number of training samples has a weak relationship with the final detection speed to enable the model to achieve optimal results. As for the selection of the deployed model, it is necessary to consider not only the accuracy but also the number of model parameters and computational size because this determines the memory occupation and computational efficiency of the model and try to select the model with a small memory occupation and high computational efficiency under the condition that the accuracy is guaranteed.
(5): Currently, there are few deep learning models designed specifically for the identification of slope surface cracks. Therefore, future research work will focus on the design of related deep learning models for the identification of slope cracks. In addition, the deep learning model will be deployed on a UAV to monitor the high and steep slopes and other locations that are prone to geohazards and difficult to detect on-site cracks. Real-time monitoring of the surface cracks in soil and rock masses can be carried out to monitor geological disasters on high and steep slopes in a timely manner to ensure public safety and property.

5. Conclusions

In this paper, we propose a deep transfer learning approach to effectively and efficiently identify slope surface cracks for the early monitoring and warning of geohazards, such as landslides. The essential idea is to employ transfer learning by training (a) the large sample dataset of concrete cracks and (b) the small sample dataset of soil and rock masses’ cracks. In the proposed approach, (1) pretrained crack identification models are constructed based on the large sample dataset of concrete cracks; (2) refined cracks identification models are further constructed based on the small sample dataset of soil and rock masses’ cracks. To evaluate the effectiveness of the proposed approach, the accuracy of deep learning models is tested. The results show that (1) AlexNet has the highest accuracy and computational efficiency, proving that the classic AlexNet model is effective in identifying slope cracks. (2) Each model achieved much higher accuracy by employing transfer learning (for example, the accuracy of LeNet is approximately 23% higher than using the small sample dataset), and the massive amount of data required for the identification of slope surface cracks is significantly reduced.

Future work is planned to achieve better results for a sufficiently sized dataset. The deep transfer learning approach can be deployed on UAV aerial surveys, providing the detection of surface cracks on high and steep slopes. The deep transfer learning approach has important engineering significance to realize timely monitoring to prevent geological disasters.

Author Contributions

Conceptualization, Y.Y. and G.M.; methodology, Y.Y. and G.M.; writing—original draft preparation, Y.Y. and G.M.; writing—review and editing, Y.Y. and G.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was jointly supported by the National Natural Science Foundation of China (Grant No. 11602235)), the Fundamental Research Funds for China Central Universities (2652018091), the Major Program of Science and Technology of Xinjiang Production and Construction Corps (2020AA002), and the 2021 Graduate Innovation Fund Project of China University of Geosciences, Beijing (ZD2021YC009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the editor and the reviewers for their helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ma, Z.; Mei, G.; Piccialli, F. Machine learning for landslides prevention: A survey. Neural Comput. Appl. 2021, 33, 10881–10907. [Google Scholar] [CrossRef]
Mei, G.; Xu, N.; Qin, J.; Wang, B.; Qi, P. A Survey of Internet of Things (IoT) for Geohazard Prevention: Applications, Technologies, and Challenges. IEEE Internet Things J. 2020, 7, 4371–4386. [Google Scholar] [CrossRef]
Fan, X.; Scaringi, G.; Korup, O.; West, A.J.; van Westen, C.J.; Tanyas, H.; Hovius, N.; Hales, T.C.; Jibson, R.W.; Allstadt, K.E.; et al. Earthquake-Induced Chains of Geologic Hazards: Patterns, Mechanisms, and Impacts. Rev. Geophys. 2019, 57, 421–503. [Google Scholar] [CrossRef] [Green Version]
Wang, H.; Nie, D.; Tuo, X.; Zhong, Y. Research on crack monitoring at the trailing edge of landslides based on image processing. Landslides 2020, 17, 985–1007. [Google Scholar] [CrossRef]
Parry, S.; Campbell, S.D.G. Deformation associated with a slow moving landslide, Tuen Mun, Hong Kong, China. Bull. Eng. Geol. Environ. 2007, 66, 135–141. [Google Scholar] [CrossRef]
Du, Y.; Yan, E.; Gao, X.; Mwizerwa, S.; Yuan, L.; Zhao, S. Identification of the Main Control Factors and Failure Modes for the Failure of Baiyuzui Landslide Control Project. Geotech. Geol. Eng. 2021, 39, 3499–3516. [Google Scholar] [CrossRef]
Lian, X.G.; Li, Z.J.; Yuan, H.Y.; Liu, J.B.; Zhang, Y.J.; Liu, X.Y.; Wu, Y.R. Rapid identification of landslide, collapse and crack based on low-altitude remote sensing image of UAV. J. Mt. Sci. 2020, 17, 2915–2928. [Google Scholar] [CrossRef]
Du, G.L.; Zhang, Y.S.; Yao, X.; Guo, C.B.; Yang, Z.H. Formation mechanism analysis of Wulipo landslide-debris flow in Dujiangyan city. Rock Soil Mech. 2016, 37, 493–501. [Google Scholar] [CrossRef]
Djerbal, L.; Melbouci, B. Ain El Hammam landslide (Algeria): Causes and evolution. Bull. Eng. Geol. Environ. 2012, 71, 587–597. [Google Scholar] [CrossRef]
Cheng, Z.; Gong, W.; Tang, H.; Juang, C.H.; Deng, Q.; Chen, J.; Ye, X. UAV photogrammetry-based remote sensing and preliminary assessment of the behavior of a landslide in Guizhou, China. Eng. Geol. 2021, 289, 106172. [Google Scholar] [CrossRef]
Huang, R. Mechanisms of large-scale landslides in China. Bull. Eng. Geol. Environ. 2012, 71, 161–170. [Google Scholar] [CrossRef]
Cao, X.; Li, T.; Bai, J.; Wei, Z. Identification and classification of surface cracks on concrete members based on image processing. Trait. Signal 2020, 37, 519–525. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic Pixel-Level Crack Detection and Measurement Using Fully Convolutional Network. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
Huang, W.; Zhang, N. A Novel Road Crack Detection and Identification Method Using Digital Image Processing Techniques. In Proceedings of the 2012 7th International Conference on Computing and Convergence Technology (ICCCT), Seoul, Korea, 3–5 December 2012; DalKwack, K., Kawata, S., Hwang, S., Han, D., Ko, F., Eds.; IEEE: Piscataway, NJ, USA, 2012; pp. 397–400. [Google Scholar]
Azarafza, M.; Ghazifard, A.; Akgun, H.; Asghari-Kaljahi, E. Development of a 2D and 3D computational algorithm for discontinuity structural geometry identification by artificial intelligence based on image processing techniques. Bull. Eng. Geol. Environ. 2019, 78, 3371–3383. [Google Scholar] [CrossRef]
Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
Nguyen, B.Q.V.; Kim, Y.T. Landslide spatial probability prediction: A comparative assessment of naive Bayes, ensemble learning, and deep learning approaches. Bull. Eng. Geol. Environ. 2021, 80, 4291–4321. [Google Scholar] [CrossRef]
Chadaram, S.; Yadav, S.K. Identification of Cracks Length by XFEM and Machine Learning Algorithm; Lecture Notes in Intelligent Transportation and Infrastructure; Springer: Singapore, 2020; pp. 265–272. [Google Scholar] [CrossRef]
Zhang, F.; Hu, Z.; Fu, Y.; Yang, K.; Wu, Q.; Feng, Z. A New Identification Method for Surface Cracks from UAV Images Based on Machine Learning in Coal Mining Areas. Remote Sens. 2020, 12, 1571. [Google Scholar] [CrossRef]
Liu, L.; Meng, G. Crack detection in supported beams—Based on neural network and support vector machine. In Advances in Neural Networks—ISNN 2005; Wang, J., Liao, X., Yi, Z., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3498, pp. 597–602. [Google Scholar]
Jang, K.; Kim, N.; An, Y.K. Deep learning–based autonomous concrete crack evaluation through hybrid image scanning. Struct. Health Monit. 2019, 18, 1722–1737. [Google Scholar] [CrossRef]
Yang, J.; Zhang, G.; Chen, X.; Ban, Y. Quantitative identification of concrete surface cracks based on deep learning clustering segmentation and morphology. Laser Optoelectron. Prog. 2020, 57. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2323. [Google Scholar] [CrossRef] [Green Version]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar] [CrossRef] [Green Version]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef] [Green Version]
Howard, A.G.; Menglong, Z.; Bo, C.; Dmitry, K.; Weijun, W.; Tobias, W.; Marco, A.; Hartwig, A. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Alexey, D.; Lucas, B.; Alexander, K.; Dirk, W.; Xiaohua, Z.; Thomas, U.; Mostafa, D.; Matthias, M.; Georg, H.; Sylvain, G.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Zhang, L.; Yang, F.; Daniel Zhang, Y.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the International Conference on Image Processing, ICIP, Phoenix, AZ, USA, 25–28 September 2016; pp. 3708–3712. [Google Scholar] [CrossRef]
Zhang, A.; Wang, K.C.P.; Fei, Y.; Liu, Y.; Chen, C.; Yang, G.; Li, J.Q.; Yang, E.; Qiu, S. Automated Pixel-Level Pavement Crack Detection on 3D Asphalt Surfaces with a Recurrent Neural Network. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 213–229. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Buyukozturk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Chen, F.C.; Jahanshahi, M.R. NB-CNN: Deep Learning-Based Crack Detection Using Convolutional Neural Network and Naïve Bayes Data Fusion. IEEE Trans. Ind. Electron. 2018, 65, 4392–4400. [Google Scholar] [CrossRef]
Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of deep convolutional neural networks and edge detectors for image-based crack detection in concrete. Constr. Build. Mater. 2018, 186, 1031–1045. [Google Scholar] [CrossRef]
Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1127–1141. [Google Scholar] [CrossRef]
Jiang, Z.; Dong, Z.; Wang, L.; Jiang, W. Method for Diagnosis of Acute Lymphoblastic Leukemia Based on ViT-CNN Ensemble Model. Comput. Intell. Neurosci. 2021, 2021, 7529893. [Google Scholar] [CrossRef] [PubMed]
Bashmal, L.; Bazi, Y.; Al Rahhal, M.M.; Alhichri, H.; Al Ajlan, N. UAV Image Multi-Labeling with Data-Efficient Transformers. Appl. Sci. 2021, 11, 3974. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
Shao, L.; Zhu, F.; Li, X. Transfer Learning for Visual Categorization: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1019–1034. [Google Scholar] [CrossRef] [PubMed]

Figure 1. High and steep slopes.

Figure 2. Real-world examples of slope surface cracks.

Figure 3. Workflow of the proposed deep learning approach for the identification of slope surface cracks.

Figure 4. Comparative illustrations before and after data augmentation.

Figure 5. A simple illustration of transfer learning.

Figure 6. The concrete crack dataset that contains the images of the cracks.

Figure 7. The concrete crack dataset that does not contain the images of the cracks.

Figure 8. The dataset of surface cracks of soil and rock masses that contains the images of the cracks.

Figure 9. The dataset of surface cracks of soil and rock masses that does not contain the images of the cracks.

Figure 10. Comparison of accuracy before data augmentation. (a) Comparison of accuracy of each model before data augmentation. (b) Comparison of the highest accuracy of each model before data augmentation.

Figure 11. Comparison of the accuracy of each model before and after data augmentation.

Figure 12. Comparison of accuracy after data augmentation. (a) Comparison of accuracy of each model after data augmentation. (b) Comparison of the highest accuracy of each model after data augmentation.

Figure 13. Comparison of computing time of each model.

Figure 14. Comparison of accuracy before and after LeNet5 transfer learning.

Figure 15. Comparison of the accuracy rate of identifying the surface cracks of soil and rock masses of each model.

Figure 16. Comparison of the accuracy of the identification of surface cracks of soil and rock masses and concrete cracks of each model.

Figure 17. Comparison of the highest accuracy in the identification of surface cracks of soil and rock masses and concrete cracks of each model.

Table 1. Software environment configurations used in this paper.

Software	Details
OS	Windows 10 Professional
Programming language	Python
Deep learning framework	PyTorch
Dependent library	Torch, Torchvision, CUDA, PIL etc.

Table 2. Hardware environment configurations used in this paper.

Hardware	Details
CPU	Intel Xeon Gold 5118 CPU
CPU Frequency (GHz)	2.30
CPU core	48
CPU RAM (GB)	128
GPU	Quadro P6000
GPU memory (GB)	24
CUDA cores	3840
CUDA version	v9.0

Table 3. Total number of parameters and floating point operations for each model.

Model	LeNet5	AlexNet	InceptionA	InceptionE	ResNet18	MobileNet	Vision Transformer
Parameters	657.35 K	0.47 GMac	0.73 GMac	59.59 GMac	6.92 GMac	0.58 GMac	11.27 GMac
FLOPs	61.33 K	56.12 M	29.75 M	697.84 M	11.18 M	3.21 M	85.80 M

Table 4. Accuracy of crack identification before data augmentation.

Epoch	LeNet5 (%)	AlexNet (%)	InceptionA (%)	InceptionE (%)	ResNet18 (%)	MobileNet (%)	Vision Transformer (%)
1	98.600	99.312	98.513	97.588	95.875	98.013	94.263
2	98.812	99.375	98.875	99.000	99.562	99.025	94.588
3	98.912	99.575	99.237	99.263	99.638	99.400	94.388
4	99.000	99.688	99.263	99.275	99.200	99.638	94.513
5	98.800	99.125	99.312	98.862	99.688	98.950	94.912
6	98.737	99.650	99.287	99.075	99.700	99.525	94.588
7	99.025	99.675	99.213	99.013	99.575	99.513	94.662
8	99.050	99.150	99.100	99.150	99.750	99.588	92.463
9	99.138	99.675	99.362	99.188	99.438	99.388	94.237
10	99.037	99.800	99.138	99.100	99.388	99.537	94.263

Table 5. Accuracy of each model after data augmentation.

Epoch	LeNet5 (%)	AlexNet (%)	InceptionA (%)	InceptionE (%)	ResNet18 (%)	MobileNet (%)	Vision Transformer (%)
1	96.875	98.838	93.912	71.050	86.625	83.412	80.912
2	96.475	98.350	98.662	80.787	90.763	87.725	78.375
3	97.050	99.287	98.463	91.425	95.375	83.825	81.075
4	97.675	99.537	98.700	96.850	96.812	88.088	80.537
5	96.412	99.638	98.850	98.138	92.338	89.450	86.550
6	97.675	99.463	99.050	98.112	95.800	86.075	80.388
7	97.062	99.700	99.175	98.388	99.025	95.713	85.062
8	98.088	99.562	99.175	98.412	95.600	90.125	87.812
9	97.950	99.612	99.125	98.438	92.700	96.612	82.888
10	98.100	99.675	99.200	98.562	96.612	95.112	77.550

Table 6. Operation timetable for each epoch of each model.

Epoch	LeNet5 (%)	AlexNet (%)	InceptionA (%)	InceptionE (%)	ResNet18 (%)	MobileNet (%)	Vision Transformer (%)
1	1.34	3.40	4.76	53.39	6.62	4.31	15.02
2	1.33	3.32	4.30	53.24	6.73	4.25	15.02
3	1.33	3.32	4.30	53.24	6.70	4.21	15.05
4	1.34	3.31	4.28	53.30	6.69	4.37	15.00
5	1.33	3.33	4.29	53.25	6.70	4.51	15.00
6	1.33	3.33	4.26	53.25	6.69	4.54	15.01
7	1.33	3.35	4.31	53.26	6.71	4.54	14.99
8	1.34	3.33	4.29	53.29	6.78	4.54	15.00
9	1.32	3.36	4.25	53.26	6.69	4.53	14.98
10	1.32	3.32	4.31	53.27	6.72	4.52	15.00

Table 7. Accuracy comparison table before and after LeNet5 transfer learning.

Epoch	1 (%)	2 (%)	3 (%)	4 (%)	5 (%)	6 (%)	7 (%)	8 (%)	9 (%)	10 (%)
Accuracy before transfer learning	72.360	75.000	72.500	72.500	70.000	72.500	72.500	72.500	72.500	72.500
Accuracy after transfer learning	96.101	97.129	97.203	97.908	97.809	97.859	98.094	98.069	97.686	97.314

Table 8. Comparison of the accuracy of identifying surface cracks of soil and rock masses in each model.

Epoch	LeNet5 (%)	AlexNet (%)	InceptionA (%)	InceptionE (%)	ResNet18 (%)	MobileNet (%)	Vision Transformer (%)
1	96.101	98.899	98.886	84.196	89.719	83.824	70.520
2	97.129	98.837	98.824	85.619	94.356	82.599	79.498
3	97.203	99.468	98.280	96.832	95.099	86.572	84.567
4	97.908	99.319	98.812	97.834	83.317	91.064	82.587
5	97.809	99.233	98.725	98.069	95.532	91.708	83.688
6	97.859	99.433	98.651	98.082	90.149	87.698	81.002
7	98.094	99.381	98.960	98.403	94.406	90.916	76.535
8	98.069	99.666	98.973	98.391	96.609	97.079	89.579
9	97.686	99.418	98.948	98.490	91.955	96.782	85.594
10	97.314	99.604	98.861	98.428	86.399	93.181	77.822

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Y.; Mei, G. Deep Transfer Learning Approach for Identifying Slope Surface Cracks. Appl. Sci. 2021, 11, 11193. https://doi.org/10.3390/app112311193

AMA Style

Yang Y, Mei G. Deep Transfer Learning Approach for Identifying Slope Surface Cracks. Applied Sciences. 2021; 11(23):11193. https://doi.org/10.3390/app112311193

Chicago/Turabian Style

Yang, Yuting, and Gang Mei. 2021. "Deep Transfer Learning Approach for Identifying Slope Surface Cracks" Applied Sciences 11, no. 23: 11193. https://doi.org/10.3390/app112311193

APA Style

Yang, Y., & Mei, G. (2021). Deep Transfer Learning Approach for Identifying Slope Surface Cracks. Applied Sciences, 11(23), 11193. https://doi.org/10.3390/app112311193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Transfer Learning Approach for Identifying Slope Surface Cracks

Abstract

1. Introduction

2. Methods

2.1. Overview

2.2. Step 1: Construction of Pretrained Deep Learning Models for the Identification of Slope Surface Cracks

2.2.1. Data Collection and Cleaning

2.2.2. Data Augmentation

2.2.3. Pretrained Model Construction

2.3. Construction of Refined Deep Learning Models for the Identification of Slope Surface Cracks

2.3.1. Dataset Construction

2.3.2. Refined Model Construction

3. Results

3.1. Experimental Environment

3.2. Experimental Data

3.2.1. Pretrained Model Dataset

3.2.2. Refined Model Dataset

3.3. Test Results and Analysis of the Pretrained Deep Learning Models for the Identification of Slope Cracks

3.3.1. Accuracy

3.3.2. Efficiency

3.4. Test Results and Analysis of the Refined Deep Learning Models for the Identification of Slope Surface Cracks

3.4.1. Comparison of Accuracy before and after Using Transfer Learning

3.4.2. Comparison of the Accuracy of Each Model after Transfer Learning

3.4.3. Comparative Analysis of the Accuracy of the Refined Models and the Pretrained Models

4. Discussion

4.1. Advantages and Applicability

4.2. Shortcomings

4.3. Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI