Rock Crack Recognition Technology Based on Deep Learning

The changes in cracks on the surface of rock mass reflect the development of geological disasters, so cracks on the surface of rock mass are early signs of geological disasters such as landslides, collapses, and debris flows. To research geological disasters, it is crucial to swiftly and precisely gather crack information on the surface of rock masses. Drone videography surveys can effectively avoid the limitations of the terrain. This has become an essential method in disaster investigation. This manuscript proposes rock crack recognition technology based on deep learning. First, images of cracks on the surface of a rock mass obtained by a drone were cut into small pictures of 640 × 640. Next, a VOC dataset was produced for crack object detection by enhancing the data with data augmentation techniques, labeling the image using Labelimg. Then, we divided the data into test sets and training sets in a ratio of 2:8. Then, the YOLOv7 model was improved by combining different attention mechanisms. This study is the first to combine YOLOv7 and an attention mechanism for rock crack detection. Finally, the rock crack recognition technology was obtained through comparative analysis. The results show that the precision of the improved model using the SimAM attention mechanism can reach 100%, the recall rate can achieve 75%, the AP can reach 96.89%, and the processing time per 100 images is 10 s, which is the optimal model compared with the other five models. The improvement is relative to the original model, in which the precision was improved by 1.67%, the recall by 1.25%, and the AP by 1.45%, with no decrease in running speed. This proves that rock crack recognition technology based on deep learning can achieve rapid and precise results. It provides a new research direction for identifying early signs of geological hazards.


Introduction
Devastating natural hazards such as landslides are massive threats to lives and property around the globe, especially in mountainous regions [1]. In recent years, due to the intensification of climate change, extreme weather has occurred more frequently, providing favorable conditions for landslides [2]. The Guangdong-Hong Kong-Macao Greater Bay Area [3] is prone to storm surges, heavy rain, floods, and their derivative geological disasters. With climate change and high urbanization, natural disasters in the Guangdong-Hong Kong-Macao Greater Bay Area are becoming more interchangeable and transitive, with complex mechanisms and great harm. From 2014 to 2020, there were 1446 landslide and collapse disaster points in the Guangdong-Hong Kong-Macao Greater Bay Area, most of which were distributed in the transition zone between hilly and shallow mountainous areas and plains, and where human engineering activities were very strong. Under the comprehensive influence of its complex natural geographical environment, climatic conditions, and human activities, mountain disasters such as collapses, landslides, and mudslides in the Greater Bay Area are frequent.
Complete rock masses undergoing weathering and geological reactions can lead to a decrease in strength and the formation of tiny cracks. Root cracks and ice cracks occur,

Data Collection and Preprocessing
The image data used in this study were obtained via UAV photography. The richness and complexity of the dataset helps with the iterative optimization of convolutional neural models. Rock cracks with different postures can help the model increase learning efficiency and avoid overfitting the model. Therefore, we used drones to shoot the same rock crack from multiple postures and angles, increasing the number of datasets on the one hand and increasing the number of features of the dataset on the other. This increases the performance of the model in identifying objects. The images captured by the UAV were cropped to 640 × 640.
In this study, many pictures were obtained using UAVs to photograph in the field. In this study, 200 images with different morphologies and complex backgrounds were selected from a large number of images to improve the representativeness of the sample data. To avoid model overfitting, the dataset was grayscale processed. The dataset was enriched with methods such as angle change. The expanded dataset was increased from 200 to 714 images. The expanded dataset is shown in Figure 1. speed and precision better than all known object detectors. The attention mechanism enables the model to focus on key regions to improve model performance. In this study, multiple attention mechanisms were inserted between the backbone networks of YOLOv7. The YOLOv7 method was combined with different attention mechanisms to improve the efficiency of crack detection. The model inserted by the attention mechanism refines the multidimensional features of the image so that the precision, recall, and AP of the model are improved to various degrees.

Study Area
The Guangdong-Hong Kong-Macao Greater Bay Area city cluster is located in the south of the Nanling Mountain Range and south of the South China Sea, and has a latitude and longitude position of 111°59′ E~115°28′ E, 21°56′ N~24°51′ N. The topography and landforms of the region are mainly terraces, hills, and plains. The interaction between ocean, land, and atmosphere is strong, and the climate is complex and changeable, with an average annual temperature of 22 °C and an average yearly rainfall of 2300 mm. The combination of climatic and topographic conditions in this region has made the Guangdong-Hong Kong-Macao Greater Bay Area a geological-disaster-prone area.

Data Collection and Preprocessing
The image data used in this study were obtained via UAV photography. The richness and complexity of the dataset helps with the iterative optimization of convolutional neural models. Rock cracks with different postures can help the model increase learning efficiency and avoid overfitting the model. Therefore, we used drones to shoot the same rock crack from multiple postures and angles, increasing the number of datasets on the one hand and increasing the number of features of the dataset on the other. This increases the performance of the model in identifying objects. The images captured by the UAV were cropped to 640 × 640.
In this study, many pictures were obtained using UAVs to photograph in the field. In this study, 200 images with different morphologies and complex backgrounds were selected from a large number of images to improve the representativeness of the sample data. To avoid model overfitting, the dataset was grayscale processed. The dataset was enriched with methods such as angle change. The expanded dataset was increased from 200 to 714 images. The expanded dataset is shown in Figure 1.   Then, this study used Labelimg to annotate the images and obtain the VOC dataset for cracked object detection. Finally, the data were divided into test and training sets in a ratio of 2:8 for training and testing.

Detection Principle
YOLOv7 is a new version of the object detection algorithms proposed by the YOLOv4 team. In order to accomplish a balance and enhancement in speed and precision, YOLOv7 has been meticulously designed in terms of model network structure and extensive parameterization.
The network structure of the YOLOv7 object detection model comprises an input module, a backbone module, and a neck module.The flow chart of the model is shown in Figure 2 The input module employs the same data enhancement methods as YOLOv5, including Mosaic data enhancement, Mixup data enhancement, adaptive anchor frame calculation, adaptive image scaling, and other techniques. The backbone network module has a total of 50 layers. First, there are 4 convolutional layers or 4 CBS layers. Then, the data are processed through an ELAN module. ELAN consists of multiple CBS layers; the scale of input and output features remains the same, the number of channels differs in the first two CBS, the next few input channels are consistent with the output channels, and the last CBS output is the desired channel. Next are three MP-1 + ELANs. The MP-1 block is predominantly partitioned into Maxpool and CBS, and its output is one channel. Finally, three feature layers are formed through the output of the three MP-1 + ELAN modules. The depth characteristics obtained through the backbone network are input into the neck module. The entire neck module contains the SPPCPC layer, two MPConv layers, two UPSample layers, four ELAN-1 layers, four CBS layers, four CatConv layers, and three REPConv layers. The characteristics acquired by the backbone network can be obtained through a series of calculations of the neck module to obtain the ultimate result.
In the input module, Mosaic data enhancement merges four images by arbitrarily scaling, cropping, and arranging them to enrich the background of the detection object and improve the performance of small things. Mixup data enhancement combines two random samples proportionally, and the results of the classification are proportionally distributed. It can enhance the generalization ability of neural network architecture and reduce the memory of false labels. Adaptive anchor frame calculations enable the adaptive computation of the optimal anchor frame values in various training sets. Adaptive image scaling reduces the black bars to be filled after the image is scaled, and the quantity of computation is significantly reduced during inference so that the object detection performance is enhanced.
In the backbone module, an MP structure is formed by maxpooling and cnov to improve the ability of the backbone module to extract features. As the convolutional neural network deepens, the information disappears or expands through many layers, and the shorter connection of the convolutional neural network in the layer close to the input or near the output significantly increases the depth of the model and at the same time greatly improves the efficiency of the training. Based on this idea, the ELAN structure was designed, enabling the backbone network to achieve an efficient aggregation network. The MP structure shows in Figure 3. The ELAN structure shows in Figure 4.
In the neck module, the SPPCPC structure is the first space pyramid pooling structure proposed in YOLOv7. SPPCPC has a significant improvement compared with the previous SPP structure, but the amount of parameters and calculations is also much greater. After the ELAN structure was changed for the neck as ELAN-1, it was also applied in the neck module. The REP structure is a structural reparameterization. It is also an important improvement in the neck module of YOLOv7. It makes the YOLOv7 run faster without sacrificing precision. The SPPCPC structure shows in Figure 5. The ELAN-1 structure shows in Figure 6. The REP structure shows in Figure 7. In the backbone module, an MP structure is formed by maxpooling and cnov to improve the ability of the backbone module to extract features. As the convolutional neural network deepens, the information disappears or expands through many layers, and the shorter connection of the convolutional neural network in the layer close to the input or near the output significantly increases the depth of the model and at the same time greatly improves the efficiency of the training. Based on this idea, the ELAN structure was designed, enabling the backbone network to achieve an efficient aggregation network. The MP structure shows in Figure 3. The ELAN structure shows in Figure 4.   In the backbone module, an MP structure is formed by maxpooling and cnov to improve the ability of the backbone module to extract features. As the convolutional neural network deepens, the information disappears or expands through many layers, and the shorter connection of the convolutional neural network in the layer close to the input or near the output significantly increases the depth of the model and at the same time greatly improves the efficiency of the training. Based on this idea, the ELAN structure was designed, enabling the backbone network to achieve an efficient aggregation network. The MP structure shows in Figure 3. The ELAN structure shows in Figure 4.   In the neck module, the SPPCPC structure is the first space pyramid pooling structure proposed in YOLOv7. SPPCPC has a significant improvement compared with the previous SPP structure, but the amount of parameters and calculations is also much greater. After the ELAN structure was changed for the neck as ELAN-1, it was also applied in the neck module. The REP structure is a structural reparameterization. It is also an important improvement in the neck module of YOLOv7. It makes the YOLOv7 run faster without sacrificing precision. The SPPCPC structure shows in Figure 5. The ELAN-1 structure shows in Figure 6. The REP structure shows in Figure 7. previous SPP structure, but the amount of parameters and calculations is also much greater. After the ELAN structure was changed for the neck as ELAN-1, it was also applied in the neck module. The REP structure is a structural reparameterization. It is also an important improvement in the neck module of YOLOv7. It makes the YOLOv7 run faster without sacrificing precision. The SPPCPC structure shows in Figure 5. The ELAN-1 structure shows in Figure 6. The REP structure shows in Figure 7.

Attention Mechanism
The human visual system can naturally find important positions in image information, and when the information of important positions is obtained, the recognition and judgment are greatly accelerated. Introducing this idea into computer vision, where the adaptive attention of convolutional neural networks is important, can improve the precision and speed of the YOLOv7. So far, the attention mechanism has developed a number of categories, among which channel attention mechanism, spatial attention mechanism, and the combination of the two-channel and spatial attention mechanisms are the more mainstream attention mechanisms. greater. After the ELAN structure was changed for the neck as ELAN-1, it was also applie in the neck module. The REP structure is a structural reparameterization. It is also an im portant improvement in the neck module of YOLOv7. It makes the YOLOv7 run fast without sacrificing precision. The SPPCPC structure shows in Figure 5. The ELAN-1 stru ture shows in Figure 6. The REP structure shows in Figure 7.

Attention Mechanism
The human visual system can naturally find important positions in image info mation, and when the information of important positions is obtained, the recognition an judgment are greatly accelerated. Introducing this idea into computer vision, where th adaptive attention of convolutional neural networks is important, can improve the prec sion and speed of the YOLOv7. So far, the attention mechanism has developed a numb of categories, among which channel attention mechanism, spatial attention mechanism and the combination of the two-channel and spatial attention mechanisms are the mo mainstream attention mechanisms. greater. After the ELAN structure was changed for the neck as ELAN-1, it was also applied in the neck module. The REP structure is a structural reparameterization. It is also an important improvement in the neck module of YOLOv7. It makes the YOLOv7 run faster without sacrificing precision. The SPPCPC structure shows in Figure 5. The ELAN-1 structure shows in Figure 6. The REP structure shows in Figure 7.

Attention Mechanism
The human visual system can naturally find important positions in image information, and when the information of important positions is obtained, the recognition and judgment are greatly accelerated. Introducing this idea into computer vision, where the adaptive attention of convolutional neural networks is important, can improve the precision and speed of the YOLOv7. So far, the attention mechanism has developed a number of categories, among which channel attention mechanism, spatial attention mechanism, and the combination of the two-channel and spatial attention mechanisms are the more mainstream attention mechanisms.

Attention Mechanism
The human visual system can naturally find important positions in image information, and when the information of important positions is obtained, the recognition and judgment are greatly accelerated. Introducing this idea into computer vision, where the adaptive attention of convolutional neural networks is important, can improve the precision and speed of the YOLOv7. So far, the attention mechanism has developed a number of categories, among which channel attention mechanism, spatial attention mechanism, and the combination of the two-channel and spatial attention mechanisms are the more mainstream attention mechanisms.
Channel attention mechanism: The CNN features of a two-dimensional image usually have three dimensions: length, width, and channel. To improve the ability to represent the feature, the channel attention mechanism weighs the channels of the convolutional feature. The SEnet attention mechanism [26] is the pioneering work of the channel attention mechanism. The SEnet is divided into a squeeze module and an excitation module, which can collect the global information, capture channel relationships, and improve the ability to extract features. However, SEnet is unable to directly model the correspondence between the weight vector and the input, which reduces the quality of the results. The ECAnet attention mechanism [27] uses one-dimensional convolution to determine interactions between channels, rather than dimensionality reduction, to provide a fast, efficient module that can be easily inserted into various convolutional neural networks. Additionally, channel attention mechanisms are also included: GSoP [28], SRM [29], GCT [30], Fca, etc. [31].
Spatial attention mechanism: The STN attention mechanism [32] transforms the spatial information in the original image into another space through a spatial transformation and preserves the key information. Meanwhile, the pooling layer in the convolutional neural network uses maximum pooling or average pooling methods to compress the image information, reducing the amount of operation and improving the precision. The GEnet attention mechanism [33] can take advantage of the ability to recalibrate within the spatial domain to capture contextual information in remote space. The Vit attention mechanism [34] was first used for image processing as a pure converter architecture. It obtained results comparable to modern convolutional neural networks. In addition to the above, spatial attention mechanisms include DCN [35], PSAnet [36], SASA [37], and so on.
Channel and spatial attention mechanism: The channel and spatial attention mechanism combines the channel attention mechanism and the spatial attention mechanism. It includes the advantages of both. The CBAM attention mechanism [38] consists of two sub-modules, CAM and SAM, which perform channel and spatial attention, respectively. It not only saves the number of parameters and amount of computing power but also makes it easy to insert it into existing network architectures. In addition, channel space attention mechanisms also include CA [39], SCNet [40], SCA-CNN [41], scSE [42], Triplet Attention, etc. [43].
The SimAM attention mechanism [44] is a parameterized simple attention module based on neuroscience theory. When designing this attention mechanism, Lingxiao Yang's team at Sun Yat-sen University considered both the channel attention mechanism and spatial attention mechanism and designed a parameterized 3D attention mechanism based on neuroscience theory. When attention mechanisms are at work, neurons are assigned weights based on their importance. Neurons have the property of spatial inhibition [45], i.e., more active neurons inhibit the activity of peripheral neurons. Moreover, the neurons that contain the most information in the optic nerve differ from the firing patterns of ordinary neurons. Based on the above two characteristics, the higher the inhibitory effect, the greater the importance of neurons. The linear separability of the target neuron from other neurons can be used to measure the energy of the target neuron, whereby the following energy functions can be defined: (1) wheret = w t t + b t andx i = w t x i + b t are linear transformations of t and x i . t and x i represent the target neurons and other neurons of the input characteristics, respectively, and i is the index of the spatial dimension. M is the number of neurons in that channel. w t and b t represent the weights and biases of a neuron as it transforms, respectively. By minimizing this equation, we can obtain the linear severability of the target neuron and other neurons using binary labels instead of y 0 and y t , adding regularization to the equation, and finally obtaining the energy equation: Theoretically, each channel has M energy functions. The solution to Equation (2) is as follows: In the formula, 2 represent the standard deviation and variance of all other neurons in this channel except the target neuron t, respectively. Substituting w t and b t into Equation (2), respectively, yields the minimum energy. In the formula, for µ = 1 can be used to represent the importance of neurons.
The regulation of attention mechanisms in the mammalian brain manifests itself as scaling to neurons, so we used scaling operators for feature optimization.
Based on this, this paper introduces the attention mechanism into the YOLOv7 object detection model to obtain a faster and more precise object detection model. In this paper, we selected five attention mechanisms to introduce into YOLOv7, trained them separately on the cracks dataset we collected, obtained their respective object detection models, and then screened the performance of the models for best results.

Design Method
The design idea of the backbone network part of YOLOv7 is already excellent. Inserting attention mechanisms into backbones can disrupt the integrity of backbones, resulting in training that is less effective than before. Between the backbone structure and the neck structure of YOLOv7, there are three feature layers of connection. We insert attention mechanisms here.
The network architecture process of the model: The image is input to the backbone network. According to the output of the three layers in the backbone network, the features are refined by the attention mechanism, the feature map with different sizes is continued to be output in the head layer through the backbone network in three layers, and the final result is output after the RepVGG block and conv for image prediction.
In this study, YOLOv7 was used as a baseline to insert different attention mechanisms to improve and construct rock crack recognition technology based on deep learning. To increase the credibility of the model experiments, this study used YOLOv5 as a benchmark for comparison.

Performance Metrics
There are four things that can happen to the results in object detection: TP indicates that the prediction is positive and the label is positive; FP indicates that the prediction is positive and the label is negative; FN indicates that the prediction is negative and the label is positive; and TN indicates that the prediction is negative and the label is negative.
Criteria for object detection models include precision, recall, average precision, etc. Precision indicates that the label is positive and accounts for the proportion of all results predicted as positive.
Recall indicates that the result predicted is positive and accounts for the proportion of all results labeled as positive.
Average Precision: Precision and recall are contradictory in some extreme cases, so it is necessary to draw a PR curve to judge. Simply put, it is the mean of the precision value on the PR curve. For PR curves, the integral can be used for calculations, and the resulting value is the mean precision (AP, average precision).
Time100 indicates the time required to process 100 images.

Parameters and Calculation
Param shows the value of the model parameters, and FLOPs shows the amount of computations for the model. The amount of model parameters and calculations determine the efficiency of the model training. The amount of model parameters and calculations is introduced in Table 1. It can be seen that the SimAM attention mechanism does not increase the number of parameters after the insertion of YOLOv7, while other attention mechanisms increase the model parameters to varying degrees. The increase in the number of parameters reduces the efficiency of the model to some extent when training. The attention mechanism increases the number of parameters of the model while inserting the model, which inevitably reduces the computational efficiency of the model. As a parameterless attention mechanism, SimAM avoids this problem.

Training Loss
During model training, the loss of the model decreases with training. The loss reduction in each attention mechanism combined with the YOLOv7 model is shown in Figure 1. Tables 2 and 3 introduce the total loss and val loss, respectively.  Figure 8 and Tables 2 and 3 show the loss drop during the training of each model, including total loss and val loss. With the help of various attention mechanisms, the rate of loss decline in other models improved to varying degrees. According to Table 2, the SimAM attention mechanism has a rapid rate of loss decline in the early stage of training, and the loss curve quickly converges to reach a stable state. According to Table 3, the SimAM attention mechanism decreases faster than other attention mechanisms at the beginning of training and achieves the best loss rate at the end.

Performance
After training, the training parameters of each model can be obtained. The parameters can be used for the object detection test model. Performance metrics for object detection include precision, recall, average precision, and Time100. The PR curve of each model is shown in Figure 9, and the detection performance of each model is shown in Table 4.
The performance of each model is shown in Figure 9 and Table 4. The baseline model, YOLOv7, reached 98.33% in terms of precision, and the models that incorporate the CBAM attention mechanism, the SE attention mechanism, and the SimAM attention mechanism increased the precision of the model to 100%. ECA and CA combined with YOLOv7

Performance
After training, the training parameters of each model can be obtained. The parameters can be used for the object detection test model. Performance metrics for object detection include precision, recall, average precision, and Time100. The PR curve of each model is shown in Figure 9, and the detection performance of each model is shown in Table 4.   The performance of each model is shown in Figure 9 and Table 4. The baseline model, YOLOv7, reached 98.33% in terms of precision, and the models that incorporate the CBAM attention mechanism, the SE attention mechanism, and the SimAM attention mechanism increased the precision of the model to 100%. ECA and CA combined with YOLOv7 have a certain degree of reduction in precision, attaining only 96.83% and 96.61%, respectively.
The recall rate of the baseline model, YOLOv7, was 73.75%. The improvement in the ECA attention mechanism after insertion into the original model reached 76.25%, followed by the SimAM attention mechanism, which achieved 75% after insertion of the recall rate, and the CBAM attention mechanism and the original model, YOLOv7, were the same. However, the CA attention mechanism and the SE attention mechanism led to a decrease in the recall rate of the model, of 71.25% and 68.75%, respectively.
The average precision combined precision and recall rate can better demonstrate the performance of the model, for which the SimAM attention mechanism after insertion into YOLOv7 improved the degree of the maximum degree of the model's AP from 95.44% to 96.89%, an increase of 1.45%. Except for the ECA attention mechanism, which lowered the AP of the model, the other attention mechanisms improved it to varying degrees.

Discussion
Comparing the six models in this study, a comprehensive view shows that YOLOv5 has the worst performance compared to the other models. This study inserted five different attention mechanisms between the backbone module and the neck module of the YOLOv7 object detection model, which affected the efficiency of the object detection model to varying degrees. The results show that the biggest performance improvement in the model was the SimAM attention mechanism. It did not increase the number of parameters of the model when inserting the model, and at the same time, it improved the performance of the model to the greatest extent compared with the other four attention mechanisms overall. Arunabha et al. proposed DenseSPH-YOLOv5 based on YOLOv5 for improvement by CBAM, additional feature fusion layers, and a Swin-Transformer Prediction Head (SPH) [46]. Ziang Cao et al. combined the CBAM attention mechanism with YOLOv5 for persimmon detection, with a precision of 92.69% and an AP of 95.53%, which was 1.51% and 0.63% higher than the original model [47], and the increase was more consistent with the results of this study. The spatial attention map of the CBAM attention mechanism was generated using convolution, so the attention mechanism part is affected by the receiving field to a certain extent, and its insertion increases the most parameters. In this study, the combination of SEnet and YOLOv7 improved the precision by 1.67% and the AP by 0.77%, compared to the original model's precision. Zhenrong Deng et al. combined the YOLOv3 model with improved Anchor Box Algorithms and SENet attention mechanisms for smallscale face detectors for outdoor security [48]. The new model is 2.2% better in precision and 1.4% better than the original model. The SE attention mechanism cannot directly model the correspondence between the weight vector and the input, which reduces the quality of the results to a certain extent. The ECA attention mechanism module uses a 1 × 1 convolutional layer directly after the global average pooling layer, removing the fully connected layer. In this paper, it increased the speed of the object detection model to some extent, but the precision did decrease to a certain extent. Yuanyang Cao et al. combined the ECA attention mechanism with YOLOv5x to apply dynamic sheep counting [49]. Compared with the original model, the precision of the model was improved by 0.76%, which is similar to the results of this study. On the one hand, the datasets are different, and on the other hand, the different locations introduced by the attention mechanism cause the difference in results. The CA attention mechanism decreased rapidly during training, but the precision effect was not good when using the trained parameters for object detection.
This study merely used a model derived by inserting five attention mechanisms into the same position of the original model. Because the design concepts, mechanisms, and principles of various attention mechanisms are different, their characteristics are also different. Therefore, the insertion location and the trained dataset also had a significant impact on the efficacy of the model. Based on the above, the next study will evaluate the position of the attention mechanism insertion, the combination of attention mechanisms, and the impact of various datasets on the performance of the object detection model to obtain a superior rock mass crack object detector.

Conclusions
The YOLOv7 object detection model surpasses the more widely used YOLOv5 object detection models in both precision and speed. Attention mechanisms make convolutional neural networks focus on areas that need attention, improve the performance of trained models, and increase the speed and precision of detecting models. Based on this, we inserted five attention mechanisms, CBAM, SE, ECA, CA, and SimAM, into YOLOv7. We compared the performance of these models to select the best attention mechanism of these five. In order to obtain the best rock crack object detection model, we combined the best attention mechanism with YOLOv7.
After comparison, it is evident that SimAM is the finest attention mechanism. First of all, the quantity of parameters and calculations did not increase. This means that the burden of training and detection did not increase. Secondly, the loss of the training model accelerated, and the loss curve rapidly converged. This implies that the pace of the training of the model was more rapid than before. Then, the precision reached up to 100%, the recall was second only to ECA, and the average precision reached a maximum of 96.89% in several attention mechanisms. Taken together, SimAM has a tremendous advantage over the other attention mechanisms. In summary, this study integrated the SimAM attention mechanism with the YOLOv7 object detection model to form an enhanced object detection model. In this study, the enhanced YOLOv7 object detection model was applied for the first time in the identification of rock mass crocks, an early symptom of geological catastrophes. The model attained excellent performance, and the object detection method for cracks in rock mass, an early indicator of geological disasters, can be obtained.
At present, the object detection method is relatively perfect. Few studies have used deep learning to identify early signs of disasters. In this study, an improved YOLOv7 object detection method based on the attention mechanism was applied for the first time to the identification of rock cracks in geohazards. Rock crack recognition technology based on deep learning is faster and more precise in discerning the presence and location of fractures in images. In the current study, there is still more research space for more refined and quantified measurements of early signs of geological disasters. If the development pattern of fissures needs to be studied more deeply, the calculation of parameters such as fissure width, length, and area can be realized in combination with the function of image segmentation.