A Study on the Identification of Cracks in Mine Subsidence Based on YOLOv8n Improvement

Zhang, Lei; Li, Xiwei; Hao, Shangkai; Yan, Qianru; Wang, Jiayuan; Wang, Meng

doi:10.3390/pr12122716

Open AccessArticle

A Study on the Identification of Cracks in Mine Subsidence Based on YOLOv8n Improvement

by

Lei Zhang

,

Xiwei Li

^*,

Shangkai Hao

,

Qianru Yan

,

Jiayuan Wang

and

Meng Wang

College of Coal Engineering, Shanxi Datong University, Datong 037003, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(12), 2716; https://doi.org/10.3390/pr12122716

Submission received: 26 October 2024 / Revised: 20 November 2024 / Accepted: 29 November 2024 / Published: 1 December 2024

(This article belongs to the Section Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Ground cracks resulting from coal mining subsidence can pose significant risks to coal mine safety, production, and the surrounding ecological environment. The use of drone technology coupled with deep learning holds practical value for accurately identifying ground cracks in mining areas. This study introduces an enhanced target detection algorithm based on YOLOv8n for analyzing ground cracks in aerial images captured by unmanned aerial vehicles (UAVs) dealing with small targets and widely distributed in shrubland. The C2f module of the backbone network is improved by incorporating deformable convolution (DCN), enhancing the model’s ability to adapt to detected object shapes, focus on small targets, and reduce irrelevant background information. Subsequently, the global attention mechanism (GAM) is integrated into the neck network to minimize feature information loss during training. Finally, the wise IOU (WIOU) loss function replaces the complete IOU (CIOU) loss function to improve the model’s generalizability. Experimental results demonstrate that the algorithm enhanced in this study exhibits improvements in precision (P), recall (R), and mean average precision (mAP@0.5) of 3.5%, 4.8%, and 5.1%, respectively, compared to the original YOLOv8n model. Furthermore, compared to other prevalent algorithms, the enhanced algorithm notably reduces model parameters, floating point operations per second (FLOPs), and model size while maintaining high detection accuracy. In conclusion, the enhanced YOLOv8n-based algorithm offers practical benefits for detecting cracks in mining subsidence areas.

Keywords:

UAV aerial imagery; ground crack detection; YOLOv8n; deformable convolution; small target detection

1. Introduction

Mineral resources play a crucial role in the economic and social development of a country [1]. Prolonged and intensive coal mining can lead to geological disasters such as surface collapse and ground cracks [2]. Ground cracks, which are common and highly dangerous geological disasters, can cause varying degrees of damage to the land and ecology of mining areas, resulting in property and economic losses and harm to the surface environment [3]. With the rapid development of coal mining technology and coal mining equipment in recent years, mines located in the western part of China, such as the Shandong Coalfield, have commonly adopted efficient and high-yield coal mining methods such as extra-long working face and roof coal, placing, which have a larger working face size and a more incredible advancing speed than the ordinary mining and integrated mining methods in the past. Mining at such a large working face at a faster advance rate will inevitably lead to more violent movements of the overburden and the ground surface, which will further cause earth cracks, damage to surface structures, and other disasters. For example, the surface of the Daliuta 1203 working face in the Shendong Mining Area produced step-type ground cracks. It penetrated through the air-mining area, causing hazards such as water surges, air leakage in the air-mining area, sand routing, and other hazards, as well as severe damage to the surface of the ground [4]. Currently, UAV low-altitude remote sensing technology offers new technical capabilities for monitoring geological disasters rapidly, efficiently, and with short cycles [5,6,7]. Moreover, the application of deep learning to establish a fracture database enables quick and accurate detection of areas affected by ground cracks, which is crucial for preventing and managing geological hazards associated with ground cracks in mines [8].

The rest of the paper is structured as follows: Section 2 summarizes the related work on detecting ground cracks by combining UAV aerial imagery with deep learning. Section 3 first describes the YOLOv8n network architecture and details of its key modules, then introduces the improved YOLOv8n detection model in this paper and details the structure and role of each improved module in the improved model. Section 4 first describes the dataset and the experimental environment. Then, ablation experiments, comparison experiments, and comparative analyses of the detection effect graphs are carried out to fully validate the feasibility of the method proposed in this paper. Section 5 discusses the findings presented throughout the paper and provides an outlook on future research directions. Section 6 summarizes the contents of this paper.

2. Related Work

Traditional methods of identifying ground cracks in mining areas include field surveys, identification by Interferometric Synthetic Aperture Radar (InSAR) technology, satellite remote sensing image identification, and identification by UAV low-altitude remote sensing technology. With the development of modularization, miniaturization, and intelligence of UAV low-altitude remote sensing technology, UAV low-altitude remote sensing technology has become essential in investigating and preventing geological disasters. Despite the significant progress in UAV detection technology, existing detection methods still need help balancing detection accuracy, model size, and detection speed. Deep learning, with its strong feature expression and recognition ability, can extract crack information in more complex environments and is currently the most commonly used crack recognition method.

Hou Enke et al. [9] compared UAV remote sensing and satellite remote sensing accuracy in interpreting ground cracks. Their findings favored UAV remote sensing for investigating ground cracks in mining areas despite the complex and extensive surface environments, potentially leading to missed details during visual interpretation. Wei Bowen et al. [10] proposed the MF-GDOG algorithm to extract fine cracks in loess areas, acknowledging its limitations in handling nonlinear crack distributions. Jingtao et al. [11] employed UAVs and the YOLOv3 algorithm for pavement crack detection, but environmental factors contributed to issues like leakage and misdetection, impacting detection precision. Kang et al. [12] used a Faster R-CNN network to extract the railroad surface insulator crack information, after which a multi-task neural network composed of an encoder and classifier determines the classification anomalies and evaluates the anomalies to obtain the railroad surface insulator crack information. Liu et al. [13] used U-Net to detect concrete cracks and chose the Focal loss function as the model evaluation function. The trained U-Net could recognize the cracks under different conditions, such as lighting and complex backgrounds. Bang et al. [14] captured images from the car’s CarLog video and combined ResNet-152 with a decoder to successfully extract the crack information for the pavement. Zhu Suya et al. [15] used a U-Net network to extract bridge crack information, and the background clutter and pseudo-cracks appearing in the detection results were refined by using the threshold method with the improved Dijkstra connection algorithm to extract the cracks and improve the detection accuracy. Quintana et al. [16] transferred the learned image features and designed classifiers for different images to detect asphalt cracks versus concrete cracks. Ding Wei et al. [17] combined UAV technology with deep learning to localize the location of concrete cracks. They could locate the cracks’ location more accurately but only for scenes with simple backgrounds.

Traditional research on the identification of ground cracks in mining areas is mainly limited by two points: first, the ground crack data collection is easily limited by time and security, resulting in the lack of complete monitoring and evaluation information on ground cracks in mining areas; second, the ground crack identification technology is backward. Although the study above managed to extract information about ground cracks, the irregular distribution of the cracks on the ground surface and their small width, typically ranging from a few centimeters to a few meters, make it more challenging to extract such information using traditional methods. At the same time, the environment of the mining area will change drastically with the occurrence of geological disasters, and the identification of ground cracks is easily affected by the complex environment of the mining area, which also makes the detection effect poor.

Thus, to address the above problems, this paper will improve the algorithm based on YOLOv8n. Firstly, the C2f module of the backbone network is improved by introducing deformable convolution (DCN), which can increase the offset according to the scale and shape of the detected object during the model training process to fit the shape of the detected target as much as possible and enhance the model’s ability to fit the shape of the detected object. Next, the global attention mechanism (GAM) is introduced in the neck network, which can reduce the loss of feature information during model training and improve the model’s performance. Finally, the wise IOU (WIOU) loss function is used to replace the complete IOU (CIOU) loss function to improve the model’s overall performance and generalization ability.

Therefore, to address the above problems, this paper utilizes UAV technology to replace manual collection of geohazard data of ground cracks in mining areas, which is used to solve the problems of low precision of small target detection and easy environmental influence of detection effect in the traditional mining subsidence ground crack identification research. The study explicitly applies this enhancement to the 8207 working face surface collapse area of the Dongzhouyao Coal Mine, part of the Jinneng Holding Coal Group. Using UAV aerial images, ground crack information is gathered to construct a dataset. The operating system of the experimental platform is Windows 10, and the deep learning framework is PyTorch 2.0. The Python version is 3.9, and CUDA 11.8 is used for computational acceleration. The improved YOLOv8n model was used for ground crack identification to verify the practical effectiveness of the improved model in this paper when applied to ground crack identification in the complex environment of bushes.

3. Improved YOLOv8 Network Architecture

3.1. YOLOv8 Network Architecture

The YOLOv8 algorithm builds upon the foundation of the YOLOv5 algorithm, featuring five models of varying sizes: n, s, m, l, and x. This improvement offers enhanced efficiency and accuracy in target detection compared to other algorithms, as it is particularly adept at recognizing features in UAV aerial images.

The YOLOv8 model is structured with four key components: the input layer, backbone network module backbone, feature enhancement module neck, and output head [18].

The input layer receives the image for detection and feeds it into the network. The backbone module extracts image features at multiple scales, while the neck module integrates these diverse features. Finally, the head module produces the final prediction results. In YOLOv8, the backbone component replaces the C3 module with the C2f module from YOLOv5 and retains the spatial pyramid pooling fusion (SPPF) module, ensuring lightweight design without compromising detection accuracy across different scales [19]. For the neck module, YOLOv8 adopts the Path Aggregation Network and Feature Pyramid Network (PAN-FPN) structure, maintaining the PAN concept while modifying the upsampling convolutional structure used in YOLOv5. Here, the C3 module is substituted with the C2f module. These network structure adjustments optimize efficiency and accuracy in target detection, particularly in UAV image analysis [20]. The output head employs a decoupled head structure that separates detection and classification tasks, utilizing an anchor-free framework to reduce computational complexity and accelerate detection time [21]. The loss function integrates two components: varifocal loss (VFL) for classification and a combination of CIOU loss and distribution focal loss (DFL) for regression tasks [22]. Figure 1 illustrates the enhanced network framework of this study’s model.

3.2. Improvements to the Backbone Network

Improvement of the C2f Module

Conventional convolution typically utilizes a convolutional layer to extract features from the input image using a fixed-size rectangular box at a specific location. However, this method only roughly localizes the target and does not fully conform to its morphology. Consequently, it can be affected by irrelevant background information, leading to decreased model accuracy during target feature extraction. In UAV aerial photography scenes focused on detecting ground cracks, which often present complex backgrounds and small targets, the use of traditional convolutional methods may compromise the quality of feature information, thereby affecting detection effectiveness. To address these issues, deformable convolution is introduced. This technique allows for adjustments in offset to better accommodate the scale and shape of the detected object during model training, thus overcoming the limitations associated with fixed rectangular box sampling in traditional convolutional methods.

First, ordinary convolution, such as a 3 × 3 convolution kernel, is considered. This kernel comprises nine samples, and the output feature formula for traditional convolution is as follows:

y (p_{0}) = \sum_{p_{0} \in R} w (P_{n}) \cdot x (P_{0} + P_{n})

(1)

where R = {(−1, −1), (−1, 0), …, (0, 1), (1, 1)}, the position in R is P_n, x is the input feature, the position of the output feature map is P₀, and w(P_n) is the convolution kernel weight at position P_n.

Deformable convolution extends traditional convolution by introducing an offset adjustment at each sampling point across the input image. Figure 2 illustrates a schematic of deformable convolution.

Figure 3 illustrates the deformable convolution. Initially, the input image features are extracted, followed by obtaining a sampling bias domain with a channel count of 2 N. This domain yields the bias matrix for the sampled points, forming the offset ΔP_n [23].

The deformable convolutional output feature formula is as follows:

y (p_{0}) = \sum_{p_{0} \in R} w (P_{n}) \cdot x (P_{0} + P_{n} + Δ P_{n})

(2)

DCNv2 further enhances DCNv1 by incorporating weight values for each sampling point, aimed at reducing extraneous information during the feature extraction process. The formula for the DCNv2 output feature is:

y (p_{0}) = \sum_{p_{0} \in R} w (P_{n}) \cdot x (P_{0} + P_{n} + Δ P_{n}) \cdot Δ m_{n}

(3)

where Δm_n is the weighting factor.

In this study, the C2f module at the end of the backbone network has been upgraded to include the C2f_DCNv2 convolution module, enhancing the model’s ability to prioritize small targets. Figure 4 depicts a diagram of the C2f module featuring the incorporation of DCNv2.

3.3. Improving the Neck Network

Introducing the GAM Attention Module

GAM is a global attention mechanism designed to enhance features across global dimensions while minimizing loss of feature information to improve model recognition performance. The GAM structure is depicted in Figure 5 [24]. It comprises two modules: channel attention and spatial attention. The channel attention module, illustrated in Figure 6, employs a three-dimensional configuration to preserve information across dimensions, followed by amplifying spatial dependencies among cross-dimensional channels using a two-layer multilayer perceptron (MLP). The spatial attention module, shown in Figure 7, integrates spatial information by means of two convolutional layers to focus on spatial details.

In the channel attention module, the feature map is first reordered by permutation to convert C × H × W into W × H × C. After this transformation, the feature map is passed through the MLP. It then undergoes an inverse permutation and is finally processed using the sigmoid activation function to produce the channel attention feature map. In the spatial attention module, two 7 × 7 convolutional layers are employed to merge spatial information, resulting in the spatial attention feature map through sigmoid activation function processing [25].

3.4. Improving the Loss Function

The current YOLOv8 bounding box regression loss function utilizes the CIOU function, computed as follows:

L_{C I O U} = 1 - I O U + φ β + \frac{γ^{2} (P, P^{g t})}{D^{2}}

(4)

I O U = \frac{|A \cap B|}{|A \cup B|}

(5)

φ = \frac{β}{(1 - I O U) + β}

(6)

β = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(7)

where P is the position of the center point of the prediction frame; P^gt dentoes the position of the center point of the actual frame; φ is the parameter used to balance the ratio; β indicates the consistency used to reflect the aspect ratio; γ is taken as the Euclidean distance between the two center points; D represents the diagonal distance of the smallest closure region that contains both actual and predicted boxes; w is the width of the prediction box; w^gt is the width of the actual box; h is the width of the prediction box; and h^gt is the height of the actual frame.

The CIOU loss function makes it easy to capture the exact shape of the detection target. However, the CIOU loss function only considers the center distance of the two bounding boxes, which cannot accurately estimate the matching of the boundaries, and cannot take effective measures against the changes in the shape of the detection target, and fails to take into account the impact of low-quality data on the performance of the model.

WIOU3 is utilized for bounding box regression loss. WIOUv3 incorporates a dynamic nonmonotonic mechanism and a reasonable gradient gain distribution strategy, reducing the competitiveness of high-quality anchor frames and mitigating harmful gradients from low-quality data. This enables the model to prioritize average-quality samples, thereby enhancing overall model performance [26].

The WIOU develops the WIOUv1 model incorporating a two-layer attention mechanism that uses distance as a metric to enhance model generalizability. The coordinates (x^gt, y^gt) represent the position of the target frame, while w^c and h^c denote the width and height of the smallest outer rectangle of the prediction frame and the reference frame, respectively. R_WIOU denotes the loss associated with high-quality anchor frames. The equation for WIOUv1 is as follows:

L_{W I O U v 1} = R_{W I O U} L_{I O U}

(8)

Among them:

R_{W I O U} = \exp (\frac{{(x - x^{g t})}^{2} + {(y - y^{g t})}^{2}}{{({(w^{c})}^{2} + {(h^{c})}^{2})}^{*}})

(9)

β is defined as the nonmonotonic focusing factor, and α and δ are hyperparameters that can be adjusted for application to different models.

β = \frac{{L^{*}}_{I O U}}{\bar{L_{I O U}}} \in [0, + \infty)

(10)

Among them:

γ = \frac{β}{δ α^{β - δ}}

(11)

To effectively counteract detrimental gradients originating from low-quality data, WIOUv3 was formulated building upon WIOUv1. The formula is as follows:

Hence, to mitigate the pronounced harmful gradients from lower-quality data, the CIOU loss function was substituted with the WIOUv3 loss function in experiments to enhance the model’s generalization capability. Additionally, in UAV aerial photography scenes for target detection tasks, the presence of small targets complicates detection. WIOUv3 dynamically optimizes the loss weights for small targets, thereby enhancing model detection performance [27].

4. Experimental Results and Analysis

4.1. Overview of the Study Area

The experimental site is located in the mining subsidence area of the 8207 working face at Dongzhouyao Coal Mine, which belongs to the Jinneng Holding Coal Group. The working face is situated about 160 m southwest of Yanghuling village and around 560 m east of Donghongya village. The ground surface at the working face is covered with loess, and the terrain exhibits a generally higher elevation in the northeast, sloping lower towards the southwest. This area features a loess hilly terrain with gentle topographic variations. Diagonally crossing the southwest side of the working face are two high-voltage lines, and no large buildings are visible in the vicinity.

The mine extends 15.8 km from east to west and 14.4 km from north to south. Mining occurs between elevations ranging from 1544.9 m down to 700 m. The total mining area covers 101.4129 km², with a depth of 389 m. The mineable strike length is 940 m, and the mining width spans 231 m. Through the site investigation and comprehensive analysis of 8207 working faces, it can be obtained that with the mining of the coal seam, the overlying rock gradually breaks and collapses. The surface moves to form ground cracks, mainly concentrated in the vicinity of the opening eye of the 8207 working face, and are distributed perpendicularly in the direction of the advancing direction of the working face.

4.2. Experimental Dataset

The aerial photography plan was determined based on the topography of the study area, current meteorological conditions, and the requirements needed for this study. A DJI Mavic 3 Pro drone was utilized for collecting low-altitude data. It operated at a relative altitude of 20 m, with an 80% overlap in heading and a 70% overlap on the sides during the aerial survey of the experimental area conducted on 18 October 2023. This survey occurred while the workings had advanced to a depth of 650 m. The flight path details are illustrated in Figure 8.

The dataset consists of images obtained from drone surveys of the experimental area. To optimize model performance, adjustments were made to the brightness and contrast of the dataset samples, resulting in a collection of 489 images depicting ground cracks. These images meet the required clarity for detection. They were randomly split into test, validation, and training sets at a ratio of 1:2:7. Following this, the images were labeled using Labelimg and saved in .txt format.

4.3. Experimental Environment and Parameter Configuration

This experiment was conducted using the PyTorch deep learning framework (version 2.0), Python 3.9, and CUDA 11.8 on a Windows 10 platform. The hardware setup includes a 12th Gen Intel(R) Core(TM) i5-12400F processor and an NVIDIA GeForce RTX 3060 GPU. The YOLOv8n model was employed for training the detection of ground cracks in mining areas. The input images were sized at 1080 pixels × 1080 pixels, with a training batch size of 8. Training consisted of 500 epochs, starting with a learning rate of 0.01 and a momentum factor of 0.937.

4.4. Experimental Evaluation Indicators

The detection performance of the enhanced algorithm, integrated with the context of ground crack identification in real mining environments, was assessed. This study employed precision (P), recall (R), mean average precision (mAP), parameters (Params), floating point operations per second (FLOPs), frames per second (FPS), and model size as metrics to evaluate model performance. The model’s detection effectiveness can be comprehensively assessed using the following formula:

P = \frac{T P}{T P + F P}

(12)

R = \frac{T P}{T P + F N}

(13)

True positive (TP) refers to the number of ground cracks correctly identified, and false positive (FP) refers to the number of instances where the background is mistakenly identified as ground cracks. False negative (FN) refers to instances where ground cracks are misidentified as background.

The mean average precision (mAP) represents the average AP across all detection categories, where ‘n’ represents the number of categories and mAP@0.5 represents the mAP at an IOU threshold of 0.5. The formula for mAP is as follows:

m A P = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{1} P (R) d R

(14)

4.5. Results and Analysis of Ablation Experiments

To assess the enhancement of the proposed improvement algorithm on the performance of the target detection algorithm for UAV aerial images, three modification points were incorporated into the original YOLOv8n model for ablation experiments: the introduction of deformable convolution, the introduction of the GAM module, and the application of the WIOU loss function. The effects of each module on the algorithm were evaluated, with the results presented in Table 1.

In Table 1, YOLOv8n represents the original model. Configuration A involves substituting the CIOU loss function with the WIOU loss function, which enhances precision by 0.9% and mAP@0.5 by 2.2% compared to the original model. However, R and FPS decreased by 2% and 35.3%, with the number of parameters, the FLOPs, and the model size remaining unchanged. Configuration B introduces the GAM attention mechanism to YOLOv8n, improving P by 1.5%, R by 0.5%, and mAP@0.5 by 0.4%, without affecting the model size, while the number of parameters and the FLOPs increase by 0.01 M and 0.2 G, respectively, and FPS decreased by 18.4%. Configuration C implements deformable convolution, which boosts P, R, mAP@0.5, and FPS by 3.4%, 4%, 3.8%, and 1.6%, respectively, maintaining the model size consistent with the original. However, the FLOPs decrease by 0.1 G, and the number of parameters increases by 0.03 M. Configuration D improves the model by introducing deformable convolution and replacing the CIOU loss function, maintaining P unchanged but enhancing R by 0.7% and mAP@0.5 by 0.3%, with the number of parameters, the FLOPs, and the model size equivalent to those of Configuration C, but FPS decreased by 0.5%. Configuration E enhances the model by introducing deformable convolution and the GAM attention mechanism, improving P, R, and mAP@0.5 by 1.4%, 2%, and 1.6%, respectively. In contrast, the number of parameters, the FLOPs, and the model size increased by 0.04 M, 0.1 G, and 0.1 MB, respectively, and FPS decreased by 7.3%. The model presented in this paper enhances P, R, mAP@0.5, and FPS by 3.5%, 4.8%, 5.1%, and 12.1%, respectively, compared to YOLOv8n. The number of parameters and model size increase by only 0.04 M and 0.1 MB, respectively, while FLOPs are reduced by 0.1 G. The advantages of these improvements are incorporated into our model, which enhances detection accuracy and real-time performance without significant changes in parameter count, FLOPs, or model size, indicating the model’s efficacy in detection.

4.6. Comparison Results and Analysis with Other Algorithms

To evaluate the performance and advantages of the improved algorithm proposed in this study for the detection of ground cracks in mining areas, it was compared with mainstream algorithms Fast R-CNN, SSD, YOLOv3-tiny, YOLOv5s, YOLOv7, YOLOv7-tiny, YOLOv8n, and YOLOv10n under the same training dataset, experimental environment, and parameter configuration, with the results displayed in Table 2.

The table illustrates that the algorithms enhanced in this study improve R by 3.3%, 7.2%, 7%, 4.9%, 1.2%, 6%, 4.8%, and 22.8% compared to those of the other eight algorithms (Fast R-CNN, SSD, YOLOv3-tiny, YOLOv5s, YOLOv7, YOLOv7-tiny, YOLOv8n, and YOLOv10n, respectively), despite P being lower than that of YOLOv3-tiny and YOLOv5s algorithms. mAP@0.5 has been improved by 6.4%, 3.7%, 4.7%, 3.7%, 2.5%, 4.8%, 5.1%, and 22.3% compared to Fast R-CNN, SSD, YOLOv3-tiny, YOLOv5s, YOLOv7, YOLOv7-tiny, YOLOv8n, and YOLOv10n algorithms, respectively. Although the number of parameters, total floating point operations, and model size of the YOLOv10n model are lower than the improved model in this paper, and the FPS reaches 60.01/f.s⁻¹. The P, R, and mAP@0.5 of the YOLOv10n model are lower than that of the improved model in this paper. In summary, compared with mainstream algorithms, the enhanced model in this study reduces operations and model size while improving accuracy, showing clear advantages in detecting ground cracks in mining areas, enabling high-precision detection, and holding significant practical value.

To visually demonstrate the superiority of the enhanced algorithm in this paper, Figure 9, Figure 10 and Figure 11 compare the training precision, recall, and mAP@0.5 curves between the original YOLOv8n model and the improved YOLOv8n model, respectively. Both models achieve convergence after 500 iterations, with the precision, recall, and mAP@0.5 metrics of the improved model in this paper surpassing those of the original model.

4.7. Target Detection Results and Analysis of the Improved Algorithm in This Paper

In order to further verify the advantages of the improved algorithm in this paper, images of small and densely distributed cracks in bushes under bright and dim lighting conditions are selected for comparing the model training effects. Included are the original image of ground cracks in the mining area, the recognition results from the original YOLOv8n model, and the outcomes from the improved model presented in this paper.

Figure 12 and Figure 13 show the effect of detecting ground cracks in the mine under bright light conditions. Figure 12 utilizes the original YOLOv8n model and the improved model proposed in this study to detect fine cracks in the experimental area. Compared with the improved model, the original YOLOv8n model has leakage detection and false detection during the detection of ground cracks. This paper’s enhanced model exhibits higher confidence levels in detection and provides more accurate information regarding the direction and distribution of ground cracks. Meanwhile, the improved model in this paper has no leakage and false detection when detecting ground cracks. In Figure 13, both the original YOLOv8n model and our improved model are employed to identify densely distributed ground cracks in the experimental area. The improved model in this study enhances detection accuracy, particularly in identifying smaller ground crack targets within complex environments.

Figure 14 and Figure 15 show the effect of detecting ground cracks in the mine under dim lighting conditions. Figure 14 and Figure 15 also utilize the original YOLOv8n model and the improved model proposed in this study to detect fine ground cracks versus densely distributed ground cracks in the experimental area. Compared to the improved model, the original YOLOv8n model in Figure 14 suffers from false detection and leakage detection scenarios, detecting extraneous backgrounds as ground cracks and failing to detect fine ground cracks. Meanwhile, the detection effect graph in Figure 14 shows that the confidence of the original YOLOv8n model is lower than that of the improved model in this paper. In Figure 15, the confidence of this improved model is significantly higher than the original YOLOv8n original model confidence.

5. Discussion

In recent years, with the rapid development of UAV low-altitude remote sensing technology, its application in the field of geologic hazards has become increasingly extensive. In addition, the combination of low-altitude remote sensing technology from drones and deep learning can also help to detect ground cracks accurately. For example, Xu et al. [28] used an uncrewed aerial vehicle (UAV) to collect images of coal mine ground cracks. They proposed an improved YOLOv8 instance segmentation network for the automatic and efficient detection of ground cracks in coal mining areas in complex environments. Chen et al. [29] proposed a ground crack detection method based on deep learning and multi-scale map convolution. The method utilizes the multi-scale convolution technique to obtain and fuse the high-level and low-level information of ground cracks, realizing the effective extraction of ground crack information. Although the above methods can extract ground crack information and have been applied in practice, the model performance has yet to be validated under different lighting conditions and in a dataset of fine ground cracks.

The study area of this paper is located in the 8207 working face of Dongzhouyao Coal Mine, Zuoyun County, Datong City, Shanxi Province, and this experiment mainly focuses on the detection of subsidence cracks of mine mining in a scrubland environment. Meanwhile, the experiment also expands the ground crack dataset by adjusting the image brightness and darkness to simulate different lighting conditions in the mining area so that the trained model is more in line with the actual situation. Most of the ground cracks in the study area of this paper are located in bushes, and the images of ground cracks collected using UAV contain many irrelevant backgrounds, which can increase the difficulty of detection during the training process. During the training process, the degradation of the ground crack image quality will make it difficult to accurately extract the subtle features of ground cracks and reduce the detection accuracy of the model. This paper proposes an improved crack detection model based on YOLOv8n for mine mining sinkholes for the above problems. The model achieved 84.9%, 84.5%, 89%, and 55.25/f.s⁻¹ for P, R, mAP@0.5, and FPS, respectively. According to the experimental results of this paper, it can be seen that the enhancement of P, mAP%0.5, and FPS is achieved with basically no increase in parameters such as model size, which highlights the effectiveness and practicality of the improved algorithm of this paper for detecting ground cracks in bushes under different lighting conditions. In this paper, we utilize UAV low-altitude remote sensing technology to quickly and efficiently acquire images of ground cracks in mining areas, providing a rich and high-quality dataset for ground crack detection. At the same time, the UAV is highly flexible and maneuverable. It can easily cover the complex terrain of the mining area, further improving the comprehensiveness and accuracy of the inspection. Combining UAV low-altitude remote sensing technology with deep learning can realize efficient and accurate detection of ground cracks in mining areas and provide timely and practical information support for safe production in mining areas.

This paper’s results demonstrate the improved algorithm’s excellent detection performance, surpassing other mainstream detection algorithms in the field. Due to the limited sample data used for training in the experiments, the improved model in this paper needs to be further improved in practical applications. The diversity of the dataset could be further extended to include different types of geological structures and surface features, as well as a broader range of meteorological conditions. At the same time, the deployment scheme of the model in practical engineering applications can be explored, and the algorithm can be implemented to further improve lightweight and real-time analysis to realize the real-time monitoring and warning functions.

6. Conclusions

This paper introduces an enhanced YOLOv8n detection algorithm tailored for mining subsidence ground crack images to address the challenge of detecting ground cracks effectively in shrubland. Initially, DCNv2 is integrated at the end of the backbone network to enhance the model’s capability to focus on small targets. Subsequently, the GAM is incorporated during the feature fusion stage to further refine detection accuracy. Finally, the WIOU loss function replaces the CIOU loss function in the training process.

The experimental results demonstrate that the improved model presented in this paper achieves 84.9%, 84.5%, 89%, and 55.25/f.s⁻¹ for precision, recall, mAP@0.5, and FPS, respectively, compared to the original model. Notably, there is minimal change in the number of parameters, FLOPs, and model size. Furthermore, compared to other mainstream algorithms, the enhanced algorithm in this paper markedly reduces the number of parameters, total floating point operations, and model size while maintaining effective detection performance.

In summary, the improved algorithm developed in this study is designed to detect cracks in mining sinkholes within mining areas filmed by UAVs. However, given the constraints of limited training data in our experiments, further enhancements are required to optimize the practical application of this improved model.

Author Contributions

Writing—original draft preparation, L.Z.; writing—review and editing, X.L. and L.Z.; software, X.L. and L.Z.; investigation, S.H., Q.Y. and M.W.; funding acquisition, J.W.; data curation, X.L., L.Z., S.H., Q.Y., J.W. and M.W. All authors have read and agreed to the published version of the manuscript.

Funding

2022 Datong Science and Technology Plan Project (grant number: 2022005); Postgraduate Education Innovation Project of Shanxi Province (grant number: 2022Y766). “Research on the crushing mechanism and coal release law of gangue top coal in the synthesized discharge of extra-thick coal seam”, Shanxi Datong University project (grant number: 2024008).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, L.K. Research on Characteritics of Mining Surface Cracks Based on UAV Images. Master’s Thesis, Xi’an University of Science and Technology, Xi’an, China, 2021. [Google Scholar]
Zhang, J.Y.; Wang, K.; Zhao, T.B.; Fang, P.; Qi, K.; Wei, B.W.; Li, Z.Y. Status and development of UAV remote sensing technology in mining surface subsidence and fracture measuring. Coal Sci. Technol 2024, 11. [Google Scholar] [CrossRef]
Chang, Y.C.; Chen, H.T.; Chuang, J.H.; Liao, I.C. Pedestrian Detection in Aerial Images Using Vanishing Point Transformation and Deep Learning. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 1917–1921. [Google Scholar]
He, X. Reserch on Coordinated Failure Mechanism of Overburden-Surface and Damage Reduction in Shendong Mining Area. Ph.D. Thesis, China University of Mining and Technology, Beijing, China, 2021. [Google Scholar]
Božić-Štulić, D.; Marušić, Ž.; Gotovac, S. Deep learning approach in aerial imagery for supporting land search and rescue missions. Int. J. Comput. Vis. 2019, 127, 1256–1278. [Google Scholar] [CrossRef]
Lian, X.G.; Han, Y.; Liu, X.Y.; Hu, H.F.; Cai, Y.F. Study Progress and Development Trend of Mine Geological Disaster Monitoring by UAV Low-altitude Remote Sensing. Met. Mine 2023, 1, 17–29. [Google Scholar]
Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Vehicle detection from UAV imagery with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6047–6067. [Google Scholar] [CrossRef] [PubMed]
Cheng, L.B. Research on Landslide Disaster Detection Model Based on Deep Learning. Master’s Thesis, Yunnan Normal University, Kunming, China, 2021. [Google Scholar]
Hou, E.K.; Zhang, J.; Xie, X.S.; Xu, Y.N. Contrast application of unmanned aerial vehicle remote sensing and satellite remote sensing technology relating to ground surface cracks recognition in coal mining area. Geol. Bull. China 2019, 38, 443–448. [Google Scholar]
Wei, B.W.; Liu, G.X.; Wang, Z.H. Extracting Ground Fissures in Loess Landform Area Using Modified F-FDOG Algorithm and UAV Images. Surv. Mapp. 2018, 41, 51–56+61. [Google Scholar]
Zhong, J.T. Pavement Distress Detection and Segmentation Using Convolutional Neural Networks with Images Captured via UAV. Master’s Thesis, Southeast University, Nanjing, China, 2024. [Google Scholar]
Kang, G.; Gao, S.; Yu, L.; Zhang, D. Deep architecture for high-speed railway insulator surface defect detection: Denoising autoencoder with multitask learning. IEEE Trans. Instrum. Meas. 2018, 68, 2679–2690. [Google Scholar] [CrossRef]
Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
Bang, S.; Park, S.; Kim, H.; Kim, H. Encoder–decoder network for pixel-level road crack detection in black-box images. Comput. Aided Civ. Infrastruct. Eng. 2019, 34, 713–727. [Google Scholar] [CrossRef]
Zhu, S.Y.; Du, J.C.; Li, Y.S.; Wang, X.P. Method for bridge crack detection based on the U-Net convolutional networks. J. Xidian Univ. 2019, 46, 35–42. [Google Scholar]
Quintana, M.; Torres, J.; Menéndez, J.M. A simplified computer vision system for road surface inspection and maintenance. IEEE Trans. Intell. Transp. Syst. 2015, 17, 608–619. [Google Scholar] [CrossRef]
Ding, W.; Yu, K.; Shu, J.P. Method for detecting cracks in concrete structures based on deep learning and UAV. China Civ. Eng. J. 2021, 54 (Suppl. S1), 1–12. [Google Scholar] [CrossRef]
Xie, J.; Deng, Y.M.; Wang, R.M. Improved YOLOv8s traffic sign detection algorithm. Comput. Eng. 2024, 50, 338–349. [Google Scholar]
Pan, W.; Wei, C.; Qian, C.Y.; Yang, Z. Improved YOLOv8s model for small object detection from perspective of drones. Comput. Eng. Appl. 2024, 60, 140–150. [Google Scholar]
Cheng, H.X.; Qiao, Q.Y.; Luo, X.L.; Yu, S.J. Object detection algorithm for UAV aerial image improved by YOLOv8. Radio Eng. 2024, 54, 871–881. [Google Scholar]
Zhang, Y.; Chen, Y.J. Improved YOLOv8 algorithm for Small object detection on water surface. Comput. Syst. Appl. 2024, 33, 152–161. [Google Scholar]
Dou, Z.; Gao, H.R.; Liu, G.Q.; Chang, B.-F. Small sample steel plate defect detection algorithm of lightweight YOLOv8. Comput. Eng. Appl. 2024, 60, 90–100. [Google Scholar] [CrossRef]
Leng, R.X. Application of Foreign Objects Identification of Transmission Lines Based on YOLOv8 Algorithm. Master’s Thesis, Northeast Agricultural University, Harbin, China, 2023. [Google Scholar]
Li, W.; Yao, X.M.; Zhang, P.C.; Yang, W.W.; Li, Y.F. Research on improved YOLO-V7 steel surface sefect detection algorithm. Mech. Sci. Technol. Aerosp. Eng. 2024, 1–10. [Google Scholar] [CrossRef]
He, Y.T.; Che, J.; Wu, J.M.; Ma, P.S. Research on pedestrian multi-object tracking algorithm under OMC Framework. Comput. Eng. Appl. 2024, 60, 172–182. [Google Scholar]
Su, Y.; Tang, F.Q.; Li, J.X.; Wang, C.; Zhang, X.-Y. Improvement of YOLOv7 model for intelligent recognition of mining surface cracks. Saf. Coal Mines 2024, 55, 1–9. [Google Scholar]
Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A modified YOLOv8 detection network for UAV aerial image recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
Xu, Z.; Lin, Y.; Zhang, Z. FS_YOLOv8: A Deep Learning Network for Ground Fissures Instance Segmentation in UAV Images of the Coal Mining Area. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 777–785. [Google Scholar] [CrossRef]
Chen, W.; Zhong, C.; Qin, X.; Wang, L. Deep Learning Based Intelligent Recognition of Ground Fissures. In Intelligent Interpretation for Geological Disasters: From Space-Air-Ground Integration Perspective; Springer Nature: Singapore, 2023; pp. 171–233. [Google Scholar]

Figure 1. Improved YOLOv8n network structure diagram.

Figure 2. Schematic diagram of deformable convolution.

Figure 3. Deformable convolution realization process.

Figure 4. Improved C2f module diagram.

Figure 5. GAM Attention Module.

Figure 6. Channel Attention.

Figure 7. Spatial attention.

Figure 8. UAV flight path design.

Figure 9. Comparison of precision during training.

Figure 10. Comparison of recall during training.

Figure 11. Comparison of mAP@0.5 during training.

Figure 12. The effect of detecting small ground cracks under bright light conditions.

Figure 13. Detection effect of the dense distribution of ground cracks under bright light conditions.

Figure 14. The effect of detecting small ground cracks under dim light conditions.

Figure 15. The effect of detecting densely distributed ground cracks under dim light conditions.

Table 1. Ablation test.

Model	DCNv2	GAM	WIOU	P/%	R/%	mAP@0.5/%	FPS/f.s⁻¹	Params/M	FLOPs/G	Model Size/MB
YOLOv8n				0.814	0.797	0.839	49.26	3.01	8.1	6.5
A			√	0.823	0.777	0.861	31.84	3.01	8.1	6.5
B		√		0.829	0.802	0.843	40.16	3.02	8.3	6.5
C	√			0.848	0.837	0.877	50.08	3.04	8.0	6.5
D	√		√	0.814	0.804	0.842	49.01	3.04	8.0	6.5
E	√	√		0.828	0.817	0.855	45.66	3.05	8.2	6.6
Ours	√	√	√	0.849	0.845	0.89	55.25	3.05	8.0	6.6

Table 2. Comparative experimental results.

Comparison Models	P/%	R/%	mAP@0.5/%	FPS/f.s⁻¹	Params/M	FLOPs/G	Model Size/MB
Fast R-CNN	0.839	0.812	0.826	49.62	40.0	207.1	113.6
SSD	0.844	0.773	0.853	51.22	24.2	274.9	97.1
Yolov3-tiny	0.851	0.775	0.843	57.47	121.28	18.9	24.5
Yolov5s	0.863	0.796	0.853	49.85	7.01	15.8	15.1
Yolov7	0.833	0.833	0.865	33.62	37.2	105.1	74.8
Yolov7-tiny	0.844	0.785	0.842	48.02	6.01	13.2	12.3
Yolov8n	0.814	0.797	0.839	49.26	3.01	8.1	6.5
Yolov10n	0.67	0.617	0.667	60.01	2.26	6.5	5.8
Ours	0.849	0.845	0.89	55.25	3.01	8.0	6.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Li, X.; Hao, S.; Yan, Q.; Wang, J.; Wang, M. A Study on the Identification of Cracks in Mine Subsidence Based on YOLOv8n Improvement. Processes 2024, 12, 2716. https://doi.org/10.3390/pr12122716

AMA Style

Zhang L, Li X, Hao S, Yan Q, Wang J, Wang M. A Study on the Identification of Cracks in Mine Subsidence Based on YOLOv8n Improvement. Processes. 2024; 12(12):2716. https://doi.org/10.3390/pr12122716

Chicago/Turabian Style

Zhang, Lei, Xiwei Li, Shangkai Hao, Qianru Yan, Jiayuan Wang, and Meng Wang. 2024. "A Study on the Identification of Cracks in Mine Subsidence Based on YOLOv8n Improvement" Processes 12, no. 12: 2716. https://doi.org/10.3390/pr12122716

APA Style

Zhang, L., Li, X., Hao, S., Yan, Q., Wang, J., & Wang, M. (2024). A Study on the Identification of Cracks in Mine Subsidence Based on YOLOv8n Improvement. Processes, 12(12), 2716. https://doi.org/10.3390/pr12122716

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Study on the Identification of Cracks in Mine Subsidence Based on YOLOv8n Improvement

Abstract

1. Introduction

2. Related Work

3. Improved YOLOv8 Network Architecture

3.1. YOLOv8 Network Architecture

3.2. Improvements to the Backbone Network

Improvement of the C2f Module

3.3. Improving the Neck Network

Introducing the GAM Attention Module

3.4. Improving the Loss Function

4. Experimental Results and Analysis

4.1. Overview of the Study Area

4.2. Experimental Dataset

4.3. Experimental Environment and Parameter Configuration

4.4. Experimental Evaluation Indicators

4.5. Results and Analysis of Ablation Experiments

4.6. Comparison Results and Analysis with Other Algorithms

4.7. Target Detection Results and Analysis of the Improved Algorithm in This Paper

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI