Improved Dual-Module YOLOv8 Algorithm for Building Crack Detection
Abstract
1. Introduction
- Firstly, the standard method of checking building cracks mainly depends on traditional measurement technology [8]. Specifically, inspectors must measure the crack width manually at regular intervals or collect data with equipment such as a crack measuring instrument. Traditional crack detection methods are mainly based on digital image processing, including edge detection, threshold segmentation and wavelet transform. However, these traditional methods have average accuracy in identifying cracks. This process is not only slow and costly but also brings great security risks. Moreover, traditional fracture measurements usually need to compare the fracture with the scale marks on the microscope, which often leads to great errors in manual reading. Therefore, many difficulties will be encountered in practical engineering applications.
- Secondly, although single-phase, two-phase and transformer-based networks have made good progress in many fields, there is still not enough research on fine-grained semantic detection.
- Thirdly, the main focus of research has been on crack detection technology, especially in the task of image classification or segmentation, and their cooperative use is often ignored. This combined method is particularly important for engineering managers to evaluate the severity of structural damage.
- The purpose of this paper is to emphasize the importance of detecting structural defects in buildings and take a closer look at the recent scientific progress and real-world applications in this field.
- This research deeply discusses the core principle, network structure and working steps of deep-learning systems, and lists the advantages and disadvantages of different methods. Considering the particularity of identifying structural fractures, we have laid a solid theoretical foundation for creating high-performance crack-recognition models.
- This work combines the advantages of Transformer and CNN and analyzes their respective advantages and disadvantages. We use imaging technology to collect fracture data and study various detection methods and parameter calculation methods. By using segmentation technology, we have established an image segmentation model, which can calculate the crack width and other geometric features, which can quickly and accurately separate the shape of the crack. This method provides an important foundation for promoting crack detection, repair, evaluation and inspection after repair.
2. Literature Research
2.1. Traditional Image-Based Detection Methods
2.2. Deep-Learning-Based Detection Methods
3. Improved YOLOv8 Network Design
3.1. YOLOv8 Network Architecture
3.1.1. Input
3.1.2. Backbone
- ConvThe Conv module consists of three main parts: a 2D convolution operation (Conv2d), batch normalization (BN) and an SiU activation function. In order to deal with the edge filling of input data, the system uses an automatic filling strategy called autopad(kp).
- C2fThe design of the C2f module is inspired by the C3 module of YOLOv5 and the ELAN module of YOLOv7. Through gradient segmentation, the developer merged the previously ordered Bottleneck components and optimized the module structure. This method can not only prevent the problem of slow convergence when the network becomes deeper but also keep the characteristics of small calculation of the model. Moreover, it also retains the complete gradient flow data, which greatly improves the performance of YOLOv8 model.
- SPPFInspired by SPPNet, the SPPF module utilizes several sequential small pooling kernels. It substitutes the three convolutional kernels of varying sizes in the SPP module with three 5 × 5 convolutional kernels, as two successive 5 × 5 kernels visually approximate a single 9 × 9 kernel. Similarly, three sequential 5 × 5 convolutions produce an equivalent effect to a single 13 × 13 convolution. Compared to employing large convolutions directly, cascading multiple smaller convolutions considerably decreases computational complexity and markedly enhance detection efficiency. Experimental results demonstrate that under identical input conditions, the outputs of the SPP module and SPPF module are consistent; however, the SPPF module functions at twice the speed of the SPP module. Figure 2 shows the structural settings of Conv, Bottleneck, C2f and SPPF.
3.1.3. Neck
3.1.4. Head
- CloU Loss: The YOLOv8 model uses CIoU instead of IoU, which can better detect the details of small objects and improve the recognition accuracy of small targets. The variable sum represents the aspect ratio of the real bounding box and the predicted bounding box, respectively. The mathematical expression is this, as follows:
- Varifocal Loss: Improve the detection accuracy by calculating the classification score (IACS) of IoU perception, which combines the object confidence and positioning accuracy. The VFL model will give priority to difficult positive samples, thus improving the overall object detection effect. This loss function treats positive samples and negative samples differently and will not be treated symmetrically. Q here is the label: for positive samples, q represents the IoU value; for negative samples, q is set to 0. Therefore, this function uses Binary Cross-Entropy (BCE) for positive samples and Focal Loss for negative samples. The formula is as follows:
3.2. Improve the Swin Transformer Module
3.2.1. Swin Transformer Framework
3.2.2. Optimizing Self-Attention in Mobile Windows
3.3. Improving the U-Net Model
3.3.1. Pixel-Level Attention
3.3.2. Spatial Attention
4. Experiment and Results
4.1. Experimental Environment and Training Configuration
4.2. Crack Detection Experiments Based on YOLOv8
4.2.1. Evaluation Metrics for Crack Detection
- Accuracy represents the proportion of correctly detected crack instances among all detection outcomes. In the confusion matrix, True Positive (TP) denotes crack instances that are correctly detected and classified, False Positive (FP) denotes non-matching or misclassified detections, and False Negative (FN) denotes missed crack instances. Accuracy can be calculated as follows:
- Recall measures the ability of the detection model to identify all actual crack instances. It is defined as the ratio of correctly detected crack instances to the total number of ground-truth crack instances. Its calculation formula is
- F1 score is the harmonic mean of Precision and Recall, providing a balanced measure of detection performance, particularly in scenarios with imbalanced crack categories. Its calculation formula is the following equation:
- It is too simple to evaluate the model just by looking at the accuracy and recall in the experiment. We should use the comprehensive index AP to evaluate the detection ability of the model. MAP is the average value of these APS calculated according to the detection results of the model. Because the two indicators, accuracy and recall, will affect each other, we need to evaluate them more carefully. Therefore, mAP is an overall indicator used to comprehensively measure the performance of model detection. In this study, mAP represents the average detection accuracy of all building fractures, and the higher the value, the better the overall effect of the model. The calculation formula is
4.2.2. Dataset Creation
- Dimension StandardizationNeural network usually requires the input picture data to meet the preset size. Dimension standardization is an important step in neural network training, which aims to ensure the consistency of all inputs. This process will use scaling, cropping and other technologies to make all photos uniform in size, so as to meet the requirements of network structure and facilitate batch processing. This method can speed up the training, improve the accuracy, reduce the risk of over-fitting, and enhance the generalization ability of the model.
- Data NormalizationData normalization is a very important preparation step before we train neural network. Its goal is to adjust the input data to a unified numerical range, such as 0 to 1, or −1 to 1. The advantage of this is that it can make the training more effective, and it can also make the neural network learn faster.Firstly, standardizing the data can make the gradient descent algorithm more stable. When training the neural network, this algorithm reduces errors by adjusting the parameters of the model. However, if the range of input data is too large, the gradient may become too large or too small, so the training will be unstable.Secondly, data standardization can help the model converge faster. In the unprocessed image data, the numerical range of different features may be much worse. This will cause some features to be updated too quickly, while others are updated too slowly.
- Image EnhancementImage enhancement operation is a series of image processing methods to improve the image quality or highlight some details. Before training the neural network, these operations can increase the quality and quantity of our image dataset, so that the model can be better and more universal. Standard techniques include adjusting luminance and contrast, applying geometric transformations, reducing resolution via downsampling, and performing data augmentation. Due to the limited number of training images in this study, image augmentation techniques were employed on the collected dataset. A sequence of operations—including luminance modification, mirroring, flipping, and rotation—was applied to the dataset images, significantly augmenting the quantity of fracture images. A schematic representation of the image enhancement process is depicted in Figure 8.
4.2.3. Performance Evaluation and Effect Analysis
4.2.4. Comparative Experiment
4.2.5. Ablation Experiments
4.3. Improving the Segmentation Algorithm of U-Net
4.3.1. Performance Evaluation Metrics for Partitioning Algorithms
- In order to evaluate the segmentation effect, we have some standard measurement methods, such as the Dice coefficient, average symmetric surface distance (ASSD) and Intersection over Union (IoU). The Dice value is used to compare our predicted result with the correct answer, which is calculated by the following formula.where Ga and Gb indicate the predicted segmentation regions produced by the enhanced U-Net network and the corresponding ground truth segmentation, respectively.
- Average symmetric surface distance (ASSD). Its formula is as follows:Here, Ta and Sb denote the sets of boundary points for the improved U-Net segmentation and the ground truth, respectively. represents the minimum straight-line distance from point a in the set Ta to set Sb all points in set. refers to the minimum linear distance from point b in set Sb to set Ta all points in the set.
- IoUIntersection over Union (IoU) is a standard used to measure performance in computer vision and image processing. It is used to test the effects of various applications, such as object detection, image segmentation and location. IoU can calculate the degree of overlap between two regions, usually by comparing the region found by the algorithm with the real region. “Intersection” refers to the overlapping part of two areas (usually bounding box or division mask), while “union” refers to the total area of the two areas. The score range of IoU is 0 to 1, where 0 means no overlap at all and 1 means complete overlap. Generally speaking, the higher the IoU score, the more similar the two areas are, and the better the task performance is. Its calculation formula is
4.3.2. Dataset
4.3.3. Performance Testing
4.3.4. Crack Information Calculation
- Use our proposed model to find out the crack part and obtain the segmented picture.
- De-noising the split crack image and removing the clutter but retaining the real crack pixels. Then, the Canny edge detection algorithm is used to extract the contour of the crack.
- The extracted crack width is calculated by rotating slice method. This method will traverse all pixels along the edge and divide the image into multiple slices at five intervals. Find the intersection with other edges and record the number of pixels between these points as w (i = 1, 2, 3 … 30). Take the minimum value among w1, w2, w3 … w30, and record it as wmin, which represents the crack width of this position on the pixel map. By repeating these steps, the width information w of each crack pixel on the skeleton map can be obtained.
5. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, D.; Wu, H. Smart City Construction and Urban Green Development: An Analysis from the Perspectives of Industrial Structure and Technological Progress. Urban Plan. Constr. 2025, 3, 63–78. [Google Scholar] [CrossRef]
- Chen, K.; Reichard, G.; Xu, X.; Akanmu, A. Automated crack segmentation in close-range building façade inspection images using deep learning techniques. J. Build. Eng. 2021, 43, 102913. [Google Scholar] [CrossRef]
- Alexander, M.; Beushausen, H. Durability, service life prediction, and modelling for reinforced concrete structures—Review and critique. Cem. Concr. Res. 2019, 122, 17–29. [Google Scholar] [CrossRef]
- Wang, N.; Zhao, X.; Zhao, P.; Zhang, Y.; Zou, Z.; Ou, J. Automatic damage detection of historic masonry buildings based on mobile deep learning. Autom. Constr. 2019, 103, 53–66. [Google Scholar] [CrossRef]
- Yao, Y.; Tung, S.-T.E.; Glisic, B. Crack detection and characterization techniques—An overview. Struct. Control Health Monit. 2014, 21, 1387–1413. [Google Scholar] [CrossRef]
- Krishnan, S.S.R.; Karuppan, M.K.N.; Khadidos, A.O.; Khadidos, A.O.; Selvarajan, S.; Tandon, S.; Balusamy, B. Comparative analysis of deep learning models for crack detection in buildings. Sci. Rep. 2025, 15, 2125. [Google Scholar] [CrossRef]
- Koch, C.; Georgieva, K.; Kasireddy, V.; Akinci, B.; Fieguth, P. A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure. Adv. Eng. Inform. 2015, 29, 196–210. [Google Scholar] [CrossRef]
- Jin, T.; Ye, X.-W.; Que, W.-M.; Wang, M.-Y. Automatic detection, localization and quantification of structural cracks combining computer vision and crowd sensing technologies. Constr. Build. Mater. 2025, 476, 141150. [Google Scholar] [CrossRef]
- Ruggieri, S.; Cardellicchio, A.; Nettis, A.; Renò, V.; Uva, G. Using attention for improving defect detection in existing RC bridges. IEEE Access 2025, 13, 18994–19015. [Google Scholar] [CrossRef]
- Abdel-Qader, I.; Abudayyeh, O.; Kelly Michael, E. Analysis of Edge-Detection Techniques for Crack Identification in Bridges. J. Comput. Civ. Eng. 2003, 17, 255–263. [Google Scholar] [CrossRef]
- Subirats, P.; Dumoulin, J.; Legeay, V.; Barba, D. Automation of Pavement Surface Crack Detection using the Continuous Wavelet Transform. In Proceedings of the 2006 International Conference on Image Processing, Atlanta, GA, USA, 8–11 October 2006; pp. 3037–3040. [Google Scholar]
- Wu, N.; Wang, Q. Experimental studies on damage detection of beam structures with wavelet transform. Int. J. Eng. Sci. 2011, 49, 253–261. [Google Scholar] [CrossRef]
- Peng, B.; Jiang, Y.-s.; Pu, Y. Pavement Crack Detection Algorithm Based on Bi-Layer Connectivity Checking. J. Highw. Transp. Res. Dev. 2014, 8, 37–46. [Google Scholar] [CrossRef]
- Wang, T.; Sun, Q.; Ji, Z.; Chen, Q.; Fu, P. Multi-layer graph constraints for interactive image segmentation via game theory. Pattern Recognit. 2016, 55, 28–44. [Google Scholar] [CrossRef]
- Deng, H. Research on Segmentation Algorithm of Gray Inhomogeneous Image Based on Cauchy Distribution. In Proceedings of the 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT), Shenyang, China, 13–15 November 2020; pp. 287–291. [Google Scholar]
- Noh, Y.; Koo, D.; Kang, Y.M.; Park, D.; Lee, D. Automatic crack detection on concrete images using segmentation via fuzzy C-means clustering. In Proceedings of the 2017 International Conference on Applied System Innovation (ICASI), Sapporo, Japan, 13–17 May 2017; pp. 877–880. [Google Scholar]
- Oliveira, H.; Correia, P.L. Automatic Road Crack Detection and Characterization. IEEE Trans. Intell. Transp. Syst. 2013, 14, 155–168. [Google Scholar] [CrossRef]
- Abdellatif, M.; Peel, H.; Cohn, A.G.; Fuentes, R. Combining block-based and pixel-based approaches to improve crack detection and localisation. Autom. Constr. 2021, 122, 103492. [Google Scholar] [CrossRef]
- Premachandra, C.; Waruna, H.; Premachandra, H.; Parape, C.D. Image Based Automatic Road Surface Crack Detection for Achieving Smooth Driving on Deformed Roads. In Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics, Manchester, UK, 13–16 October 2013; pp. 4018–4023. [Google Scholar]
- Di Mucci, V.M.; Cardellicchio, A.; Ruggieri, S.; Nettis, A.; Renò, V.; Uva, G. Artificial intelligence in structural health management of existing bridges. Autom. Constr. 2024, 167, 105719. [Google Scholar] [CrossRef]
- Wang, Q.; Mao, J.; Zhai, X.; Gui, J.; Shen, W.; Liu, Y. Improvements of YoloV3 for road damage detection. J. Phys. Conf. Ser. 2021, 1903, 012008. [Google Scholar] [CrossRef]
- Zhijun, W.; Wenjing, L.; Liang, L.; Meng, Y. Low-Rate DoS Attacks, Detection, Defense, and Challenges: A Survey. IEEE Access 2020, 8, 43920–43943. [Google Scholar] [CrossRef]
- Dung, C.V.; Anh, L.D. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
- Li, S.; Zhao, X. Image-based concrete crack detection using convolutional neural network and exhaustive search technique. Adv. Civ. Eng. 2019, 2019, 6520620. [Google Scholar] [CrossRef]
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1525–1535. [Google Scholar] [CrossRef]
- Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
- Al-Huda, Z.; Peng, B.; Algburi, R.N.A.; Al-antari, M.A.; Al-Jarazi, R.; Zhai, D. A hybrid deep learning pavement crack semantic segmentation. Eng. Appl. Artif. Intell. 2023, 122, 106142. [Google Scholar] [CrossRef]
- Lin, Z.; Wang, H.; Li, S. Pavement anomaly detection based on transformer and self-supervised learning. Autom. Constr. 2022, 143, 104544. [Google Scholar] [CrossRef]
- Wu, Z.; Lu, T.; Zhang, Y.; Wang, B.; Zhao, X. Crack Detecting by Recursive Attention U-Net. In Proceedings of the 2020 3rd International Conference on Robotics, Control and Automation Engineering (RCAE), Online, 5–8 November 2020; pp. 103–107. [Google Scholar]
- Li, H.; Liu, L.; Du, J.; Jiang, F.; Guo, F.; Hu, Q.; Fan, L. An Improved YOLOv3 for Foreign Objects Detection of Transmission Lines. IEEE Access 2022, 10, 45620–45628. [Google Scholar] [CrossRef]
- Zhang, X.; Fan, M.; Hou, M. Mobilenet V3-transformer, a lightweight model for image caption. Int. J. Comput. Appl. 2024, 46, 418–426. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zakeri, H.; Nejad, F.M.; Fahimifar, A. Image Based Techniques for Crack Detection, Classification and Quantification in Asphalt Pavement: A Review. Arch. Comput. Methods Eng. 2017, 24, 935–977. [Google Scholar] [CrossRef]
- Solimani, F.; Cardellicchio, A.; Dimauro, G.; Petrozza, A.; Summerer, S.; Cellini, F.; Renò, V. Optimizing tomato plant phenotyping detection: Boosting YOLOv8 architecture to tackle data complexity. Comput. Electron. Agric. 2024, 218, 108728. [Google Scholar] [CrossRef]
- da Silva Sobrinho, A.S.; Czeremuszkin, G.; Latrèche, M.; Wertheimer, M.R. Defect-permeation correlation for ultrathin transparent barrier coatings on polymers. J. Vac. Sci. Technol. A 2000, 18, 149–157. [Google Scholar] [CrossRef]
- Hua, Z.; Aranganadin, K.; Yeh, C.C.; Hai, X.; Huang, C.Y.; Leung, T.C.; Hsu, H.Y.; Lan, Y.C.; Lin, M.C. A Benchmark Review of YOLO Algorithm Developments for Object Detection. IEEE Access 2025, 13, 123515–123545. [Google Scholar] [CrossRef]
- Ye, Y.; Chen, Y.; Wang, R.; Zhu, D.; Huang, Y.; Huang, Y.; Liu, J.; Chen, Y.; Shi, J.; Ding, B.; et al. Image segmentation using improved U-Net model and convolutional block attention module based on cardiac magnetic resonance imaging. J. Radiat. Res. Appl. Sci. 2024, 17, 100816. [Google Scholar] [CrossRef]
- Feng, S.; Zhou, H.; Dong, H. Using deep neural network with small dataset to predict material defects. Mater. Des. 2019, 162, 300–310. [Google Scholar] [CrossRef]
- Bakirci, M. Advanced aerial monitoring and vehicle classification for intelligent transportation systems with YOLOv8 variants. J. Netw. Comput. Appl. 2025, 237, 104134. [Google Scholar] [CrossRef]
- Ai, L.; Ziehl, P. Advances in Digital Twin Technology in Industry: A Review of Applications, Challenges, and Standardization. J. Intell. Constr. 2025, 3, 1–19. [Google Scholar] [CrossRef]









| Hardware | Type |
|---|---|
| Graphics Card Model | RTX 3090 |
| Graphics Memory | 24 GB |
| CPU | i5-13500H |
| Model | Depth_Multiple | Width_Multiple | Params (M) | FLOPs (G) | FPS, V100 |
|---|---|---|---|---|---|
| YOLOv8n | 0.33 | 0.25 | 3.2 | 8.7 | 120+ |
| YOLOv8s | 0.33 | 0.50 | 11.2 | 28.6 | 90+ |
| YOLOv8m | 0.67 | 0.75 | 25.9 | 78.9 | 60+ |
| YOLOv8l | 1.00 | 1.00 | 43.7 | 165.2 | 45+ |
| YOLOv8x | 1.00 | 1.25 | 68.2 | 257.8 | 30+ |
| Model | mAP | Horizontal Cracks | Vertical Cracks | Diagonal Cracks | Other Cracks |
|---|---|---|---|---|---|
| Vit | 90.58 | 91.34 | 90.15 | 89.37 | 89.16 |
| Faster R-CNN | 86.18 | 86.61 | 86.87 | 89.64 | 85.19 |
| YOLOv3 | 85.27 | 86.38 | 86.17 | 85.97 | 83.64 |
| YOLOv5 | 88.98 | 88.65 | 89.28 | 88.19 | 85.45 |
| YOLOv7 | 92.17 | 93.48 | 93.81 | 92.37 | 89.77 |
| proposed model | 97.14 | 99.08 | 98.37 | 96.98 | 96.17 |
| YOLOv8 | Swin Transformer | mAP |
|---|---|---|
| ✓ | ✗ | 90.17 |
| ✗ | ✓ | 92.28 |
| ✓ | ✓ | 97.14 |
| Classification Network | Dice | ASSD | IoU |
|---|---|---|---|
| U-Net | 86.28% | 1.0814% | 83.92% |
| Attention U-Net | 89.17% | 0.7698% | 85.89% |
| Ours | 91.95% | 0.5618% | 86.87% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zuo, X.; Almutairi, A.D.; Saeed, M.K.; Dai, Y. Improved Dual-Module YOLOv8 Algorithm for Building Crack Detection. Buildings 2026, 16, 461. https://doi.org/10.3390/buildings16020461
Zuo X, Almutairi AD, Saeed MK, Dai Y. Improved Dual-Module YOLOv8 Algorithm for Building Crack Detection. Buildings. 2026; 16(2):461. https://doi.org/10.3390/buildings16020461
Chicago/Turabian StyleZuo, Xinyu, Ahmed D. Almutairi, Muneer K. Saeed, and Yiqing Dai. 2026. "Improved Dual-Module YOLOv8 Algorithm for Building Crack Detection" Buildings 16, no. 2: 461. https://doi.org/10.3390/buildings16020461
APA StyleZuo, X., Almutairi, A. D., Saeed, M. K., & Dai, Y. (2026). Improved Dual-Module YOLOv8 Algorithm for Building Crack Detection. Buildings, 16(2), 461. https://doi.org/10.3390/buildings16020461

