Deep Learning-Based Intelligent Detection Algorithm for Surface Disease in Concrete Buildings
Abstract
1. Introduction
2. Related Work
3. The Algorithms in This Study
3.1. Overall Improvements Based on the YOLOv8 Network
3.2. Backbone Network Improvements
3.3. Improvements to the Header Network
3.3.1. Introducing Label Assignment Positioning Alignment Tasks
3.3.2. Adding Shared Convolution and Scaling Features Using Scale Layer
3.3.3. Replace Batch Normalization (BN) with Group Normalization (GN)
3.3.4. Dynamic Selection of Interaction Features
3.4. Improvements to the Neck Network
4. Experimentation and Analysis
4.1. Experimental Environment and Dataset
4.2. Data Training
- (1)
- Input Image Size. The size of the input image determines the amount of image information processed by the model in each training step. The choice of image size directly affects the input layer of the model and the complexity of subsequent feature extraction. If the image is too small, it may lead to loss of information. If it is too large, it may lead to excessive computational cost and memory usage: 640 × 640 is a common choice, balancing detail and computational requirements in many tasks.
- (2)
- Optimizer. The choice of different optimizers affects the speed of convergence and the final performance of the model. SGD is one of the most basic optimization algorithms, and is simple and easy to implement. Although there are many more complex optimization algorithms (e.g., Adam, RMSprop, etc.), SGD is still preferred for many standard tasks due to its stability and simplicity.
- (3)
- Momentum Factor. The momentum factor is a strategy used to accelerate the convergence of SGD by weighting the historical information of the gradient to help the optimizer jump out in local minima. A momentum factor that is too small may lead to convergence that is too slow, and if it is too large, it may introduce oscillations: 0.8 is a common choice and usually provides stable convergence performance.
- (4)
- Number of Freezes. A freeze layer is a network layer that remains unchanged during training and is usually used in migration learning to avoid modifications to the feature extraction part of the pretrained model. The number of frozen layers directly affects the training speed and effectiveness of the model. Too many frozen layers may result in the model not being able to adapt to new tasks. Experimentally, it was shown that 100 frozen layers can retain most of the feature extraction ability of the pre-trained network.
- (5)
- Batch Size. The batch size determines the number of samples used in each gradient update. Larger batches usually increase the training speed, but also require more memory, while smaller batches may result in a noisier and more unstable training process. The experimentally chosen batch sizes of 16 and 8 were chosen as a balance under the computational resource constraints.
- (6)
- Learning Rate. The learning rate controls the step size of each gradient update. Too high a learning rate can lead to unstable training, while too low a learning rate can lead to slower convergence: 0.001 is a common starting value that needs to be adjusted for the learning rate to find the optimal value.
- (7)
- Number of Defrosts. Thaw counts usually refer to the gradual thawing of frozen layers during training for more detailed training. The unfreezing strategy can significantly affect the training effectiveness of the model, and unfreezing too early or too late may lead to poor training results. After experiments, it was found that 200 unfreezes can help the model gradually adapt to new tasks while retaining the original features.
- (8)
- Total Cycles. The total number of training cycles determines the number of model training cycles. Too few training cycles may lead to under-training of the model, while too many may lead to overfitting.
4.3. Evaluation Indicators
5. Experimental Results and Analysis
5.1. Ablation Experiment
5.2. Experimental Visualization Results
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Nomenclature
| Term | Definition | 
| Adaptive Layer Attention Network | A network structure that improves a model’s responsiveness to specific features by dynamically adjusting attention layer weights. | 
| Attention Mechanism | A technique that enhances a model’s focus on important information by weighting features dynamically. | 
| Convolutional Neural Network (CNN) | A deep learning network structure widely used for image and video analysis tasks through convolution operations. | 
| C2f Module | An improved convolution module that enhances large-kernel convolution using parallel expansion convolution and structural reparameterization techniques. | 
| Deep Learning | A machine learning method that uses neural networks to model and learn from complex patterns in data. | 
| Digital Image Processing Technology | A technique that uses computers to process and analyze image data. | 
| Graph Convolutional Network (GCN) | A neural network structure used for processing graph data that learns features through node neighborhood information. | 
| Image Segmentation | The process of dividing an image into multiple parts or regions for better analysis and processing. | 
| You Only Look Once | A real-time target detection algorithm that transforms the target detection problem into a regression problem. | 
| YOLOv8 | The latest version of the “You Only Look Once” object detection model. | 
| Manual Detection | A traditional bridge inspection method, usually observed and recorded by technicians on-site, is subjective, slow, and has a high leakage rate. | 
| Non-Maximum Suppression (NMS) | A technique used to remove redundant detection boxes to retain the best detection results. | 
| Region Proposal Network (RPN) | A network component that generates candidate regions for object detection. | 
| Small Object Detection Layer | A feature recognition layer specialized for small-target detection. | 
| Soft-NMS | An improved version of non-maximum suppression that reduces the scores of overlapping detection boxes instead of removing them outright. | 
| Task Align Dynamic Detection Head | A structure for aligning predictions for classification and localization tasks to maintain consistency between them. | 
References
- Yao, W.; Jin, X.; Wang, Y. Analysis of urban bridge cluster disease and inspection behavior based on intelligent management system. World Bridge 2023, 51, 115–121. [Google Scholar]
- Peng, W.; Shen, J.; Tang, X.; Zhang, Y. Review, analysis and insights of recent typical concrete construction accidents. China J. Highw. 2019, 32, 132–144. [Google Scholar]
- He, S.; Zhao, X.; Ma, J.; Zhao, Y.; Song, H.; Song, H.; Cheng, L.; Yuan, Z.; Huang, F.; Zhang, J.; et al. A review of highway concrete building inspection and evaluation techniques. China J. Highw. 2017, 30, 63–80. [Google Scholar]
- Lu, M. Study on disease analysis and reinforcement of concrete bridges. Eng. Constr. Des. 2021, 12, 157. [Google Scholar]
- Lin, Z. Current research status of UAV bridge inspection technology. Heilongjiang Transp. Technol. 2023, 46, 61–63. [Google Scholar]
- Gao, J.X. Research on New Technology of Concrete Building Inspection Based on Image Processing and Machine Learning. Master’s Thesis, Southeast University, Nanjing, China, 2018. [Google Scholar]
- Jahanshahi, M.R.; Masri, S.F. Adaptive vision-based crack detection using 3D scene reconstruction for condition assessment of structures. Autom. Constr. 2012, 22, 567–576. [Google Scholar] [CrossRef]
- Hoskere, V.; Narazaki, Y.; Hoang, T.; Spencer, B., Jr. Vision-based structural inspection using multiscale deep convolutional neural networks. arXiv 2018, arXiv:1805.01055. [Google Scholar]
- Song, W. Research on Recognition and Classification of Concrete Building Diseases Based on Image Deep Learning. Master’s Thesis, Beijing University of Posts and Telecommunications, Beijing, China, 2020. [Google Scholar]
- Mu, Z.; Qin, Y.; Yu, C.; Wu, Y.; Wang, Z.; Yang, H.; Huang, Y. Adaptive cropping shallow attention network for defect detection of bridge girder steel using unmanned aerial vehicle images. J. Zhejiang Univ. Sci. A 2023, 24, 243–256. [Google Scholar] [CrossRef]
- Ahmadi, M.; Ebadi-Jamkhaneh, M.; Dalvand, A.; Rezazadeh Eidgahee, D. Hybrid bio-inspired metaheuristic approach for design compressive strength of high-strength concrete-filled high-strength steel tube columns. Neural Comput. Appl. 2024, 36, 7953–7969. [Google Scholar] [CrossRef]
- Chen, H.; Yang, J.; Chen, X.; Zhang, D.; Gan, V.J. Tempnet: A graph convolutional network for temperature field prediction of fire-damaged concrete. Expert Syst. Appl. 2024, 238 Pt B, 121997. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhang, S.H.; Yang, H.K.; Yang, C.H.; Yuan, W.; Li, X.; Wang, X.; Zhang, Y.; Cai, X.; Sheng, Y.; Deng, X.; et al. Edge device detection of tea leaves with one bud and two leaves based on ShuffleNetv2-YOLOv5-Lite-E. Agronomy 2023, 13, 577. [Google Scholar] [CrossRef]
- Loffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (ICML), Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar] [CrossRef]
- Everingham, M.; Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. IJCV 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Girshick, R.B. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]












| Hardware or Software | Configure | 
|---|---|
| CPU | Intel(R) Xeon(R) Gold 5120 | 
| GPU | TITAN RTX | 
| Language | Python | 
| Deep learning framework | PyCharm | 
| Dependent on the library | Torch, NumPy, Matplotlib | 
| Network model | YOLOv8 | 
| Input Image Size | Optimizer | Momentum Factor | Number of Freezes | Batch Size | Learning Rate | Number of Defrosts | Batch Size | Learning Rate | Total Cycles | 
|---|---|---|---|---|---|---|---|---|---|
| 640 × 640 | SGD | 0.8 | 100 | 16 | 0.001 | 200 | 8 | 0.0001 | 300 | 
| Arithmetic | mAP@0.5 | Detection Time | 
|---|---|---|
| Faster R-CNN | 62.95 | 1.08 | 
| SSD | 61.82 | 0.73 | 
| YOLOv5 | 67.87 | 0.84 | 
| YOLOX | 68.46 | 0.85 | 
| YOLOv7 | 72.98 | 0.63 * | 
| YOLOv8 | 75.00 * | 0.67 | 
| Arithmetic | Crack | Spallation | Efflorescence | Exposed bars | Corrosion stain | mAP@0.5 | 
|---|---|---|---|---|---|---|
| YOLOv8 | 82.6 | 59.7 | 83.8 | 75.1 | 74.0 | 75.0 | 
| C2fDD | 81.7 | 61.2 | 85.8 | 75.7 | 75.0 | 75.9 (+0.9) | 
| TADDH | 86.2 | 63.7 | 87.4 | 76.4 | 80.3 | 78.8 (+3.8) | 
| P2 | 82.1 | 64.6 | 87.9 | 78.6 | 77.3 | 78.1 (+3.1) | 
| C2fDD + TADDH | 87.0 | 66.2 | 88.9 | 79.2 | 80.5 | 80.4 (+5.4) | 
| TADDH + P2 | 85.1 | 65.5 | 90.1 | 81.1 | 79.9 | 80.3 (+5.3) | 
| C2fDD + P2 | 83.3 | 64.3 | 88.4 | 81.4 | 77.7 | 79.0 (+4.0) | 
| YOLOv8 Dynamic Plus | 87.8 * | 68.2 * | 92.6 * | 82.9 * | 80.6 * | 82.4 (+7.4) * | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, J.; Pan, Y.; Zhang, J. Deep Learning-Based Intelligent Detection Algorithm for Surface Disease in Concrete Buildings. Buildings 2024, 14, 3058. https://doi.org/10.3390/buildings14103058
Gu J, Pan Y, Zhang J. Deep Learning-Based Intelligent Detection Algorithm for Surface Disease in Concrete Buildings. Buildings. 2024; 14(10):3058. https://doi.org/10.3390/buildings14103058
Chicago/Turabian StyleGu, Jing, Yijuan Pan, and Jingjing Zhang. 2024. "Deep Learning-Based Intelligent Detection Algorithm for Surface Disease in Concrete Buildings" Buildings 14, no. 10: 3058. https://doi.org/10.3390/buildings14103058
APA StyleGu, J., Pan, Y., & Zhang, J. (2024). Deep Learning-Based Intelligent Detection Algorithm for Surface Disease in Concrete Buildings. Buildings, 14(10), 3058. https://doi.org/10.3390/buildings14103058
 
        

 
       