Application of LMM-Derived Prompt-Based AIGC in Low-Altitude Drone-Based Concrete Crack Monitoring
Abstract
1. Introduction
- Step 1 (a, b): Engineers, researchers, and operators go to the field to observe and confirm the crack distribution situation (on-site close-distance, drone view).
- Step 2 (a, b): Collect crack image data (on-site close-distance, drone view).
- Step-3: Annotating the crack image data.
- Step 4: Train the YOLO model using paired crack images and annotations.
- Step-5: Saving the trained YOLO mode.
- Step-6: Transferring the drone-view data to infer using the trained YOLO model.
- Step 7: After inference, engineers can determine the location and size of cracks, and consider corresponding repairs.
- Step 8: Researchers can save crack distribution maps for further research.
- Step 2 (a): Apply Stable Diffusion to generate images. During this process, systematic prompt application should be discussed, and reasonable prompts can reduce on-site work.
- Step-3: Using the trained model to annotate the generated crack image data.
- Section 1: Train the ‘Visible Crack Dataset’ [24] (MIT License) using YOLOv8 as the annotation generator, and use low-altitude drone-view crack monitoring datasets, namely the Concrete Crack Image for Classification Dataset (CCI4CD) [25,26] (CC BY 4.0 License) and the Concrete Crack Segmentation Dataset (CCSD) [27] (CC BY-SA 4.0 License), for evaluation metrics.
- Section 2: Use the LMM (DeepSeek R1 with MIT license) to generate prompts for Stable Diffusion (v1.4 with CreativeML Open RAIL-M License) and collect the AIGC dataset (images and annotations).
- Section 3: Train the AIGC dataset and use CCI4CD and CCSD for evaluation metrics.
2. Methods
2.1. YOLOv8 (Visible Cracks)
2.2. LMM
2.3. Stable Diffusion
2.4. YOLOv8 (ConcreteCrackImage4Classification, CCI4C)
3. Results
4. Discussions
5. Conclusions
6. Future Works
- Crack-related prompts;
- Texture-related prompts;
- Color-related prompts;
- Scene-related prompts;
- Photography style-related prompts;
- Scalability and automation of crack image generation;
- Performance of AIGC across different environments;
- Integration of AIGC with existing crack classification/detection/segmentation systems;
- Multiple crack classification/detection/segmentation algorithms for verification;
- Multiple study sites/locations/backgrounds for verification.
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
AIGC | Artificial Intelligence-Generated Content |
CCI4CD | Concrete Crack Image for Classification Dataset |
CCSD | Concrete Crack Segmentation Dataset |
cls | Classification |
conf | Confidence Value |
det | Detection |
dfl | Distribution Focal Loss |
FN | False Negative |
FP | False Positive |
img2img | Image-to-Image |
IoU | Intersection over Union |
LLM | Large Language Model |
LMM | Large Multimodal Model |
mAP50 | Mean Average Precision calculated at an IoU threshold of 0.50. |
mAP50-95 | Mean Average Precision calculated at an IoU threshold from 0.50 to 0.95. |
SD (sd) | Stable Diffusion |
seg | Segmentation |
TL | True Label |
TN | True Negative |
TP | True Positive |
txt2img | Text-to-Image |
YOLOv8 | You Only Look Once version 8 |
References
- OpenAI. Available online: https://openai.com/ (accessed on 6 June 2025).
- YuanBao. Available online: https://yuanbao.tencent.com/ (accessed on 6 June 2025).
- Ren, Y.; Zhang, T.; Han, Z.; Li, W.; Wang, Z.; Ji, W.; Qin, C.; Jiao, L. A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing. Remote Sens. 2025, 17, 1748. [Google Scholar] [CrossRef]
- Pan, S.; Yoshida, K.; Yamada, Y.; Kojima, T. Monitoring human activities in riverine space using 4K camera images with YOLOv8 and LLaVA: A case study from Ichinoarate in the Asahi River. Intell. Inform. Infrastruct. 2024, 5, 89–97. [Google Scholar] [CrossRef]
- Pan, S.; Yoshida, K.; Yamada, Y.; Kojima, T. Trials of night-time 4K-camera-based human action recognition in riverine environments with multimodal and object detection technologies. Intell. Inform. Infrastruct. 2024, 5, 87–94. [Google Scholar] [CrossRef]
- Pan, S.; Yoshida, K.; Kojima, T. Application of the prompt engineering-assisted generative AI for the drone-based riparian waste detection. Intell. Inform. Infrastruct. 2023, 4, 50–59. [Google Scholar] [CrossRef]
- Shimoe, D.; Pan, S.; Yoshida, K.; Nishiyama, S.; Kojima, T. Application of image generation AI in model for detecting plastic bottles during river patrol using UAV. Jpn. J. JSCE 2025, 81, 24-16180. [Google Scholar] [CrossRef]
- Pan, S.; Yoshida, K.; Shimoe, D.; Kojima, T.; Nishiyama, S. Generating 3D Models for UAV-Based Detection of Riparian PET Plastic Bottle Waste: Integrating Local Social Media and InstantMesh. Drones 2024, 8, 471. [Google Scholar] [CrossRef]
- Pan, S.; Shimoe, D.; Yoshida, K.; Kojima, T. Local low-altitudes drone-based riparian waste benchmark dataset (LAD-RWB): A case study on the Asahi River Basin. Intell. Inform. Infrastruct. 2025, 6, 39–50. [Google Scholar] [CrossRef]
- Sun, Y.; Sheng, D.; Zhou, Z.; Wu, Y. AI hallucination: Towards a comprehensive classification of distorted information in artificial intelligence-generated content. Humanit. Soc. Sci. Commun. 2024, 11, 1278. [Google Scholar] [CrossRef]
- Lee, M.A. Mathematical Investigation of Hallucination and Creativity in GPT Models. Mathematics 2023, 11, 10. [Google Scholar] [CrossRef]
- Kumar, M.; Mani, U.; Tripathi, P.; Saalim, M.; Roy, S.; Kumar, M.; Mani, U.; Tripathi, P.; Saalim, M.; Sr, S. Artificial Hallucinations by Google Bard: Think Before You Leap. Cureus J. Med. Sci. 2023, 15, e43313. [Google Scholar] [CrossRef]
- Kompanets, A.; Duits, R.; Leonetti, D.; van den Berg, N.; Snijder, H.H. Segmentation tool for images of cracks. In Advances in Information Technology in Civil and Building Engineering; Skatulla, S., Beushausen, H., Eds.; Springer International Publishing: Cham, Switzerland, 2024; pp. 93–110. [Google Scholar] [CrossRef]
- Kompanets, A.; Leonetti, D.; Duits, R.; Snijder, B. Cracks in Steel Bridges (CSB) Dataset; 4TU.ResearchData: Leiden, The Netherlands, 2024. [Google Scholar] [CrossRef]
- Song, Y.; Su, Y.; Zhang, S.; Wang, R.; Yu, Y.; Zhang, W.; Zhang, Q. CrackdiffNet: A Novel Diffusion Model for Crack Segmentation and Scale-Based Analysis. Buildings 2025, 15, 1872. [Google Scholar] [CrossRef]
- Jamshidi, M.; El-Badry, M.; Nourian, N. Improving Concrete Crack Segmentation Networks through CutMix Data Synthesis and Temporal Data Fusion. Sensors 2023, 23, 504. [Google Scholar] [CrossRef]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. arXiv 2019, arXiv:1905.04899. [Google Scholar] [CrossRef]
- Li, H.-Y.; Huang, C.-Y.; Wang, C.-Y. Measurement of Cracks in Concrete Bridges by Using Unmanned Aerial Vehicles and Image Registration. Drones 2023, 7, 342. [Google Scholar] [CrossRef]
- Cao, H.; Gao, Y.; Cai, W.; Xu, Z.; Li, L. Segmentation Detection Method for Complex Road Cracks Collected by UAV Based on HC-Unet++. Drones 2023, 7, 189. [Google Scholar] [CrossRef]
- Humpe, A. Bridge Inspection with an Off-the-Shelf 360° Camera Drone. Drones 2020, 4, 67. [Google Scholar] [CrossRef]
- Shokri, P.; Shahbazi, M.; Nielsen, J. Semantic Segmentation and 3D Reconstruction of Concrete Cracks. Remote Sens. 2022, 14, 5793. [Google Scholar] [CrossRef]
- Yuan, Q.; Shi, Y.; Li, M. A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges. Remote Sens. 2024, 16, 2910. [Google Scholar] [CrossRef]
- Inácio, D.; Oliveira, H.; Oliveira, P.; Correia, P. A Low-Cost Deep Learning System to Characterize Asphalt Surface Deterioration. Remote Sens. 2023, 15, 1701. [Google Scholar] [CrossRef]
- Liu, F.; Liu, J.; Wang, L. Asphalt pavement crack detection based on convolutional neural network and infrared thermography. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22145–22155. [Google Scholar] [CrossRef]
- Özgenel, Ç.F.; Gönenç Sorguç, A. Performance comparison of pretrained convolutional neural networks on crack detection in buildings. In Proceedings of the 35th International Symposium on Automation and Robotics in Construction (ISARC 2018), Berlin, Germany, 20–25 July 2018; pp. 693–700. [Google Scholar]
- Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar] [CrossRef]
- Özgenel, Ç.F. Concrete Crack Segmentation Dataset. Mendeley Data VI 2019. Available online: https://data.mendeley.com/datasets/jwsn7tfbrp/1 (accessed on 6 June 2025).
- Github. Available online: https://docs.ultralytics.com/zh/models/yolov8/ (accessed on 6 June 2025).
- Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar]
- Lester, B.; Yurtsever, J.; Shakeri, S.; Constant, N. Reducing retraining by recycling parameter-efficient prompts. arXiv 2022, arXiv:2208.05577. [Google Scholar]
- Github. Available online: https://www.promptingguide.ai/jp/papers (accessed on 6 June 2025).
- Wang, J.; Liu, Z.; Zhao, L.; Wu, Z.; Ma, C.; Yu, S.; Dai, H.; Yang, Q.; Liu, Y.; Zhang, S.; et al. Review of large vision models and visual prompt engineering. Meta-Radiology 2023, 1, 100047. [Google Scholar] [CrossRef]
- Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv 2024, arXiv:2402.07927. [Google Scholar] [CrossRef]
- Choi, H.S.; Song, J.Y.; Shin, K.H.; Chang, J.H.; Jang, B.-S. Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer. Radiat. Oncol. J. 2023, 41, 209. [Google Scholar] [CrossRef]
- White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A prompt pattern catalog to enhance prompt engineering with Chatgpt. arXiv 2023, arXiv:2302.11382. [Google Scholar]
- Chang, Y.-C.; Huang, M.-S.; Huang, Y.-H.; Lin, Y.-H. The influence of prompt engineering on large language models for protein–protein interaction identification in biomedical literature. Sci. Rep. 2025, 15, 15493. [Google Scholar] [CrossRef]
- Zhao, F.; Zhang, C.; Zhang, R.; Wang, T. Visual Prompt Learning of Foundation Models for Post-Disaster Damage Evaluation. Remote Sens. 2025, 17, 1664. [Google Scholar] [CrossRef]
- Liu, H.; Yang, S.; Long, C.; Yuan, J.; Yang, Q.; Fan, J.; Meng, B.; Chen, Z.; Xu, F.; Mou, C. Urban Greening Analysis: A Multimodal Large Language Model for Pinpointing Vegetation Areas in Adverse Weather Conditions. Remote Sens. 2025, 17, 2058. [Google Scholar] [CrossRef]
- Li, H.; Zhang, X.; Qu, H. DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark. Remote Sens. 2025, 17, 719. [Google Scholar] [CrossRef]
Specification | Details |
---|---|
OS | Ubuntu 22.04.4 LTS |
CUDA Version | 12.4 |
NVIDIA-SMI | 550.144.03 |
GPU | NVIDIA GeForce RTX 3090 |
Specification | Details |
---|---|
Task | Segment |
Model | yolov8x-seg |
Early Stopping Patience | 100 |
Batch size | 16 |
Imgsz | 640 pixels |
Epochs | 500 |
Optimizer | Auto |
Lr0 | 0.01 |
Momentum | 0.937 |
Specifications | Details |
---|---|
Object | Straight cracks perpendicular to road, 3–10 mm width, clean edges, no branching, spaced 5–10 m apart |
Texture | Character: Rough and irregular. Surface Quality: It appears tactile and uneven, suggesting a dimensional surface rather than being flat and smooth. Likely Material: The description suggests it resembles coated, painted, or plastered wall. Detail: There are no visible patterns or specific features (like grooves or tiles) beyond the overall roughness and irregularity. Its relatively fine-grained roughness, not large pebbles or deep cracks. Visual Effect: Creates a sense of understated depth and physicality due to the light interacting with the uneven surface. |
Color | Hue: Light gray. Tone: Uniform. Your description emphasizes “overall tone uniform, no obvious color difference or pattern.” Saturation: Implied to be low saturation (true gray, not blue-gray or green-gray), reinforcing the neutrality described. Variation: Lack of significant color variation, striations, fading, or stains contributes strongly to the described simplicity and neutrality. |
Summary | The image depicts a surface with a rough, irregular texture resembling a coated or plastered wall. The texture provides subtle depth without prominent patterns. This surface is uniformly covered in a light gray color, lacking variations in hue, tone, or saturation, resulting in a clean, neutral, and minimalist appearance. |
Specifications | Details |
---|---|
Sampling steps | 150 |
Width/Height | 512/512 pixels |
Batch count | 100 (maximum, default = 1) |
Batch size | 8 (maximum, default = 1) |
Specifications | Details |
---|---|
Resize mode | Just resize (latent upscale) |
Sampling steps | 150 |
Switch at | 0.8 |
Sampling method | DDIM |
Schedule type | DDIM |
Width/Height | 512/512 pixels |
Batch count | 100 (maximum, default = 1) |
Batch size | 8 (maximum, default = 1) |
CFG Scale | 7 (default) |
Denoising strength | 0.75 (default) |
Specifications | Details |
---|---|
Imgsz | 512 pixels |
conf | 0.01 |
Specification | Details |
---|---|
Task | Segment |
Model | yolov8l-seg |
Patience | 100 |
Batch size | 32 |
Imgsz | 512 pixels |
Epochs | 500 |
Optimizer | Auto |
Lr0 | 0.01 |
Momentum | 0.937 |
Confidence | Accuracy (%) (Positive, Cracks) | Accuracy (%) (Negative, No_Cracks) |
---|---|---|
0.1 | 99.9 (19,992/20,000) | 95.6 (19,123/20,000) |
0.25 | 99.8 (19,953/20,000) | 97.5 (19,493/20,000) |
0.5 | 97.7 (19,545/20,000) | 98.9 (19,776/20,000) |
Confidence | Accuracy (%) (Positive, Cracks) | Accuracy (%) (Negative, No_Cracks) |
---|---|---|
0.1 | 99.9 (19,996/20,000) | 88.0 (17,598/20,000) |
0.25 | 99.5 (19,899/20,000) | 93.5 (18,698/20,000) |
0.5 | 82.2 (16,442/20,000) | 97.7 (19,531/20,000) |
Trained Model-1 | Trained Model-2 | |
---|---|---|
17 | 0.899 | 0.834 |
19 | 0.915 | 0.976 |
30 | 0.810 | 0.946 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pan, S.; Fan, Z.; Yoshida, K.; Qin, S.; Kojima, T.; Nishiyama, S. Application of LMM-Derived Prompt-Based AIGC in Low-Altitude Drone-Based Concrete Crack Monitoring. Drones 2025, 9, 660. https://doi.org/10.3390/drones9090660
Pan S, Fan Z, Yoshida K, Qin S, Kojima T, Nishiyama S. Application of LMM-Derived Prompt-Based AIGC in Low-Altitude Drone-Based Concrete Crack Monitoring. Drones. 2025; 9(9):660. https://doi.org/10.3390/drones9090660
Chicago/Turabian StylePan, Shijun, Zhun Fan, Keisuke Yoshida, Shujia Qin, Takashi Kojima, and Satoshi Nishiyama. 2025. "Application of LMM-Derived Prompt-Based AIGC in Low-Altitude Drone-Based Concrete Crack Monitoring" Drones 9, no. 9: 660. https://doi.org/10.3390/drones9090660
APA StylePan, S., Fan, Z., Yoshida, K., Qin, S., Kojima, T., & Nishiyama, S. (2025). Application of LMM-Derived Prompt-Based AIGC in Low-Altitude Drone-Based Concrete Crack Monitoring. Drones, 9(9), 660. https://doi.org/10.3390/drones9090660