A Real-Time Mobile Robotic System for Crack Detection in Construction Using Two-Stage Deep Learning
Abstract
1. Introduction
1.1. Motivation and Problem Statement
1.2. Sensor Technologies and Automated Inspection
1.3. Research Scope and Contributions
- Residual-Based Deep Learning Model: A two-stage model (U-Net + Pix2Pix) trained on segmentation residuals that enhances boundary precision for thin and branched cracks significantly.
- Real-Time Edge Inference: A real-time detection pipeline optimized for edge deployment, evaluated on both onboard and distributed configurations to ensure consistent inference during autonomous navigation.
- Autonomous Navigation Integration: The integration of a UGV navigation stack utilizing ROS 2 (vHumble), leveraging SLAM and adaptive Monte Carlo localization for consistent trajectory tracking.
- Dynamic Evaluation: A comprehensive quantitative evaluation establishing robustness via mIoU, F1-score, precision, and recall, comparing static benchmarks against dynamic field performance.
2. Related Work
2.1. Visual Inspection and Structural Health Monitoring
2.2. Deep Learning-Based Crack Detection
2.3. Integrated Autonomous Robotic Inspection Systems
3. System Architecture and Hardware Platform
3.1. System Overview
3.1.1. Hardware Components
- A Hesai Technology (Shanghai, China) PandarXT-32 LiDAR (see Figure 3), featuring a 360° horizontal field of view (FOV), a 31° vertical FOV (−16° to +15°), 32 laser channels, and a detection range of 0.05 m to 120 m. It delivers up to 640,000 points/second in single-return mode and 1,280,000 points/second in dual-return mode, with an IP6K7 ingress protection rating for outdoor durability.
- An Intel (Santa Clara, CA, USA) RealSense D455 RGB-D camera (see Figure 3), providing synchronized high-resolution color and depth streams up to 90 fps.
- The onboard computer is a Mini-ITX single-board system (Intel i3-9100TE quad-core CPU, 16 GB DDR4 RAM, 250 GB SSD) with Ethernet, USB 3.0, RS232, and PCIe 3.0 × 16 expansion. It runs a minimal ROS 2 Humble instance and is responsible for low-level robot control, sensor data acquisition, and real-time communication with the offboard unit.
- The offboard computer is a Dell (Round Rock, TX, USA) Precision 5690 mobile workstation powered by an Intel Core Ultra 7 processor and an NVIDIA (Santa Clara, CA, USA) RTX 2000 Ada Generation GPU. This unit handles computationally intensive tasks, including SLAM, path planning, and the two-stage AI crack detection pipeline.
- Mapping and Localization: The SLAM toolbox performs LiDAR-based SLAM to generate a 2D occupancy grid map. Once mapped, AMCL localizes the robot within the map, and the NAV2 stack manages autonomous navigation with obstacle avoidance.
- AI Inference Pipeline: This pipeline implements a two-stage deep learning approach: (1) a U-Net model performs pixel-wise semantic segmentation to identify potential cracks in RealSense RGB-D frames, and (2) a Pix2Pix conditional GAN refines these predictions by enhancing crack continuity and suppressing noise or false positives. Both models are optimized for real-time inference using TensorRT.
3.1.2. System Architecture and Data Flow
- Upon deployment, the robot begins streaming LiDAR and RGB-D data to the offboard workstation.
- The Slam Toolbox incrementally builds a global map of the environment, which is visualized in real-time.
- An operator defines a region of interest (ROI) for inspection via an intuitive graphical interface overlaid on the map.
- NAV2 computes and executes a safe trajectory to the ROI, with the onboard computer managing motor control and obstacle response. As the robot navigates the path, the RGB-D camera continuously captures surface imagery.
- Simultaneously, the offboard system processes each frame through the U-Net + Pix2Pix pipeline, producing refined crack detections that are logged, visualized, and made available for downstream decision-making.
4. Methodology
4.1. Dataset Preparation and Model Training
4.1.1. Dataset Description
4.1.2. Data Preprocessing and Augmentation
4.2. Model Architecture and Training Setup
4.2.1. U-Net Training Configuration
4.2.2. Pix2Pix Refinement Training
- R = +1 denotes false negatives (crack pixels missed by the U-Net);
- R = 0 denotes correct predictions (either crack or background);
- R = −1 denotes false positives (non-crack pixels incorrectly classified as cracks).
- +1 is mapped to 255 (white; pixels to be added),
- 0 is mapped to 127 (gray; no correction required)
- −1 is mapped to 0 (black; pixels to be removed).
4.3. Performance Metrics and Evaluation
- Intersection over Union (IoU): Measures the overlap between predicted and ground truth crack regions, computed as
- Mean IoU (mIoU): The average IoU across both classes (crack and background), providing a balanced metric for binary segmentation:
- Precision: The ratio of correctly predicted crack pixels to all predicted crack pixels, quantifying the model’s ability to avoid false positives:
- Recall: The ratio of correctly predicted crack pixels to all actual crack pixels, measuring the model’s sensitivity to crack presence:
- F1 Score: The harmonic mean of precision and recall, providing a single metric that balances both:
4.4. Robot Software Integration and Operational Workflow
4.4.1. Mapping and Localization
4.4.2. Navigation, Collision Avoidance, and Path Planning
4.4.3. Multi-Sensor Fusion and Real-Time Coordination
4.4.4. Operational Workflow
- Mapping Phase: The operator manually drives the robot to explore the environment. This manual traversal enables the system to construct a static occupancy grid, distinguishing between free space and obstacles to serve as the foundation for future navigation.
- Path Planning: an operator interacts with the system to define the mission. Using the generated map, the operator manually selects regions of interest or inspection zones, establishing the trajectory waypoints the robot will follow.
- Autonomous Inspection: The execution phase relies on NAV2 for robust navigation and collision avoidance. The robot autonomously traverses a sequence of predefined waypoints to reach inspection targets. Simultaneously, the onboard camera captures high-resolution imagery, which is streamed to an offboard computer for real-time processing through a U-Net/Pix2Pix pipeline, enabling immediate crack detection without overburdening the robot’s onboard computer resources.
- Visualization: The final stage focuses on data presentation. Detected defects are spatially registered, marking cracks on the global map for localization. Additionally, the system generates detailed crack visualization, displaying the raw imagery alongside the segmented crack detection results to facilitate assessment.
5. Experimental Validation
5.1. Simulation and System Integration
5.2. Physical Environment Evaluation
5.2.1. Laboratory Testing
- (a)
- Floor crack scenario: A concrete slab with a surface crack was placed on the floor, as shown in Figure 11. The robot autonomously navigated to the regions of interest, and the camera captured high-fidelity RGB data. The recorded video of the laboratory floor experiment can be found at https://youtu.be/tSlldbQ8O78 [50].
- (b)
- Wall-mounted cracks: A vertical concrete slab with a surface crack was mounted on a wall to emulate bridge or retaining wall inspection contexts. The robot adjusted the arm-mounted camera to maintain near-perpendicular viewing geometry, ensuring consistent image scale and minimal perspective distortion.
- (c)
- Photorealistic 2D validation: To validate the model’s visual pattern recognition capabilities, high-resolution printed images of cracks were affixed to a wall. These images matched real defects in morphology and color but lacked physical depth.
5.2.2. Real-World Indoor Deployment
5.2.3. Visual Robustness Evaluation Using Real-World Images
5.3. Quantitative Evaluation
5.3.1. Static Dataset Analysis
5.3.2. Computational Cost Analysis
5.3.3. Real-Time System Performance
5.4. Qualitative Analysis and Real-Time Detection
5.5. Static vs. Real-Time Performance Comparison
6. Discussion
6.1. Contributions of the Two-Stage Architecture
6.2. Real-Time Performance and Robustness
6.3. Limitations and Practical Considerations
6.4. Future Research Directions
- Robotic Manipulation Integration: The current system will be extended to incorporate the full functionality of the robotic arm currently attached to the platform. This integration will enable physical interaction with detected defects, transforming the system from a passive inspection tool into an active intervention platform. The robotic arm will be programmed to autonomously position repair tools at crack locations identified by the vision system, enabling precise, targeted maintenance operations.
- 3D Crack Profiling: To complement the current 2D vision-based detection, laser scanning profiling will be integrated to acquire in-depth information about detected cracks. This capability will enable quantitative assessment of crack severity through measurements of depth, width, and volumetric extent, which will be critical parameters for structural health evaluation and repair planning. The fusion of visual detection with depth data will provide a more complete characterization of surface defects and improve prioritization of repair interventions.
- Autonomous Crack Repair: The ultimate goal is to develop a fully autonomous crack repair system that can detect, assess, and remediate surface defects without human intervention. This will involve investigating suitable repair materials and application methods compatible with robotic deployment, such as automated sealant injection or epoxy application. The integration of detection, profiling, and repair capabilities will enable continuous, proactive infrastructure maintenance that can address defects before they propagate into major structural problems.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Farrar, C.R.; Keith, W. Introduction. In Structural Health Monitoring; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2012; pp. 1–16. ISBN 978-1-118-44311-8. [Google Scholar][Green Version]
- Estes, A.C.; Frangopol, D.M. Updating Bridge Reliability Based on Bridge Management Systems Visual Inspection Results. J. Bridge Eng. 2003, 8, 374–382. [Google Scholar] [CrossRef]
- American Society of Civil Engineers. 2025 Report Card for America’s Infrastructure; American Society of Civil Engineers: Reston, VA, USA, 2025. [Google Scholar]
- Koch, C.; Doycheva, K.; Kasi, V.; Akinci, B.; Fieguth, P. A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure. Adv. Eng. Inform. 2015, 29, 196–210. [Google Scholar] [CrossRef]
- Jayaram, M.A. Computer vision applications in construction material and structural health monitoring: A scoping review. Mater. Today Proc. 2023, in press. [Google Scholar] [CrossRef]
- Percepto. Perception 2023: The State of Visual Inspection; Percepto Ltd.: Austin, TX, USA, March 2023; Available online: https://percepto.co/wp-content/uploads/2023/03/2023-State-of-visual-inspection.pdf (accessed on 7 January 2026).
- U.S. Bureau of Labor Statistics. Employment Projections—2023–2033. August 2024. Available online: https://www.bls.gov/news.release/archives/ecopro_08292024.pdf (accessed on 7 January 2026).
- U.S. Bureau of Labor Statistics. Occupational Outlook Handbook: Construction Laborers and Helpers. September 2024. Available online: https://www.bls.gov/ooh/construction-and-extraction/construction-laborers-and-helpers.htm (accessed on 7 January 2026).
- Kaartinen, E.; Dunphy, K.; Sadhu, A. LiDAR-Based Structural Health Monitoring: Applications in Civil Infrastructure Systems. Sensors 2022, 22, 4610. [Google Scholar] [CrossRef]
- Yuan, Q.; Shi, Y.; Li, M. A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges. Remote Sens. 2024, 16, 2910. [Google Scholar] [CrossRef]
- Ali, L.; Alnajjar, F.; Khan, W.; Serhani, M.A.; Al Jassmi, H. Bibliometric Analysis and Review of Deep Learning-Based Crack Detection Literature Published between 2010 and 2022. Buildings 2022, 12, 432. [Google Scholar] [CrossRef]
- Weng, X.; Huang, Y.; Li, Y.; Yang, H.; Yu, S. Unsupervised domain adaptation for crack detection. Autom. Constr. 2023, 153, 104939. [Google Scholar] [CrossRef]
- Wu, Y.; Li, S.; Li, Y. Depth-aware RGB-D concrete crack segmentation and quantification using progressive cross-modal attention. Measurement 2026, 258, 119453. [Google Scholar] [CrossRef]
- Musa, A.; Kakudi, H.; Hassan, M.; Hamada, M.; Umar, U.; Salisu, M. Lightweight Deep Learning Models For Edge Devices—A Survey. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2025, 17, 18. [Google Scholar] [CrossRef]
- Halder, S.; Afsari, K. Robots in Inspection and Monitoring of Buildings and Infrastructure: A Systematic Review. Appl. Sci. 2023, 13, 2304. [Google Scholar] [CrossRef]
- Huang, M.; Li, X.; Lei, Y.; Gu, J. Structural damage identification based on modal frequency strain energy assurance criterion and flexibility using enhanced Moth-Flame optimization. Structures 2020, 28, 1119–1136. [Google Scholar] [CrossRef]
- Huang, M.; Ling, Z.; Sun, C.; Lei, Y.; Xiang, C.; Wan, Z.; Gu, J. Two-stage damage identification for bridge bearings based on sailfish optimization and element relative modal strain energy. Struct. Eng. Mech. Int’l J. 2023, 86, 715–730. [Google Scholar] [CrossRef]
- Huang, M.; Zhao, W.; Gu, J.; Lei, Y. Damage identification of a steel frame based on integration of time series and neural network under varying temperatures. Adv. Civ. Eng. 2020, 2020, 4284381. [Google Scholar] [CrossRef]
- Kirthiga, R.; Elavenil, S. A survey on crack detection in concrete surface using image processing and machine learning. J. Build. Pathol. Rehabil. 2023, 9, 15. [Google Scholar] [CrossRef]
- Khan, M.A.-M.; Kee, S.-H.; Pathan, A.-S.K.; Nahid, A.-A. Image Processing Techniques for Concrete Crack Detection: A Scientometrics Literature Review. Remote Sens. 2023, 15, 2400. [Google Scholar] [CrossRef]
- Lee, D.; Nie, G.-Y.; Ahmed, A.; Han, K. Development of Automated Welding System for Construction: Focused on Robotic Arm Operation for Varying Weave Patterns. Int. J. High-Rise Build. 2022, 11, 115–124. [Google Scholar] [CrossRef]
- Lee, D.; Nie, G.-Y.; Han, K. Vision-based inspection of prefabricated components using camera poses: Addressing inherent limitations of image-based 3D reconstruction. J. Build. Eng. 2023, 64, 105710. [Google Scholar] [CrossRef]
- Hamishebahar, Y.; Guan, H.; So, S.; Jo, J. A comprehensive review of deep learning-based crack detection approaches. Appl. Sci. 2022, 12, 1374. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
- Mohammed, M.A.; Han, Z.; Li, Y. Exploring the detection accuracy of concrete cracks using various CNN models. Adv. Mater. Sci. Eng. 2021, 2021, 9923704. [Google Scholar] [CrossRef]
- Cha, Y.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput. Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Yang, H.; Yang, L.; Wu, T.; Meng, Z.; Huang, Y.; Wang, P.S.-P.; Li, P.; Li, X. Automatic detection of bridge surface crack using improved Yolov5s. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2250047. [Google Scholar] [CrossRef]
- Choi, W.; Cha, Y.-J. SDDNet: Real-time crack segmentation. IEEE Trans. Ind. Electron. 2019, 67, 8016–8025. [Google Scholar] [CrossRef]
- Panella, F.; Lipani, A.; Boehm, J. Semantic segmentation of cracks: Data challenges and architecture. Autom. Constr. 2022, 135, 104110. [Google Scholar] [CrossRef]
- Lee, D.; Nie, G.-Y.; Han, K. Automatic and Real-Time Joint Tracking and Three-Dimensional Scanning for a Construction Welding Robot. J. Constr. Eng. Manag. 2024, 150, 04023165. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Bae, H.; Jang, K.; An, Y.-K. Deep super resolution crack network (SrcNet) for improving computer vision–based automated crack detectability in in situ bridges. Struct. Health Monit. 2021, 20, 1428–1442. [Google Scholar] [CrossRef]
- Amieghemen, G.E.; Ramezani, M.; Sherif, M.M. Residual Pyramidal GAN (RP-GAN) for crack detection and prediction of crack growth in engineered cementitious composites. Measurement 2025, 242, 115769. [Google Scholar] [CrossRef]
- Dai, R.; Wang, R.; Shu, C.; Li, J.; Wei, Z. Crack Detection in Civil Infrastructure Using Autonomous Robotic Systems: A Synergistic Review of Platforms, Cognition, and Autonomous Action. Sensors 2025, 25, 4631. [Google Scholar] [CrossRef] [PubMed]
- Yarovoi, A.; Cho, Y.K. Review of simultaneous localization and mapping (SLAM) for construction robotics applications. Autom. Constr. 2024, 162, 105344. [Google Scholar] [CrossRef]
- Onatayo, D.; Onososen, A.; Oyediran, A.O.; Oyediran, H.; Arowoiya, V.; Onatayo, E. Generative AI Applications in Architecture, Engineering, and Construction: Trends, Implications for Practice, Education & Imperatives for Upskilling—A Review. Architecture 2024, 4, 877–902. [Google Scholar] [CrossRef]
- Sampath, V.; Maurtua, I.; Aguilar Martín, J.J.; Iriondo, A.; Lluvia, I.; Aizpurua, G. Intraclass Image Augmentation for Defect Detection Using Generative Adversarial Neural Networks. Sensors 2023, 23, 1861. [Google Scholar] [CrossRef]
- Kim, B.; Natarajan, Y.; Preethaa, K.S.; Song, S.; An, J.; Mohan, S. Real-time assessment of surface cracks in concrete structures using integrated deep neural networks with autonomous unmanned aerial vehicle. Eng. Appl. Artif. Intell. 2024, 129, 107537. [Google Scholar] [CrossRef]
- Delgado, J.M.D.; Oyedele, L.; Ajayi, A.; Akanbi, L.; Akinade, O.; Bilal, M.; Owolabi, H. Robotics and automated systems in construction: Understanding industry-specific challenges for adoption. J. Build. Eng. 2019, 26, 100868. [Google Scholar] [CrossRef]
- Xiao, B.; Chen, C.; Yin, X. Recent advancements of robotics in construction. Autom. Constr. 2022, 144, 104591. [Google Scholar] [CrossRef]
- Lee, D.; Han, K. Vision-Based Construction Robot for Automated Welding in Real-Time: Proposing Fully Automated and Human-Robot Interaction. Autom. Constr. 2024, 168, 105782. [Google Scholar] [CrossRef]
- Lee, D.; Han, K. Autonomous Navigation and Positioning of a Real-Time and Automated Mobile Robotic Welding System. J. Constr. Eng. Manag. 2025, 151, 04025033. [Google Scholar] [CrossRef]
- Zhu, H.; Leighton, B.; Chen, Y.; Ke, X.; Liu, S.; Zhao, L. Indoor navigation system using the fetch robot. In Proceedings of the International Conference on Intelligent Robotics and Applications; Springer: Berlin/Heidelberg, Germany, 2019; pp. 686–696. [Google Scholar]
- Kulkarni, S.; Singh, S.; Balakrishnan, D.; Sharma, S.; Devunuri, S.; Korlapati, S.C.R. CrackSeg9k: A Collection and Benchmark for Crack Segmentation Datasets and Frameworks. arXiv 2022, arXiv:2208.13054. [Google Scholar] [CrossRef]
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
- Lee, D. Real-Time Automated Mobile Robotic Crack Detection System_Gazebo Simulation. 2024. Available online: https://youtu.be/TNtV2WLi4f8 (accessed on 7 January 2026).
- Lee, D. Real-Time Automated Mobile Robotic Crack Detection System_Floor Experiment_Lab-Controlled. Available online: https://youtu.be/tSlldbQ8O78 (accessed on 7 January 2026).
- Lee, D. Real-Time Automated Mobile Robotic Crack Detection System_Wall Experiment_Lab-Controlled. 2024. Available online: https://youtu.be/2fDa-pLnY50 (accessed on 7 January 2026).
- Lee, D. Real-Time Automated Mobile Robotic Crack Detection System_2nd Floor in ERB at GSU. 2024. Available online: https://youtu.be/S_Qddbrxurk (accessed on 7 January 2026).
- Ogun, E.; Kim, J.; Lee, D. Advanced Crack Detection in Building Structures Using Pix2Pix and U-Net Architectures. In Proceedings of the Smart Materials, Adaptive Structures and Intelligent Systems; American Society of Mechanical Engineers: New York, NY, USA, 2025; p. V001T05A003. [Google Scholar]
- Noh, J.; Jang, J.; Jo, J.; Yang, H. Crack Segmentation Using U-Net and Transformer Combined Model. Appl. Sci. 2025, 15, 10737. [Google Scholar] [CrossRef]
- Li, Y.; Ma, R.; Liu, H.; Cheng, G. Hrsegnet: Real-time high-resolution neural network with semantic guidance for crack segmentation. arXiv 2023, arXiv:2307.00270. [Google Scholar] [CrossRef]
- Zhang, T.; Qin, L.; Zou, Q.; Zhang, L.; Wang, R.; Zhang, H. CrackScopeNet: A Lightweight Neural Network for Rapid Crack Detection on Resource-Constrained Drone Platforms. Drones 2024, 8, 417. [Google Scholar] [CrossRef]
- Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic Pixel-Level Crack Detection and Measurement Using Fully Convolutional Network. Comput. Aided Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
















| Category | Transformation | Parameters/ Probability (p) | Purpose/Effect |
|---|---|---|---|
| Geometric Transformations | Horizontal Flip | p = 0.5 | Introduces left–right orientation variability |
| Vertical Flip | p = 0.3 | Adds top–bottom orientation variability | |
| Random 90° Rotation | p = 0.5 | Ensures rotational invariance to crack orientation | |
| Shift Scale Rotate | Shift: ±10%, Scale: ±10%, Rotation: ±15°, p = 0.5 | Simulates perspective and viewpoint changes during mobile robot navigation | |
| Photometric Transformations | Random Brightness & Contrast | brightness_limit = 0.2, contrast_limit = 0.2, p = 0.3 | Mimics illumination/exposure variations due to lighting and weather changes |
| Image Degradation Simulations | Gaussian Noise | Variance ∈ [10.0, 50.0], p = 0.2 | Simulates sensor noise and compression artifacts |
| Elastic Transformation | α = 1, σ = 50, p = 0.2 | Introduces subtle local distortions resembling surface irregularities or lens aberrations | |
| Validation & Testing Pre-processing | Resizing | 384 × 384 | Standardizes input image dimensions for inference |
| Normalization | mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225] | Normalizes using ImageNet statistics for numerical stability and transfer learning |
| Method | mIoU (%) | F1 Score (%) | FPS |
|---|---|---|---|
| YOLOv8-seg (n) | 26.3 | 29.0 | 244 |
| YOLOv11-seg (n) | 29.5 | 36.4 | 200 |
| U-Net + ViT [54] | 71.8 | - | - |
| HrSegNet-B16 [55] | 78.4 | - | 182.0 |
| HrSegNet-B48 [55] | 80.3 | - | 140.3 |
| CrackScopeNet [56] | 82.1 | - | - |
| U-Net (baseline) | 70.2 ± 0.2 | 72.7 ± 0.5 | 45.2 |
| Proposed Method (U-Net + Pix2Pix) | 73.9 ± 0.6 | 76.4 ± 0.3 | 37.9 |
| Metric | U-Net Only | U-Net + Pix2Pix | Overhead |
|---|---|---|---|
| Parameters | 13.4 M | 42.6 M | +218% |
| FLOPs | 69.9 G | 109.1 G | +56% |
| Inference Time | 22.1 ms | 26.4 ms | +19% |
| Metric | Value | Description |
|---|---|---|
| Processing Rate (U-Net only) | ~20 fps | Baseline segmentation inference rate |
| Processing Rate (U-Net + Pix2Pix) | ~16 fps | Two-stage pipeline inference rate |
| End-to-End Latency (U-Net) | 50 ms | Frame acquisition to result publication |
| End-to-End Latency (U-Net + Pix2Pix) | 62.5 ms | Complete two-stage processing delay |
| Temporal Filter Window | 5 frames | Sliding-window size for consistency check |
| Temporal Detection Threshold | 4/5 frames | Required persistence for confirmation |
| Minimum Crack Threshold | 5.0% | Pixel percentage for frame-level detection |
| Operating Frame Resolution | 384 × 384 | Processed-image dimensions |
| Configuration | Model | FPS | Latency (ms) |
|---|---|---|---|
| Static (inference only) | U-Net | 45 | 22.2 |
| U-Net + Pix2Pix | 38 | 26.3 | |
| Real-Time (offboard GPU) | U-Net | 20 | 50.0 |
| U-Net + Pix2Pix | 16 | 62.5 | |
| Real-Time (onboard CPU) | U-Net | 5 | 200.0 |
| U-Net + Pix2Pix | 2 | 500.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ogun, E.; Voeurn, Y.A.; Lee, D. A Real-Time Mobile Robotic System for Crack Detection in Construction Using Two-Stage Deep Learning. Sensors 2026, 26, 530. https://doi.org/10.3390/s26020530
Ogun E, Voeurn YA, Lee D. A Real-Time Mobile Robotic System for Crack Detection in Construction Using Two-Stage Deep Learning. Sensors. 2026; 26(2):530. https://doi.org/10.3390/s26020530
Chicago/Turabian StyleOgun, Emmanuella, Yong Ann Voeurn, and Doyun Lee. 2026. "A Real-Time Mobile Robotic System for Crack Detection in Construction Using Two-Stage Deep Learning" Sensors 26, no. 2: 530. https://doi.org/10.3390/s26020530
APA StyleOgun, E., Voeurn, Y. A., & Lee, D. (2026). A Real-Time Mobile Robotic System for Crack Detection in Construction Using Two-Stage Deep Learning. Sensors, 26(2), 530. https://doi.org/10.3390/s26020530

