An Integrated Framework for Pavement Crack Segmentation and Severity Estimation

Alsharayah, Osama; Manasreh, Dmitry; Nazzal, Munir D.

doi:10.3390/buildings16030677

Open AccessArticle

An Integrated Framework for Pavement Crack Segmentation and Severity Estimation

by

Osama Alsharayah

,

Dmitry Manasreh

and

Munir D. Nazzal

^*

Center for Smart, Sustainable & Resilient Infrastructure (CSSRI), Department of Civil & Architectural Engineering & Construction Management, University of Cincinnati, Cincinnati, OH 45221, USA

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(3), 677; https://doi.org/10.3390/buildings16030677

Submission received: 30 December 2025 / Revised: 28 January 2026 / Accepted: 31 January 2026 / Published: 6 February 2026

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

Pavement maintenance programs rely on timely and accurate crack assessment to preserve roadway quality and reduce long-term rehabilitation costs. Manual inspection remains the prevailing practice, yet it is slow, subjective, and exposes crews to safety risks. Automating crack detection under real-world roadway conditions remains challenging due to inconsistent lighting, shadows, stains, and surface textures that obscure distress features. This study examines the applicability of an integrated, vehicle-mounted framework for automated pavement crack segmentation and width-based severity estimation under practical roadway operating conditions. Data were collected from a moving vehicle using a custom camera–GPS system operating under diverse conditions, capturing the variability encountered in practical surveys. The proposed approach employs a state-of-the-art segmentation model and a calibrated width estimation tool that converts pixel-level crack measurements into physical units using a position-dependent regression model. The key contribution of this work is a unified segmentation and severity evaluation pipeline supported by a novel pixel-to-inch calibration surface and validated using images acquired during normal driving operations and manual field crack measurements. By combining advanced computer vision techniques with practical field-oriented data collection, the proposed system provides a deployable solution for roadway crack assessment, enabling safer, faster, and more scalable network-level pavement monitoring.

Keywords:

pavement cracking; crack severity; crack width; computer vision; deep learning

1. Introduction

Maintaining safe and reliable roadway infrastructure is a major priority for transportation agencies, requiring systematic monitoring to identify pavement deterioration before it progresses into costly structural damage. Among the various pavement distresses, cracking is one of the earliest and most common indicators of degradation. If left untreated, cracks often widen and deepen over time, accelerating moisture infiltration and contributing to accelerated pavement failure. As noted by the Federal Highway Administration, the ability to detect and characterize cracks accurately plays a critical role in effective pavement management programs [1].

Traditional pavement inspection methods rely heavily on manual visual surveys, where trained personnel document crack characteristics by walking or driving along roadway segments. Although widely used, these surveys are slow, subjective, labor-intensive, and potentially hazardous for field crews. Moreover, the increasing size of roadway networks has highlighted the limitations of manual inspection and the need for scalable, automated alternatives.

Automating pavement crack detection in real-world field conditions, however, remains a significant challenge. Cracks are often thin and irregular, and they frequently blend into surrounding pavement textures. Shadows, oil stains, tire marks, sealed joints, and variable lighting complicate the visual appearance of cracks, especially when data are collected from a moving vehicle. While recent advancements in computer vision and deep learning such as U-Net variants, attention-based networks, and YOLO-based models have shown promising segmentation performance on benchmark datasets [2,3,4,5,6,7,8,9,10,11,12,13], these datasets commonly lack the environmental and operational variability present in practical roadway surveys. As a result, many existing models do not generalize well to field-collected imagery.

Beyond segmentation, accurate crack width estimation is essential for severity assessment and maintenance prioritization. Prior research by Ong et al. [14] has explored geometric and hybrid algorithms for crack width measurement, yet these methods are often evaluated under restricted or controlled imaging conditions. Few studies explicitly address perspective distortion, camera mounting geometry, or systematic field validation factors that are critical when developing vehicle-mounted imaging systems for large-scale pavement assessment.

To address these gaps, this study develops an integrated framework for automated pavement crack segmentation and width-based severity measurement designed for real roadway environments. A custom camera-GPS data acquisition system was mounted on a vehicle to capture pavement imagery under diverse conditions. A YOLO11-based segmentation model was trained on 2000 images with pixel-level crack annotations, and a novel pixel-to-inch calibration surface was constructed to convert raw pixel distances into corrected physical widths across the full camera field of view. Width estimates can then be translated into severity categories using standard transportation agencies guidelines.

The main contributions of this study are as follows:

A large, real-world pavement imaging dataset collected entirely from a moving vehicle under diverse environmental and pavement conditions, capturing the variability encountered in operational roadway surveys.
An integrated crack segmentation and width measurement pipeline built on a YOLO11-based model capable of handling surface artifacts and real-world noise challenges.
A novel pixel-to-inch calibration surface that compensates for spatial distortion across the image frame and is validated using both controlled calibration objects and real field-measured cracks.

By linking advanced computer vision methods with practical, field-oriented data collection, this work provides a scalable solution for automated pavement crack assessment, supporting more efficient and objective pavement monitoring practices.

2. Related Work

Research on automated pavement crack analysis has expanded rapidly over the past decade, driven by the limitations of manual inspection and the growing need for scalable and objective pavement condition evaluation. Traditional visual surveys are often labor-intensive and require personnel to operate in close proximity to live traffic, motivating the development of automated, vehicle-mounted inspection systems that reduce on-site exposure while enabling network-level monitoring. Existing studies generally fall into two closely related research streams: crack detection and segmentation, and crack width or severity estimation. While both areas have seen substantial methodological advances, they are often treated independently, limiting their applicability in operational roadway assessments.

2.1. Crack Detection and Segmentation

Early pavement crack detection studies relied mainly on handcrafted image processing techniques, including thresholding, edge detection, and morphological operations. Fernandez et al. [9] reported that these methods performed reasonably well for clearly visible cracks but had difficulty detecting thin and irregular crack patterns that blend into textured pavement surfaces. Kheradmandi and Mehranfar [5] similarly noted that traditional approaches often required heavy preprocessing to reduce noise and enhance contrast, yet their performance remained sensitive to changes in pavement material and lighting conditions.

The use of machine learning introduced more flexible approaches to crack detection. Patch-based convolutional neural network classifiers showed clear improvements over handcrafted feature-based methods. Zhang et al. [7] reported higher detection accuracy and better generalization using CNN-based classifiers. Extending this work, Zhang et al. [8] developed CrackNet II, which used deeper, fully learnable architectures to improve precision and recall while maintaining practical inference speeds. Around the same time, object detection frameworks such as YOLO were investigated by Liu and Wang [11] and later refined through enhanced YOLOv8-based models [13], making near real-time crack localization more feasible for large-scale roadway surveys.

Crack detection approaches are generally divided into object detection and instance segmentation methods. Object detection frameworks identify cracks using bounding boxes, which allows faster annotation and inference but provides limited geometric detail. Instance segmentation, on the other hand, produces pixel-level crack boundaries, resulting in more accurate spatial representation at the cost of increased annotation effort. Zhang et al. [15] discussed this tradeoff and showed that pixel level labeling can require substantially more time per image than bounding-box annotation. Even with this added effort, instance segmentation has become the preferred approach for detailed crack analysis because of its improved spatial accuracy.

More recent research has focused on deep learning segmentation models based on encoder–decoder architectures and attention mechanisms. U-Net variants continue to be widely adopted, with Lau et al. [2] demonstrating strong performance using a ResNet-34 backbone across several benchmark datasets. Han et al. [3] proposed CrackW-Net, which introduced skip-level sampling blocks to reduce false detections caused by noise. Other studies presented specialized architectures such as MixCrackNet [4] and CrackFormer-II [6], which incorporated deformable convolutions, attention mechanisms, and transformer encoders to improve feature representation and robustness. Building upon these approaches, more recent segmentation models have explored hybrid and context-aware designs to further enhance crack delineation under complex surface conditions. Zim et al. [16] proposed EfficientCrackNet, which integrates convolutional and transformer modules to improve segmentation accuracy while reducing computational complexity, making it more suitable for large-scale applications. Zhu et al. [17] introduced transformer-based architectures with local perception and auxiliary convolution layers, demonstrating improved edge preservation and segmentation of fine crack structures. Agyei Kyem et al. [18] further advanced this direction with Context-CrackNet by explicitly modeling contextual information to enhance the detection of thin and discontinuous cracks across diverse pavement textures. Overall, these studies indicate that deep learning-based segmentation models consistently outperform traditional image processing methods on benchmark datasets.

Despite their strong performance on benchmark datasets, many recent segmentation models rely on deep encoder–decoder architectures, attention mechanisms, or transformer-based designs that substantially increase model complexity and computational cost. Such architectures often assume high-resolution imagery and controlled acquisition conditions, which may limit their practicality for large-scale, vehicle-mounted roadway surveys where real-time or near real-time processing is desirable. In response to these constraints, lightweight and single-stage frameworks such as YOLO-based segmentation models have gained increasing attention due to their favorable balance between accuracy and inference efficiency. Recent studies have demonstrated that YOLO-based segmentation can achieve competitive pixel-level performance while maintaining substantially lower computational overhead, making these models more suitable for field deployment under operational constraints.

2.2. Crack Width and Severity Estimation

Accurate crack width estimation is an important element of pavement condition assessment, since crack width directly affects severity classification and maintenance prioritization decisions. Early studies in this area relied mainly on geometric rules applied to skeletonized crack representations. Ong et al. [14] presented a detailed examination of commonly used geometric methods, including shortest-distance and orthogonal projection techniques. The shortest-distance method is relatively stable with respect to small pixel variations, but it often underestimates crack width because of irregular crack boundaries. Orthogonal projection methods provide a better representation of crack orientation, although they tend to be more sensitive to image noise and may overestimate width in complex pavement backgrounds. To address these limitations, Ong et al. [14] proposed hybrid approaches that combine both strategies, resulting in improved accuracy when evaluated on synthetic and controlled datasets.

Alongside geometric methods, data-driven approaches have also been investigated for crack severity assessment. Liu et al. [19] showed that combining visible imagery with infrared thermography within convolutional neural network models enables effective classification of crack severity levels. Their results indicated that models trained on fused imagery generally outperformed those trained on a single imaging modality, with EfficientNet-based architectures achieving the highest accuracy. These findings demonstrate the potential of deep learning for automated severity assessment. At the same time, such approaches often require large, labeled datasets and specialized sensing equipment, which can limit their practicality for large-scale or network-level pavement monitoring.

Although notable progress has been made in crack width measurement and severity classification, most existing studies remain focused on controlled imaging environments or idealized camera configurations. Important factors such as perspective distortion across the image frame and camera mounting geometry are rarely examined. In addition, validation using field-measured crack widths is still limited, particularly for systems designed to operate on vehicle-mounted platforms at traffic speeds.

2.3. Limitations of Existing Approaches and Research Gap

Although recent studies report strong performance in both crack segmentation and crack width estimation, these tasks are usually treated as separate problems. Segmentation models are commonly evaluated using pixel-level metrics on benchmark datasets, with little connection to physical crack properties or network-level pavement assessment needs. In a similar way, width estimation methods often emphasize algorithmic accuracy under controlled imaging conditions, while giving limited attention to spatial distortion, camera configuration, or constraints associated with real-world deployment.

This separation has created a gap between computer vision models that perform well in experimental settings and pavement assessment systems that can operate reliably in active roadway environments. Only a small number of studies attempt to integrate crack segmentation with calibrated width measurement into a single workflow. Even fewer validate such integrated systems using data collected from a moving vehicle under real traffic conditions.

To address these limitations, this study develops an integrated framework that combines deep learning-based crack segmentation with a calibrated width estimation approach tailored for real-world roadway applications. The framework emphasizes field-based data collection, spatial correction across the camera field of view, and validation using real crack measurements. By focusing on these aspects, the proposed approach moves beyond benchmark-focused evaluation and toward a practical and deployable solution for automated pavement crack assessment.

3. Methodology

Figure 1 presents an overview of the proposed research methodology, illustrating the main processing stages from data acquisition to crack severity classification.

3.1. System Description

The data acquisition system consisted of two primary components: a forward-facing camera and a GPS receiver used for frame synchronization and georeferencing and was mounted on a sport utility vehicle (SUV). The camera was based on a Sony IMX490 CMOS sensor with a native resolution of 2320 × 1620 pixels and a maximum frame rate of 20.8 frames per second. The sensor supports high dynamic range imaging, which is beneficial for handling variable lighting conditions encountered during roadway data collection. A 6 mm lens was used, providing a field of view sufficient to capture the full lane width.

Initially, the camera was mounted perpendicular to the pavement surface, corresponding to a 90° orientation relative to the horizontal plane (Figure 2). While this configuration provided a near-orthographic view of the pavement, it limited image quality at higher vehicle speeds due to reduced exposure time. As a result, crack features appeared less distinct, particularly under non-ideal lighting conditions.

To address this issue, the camera was reconfigured and mounted at the rear of the vehicle at a fixed height of approximately 5.5 feet above the pavement surface, and at an angle of approximately 40° relative to the horizontal plane, facing downward toward the pavement surface (Figure 3). This oblique configuration increased the effective light capture and improved image clarity during motion, while still maintaining consistent coverage of the travel lane.

The camera and GPS data streams were integrated using the Robot Operating System (ROS), enabling synchronized recording of image frames and positional data. Data acquisition and system control were handled using a Lenovo ThinkPad P1 Gen 5 laptop (Lenovo Group Ltd., Beijing, China) equipped with an Intel^® Core i9-12900H processor (I Intel Corporation, Santa Clara, CA, USA), running Ubuntu 20.04.6 LTS (Canonical Ltd., London, UK). During field operation, image data were recorded at 10 frames per second, while GPS data were logged at 20 Hz. Because the camera was mounted at an oblique angle relative to the pavement surface, spatial distortion varies across the image frame; this effect is explicitly accounted for through the position-dependent pixel-to-inch calibration surface described in Section 3.5.

3.2. Data Collection

Image frames captured by the data acquisition system consistently covered the full width of the travel lane, as shown in (Figure 3). Data collection was conducted at vehicle speeds ranging from 40 to 55 miles per hour to ensure adequate image quality for annotation and segmentation since vehicle speed influences image sharpness. Data were collected under a broad range of environmental and surface conditions to reflect the variability encountered in real roadway environments. Collection trips took place under both sunny and cloudy conditions and spanned summer and winter seasons. Rainy conditions were intentionally avoided due to safety considerations and the potential impact of water on image quality and sensor visibility. The dataset includes various pavement and surface types, and environmental conditions to support general applicability under practical roadway operating conditions. Furthermore, data collection sites encompassed state routes, local roads, interstate highways, and enclosed parking garages, all within the state of Ohio.

Compared to many datasets reported in the literature, the imagery collected in this study presents a higher level of difficulty. Rather than relying on static, high-resolution photographs acquired under controlled conditions, pavement images were captured from a moving vehicle under variable lighting, surface texture, and environmental conditions. This approach better reflects the challenges associated with practical roadway surveys and supports the development of models intended for deployment in real-world applications. (Figure 4) illustrates sample roadway segments on state routs from which data were collected.

The proposed framework is inherently dependent on RGB visual information and is therefore intended for daytime data collection under dry pavement conditions. Although the dataset spans a range of lighting, surface textures, and roadway environments, nighttime operation without active illumination, wet pavement surfaces, strong shadows, or severe surface contamination can substantially reduce visual contrast and hinder both manual annotation and automated crack segmentation. In practical deployments, these limitations can be mitigated by scheduling data collection during daylight hours, avoiding adverse weather conditions, and selecting acquisition times that minimize shadow effects.

3.3. Data Annotation

To enable supervised training of the crack segmentation model, a subset of the collected images was manually annotated at the pixel level as shown in (Figure 5). Annotation was performed using the LabelMe tool [20], an open-source annotation software developed by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), Cambridge, MA, USA, which supports polygon-based labeling of irregular crack geometries. The dataset includes longitudinal, transverse, and alligator cracks. Cracks were delineated using fine-grained polygon masks, while all non-crack pavement features were implicitly treated as background. Images containing visual artifacts commonly encountered in roadway imagery, including shadows, oil stains, sealed joints, surface discoloration, and tire marks, were intentionally labeled as background. This labeling strategy was adopted to reduce false positives and encourage the model to learn discriminative features associated with true crack patterns rather than superficial texture variations.

To ensure annotation consistency, a simple labeling guideline was followed throughout the process. Cracks were annotated only when a continuous fracture was visually recognizable, while ambiguous surface features were excluded.

3.4. Segmentation Model Training & Validation

Given the need for fast and reliable crack segmentation under real roadway operating conditions, the YOLO11 architecture was selected as the segmentation framework in this study. YOLO-based models are widely adopted in real-time computer vision applications due to their balance between inference speed and segmentation accuracy. YOLO11 provides multiple model scales, allowing performance to be adjusted based on computational and deployment constraints.

Among the available configurations, the YOLO11n-seg model was selected following comparative evaluation with larger variants. While larger models yielded marginal improvements in segmentation accuracy, these gains were accompanied by substantially higher computational cost. No consistent performance gains were observed for extremely small or discontinuous cracks, suggesting that segmentation performance in this study is primarily constrained by image resolution and visual contrast rather than model capacity alone. The Nano variant offered a favorable balance between accuracy and efficiency, making it more suitable for large-scale roadway analysis and potential real-time deployment. Crack segmentation was formulated as a single-class problem, with crack regions represented using pixel-level masks. The current framework does not explicitly distinguish other pavement distress types such as patches or raveling. While sealed cracks and patches did not result in systematic misclassification in the evaluated dataset, severe raveling may present visual patterns that warrant explicit multi-class modeling in future work.

A total of 2000 manually annotated images containing longitudinal, transverse, and alligator cracks, were used for model training and validation. Annotation files were initially generated in JSON format and subsequently converted into the YOLO-compatible segmentation format. All images were resized to a fixed resolution of 1280 × 1280 pixels prior to training to ensure consistent input dimensions. Resizing images to a fixed resolution of 1280 × 1280 pixels may result in slight thinning or smoothing of very narrow cracks, which represents a known limitation of the current training setup and is common in pixel-based segmentation frameworks.

The dataset was randomly split into 85% for training and 15% for validation. Model training was conducted for up to 300 epochs using the stochastic gradient descent (SGD) optimizer with a batch size of 8. The maximum number of epochs was selected to allow sufficient convergence of the segmentation loss, while adaptive training controls were used to prevent overfitting.

Data augmentation was applied during training using the YOLO framework to improve robustness under real-world roadway conditions. Augmentations included random scaling, slight rotation, and photometric adjustments such as brightness and contrast variation. Mosaic augmentation was enabled during early training stages to expose the model to varied spatial contexts and background combinations.

Training incorporated an adaptive learning rate scheduler to reduce the learning rate as validation performance plateaued. Early stopping was also employed based on validation loss trends to prevent overfitting and unnecessary training beyond convergence. Hyperparameter tuning was limited to controlled adjustments of training dynamics rather than exhaustive optimization. All training was performed on an NVIDIA A100 GPU.

To reduce manual labeling effort and improve dataset quality, the trained YOLO11n-seg model was iteratively used to generate preliminary segmentation masks for additional unlabeled images. These initial predictions were reviewed and corrected by a human annotator before being incorporated into subsequent training cycles. This iterative annotation strategy shown in (Figure 6), where model-generated labels are refined and reused for retraining, has been shown to improve both annotation efficiency and segmentation performance in similar computer vision applications [21].

Segmentation outputs were periodically reviewed to identify systematic errors, particularly false positives caused by visually similar non-crack features such as tire marks, crack sealant, and tar lines. To mitigate these errors, additional background images containing such features were included during training without crack annotations, providing negative examples that helped the model better distinguish true cracks from visually similar pavement artifacts.

3.5. Crack Width Estimation

Crack width estimation for longitudinal and transverse cracks is based on a position-dependent pixel-to-inch mapping function that compensates for perspective distortion caused by the oblique camera mounting configuration. Because pixel scale varies across the image frame, direct conversion from pixel length to physical width using a single global factor is not valid. Instead, a spatial mapping function was developed using repeated measurements of a reference object with known physical width.

The pixel-to-inch calibration surface is dependent on camera height and viewing angle. When the method is applied using an identical mounting configuration, the same calibration surface may be reused without recalibration. The reference object was placed on the pavement surface at multiple locations within the camera’s field of view to capture spatial variation in pixel measurements caused by perspective effects. (Figure 7 and Figure 8) show the reference object positioned at different locations across the image frame, illustrating how the same object appears with different apparent dimensions depending on its position.

To generate calibration data, multiple horizontal line segments were drawn across the width of the reference object along its longitudinal extent. These line segments represent repeated measurements of the same known physical width sampled at different image locations. In total, approximately 5000 line segments were generated from 366 object placements. For clarity, the annotation process is illustrated using a binary representation, where white pixels denote the reference object and black pixels denote the pavement background. (Figure 9) shows an example of the binary calibration image with multiple horizontal annotation lines used to extract pixel-length measurements.

For each line segment, the pixel length L_P was computed using the Euclidean distance between its endpoints:

L_{p} = \sqrt{{(x_{i} - x_{f})}^{2} + {(y_{i} - y_{f})}^{2}}

(1)

The midpoint coordinates of each line segment were then calculated as:

x_{m} = \frac{x_{i} + x_{f}}{2}, y_{m} = \frac{y_{i} + y_{f}}{2}

(2)

(Figure 10) shows an annotated calibration image with example pixel coordinates and the corresponding extracted pixel length, illustrating how both geometric length and spatial location are recorded for each measurement.

Using the dataset of approximately 5000 measurements, a regression-based mapping function was developed to relate pixel length and image position to the known reference width:

W_{r e f} = f (L_{p}, x_{m}, y_{m})

(3)

(Figure 11) presents a three-dimensional visualization of the measured pixel lengths as a function of image coordinates (x, y), highlighting the spatial distortion effects across the camera frame that motivate the use of a position-dependent mapping.

A linear regression model was used to estimate the mapping function. Once the mapping function was established, crack width measurement was performed directly on segmentation outputs. For a given crack, a single annotation line is drawn across the visible crack opening, representing the crack width. The pixel length and midpoint coordinates of this annotation are computed using Equations (1) and (2) and passed to the mapping function to estimate the physical crack width:

W_{c r a c k} = f (L_{p}, x_{m}, y_{m})

(4)

This measurement process is implemented through a Python-based workflow (Python 3.8.10, Python Software Foundation, Wilmington, DE, USA) that converts user-provided annotations into crack width estimates in inches. By using a spatially varying mapping function derived from field calibration data, the proposed approach enables practical and repeatable crack width estimation while accounting for perspective distortion across the image frame.

4. Results

4.1. Segmentation Model Performance

The YOLO11-based segmentation model was evaluated on a held-out validation set of 300 pavement images containing manually annotated crack masks. These images were excluded from training to ensure an unbiased assessment of generalization performance. In addition to pixel-level evaluation, segmentation outputs were post-processed to extract spatial crack density statistics along the surveyed roadway.

The lightweight YOLO11n-based segmentation model is designed for efficient inference, making it suitable for real-time or near real-time deployment in vehicle-mounted roadway monitoring applications.

Crack density was aggregated over one-mile segments along the SR-49 northbound route, producing a distribution that reflects spatial variability in pavement condition. The resulting histogram, shown in (Figure 12), demonstrates the model’s ability to support network-level assessment by identifying roadway segments with elevated cracking intensity.

To quantify segmentation accuracy, predicted masks were compared against ground-truth annotations to construct a confusion matrix (Figure 13), from which standard performance metrics were derived.

From the confusion matrix, the following standard object detection terms were derived:

True Positive (TP): Correctly predicted crack region matching a ground truth mask.
False Positive (FP): Predicted crack region where no ground truth mask exists.
True Negative (TN): Correct identification of non-crack regions.
False Negative (FN): Missed crack region where a ground truth mask exists.

These values were used to compute the evaluation metrics:

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

(5)

R e c a l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e}

(6)

F 1 s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

Table 1 summarizes the segmentation performance and computational characteristics of the evaluated YOLO11 model variants across different model scales. Note: The YOLO11n-seg model is shown in bold as it represents the configuration adopted in the proposed framework.

Training convergence was monitored using mAP50 for mask predictions across epochs. As shown in (Figure 14), performance stabilized well before the final training iteration, indicating that the model reached convergence without overfitting.

Qualitative comparisons between ground-truth annotations and predicted crack masks for representative pavement frames are provided in (Figure 15), demonstrating the model’s ability to capture thin, irregular crack geometries under diverse surface and lighting conditions.

4.2. Crack Width Measurement Analysis

The crack width measurement framework for longitudinal and transverse cracks was evaluated using a two-stage validation strategy consisting of controlled image-based validation and field validation using physical crack measurements. This combined approach was adopted to assess both the spatial robustness of the pixel-to-length mapping across the camera field of view and the accuracy of the tool under real pavement conditions.

The proposed framework employs a position-dependent pixel-to-length correction surface derived from reference measurements distributed across the image plane. This mapping is intended to compensate for perspective effects and camera mounting geometry, enabling consistent conversion from pixel-based crack measurements to physical widths throughout the entire field of view.

Sample Output

An example of the automated output generated by the crack width estimation tool is presented in Table 2. Each detected crack instance is associated with an estimated physical width and a corresponding severity classification based on predefined threshold values. This output format demonstrates the framework’s ability to directly support crack severity assessment and condition evaluation without requiring manual post-processing.

Image-based validation

Image-based validation was conducted to evaluate the spatial consistency of crack width estimation across the camera field of view. Reference objects with known physical widths were placed at multiple spatial locations within the image frame and imaged using the same camera configuration employed during crack data collection. Absolute width estimation errors were then computed by comparing predicted widths with known reference dimensions at each location.

Figure 16 shows the spatial distribution of absolute width estimation error for 110 validation samples. The results indicate a clear spatial pattern, with lower errors concentrated toward the central region of the frame and higher errors occurring near the image corners that should be excluded when aggregating estimations for network-level condition assessment. This behavior is consistent with perspective-induced distortion effects and highlights the limitations of applying a single global pixel-to-length conversion factor.

The observed spatial error distribution provides direct justification for the use of a position-dependent mapping, which explicitly accounts for spatial variation in scale and reduces systematic width estimation error across the image plane.

Field validation

Field validation was performed using real pavement cracks measured under in-service conditions. Crack widths were physically measured in the field using a ruler, and corresponding crack images were processed by the proposed framework to obtain predicted width estimates. Each field-measured crack was directly paired with its image-based prediction for quantitative comparison.

For this dataset (N = 60), the proposed method achieved a mean bias of −0.024 in, a mean absolute error (MAE) of 0.058 in, and a root mean square error (RMSE) of 0.073 in. (Figure 17) illustrates the relationship between predicted and measured crack widths, demonstrating strong agreement across the observed width range.

When translated into severity categories, the framework achieved an overall classification accuracy of 83.3%, with a Cohen’s κ coefficient of 0.82, indicating substantial agreement beyond chance. The corresponding confusion matrix (Figure 18) shows that most misclassifications occurred near severity boundaries, where relatively small deviations in width estimation can result in category transitions. It should be noted that field measurements were obtained using manual tools and are therefore subject to inherent measurement uncertainty, particularly for irregular or partially occluded cracks.

5. Conclusions

This study presented an integrated framework for automated pavement crack segmentation and width-based severity measurement designed for real-world roadway deployment. Unlike many prior works that rely on controlled, usually orthogonal, imaging conditions or static datasets, the proposed approach was developed and validated using pavement imagery collected from a moving vehicle under diverse environmental, pavement, and operational conditions. This emphasis on field realism addresses a critical gap between laboratory-scale computer vision research and practical pavement assessment applications.

A YOLO11n-based segmentation model was trained using pixel-level annotations to accurately delineate crack geometries in challenging roadway imagery. The lightweight architecture enabled efficient inference while maintaining sufficient spatial precision for downstream width analysis, making it suitable for large-scale or real-time deployment scenarios. The segmentation model achieved strong performance, with an mAP50 exceeding 0.92 and an F1 score of approximately 0.96 on field-collected roadway imagery, demonstrating reliable crack extraction under realistic operating conditions. Iterative refinement through model-assisted annotation further improved dataset quality and reduced labeling effort, contributing to robust segmentation performance across varied pavement types and surface conditions. To enable physically meaningful crack width measurement, this work introduced a mapping-based pixel-to-inch conversion approach that explicitly accounts for spatial distortion across the camera field of view. By placing a reference object of known width at numerous locations within the image frame and extracting thousands of pixel-length samples, a continuous mapping surface was derived to correct raw pixel measurements. This approach moves beyond uniform scaling assumptions and provides a practical solution for width estimation under oblique camera mounting geometries. Validation results demonstrated that the proposed calibration approach improved width estimation reliability, achieving approximately 83% accuracy for crack width measurements in practical field conditions, with residual errors primarily concentrated near the image periphery.

Overall, the proposed framework bridges the gap between deep learning-based crack detection and actionable pavement condition metrics by tightly integrating segmentation, spatial correction, and field validation. While the current implementation focuses on single-class crack segmentation and two-dimensional width estimation, the methodology is extensible to additional distress types, multi-class segmentation, and improved geometric modeling. Future work will explore broader validation across roadway networks, enhanced distortion modeling near image boundaries, and simple periodic recalibration to account for long-term changes in camera geometry during extended deployments. In addition, future studies will examine performance across different pavement types, traffic environments, and climatic conditions to further assess robustness and generalizability. The results demonstrate the feasibility of deploying computer vision-based crack assessment systems for practical, objective, and scalable pavement monitoring.

Author Contributions

Conceptualization, M.D.N.; Methodology, D.M. and M.D.N.; Validation, O.A. and D.M.; Formal analysis, O.A. and D.M.; Investigation, O.A. and D.M.; Resources, M.D.N.; Data curation, O.A. and D.M.; Writing—original draft, O.A.; Writing—review & editing, D.M. and M.D.N.; Supervision, M.D.N.; Project administration, M.D.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are unavailable due to privacy concerns.

Conflicts of Interest

All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Federal Highway Administration (FHWA). Pavement Condition Index Distress Identification Manual; U.S. Department of Transportation: Washington, DC, USA, 2014. Available online: https://www.fhwa.dot.gov/publications/research/infrastructure/pavements/ltpp/13092/13092.pdf (accessed on 9 September 2025).
Lau, S.L.H.; Chong, E.K.P.; Yang, X.; Wang, X. Automated pavement crack segmentation using U-Net-based convolutional neural network. IEEE Access 2020, 8, 114892–114899. [Google Scholar] [CrossRef]
Han, C.; Ma, T.; Huyan, J.; Huang, X.; Zhang, Y. CrackW-Net: A Novel Pavement Crack Image Segmentation Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22135–22144. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, C. Network for robust and high-accuracy pavement crack segmentation. Autom. Constr. 2024, 162, 105375. [Google Scholar] [CrossRef]
Kheradmandi, N.; Mehranfar, V. A critical review and comparative study on image segmentation-based techniques for pavement crack detection. Constr. Build. Mater. 2022, 321, 126162. [Google Scholar] [CrossRef]
Liu, H.; Yang, J.; Miao, X.; Mertz, C.; Kong, H. CrackFormer Network for Pavement Crack Segmentation. IEEE Trans. Intell. Transp. Syst. 2023, 24, 9240–9252. [Google Scholar] [CrossRef]
Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3708–3712. [Google Scholar] [CrossRef]
Zhang, A.; Wang, K.C.; Fei, Y.; Liu, Y.; Tao, S.; Chen, C.; Li, J.Q.; Li, B. Deep learning-based fully automated pavement crack detection on 3D asphalt surfaces with an improved CrackNet. J. Comput. Civ. Eng. 2018, 32, 04018041. [Google Scholar] [CrossRef]
Cubero-Fernandez, A.; Rodriguez-Lozano, F.J.; Villatoro, R.; Olivares, J.; Palomares, J.M. Efficient pavement crack detection and classification. J. Image Video Proc. 2017, 2017, 39. [Google Scholar] [CrossRef]
Zhang, H.; Qian, Z.; Tan, Y.; Xie, Y.; Li, M. Investigation of pavement crack detection based on deep learning method using weakly supervised instance segmentation framework. Constr. Build. Mater. 2022, 358, 129117. [Google Scholar] [CrossRef]
Liu, F.; Wang, L. UNet-based model for crack detection integrating visual explanations. Constr. Build. Mater. 2022, 322, 126265. [Google Scholar] [CrossRef]
Nie, M.; Wang, C. Pavement Crack Detection based on yolo v3. In Proceedings of the 2019 2nd International Conference on Safety Produce Informatization (IICSPI), Chongqing, China, 28–30 November 2019; pp. 327–330. [Google Scholar] [CrossRef]
Elsharkawy, Z.F.; Kasban, H.; Abbass, M.Y. Efficient surface crack segmentation for industrial and civil applications based on an enhanced YOLOv8 model. J. Big Data 2025, 12, 16. [Google Scholar] [CrossRef]
Ong, J.C.; Ismadi, M.Z.P.; Wang, X. A hybrid method for pavement crack width measurement. Measurement 2022, 197, 111260. [Google Scholar] [CrossRef]
Zhang, D.; Han, J.; Cheng, G.; Yang, M.-H. Weakly Supervised Object Localization and Detection: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5866–5885. [Google Scholar] [CrossRef] [PubMed]
Zim, A.H.; Iqbal, A.; Al-Huda, Z.; Malik, A.; Kuribayashi, M. EfficientCrackNet: A Lightweight Model for Crack Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 28 February–4 March 2025; pp. 6279–6289. [Google Scholar] [CrossRef]
Zhu, Y.; Cao, T.; Yang, Y. A Transformer-Based Pavement Crack Segmentation Model with Local Perception and Auxiliary Convolution Layers. Electronics 2025, 14, 2834. [Google Scholar] [CrossRef]
Agyei Kyem, B.; Asamoah, J.K.; Aboah, A. Context-CrackNet: A context-aware framework for precise segmentation of tiny cracks in pavement images. Constr. Build. Mater. 2025, 484, 141583. [Google Scholar] [CrossRef]
Liu, F.; Liu, J.; Wang, L. Deep learning and infrared thermography for asphalt pavement crack severity classification. Autom. Constr. 2022, 140, 104383. [Google Scholar] [CrossRef]
Wada, K. Labelme: Image Polygonal Annotation with Python; Version 4.6.0, [Computer software]; Zenodo: Geneva, Switzerland, 2021. [Google Scholar] [CrossRef]
Al Oide, A.; Manasreh, D.; Karasneh, M.; Melhem, M.; Nazzal, M.D. Enhancing Road Safety on US Highways: Leveraging Advanced Computer Vision for Automated Guardrail Damage Detection and Evaluation. Buildings 2025, 15, 668. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed crack segmentation and severity estimation framework.

Figure 2. Final camera setup mounted on the rear of the SUV.

Figure 3. Representative pavement image captured from the vehicle-mounted data acquisition system during field operation.

Figure 4. Sample data collection road segments in Ohio (state routs).

Figure 5. Illustration of raw pavement image and its corresponding pixel-level crack annotation. (a) raw image, (b) annotated image.

Figure 6. Iterative training and enhancement pipeline for crack segmentation model.

Figure 7. Reference object in 6 different positions within the camera field of view.

Figure 8. Spatial mapping reference object (white) positioned at multiple locations across the camera field of view.

Figure 9. Binary representation of the reference object used for calibration, showing multiple horizontal line segments drawn across the known object width to generate pixel-length measurements.

Figure 10. Annotated calibration image showing pixel coordinates and extracted line length.

Figure 11. 3D surface plot of pixel length as a function of image coordinates (x, y), illustrating spatial distortion effects.

Figure 12. Spatial distribution of crack density along the SR-49 route, aggregated over one-mile segments.

Figure 13. Confusion matrix comparing predicted crack masks with ground-truth annotations.

Figure 14. mAP50 (mask) progression over training epochs, illustrating model convergence.

Figure 15. Visual comparison of crack segmentation results: (a) ground-truth annotations and (b) YOLO11 predicted masks for representative pavement frames.

Figure 16. Spatial distribution of absolute width estimation error across the image (N = 110).

Figure 17. Predicted versus measured crack widths for field validation data (N = 60).

Figure 18. Confusion matrix for severity classification based on field-measured crack widths. Darker color intensity represents higher numbers of samples.

Table 1. Crack segmentation model metrics.

Detection Method	Recall	Precision	F1 score	mAP50
YOLO11n-seg	0.9761	0.9438	0.9597	0.9201
YOLO11s-seg	0.9774	0.9461	0.9615	0.9240
YOLO11m-seg	0.9759	0.9498	0.9626	0.9252
YOLO11l-seg	0.9782	0.9487	0.9632	0.9259

Table 2. Example output from the crack width measurement tool.

Crack ID	Measured Width (in)	Severity Class
001	0.23	Low
002	0.52	Moderate
003	0.31	Low
004	0.64	High
005	0.85	High
006	0.36	Low

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alsharayah, O.; Manasreh, D.; Nazzal, M.D. An Integrated Framework for Pavement Crack Segmentation and Severity Estimation. Buildings 2026, 16, 677. https://doi.org/10.3390/buildings16030677

AMA Style

Alsharayah O, Manasreh D, Nazzal MD. An Integrated Framework for Pavement Crack Segmentation and Severity Estimation. Buildings. 2026; 16(3):677. https://doi.org/10.3390/buildings16030677

Chicago/Turabian Style

Alsharayah, Osama, Dmitry Manasreh, and Munir D. Nazzal. 2026. "An Integrated Framework for Pavement Crack Segmentation and Severity Estimation" Buildings 16, no. 3: 677. https://doi.org/10.3390/buildings16030677

APA Style

Alsharayah, O., Manasreh, D., & Nazzal, M. D. (2026). An Integrated Framework for Pavement Crack Segmentation and Severity Estimation. Buildings, 16(3), 677. https://doi.org/10.3390/buildings16030677

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Integrated Framework for Pavement Crack Segmentation and Severity Estimation

Abstract

1. Introduction

2. Related Work

2.1. Crack Detection and Segmentation

2.2. Crack Width and Severity Estimation

2.3. Limitations of Existing Approaches and Research Gap

3. Methodology

3.1. System Description

3.2. Data Collection

3.3. Data Annotation

3.4. Segmentation Model Training & Validation

3.5. Crack Width Estimation

4. Results

4.1. Segmentation Model Performance

4.2. Crack Width Measurement Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI