Next Article in Journal
Assessment of Implant Stability Using ISQ, Periotest, and CBCT in a Split-Mouth Pilot Study
Previous Article in Journal
CFD Analysis of a Venturi Injector System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intelligent Measurement of Concrete Crack Width Based on U-Net Deep Learning and Binocular Vision 3D Reconstruction

1
Shandong Hi-Speed Infrastructure Construction Co., Ltd., Jinan 250101, China
2
Shandong Hi-Speed South Ring Expressway Co., Ltd., Jinan 250003, China
3
School of Mechanics and Civil Engineering, China University of Mining and Technology, 2 Daxue Rd., Tongshan District, Xuzhou 221116, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(5), 2355; https://doi.org/10.3390/app16052355
Submission received: 18 January 2026 / Revised: 19 February 2026 / Accepted: 20 February 2026 / Published: 28 February 2026
(This article belongs to the Section Civil Engineering)

Abstract

The concrete cracking problem can seriously affect the durability and safety of civil structures. Accurately and quickly measuring the width of concrete cracks can help control defect development in a timely manner. Current research mainly relies on pixel detection of two-dimensional images, which lacks real three-dimensional information about crack lesions. Detection results are also obviously affected by various factors, such as shooting distance and posture, resulting in poor accuracy. Therefore, this paper presents an engineering-integrated solution that combines U-Net-based crack segmentation with binocular vision 3D reconstruction. The focus is placed on the practical deployment of the integrated pipeline, the optimization of key parameters under real inspection conditions, and the experimental validation of measurement accuracy on actual concrete cracks. Firstly, the U-Net deep learning algorithm is used to automatically identify and segment the concrete crack region; then, a binocular vision-based 3D reconstruction pipeline is adopted, and a parallax rejection algorithm based on a “double-threshold” decision is proposed to improve the fidelity of crack disparity maps, and the effect of the filter window size on the concrete crack region is analyzed; finally, an intelligent measurement method based on the 3D reconstruction model is proposed, and the measurement results of concrete crack width can be calculated directly from the 3D reconstruction model. The results show that (1) the model can identify the characteristics of the crack, and the detection effect at 4:00 p.m. is the best, because at this time the light is more uniform with less shadow and moderate contrast between the crack and its background; (2) the reconstruction of the 3D point cloud model of the concrete crack with a filtering window of size 9 × 9 is the best; (3) the maximum error between the calculated and measured values of crack width is 0.31mm, the minimum error is 0.07mm, and the average error is 0.15 mm, which indicates that the measurement accuracy reaches the sub-millimetre level and verifies the validity of the proposed method in this paper.

1. Introduction

According to the statistical results of the traffic management department, the national highway mileage has reached 5,436,800 km. With the development of the economy, the number of cars (especially large heavy-duty vehicles) continues to climb, maintenance work is being faced with unprecedented challenges, and the work of tunnel maintenance in tunnel structures is crucial to traffic safety [1,2,3].
Because they are affected by vehicle load and natural environment (e.g., temperature and climate), various diseases gradually appear in concrete structures. Under the joint action of rain erosion and vehicle load, cracks continue to expand, leading to a decline in performance and an increase in risk to traffic safety. In addition, cracks are often the early signs of concrete diseases, which may further develop into more serious structural damage and accelerate concrete deterioration if not repaired in time. Therefore, the early identification and effective management of cracks are essential for the long-term stable operation of tunnels. Therefore, the intervention of timely maintenance is particularly important as an effective means of repairing cracks, restoring the quality of tunnel use, and delaying end of life [4,5].
While the first step of the maintenance work is the detection and counting of concrete cracks, the traditional method of early inspection mainly relies on manual inspection, which faces many problems [6,7,8,9,10]. First of all, the efficiency of manual inspection is low; particularly for a wide range of long tunnels, manual inspection requires a lot of time and manpower. Recording and analyzing inspection data manually is not only time-consuming and inefficient but also often requires the closure of the inspection section due to the high labour intensity and long duration of the inspection process, which may expose the inspectors to traffic safety risks. In addition, manual inspection results are greatly influenced by human subjective factors and prone to misjudgment, omission, and other problems, which reduce the accuracy of detection. This uncertainty not only affects the accuracy of maintenance decisions but also may lead to more manpower and material resources invested in later highway maintenance, increasing operating costs. Therefore, it is of great importance that we improve the automation and intelligence level of inspection methods in order to enhance the efficiency and safety of tunnel maintenance.
The continuous development of computer vision has brought new opportunities and challenges to concrete cracking detection, and the high accuracy of computer vision-processed images can significantly improve the efficiency of crack detection [11,12]. Using digital image processing technology, a variety of digital features of cracks can be extracted, and then the tunnel cracks can be detected quickly and efficiently. Crack detection methods based on digital image processing usually include steps such as image denoising, feature extraction, crack classification, and segmentation. Based on the use of digital image-processing techniques such as fuzzy multilevel median filtering [13], grey scale correction [10], non-downsampled contour wavelet transform [14], and multiscale wavelet transform [15] to remove noise and enhance the image, a variety of digital features of cracks can be extracted. For example, Zhang X et al. [16] combined improved ICS-LBP texture features, locally correlated texture features, and relative standard deviation texture features to detect cracks.
For crack classification, Support Vector Machine (SVM) [17], beamlet transform [18], and other methods are widely used for crack classification tasks. Rababaah H et al. [19] investigated three types of classifiers based on projective and Hough features: genetic algorithms, multilayer perceptron, and self-organising mapping. For crack segmentation, Li JH et al. [20] constructed an eight-direction Sobel template for edge detection of crack-like disease images and subsequently processed the images using a weighted neighbourhood averaging algorithm and Ostu image segmentation algorithm. Jian Z et al. [21] proposed crack detection and segmentation algorithms based on the wavelet transform, which can be detected in real time; Wang H et al. [22] proposed a method to segment cracks using fractal dimensions, and the crack edge images obtained from the processing were clear and the crack edge continuity good. Zhang J et al. [23] proposed an automatic crack identification method based on phase coding group, which can detect cracks with weak contrast and also fine cracks.
In addition, another common method for measuring the width of a crack is the skeleton line method [24]. For example, Zhong A [25] used a crack width measurement algorithm based on the skeleton line, aiming to overcome the shortcomings of the traditional measurement method, and the results show that the crack width calculated by this method is closer to the real situation and has high reference value; meanwhile, the algorithm can accurately locate the maximum width of the crack and the most serious damage point, which helps to assess the crack development trend and provides a scientific basis for actual repair. Liu XY [26] obtained binary images of cracks after morphological segmentation of concrete cracks, calculated the pixel width of the cracks after skeleton extraction, and finally calculated the actual value of the width of the concrete cracks using the actual size of the individual pixel points that were calibrated.
However, current research is dominated by planar image pixel detection, which lacks the real 3D size information of crack lesions, and the detection results are obtained by pixel conversion, which is clearly affected by many factors such as shooting distance and shooting posture, resulting in poor accuracy in detection results.
Therefore, this paper presents an integrated pipeline that combines U-Net-based crack segmentation with binocular vision 3D reconstruction. The main contribution lies not in the individual algorithms themselves but in their tailored integration and targeted improvements, particularly a double-threshold-based parallax rejection algorithm and a systematic evaluation of filtering parameters, to enable sub-millimetre-level crack width measurement from 3D point clouds.

2. Automatic Identification and Segmentation of Concrete Cracks Based on U-Net Deep Learning

2.1. U-Net-Based Automatic Identification and Segmentation Algorithm

U-Net is a convolutional neural network originally designed for biomedical image segmentation. Its symmetric U-shaped architecture consists of a downsampling path (encoder) for feature extraction via convolution and pooling and an upsampling path (decoder) that restores spatial resolution through transposed convolutions. Skip connections between corresponding encoder and decoder layers preserve fine-grained details, making U-Net particularly effective for segmenting crack regions with complex boundaries. The network structure is illustrated in Figure 1.

2.2. Dataset and Training Protocol

To provide a clear and reproducible basis for the U-Net-based crack segmentation model, this section describes the dataset used in this study, including its collection, annotation, and the training protocol. The dataset comprises 2000 crack images under various lighting conditions and angles, which were manually annotated by domain experts using the LabelMe tool with high inter-annotator consistency (IoU > 0.95). The images were split into training (1500), validation (200), and test (300) sets. Online data augmentation—including random rotations, flips, brightness/contrast adjustments, and Gaussian blur—was applied to improve generalization. The U-Net model was implemented in PyTorch and trained on an NVIDIA RTX 3090 GPU using the Adam optimizer (initial learning rate 0.001) with a combined Binary Cross-Entropy and Dice loss, batch size of 8, and early stopping with a patience of 20 epochs. This protocol ensures that the model achieves optimal performance while providing a clear basis for reproducibility by other researchers.

2.3. Automatic Identification and Segmentation Effect Verification

In order to analyze the effect of identification and segmentation under different light intensities, the same crack pictures were taken at different time periods for detection. The shooting time started at 8 a.m. and was taken every two hours until 6 p.m. During image acquisition, specific time points (8:00 to 18:00) were recorded for each image. The two types of cracks include longitudinal and transverse cracks, and the detection results are shown in Figure 2:
The best effect of identification and segmentation occurs at 16:00, where the body of the crack is continuous and there is no wrong identification and segmentation areas of the background; this is because at this time, the light is more uniform, there are fewer shadows, and the contrast between the crack and the background is moderate, allowing the model to identify the crack features more accurately.
There is poor continuity of the main body of the detected cracks at 8:00 and 18:00 due to a low light angle or lack of light, which leads to blurring of the edge of the cracks or loss of contrast, making it difficult for the model to accurately extract the crack features.
The detection results for the rest of the time period have some wrong identification and segmentation areas, and the reason is that overexposure or shadow interference may lead to poor continuity of the crack body and more wrong identification and segmentation areas under the strong light between 10:00 and 14:00. Particularly when the direct sunlight or the ground reflections are strong, the model may misidentify non-cracked areas as cracks.

2.4. Quantitative Evaluation of Segmentation Performance

To quantify the lighting conditions beyond temporal information, we estimated the illumination intensity (lux) for each acquisition time using solar position algorithms and historical meteorological data. The estimation was performed as shown in Table 1 using the illumipy Python library, which calculates illuminance based on geographic location (latitude/longitude); date and time of acquisition; cloud coverage data retrieved from OpenWeatherMap historical archives for the corresponding dates. The algorithm accounts for solar altitude, atmospheric effects, and cloud cover to estimate horizontal illuminance in lux. While these are estimated values rather than direct measurements, they provide a reasonable approximation of the relative lighting conditions across different time points.
To objectively assess the accuracy of the U-Net-based crack segmentation, we evaluated its performance on a test set of crack images (including the images shown in Figure 2) using four standard segmentation metrics: intersection over union (IoU), precision, recall, and F1-score. The ground-truth masks were manually annotated by domain experts. The quantitative results are summarized in Table 1. The results show that the segmentation performance varies significantly with lighting conditions, consistent with the visual observations in Figure 2. The best performance is achieved at 16:00, with an IoU of 88.6% and F1-score of 88.8%, confirming that uniform illumination with moderate contrast is optimal for crack segmentation. The poorest performance occurs at 18:00 (IoU 75.4%) due to insufficient lighting and loss of edge contrast.
To visualize the trend of segmentation accuracy across different times of day, Figure 3 plots the IoU and F1-score values from Table 1 against time of day. As shown in Figure 3, accuracy improves from morning (8:00) to mid-afternoon, peaks at 16:00 (IoU 88.6%, F1-score 88.8%), and then declines sharply toward evening (18:00). These visualizations confirm that lighting conditions are a critical factor for crack segmentation performance and provide practical guidance for scheduling image acquisition in inspection workflows.

3. 3D Reconstruction of Concrete Cracks Based on Binocular Vision

3.1. Basic Theory of Stereo Matching

The process of 3D reconstruction is called stereo matching. According to the research of Scharstein and Szeliski, stereo matching can be divided into the following four steps [27]: ① matching cost calculation; ② cost aggregation; ③ parallax calculation; ④ parallax optimization. The most crucial steps are cost calculation and cost aggregation.
For cost calculation, this study adopts the Census transform, a non-parametric method that encodes local image structure by comparing pixel intensities within a window to the central pixel, generating a bit string representation (Equations (1)–(4)).
C e u , v = j = n n j = m m ξ I u , v , I u + i , v + j
where m, n are the dimensions of the matching window and are odd; m and n are the largest integer not greater than half of m and n; ( u , v ) are the pixel coordinates, and is the bit-by-bit connective. ξ is calculated according to the following equation:
ξ x , y = 0 ,   if   x y 1 ,   if   x > y
C u , v , d = H a m min g C e l u , v , C e r u d , v
H a m min g x , y = 0 ,   i f   x = y 1 ,   i f   x y
Cost aggregation means associating the matching cost of each pixel with the pixels around it. According to Hirschmuller, a point p cost aggregation approach is calculated as in Equation (5):
S p , d = r L r p , d
L r p , d = C p , d + min ( L r ( p r , d ) , L r ( p r , d 1 ) + P 1 , L r ( p r , d + 1 ) + P 1 , min r L r p r , i + P 2 ) min k L r ( p r , k )
There are the following choices for the aggregation direction r: left–right, right–left, up–down, down–up, diagonal, etc. The cost aggregation value in a certain neighbourhood direction of the p-point is divided into three items: the first item is the matching cost C of the current point; the second item is minimum value of the current parallax cost aggregation value of the pixel point p-r in the current neighbourhood direction, the cost aggregation value + P 1 for the parallax difference value of the p-r point that is 1, and the cost aggregation value + P 2 for the parallax difference value of the p-r point that is greater than 1; and the last item is the minimum cost aggregation value for a parallax difference value greater than 1 for the point p-r. The cost aggregation paths are generally made to be four-path or eight-path to achieve better results.

3.2. Impact of Different Parameters

The parameter selection discussed in this section is motivated by practical engineering requirements—specifically, the need to balance noise suppression and detail preservation in weakly textured crack regions under real lighting conditions. The following analysis aims to identify optimal settings for deployable inspection systems.

3.2.1. Impact of the Small Connected Regions

To address the problem of inconsistent parallax in crack regions, this paper introduces a heuristic parallax elimination strategy based on a double-threshold judgment. While the Census transform and cost aggregation follow standard stereo matching practices, the proposed threshold-based region rejection scheme is specifically designed for crack-like discontinuity features and is not commonly applied in conventional crack width measurement pipelines. Specifically: on the basis of the parallax obtained from the left-right consistency detection, two thresholds θ 1 , θ 2 are set first. The region S i to be eliminated is marked by the region tracking, and the difference between the two neighbouring parallaxes in the region S i is recorded as R i . If S i and R i satisfy the formula (7), the parallaxes in the region are all eliminated, wherein θ 1 denotes the determination threshold of the difference between the parallax of a point and the domain parallax within the same connected area, and θ 2 denotes the determination threshold of the size of the area of the connected area.
R i θ 1 S i θ 2
The value of θ 1 and θ 2 directly affects the effect of three-dimensional reconstruction, in which the value of θ 1 and the neighbouring parallax difference related to the value is generally small. The value of θ 1 ,   θ 2 is related to the size of the area to be rejected. The maximum area of the region to be culled is referred to as S 0 , which often takes values up to a maximum of only a dozen. Therefore, the value of θ 2 is discussed in the following two cases.
When θ 2 s 0 , the threshold is set too low. In this case, regions that satisfy Si θ2 are removed, but some genuinely erroneous regions with area Si satisfying θ2 < Si ≤ s0 do not meet the condition and are therefore retained. As shown in Figure 4a, this leads to residual erroneous 3D points in the reconstruction.
When θ 2 > s 0 , the threshold is set above the expected maximum area of erroneous regions. In this case, all genuinely erroneous regions (with area Sis0) satisfy Si θ2 and are correctly identified and removed. As shown in Figure 4b, the resulting 3D reconstruction is clean and free of erroneous points, confirming that setting θ2 slightly above s0 is optimal.

3.2.2. Impact of Filter Window Size

Although small connected regions and inconsistent disparities have been removed, residual noise in the disparity map may still affect reconstruction smoothness. A median filter is therefore applied to suppress noise while preserving edge details. The filter replaces each pixel with the median value within a sliding window of size h × w (both odd), as illustrated in Figure 5. In order to select the appropriate filtering window, this paper is designed to vary the window size from 3 × 3 to 11 × 11, and the parallax map filtering results are shown in Figure 6.
Some filtering results are selected for comparison. It can be seen that with the gradual increase in the filtering window, the area of the marked region in the figure gradually increases, which illustrates the fact that the increase in the filtering window leads to the erosion of the invalid parallax on the effective parallax. At the same time, the image brightness gradually becomes brighter, which is attributed to the fact that the increase in the filter window leads to a larger value of the parallax image element. In Figure 6a, when the filtering window is small, the parallax is not smooth enough, and the filtering effect is poor. In continuing to increase the filtering window, the parallax smoothing effect changes significantly but leads to the loss of more effective parallax, as shown in Figure 6i. However, the change in parallax map after filtering means it cannot be selected as a more appropriate filtering window.
Therefore, it is necessary to compare and analyze the spatial distribution state of the local point clouds in combination with the 3D reconstruction model, as shown in Figure 7. It can be seen that when the filter window is smaller than 7 × 9, the spatial point arrangement in the 3D model has an irregular shape, and a few spatial points are more concentrated; when the filter window continues to increase, the spatial point arrangement gradually becomes neat; and when the window increases to 9 × 9, the change in spatial point arrangement is basically unchanged. As can be seen from Figure 7, when the window is smaller than 7 × 9, the filtering is not smooth enough. Considering the effect of increasing the window on parallax erosion, we finally selected a filtering window size of 9 × 9 for this paper.

3.3. Results of 3D Reconstruction of Apparent Concrete Cracks

After the analysis and discussion in Section 3.2, the selected stereo matching scheme in this paper is shown in Table 2. Based on the parallax map and binocular camera parameters, the 3D coordinates of each point in the parallax map are calculated. The 3D high-definition full-colour model of the cracked area is shown in Figure 8.

4. Intelligent Measurement of Concrete Cracks Based on 3D Reconstruction Models

4.1. Intelligent Measurement Methods

The width of a crack at a certain place (denoted by P O ( u O , v O ) ) is calculated using the following method: make a plumb line of the centre line over a point on the centre line of the crack, and intersect with the contour line of the crack to obtain two intersections P 1 and P 2 . The distance between the two intersections corresponding to the three-dimensional points is the width of the crack at that place.
In detail, this method is divided into four steps, which are centre line curve fitting, calculation of the normal equation of the current point, calculation of the intersection point between the contour line and the normal line, and calculation of the width of the crack. The specific steps are as follows.
(1) Centre line Curve Fitting
Remember that the centre line matrix of a crack is M c , an a × 2 matrix, the first column of the matrix is u, and the second column is v. Take m points to fit the current position curve, and the fitted points form a set J m . The position P O of M c in the y matrix should satisfy the following relation:
P O J m , J m M c
1 + m + 1 2 k a m + 1 2
where m is an odd number to satisfy that the midpoint of the set is P O . The curve fitting methods are divided into two categories: straight line fitting and quadratic curve fitting. Specifically, the correspondence between pixel coordinates u, v and X, Y is determined first, and then the following fitting methods are chosen to fit the curve:
① Straight line fitting: L P : Y = k X + d .
② Quadratic curve fitting: L P : Y = a X 2 + b X + c
(2) Calculation of the equation of the normal to the current point
If the function of the set is known, the slope f x of the current position can be calculated from the derivation.
① straight line fitted to the current point slope: f x = k .
② quadratic curve fitted to the current point slope: f x = 2 a X O + b .
Then the current point normal slope f t = and L t normal equation can be calculated according to the Equations (10) and (11):
f t = 1 f x
L t : Y t = f t ( X t X O ) X O
(3) Calculation of the intersection of the contour line and the normal line
The normal line intersects the contour lines on both sides of the crack and has only two points of intersection. Define the two points on the contour line that are closest to the normal as a set of point pairs, denoted C u . Specify that the point in point pair C u that is closer to the straight line L P is the intersection of the normal L t with the contour line C. The normal L t intersects the contour C 1 at point pair C u 1 p 11 , p 12 and the contour C 2 at point pair C u 2 p 21 , p 22 . According to the above definition, the normal L t intersects the contour C 1 at p 12 , i.e., P 1 , and the normal L t intersects the contour C 2 at p 21 , i.e., P 2 . The key point, therefore, is how to solve the coordinates of P 1 and P 2 .
① Find the coordinates
Let the P 1 coordinate be the nearest point in the matrix M d to the centre line fitting line L P . The process of finding the x coordinate is similar to step ①. Take another point on P O ( X O , Y O ) that is different from P a X a , Y a . Take a straight line fit as an example; then the coordinates of P a can be calculated by the L P equation as follows:
P a X a , Y a : X a = X O + θ     Y a = k × θ + d
Similarly, P a X a , Y a is converted to pixel coordinates P a u a , v a . Accordingly, the coordinates in the M d matrix are taken and noted as P b u b , v b . The distance from the point to the normal can be calculated as in Equation (13):
D b L P = P a P O × P a P b P a P O
Then, the coordinate P 1 is the coordinate corresponding to the minimum distance in D b L P :
P 1 = f s e l e ( f s o r t u p M d , D p L P , 1 )
In the formula, the function f s o r t u p M d , D p L P arranges the coordinates M d in ascending order according to the parameter D p L P , always keeping the correspondence of M d D p L P before and after the ordering unchanged, and returns a matrix with the same size as M d ; the function f s e l e extracts the elements of the first row of the matrix.
② Find the coordinate P 2
Form vectors with the current position P O and the points in the matrix M d , fine the quantity product of each group of vectors and P O P 1 . Since P 1 and P 2 are located in the nearest positions on both sides of the centre line, the result of the vector product of P O P 2 and P O P 2 is negative and has the smallest value. P O P 2 endpoint position, i.e., the coordinate P 2 . Calculate the expression as follows:
P O P r = P O M d
D O r = P O P 1 · P O P r
P O P 2 = f s e l e ( f s o r t u p M d ( D O r < 0 ) , D O r ( D O r < 0 ) , 1 )
P 2 = P O P O P 2
where D O r ( D O r < 0 ) is logical matrix with the condition of D O r < 0 , and the corresponding rows of D O r and M d are selected according to D O r ( D O r < 0 ) .
(4) Calculation of crack width
If the pixel points P 1 and P 2 on the contour line at P O are known, combined with the parallax map D I and the new projection matrix M P n e w , the 3D spatial coordinates of the two points P 1 and P 2 in the 3D point cloud model can be calculated, and the distance between the two 3D spatial coordinates is the width of the crack at that location.

4.2. Intelligent Measurement Results of Concrete Cracks

After determining the number of curve fitting points and the number of polynomial fits, the crack width was calculated on this basis. In order to verify the measurement accuracy of the crack width calculation results, a total of nine points (as shown in Figure 9) were selected on the crack, and the true width of the crack was measured with a percentage scale, and then the calculated values were compared and analyzed with the true values. The results show (as shown in Table 3) that the maximum difference between the calculated value and the real value of crack width is 0.31 mm, the minimum difference is 0.07 mm, and the average difference is 0.15 mm, which indicates that the measurement accuracy reaches the sub-millimetre level, and verifies the validity of the method proposed in this paper.
In structural health monitoring, crack width is a key indicator for assessing deterioration severity and prioritizing maintenance actions. According to commonly referenced guidelines such as China’s JTG H12-2015 [28], crack widths below 0.2 mm are generally considered acceptable with routine monitoring recommended; widths between 0.2 mm and 0.5 mm indicate active deterioration requiring closer inspection; and widths exceeding 0.5 mm suggest significant structural concern typically triggering maintenance or repair actions. The proposed method achieves an average measurement error of 0.15 mm, which is below the 0.2 mm threshold distinguishing acceptable conditions from those requiring heightened attention. This level of accuracy is sufficient to reliably classify cracks into appropriate maintenance categories and support data-driven maintenance decisions with quantifiable measurement uncertainty.
To benchmark the performance of the proposed 3D reconstruction-based method, a classical 2D skeleton-based width measurement approach was implemented as a baseline, involving crack segmentation, skeleton extraction, normal-based width calculation, and pixel-to-millimeter conversion using a pre-calibrated scaling factor. Comparative evaluation on the same nine measurement points as shown in Table 4 demonstrates that the proposed method consistently outperforms the 2D approach across all error metrics. It achieves a mean absolute error of 0.15 mm—a 46% reduction from the 2D method’s 0.28 mm; its maximum error of 0.31 mm remains below the critical 0.5 mm maintenance threshold, while the 2D method occasionally exceeds it (0.52 mm); relative error improves to 6.2% versus 8.3%; and a lower standard deviation (0.07 mm vs. 0.12 mm) confirms more consistent performance across varied crack types and conditions.

4.3. Statistical Analysis Across Crack Types and Lighting Conditions

To provide a more comprehensive evaluation of the proposed method’s performance, we analyzed the measurement errors across different crack types based on their morphological characteristics. The nine measurement points were classified into three categories—transverse cracks: B3, B4; longitudinal cracks: B7, B8, B9; and networked cracks: B1, B2, B5, B6. For each category, we calculated the mean error and standard deviation to characterize both accuracy and consistency. The results are summarized in Table 5. In addition, to evaluate the method’s robustness to varying illumination, we analyzed measurement errors under different lighting conditions corresponding to the time points in Figure 2. For each lighting condition, all nine measurement points were evaluated, and the mean error and standard deviation are shown in Table 5.
The proposed method was evaluated across different crack types and lighting conditions. By crack type, transverse and longitudinal cracks show good consistency with mean errors of 0.13 ± 0.02 mm and 0.12 ± 0.02 mm, respectively, while networked cracks exhibit a higher mean error of 0.18 ± 0.10 mm due to their complex morphology. By lighting condition, optimal illumination at 16:00 yields the best accuracy and stability with a mean error of 0.15 ± 0.07 mm. Performance degrades progressively under strong light (0.19 ± 0.10 mm), low light (0.21 ± 0.11 mm), and dim light (0.26 ± 0.15 mm), confirming that measurement accuracy is strongly correlated with illumination quality.
These validation results, obtained from actual crack samples under realistic lighting and surface conditions, demonstrate that the proposed integrated framework is not only theoretically sound but also practically deployable for engineering inspection tasks. The sub-millimetre accuracy achieved (average error 0.15 mm) confirms its suitability for real-world structural health monitoring.
It should be noted that the experimental validation presented above was conducted under representative inspection conditions, with crack images acquired at specific times and using a fixed camera baseline. While the results demonstrate sub-millimetre accuracy under these settings, the generalizability of the method to broader scenarios requires further investigation. Factors such as varying surface textures (e.g., smooth vs. rough concrete finishes), different crack morphologies (e.g., branched, or filled cracks), and camera baseline configurations may influence segmentation accuracy and 3D reconstruction fidelity.

5. Conclusions

This study presents an integrated engineering solution for concrete crack width measurement by combining U-Net-based segmentation with binocular vision 3D reconstruction. The primary contribution lies in the effective integration and task-specific adaptation of existing techniques, the systematic optimization of key parameters under realistic inspection conditions, and comprehensive experimental validation achieving sub-millimetre accuracy on actual cracks. The main conclusions obtained are as follows.
The light intensity at different times has an effect on the automatic identification and segmentation of cracked areas. The model can identify the characteristics of the crack. In particular, the detection effect at 4:00 p.m. is the best, because at this time, the light is more uniform, with less shadow and moderate contrast between the crack and its background.
A binocular vision-based 3D reconstruction framework is adopted for concrete cracks. To improve reconstruction quality in weakly textured crack regions, a double-threshold-based parallax rejection algorithm is proposed, which effectively removes spurious disparities while preserving crack edge details—an aspect not addressed in standard stereo matching workflows. A parallax rejection algorithm based on “double-threshold” judgment is proposed specifically for eliminating mismatched regions in crack disparity maps that are extremely incongruous with their surroundings. This algorithm enhances local fidelity in 3D reconstruction and addresses a key limitation in conventional stereo matching when applied to fine crack features.
The effect of filter window size on the spatial distribution regularity of 3D point clouds is systematically analyzed. If the filter window is small, the parallax is not smooth enough and the filtering effect is poor; when the filter window continues to increase, the parallax smoothing effect changes significantly, but it will lead to the loss of more effective parallax; if the filter window is smaller than 7 × 9, the arrangement of spatial points in the 3D model shows irregular shape, and a few spatial points are more concentrated; when the window is increased to 9 × 9, the arrangement of spatial points remains basically unchanged. The 9 × 9 median filter window is therefore selected as the optimal balance between noise suppression and preservation of effective parallax.
An intelligent measurement method of concrete cracks based on 3D reconstruction model is proposed, which can directly calculate the measurement results of concrete crack width from the 3D reconstruction model. Experimental validation on real cracks demonstrates that the proposed method achieves sub-millimetre measurement accuracy, with a maximum error of 0.31 mm, a minimum error of 0.07 mm, an average error of 0.15 mm, and standard deviation of 0.07 mm, which confirms the effectiveness and engineering reliability of the integrated 3D measurement pipeline. Comparative evaluation on the same nine measurement points demonstrates that the proposed method consistently outperforms the 2D approach across all error metrics.
The proposed method, combining U-Net-based crack segmentation with binocular vision 3D reconstruction, is not intended as a standalone tool but as a modular component that can be embedded within broader structural health monitoring (SHM) and inspection frameworks. Its modular design, compatibility with various inspection platforms, and ability to generate decision-relevant data make it a viable component for automated, data-driven infrastructure management.

6. Limitations and Future Work

While the proposed method achieves sub-millimetre accuracy on the tested crack samples, several limitations should be acknowledged. First, the experimental dataset is limited in size and diversity, with images collected under specific lighting conditions and from a single environment. The performance of the method with varying surface textures, crack patterns, and camera configurations has not been fully explored. Second, the current pipeline assumes relatively clear crack boundaries; highly degraded or filled cracks may require additional preprocessing or alternative segmentation strategies. Third, the stereo matching parameters (e.g., baseline distance, filtering window size) were optimized for the current setup and may need recalibration for different camera systems or inspection platforms.
In addition, although it was developed for crack inspection, the core method—U-Net segmentation combined with binocular vision—is conceptually transferable to other concrete structures like slabs or bridge decks. However, its generalizability depends on several factors: surface texture (affecting crack–background contrast), inspection environment (outdoor lighting variability), crack morphology and orientation (impacting stereo matching), and camera setup (requiring recalibration for different working distances or platforms). While the framework provides a solid foundation, task-specific adaptations and re-validation are necessary before deployment on other structural elements.
Future work will focus on (1) expanding the dataset to include a wider range of civil environments, surface conditions, and crack types; (2) investigating adaptive parameter selection strategies to enhance robustness across varying inspection scenarios; and (3) exploring the integration of the proposed method with portable or drone-based inspection platforms for real-time structural health monitoring [29].

Author Contributions

Conceptualization, D.X. and G.W.; methodology, D.X., K.W. and S.L.; software, G.W. and Q.-A.W.; validation, D.X., G.W. and K.W.; formal analysis, G.W. and Z.C.; investigation, S.L., G.S. and X.F.; resources, M.H. and R.L.; data curation, K.W. and Z.C.; writing—original draft preparation, D.X. and G.W.; writing—review and editing, Q.-A.W., X.F., M.H., R.L. and G.C.; visualization, G.W. and Z.C.; supervision, R.L. and G.C.; project administration, G.C.; funding acquisition, G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number No. 2024YFF0507903 and the National Natural Science Foundation of China, grant number No. 52379114.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Huaijian Li, Kai Wang, Guangbin Shang and Xiaohua Fan were employed by the companies Shandong Hi-Speed Infrastructure Construction Co., Ltd. and Shandong Hi-Speed South Ring Expressway Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

u,vPixel coordinates in image
I(⋅)Image intensity value
C(p,d)Matching cost at pixel p with disparity d
dDisparity value
Lr(p,d)Aggregated cost at pixel p along direction r
m,nDimensions of matching window
h,wHeight and width of filter window
ξ(⋅)Census transform bit string
Bitwise concatenation operation
θ1,θ2Double thresholds for parallax rejection
SConnected region area in disparity map
kNumber of points used for centre line fitting
X,YFitted curve coordinates
D I Parallax map
M P n e w New projection matrix
M c Centre line matrix of a crack
P 0 ( u 0 , v 0 ) Width of a crack at a certain place
f x Current point slope

References

  1. Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
  2. Wei, S.; Zhang, H.; Wang, C.; Wang, Y.; Xu, L. Multi-temporal SAR data large-scale crop mapping based on U-Net model. Remote Sens. 2019, 11, 68. [Google Scholar] [CrossRef]
  3. Zhang, J.; Yao, Y.; Deng, B. Fast and robust iterative closest point. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3450–3466. [Google Scholar] [CrossRef]
  4. Xiong, C.; Zayed, T.; Abdelkader, E.M. A novel YOLOv8-GAM-Wise-IoU model for automated detection of bridge surface cracks. Constr. Build. Mater. 2024, 414, 135025. [Google Scholar] [CrossRef]
  5. Sun, W.; Wang, R. Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM. IEEE Geosci. Remote Sens. Lett. 2018, 15, 474–478. [Google Scholar] [CrossRef]
  6. Zhang, J.; Lu, C.; Wang, J.; Wang, L.; Yue, X.G. Concrete cracks detection based on FCN with dilated convolution. Appl. Sci. 2019, 9, 2686. [Google Scholar] [CrossRef]
  7. Li, S.; Gu, X.; Xu, X.; Xu, D.; Zhang, T.; Liu, Z.; Dong, Q. Detection of concealed cracks from ground penetrating radar images based on deep learning algorithm. Constr. Build. Mater. 2021, 273, 121949. [Google Scholar] [CrossRef]
  8. Buchinger, D.; Rosso, R.S.U., Jr. A divide-and-conquer algorithm for curve fitting. Comput.-Aided Des. 2022, 151, 103362. [Google Scholar] [CrossRef]
  9. Deng, J.; Lu, Y.; Lee, V.C.-S. Imaging-based crack detection on concrete surfaces using You Only Look Once network. Struct. Health Monit. 2021, 20, 484–499. [Google Scholar] [CrossRef]
  10. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  11. Qi, J.; Liu, L. The stereo matching algorithm based on an improved adaptive support window. IET Image Process. 2022, 16, 2803–2816. [Google Scholar] [CrossRef]
  12. Lin, Y.; Gao, Y.; Wang, Y. An improved sum of squared difference algorithm for automated distance measurement. Front. Phys. 2021, 9, 737336. [Google Scholar] [CrossRef]
  13. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE Computer Society: Los Alamitos, CA, USA, 2017; pp. 1–10. [Google Scholar]
  14. Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2767–2773. [Google Scholar]
  15. Yoon, J.; Shin, H.; Song, M.; Gil, H.; Lee, S. A crack width measurement method of UAV Images using high-resolution algorithms. Sustainability 2022, 15, 478. [Google Scholar] [CrossRef]
  16. Zhang, X.; Chen, Y.; Hong, H. Pavement Crack Detection Based on Texture Feature. In Proceedings of the MIPPR 2011: Automatic Target Recognition and Image Analysis, Guilin, China, 4–6 November 2011; SPIE: Washington, DC, USA, 2011; Volume 8003, p. 6. [Google Scholar]
  17. Kim, H.; Sim, S.H.; Spencer, B.F. Automated concrete crack evaluation using stereo vision with two different focal lengths. Autom. Constr. 2022, 135, 104136. [Google Scholar] [CrossRef]
  18. Ying, L.; Salari, E. Beamlet Transform-Based Technique for Pavement Crack Detection and Classification. Comput.-Aided Civ. Infrastruct. Eng. 2010, 25, 572–580. [Google Scholar] [CrossRef]
  19. Rababaah, H.; Vrajitoru, D.; Wolfer, J. Asphalt Pavement Crack Classification: A Comparison of GA, MLP, and SOM. In Proceedings of the Genetic and Evolutionary Computation Conference, Washington, DC, USA, 25–29 June 2005; Late-Breaking Paper: Seattle, WA, USA, 2005. [Google Scholar]
  20. Li, J.H. Pavement crack diseases detecting by image processing algorithm. J. Chang. Univ. (Nat. Sci. Ed.) 2004, 24, 24–29. (In Chinese) [Google Scholar]
  21. Jian, Z.; Huang, P.S.; Chiang, F.P. Wavelet-Based Pavement Distress Detection and Evaluation. Opt. Eng. 2006, 45, 409–411. [Google Scholar] [CrossRef]
  22. Wang, H.; Zhu, N.; Wang, Q. Segmentation of pavement cracks using differential box-counting approach. J. Harbin Inst. Technol. 2007, 39, 142–144. (In Chinese) [Google Scholar]
  23. Zhang, J.; Sha, A.M.; Sun, Z.Y.; Gao, H.G. Pavement Crack Automatic Recognition Based on Phase-grouping Method. China J. Highw. Transp. 2008, 2, 39–42. (In Chinese) [Google Scholar]
  24. Payab, M.; Abbasina, R.; Khanzadi, M. A brief review and a new graph-based image analysis for concrete crack quantification. Arch. Comput. Methods Eng. 2018, 26, 347–365. [Google Scholar] [CrossRef]
  25. Zhong, A.E. Research on Pavement Crack Recognition Based on Image Processing. Master’s Thesis, Wuhan University of Science and Technology, Wuhan, China, 2020. (In Chinese) [Google Scholar]
  26. Liu, X.Y. Research on Bridge Crack Detection Algorithm Based on Image Measure. Master’s Thesis, Inner Mongolia University of Science and Technology, Hohhot, China, 2022. (In Chinese) [Google Scholar]
  27. Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
  28. Ministry of Transport of the People’s Republic of China. Technical Specifications of Maintenance for Highway Tunnel: JTG H12—2015; Communications Press: Beijing, China, 2015. [Google Scholar]
  29. Sarfarazi, S.; Modano, M.; Fulgione, M. Artificial Intelligence for Dynamic Characterization of Composite Panel Structures: A Structured Review. Mech. Res. Commun. 2026, 151, 104607. [Google Scholar] [CrossRef]
Figure 1. U-Net network architecture diagram.
Figure 1. U-Net network architecture diagram.
Applsci 16 02355 g001
Figure 2. Detection results of U-Net algorithm at different times. (a) Photo; (b) detection result; (c) photo; (d) detection result.
Figure 2. Detection results of U-Net algorithm at different times. (a) Photo; (b) detection result; (c) photo; (d) detection result.
Applsci 16 02355 g002
Figure 3. Segmentation accuracy of U-Net as a function of time of day; (a) IoU trends across six time points (8:00–18:00); (b) F1-score trends across six time points (8:00–18:00).
Figure 3. Segmentation accuracy of U-Net as a function of time of day; (a) IoU trends across six time points (8:00–18:00); (b) F1-score trends across six time points (8:00–18:00).
Applsci 16 02355 g003
Figure 4. Disparity maps and three-dimensional reconstruction results under different thresholds. (a) case 1; (b) case 2.
Figure 4. Disparity maps and three-dimensional reconstruction results under different thresholds. (a) case 1; (b) case 2.
Applsci 16 02355 g004
Figure 5. Schematic diagram of median filtering.
Figure 5. Schematic diagram of median filtering.
Applsci 16 02355 g005
Figure 6. Filtering results with different window sizes; (a) h = 3, w = 3; (b) h = 3, w = 5; (c) h = 5, w = 5; (d) h = 5, w = 7; (e) h = 7, w = 7; (f) h = 7, w = 9; (g) h = 9, w = 9; (h) h = 9, w = 11; (i) h = 11, w = 11.
Figure 6. Filtering results with different window sizes; (a) h = 3, w = 3; (b) h = 3, w = 5; (c) h = 5, w = 5; (d) h = 5, w = 7; (e) h = 7, w = 7; (f) h = 7, w = 9; (g) h = 9, w = 9; (h) h = 9, w = 11; (i) h = 11, w = 11.
Applsci 16 02355 g006
Figure 7. Analysis of local spatial point alignment regularity with different window sizes; (a) h = 3, w = 3; (b) h = 3, w = 5; (c) h = 5, w = 5; (d) h = 5, w = 7; (e) h = 7, w = 7; (f) h = 7, w = 9; (g) h = 9, w = 9; (h) h = 9, w = 11; (i) h = 11, w = 11.
Figure 7. Analysis of local spatial point alignment regularity with different window sizes; (a) h = 3, w = 3; (b) h = 3, w = 5; (c) h = 5, w = 5; (d) h = 5, w = 7; (e) h = 7, w = 7; (f) h = 7, w = 9; (g) h = 9, w = 9; (h) h = 9, w = 11; (i) h = 11, w = 11.
Applsci 16 02355 g007
Figure 8. Result of three-dimensional model of cracks.
Figure 8. Result of three-dimensional model of cracks.
Applsci 16 02355 g008
Figure 9. Recognition results of crack contour.
Figure 9. Recognition results of crack contour.
Applsci 16 02355 g009
Table 1. Segmentation performance of U-Net under different lighting conditions.
Table 1. Segmentation performance of U-Net under different lighting conditions.
TimeEstimated Illumination (lux)IoU (%)Precision (%)Recall (%)F1-Score (%)
8:0032078.382.176.579.2
10:00125082.585.381.283.2
12:00210081.784.680.482.4
14:00185083.286.182.384.2
16:0095088.690.287.588.8
18:0018075.479.373.876.4
Table 2. Stereo matching scheme.
Table 2. Stereo matching scheme.
Stereo Matching StepsMethod and Parameter Selection
Cost CalculationCensus
Consideration AggregationFour-path cost aggregation
Parallax CalculationWinner-takes-all algorithm
Parallax OptimisationSub-pixel fitting
Uniqueness detection
Left–right consistency detection
Removal of small connected regions (x = 1, x = 16)
Region filling (x = 20)
Median filtering (filter window size 9 × 9)
Table 3. Calculated and measured crack widths.
Table 3. Calculated and measured crack widths.
PointsCalculated Width/mmMeasured Width/mmError/mmAverage Error/mm
B14.164.290.130.15
B22.662.730.07
B31.120.980.14
B41.060.950.11
B54.614.300.31
B63.053.260.21
B73.473.610.14
B82.923.020.10
B92.942.830.11
Table 4. Comparison between proposed 3D method and 2D skeleton-based method.
Table 4. Comparison between proposed 3D method and 2D skeleton-based method.
MetricProposed 3D Method2D Skeleton-Based MethodImprovement
Mean Absolute Error (mm)0.150.2846% reduction
Max Error (mm)0.310.5240% reduction
Min Error (mm)0.070.1450% reduction
Average relative Error (%)6.2%8.3%2.1% improvement
Standard Deviation (mm)0.070.1242% reduction
Table 5. Measurement error statistics across crack types and lighting conditions.
Table 5. Measurement error statistics across crack types and lighting conditions.
CategorySubcategoryPointsNumber of PointsMean Error (mm)Std Dev (mm)
Crack TypeTransverseB3, B420.130.02
LongitudinalB7, B8, B930.120.02
NetworkedB1, B2, B5, B640.180.10
Lighting ConditionLow light (8:00)All 9 points90.210.11
Strong light (12:00)All 9 points90.190.10
Optimal light (16:00)All 9 points90.150.07
Dim light (18:00)All 9 points90.260.15
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiao, D.; Wang, G.; Wang, K.; Liu, S.; Shang, G.; Wang, Q.-A.; Fan, X.; Hu, M.; Liu, R.; Chen, G.; et al. Intelligent Measurement of Concrete Crack Width Based on U-Net Deep Learning and Binocular Vision 3D Reconstruction. Appl. Sci. 2026, 16, 2355. https://doi.org/10.3390/app16052355

AMA Style

Xiao D, Wang G, Wang K, Liu S, Shang G, Wang Q-A, Fan X, Hu M, Liu R, Chen G, et al. Intelligent Measurement of Concrete Crack Width Based on U-Net Deep Learning and Binocular Vision 3D Reconstruction. Applied Sciences. 2026; 16(5):2355. https://doi.org/10.3390/app16052355

Chicago/Turabian Style

Xiao, Dedong, Gaoxin Wang, Kai Wang, Shukui Liu, Guangbin Shang, Qi-Ang Wang, Xiaohua Fan, Minghui Hu, Richeng Liu, Guozhao Chen, and et al. 2026. "Intelligent Measurement of Concrete Crack Width Based on U-Net Deep Learning and Binocular Vision 3D Reconstruction" Applied Sciences 16, no. 5: 2355. https://doi.org/10.3390/app16052355

APA Style

Xiao, D., Wang, G., Wang, K., Liu, S., Shang, G., Wang, Q.-A., Fan, X., Hu, M., Liu, R., Chen, G., & Chen, Z. (2026). Intelligent Measurement of Concrete Crack Width Based on U-Net Deep Learning and Binocular Vision 3D Reconstruction. Applied Sciences, 16(5), 2355. https://doi.org/10.3390/app16052355

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop