Accuracy Enhancement of Homography-Based Crack Width Calculation Using RGB-D Sensors

Zhou, Shijie; Li, Yuxuan; Wang, Shuo; Narazaki, Yasutaka

doi:10.3390/buildings16112282

Open AccessArticle

Accuracy Enhancement of Homography-Based Crack Width Calculation Using RGB-D Sensors

¹

Zhejiang University/University of Illinois Urbana-Champaign Institute, Zhejiang University, Haining 314400, China

²

College of Civil Engineering and Architechture, Zhejiang Univiersity, Hangzhou 310058, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(11), 2282; https://doi.org/10.3390/buildings16112282

Submission received: 2 May 2026 / Revised: 30 May 2026 / Accepted: 3 June 2026 / Published: 5 June 2026

(This article belongs to the Section Building Materials, and Repair & Renovation)

Download

Browse Figures

Versions Notes

Abstract

Accurate crack width measurement is important for structural condition assessment, but image-based methods are sensitive to oblique viewing angles and varying imaging distances. To address this challenge, this study proposes a method and evaluation framework for crack width measurement under non-orthogonal imaging conditions using RGB-D sensors. The proposed method integrates plane fitting and homography-based geometric rectification to transform imaged cracks into standard orthogonal viewpoints. It then applies dynamic masking and hybrid global–local binarization to the rectified image to improve measurement accuracy and robustness. Finally, this study develops an evaluation framework for comparing the proposed and baseline methods under different viewing angles and imaging distances. The framework establishes correspondences between physical locations along the same crack across RGB-D images captured under different imaging conditions, enabling quantitative analysis of performance variations. Experiments on two cracks in concrete buildings show that the proposed method outperforms the baseline method without geometric rectification, reducing the fitted surface error by 19.3–52.3% while maintaining a validity rate above 99%. The results indicate that incorporating surface geometry offers a practical pathway for quantitative crack assessment in close-range image-based inspection using handheld or UAV-mounted RGB-D cameras.

Keywords:

crack width measurement; RGB-D imaging; geometric rectification; homography transformation; viewpoint variation; concrete crack inspection

1. Introduction

Concrete is one of the most widely used engineering materials in modern infrastructure, forming the physical basis of bridges, tunnels, buildings, rail systems, marine facilities, and many other mission-critical assets [1]. Despite its excellent compressive capacity and broad constructability, concrete remains a quasi-brittle material whose tensile resistance is comparatively weak [2]. Under long-term service conditions, the combined effects of mechanical loading, shrinkage, thermal gradients, carbonation, chloride ingress, alkali-aggregate reaction, and other environmental actions make surface cracking nearly unavoidable [3]. Once cracks appear and propagate, they do not merely affect visual appearance; more importantly, they create direct pathways for water, oxygen, chlorides, and carbon dioxide to reach reinforcing steel [4]. This accelerates corrosion, induces cover delamination and spalling, reduces effective load-carrying area, and may eventually compromise structural safety and durability [5].

From the perspectives of macroeconomics and social security, the aging of infrastructure has become a severe global challenge. According to the Global Infrastructure Health White Paper released in 2025, approximately 35% of the bridges in use worldwide have exceeded their designed lifespan and require high-frequency and high-precision monitoring and maintenance solutions [6]. It is estimated that the annual investment in repairing traffic infrastructure defects in the United States and China alone amounts to hundreds of billions of dollars, and this figure is still growing at a compound annual growth rate of about 5% [7]. This difficult situation requires engineers to focus on “damage prediction” rather than “damage repair”. During this transformation process, the precise extraction and quantitative assessment of crack characteristics (such as position, direction, length, and width) have become the core basis for judging the safety level of the structure and formulating maintenance plans.

In the current engineering operation and maintenance practice, crack monitoring still largely relies on traditional manual visual inspections. This approach has played an important role over the past half century, but it has significant limitations when dealing with modern large-scale infrastructure complexes. Firstly, manual inspection is highly subjective, and the inspection results often vary depending on the professional experience of the inspectors, their visual acuity, and even their psychological and physiological state on that day, resulting in poor consistency and repeatability of the data [8]. Secondly, for scenarios such as the bottom of long-span bridges and super-high piers that are located at high altitudes and are highly risky, manual inspections often require the erection of scaffolding or the rental of aerial inspection vehicles. This not only poses significant operational risks but also incurs extremely high logistics and time costs [9]. Furthermore, it is difficult to achieve high-frequency coverage of large-scale structures through manual methods. The lag in data collection may cause managers to miss the best time for reinforcement [10].

To solve the aforementioned problems, automated crack identification technologies based on computer vision (CV) have been investigated. Deep learning-based approaches, such as convolutional neural networks (CNNs) and Vision Transformer (ViT) models, have made significant breakthroughs in crack classification, object detection, and pixel-level semantic segmentation fields [11]. Compared with manual methods, the visual inspection solution has inherent advantages such as non-contact, low cost, high efficiency, and digital traceability, making it possible for the digital inspection of infrastructure [12].

Despite the advances of CV methodologies, such methods encounter a significant challenge of precise quantification because of the inherent 2D nature of the image data. In ideal laboratory tests, cameras are typically vertically positioned in front of the experimental sample block, with the optical axis being orthogonal to the structural surface. Under this perspective, the pixel spatial resolution is uniform and can be directly converted to physical width if the scale factor is obtained. However, in actual inspection scenarios, due to the physical limitations of the on-site working space, the geometric irregularity of the structural surface, and the random deviations in the flight posture of the unmanned aerial vehicle, the captured crack images are often taken at an inclined angle [7]. This non-orthogonal perspective causes severe geometric projection distortion, resulting in the pixels in the image no longer being evenly distributed. There is a non-linear mapping error between the apparent pixel width of the crack and its true physical width. Furthermore, the complex and variable light and shadow, dirt, water seepage, as well as the texture of the concrete itself in an outdoor environment, all pose significant challenges for the reliable extraction of sub-millimeter-sized cracks under varying perspectives [13].

To improve the robustness of crack quantification under various perspectives, researchers have introduced the use of stereo vision, laser-assisted ranging, RGB-D sensors, and various three-dimensional reconstruction strategies [14]. These methods are promising because they provide geometric information that can support plane estimation and scale recovery. However, current solutions still face several practical challenges. Many systems depend on expensive cameras, long focal length optics, rigid multi-sensor mounting, and frequent calibration. In outdoor or large-span inspection settings, these requirements increase cost, reduce portability, and complicate deployment. Even when depth sensing is available, strong sunlight, long-range measurement, multipath interference, and noisy point clouds can degrade geometric accuracy. Furthermore, complex backgrounds in real structures, such as stair edges, construction joints, stains, and surface texture, often mimic crack appearance, creating false positives during segmentation.

This study proposes a method and evaluation framework for crack width measurement under non-orthogonal imaging conditions using RGB-D sensors. The method first estimates a major plane in the collected RGB-D image using a plane-fitting algorithm based on Random Sampling and Consensus (RANSAC) [15]. Based on the equation of the fitted plane, a homography matrix is computed to rectify the plane to a standard view, such that the transformed plane is parallel to the image plane and located at a predetermined distance from it. At the same time, the RGB image is binarized to obtain crack masks by combining multiple binarization mechanisms. The regions of interest (ROI) for crack quantification are extracted by combining the binarized images and inliers of the RANSAC algorithm, a process termed dynamic masking in this research.

The novelty of this study’s crack width measurement is that it could be used in a multi-plane situation. In conventional image-based approaches, crack width is usually calculated directly from pixel distances, which makes the result sensitive to viewpoint changes and imaging distance. And the crack must be located in a single plane. In this study, the crack-bearing surface is first estimated from RGB-D data and then used as a geometric reference for rectification and width reconstruction. This allows the measured width to be related to the physical crack plane rather than to the distorted image plane. In addition, the evaluation is designed to compare the same physical crack regions across different viewpoints, which provides a more consistent basis for assessing measurement robustness.

This research evaluates the robustness of existing binarization-based methods and the proposed method using an RGB-D image dataset capturing a crack from different perspectives. The research aligns planes in different images in the dataset to enable the consistent evaluation of the same physical parts of the crack under different perspectives. The evaluation showed that the proposed approach reduces the overall error by 52.3% compared with the baseline method and provides better robustness under varying distance and viewing-angle conditions, with greater improvements observed at larger distances and wider viewing angles. The contributions of this research can be summarized into the following two points:

An RGB-D-assisted geometric rectification for crack width measurement under non-orthogonal imaging conditions: This study develops a geometric rectification approach to reduce the sensitivity of conventional image-based crack width measurement to viewing angle and imaging distance. Planes are fitted to RGB-D images, and the estimated plane equations are used to analytically derive homography transformations from the observed plane regions to standard views with orthogonal viewing angles and controlled imaging distances. This rectification standardizes the crack detection and width estimation problem, improving the robustness and accuracy of subsequent image processing algorithms.
An evaluation framework for crack width measurement methods under different viewing angles and imaging distances: this study quantifies the variations in crack width measurement accuracy under different imaging conditions by establishing correspondences between physical locations along the same crack across RGB-D images captured under different imaging conditions. The evaluation framework enables the quick quantification and visualization of the viewpoint dependence of different crack width measurement algorithms, facilitating the development, selection, and deployment of crack width measurement in the field.

Section 2 discusses related work. Section 3 discusses methodology in detail, including plane fitting, geometric rectification, hybrid global–local binarization, dynamic masking, crack width calculation, and evaluation methodology. Section 4 presents the experiment and results. Section 5 presents the discussion, followed by the conclusion in Section 6.

2. Related Works

2.1. RGB-D Sensor Fusion for Crack Detection and Quantification

The sensor fusion technology based on RGB-D cameras is one of the mainstream solutions for monitoring non-orthogonal viewpoints of concrete structures in recent years. This type of system aims to construct a hardware platform that can quantitatively restore the geometric properties of cracks in three-dimensional space by integrating sensors with depth perception capabilities and high-resolution imaging equipment. The research focus usually lies on the three-dimensional reconstruction algorithms for spatial alignment of different types of sensors and multi-scale information registration, which can fit the spatial equation of the plane where the cracks are located.

Kim et al. proposed an algorithm combining the low-resolution Intel RealSense D435 RGB-D camera and a high-resolution RGB camera [14]. In this algorithm, a plane is fitted to the point cloud data generated by the RGB-D camera, and a linear coordinate transformation between the two cameras is established. Each crack pixel identified in the high-resolution image is then projected onto the fitted plane, thereby achieving precise quantification of fine cracks at a working distance of 2.5 m. Yang et al. proposed a semantic metric 3D reconstruction framework that fused RGB-D visual SLAM with pixel-level defect segmentation, enabling crack and spalling information to be mapped into a global 3D metric representation with width, depth, and area-related measurements [16]. Shokri et al. further combined deep-learning-based crack segmentation with stereo-camera calibration and 3D reconstruction to identify and model concrete cracks in three-dimensional space [17]. Feng et al. proposed a crack assessment method integrating multi-sensor fusion SLAM and image super-resolution, through which textured point clouds of bridge structures could be obtained directly to improve both reconstruction efficiency and crack-width estimation accuracy [18]. Wu et al. further introduced a depth-aware RGB-D fusion strategy for crack segmentation and quantification [19]. Wójcik et al. used low-cost RGB-D images for crack-width measurement, but the influence of imaging distance and oblique viewing angle was not systematically investigated [20]. However, these approaches mainly enhance the detection network using depth information, whereas the present study uses the fitted crack plane as an explicit geometric reference for rectification and metric width calculation.

Overall, these studies generally follow a two-stage pipeline: crack detection in the image domain and projection of the detected crack into 3D physical space. The accuracy of crack quantification therefore depends on both the reliability of image-domain crack detection and the accuracy of 3D geometric projection. However, performance can degrade markedly in complex scenes or under challenging viewing conditions, such as long distances or oblique viewing angles. Therefore, a crack quantification method that remains robust under such conditions is still needed.

2.2. Image Rectification for Robust Geometric Quantification

For planar targets, image rectification is commonly achieved through homography-based transformation, which compensates for perspective distortion and improves the geometric consistency of subsequent metric measurements. In structural inspection, Valença et al. integrated terrestrial laser scanning with image processing and orthorectified the captured images using TLS-derived geometric information, thereby improving crack detection on large concrete structures [21]. Liu et al. further combined UAV imaging with three-dimensional scene reconstruction for crack detection of bridge piers, showing that geometric reconstruction can support more reliable crack mapping and measurement [9]. Beyond crack inspection, homography-based rectification has also been adopted in other structural measurement tasks. For example, Chen et al. used a homography-based strategy to suppress UAV motion effects in bridge vibration measurement, while Min et al. incorporated homography formulation into vision-based inspection of port structures for geometric displacement estimation [22,23]. More recently, Yu et al. addressed oblique optical-axis conditions in full-field deformation and crack measurement by compensating homography calibration errors through a Perspective-n-Point-based correction scheme [24]. An et al. [25] also investigated crack quantification in oblique images and proposed a method assisted by 3D laser point clouds to recover the true physical size represented by image pixels. These studies indicate that geometric rectification is a key prerequisite for robust quantitative measurement under non-orthogonal imaging conditions. Existing 3D crack mapping workflows often depend on photogrammetric reconstruction and camera-pose estimation, while the present method focuses on local RGB-D plane fitting and homography-based rectification for crack-width calculation under controlled viewpoint variations [26]. However, existing methods usually rely on external geometric references, orthogonal rectification workflows, or additional sensing support, and their robustness may still be limited in complex inspection scenes. Therefore, a lightweight and robust rectification-oriented crack quantification framework remains needed.

2.3. Crack Detection Algorithms in Images

Crack detection in images can be performed by classical binarization-based methods or deep-learning-based methods. The Otsu algorithm is a binarization-based method that determines the optimal separation threshold by maximizing the between-class variance [27]. Concrete surfaces often exhibit significant illumination gradients or local shadows, causing the grayscale histogram to lack typical bimodal characteristics [28]. To address the deficiencies of global operators, local adaptive methods have been introduced. Sauvola’s method adaptively computes the threshold from local image statistics and is therefore more suitable for concrete surfaces affected by nonuniform illumination or local shadows [29]. Deep-learning-based methods, including CNN-based crack classification, U-Net-based segmentation, DeepCrack, and vision-transformer-based frameworks, have substantially improved image-domain crack detection accuracy and robustness. However, their performance may degrade when crack appearance changes with imaging distance or viewing angle, unless this issue is explicitly addressed during the training process [30,31,32,33]. Li et al. [34] proposed OrthoBoundary and edge shortest distance and improved the crack width calculation algorithm in the image domain to correct the propagation direction of skeleton points. However, it still mainly relied on image geometry. Ge et al. proposed a foundation-model-based approach to improve cross-scene crack segmentation. However, such methods still operate primarily in the image domain and do not explicitly correct geometric distortion caused by oblique imaging [35].

These methods operate primarily in the image domain and do not explicitly incorporate the 3D geometry of the observed crack surface. The apparent scale, morphology, and texture of the same crack may change with imaging distance and viewing angle. Such viewpoint-induced variations can limit the robustness and generalization of image-based methods, particularly deep-learning-based ones. When the imaging conditions deviate from those represented during model development, the resulting detection outputs may become less reliable, which in turn affects subsequent crack quantification.

2.4. Summary of Research Gaps

While previous studies leverage depth data for projection or detection, they do not transform crack-width measurement into a standardized problem that can be readily handled by various image-based algorithms. In contrast, this study introduces a geometric rectification step that can be combined with different image processing algorithms to improve measurement accuracy and robustness. The rectification helps image processing algorithms, particularly simple or classical ones that are easy to deploy, operate more reliably. Moreover, this study extends the crack width measurement problem by developing an evaluation framework for quantifying and visualizing the viewpoint dependence of different measurement algorithms.

3. Methodology

This section is organized into two methodological components. Section 3.1, Section 3.2, Section 3.3, Section 3.4, Section 3.5 and Section 3.6 describe the proposed RGB-D-assisted crack-width measurement framework. Section 3.7 presents the viewpoint-consistent evaluation methodology used to assess same-location measurement performance under viewpoint variability.

3.1. Overview

The proposed approach estimates the crack width from an RGB-D image taken under various viewpoint conditions by the following five steps (Figure 1): plane fitting, geometric rectification, hybrid global–local binarization, dynamic masking, and crack width calculation. These steps are designed to maximize the performance of relatively simple classical binarization algorithms under challenging viewpoints. Details of these steps, as well as the methodology used to evaluate the performance of RGB-D-image-based proposed and existing crack width estimation methods, are discussed in the following sections.

3.2. Plane Fitting

The proposed method begins with fitting a plane to the RGB-D image data. First, the 3D coordinates

X (u, v), Y (u, v), Z (u, v)

of an image point

(u, v)

near the crack can be calculated based on the projection Equation (1):

s f [\begin{matrix} u \\ v \\ 1 \end{matrix}] = [\begin{matrix} f_{x} & s k e w & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} X (u, v) \\ Y (u, v) \\ Z (u, v) \end{matrix}]

(1)

where

u

and

v

here represent the pixel coordinates of the crack surface in the RGB image;

s f

is the scale factor;

s k e w

is the skew coefficient;

f_{x}

and

f_{y}

are the focal lengths of this camera; and

c_{x}

and

c_{y}

are the principal points. According to the intrinsic parameters, we know

s f

is 1,

s k e w

is zero, and the corresponding

f_{x}

,

f_{y}

,

c_{x}

, and

c_{y}

.

Z (u, v)

is known as the depth value of point

(u, v)

, then

X (u, v)

and

Y (u, v)

can be derived from Equation (1):

X (u, v) = \frac{(u - c_{x}) Z (u, v)}{f_{x}}

(2)

Y (u, v) = \frac{(u - c_{y}) Z (u, v)}{f_{y}}

(3)

In this case, the three-dimensional coordinates of image pixels can be determined.

The plane-fitting in this research is based on the RANSAC algorithm [14]. Minimal sets of points (3 in this research) are sampled to hypothesize planes in the following form:

a x + b y + c z + d = 0

(4)

\sqrt{a^{2} + b^{2} + c^{2}} = 1

(5)

where

a, b, c, d

are the parameters of the fitted plane. Equation (5) enforces the normalization condition on the plane coefficients, such that the normal vector has unit length. Under this normalization,

d

represents the signed distance parameter of the plane, and

|d|

corresponds to the perpendicular distance from the camera center to the plane. The distances from all candidate points to those planes are evaluated, and inliers are determined. The case with the largest number of inliers and the lowest residual is selected, and the final plane equation is determined using those inliers.

According to the coordinate Equations (2) and (3), and the plane Equation (4), the depth of any point in the image

(u_{0}, v_{0})

can be calculated by the following:

Z = \frac{- d}{a (\frac{u_{0} - c_{x}}{f_{x}}) + b (\frac{v_{0} - c_{y}}{f_{y}}) + c}

(6)

The three-dimensional coordinates

X

,

Y

, can then be calculated by Equations (2) and (3). After obtaining the plane equation, the normalized plane normal vector

n

is obtained by the following:

n = [\begin{matrix} a \\ b \\ c \end{matrix}]

(7)

3.3. Geometric Rectification

After obtaining the fitted plane and its normal vector, the crack surface observed under an oblique viewpoint is transformed into an orthogonal view with a consistent geometric scale. For points lying on the same 3D plane, the mapping between the baseline image and a virtual rectified image can be described by a plane-induced homography [36]. Let the pixel coordinate in the baseline RGB image be written in homogeneous form as follows:

x = [\begin{matrix} u \\ v \\ 1 \end{matrix}]

(8)

and the corresponding point in the rectified image is as follows:

x^{'} = [\begin{matrix} u^{'} \\ v^{'} \\ 1 \end{matrix}]

(9)

Then, the geometric relationship between the two images can be expressed as follows:

x^{'} ~ H x

(10)

where

H

is the homography matrix.

To construct this mapping, a virtual camera is defined such that the fitted crack plane becomes parallel to the image plane after transformation. Let the target optical axis of the virtual camera be the following:

e_{z} = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}]

(11)

A rotation matrix

R

is then introduced to align the fitted plane normal vector

n

with

e_{z}

:

R n = e_{z}

(12)

The corresponding rotation angle

θ

and rotation axis

k

are given by the following:

θ = \arccos (n^{T} e_{z})

(13)

k = \frac{n \times e_{z}}{‖n \times e_{z}‖}

(14)

where

k = {[k_{x}, k_{y}, k_{z}]}^{T}

is the unit rotation axis. The rotation matrix

R

can then be obtained by the Rodrigues formula:

R = I + s i n θ {[k]}_{\times} + (1 - c o s θ) {[k]}_{\times}^{2}

(15)

where

I

is the

3 \times 3

identity matrix, and

{[k]}_{\times}

is the skew-symmetric matrix of

k

, defined as follows:

{[k]}_{\times} = [\begin{matrix} 0 & - k_{z} & k_{y} \\ k_{z} & 0 & - k_{x} \\ - k_{y} & k_{x} & 0 \end{matrix}]

(16)

This formulation converts the axis-angle representation into a rotation matrix and is the standard Rodrigues rotation formula used in geometric vision and in OpenCV implementations. If

‖n \times e_{z}‖

is sufficiently small, the plane normal is already aligned with the target optical axis, and

R

degenerates to the identity matrix [37,38].

Accordingly,

R

can be obtained from the axis-angle representation through the Rodrigues formula. This operation reorients the inclined crack surface into an orthogonal plane in the virtual view.

To further unify the image scale after rectification, a target rectification depth

d_{t a r g e t}

is introduced. This parameter specifies the desired distance between the fitted plane and the virtual camera. The corresponding translation vector is expressed as follows:

t = - R ((d_{p} - d_{t a r g e t}) n)

(17)

When

d_{t a r g e t}

is selected close to the true physical depth of the crack surface, the resampled image can better preserve the actual geometric scale of the crack pattern and reduce the apparent compression caused by perspective projection.

Let

K

and

K^{'}

denote the intrinsic matrices of the baseline camera and the virtual rectified camera, respectively. The plane-induced homography is then written as follows:

H = K^{'} (R + \frac{t n^{T}}{d_{p}}) K^{- 1}

(18)

Using this homography matrix

H

, the baseline oblique image is warped into a new rectified image through inverse perspective mapping. This rectified image is the direct output of the geometric rectification step and represents the crack surface in an orthogonal manner with a unified geometric scale. Consequently, crack features that are compressed in the baseline oblique view are restored to proportions closer to their true physical geometry. Such pre-rectification is particularly important for sub-millimeter cracks, as it improves the effective sampling density of fine crack features prior to binarization and segmentation, thereby providing a geometrically consistent image for subsequent crack width estimation.

3.4. Dynamic Masking

Since geometric rectification and homography transformation may introduce unmapped pixels and irregular invalid regions near the image boundary, a dynamic masking strategy was further employed to constrain subsequent crack segmentation to the valid surface region. Without such a constraint, black borders, empty pixels, and boundary artifacts may contaminate local statistics and lead to erroneous crack responses.

In the proposed method, a dense validity mask was generated from the full-resolution depth image and the fitted plane model. For each pixel, its three-dimensional coordinates

(X, Y, Z)

were first recovered from the depth value and camera intrinsic. The point-to-plane distance was then computed as follows:

D (x, y) = |a X (x, y) + b Y (x, y) + c Z (x, y) + d|

(19)

where

(a, b, c, d)

are the parameters of the fitted plane. A raw mask

M_{r a w} (x, y)

was subsequently defined by the following:

M_{r a w} (x, y) = \{\begin{matrix} 1, & Z (x, y) > T_{d} a n d D (x, y) < τ_{d}, \\ 0, & otherwise, \end{matrix}

(20)

where

T_{d}

(0 in this research) denotes the minimum valid depth threshold and

τ_{d}

(3 mm in this research) is the allowable point-to-plane distance. In this way, only pixels with reliable depth observations and sufficient geometric consistency with the target surface were retained as valid candidates.

To improve spatial continuity and suppress isolated noisy regions, the raw mask was further refined by morphological opening and closing operations. After that, connected-component analysis was applied, and the principal surface region was preserved as the final valid mask. This refinement step helps remove small spurious regions and yields a more compact and stable support region for crack extraction.

The final dynamic mask was then used to restrict the statistical domain of the subsequent binarization process. Only pixels within the valid region participated in threshold estimation, while pixels outside the mask were excluded from crack classification. Therefore, the proposed masking strategy effectively suppresses artifacts introduced by rectification, reduces the influence of invalid boundary regions, and improves the reliability of crack segmentation under skewed viewing conditions.

3.5. Hybrid Global–Local Binarization

To improve crack segmentation under nonuniform illumination and complex surface texture, a hybrid global–local binarization strategy was adopted in the proposed method. Specifically, Sauvola thresholding was used to preserve fine crack boundaries under spatially varying brightness, whereas Otsu thresholding was introduced to provide a global intensity reference for suppressing large-scale non-crack dark regions. By combining these two thresholding mechanisms, the proposed method improves robustness against background interference while maintaining sensitivity to narrow cracks [27,29]. For each pixel

(x, y)

the Sauvola local threshold is defined as follows:

T_{S (x, y)} = m (x, y) [1 - k (1 - \frac{s (x, y)}{R})]

(21)

where

m (x, y)

and

s (x, y)

are the local mean and standard deviation computed within a rectangular window centered at

(x, y)

respectively;

k

is a sensitivity parameter; and

R

is the normalization factor for the standard deviation.

In addition, a global threshold

T_{O}

was determined using Otsu’s method, which selects the threshold that maximizes the inter-class variance of the grayscale histogram:

T_{O} = \arg \max_{t} [ω_{0} (t) ω_{1} (t) {(μ_{0} (t) - μ_{1} (t))}^{2}]

(22)

where

ω_{0} (t)

and

ω_{1} (t)

denote the probabilities of the two classes separated by threshold

t

,

μ_{0} (t)

and

μ_{1} (t)

are the corresponding class mean intensities.

The final threshold was then formulated as a linear combination of the local Sauvola threshold and the global Otsu threshold:

T (x, y) = (1 - λ) T_{S} (x, y) + λ T_{O}

(23)

where

λ

(=0.6 in this study) is the fusion weight. The Sauvola threshold

T_{S} (x, y)

was calculated using a window size of 21 × 21 pixels, with k = 0.25 and R = 128. These parameters were kept constant across all tested images to ensure reproducibility. They were selected based on preliminary trials to balance local crack-detail preservation and background-noise suppression.

The binarized image

B (x, y)

is finally obtained by the following:

B (x, y) = \{\begin{matrix} 0, & G (x, y) < T (x, y), \\ 255, & otherwise . \end{matrix}

(24)

where

G (x, y)

denotes the grayscale intensity at pixel

(x, y)

. Since cracks generally appear as dark line-like structures in the image, pixels with intensities lower than the threshold were classified as crack pixels. In this way, the proposed hybrid strategy can suppress irrelevant dark patterns at the global scale while preserving thin crack details at the local scale.

3.6. Crack Width Calculation

After geometric rectification and binarization, crack width is obtained by combining image-domain edge localization with metric reconstruction in physical space. First, the binary crack region is skeletonized using a thinning operation based on the Zhang–Suen algorithm [39], and each sampled point is assigned to its nearest skeleton pixel as the local center point. Then, the local tangent direction is estimated from neighboring skeleton points, and the corresponding normal direction is used as the measurement direction so that the width is measured perpendicular to the local crack trend.

Starting from the local center point, the two crack edges are identified by searching along the positive and negative normal directions within the binary crack region. For the detected edge points, their 3D coordinates can be reconstructed using the pixel-to-coordinate relations in Equations (2) and (3) together with the plane-constrained depth formulation in Equation (6), under the fitted plane

a x + b y + c z + d = 0

given in Equation (4).

The crack-width sampling points are determined from the extracted crack centerline within the selected crack region. Specifically, the crack region of interest is first selected, and the centerline pixels are sampled at a fixed interval. For each sampled centerline point, the local normal direction is estimated using neighboring centerline pixels. Then, two crack edge points are searched along the positive and negative normal directions. The physical crack width is calculated as the Euclidean distance between the reconstructed 3D coordinates of these two edge points.

The physical crack width is finally defined as the Euclidean distance between the reconstructed 3D coordinates of the two edge points. To ensure robustness, samples for which a reliable local measurement cannot be established are excluded from width statistics. These invalid samples include cases where the local normal direction cannot be stably estimated from the skeleton neighborhood, where valid edge points cannot be identified along the normal direction, or where numerical instability occurs during pixel-to-plane projection. In this way, the crack width is converted from an apparent pixel distance into a physically meaningful metric distance on the fitted plane, improving measurement reliability under varying viewpoints and imaging distances.

Figure 2 presents two representative invalid-sample cases that can be visually illustrated. Figure 2a,b are the overviews of two different samples, crack 1 and crack 2. The yellow and blue labels are used to assist in locating the ROI of binarization images. Figure 2e represents the invalid sample caused by crack branching, and Figure 2f represents the invalid sample caused by failure of edge detection. Numerical instability during pixel-to-plane projection is included in the text, but hard to visualize in the image domain.

3.7. Viewpoint-Consistent Evaluation Methodology

To enable crack width comparison at the same physical crack location under viewpoint variability, a reference-image-based correspondence transfer strategy is designed. First, one image is selected from the rectified image set as the standard reference image, typically the one with the smallest viewpoint deviation. Because this image suffers the least geometric distortion, it serves as a stable evaluation baseline. A region of interest containing the target crack is manually defined on the reference image, while the crack center points within this region are automatically extracted as evaluation samples. In this way, manual intervention is only used to delimit the evaluation region, whereas the generation of actual measurement points remains automatic. Shown as Figure 3.

Next, MASt3R is employed to establish dense correspondences between the reference image and the target images acquired under different depths and viewpoint angles. Instead of estimating correspondences over the entire scene indiscriminately, the proposed method introduces a plane-constrained strategy so that only points belonging to the crack plane are retained for homography estimation. Specifically, the dense geometric coordinates recovered by MASt3R are filtered using plane information, which removes points associated with background planes and irrelevant structures and preserves only the correspondences lying on the target crack surface. This strategy reduces multi-plane interference and improves the stability and geometric consistency of point transfer across viewpoints.

To improve the reliability of point transfer, the MASt3R-derived correspondences were constrained to the crack plane. Dense correspondences were first obtained between the reference and target images using MASt3R. Then, only correspondences whose recovered 3D coordinates were located on or close to the fitted crack plane were retained for homography estimation. Points from background surfaces, irrelevant objects, or out-of-plane regions were discarded. This filtering reduces the influence of multi-plane mismatches and helps ensure that the transferred sample points correspond to the same physical crack locations across different viewpoints.

Based on the estimated homography, the sampled points in the reference image are transferred to two groups of target images: (1) baseline images captured under different depths and viewpoint angles, and (2) the corresponding rectified images acquired under the same conditions. Since both measurements originate from the same set of sampled points in the same reference image, crack width comparison between the baseline and rectified images is conducted at identical physical crack locations rather than at independently selected positions in each image. Therefore, the resulting difference more faithfully reflects the influence of viewpoint variation and the effectiveness of geometric rectification in improving measurement consistency.

This evaluation strategy partly relies on the quality of the MASt3R-derived correspondences. Poor correspondences may shift the transferred sample points away from the true crack locations and introduce local errors in the relative error calculation. To reduce this effect, only correspondences located on the fitted crack-bearing plane and within the valid evaluation region were retained. This limitation is acknowledged, and the reported results should be interpreted as measurement accuracy under the retained valid correspondences.

4. Experiments and Results

4.1. Experiment Setup and Overview of Experiments

A series of experiments was conducted on two concrete cracks on buildings at the International Campus of Zhejiang University. RGB-D images were captured using a handheld Azure Kinect DK RGB-D camera, with the color image resolution set to 4096

\times

3072. Figure 4 illustrates the experimental setup and the ground-truth measurement procedure. Before image-based evaluation, reference crack widths were measured manually using a digital caliper (Syntek JS20-GTG manufactured by Shengtai Xin Electronic Technology Co. Ltd. of Deqing, China) at selected physical locations along the crack. The digital caliper used in this study had a nominal resolution of 0.01 mm. For each selected reference location, three repeated measurements were performed, and the average value was used as the reference crack width for quantitative error evaluation. Note that although the nominal resolution of the caliper was 0.01 mm, the actual measurement uncertainty was larger because of irregular crack boundaries, local surface roughness, and manual alignment of the caliper with the local crack-opening direction. Nevertheless, the measured values are interpreted as reference values for evaluating the estimation by algorithms.

The experimental dataset was acquired from two cracks shown in Figure 5, where the yellow and blue labels are used to assist in locating the ROI of binarization images. The viewing angle varied approximately from 0° to 60°, and the imaging depth ranged approximately from 170 mm to 550 mm. A schematic diagram defining imaging depth, viewing angle, and target rectification depth is provided in Figure 6. The same RGB-D acquisition device, image-processing pipeline, geometric rectification procedure, and crack-width calculation method were applied to two tested cracks, yielding a total of 43 and 50 RGB-D images, respectively. The dataset constructed from the first crack is referred to as the main dataset in this study because it includes richer viewpoint variations and more crack width estimation points.

4.2. Evaluation Baseline and Metrics

This study compares the proposed crack width measurement method with geometric rectification against a baseline method without rectification. In the baseline method, crack width is measured directly from the baseline oblique images using the same crack extraction and width estimation procedure, but without transforming the crack plane into an orthogonal view. The proposed method introduces geometric rectification before crack binarization and width measurement, so that the crack pattern is evaluated in a geometrically normalized image domain.

To quantitatively evaluate the measurement performance, both point-level and global metrics are employed. At the point level, the relative error is defined as follows:

r e l a t i v e e r r o r = \frac{|w_{measured} - w_{true}|}{w_{true}}

(25)

where

w_{measured}

is the estimated crack width and

w_{true}

is the corresponding ground-truth width. In this research, ground-truth width was obtained manually by a digital caliper.

In addition, the validity rate is introduced to describe the proportion of transferred sample points that can produce valid width measurements. A point is considered valid only when a stable local normal direction can be estimated, and two crack edge points can be successfully detected along the opposite normal directions, so that the corresponding physical crack width can be computed. The validity rate is defined as follows:

validity rate = \frac{N_{valid}}{N_{all}} \times 100 [%]

(26)

where

N_{valid}

is the number of valid measurement points and

N_{all}

is the total number of transferred sample points.

For dataset-level evaluation, the mean relative error is used to summarize the measurement accuracy over all valid points. Furthermore, to characterize the overall error trend under joint depth and angle variations, an integrated fitted surface error reduction metric is introduced. Specifically, the relative errors of the baseline and proposed methods are modeled in the depth–angle space by quadratic surfaces:

E (d, θ) = β_{0} + β_{1} d + β_{2} θ + β_{3} d^{2} + β_{4} d θ + β_{5} θ^{2}

(27)

The coefficients are estimated using the Huber loss:

L_{δ} (r) = \{\begin{matrix} \frac{1}{2} r^{2}, & |r| \leq δ, \\ δ (|r| - \frac{1}{2} δ), & |r| > δ, \end{matrix}

(28)

which improves robustness against locally unstable or outlying error observations during trend fitting [40].

The fitted surfaces are then integrated over the common evaluation domain

Ω

, and the reduction ratio is defined as follows:

R_{int} = \frac{\iint_{Ω} E_{base} (d, θ) d d d θ - \iint_{Ω} E_{prop} (d, θ) d d d θ}{\iint_{Ω} E_{base} (d, θ) d d d θ} \times 100 [%]

(29)

where

E_{base} (d, θ)

and

E_{prop} (d, θ)

denote the fitted relative-error surfaces of the baseline and proposed methods, respectively. A larger value of

R_{int}

indicates a greater overall reduction in error achieved by the proposed method.

4.3. Selection of Target Rectification Depth

To determine an appropriate target rectification depth

d_{t a r g e t}

, for the proposed method, five candidate values—namely 100 mm, 150 mm, 200 mm, 250 mm, and 300 mm—were evaluated using the main dataset. For each setting, the number of valid measurement points, validity rate, mean relative error, and integrated fitted surface error reduction were calculated, as summarized in Table 1.

Among all candidates,

d_{t a r g e t} =

100 mm achieved the lowest mean relative error (0.109) and the highest integrated fitted surface error reduction (54.3%). However, its validity rate dropped to 98.5%, with 21 invalid points observed among 1376 transferred sample points. Figure 7 is a sample of valid and invalid points at

d_{t a r g e t} =

100 mm. Although most valid points maintained relatively lower local relative errors compared to the

d_{t a r g e t} = 150

mm condition, several points became invalid in crack regions. This indicates that an overly small target rectification depth can amplify local geometric deformation, leading to unstable points localization.

By contrast,

d_{t a r g e t} =

150 mm maintained a low mean relative error (0.137) and a comparable integrated fitted surface error reduction (52.3%), while increasing the validity rate to 99.6%. Figure 8 is a sample of valid and invalid points at

d_{t a r g e t} = 150

mm. The invalid points were clearly reduced, and those valid points were more consistently distributed along the sampled crack region. These results suggest that 150 mm preserves most of the rectification benefit while reducing local instability and providing better robustness for samples of crack-width measurement.

The comparison of Figure 7 and Figure 8 shows that the number of invalid points was clearly reduced and the crack morphology remained more stable after rectification. This suggests that 150 mm preserves most of the rectification benefit while providing better robustness.

For a larger value of

d_{t a r g e t}

, the validity rate remained high, but the mean relative error increased while the integrated fitted surface error reduction decreased progressively. Therefore,

d_{t a r g e t} =

150 mm was selected for all subsequent experiments, as it provided the best balance between accuracy and robustness.

4.4. Crack Width Estimation Error in the Depth–Angle Space

The overview of the error distribution in the depth–angle domain evaluated on the main dataset is shown in Figure 9 with the fitted surfaces for both the baseline method and the proposed method at

d_{t a r g e t} =

150. Two viewing perspectives are presented in order to make the global trend more visually interpretable. The red scatter points and red fitted surface represent the baseline method, whereas the blue scatter points and blue fitted surface represent the proposed method.

Although the raw scatter points exhibit noticeable dispersion, the fitted trend surfaces reveal a clear and consistent overall pattern. In most parts of the evaluated depth–angle domain, the fitted surface of the proposed method remains below that of the baseline method, indicating that the proposed method achieves a lower overall relative error level. Figure 9 shows the two fitted relative-error surfaces from different viewing perspectives. The proposed surface is generally located below the baseline surface, especially over the regions with larger observation depth and viewing angle. This result suggests that the geometric rectification strategy effectively reduces the systematic error caused by joint variations in observation depth and viewpoint angle.

The two fitted surfaces also show different sensitivities to geometric changes. The fitted surface of the baseline method rises more noticeably as depth and viewing angle increase, whereas the proposed method maintains a lower and more stable fitted error surface. This indicates that the baseline method is more vulnerable to geometric distortion under higher angle and depth conditions. It also indicates that the proposed method provides improved robustness across the evaluated depth–angle domain.

From the two perspectives shown in Figure 9a,b, it can also be observed that the separation between the two fitted surfaces becomes more evident in the regions associated with larger viewing angles and greater depths. This implies that the advantage of the proposed method is more pronounced when geometric distortion becomes stronger. Therefore, the 3D global trend analysis provides an intuitive overall confirmation that the proposed method is not only more accurate on average but also more robust to the coupled influence of depth and viewpoint variability.

The grouped 2D analyses in the following section further decompose this global trend by examining the relative error variation with depth under shared angle groups and with angle under shared depth groups.

4.5. Relative Error Analysis Under Depth and Angle Variability

4.5.1. Relative Error Variation with Depth Under Shared Angle Groups

To examine the effect of depth under comparable viewpoint conditions, the viewing-angle samples were partitioned into four groups using the K-Means clustering algorithm. Given the angle dataset

x_{i}

, K-Means divides the samples into

K

clusters

C_{i}

by minimizing the within-cluster sum of squares:

\min_{C_{1}, \dots, C_{K}} \sum_{k = 1}^{K} \sum_{x_{i} \in C_{k}} {|x_{i} - μ_{k}|}^{2}

(30)

where

μ_{k}

denotes the centroid of the

k

-th cluster. In this study, K = 4 was adopted to partition the full angle range into four representative groups with relatively homogeneous intra-group angular variation. The resulting clusters were reordered according to the centroid angle from small to large and denoted as Angle Clusters 1–4, which, respectively, represent near orthogonal, low oblique, moderate oblique, and strongly oblique viewing conditions. KMeans is a classical partition-based clustering method that aims to minimize within-cluster variance [41].

Figure 10 shows the variation in relative errors for different depths within K-Means angle groups. In Angle Cluster 1, both methods maintain relatively low error, but the baseline method shows a clearer increase with depth, whereas the proposed method remains comparatively stable. In Angle Clusters 2 and 3, the advantage of the proposed method becomes more evident, with the mean error curve remaining below that of the baseline method at nearly all depth levels. This suggests that the proposed method can more effectively suppress depth-related error accumulation under moderate viewing angles. The difference is most pronounced in Angle Cluster 4. At large viewing angles, both methods become more sensitive to depth, but the proposed method still preserves substantially lower mean error and reduced fluctuation. This indicates that the benefit of geometric rectification becomes more significant when depth variation is coupled with strong viewpoint obliqueness. Overall, the proposed method consistently yields lower relative error than the baseline method across all angle groups, indicating better robustness to depth variation.

4.5.2. Relative Error Variation with Angle Under Shared Depth Groups

Figure 11 presents the variation in relative errors for different angles within K-Means depth groups. In the nearest depth group, both methods show relatively low error at small angles, but the error increases at larger angles. In the intermediate depth groups, the angular effect becomes more evident: the baseline method exhibits a clearer upward trend with angle, whereas the proposed method increases more slowly and remains at a lower level throughout the angular range. For the larger-depth groups, the difference between the two methods remains significant. Although local fluctuations are present, the proposed method generally produces lower mean error and a more concentrated scatter distribution. This is particularly evident in the farthest depth group, where the baseline method stays at a relatively high error level, while the proposed method maintains a lower and flatter trend. Overall, the results confirm that viewing angle is a major source of relative error, especially when combined with larger depth. The proposed method effectively suppresses the error growth induced by increasing obliqueness and therefore improves measurement accuracy and stability under varying viewpoint conditions.

4.6. Validation on a Second Crack

To interpret the generalizability of the finding from the main dataset, an additional experiment was conducted for the second crack. The target rectification depth was fixed at 150 mm, following the parameter selection results obtained from the main experiment. The second validation dataset covered imaging depths of approximately 172–221 mm and viewing angles of approximately 12–48°. The proposed method again outperformed the baseline method without geometric rectification, reducing the fitted surface error by 19.3% while maintaining the validity rate above 99.4%.

The overview of the error distribution in the depth–angle domain evaluated on Crack 2 is shown in Figure 12, with the fitted surfaces for both the baseline method and the proposed method at

d_{t a r g e t} =

150. Two viewing perspectives of Crack 2 also represent the same tendency as Crack 1. The proposed points and fitted surface indicate a lower relative error compared to the baseline points and fitted surface.

Figure 13 shows the relative-error variation with viewing angle under shared K-Means depth groups for the second crack. Overall, the proposed method maintains a lower mean relative error than the baseline method in most depth groups. In the first two depth clusters, where the imaging depth is relatively small, both methods exhibit relatively low errors, but the proposed method still produces comparable or lower mean error. As the viewing angle increases, especially in depth Clusters 3 and 4, the error level of the baseline method increases more evidently, whereas the proposed method remains lower and shows a more controlled trend. This result is consistent with the main experiment, indicating that geometric rectification is particularly beneficial when viewpoint obliqueness becomes stronger.

Figure 14 further presents the relative-error variation with imaging depth under shared K-Means angle groups for the second crack. For the near orthogonal and low angle groups, both methods yield relatively small errors, and the difference between the two methods is limited. This is expected because perspective distortion is weak under near-orthogonal imaging conditions. In the moderate-angle and large-angle groups, the proposed method generally achieves lower mean relative error than the baseline method. The improvement is most evident in angle Cluster 4, where the baseline mean error remains high, while the proposed method reduces the mean error across the evaluated depth range. These findings again suggest that the proposed rectification strategy is more effective when the geometric distortion caused by oblique observation becomes more significant.

Overall, the additional experiment provides supplementary evidence that the proposed rectification strategy can reduce relative errors on a crack imaged under non-orthogonal viewing conditions. However, the specific improvement patterns are not consistent with the main experiment, especially in low-angle or small-depth groups. Further research is needed to fully characterize the robustness in this regard under various practical imaging conditions.

5. Discussion

The improved performance of the proposed method can be mainly attributed to the introduction of geometry-aware rectification before crack extraction and width estimation. In conventional image-domain measurement, crack morphology is directly affected by viewpoint-dependent perspective distortion, especially under oblique imaging conditions. By incorporating RGB-D-based surface information and performing homography rectification on the crack plane, the proposed framework provides a more geometrically consistent representation of the crack region. This geometric normalization improves the reliability of subsequent binarization, edge localization, and width calculation, thereby reducing the sensitivity of the overall measurement process to viewpoint changes.

The experimental results further indicate that both imaging depth and viewing angle influence crack-width measurement accuracy, but their effects are not equally significant. Compared with depth variation, the influence of viewing angle is more pronounced because angle directly alters the degree of geometric compression and perspective distortion in the observed crack pattern. This explains why the baseline method shows a clearer increase in relative error as the angle becomes larger. In contrast, the proposed method maintains lower error levels under all tested grouping conditions, indicating that the rectification step effectively compensates for distortion induced by non-orthogonal viewpoints. These results confirm that explicit geometric correction is particularly important for robust crack measurement in multi-view inspection scenarios.

The findings are consistent with previous RGB-D and three-dimensional reconstruction studies showing that geometric information is important for physical crack measurement [14]. The results also support previous rectification studies showing that perspective correction is necessary for quantitative inspection under non-orthogonal views [9,21,24]. Building on the importance of incorporating geometric information and the potential of rectification in the context of crack detection and quantification, the proposed framework transforms the problem into a simpler, standardized form using depth information and quantifying the viewpoint dependence of different crack-width estimation algorithms to enable better control of crack-width measurement processes.

From an application perspective, the proposed framework is meaningful for practical infrastructure inspection, where crack images are often captured under non-ideal viewpoints and varying acquisition distances. In such cases, purely image-based measurement methods may suffer from unstable accuracy due to uncontrolled pose changes. By integrating geometric correction, crack extraction, and metric width estimation into a unified pipeline, the proposed method improves not only average accuracy but also measurement stability, which is essential for reliable structural condition assessment.

Although the proposed method shows improved accuracy and robustness, several limitations should be acknowledged. First, the current validation is based on a limited number of crack specimens and surface conditions. This limits the generalizability of the observed performance improvement patterns across different viewpoint conditions. Second, the method relies on the assumption that the crack-bearing region can be represented by a dominant plane. For rough, curved, or multi-plane surfaces, plane fitting and homography rectification may become less reliable. Third, the crack-width calculation still depends on the quality of binarization, skeleton extraction, local normal estimation, and edge-point detection. Errors in any of these steps may lead to invalid samples or biased width estimation.

These limitations indicate that further study is needed. Future work should test the method on more crack specimens, different surface textures, wider depth–angle ranges, and more complex field environments. Adaptive selection of the target rectification depth should also be investigated to balance geometric correction and crack-shape preservation. Further extensions to curved or multi-plane surfaces would also improve the practical applicability of the proposed framework.

6. Conclusions

This study proposed an RGB-D-assisted crack-width measurement framework to improve measurement accuracy under varying viewpoint conditions. The method integrates plane fitting, geometric rectification, hybrid global–local binarization, dynamic masking, and crack-width calculation into a unified pipeline. By introducing surface-based homography rectification before crack extraction, the proposed framework reduces viewpoint-induced perspective distortion and provides a more consistent geometric basis for quantitative crack-width estimation.

The experimental results showed that the proposed method consistently achieved lower relative error than the baseline image-based measurement method across different depth and viewing-angle settings. With the target rectification depth set to 150 mm, the proposed method reduced the overall fitted-surface error by 19.3–52.3% compared with the baseline method. The improvement was more evident under larger viewing angles and longer imaging distances, where perspective distortion had a stronger influence on image-domain measurement.

These findings indicate that incorporating surface geometry into crack-width measurement is effective for improving robustness under non-orthogonal imaging conditions. The study also demonstrates that viewpoint-consistent evaluation is important for assessing measurement reliability, because the same physical crack regions can be compared across different imaging geometries. Therefore, the proposed framework provides a practical geometry solution for more reliable quantitative crack assessment in infrastructure inspection.

Several limitations remain. The current validation is based on a limited number of crack specimens and relatively controlled imaging conditions. The method also relies on the assumption that the crack region can be approximated by a dominant plane. Future work should further validate the framework using more crack types, different surface textures, wider depth–angle ranges, and more complex field environments. Adaptive rectification-depth selection and uncertainty crack-width measurement should also be investigated to improve the generalizability of the proposed method.

Author Contributions

Conceptualization, S.Z. and Y.N.; Methodology, S.Z. and Y.N.; Software, S.Z.; Validation, S.Z.; Formal analysis, S.Z., Y.L. and Y.N.; Investigation, S.Z.; Resources, S.Z.; Data curation, S.Z.; Writing—original draft, S.Z.; Writing—review & editing, S.Z., Y.L., S.W. and Y.N.; Visualization, S.Z. and S.W.; Supervision, S.Z., Y.L. and Y.N.; Project administration, Y.N.; Funding acquisition, Y.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science Foundation of China, International (Regional) Cooperation and Exchange Program (Grant No. 52361165658).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mehta, P.K.; Monteiro, P.J.M. Concrete: Microstructure, Properties, and Materials; McGraw-Hill: Columbus, OH, USA, 2013. [Google Scholar]
Li, V.C. Micromechanics and Engineered Cementitious Composites (ECC) Design Basis. In Engineered Cementitious Composites (ECC): Bendable Concrete for Sustainable and Resilient Infrastructure; Li, V.C., Ed.; Springer: Berlin/Heidelberg, Germany, 2019; pp. 11–71. [Google Scholar]
Tuutti, K. Corrosion of Steel in Concrete. Doctoral Dissertation, Swedish Cement and Concrete Research Institute, Stockholm, Sweden, 1982. [Google Scholar]
Ahn, E.; Kim, H.; Sim, S.-H.; Shin, S.W.; Shin, M. Principles and Applications of Ultrasonic-Based Nondestructive Methods for Self-Healing in Cementitious Materials. Materials 2017, 10, 278. [Google Scholar] [CrossRef] [PubMed]
Abo Alarab, L.A.; Poursaee, A.; Ross, B.E. An Experimental Method for Evaluating Reinforcement Corrosion in Cracked Concrete. J. Struct. Integr. Maint. 2019, 4, 43–50. [Google Scholar] [CrossRef]
ASCE. 2025 Infrastructure Report Card; ASCE: Reston, VA, USA, 2025. [Google Scholar]
Spencer, B.F.; Hoskere, V.; Narazaki, Y. Advances in Computer Vision-Based Civil Infrastructure Inspection and Monitoring. Engineering 2019, 5, 199–222. [Google Scholar] [CrossRef]
Oh, J.-K.; Jang, G.; Oh, S.; Lee, J.H.; Yi, B.-J.; Moon, Y.S.; Lee, J.S.; Choi, Y. Bridge Inspection Robot System with Machine Vision. Autom. Constr. 2009, 18, 929–941. [Google Scholar] [CrossRef]
Liu, Y.-F.; Nie, X.; Fan, J.-S.; Liu, X.-G. Image-Based Crack Assessment of Bridge Piers Using Unmanned Aerial Vehicles and Three-Dimensional Scene Reconstruction. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 511–529. [Google Scholar] [CrossRef]
Cha, Y.-J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Ni, F.; Zhang, J.; Chen, Z. Pixel-level Crack Delineation in Images with Convolutional Feature Fusion. Struct. Control Health Monit. 2018, 26, e2286. [Google Scholar] [CrossRef]
Koch, C.; Georgieva, K.; Kasireddy, V.; Akinci, B.; Fieguth, P. A Review on Computer Vision Based Defect Detection and Condition Assessment of Concrete and Asphalt Civil Infrastructure. Adv. Eng. Inform. 2015, 29, 196–210. [Google Scholar] [CrossRef]
Kim, H.; Ahn, E.; Shin, M.; Sim, S.-H. Crack and Noncrack Classification from Concrete Surface Images Using Machine Learning. Struct. Health Monit. 2018, 18, 725–738. [Google Scholar] [CrossRef]
Kim, H.; Lee, S.; Ahn, E.; Shin, M.; Sim, S.-H. Crack Identification Method for Concrete Structures Considering Angle of View Using RGB-D Camera-Based Sensor Fusion. Struct. Health Monit. 2021, 20, 500–512. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. In Readings in Computer Vision; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1987; pp. 726–740. [Google Scholar]
Yang, L.; Li, B.; Li, W.; Jiang, B.; Xiao, J. Semantic Metric 3D Reconstruction for Concrete Inspection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: New York, NY, USA, 2018; pp. 1543–1551. [Google Scholar]
Shokri, P.; Shahbazi, M.; Nielsen, J. Semantic Segmentation and 3D Reconstruction of Concrete Cracks. Remote Sens. 2022, 14, 5793. [Google Scholar] [CrossRef]
Feng, C.-Q.; Li, B.-L.; Liu, Y.-F.; Zhang, F.; Yue, Y.; Fan, J.-S. Crack Assessment Using Multi-Sensor Fusion Simultaneous Localization and Mapping (SLAM) and Image Super-Resolution for Bridge Inspection. Autom. Constr. 2023, 155, 105047. [Google Scholar] [CrossRef]
Wu, Y.; Li, S.; Li, Y. Depth-Aware RGB-D Concrete Crack Segmentation and Quantification Using Progressive Cross-Modal Attention. Measurement 2026, 258, 119453. [Google Scholar] [CrossRef]
Wójcik, B.; Żarski, M.; Salamak, M.; Miszczak, J.A. Extracting Crack Characteristics from RGB-D Images. In 2nd Workshop on Engineering Optimization—WEO 2021; Institute of Fundamental Technological Research of the Polish Academy of Sciences: Warsaw, Poland, 2021. [Google Scholar]
Valença, J.; Puente, I.; Júlio, E.; González-Jorge, H.; Arias-Sánchez, P. Assessment of Cracks on Concrete Bridges Using Image Processing Supported by Laser Scanning Survey. Constr. Build. Mater. 2017, 146, 668–678. [Google Scholar] [CrossRef]
Chen, G.; Liang, Q.; Zhong, W.; Gao, X.; Cui, F. Homography-Based Measurement of Bridge Vibration Using UAV and DIC Method. Measurement 2021, 170, 108683. [Google Scholar] [CrossRef]
Min, J.; Bang, Y.; Bang, H.; Jeon, H. Port Structure Inspection Based on 6-DOF Displacement Estimation Combined with Homography Formulation and Genetic Algorithm. Appl. Sci. 2021, 11, 6470. [Google Scholar] [CrossRef]
Yu, S.; Zhang, J.; Zhu, C.; Sun, Z.; Dong, S. Full-Field Deformation Measurement and Cracks Detection in Speckle Scene Using the Deep Learning-Aided Digital Image Correlation Method. Mech. Syst. Signal Process. 2024, 209, 111131. [Google Scholar] [CrossRef]
An, Y.; Zhu, Q.; Huang, S.; Ou, J.; Lei, J. Automated Crack Parameter Identification in Oblique Imaging Using 3D Laser Point Cloud. Autom. Constr. 2025, 178, 106411. [Google Scholar] [CrossRef]
Merkle, D.; Solass, J.; Schmitt, A.; Rosin, J.; Reiterer, A.; Stolz, A. Semi-Automatic 3D Crack Map Generation and Width Evaluation for Structural Monitoring of Reinforced Concrete Structures. ITcon 2023, 28, 774–805. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man. Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Fan, C.; Ding, Y.; Liu, X.; Yang, K. A Review of Crack Research in Concrete Structures Based on Data-Driven and Intelligent Algorithms. Structures 2025, 75, 108800. [Google Scholar] [CrossRef]
Sauvola, J.; Pietikäinen, M. Adaptive Document Image Binarization. Pattern Recognit. 2000, 33, 225–236. [Google Scholar] [CrossRef]
Lee, J.; Kim, H.-S.; Kim, N.; Ryu, E.-M.; Kang, J.-W. Learning to Detect Cracks on Damaged Concrete Surfaces Using Two-Branched Convolutional Neural Network. Sensors 2019, 19, 4796. [Google Scholar] [CrossRef]
Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer Vision-Based Concrete Crack Detection Using U-Net Fully Convolutional Networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
Asadi Shamsabadi, E.; Xu, C.; Rao, A.S.; Nguyen, T.; Ngo, T.; Dias-da-Costa, D. Vision Transformer-Based Autonomous Crack Detection on Asphalt and Concrete Surfaces. Autom. Constr. 2022, 140, 104316. [Google Scholar] [CrossRef]
Li, C.; Qin, H.; Tang, Y.; Zhao, H.; Pan, S.; Liu, J.; Luo, W. An Image-Based Concrete-Crack-Width Measurement Method Using Skeleton Pruning and the Edge-OrthoBoundary Algorithm. Buildings 2025, 15, 2489. [Google Scholar] [CrossRef]
Ge, K.; Wang, C.; Guo, Y.; Tang, Y.; Hu, Z.; Chen, H. Fine-Tuning Vision Foundation Model for Crack Segmentation in Civil Infrastructures. Constr. Build. Mater. 2024, 431, 136573. [Google Scholar] [CrossRef]
Liebowitz, D.; Zisserman, A. Combining Scene and Auto-Calibration Constraints. In Proceedings of the Seventh IEEE International Conference on Computer Vision; IEEE: Kerkyra, Greece, 1999; Volume 1, pp. 293–300. [Google Scholar]
Ma, Y.; Soatto, S.; Košecká, J.; Sastry, S.S. An Invitation to 3-D Vision; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2004; Volume 26. [Google Scholar]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Zhang, T.Y.; Suen, C.Y. A Fast Parallel Algorithm for Thinning Digital Patterns. Commun. ACM 1984, 27, 236–239. [Google Scholar] [CrossRef]
Huber, P.J. Robust Estimation of a Location Parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics; University of California Press: Berkeley, CA, USA, 1967; Volume 5.1, pp. 281–298. [Google Scholar]

Figure 1. Framework of the proposed crack width estimation method.

Figure 2. Representative invalid-sample cases in crack-width measurement.

Figure 3. Viewpoint-consistent evaluation framework for same-location crack width comparison.

Figure 4. Experimental setup and ground-truth crack-width measurement.

Figure 5. Experiment validation of the proposed approach to concrete multiple surfaces.

Figure 6. Schematic diagram of

d_{t a r g e t} .

Figure 6. Schematic diagram of

d_{t a r g e t} .

Figure 7. The sample validity points of rectification image of

d_{t a r g e t} =

100 mm.

Figure 7. The sample validity points of rectification image of

d_{t a r g e t} =

100 mm.

Figure 8. The sample validity points of rectification image of

d_{t a r g e t} =

150 mm.

Figure 8. The sample validity points of rectification image of

d_{t a r g e t} =

150 mm.

Figure 9. 3D Relative error of two views of Crack 1.

Figure 10. Relative error vs. depth by shared angle KMeans of Crack 1.

Figure 11. Relative error vs. angle by shared depth KMeans of Crack 1.

Figure 12. 3D Relative error of two views of Crack 2.

Figure 13. Relative error vs. depth by shared angle KMeans of Crack 2.

Figure 14. Relative error vs. angle by shared depth KMeans of Crack 2.

Table 1. Comparison of candidate target rectification distances for parameter selection.

$d_{t a r g e t}$ (mm)	$N_{v}$	$N_{t}$	$R_{v}$ (%)	${\bar{E}}_{r e l}$	$R_{i n t}$ (%)
100	1355	1376	98.5%	0.109	54.3%
150	1370	1376	99.6%	0.137	52.3%
200	1376	1376	100%	0.168	34.5%
250	1375	1376	99.9%	0.189	23.7%
300	1376	1376	100%	0.203	16.2%

where

N_{v}

and

N_{t}

denote the numbers of valid and total evaluation points, respectively.

R_{v}

denotes the validity rate,

{\bar{E}}_{r e l}

denotes the mean relative error, and

R_{i n t}

denotes the integrated fitted-surface error reduction. The results are summarized in Table 1.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, S.; Li, Y.; Wang, S.; Narazaki, Y. Accuracy Enhancement of Homography-Based Crack Width Calculation Using RGB-D Sensors. Buildings 2026, 16, 2282. https://doi.org/10.3390/buildings16112282

AMA Style

Zhou S, Li Y, Wang S, Narazaki Y. Accuracy Enhancement of Homography-Based Crack Width Calculation Using RGB-D Sensors. Buildings. 2026; 16(11):2282. https://doi.org/10.3390/buildings16112282

Chicago/Turabian Style

Zhou, Shijie, Yuxuan Li, Shuo Wang, and Yasutaka Narazaki. 2026. "Accuracy Enhancement of Homography-Based Crack Width Calculation Using RGB-D Sensors" Buildings 16, no. 11: 2282. https://doi.org/10.3390/buildings16112282

APA Style

Zhou, S., Li, Y., Wang, S., & Narazaki, Y. (2026). Accuracy Enhancement of Homography-Based Crack Width Calculation Using RGB-D Sensors. Buildings, 16(11), 2282. https://doi.org/10.3390/buildings16112282

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accuracy Enhancement of Homography-Based Crack Width Calculation Using RGB-D Sensors

Abstract

1. Introduction

2. Related Works

2.1. RGB-D Sensor Fusion for Crack Detection and Quantification

2.2. Image Rectification for Robust Geometric Quantification

2.3. Crack Detection Algorithms in Images

2.4. Summary of Research Gaps

3. Methodology

3.1. Overview

3.2. Plane Fitting

3.3. Geometric Rectification

3.4. Dynamic Masking

3.5. Hybrid Global–Local Binarization

3.6. Crack Width Calculation

3.7. Viewpoint-Consistent Evaluation Methodology

4. Experiments and Results

4.1. Experiment Setup and Overview of Experiments

4.2. Evaluation Baseline and Metrics

4.3. Selection of Target Rectification Depth

4.4. Crack Width Estimation Error in the Depth–Angle Space

4.5. Relative Error Analysis Under Depth and Angle Variability

4.5.1. Relative Error Variation with Depth Under Shared Angle Groups

4.5.2. Relative Error Variation with Angle Under Shared Depth Groups

4.6. Validation on a Second Crack

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI