Vision-Based Detection, Localization, and Optimized Path Planning for Rebar Intersections in Automated Construction

Li, Chengxiang; Zhang, Weimin; Li, Fangxing; Guo, Meijun; Fan, Shicheng

doi:10.3390/app15137186

Open AccessArticle

Vision-Based Detection, Localization, and Optimized Path Planning for Rebar Intersections in Automated Construction

by

Chengxiang Li

^1,*

,

Weimin Zhang

^1,2,*

,

Fangxing Li

¹,

Meijun Guo

¹

and

Shicheng Fan

¹

School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China

²

Zhengzhou Research Institute, Beijing Institute of Technology, Zhengzhou 450000, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7186; https://doi.org/10.3390/app15137186

Submission received: 4 June 2025 / Revised: 22 June 2025 / Accepted: 25 June 2025 / Published: 26 June 2025

Download

Browse Figures

Versions Notes

Abstract

The accurate detection and precise spatial localization of rebar intersection points are essential for advancing automation in construction tasks, such as robotic rebar tying. This paper presents a vision-based methodology that integrates RGB-D sensing, camera calibration, and coordinate transformation techniques to robustly detect and localize rebar crossing points. A structured detection framework efficiently extracts intersection coordinates from RGB-D imagery, subsequently mapping these points to a global reference frame using extrinsic camera calibration parameters. To achieve comprehensive site coverage and optimize operational efficiency, the path planning challenge is reformulated as a sequencing optimization problem of the identified intersections. We propose a greedy optimization algorithm that generates smooth, snake-like traversal paths in an efficient manner. Experimental validation confirms the effectiveness of our approach, demonstrating detection accuracy exceeding 99%, an average processing time below 125 ms per intersection point, and a maximum coordinate transformation error under 2 mm. The presented solution offers a lightweight, precise, and scalable framework, significantly facilitating the integration of vision-based methods into automated construction workflows.

Keywords:

automated construction; rebar intersection detection; vision-based detection; path planning; coordinate transformation; RGB-D sensing

1. Introduction

With the rapid development of the construction industry and increasing labor costs, integrating robotics into construction processes has attracted growing interest [1]. Robotics can greatly improve construction efficiency, enhance accuracy, and reduce workplace accidents. Traditionally, the construction industry has relied heavily on labor-intensive, repetitive work in hazardous environments, making robotics a promising solution to these issues [2,3]. Advances inconstruction robotics not only promote automation but also expand its applications—from material transport to the precise assembly of structural elements [4]. The use of robotics helps to streamline workflows, boost productivity, improve safety, and significantly lower workers’ exposure to dangerous conditions on construction sites [5].

Traditional manual rebar tying is inefficient, labor-intensive, and highly dependent on the worker’s skill and physical condition. As a critical task in construction, rebar tying demands precision and sustained high-intensity labor, often resulting in low productivity, inconsistent quality, and safety risks [6,7]. With recent advances in deep learning and artificial intelligence, neural network-based rebar tying robots have been developed. These robots leverage machine vision and image processing to detect rebar intersections and employ robotic arms for precise automatic tying, effectively replacing manual labor in repetitive and strenuous tasks [8,9]. They significantly reduce physical workload, improve consistency, and enhance structural reliability. Currently, rebar tying robots are increasingly adopted in large-scale infrastructure projects such as bridges, high-speed railways, and underground facilities [10]. They help to improve construction efficiency, ensure timely project delivery, and shorten construction timelines. As technology continues to advance and costs decrease, the market potential of these robots is expanding. In the future, rebar tying robots are expected to be integrated with broader construction automation systems, enabling more intelligent workflows and further improving overall construction efficiency and quality [11].

In rebar tying robotics, vision recognition systems based on deep learning often have limited generalization capability. When used on different construction sites or with various rebar types, these models may fail to maintain high accuracy and can easily overfit [12,13]. Although Time-of-Flight (TOF) camera-based systems offer better generalization, they are highly sensitive to ambient light, which can reduce recognition accuracy [14,15]. Moreover, coordinate transformations between the camera and the robotic arm during operation are complex and can cause significant localization errors, hindering precise tying [16,17]. To address these issues, this paper presents a vision-based detection and path planning system for automated rebar tying. The system captures RGB-D images using a TOF camera and applies depth range filtering and Gaussian smoothing to suppress environmental noise. It then uses the Hough Transform to detect rebar intersections and extract their 2D pixel coordinates. Through extrinsic calibration, these coordinates are converted to 3D positions in the robot arm’s coordinate frame, providing accurate spatial data for efficient and precise tying. This study aims to develop and validate an automated framework that reliably detects rebar intersections and generates optimal tying paths under typical site conditions.

2. Related Work

As the construction industry demands higher efficiency and safety, manual rebar tying faces challenges such as low productivity and significant safety risks. To address these issues, rebar tying robots have been developed to automate the process, improving both precision and efficiency while reducing physical labor. These robots not only lower labor costs but also enhance the quality of the tying process. High-quality rebar tying is critical for structural durability and construction safety [18]. With advances in robotics, extensive research has been conducted on vision recognition, path planning, and motion control, providing a solid foundation for the practical application of rebar tying robots.

Several autonomous rebar tying robots have been successfully deployed on construction sites, each with specific advantages and limitations. TyBOT, developed by Advanced Construction Robotics, targets large-scale infrastructure projects such as bridges and roads. It can complete more than 1200 ties per hour in all weather conditions, greatly improving efficiency and reducing labor demands. However, its use is mainly limited to standardized scenarios and it struggles with irregular rebar arrangements or smaller projects. In addition, its high cost and sensitivity to factors like lighting variations and corroded rebar hinder wider adoption [19]. SkyTy, developed by SkyMul, leverages drone technology for automated rebar tying. It consists of a ground unit, a mapping drone, and multiple working drones, aiming to cut labor costs, reduce errors, and speed up construction. Yet, as a new technology, SkyTy still faces challenges in recognition accuracy and operational stability in complex environments, while high equipment and training costs further restrict its use [20]. Similarly, T-iROBO Rebar, developed by Taisei Corporation, can accurately detect rebar intersections and perform tying tasks, improving both efficiency and safety. However, it requires a controlled environment and has limited adaptability to complex or irregular layouts, and its high cost also limits widespread implementation [21].

In vision recognition, researchers have improved the accuracy of rebar detection and localization by integrating computer vision and deep learning methods. Wang et al. proposed a fast synthetic dataset generation approach using Building Information Modeling (BIM) and rendering software for rebar instance segmentation. By automating data annotation, this method greatly reduced the time and cost of creating training datasets. They trained a Mask R-CNN model with mixed real and synthetic data in various proportions, demonstrating that synthetic data can enhance prediction performance and provide a certain level of generalization. However, this approach focuses mainly on 2D segmentation and does not address detailed 3D attributes such as position, spacing, and diameter, limiting its use in rebar tying robots [22]. Feng et al. developed an innovative planar rebar tying robot and introduced a two-stage recognition method that combines depth cameras and industrial cameras. By using the YOLOv5 deep learning model and the DBSCAN clustering algorithm, they achieved accurate detection and condition assessment of rebar intersections. Autonomous navigation and path planning were implemented with the ROS system. Experiments on both lab platforms and actual construction sites verified the system’s effectiveness. However, this method still faces challenges related to adaptation in complex environments, real-time performance, long-term stability, and cost efficiency [23].

In rebar intersection detection, deep learning methods provide high accuracy but have drawbacks such as large data requirements, high computational costs, and limited robustness to environmental variations. As an alternative, Hough Transform-based detection offers simplicity, computational efficiency, and lower data demands, making it suitable for fast rebar intersection detection. The traditional Hough Transform mainly detects straight lines, but rebar intersections require the precise localization of both crossing lines and corner points. To address this, researchers have developed an improved Hough Transform that can detect both corner points and line segments simultaneously. This variant introduces a new voting mechanism in the parameter space, enabling a more accurate identification of line segment endpoints and their intersections. By combining corner and line detection, the accuracy and efficiency of intersection recognition are significantly enhanced, providing reliable support for rebar tying automation [24]. However, the Hough Transform may struggle with noisy images. Noise can produce false edge points, leading to excessive meaningless votes in the Hough space and increasing the false detection rate. This also raises computational complexity and processing time [25]. Therefore, effective image preprocessing is essential before applying the Hough Transform to improve detection accuracy [26,27].

In path planning and motion control, the Traveling Salesman Problem (TSP) is commonly used to optimize the robot’s path among multiple rebar intersections. By planning the shortest route, the robot can visit each intersection sequentially, minimizing travel distance and time, thus improving overall tying efficiency. An improved A-based path planning method has also been proposed, combining priority constraints and dead zone escape algorithms to achieve the complete coverage of the work area and meet the requirements of rebar tying tasks. However, this approach still faces issues such as high computational complexity and poor adaptability to dynamic environments [28]. Another method utilizes Building Information Modeling (BIM) data for task planning in legged rebar tying robots. This approach generates initial intersection points from BIM, filters them based on the robot’s operational constraints, and optimizes the task sequence using a constrained genetic algorithm. Experiments show that this method achieves an average tying efficiency of about 1500 points per hour, nearly twice that of manual work. Nevertheless, challenges remain in adapting to complex site conditions, managing hardware inconsistencies across robots, and scaling to large projects [29]. Shen et al. proposed a novel motion and tying control strategy for rebar tying robots. The robot integrates multiple sensors, such as photoelectric sensors and limit switches, to ensure precise movement and operation within a confined rebar grid. With machine vision, the robot locates rebar nodes, adjusts the tying gun position, and ties multiple nodes sequentially. Experiments confirm that the system achieves a positioning accuracy within 5 mm and can tie four nodes at a single location. This design reduces unnecessary movement within the grid, enabling the efficient execution of repetitive tying tasks while maintaining quality standards [30].

To address the aforementioned issues, this paper proposes a robust image preprocessing method using a bilateral filter to reduce the impact of environmental noise on RGB-D images. The Hough Transform and line segment merging processes are further optimized to enhance the accuracy of intersection point detection. Additionally, a greedy serpentine path planning algorithm is introduced to improve operational efficiency. To increase the precision of intersection localization, a coordinate transformation approach based on coordinate system alignment is also presented. These improvements enable the robot to perform rebar tying tasks more efficiently and accurately.

3. Overall System

This paper presents a rebar intersection recognition and planning system based on autonomous environmental sensing to support automated rebar tying operations. The system comprises three main modules: visual recognition, path planning, and coordinate transformation. The visual recognition module uses computer vision techniques to capture and process real-time image data from the construction site, accurately identifying the locations of rebar intersections. Based on the detected points, the path planning module applies optimization algorithms to generate an efficient traversal sequence, ensuring safe and smooth operation. The coordinate transformation module converts the image coordinates from the recognition system into spatial coordinates required for robotic manipulation, enabling precise localization and actuation. Through the integrated operation of these modules, the system autonomously perceives and adapts to complex construction environments, providing reliable position and path information for robotic tying tasks. This improves operational accuracy and efficiency while reducing labor intensity in reinforced concrete construction. Figure 1 shows the overall architecture of the system.

3.1. Visual Recognition System

The tasks of the visual recognition system are to identify and localize rebar intersection points in a complex, interference-prone environment using RGB-D images captured by a camera and to output the accurate pixel coordinates of these intersection points. To achieve this objective, several key aspects need to be considered:

Extracting rebar layers from complex grid backgrounds remains a significant challenge;
Ambient light interference must be minimized to ensure reliable TOF camera imaging and system recognition accuracy;
The sensor precision must be sufficiently high to guarantee the accurate localization of intersection points.

Therefore, we divide the visual recognition system into several components, including an image preprocessing system, a robust depth data augmentation system, and an intersection point recognition system, each addressing the respective issues. Figure 2 illustrates the overall workflow of the visual recognition system.

3.1.1. Image Preprocessing System

The image preprocessing system mainly uses a Depth Range Filter to address challenges in complex construction environments. In practice, rebar grids are usually multi-layered, but only the top layer requires tying. Relying solely on RGB images makes it difficult to isolate the top layer from lower layers, often leading to false detections in intersection recognition. Furthermore, construction sites contain various irrelevant objects, such as other materials and tools, which add background noise and hinder accurate rebar localization. By applying a Depth Range Filter, the system effectively filters out layers and background clutter, improving the precision of rebar intersection detection.

Since our camera is installed perpendicular to the rebar grid, the distance between the camera and the rebar at different positions on the top layer is approximately the same. Based on this configuration, we introduce a Depth Range Filter based on RGB-D images. The function of the filter is illustrated in Figure 3.

The function of the filter is to remove pixels from the depth image that fall within a specific asymmetric depth range. Specifically, a fixed depth value

D_{0}

is first set, and then deviations

Δ_{1}

are

Δ_{2}

defined for the lower and upper bounds, respectively. These parameters determine the filtering range, which is used to construct a masking function. The mathematical expression for this function is given by Equation (1).

M (u, v) = \{\begin{matrix} 1, & if D_{0} - Δ_{1} \leq D (u, v) \leq D_{0} + Δ_{2} \\ 0, & otherwise \end{matrix}

(1)

The function of the masking function is as follows: when the depth value D(u,v) lies within the range [

D_{0} - Δ_{1}

,

D_{0}

+

Δ_{2}

], M(u,v) = 1, this indicates that the pixel is retained; otherwise, M(u,v) = 0, indicating that the pixel is filtered out.

The filtered depth image D′(u,v) is obtained by applying the masking function to the original depth image, as shown in Equation (2).

D^{'} (u, v) = D (u, v) \times M (u, v)

(2)

3.1.2. Robust Depth Data Augmentation System

The objective of this system is to reduce or eliminate the instability and errors caused by factors such as changes in ambient light and noise interference in depth images, thereby enhancing the clarity and accuracy of the depth data. This, in turn, supports high-precision visual recognition tasks in complex environments.

Due to the sensitivity of TOF cameras to ambient lighting, depth images often contain holes or missing values. To address this, an interpolation algorithm based on bilateral filtering is employed to fill these gaps. This method considers both spatial distance and depth value differences when computing weights, enabling a weighted-average interpolation that better preserves image structure. Unlike traditional interpolation methods that rely solely on spatial proximity, bilateral filtering incorporates depth disparities, resulting in more accurate weight calculations and improved edge preservation. This approach effectively maintains edges and fine details, preventing blurring across object boundaries in the depth image. The working principle of the bilateral filter used in this study is illustrated in Figure 4.

The specific calculation method is shown in Equation (3), where the weight function is composed of the product of spatial Gaussian weights and depth difference Gaussian weights.

w (i, j) = exp (- \frac{‖ x_{i} - x_{j} ‖^{2}}{2 σ_{s}^{2}}) \times exp (- \frac{{(D_{i} - D_{j})}^{2}}{2 σ_{r}^{2}})

(3)

where

w (i, j)

represents the combined weight between pixel i and pixel j;

x_{i} = (x_{i}, y_{i})

and

x_{j} = (x_{j}, y_{j})

are the coordinates of pixels i and j, respectively;

D_{i}

and

D_{j}

are the depth values of pixels i and j, respectively;

σ_{s}

is the spatial standard deviation that controls the spatial weight decay; and

σ_{r}

is the intensity standard deviation that controls the decay of the depth difference weight.

In the interpolation process, for each missing depth value pixel i, the interpolated depth value

{\hat{D}}_{i}

is calculated using a weighted average of the set of neighboring non-missing pixels

Ω_{i}

. The calculation formula is shown in Equation (4).

{\hat{D}}_{i} = \frac{\sum_{j \in Ω_{i}} w (i, j) \cdot D_{j}}{\sum_{j \in Ω_{i}} w (i, j)}

(4)

In this formula, the numerator is the weighted sum of the depth values of the non-missing pixels in the neighborhood, while the denominator is the total sum of the corresponding weights. In this way, neighboring pixels that are closer to the missing pixel and have similar depth values contribute more to the interpolation result, effectively preserving the edges and fine details of the image.

We can validate the effectiveness of the image preprocessing system by aligning the RGB image with the depth image and then inputting them into the depth range filter, followed by interpolation using the bilateral filter. Additionally, the preprocessed RGB-D image needs to be binarized. This step is intended to facilitate subsequent image processing tasks.

3.1.3. Intersection Point Recognition System

The raw RGB-D image obtained from the camera, after processing by the image preprocessing system and the robust depth data augmentation system, results in a binarized image. This processed image is then used in conjunction with the intersection point recognition system for rebar intersection detection.The workflow of the intersection point detection system is shown in Figure 5.

After binarization, the image needs to undergo thinning. The purpose of the image thinning algorithm is to progressively reduce the binary image (such as the lines in the rebar image) until it generates a skeleton that is one pixel wide. Specifically, the algorithm iteratively removes pixels that meet certain conditions, simplifying the image into a skeleton for rebar intersection point detection using the Hough Transform. The main objectives of thinning are as follows:

Reduce redundant information: By thinning the image, the lines in the original image are simplified to a one-pixel-wide skeleton, eliminating the thickness and redundant information of the lines. This ensures that no extra detection results are generated due to line width during the Hough Transform, thereby improving computational efficiency.
Improve line detection accuracy: In the unthinned image, wide lines can lead to multiple similar Hough parameters being generated for different pixels, increasing unnecessary interference. The thinned skeleton retains only the centerline, making the representation of each line in Hough space more consistent, which enhances the accuracy of line detection.
Enhance the reliability of intersection point detection: After skeletonization, the intersections become clearer, especially in complex structures where overlapping lines do not cause misjudgments due to thickness. This makes the intersection points detected by the Hough Transform more reliable and consistent with the actual structure.
Reduce computational overhead: The Hough Transform typically calculates every pixel in the image, but the thinned image contains fewer pixels, significantly reducing the computational load. Therefore, skeletonization helps to lower the computational cost of the Hough Transform.

Skeletonized images typically preserve the main structural lines of an object; however, the skeletonization process may also produce irrelevant short branches or protrusions. These short branches can be mistakenly detected as independent line segments during the Hough Transform, increasing false positives. To address this, morphological pruning is applied to remove isolated points or short protrusions that are unconnected or connected by only a single pixel. Pruning the skeleton before the Hough Transform improves detection accuracy, reduces false lines, lowers computational complexity, and enhances the clarity and stability of the results. This ensures that the Hough Transform focuses on the main skeleton structure, leading to more reliable line detection. In this method, pruning removes short branches while preserving the endpoints of the main structure to avoid incorrectly truncating line segments near intersections. As a result, the primary structure becomes more distinct, intersection points remain intact and noise is reduced, enabling more accurate and continuous line detection. The pruning process applied in this study is illustrated in Figure 6.

Subsequently, probabilistic Hough Transform is applied to extract line segments from the pruned image. In the standard Hough Transform, a line can be represented in polar coordinates, as shown in Equation (5).

ρ = x \cos θ + y \sin θ

(5)

where

(x, y)

represents a pixel in the image,

ρ

is the perpendicular distance from the origin of the image coordinates to the line, and

θ

is the angle between the perpendicular line and the x-axis. The Hough Transform detects a line by converting each edge point into a curve in the parameter space. When voting points accumulate in the parameter space to form a peak, this indicates that a line has been detected.

In the standard Hough Transform, each edge point votes across the entire parameter space, resulting in high computational cost. To improve efficiency, this study adopts the Probabilistic Hough Transform (PHT). Unlike the standard method, PHT randomly samples a subset of edge points for voting, significantly reducing redundant computations. Moreover, PHT not only estimates line parameters but also directly detects the start and end points of line segments, enabling the results to be visualized as actual segments in the image. This approach enhances both computational efficiency and detection clarity.

The specific steps of the Probabilistic Hough Transform (PHT) are as follows. First, the pruned skeleton image is input, and the skeleton points are extracted. PHT detection is then performed only on these points. A random subset of edge points is selected for voting. For each selected point

(x_{0}, y_{0})

, its corresponding curve in the parameter space

(ρ, θ)

is computed, and the associated cells in the accumulator are incremented. After voting, cells with the highest votes indicate that multiple points are aligned; on the same line, thus, a line is detected. Unlike the standard Hough Transform, PHT directly outputs the start and end points of the detected line segments by connecting the actual skeleton points, avoiding the unnecessary extension of lines to image boundaries and ensuring that the extracted segments accurately represent the true object contours.

In the probabilistic Hough Transform, the accumulator is used to record the vote count for each line in the parameter space. Typically, the accumulator is a two-dimensional matrix

A (ρ, θ)

, where each cell corresponds to a specific combination of

ρ

and

θ

. In the standard Hough Transform, this matrix reserves a cell for every possible combination of

ρ

and

θ

, even though many cells may not receive any votes. This results in a large number of empty cells, especially when there are few line segments in the image.

To address this issue, we utilize a hash table to efficiently store accumulator data, recording only the points that have received votes, which significantly reduces memory consumption. For the Hough Transform accumulator, the hash table maps the parameters

(ρ, θ)

to their corresponding vote counts. The key is the

(ρ, θ)

pair, and the value is the vote count. For example, if a particular

(ρ, θ)

cell has accumulated three votes, the hash table will store this as

{(ρ, θ) : 3}

. If no votes are cast for a particular cell, no corresponding record is made.

A hash table provides constant time complexity for operations such as lookup, insertion, and deletion, ensuring that execution time remains nearly constant, regardless of data size. This makes it highly suitable for frequently updating vote counts in the accumulator. Moreover, the hash table stores only parameter pairs with actual votes, avoiding unnecessary memory allocation for empty accumulator cells. This is especially beneficial in images with few lines, where most cells would otherwise remain empty. Additionally, the hash table enables the faster retrieval of high-vote cells, reducing traversal time and accelerating the detection process, particularly for large images or high-resolution accumulators. The flowchart of the Hough Transform detection process is shown in Figure 7.

After the Hough Transform detection, multiple line segments are obtained. To eliminate redundant line segments and merge adjacent parallel lines, we identify parallel lines and calculate the distance between them. Based on these criteria, line segments are deduplicated and merged. For those segments that are parallel and close to each other, we extend or merge them by taking the midpoint of their endpoints, forming longer, continuous lines. The process of endpoint merging is determined by checking whether the distance between the endpoints is small enough to justify merging, thus generating a simpler and more accurate line representation. During the merging process, a vector-based approach is employed, treating line segments as vectors and directly calculating the weighted average of the endpoints for those lines that are similar in direction and position. This method reduces redundant midpoint calculations and efficiently generates the merged vector.

Before merging the line segments, the following conditions must be satisfied:

Parallelism: The direction vectors of the two line segments should be approximately parallel. This is typically determined by calculating the angular difference between the segments or by evaluating the dot product of their direction vectors.
Proximity: The distance between the endpoints or midpoints of the two line segments should be within a certain threshold, ensuring that they are sufficiently close. A distance threshold can be set to determine whether the proximity condition is met.

The specific calculation method is as follows. Assume that there are two line segments,

L_{1}

and

L_{2}

. First, the direction vectors of each line segment are calculated, followed by computing the cosine of the angle between the vectors to determine if the two line segments are parallel. If

cos θ

is close to 1 or −1 (within a small threshold), the two line segments are considered parallel. The proximity condition is determined by calculating the Euclidean distance between all pairs of endpoints from

L_{1}

and

L_{2}

and selecting the minimum distance as the proximity measure. If the minimum distance

d_{\min}

is smaller than the set proximity threshold, the two line segments are considered sufficiently close and eligible for merging.

In this paper, a bidirectional projection method is used to merge two closely positioned line segments, as shown in Figure 8. The bidirectional projection method projects the endpoints of one line segment onto the direction of the other line segment, allowing for a more accurate determination of each segment’s coverage along the other’s direction. The bidirectional projection ensures that, even if the two line segments are not perfectly aligned, a reasonable merging area can still be identified, rather than relying solely on a single directional judgment. The projection

P^{'}

of any point P onto the line segment

A B

is given by Equation (6).

\vec{P^{'} O} = \vec{A O} + \frac{\vec{P A} \cdot \vec{v_{1}}}{∥ \vec{v_{1}} ∥^{2}} \cdot \vec{v_{1}}

(6)

By calculating the dot product of the vector from point P to point A with the direction vector

\vec{v_{1}}

, the projection length onto the direction vector is obtained. This projection length is then multiplied by the direction vector to determine the position of the projected point

P^{'}

. After the projection, the two projection points closest to the image boundary are selected to form a new line segment.

The line segments in the image are detected using the Hough Transform. Each pair of line segments is then compared to ensure that they meet the angular criterion, avoiding the misidentification of parallel segments as intersections. For pairs that satisfy the angular condition, the line equations are computed and the intersection point is found by solving the system of equations. The intersection point is then checked to ensure that it lies within the image boundaries and the actual range of the segments, excluding any points outside the segment extensions. The confidence of the intersection is calculated based on factors such as the distance to the segment midpoints and segment lengths. Higher confidence is given to intersections closer to the midpoints and with longer segments, allowing for the retention of reliable intersections while filtering out ambiguous ones. The formula for calculating the confidence is shown in Equation (7).

β = \frac{| s i n θ |}{1 + (d_{AB}^{2} + d_{CD}^{2})} \times 100 %

(7)

In this equation,

θ

represents the angle between the two lines, and

d_{A B}^{2}

and

d_{C D}^{2}

are the squared distances from the detected intersection point to the midpoints of the two line segments. The formula uses a nonlinear function to increase the weight of the distance, ensuring that the confidence of the intersection point decreases rapidly as it moves further from the segment midpoints. Additionally, the angle between the two line segments is considered. If the segments are nearly parallel, the confidence of the intersection should be lower. Therefore, an angular factor is included to optimize the confidence calculation.

After the intersection points are detected, the path planning system is used to assign labels to the detected points, providing the robot arm with the binding sequence. The coordinate transformation system is then used to convert the intersection points into 3D coordinates in the robot arm’s coordinate system.

3.2. Path Planning System

The objective of the path planning problem in this paper is to visit each intersection point in a grid environment without obstacles, ensuring both completeness and efficiency. Because the detected 2D pixel coordinates of the intersection points must be transformed into 3D coordinates in the robot arm’s coordinate system, the original path planning problem is effectively reduced to a sorting problem of the detected pixel points, with tying operations executed sequentially. Another challenge is to determine the optimal traversal route between the start and end points as quickly as possible while maintaining the robot arm’s motion stability [31]. To address this, a greedy algorithm is employed to optimize a serpentine path, which achieves efficient area coverage for repetitive tasks, offers a simple implementation, keeps computational complexity low, and ensures high stability during robot motion [32].

The traditional serpentine path starts at the origin and progresses in a specific direction (e.g., from left to right). When the end of a grid row is reached, the direction is reversed (e.g., from right to left), and the robot continues along the next row. This process is repeated until all grid cells are covered. Each time the direction changes, the robot must ensure that the path remains continuous and does not revisit previously scanned areas. Since the initial direction of the path is fixed, this can lead to a situation where, as shown in Figure 9 and Figure 10, the total path length increases due to the differing horizontal and vertical spacing between the grid cells.

The basic idea of the greedy algorithm in solving path planning problems is as follows: starting from a point, repeatedly select the nearest unvisited point until all points have been visited. The greedy algorithm assumes that, by selecting the locally optimal solution at each step, a globally optimal solution will eventually be reached. However, this is not always the case. For simple rectangular grid-based path planning problems, the greedy algorithm typically chooses the current optimal solution at each step, avoiding multiple searches and backtracking. This results in faster computation and generally yields a shorter path. In most greedy algorithms, the Euclidean distance is used to calculate the nearest point. However, since the robot arm in this paper typically moves only along the x-axis and y-axis, Manhattan distance is more suitable for grid-based path calculations, especially when the allowed movement directions are limited to horizontal and vertical directions. For two points in a 2D plane,

(x_{1}, y_{1})

and

(x_{2}, y_{2})

, the formula for calculating the Manhattan distance is shown in Equation (8).

D_{Manhattan} = | x_{1} - x_{2} | + | y_{1} - y_{2} |

(8)

Manhattan distance does not involve square roots or multiplication, making it simpler and faster to compute. Compared to Euclidean distance, using Manhattan distance as a heuristic function in the greedy algorithm [33] allows the algorithm to efficiently guide the path toward the target, particularly in grid-based path planning problems where only horizontal and vertical movements are allowed. After completing the tying task for the current grid, the robotic arm and camera are moved to the next grid to perform the same tying operation.

3.3. Coordinate Transformation System

Since we obtain the intersection points in pixel coordinates from the image, to enable the robotic arm to perform the tying task at these intersections, the three-dimensional coordinates of the intersection points in the robot’s coordinate system are required. This involves a key problem in robotics known as hand–eye calibration [34]. The purpose of hand–eye calibration is to determine the spatial relationship between the camera and the robot’s end-effector (such as the robotic arm), thus aligning the camera’s pixel coordinate system with the robot’s working coordinate system. This process is commonly used in robotic vision applications, such as object grasping and visual guidance.

In recent years, several optimized hand–eye calibration methods have been proposed based on classical theories. Koide K et al. [35] proposed an optimization method based on minimizing re-projection errors. By converting the calibration problem into the minimization of re-projection errors, they achieved a more accurate calibration of the spatial relationship between the camera and the robotic arm. This method is not limited to specific scenarios and is applicable to general hand–eye calibration needs. Wu J et al. [36] proposed a globally optimal symbolic hand–eye calibration method. By introducing symbolic computation methods, they developed an algorithm capable of globally optimizing the hand–eye calibration process. This method ensures a global optimal solution during the calibration process, significantly improving calibration accuracy while reducing computational complexity, making it suitable for complex robotic vision systems.

The process of converting pixel coordinates to the robotic arm’s coordinates typically involves multiple steps of coordinate transformation. Starting from the image plane coordinates, the transformation proceeds through camera calibration, including both intrinsic and extrinsic parameters, and ultimately results in the three-dimensional position in the robotic arm’s coordinate system.

First, the pixel coordinates

(u, v)

in the image are converted to the three-dimensional coordinates

(X_{c}, Y_{c}, Z_{c})

in the camera coordinate system. This transformation requires the camera’s intrinsic matrix K, as given in Equation (9), along with the depth information

Z_{c}

.

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}]

(9)

where

f_{x}

and

f_{y}

are the focal lengths of the camera, and

(c_{x}, c_{y})

are the coordinates of the principal point (the center of the optical axis).

Based on the perspective projection model, given the pixel coordinates

(u, v)

and the depth

Z_{c}

, the three-dimensional coordinates

(X_{c}, Y_{c}, Z_{c})

in the camera coordinate system can be obtained, as shown in Equation (10).

[\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}] = K^{- 1} [\begin{matrix} u \cdot Z_{c} \\ v \cdot Z_{c} \\ Z_{c} \end{matrix}]

(10)

where

K^{- 1}

is the inverse of the camera’s intrinsic matrix.

Through hand–eye calibration, the transformation matrix

T_{C E}

between the camera coordinate system and the robotic arm’s end-effector coordinate system can be obtained. This transformation matrix is typically acquired through calibration experiments and consists of the rotation matrix

R_{C E}

(

3 \times 3

matrix) and the translation vector

t_{C E}

(

3 \times 1

vector), as shown in Equation (11).

[\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}] = K^{- 1} [\begin{matrix} u \cdot Z_{c} \\ v \cdot Z_{c} \\ Z_{c} \end{matrix}]

(11)

The camera coordinates are extended to homogeneous coordinates by adding a fourth component of 1, resulting in

(X_{C}, Y_{C}, Z_{C}, 1)

. As shown in Equation (12), this can be transformed into the three-dimensional point

(X_{e}, Y_{e}, Z_{e}, 1)

in the robotic arm’s end-effector coordinate system using the matrix

T_{C E}

.

[\begin{matrix} X_{e} \\ Y_{e} \\ Z_{e} \\ 1 \end{matrix}] = T_{C E} [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \\ 1 \end{matrix}]

(12)

To convert the camera coordinates to the robotic arm’s base coordinate system, we also need to consider the transformation between the end-effector and the base. Let the transformation matrix from the end-effector coordinate system to the base coordinate system be

T_{E B}

, which is typically obtained from the kinematics of the robotic arm. Given the point

(X_{e}, Y_{e}, Z_{e}, 1)

in the end-effector coordinate system, as shown in Equation (13), it can be transformed into the three-dimensional point

(X_{b}, Y_{b}, Z_{b}, 1)

in the base coordinate system using

T_{E B}

.

[\begin{matrix} X_{b} \\ Y_{b} \\ Z_{b} \\ 1 \end{matrix}] = T_{E B} [\begin{matrix} X_{e} \\ Y_{e} \\ Z_{e} \\ 1 \end{matrix}]

(13)

Therefore, the transformation from the camera coordinate system to the base coordinate system can be expressed as Equation (14).

[\begin{matrix} X_{b} \\ Y_{b} \\ Z_{b} \\ 1 \end{matrix}] = T_{E B} \cdot T_{C E} \cdot [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \\ 1 \end{matrix}]

(14)

The transformation of pixel coordinates of intersection points to the robotic arm’s base coordinate system may be affected by errors resulting from inaccuracies in hand–eye calibration and kinematic calibration of the robot. These errors are mainly caused by the limitations of calibration algorithms, measurement inaccuracies in the robot’s joints and links, flexible deformations, measurement noise, and environmental fluctuations. Inaccurate hand–eye calibration can lead to misalignment between the camera and the robotic arm coordinate systems, while kinematic calibration errors can further magnify these deviations, ultimately compromising the end-effector’s positioning accuracy [17]. Therefore, it is essential to provide the robotic arm with precise three-dimensional coordinates of the intersection points. In this study, a chessboard calibration plate and a probe are used to implement a chessboard-based coordinate transformation method that converts pixel coordinates into the robotic arm’s coordinate system.

This method simplifies the coordinate transformation by aligning the origin and axes of the robotic arm’s coordinate system with those of the chessboard calibration plate, effectively making the two coordinate systems coincide. This alignment offers several advantages: it reduces computational complexity and avoids unnecessary coordinate conversions, thereby minimizing error accumulation; it enhances operational accuracy, improving system performance in high-precision tasks; and it streamlines the calibration and verification process, making the system more intuitive and efficient. This approach is particularly well-suited for industrial scenarios that demand high precision and repeatability.

First, the origin of the robotic arm coordinate system is aligned with that of the calibration plate using a probe. The robotic arm is positioned so that its coordinate origin coincides with the predefined location of the calibration plate placed at the end-effector probe, ensuring that both origins match as closely as possible. Next, the coordinate axes (x, y, z) of both systems are aligned to ensure directional consistency. Once the origins and axes are properly aligned, external parameter calibration is performed using a chessboard pattern to obtain the extrinsic matrix, as shown in Equation (15). The resulting intrinsic and extrinsic matrices are presented in Figure 11. The extrinsic matrix is then used to transform intersection point coordinates from the camera coordinate system to the chessboard coordinate system, as described in Equation (16). For the z-axis value, since the depth camera can accurately capture depth information, the transformed z-coordinate is replaced with the depth value directly measured by the camera, thereby improving the overall accuracy.

T_{W C} = [\begin{matrix} R & t \\ 0 & 1 \end{matrix}]

(15)

[\begin{matrix} X_{w} \\ Y_{w} \\ Z_{w} \\ 1 \end{matrix}] = T_{w c}^{- 1} [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \\ 1 \end{matrix}]

(16)

where

{(X_{w}, Y_{w}, Z_{w})}^{T}

is the coordinate in the chessboard coordinate system,

{(X_{c}, Y_{c}, Z_{c})}^{T}

is the coordinate in the camera coordinate system, and

T_{w c}^{- 1}

is the inverse of the extrinsic matrix.

4. Experiments

This section focuses on evaluating the feasibility and robustness of the proposed rebar crossing point detection and path planning algorithm under controlled experimental conditions that closely replicate real operational scenarios. The experiments include camera calibration using a checkerboard pattern to obtain precise intrinsic and extrinsic parameters, coordinate transformation validation, crossing point localization, and trajectory planning tests. In addition, the system’s recognition accuracy and positioning errors are thoroughly analyzed to demonstrate its effectiveness and reliability within the tested setup.

4.1. Camera Calibration

To ensure high-precision measurements, this study conducts comprehensive intrinsic and extrinsic calibration of the RGB-D camera using a standard checkerboard pattern. Multiple images of the checkerboard with known dimensions are captured from various angles and positions within the camera’s field of view to accurately determine the intrinsic parameters and distortion coefficients. Simultaneously, based on the spatial geometric relationships, the extrinsic transformation matrix from the camera coordinate system to the robotic arm’s base coordinate system is computed. During the calibration, images are taken from different orientations and poses to enhance the robustness and accuracy of parameter estimation. Figure 12 shows examples of the checkerboard images acquired under various poses and viewpoints, which are used for corner point detection and intrinsic/extrinsic parameter computation.

4.1.1. Calibration Method

The calibration process adopted in this study is based on Zhang’s method, a widely used planar pattern calibration technique. A standard black-and-white checkerboard with known grid size was used as the calibration target. The intrinsic parameters—including the focal lengths

(f_{x}, f_{y})

, principal point

(c_{x}, c_{y})

, and lens distortion coefficients (radial:

k_{1}, k_{2}, k_{3}

; tangential:

p_{1}, p_{2}

)—were estimated by minimizing the reprojection error of the detected corner points.

To ensure robustness and accuracy, multiple images were captured with the checkerboard placed at various positions and orientations within the camera’s field of view. This multi-view coverage enhances the numerical stability of parameter estimation and reduces the sensitivity to noise or local distortion.

The extrinsic parameters, which describe the rigid-body transformation from the camera coordinate frame to the reference world frame (in this case, the manipulator base), were simultaneously estimated based on the known spatial geometry of the checkerboard corners and their corresponding image projections.

4.1.2. Calibration Results and Reprojection Error Analysis

This section presents the calibrated intrinsic and extrinsic matrices, summarizes the distortion parameters with their uncertainties, and analyzes the reprojection error. For completeness, the optimized intrinsic matrix K is given in Equation (17). This

3 \times 3

matrix encodes the camera’s focal lengths and principal point. As shown, the focal length in the x direction is approximately 748.29 pixels and in the y direction is 747.88 pixels, while the principal point is at

(640.24, 356.21)

in pixel coordinates (close to the image center):

K = [\begin{matrix} 748.29024 & 0 & 640.23685 \\ 0 & 747.87601 & 356.21422 \\ 0 & 0 & 1 \end{matrix}]

(17)

Likewise, the extrinsic calibration result is represented by the

3 \times 4

matrix in Equation (18), which combines the rotation matrix R and translation vector T (in the form

[R | T]

).

[R | T] = [\begin{matrix} 0.9997 & - 0.0217 & - 0.0055 & - 372.1355 \\ - 0.0217 & - 0.9997 & - 0.0073 & 104.9251 \\ - 0.0054 & 0.0074 & - 1.0000 & 698.5552 \end{matrix}]

(18)

Here, the first three columns form the rotation matrix R, and the final column is the translation vector T. These extrinsic parameters transform points from the world coordinate system to the camera’s coordinate system.

In addition to the intrinsics and extrinsics, the lens distortion coefficients were estimated as part of the calibration. Table 1 summarizes the five distortion parameters of the standard radial–tangential model—three radial coefficients (

k_{1}

,

k_{2}

,

k_{3}

) and two tangential coefficients (

p_{1}

,

p_{2}

)—along with their standard deviations (uncertainties). The results show that the primary radial distortion coefficients

k_{1}

and

k_{2}

are on the order of

10^{- 2}

and have opposite signs (with

k_{1}

positive and

k_{2}

negative), indicating a mild barrel distortion that is slightly compensated by the second-term correction. The third radial coefficient

k_{3}

is effectively zero within the uncertainty, suggesting that higher-order radial distortion is negligible. Similarly, the tangential distortion coefficients

p_{1}

and

p_{2}

are very close to zero, implying negligible tangential distortion.

The reprojection accuracy of the calibration is evaluated in Table 2 and visualized in Figure 13. The mean reprojection error is 0.07293 pixels in the x-direction and 0.07531 pixels in the y-direction, with the vast majority of corner points exhibiting errors below 0.1 pixels. As shown in Figure 13, the error distribution is tightly clustered around the origin, with no significant directional bias or outliers, indicating that the calibration model provides an excellent fit across all calibration images.

4.2. Coordinate Transformation Error Analysis

To assess the accuracy of coordinate transformation from image space to the robotic workspace, a structured translation error test was conducted. Specifically, the end-effector of the manipulator was commanded to move along the x and y directions by fixed distances (50 mm and 60 mm, respectively). The positions of a selected rebar intersection point were recorded before and after each movement using the vision system. The measured displacement was then compared against the expected ground-truth motion to evaluate the spatial consistency of the transformation pipeline.

This method simulates a known geometric displacement on a fixed local feature, effectively constructing an artificial ground truth, which is a widely adopted strategy in 3D vision system accuracy assessment. To further validate the transformation precision, a high-accuracy probe was mounted on the robotic arm and used to physically touch the center of each rebar intersection point. As shown in Figure 14, the actual end-effector coordinates recorded during probing are considered as the true reference positions in the robot coordinate frame.

Following coordinate transformation, sensor-based alignment verification was performed to evaluate the consistency between the vision-derived coordinates and the robotic control system. By comparing the positions of the same intersection point across both coordinate systems before and after the predefined motion, the transformation error could be quantitatively assessed. This step is especially critical for precision tasks such as rebar binding, where millimeter-level accuracy is required.

The experimental results are summarized in Table 3, which presents representative sample points along with their corresponding displacement errors. Statistically, the average absolute error along the x-axis is 0.757 mm, with a relative error of 1.514%, while, along the y-axis, the average absolute error is 0.914 mm, with a relative error of 1.523%. Most samples exhibit displacement errors of around 1mm with a uniform distribution, indicating that the transformation from image coordinates to the robotic coordinate system is both accurate and robust, meeting the requirements for high-precision spatial localization. The z-coordinate in the camera frame is directly obtained from the depth value measured by the RGB-D sensor. By using the native depth measurement instead of computing the z-coordinate through coordinate transformation, this approach effectively minimizes projection errors and enhances the overall accuracy of 3D localization, particularly in the depth direction.

4.3. Rebar Crossing Point Detection and Path Planning

To accurately identify the positions of rebar intersection points, the proposed system utilizes RGB-D images captured by the calibrated camera. The 2D pixel coordinates of each detected crossing point are first extracted using a vision-based detection module. These pixel coordinates are then transformed into 3D positions in the robot base frame using the previously obtained extrinsic parameters.

The path planning objective is to visit each detected rebar crossing point in a collision-free grid environment, ensuring complete coverage and execution efficiency. Since the detection results are represented in pixel coordinates, the original spatial planning problem is transformed into an ordered sequencing problem over the detected pixel points. The sequence determines the order in which the robot performs tasks (e.g., tying) at each intersection.

To ensure that the manipulator maintains stable motion while minimizing total traversal time, a greedy algorithm is employed to optimize a snake-like path. This path offers efficient area coverage with low computational complexity and high stability—qualities especially desirable in repetitive industrial tasks such as automated rebar tying.

Figure 15 shows the output of the crossing point detection and the corresponding optimized path planning sequence. Each labeled grid cell contains a detected intersection point and its identification index, which serves as the robot’s operation order.

Table 4 summarizes the core performance metrics of the proposed rebar crossing point detection and localization system. In this experiment, a total of 30 RGB-D images were collected, covering 735 rebar crossing points under various grid configurations. The detection algorithm achieves an accuracy exceeding 99% on this dataset, demonstrating stable and reliable performance across different scenarios. The average processing time for each crossing point is below 200 ms, ensuring real-time performance suitable for robotic execution. Furthermore, the maximum observed coordinate transformation error is less than 2 mm, which fully meets the precision requirements for robotic manipulation tasks in construction environments.

Compared to existing rebar tying solutions such as TyBOT, SkyTy, and T-iROBO, the proposed vision-guided system emphasizes cost-effectiveness, adaptability, and scalability rather than maximum binding speed. TyBOT, as a large-scale commercial solution, offers high-speed operation on standard, regular grids but lacks flexibility for confined or irregular sites. SkyTy, as a drone-based approach, provides mobile wire deployment but is limited by payload, precision, and environmental conditions. T-iROBO, as a compact mobile ground robot, is versatile in tight areas but relies on complex path planning and limited battery capacity. In contrast, our system integrates a low-cost RGB-D camera with a robotic arm and robust vision algorithms, achieving over 99% detection accuracy and millimeter-level positioning precision. Moreover, the modular design and generalizable perception pipeline make it scalable to different task sizes and diverse construction environments.

5. Conclusions

This study introduces a comprehensive and practical pipeline designed for accurately detecting rebar crossing points and automating path planning for robotic operations in construction settings. Through meticulous camera calibration, both intrinsic and extrinsic parameters were precisely determined, enabling reliable transformations from image coordinates to the robot’s control coordinate frame. Reprojection errors were maintained at sub-pixel levels, confirming the robustness and accuracy of the calibration procedure.

Structured probing experiments further demonstrated that the coordinate transformations achieved millimeter-level precision, with maximum errors within 2 mm.

The proposed RGB-D vision system effectively detects intersection points within grid-structured steel reinforcements and converts the detected pixel coordinates into precise three-dimensional robot coordinates using calibrated extrinsic parameters. The path planning component redefines the problem as a sequencing optimization task and applies a greedy algorithm to generate efficient snake-like trajectories, ensuring high coverage efficiency, smooth manipulator motion, and low computational cost. Experimental results verified that the system consistently achieves detection accuracy above 99%, with average processing times below 125 ms per point and transformation errors under 2 mm, thus meeting practical requirements for robotic rebar tying tasks.

While these results demonstrate strong feasibility and reliability in semi-controlled test scenarios, challenges remain in scaling the system for deployment on large and dynamic construction sites with varying lighting, occlusions, and environmental disturbances. To address these challenges and further advance this research, future work will focus on integrating deep learning methods to enhance the semantic understanding of complex rebar arrangements and to improve detection robustness in the presence of occlusions and clutter. Additionally, incorporating multi-sensor fusion and adaptive calibration techniques will be explored to increase system resilience and accuracy under changing on-site conditions.

This work lays a solid foundation for the intelligent automation of rebar tying and contributes valuable insights towards broader applications of robotic systems in the construction industry.

Author Contributions

C.L.: Writing—original draft, Conceptualization, Methodology, Software, Validation, Investigation, Data curation. W.Z.: Writing—review and editing, Supervision, Resources, Funding acquisition. F.L.: Project administration. M.G.: Investigation, Data curation. S.F.: Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology and Development Joint Research Foundation of Henan Province (Grant No. 225200810070).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Xiao, B.; Chen, C.; Yin, X. Recent advancements of robotics in construction. Autom. Constr. 2022, 144, 104591. [Google Scholar] [CrossRef]
Asadi, K.; Ramshankar, H.; Pullagurla, H.; Bhandare, A.; Shanbhag, S.; Mehta, P.; Kundu, S.; Han, K.; Lobaton, E.; Wu, T. Building an integrated mobile robotic system for real-time applications in construction. arXiv 2018, arXiv:1803.01745. [Google Scholar]
Liu, Y.; AH, A.; Haron, N.A.; NA, B.; Wang, H. Robotics in the Construction Sector: Trends, Advances, and Challenges. J. Intell. Robot. Syst. 2024, 110, 72. [Google Scholar] [CrossRef]
Bock, T.; Linner, T. Robotic Industrialization; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Wang, T.; Mao, C.; Sun, B.; Li, Z. Genealogy of construction robotics. Autom. Constr. 2024, 166, 105607. [Google Scholar] [CrossRef]
Vi, P. Effects of Rebar Tying Machine on Trunk Flexion and Productivity. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Orlando, FL, USA, 26–30 September 2005; SAGE Publications Sage CA: Los Angeles, CA, USA, 2005; Volume 49, pp. 1349–1353. [Google Scholar]
Aires, M.M.; Alonso, M.L.; Gago, E.J.; Pacheco-Torres, R. Technological advances in rebar tying jobs: A comparative analysis of the associated yields and illnesses. System 2015, 3, 4. [Google Scholar]
Xu, K.; Lu, X.; Shen, T.; Zhu, X.; Wang, S.; Wang, X.; Wang, J. Rebar binding point location method based on improved YOLOv5 and thinning algorithm. Measurement 2025, 242, 116029. [Google Scholar] [CrossRef]
Cardno, C.A. Robotic rebar-tying system uses artificial intelligence. Civ. Eng. Mag. Arch. 2018, 88, 38–39. [Google Scholar] [CrossRef]
Alijani, S.; Fayyad, J.; Najjaran, H. Vision transformers in domain adaptation and domain generalization: A study of robustness. Neural Comput. Appl. 2024, 36, 17979–18007. [Google Scholar] [CrossRef]
Anane, W.; Iordanova, I.; Ouellet-Plamondon, C. BIM-driven computational design for robotic manufacturing in off-site construction: An integrated Design-to-Manufacturing (DtM) approach. Autom. Constr. 2023, 150, 104782. [Google Scholar] [CrossRef]
Lingard, H.; Raj, I.S.; Lythgo, N.; Troynikov, O.; Fitzgerald, C. The impact of tool selection on back and wrist injury risk in tying steel reinforcement bars: A single case experiment. Constr. Econ. Build. 2019, 19, 1–19. [Google Scholar] [CrossRef]
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 2021, 64, 107–115. [Google Scholar] [CrossRef]
Wang, T.L.; Ao, L.; Zheng, J.; Sun, Z.B. Reconstructing depth images for time-of-flight cameras based on second-order correlation functions. Photonics 2023, 10, 1223. [Google Scholar] [CrossRef]
Alenyà, G.; Foix, S.; Torras, C. ToF cameras for active vision in robotics. Sens. Actuators A Phys. 2014, 218, 10–22. [Google Scholar] [CrossRef]
Tsai, R.Y.; Lenz, R.K. A new technique for fully autonomous and efficient 3 d robotics hand/eye calibration. IEEE Trans. Robot. Autom. 1989, 5, 345–358. [Google Scholar] [CrossRef]
Enebuse, I.; Foo, M.; Ibrahim, B.S.K.K.; Ahmed, H.; Supmak, F.; Eyobu, O.S. A comparative review of hand-eye calibration techniques for vision guided robots. IEEE Access 2021, 9, 113143–113155. [Google Scholar] [CrossRef]
Kim, B.; K.R., S.P.; Natarajan, Y.; Danushkumar, V.; An, J.; Lee, D.E. Real-time assessment of rebar intervals using a computer vision-based DVNet model for improved structural integrity. Case Stud. Constr. Mater. 2024, 21, e03707. [Google Scholar] [CrossRef]
Robotics, A.C. TyBOT: Autonomous Rebar Tying Robot. 2018. Available online: https://www.constructionrobots.com/tybot (accessed on 8 January 2025).
Ventures, C. SkyMul, SkyTy Automated Rebar Tying System. 2021. Available online: https://www.cemexventures.com/skymul (accessed on 8 January 2025).
Nishimura, K.T. Development of the rebar tying robot “T-iROBO Rebar”. Constr. Mach. 2018, 54, 13–18. [Google Scholar]
Wang, H.; Ye, Z.; Wang, D.; Jiang, H.; Liu, P. Synthetic datasets for rebar instance segmentation using mask r-cnn. Buildings 2023, 13, 585. [Google Scholar] [CrossRef]
Feng, R.; Jia, Y.; Wang, T.; Gan, H. Research on the System Design and Target Recognition Method of the Rebar-Tying Robot. Buildings 2024, 14, 838. [Google Scholar] [CrossRef]
Bachiller-Burgos, P.; Manso, L.J.; Bustos, P. A variant of the Hough Transform for the combined detection of corners, segments, and polylines. EURASIP J. Image Video Process. 2017, 2017, 32. [Google Scholar] [CrossRef]
James, T.O. A Hardware Track-Trigger for CMS: At the High Luminosity LHC; Springer Nature: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Hassanein, A.S.; Mohammad, S.; Sameer, M.; Ragab, M.E. A survey on Hough transform, theory, techniques and applications. arXiv 2015, arXiv:1502.02160. [Google Scholar]
Adatrao, S.; Mittal, M. An analysis of different image preprocessing techniques for determining the centroids of circular marks using hough transform. In Proceedings of the 2016 2nd International Conference on Frontiers of Signal Processing (ICFSP), Warsaw, Poland, 15–17 October 2016; pp. 110–115. [Google Scholar]
Cheng, B.; Deng, L. Vision detection and path planning of mobile robots for rebar binding. J. Field Robot. 2024, 41, 1864–1886. [Google Scholar] [CrossRef]
Cao, S.; Duan, H.; Guo, S.; Wu, J.; Ai, T.; Jiang, H. BIM-based task planning method for wheeled-legged rebar binding robot. Archit. Eng. Des. Manag. 2024, 20, 656–672. [Google Scholar] [CrossRef]
Shen, D.H.; Guo, S.; Duan, H.; Ji, K.; Jiang, H. Movement and binding control strategy based on a new type of rebar-binding robot. Ind. Robot. Int. J. Robot. Res. Appl. 2024, 51, 837–846. [Google Scholar] [CrossRef]
Karur, K.; Sharma, N.; Dharmatti, C.; Siegel, J.E. A survey of path planning algorithms for mobile robots. Vehicles 2021, 3, 448–468. [Google Scholar] [CrossRef]
Aryan, A.; Bosché, F.; Tang, P. Planning for terrestrial laser scanning in construction: A review. Autom. Constr. 2021, 125, 103551. [Google Scholar] [CrossRef]
Yan, Y. Research on the A Star Algorithm for Finding Shortest Path. Highlights Sci. Eng. Technol. 2023, 46, 154–161. [Google Scholar] [CrossRef]
Horaud, R.; Dornaika, F. Hand-eye calibration. Int. J. Robot. Res. 1995, 14, 195–210. [Google Scholar] [CrossRef]
Koide, K.; Menegatti, E. General hand–eye calibration based on reprojection error minimization. IEEE Robot. Autom. Lett. 2019, 4, 1021–1028. [Google Scholar] [CrossRef]
Wu, J.; Liu, M.; Zhu, Y.; Zou, Z.; Dai, M.Z.; Zhang, C.; Jiang, Y.; Li, C. Globally optimal symbolic hand-eye calibration. IEEE/ASME Trans. Mechatron. 2020, 26, 1369–1379. [Google Scholar] [CrossRef]

Figure 1. System framework for rebar intersection detection and path planning.

Figure 2. Overall workflow of the visual recognition system for rebar intersection detection.

Figure 3. Depth Range Filter for rebar layer separation.

Figure 4. Bilateral filtering-based interpolation for depth image hole filling.

Figure 5. Workflow diagram of the rebar intersection point detection system. The process starts from a binarized image and involves skeletonization, pruning, and Hough Transform steps, culminating in the final intersection point detection result on the original RGB image.

Figure 6. Pruning process of skeletonized image for enhanced line detection.

Figure 7. Flowchart of the Probabilistic Hough Transform with hash table optimization.

Figure 8. Process of merging line segments through bidirectional projection and endpoint combination.

Figure 9. When the length spacing is 1 and the width spacing is 3, the path planning methods of the two algorithms are the same, and the total path length is identical (with the starting point at the lower-left corner).

Figure 10. When the length spacing is 3 and the width spacing is 1, the total path length obtained by optimizing the greedy algorithm is shorter than the total path length planned by the unoptimized algorithm (with the starting point at the lower-left corner).

Figure 11. Calibration results for the intrinsic and extrinsic camera matrices.

Figure 12. The checkerboard images captured from multiple angles were used for camera calibration.

Figure 13. Calibration result analysis: (a) Reprojection error distribution of all calibration images; (b) Bar chart of the mean reprojection error for each calibration image. The error is measured in pixels and reflects the accuracy of camera calibration. Most images exhibit mean errors below 0.1 pixel, indicating high calibration precision and consistency; (c) Rotation estimation error (in radians) for each calibration image. The error is derived from the standard deviation of the optimized rotation vector, indicating the uncertainty in camera orientation estimation. As shown in the figure, most errors remain below 0.0025 radians, demonstrating high precision and stability in the pose calibration process; (d) Translation estimation error (in millimeters) for each calibration image. The error is computed from the standard deviation of the optimized translation vectors, indicating the uncertainty in position estimation. As shown in the figure, most errors lie between 0.6 mm and 0.8 mm with moderate variation, demonstrating reliable and consistent spatial localization accuracy in the calibration process.

Figure 14. Probe-based ground-truth acquisition system for coordinate transformation validation.

Figure 15. Detection and path planning results for rebar crossing points. Red labels indicate the identification index and confidence at each detected intersection. The planned sequence ensures full coverage with minimal path redundancy.

Table 1. Estimated lens distortion coefficients (radial

k_{1}

,

k_{2}

,

k_{3}

and tangential

p_{1}

,

p_{2}

) with their standard deviations.

Table 1. Estimated lens distortion coefficients (radial

k_{1}

,

k_{2}

,

k_{3}

and tangential

p_{1}

,

p_{2}

) with their standard deviations.

Parameter	Value ± Std. Dev.
$k_{1}$	$0.06987 \pm 0.00094$
$k_{2}$	$- 0.07504 \pm 0.00169$
$p_{1}$	$0.00025 \pm 0.00023$
$p_{2}$	$- 0.00030 \pm 0.00025$
$k_{3}$	$0.00000 \pm 0.00000$

Table 2. Mean reprojection error per image axis after calibration. Both values are well below one pixel, indicating high precision in the camera parameter estimation.

Image Axis	Mean Reprojection Error/Pixels
x	0.07293
y	0.07531

Table 3. Coordinate system alignment deviation verification table.

Original Coordinate (mm)	X-Axis Movement 50 mm (mm)	Inaccuracy (mm)	Original Coordinate (mm)	Y-Axis Movement 60 mm (mm)	Inaccuracy (mm)
324.411	275.142	0.731	267.476	209.153	1.677
202.619	153.094	0.475	267.402	208.618	1.216
198.508	148.855	0.347	144.848	85.973	1.125
321.635	272.374	0.739	144.015	85.160	1.145
318.071	269.090	1.019	21.230	−38.932	0.162
194.379	144.795	0.416	18.750	−40.853	0.397
567.562	518.825	1.263	267.852	208.366	0.514
437.910	388.973	1.063	20.590	−38.924	1.076

Table 4. Performance metrics of the rebar crossing point detection and localization system.

Metric	Value
Detection accuracy	>99%
Detection time per crossing point	<125 ms
Maximum coordinate transformation error	<2 mm

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, C.; Zhang, W.; Li, F.; Guo, M.; Fan, S. Vision-Based Detection, Localization, and Optimized Path Planning for Rebar Intersections in Automated Construction. Appl. Sci. 2025, 15, 7186. https://doi.org/10.3390/app15137186

AMA Style

Li C, Zhang W, Li F, Guo M, Fan S. Vision-Based Detection, Localization, and Optimized Path Planning for Rebar Intersections in Automated Construction. Applied Sciences. 2025; 15(13):7186. https://doi.org/10.3390/app15137186

Chicago/Turabian Style

Li, Chengxiang, Weimin Zhang, Fangxing Li, Meijun Guo, and Shicheng Fan. 2025. "Vision-Based Detection, Localization, and Optimized Path Planning for Rebar Intersections in Automated Construction" Applied Sciences 15, no. 13: 7186. https://doi.org/10.3390/app15137186

APA Style

Li, C., Zhang, W., Li, F., Guo, M., & Fan, S. (2025). Vision-Based Detection, Localization, and Optimized Path Planning for Rebar Intersections in Automated Construction. Applied Sciences, 15(13), 7186. https://doi.org/10.3390/app15137186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vision-Based Detection, Localization, and Optimized Path Planning for Rebar Intersections in Automated Construction

Abstract

1. Introduction

2. Related Work

3. Overall System

3.1. Visual Recognition System

3.1.1. Image Preprocessing System

3.1.2. Robust Depth Data Augmentation System

3.1.3. Intersection Point Recognition System

3.2. Path Planning System

3.3. Coordinate Transformation System

4. Experiments

4.1. Camera Calibration

4.1.1. Calibration Method

4.1.2. Calibration Results and Reprojection Error Analysis

4.2. Coordinate Transformation Error Analysis

4.3. Rebar Crossing Point Detection and Path Planning

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI