1. Introduction
Shipbuilding is a typical major equipment manufacturing industry, and welding quality directly affects hull structural strength, sealing performance, and service safety. In ship compartments, decks, and frame structures, weld seams are numerous and widely distributed. In confined spaces such as double-bottom grid compartments, the installation space and operating accessibility of sensing devices are often limited, making it difficult to acquire and localize the target points of common seam types such as fillet welds. Meanwhile, welding operations are accompanied by high temperature, arc light, fumes, and harmful gases, and traditional approaches that heavily rely on manual experience face evident constraints in efficiency, stability, and safety. These factors motivate the need for robotic pre-localization and guidance; however, the present study focuses on the pre-arc stage and does not attempt to solve visual perception under active arc light, dense smoke, or severe spatter. With the development of welding robots and vision sensing technologies, vision-assisted robotic welding, seam tracking, depth perception, and active vision have formed a relatively systematic research foundation [
1,
2]. Weld seam tracking has gradually become an important direction in robotic welding vision research, and related studies have systematically reviewed the development of vision/laser-based seam tracking methods [
3]. At the same time, research on spatial information acquisition and active vision has continued to advance, and robotic welding has gradually evolved from two-dimensional recognition to three-dimensional perception and guidance control [
4,
5]. From the perspective of automation and intelligent welding, adaptability to complex environments, system integration complexity, and engineering deployability remain important issues that vision sensing technologies need to address continuously [
6].
In terms of welding start-point localization, Chen et al. previously proposed a vision-based initial weld-point localization method that achieved sub-pixel acquisition of the start position through template matching and pixel-level interpolation [
7]. Fang et al. further exploited the geometric relationship between two seams in the initialization and welding stages to realize vision-based localization of initial weld points for container-manufacturing scenarios [
8]. More recently, Liu et al. proposed a one-shot integrated localization method for initial weld points based on the co-mapping of cross and parallel stripes, advancing start-point localization from early local image matching toward a realization path that emphasizes geometric constraints and one-shot positioning [
9]. Overall, start-point acquisition has always been a fundamental issue in welding automation; however, for pre-localization before arc ignition, studies on how to stably obtain the 3D coordinates of the target point under limited sensing information remain relatively insufficient.
With the development of vision sensing technologies, research has gradually expanded to online seam tracking and guidance control during the welding process. Xu et al. systematically investigated computer-vision methods for seam tracking in robotic GTAW and GMAW scenarios, showing that vision sensing can play a critical role in welding deviation detection [
10,
11]. Xiao et al. proposed an adaptive feature extraction method for multiple typical weld seams to enhance recognition capability under complex seam conditions [
12]. Wang et al. proposed a robust weld seam recognition method based on structured-light vision [
13], while Wu et al. investigated the robustness of laser-vision feature extraction for fillet-weld scenarios under different reflective materials [
14]. These studies have promoted the transition of weld seam detection and tracking from ideal laboratory conditions to more complex industrial interference environments.
In addition to online tracking, researchers have also begun to focus on visual guidance under limited-teaching or teaching-free conditions. Hou et al. proposed a teaching-free robotic GMAW method based on a laser-vision sensing system, reducing reliance on manual teaching during welding [
15]. Wu et al. further proposed a teaching-free welding-position guidance method for fillet welds, enabling coordinated welding-position guidance and robotic execution [
16]. Li et al. developed a robotic welding guidance system by combining improved YOLOv5 with a RealSense depth camera, thereby improving the automation level of target detection and guidance [
17]. These studies indicate that welding vision research has gradually evolved from isolated recognition toward the coordinated realization of perception, localization, and guidance.
In terms of spatial information acquisition, binocular vision, structured light, and active vision methods have been widely used for 3D weld seam reconstruction, path extraction, and robotic guidance. Tan et al. proposed a weld seam localization method based on polarization 3D reconstruction and linear structured-light imaging [
18]; Xiao et al. proposed a robotic welding visual-guidance framework based on binocular cooperation [
19]; and Geng et al. used 3D vision to achieve weld seam extraction and path planning for medium-thickness plate structural components [
20]. Furthermore, Han et al. investigated pose and position calibration for laser displacement sensors, providing a more reliable calibration basis for spatial coordinate acquisition based on laser ranging [
21]. Wang et al. unified laser localization, trajectory fitting, and real-time tracking within a 3D coordinate recognition task, emphasizing the unified role of 3D seam-center coordinates across different welding modes [
22]. Liu et al. further proposed a weld seam type recognition and 3D localization method based on cross-structured light for complex pre-weld conditions [
23]. These studies show that 3D spatial information is of great significance for welding localization and guidance; however, for pre-localization tasks before arc ignition, how to reduce system complexity while maintaining accuracy remains a question worthy of further discussion.
As shown in
Table 1, existing studies have achieved abundant results in welding start-point localization, online seam tracking, 3D vision reconstruction, and robotic visual guidance. However, many laser-vision and hybrid 3D sensing methods are designed for seam-profile recovery, continuous seam tracking, or trajectory extraction, and they usually involve laser-stripe/profile extraction, stereo/depth reconstruction, or multi-step 3D processing. For pre-arc localization of discrete welding target points, a complete 3D sensing chain may be unnecessary. In contrast, this paper focuses on the point-wise acquisition of the 3D coordinate of the currently aligned target point under confined installation and deployment constraints. The proposed framework uses 2D vision for target-to-laser pre-alignment and combines a single point-laser distance constraint with the current TCP pose to compute the target coordinate. After one target point is localized, the same procedure can be repeated for subsequent start, end, or intermediate target points according to the task requirement.
In this paper, “confined workspace” mainly refers to limited sensor installation space and restricted deployment flexibility in ship structures. It does not imply that point-laser ranging is insensitive to arbitrary laser incidence angles. During measurement, the manipulator still needs to adjust the end-effector to maintain a feasible incidence condition whenever the workspace permits. Based on this task positioning, the proposed “2D visual planar guidance + 1D point-laser distance constraints” explores a low-complexity route for point-wise target localization, rather than replacing all 3D vision solutions.
Accordingly, this paper addresses the pre-localization problem at the start stage of ship welding and proposes a low-complexity point-wise three-dimensional localization method based on “2D visual planar guidance + 1D point-laser distance constraints.” In this method, 2D vision provides planar observation and target-to-laser pre-alignment, point-laser ranging provides a one-dimensional distance constraint, and the TCP pose provides the spatial reference for coordinate computation. Compared with dense 3D reconstruction and structured-light-based seam profiling, the proposed method does not require disparity-map generation, dense point-cloud reconstruction, laser-stripe extraction, or complete seam-profile fitting. Its lower complexity is mainly reflected in the reduced sensing-data dimension, shorter 3D processing chain, and point-wise deployment procedure for discrete welding target points before arc ignition.
The main contributions of this paper are as follows: (1) a point-wise coordinate computation framework is established by coupling 2D visual pre-alignment, TCP pose, and a one-dimensional point-laser distance constraint to solve the LIP coordinate without dense reconstruction or laser-stripe/profile extraction; (2) a distance- and spatial-level stabilization module is introduced to improve the consistency of the coordinate output under quasi-static pre-measurement conditions; and (3) the feasibility of the proposed point-wise localization workflow is validated on the current experimental platform through 3D computation tests, passive binocular-depth baseline comparison, target alignment experiments, and task-level trajectory verification. The scope of experimental validation is limited to pre-localization and guidance scenarios before arc ignition under controllable illumination and feasible measurement-pose conditions, and it does not involve robust online perception under intense arc light, persistent fumes, and spatter throughout real welding processes.
2. Methods
This section presents the proposed fusion-sensing method for point-wise welding target localization before arc ignition. As illustrated in
Figure 1, the workflow consists of target planar observation, distance-constraint introduction, spatial coordinate computation, and pre-measurement alignment.
2.1. System and Task Definition
This subsection defines the sensing units and target task of the proposed method so that the subsequent 3D modeling, stabilization design, and experimental validation can be interpreted consistently.
The system consists of a binocular camera, a point-laser ranging module, and a six-degree-of-freedom collaborative manipulator, as shown in
Figure 2. The binocular camera is mounted outside the workspace in an eye-to-hand configuration to identify the target point and provide planar visual observations. In this type of vision–robot collaborative system, it is generally necessary to establish a unified spatial transformation through extrinsic calibration between the camera and the robot [
24] so that visual observations can be used consistently in subsequent pose computation and motion execution. The point-laser ranging module is mounted near the manipulator end effector to provide a one-dimensional distance constraint along the beam direction, while the manipulator performs target alignment, measurement, and subsequent trajectory motion.
The task considered in this paper is not arc-welding seam tracking during actual welding, but pre-localization and guidance at the start stage of welding. The core objective is to obtain reliable 3D target coordinates under confined installation and operation conditions with relatively low system complexity, thereby providing the basis for subsequent welding initiation or trajectory planning. In the current implementation, this objective assumes that the manipulator can adjust the end-effector to a feasible measurement pose, preferably making the point-laser beam approximately normal to the local work surface when the workspace permits.
Based on this task definition, the subsequent 3D model is developed to solve the spatial coordinate of the currently aligned target point under limited sensing information. In the proposed workflow, welding target points are localized in a point-wise manner: at each measurement step, the system aligns the point laser with the current target region and computes the corresponding 3D coordinate. The same procedure can then be repeated for subsequent start, end, or intermediate target points according to the task requirement. Therefore, the method is intended for sequential localization of discrete welding target points before arc ignition, rather than continuous seam-profile reconstruction.
2.2. Point-Laser-Constrained 3D Coordinate Computation Model
After clarifying the system composition and task objective, this subsection further addresses the core question of how the target 3D coordinates are obtained. Since the present work does not rely on dense 3D reconstruction, it is necessary to establish a spatial computation model that can directly unify manipulator pose, laser geometric parameters, and real-time ranging data. To this end, the internal coordinate chain of the robot and the geometric definition of the point-laser module relative to the tool reference point must first be specified.
As shown in
Figure 3, the coordinate systems involved include the robot base coordinate system {B}, the end-flange coordinate system {E}, and the tool center point coordinate system {TCP}. The fixed transformation from {E} to {TCP} is obtained through TCP calibration. Since the reference point that actually participates in localization and measurement is the TCP rather than the flange center, all subsequent spatial computations use the TCP as the unified reference for end-effector pose representation.
Furthermore, as shown in
Figure 4, the geometric relationship of the point-laser module relative to the TCP can be described by the translational offset vector
, the unit direction vector
, and the real-time measured distance
d. Here,
denotes the position of the laser emission point relative to the TCP origin in the TCP coordinate system,
denotes the emission direction of the laser beam in the TCP coordinate system, and
d denotes the real-time distance from the emission point to the target surface along
.
At each sampling instant, the current joint angles of the manipulator are first read and the joint-angle vector is denoted as
Combined with the forward kinematics model of the robot, the pose transformation of the end-flange coordinate system {E} relative to the base coordinate system {B} can be obtained. The definitions of the robot link coordinate systems and the homogeneous transformations between adjacent links can be formulated in a unified manner using Denavit–Hartenberg notation [
25].
Here,
denotes the homogeneous transformation matrix between adjacent joint coordinate systems determined by the
i-th joint variable. Further combining the completed TCP calibration result, the pose of the tool center point coordinate system {TCP} in the base coordinate system {B} can be expressed as
Here,
denotes the fixed transformation from the end-flange coordinate system to the tool center point coordinate system. Writing
in block form yields
From this, the position vector and the rotation matrix R of the TCP in the base coordinate system can be extracted. Here, represents the spatial position of the TCP origin in the base coordinate system, and R represents the rotational mapping from the TCP coordinate system to the base coordinate system.
Since both
and
are defined in the TCP coordinate system, they must first be mapped into the base coordinate system through
R. The position of the laser emission point in the base coordinate system is
The laser direction vector in the base coordinate system can be expressed as
Accordingly, the spatial coordinates of the laser incident point (LIP) are given by
Further rearrangement gives
This model has a simple algebraic structure. For each aligned target point, the 3D coordinate is obtained from the current TCP pose, the calibrated laser geometry, and one scalar ranging value, without disparity-map generation, point-cloud reconstruction, or structured-light stripe extraction. To further clarify the sensitivity of this model, Equation (
8) can be interpreted from a first-order perturbation perspective. The LIP coordinate error can be approximately expressed as
This expression indicates that TCP translational error, laser-offset error, and ranging noise are directly propagated to the final LIP coordinate, whereas TCP rotational error and laser-direction error may be amplified by the offset-distance term. Therefore, calibration consistency, stable ranging, and a feasible laser incidence condition are critical for maintaining the localization accuracy of the proposed point-laser-constrained model.
2.3. Dual-Stage Stabilization for Ranging and Spatial Coordinate Outputs
After the 3D computation model is established, fluctuations in the ranging signal and small variations in the measurement pose may still lead to unstable LIP outputs. Since the LIP coordinate is computed through the coordinate chain involving the TCP pose, laser offset, laser direction, and distance value, the final spatial fluctuation is not only a direct copy of the ranging noise. Residual pose variation, extrinsic-parameter inconsistency, and occasional abnormal ranging values may appear as spatial jitter after coordinate transformation.
Therefore, a dual-stage stabilization scheme is introduced, in which ranging-level filtering and spatial-output smoothing are used for different purposes. The distance-level filter suppresses random fluctuations in the raw ranging sequence, whereas the spatial-level smoothing improves the temporal consistency of the final LIP coordinate sequence after coordinate transformation. Thus, the two stages are not repeated filtering of the same signal. Specifically, the Kalman filter operates on the scalar point-laser ranging sequence
d and outputs a stabilized distance value, whereas the sliding-window averaging is applied to the transformed three-dimensional LIP coordinate sequence
to suppress short-term spatial fluctuations before the final coordinate output. The filter settings were selected for the quasi-static post-alignment measurement stage, prioritizing output stability while keeping the delay acceptable for coordinate acquisition rather than high-speed seam tracking. A residual-threshold mechanism is further introduced to reduce the influence of occasional abnormal observations [
26,
27].
It should be noted that this serial stabilization design is an implementation choice for the current pre-localization workflow. It is not intended to prove that the proposed filtering structure is universally superior to a single-stage three-dimensional filter or an extended Kalman filter. A systematic comparison with alternative filtering structures should be further investigated in future work.
2.4. Target Alignment and Measurement Procedure
A spatial computation model and stabilization strategy alone are still insufficient for practical measurement, because subsequent ranging and coordinate computation are meaningful only when the point laser effectively falls within the target region. Accordingly, this subsection explains why a visual alignment step is still required in the proposed method and where this process is positioned within the overall framework.
To ensure that point-laser ranging acts on the target region, an image-error-based target alignment process is introduced during task execution; its basic idea falls within the typical framework of image-based visual servoing (IBVS) [
28]. After the blue target point and the laser spot are identified in the binocular images, the pixel error between them is used to drive incremental planar correction of the manipulator end effector until the laser spot stably falls near the target, after which ranging and 3D coordinate computation are performed. It should be emphasized that visual alignment ensures the coincidence between the laser spot and the target region, but it does not by itself eliminate the influence of laser incidence angle on ranging accuracy.
This process is used mainly as a pre-measurement step rather than as the core method emphasized in this paper. Its role is to create stable conditions for point-laser ranging and 3D computation and to support the complete closed-loop workflow from target recognition to trajectory execution in task-level validation. In terms of image processing, target-point detection mainly relies on color and region features, whereas laser-spot extraction is based on high-intensity response and local-extremum features. Considering that local specular reflections may occur on metallic surfaces, the pre-measurement alignment stage reduces the influence of non-target bright regions by restricting the region of interest, using the color prior of the target marker, and constraining the expected laser-spot location. These measures are intended to suppress occasional non-target highlights during pre-arc alignment, but they do not fully solve multi-path reflection or strong specular reflection on metallic surfaces. If strong reflection or large oblique incidence cannot be avoided during pre-measurement alignment, the computed coordinate should be interpreted as having increased uncertainty.
2.5. Experimental Design
This section addresses how the proposed method should be validated. Since the method includes a 3D computation model, a dual-stage stabilization strategy, and a pre-measurement alignment step, the experimental design should correspondingly evaluate the feasibility of coordinate computation, the improvement in output consistency, the role of point-laser ranging on the current platform, and the executability of the complete workflow at the task level.
2.5.1. Experimental Platform and Calibration Method
The experimental platform consists of a DECXIN-SM-2322V1 binocular camera, a Ruixing GJD-01 point-laser ranging module, and a FAIR Innovation FR3 six-degree-of-freedom collaborative manipulator. Binocular calibration was performed using a 10 × 7 checkerboard with a square size of 21 mm, and a MATLAB Stereo Camera Calibrator was used for calibration [
29]. The average reprojection error of the system was 0.2836 pixels, calculated as the mean pixel deviation between the detected checkerboard corner points and the reprojected points after calibration.
The purpose of constructing this experimental platform was to provide unified conditions for observation, ranging, and execution of the fusion-sensing method. Because this paper focuses on start-point pre-localization, the platform design emphasizes the clarity of coordinate mapping relationships and experimental repeatability rather than reproducing all disturbances in real welding as completely as possible.
The reference coordinates used in the 3D localization experiments were obtained in the robot base coordinate system after TCP calibration and workpiece–plane calibration. The marked target points on the working plane were contacted or taught using the calibrated TCP, and the corresponding robot-base coordinates were recorded as reference coordinates for error evaluation. Therefore, these reference coordinates should be understood as platform-level reference values rather than metrology-grade absolute ground truth. Possible uncertainty introduced by TCP calibration, contact judgment, and workpiece–plane calibration is included in the experimental limitation of this study.
2.5.2. Evaluation Metrics
To comprehensively evaluate the method, three categories of metrics are adopted. The first is spatial computation accuracy, including the 3D Euclidean error and the error at independent validation points. The second is stabilization performance, including ranging fluctuation metrics under static and dynamic conditions and the convergence of spatial point clouds. The third is task-level performance, including closed-loop planar error, success rate, and the execution performance of the simulated welding task.
For the point-laser-constrained 3D computation results, the 3D Euclidean error is used to measure the spatial deviation between the computed point and the reference point, defined as
Here, denotes the computed 3D coordinates of the laser incident point, and denotes the reference coordinate of the target point in the robot base coordinate system. In this study, is used as a platform-level reference obtained after TCP and workpiece–plane calibration, rather than as an absolute metrology-grade ground truth.
For the binocular-depth comparison experiment, a linear correction model is used to provide first-order compensation for the raw binocular-predicted depth:
Here, denotes the raw binocular-predicted depth, denotes the corrected predicted depth, and a and b are the scale factor and the bias term, respectively.
Furthermore, the root mean square error (RMSE) is used to evaluate the overall error level of the binocular depth-measurement results, defined as
Here, is the corrected predicted depth of the i-th sample, is the corresponding reference depth obtained from the calibrated experimental setup, and N is the number of samples.
These metrics correspond to the three key concerns of this work: whether the 3D coordinates can be obtained with sufficient accuracy, whether the obtained results exhibit sufficient consistency, and whether they can genuinely support subsequent motion execution within the complete workflow.
2.5.3. Comparison and Validation Settings
The experimental settings include (1) verification of 3D computation accuracy under near-normal incidence, oblique incidence, and same-point-different-pose (SPDP) conditions, where the oblique-incidence test is used to reveal the sensitivity and boundary condition of the point-laser ranging constraint; (2) analysis of distance- and spatial-level stabilization performance under the current filtering configuration; (3) analysis of passive binocular depth measurement as a platform-level baseline, used to evaluate whether binocular depth alone is sufficient for the present task; and (4) target alignment and simulated fillet-weld task validation, used to verify the executability of the proposed workflow. The experimental scenario is shown in
Figure 5.
These experiments are not simply juxtaposed; instead, they are progressively organized according to the logic of “the model is solvable—the output consistency is improved—the point-laser distance constraint is practically needed on the current platform—the workflow can be executed in closed loop.” The first three categories mainly serve to analyze the effectiveness of the method, whereas the last category connects the preceding stages and demonstrates that the proposed approach, as a fusion-sensing pipeline, has a practical basis for operation.
3. Results and Discussion
According to the method design and experimental arrangement described above, this section analyzes the results in the order of “basic computation capability—output-consistency improvement—practical need for point-laser constraints—task-level workflow validation.” This organization not only corresponds to the specific issues addressed by each module when the method was proposed but also helps avoid mixing experimental conclusions from different levels.
3.1. Accuracy of Point-Laser-Constrained 3D Computation
This subsection mainly discusses the performance of the 3D computation model itself under different conditions, with emphasis on whether the model provides acceptable basic computation capability and which factors affect its error.
Under approximately normal laser incidence, the same target point was measured three consecutive times. The three 3D Euclidean errors were 1.72 mm, 1.73 mm, and 1.73 mm, respectively, with a range of only 0.01 mm, as shown in
Table 2, indicating good repeatability of the system.
Under oblique incidence, although the TCP coordinates remained essentially unchanged, the measured distance still fluctuated between 253 and 257 mm, causing a clear shift in the Z component of the LIP result. Consequently, the 3D error increased from 1.36 mm to 4.35 mm, as shown in
Table 3, indicating that oblique incidence on metallic surfaces significantly aggravates ranging instability. This result reveals a boundary condition under which the point-laser ranging constraint becomes less reliable. Therefore, in practical use, the measurement pose should be planned to keep the laser beam approximately normal to the local work surface whenever possible, or the output should be assigned a larger uncertainty under large oblique incidence.
In the same-point-different-pose (SPDP) experiment, the 3D Euclidean errors of the 10 measurements ranged from 1.21 to 2.35 mm, with a mean error of 1.82 mm, as shown in
Table 4. The relatively concentrated error range indicates that the repeated measurements under different poses maintained acceptable repeatability on the current platform, although the Z-axis error remained the dominant source of deviation. After least-squares optimization of the extrinsic parameters based on these 10 samples, the average 3D error under local conditions could be further reduced to about 1.0 mm. These results indicate that the point-laser-constrained model has good basic computation capability under feasible measurement-pose conditions, but the final accuracy is strongly affected by extrinsic-parameter consistency, pose consistency, incidence condition, and ranging stability. In other words, the proposed method can form an effective 3D solution path for the current pre-localization task, provided that the laser measurement is conducted under an acceptable incidence condition. Its performance is still jointly constrained by installation parameters, measurement pose, surface reflectance, and calibration coverage.
3.2. Analysis of Dual-Stage Stabilization Performance
This subsection focuses more on whether the results can be computed stably under continuous measurement. Accordingly, the analysis centers on suppression of distance fluctuations and improvement in the convergence of spatial point locations.
The static ranging results are shown in
Figure 6 and
Table 5.
Figure 6 shows that the raw ranging sequence contains clear random fluctuations; sliding-window averaging provides a certain smoothing effect, whereas the output of Kalman filtering is more stable.
Table 5 further shows that Kalman filtering achieves the minimum values for both the standard deviation and the standard deviation of first differences, indicating stronger suppression of random fluctuations under static conditions. The drift slope a is obtained by linearly fitting the distance sequence with respect to time, and its unit is mm·s
−1. A smaller absolute value of
a indicates weaker slow drift during static measurement, which is beneficial for maintaining a stable ranging constraint before computing the LIP coordinate. It should be noted that this drift slope is used only to characterize slow temporal drift in the ranging sequence, rather than final welding localization accuracy.
The dynamic ranging results are shown in
Figure 7 and
Table 6.
Figure 7 indicates that during continuous distance variation, sliding-window averaging exhibits obvious response lag, whereas Kalman filtering can follow the true variation trend more effectively.
Table 6 further provides a quantitative comparison between sliding-window averaging and Kalman filtering. Under the current parameter setting, Kalman filtering shows a lower dynamic tracking error and a shorter equivalent time delay than sliding-window averaging, while maintaining a slightly higher slope-retention coefficient. However, the equivalent time delay of 1.43 s also indicates that the filtered result is more suitable for post-alignment quasi-static coordinate output than for high-speed online seam tracking.
The spatial-level stabilization results are shown in
Figure 8 and
Table 7. As can be seen from
Figure 8, after three-dimensional Kalman filtering, the LIP point cloud becomes markedly more convergent and the spatial dispersion is significantly reduced.
Table 7 further quantifies this effect in terms of the 3D confidence ellipsoid volume. The reduced ellipsoid volume indicates improved convergence of the LIP point cloud under the current static measurement condition. This result supports the usefulness of spatial-level smoothing for improving output consistency, but it should not be interpreted as a complete comparison with all possible single-stage 3D filtering or EKF-based fusion schemes.
Overall, the stabilization results should be interpreted within the task boundary of this paper. The proposed workflow focuses on pre-arc target-point localization, where the final coordinate is computed after visual alignment and short-term measurement, rather than on continuous high-speed seam tracking during welding. Therefore, the introduced filtering strategy is mainly used to improve the stability of the coordinate output in a quasi-static measurement stage. For applications requiring real-time tracking with strict latency constraints, filter-parameter optimization, delay compensation, or alternative filtering structures should be further evaluated.
3.3. Applicability of Binocular Depth Measurement and Platform-Level Baseline Analysis
To evaluate the applicability of pure binocular depth measurement at the task scale considered in this paper and to clarify the necessity of introducing active distance observation on the current platform, this subsection analyzes the binocular depth-measurement results. The basic geometric basis of binocular depth estimation lies in the triangulation relationship between corresponding rays in a calibrated binocular system [
30].
To evaluate whether passive binocular depth estimation alone is sufficient on the current experimental platform, a binocular depth-measurement experiment was designed. This comparison is used as a platform-level baseline for the present task, rather than as a comprehensive benchmark against optimized industrial 3D sensing systems. Distance points were set every 50 mm within the range of 500–1000 mm, and 10 pairs of binocular images were acquired at each distance, giving a total of 110 samples. The raw triangulation results exhibited a clear systematic negative bias, and the overall RMSE reached 67.80 mm.
Figure 9 further shows that the raw binocular depth estimates lie overall below the ideal reference line and that the deviation accumulates continuously as the distance increases. Although the error is markedly reduced after linear correction, the RMSE on the independent validation set remains 6.16 mm. This residual error is related to the current platform configuration, including camera resolution, baseline configuration, calibration accuracy, working distance, and image-feature localization uncertainty. Therefore, under the current hardware and calibration conditions, passive binocular depth estimation alone is still insufficient to meet the millimeter-level requirement of the present pre-localization task. This result supports the introduction of point-laser ranging as a one-dimensional active distance constraint in the proposed workflow.
3.4. Workflow Validation and Discussion of Results
In addition to 3D computation accuracy, stabilization performance, and comparison experiments, it is also necessary to further examine the collaborative operation of the individual components within the complete workflow. Accordingly, this subsection validates the overall executability of the proposed method for welding start-point pre-localization by combining the processes of target alignment, 3D coordinate acquisition, and trajectory execution.
In 10 independent target-alignment experiments, the system successfully converged in all cases, yielding a success rate of 100%. As shown in
Table 8, the final mean pixel error was 2.494 px and the corresponding mean planar error was 1.124 mm, with standard deviations of 0.243 px and 0.109 mm, respectively. Furthermore, as shown in
Figure 10, the final errors of the 10 experiments were all concentrated within a small range, indicating that the target-alignment process has good convergence stability and repeatability and can provide stable measurement conditions for subsequent ranging and 3D coordinate computation.
At the scale of the working plane, if the locally optimal extrinsic parameters obtained by fitting a single point under multiple poses are directly applied to the entire workspace, systematic offsets with regular patterns occur. To improve the applicability of the extrinsic parameters over the whole plane, nine representative points were uniformly selected on the working plane, and each point was measured three times, yielding 27 samples for least-squares re-estimation; their spatial distribution is shown in
Figure 11. The mean measurement results of the correction points under the original extrinsic parameters are listed in
Table 9, from which it can be seen that systematic deviations of varying magnitude exist at different positions. After re-estimation, the corrected extrinsic offset was (46.27, −8.87, 126.95) mm.
Five independent validation points that were not involved in the fitting were further used for testing, and their distribution on the working plane is shown in
Figure 12. The corresponding validation results are summarized in
Table 10. All point errors were below 2 mm, with a mean error of 1.54 mm and a standard deviation of 0.28 mm. These results indicate that, after workplane-level extrinsic correction, the proposed method not only has good computation capability under local pose conditions but also exhibits good global applicability over the entire target measurement area.
Finally, a red marker was used in place of a real welding torch, and blue feature points on the working plane were used to simulate visible start and end target indications in a pre-weld scenario. The system sequentially completed target indication acquisition, target alignment, LIP 3D computation, and MoveL trajectory execution. Representative results of the simulated fillet-weld task are shown in
Figure 13, including two typical task forms: straight-line connection between two points and multi-segment path drawing with turning features. As shown in
Figure 13, the manipulator end effector can perform continuous and stable trajectory motion according to the target-point coordinates obtained through visual guidance. The traces left by the red marker are clear, the start and end positions are relatively accurate, and the path direction is basically consistent with the target weld geometry, indicating that the proposed method can verify the basic coordinate-guided workflow from target indication to alignment, 3D coordinate computation, and trajectory execution for welding start-point pre-localization.
The low-complexity characteristic of the proposed method should be interpreted at the task level. Compared with binocular/depth reconstruction or structured-light/line-laser seam profiling, the proposed method does not require stereo matching, dense depth-map generation, point-cloud reconstruction, laser-stripe extraction, laser-plane profile reconstruction, or complete seam-profile fitting. For each welding target point, the 3D output is obtained from the current TCP pose, the calibrated point-laser geometric parameters, and one scalar distance measurement after visual pre-alignment. Therefore, the method reduces the sensing-data dimension and shortens the 3D processing chain, which can reduce computational burden and simplify deployment for discrete pre-arc target-point localization. This advantage is task-specific and does not imply that the proposed method can replace structured-light or line-laser systems in tasks requiring complete seam morphology recovery or continuous path extraction.
Several limitations should be noted. First, the proposed stabilization strategy is designed for quasi-static pre-measurement after visual alignment. Although it improves the consistency of the coordinate output in the current experiments, the introduced delay indicates that it should not be directly interpreted as a high-speed tracking filter. Its relative performance compared with single-stage 3D filtering or EKF-based fusion still requires further investigation. Second, the reference coordinates used in the validation experiments are platform-level reference values obtained within the calibrated robot–workpiece coordinate framework, rather than metrology-grade absolute ground truth. Third, the binocular-depth experiment is used only as a baseline on the current platform to explain the need for an active distance constraint; it does not constitute a full comparison with optimized industrial structured-light or line-laser sensing systems. Finally, the current experiments are designed to validate the proposed point-wise pre-localization workflow under controllable pre-arc conditions, rather than to reproduce all disturbances in real shipyard welding. The obtained results demonstrate the feasibility of the key technical chain, including target-to-laser alignment, point-laser-constrained 3D computation, workplane-level extrinsic correction, and coordinate-guided trajectory execution. Nevertheless, in practical shipyard environments, reflective metallic surfaces with different roughness, partial target occlusion, unstable illumination, limited workspace accessibility, smoke, spatter, and large oblique incidence may affect target detection, laser-spot extraction, and ranging reliability. The oblique-incidence experiment in this study has already shown that unfavorable incidence conditions can increase localization error. Therefore, more challenging shipyard-like experiments are still required before practical deployment.
4. Conclusions
This paper addresses the pre-localization problem at the start stage of ship welding and proposes a low-complexity point-wise three-dimensional localization method that integrates 2D visual planar guidance with 1D point-laser distance constraints. Based on the TCP pose and point-laser ranging constraints, the method establishes a direct computation model for the currently aligned target point in the robot base coordinate system and realizes spatial coordinate acquisition of discrete welding target points without relying on complete 3D reconstruction.
Experimental results show that the proposed method possesses good basic computation capability under near-normal incidence and SPDP conditions. After workplane-level extrinsic correction, the independent validation points yielded a mean 3D Euclidean error of 1.54 mm with a standard deviation of 0.28 mm on the current experimental platform. The distance- and spatial-level stabilization strategy improves the consistency of the point-output results under quasi-static pre-measurement conditions. The average planar error of the closed-loop alignment experiments was 1.124 mm, validating the supporting role of visual pre-alignment in the measurement workflow. Passive binocular depth measurement on the current platform still produced an RMSE of 6.16 mm after linear correction, indicating the practical need for introducing point-laser ranging as a one-dimensional distance constraint in the proposed workflow.
Task-level experiments further show that the system can complete the full workflow from target recognition and target alignment to 3D computation and trajectory execution, indicating that the proposed method can serve as a front-end perception route for pre-localization and guidance at the welding start stage under controllable illumination and feasible measurement-pose conditions. At the same time, the increased ranging error under oblique incidence and the systematic offsets observed when local extrinsic parameters are transferred to the global working plane indicate that the performance of the method is still affected by surface reflectance characteristics, laser incidence conditions, measurement-pose consistency, installation consistency, and calibration coverage.
Future work may proceed in several directions: further improving perception robustness under strong specular reflection, multi-path reflection, arc light, smoke interference, and large oblique incidence; introducing incidence-angle evaluation, surface-angle or curvature measurement, uncertainty weighting, delay compensation, and safety-constraint mechanisms; conducting system-level validation in environments closer to real shipyards; quantitatively comparing the proposed point-wise localization route with industrial structured-light or line-laser sensing systems in terms of deployment time, calibration complexity, hardware configuration, and computation load; and exploring multi-source depth-fusion methods that combine binocular priors, point-laser constraints, local surface-orientation information, and multi-pose verification to improve adaptability to complex curved surfaces, complex working conditions, and larger workspaces.