1. Introduction
Robotic welding represents a core technology in modern automotive manufacturing, where the integrity of welded joints directly influences product reliability, structural safety, and production costs [
1,
2,
3]. In industrial practice, welding trajectories are commonly programmed offline based on nominal CAD models, assuming ideal part positioning within dedicated fixturing systems. However, under real production conditions, geometric variability of components frequently leads to deviations from nominal positioning, resulting in weld misalignment, defects, and reduced process capability [
4,
5].
In cylindrical assemblies such as catalytic converters, dimensional changes introduced during upstream manufacturing stages may produce deviations in wall radius, overall length, or seating contact areas with conical end caps. These geometric inconsistencies can prevent proper seating in the fixture, making nominal CAD-based trajectories incompatible with the actual geometry of the part. Similar challenges related to dimensional variability and fixture sensitivity have been reported in industrial welding systems [
6,
7]. As demonstrated in
Section 3, the analyzed production process exhibits very low capability for wall radius (Cpk = 0.02) and marginal capability for overall length (Cpk = 0.56), confirming that geometric variability directly contributes to weld nonconformities.
To mitigate such issues, various adaptive strategies have been proposed. Seam-tracking systems based on laser or vision sensing enable real-time detection of joint position during welding [
8,
9]. Although effective for local correction, these systems can be sensitive to arc glare, smoke, and surface reflections, and may present limitations in circular or axisymmetric geometries. Other approaches employ machine learning or reinforcement learning techniques to optimize process parameters such as current, voltage, or travel speed [
10,
11]. However, in many reported implementations, the geometric trajectory itself remains unchanged.
Recent advances in deep reinforcement learning (DRL) have demonstrated the capability of data-driven agents to learn complex, nonlinear control policies in robotic systems [
12,
13]. DRL has been applied in manufacturing contexts for adaptive control, robotic path planning, and safe interaction in uncertain environments [
14,
15]. Nevertheless, most existing solutions either rely primarily on simulation-based training or focus on parameter adaptation rather than direct geometric trajectory correction derived from full-field three-dimensional measurements.
In this context, the present study proposes an integrated industrial workflow that combines structured-light 3D optical metrology with a DRL-based trajectory adaptation mechanism. Measured geometric deviations are encoded into a compact state representation, which is mapped by a trained DRL agent to translational and rotational corrections of the welding trajectory. Importantly, all corrections are computed offline, ensuring compatibility with standard industrial robot controllers and avoiding real-time computational overhead.
This approach enables the robot to adapt its trajectory to the actual geometry of each individual part, reducing sensitivity to upstream manufacturing variability and fixture wear. Instead of relying on manual adjustments or deterministic rule-based offsets, the DRL agent learns correction policies directly from production data, capturing nonlinear interactions between geometric deviations and welding outcomes.
The objective of this study is the development and experimental validation of a complete adaptive welding workflow integrating 3D scanning and deep reinforcement learning for automatic robotic trajectory correction. The method is validated using real industrial data from a production batch of 5000 components and evaluated through statistical process capability and sigma-level analysis.
The remainder of the paper is organized as follows.
Section 2 describes the 3D scanning system, the formulation of the reinforcement learning problem, and the integration of the proposed method into the industrial robot.
Section 3 presents the experimental results obtained using real industrial data.
Section 4 discusses industrial relevance, comparative advantages, and identified limitations.
Section 5 summarizes the main conclusions and outlines future research directions.
The main contributions of this work are:
(i) An integrated industrial workflow combining full-field 3D optical metrology with deep reinforcement learning for robotic welding trajectory correction.
(ii) A learning-based trajectory adaptation strategy that directly compensates for geometric deviations rather than adjusting the process parameters only.
(iii) Large-scale industrial validation on a production batch of 5000 components, including statistical process capability and sigma-level analysis.
(iv) An offline correction framework compatible with standard industrial robot controllers, avoiding real-time computational overhead.
Positioning and novelty: (i) Conventional seam-tracking approaches perform local online corrections under harsh arc conditions, (ii) reinforcement learning studies primarily optimize welding parameters (current/voltage/speed) while keeping the nominal path unchanged, and (iii) deterministic rule-based geometric approaches offset compensation, becoming brittle under coupled deviations. In contrast to these, the present work introduces an offline, part-specific trajectory adaptation stage driven by full-field 3D metrology and a DRL policy learned from real production data. The proposed framework targets global geometric/positioning deviations originating upstream (dimensional variability, seating errors, fixture wear), produces a finalized robot program compatible with standard industrial controllers, and demonstrates large-scale process improvement under industrial monitoring.
2. Materials and Methods
2.1. Three-Dimensional Scanning System and Data Acquisition
For dimensional characterization of the components, an optical 3D scanning system based on structured light (GOM ATOS Compact Scan) was employed. The system generates a high-resolution point cloud by projecting a precise pattern onto the surface of the component, enabling accurate reconstruction of the real geometry. The setup includes two high-resolution cameras, a blue-light projector, an ambient light sensor, and a rotary table used for the controlled positioning and scanning of the part.
The data acquisition process consists of the following steps:
The component is positioned on the rotary table.
A sequence of scans is performed from multiple viewing angles.
Filtering and cleaning algorithms are applied to the acquired point cloud.
The point cloud is converted into a triangulated mesh.
The mesh is aligned with the nominal CAD model using a three-stage procedure:
The resulting aligned data constitute the primary input for the machine learning system.
The measurement uncertainty of the ATOS system was below 10 µm, ensuring reliable extraction of dimensional deviations relevant for welding correction (
Figure 1).
2.2. Extraction of Geometric Deviations
After aligning the point cloud with the nominal model, point-to-point deviations are computed:
where the Euclidean norm is used and the nominal point corresponds to the closest point on the CAD surface, and
is the scanned (measured) point and
is the corresponding point in the CAD model.
For critical features (radius R, length L, contact area C), the deviations are aggregated:
Aggregated deviations were computed using statistical descriptors (e.g., mean and maximum absolute deviation) for each critical feature.
Figure 2 highlights deviations in the critical welding areas; these deviations represent the observable state for the DRL model.
2.3. Dataset Generation for Learning
For training the DRL agent, each scan must be associated with the corresponding welding outcome.
Thus, for each part, the following tuple is constructed:
where:
= state—the deviation vector:
= action—the correction applied to the trajectory:
= reward—determined based on the welding quality:
In the full version, the reward can be continuous:
where
,
, and
represent normalized error metrics derived from geometric deviation, visual inspection, and non-destructive testing, respectively, and α, β, γ are weighting coefficients.
To ensure the stability and generalization of the learning process, 3D point clouds are not used directly as raw input to the DRL agent. Geometric information is reduced to a compact set of industrially relevant features, consisting of aggregated dimensional deviations and deviation maps aligned to the CAD model. This representation enables a direct correlation between geometric variations of the parts and the trajectory correction actions.
2.4. Formulation of the Reinforcement Learning Problem
The process is formalized as a Markov Decision Process (MDP), where the agent learns a continuous trajectory-compensation policy from recorded industrial transitions. The state
consists of compact geometric deviation descriptors extracted prior to welding (
Section 2.3), while the action
represents continuous translational and rotational corrections applied to the nominal robot trajectory (Equation (11)). After executing welding with the corrected program, the resulting weld quality provides the reward
.
Since the correction actions are continuous, the policy is learned using a Soft Actor–Critic (SAC) formulation. SAC is an off-policy Actor–Critic method that optimizes a stochastic policy
while encouraging exploration through entropy regularization. The objective is to maximize the expected cumulative reward and the policy entropy:
where
is the discount factor and
is the entropy temperature. In practice, two Critic networks
,
are used to improve stability and reduce positive bias.
2.5. Deep Reinforcement Learning Agent Architecture
The agent follows an Actor–Critic architecture consistent with the SAC algorithm. Geometric deviations extracted from 3D metrology are provided to two parallel encoders: (i) a CNN encoder that processes a fixed-resolution deviation map aligned to the CAD model, and (ii) an MLP that encodes low-dimensional numerical deviation descriptors (e.g., , , , , , , ). The encoded representations are fused into a shared latent vector.
The fused representation is then used by:
An Actor network that outputs the parameters of a squashed Gaussian policy (mean and standard deviation_toggle) for continuous corrections, i.e., .
Two Critic networks and (twin Q) that estimate the expected return for state–action pairs.
This design preserves industrial interpretability at the input/output level (explicit deviation descriptors and bounded correction actions) while enabling robust learning of nonlinear compensation policies from production data (
Figure 3).
The agent integrates geometric deviation information extracted from 3D scans through a convolutional neural network (CNN) and numerical deviation features through a multilayer perceptron (MLP). The fused representation forms a shared latent vector that feeds both the stochastic policy network (Actor) and two Critic networks (Twin Q-functions).
The Actor outputs continuous corrective actions applied to the robot welding trajectory, while the Critics estimate the expected return of state–action pairs to guide policy optimization.
The CNN processes fixed-resolution deviation maps aligned with the CAD model, whereas the MLP encodes low-dimensional numerical deviation descriptors.
2.6. Exploration and Update Policy
In SAC, exploration is achieved through the stochastic policy itself, without ε-greedy action selection. At each step, the Actor samples a continuous correction action within predefined safety bounds. Transitions are stored in an experience replay buffer and reused for off-policy learning.
Training updates are performed by minimizing (i) the Critic losses for the twin Q-functions and (ii) the Actor loss derived from maximizing expected Q-value while maintaining policy entropy. Target networks are updated using soft updates to ensure stability. The entropy temperature can be automatically tuned to maintain a desired exploration level.
Importantly, all learning is performed offline using recorded industrial transitions, ensuring that no unsafe online exploration is executed on the production cell.
2.6.1. Industrial Training Configuration and Deployment Context
The DRL agent was implemented and trained within the industrial automation environment of the Ford Craiova welding cell. Training was performed offline using recorded industrial transition data collected during controlled production runs. The learning configuration followed a standard Soft Actor–Critic (SAC) framework with experience replay and entropy-regularized stochastic exploration. Hyperparameters were selected and validated through iterative tuning under production engineering supervision to ensure stable convergence and compatibility with industrial safety constraints. Training relied on recorded production datasets rather than online exploration, thereby avoiding any risk to active manufacturing operations. The resulting policy was validated under supervised industrial conditions prior to full deployment in the production environment.
2.6.2. Training Configuration and Reproducibility Protocol
The SAC agent was trained offline using recorded industrial transition data collected from the welding production line. The training configuration followed a standard Soft Actor–Critic implementation with twin Q-functions and entropy regularization.
Training was conducted for 120,000 episodes, with a maximum of 64 steps per episode. A replay buffer of 1,000,000 transitions was used to enable stable off-policy learning. A warm-up phase of 10,000 steps was applied before policy updates.
Mini-batches of size 256 were sampled from the replay buffer. The discount factor was set to γ = 0.99, and soft target updates were performed using a coefficient τ = 0.005. Both Actor and Critic networks were optimized using the Adam optimizer with a learning rate of 3 × 10
−4 (
Table 1).
The neural architecture consisted of three fully connected hidden layers with 256 neurons per layer, using ReLU activations. The entropy temperature parameter was automatically tuned during training.
Training was executed on an NVIDIA RTX A5000/RTX 4090 GPU platform, requiring approximately 18 h for full convergence. Inference in the production environment is performed on an industrial PC CPU with latency below 40 ms, ensuring no impact on welding cycle time.
To improve robustness and mitigate initialization bias, training was repeated with five independent random seeds, and convergence was assessed based on stabilization of the residual RMS welding error.
The dataset was partitioned using a hold-out strategy (70/15/15), separating training, validation, and test sets to prevent data leakage between model development and final industrial evaluation.
2.7. Generation of the Corrected Trajectory
The action proposed by the agent is applied to the nominal trajectory:
where the action
a contains positional and orientation corrections.
The correction is applied point-wise along the nominal trajectory.
The nominal Cartesian trajectory is defined as a spatial curve parameterized by arc length:
where s represents the arc length and L is the total path length.
To obtain a time-parameterized trajectory suitable for robot execution, a time-scaling function s(t) is introduced, mapping time to arc length. The executed trajectory is therefore:
In practice, the continuous trajectory is discretized at controller sampling intervals
Δt, generating a sequence of Cartesian positions:
The correction
produced by the DRL agent is applied point-wise to the discretized trajectory prior to export. Final smoothing is performed using cubic interpolation to ensure compatibility with continuous path (CP) execution mode of the industrial robot (
Figure 4).
2.8. Integration into the FANUC Industrial Robot
All stages of data acquisition, geometric processing, and DRL agent inference are performed offline, prior to welding execution. Trajectory corrections are fully computed before loading the program into the industrial robot controller, so that the actual welding process does not involve additional computations or real-time decision making.
The corrected trajectory is exported in a compatible format (LS, TP, or Cartesian Path):
Calculation of offsets in the robot coordinate system;
Generation of new points;
Application of smoothing (cubic interpolation);
Loading into the controller.
2.9. Complete Method Pseudocode
Algorithm 1 summarizes the complete offline trajectory compensation pipeline adopted in this study, from 3D scanning and CAD alignment to seam feature extraction, policy inference, and the generation of bounded translational/rotational offsets applied to the robot program. The method follows an actor–critic deep reinforcement learning formulation based on the Soft Actor–Critic (SAC) framework, which is suitable for continuous action spaces and stable learning. For clarity and reproducibility, the pseudocode reports the key processing stages, the safety envelope enforcement, and the logging/feedback loop used during offline training
| Algorithm 1. SAC-based Offline Trajectory Compensation |
Input: 3D geometric deviation descriptors , trained Actor parameters , nominal trajectory Output: Corrected trajectory Step 1: State Construction Acquire the 3D scan of the component and extract geometric deviation descriptors. Construct the state vector Step 2: Policy Inference Sample correction action: (13) where is a stochastic Gaussian policy with bounded outputs. Step 3: Safety Constraint Enforcement Clip action within predefined industrial safety bounds. Step 4: Trajectory Update Apply correction to nominal trajectory: (14) Step 5: Welding Execution and Reward Computation (Training Phase Only) Execute welding, measure weld quality, and compute reward . Replay Buffer Update Store transition in buffer . Step 6: Offline SAC Update (Training Phase Only) Store transition in the replay buffer. Update Critic networks. Update Actor using entropy-regularized objective. Soft-update target networks. |
Raw point clouds (PLY/STL);
Complete deviation maps;
SPC tables (Excel);
Original scan figures;
Before/after correction plots;
Extended pseudocode (full version);
Detailed CNN + SAC architecture (Actor–Twin Critic);
Examples of nominal and corrected trajectories.
3. Results
This section presents the statistical analysis of dimensional variations of the components, the evaluation of the effect of deviations on welding quality, and the validation of the performance of the proposed method. The results are obtained by combining three-dimensional measurements, statistical process control (SPC) techniques, and deep reinforcement learning (DRL) agent training, using real datasets from industrial production.
3.1. Analysis of Geometric Variations of the Catalytic Converter
For the critical features—wall radius (R), total length (L), and contact area (C)—100 parts were analyzed under normal production conditions. The data originate from 3D optical measurements, processed and aligned to the nominal model (
Figure 5).
3.1.1. Wall Radius (R)
SPC analysis indicates a wide distribution of deviations, with values consistently exceeding the specified limits:
These results indicate a completely incapable process, in which wall radius variations cannot be compensated for by mechanical positioning and lead to major errors in the welded joint.
3.1.2. Total Length (L)
Length analysis highlights a marginal process:
In particular, parts with L < 188 mm require additional operations (rework) due to the way the end caps seat and overlap on the converter body (
Figure 6).
Observation: a reduction in length leads to positioning deviations that cause non-uniform opening of the welding gap.
3.1.3. Contact Area (C)
According to go/no-go inspection:
This feature has a limited influence on trajectory variations, with major deviations concentrated in R and L.
3.1.4. Industrial Pre- and Post-Implementation Quality Assessment
To rigorously evaluate the impact of the proposed DRL-based trajectory compensation framework, a direct comparison between baseline production and post-implementation performance was conducted using industrial quality monitoring data.
The baseline production batch consisted of 100 units, for which 5 defects were recorded. Each unit was evaluated with respect to one critical weld seam (one defect opportunity per unit). This corresponds to:
Defects (d) = 5;
Units (n) = 100;
Opportunities per unit (o) = 1;
Total opportunities = 100;
Defect rate = 5%;
Process sigma level = 3.145.
Following the deployment of the proposed trajectory correction framework, a monitored production batch of 5000 units was analyzed. In this case, two critical weld seams per unit were evaluated, resulting in two defect opportunities per unit. The recorded data were as follows:
Defects (d) = 1;
Units (n) = 5000;
Opportunities per unit (o) = 2;
Total opportunities = 10,000;
Defect rate (per unit) = 0.02%;
Process sigma level = 5.219.
The sigma level was computed using standard Six Sigma methodology based on defects per opportunity (DPO) and corresponding DPMO conversion. Since sigma accounts for the number of defect opportunities per unit, the comparison remains valid despite the difference in weld seam count between the two production stages.
It should be noted that the baseline batch (n = 100 units) represents the standard monitored production window prior to deployment of the compensation framework, while the post-implementation evaluation (n = 5000 units) corresponds to extended industrial validation under stable operating conditions.
Although the sample sizes differ, the statistical comparison is performed at the defect-per-opportunity (DPO) level, which normalizes for both unit count and defect opportunities per unit. Furthermore, the statistical validation analysis presented in
Section 3.8 confirms that the observed improvement is not attributable to sampling size differences but reflects a genuine shift in process performance.
Table 2 summarizes the key performance indicators before and after implementation.
Figure 7 illustrates the resulting improvement in process capability.
The increase from 3.145σ to 5.219σ corresponds to a substantial reduction in defects per opportunity and reflects a significant enhancement in welding process robustness. From an industrial perspective, this improvement translates into reduced rework, lower scrap rates, and increased production stability.
3.2. Effect of Deviations on the Weld Bead
The measured deviations lead, in production, to the following issues:
Misalignment between the end caps and the body;
Excessively open or closed welding gaps;
Changes in the actual torch angle;
Axial offsets of the welding points;
Local material deformations.
These effects were repeatedly observed in non-compliant parts, confirming the direct relationship between dimensional variations and welding defects (
Figure 8).
3.3. Performance of the DRL Agent in Trajectory Correction
The DRL agent was trained on a dataset composed of:
Geometric deviations extracted from 3D scanning;
Correction actions manually applied in production;
Welding outcomes (OK/defective);
Corrected and nominal trajectories
Convergence of the Learning Policy
During the training episodes, the DRL agent gradually improves its performance:
3.4. Evaluation of the Corrections Proposed by the Agent
Based on the measured deviations, the DRL agent generates corrections in the space of:
Consistency of the Adjustments
For parts exhibiting large radius deviations (
Figure 9):
The agent proposes an increase in radial offset;
Modification of the torch angle to compensate for gap opening;
Adjustment of the travel speed (in certain cases).
3.5. Industrial Results After Implementation of the Corrections
The method was validated on an extended batch of 5000 parts.
Evaluated parameters:
Defect rate;
Rework rate;
Process stability;
Sigma level.
3.5.1. Defect Reduction
The implementation of adaptive trajectory correction led to:
A reduction in welding defects by more than 90%;
Elimination of rework associated with short parts (<188 mm);
Automatic correction of improper positioning.
For comparison, the reference (baseline) process relied exclusively on welding trajectories programmed offline based on the nominal CAD model, without adaptive trajectory corrections. Under these conditions, geometric variations of the components frequently led to misalignment of the joining areas, generation of welding defects, and the need for rework operations. The performance reported in this section corresponds to the implementation of the proposed method and is evaluated relative to this reference process.
3.5.2. Process Capability Calculation
The process capability was evaluated using the Six Sigma methodology based on defects per opportunity (DPO). The Defects Per Million Opportunities (DPMO) were calculated as:
where D represents the number of detected defects, N the number of produced units, and O the number of critical weld opportunities per unit.
The corresponding sigma level was determined using the standard Six Sigma conversion, including the conventional 1.5σ shift adopted in industrial practice. This formulation enables a normalized comparison between production stages with different numbers of defect opportunities per unit.
3.6. Qualitative Performance Analysis
The implementation of the AI-based method integrated with 3D scanning brought clear benefits:
The robot becomes independent of perfect part positioning.
It compensates for dimensional deviations directly in the trajectory.
It reduces process sensitivity to manufacturing variations.
It improves process stability without mechanical modifications or fixture changes.
It enables generalization to new parts with similar geometries.
3.7. Operational Overhead and Industrial Feasibility
The practical deployment of the proposed trajectory compensation framework was evaluated in terms of additional operational overhead introduced into the production workflow.
The structured-light 3D scanning process required approximately 5–10 s per component, depending on positioning and surface conditions. Deviation extraction and data preprocessing were executed automatically within the metrology software environment and required only a few additional seconds.
The trajectory correction computation was performed offline within the industrial software ecosystem and did not introduce real-time computational load during welding execution. The corrected trajectory was generated prior to welding and transferred to the robot controller as a standard program.
The average main welding cycle time for the analyzed component is approximately 45 s per unit, depending on seam configuration and robot speed settings. The additional preprocessing stage (3D scanning and deviation extraction), requiring approximately 8–15 s in total, represents less than one third of the overall production cycle and is executed prior to arc ignition. Since the trajectory correction is fully computed offline, the arc-on time and takt time of the welding cell remain unchanged.
Importantly, the welding cycle time itself remained unchanged, since no adaptive control or real-time sensing was required during arc operation. The method therefore preserves production takt time while significantly improving process capability.
From an industrial perspective, the additional preprocessing time is negligible compared to the benefits obtained through defect reduction, elimination of rework, and increased production stability.
3.8. Statistical Validation of Process Improvement
To ensure that the observed improvements were not attributable to random variation, a formal statistical comparison was performed between the baseline production stage (CAD-based trajectory) and the SAC-compensated production stage.
For geometric residual error (RMS), normality was first assessed using the Shapiro–Wilk test. Since no significant deviation from normality was detected (p > 0.05), a two-sample independent t-test (Welch’s correction for unequal variances) was applied.
The null hypothesis H0 assumed equal mean residual RMS errors between baseline and SAC-based compensation. The alternative hypothesis H1 assumed a reduction in RMS error under SAC compensation.
The obtained test statistic indicated a statistically significant reduction in residual RMS error:
p = 0.0031 (two-tailed), α = 0.05.
The 95% confidence interval for the mean reduction in RMS error was:
CI95 = [0.38 mm, 0.51 mm].
The computed effect size (Cohen’s d = 0.84) corresponds to a large practical effect according to conventional interpretation thresholds.
For defect occurrence rates, a comparison of proportions was conducted using a two-proportion z-test based on defects per opportunity (DPO). The reduction from 5 defects/100 opportunities to 1 defect/10,000 opportunities was statistically significant (p < 0.001), confirming that the observed increase in sigma level (from 3.145σ to 5.219σ) is not attributable to sampling variability.
These results confirm that the observed industrial performance gains are statistically robust and reflect a true improvement in process capability.
4. Discussion
The integration of three-dimensional measurement with a deep reinforcement learning (DRL) agent for adaptive adjustment of the welding trajectory represents a significant advancement over classical offline programming methods, as well as over the most recent commercial seam-tracking systems. This section analyzes the relevance of the method, its comparison with other techniques in the field, the industrial impact, the identified limitations, and directions for further extension.
4.1. Industrial Relevance of the Proposed Method
The presented method addresses a critical problem in high-volume production: parts may be dimensionally inaccurate, and their positioning in the fixture cannot be mechanically corrected.
Under these conditions:
Nominal trajectories become inadequate;
Fixturing systems cannot eliminate deviations;
Manual adjustment is slow, costly, and dependent on operator experience.
The proposed system automates this stage by:
Identifying real deviations through 3D scanning;
Correlating them with welding outcomes;
Automatically generating trajectory corrections via the DRL agent.
The industrial impact is significant: the robot becomes capable of directly compensating for natural production variations without hardware modifications, significantly reducing dependence on upstream precision and eliminating additional costs related to calibration or fixture redesign.
4.2. Comparison with Traditional Methods
Offline programming based on the nominal CAD model
Major limitations:
Manual trajectory adjustment
Although effective in some cases:
Seam-tracking–based systems
These use cameras or laser sensors to detect the joint in real time.
Limitations:
Not optimal for complex circular geometries;
Data acquisition can be affected by reflections, smoke, and the welding arc;
Real-time correction is limited to only one dimension of the process.
Unlike classical seam-tracking systems, the proposed method does not replace local real-time control, but acts in a complementary manner by globally correcting the trajectory prior to welding execution.
In industrial arc welding, real-time sensing and correction are often constrained by harsh optical conditions (arc glare, fumes, spatter, reflections) and by strict requirements for deterministic execution on certified robot controllers. For these reasons, the proposed approach is deliberately designed as an offline trajectory adaptation stage: the full-field 3D scan is acquired prior to welding, the correction is computed once per part, and the robot executes a finalized program without additional online inference. This design choice prioritizes robustness, repeatability, and straightforward integration into existing production cells.
From an automation perspective, the method is complementary to seam-tracking. Seam-tracking can be advantageous for local, short-range deviations detected directly in the joint region during welding, whereas the present method targets global geometric/positioning inconsistencies that originate upstream (part variability, seating errors, fixture wear) and would otherwise propagate along the entire nominal path. In practice, the two strategies can be combined: offline correction can bring the torch into a corrected global alignment window, while optional in-line sensing can refine the seam locally when feasible.
4.3. Comparison with Modern AI-Based Solutions
Recent literature presents several major directions:
Adaptive visual control (CNN + real-time vision)
Advantages:
Limitations:
Requires controlled illumination;
Difficult to use directly in welding environments (intense light, smoke, optical noise);
High costs.
Parameter optimization via reinforcement learning (RL)
Some works adjust:
Current;
Voltage;
Travel speed.
Major limitation:
A straightforward alternative to learning-based trajectory adaptation consists of deterministic geometric compensation rules derived from measured deviations (e.g., fixed offsets proportional to radius or length errors). While such rule-based strategies can partially mitigate systematic deviations, industrial observations indicate that their effectiveness rapidly degrades when multiple deviation sources interact simultaneously. In particular, coupled effects between part length, radial deformation, seating asymmetry, and local gap opening lead to nonlinear correction requirements that cannot be robustly captured by a small set of predefined rules.
The DRL-based approach overcomes these limitations by implicitly learning nonlinear correction policies from real production data, directly correlating geometric deviation patterns with welding outcomes. Instead of enforcing a predefined compensation model, the agent adapts its actions based on observed success or failure, allowing it to handle interacting deviations and edge cases that are difficult to formalize analytically. In this sense, the learning agent acts as a data-driven generalization layer over classical geometric compensation, preserving industrial interpretability at the input/output level while avoiding brittle hand-tuned heuristics.
Hybrid fuzzy–neural models
Limitations:
Justification for the Selection of Soft Actor–Critic
The choice of the Soft Actor–Critic (SAC) algorithm was motivated by the continuous nature of the trajectory correction actions and by the need for stable off-policy learning under limited and noisy industrial datasets.
Compared to deterministic Actor–Critic variants such as DDPG or TD3, SAC introduces entropy regularization, encouraging stochastic exploration and improving robustness to local optima. In industrial welding applications, where geometric deviations may interact nonlinearly and defect feedback is sparse, entropy-regularized policies help avoid premature convergence to suboptimal correction strategies.
On-policy methods such as PPO were not selected due to their higher sample complexity and reduced data efficiency, which are less suitable for industrial contexts where data collection is constrained and safety-critical.
Furthermore, SAC has demonstrated strong empirical stability in continuous control benchmarks and robotic manipulation tasks, making it particularly appropriate for learning bounded translational and rotational trajectory corrections.
The selection of SAC therefore reflects a trade-off between stability, sample efficiency, and safe offline training compatibility, which are critical factors in industrial deployment.
4.4. Distinctive Advantage of the Proposed Method
The method presented in this work represents an integrated approach that combines full 3D scanning of components with a deep reinforcement learning agent for direct correction of the robot trajectory based on real geometric deviations.
The primary methodological contribution of this work lies in coupling full-field industrial metrology with entropy-regularized reinforcement learning trained exclusively on real production data. Unlike simulation-dominated RL studies, the proposed approach operates entirely within a measured industrial state–action–reward loop, enabling direct transferability and eliminating the sim-to-real gap commonly encountered in robotic learning systems.
This enables the robot to:
Compensate for large deviations;
Generate customized trajectories for each part;
Maintain welding quality even in the presence of positioning deviations that would otherwise produce severe defects.
4.5. Identified Limitations
Although the method is effective, several inherent limitations exist. It is important to emphasize that the proposed method is not intended to replace all forms of adaptive welding control, nor to eliminate the need for local sensing in applications dominated by fast, high-frequency disturbances. Its primary scope is the compensation of part-level geometric and positioning deviations that are stable over the duration of a welding cycle and originate from upstream manufacturing variability. Within this scope, the offline correction paradigm offers a favorable trade-off between robustness, industrial deployability, and performance gains. Applications characterized by rapidly evolving joint geometry during welding may require hybrid strategies that combine offline trajectory adaptation with in-line sensing and control. There are additional costs associated with scanning each part.
Even though the scanning time is relatively short (5–10 s), full integration requires optimization of the production flow.
There is a need for an initial training dataset.
The DRL agent requires:
Defective parts;
Manual corrections;
Welding quality data.
This imposes an initial data collection phase.
Training time: the full training of the agent requires accelerated simulations or a large number of episodes.
Implementation complexity: complete integration requires:
4.6. Future Development Directions
Although the experimental validation focuses on a specific industrial component, the proposed framework is not tied to a particular product geometry or welding task. The learning agent operates on abstracted geometric deviation descriptors and trajectory correction actions, which are common across a wide range of robotic joining and processing operations. As a result, the same methodology can be transferred to other axisymmetric or quasi-axisymmetric components, as well as to different robotic processes where nominal trajectories are systematically affected by part-level geometric variability.
The method provides a solid foundation for extension to more advanced industrial applications. Future directions include: continual learning; agent ability to continue learning over time based on new parts; integration of in-line sensors; combining offline 3D scanning with laser sensors during welding; generalization to different geometries.
The same agent can be adapted through transfer learning to:
Tanks;
Valves;
Cylindrical housings.
Joint optimization of process parameters and trajectory: an extended DRL model can act on:
Trajectory;
Current;
Voltage;
Travel speed.
Cloud or edge computing implementation could be possible to reduce processing time on the shop floor.
5. Conclusions
This work presented an industrially validated framework for offline, part-specific robotic welding trajectory correction driven by full-field 3D metrology and entropy-regularized reinforcement learning. By combining structured-light scanning with a Soft Actor–Critic (SAC) agent, the system learns nonlinear compensation policies that directly address geometric variability originating upstream in the manufacturing chain.
Unlike conventional CAD-based programming or deterministic offset rules, the proposed approach treats geometric deviations as a measurable state representation and maps them to bounded translational and rotational trajectory corrections through data-driven policy learning. The correction is computed offline and deployed as a finalized robot program, ensuring full compatibility with standard industrial controllers and preserving welding cycle time.
The large-scale industrial validation demonstrates that metrology-informed reinforcement learning can convert dimensional variability from a process limitation into a controllable parameter. The framework does not replace seam-tracking or local sensing but complements them by correcting global positioning and geometric inconsistencies prior to arc initiation.
Beyond the specific catalytic converter case study, the methodology is transferable to other robotic joining operations where nominal trajectories are systematically affected by part-level geometric deviations. The proposed architecture establishes a scalable pathway toward adaptive, data-driven robotic manufacturing systems capable of operating under realistic industrial variability constraints.