3D Scanner-Based Identification of Welding Defects—Clustering the Results of Point Cloud Alignment

This paper describes a framework for detecting welding errors using 3D scanner data. The proposed approach employs density-based clustering to compare point clouds and identify deviations. The discovered clusters are then classified according to standard welding fault classes. Six welding deviations defined in the ISO 5817:2014 standard were evaluated. All defects were represented through CAD models, and the method was able to detect five of these deviations. The results demonstrate that the errors can be effectively identified and grouped according to the location of the different points in the error clusters. However, the method cannot separate crack-related defects as a distinct cluster.


Introduction
Industrial digitalization has opened up new dimensions to make industrial processes more efficient. The quality control of the welding is a significant challenge to achieve autonomous manufacturing [1]. The issue of compliance and the existence of quality has become increasingly prominent with the concept of quality management [2]. Various machine learning (ML) techniques can be used to solve problems related to welding processes and make manufacturing more efficient [3]. The accuracy of welding processes improved significantly with ML algorithms [4].
Welding technology encompasses a wide range of processes and variations, making it a subject of extensive research. The use of raw materials in welding has been studied in depth, with research focusing on the structural changes in the materials used for joining [5]. This research typically includes laboratory testing with both destructive and non-destructive methods, as well as microscopic examinations [6,7]. Additionally, there is ongoing research on seam detection using image processing algorithms (based on laser vision) and Kalman filter for trajectory optimization [8]. The laser vision technique [9] is also used for adaptive measuring the actual weldments. The algorithm extracts the shape features by constructing the characteristic triangle of the weld trajectory [10]. In recent years, there has been an increase in research on artificial intelligence (AI) in the field of welding, which represents a promising new direction for research [11]. Breakthroughs in this area, as well as related areas, are expected in the near future [12,13].
The test to verify the conformity of a welded joint is complex, due to the thermal processes used in the technology. Standards, process safety, and economic aspects are important factors to be incorporated. The test order is standardized. The listing surface and volume deviations using defect categories and codes are one of the significant achievements in welding quality management [2]. The standard shows three quality categories with limits for each deviation. On this basis, clear definitions are introduced for the acceptability of welded joints. A welded joint is considered to be in compliance if the extent of any defects it may contain is within the limits; otherwise, it is considered to be non-compliant. In some cases, such as cracks, they are not acceptable in either quality class (not permitted) [1]. The physical principle of the various non-contact inspection tools, successfully used in many industrial applications to detect quality defects, belongs to the research of image recognition and image processing [14]. Image processing tools are used for 3D reconstruction, for example, in medical applications [15] and in specialized fields such as identification of underwater objects [16].
The question of inhomogeneity of welds is also a challenge for non-destructive testing using ultrasonic waves, which is applied in power plants to inspect welds. Among different ultrasonic testing techniques, phased array ultrasonic testing (PAUT) determines the angle and focal law and transmits pulses through the electromagnetic time delays of multiple elements arrayed in phased array (PA) probes to generate a focused beam. It can detect flaws present in multiple directions of the target under inspection [17].
Three-dimensional scanning is an accurate and fast non-contact measurement process that creates a 3D point cloud of physical objects. Active non-contact scanning methods can work using time of flight, triangulation, or structured light. The first two are specific to laser scanning [18]. This process emits laser light or infrared or other special colored light and then uses the returned light to create a spatial point mesh as a digital model by capturing the surface coordinates (x, y, z) of the actual object (up to millions of points per second). The technology is often used for reverse engineering, which creates a digital duplicate of reality [19]. Technology is becoming increasingly accurate to meet design challenges. The optics measure both the distortion of the grating and the intensity of the reflected light, giving similarly accurate results as laser versions [20]. The accuracy of five microns experienced in 3D scanning processes [21] is strongly setting-dependent (scanning conditions, part surface quality). A laser scanning test has been considered for future work to achieve similar accuracy, as 3D visualization models of complex and irregular schemes can be constructed quickly and accurately [22]. These scanners have fixed and handheld peripherals that can be adapted in the laboratory or to a robot. Increasingly sophisticated equipment has enabled advanced point cloud processing (after scanning of parts) and faster non-destructive material testing (comparison).
The point clouds should be sampled for more efficient data processing. One possible solution to the processing of point clouds is fast global registration (FGR) and the iterative closest point (ICP) algorithm, which creates pairwise correspondences between two point clouds [23]. After fitting the point clouds, the distances can be calculated, and the deviation can be identified with density-based clustering [24]. Density-based spatial clustering of applications with noise (DBSCAN) is the pioneer of density-based clustering techniques [23], which can discover clusters of arbitrary shape and effectively handle noise or outliers. The DBSCAN algorithm has a quadratic time complexity with dataset size and can be extended to large datasets [25]. DBSCAN allows the identification of clusters of different shapes and can handle noisy patterns in the data by classifying points in low-density regions as noise and skipping them. Its operation is based on spatial density, which has several advantages over other classical clustering algorithms. One of these advantages is its ability to identify clusters of arbitrary shapes correctly. Another is that it performs divisive clustering by automatically determining the number of clusters, so it is not necessary to predetermine their number.
The novelties and objectives of this article are as follows: • A comprehensive discussion of the classification of weld defects is made based on image processing and data analysis, including the qualitative variation of each weld defect; • A methodology is developed for the visualization of weld defects using computeraided design (CAD) with a results-oriented approach; • The point clouds processed by the 3D camera and modeled with CAD programs are determined (point feature histogram-PFH); the descriptive characteristics of the points prepares the clouds for coarse (FGR) and fine (ICP) matching; • The weld defects are detected by density-based clustering (DBSCAN) by sorting the top distance between point pairs, and the clustered errors are aligned to the standard.
The paper is structured as follows. The developed framework is detailed in Section 2. In Section 2.1, the modeling of the defective CAD parts and the reference CAD workpiece are presented, converting them into a point cloud, and then calculating point properties (Section 2.2). The fitting process is performed using FGR and ICP algorithms (Section 2.3). The clustering process using the DBSCAN clustering algorithm is shown in Section 2.4. Finally, the results of the developed framework are presented in Section 3.

The Proposed CAD Model-Based Method of Generation and Analysis of Error Distributions
A clustering-based framework is developed to detect qualitative variations of welded joint defects (possible geometric variations of joints, weld convexity, root size smaller or larger, edge efflorescence, inclusions, end crater, and cracks). The proposed framework (see in Figure 1) started with the creation of point clouds. For the fitting, we need two point clouds: a reference point cloud, which is a point cloud of a flawless part, and a test point cloud, which is a point cloud of a defective part (Input block on Figure 1). The process continues with sampling. After determining the point descriptor properties, a coarse and fine-fitting process can follow (Cloud points matching block on Figure 1). Finally, the clustering step localizes the centers of the errors (Clustering block on Figure 1). A 3D reconstruction of the object was performed using structured light 3D scanning. The resulting high-density point cloud was then converted into a mesh 3D model. An initial step was added to the testing procedure to account for potential defects in real weldments. CAD models of fillet welded models were created, incorporating non-conformities as outlined in ISO 5817:2014. These non-conformities were grouped into three quality categories, with specific limits defined for each. For example, imperfection 5213, with a deviation larger than the specified root size, was included as a separate CAD model. The point clouds were compared from both the compliant and non-compliant CAD models, eliminating other uncertainties present in real parts [26]. In other words, the system is trained with CAD models, and its operation is validated through a 3D point cloud of a real welded joint. These point clouds with errors are compared one by one with the point cloud of the CAD model without errors. The purpose of the analysis was to obtain the deviations associated with each error from the point cloud comparison and to explore their correlations.
The steps are described in the following subsections: • The prepared welded parts are scanned with the 3D scanner. CAD parts with defects and a reference CAD workpiece are modeled (see Input block on Figure 1) and converted into a point cloud for the training and test set (Section 2.1). • Sampling generates a point cloud from the models (mesh files) for the validation set (see Cloud points matching block on Figure 1). The point features should be calculated to compare the two point clouds (Section 2.2). • After computing the point properties, the technique of matching the point clouds using the FGR and ICP algorithms (see Cloud points matching block on Figure 1) is described (Section 2.3). • The optimal percentile value is selected for PFH to facilitate efficient clustering (see Clustering block on Figure 1). DBSCAN clustering is applied (Section 2.4).

Training and Validation Datasets
In the next step, point clouds of the different CAD models are created to compare the real weldments. The following mathematical basis is used to compare the point clouds: The scanning of the workpiece ( Figure 2) for the validation set consists of the following steps: 1.
Tray calibration according to the size of the workpiece. The projection grid is projected by the projector onto the surface of the tray, followed by scanning of other details of the background (rotation table).

2.
The flawless workpiece is placed in the center of the tray, adjusting the rotation angle, and the scanning process begins. The three-dimensional model, usually fused from six or seven images, can be exported as a mesh model in various (.stl/.obj) formats.

3.
Since the scanned file is currently a mesh model consisting of 3D points and triangles (called surfaces), a point cloud is created from the models and used as the experimental subject (data source) for the study.

4.
The process is repeated with the workpiece containing the weld defect, with recommended setup.

Determination of the Point Features
In the next step, the point feature descriptors (fast point feature histogram-FPFH) [27] in P and Q are calculated (see Figure 3). According query points are p s to p i and q s to q i neighbors. After calculating the point descriptors, point correspondences can be established by nearest-neighbor feature matching. The FGR filters the results with multiple tests (correspondence rejection) to exclude false correspondences. It then estimates a transformation to align the two clouds using the remaining correspondences. This transformation is the initial transformation for the next step (transformation refinement with ICP). Our assumption is that the two objects are roughly identical, but due to the welding errors, there are differences between the ideal and the point cloud at the welding area. Therefore, we intended to match the two point clouds by performing a rough feature-based registration. For this, point feature descriptors must compute the points in both clouds. A point descriptor is a high-dimensional vector that encapsulates information and describes the underlying surface at a query point. There are many feature descriptor algorithms. We apply the FPFH [28], following the authors of FGR. The FPFH can compute efficiently and provides good matching accuracy.
The FPFH is based on the PFH, which was an earlier and slightly different version with higher computational complexity and dimensionality. Both of them represent the relationship (Equation (1)) between normal vectors of point pairs (n s , n t ) using different angular features (α, β, γ). A point pair (p 1 , p s ) needs to first define (Equation (2)) a fixed coordinate frame (uwv) to calculate the angular features (Equation (3)) as follows: The three angular features can be computed for a pair of points: Let N p s (Equation (4)) be the neighborhood of a query point p s in a given radius (r): The simplified point feature histogram (SPFH) of a query point p s (Equation (5)) is the histogram of the angular features of all point pairs, in which the first point is the query point (p s ) and the second point is one of its neighbors: In two steps (Equation (6)), we can obtain the FPFH feature of a query point p s . First, calculate the SPFH for the query point p s itself. In the second step, calculate the SPFH for each point p i in the neighborhood (p i ∈ N p s ) (see Figure 3). After weighing the histogram of neighboring points with the distance, we get the final FPFH histogram: where ω i is the distance between the query point p s and its neighbor p i (p i ∈ N p s ). The same steps are performed for Q.

Fitting of the Point Clouds, FGR then ICP
An approximate registration between p i and q i is performed to match the two point clouds, which requires the calculation of a point descriptor. FGR estimates a transformation that alights two point clouds together based on point correspondences between the two clouds. Correspondences are established by feature matching. These algorithms produce fewer tight matching results. The utility function represents the transformed source and target point clouds. Although the approach is faster than ICP, its performance could be better on large sample point clouds, which alone does not give valuable results. Therefore, a local refinement algorithm needs to refine the matching further [29]. Because ICP needs an initial transformation, it is used mainly in conjunction with FGR. Iterative matching uses the ICP method between the P and Q matrices. The ICP algorithm iterates through the following steps:

1.
Determination of correspondences First, we search for a correspondence set C, which contains matching index pairs. For each point in the P cloud, we search for its nearest neighbor from Q. Their indexes form a pair.

2.
Transformation estimation • Find the center of mass (Equations (7) and (8)) of two sets of points: • Subtract the corresponding center of mass (Equations (9) and (10)) from every point: • Find a translation vector t and rotation matrix R (Equation (11)) that minimizes the sum of squared errors: • This problem can be solved through singular value decomposition (SVD, Equation (12)), which is used to minimize the distance between the points [30]. We compute the cross-covariance matrix: • We then use the SVD (Equation (13)) to decompose: where U, V are 3 × 3 rotation matrices. • If rank W is 3, the parameters minimizing E(R, t) are unique and given by Equations (14) and (15): 3.

Transformation
Finally, the translated and rotated point (in Equation (16)) can be obtained: 4.

Iteration
Steps 1-3 are repeated, refining the transformation itself, getting closer to the overlap between the two point clouds so that the clouds fit together better.
After ICP, we get a final transformation that can be used to transform the point cloud. Now that the two point clouds have aligned, the distance between them can be calculated.

Clustering Based on Density
After the ICP, we can calculate the distance between two point clouds using the method compute point cloud distance provided by Open3D. It calculates the distance to the nearest point in the target point cloud for each source point cloud. We use Euclidean distance (Equation (17)) to calculate the distance: where NN Q (p) returns with the nearest point top from Q. The values will approach zero where the two clouds are nearly identical. However, the distance will increase if a discrepancy or a welding error is valid. For the distance values, a threshold is set to select the few thousand points that are "furthest" apart in terms of the position of the two point clouds. This can be a percentage threshold or a specific distance. The NN 10 Q (p) is clustered as the top ten percent of NN Q (p). Because the location of these points varies with density, they are suitable for density-based analysis. Thus, the (sorted) distance values are considered the input dataset for the densitybased algorithm.
Clustering is based on distance or similarity measures (e.g., density). The DBSCAN algorithm is a traditional density-based clustering method, currently presented in a OPEN3D Python environment. One of the key advantages of DBSCAN is that it does not require the user to specify the number of clusters. DBSCAN identifies points in a dataset in high-density regions and marks them as core points, as shown in black in Figure 4. These core points are then used to define clusters, data regions densely populated with points. It then expands outward from these core points, adding points within a certain distance (called the eps parameter) to the cluster. Core points contain at least minpoints points in their eps neighborhood. Border points (shown in orange in Figure 4) does not contain enough points in their neighborhood, but they fall in the neighborhood of some core points [31]. Points that are not within the eps distance of a core point are considered noise (red in Figure 4) and are not included in the clusters [32]. In our case, the eps value is related to the pcr value (Equation (18)), which has the magnitude of the arithmetic mean of the distance between each point in the point cloud and its neighbors (where the quotient has the number defined for sampling): The minpoints parameter of the DBSCAN is the minimum number of points per cluster required for a point to be considered a dense region or valid cluster. The cluster number indicates the minimum number of points a cluster can consist of.
After clustering, we can select the cluster that seems to be the problem and use the rootmean-square error (RMSE) formula to determine its distance from the master cloud. RMSE is calculated by following formula: where i is the estimated time series, and the denominator is the sampling volume. For each point in the identified clusters (errors), we examine how far it is from its nearest neighbor in the reference. The higher RMSE value means a more problematic cluster.

Application of the Developed Clustering Method for Welding Defects Identification
In this section, we will prove the applicability of the developed framework on generated workpieces (see Section 3.1) and an actual physical workpiece (see Section 3.2). We will show how we can detect welding defects based on the proposed framework in both cases. Welding technology standards are used to assess the variation in weld defects, which can be used to accurately determine the adequacy of welds. One of these standards is ISO 6520-1:2007, which deals with the main classification of defects and their precise description. The six main groups of defects are described in tabular form, each with an example (see Table 1). The other standard that shows in detail the individual seam deviations and their limits is ISO 5817:2014. The welding defects are categorized based on whether they can be detected with the developed ML-based solution. The standard includes limits in three quality categories or indicates acceptability by the terms not permitted and permitted. The acceptability of a welded joint is based on the visible absence of defects in the weld and the adequacy of all the tests described in the relevant procedure test standard. All types of welds-related deviations have a "catalog number" or classification, given in Table 2 with limits. The number of imperfections is the number of the deviation recorded in the standard. We calculated this for A5 weld sizes.
We applied the following software and hardware elements for the tests of the developed framework:  [26].

Performing Tests on Generated Workpieces
After fitting the CAD point clouds, the results of density-based clustering (from the density characteristic (radius = 15 × pcr) and a minimum number of points per cluster is 100) are shown in Figure 5. The generated defects, except for two (crack and end crater), form a separate cluster and are visible. The evaluation is defined as different marginal deviations in the three quality categories (B, C, D). As you move down to BCD, the standard becomes more permissive; for example, the height of the excessive convexity can be up to 3 mm for category B, 4 mm for category C, and up to 5 mm for category D.
In the case of cracking, the defects generated do not show up as separate clusters, even when moving towards the more permissive direction of the standard. The other main groups of defects (solid inclusion, lack of fusion, imperfect shape, and miscellaneous imperfection) are detectable by clusters (see Table 2): • Crack (number of imperfection 100): For the test, a "brick body cavity" of 0.5 mm × 0.2 mm × 0.1 mm was created in the CAD model ( Figure 5a). Regardless of the defect type, the system detects the same percentage of the parts remaining outside the scattered areas together with the cavitation defect cluster for both defect clusters (225, 000 points for 90%). The least sensitive group for the test, due to the size difference (100 points/250, 000 points), small and fragmented clusters are not visible to the system, and the test cannot detect clusters with such low scores. Clusters with roughly the same number of points for both error groups can respond differently to the test because they have different geometric properties (for the case of excessive convexity, a normal vector plus the reference). Above 90% (a score of approximately 25,000), the two groups separate in terms of cluster scores, whereas for bump-like defects, defects can still be detected with a minimal decrease; for other imperfection groups, after a marked decrease, no defect-presenting cluster can be detected at 99%.

•
End crater pipe (number of imperfection 5012) (Figure 5d) and Intermittent undercut (number of imperfection 2025) (Figure 5e): In these cases, minus normal vector errors proportional to the reference point cloud are detected. The number of points in the cluster detected as an error decreases steadily, and at 90%, the cluster product of the generated error contains 5700 points. The linear decrease is also true for the error group of the end craters, but the number of points in the cluster clouds is smaller, from 288 to 176. • Excessive asymmetry of fillet weld (number of imperfection 512) and dimension: The most sensitive group of errors to analyze, the diversity of errors means that detection is roughly 90% successful (Figure 5f). At this optimum percentage, the test detects five clusters (all the errors represented).
In conclusion, density-based clustering cannot detect crack and end crater defects due to their small size. Based on the location and extent of the clusters, the asymmetry defect shows two distinct clusters on the seam, with excessive convexity and intermittent undercut showing one cluster each along the entire length of the seam.

Performing Tests on Real Physical Workpieces
We used a Telwin Inverpulse 320 welding machine to create welded samples: 8 − 8 specimens, 50 mm long, with a5 welds (dimension of the height dimension of the triangle enclosing the corner weld) were T-welded. We will show the lack of cutting accuracy required to produce the part, and various burn-in problems can also be detected by examining the joints. The defective test piece was fitted to a reference workpiece (validated best workpiece). As shown in Figure 6, in addition to the differences in weld size, differences in geometrical size also appeared separately (workpiece corners or deviations from planes). It is important to note that, whereas in Figure 6, the red colors represent the furthest points from the reference workpiece, in Figure 7, red is just one of the colors of the clusters sign.  We derived the value of eps from the density characteristic (15 × pcr). According to this radius, we have checked whether each point has a neighbor. The minimum number of points per cluster has been set to 100 in this case, based on the number of dimensions present in the dataset and the number of dimensions in the data [34]. The top 10% of the distances are obtained after the fits are clustered: the center of clusters and size distributions exhibit characteristics typical of the types of weld defect. As many clusters as detectable, as many errors are identified.
The testing process is similar to that described for generated workpieces. However, after DBSCAN, not only a "perfect" defect is displayed. Therefore, the defect clusters were sorted by their RMSE value (most significant value, upfront) and labeled by their center of mass location on the workpiece (brown point-maximum RMSE value). As shown in Figure 7, two of the center of mass points fall on welding seams (brown-down, blue-up), but based on the sorted RMSE values, only the lower cluster requires further investigation. The long red cluster does not exceed the tolerance values. The center of mass of the other visible clusters is located inside the part.
Based on the RMSE values and the center of gravity values, whether the defect is currently an excessive convexity defect (above the allowable value) or an intermittent undercut defect (below the value) can be further investigated. The cluster indicates a weld defect if the marking falls on a weld. In this case, the product must be repaired or scrapped. The values of the clusters to be listed are shown in Table 3. Typical weld defects are excessive convexity defects, where one of the centers of mass of the defective clusters has a value beyond the tolerance, higher than the weld line height of the defect-free part. The tolerance mentions three categories (B, C, D). As noted, the actual size is outside the B tolerance category in one case. The furthest values for fitting generated CAD models were 4.803 mm (B), 3.939 mm (C), and 2.936 mm (D). The values are close to the standard (5-4-3 mm).
In conducting these experiments, we found the following: • It is inefficient for the framework of the CAD model to contain errors of too small a size; below a score of 200 (out of 250,000), no error is detectable. (Crack and end crater defects cannot be detected.) • Sub-surface defects cannot be detected due to weld seam gassing, and additional investigations are needed. • Comparing the image of the CAD model with the image of the welded workpiece is unfortunate due to geometric differences, as differences not related to the weld scan are also considered errors. Dimensional differences caused by inaccurate cuts generate huge clusters. • A glossy, transparent surface or all-black-and-white color will spoil the expected scanning result. However, environmental preparation is also essential (from the projector light, no brighter light in the room).
• Care had to be taken to fine-tune the scan in the setup, but it is highly hardwaredependent. Generating a 3D model from more than six (rotation angles less than 60 degrees) matching images makes the process slow.

Conclusions
This study presents a method for classifying weld defects in corner welded joints using imaging techniques and data clustering. The developed framework employs a resultsoriented approach to visualize defects using CAD, with point clouds prepared for coarse (FGR) and fine (ICP) matching by point features (PFH). Density-based clustering (DBSCAN) is applied to the sorted distance between point pairs, resulting in several clusters. The location of these clusters is then clarified by sorting and marking them on the workpiece. The excessive convexity defect is presented as an example, with RMSE values of the clusters compared to the standard. Clusters exceeding the limit value are marked by their center of mass, which can be used to check whether the cluster with the critical value is located on a weld seam. The generated and physical workpieces demonstrate the applicability of the developed framework.
The limitation of the developed framework depends on the 3D scanning technology. The quality of the scan is strongly influenced by the physical conditions: the settings used in the preparation; the camera position away from the subject; the angle of the camera to the horizontal or camera to the projector; the light conditions in the scanning room. In addition, the machine vision methods can lack robustness when faced with variations in the manufacturing process or imaging conditions [35]. Using the DBSCAN algorithm parameters eps = 100 can cause a failed defect detection for extremely small (0.5 mm × 0.2 mm × 0.1 mm) weld defects (e.g., cracks), as less than 100 points are generated on the defect cluster surface out of 250, 000. Further reducing the value of eps, however, thousands of clusters will be misclassified. It is important to note that for generated CAD models if a fault passes through a full cross-section, some bias occurs in the calculation of its maximum distance. It is emphasized that this method alone cannot be used for a complete weld analysis, but it can speed up and automate the process. Based on standardized tests, it is concluded most of the listed defects can be detected, with some limitations. The center-of-mass coordinates of point sets can be used for graphical applications, allowing for the visualization of the defect location for quality control purposes, and potentially launching further research in this area.
The proposed method can be integrated into digital production processes, as it enables the automation of the inspection process. This automation can either support or completely replace human decision-making. However, it may result in incorrect decisions caused by surface errors during rendering. To overcome this limitation, future work could consider combining multiple methods and utilizing advanced convolutional neural techniques as a potential solution.  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/abonyilab/3D-scanner-data (accessed on 28 December 2022).

Conflicts of Interest:
The authors declare no conflict of interest.

Nomenclature
The following abbreviations are used in this manuscript: