2.2. Point Cloud Denoising Based on the BPM-ADBSCAN-SOR Algorithm
When the line chromatic confocal sensor collects the point cloud data of the surface roughness sample set, noise interference exists in the point cloud data due to the disturbance of the surrounding environment, which will have a great impact on the accuracy of the final roughness calculation result. In addition, with the development of three-dimensional scanning technology, the scale of point cloud data becomes larger and larger, and the efficiency of subsequent point cloud processing steps will be significantly reduced if noise and other data are not eliminated. Therefore, it is necessary to preprocess the point cloud data collected by the system. At present, the point cloud preprocessing mainly faces the following three problems:
In the point cloud data collected by the line chromatic confocal sensor, the noise type is mixed noise, which mainly includes outlier noise around the main part of the roughness sample cloud and near-field noise in the gap of the sample block. Traditional point cloud denoising algorithms, such as statistical filtering, through filtering and low-pass filtering, have a good processing effect on single noise, but the processing effect on mixed noise is mediocre.
Since the nominal value of the surface roughness sample block is mostly at the micron level, the three-dimensional surface profile fluctuation is small. In order to ensure the high accuracy of the calculation results, the point cloud data collected by the line chromatic confocal is a large-scale, high-density point cloud, and the number of collected points is tens of millions or hundreds of millions. The requirements for computing power and memory are relatively high. The traditional filtering algorithm is difficult to support the calculation of large quantities.
Since the purpose of this study is to process the point cloud data of multiple sets of roughness sample sets, and according to the relevant provisions in JJF 1099-2018 [
22], the roughness sample blocks with different nominal values under different processing conditions have different evaluation lengths in the calibration process, so it is necessary to separate the point clouds of each sample block during pretreatment.
Based on the aforementioned analysis, this paper proposes a combined denoising algorithm, ADBSCAN-SOR with back projection mechanism (BPM-ADBSCAN-SOR), which utilizes the back projection mechanism. This algorithm enables rapid and efficient removal of mixed noise from roughness sample point clouds and clustering of multiple sets of samples. The overall process of the algorithm is illustrated in
Figure 2:
Firstly, considering that the point cloud data collected by the line spectral confocal sensor is large-scale dense point cloud data, in order to reduce the computing power requirements and improve the computing efficiency, the point cloud data should be simplified first, and voxel down-sampling based on octree should be adopted. Octree [
23], as a typical hierarchical spatial partition data structure, divides the bounding volume of a scene into eight uniform sub-regions recursively until the number of elements in each sub-node meets a preset threshold or other termination conditions, so as to realize the orderly organization and management of point clouds. In the process of establishing the topological relationship of the octree, first traverse all points, and record the extreme values in
X,
Y and
Z directions to establish a minimum three-dimensional grid that can surround all points. The grid size is:
Set the level
d for the octree, and calculate the voxel mesh size of each leaf node of the octree as:
Traversing each leaf node, voxel down-sampling is completed by calculating the voxel center of the current node and replacing all points in the whole voxel with the voxel center of the current node. In this paper, the octree hierarchy is selected as 10 levels. The original point cloud data and the down-sampling hierarchical division results based on the octree are shown in
Figure 3 and
Figure 4 respectively. The point cloud data before and after down-sampling are opened by the CloudCompare v2.13.0 software. The comparison of point cloud density before and after octree down-sampling is shown in
Figure 5.
After the point cloud down-sampling based on the octree, the point cloud density decreases significantly, and the number of points is down-sampled from 38,720,845 to 842,642. The point cloud structure is effectively simplified, which greatly reduces the computational burden and improves the computational speed for subsequent denoising and clustering.
After obtaining the low-resolution point cloud data, the noise type of the point cloud main body is analyzed, and it is found that the noise in the point cloud data mainly includes outlier noise distributed around the main body and near-field noise existing in the roughness sample block gap. The specific noise distribution is shown in
Figure 6.
Aiming at outlier noise, combined with clustering requirements of sample blocks, the DBSCAN algorithm can effectively remove outlier noise and clustering effect. It uses the concept of neighborhood and data point similarity to achieve effective classification of data. However, when using the line chromatic confocal sensor to obtain the three-dimensional point cloud data of the roughness sample block, placement or platform errors often cause the measurement surface to tilt. Consequently, a low-frequency trend item is superimposed on the original point cloud data.
This trend item will affect the subsequent DBSCAN clustering and roughness parameter calculation, so it is necessary to correct the sample tilt. The sample tilt shows a first-order linear trend term in the point cloud.
To eliminate sample skew, it is necessary to estimate the trend term:
The least square method is often used for plane fitting, and the objective function is:
where (
xi,
yi,
zi) is the spatial coordinates of points, and
N is the number of points, and its matrix form is:
Then the plane parameter is:
According to the least square method:
Thus, the optimal plane parameters a, b, and c can be obtained. After obtaining the fitting plane, subtract the fitting plane from the original point cloud:
Then the corrected point cloud is:
In the process of fitting the plane, the global trend estimation is used, the neighborhood average is not used, and the local height difference is not changed. Therefore, only the low-frequency trend item will be removed, and the surface undulation will not be smoothed. After correcting the inclination, the subsequent clustering denoising steps can be carried out.
DBSCAN [
24] can detect arbitrary cluster structures without specifying the number of target clusters, and can automatically detect and eliminate noise points without additional noise point removal operations.
ε and
MinPts are two important parameters used to measure the neighborhood density of data points in the algorithm.
ε represents the distance threshold of data points in a specific range, that is, the defined radius;
MinPts refers to the threshold of the minimum number of samples in the range of
ε radius. DBSCAN’s clustering concept is a density-reachability-based classification method that groups data points with higher densities into a cluster group, which usually includes one or more core objects.
However, traditional DBSCAN relies on manually specified ε and MinPts, which have a decisive influence on clustering results. In actual scanned point clouds, point cloud density often varies significantly in different regions, and fixed parameters often fail to accommodate high-density and low-density regions, resulting in low-density regions being misclassified as noise or high-density regions being misclassified as the same structure. Therefore, this study introduces an adaptive parameter selection mechanism based on k-nearest neighbor distance distribution on the traditional DBSCAN structure to automatically estimate the optimal ε and improve the robustness of the algorithm.
Specifically, for a point cloud sample set
p, the distance
dk(
pi) from each data point to its
kth nearest neighbor is first computed, forming a set of k-nearest neighbor distances:
After sorting the distance set, a typical “elbow phenomenon” can be observed, reflecting the change trend of point cloud density from dense area to sparse area. Generally speaking, there are quantile estimation methods, the elbow method and the local adaptive method to estimate the
ε value. All three methods can realize automatic estimation of
ε. The difference is that the quantile method has the highest stability, but is not sensitive to density change. The elbow method is more sensitive than the quantile method, but it has weak processing ability for super-large point cloud data. The local adaptive method has the highest sensitivity to density, but its stability is poor. The denoising effects of the three adaptive strategies are shown in
Table 1.
According to the data in
Table 1, it can be found that the elbow method and local adaptive method are obviously superior to the elbow 90% quantile method in noise reduction effect, while the elbow method is obviously faster than the local adaptive method. The local adaptive method is mainly used to deal with a data set with uneven density distribution, and the clustering performance is improved by setting different neighborhood radii for different regions. However, the roughness sample point cloud data processed in this paper is obtained by the line spectral confocal system, the sampling interval is fixed, and the overall distribution of point cloud density is relatively uniform, so the problem of density change is not significant. In this case, the improvement of the local adaptive method may increase the complexity of the algorithm and the number of parameters, but the improvement of clustering performance is limited. In addition, the roughness sample calibration is a precision measurement application, which requires high stability and repeatability of the algorithm, while the local adaptive method usually introduces additional instability factors. In contrast, the elbow method can determine the uniform neighborhood radius through global analysis, which has better stability and computational efficiency. In summary, the elbow method is used to estimate ε in this study. First, for the sample set
p, the
ε neighborhood is defined as
For each sample point, calculate the k nearest neighbor distance
dk(
xi), let
k =
MinPts, and arrange all
dk(
xi) in ascending order to obtain the sequence:
Define the function
f(
i) =
d(i), then the function is a k-distance curve. Suppose
f(
i) is a discrete function, then its first-order difference is:
The second order difference is:
When |Δ
2f(
i)| reached a local maximum, the location can be considered as the curvature maximum. The adaptive estimate of the response is:
Then
ε* is the adaptive neighborhood radius. In addition,
MinPts is set to be consistent with
k, so that the density definition is consistent and the stability of the algorithm in different density intervals is enhanced. After
ε estimation is completed, the subsequent calculation steps are consistent with the traditional DBSCAN algorithm, realizing the automatic calculation of filter parameters, avoiding manual parameter adjustment and improving the processing efficiency. As for the value of
k, this study selects several commonly used
k values for processing, and the processing results are shown in
Figure 7 and
Table 2.
It can be seen from
Figure 7 and
Table 2 that when
k is 100, the original sample block will be cut off, resulting in an error in the number of clusters. When
k is 200, clustering will occur across, and a part of the point cloud of adjacent sample blocks will also be clustered into its own cluster. When
k = 250, clustering adhesion will occur, resulting in a reduction in the number of clusters. Therefore, in this study, set
k = 150, and the calculated
k-distance is shown in
Figure 8.
After processing by the ADBSCAN algorithm, the point cloud clusters of each roughness sample block at low-resolution are obtained. Because the characteristic scale of roughness is in micron level, and the calculation of roughness needs to keep the tiny features of roughness sample blocks as much as possible, the point cloud at low resolution needs to be backprojected to the original point cloud data, and the original large-scale point cloud is trimmed, which is approximately regarded as clustering operation at the original point cloud scale, so as to achieve the purpose of removing outlier noise and point cloud clustering. For back projection clipping, firstly, we need to extract the boundary of the point cloud cluster with low-resolution roughness. In this paper, point cloud bounding box technology is adopted [
25]. According to the geometric characteristics of the point cloud model, a box with a shape close to a cube is used to enclose the point cloud data. AABB bounding box is used in this paper. An AABB bounding box is a cuboid that can encompass the surface of an object without being closely attached. Suppose there is a point cloud
p = {((
xi,
yi,
zi)}
Ni=1, and its minimum and maximum values in X, Y, and Z directions are:
Construct a rectangular bounding box using the minima, with the boundary:
Find out the minimum coordinate point and maximum coordinate point in the AABB bounding box and record them as
pmin and
pmax. The basic structure of an AABB can be two diagonal points,
pmin and
pmax. The central coordinate of an AABB is
c = (
cx,
cy,
cz)
T. The difference between the central point and
pmin and
pmax is
d = (
dx,
dy,
dz)
T respectively. Then the size of the AABB bounding box is:
Calculate and display the bounding box size according to Formula (12). After determining the bounding range, project the bound locked by the bounding box back to the original roughness sample block point cloud data, and trim the original data, as shown in
Figure 9.
It can be seen from the figure that the clipped point cloud can maintain a good main structure and will not cause sample block data loss, but there is still near-field noise around the sample block that needs to be further removed.
After the BPM-ADBSCAN algorithm, the main body of the point cloud data is transformed from the original point cloud of the sample set to the point cloud of every single sample, and the near-field noise inside the original small gap can be regarded as random single-point noise near the main body of the new point cloud. Based on this characteristic, a statistical filter can filter by analyzing the local density of the point cloud [
26]: when the local density of the point cloud in a certain area is lower than a set threshold, the point in this area will be judged as an outlier and eliminated.
For each point Pi in the cloud, first determine its neighborhood range, select
k points closest to
Pi to form a neighborhood. Let
Pi’s neighborhood consist of
k points, and the Euclidean distance between
Pi and the
jth neighborhood point is
dj, then its average distance
davg can be expressed as.
Calculate the standard deviation of the distance between points in the neighborhood of point
Pi. The standard deviation
dσ indicates the degree of change in the distance between neighboring points, reflecting the smoothness of the point cloud in local areas.
In the outlier determination process, the degree of abnormality of each point is measured by calculating its outlier degree. The outlier degree is defined as the ratio of the average distance in the neighborhood of the point to the standard deviation. When the outlier degree exceeds the preset threshold T, the point is determined to be an outlier; otherwise, it is regarded as a normal point. The denoising effect of statistical filtering is shown in
Figure 10.
2.3. Roughness Calculation Algorithm Based on Steady-State Confidence-Weighted Robust Gaussian Filtering
In the process of designing a surface roughness calibration algorithm, according to the relevant provisions in JJF 1099-2018 surface roughness comparison block calibration specification, it is necessary to determine the evaluation length first. After selecting the evaluation position, the roughness value is measured by a Gaussian filter. However, two problems inevitably occur in the execution of a traditional Gaussian filter: (1) boundary effect; (2) outlier sensitivity. The schematic diagrams illustrating the boundary effect and sensitivity to outliers of traditional Gaussian filtering are presented in
Figure 11 and
Figure 12, respectively.
To solve these two problems, this paper designs the SSCW-RGF (Steady-State Confidence-Weighted Robust Gaussian Filtering) algorithm to fit an accurate reference centerline and obtain the corresponding roughness curve, thereby solving for the roughness value.
The prerequisite for filtering is to acquire the corresponding surface contour. In obtaining the surface contour, this study employs the concept of profile extraction, treating the point cloud distribution from the side view of the extracted profile as the surface contour. Given that different machining processes exhibit distinct texture orientations, it is advisable to position the roughness specimens such that their texture directions are as perpendicular as possible to the line laser direction prior to scanning. Following the application of the denoising and clustering algorithm mentioned earlier, for each specimen’s point cloud, the centroid coordinates are first determined. Subsequently, ten random intervals are selected from the centroid outward in both directions along the X-axis for cropping. The cropping direction is perpendicular to the X-axis, resulting in the acquisition of ten YOZ profile contours. Subsequently, the next filtering operation is conducted.
To perform filtering in the steady-state region, the first step is to determine the steady-state region, According to the mathematical model of finite-field convolution, it is assumed that the actual measured profile is a discrete sequence
z(
xn),
xn =
n∆
x,
n = 0, …,
N − 1, its measurement interval is a finite interval
D = [
x0,
xN−1], the essence of Gaussian filtering is convolution, but there is no real data outside the measurement contour, so convolution can only be performed in a finite field:
is the contour after continuation. The continuation methods generally include reflection, period, zero padding, etc., and can be expressed as:
Thus, the convolution value can be decomposed into:
If the error term is
ε(
xn), then the convolution is completely error-free if:
The Gaussian kernel theory has an infinite support domain, so the error can be zero in the engineering sense only if
xn is farther than 3
σ from the boundary. Thus, the steady-state region can be defined as
Dsteady = [
x0 + 3
σ,
xN−1 − 3
σ], then the length of the steady-state region is
Lsteady = (
xN−1 −
x0) − 6
σ ≈
L − 3
λc. The value of
λc is specified by the JJF 1099-2018 calibration specification. According to the relevant content of the calibration specification, the value of
λc corresponding to each sample block is set in advance according to the order of placement before scanning. By using the BPM-ADBSCAN-SOR algorithm, the point cloud data of each roughness sample block is obtained. The setting of
λc is implemented through case logic, and each sample point cloud corresponds to a case. At the same time, the calibration specification stipulates that the evaluation length needs to be greater than five times the sampling length, and the sampling length is numerically equal to the cutoff wavelength
λc. Therefore, in order to make the evaluation length completely fall within the steady-state region,
Lsteady must be greater than 5
λc; that is, when the measurement length is at least eight times the cutoff wavelength, the evaluation length can completely fall into the steady-state region. In this case, the contour signal with length of 5
λc can be directly intercepted as the output result in the steady state region, but the contour extension is artificially introduced when Gaussian filtering is carried out, so the filtering result is weighted according to the steady state degree, and the credibility of the filtering result in the boundary region is reduced by constructing a credibility function without contour extension. Firstly, according to ISO16610-21 [
27], the valid support range of Gaussian kernel is defined as:
R =
Lcλc, where
Lc is a dimensionless constant, usually 0.5, indicating that the current effective support length is 0.5 times the cutoff frequency, and when the distance exceeds this range, the effect of the filter gradually decreases until the signal is considered unfilterable. For a given profile signal
z(
x), it is necessary to determine whether each point is in the steady state region according to its distance from the signal boundary. Assume the boundaries of the signal are
xmin and
xmax, then for any point x on the signal, define the minimum distance to the nearest boundary as:
Define the confidence function
c(
x) of the steady state region as:
At this time, the reference center line
w(
x) fitted by the traditional Gaussian filter is weighted with the steady-state region reliability function
c(
x), and the weighted reference center line
ŵ(
x) expression is calculated as:
From the Gaussian median line extracted from Equation (29), the roughness signal
r(
i) can be calculated from Equation (30):
Based on the reliability weighted Gaussian filter in the steady state region, M estimation theory is introduced, and a robust Gaussian filtering algorithm weighted by steady-state reliability is designed. This algorithm can fit most of the data and identify possible outliers, avoid the influence of outliers on the filtering results, and maintain a certain robustness in the measured data when there is a large deviation.
The SSCW-RGF algorithm is based on the SSCW-Gaussian algorithm, introducing a vertical weight function, with the objective function being:
where
ρ(
x) is the vertical weight function, and the choice of
ρ function should be based on the actual distribution mode of the signal, which can effectively improve the computational efficiency while realizing reliable signal processing.
Robust estimation theory holds that different robust weight functions may produce different results when processing the same observation signal, and improper selection of a robust function may sometimes lead to the failure of the evaluation. Robust estimation theory holds that different robust weight functions may produce different results when processing the same observation signal, and improper selection of a robust function may directly lead to the failure of evaluation sometimes [
28]. According to ISO 16610-31 [
29], the robust filtering of surface contour often uses Tukey double weight estimation to construct a vertical weight function, and its corresponding vertical weight function is:
where
i is the number of iterations,
χ is the residual after the
ith iteration,
cB = 4.4
median(|
z(
x) −
w(
x)|) represents the residual function. Tukey estimation assigns weights to the measurement data based on the actual statistical value of the residual. The setting of convergence precision
t is related to the operation results of the whole SSCW-RGF algorithm. Generally, the value range of
t is between 10
−6 and 10
−3. The operation results of SSCW-RGF under different convergence precisions are shown in
Figure 13:
It can be seen from the figure that when the convergence accuracy is 10−3, the filtering result will still be affected by outliers; when the convergence accuracy is further reduced to 10−4, the change in filtering results has been basically stable, indicating that the algorithm has basically reached a stable state near 10−4. At this point, if we continue to reduce the convergence accuracy t, the number of iterations and calculation time will be significantly increased. Therefore, considering the filtering accuracy and computational efficiency, this paper selects 10−4 as the convergence threshold of SSCW-RGF.
After fitting the stable benchmark centerline, the roughness signal can be separated according to relevant provisions in JJF 1099-2018 calibration specification, and then the roughness value can be calculated. The surface roughness of the workpiece is generally expressed by the arithmetic mean deviation
Ra of profile, and calculated according to Equation (25):
where
l is the sampling length and
z(
x) is the surface profile height based on the surface profile benchmark centerline.
Ra values within the evaluation length are calculated from Equation (26):
where
Ran represents the
Ra value calculated in the
nth sample length.