An Improved Supervoxel Clustering Algorithm of 3D Point Clouds for the Localization of Industrial Robots

: Supervoxels have a widespread application of instance segmentation on account of the merit of providing a highly approximate representation with fewer data. However, low accuracy, mainly caused by point cloud adhesion in the localization of industrial robots, is a crucial issue. An improved bottom-up clustering method based on supervoxels was proposed for better accuracy. Firstly, point cloud data were preprocessed to eliminate the noise points and background. Then, improved supervoxel over-segmentation with moving least squares (MLS) surface ﬁtting was employed to segment the point clouds of workpieces into supervoxel clusters. Every supervoxel cluster can be reﬁned by MLS surface ﬁtting, which reduces the occurrence that over-segmentation divides the point clouds of two objects into a patch. Additionally, an adaptive merging algorithm based on fusion features and convexity judgment was proposed to accomplish the clustering of the individual workpiece. An experimental platform was set up to verify the proposed method. The experimental results showed that the recognition accuracy and the recognition rate in three different kinds of workpieces were all over 0.980 and 0.935, respectively. Combined with the sample consensus initial alignment (SAC-IA) coarse registration and iterative closest point (ICP) ﬁne registration, the coarse-to-ﬁne strategy was adopted to obtain the location of the segmented workpieces in the experiments. The experimental results demonstrate that the proposed clustering algorithm can accomplish the localization of industrial robots with higher accuracy and lower registration time.


Introduction
Without the application of industrial robots, it is impossible to accomplish automation and modernization of a manufacturing process in any industrial branch. Due to their strong adaptability and flexibility, industrial robots are commonly assigned for dull, dangerous, and unpleasant jobs to replace human beings, including automatic spray painting [1], welding [2], grinding [3], logistics sorting [4], assembling [5] and so on.
Computer vision, as the eyes of industrial robots, has become a significant component of the robotic system in previous decades. Due to the advantage of obtaining richer physical information about objects, extensive research on 3D vision has been carried out in recent decades. However, 3D segmentation is still one of the most challenging tasks in computer vision. The results of 3D segmentation will directly affect object recognition [6], pose estimation [7] and positioning [5] in the application of industrial robots. The goal of the segmentation process is to group points that belong to the same objects into clusters or sets, where each cluster has similar properties by some criteria. Three-dimensional segmentation methods include the traditional pointwise methods and the segmentation methods based on deep learning, which emerged during those years. The segmentation method based on deep learning is one of the solutions; however, the training data must be manually labeled, which is more challenging than labeling 2D images [8][9][10]. It is impractical to adopt the deep learning approach in engineering applications for difficult training data preparation. The pointwise methods are more widely used in engineering projects as no training data are required and they have high adaptability [11]. However, the pointwise methods are inefficient for point clouds with large data volumes. Furthermore, in the application of industrial robots, noises and the adhesion of objects' point clouds are generated for many reasons, including similar and colorless workpieces stacked in a mass, the light reflection, and the shooting angle of 3D cameras. Without eliminating the useless noises and adhesion, it is easy to generate mis-segmentation and under-segmentation.
The supervoxel-based method inspired by superpixels can effectively reduce point clouds' data volume. The superpixel method is widely used in 2D image processing to effectively reduce the computation consumption of the subsequent process [12]. In recent years, the efficient supervoxel method has been introduced to 3D semantic segmentation [13][14][15]. Supervoxels were applied in a convolution operation (SVConv) by Huang, Ma et al. to effectively accomplish online 3D semantic segmentation [16]. In Sha, Chen et al.'s work, road contours were extracted efficiently and based completely on a supervoxel method without any trajectory data [17]. The Euclidean clustering algorithm was optimized by supervoxels to improve the anti-noise ability of the clustering process by Chen et al. [18]. Li, Liu et al. proposed a multi-resolution supervoxels method to improve accuracy in regions of inconsistent density [19]. Lin, Wang et al. adopted an adaptive resolution for each supervoxel to preserve object boundaries effectively [20]. Although the existing supervoxel segmentation methods have the abilities of data reduction and anti-noise, the adhesion problem is still unsolvable, which results in the low accuracy of segmenting objects, especially in complex industrial applications.
To improve the accuracy and efficiency of 3D instance segmentation under the condition of stacked workpieces with weak texture, a bottom-up clustering method based on supervoxels was proposed. In the supervoxel-based over-segmentation algorithm, moving least squares (MLS) surface fitting was utilized to refine the supervoxel clusters, which can eliminate noises and adhesion. In the merging algorithm, the precise geometric and spatial features are extracted from refined supervoxel clusters, which are generated from over-segmentation. Then, according to the convexity-concavity judgment and the distance metric consisting of feature information, the supervoxel patches are merged to complete 3D instance segmentation.
In summary, the main contributions of this paper are as follows: 1.
An improved supervoxel over-segmentation algorithm with MLS surface fitting was proposed to effectively eliminate the adhesion caused by shooting angles and reflections. Additionally, the over-segmentation method performs data simplification.

2.
A multi-feature metric combined with convexity-concavity judgment was proposed. An adaptive approach was added to this metric to normalize different features. According to the metric, over-segmentation patches can be merged via the proposed merging algorithm.
The organization of this paper is as follows: in Section 2, the proposed methodology is introduced including preprocessing, over-segmentation based on supervoxels and MLS, multi-features region merging. In Section 3, the experimental results with quantitative and visible outputs are demonstrated to analyze the viability and advantages of the proposed method. Finally, the conclusion is summarized in Section 4.

Methods
The proposed bottom-up method includes data preprocessing, an over-segmentation algorithm, and a region merging algorithm, as shown in Figure 1b. The point clouds are obtained by a binocular structured light camera in Figure 1a. After object instance segmentation in Figure 1b, the objects' point clouds with shape and location information are extracted. As shown in Figure 1c, combined with the sample consensus initial alignment and iterative closest point (SAC-ICP) registration, bin-picking experiments are conducted to test the improvements and feasibility of the proposed method.

Methods
The proposed bottom-up method includes data preprocessing, an over-segmentati algorithm, and a region merging algorithm, as shown in Figure 1b. The point clouds a obtained by a binocular structured light camera in Figure 1a. After object instance se mentation in Figure 1b, the objects' point clouds with shape and location information a extracted. As shown in Figure 1c, combined with the sample consensus initial alignme and iterative closest point (SAC-ICP) registration, bin-picking experiments are conduct to test the improvements and feasibility of the proposed method.

Binocular Structured Light Data Acquisition
Instance Segmentation

Bin-picking Experiments
Target Model

Data Acquisition and Preprocessing
The workflow of this section is shown in Figure 2, including data acquisition, dow sampling, plane removal, and outlier points removal.

3D point cloud data acquisition
Use Voxel-Grid filter to down sampling Use RANSAC algorithm to remove palne Use Statistical-Outlier-Removal to eliminate noise A binocular structured light 3D camera was applied to attain high-quality 3D po cloud data with a resolution of up to 0.02 mm. The large amount of point cloud data sults in high computational complexity. Thus, the voxel grid algorithm was implement to perform down sampling, while maintaining the input data's shape characteristics a geometric properties. Given the voxel grid size, the point clouds can be divided in Figure 1. The process of our method and its application. Subfigures (a-c) display the acquisition of point cloud data, the objects instance segmentation, and bin-picking experiments, respectively.

Data Acquisition and Preprocessing
The workflow of this section is shown in Figure 2, including data acquisition, down sampling, plane removal, and outlier points removal.

Methods
The proposed bottom-up method includes data preprocessing, an over-segmentation algorithm, and a region merging algorithm, as shown in Figure 1b. The point clouds are obtained by a binocular structured light camera in Figure 1a. After object instance seg mentation in Figure 1b, the objects' point clouds with shape and location information are extracted. As shown in Figure 1c, combined with the sample consensus initial alignmen and iterative closest point (SAC-ICP) registration, bin-picking experiments are conducted to test the improvements and feasibility of the proposed method.

Binocular Structured Light Data Acquisition
Instance Segmentation

Bin-picking Experiments
Target Model

Data Acquisition and Preprocessing
The workflow of this section is shown in Figure 2, including data acquisition, down sampling, plane removal, and outlier points removal.

3D point cloud data acquisition
Use Voxel-Grid filter to down sampling Use RANSAC algorithm to remove palne Use Statistical-Outlier-Removal to eliminate noise A binocular structured light 3D camera was applied to attain high-quality 3D poin cloud data with a resolution of up to 0.02 mm. The large amount of point cloud data re sults in high computational complexity. Thus, the voxel grid algorithm was implemented to perform down sampling, while maintaining the input data's shape characteristics and geometric properties. Given the voxel grid size, the point clouds can be divided into A binocular structured light 3D camera was applied to attain high-quality 3D point cloud data with a resolution of up to 0.02 mm. The large amount of point cloud data results in high computational complexity. Thus, the voxel grid algorithm was implemented to perform down sampling, while maintaining the input data's shape characteristics and geometric properties. Given the voxel grid size, the point clouds can be divided into multiple voxel grids by octree, as shown in Figure 3. The entire point clouds are approximately expressed by the centroids of those voxel grids to achieve down sampling. The coordinates (XYZ) of centroids can be calculated as follows: where n is the total number of points in the voxel V. multiple voxel grids by octree, as shown in Figure 3. The entire point clouds are a mately expressed by the centroids of those voxel grids to achieve down sampli coordinates (XYZ) of centroids can be calculated as follows: where is the total number of points in the voxel . The points include not only objects but also planes and other noisy points t go against the subsequent instance segmentation. The random sample consensus SAC) algorithm was utilized to remove the plane. The RANSAC algorithm random pled three points as the minimum point set to generate a hypothetical plane in ev ation. Then, the distance between the remaining points and the plane generated b three points was calculated by the following formula (2): where , , , are the parameters of the calculated plane equation. For a given d threshold ( = 2 mm), the number of points whose distance was below the th were counted as inliers. After iteration, the RANSAC algorithm returned to the pla the highest percentage of inliers.
The statistical outlier removal algorithm was adopted to eliminate the noisy [21]. After traversing each point's k-nearest ( = 6) neighbors, this approach del points whose average distance to their neighbors was more than multiple standa ations of the mean distance to the query point. The points in accordance with form remained.
where ̅ represents the average distance between and its k-nearest neighbors, represents the standard deviation multiple threshold (usually _ = 1), are the mean and standard deviation of the Gaussian distribution, which was ge by the average distance between and the remaining point, respectively.
The results of preprocessing are shown in Figure 4. These steps reduce the of points in the original data and remove noises. They help to decrease the dow processing calculation consumption and increase the accuracy of the proposed metho ever, there still are noises and adhesion that cannot be eliminated, as shown in Fig which we aim to remove in the following section. The points include not only objects but also planes and other noisy points that will go against the subsequent instance segmentation. The random sample consensus (RANSAC) algorithm was utilized to remove the plane. The RANSAC algorithm randomly sampled three points as the minimum point set to generate a hypothetical plane in every iteration. Then, the distance between the remaining points and the plane generated by these three points was calculated by the following Formula (2): where a, b, c, d are the parameters of the calculated plane equation. For a given distance threshold (δ = 2 mm), the number of points whose distance was below the threshold were counted as inliers. After iteration, the RANSAC algorithm returned to the plane with the highest percentage of inliers. The statistical outlier removal algorithm was adopted to eliminate the noisy points [21]. After traversing each point's k-nearest (k = 6) neighbors, this approach deleted the points whose average distance to their neighbors was more than multiple standard deviations of the mean distance to the query point. The points in accordance with Formula (3) remained.
where d represents the average distance between p and its k-nearest neighbors, std_mul represents the standard deviation multiple threshold (usually std_mul = 1), µ p and σ p are the mean and standard deviation of the Gaussian distribution, which was generated by the average distance between p and the remaining point, respectively. The results of preprocessing are shown in Figure 4. These steps reduce the number of points in the original data and remove noises. They help to decrease the downstream processing calculation consumption and increase the accuracy of the proposed method. However, there still are noises and adhesion that cannot be eliminated, as shown in Figure 4d, which we aim to remove in the following section.

Over-Segmentation Based on Supervoxels and MLS Surface Fitting
Unsupervised over-segmentation is one of the most widely used point cloud processing methods, which has been extensively used in computer vision. Similar to superpixels, the point cloud is divided into voxel regions with analogous properties by supervoxel segmentation. One of the most widely used supervoxel methods is voxel cloud connectivity segmentation (VCCS) [22]. However, mis-segmentation commonly occurs using VCCS for unclear boundaries. Research has been performed to refine supervoxels. Guarda et al. proposed a C2NO algorithm to generate constant size, compact, nonoverlapping supervoxel clusters [23]. In Xiao et al.'s work, a merge-swap optimization framework was introduced to generate regular, compact supervoxels with adaptive sizes using an energy function [24]. The points that belong to two separate objects are grouped into one cluster in the segmentation of stacked industrial workpieces, which is caused by noises and adhesion. An improved over-segmentation approach was proposed to address this issue based on supervoxels and MLS surface fitting. The noisy points and adhesion, which cannot be removed by preprocessing, can be effectively eliminated. Consequently, the proposed method realizes the goals of minimizing the mis-segmentation occurrence and enhancing the accuracy of workpiece instance segmentation. The process of this method is shown in Figure 5. The details of the proposed method will be elaborated on in this section.

Over-Segmentation Based on Supervoxels and MLS Surface Fitting
Unsupervised over-segmentation is one of the most widely used point cloud processing methods, which has been extensively used in computer vision. Similar to superpixels, the point cloud is divided into voxel regions with analogous properties by supervoxel segmentation. One of the most widely used supervoxel methods is voxel cloud connectivity segmentation (VCCS) [22]. However, mis-segmentation commonly occurs using VCCS for unclear boundaries. Research has been performed to refine supervoxels. Guarda et al. proposed a C2NO algorithm to generate constant size, compact, nonoverlapping supervoxel clusters [23]. In Xiao et al.'s work, a merge-swap optimization framework was introduced to generate regular, compact supervoxels with adaptive sizes using an energy function [24]. The points that belong to two separate objects are grouped into one cluster in the segmentation of stacked industrial workpieces, which is caused by noises and adhesion. An improved over-segmentation approach was proposed to address this issue based on supervoxels and MLS surface fitting. The noisy points and adhesion, which cannot be removed by preprocessing, can be effectively eliminated. Consequently, the proposed method realizes the goals of minimizing the mis-segmentation occurrence and enhancing the accuracy of workpiece instance segmentation. The process of this method is shown in Figure 5. The details of the proposed method will be elaborated on in this section.  For the given resolution of a voxel, the over-segmentation algorithm begins with the voxelization that is generated from the point cloud by octree. The process of supervoxel over-segmentation is similar to polycrystalline nuclear crystallization of the supersatu-  For the given resolution of a voxel, the over-segmentation algorithm begins with the voxelization that is generated from the point cloud by octree. The process of supervoxel over-segmentation is similar to polycrystalline nuclear crystallization of the supersaturated saline solution, where all the crystal nuclei grow simultaneously. Therefore, seed voxels need to be selected to initialize the supervoxels after the voxelization. The spatial relationship among those voxels was created by building an adjacency graph on 26-adjacent of the voxel. Assuming that each seed is evenly distributed in the three-dimensional space, the voxels most approximated to the centers of the given seed resolution are selected as the candidates of seed voxels.
Some candidates of seed voxels isolated from their neighbors need to be deleted. The seed voxels where there is not a sufficient number over min_n of voxels surrounding them in the search area should be removed. The filter criterion is as follows: where R search represents the search radius of the seed voxels, R seed represents the revolution of seed voxels, which decides the distance between adjacent supervoxels and R voxel represents the size of voxels generated by voxelization. R seed should be much larger than R voxel ; otherwise, the seeds will not be selected correctly, which may cause mis-segmentation. sd represents the seed voxels; only the candidates that fit Formula (6) will remain as the initial supervoxels. In Equations (4)- (6), only the parameters including R seed and R voxel need to be assigned based on the size of objects. Other parameters are able to be calculated according to these two parameters. Figure 6 shows the geometric representation of those parameters.

Voxel Feature Distance
In our implementation of supervoxel generation, the voxel feature distance is utilized to determine the similarity of seed voxels and their adjacent voxels. First, the spatial distance is normalized to limit the search scope of every clustering iteration. The algorithm will stop searching when it approaches the cluster center of the adjacent supervoxel by using a maximum range of √3 to normalize the spatial distance. Then, the normal difference is calculated, which characterizes the degree of surface bending. Thus, the boundary properties of 3D voxel data can be represented by spatial distance and normal.

Voxel Feature Distance
In our implementation of supervoxel generation, the voxel feature distance is utilized to determine the similarity of seed voxels and their adjacent voxels. First, the spatial distance is normalized to limit the search scope of every clustering iteration. The algorithm will stop searching when it approaches the cluster center of the adjacent supervoxel by using a maximum range of √ 3R seed to normalize the spatial distance. Then, the normal difference is calculated, which characterizes the degree of surface bending. Thus, the boundary properties of 3D voxel data can be represented by spatial distance and normal.
where x, y, z are spatial coordinates, → N is the normal of a voxel, D s and D n are the spatial distance and normal difference, respectively. D is the fusion distance of two features, where λ and µ are the parameters that allocate the influential proportion of two feature distances.
By iteratively traversing the adjacent voxels of all the initial supervoxels, the voxels will melt into their neighboring supervoxels, according to spatial distance and normal distance. The lower voxels are searched and processed layer by layer until all the adjacent voxels of supervoxels are traversed. Additionally, after updating the cluster centers of the supervoxels, supervoxels regrow until the cluster centers are stable or this algorithm reaches the maximum iterations.

MLS Surface Fitting
Due to the noises and adhesion of point clouds, surface fitting is adopted to refine the supervoxels. The least squares method has a widespread application in curve and surface fitting. It has been improved by many researchers, including total least squares (TLS), recursive least squares (RLS), weighted least squares (WLS), generalized least squares (GLS), partial least squares (PLS) and segmented least squares (SLS). However, in the cases of the large amount and irregular, scattered distribution of point cloud data, the abovementioned methods are not suitable on account of their global approximation strategies. The moving least squares (MLS) [25] method is utilized, which is a local approximation to represent the surface of supervoxel clusters. Compared with the traditional least square method, every point in the fitting region will be projected to the locally weighted fitting surface in the MLS method. On a local subdomain of the fitting region, the fitting function is defined by the following equation: p(x, y) = 1, x, y, x 2 , xy, y 2 T (11) where n represents the number of points in the local reference domain of a given radius at the target point. w(x − x I ) is a weighted function, which guarantees the increasing contribution to optimization function J with decreasing distance from the sampling point to the target point.
To consider MLS's sensitivity to outliers, radius outlier removal is adopted to eliminate the isolated points while avoiding excessive fitting deviation before performing MLS fitting. After adding the MLS fitting filter, the adhesion of separate objects' point clouds caused by the structured-light projection angle and stacking can be removed without affecting the shape characterization of the objects. The two examples of the denoising results are shown in Figure 7. There are significant changes that occurred in the boxes, where the adhesion of two separate objects' point clouds can be removed, while the point clouds still convey shapes.
fitting. After adding the MLS fitting filter, the adhesion of separate objects' point clouds caused by the structured-light projection angle and stacking can be removed without affecting the shape characterization of the objects. The two examples of the denoising results are shown in Figure 7. There are significant changes that occurred in the boxes, where the adhesion of two separate objects' point clouds can be removed, while the point clouds still convey shapes.

Region Merging Based on Multi-Feature with Convexity Judgment
The patches produced by over-segmentation should be merged into object clusters. The supervoxel patches contain precise geometric and other information about the objects. If we give additional constraints based on the geometric characters and structure relationships between patches, the instance segmentation will be accomplished without a training dataset.
The distance metric was proposed to decide whether the patches are clustered or not, which plays a significant role in the merging algorithm. The distance metric is a fusion of the following two features: a geometric feature distance δ G that represents the geometric distance between any two adjoining supervoxel patches; a spatial distance δ D that captures the Euclidean distance of any two adjacent patches' centroids.
As shown in Figure 8a, for any two patches p s and p t , their centroids are represented by x t and x s , their normal vectors by → n s and → n t and the unit vector laying on the line connecting the two centroids by ). To represent the locational and geometrical relationship of the two patches, the feature distance δ D and δ G can be described as follows: Owing to the geometric data and spatial distance data being intrinsically different types of data, the two features need to be transformed into a unified domain for normalization. Therefore, the proposed distance value between two supervoxel patches and can be described as follows: where and are two transformations defined to normalize two feature distances into unified ranges between 0 and 1. Any of the two feature items in formula (17) can be changed; consequently, an adaptive value as weight is proposed to feed the specific needs of different applications. Presumptively, and have unknown distributions with unknown means and . We can then define the adaptive and the transformations , by the following equation: The process of the region merging algorithm is shown in Figure 9. The locally convex connected patches (LCCP) [26,27] method is a segmentation method based on the concave and convex relations between two patches, which was also exploited in other works [28,29]. In our work, the geometrical feature distance δ G is incorporated with a convexity criterion inspired by the LCCP method. The new δ G is defined by the following equation: As illustrated in Figure 8b,c, if the angle α s , α t between the normal vectors → n s , → n t of the two adjoining patches and the line connecting their centroids → d is deemed as convex, the geometric distance δ G is halved. The two patches with a valid convex property are more probable to be merged into parts of the same object. Two patches are evaluated as convex if and only if they comply with both the convex-concave criterion and the sanity criterion. If α s < α t or ∠ → n s , → n t < 1 • i.e., two patches are almost parallel; it is regarded that they are convex in the convex-concave criterion. However, the convex property must be validated by the sanity criterion to be valid. Two surfaces are disconnected when there is only a singular connection between them. Only if the angle between the cross product of normal Owing to the geometric data and spatial distance data being intrinsically different types of data, the two features need to be transformed into a unified domain for normalization. Therefore, the proposed distance value between two supervoxel patches p s and p t can be described as follows: where T G and T D are two transformations defined to normalize two feature distances into unified ranges between 0 and 1. Any of the two feature items in Formula (17) can be changed; consequently, an adaptive value λ as weight is proposed to feed the specific needs of different applications. Presumptively, δ G and δ D have unknown distributions with unknown means µ G and µ D . We can then define the adaptive λ and the transformations T G , T D by the following equation: The process of the region merging algorithm is shown in Figure 9.

Evaluation
To evaluate the segmentation results, some experimental performance indicators are described as follows.

Evaluation
To evaluate the segmentation results, some experimental performance indicators are described as follows.
The ground-truth partition G = {G 1 , G 2 , G 3 , . . . , G n } is defined as a set of artificial labeled points set G i , and the segmentation result S = {S 1 , S 2 , S 3 , . . . , S m } is defined as a set of regions generated by the algorithm in the same point cloud. Additionally, N G = n represents the number of ground-truth regions. The precision is defined to evaluate the segmentation result of our algorithm compared with the ground-truth, as described by the Formula (22), which is as follows: where TP i represents the number of points for an object in region i and accurately segmented as the object in the segmentation result. TP i is calculated by figuring the overlap point cloud between G i and S i . FP i represents the number of points actually for an object but not segmented as the object in region i. Due to the noises and other reasons, the artificial annotated objects may not be quite correct. So, the workpiece with a precession of larger than 95% will be regarded as successfully segmented. The number of successfully segmented workpieces is defined as N T . The recognition rate is defined by Formula (23): The results of instance segmentation will significantly and directly affect positioning accuracy, related to registration. Consequently, to test the validity and efficiency of the proposed algorithm in industrial robot applications, registration experiments were performed. Many point cloud registration algorithms have been proposed including singular value decomposition (SVD) [30], random sample consensus (RANSAC) [31], normal distributions transform (NDT) [32], sample consensus initial alignment (SAC-IA) [33], iterative closest point (ICP) [34] and its improved algorithm [35,36]. SAC-IA coarse registration and ICP fine registration algorithm were adopted for their precision and high efficiency. Firstly, the SAC-IA algorithm was utilized to perform coarse registration, using the fast point feature histogram description (FPFH) [33] as the point cloud feature description. The transformation matrix obtained by SAC-IA was used as the initial matrix in the ICP algorithm. Then, the target point cloud was aligned to the template point cloud by minimizing the distance iteratively to attain the fine matrix. The fitness score, i.e., the mean square error (MSE) between the target workpiece and the template workpiece, was calculated using Formula (24), which is as follows: wherep = {p i |i = 1, 2, 3 . . .} and Q = {q i |i = 1, 2, 3 . . .} represent the points in the target point clouds after translation and the points in the template point clouds, respectively. m is the number of point pairs. The objects with a fitness score below 1.2 mm 2 are defined as high matching objects. Therefore, the high registration rate means the proportion of high matching workpieces.

Experimental Results and Discussion
The experimental setup is shown in Figure 10. The experiments were conducted in the following two parts: instance segmentation compared with ground-truth; segmentation performance tests combined with SAC-ICP registration in industrial application. The experiment platform was Intel Core i7-8750, with 8G memory, Windows 10 64-bit operating system, VS2015VC++win64 console application, and open source point library PCL 1.9.1. Three kinds of workpieces were taken for the experiments to test the feasibility in different scenes, as shown in Figure 11. as high matching objects. Therefore, the high registration rate means the proportion of high matching workpieces.

Experimental Results and Discussion
The experimental setup is shown in Figure 10. The experiments were conducted in the following two parts: instance segmentation compared with ground-truth; segmentation performance tests combined with SAC-ICP registration in industrial application. The experiment platform was Intel Core i7-8750, with 8G memory, Windows 10 64-bit operating system, VS2015VC++win64 console application, and open source point library PCL 1.9.1. Three kinds of workpieces were taken for the experiments to test the feasibility in different scenes, as shown in Figure 11.

Instance Segmentation Experiments
Ten experiments were conducted to demonstrate the data simplification ability of our over-segmentation method, whereas VCCS in the same revolution cannot simplify point cloud data. Table 1 illustrates that the simplifying radio in ten experiments all reach over 65%. The reason why our method can simplify data is that the over-segmentation based on MLS can reduce the useless and noisy points. Ten experiments were conducted in three groups of different workpieces to study the performance of the proposed method compared with other methods. The parameters we adopted are listed in Table 2. Different thresholds were set for different kinds of workpieces with the best accuracy. However, the same voxel size was utilized in different methods, including search radius.

Instance Segmentation Experiments
Ten experiments were conducted to demonstrate the data simplification ability of our over-segmentation method, whereas VCCS in the same revolution cannot simplify point cloud data. Table 1 illustrates that the simplifying radio in ten experiments all reach over 65%. The reason why our method can simplify data is that the over-segmentation based on MLS can reduce the useless and noisy points. Ten experiments were conducted in three groups of different workpieces to study the performance of the proposed method compared with other methods. The parameters we adopted are listed in Table 2. Different thresholds were set for different kinds of workpieces with the best accuracy. However, the same voxel size was utilized in different methods, including search radius. The segmentation results of different methods are shown in Figure 12. The proposed method can segment stacked workpieces accurately, while under-segmentation occurs in other methods. The results of the segmentation accuracy and recognition rate are listed in Table 3. The exchanged tests were performed to analyze different contributions of our bottom-up method, including VCCS combined with our merging method and our over-segmentation method combined with LCCP. The results demonstrate that the average precision of the proposed method reached up to 0.988, 0.984, 0.988 in Tee pipe 1, Tee pipe 2 and Two-way elbow, respectively. The recognition rates were 0.936, 0.975, 0.958, respectively. The results illustrate that the proposed method is more accurate than other methods. The proposed over-segmentation method plays a significant role in enhancing the segmentation accurate rate.     To check the position accuracy after segmentation, an analysis of the mean errors between the segmented workpieces' centroids and the artificial annotated point cloud in the XYZ-axis was performed. The comparisons of the mean errors using different methods in the same point cloud data are shown in Figure 13. In each group of Figure 13a-c, the mean errors of all the workpieces in the XYZ-axis are calculated for every experiment. The mean errors of the workpieces' centroids are volatile in other methods (blue and black lines). The mean error in the Euclidean method even reaches higher than 19 mm in the X-axis of Tee pipe 2. The mean errors of the segmented workpieces' centroids are mostly below 2 mm in the proposed method, which is lower and more stable than the other methods. Consequently, the proposed method can maintain the shape characteristics and locational information of workpieces; simultaneously, the point cloud data can also be simplified. To check the position accuracy after segmentation, an analysis of the mean errors between the segmented workpieces' centroids and the artificial annotated point cloud in the XYZ-axis was performed. The comparisons of the mean errors using different methods in the same point cloud data are shown in Figure 13. In each group of Figure 13a-c, the mean errors of all the workpieces in the XYZ-axis are calculated for every experiment. The mean errors of the workpieces' centroids are volatile in other methods (blue and black lines). The mean error in the Euclidean method even reaches higher than 19mm in the Xaxis of Tee pipe 2. The mean errors of the segmented workpieces' centroids are mostly below 2 mm in the proposed method, which is lower and more stable than the other methods. Consequently, the proposed method can maintain the shape characteristics and locational information of workpieces; simultaneously, the point cloud data can also be simplified.

SAC-ICP Registration Experiments
The registration results are shown in Figure 14. The registration results compared with other methods in the same registration parameters are listed in Table 4. The fitness scores of different workpieces are diverse because of the object shape and registration algorithm. Compared with other methods, the target point clouds and the template point clouds were matched more accurately in the proposed method for accurate instance segmentation. Owing to the effective simplification of data, the proposed method can save more registration consuming time with low error and high registration rates.

SAC-ICP Registration Experiments
The registration results are shown in Figure 14. The registration results compared with other methods in the same registration parameters are listed in Table 4. The fitness scores of different workpieces are diverse because of the object shape and registration algorithm. Compared with other methods, the target point clouds and the template point clouds were matched more accurately in the proposed method for accurate instance segmentation. Owing to the effective simplification of data, the proposed method can save more registration consuming time with low error and high registration rates.

SAC-ICP Registration Experiments
The registration results are shown in Figure 14. The registration results compared with other methods in the same registration parameters are listed in Table 4. The fitness scores of different workpieces are diverse because of the object shape and registration algorithm. Compared with other methods, the target point clouds and the template point clouds were matched more accurately in the proposed method for accurate instance segmentation. Owing to the effective simplification of data, the proposed method can save more registration consuming time with low error and high registration rates.

Conclusions
In this paper, an improved instance segmentation method based on supervoxels for the localization of industrial robots has been proposed, which can process point cloud data more accurately, robustly, and effectively. An over-segmentation algorithm with MLS surface fitting was presented, which generates supervoxel patches, while eliminating noisy points and point clouds' adhesion by refinement. Additionally, the adaptive region merging algorithm based on multi-features and convex-concave judgment was performed to accomplish instance segmentation. The experimental results demonstrate the feasibility and stability of the proposed method for application in industrial robots. Compared with other traditional methods, the proposed method achieves the instance segmentation of workpieces with higher precision and recognition rate under the complex condition of multiple similar stacked objects. Furthermore, the registration time can be reduced due to the data simplification of the proposed method. In future work, the energy function will be considered based on the proposed method to avoid boundary overlap. Additionally, the supervoxel-based over-segmentation clustering will be further developed for the application in semantic segmentation.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest:
The authors declare no conflict of interest.