FEC: Fast Euclidean Clustering for Point Cloud Segmentation

Segmentation from point cloud data is essential in many applications such as remote sensing, mobile robots, or autonomous cars. However, the point clouds captured by the 3D range sensor are commonly sparse and unstructured, challenging efficient segmentation. In this paper, we present a fast solution to point cloud instance segmentation with small computational demands. To this end, we propose a novel fast Euclidean clustering (FEC) algorithm which applies a pointwise scheme over the clusterwise scheme used in existing works. Our approach is conceptually simple, easy to implement (40 lines in C++), and achieves two orders of magnitudes faster against the classical segmentation methods while producing high-quality results.


Introduction
A point cloud is a data structure containing a large number of 3D points, obtained by using LiDAR or 2D images.Segmentation of point cloud has wide applications spanning from 3D perception and remote sensing 3D data processing to 3D reconstruction in virtual reality.For example, a robot must identify obstacles in a scene so that it can interact and move around the scene [1].Achieving this goal requires distinguishing different semantic labels as well as various instances with the same semantic label.Thus, it is essential to investigate the problem of segmentation.
Note that the term segmentation of point cloud is commonly used in robotic and remote sensing while it refers to instance segmentation in computer vision.

Related Works
There is a substantial amount of work that targets acquiring a global point cloud and segmenting it off-line which can be classified into four main categories:

Edge-based Method
The most important procedure for Edge-based approaches is finding boundaries of each object section.[5] extends edged detection methods from two-dimension (2D) to three-dimension (3D) level in geometrical way.This notion hypothesizes that the 3D scene can be divided by planes, and finding the edges is an intrinsic optimal way to get units.Then the segmentation problem simply to edge detection according to gradient, thresholds, morphological transform, filtering, template-matching [6][7][8][9][10].[11] introduce a parameter-free edge detection method with assistance of kernel regression.[4] summarise that this kind of method could be divided into two stages, namely border outlining and inner boundary grouping.However, the main drawback of edge-based methods is under-effective in noisy data [10].

Region Growing Based Method
Region growing (RG)-based methods focus more on extracting geometrical homogeneous surfaces than detecting boundaries.The assumption is that different parts of objects/scenes are visually distinguishable or separable.Generally speaking, these methods make point groups by checking each point (or derivatives, like voxels/supervoxels [12,13]) around one or more seeds by specific criteria.So researchers advocating RG-based method developed several criteria considering orientation [4,14], surface normal [4,15,16] , curvatures [17], et al. [18] explore the possibility to apply RG algorithm to multiple conclusion cases (i.e., leaf phenotyping) and proves to be feasible.Typically, the conventional regional growth algorithm employs the RANSAC algorithm to generate seed spots.Considering internal features of point clouds, [19] randomly select seeds and segment multi-class at once.[20] propose an RG-based method, which filters seed candidates by roughness measurement, for robust context-free segmentation of unordered point clouds based on geometrical continuities.However, the selection strategy of seed points is crucial cause if means to the computing speed and over-or under-segmentation [12].In addition, these approaches are highly responsive to the imprecise calculation of normals and curvatures close to region borders [21].

Clustering Based Method
The clustering algorithms segment or simplify point cloud elements into categories based on their similarities or euclidean/non-euclidean distances.As a result, k-means [22], mean shift [23], DBSCAN [24], and euclidean cluster (EC) extraction [3] were employed on this task.The k-means clustering aims at grouping point data into k divisions constrained by average distances [25].Since point cloud data is often acquired unevenly, k-means is an ideal tool to remove redundant dense points [26].[27] introduces a k-plane approach to categorize laser footprints that cannot be accurately classified using the standard k-means algorithm.DBSCAN assumes density distribution is the critical hint for deciding sample categories as dense regions, making it noise-resistant [24].Even so, the DBSCAN method suffers from high memory capacity requirements for storing distance data between points.[28] improve normal clustering methods to a hierarchical one, which is shown to be suitable for microrelief terrains.In order to compress point clouds without distortion and losing information, [29] demonstrate a 3D range-image oriented clustering scheme.Although the clustering-based methods are simple, the high iterate rate of each point in the point cloud leads to a high computation burden and defeats efficiency.

Learning Based Method
While deep learning-based methods often provide interesting results, the understanding of the type of coding solutions is essential to improve their design in order to be used effectively.To alleviate the cost of collecting and annotating large-scale point cloud data, [30] propose an unsupervised learning approach to learn features from an unlabeled point cloud dataset by using part contrasting and object clustering with deep graph convolutional neural networks (GCNNs).[31] incoprate clustering method with the proposed FPCC-Net specially for industial bin-picking.PointNet [32] is the first machine learning neural network to deal with 3D raw points, and senstive to each points' immediate structure and geometry.Other current methods use deep learning directly on point clouds [33][34][35][36] or projections into a camera image [37] to segment instances in point clouds.Learning-based methods provide in an indoor scene but commonly suffer from long runtime and process large-scale point clouds.

Motivations
Although classical segmentation approaches mentioned above achieve promising segmentation results, one main drawback is huge computation-consuming, restricting their application to real-world, large-scale point cloud processing.To overcome this deficiency, we summarized two main strategies existing to accelerate the point cloud segmentation:

GPU vs CPU.
Conventional segmentation methods depend on the CPU's processing speed to run computations in a sequential fashion.[38] provide GPU-based version of EC [3] in the PCL [39] library and further extended by [40] who achieve 10 times speedup than the CPU-based EC.
Despite the fact that GPU enables faster segmentation, it is not practical for hardware devices with limited memory capacity and computation resources, such as mobile phones and small robotic systems (e.g., UAV), not to mention the steep price.

Pre-knowledge vs unorganized.
Taking advantage of pre-knowledge about the point cloud such as layer-based organized indexing in LiDAR data [41], relative pose between the LiDAR sensor, or point cloud in each 3D scan(frame) [42] accelerate segmentation speed.These assumptions provided by the structured point cloud hold in specific scenarios such as autonomous vehicle driving.
However, these premises are not available in many applications since not all the point cloud data are generated from the vehicle-installed LiDAR.For example, airborne laser scanning, RGB-D sensors, and image-based 3D reconstruction supply general organized data instead, making the pre-knowledge approaches [41,42] fail.
From the discussion above, we found out that an efficient and low-cost solution to general point cloud segmentation is vital for real-world applications but absent from research literature.

Contributions
As shown in Table 1, we attack the general point cloud segmentation problem and place an emphasis on computational speed as compared to the works that are considered state-of-the-art.The process of segmentation is proposed to be completed in two parts by our approach: (i) ground points removal and (ii) the clustering of the remaining points into meaningful sets.The proposed solution underwent extensively rigorous testing on both synthetic and real data.The results provided conclusive evidence that our method is superior to the other existing approaches.A fast segmentation redirects precious hardware resources to more computationally demanding processes in many application pipelines.The following is a condensed summary of the contributions that this work makes: • We present a new Euclidean clustering algorithm to the point could instance segmentation problem by using point-wise against the cluster-wise scheme applied in existing works.

Materials and Methods
Our method concludes two steps: (i) ground points removal and (ii) the clustering of the remaining points.

Ground Surface removal
Cloud points on the ground constitute the majority of input data and reduce the computation speed.Besides, the ground surface affects segmentation quality since it changes input connectivity.Therefore, it is essential to remove the ground surface as a pre-processing.Many ground points extraction methods such as grid-based [44] and plane fitting [43] have been used in existing works.The cloth simulation filter (CSF) [45], which is robust on complicated terrain surfaces, was the one that we decided to utilize to extract and eliminate ground points for this research.

Fast Euclidean Clustering
Similar to EC [3], we employ Euclidean (L2) distance metrics to measure the proximity of unorganized points and aggregate commonalities into the same cluster, which can describe as: where and d th is a maximum distance threshold.Algorithm 1 describes the algorithmic processes and illustrates them with an example displayed in Figure 2. Note that the proposed algorithm uses point-wise scheme, which loop points with the input numbering order against the cluster-wise scheme used in EC and RG.The deployment of the proposed FEC is simple, requiring only 40 lines of code written in C++.
Time complexity.The complexity of constructing the kd-tree in the Big-O notation format is O(3N log N ), where N is the input size of the total point Input: • p i ∈ P: unorganized point cloud 3 Experiments and Results

Method Comparison
In our experiments, the proposed method FEC was compared to five state-ofthe-art point cloud segmentation solutions: • EC : Classical Euclidean clustering algorithm [3] implemented in PCL library [39] 1 .• RG: Classical region growing based point cloud segmentation solution [4] implemented in PCL library [39] 2 .• SPGN : Recent learning-based method [33] which is designed for small-scale indoor scenes3 .• VoxelNet: Recent learning-based method [34] which learns sparse pointwise features for voxels and use 3D convolutions for feature propagation4 .• LiDARSeg : State-of-the-art instance segmentation [36] for large-scale outdoor LiDAR point clouds 5 .
Note that the value of d th and Th max were set as the same to EC, and RG.For a fair comparison, we remove the ground points in the point cloud (section.2.1) and then use them as the input for EC, RG, and FEC.

Metrics Evaluation
We provide three metrics (average prevision, time complexity, space complexity) to demonstrate that the proposed FEC outperforms baselines on efficiency without penalty to effectiveness.
• Average precision: The average precision (AP) is a widely accepted point-based metric to evaluate the segmentation quality [36], as well as similar to criterion for COCO instance segmentation challenges [46].The equation for AP can be presented as: where TP is true positive, and FP is false positive.We use 0.75 as the point-wise threshold for true positives in the following experiments.• Complexity: Both running time and memory consumption of real data experiments are designed to evaluate the time complexity and the space complexity.Our method was executed on a 32GB RAM, Intel Core I9 computer and compared to two other classical geometry-based methods: EC and RG.
The NVIDIA 3090 GPU was utilized to evaluate both the learning-based methods SPGN and VoxelNet.

Synthetic Data
In this experiment, we evaluate the performances of EC, RG, and FEC respectively on synthetic point cloud data with increasing scale.Note that we generate unorganized points where the state-of-the-art learning-based methods SPGN, VoxelNet, and LiDARSeg which use single LiDAR scan as input, are not feasible in such setting.

Setting
We first divide 3D space into multiple voxels, and then generate clusters (segments) by filling m 3D points evenly inside the randomly selected n voxels.Under such a setting, we can control the cluster number by varying the value of n.Similarly, the cluster density can be determined by varying the value of m.
Besides, we can also simulate clusters with different uniformities by changing the strategy of filling 3D points into voxel from even to random (even filling followed by random shift with variance σ).Note that we randomly set the 3D point index to make the synthetic an unorganized point cloud.

Varying Density of Cluster
In this experiment, we increase the density of synthetic data from 10 to 500 under varying total cluster numbers from 100 to 2200.As illustrated in Figure 4, the running time of both RG and EC grew significantly with increasing cluster densities under varying cluster numbers.An interesting observation is that the growth curve of RG is linear while EC provides exponential growth and overtakes the running time with a density of cluster larger than 500.In contrast, the proposed method FEC provides stable running times which are at least 100× faster performance over EC and RG with increasing density (number of points in unit volume) of each cluster.

RG EC FEC
Fig. 4: Running time for EC (yellow), RG (red) and FEC (green) with increasing density of different number of cluster.

Varying Number of Clusters
In this experiment, we fix the density of clusters to 200, and increase the number of clusters from 100 to 1900. Figure 5(a) demonstrates that the running time of EC and RG grow dramatically with an increasing number of clusters from milliseconds level to hundred seconds level.In contrast, FEC runs significantly faster with < 1s performance under all configurations.

Running time [s]
Fig. 6: Running time for EC (yellow), RG (red) and FEC (green) with increasing dual-mixing of cluster number and density.

Varying Uniformity of Cluster
We alter the uniformity of each cluster in this experiment by simulating an increasing number of normal distribution sub-clusters.The results in Figure 5(b) reveal that the uniformity of the cluster obviously drags the running time of EC and RG with a hundredfold increase.In contrast, the uniformity of the cluster has slightly affected FEC without significant growth in running time.

Segmentation Quality
In the synthetic data experiment, we observe that all three geometry-based methods EC, RG, and FEC provide segmentation precision approximate to 1 without significant difference.

Real Data
In this experiment, we evaluate the performance of all methods on a publicly available dataset, namely, KITTI odometry task [2] with point-wise instance label from semanticKITTI [47].Following the instructions of [33,34,36], we trained the SPGN, Voxel-Net and LiDARSeg using semanticKITTI.Thus, we compare the proposed method FEC against to classical geometry-based approaches EC, RG and learning based solutions SPGN, VoxelNet, LiDARSeg in 12 sequences of semanticKITTI.

FEC vs Geometry-based Methods on KITTI
We tested the EC, RG and the proposed FEC on real point cloud datasets from KITTI odometry point cloud sequence [2] with two common segmentation styles in practice, namely: • Inter-class segmentation: the input for inter-class segmentation is a point cloud representing a single class, such as car, building, or tree, for example.Following the completion of the classification step, instance segmentation is carried out in a manner that is distinct for each of the classes.• Intra-class segmentation: as input, intra-class segmentation utilizes multiple-class point clouds.In such a mode, the original LiDAR point cloud is utilized as input without classification.
Efficiency.As shown in Table 2 that quantitatively FEC achieves an average of 50× speedup against existing method EC and 100× RG under intra-class segmentation mode on 12 sequences form #03 to #11.In interclass mode, FEC achieves average 30× and 40× speedup for car and building segmentation, 10× and 20× speedup for tree segmentation against to EC and RG.Besides, an interesting observation is that the running time of the    three methods on #03 and #11 is nearly 2-3 times longer against the rest sequences under the intra-class mode.This is raised because the total number of instances in #03 and #11 is more extensive than the others, especially the tiny objects such as bicycles, trunks, fences, poles, and traffic signs.Since the geometry-based approaches will call more cluster processes in the loop with many small instances, thus, RG, EC, and FEC take much more running time on #03 and #11 sequences over the others.
Memory-consuming.As shown in Table 3 that the proposed FEC consume only one third and half of memory against to EC and RG.
Effectiveness.As shown in Table 3 quantitatively, all three geometry-based methods provide similar instance segmentation quality with AP larger than 60%.Specifically, EC provides the best segmentation accuracy at 65.8% while the proposed FEC achieves a slightly lower score at 65.5%.The qualitative segmentation results are shown in Figure 7 with sequence #00 and #11 as two examples.It verifies our statement that our method and baseline methods provide similar segmentation quality globally.Interestingly, we found out that the proposed FEC provides better segmentation quality in handling details segment.As shown in Fig. 8, for the points on the tree, which are sparer and nonstructural than the other classes, EC and RG often suffer from oversegmentation and under-segmentation problems.For example, in the second row of Fig. 8, the three independent trees are clustered into the building by EC and RG while FEC successfully detects them.In summary, as shown in Table 3 demonstrates that our method achieves significant improvement in efficiency while without penalty to the performance (quality).
LiDARSeg FEC Fig. 9: Comparisons to state-of-the-art learning based solution LiDARSeg [36] on KITTI odometry dataset [47] with semanticKITTI labeling [47].Note that LiDARSeg [36] only applies to a single LiDAR scan as input instead of a long sequence.

FEC vs learning-based Methods on KITTI
In this experiment, we compare the proposed method FEC to state-of-theart learning-based solutions on sequence #00-#11 of the KITTI odometry dataset [47].We trained the learning approaches SPGN [33], VoxelNet [34] and LiDARSeg [36] semanticKITTI labeling [47], and evaluated the running time and average precision (AP) defined in [36].Note that all the learningbased methods were tested with GPU while the geometry solutions EC [3], RG [4] ,and FEC run with CPU only.Data input.Note that in the section.3.4.1 we use the point cloud of the whole sequence as input to three geometry-based methods.However, such large scale input is not feasible to the state-of-the-art deep learning based approaches for instance segmentation or 3D detection, namely, SGPN, Vox-elNet, and LiDARSeg.Thus, in this experiment, we alternatively spilt the Training and hardware.
Particularly, we use the ground-truth point-wise instance label from semanticKITTI panoptic segmentation dataset [47] to train textbfSGPN, VoxelNet, and LiDARSeg with a Nvidia 3090 GPU while forcing the CPU-only mode for EC, RG and proposed FEC.
Efficiency.Please note that since LiDARSeg [36] is designed for segmenting the single LiDAR scan, thus the running time we recorded below is the average process time of each scene instead of the whole sequence.As shown in Table 4 that FEC achieves 5× speed up against to EC, VoxelNet and LiDARSeg, 10× to RG, and 20× to SPGN.Since all the geometry-based methods require ground surface removal as pre-process, thus for a fair comparison, the time-consumings of ground surface detection and removal have been considered in the total running time.It is important to notice that FEC relies on CPU calculation only while all the learning-based approaches are accelerated with GPU in inference without mentioning the huge timeconsuming in the training stage.
Effectiveness.As shown in Table 4 that quantitatively all three geometrybased methods EC, RG and FEC achieve similar instance segmentation quality with AP around 62%.Note that the segmentation accuracy of geometry-based approaches is even slightly better than the leaning-based ones with AP at 59% of LiDARSeg, 55.7% of VoxelNet and 46.3% of SPGN.The qualitative comparisons are shown in Figure 9 with 2 scans in sequence #00 as examples.We can observe that the proposed solution FEC method provides similar segmentation quality as the state-of-the-art learning-based method LiDARSeg.

Discussion
If FEC is faster?
Both the synthetic (section.3.3) and real data (section.3.4) experiments demonstrate that the proposed FEC outperform the state-of-the-art geometry-based and learning-based methods with a significant margin.Specifically, our method is an order of magnitude faster than RG [4] and nearly 5 times faster than EC [3].Besides, without GPU acceleration, FEC is nearly 5 times faster LiDARSeg [36], 6 times faster than VoxelNet [34], and nearly 20 times faster than SPGN [33].

If FEC losses accuracy?
One may have concern that if FEC will lose segmentation accuracy in order to accelerate the processing speed.Both the synthetic (section.3.3) and real data (section.3.4) experiments verify that FEC provides similar segmentation quality as the conventional geometry-based methods RG [4], EC [3], and even slightly better than the learning-based solutions LiDARSeg [36], VoxelNet [34] and SPGN [33].Thus, we point out that the proposed solution FEC achieves significant improvement in efficiency while without penalty to the performance (quality).
Why FEC is faster?
Based on the intuitive analysis in the section.??, we interpret the FEC brings significant improvement , n efficiency mainly due to the point-wise scheme over the cluster-wise scheme used in existing works RG [4] and EC [3].Such a novel point-wise scheme leads to significantly fewer calls to kd-tree search in the loop, which is the key to reducing the running time significantly.

Where we can use FEC?
The proposed solution FEC is a GPU-free faster point cloud instance segmentation solution.It can handle general point cloud data as input without relying on scene scale (e.g.single LiDAR scan) or structure pre-knowledge (e.g.scan line id).Thus, we can apply FEC to large-scale point cloud instance segmentation in various 3D perception (computer vision, remote sensing) and 3D reconstruction (autonomous driving, virtual reality) tasks.

Conclusions
This paper introduces an efficient solution to a general point cloud segmentation task based on a novel algorithm named faster Euclidean clustering.Our experiments have shown that our methods provide similar segmentation results but with 100× higher speed than the existing approaches.We interpret this improved efficiency as using the point-wise scheme against the cluster-wise scheme in existing works.
Future work.The current implementation of FEC is based on a serial computation strategy.Since the point-wise scheme contains multiple associations between the outer and inner loops, thus the parallel computation strategy could be applied to FEC for a potential acceleration.

Fig. 2 :
Fig.2: An example of FEC to point cloud segmentation.Note that FEC utilizes point-wise scheme with point index order.

Fig. 5 :
Fig. 5: Running time for EC (yellow), RG (red) and FEC (green) with increasing number of clusters (a), and non-uniformity (b) of each cluster.

Table 1 :
Summary of the related works.
• d th : the neighbor radius threshold • Th max : the max number of neighbor points The cost of main loop is O(N 2 ν), where ν is an constant number determined by the 3D point density ρ and the neighbor radius threshold d th as ν =43 πd th 3 ρ.Since N 2 ν > 3n log N , thus the overall cost is O(N 2 ν).

Table 2 :
[2]erimental results on point clouds from KITTI vision benchmark[2].Running time (in seconds s) of EC, RG and FEC on 11 sequences (#00-11 from odometry task) based on inter-class and intra-class styles are reported [in seconds].Best results are shown in green.

Table 3 :
Running [2]] (in seconds), memory-consuming (in MB), and AP (average precision defined by[36]) over 11 sequences in KITTI[2]odometry task are reported.Best and second best results are shown in green and blue.

Table 4 :
[36]arisons to state-of-the-art learning based solutions sequence #00-#11 of KITTI odometry dataset[47].Note that since LiDARSeg[36]is designed for segmenting the single LiDAR scan, thus the running time we reported is the average process time of each scene instead of the whole sequence.Best results are shown in green.the KITTI odometry dataset into a single scan as input.Besides, we also report the performances of EC and RG.