3DSG: A 3D LiDAR-Based Object Detection Method for Autonomous Mining Trucks Fusing Semantic and Geometric Features

Li, Huazhi; Wang, Zhangyu; Yu, Guizhen; Gong, Ziren; Zhou, Bin; Chen, Peng; Zhao, Fei

doi:10.3390/app122312444

Open AccessArticle

3DSG: A 3D LiDAR-Based Object Detection Method for Autonomous Mining Trucks Fusing Semantic and Geometric Features

by

Huazhi Li

^1,2,3

,

Zhangyu Wang

^2,3,4

,

Guizhen Yu

^1,2,3,*,

Ziren Gong

^1,2,

Bin Zhou

^2,3,4,

Peng Chen

^1,2 and

Fei Zhao

¹

School of Transportation Science and Engineering, Beihang University, Beijing 100191, China

²

Key Laboratory of Autonomous Transportation Technology for Special Vehicles, Ministry of Industry and Information Technology, Beijing 100191, China

³

Hefei Innovation Research Institute, Beihang University, Hefei 230012, China

⁴

School of Research Institute for Frontier Science, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(23), 12444; https://doi.org/10.3390/app122312444

Submission received: 29 October 2022 / Revised: 25 November 2022 / Accepted: 28 November 2022 / Published: 5 December 2022

Download

Browse Figures

Versions Notes

Abstract

:

Object detection is an essential task for autonomous vehicles to ensure safety. Due to the complexity of mining environments, it is difficult to detect objects accurately and robustly. To address these issues, this paper proposes a novel 3D LiDAR-based object detection method fusing semantic and geometric features for autonomous trucks in mining environments (3DSG). A road region extraction method is presented by establishing a semantic segmentation network with a region searching strategy to eliminate off-road point clouds. To deal with the complexity of unstructured road ground point-cloud segmentation, we propose a cascaded ground detection algorithm by performing semantic segmentation filtering and rectangular grid map filtering. A clustering method is proposed fusing adaptive distance thresholds of Euclidean clusters with semantic segmentation categories to solve the problem of the over- and undersegmentation of objects caused by the sparsity of point clouds. The performance of the proposed method is examined utilizing a real mining dataset named TG-Mine-3D. Compared with state-of-the-art methods, our method achieved higher precision of 66.39%. Moreover, for the truck and pedestrian categories, the performance of our method was significantly improved by 2.66% and 5.80%, respectively. The proposed method running at 51.35 ms achieved real-time performance.

Keywords:

3D LiDAR; autonomous truck; mining environment; object detection; geometric feature; semantic feature

1. Introduction

With the rapid development of autonomous driving technology [1,2,3], open-pit mining environments as a closed scenario are suitable for autonomous truck operation. However, the environment of a mining area is characterized by heavy dust and bumpy roads, as shown in Figure 1, which sets high requirements for the implementation of autonomous driving. For autonomous trucks, one essential task is object detection, which is directly related to the safety and efficiency of mining operations.

Object detection has attracted the interest of numerous scholars with different types of sensors explored (e.g., vision sensors and LiDAR) [1,2,4,5]. Vision sensors are easily affected by undesirable light and weather conditions, and cannot acquire accurate depth information. By comparison, LiDAR, as a three-dimensional (3D) sensor, is advantageous due to not being affected by lighting and providing large amounts of depth information with high accuracy and long-range detection. Several LiDAR-based environment perception methods have been proposed to improve the performance of object detection for autonomous mining trucks, and encouraging results have been achieved [6,7]. However, due to the harsh environmental conditions of open-pit mining areas, as shown in Figure 1a, there remain a series of challenges to be overcome.

The first challenge is that road boundaries in a mining environment are not clear, hindering extracting the region of interest (ROI) to filter LiDAR points out of the road area, as shown in the red rectangles in Figure 1b. The second arises from the bumpiness of the road, leading to instability when detecting the ground according to the height information of the point clouds, as shown in Figure 1c. Furthermore, in view of the wide range of the size of different objects (e.g., pedestrians, vehicles, and trucks), over- or undersegmentation decreases clustering precision [8,9].

To address the above problems, we propose a novel 3D LiDAR-based object detection method fusing semantic and geometric features for autonomous mining trucks in a mining environment (3DSG) in this study. As shown in Figure 2, the pipeline of the proposed method mainly has three parts: road region extraction, ground detection, and object clustering. To reduce the computational burden, the semantic segmentation of point clouds is first performed to extract the ROI. Next, a cascaded ground detection method is employed to distinguish ground and nonground point clouds. Furthermore, a clustering algorithm adjusting Euclidean distance according to semantic categories was designed to classify the point clouds of different objects. The proposed strategy based on 3D LiDAR deployed in autonomous trucks is adaptive to object detection in mining environments with unstructured roads.

The main contributions of this study are summarized as follows:

A cascaded ground detection algorithm based on semantic segmentation point filtering and rectangular grid map filtering was designed, which performs reliably in unstructured terrains such as a mining environment.
To overcome over- and undersegmentation caused by the sparsity of point clouds, we propose a clustering method of adaptive Euclidean clustering distance thresholds according to semantic segmentation categories.
Semantic and geometric features are fused to enhance the object detection performance in terms of both efficiency and accuracy. The quantitative comparisons on the TG-Mine-3D dataset illustrate that our method achieved state-of-the-art performance on truck and pedestrian detection accuracy, and promising computational speed.

The rest of this study is organized as follows. Section 2 presents a review of the related work. Section 3 details the proposed 3D LiDAR-based object detection method for autonomous mining trucks. Experimental results are shown in Section 4. Lastly, Section 5 presents the conclusions and suggestions for future work.

2. Related Work

This study is focused on LiDAR-based object detection for autonomous trucks in mining environments. Generally, LiDAR-based detection methods can be divided into two types: traditional and deep-learning-based [6].

2.1. Research on Traditional Methods

Traditional methods perform object detection according to the geometric features of point clouds with certain assumptions, such as the road surface being fitted into one or multiple planes, and the distinct height between obstacles and the ground. Three basic steps are generally involved: ground and nonground cloud segmentation, nonground cloud clustering, and bounding box fitting. Though they have high computational efficiency, these methods usually have a large number of parameters to be calibrated and validated, resulting in undesirable robustness.

(1) Ground segmentation: Various types of ground segmentation methods have been proposed. They can be roughly divided into six categories [10]; the most widely used are plane-fitting and elevation-map methods. Specifically, traditional methods are more applicable to object detection in urban environments with a flat ground and clear road structures. For example, the ground surface was estimated by fitting planes on the basis of the RANSAC approach to a single or multiregion plane [10,11]. The ground and nonground points were separated on the basis of their distance from the plane. However, the plane-fitting method assumes that the ground is flat or has a gradual slope. Roads in a mining environment have rugged characteristics and cannot be represented in planes. Thus, the plane-fitting method is not suitable for mining environments. In the literature [12,13], LiDAR points were projected onto a grid map to classify nonground clouds. For each cell of the grid map, feature values were calculated on the basis of the average height, height variance, or maximal and minimal height differences. The grid-map method has high computational efficiency because it projects raw point clouds onto a fixed-size map, which reduces the amount of processed data. However, due to the principal characteristics of grid map methods, the ground under overhanging objects such as bridges is detected as a nonground point cloud [10]. Moreover, the sloped terrain in mining environments can be problematic for grid-map methods, as the feature values are similar to the nonground point clouds.

(2) Object Clustering: Scholars have proposed many point-cloud clustering methods, roughly divided into several categories, such as distance-based Euclidean distance clustering [8,14,15] and density-based DBSCAN [16,17]. In [14], a Euclidean-based clustering method was proposed with a radially bounded nearest-neighbor strategy (RBNN). The authors further extended RBNN by considering the normal vector, Euclidean, and angular distance to previously clustered points [15], which rendered the algorithm prone to segmenting surfaces. In [8], the authors proposed an adaptive distance threshold Euclidean clustering method to solve the problems of over- and undersegmentation caused by the fixed parameters. However, clustering methods using only the spatial distance between points cannot address the problem of oversegmentation when an object is occluded, and undersegmentation when multiple objects are close to each other. The density-based clustering method is suitable for arbitrary shapes and is robust to noise. DSets-DBSCAN [18] is an extension to DBSCAN [16] to achieve parameter-free clustering. In [17], the authors proposed a modified version of the DBSCAN method that improved clustering stability for the case of objects being close to each other. However, the density-based clustering method was not effective for point clouds with uneven density.

2.2. Research on Deep-Learning-Based Methods

Recently, a convolutional neural network (CNN) has been successfully applied to 2D [1,19,20] and 3D [21,22,23] object detection, which has become a research hotspot. Deep-learning-based methods are data-driven by learning the characteristics of the annotation samples, which is robust for different scenarios.

Many methods project 3D point clouds onto 2D images; then, image-based detection methods are applied to realize 3D object detection. Chen et al. [24,25] proposed an energy function to score a predefined 3D bounding box from 3D geometric information. In [26,27], geometric constraints between 3D and 2D bounding boxes were used to recover the pose of 3D objects. However, these methods only generated coarse results due to the lack of depth information.

Many researchers prefer voxel-based 3D object detection due to the fast speed and high accuracy. VoxelNet [28], SECOND [29], and PointPillars [21] project point clouds onto a voxel grid and learn features from each voxel. PointPillars is an extension to SECOND, encoding point clouds as pillars, and can achieve outstanding performance and have real-time efficiency. TANET [23] outperformed other methods for pedestrians, and is robust in noise point clouds by considering the channelwise, pointwise, and voxelwise attention of point clouds for each voxel.

Other methods are directly based on point clouds. Qi et al. [30] presented the PointNet architecture to directly learn point features from raw point clouds, and its applications include object classification and segmentation, and scene semantic segmentation. The follow-up works [31] further improved the extracted feature quality by a hierarchical neural network. PointRCNN [22] is a two-stage 3D detector that extracts features through PointNets and then estimates the final results with two-stage detection networks.

Since the above deep-learning-based methods have achieved high detection accuracy, only trained objects could be detected. Detection performance is poor for untrained objects, especially in driving scenarios that are complicated and volatile.

A common limitation of the above methods is that they all require complex parameters or can only detect trained object categories. This paper proposes 3DSG, fusing semantic and geometric features for autonomous trucks in mining environments, and has wide parameter thresholds and does not depend on trained categories.

3. Methodology

To achieve accurate and robust object detection performance for autonomous trucks, a LiDAR-based object detection method is presented. The proposed method operates in a series of steps, including road region extraction, ground detection, and object clustering.

3.1. Road Region Extraction

The road region is the most important area for perception, as objects in the road region may affect the passing of autonomous trucks. Road region extraction can provide the ROI for object detection, which can filter out invalid areas to reduce the computational burden. Most road region detection algorithms have been applied to structured environments [32,33] where road features are obvious: the road surface is flat, and the sidewalks are higher on both sides. As roads in mining environments have large undulations, and the boundaries of the road are not clear, it is a challenge to accurately extract the road region. In this section, a novel road region extraction method for mining environments is proposed. The method mainly includes two stages: first, a semantic segmentation network based on a deep learning method [34,35] is created to obtain the semantic information of point clouds. In this study, the semantic segmentation method is RangeNet++ [36]; then, a search-based strategy is applied for ROI point extraction.

3.1.1. Cloud Semantic Segmentation

In this study, RangeNet++ was applied to segment LiDAR points into multiple categories, including ‘truck’, ‘vehicle’, ‘pedestrian’, and ‘ground’. RangeNet++ is an efficient 3D semantic segmentation architecture that runs in real time. To solve the problem of point-cloud disorder, a range representation method is used that converts each point into image coordinates. The representation method is defined by

(\begin{matrix} u \\ v \end{matrix}) = (\begin{matrix} \frac{1}{2} [1 - a r c t a n (y, x) π^{- 1}] & w \\ [1 - (a r c t a n (z r^{- 1}) + f_{u p}) f^{- 1}] & h \end{matrix})

(1)

where (

u, v

) represent the image coordinates with height h and width w,

f = f_{u p} + f_{d o w n}

is the vertical field of view of the sensor, and

r = {∥p_{i}∥}_{2}

is the range of each point.

An illustration of point-cloud semantic segmentation is shown in Figure 3.

In order to facilitate the observation of different categories, points were marked with different colors according to categories. As shown in Figure 3, ‘ground’ is green, ‘truck’ is red, ‘vehicle’ is blue, ‘pedestrian’ is pink, and ‘other’ is gray. As the road boundary shape features in the mine scenarios are not prominent, we set it in the ‘other’ category for semantic segmentation. Due to the mining environment being a closed scenario, object categories are controllable. Thus, semantic segmentation results are stable.

With the truck moving, one area can be scanned multiple times by LiDAR. However, due to the semantic segmentation noise, this area may be classified into different classes. Inspired by [37], we build a cloud map by stitching clouds with RTK GNSS/INS poses. Then we use statistics to filter noise [37] to improve the accurate and robust of the semantic segmentation.

3.1.2. Road Region Searching

According to the method mentioned above, the LiDAR points are segmented within categories. Because the ‘other’ category includes not only road boundary but also unmarked obstacles, the road boundary cannot be directly extracted on the basis of the points with the ‘other’ category.

Considering the road region, including the road surface and obstacles, the road region can be extracted into the categories of ‘ground’, and obstacles into ‘truck’, ‘vehicle’, and ‘pedestrian’. Since the semantic segmentation method may cause false segmentation, especially on rough road surfaces and on the edges for small obstacles such as pedestrians, as shown in Figure 4, parts of the pedestrian point clouds are false semantic segments into ‘other’ and ‘ground’.

To address this problem, a grid map method was used to search road region point clouds, as shown in Figure 5. Because the point clouds had disorganized characteristics, it was difficult to extract the outer contour of the ground and obstacles. Therefore, semantic segmentation point clouds were projected onto a 2D grid map, and the grid map was searched to extract the outer contours.

The pipeline for road region searching is shown in Figure 5.

After the semantic segmentation of LiDAR points, as shown in Figure 5a, ‘ground’ is green, ‘truck’ is red, and ‘other’ is gray. Then, after projecting semantic segmentation point clouds onto a grid map with M rows and N columns, there were different categories with different colors, as shown in Figure 5b. For the left boundary, each column was searched from left to right. If one grid cell stored both ‘ground’ and ‘other’ category points, the ‘other’ category points were the left boundary points and the search was stopped in this row. For the right boundary, the same search strategy was used, with the search direction from right to left. The left and right boundaries are shown as red lines in Figure 5b. After searching for each row of the grid map, the points were extracted between the left and right boundaries for each row, which were points in the ROI, as shown in Figure 5c.

The grid map search method is illustrated in Algorithm 1.

3.2. Ground Detection

Ground detection separates points into ground points and nonground points. The ground is usually represented by a plane model, which may not be robust for different environments. Considering the points with different category information after semantic segmentation, most of the ground points can be extracted correctly with only a small number of false detections. In order to optimize the ground detection performance, a cascaded ground detection method inspired by [10] is proposed, as shown in Figure 6. The method included two stages: semantic segmentation point filtering and rectangular grid map filtering.

Algorithm 1 Road region searching.

Require:: Point clouds with semantic segmentation categories P = {p $_{1}$ ,p $_{2}$ ,…,p $_{n}$ }, grid map size M, N
Ensure:: Points in left boundary P $_{l b}$ , points in right boundary P $_{r b}$ , points in ROI P $_{r o i}$
1:: Project P to grid map GM(M,N)
2:: for i=1:M do
3:: for j=1:N do
4:: if GM(i,j).label==’ground’ $∥$ GM(i,j).label ==’obstacle’ then
5:: P $_{l b}$ .push_back(j)
6:: break
7:: end if
8:: if j==N then
9:: P $_{l b}$ .push_back(N)
10:: end if
11:: end for
12:: for j=N:1 do
13:: if GM(i,j).label==’ground’ $∥$ GM(i,j).label ==’obstacle’ then
14:: P $_{r b}$ .push_back(j)
15:: break
16:: end if
17:: if j==1 then
18:: P $_{r b}$ .push_back(1)
19:: end if
20:: end for
21:: end for

For the semantic segmentation point-cloud filtering, the obstacle categories were directly classified as nonground point clouds. Since the semantic segmentation algorithm is robust for obstacle points, semantic segmentation points filtering was robust. Since semantic segmentation algorithm precision is low for obstacle edges and small obstacles such as pedestrians, as shown in Figure 4, they may be falsely segmented into the ‘ground’ or ‘other’ category.

For rectangular grid map filtering, the method was applied to improve the accuracy of ground detection, especially for obstacle edges and small obstacles. Because the grid map method was used to filter out the false semantic segmentation to improve segmentation stability and accuracy, wide parameter thresholds were used to decrease false segmentation. The point clouds processed with semantic segmentation filtering were projected onto the grid map with a resolution of 0.2 m, represented as a dataset

G M = {c e l l_{00}, \dots, c e l l_{m n}}

, where m and n denote the vertical and horizontal serial numbers for the grid map, respectively. For each grid map cell, the variance in height

δ_{H^{2}}

, and the maximal and minimal heights

Δ H

were calculated. To maintain all point-cloud information, the indices of point clouds projected onto the cell were stored. Therefore, the grid map feature is denoted as

f = [δ_{H^{2}}, Δ H, i n d e x e s]

. The cell in row i and column j was computed with

G M [i] [j] = \{\begin{matrix} g r o u n d, i f δ_{H}^{2} < t r_{δ} a n d Δ H < t r_{H} \\ n o n g r o u n d; o t h e r w i s e, \end{matrix}

(2)

where

t r_{δ}

and

t r_{H}

denote the threshold of variance and maximal height difference, respectively.

The nonground point clouds are output integrated by the semantic segmentation point-cloud filter and rectangular grid map filter.

In summary, the ground detection algorithm is shown in Algorithm 2.

Algorithm 2 Ground detection.

Require:: Point clouds with semantic segmentation categories in road region, P={p $_{1}$ ,p $_{2}$ ,…,p $_{n}$ }, grid map size M and N, threshold tr $_{δ}$ and tr $_{H}$
Ensure:: Ground point clouds P $_{g}$ , nonground point clouds P $_{n g}$ , nonground candidate point clouds P $_{c a n d i d a t e}$
1:: for i=1:size(P) do
2:: if P $_{i}$ .label==’truck’ $∥$ P $_{i}$ .label==’vehicle’ $∥$ P $_{i}$ .label==’pedestrian’ then
3:: P $_{n g}$ .push_back(P $_{i}$ )
4:: else
5:: P $_{c a n d i d a t e}$ .push_back(P $_{i}$ )
6:: end if
7:: end for
8:: Project P $_{c a n d i d a t e}$ to grid map GM(M,N)
9:: for i=1:M do
10:: for i=1:N do
11:: if $G M (i, j) . δ_{H^{2}}$ $> t r_{δ}$ $∥$ GM(i,j). $Δ$ H > tr $_{H}$ then
12:: for m=1:GM(i,j).indexs.size do
13:: P $_{n g}$ .push_back(Point(GM(i,j).indexs(m)))
14:: end for
15:: else
16:: for m=1:GM(i,j).indexs.size do
17:: P $_{g}$ .push_back(Point(GM(i,j).indexs(m)))
18:: end for
19:: end if
20:: end for
21:: end for

3.3. Object Clustering

The object clustering method segments nonground points into independent objects. Due to the sparse characteristics of LiDAR points, as the scan range increases, the points change from dense into sparse. The fixed distance threshold cannot meet the clustering requirements of either near or far distance, and over- or undersegmentation appears. To address this problem, we propose a clustering method fusing Euclidean clustering with adaptive distance thresholds according to semantic segmentation category information, as shown in Figure 7.

Inspired by [38], the clustering distance thresholds were set to different values according to the density of the point clouds in different regions in order to address the problem of remote oversegmentation under fixed clustering distance thresholds. The category of clusters is determined according to the category of each point in the cluster. If all the points in the cluster are in the same category, the category is the clustering category. Otherwise, the clustering category is uncertain. As shown in Figure 7b, points in each red rectangle correspond to the same clustering result, labeled as the clustering category. As the truck length is large, it is oversegmented into three clusters.

Point clouds with semantic categories can be used as a clustering criterion to address the problem of oversegmentation. However, the result of semantic segmentation may include false segmentation, as shown in Figure 7b, and there may be multiple categories in the points of segmented clustering results. The clustering categories should be filtered.

To overcome this problem, we present a statistical category filtering method to filter out semantic segmentation noise in each cluster. For each cluster, we statisticized the number of points in each category. The category with the maximal number of points was the category of all points in the cluster. Then, all points of the cluster were set to this category. As shown in Figure 7c, for truck category points with the maximal number, the uncertain category was filtered to the truck category.

For each cluster, the merge threshold was calculated according to the clustering category and the distance of the cluster center to the origin of coordinates. For different clustering categories, such as ‘truck’, ‘vehicle’, and ‘pedestrian’, different distance thresholds were set to merge the clustering results. As shown in Figure 7d, three clustering results were merged into a cluster to address the problem of oversegmentation. On the basis of the category filter method, the clustering category was determined to be ‘truck’. The merging threshold was defined as in Equation (3).

d_{m} = w_{c} * w_{r} * r

(3)

where

d_{m}

is the clustering merger threshold,

w_{c}

is the parameter of the clustering category,

w_{r}

is the parameter of the distance of the cluster center to the origin of coordinates, an r is the distance of the cluster center to origin of coordinates.

4. Experimental Results

To evaluate the performance of the proposed method, extensive experiments were carried out in various scenarios. We collected and annotated a mining dataset, and compared the proposed method with state-of-the-art methods.

4.1. Experimental Setup

The algorithm was implemented with C++ and PCL on Ubuntu 18.04. The experiment was performed on a computer with a 3.5 GHz Intel Core i7-7800X processor, 32 GB RAM, and a RTX3080 GPU.

The autonomous truck was equipped with an ouster OS1-64 LiDAR on the front to acquire 3D point clouds, as shown in Figure 8. The LiDAR consisted of 64 channels in the vertical resolution and was configurable in the horizontal resolution of 512, 1024, or 2048 with output frequency of 10 or 20 Hz. To meet the requirements of detection accuracy and computational efficiency, 1024 horizontal resolution with 10 Hz output was configured. The field of view in the horizontal plane was

360^{°}

, and −16.6

^{°}

~16.6

^{°}

in the vertical plane.

Because the autonomous truck was large, the installation angle of the LiDAR had to consider both the nearest blind area and the farthest detection requirement. For the location of installation, the effective horizon field of view was from 0 to 180 degrees in the front direction. To reduce the computational burden, the point clouds from 180 to 360 degrees were filtered in the rear direction.

The truck front was the y axis, and the vertical ground was the z axis. The x axis was determined with the right-hand-side coordinate system. The raw points of the LiDAR were in polar coordinates. According to the LiDAR characteristics, the polar coordinate data were converted into a Cartesian coordinate system.

The configuration parameters are presented in Table 1. The parameters of H,

θ y

,

θ p

,

θ r

were associated with the LiDAR installation.

4.2. Dataset

LiDAR data were collected using the robot operating system (ROS) in a real mining scenario in Inner Mongolia, China. The collected dataset contained 3000 frames of point clouds and image data, including the whole process of the mining operation for the autonomous truck on straight, turning, uphill, and downhill roads. There were 1828 frames of data containing moving obstacles such as trucks, vehicles, and pedestrians. The image data could be used for comparison with point clouds.

Datasets were annotated in KITTI format, including a point-cloud semantic segmentation dataset and an obstacle detection dataset. The semantic segmentation dataset comprised 3000 frames for RangeNet++ algorithm training and testing. The obstacle detection dataset, called TG-Mine-3D, comprised 1828 frames for performance evaluation. According to the objects in the mine, the TG-Mine-3D dataset had three categories of objects: ‘truck’, ‘vehicle’, and ‘pedestrian’. We used custom dataset TG-Mine-3D to evaluate the performance of 3DSG since, to our knowledge, there were no open-source datasets in the mining area.

4.3. Results

The performance of the proposed method was examined in a series of mining scenarios, including detection trucks, vehicles, and pedestrians, using the TG-Mine-3D dataset for both quantitative and qualitative analyses. To compare the detection results with those of other methods, we separated objects into L-shaped and point models according to the number of points and clustering categories, inspired by [39].

Quantitative Analysis. We evaluated the performance regarding 3D average precision (AP) and mean average precision (mAP), which are consistent with the KITTI evaluation detection metrics. For a fair comparison, the same intersection over union (IoU) thresholds were used for the same category of all methods: 0.7 for ‘truck’ and ‘vehicle’, and 0.5 for ‘pedestrian’. This is unlike KITTI, which divides objects into three difficulty levels (easy, moderate, and hard) depending on the size, occlusion level, and truncation of 3D objects. We considered that all objects had a similar level of difficulty owing to the low probability of occlusion for each object in a mining environment.

The proposed method was compared to our method without the ROI segment, the method in [11,40], SECOND [29], PointPillars [21], PointRCNN [22], and TANET [23], whose performance was highlighted in 3D object detection. The results of AP, mAP, and inference time are shown in Table 2.

The proposed method, 3DSG, achieved a 3D mAP of 66.39%, outperforming traditional state-of-the-art methods [11,40], and deep learning methods SECOND [29], PointPillars [21], and PointRCNN [22] by about 43.11%, 31.63%, 4.17%, 1.49%, and 1.97%, respectively. Although TANet [23] outperformed our method by 0.02% for 3D mAP, our method showed more robustness for the categories of ‘truck’ and ‘pedestrian’.

Compared with traditional methods [11,40], the proposed method outperformed regarding detection in all categories. As the unstructured road was uneven, the detection performance of the plane fitting method [11] decreased. Due to the complex geometric structure of unstructured road boundaries, the traditional detection methods [11,40] falsely detected road boundaries as objects such as trucks, resulting in poor detection performance. Owing to the strategy of road region extraction, they filtered out most nonground point clouds off the road region, such as the road boundaries, which improved the robustness of the object detection results.

Compared with deep learning methods SECOND [29], PointPillars [21], PointRCNN [22], and TANET [23], the proposed method achieved the best performance in the truck and pedestrian categories, but showed poor detection performance in vehicles. For large objects (e.g., trucks), the proposed method achieved improvements of 8.09%, 3.60%, 4.33%, and 2.66% over SECOND, PointPillars, PointRCNN, and TANET, respectively. This was due to the proposed method fusing semantic segmentation information with geometric features to classify object categories, which achieved high classification accuracy for large objects such as trucks. For challenging objects (e.g., pedestrians), the proposed method achieved improvements of 12.7%, 10.48%, 12.20%, and 5.80% over SECOND, PointPillars, PointRCNN, and TANET, respectively. This was due to the fact that the point clouds were sparse, and the pedestrians were small, and deep learning methods have a poor effect on pedestrian feature learning. Our method is based on the traditional geometric method that can extract a point beyond the set threshold robust. For vehicles, our method achieved poor detection performance due to the interference of the additional device of the vehicle. However, deep learning methods have a good learning effect on vehicles, and detection performance is not affected by flags.

Qualitative Analysis. The qualitative results are shown in Figure 9 and Figure 10 from bird’s eye view (BEV) and image perspectives. Figure 10 shows the failure cases. As shown in Figure 10a, the truck was too far away and was falsely detected as two vehicles. For false semantic segmentation, part of the road boundary was detected as ‘vehicle’, as shown in Figure 10b. For safety, flags are placed on the back of vehicles and are sometimes falsely detected as pedestrians, as shown in Figure 10c.

The failure cases in Figure 10 led to a low detection accuracy of mAP, as shown in Table 2. However, these failure cases could output detection results normally and do not affect safety performance, which is important for autonomous driving.

Inference time. Table 2 shows that the running time of the proposed method was 51.35 ms, which could meet the real-time performance for the 10 Hz output frequency of LiDAR. Compared with deep learning methods [21,22,23,29], traditional methods [11,40] are more efficient. Due to the unique encoding method, the PointPillars achieves the fastest computing speed among all the comparison methods. Compared with our method without ROI, our method had better computational efficiency. Since the ROI module filtered out point clouds outside roads, the amount of data input to the detection method was reduced, which improved the computing efficiency of the detection method.

5. Conclusions

In this study, we presented 3DSG, a novel 3D LiDAR object detection method for autonomous trucks in mining environments. The method was tested on an autonomous truck for the whole process of mining operations under different road conditions in a real mining scenario. The performance of the proposed method was evaluated using the TG-Mine-3D dataset. Compared with state-of-the-art methods, our method achieved higher precision of 66.39%. Moreover, for the ’truck’ and ’pedestrian’ categories, the performance of our method was significantly improved by 2.66% and 5.80%, respectively. The proposed method, running for 51.35 ms, achieved real-time performance. In conclusion, our proposed method solves the problem of object detection for autonomous trucks and provides guidance for application in mining environments.

The proposed 3DSG method achieved high accuracy for object detection in a mining environment. Instead of continuous multiple frames, the proposed method was employed for a single data frame, which may lead to fluctuations in detection results. In the future, we will focus on the tracking algorithm for continuous multiframe object detection to improve stability performance.

Author Contributions

Conceptualization, H.L., Z.W., G.Y., Z.G., B.Z., P.C., F.Z.; methodology, H.L., G.Y, Z.G., F.Z.; software, Z.G., F.Z.; validation, H.L., Z.W., P.C.; formal analysis, H.L., G.Y.; data curation, H.L., B.Z.; writing—original draft preparation, H.L., Z.W., G.Y., Z.G., B.Z., P.C., F.Z.; writing—review and editing, H.L., Z.W., G.Y., Z.G., B.Z., P.C., F.Z.; visualization, H.L., P.C.; supervision, G.Y., Z.W.; project administration, B.Z., P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Technologies R&D Program of China (2020YFB1600301) and the National Natural Science Foundation of China (grant no. 52072020).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Detailed data are contained within the article. More data that support the findings of this study are available from the author H.L. upon reasonable request.

Acknowledgments

The authors are thankful for the support of the School of Transportation and Science, and the School of Research Institute for Frontier Science of Beihang University, Key Laboratory of Autonomous Transportation Technology for Special Vehicles.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zheng, W.; Xie, H.; Chen, Y.; Roh, J.; Shin, H. PIFNet: 3D Object Detection Using Joint Image and Point Cloud Features for Autonomous Driving. Appl. Sci. 2022, 12, 3686. [Google Scholar] [CrossRef]
Sabou, S.; Oniga, S.; Lung, C. Magnetic sensors in inertial navigation system. In Proceedings of the 2014 IEEE 20th International Symposium for Design and Technology in Electronic Packaging (SIITME), Bucharest, Romania, 23–26 October 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 211–214. [Google Scholar]
Wang, W.; Chang, X.; Yang, J.; Xu, G. LiDAR-Based Dense Pedestrian Detection and Tracking. Appl. Sci. 2022, 12, 1799. [Google Scholar] [CrossRef]
Hoang, L.; Lee, S.H.; Lee, E.J.; Kwon, K.R. GSV-NET: A Multi-Modal Deep Learning Network for 3D Point Cloud Classification. Appl. Sci. 2022, 12, 483. [Google Scholar] [CrossRef]
Wang, G.; Wu, J.; Xu, T.; Tian, B. 3D vehicle detection with RSU LiDAR for autonomous mine. IEEE Trans. Veh. Technol. 2021, 70, 344–355. [Google Scholar] [CrossRef]
Tang, J.; Lu, X.; Ai, Y.; Tian, B.; Chen, L. Road Detection for autonomous truck in mine environment. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, NZ, USA, 27–30 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 839–845. [Google Scholar]
Yan, Z.; Duckett, T.; Bellotto, N. Online learning for human classification in 3D LiDAR-based tracking. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 864–871. [Google Scholar]
Zhao, Y.; Zhang, X.; Huang, X. A technical survey and evaluation of traditional point cloud clustering methods for lidar panoptic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2464–2473. [Google Scholar]
Narksri, P.; Takeuchi, E.; Ninomiya, Y.; Morales, Y.; Akai, N.; Kawaguchi, N. A slope-robust cascaded ground segmentation in 3D point cloud for autonomous vehicles. In Proceedings of the 2018 21st International Conference on intelligent transportation systems (ITSC), Maui, HI, USA, 4–7 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 497–504. [Google Scholar]
Zermas, D.; Izzat, I.; Papanikolopoulos, N. Fast segmentation of 3d point clouds: A paradigm on lidar data for autonomous vehicle applications. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 5067–5073. [Google Scholar]
Li, N.; Su, B. 3D-Lidar based obstacle detection and fast map reconstruction in rough terrain. In Proceedings of the 2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE), Dalian, China, 19–20 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 145–151. [Google Scholar]
Anand, B.; Senapati, M.; Barsaiyan, V.; Rajalakshmi, P. LiDAR-INS/GNSS-Based Real-Time Ground Removal, Segmentation, and Georeferencing Framework for Smart Transportation. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
Klasing, K.; Wollherr, D.; Buss, M. A clustering method for efficient segmentation of 3D laser data. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation, Bangkok, Thailand, 14–17 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 4043–4048. [Google Scholar]
Klasing, K.; Wollherr, D.; Buss, M. Realtime segmentation of range data using continuous nearest neighbors. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 2431–2436. [Google Scholar]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
Tran, T.N.; Drab, K.; Daszykowski, M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters. Chemom. Intell. Lab. Syst. 2013, 120, 92–96. [Google Scholar] [CrossRef]
Hou, J.; Gao, H.; Li, X. DSets-DBSCAN: A parameter-free clustering algorithm. IEEE Trans. Image Process. 2016, 25, 3182–3193. [Google Scholar] [CrossRef] [PubMed]
Sarcinelli, R.; Guidolini, R.; Cardoso, V.B.; Paixão, T.M.; Berriel, R.F.; Azevedo, P.; De Souza, A.F.; Badue, C.; Oliveira-Santos, T. Handling pedestrians in self-driving cars using image tracking and alternative path generation with Frenét frames. Comput. Graph. 2019, 84, 173–184. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: New York, NY, USA, 2016; pp. 21–37. [Google Scholar]
Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
Liu, Z.; Zhao, X.; Huang, T.; Hu, R.; Zhou, Y.; Bai, X. Tanet: Robust 3d object detection from point clouds with triple attention. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11677–11684. [Google Scholar]
Chen, X.; Kundu, K.; Zhu, Y.; Berneshawi, A.G.; Ma, H.; Fidler, S.; Urtasun, R. 3d object proposals for accurate object class detection. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28, pp. 424–432. [Google Scholar]
Chen, X.; Kundu, K.; Zhang, Z.; Ma, H.; Fidler, S.; Urtasun, R. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2147–2156. [Google Scholar]
Mousavian, A.; Anguelov, D.; Flynn, J.; Kosecka, J. 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7074–7082. [Google Scholar]
Li, B.; Ouyang, W.; Sheng, L.; Zeng, X.; Wang, X. Gs3d: An efficient 3d object detection framework for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1019–1028. [Google Scholar]
Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 5099–5108. [Google Scholar]
Zhao, G.; Yuan, J. Curb detection and tracking using 3D-LIDAR scanner. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 437–440. [Google Scholar]
Jung, Y.; Seo, S.W.; Kim, S.W. Curb detection and tracking in low-resolution 3d point clouds based on optimization framework. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3893–3908. [Google Scholar] [CrossRef]
Kong, X.; Zhai, G.; Zhong, B.; Liu, Y. Pass3d: Precise and accelerated semantic segmentation for 3d point cloud. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3467–3473. [Google Scholar]
Cortinhal, T.; Tzelepis, G.; Erdal Aksoy, E. SalsaNext: Fast, uncertainty-aware semantic segmentation of LiDAR point clouds. In Proceedings of the International Symposium on Visual Computing, San Diego, CA, USA, 5–7 October 2020; Springer: New York, NY, USA, 2020; pp. 207–222. [Google Scholar]
Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. Rangenet++: Fast and accurate lidar semantic segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4213–4220. [Google Scholar]
Qin, T.; Zheng, Y.; Chen, T.; Chen, Y.; Su, Q. A Light-Weight Semantic Map for Visual Localization towards Autonomous Driving. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 11248–11254. [Google Scholar]
Zhou, B.; Huang, R. Segmentation Algorithm for 3D LiDAR Point Cloud Based on Region Clustering. In Proceedings of the 2020 7th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS), Guangzhou, China, 13–15 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 52–57. [Google Scholar]
Ye, Y.; Fu, L.; Li, B. Object detection and tracking using multi-layer laser for autonomous urban driving. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 259–264. [Google Scholar]
Rachman, A.S.A. 3D-LIDAR Multi Object Tracking for Autonomous Driving: Multi-Target Detection and Tracking under Urban Road Uncertainties. Master’s Thesis, Delft University of Technology, Delft, The Netherlands, 2017. [Google Scholar]

Figure 1. Illustration of a mining environment. (a) Mining environment. (b) Road boundary. (c) Bumpy road.

Figure 2. The pipeline of the proposed method.

Figure 3. Illustration of cloud-semantic segmentation.

Figure 4. Illustration of false semantic segmentation.

Figure 5. Pipeline for road region searching. (a) Semantic segment point clouds. (b) Grid map searching. (c) ROI point clouds.

Figure 6. The pipeline of ground detection.

Figure 7. Pipeline of the clustering method combining adaptive distance threshold Euclidean clustering with semantic segmentation category information.

Figure 8. Truck equipped with an ouster OS1-64 LiDAR.

Figure 9. Qualitative analysis of TG-Mine-3D dataset. The first row shows BEV results. Ground truth boxes are shown in green, and detection results are in red. The second row shows the image corresponding to the first row.

Figure 10. Failure cases using the same visualized setup as that in Figure 9. The yellow rectangle represents the target of the failed detection.

Table 1. Configuration parameters for the LiDAR.

Parameters	Definition	Value
H	Height	4.1 m
$θ_{y}$	Yaw	0.12 deg
$θ_{p}$	Pitch	−16.1 deg
$θ_{r}$	Roll	−0.031 deg
$α$	Horizontal angular resolution	1024/0.703 deg
$β$	Vertical angular resolution	0.518 deg
f	Out frequency	10 Hz

Table 2. Configuration parameters for the LiDAR.

Method	Truck	Vehicle	Pedestrian	3D mAP (%)	Running Time (ms)
Method in [11]	31.85	20.38	17.62	23.28	16.25
Method in [40]	52.35	25.16	26.76	34.76	25.32
SECOND [29]	75.86	71.18	39.61	62.21	49.52
PointPillars [21]	80.35	72.52	41.83	64.90	15.63
PointRCNN [22]	79.62	73.53	40.11	64.42	98.76
TANet [23]	81.29	71.44	46.51	66.41	31.26
3DSG without ROI	56.15	31.25	28.31	38.57	110.69
3DSG	83.95	62.91	52.31	66.39	51.35

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Wang, Z.; Yu, G.; Gong, Z.; Zhou, B.; Chen, P.; Zhao, F. 3DSG: A 3D LiDAR-Based Object Detection Method for Autonomous Mining Trucks Fusing Semantic and Geometric Features. Appl. Sci. 2022, 12, 12444. https://doi.org/10.3390/app122312444

AMA Style

Li H, Wang Z, Yu G, Gong Z, Zhou B, Chen P, Zhao F. 3DSG: A 3D LiDAR-Based Object Detection Method for Autonomous Mining Trucks Fusing Semantic and Geometric Features. Applied Sciences. 2022; 12(23):12444. https://doi.org/10.3390/app122312444

Chicago/Turabian Style

Li, Huazhi, Zhangyu Wang, Guizhen Yu, Ziren Gong, Bin Zhou, Peng Chen, and Fei Zhao. 2022. "3DSG: A 3D LiDAR-Based Object Detection Method for Autonomous Mining Trucks Fusing Semantic and Geometric Features" Applied Sciences 12, no. 23: 12444. https://doi.org/10.3390/app122312444

APA Style

Li, H., Wang, Z., Yu, G., Gong, Z., Zhou, B., Chen, P., & Zhao, F. (2022). 3DSG: A 3D LiDAR-Based Object Detection Method for Autonomous Mining Trucks Fusing Semantic and Geometric Features. Applied Sciences, 12(23), 12444. https://doi.org/10.3390/app122312444

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

3DSG: A 3D LiDAR-Based Object Detection Method for Autonomous Mining Trucks Fusing Semantic and Geometric Features

Abstract

1. Introduction

2. Related Work

2.1. Research on Traditional Methods

2.2. Research on Deep-Learning-Based Methods

3. Methodology

3.1. Road Region Extraction

3.1.1. Cloud Semantic Segmentation

3.1.2. Road Region Searching

3.2. Ground Detection

3.3. Object Clustering

4. Experimental Results

4.1. Experimental Setup

4.2. Dataset

4.3. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI