Pedestrian Detection with Lidar Point Clouds Based on Single Template Matching

Liu, Kaiqi; Wang, Wenguang; Wang, Jun

doi:10.3390/electronics8070780

Open AccessArticle

Pedestrian Detection with Lidar Point Clouds Based on Single Template Matching

by

Kaiqi Liu

,

Wenguang Wang

^* and

Jun Wang

^*

School of Electronic and Information Engineering, Beihang University, Beijing 100191, China

^*

Authors to whom correspondence should be addressed.

Electronics 2019, 8(7), 780; https://doi.org/10.3390/electronics8070780

Submission received: 7 June 2019 / Revised: 5 July 2019 / Accepted: 8 July 2019 / Published: 11 July 2019

(This article belongs to the Section Electrical and Autonomous Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

In the field of intelligent transportation systems, pedestrian detection has become a problem that is urgently in need of a solution. Effective pedestrian detection reduces accidents and protects pedestrians from injuries. A pedestrian-detection algorithm, namely, single template matching with kernel density estimation clustering (STM-KDE), is proposed in this paper. First, the KDE-based clustering method is utilized to extract candidate pedestrians in point clouds. Next, the coordinates of the point clouds are transformed into the pedestrians’ local coordinate system and projection images are generated. Locally adaptive regression kernel features are extracted from the projection image and matched with the template features by using cosine similarity, based on which pedestrians are distinguished from other columnar objects. Finally, comparative experiments using KITTI datasets are conducted to verify pedestrian-detection performance. Compared with the STM with radially bounded nearest neighbor (STM-RBNN) algorithm and the KDE-based pedestrian-detection algorithm, the proposed algorithm can segment gathering pedestrians and distinguish them from other columnar objects in real scenarios.

Keywords:

Lidar; pedestrian detection; point clouds; template matching

1. Introduction

With the development of intelligent transportation systems, environmental perception is becoming increasingly significant in the fields of advanced driver assistance systems (ADAS) and autonomous vehicles [1]. Considering that there is a large number of traffic accidents every year, pedestrian detection has become an urgent challenge in need of a solution. Pedestrians are vulnerable in traffic. Millions of people in the world are killed or injured by traffic accidents every year (https://www.who.int/violence_injury_prevention/road_safety_status/2015/zh/). To some extent, effective pedestrian detection would reduce traffic accidents to protect pedestrians from vehicle injuries. Meanwhile, detecting the position and movement of a pedestrian has a great effect on the motion planning of autonomous vehicles.

There are dynamic and stationary pedestrians on road scenarios, in which their motion and position are random. Compared with vehicles, pedestrian profile sizes are smaller. In addition, pedestrians in these scenarios often gather together, which makes it difficult to accurately segment them. All these factors make pedestrian detection difficult.

In the field of pedestrian detection, the camera is one of the most commonly used sensors. It has the closest structure to that of human eyes and can directly reflect the world how humans perceive it. In optical images, it is easy to obtain color, texture and contour features of targets. These features, combined with learning methods, can classify objects in road scenarios. Dalal proposed the histogram of oriented gradient (HOG) feature for pedestrian detection [2]. HOG is insensitive to light and shadows, which widely interests researchers [3]. Scale-invariant feature transform (SIFT) and Haar-like features are also commonly used in pedestrian detection [4,5]. For autonomous vehicles, determining the position of pedestrians is significant to designate driving strategies. However, it is difficult to obtain accurate pedestrian-position information with monocular images.

A Lidar sensor can obtain the contour, position and distance information of objects, which makes it an excellent sensor in 3D environment perception [6,7]. Generally, pedestrian-detection methods with Lidar can be divided into three categories: training-based methods, foreground and background segmentation methods and kernel density estimation (KDE)-based methods.

1.1. Training-Based Methods

Features play an important role in training-based methods. A pedestrian classification system with three features, including the total number of points in a cluster, distance to the Lidar and spatial distribution direction of clustered points, was utilized in Reference [8]. The backpropagate autoencoder-artificial neural network was trained for the classification. An AdaBoost classifier, combined with a radius feature, a convexity feature, a local minimum feature and a robust compactness feature were utilized to detect pedestrians in Reference [9]. Fifteen unique features of Lidar point clouds, such as the number of points, the normalized Cartesian dimension and the central moment, were used to detect pedestrians in urban scenarios [10]. The segmentation method in Reference [11] was used for point-cloud segmentation in Reference [12]. Thirty-one features and a radial basis fuction additive kernel support vector machine (SVM) were also used for pedestrian detection. Three-dimensional features from point clouds and two-dimensional features from projection images were extracted to classify pedestrians with a radial basis function kernel SVM [13]. In order to detect pedestrians in a long-range area with sparse point clouds, radial basis function-based interpolation was implemented to robustly extract features and SVMs are applied to learn the classifiers [14].

In general, training-based methods perform well in pedestrian detection with a large number of training data but complex and changeable real scenarios make it difficult to obtain sufficient negative training samples, which limit the performance of these methods.

1.2. Foreground and Background Segmentation Methods

In addition to directly extracting pedestrians from scenarios, there is also a kind of method based on the foreground and background segmentation of point clouds. “Background” refers to fixed ground, buildings, trees and stationary pedestrians and vehicles. “Foreground” refers to moving objects such as cyclists and moving pedestrians and vehicles. Fixed sensors are required in these methods, which are common in monitoring scenarios or when the vehicle is stationary at the roadside or an intersection. A visual monitoring system that passively observes moving objects was proposed in Reference [15]. Over time, the pixel values of the particular pixel in the background obey adaptive Gaussian distribution. With illumination changing, the background pixel values obey the adaptive mixture of Gaussians. After the background model is constructed, pixel values that do not satisfy the distribution of the background pixel values are regarded as moving foreground pixels and foreground targets are obtained by morpholological processing. A similar segmentation method was applied in Lidar monitoring scenarios [16] and a point-cloud foreground detection method based on the dynamic Markov model was proposed to detect moving pedestrians.

Although this kind of method can adaptively detect background and foreground pixels, it has some drawbacks. First, the utilized sensors in this method should be fixed in the scenarios in order to segment foreground and background. If the sensors have movements, the background information is changed, which makes it difficult to collect a large number of data to establish the background model. Second, this method needs to accumulate the pixel values in each pixel, which demands much computation and many computing resources.

1.3. KDE-Based Method

In the field of pedestrian detection with Lidar, single-line Lidar has been applied to perceive the environment earlier. In Reference [17], four single-line Lidars were constructed in the scenarios whose height was set to sixteen centimeters in order to detect the legs of pedestrians with KDE. With the development of Lidar sensors, 2.5D and 3D Lidar have attracted increasing attention. In Reference [18], a fusion method based on KDE was proposed for pedestrian detection. The geometric features of pedestrians and the returned layer numbers are applied to determine their positions.

The KDE-based method can effectively detect upright pedestrians in a scenario but it is difficult for it to accurately distinguish columnar objects with similar sizes to pedestrians, such as mailboxes and guideboards.

The main objective of the paper is to propose a pedestrian detection system using Lidar point clouds with the ability to detect gathering pedestrians and avoid the training process of a large number of data, thereby improving the ability of the intelligent transportation system in environmental perception. In this work, our contributions are demonstrated by the detection system named single template matching with kernel density estimation clustering (STM-KDE). Compared with the existing methods, STM-KDE can well segment the gathering pedestrians with the KDE-based clustering, which ignores the problem that the point clouds are distributed uneven but only focuses on the accumulation of the point clouds at the object’s upright position. Meanwhile, the system does not need a large amount of data for model training and only a single pedestrian template is needed for detecting pedestrians in the scenarios. In the end, the qualitative and quantitative experiments validate the performance of the proposed detection system. In the range of 15 m, by considering the detection results and the false alarms, the

F_{1}

score of STM-KDE exceeds the comparative methods by 4% and 19%, respectively. Meanwhile, the proposed detection method does not improve the computational complexity on the basis of improving the detection efficiency compared with existing method.

The remainder of the paper is organized as follows. In Section 2, the implementation process of the proposed STM-KDE detection system is depicted. The experiments are shown in Section 3, in which two KITTI datasets are employed. Finally, concluding remarks and the contributions are given in Section 4.

2. Materials and Methods

With the influence of environmental interference and measurement noise, it is difficult to distinguish pedestrians from objects with similar shapes. In addition, pedestrians in road scenarios often appear in groups. The segmentation performance of point clouds is poor within a short distance. In order to solve these problems, a pedestrian-detection method based on STM-KDE is proposed. The method uses one template to match objects in scenarios and detect pedestrians. It should be noted that pedestrian detection in this paper includes pedestrians and cyclists, because the state of people while cycling is similar to that of walking and, compared with the human body, the number of point clouds scanned by Lidar on the bicycle is smaller, which can be ignored. The detection flowchart is shown in Figure 1.

As shown in Figure 1, there are three main processes in the proposed detection algorithm.

Point-cloud preprocessing. This process includes ground segmentation and grid filtering, which is mainly utilized to reduce the number of point clouds and improve the efficiency of the algorithm.
Pedestrian clustering based on KDE, which is employed to extract candidate pedestrians from point clouds with size limitation.
Template matching. Three-dimensional point clouds are projected onto the 2D plane, from which contour features are extracted. Cosine Similarity between the features of the template and the projection image is calculated for pedestrian detection.

2.1. Preprocessing

In the field of autonomous vehicles, Velodyne HDL-64E [19,20] is a commonly used Lidar with 64 laser emitters and 64 receivers, which is equivalent to using 64 laser lines to scan the environment. Every second, Velodyne HDL-64E receives millions of point clouds, half of which are returned from the ground. Ground point clouds not only increase computation but also interfere with target detection. The purpose of point-cloud preprocessing is to reduce the point clouds to be processed and improve the efficiency of the subsequent detection. In this paper, preprocessing mainly consists of two processes: ground segmentation and grid filtering. In general, ground modeling is utilized to segment point clouds into ground points and nonground points [19,21,22,23]. Grid filtering [20] filters outliers and tall structural objects that are obviously nonhuman beings. Grid filtering is implemented according to the height information of point clouds that fall into the same grid. The detailed process is as follows.

First, nonground point clouds are projected onto a grid map that is evenly divided along the x and y directions. The points in grid cell

(i, j)

are expressed as

P_{i j} = \{p_{1}^{i j}, \dots, p_{λ}^{i j}\}

, where

λ

is the number of point clouds in grid cell

(i, j)

. The maximum height, minimum height and mean height of the points in grid

(i, j)

are denoted as

z_{i j}^{m a x}

,

z_{i j}^{m i n}

and

z_{i j}^{m e a n}

, respectively. Next, the point clouds are filtered by the following criteria.

Generally, points that are returned by a human are dense. If the point number contained in a grid cell is smaller than a preset value, these points may come from the measurement noise, mis-segmented ground points and other interference objects. Therefore, these points are filtered out.
If the height difference of the point clouds in a grid cell that can be calculated with $z_{i j}^{m a x}$ and $z_{i j}^{m i n}$ is small and the average height $z_{i j}^{m e a n}$ of the point clouds is small, these points may come from roads or low obstacles and are filtered out.
If the height difference of the point clouds in a grid cell is large or the maximum height $z_{i j}^{m a x}$ is big, these points are likely to come from tall buildings and are therefore filtered out.

2.2. Clustering Based on KDE

Before detecting, it is necessary to determine the detection region of interest. Sliding windows in point clouds is a useful way to extract objects in training-based pedestrian-detection methods [24]. However, searching with sliding windows in whole point clouds results in a large amount of computation. The distance relationship between point clouds can also be utilized to perform 3D clustering [25,26]. The purpose of clustering is to segment point clouds based on their characteristics. For example, in Reference [25], a radially bounded nearest neighbor (RBNN) algorithm is proposed to cluster the point clouds. RBNN is easy to implement and segments point clouds based on their relative spatial position. However, RBNN uses a fixed spherical radius to search for neighbor points. For adjacent objects, such as multiple pedestrians gathering together, its segmentation performance is poor. Undersegmentation occurs and makes multiple pedestrians group into one segmentation. To solve the above problems, a clustering method based on KDE is proposed to extract candidate pedestrians.

2.2.1. Hierarchical Segmentation

Hierarchical segmentation uses scanning breakpoints in each layer to segment the point clouds. The distance between adjacent points

p_{i - 1}

and

p_{i}

is compared with the threshold in Equation (1).

η_{s e g} = ε \times r \times sin (α)

(1)

where

ε

is a constant, r denotes the distance between the point and Lidar and

α

expresses the horizontal scanning angular resolution of the Lidar. If the distance between

p_{i - 1}

and

p_{i}

is smaller than

η_{s e g}

, points

p_{i - 1}

and

p_{i}

belong to the same segment. On the contrary, they are used as the end point and start point of two segments. Subsequently, the above process is implemented on adjacent points

p_{i}

and

p_{i + 1}

. After comparing the distances in the scanning layer in order, the hierarchical-segmentation results of all scanning layers

C

are obtained.

Next, geometric conditions are utilized to extract the candidate pedestrian segment. As shown in Equation (2),

l_{C_{i}}

and

w_{C_{i}}

represent the length and width of the segment, respectively.

η_{p e d}

expresses the pedestrian-size that is applied to extract the candidate pedestrian segmentation in single layer.

C_{p} = {C_{i} | l_{C_{i}} \leq η_{p e d}, w_{C_{i}} \leq η_{p e d}, C_{i} \in C}

(2)

The clustering process is shown in Algorithm 1.

Algorithm 1 Hierarchical segmentation.

Input:

P = {(p_{1}, p_{2}, \dots, p_{κ})}^{T}

,

ε

,

α

,

η_{p e d}

;

1:: $C = Ø$ , $C_{p} = Ø$ , $i d x_{s t a r t} = 1$ ;
2:: for $i = 2, \dots, κ$ do
3:: $d i s_{i} = ∥p_{i} - p_{i - 1}∥$ ;
4:: $η_{s e g}^{i} = ε \cdot ∥p_{i}∥ \cdot sin (α)$ ;
5:: if $d i s_{i} > η_{s e g}^{i}$ then
6:: $C_{s} = {p_{x} | x = i d x_{s t a r t}, \dots, i - 1}$ ;
7:: $C = C \cup C_{s}$ ;
8:: $i d x_{s t a r t} = i$ ;
9:: end if
10:: end for
11:: $C_{s} = {p_{x} | x = i d x_{s t a r t}, \dots, κ}$ ;
12:: $C = C \cup C_{s}$ ;
13:: for $C_{i} \in C$ do
14:: if $l_{C_{i}} < η_{p e d}$ and $w_{C_{i}} < η_{p e d}$ then
15:: $C_{p} = C_{p} \cup C_{i}$ ;
16:: end if
17:: end for

Output: A set of candidate pedestrian clusters

C_{p}

in each scanning layer

2.2.2. Multilayer Fusion

Generally, pedestrians in road scenarios remain upright. The body of the pedestrian returns multiple scanning layers. The accumulation of scanning points at the position of pedestrians is larger than at the position of noise and other environmental objects. Therefore, KDE is utilized to calculate accumulation probability. In each scanning layer, Gaussian distribution is established in geometric center

c_{p} = (x_{p}, y_{p})

of candidate pedestrian cluster

C_{p}

. Clusters in multiple scanning layers are fused to determine whether

C_{p}

is a disturbance in a single scanning layer or a columnar object that exists at a similar position in each scanning layer. The probability function is

p (c) = \frac{1}{N_{s}} \sum_{p = 1}^{n} φ (c - c_{p}, w)

(3)

N_{s} = \sum_{p = 1}^{l_{N}} N_{p}

(4)

N_{p} = \{\begin{matrix} 1, & i f 0 < h_{l i d a r} + r \times tan (ϕ_{p}) < h_{p e d} \\ 0, & e l s e \end{matrix}

(5)

where

c = (x, y)

denotes the coordinates of the points in the

x - y

plane, n denotes the number of clusters in

C_{p}

, w is the window length that depends on pedestrian size and

N_{s}

can be regarded as the theoretical number of scanning layers returned by a pedestrian.

l_{N}

is the total number of scanning layers.

Φ = \{ϕ_{1}, \dots, ϕ_{N}\}

represents the pitch angle corresponding to each scanning layer.

h_{p e d}

denotes the average pedestrian height. In Equation (3),

φ (\cdot)

denotes the probability density function of Gaussian distribution.

φ (c, w) = \frac{1}{{(2 π)}^{d / 2} w^{d} {|Σ|}^{1 1 / 2 2}} exp (- \frac{c^{T} Σ^{- 1} c}{2 w^{2}})

(6)

where

Σ

is a covariance matrix and the identity matrix is utilized in this paper [17]. It is considered that the coefficient in Equation (6) is a constant; the equation can be simplified to the following form.

p (c) = \frac{1}{N_{s}} \sum_{i = 1}^{n} exp (- \frac{{(c - c_{i})}^{T} (c - c_{i})}{2 w^{2}})

(7)

After fusing multiple layers, a local extreme value appears at the position of upright pedestrians. The mean shift algorithm in Reference [27] was utilized to extract the position of the local maxima.

It is assumed that the obtained positions of the m local maximum values are expressed by

P_{p e a k} = \{p_{1}, \dots, p_{m}\}

,

p_{i} = (f_{p i}, x_{p i}, y_{p i})

, where

f_{p i}

is the probability of the ith extremum, whose position is represented by

(x_{p i}, y_{p i})

. The positions of candidate pedestrians are selected from the positions of probability values that are greater than a preset threshold. The positions of the selected

γ

maximum extremums are denoted by

P_{p} = \{p_{1}, \dots, {peak}_{γ}\}

. With the local maximum position as the center, the point clouds of candidate pedestrian

c_{p e d}

are extracted according to the probable pedestrian size. All candidate pedestrian clusters are expressed as

C_{p e d} = \{c_{p e d}^{1}, \dots, c_{p e d}^{γ}\}

.

2.3. Template Matching

2.3.1. Projection-Image Generation

For each candidate pedestrian cluster

c_{p e d}^{i}

in

C_{p e d}

, principal component analysis (PCA) [28] was implemented to determine the main plane and local coordinate system. The plane determined by eigenvectors corresponding to the two large eigenvalues is defined as the main plane of the pedestrian point clouds. In these two eigenvectors, the eigenvector that has the smallest angle with vector

(0, 0, 1)

is defined as the

z^{'}

axis and another is defined as the

y^{'}

axis. The eigenvector corresponding to the smallest eigenvector is the

x^{'}

axis. As shown in Figure 2,

O x y z

is the Lidar coordinate system.

O^{'} x^{'} y^{'} z^{'}

is the local coordinate system in which the pedestrian is located.

O^{'} y^{'} z^{'}

denotes the main plane of the pedestrian.

O x^{'} y^{'} z^{'}

is the coordinate system that has the same origin as

O x y z

and each coordinate axis of

O x^{'} y^{'} z^{'}

is parallel to coordinate system

O^{'} x^{'} y^{'} z^{'}

.

(x, y, z)

,

(x^{'}, y^{'}, z^{'})

and

(x^{″}, y^{″}, z^{″})

represent the coordinates in

O x y z

,

O^{'} x^{'} y^{'} z^{'}

and

O x^{'} y^{'} z^{'}

, respectively. The coordinate transformation from Lidar coordinate system

O x y z

to local coordinate system

O^{'} x^{'} y^{'} z^{'}

is as follows [14].

(\begin{matrix} x^{″} \\ y^{″} \\ z^{″} \end{matrix}) = (\begin{matrix} cos (ψ_{o x^{'}, o x}) & cos (ψ_{o x^{'}, o y}) & cos (ψ_{o x^{'}, o z}) \\ cos (ψ_{o y^{'}, o x}) & cos (ψ_{o y^{'}, o y}) & cos (ψ_{o y^{'}, o z}) \\ cos (ψ_{o z^{'}, o x}) & cos (ψ_{o z^{'}, o y}) & cos (ψ_{o z^{'}, o z}) \end{matrix}) (\begin{matrix} x \\ y \\ z \end{matrix})

(8)

(\begin{matrix} x^{'} \\ y^{'} \\ z^{'} \end{matrix}) = (\begin{matrix} x^{″} - O_{x}^{'} \\ y^{″} - O_{y}^{'} \\ z^{″} - O_{z}^{'} \end{matrix})

(9)

where

ψ

denotes the angle between the two axes; and

O_{x}^{'}

,

O_{y}^{'}

and

O_{z}^{'}

represent the x, y and z coordinates of point

O^{'}

, which is determined by the geometric center of the candidate pedestrian cluster.

After transforming the coordinates of clusters in the Lidar coordinate system to the local coordinate system, the point clouds are projected onto the main plane. As shown in Figure 2,

O^{'} y^{'} z^{'}

is the main plane, in which the grid is divided with a fixed length in the direction of the

y^{'}

and

z^{'}

axes. In each grid, the minimum distance of the points to the main plane is regarded as the pixel value. If there are no points in the grid, the pixel value is set to 0. Figure 3c is the projection result of point clouds in Figure 3b.

Since the Lidar discretely collects 3D points, the projection image obtained in Figure 3c is a discrete range image. The discrete image is filled into a complete contour image by using morphological processing, such as morphological expansion and hole filling. Subsequent feature extraction and template matching are implemented in the projection image like Figure 3d.

2.3.2. Feature Extraction

As shown in Figure 3d, the projection image of pedestrian point clouds shows a clear pedestrian outline that can clearly distinguish the shape of the pedestrian’s head, waist, arms and legs by visual observation. The features from the contour of objects can be extracted to build the template for matching. The locally adaptive regression kernel (LARK) feature can estimate the contour of an object by calculating the similarity of the current pixel with its surrounding pixels. It has achieved success in the field of face detection [29,30]. The LARK feature of pixel

x

is calculated as follows.

K (x_{l} - x, Σ_{l}) = \frac{\sqrt{det (Σ_{l})}}{2 π h^{2}} exp (- \frac{{(x_{l} - x)}^{T} Σ_{l} (x_{l} - x)}{2 h^{2}}), l = 1, \dots, P^{2}

(10)

The local window is established with

x

as its center and P as the length of the window. There are

P^{2}

pixels in total in the local window. The pixel coordinates in the local window are denoted as

x_{l} = {(x_{1}, x_{2})}^{T}

, h denotes the smoothing parameter,

Σ_{l}

represents the covariance matrix calculated by the spatial gradient vectors in a local window of

x

. The calculation of

Σ_{l}

is as follows [31].

Σ_{l} = (\begin{matrix} \sum_{x_{j} \in w_{l}} g_{x_{1}} (x_{j}) g_{x_{1}} (x_{j}) & \sum_{x_{j} \in w_{l}} g_{x_{1}} (x_{j}) g_{x_{2}} (x_{j}) \\ \sum_{x_{j} \in w_{l}} g_{x_{2}} (x_{j}) g_{x_{1}} (x_{j}) & \sum_{x_{j} \in w_{l}} g_{x_{2}} (x_{j}) g_{x_{2}} (x_{j}) \end{matrix})

(11)

where

w_{l}

represents the local window of

x_{l}

.

g_{x_{1}} (\cdot)

and

g_{x_{2}} (\cdot)

are the first derivatives in the x and y directions.

For each pixel in the projection image, Equations (10) and (11) are utilized to extract LARK features

K_{i} \in R^{P \times P}

. Subsequently, the LARK feature is normalized as a column vector

l_{i} \in R^{P^{2} \times 1}

. The LARK vectors of all pixels constitute the LARK matrix of image

L = [l_{1}, \dots, l_{i}, \dots, l_{M}] \in R^{P^{2} \times M},

(12)

where M is the pixel number of the projection image.

2.3.3. Cosine Similarity

The angle between the two vectors can be used to describe the difference between them. After LARK feature extraction, cosine similarity is utilized to characterize the matching degree between the template feature matrix and the feature matrix of the candidate pedestrian. Cosine similarity is widely used in many fields [32,33]. The cosine similarity of LARK matrices is defined as follows

ρ (L_{t e s t}, L_{t e m p}) = \sum_{i = 1}^{M} \frac{{(l_{t e s t}^{i})}^{T} l_{t e m p}^{i}}{{∥L_{t e s t}∥}_{F} {∥L_{t e m p}∥}_{F}},

(13)

where

L_{t e s t}

is the LARK feature matrix of the candidate pedestrian and

L_{t e m p}

is the LARK feature matrix of the template. M is the column number of the feature matrix, which is the number of pixels in Equation (12).

l_{t e s t}^{i}

and

l_{t e m p}^{i}

are the ith column vector of the corresponding kernel matrix, respectively.

{∥\cdot∥}_{F}

denotes the Frobenius norm.

2.4. Pedestrian Detection

A pedestrian-detection algorithm based on STM-KDE is proposed in this paper. The algorithm used only one template to detect pedestrians in the whole frame of point clouds without a large number of training data. The algorithm process is shown in Algorithm 2.

Algorithm 2 Pedestrian-detection method based on single template.

Input:

P_{w} = {(p_{1}, p_{2}, \dots, p_{n})}^{T}

,

η_{c}

,

η_{d e t}

,

L_{t e m p}

;

1:: $P_{a} = GroundFilter (P_{w})$ ;
2:: $P_{e} = GridFilter (P_{a})$ ;
3:: $C_{p e d} = CandidatePedestrianExtract (P_{e}, η_{c})$ ;
4:: for $c_{p e d}^{i} \in C_{p e d}$ do
5:: $I_{i} = RangeImage (c_{p e d}^{i})$ ;
6:: $L_{t e s t}^{i} = LarkExtract (I_{i})$ ;
7:: $ρ_{i} = CosineSimilarity (L_{t e m p}, L_{t e s t}^{i})$ ;
8:: if $ρ_{i} > η_{d e t}$ then
9:: $c_{p e d}^{i} \in p e d$ ;
10:: else
11:: $c_{p e d}^{i} \notin p e d$ ;
12:: end if
13:: end for

Output: Pedestrians point clouds.

The inputs of the algorithm include: original point clouds

P_{w}

, the LARK feature matrix of template

L_{t e m p}

, threshold of clustering

η_{c}

and the threshold of template matching

η_{d e t}

. The output is the detected pedestrian point clouds. Before detection, a pedestrian cluster with a complete contour is manually extracted from the point clouds to establish the template, which is projected onto the main plane to construct the template projection image. The template matrix of LARK feature

L_{t e m p}

is extracted from the projection image. Then, ground segmentation and grid filtering in Lines 1 and 2 are implemented to filter out ground points and tall buildings. Next, the clustering method based on KDE is utilized to obtain candidate pedestrian

c_{p e d}^{i}

, which is shown in Line 3. After coordinate transforming and main-plane projection, the LARK feature matrix is extracted from the projection image with morphological processing. Cosine similarity is calculated to detect pedestrians with preset threshold

η_{d e t}

.

2.5. Comparing with Existing Methods

There are three important distinctions between the proposed STM-KDE algorithm and existing pedestrian-detection methods. First, while the KDE-based method in Reference [18] uses the KDE process twice to fuse the information for pedestrian detection, the proposed algorithm only uses KDE once to fuse the central information of the segmentation results in multiple scanning layers for pedestrian clustering, which reduces the amount of computation in this process.

Second, the single template-matching method was first utilized for face detection in optical images [29]. This paper applies this detection idea to pedestrian detection with point clouds and proposes the novel STM-KDE detection method that avoids the need for a large number of training data.

Third, compared with existing methods, the proposed algorithm solves the problem of neighboring pedestrian detection with the combination of KDE clustering and LARK contour feature matching, which avoids the problems caused by sliding-window processing or undersegmentation.

2.6. Computation Complexity

The complexity of the proposed STM-KDE detection method is analyzed in this section. The detection method mainly includes two stages: the candidate pedestrian extraction and pedestrian detection with single template.

For candidate pedestrian detection, it is assumed that there are totally n point clouds and m hierarchical segmentations with k point clouds in the scenario. The complexity of Hierarchical segmentation is

O (n)

. Since only center position is utilized to extract the candidate pedestrian, the complexity of KDE-based segmentation is

O (m), m < k < n

.

For pedestrian detection with single template, since the LARK feature is extracted for each pixel, therefore, the complexity of single template matching process is

r \times O (m)

as r denotes the pixel number of the projection image. By integrating the computation of the two processes, the total complexity of the STM-KDE is

O (n) + O (m) + r \times O (m) = O (n)

.

While in Reference [18], there are two-stage of KDE process and the complexity of the first-stage KDE process is the sum of

O (n)

and

O (k)

. The complexity of the second-stage KDE is

O (c)

with c is the number of selected cluster center in each scanning layer,

c < n

. Therefore, the total complexity of the KDE-based detection is

O (n) + O (k) + O (c) = O (n)

.

In conclusion, the total complexities of the two methods have the same order of magnitude. The proposed detection method does not improve the computational complexity comparing with existing method.

3. Results

KITTI datasets [34] were used to verify the detection efficiency of the proposed STM-KDE algorithm. The 0047 and 0016 datasets were used in the experiments. In the 0047 dataset, the vehicle was driving in the campus and the Lidar was on a mobile platform. The optical image of the scenario is shown in Figure 4a. The sensor platform in the 0016 dataset was static. The sensor was fixed at the road junction in the campus, where there was a large number of pedestrians. The scenario is shown in Figure 4b. Since these two datasets did not provide pedestrian ground truth, we manually labeled the position of each pedestrian. The 0047 dataset contained 31 frames of point clouds and a total of 118 pedestrians. The 0016 dataset contained 186 frames of point clouds and a total of 2174 pedestrians in the scenario.

Three experiments were implemented in the paper. The size of grid filtering

l_{g r i d}

was set to

0.2

m. Since the segmentation in Hierarchical segmentation is a rough selection process for candidate pedestrians,

ε

was set to 20. Thresholds in grid filtering

η_{n}

and pedestrian clustering

η_{p e d}

,

η_{c}

were set to 4 and

0.8

m and

0.4

m, respectively, considering the distribution of the point clouds and the size of the pedestrians.

3.1. Clustering

In this section, some areas in the 0016 dataset were utilized to verify the efficiency of the proposed clustering based on KDE. The areas were close to the Lidar. The point clouds were relatively dense and many pedestrians gathered in these areas. In the original point-cloud coordinate system, the x and y range of the selected areas were at

x \in [0, 20] (m)

and

y \in [- 7, 13] (m)

, respectively. Since there was no ground truth provided, the clustering results are qualitatively described.

Figure 5 shows the clustering results selected every 10 frames. The first column represents the original point clouds. The second column shows the RBNN clustering results, in which the radius of the sphere was set to

0.5

m. The third column is the results of the proposed KDE clustering. Only the extracted candidate-pedestrian clusters are shown in the results. In Figure 5, different types of point-cloud clusters are represented by different colors.

As shown in Figure 5, the clustering results of the proposed KDE method are more accurate for the clustering of gathering pedestrians. In the black box 1 in Figure 5b,c, there are four pedestrians in total. With the RBNN algorithm, there are three pedestrians among them are undersegmented (in orange points). While the KDE-based clustering method could segment the four gathering pedestrians well. The similar results are shown in box 2. With KDE-based clustering, the two pedestrians are segmented well. However, it is worth noting that if the pedestrians stood within a close distance and there was occlusion in the scenario, neither algorithm could accurately segment the pedestrians, as shown in the black box of Figure 5e,f. Therefore, it can be seen that pedestrian size is small and at the same time occlusion often happens in the environment, which causes many difficulties for subsequent pedestrian detection.

3.2. Template Matching

The density of Lidar point clouds decreases with the increase of distance due to Lidar’s scanning characteristics. In real scenarios, pedestrians may encounter occlusion, which leads to the reduction of returned point clouds. Figure 6 shows the projection image of pedestrians with different numbers of point clouds and their cosine similarities with the template. The pedestrians are extracted in different positions based on the ground truth. Some pedestrians that are occluded by obstacles are split into multiple small areas, like Figure 6b,d,e. As shown in Figure 6, with the number of point clouds increasing, the contour of pedestrians is gradually completed and similarity is also higher. When a pedestrian is occluded, as shown in Figure 6b,d,e, occlusion causes the contour to be missing and reduces similarity between pedestrian and template.

Figure 7 shows the statistical relationship between the number of point clouds and cosine similarity. Each red point in Figure 7a represents a test sample. Figure 7b shows a curve of the similarity variation trend with the number of point clouds. The x-coordinate of the curve is quantization value

X

of number of point clouds x using Equation (14), in which s is the quantization unit. For pedestrians that are quantized to the same value, similarity is calculated by their mean value. It is assumed that the samples are denoted by

(X, Y)

, in which

X = x_{1}, \dots, x_{n}

represents the set of point clouds numbers,

Y = y_{1}, \dots, y_{n}

represents the set of cosine similarities between the samples and the template. Then X is quantified to

X

with Equation (14) and the cosine similarities of the obtained same quantified

X

are averaged to

\bar{Y}

. Figure 7b shows the relationship of

(X, \bar{Y})

. As can be seen from Figure 7, similarity gradually increases with the increase of number of point clouds.

X = ⌈\frac{x}{s}⌉ \times s

(14)

Figure 8 shows similarity variation with pedestrian distance. Figure 8a shows the similarity and pedestrian distance of each test sample. Figure 8b shows the curve of similarity and quantized distance, which is obtained by using the same quantified process as Figure 7b. Similarity gradually decreases with the increase of distance. In summary, it can be seen that a template-based detection method can obtain good matching results at close distance with dense point clouds.

3.3. Pedestrian Detection

The pedestrian-detection method in Reference [18] and single template matching with RBNN (STM-RBNN) were utilized to compare detection efficiency. A receiver operating characteristic (ROC) curve was adopted to evaluate pedestrian-detection performance with detection rate and false-alarm number. The calculation of the detection rate is in Equation (15).

R_{D} = \frac{N_{P}}{N_{P A}},

(15)

where

N_{P}

denotes the number of detected pedestrians and

N_{P A}

represents the total number of pedestrians in the scenarios.

Figure 9 shows the ROC curve of pedestrian detection in the range of 50 m. The black solid line shows the performance of the STM-KDE algorithm. The blue dash-and-dotted line shows the performance of the STM-RBNN algorithm. The red dotted line is the results of Gidel’s method based on KDE. It can be seen that, under the same false-alarm number, the detection rate of the two STM-based methods achieves better performance than the method based on KDE. The KDE-based pedestrian-detection method models upright pedestrians as cylinders and regards objects with high probability as candidate pedestrians. However, there are many objects that have a similar size to pedestrians in the scenario, such as street signs and mailboxes. Figure 10 shows a detection example of objects that may cause false alarms. Figure 10a is the optical image of the scenario. The object in the blue circle is a street sign that has similar height to a pedestrian. Detection results are shown in Figure 10b,c. The STM-based method can distinguish the object, while the KDE method detects it as a pedestrian. In addition, a tree trunk in the bush was wrongly detected as a pedestrian with the KDE-based method. Therefore, the KDE-based pedestrian-detection method is more suitable in simple scenarios with fewer columnar targets.

In Figure 9, it can also be concluded that the detection rate of the STM-RBNN algorithm is higher than that of STM-KDE in the case of low false alarms. The reason is that RBNN clustering results for most of the objects in the environment were good, though undersegmentation may occur when objects are close to each other. In KDE clustering, some point clouds of walls and street signs in special angles may be accumulated and extracted as candidate pedestrians, which produces false alarms. However, with threshold decrease, the STM-KDE algorithm achieves a higher detection rate. The reason that limits the detection rate with RBNN clustering is that there were many gathering pedestrians in the scenario, which made them undersegmented with RBNN clustering and filtered them out.

According to the ROC curve, appropriate thresholds were selected to show the statistics of pedestrian-detection results in different distances. Equations (16) and (17) were used to evaluate detection performance, in which the precision P indicates the ratio of the number of correct detections to the total number of detections and the recall R indicates the ratio of the number of correct detections to the total number of targets.

P = \frac{T P}{T P + F P}, R = \frac{T P}{T P + F N},

(16)

F_{1} = \frac{2 P R}{P + R},

(17)

where

T P

, number of true detections;

F P

, number of false alarms,

F N

, number of missing detections.

F_{1}

score is the harmonic average of precision P and recall R and it is a comprehensive measure of detection results. Generally, the higher the

F_{1}

score is, the better the detection performance of the algorithm is. The probability threshold in the KDE-based detection method was set to

η_{d e t}^{K D E} = 0.6

. The cosine similarity thresholds in STM with RBNN and KDE clustering were set to

η_{d e t}^{S T} = 0.6

. Detection results in the range of 15 m, 25 m and 50 m are listed in Table 1, Table 2 and Table 3, respectively.

As can be seen in Table 1, Table 2 and Table 3, as the distance of the Lidar and the objects increases, the

F_{1}

score of the three detection algorithms decreases to different degrees, which shows that pedestrian-detection efficiency has a decreasing trend. As distance increases, the number of point clouds returned by the pedestrian decreases and the contour of pedestrian gradually blurs, which causes a lot of miss alarms. At the same time, due to sparse point clouds in distant areas, the contour difference between pedestrians and other columnar objects may be reduced, resulting in many false alarms.

In addition, as can be seen from the results, in the range of 25–50 m, the detection performance of the KDE-based method obviously increases. The number of detections increases from 980 to 1059, while the detection number of STM-KDE only increased from 1042 to 1067. From Figure 7 and Figure 8, it can be seen that cosine similarity decreases as the number of point clouds decreases, which makes detection more difficult. In the 50 m range, the number of false alarms of the two STM-based algorithms was 217 and 339, respectively, while the false-alarm number of the KDE-based number was increased to 1165. Therefore, compared with the KDE-based algorithm, the STM-based algorithm has a significant suppression effect on false alarms.

Since the moving speed of pedestrians is slow, and the safety of pedestrians who are closer to vehicles is more threatened by vehicles, pedestrian detection in close areas is more significant for environmental perception. Figure 11 shows the ROC curve in the range of 15 m. The KDE-based method performs well in the case of low false alarms. However, it cannot distinguish between upright pedestrians and columnar objects. The low detection threshold leads to a high detection rate but at the same time brings many false alarms, which limits the improvement of the detection rate. It can be seen from Figure 5 that RBNN clustering cannot accurately segment gathering pedestrians, so improvement of the detection rate is also limited. The STM-KDE algorithm could solve the problem of point-cloud segmentation and it has good detection performance for pedestrians in close distances.

3.4. Operation Time

All experiments in this work are implemented on a MacBook Pro with a 2.9 GHz Inter Core i5 CPU and 8 GB main memory. The MATLAB R2017b is employed to process the pedestrian detection with KITTI datasets. The operation time in 0016 dataset is shown in Table 4. Each process is calculated seprately. There are totally 31 frames of point clouds in 0016 dataset. The algorithm will detect a total of 583 candidate pedestrians in the scenarios, from which the real pedestrians are finally determined.

As shown in Table 4, in the consumed 28.4 s, there are 248 ms to read the point clouds and 242 ms to mark the layer index, which is caused due to processing the data in MATLAB platform. After removing the consuming time of data reading and scan layer index marking, it takes an average of 48 ms to detect one cluster. With the advanced hardware platform and parallel computing, the detection method can get real-time performance.

4. Conclusions

In this paper, a pedestrian-detection algorithm based on STM-KDE was proposed. The experiment results show that the proposed algorithm can cluster gathering pedestrians well and effectively distinguish pedestrians from other columnar objects. The main contributions of the paper are listed as follows.

First, considering the size of pedestrians and gathering characteristics, the candidate-pedestrian clustering method based on hierarchical segmentation and multilayer fusion can solve the segmentation problem of gathering pedestrians. The KDE-based method can distinguish between columnar objects and other structural objects in the scenario and reduce the searching region of pedestrian detection.

Second, the proposed single template-matching pedestrian-detection method can distinguish upright pedestrians from other columnar objects in the scenario. The contour feature matrix is established by projecting the point clouds on the main plane and constructing a projection image. The proposed method applies the idea of template matching to pedestrian detection with point clouds, which avoids model training with a large number of data.

Third, the proposed detection method does not improve the computation complexity on the basis of improving the detection efficiency compared with existing detection method. With the advanced hardware platform and parallel computing, the proposed detection will perform well in operation time.

The proposed algorithm still has the problem of a decrease in detection rate when point clouds are sparse or occluded. In the future, point-cloud density enhancement and sensor fusion could be applied to obtain more object information for better detection.

Author Contributions

K.L. presented the algorithm and wrote the paper; K.L. and W.W. analyzed the data. W.W. and J.W. reviewed and revised the paper.

Funding

This research was funded by the National Natural Science Foundation of China under grant No. 61771028.

Acknowledgments

The authors would like to thank the Jianqiang Wang of Tsinghua University for his constructive suggestions on our manuscript. The authors are also grateful to the reviewers and editor for their valuable comments and remarks to improve the quality of the contributions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ren, R.; Fu, H.; Wu, M. Large-Scale Outdoor SLAM Based on 2D Lidar. Electronics 2019, 8, 613. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Ouyang, W.; Zeng, X.; Wang, X. Single-Pedestrian Detection Aided by Two-Pedestrian Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1875–1889. [Google Scholar] [CrossRef] [PubMed]
Hurney, P.; Waldron, P.; Morgan, F.; Jones, E.; Glavin, M. Review of Pedestrian Detection Techniques in Automotive Far-Infrared Video. IET Intell. Transp. Syst. 2015, 9, 824–832. [Google Scholar] [CrossRef]
Inan, T.; Halici, U. 3-D Face Recognition with Local Shape Descriptors. IEEE Trans. Inf. Forensics Secur. 2012, 7, 577–587. [Google Scholar] [CrossRef]
Wang, R.; Xu, Y.; Sotelo, M.A.; Ma, Y.; Sarkodie-Gyan, T.; Li, Z.; Li, W. A Robust Registration Method for Autonomous Driving Pose Estimation in Urban Dynamic Environment Using LiDAR. Electronics 2019, 8, 43. [Google Scholar] [CrossRef]
Goodin, C.; Carruth, D.; Doude, M.; Hudson, C. Predicting the Influence of Rain on LIDAR in ADAS. Electronics 2019, 8, 89. [Google Scholar] [CrossRef]
Zhao, J.; Xu, H.; Wu, J.; Zheng, Y.; Liu, H. Trajectory Tracking and Prediction of Pedestrian’s Crossing Intention Using Roadside LiDAR. IET Intell. Transp. Syst. 2019, 13, 789–795. [Google Scholar] [CrossRef]
Arras, K.O.; Mozos, O.M.; Burgard, W. Using Boosted Features for the Detection of People in 2D Range Data. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007; pp. 3402–3407. [Google Scholar]
Premebida, C.; Ludwig, O.; Nunes, U. Exploiting LIDAR-based Features on Pedestrian Detection in Urban Scenarios. In Proceedings of the 2009 12th International IEEE Conference on Intelligent Transportation Systems, St. Louis, MO, USA, 4–7 October 2009; pp. 1–6. [Google Scholar]
Kim, B.; Choi, B.; Yoo, M.; Kim, H.; Kim, E. Robust Object Segmentation Using A Multi-layer Laser Scanner. Sensors 2014, 14, 20400–20418. [Google Scholar] [CrossRef]
Kim, B.; Choi, B.; Park, S.; Kim, H.; Kim, E. Pedestrian/Vehicle Detection Using a 2.5-D Multi-Layer Laser Scanner. IEEE Sens. J. 2016, 16, 400–408. [Google Scholar] [CrossRef]
Tang, H.L.; Chien, S.C.; Cheng, W.H.; Chen, Y.Y.; Hua, K.L. Multi-cue Pedestrian Detection from 3D Point Cloud Data. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 1279–1284. [Google Scholar]
Li, K.; Wang, X.; Xu, Y.; Wang, J. Density Enhancement-Based Long-Range Pedestrian Detection Using 3-D Range Data. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1368–1380. [Google Scholar] [CrossRef]
Stauffer, C.; Grimson, W.E.L. Learning Patterns of Activity Using Real-time Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 747–757. [Google Scholar] [CrossRef]
Benedek, C.; Gálai, B.; Nagy, B.; Jankó, Z. Lidar-Based Gait Analysis and Activity Recognition in a 4D Surveillance System. IEEE Trans. Circuits Syst. Video Technol. 2016, 28, 101–113. [Google Scholar] [CrossRef] [Green Version]
Cui, J.; Zha, H.; Zhao, H.; Shibasaki, R. Laser-Based Detection and Tracking of Multiple People in Crowds. Comput. Vis. Image Underst. 2007, 106, 300–312. [Google Scholar] [CrossRef]
Gidel, S.; Checchin, P.; Blanc, C.; Chateau, T.; Trassoudaine, L. Pedestrian Detection and Tracking in an Urban Environment Using a Multilayer Laser Scanner. IEEE Trans. Intell. Transp. Syst. 2010, 11, 579–588. [Google Scholar] [CrossRef]
Liu, K.; Wang, W.; Tharmarasa, R.; Wang, J. Dynamic Vehicle Detection With Sparse Point Clouds Based on PE-CPD. IEEE Trans. Intell. Transp. Syst. 2019, 20, 1964–1977. [Google Scholar] [CrossRef]
Börcs, A.; Nagy, B.; Benedek, C. Instant Object Detection in LiDAR Point Clouds. IEEE Geosci. Remote Sens. Lett. 2017, 14, 992–996. [Google Scholar] [CrossRef]
Liu, K.; Wang, W.; Tharmarasa, R.; Wang, J.; Zuo, Y. Ground Surface Filtering of 3D Point Clouds Based on Hybrid Regression Technique. IEEE Access 2019, 7, 23270–23284. [Google Scholar] [CrossRef]
Giorgini, M.; Barbieri, F.; Aleotti, J. Ground Segmentation from Large-Scale Terrestrial Laser Scanner Data of Industrial Environments. IEEE Robot. Autom. Lett. 2017, 2, 1948–1955. [Google Scholar] [CrossRef]
Wei, P.; Cagle, L.; Reza, T.; Ball, J.; Gafford, J. LiDAR and Camera Detection Fusion in a Real-Time Industrial Multi-Sensor Collision Avoidance System. Electronics 2018, 7, 84. [Google Scholar] [CrossRef]
Jun, W.; Wu, T.; Zheng, Z. LIDAR and Vision Based Pedestrian Detection and Tracking System. In Proceedings of the 2015 IEEE International Conference on Progress in Informatics and Computing (PIC), Nanjing, China, 18–20 December 2015; pp. 118–122. [Google Scholar]
Klasing, K.; Wollherr, D.; Buss, M. A Clustering Method for Efficient Segmentation of 3D Laser Data. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, 19–23 May 2008; pp. 4043–4048. [Google Scholar]
Chavez-Garcia, R.O.; Aycard, O. Multiple sensor fusion and classification for moving object detection and tracking. IEEE Trans. Intell. Transp. Syst. 2015, 17, 525–534. [Google Scholar] [CrossRef]
Cheng, Y. Mean Shift, Mode Seeking, and Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1995, 17, 790–799. [Google Scholar] [CrossRef]
Navarro-Serment, L.E.; Mertz, C.; Hebert, M. Pedestrian Detection and Tracking Using Three-dimensional Ladar Data. Int. J. Robot. Res. 2010, 29, 1516–1528. [Google Scholar] [CrossRef]
Seo, H.J.; Milanfar, P. Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1688–1704. [Google Scholar] [PubMed]
Wang, Y.; Wang, W. Face Detection Using Skin Color and Locally Adaptive Regression Kernels. In Proceedings of the 2017 20th International Conference on Information Fusion (Fusion), Xi’an, China, 10–13 July 2017; pp. 1–8. [Google Scholar]
Takeda, H.; Farsiu, S.; Milanfar, P. Kernel Regression for Image Processing and Reconstruction. IEEE Trans. Image Process. 2007, 16, 349–366. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, D.; Lu, H.; Bo, C. Visual Tracking via Weighted Local Cosine Similarity. IEEE Trans. Cybern. 2015, 45, 1838–1850. [Google Scholar] [CrossRef] [PubMed]
Pirlo, G.; Impedovo, D. Cosine Similarity for Analysis and Verification of Static Signatures. IET Biom. 2013, 2, 151–158. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision Meets Robotics: The KITTI Dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]

Figure 1. Flowchart of pedestrian detection.

Figure 2. Pedestrian-detection coordinates.

O x y z

, Lidar coordinate system;

O^{'} x^{'} y^{'} z^{'}

, local coordinate system.

Figure 2. Pedestrian-detection coordinates.

O x y z

, Lidar coordinate system;

O^{'} x^{'} y^{'} z^{'}

, local coordinate system.

Figure 3. Projection-image generation. (a) Pedestrian optical image; (b) corresponding point clouds; (c) projection image generated by projecting point clouds onto main plane; (d) result of morphological processing on (c).

Figure 4. Scenario optical images: (a) 0047 and (b) 0016 dataset.

Figure 5. Clustering results. Images in the first column are the original point clouds. Results in the second column are clustering with radially bounded nearest neighbor (RBNN). The results in the third column are the clustering with kernel density estimation (KDE). (a) Original point clouds (Frame 1); (b) RBNN (Frame 1); (c) KDE (Frame 1); (d) original point clouds (Frame 11); (e) RBNN (Frame 11); (f) KDE (Frame 11); (g) original point clouds (Frame 21); (h) RBNN (Frame 21); (i) KDE (Frame 21); (j) original point clouds (Frame 31); (k) RBNN (Frame 31); (l) KDE (Frame 31).

Figure 6. Projection image with different number of point clouds. Numbers in parentheses indicate cosine similarities. (a) Five points (0.36); (b) 10 points (0.38); (c) 15 points (0.46); (d) 22 points (0.49); (e) 26 points (0.53); (f) 30 points (0.54); (g) 35 points (0.58); (h) 40 points (0.59); (i) 43 points (0.61); (j) 49 points (0.62).

Figure 7. Relationship between number of point clouds and cosine similarity. (a) Relationship between number of point clouds and similarity for each sample. (b) Statistical curve of relationship between quantized number of point clouds and average similarity.

Figure 8. Relationship between pedestrian distance and cosine similarity. (a) Relationship between distance and similarity for each sample. (b) Statistical curve of relationship between quantized distance and average similarity.

Figure 9. Receiver operating characteristic (ROC) curve in the range of 50 m.

Figure 10. Detection results of objects that may cause false alarms. (a) Scenario optical image. (b) Detection results with single template matching. (c) Detection result with the KDE-based method.

Figure 11. ROC curve in the range of 15 m.

Table 1. Pedestrian detection in the range of 15 m.

Algorithm	Datasets	Total Number	Truely Detected	False Alarms	Precision	Recall	$F_{1}$ Score
KDE-based method	0016	852	579	23	96.2%	68.0%	0.80
	0047	30	22	11	66.7%	73.3%	0.70
	Total	882	601	34	94.6%	68.1%	0.79
STM-RBNN	0016	852	427	77	84.7%	50.1%	0.63
	0047	30	29	10	74.4%	96.7%	0.84
	Total	882	456	87	84.0%	51.7%	0.64
STM-KDE	0016	852	629	34	94.9%	73.8%	0.83
	0047	30	28	14	66.7%	93.3%	0.78
	Total	882	657	48	93.2%	74.5%	0.83

Table 2. Pedestrian detection in the range of 25 m.

Algorithm	Datasets	Total Number	Truely Detected	False Alarms	Precision	Recall	$F_{1}$ Score
KDE-based method	0016	1591	890	27	97.1%	55.9%	0.71
	0047	100	90	33	73.2%	90.0%	0.81
	Total	1691	980	60	94.2%	58.0%	0.72
STM-RBNN	0016	1591	711	27	96.3%	44.7%	0.61
	0047	100	97	18	84.3%	97.0%	0.90
	Total	1691	808	45	94.7%	47.8%	0.64
STM-KDE	0016	1591	945	38	96.1%	59.4%	0.73
	0047	100	97	23	80.8%	97.0%	0.88
	Total	1691	1042	61	94.4%	61.6%	0.75

Table 3. Pedestrian detection in the range of 50 m.

Algorithm	Datasets	Total Number	Truely Detected	False Alarms	Precision	Recall	$F_{1}$ Score
KDE-based method	0016	2174	962	1106	46.5%	44.3%	0.45
	0047	118	97	59	62.2%	82.2%	0.71
	Total	2292	1059	1165	47.6%	46.2%	0.47
STM-RBNN	0016	2174	731	196	78.9%	33.6%	0.47
	0047	118	98	21	82.4%	83.1%	0.83
	Total	2292	829	217	79.3%	36.2%	0.50
STM-KDE	0016	2174	965	313	75.5%	44.3%	0.56
	0047	118	102	26	79.7%	86.4%	0.83
	Total	2292	1067	339	75.9%	46.6%	0.58

Table 4. Operation time of the pedestrian detection.

Total Time	Time/ Frame	Read Data	Preprocessing	Scan Layer Mark	Cluster	Template Match	Cluster Number	Detection/ Cluster
28.4 s	916 ms	248 ms	1791 ms	242 ms	11231 ms	11831 ms	583	48 ms

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, K.; Wang, W.; Wang, J. Pedestrian Detection with Lidar Point Clouds Based on Single Template Matching. Electronics 2019, 8, 780. https://doi.org/10.3390/electronics8070780

AMA Style

Liu K, Wang W, Wang J. Pedestrian Detection with Lidar Point Clouds Based on Single Template Matching. Electronics. 2019; 8(7):780. https://doi.org/10.3390/electronics8070780

Chicago/Turabian Style

Liu, Kaiqi, Wenguang Wang, and Jun Wang. 2019. "Pedestrian Detection with Lidar Point Clouds Based on Single Template Matching" Electronics 8, no. 7: 780. https://doi.org/10.3390/electronics8070780

APA Style

Liu, K., Wang, W., & Wang, J. (2019). Pedestrian Detection with Lidar Point Clouds Based on Single Template Matching. Electronics, 8(7), 780. https://doi.org/10.3390/electronics8070780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pedestrian Detection with Lidar Point Clouds Based on Single Template Matching

Abstract

1. Introduction

1.1. Training-Based Methods

1.2. Foreground and Background Segmentation Methods

1.3. KDE-Based Method

2. Materials and Methods

2.1. Preprocessing

2.2. Clustering Based on KDE

2.2.1. Hierarchical Segmentation

2.2.2. Multilayer Fusion

2.3. Template Matching

2.3.1. Projection-Image Generation

2.3.2. Feature Extraction

2.3.3. Cosine Similarity

2.4. Pedestrian Detection

2.5. Comparing with Existing Methods

2.6. Computation Complexity

3. Results

3.1. Clustering

3.2. Template Matching

3.3. Pedestrian Detection

3.4. Operation Time

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI