Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation

Yeom, Jihwan; Kim, Jinman; Kook, Joongjin

doi:10.3390/app16136286

Open AccessArticle

Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation

by

Jihwan Yeom

¹,

Jinman Kim

² and

Joongjin Kook

^1,*

¹

Department of Information Security Engineering, Sangmyung University, 31 Sangmyungdae-gil, Dongnam-gu, Cheonan-si 31066, Chungcheongnam-do, Republic of Korea

²

Department of General Education Center, Sangmyung University, 31 Sangmyungdae-gil, Dongnam-gu, Cheonan-si 31066, Chungcheongnam-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(13), 6286; https://doi.org/10.3390/app16136286 (registering DOI)

Submission received: 27 May 2026 / Revised: 19 June 2026 / Accepted: 19 June 2026 / Published: 23 June 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

This paper presents a non-learning-based, seed-dependent, semi-automatic pedestrian candidate generation pipeline for LiDAR point clouds. The proposed method is designed to support 3D annotation workflows by reducing irrelevant candidate clusters while improving the reliability of pedestrian candidate selection under distance-dependent point sparsity. The pipeline integrates distance-aware DBSCAN clustering, Single Template Matching (STM), and Centralized Point Augmentation (CPA). First, LiDAR points within the camera field of view are preprocessed, and pedestrian candidate clusters are generated using DBSCAN parameters configured according to distance intervals. Ground-snapping-based bounding-box refinement and height-based filtering are then applied to improve geometric consistency and reduce non-pedestrian candidates. In the second stage, STM compares PCA-aligned projected silhouettes of candidate clusters with a seed pedestrian template to suppress false positives. To address silhouette instability caused by sparse mid-range pedestrian points, CPA adds centroid-contracted points in the projection-relevant plane before template matching. Experiments on pedestrian-containing frames from the KITTI dataset show that STM improves precision from 27.6% to 60.5% and increases the F1-score from 36.8% to 51.4% compared with the initial DBSCAN-based candidate generation stage. The final CPA configuration improves recall from 44.7% to 46.7% and the overall F1-score from 51.4% to 52.1%, while revealing a precision–recall trade-off. Supplementary IoU analysis shows that the final DBSCAN–STM–CPA configuration maintains meaningful spatial overlap with pedestrian ground-truth boxes, achieving 88.9% at 3D IoU ≥ 0.10 and 81.6% at BEV IoU ≥ 0.25. Runtime analysis further shows that height-based filtering reduces the average per-frame processing time from 151.5 ms to 125.1 ms, while the final CPA configuration introduces only a small overhead, resulting in 126.2 ms per frame. These results demonstrate that the proposed DBSCAN–STM–CPA pipeline can provide reliable pedestrian candidates for semi-automatic 3D labeling without requiring class-specific detector training.

Keywords:

LiDAR point cloud; pedestrian candidate generation; semi-automatic 3D labeling; DBSCAN; single template matching; centralized point augmentation; distance-aware clustering

1. Introduction

In physical AI systems such as autonomous vehicles and mobile robots, reliable perception of surrounding objects is essential for safe navigation and decision-making [1]. Among various object categories, pedestrians are particularly important because they are small, dynamic, and safety-critical targets. LiDAR sensors provide accurate geometric measurements and are robust to illumination changes, making them widely used in autonomous driving and robotic perception systems [2]. However, LiDAR-based pedestrian perception remains challenging because the number of points reflected from a pedestrian decreases rapidly as the distance from the sensor increases [3].

This distance-dependent point sparsity causes several practical difficulties. In near-range regions, pedestrian bodies are usually represented by relatively dense point distributions, allowing object-level clusters and shape features to be formed. In contrast, in medium- and long-range regions, pedestrians may be represented by only partial or fragmented point sets. Such sparsity weakens clustering stability, degrades shape representation, and reduces the reliability of template-based or learning-based detection. These limitations are especially problematic for pedestrian detection because small variations in point distribution can significantly affect candidate localization and silhouette similarity [4,5,6].

Another important challenge is the high cost of 3D bounding-box annotation. Compared with 2D image labeling, 3D point cloud annotation is more time-consuming because annotators must manipulate object boxes in three-dimensional space while considering sparse, occluded, or incomplete object shapes [7,8,9]. Although detector-assisted labeling methods can reduce manual effort by generating candidate boxes using pre-trained 3D detectors, they depend on object categories and domains learned during training. This limits their usefulness in early-stage dataset construction, domain adaptation, or scenarios where sufficient labeled data is not yet available [7,10].

To address these issues, this paper proposes a non-learning-based, seed-dependent, semi-automatic pedestrian candidate generation pipeline using LiDAR point clouds from the KITTI dataset [11,12]. The proposed pipeline combines Distance-Aware DBSCAN, Single Template Matching (STM), and Centralized Point Augmentation (CPA). In the first stage, distance-aware DBSCAN is used to generate pedestrian candidate clusters by adjusting clustering parameters according to distance-dependent point density [13]. Ground-snapping-based bounding-box refinement using RANSAC-based plane estimation and height-based filtering is then applied to improve geometric consistency and reduce irrelevant candidate clusters [14]. In the second stage, STM refines the remaining candidates by comparing their projected silhouettes with a seed pedestrian template [15]. Finally, CPA is introduced to stabilize sparse mid-range pedestrian silhouettes by adding centroid-contracted points in the projection-relevant plane.

The proposed method should not be interpreted as a fully unsupervised detector because the STM stage depends on a seed pedestrian instance [15]. Instead, it is designed as a semi-automatic labeling support framework in which clustering generates object candidates and a seed-dependent template refines pedestrian-like clusters. This design is suitable for practical labeling scenarios where a human operator can provide or verify a representative seed instance and then use the system-generated candidates to reduce manual annotation effort [7,8,9].

The main contributions of this paper are summarized as follows.

First, we propose a distance-aware DBSCAN-based candidate generation strategy for LiDAR pedestrian point clouds. By configuring clustering parameters according to distance intervals, the method addresses the density variation between near-, mid-, and far-range pedestrian observations [13].

Second, we integrate seed-dependent STM as a precision-oriented candidate refinement module. The STM stage suppresses false positives by comparing PCA-aligned projected silhouettes with a seed pedestrian template, thereby improving candidate reliability for semi-automatic labeling [15].

Third, we introduce centralized point augmentation as a targeted mid-range silhouette stabilization strategy. CPA reinforces sparse projected pedestrian shapes by adding centroid-contracted points, improving the retention of valid mid-range pedestrian candidates while revealing a clear precision–recall trade-off.

Fourth, we provide a distance-wise experimental analysis on KITTI pedestrian instances, including DBSCAN parameter sensitivity, STM refinement effects, CPA step sensitivity, and final candidate generation performance [6,11]. The results clarify where the proposed pipeline is effective and where its limitations remain under severe long-range LiDAR sparsity.

The remainder of this paper is organized as follows. Section 2 reviews related work on semi-automatic 3D point cloud labeling, LiDAR-based 3D object detection, clustering-based candidate generation, template matching, and representation enhancement. Section 3 describes the proposed DBSCAN–STM–CPA pipeline in detail. Section 4 presents experimental results and distance-wise analyses. Section 5 discusses the implications and limitations of the proposed method, and Section 6 concludes the paper.

2. Related Work

2.1. Semi-Automatic 3D Point Cloud Labeling

The construction of large-scale LiDAR datasets requires accurate 3D bounding-box annotations, but manual labeling of point clouds is considerably more difficult and time-consuming than 2D image annotation. Unlike images, point clouds provide sparse and irregular geometric measurements, and object boundaries are often ambiguous due to occlusion, truncation, and distance-dependent point density degradation. This difficulty is particularly severe for small objects such as pedestrians, whose visible point distributions can be highly incomplete in medium- and long-range regions.

To reduce annotation effort, several 3D point cloud labeling tools and semi-automatic annotation frameworks have been proposed. 3D BAT provides a web-based annotation environment for multi-modal driving data and supports semi-automatic object tracking and 3D bounding-box manipulation [9]. labelCloud was introduced as a lightweight, domain-agnostic labeling tool that supports direct 3D point cloud annotation with relatively simple dependencies and flexible input formats [8]. More recent work has further explored semi-automatic LiDAR annotation using cross-scene adaptability and temporal consistency, aiming to reduce the burden of large-scale point cloud labeling [16].

Another common strategy is to use pre-trained 3D object detectors to generate initial bounding-box candidates that are subsequently corrected by human annotators [7]. Although detector-assisted labeling can substantially reduce annotation cost, it is inherently limited by the object categories and domains learned during training. In early-stage dataset construction or in scenarios involving novel or underrepresented object classes, pre-trained detectors may fail to generate reliable candidates. Therefore, non-learning-based candidate generation methods that exploit geometric point distributions remain useful as complementary components in semi-automatic labeling pipelines.

The present study follows this semi-automatic labeling perspective. The proposed method does not aim to fully replace human annotation. Instead, it generates pedestrian candidate clusters that can support subsequent human verification or box refinement. Because the STM stage uses a seed pedestrian instance as a shape reference, the overall framework is described as a seed-dependent, non-learning-based semi-automatic candidate generation method rather than a fully unsupervised detector.

2.2. LiDAR-Based 3D Object Detection and Long-Range Sparsity

LiDAR-based 3D object detection has been widely studied for autonomous driving and robotic perception. The KITTI dataset has served as a representative benchmark for evaluating 3D object detection methods in autonomous driving scenarios [11,12]. Modern 3D detectors represent point clouds using different structures, including point-based, voxel-based, pillar-based, and multi-view representations. PointPillars converts raw point clouds into vertical pillar features and applies a 2D convolutional backbone for efficient object detection [17]. Frustum PointNets use 2D image detections to define frustum proposals and then perform 3D instance segmentation and box estimation from point clouds [18]. AVOD combines feature representations from multiple views for 3D proposal generation and detection [19], while PointCNN learns convolution-like operations directly from unordered point sets using an X-transformation [20].

Despite the strong performance of learning-based 3D detectors, their detection accuracy can degrade substantially when LiDAR point density decreases with distance. The KITTI dataset was collected using a Velodyne HDL-64E LiDAR sensor [11,12,21].

This sparsity directly affects pedestrian detection because pedestrians are small objects and may be represented by only a few points in far-range regions. Previous long-range pedestrian detection studies have shown that conventional 3D detectors often suffer from severe performance degradation for distant pedestrians, especially when the number of valid points is insufficient to form reliable object-level features [4,5,6].

Several recent studies have addressed this limitation using multi-modal fusion, range-aware feature learning, point completion, or far-field detection strategies. These methods typically aim to compensate for sparse LiDAR geometry by incorporating image cues, semantic information, or learned feature enhancement. However, most of them still require annotated training data and confidence-score-based detection outputs. In contrast, the objective of this study is to investigate a non-learning-based alternative for semi-automatic pedestrian candidate generation. The proposed method focuses on distance-aware clustering, seed-based template refinement, and simple geometric augmentation rather than end-to-end detector training.

2.3. Cluster-Based 3D Candidate Generation

Clustering-based methods have long been used for object candidate generation in point cloud processing because they can directly exploit spatial distributions without requiring annotated training data. Among them, DBSCAN is a representative density-based clustering algorithm that groups points according to local density conditions defined by eps and min_samples [13]. DBSCAN is suitable for point cloud candidate extraction because it can discover irregularly shaped clusters and identify low-density noise points.

However, applying DBSCAN to LiDAR data requires careful parameter selection. A single global eps value cannot adequately handle the distance-dependent density variation in LiDAR point clouds. In near-range regions, pedestrian points are relatively dense, and a small eps value may be sufficient to form compact object clusters. In contrast, medium- and long-range pedestrians are represented by sparse and fragmented points, requiring a larger eps value to prevent valid object points from being separated or classified as noise. Conversely, excessively large eps values can merge adjacent objects or background structures into a single cluster, increasing false positives and degrading bounding-box quality.

To address this issue, range-aware clustering strategies have been investigated in LiDAR-based perception. These approaches adjust clustering parameters or preprocessing rules according to distance, point density, height, or ground-plane structure. Such strategies are especially important for pedestrian candidate generation because the physical size of the pedestrian remains similar, while the observed point density changes significantly with distance.

The proposed method adopts this principle by configuring DBSCAN parameters separately for near-, mid-, and far-range regions. The distance-aware DBSCAN stage serves as the initial candidate generation module, and the resulting clusters are refined through ground snapping, height-based filtering, and seed-dependent template matching.

2.4. Seed-Based Template Matching for Pedestrian Refinement

Template matching provides an interpretable mechanism for comparing object shapes without training a class-specific detector. In LiDAR-based pedestrian detection, Single Template Matching (STM) constructs a pedestrian template from a single seed instance and compares candidate clusters with this seed template after PCA-based alignment and 2D projection [15]. The typical STM process includes candidate extraction, local coordinate alignment, projection onto a 2D plane, binary silhouette generation, morphological refinement, and similarity computation.

STM is attractive for semi-automatic labeling because a human operator can provide one representative seed object, allowing the system to search for geometrically similar candidates without requiring a trained detector. However, this advantage also implies that STM is seed-dependent. Its performance can be affected by the quality, distance, and representativeness of the selected seed. In addition, because STM compares projected silhouettes, it can be unstable when candidate clusters contain only sparse or partial points. This limitation becomes more critical in medium- and long-range pedestrian detection, where LiDAR sparsity causes fragmented or distorted silhouettes.

For this reason, this study uses STM as a second-stage candidate refinement module rather than as a standalone detector. The first stage generates pedestrian candidate clusters through distance-aware DBSCAN and height-based filtering, while the second stage suppresses false positives by comparing candidate silhouettes with the seed template. Since the seed provides class-related shape information, the proposed framework should be interpreted as a seed-dependent, semi-automatic labeling support method, not as a fully unsupervised object detector.

2.5. Evaluation Metrics for Candidate Localization

Official 3D object detection benchmarks generally evaluate detections using Intersection over Union (IoU)-based matching and Average Precision (AP). The KITTI benchmark, for example, relies on precision–recall curves and AP computed under predefined overlap criteria [22]. These metrics are suitable for confidence-score-based detectors but are less directly applicable to non-learning-based candidate generation pipelines that do not produce calibrated confidence scores.

For sparse pedestrian clusters, IoU-based evaluation can also be overly strict. Cluster-derived axis-aligned bounding boxes may tightly follow the observed point distribution, while ground-truth boxes are manually annotated and may include unobserved object extents. As a result, a candidate cluster can be located near the pedestrian center but still obtain a low IoU due to small box size, partial observation, or orientation mismatch. To complement IoU-based evaluation, center-distance-based matching has been used in 3D detection benchmarks and related studies to assess localization consistency [6,23].

In this study, center-distance-based matching is adopted as the primary evaluation criterion because the proposed method is designed for candidate generation and semi-automatic labeling support. At the same time, IoU-based supplementary analysis can be reported to address overlap-based localization quality and to clarify the relationship between the proposed evaluation protocol and conventional 3D detection benchmarks.

2.6. Representation Enhancement and Point Augmentation

A common challenge in sparse object detection is that the target object may be incompletely represented, weakly contrasted, or confused with background structures. Recent studies in remote sensing change detection and salient object detection have addressed similar representation problems through semantic compensation, adaptive fusion, progressive interaction, and saliency-guided enhancement. For example, SCAFNet introduces a semantic compensated adaptive fusion strategy to address semantic misalignment and non-adaptive fusion in remote sensing image change detection [24]. Similarly, ORSI salient object detection using progressive interaction and saliency-guided enhancement aims to strengthen object-relevant regions while suppressing background interference in optical remote sensing images [25].

Although these remote sensing image methods are not directly designed for LiDAR point clouds, they are conceptually related to the present study in that they attempt to reinforce weak or incomplete object representations before final decision-making. In the proposed pipeline, centralized point augmentation plays a similar role in a geometric and non-learning-based manner. Instead of learning semantic compensation from image features, the proposed method adds contracted points around the cluster centroid in the projection-relevant y–z plane to stabilize sparse pedestrian silhouettes during STM-based comparison.

However, centralized point augmentation should not be interpreted as physical surface reconstruction. It is a geometric silhouette stabilization strategy intended to reduce fragmentation in sparse projected shapes. Excessive augmentation may also increase the similarity of non-pedestrian clusters to the seed template, thereby increasing false positives. Therefore, the contribution of centralized augmentation is best understood as a targeted mid-range stabilization mechanism rather than a universal performance enhancer.

2.7. Positioning of the Proposed Method

The proposed DBSCAN–STM pipeline with centralized point augmentation is positioned between fully manual 3D labeling tools and learning-based 3D object detectors. Compared with manual labeling tools, it provides automatic candidate generation and shape-based refinement to reduce annotation effort. Compared with learning-based 3D detectors, it does not require class-specific training, confidence scores, or large annotated datasets. Compared with clustering-only methods, it improves candidate reliability by combining distance-aware DBSCAN, ground snapping, height-based filtering, seed-dependent STM refinement, and controlled point augmentation.

The main objective of this study is therefore not to outperform state-of-the-art LiDAR detectors under official AP-based benchmarks but to provide a practical semi-automatic pedestrian candidate generation pipeline for sparse LiDAR point clouds. This positioning is particularly relevant when labeled data are limited, when a human operator can provide or verify a seed instance, and when center-aligned candidate localization is more important than confidence-score-based ranking.

3. Proposed Method

This section describes the proposed distance-aware DBSCAN–STM pipeline with centralized point augmentation (CPA) for LiDAR-based pedestrian candidate generation and semi-automatic labeling support. The proposed method is designed to generate reliable pedestrian candidate clusters from raw LiDAR point clouds without training a class-specific detector. It combines distance-aware density-based clustering, ground-aware bounding-box refinement, seed-dependent template matching, and geometric silhouette stabilization.

Unlike fully supervised 3D object detectors, the proposed framework does not require network training or confidence-score estimation. However, because the STM stage uses a seed pedestrian instance as a shape reference, the overall method should be interpreted as a seed-dependent, semi-automatic candidate generation framework rather than a fully unsupervised detector.

The overall procedure consists of the following stages:

I.: KITTI LiDAR data preprocessing and field-of-view filtering
II.: Distance-aware DBSCAN clustering
III.: Ground-snapping-based bounding-box coordinate refinement
IV.: Seed pedestrian selection
V.: Height-based pedestrian candidate filtering
VI.: STM-based template matching
VII.: Centralized point augmentation for sparse candidate stabilization
VIII.: Performance evaluation using center-distance and supplementary IoU metrics

Figure 1 shows the overall architecture of the proposed semi-automatic DBSCAN–STM–CPA pipeline. The pipeline consists of preprocessing, distance-aware DBSCAN clustering, ground-snapping-based bounding-box refinement, seed pedestrian selection, height-based filtering, STM-based template matching, CPA, and final candidate evaluation.

3.1. Dataset Configuration and Evaluation Scope

The input data used in this study are Velodyne LiDAR point clouds from the KITTI dataset [11]. The evaluation is performed on frames that contain at least one pedestrian ground-truth (GT) instance. As a result, 1779 pedestrian-containing frames are selected, including a total of 4487 pedestrian GT instances. The pedestrian GT instances are grouped according to their horizontal distance from the LiDAR sensor. The distance intervals are defined as follows:

Near range: 0–15 m
Mid range: 15–30 m
Far range: 30–45 m
Ultra-far range 1: 45–60 m
Ultra-far range 2: beyond 60 m

The distribution of pedestrian GT instances is summarized in Table 1.

The near- and mid-range intervals contain most pedestrian instances, whereas the ultra-far ranges contain relatively few valid samples. Therefore, the 45–60 m and beyond-60 m ranges are analyzed as reference intervals rather than primary evaluation categories. Riding persons, such as cyclists and motorcyclists, are excluded from pedestrian evaluation because their point cloud shapes and bounding-box characteristics are different from those of standing or walking pedestrians.

3.2. Preprocessing and Field-of-View Filtering

For each frame, the raw KITTI LiDAR point cloud is first loaded in the LiDAR coordinate system. The x-axis corresponds to the forward direction of the sensor, while the y- and z-axes represent the lateral and vertical directions, respectively. Since this study focuses on pedestrian candidate generation in the forward driving scene, points outside the camera field of view are removed using the KITTI calibration parameters.

Ground and non-ground points are then separated using a height-based threshold. The ground threshold is conservatively selected to avoid removing lower-body pedestrian points. The extracted ground candidate points are subsequently used for RANSAC-based ground plane estimation in the bounding-box refinement stage. The remaining non-ground points within the camera field of view are used as input for distance-aware DBSCAN clustering.

Figure 2 illustrates the preprocessing procedure, including ground segmentation and camera field-of-view filtering.

To improve reproducibility, all fixed preprocessing and candidate generation parameters are summarized in Table 2.

3.3. Distance-Aware DBSCAN Clustering

After preprocessing, pedestrian candidate clusters are generated using DBSCAN. DBSCAN is a density-based clustering algorithm that groups points according to local density conditions defined by eps and min_samples. It is suitable for LiDAR point cloud candidate generation because it can extract irregularly shaped clusters and identify low-density points as noise.

However, a single global DBSCAN parameter setting is not appropriate for LiDAR point clouds because point density decreases significantly with distance. Near-range pedestrian points are relatively dense, so a small eps value can form compact object clusters. In contrast, mid- and far-range pedestrians are represented by sparser and more fragmented points, requiring larger eps values to prevent valid pedestrian points from being separated or classified as noise. Conversely, excessively large eps values may merge adjacent objects or background structures, increasing false positives.

To address this issue, this study configures DBSCAN eps separately for each distance range. The horizontal distance of a point from the LiDAR sensor is computed as

d_{i} = \sqrt{x_{i}^{2} + y_{i}^{2}}

(1)

where

x_{i}

and

y_{i}

denote the forward and lateral coordinates of the point in the LiDAR coordinate system. Based on the parameter exploration results, eps is set to 0.15 for the near range, 0.30 for the mid range, and 0.50 for the far and ultra-far ranges. The min_samples value is fixed to 10 across all ranges.

Each DBSCAN cluster is represented as an axis-aligned bounding box (AABB). Let a cluster point set be defined as

P = {p_{i} | p_{i} = (x_{i}, y_{i}, z_{i}), i = 1, 2, \dots, N}

(2)

where N is the number of points in the cluster. The minimum and maximum coordinates of the cluster are computed as

p_{m i n} = (x_{m i n}, y_{m i n}, z_{m i n})

(3)

p_{m a x} = (x_{m a x}, y_{m a x}, z_{m a x})

(4)

The corresponding AABB is defined by

p_{m i n}

and

p_{m a x}

. This AABB representation is used for height filtering, center-distance matching, supplementary IoU evaluation, and subsequent STM-based candidate refinement.

3.4. Ground-Snapping-Based Bounding-Box Refinement

Clusters generated from sparse LiDAR points may produce bounding boxes whose lower faces are lifted above the actual ground surface. This problem frequently occurs when lower-body pedestrian points are missing or when only partial object points are observed. Such vertical misalignment can increase localization error and reduce the consistency between predicted candidate boxes and GT annotations.

To mitigate this problem, a ground-snapping refinement step is applied. First, ground candidate points extracted during preprocessing are used to estimate the local ground plane using RANSAC. The ground plane is represented as

a x + b y + c z + d = 0

(5)

where

a, b, c

and

d

are the estimated plane parameters. For each candidate cluster, the bottom coordinate of its AABB is adjusted toward the estimated ground plane. This refinement aligns the candidate box with the local road surface and reduces floating-box artifacts.

Figure 3 illustrates the ground-snapping-based coordinate refinement process, including the original cluster-derived box, bottom-face snapping to the estimated ground plane, and the final height-corrected candidate box.

The ground-snapping step is important for semi-automatic labeling because candidate boxes should provide stable geometric cues to human annotators. Although this process does not reconstruct the full physical extent of the pedestrian, it improves the vertical consistency of cluster-derived boxes before height-based filtering and template matching.

3.5. Seed Pedestrian Selection and Height-Based Candidate Filtering

The STM stage requires a seed pedestrian instance to construct a reference shape template. In the original STM framework, the seed can be provided manually by an operator. In the current experimental setting, the seed is selected as the nearest pedestrian GT instance in each frame to provide a stable and sufficiently dense reference shape. Because this selection uses pedestrian GT information, the proposed method should not be interpreted as a fully unsupervised detector. Rather, it is a seed-dependent, semi-automatic candidate generation method. In practical labeling scenarios, the seed can be selected by an operator through a single click on a reliable near-range pedestrian candidate.

The seed pedestrian is used for two purposes. First, its height provides a reference for filtering candidate clusters. Second, its projected shape is used as the STM template. Let

h_{s e e d}

denote the height of the seed pedestrian and

h_{j}

denote the height of the j-th candidate cluster. A candidate cluster is retained if its height satisfies

| h_{j} - h_{s e e d} | \leq ∆ h

(6)

where

∆ h

is the predefined height tolerance. This filtering step removes clusters whose vertical extents are clearly inconsistent with pedestrian objects. It also reduces the number of candidates passed to STM, thereby lowering the computational cost of PCA alignment, projection, morphological processing, and cosine similarity computation.

Figure 4 shows an illustrative single-frame example of height-based candidate filtering.

Height-based filtering is motivated by the observation that pedestrian height is more stable than width or length in sparse LiDAR observations. Width and length may vary due to occlusion, truncation, or partial observation, whereas vertical extent remains a useful cue for excluding non-pedestrian clusters. Nevertheless, height filtering alone is insufficient for final classification because objects with similar heights may still be non-pedestrians. Therefore, STM-based shape comparison is applied as the second-stage refinement.

3.6. STM-Based Template Matching

After height-based filtering, the remaining candidate clusters are refined using Single Template Matching (STM). STM compares the shape of each candidate cluster with the seed pedestrian template. The process consists of the following steps:

I.: PCA-based local coordinate alignment
II.: Projection of aligned points onto the y–z plane
III.: Binary image generation from projected points
IV.: Morphological closing and dilation for silhouette enhancement
V.: Cosine similarity computation between the seed template and candidate silhouettes
VI.: Pedestrian candidate classification based on a similarity threshold

For each cluster, PCA is applied to define a local coordinate system and reduce orientation-related variation. The aligned points are projected onto the y–z plane because this plane reflects the lateral and vertical silhouette of a pedestrian, which is directly relevant to shape comparison. The projected point image is then binarized and refined using morphological operations to reduce small holes and fragmented contours.

Figure 5 shows an example of pedestrian template generation for STM using the KITTI data used in this study.

Let

T

denote the binary template image generated from the seed pedestrian and

I_{j}

denote the binary projection image generated from the

j

-th candidate cluster. Both images are vectorized before similarity computation. The cosine similarity is computed as

S (T, I_{j}) = \frac{T \cdot I_{j}}{| | T | | | | I_{j} | |}

(7)

where

S (T, I_{j})

represents the shape similarity between the seed template and the candidate cluster. A candidate is classified as a pedestrian if

S (T, I_{j}) \geq τ

(8)

where τ is the cosine similarity threshold. In this study, τ is set to 0.40.

The STM stage functions as a precision-oriented refinement module. It suppresses false positives among the initial DBSCAN candidates by selecting clusters whose projected silhouettes are similar to the seed pedestrian. However, because STM relies on projected shape similarity, it may reject valid pedestrians when their point distributions are sparse, fragmented, or partially observed. This limitation motivates the centralized point augmentation strategy described in the next section.

3.7. Centralized Point Augmentation

Centralized point augmentation is introduced to stabilize projected pedestrian silhouettes under medium-range sparsity. In mid- and long-range regions, pedestrian clusters often contain only partial point distributions. As a result, their projected y–z silhouettes may be fragmented, which reduces cosine similarity with the seed template even when the cluster corresponds to an actual pedestrian.

The key idea of CPA is to add geometrically contracted points around the cluster centroid before projection. This process does not physically reconstruct the pedestrian surface. Instead, it aims to reduce silhouette fragmentation and improve the stability of STM similarity computation.

Given the original point set of a candidate cluster defined in Equation (2), the centroid of the cluster is computed as

c = (c_{x}, c_{y}, c_{z}) = \frac{1}{N} \sum_{i = 1}^{N} p_{i}

(9)

Because STM compares the projected shape on the y–z plane, CPA modifies only the y- and z-coordinates while preserving the x-coordinate. For the k-th contraction step, the scaling ratio

α_{k}

is defined as

α_{k} = m a x (α_{m i n}, 1 - k Δ α)

(10)

where Δα is the scaling decrement and

α_{m i n}

is the minimum scaling ratio. The augmented point generated from

p_{i}

at step k is defined as

p_{i}^{(k)} = (x_{i}, c_{y} + α_{k} (y_{i} - c_{y}), c_{z} + α_{k} (z_{i} - c_{z}))

(11)

The final augmented point set is constructed by combining the original points and the contracted points:

P_{a u g} = P \cup P^{(1)} \cup P^{(2)} \cup \dots \cup P^{(K)}

(12)

where K is the number of augmentation steps. If the original cluster contains N points, the augmented set contains approximately N(K + 1) points before duplicate removal. Figure 6 illustrates the proposed centralized point augmentation process for a sparse mid-range pedestrian candidate. The original point distribution is progressively contracted toward the cluster centroid in the projection-relevant y–z plane to stabilize the binary silhouette used for STM-based comparison.

In this study, CPA is applied only to candidate clusters located in the distance range where point sparsity begins to affect STM stability. The final setting applies two contraction steps to the 15–30 m range. This design is based on the observation that moderate CPA improves mid-range silhouette stability, whereas excessive contraction increases false positives by making non-pedestrian clusters more similar to the seed template.

It should be noted that CPA is a non-physical geometric transformation. Excessive contraction can distort the original cluster shape and reduce the discriminative power of STM. Therefore, CPA is used as a controlled mid-range stabilization mechanism rather than a universal augmentation strategy for all distance ranges.

3.8. Performance Evaluation

The proposed method is evaluated using center-distance-based matching as the primary criterion. This choice is motivated by the semi-automatic labeling objective of the pipeline. Cluster-derived AABBs may tightly follow observed points and may not fully overlap with manually annotated GT boxes, especially when pedestrians are sparsely observed. Therefore, a candidate that is spatially close to a pedestrian center may still obtain a low IoU. Center-distance matching provides a localization-oriented measure of whether the generated candidate is useful for annotation support.

Let

c_{p r e d}

and

c_{g t}

denote the center coordinates of the predicted candidate box and the GT box, respectively. They are computed as

c_{p r e d} = \frac{P_{m i n}^{p r e d} + P_{m a x}^{p r e d}}{2}

(13)

c_{g t} = \frac{P_{m i n}^{g t} + P_{m a x}^{g t}}{2}

(14)

D_{C D} = | | c_{p r e d} - c_{g t} {| |}_{2}

(15)

A candidate is regarded as a true positive if

D_{C D}

is less than or equal to the predefined threshold. In the final evaluation, the center-distance threshold is set to 1.0 m. One-to-one matching is applied to prevent multiple predictions from being assigned to the same GT instance.

Based on this matching result, true positives (TP), false positives (FP), and false negatives (FN) are defined as follows:

TP: GT instances matched with a candidate cluster under the center-distance threshold
FP: candidate clusters not matched with any GT instance
FN: GT instances not matched with any candidate cluster

Precision, recall, and F1-score are computed as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

R e c a l l = \frac{T P}{T P + F N}

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

In addition to center-distance-based evaluation, supplementary 3D IoU and BEV IoU analyses are conducted to examine overlap-based localization quality. IoU-based results are not used as the primary metric because the proposed method does not generate confidence scores for AP computation and is designed for candidate generation rather than confidence-ranked detection. Nevertheless, IoU provides useful complementary evidence regarding the spatial overlap between candidate boxes and GT boxes.

3.9. Summary of the Proposed Pipeline

The proposed method consists of two main stages. Stage 1 generates initial pedestrian candidates using distance-aware DBSCAN, ground snapping, and height-based filtering. Stage 2 refines these candidates using seed-dependent STM and, when appropriate, centralized point augmentation. Distance-aware DBSCAN improves candidate generation under varying point density, while STM suppresses false positives using shape similarity to the seed pedestrian. CPA further stabilizes sparse mid-range silhouettes by reinforcing the projection-relevant y–z shape distribution.

Because the method depends on a seed pedestrian for template construction, it is positioned as a seed-dependent, semi-automatic labeling support pipeline. This positioning is consistent with the practical objective of reducing manual annotation effort rather than providing a fully supervised or fully unsupervised 3D detector.

4. Experimental Results

4.1. Experimental Setup and Evaluation Protocol

This section evaluates the proposed distance-aware DBSCAN–STM pipeline with centralized point augmentation (CPA) using pedestrian instances in the KITTI LiDAR dataset. The evaluation is conducted on 1779 frames containing at least one pedestrian ground-truth (GT) instance. In total, 4487 pedestrian GT instances are used for evaluation. The distance ranges are divided into 0–15 m, 15–30 m, 30–45 m, 45–60 m, and beyond 60 m, as described in Section 3.1.

The proposed pipeline is evaluated in two main stages. Stage 1 denotes the initial candidate generation stage, which consists of distance-aware DBSCAN clustering, ground-snapping-based bounding-box refinement, and height-based filtering. Stage 2 denotes the STM-based candidate refinement stage, where candidate clusters are classified by comparing their projected silhouettes with the seed pedestrian template. CPA is further applied to selected distance ranges to analyze its effect on sparse silhouette stabilization.

The primary evaluation metric is based on 3D center-distance (CD) matching between predicted cluster boxes and pedestrian GT boxes. A candidate is considered a true positive if the Euclidean distance between the predicted box center and the GT box center is less than or equal to 1.0 m. One-to-one matching is applied to avoid assigning multiple predictions to the same GT instance. Precision, recall, and F1-score are then computed from TP, FP, and FN.

Although CD-based matching is used as the main evaluation criterion because the proposed method is designed for semi-automatic candidate generation, IoU-based evaluation is additionally conducted as a supplementary analysis. In particular, 3D IoU and BEV IoU are reported to clarify the spatial overlap between cluster-derived boxes and GT boxes. Since the proposed method does not generate confidence scores, AP computation based on ranked detections is not directly applicable. Therefore, the IoU-based results should be interpreted as supplementary localization indicators rather than official KITTI AP results.

The fixed parameters used in the experiments are summarized in Table 2. The DBSCAN eps values are set to 0.15, 0.30, and 0.50 for the 0–15 m, 15–30 m, and ≥30 m ranges, respectively. The min_samples value is fixed to 10. The STM cosine similarity threshold is set to 0.40. For the final CPA configuration, two contraction steps are applied only to the 15–30 m range.

4.2. DBSCAN Parameter Exploration

To determine appropriate DBSCAN parameters for pedestrian candidate generation, eps and min_samples are evaluated separately for each distance range. During this parameter exploration, a stricter CD threshold of 0.5 m is used instead of the final 1.0 m threshold. This stricter criterion is adopted to reduce ambiguous matches between neighboring pedestrian instances and to make the effect of DBSCAN parameter changes more distinguishable.

Figure 7 shows the DBSCAN parameter sensitivity results for the near, mid, and far ranges.

As shown in Figure 7, the parameter range that produces meaningful pedestrian candidate clusters changes according to distance. In the near range, eps values around 0.15–0.20 m produce high recall because pedestrian points are relatively dense. In the mid range, the effective eps range shifts to approximately 0.25–0.35 m as pedestrian point density decreases. In the far range, meaningful clustering is observed mainly at larger eps values around 0.40–0.50 m.

These results confirm that a single global DBSCAN parameter setting is not suitable for LiDAR pedestrian candidate generation. If eps is too small, sparse mid- and far-range pedestrian points are fragmented or classified as noise. Conversely, if eps is too large, adjacent objects and background structures may be merged, increasing false positives. Based on the parameter exploration results, eps is set to 0.15 for the near range, 0.30 for the mid range, and 0.50 for the far and ultra-far ranges. The min_samples value is fixed to 10 because increasing min_samples consistently reduces recall under sparse point conditions.

4.3. Effect of STM-Based Candidate Refinement

Table 3 compares the overall pedestrian detection performance between Stage 1 and Stage 2. Stage 1 corresponds to DBSCAN-based candidate generation with height filtering, whereas Stage 2 corresponds to STM-based refinement.

In Stage 1, the proposed pipeline achieves a recall of 55.5%, precision of 27.6%, and F1-score of 36.8%. This result indicates that distance-aware DBSCAN and height-based filtering can generate a relatively broad set of pedestrian candidates, but many false positives remain.

After applying STM in Stage 2, recall decreases from 55.5% to 44.7%, while precision increases from 27.6% to 60.5%. The precision is therefore improved by approximately 2.2 times compared with the Stage-1 precision, not compared with the Stage-1 F1-score. As a result, the F1-score increases from 36.8% to 51.4%, corresponding to an absolute improvement of 14.6 percentage points.

These results show that STM functions as a precision-oriented candidate refinement module. It suppresses many false positives generated during the clustering stage while preserving 80.6% of the GT instances that were already detected in Stage 1. However, the recall decrease also indicates that some valid pedestrian clusters are rejected because their projected silhouettes are not sufficiently similar to the seed template.

4.4. Distance-Wise Analysis of STM Performance

Table 4 presents the distance-wise comparison between Stage 1 and Stage 2.

The distance-wise results show that the proposed STM refinement is most effective in the near range. In the 0–15 m range, Stage 1 achieves a recall of 76.2%, precision of 53.7%, and F1-score of 63.0%. After STM refinement, precision increases to 69.4% while recall remains relatively high at 73.0%, resulting in an F1-score of 71.2%. The STM recall relative to Stage 1 reaches 95.8%, indicating that STM preserves most of the valid near-range pedestrian candidates while suppressing false positives.

In the 15–30 m range, STM improves precision from 24.2% to 48.4%, but recall decreases from 43.5% to 24.7%. The resulting F1-score increases only slightly from 31.1% to 32.7%. This result indicates that STM still suppresses false positives in the mid range, but sparse and incomplete pedestrian silhouettes make it more difficult to preserve true pedestrian candidates.

In the 30–45 m range, recall decreases substantially after STM refinement, from 26.1% to 8.5%. Although precision increases from 6.2% to 22.1%, the final F1-score remains low at 12.3%. This result suggests that the main limitation in the far range is not only false-positive suppression but also the lack of sufficiently stable pedestrian point distributions for template matching.

In the ultra-far ranges beyond 45 m, the number of valid GT instances is small, and the initial DBSCAN stage generates very few meaningful candidate clusters. Therefore, STM and CPA have limited opportunities to improve detection performance. These results indicate that severe raw point sparsity imposes an upper bound on the effectiveness of non-learning-based clustering and template matching.

4.5. Effect of Centralized Point Augmentation

CPA is evaluated to determine whether sparse pedestrian silhouettes can be stabilized before STM-based shape comparison. Since near-range pedestrian clusters are already represented by relatively dense points, CPA is not applied to the 0–15 m range. The analysis focuses on candidate clusters located in the 15–60 m range, where sparsity begins to degrade the stability of projected silhouettes.

Table 5 summarizes the overall STM performance according to the number of CPA contraction steps.

The results show a clear precision–recall trade-off. As the number of CPA steps increases, recall and STM recall relative to Stage 1 gradually increase. This indicates that CPA helps more candidate clusters pass the STM similarity threshold. However, precision consistently decreases as the number of CPA steps increases, meaning that non-pedestrian clusters also become more likely to resemble the seed template after excessive contraction.

Therefore, CPA should not be interpreted as a universal performance enhancer. Instead, it should be understood as a controlled silhouette stabilization mechanism. Excessive CPA reduces the discriminative power of STM and increases false positives. This observation is consistent with the non-physical nature of centroid contraction: while moderate contraction may fill fragmented pedestrian silhouettes, excessive contraction may distort object-specific shape differences.

Figure 8 presents the distance-wise changes in recall, precision, and F1-score according to CPA steps.

The distance-wise analysis shows that the main benefit of CPA appears in the 15–30 m range. Without CPA, the F1-score in this range is 32.7%. With two CPA steps, the F1-score increases to 35.6%, mainly due to improved recall. However, additional CPA steps reduce precision and eventually decrease the F1-score. In the 30–45 m range, CPA increases recall but also causes a larger precision drop, resulting in no clear F1-score improvement. In the 45–60 m range, the small number of detections makes the results unstable and difficult to generalize.

These results indicate that CPA is most useful as a targeted mid-range stabilization method. In the final configuration, CPA is therefore applied only to the 15–30 m range with two contraction steps.

4.6. Final Configuration of the Proposed Pipeline

Based on the CPA sensitivity analysis, the final pipeline applies two CPA contraction steps only to the 15–30 m range. Table 6 compares Stage 1, Stage 2 without CPA, and the final DBSCAN–STM–CPA configuration.

Compared with STM without CPA, the final configuration increases recall from 44.7% to 46.7% and STM recall relative to Stage 1 from 80.6% to 83.7%. Precision decreases slightly from 60.5% to 58.8%. As a result, the overall F1-score increases from 51.4% to 52.1%.

The absolute F1-score improvement of 0.7 percentage points is modest. Therefore, the contribution of CPA should be interpreted carefully. The main effect of CPA is not a large global performance improvement but a targeted improvement in mid-range candidate preservation. In particular, CPA improves the stability of STM for sparse 15–30 m pedestrian clusters. However, it also introduces a precision–recall trade-off and may increase false positives if applied too strongly or to unsuitable distance ranges.

4.7. Comparison with Existing 3D Detectors

Table 7 compares the proposed method with representative KITTI pedestrian detection results reported in the LRPD study under a center-distance-based evaluation protocol. Because the proposed method is a non-learning-based candidate generation pipeline and does not produce confidence-ranked detections, this comparison should not be interpreted as an official KITTI AP comparison. Instead, it provides contextual evidence regarding candidate localization quality under a CD-based matching criterion.

The final DBSCAN–STM–CPA configuration achieves an F1-score of 0.52, precision of 0.58, and recall of 0.46 under the CD-based evaluation protocol. These results are close to the reported PointPillars F1-score of 0.52 in the referenced evaluation setting. However, unless all methods are evaluated on the identical frame subset, distance distribution, and matching implementation, the comparison should be regarded as indicative rather than strictly head-to-head.

The proposed method differs from learning-based detectors in its objective and operating conditions. It does not require annotated training data, class-specific model learning, or confidence-score calibration. Instead, it generates candidate clusters using distance-aware DBSCAN and refines them using a seed-dependent template. Therefore, its practical value lies in semi-automatic labeling support rather than direct competition with state-of-the-art supervised 3D detectors.

4.8. Supplementary IoU and Runtime Analyses

To further assess localization quality and computational practicality, supplementary IoU-based evaluation and runtime analysis were conducted. Although the main evaluation in this study is based on center-distance matching, IoU-based metrics provide additional evidence regarding the spatial overlap between predicted candidate boxes and GT boxes. In addition, runtime analysis was performed to examine whether height-based filtering reduces the computational burden of the STM stage by decreasing the number of candidate clusters.

4.8.1. Supplementary IoU-Based Localization Analysis

Table 8 summarizes the supplementary 3D IoU and BEV IoU results of the proposed pipeline. Since the proposed method generates candidate boxes without confidence scores, IoU is not used for AP computation. Instead, IoU-based matching is used as a complementary localization indicator. Relaxed IoU thresholds are adopted because cluster-derived AABBs often tightly follow the observed LiDAR points, whereas manually annotated GT boxes include the full physical extent of pedestrians, including partially unobserved regions.

The IoU-based results show that the proposed method produces candidate boxes that are not only close to GT centers but also maintain meaningful spatial overlap with pedestrian GT boxes. The final DBSCAN–STM–CPA configuration slightly improves the threshold-based 3D and BEV IoU matching rates compared with Stage 2, while the mean IoU values remain nearly unchanged. This suggests that CPA helps retain additional valid candidates without substantially degrading box-overlap quality, rather than directly improving precise box localization.

The difference between center-distance performance and IoU-based performance also reflects the nature of sparse LiDAR pedestrian observations. A candidate box may be located close to the pedestrian center while still having limited IoU because the observed point cluster represents only a partial portion of the pedestrian body. Therefore, the IoU results should be interpreted as supplementary evidence of localization quality rather than as a replacement for the center-distance-based evaluation used throughout this study.

4.8.2. Runtime Analysis

Runtime analysis was conducted to evaluate the computational practicality of the proposed pipeline. Table 9 reports the average per-frame processing time for each major component, including preprocessing, DBSCAN clustering, ground snapping, height filtering, STM, CPA, and total runtime. The analysis compares three configurations: without height filtering, with height filtering, and the final DBSCAN–STM–CPA configuration.

The runtime results show that height-based filtering reduces the number of candidate clusters passed to STM and thereby decreases the computational cost of template matching. Although height filtering itself introduces a small additional processing cost, this overhead is offset by the reduction in STM computation. As a result, the configuration with height filtering achieves a lower total runtime than the configuration without height filtering.

The final DBSCAN–STM–CPA configuration introduces additional computation due to centralized point augmentation. However, because CPA is applied only to the 15–30 m range and uses only two contraction steps in the final setting, its runtime overhead remains limited. This result supports the use of CPA as a targeted mid-range stabilization module rather than as a global augmentation process applied to all candidate clusters.

Overall, the supplementary IoU and runtime analyses provide additional support for the practical applicability of the proposed method. The IoU results show that center-distance-based matches retain meaningful spatial overlap with GT boxes, while the runtime results show that height-based filtering and range-limited CPA can be incorporated without excessive computational cost. These findings support the use of the proposed DBSCAN–STM–CPA pipeline as a semi-automatic pedestrian candidate generation method for LiDAR-based 3D labeling workflows.

4.9. Summary

The experimental results demonstrate that the proposed pipeline is effective as a semi-automatic pedestrian candidate generation and refinement method. Distance-aware DBSCAN improves initial candidate generation under varying point density, while STM substantially improves precision by suppressing false positives. CPA provides a targeted benefit in the mid-range region by stabilizing sparse projected silhouettes, but its overall F1-score improvement is modest and accompanied by a precision–recall trade-off.

Therefore, the final DBSCAN–STM–CPA configuration should be interpreted as a seed-dependent, non-learning-based labeling support pipeline that improves candidate reliability, particularly in near- and mid-range pedestrian scenarios. Its main contribution lies not in outperforming supervised 3D detectors under official AP-based benchmarks but in providing a practical candidate generation framework that can reduce manual annotation effort when labeled data or trained detectors are limited.

5. Discussion

The experimental results demonstrate that the proposed DBSCAN–STM–CPA pipeline is effective as a seed-dependent, semi-automatic pedestrian candidate generation method rather than as a fully supervised or fully unsupervised 3D detector. The main strength of the proposed framework lies in its ability to combine distance-aware geometric clustering with seed-based shape refinement without requiring class-specific network training. This positioning is consistent with the practical objective of supporting 3D annotation workflows, especially when annotated training data are limited or when a human operator can provide a representative pedestrian seed.

5.1. Effectiveness of Distance-Aware Candidate Generation

The DBSCAN parameter sensitivity analysis confirms that LiDAR pedestrian candidate generation is strongly affected by distance-dependent point density. A small eps value is effective in the near range because pedestrian point distributions are relatively dense, whereas larger eps values are required in mid- and far-range regions to avoid fragmenting sparse pedestrian clusters. This observation supports the use of distance-aware DBSCAN parameters rather than a single global parameter setting.

The Stage-1 results show that distance-aware DBSCAN combined with height-based filtering can generate a broad set of pedestrian candidates. However, the relatively low precision indicates that clustering and height filtering alone are insufficient for reliable pedestrian candidate selection. This is expected because non-pedestrian objects or background structures can have spatial extents similar to pedestrians. Therefore, the role of Stage 1 should be understood primarily as recall-oriented candidate generation rather than final pedestrian classification.

5.2. Role of STM as a Precision-Oriented Refinement Module

The STM stage substantially improves precision by comparing candidate silhouettes with a seed pedestrian template. The overall precision increases from 27.6% in Stage 1 to 60.5% in Stage 2, while the F1-score increases from 36.8% to 51.4%. This result indicates that seed-dependent template matching is effective for suppressing false positives generated by the initial clustering stage.

However, the improvement in precision is accompanied by a decrease in recall. This trade-off occurs because STM relies on projected silhouette similarity. When valid pedestrian clusters are sparse, fragmented, partially occluded, or observed from different viewpoints, their binary silhouettes may not sufficiently match the seed template. Therefore, STM is most effective when candidate clusters preserve recognizable pedestrian shapes, particularly in the near range. In the mid- and far-range regions, LiDAR sparsity reduces the stability of silhouette representation and limits the recall-preserving ability of STM.

This result also reinforces the need to interpret the proposed method as a semi-automatic labeling support pipeline. Since STM uses a seed pedestrian instance, the method does not operate as a fully unsupervised detector. Instead, it is more appropriate for scenarios where an operator provides or verifies a representative seed and then uses the generated candidates for subsequent annotation refinement.

5.3. Interpretation of Centralized Point Augmentation

CPA is introduced to stabilize sparse projected silhouettes before STM-based similarity computation. The results show that CPA increases recall and STM recall relative to Stage 1 as the number of contraction steps increases. This suggests that centroid-contracted points can help fragmented candidate silhouettes pass the STM similarity threshold.

Nevertheless, CPA also causes a clear precision–recall trade-off. As the number of CPA steps increases, precision consistently decreases. This means that excessive augmentation can make non-pedestrian clusters appear more similar to the pedestrian seed template. Therefore, CPA should not be interpreted as a general-purpose performance enhancer or as a physical reconstruction of missing pedestrian surfaces. Rather, it should be understood as a controlled geometric transformation that stabilizes mid-range silhouettes for template matching.

The final configuration applies two CPA contraction steps only to the 15–30 m range. This setting reflects the empirical observation that the main benefit of CPA appears in the mid range, where pedestrian clusters are sparse enough to suffer from silhouette fragmentation but still contain sufficient geometric information for shape stabilization. In contrast, the near range does not require CPA because point density is already sufficient, while the far and ultra-far ranges often contain too few points for augmentation to recover meaningful pedestrian shapes.

5.4. Comparison with Learning-Based 3D Detectors

The comparison with representative 3D detectors provides contextual evidence for the candidate localization quality of the proposed method. The final DBSCAN–STM–CPA configuration achieves an F1-score close to the PointPillars result reported under the referenced center-distance-based protocol. However, this comparison should be interpreted carefully. Unless all methods are evaluated on the identical frame subset, distance distribution, preprocessing procedure, and matching implementation, the results should be regarded as indicative rather than strictly head-to-head.

The proposed method differs fundamentally from supervised 3D detectors. Learning-based detectors such as PointPillars, Frustum PointNets, AVOD, and PointCNN are designed to learn class-specific representations from annotated training data and usually generate confidence-ranked detections. In contrast, the proposed method does not learn a detector model or produce calibrated confidence scores. Its practical value lies in providing candidate clusters for semi-automatic labeling, particularly in early-stage dataset construction or data-limited scenarios.

5.5. Limitations

Several limitations remain. First, the current experimental seed selection uses the nearest pedestrian GT instance, which provides a stable and high-quality seed. Although this setting is useful for evaluating the upper-bound behavior of STM-based refinement, it does not fully represent practical labeling scenarios. In real applications, the seed would be selected by a human operator or automatically chosen from candidate clusters without GT class information. Therefore, additional experiments using operator-clicked seeds, randomly selected near-range clusters, or non-GT seed candidates are necessary to evaluate seed sensitivity.

Second, although supplementary 3D IoU and BEV IoU analyses were included to complement center-distance matching, the proposed method still does not provide confidence-ranked detections for official AP-based evaluation. Therefore, future work should further examine stricter IoU thresholds and confidence-score estimation strategies.

Third, the distinct contribution of CPA should be further isolated. Since both morphological operations and CPA aim to reduce silhouette fragmentation, their effects may partially overlap. Additional ablation studies comparing no augmentation, stronger morphological dilation, CPA only, and combined morphology plus CPA would help clarify whether CPA provides benefits beyond conventional silhouette processing.

Fourth, the current method remains limited under severe long-range sparsity. In far and ultra-far ranges, pedestrian clusters often contain too few points to form stable geometric structures. In such cases, neither DBSCAN parameter adjustment nor centroid-based augmentation can fully recover missing object information. This limitation suggests that the proposed approach is most suitable for near- and mid-range semi-automatic labeling support, while long-range pedestrian detection may require additional cues such as image information, temporal accumulation, or learning-based feature enhancement.

Finally, although runtime analysis was included, the current implementation was evaluated in an offline experimental environment. Future work should further validate the pipeline in an interactive annotation interface to assess real-time usability.

5.6. Practical Implications

Despite these limitations, the proposed pipeline provides a practical framework for semi-automatic LiDAR pedestrian labeling. The method can reduce the number of irrelevant candidate clusters before human verification and can improve candidate reliability through seed-based shape matching. Because it does not require class-specific detector training, it can be useful in data-limited situations, early-stage dataset construction, or domain adaptation scenarios where supervised detectors may not yet be available or reliable.

The results also suggest that distance-aware design is important for LiDAR annotation support. Candidate generation, template matching, and augmentation should not be applied uniformly across all distance ranges. Instead, each processing component should be configured according to the point density and shape stability of the target distance range. From this perspective, the proposed DBSCAN–STM–CPA pipeline provides a modular baseline for future semi-automatic labeling systems that combine geometric clustering, operator-provided seeds, and adaptive representation enhancement.

6. Conclusions

This paper proposed a distance-aware DBSCAN–STM pipeline with centralized point augmentation for LiDAR-based pedestrian candidate generation and semi-automatic labeling support. The proposed method was designed to address three practical challenges in sparse LiDAR pedestrian perception: distance-dependent point cloud sparsity, sensitivity of density-based clustering parameters, and instability of template matching under incomplete pedestrian silhouettes [3,4,5,6].

The proposed pipeline first generates pedestrian candidate clusters using distance-aware DBSCAN [13]. By assigning different eps values to different distance ranges, the clustering stage better reflects the density variation in LiDAR point clouds. Ground-snapping-based bounding-box refinement using RANSAC improves the vertical consistency of candidate boxes [14], and height-based filtering reduces irrelevant clusters before template matching. The STM stage then refines the candidate set by comparing PCA-aligned projected silhouettes with a seed pedestrian template [15]. This refinement substantially improves precision by suppressing false positives, showing that seed-dependent template matching is useful as a second-stage candidate filtering mechanism.

The experimental results show that the proposed method is most effective in near-range pedestrian scenarios, where LiDAR point density is sufficient to form stable clusters and reliable projected silhouettes. In the near range, STM preserves most valid Stage-1 candidates while improving precision. In the mid range, point sparsity begins to degrade silhouette stability, and centralized point augmentation provides a targeted benefit by improving candidate retention. However, the overall gain from CPA is modest, and the results reveal a clear precision–recall trade-off. Excessive augmentation increases false positives because centroid contraction can also make non-pedestrian clusters more similar to the seed template.

Therefore, CPA should not be interpreted as a universal performance enhancer or physical point cloud reconstruction method. Its contribution is better understood as a controlled mid-range stabilization mechanism for seed-dependent template matching. In the final configuration, applying two CPA steps only to the 15–30 m range improves mid-range candidate preservation while avoiding excessive precision loss. In far- and ultra-far-range regions, however, the effectiveness of both STM and CPA is limited because the raw LiDAR observations often contain too few points to form meaningful pedestrian clusters [4,5,6].

The proposed method is also distinct from fully supervised 3D detectors such as PointPillars, Frustum PointNets, AVOD, and PointCNN-based detectors [17,18,19,20]. It does not require class-specific network training, confidence-score calibration, or large annotated datasets. Instead, it provides a non-learning-based, seed-dependent candidate generation framework that can support semi-automatic 3D labeling. For this reason, the method should be evaluated primarily in terms of candidate reliability and annotation assistance rather than direct competition with official AP-based detector benchmarks [6,22].

Several limitations remain. First, the current experimental seed selection relies on a pedestrian GT instance, which provides an upper-bound condition for seed quality. Future work should evaluate operator-clicked seeds or automatically selected non-GT clusters to better reflect practical labeling scenarios. Second, although supplementary 3D IoU and BEV IoU analyses were included, the proposed method is still primarily designed for center-distance-based candidate generation rather than confidence-ranked AP-based 3D detection. Third, the effect of CPA should be validated through additional ablation studies, including comparisons with stronger morphological operations and repeated evaluations with confidence intervals. Finally, the current approach is limited under severe long-range sparsity, where density-based clustering fails to generate reliable initial candidates.

Future work will focus on adaptive CPA control based on distance, point count, and cluster shape quality. In addition, the seed selection strategy will be extended to operator-assisted and non-GT seed scenarios, and the pipeline will be integrated into a practical semi-automatic labeling interface [7,8,9]. Further research will also investigate alternative clustering methods, multi-seed template construction, and hybrid integration with learning-based detectors to improve candidate generation under sparse and long-range LiDAR conditions. The runtime behavior of the pipeline will also be further validated in an interactive annotation environment.

Author Contributions

Conceptualization, J.K. (Joongjin Kook); methodology, J.K. (Joongjin Kook) and J.Y.; software, J.K. (Jinman Kim) and J.Y.; validation, J.K. (Joongjin Kook), J.K. (Jinman Kim) and J.Y.; formal analysis, J.K. (Joongjin Kook), J.K. (Jinman Kim) and J.Y.; investigation, J.K. (Joongjin Kook), J.K. (Jinman Kim) and J.Y.; resources, J.K. (Joongjin Kook), J.K. (Jinman Kim) and J.Y.; data curation, J.K. (Joongjin Kook); writing—original draft preparation, J.K. (Joongjin Kook); writing—review and editing, J.K. (Joongjin Kook); visualization, J.Y.; supervision, J.K. (Joongjin Kook); project administration, J.K. (Joongjin Kook); funding acquisition, J.K. (Joongjin Kook). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a 2025–2026 Research Grant from Sangmyung University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The KITTI dataset used in this study is publicly available from the KITTI Vision Benchmark Suite. The parameter configuration and implementation details are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ko, J.-H.; Choi, D.; Jeong, G.-H.; Lee, H.-W.; Yu, S.; Park, H.; Ma, S.-Y.; Jung, T.-U. Development Trends and Future Prospects of Autonomous Vehicles (Part 1: Technology and Systems). J. Semicond. Disp. Technol. 2025, 24, 21–26. [Google Scholar]
Chen, J.; Jia, K.; Wei, Z. Small but Mighty: A Lightweight Feature Enhancement Strategy for LiDAR Odometry in Challenging Environments. Remote Sens. 2025, 17, 2656. [Google Scholar] [CrossRef]
Gupta, S.; Kanjani, J.; Li, M.; Ferroni, F.; Hays, J.; Ramanan, D.; Kong, S. Far3Det: Towards Far-Field 3D Detection. arXiv 2022, arXiv:2211.13858. [Google Scholar] [CrossRef]
Duan, Z.; Shao, J.; Zhang, M.; Zhang, J.; Zhai, Z. A Small-Object-Detection Algorithm Based on LiDAR Point-Cloud Clustering for Autonomous Vehicles. Sensors 2024, 24, 5423. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Yang, D.; Yurtsever, E.; Redmill, K.A.; Özgüner, Ü. Faraway-Frustum: Dealing with LiDAR Sparsity for 3D Object Detection Using Fusion. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC); IEEE: New York, NY, USA, 2021; pp. 2646–2652. [Google Scholar] [CrossRef]
Fürst, M.; Wasenmüller, O.; Stricker, D. LRPD: Long Range 3D Pedestrian Detection Leveraging Specific Strengths of LiDAR and RGB. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC); IEEE: New York, NY, USA, 2020; pp. 1–7. [Google Scholar] [CrossRef]
Wu, A.; He, P.; Li, X.; Chen, K.; Ranka, S.; Rangarajan, A. An Efficient Semi-Automated Scheme for Infrastructure LiDAR Annotation. IEEE Trans. Intell. Transp. Syst. 2024, 25, 8237–8247. [Google Scholar] [CrossRef]
Sager, C.; Zschech, P.; Kühl, N. labelCloud: A Lightweight Labeling Tool for Domain-Agnostic 3D Object Detection in Point Clouds. Comput.-Aided Des. Appl. 2022, 19, 1191–1206. [Google Scholar] [CrossRef]
Zimmer, W.; Rangesh, A.; Trivedi, M.M. 3D BAT: A Semi-Automatic, Web-Based 3D Annotation Toolbox for Full-Surround, Multi-Modal Data Streams. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV); IEEE: New York, NY, USA, 2019; pp. 1816–1821. [Google Scholar] [CrossRef]
Soum-Fontez, L.; Deschaud, J.-E.; Goulette, F. Open-Set 3D Object Detection in LiDAR Data as an Out-of-Distribution Problem. arXiv 2024, arXiv:2410.23767. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision Meets Robotics: The KITTI Dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD); AAAI Press: Washington, DC, USA, 1996; pp. 226–231. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Liu, K.; Wang, W.; Wang, J. Pedestrian Detection with LiDAR Point Clouds Based on Single Template Matching. Electronics 2019, 8, 780. [Google Scholar] [CrossRef]
Choi, J.; Cho, H.; Jung, H.; Lee, W. SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D Consistency. arXiv 2025, arXiv:2503.23980. [Google Scholar] [CrossRef]
Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2019; pp. 12697–12705. [Google Scholar] [CrossRef]
Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2018; pp. 918–927. [Google Scholar] [CrossRef]
Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: New York, NY, USA, 2018; pp. 1–8. [Google Scholar] [CrossRef]
Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. PointCNN: Convolution on X-Transformed Points. In Advances in Neural Information Processing Systems; NeurIPS: San Diego, CA, USA, 2018; pp. 828–838. [Google Scholar]
Velodyne Acoustics, Inc. HDL-64E S2 High Definition LiDAR Sensor—Data Sheet; Document 63-9194 Rev-B; Velodyne: San Jose, CA, USA, 2014. [Google Scholar]
The KITTI Vision Benchmark Suite. Available online: https://www.cvlibs.net/datasets/kitti/ (accessed on 12 June 2026).
Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2020; pp. 11621–11631. [Google Scholar] [CrossRef]
Zhang, Y.; Zhen, J.; Sun, S.; Liu, T.; Huo, L.; Wang, T. SCAFNet: A Semantic Compensated Adaptive Fusion Network for Remote Sensing Images Change Detection. IEEE Geosci. Remote Sens. Lett. 2026, 23, 6003405. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, T.; Xue, L.; Lian, W.; Tao, R. ORSI Salient Object Detection via Progressive Interaction and Saliency-Guided Enhancement. IEEE Geosci. Remote Sens. Lett. 2026, 23, 6002105. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the proposed semi-automatic DBSCAN–STM–CPA pipeline.

Figure 2. Ground segmentation and camera field-of-view filtering applied to the KITTI LiDAR point cloud.

Figure 3. Bounding-box coordinate refinement using ground snapping: (1) original cluster-derived box, (2) snapping of the bottom face to the estimated ground plane, and (3) final height-corrected candidate box.

Figure 4. Example of height-based candidate filtering in a single KITTI frame: (a) initial candidate clusters generated by distance-aware DBSCAN and ground-snapping refinement and (b) filtered candidate clusters retained after applying the seed-height tolerance. Red boxes indicate candidate clusters before filtering, blue boxes indicate candidate clusters retained after filtering, and the green box indicates the seed pedestrian used as the height reference. The number of candidate clusters is reduced from 36 to 15 after filtering. This example is provided only for visual explanation and does not represent aggregate performance.

Figure 5. Example of pedestrian template generation for STM: (a) camera-view pedestrian instance, (b) corresponding LiDAR point cluster, (c) y–z plane projection after PCA alignment, and (d) binary silhouette after morphological refinement.

Figure 6. Illustration of centralized point augmentation for a mid-range pedestrian candidate. The original sparse point distribution is progressively contracted toward the cluster centroid in the y–z plane. In the final configuration, two contraction steps are applied to stabilize the projected silhouette for STM-based comparison.

Figure 7. DBSCAN parameter sensitivity analysis for pedestrian candidate generation under CD = 0.5 m: (a) near range, 0–15 m; (b) mid range, 15–30 m; and (c) far range, 30–45 m.

Figure 8. Distance-wise effect of centralized point augmentation on STM performance: (a) recall, (b) precision, and (c) F1-score.

Table 1. Pedestrian GT Distribution according to distance range.

Distance Range (m)	Number of Pedestrian GT Instances
$0 \leq d < 15 m$ (Near range)	2093
$15 \leq d < 30 m$ (Mid range)	1771
$30 \leq d < 45 m$ (Far range)	468
$45 \leq d < 60 m$ (Ultra-far-range 1)	118
$d \geq 60 m$ (Ultra-far-range 2)	37
Total	4487

Table 2. Fixed parameter settings for preprocessing and candidate generation.

Category	Parameter	Value	Description
Ground Segmentation	z-threshold	−1.3 m	Threshold for initial ground candidate extraction
Field-of-view filtering	Camera FoV	KITTI calibration based	Forward camera-view point selection
RANSAC ground fitting	distance threshold	0.1 m	Inlier threshold for ground plane estimation
RANSAC ground fitting	max iterations	2000	Maximum number of RANSAC iterations
DBSCAN near range	eps	0.15	0–15 m
DBSCAN mid range	eps	0.30	DBSCAN neighborhood radius for the 15–30 m distance range
DBSCAN far range	eps	0.50	≥30 m
DBSCAN	min_samples	10	Fixed across all distance ranges
Height filtering	height tolerance Δh	0.2 m	Allowed height difference from seed pedestrian
STM	cosine similarity threshold τ	0.40	Threshold for template-based classification
Projection	projection plane	y–z plane	Plane used for shape image generation
Projection	projection window size	80 × 32	Fixed window for silhouette generation
Morphology	closing kernel size	3 × 3	Silhouette filling
Morphology	dilation kernel size	3 × 3	Silhouette reinforcement
CPA	application range	15–30 m in final setting	Target range for final augmentation
CPA	scaling decrement Δα	0.10	Reduction interval per contraction step
CPA	minimum scaling ratio α_min	0.20	Lower bound of contraction ratio
CPA	final number of steps	2	Final selected setting

Table 3. Overall pedestrian detection performance of Stage 1 and STM-based Stage 2.

Metric	Stage-1 (DBSCAN-Height Filter)	Stage-2 (DBSCAN-STM)
Recall	55.5%	44.7%
Precision	27.6%	60.5%
F1-score	36.8%	51.4%
STM Recall Stage-1	-	80.6%
3D Euclidean Error	0.15 m	0.14 m

Table 4. Distance-Wise Comparison of STM-Based Detection Performance.

Distance Range (m)	GT	Stage-1 (DBSCAN-Height Filter)				Stage-2 (DBSCAN-STM)				STM Recall Stage 1
Distance Range (m)	GT	F1 Score	Precision	Recall	3D Euc. Error	F1 Score	Precision	Recall	3D Euc. Error	STM Recall Stage 1
0–15 m	2093	63.0%	53.7%	76.2%	0.12 m	71.2%	69.4%	73.0%	0.12 m	95.8%
15–30 m	1771	31.1%	24.2%	43.5%	0.18 m	32.7%	48.4%	24.7%	0.20 m	56.8%
30–45 m	468	10.0%	6.2%	26.1%	0.28 m	12.3%	22.1%	8.5%	0.22 m	32.8%
45–60 m	118	3.2%	4.3%	2.5%	0.50 m	1.4%	4.3%	0.8%	0.13 m	33.3%
$\geq$ 60 m	37	0.0%	0.0%	0.0%	-	-	-	-	-	-

Table 5. Overall STM performance according to CPA contraction steps.

CPA Steps	STM Recall	STM Precision	STM F1-Score	STM Recall Relative to Stage 1	3D Euclidean Error
0	44.7%	60.5%	51.4%	80.6%	0.14 m
1	46.4%	56.6%	51.0%	83.1%	0.14 m
2	46.9%	52.8%	49.7%	85.8%	0.12 m
3	47.9%	50.3%	49.1%	86.1%	0.14 m
4	48.5%	47.2%	47.9%	86.7%	0.14 m
5	48.6%	45.3%	46.9%	86.9%	0.14 m
6	49.2%	43.1%	45.9%	87.8%	0.12 m
7	49.5%	42.2%	45.6%	88.7%	0.14 m
8	49.8%	41.1%	45.1%	88.8%	0.14 m

Table 6. Final STM Pipeline Results with Centralized Augmentation for the 15–30 m Range.

Metric	Stage-1 (DBSCAN-Height Filter)	Stage-2 (DBSCAN-STM)	DBSCAN–STM–CPA (Final)
Recall	55.5%	44.7%	46.7%
Precision	27.6%	60.5%	58.8%
F1-score	36.8%	51.4%	52.1%
STM Recall Stage-1	-	80.6%	83.7%
3D Euclidean Error	0.15 m	0.14 m	0.16 m

Table 7. Comparison of KITTI Pedestrian Detection Performance Based on 3D Center Distance.

Method	F1-Score	Precision	Recall	3D Euc. Err.
AVOD	0.48	0.57	0.43	0.14 m
PointPillars	0.52	0.66	0.43	0.11 m
F-PointNet	0.33	0.22	0.62	0.16 m
PointCNN	0.37	0.28	0.55	0.12 m
DBSCAN-Height Filter (Stage-1, Ours)	0.36	0.27	0.55	0.15 m
DBSCAN-STM (Stage-2, Ours)	0.51	0.60	0.44	0.14 m
DBSCAN-STM-CPA (Final, Ours)	0.52	0.58	0.46	0.16 m

Table 8. Supplementary IoU-based localization analysis of the proposed pipeline (threshold-based IoU matching rates are reported as percentages, while mean IoU values are reported as absolute scores).

Method	3D IoU ≥ 0.10	3D IoU ≥ 0.25	BEV IoU ≥ 0.10	BEV IoU ≥ 0.25	Mean 3D IoU	Mean BEV IoU
Stage 1: DBSCAN-Height Filter	88.4%	76.7%	88.6%	79.7%	0.346	0.371
Stage 2: DBSCAN-STM	88.6%	79.0%	89.0%	81.4%	0.354	0.377
Final DBSCAN-STM-CPA	88.9%	79.0%	89.2%	81.6%	0.353	0.376

Table 9. Runtime analysis of the proposed pipeline (ms/frame).

Configuration	Preprocessing	DBSCAN	Ground Snapping	Height Filtering	STM	CPA	Total Runtime
Without height filtering	6.400	65.187	48.529	0.000	31.384	0.000	151.500
With height filtering	6.400	65.187	48.529	0.220	4.786	0.000	125.121
Final DBSCAN-STM-CPA	6.400	65.187	48.529	0.220	5.757	0.130	126.223

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yeom, J.; Kim, J.; Kook, J. Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation. Appl. Sci. 2026, 16, 6286. https://doi.org/10.3390/app16136286

AMA Style

Yeom J, Kim J, Kook J. Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation. Applied Sciences. 2026; 16(13):6286. https://doi.org/10.3390/app16136286

Chicago/Turabian Style

Yeom, Jihwan, Jinman Kim, and Joongjin Kook. 2026. "Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation" Applied Sciences 16, no. 13: 6286. https://doi.org/10.3390/app16136286

APA Style

Yeom, J., Kim, J., & Kook, J. (2026). Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation. Applied Sciences, 16(13), 6286. https://doi.org/10.3390/app16136286

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation

Abstract

1. Introduction

2. Related Work

2.1. Semi-Automatic 3D Point Cloud Labeling

2.2. LiDAR-Based 3D Object Detection and Long-Range Sparsity

2.3. Cluster-Based 3D Candidate Generation

2.4. Seed-Based Template Matching for Pedestrian Refinement

2.5. Evaluation Metrics for Candidate Localization

2.6. Representation Enhancement and Point Augmentation

2.7. Positioning of the Proposed Method

3. Proposed Method

3.1. Dataset Configuration and Evaluation Scope

3.2. Preprocessing and Field-of-View Filtering

3.3. Distance-Aware DBSCAN Clustering

3.4. Ground-Snapping-Based Bounding-Box Refinement

3.5. Seed Pedestrian Selection and Height-Based Candidate Filtering

3.6. STM-Based Template Matching

3.7. Centralized Point Augmentation

3.8. Performance Evaluation

3.9. Summary of the Proposed Pipeline

4. Experimental Results

4.1. Experimental Setup and Evaluation Protocol

4.2. DBSCAN Parameter Exploration

4.3. Effect of STM-Based Candidate Refinement

4.4. Distance-Wise Analysis of STM Performance

4.5. Effect of Centralized Point Augmentation

4.6. Final Configuration of the Proposed Pipeline

4.7. Comparison with Existing 3D Detectors

4.8. Supplementary IoU and Runtime Analyses

4.8.1. Supplementary IoU-Based Localization Analysis

4.8.2. Runtime Analysis

4.9. Summary

5. Discussion

5.1. Effectiveness of Distance-Aware Candidate Generation

5.2. Role of STM as a Precision-Oriented Refinement Module

5.3. Interpretation of Centralized Point Augmentation

5.4. Comparison with Learning-Based 3D Detectors

5.5. Limitations

5.6. Practical Implications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI