MDPI - Publisher of Open Access Journals

22 pages, 989 KiB

Open AccessArticle

Intra-Frame Graph Structure and Inter-Frame Bipartite Graph Matching with ReID-Based Occlusion Resilience for Point Cloud Multi-Object Tracking

by Shaoyu Sun, Chunhao Shi, Chunyang Wang, Qing Zhou, Rongliang Sun, Bo Xiao, Yueyang Ding and Guan Xi

Electronics 2024, 13(15), 2968; https://doi.org/10.3390/electronics13152968 - 27 Jul 2024

Cited by 2 | Viewed by 1007

Abstract

Three-dimensional multi-object tracking (MOT) using lidar point cloud data is crucial for applications in autonomous driving, smart cities, and robotic navigation. It involves identifying objects in point cloud sequence data and consistently assigning unique identities to them throughout the sequence. Occlusions can lead [...] Read more.

Three-dimensional multi-object tracking (MOT) using lidar point cloud data is crucial for applications in autonomous driving, smart cities, and robotic navigation. It involves identifying objects in point cloud sequence data and consistently assigning unique identities to them throughout the sequence. Occlusions can lead to missed detections, resulting in incorrect data associations and ID switches. To address these challenges, we propose a novel point cloud multi-object tracker called GBRTracker. Our method integrates an intra-frame graph structure into the backbone to extract and aggregate spatial neighborhood node features, significantly reducing detection misses. We construct an inter-frame bipartite graph for data association and design a sophisticated cost matrix based on the center, box size, velocity, and heading angle. Using a minimum-cost flow algorithm to achieve globally optimal matching, thereby reducing ID switches. For unmatched detections, we design a motion-based re-identification (ReID) feature embedding module, which uses velocity and the heading angle to calculate similarity and association probability, reconnecting them with their corresponding trajectory IDs or initializing new tracks. Our method maintains high accuracy and reliability, significantly reducing ID switches and trajectory fragmentation, even in challenging scenarios. We validate the effectiveness of GBRTracker through comparative and ablation experiments on the NuScenes and Waymo Open Datasets, demonstrating its superiority over state-of-the-art methods. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

18 pages, 10878 KiB

Open AccessArticle

Edge-Triggered Three-Dimensional Object Detection Using a LiDAR Ring

by Eunji Song, Seyoung Jeong and Sung-Ho Hwang

Sensors 2024, 24(6), 2005; https://doi.org/10.3390/s24062005 - 21 Mar 2024

Cited by 1 | Viewed by 1801

Abstract

Autonomous driving recognition technology that can quickly and accurately recognize even small objects must be developed in high-speed situations. This study proposes an object point extraction method using rule-based LiDAR ring data and edge triggers to increase both speed and performance. The LiDAR’s [...] Read more.

Autonomous driving recognition technology that can quickly and accurately recognize even small objects must be developed in high-speed situations. This study proposes an object point extraction method using rule-based LiDAR ring data and edge triggers to increase both speed and performance. The LiDAR’s ring information is interpreted as a digital pulse to remove the ground, and object points are extracted by detecting discontinuous edges of the z value aligned with the ring ID and azimuth. A bounding box was simply created using DBSCAN and PCA to check recognition performance from the extracted object points. Verification of the results of removing the ground and extracting points through Ring Edge was conducted using SemanticKITTI and Waymo Open Dataset, and it was confirmed that both F1 scores were superior to RANSAC. In addition, extracting bounding boxes of objects also showed higher PDR index performance when verified in open datasets, virtual driving environments, and actual driving environments. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

23 pages, 2966 KiB

Open AccessArticle

DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds

by Yaqian Ning, Jie Cao, Chun Bao and Qun Hao

Remote Sens. 2023, 15(23), 5612; https://doi.org/10.3390/rs15235612 - 3 Dec 2023

Cited by 6 | Viewed by 3116

Abstract

The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To [...] Read more.

The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To solve this problem, we propose a novel approach called deformable voxel set attention, which we utilized to create a deformable voxel set transformer (DVST) backbone for 3D object detection from point clouds. The DVST aims to efficaciously integrate the flexible receptive field of the deformable mechanism and the powerful context modeling capability of the transformer. Specifically, we introduce the deformable mechanism into voxel-based set attention to selectively transfer candidate keys and values of foreground queries to important regions. An offset generation module was designed to learn the offsets of the foreground queries. Furthermore, a globally responsive convolutional feed-forward network with residual connection is presented to capture global feature interactions in hidden space. We verified the validity of the DVST on the KITTI and Waymo open datasets by constructing single-stage and two-stage models. The findings indicated that the DVST enhanced the average precision of the baseline model while preserving computational efficiency, achieving a performance comparable to state-of-the-art methods. Full article

(This article belongs to the Topic Deep Learning and Transformers’ Methods Applied to Remotely Captured Data)

► Show Figures

Graphical abstract

13 pages, 2399 KiB

Open AccessArticle

Voxel Transformer with Density-Aware Deformable Attention for 3D Object Detection

by Taeho Kim and Joohee Kim

Sensors 2023, 23(16), 7217; https://doi.org/10.3390/s23167217 - 17 Aug 2023

Cited by 2 | Viewed by 3354

Abstract

The Voxel Transformer (VoTr) is a prominent model in the field of 3D object detection, employing a transformer-based architecture to comprehend long-range voxel relationships through self-attention. However, despite its expanded receptive field, VoTr’s flexibility is constrained by its predefined receptive field. In this [...] Read more.

The Voxel Transformer (VoTr) is a prominent model in the field of 3D object detection, employing a transformer-based architecture to comprehend long-range voxel relationships through self-attention. However, despite its expanded receptive field, VoTr’s flexibility is constrained by its predefined receptive field. In this paper, we present a Voxel Transformer with Density-Aware Deformable Attention (VoTr-DADA), a novel approach to 3D object detection. VoTr-DADA leverages density-guided deformable attention for a more adaptable receptive field. It efficiently identifies key areas in the input using density features, combining the strengths of both VoTr and Deformable Attention. We introduce the Density-Aware Deformable Attention (DADA) module, which is specifically designed to focus on these crucial areas while adaptively extracting more informative features. Experimental results on the KITTI dataset and the Waymo Open dataset show that our proposed method outperforms the baseline VoTr model in 3D object detection while maintaining a fast inference speed. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

15 pages, 3037 KiB

Open AccessArticle

3D-DIoU: 3D Distance Intersection over Union for Multi-Object Tracking in Point Cloud

by Sazan Ali Kamal Mohammed, Mohd Zulhakimi Ab Razak and Abdul Hadi Abd Rahman

Sensors 2023, 23(7), 3390; https://doi.org/10.3390/s23073390 - 23 Mar 2023

Cited by 11 | Viewed by 3705

Abstract

Multi-object tracking (MOT) is a prominent and important study in point cloud processing and computer vision. The main objective of MOT is to predict full tracklets of several objects in point cloud. Occlusion and similar objects are two common problems that reduce the [...] Read more.

Multi-object tracking (MOT) is a prominent and important study in point cloud processing and computer vision. The main objective of MOT is to predict full tracklets of several objects in point cloud. Occlusion and similar objects are two common problems that reduce the algorithm’s performance throughout the tracking phase. The tracking performance of current MOT techniques, which adopt the ‘tracking-by-detection’ paradigm, is degrading, as evidenced by increasing numbers of identification (ID) switch and tracking drifts because it is difficult to perfectly predict the location of objects in complex scenes that are unable to track. Since the occluded object may have been visible in former frames, we manipulated the speed and location position of the object in the previous frames in order to guess where the occluded object might have been. In this paper, we employed a unique intersection over union (IoU) method in three-dimension (3D) planes, namely a distance IoU non-maximum suppression (DIoU-NMS) to accurately detect objects, and consequently we use 3D-DIoU for an object association process in order to increase tracking robustness and speed. By using a hybrid 3D DIoU-NMS and 3D-DIoU method, the tracking speed improved significantly. Experimental findings on the Waymo Open Dataset and nuScenes dataset, demonstrate that our multistage data association and tracking technique has clear benefits over previously developed algorithms in terms of tracking accuracy. In comparison with other 3D MOT tracking methods, our proposed approach demonstrates significant enhancement in tracking performances. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

18 pages, 4232 KiB

Open AccessArticle

Vehicle Detection on Occupancy Grid Maps: Comparison of Five Detectors Regarding Real-Time Performance

by Nils Defauw, Marielle Malfante, Olivier Antoni, Tiana Rakotovao and Suzanne Lesecq

Sensors 2023, 23(3), 1613; https://doi.org/10.3390/s23031613 - 2 Feb 2023

Cited by 5 | Viewed by 4465

Abstract

Occupancy grid maps are widely used as an environment model that allows the fusion of different range sensor technologies in real-time for robotics applications. In an autonomous vehicle setting, occupancy grid maps are especially useful for their ability to accurately represent the position [...] Read more.

Occupancy grid maps are widely used as an environment model that allows the fusion of different range sensor technologies in real-time for robotics applications. In an autonomous vehicle setting, occupancy grid maps are especially useful for their ability to accurately represent the position of surrounding obstacles while being robust to discrepancies between the fused sensors through the use of occupancy probabilities representing uncertainty. In this article, we propose to evaluate the applicability of real-time vehicle detection on occupancy grid maps. State of the art detectors in sensor-specific domains such as YOLOv2/YOLOv3 for images or PIXOR for LiDAR point clouds are modified to use occupancy grid maps as input and produce oriented bounding boxes enclosing vehicles as output. The five proposed detectors are trained on the Waymo Open automotive dataset and compared regarding the quality of their detections measured in terms of Average Precision (AP) and their real-time capabilities measured in Frames Per Second (FPS). Of the five detectors presented, one inspired from the PIXOR backbone reaches the highest

A P_{0.7}

of 0.82 and runs at 20 FPS. Comparatively, two other proposed detectors inspired from YOLOv2 achieve an almost as good, with a

A P_{0.7}

of 0.79 while running at 91 FPS. These results validate the feasibility of real-time vehicle detection on occupancy grids. Full article

(This article belongs to the Special Issue Environment Perception for Industrial Robotics, Connected and Autonomous Vehicles and Beyond)

► Show Figures

Figure 1

19 pages, 9394 KiB

Open AccessArticle

Comparison of Pedestrian Detectors for LiDAR Sensor Trained on Custom Synthetic, Real and Mixed Datasets

by Paweł Jabłoński, Joanna Iwaniec and Wojciech Zabierowski

Sensors 2022, 22(18), 7014; https://doi.org/10.3390/s22187014 - 16 Sep 2022

Cited by 8 | Viewed by 3491

Abstract

Deep learning algorithms for object detection used in autonomous vehicles require a huge amount of labeled data. Data collecting and labeling is time consuming and, most importantly, in most cases useful only for a single specific sensor application. Therefore, in the course of [...] Read more.

Deep learning algorithms for object detection used in autonomous vehicles require a huge amount of labeled data. Data collecting and labeling is time consuming and, most importantly, in most cases useful only for a single specific sensor application. Therefore, in the course of the research which is presented in this paper, the LiDAR pedestrian detection algorithm was trained on synthetically generated data and mixed (real and synthetic) datasets. The road environment was simulated with the application of the 3D rendering Carla engine, while the data for analysis were obtained from the LiDAR sensor model. In the proposed approach, the data generated by the simulator are automatically labeled, reshaped into range images and used as training data for a deep learning algorithm. Real data from Waymo open dataset are used to validate the performance of detectors trained on synthetic, real and mixed datasets. YOLOv4 neural network architecture is used for pedestrian detection from the LiDAR data. The goal of this paper is to verify if the synthetically generated data can improve the detector’s performance. Presented results prove that the YOLOv4 model trained on a custom mixed dataset achieved an increase in precision and recall of a few percent, giving an F1-score of 0.84. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

20 pages, 7143 KiB

Open AccessArticle

AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection

by Feng Shuang, Hanzhang Huang, Yong Li, Rui Qu and Pei Li

Remote Sens. 2022, 14(5), 1176; https://doi.org/10.3390/rs14051176 - 27 Feb 2022

Cited by 11 | Viewed by 3449

Abstract

The point clouds scanned by lidar are generally sparse, which can result in fewer sampling points of objects. To perform precise and effective 3D object detection, it is necessary to improve the feature representation ability to extract more feature information of the object [...] Read more.

The point clouds scanned by lidar are generally sparse, which can result in fewer sampling points of objects. To perform precise and effective 3D object detection, it is necessary to improve the feature representation ability to extract more feature information of the object points. Therefore, we propose an adaptive feature enhanced 3D object detection network based on point clouds (AFE-RCNN). AFE-RCNN is a point-voxel integrated network. We first voxelize the raw point clouds and obtain the voxel features through the 3D voxel convolutional neural network. Then, the 3D feature vectors are projected to the 2D bird’s eye view (BEV), and the relationship between the features in both spatial dimension and channel dimension is learned by the proposed residual of dual attention proposal generation module. The high-quality 3D box proposals are generated based on the BEV features and anchor-based approach. Next, we sample key points from raw point clouds to summarize the information of the voxel features, and obtain the key point features by the multi-scale feature extraction module based on adaptive feature adjustment. The neighboring contextual information is integrated into each key point through this module, and the robustness of feature processing is also guaranteed. Lastly, we aggregate the features of the BEV, voxels, and point clouds as the key point features that are used for proposal refinement. In addition, to ensure the correlation among the vertices of the bounding box, we propose a refinement loss function module with vertex associativity. Our AFE-RCNN exhibits comparable performance on the KITTI dataset and Waymo open dataset to state-of-the-art methods. On the KITTI 3D detection benchmark, for the moderate difficulty level of the car and the cyclist classes, the 3D detection mean average precisions of AFE-RCNN can reach 81.53% and 67.50%, respectively. Full article

(This article belongs to the Special Issue Advances in Deep Learning Based 3D Scene Understanding from LiDAR)

► Show Figures

Figure 1

23 pages, 5360 KiB

Open AccessArticle

On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data

by Manuel Carranza-García, Jesús Torres-Mateo, Pedro Lara-Benítez and Jorge García-Gutiérrez

Remote Sens. 2021, 13(1), 89; https://doi.org/10.3390/rs13010089 - 29 Dec 2020

Cited by 198 | Viewed by 19286

Abstract

Object detection using remote sensing data is a key task of the perception systems of self-driving vehicles. While many generic deep learning architectures have been proposed for this problem, there is little guidance on their suitability when using them in a particular scenario [...] Read more.

Object detection using remote sensing data is a key task of the perception systems of self-driving vehicles. While many generic deep learning architectures have been proposed for this problem, there is little guidance on their suitability when using them in a particular scenario such as autonomous driving. In this work, we aim to assess the performance of existing 2D detection systems on a multi-class problem (vehicles, pedestrians, and cyclists) with images obtained from the on-board camera sensors of a car. We evaluate several one-stage (RetinaNet, FCOS, and YOLOv3) and two-stage (Faster R-CNN) deep learning meta-architectures under different image resolutions and feature extractors (ResNet, ResNeXt, Res2Net, DarkNet, and MobileNet). These models are trained using transfer learning and compared in terms of both precision and efficiency, with special attention to the real-time requirements of this context. For the experimental study, we use the Waymo Open Dataset, which is the largest existing benchmark. Despite the rising popularity of one-stage detectors, our findings show that two-stage detectors still provide the most robust performance. Faster R-CNN models outperform one-stage detectors in accuracy, being also more reliable in the detection of minority classes. Faster R-CNN Res2Net-101 achieves the best speed/accuracy tradeoff but needs lower resolution images to reach real-time speed. Furthermore, the anchor-free FCOS detector is a slightly faster alternative to RetinaNet, with similar precision and lower memory usage. Full article

(This article belongs to the Special Issue Current Limits and New Challenges and Opportunities in Soft Computing, Machine Learning and Computational Intelligence for Remote Sensing)

► Show Figures

Graphical abstract

14 pages, 6420 KiB

Open AccessArticle

An LSTM-Based Autonomous Driving Model Using a Waymo Open Dataset

by Zhicheng Gu, Zhihao Li, Xuan Di and Rongye Shi

Appl. Sci. 2020, 10(6), 2046; https://doi.org/10.3390/app10062046 - 18 Mar 2020

Cited by 43 | Viewed by 13239

Abstract

The Waymo Open Dataset has been released recently, providing a platform to crowdsource some fundamental challenges for automated vehicles (AVs), such as 3D detection and tracking. While the dataset provides a large amount of high-quality and multi-source driving information, people in academia are [...] Read more.

The Waymo Open Dataset has been released recently, providing a platform to crowdsource some fundamental challenges for automated vehicles (AVs), such as 3D detection and tracking. While the dataset provides a large amount of high-quality and multi-source driving information, people in academia are more interested in the underlying driving policy programmed in Waymo self-driving cars, which is inaccessible due to AV manufacturers’ proprietary protection. Accordingly, academic researchers have to make various assumptions to implement AV components in their models or simulations, which may not represent the realistic interactions in real-world traffic. Thus, this paper introduces an approach to learn a long short-term memory (LSTM)-based model for imitating the behavior of Waymo’s self-driving model. The proposed model has been evaluated based on Mean Absolute Error (MAE). The experimental results show that our model outperforms several baseline models in driving action prediction. In addition, a visualization tool is presented for verifying the performance of the model. Full article

(This article belongs to the Special Issue Intelligent Transportation Systems: Beyond Intelligent Vehicles)

► Show Figures

Figure 1

18 pages, 6308 KiB

Open AccessEditor’s ChoiceArticle

Real-Time Vehicle Detection Framework Based on the Fusion of LiDAR and Camera

by Limin Guan, Yi Chen, Guiping Wang and Xu Lei

Electronics 2020, 9(3), 451; https://doi.org/10.3390/electronics9030451 - 7 Mar 2020

Cited by 41 | Viewed by 7300

Abstract

Vehicle detection is essential for driverless systems. However, the current single sensor detection mode is no longer sufficient in complex and changing traffic environments. Therefore, this paper combines camera and light detection and ranging (LiDAR) to build a vehicle-detection framework that has the [...] Read more.

Vehicle detection is essential for driverless systems. However, the current single sensor detection mode is no longer sufficient in complex and changing traffic environments. Therefore, this paper combines camera and light detection and ranging (LiDAR) to build a vehicle-detection framework that has the characteristics of multi adaptability, high real-time capacity, and robustness. First, a multi-adaptive high-precision depth-completion method was proposed to convert the 2D LiDAR sparse depth map into a dense depth map, so that the two sensors are aligned with each other at the data level. Then, the You Only Look Once Version 3 (YOLOv3) real-time object detection model was used to detect the color image and the dense depth map. Finally, a decision-level fusion method based on bounding box fusion and improved Dempster–Shafer (D–S) evidence theory was proposed to merge the two results of the previous step and obtain the final vehicle position and distance information, which not only improves the detection accuracy but also improves the robustness of the whole framework. We evaluated our method using the KITTI dataset and the Waymo Open Dataset, and the results show the effectiveness of the proposed depth completion method and multi-sensor fusion strategy. Full article

(This article belongs to the Section Electrical and Autonomous Vehicles)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI