Special Issue "Deep Perception in Autonomous Driving"

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electrical and Autonomous Vehicles".

Deadline for manuscript submissions: 31 December 2023 | Viewed by 8994

Special Issue Editors

Departments of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zürich, Switzerland
Interests: autonomous driving; deep learning; imagevideo segmentation
School of Software, Shandong University, Ji'nan 250100, China
Interests: autonomousdriving;computer vision; deep learning
Departments of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zürich, Switzerland
Interests: autonomous driving; deep learning; embodied AI; human-centric visual understanding; vision-language reasoning

Special Issue Information

Dear Colleagues,

The perception of the physical environment plays an essential role in the field of autonomous driving. Starting with the technical equipment within vehicles, autonomous driving is ushering in fundamental changes. For instance, cameras and various sensors are equipped to enable autonomous driving systems to better recognize the environment. This opens amazing opportunities to achieve innovative autonomous driving functions but also imposes exciting challenges for the perception system and associated multimodal data processing/understanding modules. With this Special Issue, we attempt to showcase the latest advances and trends in deep learning-based techniques to build ‘autonomous driving friendly’ perception models.

This Special Issue will feature original research papers related to the models and algorithms for perception tasks in autonomous driving. The main topics of interest (but are not limited to):

  • Visual, LiDAR and radar perception
  • 2D/3D object detection, 2D/3D object tracking
  • Domain adaption for classification/detection/segmentation
  • Scene parsing, semantic segmentation, instance segmentation and panoptic segmentation.
  • Human-centric visual understanding, human–human/object interaction understanding
  • Human activity understanding, human intention modeling
  • Person re-identification, pose estimation and part parsing
  • Vehicle detection, pedestrian detection and road detection
  • New benchmark datasets and survey papers related to the topics

Dr. Tianfei Zhou
Prof. Dr. Xiankai Lu
Dr. Wenguan Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • autonomous vehicles
  • artificial intelligence
  • visual perception
  • deep learning

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Image-Based Pothole Detection Using Multi-Scale Feature Network and Risk Assessment
Electronics 2023, 12(4), 826; https://doi.org/10.3390/electronics12040826 - 06 Feb 2023
Viewed by 518
Abstract
Potholes on road surfaces pose a serious hazard to vehicles and passengers due to the difficulty detecting them and the short response time. Therefore, many government agencies are applying various pothole-detection algorithms for road maintenance. However, current methods based on object detection are [...] Read more.
Potholes on road surfaces pose a serious hazard to vehicles and passengers due to the difficulty detecting them and the short response time. Therefore, many government agencies are applying various pothole-detection algorithms for road maintenance. However, current methods based on object detection are unclear in terms of real-time detection when using low-spec hardware systems. In this study, the SPFPN-YOLOv4 tiny was developed by combining spatial pyramid pooling and feature pyramid network with CSPDarknet53-tiny. A total of 2665 datasets were obtained via data augmentation, such as gamma regulation, horizontal flip, and scaling to compensate for the lack of data, and were divided into training, validation, and test of 70%, 20%, and 10% ratios, respectively. As a result of the comparison of YOLOv2, YOLOv3, YOLOv4 tiny, and SPFPN-YOLOv4 tiny, the SPFPN-YOLOv4 tiny showed approximately 2–5% performance improvement in the mean average precision (intersection over union = 0.5). In addition, the risk assessment based on the proposed SPFPN-YOLOv4 tiny was calculated by comparing the tire contact patch size with pothole size by applying the pinhole camera and distance estimation equation. In conclusion, we developed an end-to-end algorithm that can detect potholes and classify the risks in real-time using 2D pothole images. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
3D Point Cloud Stitching for Object Detection with Wide FoV Using Roadside LiDAR
Electronics 2023, 12(3), 703; https://doi.org/10.3390/electronics12030703 - 31 Jan 2023
Viewed by 526
Abstract
Light Detection and Ranging (LiDAR) is widely used in the perception of physical environment to complete object detection and tracking tasks. The current methods and datasets are mainly developed for autonomous vehicles, which could not be directly used for roadside perception. This paper [...] Read more.
Light Detection and Ranging (LiDAR) is widely used in the perception of physical environment to complete object detection and tracking tasks. The current methods and datasets are mainly developed for autonomous vehicles, which could not be directly used for roadside perception. This paper presents a 3D point cloud stitching method for object detection with wide horizontal field of view (FoV) using roadside LiDAR. Firstly, the base detection model is trained by KITTI dataset and has achieved detection accuracy of 88.94. Then, a new detection range of 180° can be inferred to break the limitation of camera’s FoV. Finally, multiple sets of detection results from a single LiDAR are stitched to build a 360° detection range and solve the problem of overlapping objects. The effectiveness of the proposed approach has been evaluated using KITTI dataset and collected point clouds. The experimental results show that the point cloud stitching method offers a cost-effective solution to achieve a larger FoV, and the number of output objects has increased by 77.15% more than the base model, which improves the detection performance of roadside LiDAR. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
D-STGCN: Dynamic Pedestrian Trajectory Prediction Using Spatio-Temporal Graph Convolutional Networks
Electronics 2023, 12(3), 611; https://doi.org/10.3390/electronics12030611 - 26 Jan 2023
Viewed by 765
Abstract
Predicting pedestrian trajectories in urban scenarios is a challenging task that has a wide range of applications, from video surveillance to autonomous driving. The task is difficult since pedestrian behavior is affected by both their individual path’s history, their interactions with others, and [...] Read more.
Predicting pedestrian trajectories in urban scenarios is a challenging task that has a wide range of applications, from video surveillance to autonomous driving. The task is difficult since pedestrian behavior is affected by both their individual path’s history, their interactions with others, and with the environment. For predicting pedestrian trajectories, an attention-based interaction-aware spatio-temporal graph neural network is introduced. This paper introduces an approach based on two components: a spatial graph neural network (SGNN) for interaction-modeling and a temporal graph neural network (TGNN) for motion feature extraction. The SGNN uses an attention method to periodically collect spatial interactions between all pedestrians. The TGNN employs an attention method as well, this time to collect each pedestrian’s temporal motion pattern. Finally, in the graph’s temporal dimension characteristics, a time-extrapolator convolutional neural network (CNN) is employed to predict the trajectories. Using a lower variable size (data and model) and a better accuracy, the proposed method is compact, efficient, and better than the one represented by the social-STGCNN. Moreover, using three video surveillance datasets (ETH, UCY, and SDD), D-STGCN achieves better experimental results considering the average displacement error (ADE) and final displacement error (FDE) metrics, in addition to predicting more social trajectories. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
Time Synchronization and Space Registration of Roadside LiDAR and Camera
Electronics 2023, 12(3), 537; https://doi.org/10.3390/electronics12030537 - 20 Jan 2023
Viewed by 640
Abstract
The sensing system consisting of Light Detection and Ranging (LiDAR) and a camera provides complementary information about the surrounding environment. To take full advantage of multi-source data provided by different sensors, an accurate fusion of multi-source sensor information is needed. Time synchronization and [...] Read more.
The sensing system consisting of Light Detection and Ranging (LiDAR) and a camera provides complementary information about the surrounding environment. To take full advantage of multi-source data provided by different sensors, an accurate fusion of multi-source sensor information is needed. Time synchronization and space registration are the key technologies that affect the fusion accuracy of multi-source sensors. Due to the difference in data acquisition frequency and deviation in startup time between LiDAR and the camera, asynchronous data acquisition between LiDAR and camera is easy to occur, which has a significant influence on subsequent data fusion. Therefore, a time synchronization method of multi-source sensors based on frequency self-matching is developed in this paper. Without changing the sensor frequency, the sensor data are processed to obtain the same number of data frames and set the same ID number, so that the LiDAR and camera data correspond one by one. Finally, data frames are merged into new data packets to realize time synchronization between LiDAR and camera. Based on time synchronization, to achieve spatial synchronization, a nonlinear optimization algorithm of joint calibration parameters is used, which can effectively reduce the reprojection error in the process of sensor spatial registration. The accuracy of the proposed time synchronization method is 99.86% and the space registration accuracy is 99.79%, which is better than the calibration method of the Matlab calibration toolbox. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
VMLH: Efficient Video Moment Location via Hashing
Electronics 2023, 12(2), 420; https://doi.org/10.3390/electronics12020420 - 13 Jan 2023
Cited by 1 | Viewed by 604
Abstract
Video-moment location by query is a hot topic in video understanding. However, most of the existing methods ignore the importance of location efficiency in practical application scenarios; video and query sentences have to be fed into the network at the same time during [...] Read more.
Video-moment location by query is a hot topic in video understanding. However, most of the existing methods ignore the importance of location efficiency in practical application scenarios; video and query sentences have to be fed into the network at the same time during the retrieval, which leads to low efficiency. To address this issue, in this study, we propose an efficient video moment location via hashing (VMLH). In the proposed method, query sentences and video clips are, respectively, converted into hash codes and hash code sets, in which the semantic similarity between query sentences and video clips is preserved. The location prediction network is designed to predict the corresponding timestamp according to the similarity among hash codes, and the videos do not need to be fed into the network during the process of retrieval and location. Furthermore, different from the existing methods, which require complex interactions and fusion between video and query sentences, the proposed VMLH method only needs a simple XOR operation among codes to locate the video moment with high efficiency. This paper lays the foundation for fast video clip positioning and makes it possible to apply large-scale video clip positioning in practice. The experimental results on two public datasets demonstrate the effectiveness of the method. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
Few-Shot Learning Based on Double Pooling Squeeze and Excitation Attention
Electronics 2023, 12(1), 27; https://doi.org/10.3390/electronics12010027 - 21 Dec 2022
Viewed by 798
Abstract
Training a generalized reliable model is a great challenge since sufficiently labeled data are unavailable in some open application scenarios. Few-shot learning (FSL) aims to learn new problems with only a few examples that can tackle this problem and attract extensive attention. This [...] Read more.
Training a generalized reliable model is a great challenge since sufficiently labeled data are unavailable in some open application scenarios. Few-shot learning (FSL) aims to learn new problems with only a few examples that can tackle this problem and attract extensive attention. This paper proposes a novel few-shot learning method based on double pooling squeeze and excitation attention (dSE) for the purpose of improving the discriminative ability of the model by proposing a novel feature expression. Specifically, the proposed dSE module adopts two types of pooling to emphasize features responding to foreground object channels. We employed both the pixel descriptor and channel descriptor to capture locally identifiable channel features and pixel features of an image (as opposed to traditional few-shot learning methods). Additionally, in order to improve the robustness of the model, we designed a new loss function. To verify the performance of the method, a large number of experiments were performed on multiple standard few-shot image benchmark datasets, showing that our framework can outperform several existing approaches. Moreover, we performed extensive experiments on three more challenging fine-grained few-shot datasets, the experimental results demonstrate that the proposed method achieves state-of-the-art performances. In particular, this work achieves 92.36% accuracy under the 5-way–5-shot classification setting of the Stanford Cars dataset. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
Selection of Relevant Geometric Features Using Filter-Based Algorithms for Point Cloud Semantic Segmentation
Electronics 2022, 11(20), 3310; https://doi.org/10.3390/electronics11203310 - 14 Oct 2022
Viewed by 690
Abstract
Semantic segmentation of mobile LiDAR point clouds is an essential task in many fields such as road network management, mapping, urban planning, and 3D High Definition (HD) city maps for autonomous vehicles. This study presents an approach to improve the evaluation metrics of [...] Read more.
Semantic segmentation of mobile LiDAR point clouds is an essential task in many fields such as road network management, mapping, urban planning, and 3D High Definition (HD) city maps for autonomous vehicles. This study presents an approach to improve the evaluation metrics of deep-learning-based point cloud semantic segmentation using 3D geometric features and filter-based feature selection. Information gain (IG), Chi-square (Chi2), and ReliefF algorithms are used to select relevant features. RandLA-Net and Superpoint Grapgh (SPG), the current and effective deep learning networks, were preferred for applying semantic segmentation. RandLA-Net and SPG were fed by adding geometric features in addition to 3D coordinates (x, y, z) directly without any change in the structure of the point clouds. Experiments were carried out on three challenging mobile LiDAR datasets: Toronto3D, SZTAKI-CityMLS, and Paris. As a result of the study, it was demonstrated that the selection of relevant features improved accuracy in all datasets. For RandLA-Net, mean Intersection-over-Union (mIoU) was 70.1% with the features selected with Chi2 in the Toronto3D dataset, 84.1% mIoU was obtained with the features selected with the IG in the SZTAKI-CityMLS dataset, and 55.2% mIoU with the features selected with the IG and ReliefF in the Paris dataset. For SPG, 69.8% mIoU was obtained with Chi2 in the Toronto3D dataset, 77.5% mIoU was obtained with IG in SZTAKI-CityMLS, and 59.0% mIoU was obtained with IG and ReliefF in Paris. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Graphical abstract

Article
Anchor-Free Object Detection with Scale-Aware Networks for Autonomous Driving
Electronics 2022, 11(20), 3303; https://doi.org/10.3390/electronics11203303 - 13 Oct 2022
Cited by 1 | Viewed by 611
Abstract
Current anchor-free object detectors do not rely on anchors and obtain comparable accuracy with anchor-based detectors. However, anchor-free object detectors that adopt a single-level feature map and lack a feature pyramid network (FPN) prior information about an object’s scale; thus, they insufficiently adapt [...] Read more.
Current anchor-free object detectors do not rely on anchors and obtain comparable accuracy with anchor-based detectors. However, anchor-free object detectors that adopt a single-level feature map and lack a feature pyramid network (FPN) prior information about an object’s scale; thus, they insufficiently adapt to large object scale variation, especially for autonomous driving in complex road scenes. To address this problem, we propose a divide-and-conquer solution and attempt to introduce some prior information about object scale variation into the model when maintaining a streamlined network structure. Specifically, for small-scale objects, we add some dense layer jump connections between the shallow high-resolution feature layers and the deep high-semantic feature layers. For large-scale objects, dilated convolution is used as an ingredient to cover the features of large-scale objects. Based on this, a scale adaptation module is proposed. In this module, different dilated convolution expansion rates are utilized to change the network’s receptive field size, which can adapt to changes from small-scale to large-scale. The experimental results show that the proposed model has better detection performance with different object scales than existing detectors. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
Monoscopic Phase Measuring Deflectometry Simulation and Verification
Electronics 2022, 11(10), 1634; https://doi.org/10.3390/electronics11101634 - 20 May 2022
Cited by 1 | Viewed by 758
Abstract
The three-dimensional (3D) shape of specular surfaces is important in aerospace, precision instrumentation, and automotive manufacturing. The phase measuring deflectometry (PMD) method is an efficient and highly accurate technique to measure specular surfaces. A novel simulation model with simulated fringe patterns for monoscopic [...] Read more.
The three-dimensional (3D) shape of specular surfaces is important in aerospace, precision instrumentation, and automotive manufacturing. The phase measuring deflectometry (PMD) method is an efficient and highly accurate technique to measure specular surfaces. A novel simulation model with simulated fringe patterns for monoscopic PMD is developed in this study. Based on the pre-calibration and the ray-tracing model of the monoscopic PMD system, a comprehensive model from deformed pattern generation to shape reconstruction was constructed. Experimental results showed that this model achieved high levels of measuring accuracy in both planar and concave surfaces measurement. In planar surface measurement, the peak to valley (PV) value and root mean square (RMS) value of the reconstructed shape can reach 26.93 nm and 10.32 nm, respectively. In addition, the accuracy of the reconstructed concave surface can reach a micrometre scale. This work potentially fills critical gaps in monoscopic PMD simulation and provides a cost-effective method of PMD study. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
Few-Shot Object Detection Method Based on Knowledge Reasoning
Electronics 2022, 11(9), 1327; https://doi.org/10.3390/electronics11091327 - 22 Apr 2022
Viewed by 988
Abstract
Human beings have the ability to quickly recognize novel concepts with the help of scene semantics. This kind of ability is meaningful and full of challenge for the field of machine learning. At present, object recognition methods based on deep learning have achieved [...] Read more.
Human beings have the ability to quickly recognize novel concepts with the help of scene semantics. This kind of ability is meaningful and full of challenge for the field of machine learning. At present, object recognition methods based on deep learning have achieved excellent results with the use of large-scale labeled data. However, the data scarcity of novel objects significantly affects the performance of these recognition methods. In this work, we investigated utilizing knowledge reasoning with visual information in the training of a novel object detector. We trained a detector to project the image representations of objects into an embedding space. Knowledge subgraphs were extracted to describe the semantic relation of the specified visual scenes. The spatial relationship, function relationship, and the attribute description were defined to realize the reasoning of novel classes. The designed few-shot detector, named KR-FSD, is robust and stable to the variation of shots of novel objects, and it also has advantages when detecting objects in a complex environment due to the flexible extensibility of KGs. Experiments on VOC and COCO datasets showed that the performance of the detector was increased significantly when the novel class was strongly associated with some of the base classes, due to the better knowledge propagation between the novel class and the related groups of classes. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Article
Multi-Task Learning Using Gradient Balance and Clipping with an Application in Joint Disparity Estimation and Semantic Segmentation
Electronics 2022, 11(8), 1217; https://doi.org/10.3390/electronics11081217 - 12 Apr 2022
Viewed by 742
Abstract
In this paper, we propose a novel multi-task learning (MTL) strategy from the gradient optimization view which enables automatically learning the optimal gradient from different tasks. In contrast with current multi-task learning methods which rely on careful network architecture adjustment or elaborate loss [...] Read more.
In this paper, we propose a novel multi-task learning (MTL) strategy from the gradient optimization view which enables automatically learning the optimal gradient from different tasks. In contrast with current multi-task learning methods which rely on careful network architecture adjustment or elaborate loss functions optimization, the proposed gradient-based MTL is simple and flexible. Specifically, we introduce a multi-task stochastic gradient descent optimization (MTSGD) to learn task-specific and shared representation in the deep neural network. In MTSGD, we decompose the total gradient into multiple task-specific sub-gradients and find the optimal sub-gradient via gradient balance and clipping operations. In this way, the learned network can satisfy the performance of specific task optimization while maintaining the shared representation. We take the joint learning of semantic segmentation and disparity estimation tasks as the exemplar to verify the effectiveness of the proposed method. Extensive experimental results on a large-scale dataset show that our proposed algorithm is superior to the baseline methods by a large margin. Meanwhile, we perform a series of ablation studies to have a deep analysis of gradient descent for MTL. Full article
(This article belongs to the Special Issue Deep Perception in Autonomous Driving)
Show Figures

Figure 1

Back to TopTop