Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out
Abstract
:1. Introduction
- Based on the vibration characteristics of vehicles with tire blow-out, we establish noise models for cameras at different mounting positions to simulate the camera deviation in real scenarios.
- A geometry-guided auto-resizable kernel transformer method, namely GARKT, is proposed to address the perception problem in a tire blow-out situation in a robust and efficient way.
- Experimental results demonstrate that the GARKT can handle the tire blow-out situation and achieve acceptable performance of 3D object detection, which greatly enhances the driving safety.
2. Related Work
2.1. Camera-Based 3D Perception
2.2. Camera Calibration
3. Method
3.1. Modeling of Camera Deviation for Vehicles with Tire Blow-Out
Algorithm 1 Modeling of camera deviation |
; |
; |
; |
according to Formulas (6) and (7), separately; |
according to Formula (1); |
according to Formulas (3)–(5); |
according to Formula (2); |
. |
The Position of Tire Blow-Out | Degree 1 | Degree 2 | Degree 3 | Degree 4 | Degree 5 |
---|---|---|---|---|---|
Left-front tire | FL | F, BL | FR | BR | B |
Left-rear tire | B | BL | FL, BR | F | FR |
Right-front tire | FR | F, BR | FL | BL | B |
Right-rear tire | B | BR | FR, BL | F | FL |
3.2. Overall Architecture
3.3. Configuration of Kernel
4. Experiment
4.1. Experiment Setting
4.2. Main Results
4.3. Noisy Extrinsics Analysis
4.4. Convergence and Inference Speed
4.5. Visualization Results
4.6. Real Tire Blow-Out Experiment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Lang, A.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Bewley, A.; Sun, P.; Mensink, T.; Anguelov, D.; Sminchisescu, C. Range conditioned dilated convolutions for scale invariant 3d object detection. arXiv 2020, arXiv:2005.09927. [Google Scholar]
- Ma, X.; Wang, Z.; Li, H.; Zhang, P.; Ouyang, W.; Fan, X. Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6851–6860. [Google Scholar]
- Zhang, R.; Qiu, H.; Wang, T.; Xu, X.; Guo, Z.; Qiao, Y.; Gao, P.; Li, H. MonoDETR: Depth-aware transformer for monocular 3d object detection. arXiv 2022, arXiv:2203.13310. [Google Scholar]
- Rukhovich, D.; Vorontsova, A.; Konushin, A. ImVoxelNet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 2397–2406. [Google Scholar]
- Ma, Y.; Wang, T.; Bai, X. Vision-Centric BEV Perception: A Survey. arXiv 2022, arXiv:2208.02797. [Google Scholar]
- Qian, R.; Lai, X.; Li, X. 3D Object Detection for Autonomous Driving: A Survey. Pattern Recognit. 2022, 130, 108796. [Google Scholar] [CrossRef]
- Reading, C.; Harakeh, A.; Chae, J.; Waslander, S.L. Categorical Depth Distribution Network for Monocular 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Chen, X.; Kundu, K.; Zhang, Z.; Ma, H.; Fidler, S.; Urtasun, R. Monocular 3D Object Detection for Autonomous Driving. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2147–2156. [Google Scholar] [CrossRef]
- Mousavian, A.; Anguelov, D.; Košecká, J.; Flynn, J. 3D bounding box estimation using deep learning and geometry. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 5632–5640. [Google Scholar] [CrossRef]
- Liu, Z.; Wu, Z.; Toth, R. SMOKE: Single-stage monocular 3D object detection via keypoint estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
- Wang, T.; Zhu, X.; Pang, J.; Lin, D. FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 913–922. [Google Scholar] [CrossRef]
- Park, D.; Ambruş, R.; Guizilini, V.; Li, J.; Gaidon, A. Is Pseudo-Lidar needed for Monocular 3D Object detection? In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3122–3132. [Google Scholar] [CrossRef]
- Roddick, T.; Kendall, A.; Cipolla, R. Orthographic feature transform for monocular 3D object detection. In Proceedings of the 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, 9–12 September 2019. [Google Scholar]
- Xie, E.; Yu, Z.; Zhou, D.; Philion, J. M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Bird’s-Eye View Representation. arXiv 2022, arXiv:2204.05088. [Google Scholar]
- Wang, Y.; Guizilini, V.; Zhang, T.; Wang, Y.; Zhao, H.; Solomon, J. DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries. arXiv 2021, arXiv:2110.06922. [Google Scholar]
- Li, Z.; Wang, W.; Li, H.; Xie, E.; Sima, C.; Lu, T.; Qiao, Y.; Dai, J. BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers. In Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 1–18. [Google Scholar] [CrossRef]
- Li, H.; Sima, C.; Dai, J.; Wang, W.; Lu, L.; Wang, H.; Geng, X.; Zeng, J.; Li, Y.; Yang, J.; et al. Delving into the Devils of Bird’s-eye-view Perception: A Review, Evaluation and Recipe. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 46, 2151–2170. [Google Scholar] [CrossRef] [PubMed]
- Huang, J.; Huang, G.; Zhu, Z.; Ye, Y.; Du, D. BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View. arXiv 2021, arXiv:2112.11790. [Google Scholar]
- Li, Y.; Ge, Z.; Yu, G.; Yang, J.; Wang, Z.; Shi, Y.; Sun, J.; Li, Z. BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection. Proc. AAAI Conf. Artif. Intell. 2023, 37, 1477–1485. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhu, Z.; Zheng, W.; Huang, J.; Huang, G.; Zhou, J.; Lu, J. BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving. arXiv 2022, arXiv:2205.09743. [Google Scholar]
- Philion, J.; Fidler, S. Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12359 LNCS. In Computer Vision–ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIV 16; Springer International Publishing: Cham, Switzerland, 2020; pp. 194–210. [Google Scholar] [CrossRef]
- Jiang, Y.; Zhang, L.; Miao, Z.; Zhu, X.; Gao, J.; Hu, W.; Jiang, Y.-G. PolarFormer: Multi-camera 3D Object Detection with Polar Transformers. arXiv 2022, arXiv:2206.15398. [Google Scholar] [CrossRef]
- Chen, S.; Wang, X.; Cheng, T.; Zhang, Q.; Huang, C.; Liu, W. Polar Parametrization for Vision-based Surround-View 3D Detection. arXiv 2022, arXiv:2206.10965. [Google Scholar]
- Liu, Y.; Yan, J.; Jia, F.; Li, S.; Gao, A.; Wang, T.; Zhang, X.; Sun, J. PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images. arXiv 2022, arXiv:2206.01256. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Xu, H.; Lan, G.; Wu, S.; Hao, Q. Online Intelligent Calibration of Cameras and LiDARs for Autonomous Driving Systems. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019, Auckland, New Zealand, 27–30 October 2019. [Google Scholar] [CrossRef]
- Schneider, N.; Piewak, F.; Stiller, C.; Franke, U. RegNet: Multimodal sensor registration using deep neural networks. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017. [Google Scholar] [CrossRef]
- Kodaira, A.; Zhou, Y.; Zang, P.; Zhan, W.; Tomizuka, M. SST-Calib: Simultaneous Spatial-Temporal Parameter Calibration between LIDAR and Camera. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Macau, China, 8–12 October 2022; pp. 2896–2902. [Google Scholar] [CrossRef]
- Fan, S.; Wang, Z.; Huo, X.; Wang, Y.; Liu, J. Calibration-free BEV Representation for Infrastructure Perception. arXiv 2023, arXiv:2303.03583. [Google Scholar]
- Jiang, H.; Meng, W.; Zhu, H.; Zhang, Q.; Yin, J. Multi-Camera Calibration Free BEV Representation for 3D Object Detection. arXiv 2022, arXiv:2210.17252. [Google Scholar]
- Zhou, Y.; He, Y.; Zhu, H.; Wang, C.; Li, H.; Jiang, Q. Monocular 3D Object Detection: An Extrinsic Parameter Free Approach. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7552–7562. [Google Scholar] [CrossRef]
- Chen, S.; Cheng, T.; Wang, X.; Meng, W.; Zhang, Q.; Liu, W. Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer. arXiv 2022, arXiv:2206.04584. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Tan, M.; Le, Q. Effificientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Liu, Y.; Wang, T.; Zhang, X.; Sun, J. PETR: Position embedding transformation for multi-view 3d object detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Huang, B.; Li, Y.; Xie, E.; Liang, F.; Wang, L.; Shen, M.; Liu, F.; Wang, T.; Luo, P.; Shao, J. Fast-BEV: Towards Real-time On-vehicle Bird’s-Eye View Perception. arXiv 2023, arXiv:2301.07870. [Google Scholar]
Degree 1 | Degree 2 | Degree 3 | Degree 4 | Degree 5 | |
---|---|---|---|---|---|
Kernel Size | 11 × 11 | 9 × 9 | 7 × 7 | 5 × 5 | 3 × 3 |
Method | NDS | mAP | mATE | mASE | mAOE | mAVE | mAAE | FPS |
---|---|---|---|---|---|---|---|---|
BEVFormer | 0.479 | 0.378 | 0.775 | 0.274 | 0.406 | 0.440 | 0.205 | 1.7 |
PolarFormer | 0.405 | 0.341 | 0.834 | 0.278 | 0.432 | 0.896 | 0.215 | 12.1 |
BEVDet | 0.386 | 0.315 | 0.729 | 0.266 | 0.533 | 1.016 | 0.275 | 4.2 |
PETR | 0.388 | 0.324 | 0.812 | 0.269 | 0.535 | 0.986 | 0.231 | 10.0 |
Fast-BEV | 0.343 | 0.253 | 0.798 | 0.314 | 0.394 | 0.392 | 0.227 | 33 |
GARKT-EfficientNet-B4 | 0.439 | 0.351 | 0.706 | 0.268 | 0.379 | 0.812 | 0.200 | 20.577 |
GARKT-RestNet101 | 0.452 | 0.424 | 0.640 | 0.265 | 0.480 | 1.572 | 0.216 | 18.412 |
Method | NDS | mAP | mATE | mASE | mAOE | mAVE | mAAE | FPS |
---|---|---|---|---|---|---|---|---|
BEVFormer | 0.401 | 0.285 | 0.933 | 0.284 | 0.483 | 0.504 | 0.210 | 1.7 |
PolarFormer | 0.310 | 0.260 | 0.952 | 0.317 | 0.627 | 1.332 | 0.427 | 12.1 |
BEVDet | 0.321 | 0.238 | 0.880 | 0.278 | 0.632 | 1.163 | 0.282 | 4.2 |
PETR | 0.326 | 0.244 | 0.984 | 0.279 | 0.633 | 1.128 | 0.236 | 10.0 |
Fast-BEV | 0.286 | 0.190 | 0.960 | 0.325 | 0.466 | 0.447 | 0.234 | 33 |
GARKT-EfficientNet-B4 | 0.431 | 0.347 | 0.763 | 0.248 | 0.383 | 0.821 | 0.206 | 20.577 |
GARKT-RestNet101 | 0.440 | 0.376 | 0.688 | 0.253 | 0.407 | 1.194 | 0.132 | 18.412 |
3-3-3-3-3 | 5-5-5-5-3 | 7-7-7-5-3 | 9-9-7-5-3 | 11-9-7-5-3 | |
---|---|---|---|---|---|
FPS | 68.710 | 51.988 | 36.147 | 27.135 | 20.577 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, D.; Fan, X.; Dong, W.; Huang, C.; Li, J. Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out. Sensors 2024, 24, 4446. https://doi.org/10.3390/s24144446
Yang D, Fan X, Dong W, Huang C, Li J. Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out. Sensors. 2024; 24(14):4446. https://doi.org/10.3390/s24144446
Chicago/Turabian StyleYang, Dongsheng, Xiaojie Fan, Wei Dong, Chaosheng Huang, and Jun Li. 2024. "Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out" Sensors 24, no. 14: 4446. https://doi.org/10.3390/s24144446
APA StyleYang, D., Fan, X., Dong, W., Huang, C., & Li, J. (2024). Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out. Sensors, 24(14), 4446. https://doi.org/10.3390/s24144446