See the Unseen: Grid-Wise Drivable Area Detection Dataset and Network Using LiDAR
Abstract
:1. Introduction
- A novel DA detection scheme termed grid-wise DA detection, which enhances the robustness and accuracy of DA detection across varied driving environments.
- Argoverse-grid, a newly proposed large-scale dataset specifically curated for training and benchmarking grid-wise DA detection models.
- A Grid-wise DA detection network, termed Grid-DATrNet, that utilizes global attention mechanisms to detect DA effectively, especially in distant regions where conventional methods often falter.
2. Related Works
2.1. Camera-Based DA Detection
2.2. Point-Wise DA Detection
2.3. Grid-Wise DA Detection
3. Methods
3.1. Argoverse-Grid
- Convert the polygon map to a rasterized map: Since the Argoverse 1 dataset provides DA maps as polygon files, we need to convert the Drivable Area (DA) map from polygon files into a rasterized map. The process involves labeling all grids between the polygon lines, which represent the edge of the DA, as the DA grid, and labeling the area outside the polygon lines as the non-DA grid. The results of converting polygons into rasterized DA map can be seen in Figure 3.
- Convert ego-vehicle position to ego-vehicle position in DA map coordinate: Next, we need to convert the ego-vehicle position from world coordinates into map coordinates and measure the yaw orientation of the ego-vehicle in DA map coordinates. This transformation process is achieved by using the translation and rotation matrix information in the Argoverse 1 dataset calibration files. The process of transforming the ego-vehicle position to the ego-vehicle coordinate in the DA map can be described as follows:Here, represents the ego-vehicle position in the DA map coordinate, represents the ego-vehicle position in world coordinates, represents the rotation matrix from ego-vehicle position to DA map position, T is basic transpose operation of matrix, and represents the translation matrix from ego-vehicle position to DA map position. The yaw orientation of the ego-vehicle in DA map coordinates can be extracted using a basic rotation matrix to yaw orientation transformation as follows:Here, represents the ego-vehicle position in world coordinates to DA map coordinate column 0 row 0, and represents the ego-vehicle position in world coordinates column 0 row 1.
- Crop and rotate DA map around ego-vehicle: Finally, we need to crop the DA map around the ego-vehicle based on the ego-vehicle coordinate in the DA map. The process of rotating and cropping the DA map around the ego-vehicle can be described as follows:Here, represents the whole DA rasterized map in the ego-vehicle city, process is a function that rotates the DA map by an angle in the counterclockwise direction, and is a function that crops the DA map at . Since the Argoverse dataset does not provide the angle rotation offset from ego-vehicle world coordinates to DA map coordinates, we adjusted the value of to find the angle offset that makes the DA label around the ego-vehicle have an upward orientation. The value of we found is approximately 89.8°.
3.2. Metrics Evaluation
3.3. Proposed Network: Grid-DATrNet
4. Experiment and Discussion
4.1. Experiment Setup
- Camera-based DA detection: We utilized the TwinLiteNet model [17], originally designed for DA and lane detection using images and MapTrNet model [4], which was originally used to detect DA edge detection using multi-cameras. The DA detected in all seven images from the Argoverse-grid dataset from TwinLiteNet model were projected onto the BEV map using calibration information while MapTrNet predicted from the seven camera inputs.
- Heuristic-based grid-wise DA detection: grid-wise DA detection using Gaussian Bayesian Kernel [31] was used to assess the performance of Grid-DATrNet compared to rule-based detection.
- Point-wise DA detection: a comparison was also made with point-wise DA detection as proposed in GndNet [6], a state-of-the-art LiDAR-based method for DA detection.
4.2. Implementation Details
4.3. Comparison with Heuristic DA Detection Using LiDAR
4.4. Comparison of Grid-DATrNet Using Various BEV Encoders and Feature Extractors
4.5. Comparison with Point-Wise DA Detection Using LiDAR
4.6. Comparison with Camera-Based Methods
4.7. Comparison of Grid-DATrNet in Various ROI
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Álvarez, J.M.; López, A.M.; Gevers, T.; Lumbreras, F. Combining Priors, Appearance, and Context for Road Detection. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1168–1178. [Google Scholar] [CrossRef]
- Wang, C.; Zhang, H.; Yang, M.; Wang, X.; Ye, L.; Guo, C. Automatic parking based on a bird’s eye view vision system. Adv. Mech. Eng. 2014, 6, 847406. [Google Scholar] [CrossRef]
- Liu, Y.; Yuan, T.; Wang, Y.; Wang, Y.; Zhao, H. Vectormapnet: End-to-end vectorized hd map learning. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: Birmingham, UK; pp. 22352–22369. [Google Scholar]
- Liao, B.; Chen, S.; Wang, X.; Cheng, T.; Zhang, Q.; Liu, W.; Huang, C. MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction. arXiv 2023, arXiv:2208.14437. [Google Scholar]
- Li, Q.; Wang, Y.; Wang, Y.; Zhao, H. HDMapNet: An Online HD Map Construction and Evaluation Framework. arXiv 2022, arXiv:2107.06307. [Google Scholar]
- Paigwar, A.; Erkent, O.; Sierra-Gonzalez, D.; Laugier, C. GndNet: Fast Ground Plane Estimation and Point Cloud Segmentation for Autonomous Vehicles. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 2150–2156. [Google Scholar] [CrossRef]
- Chang, M.F.; Lambert, J.; Sangkloy, P.; Singh, J.; Bak, S.; Hartnett, A.; Wang, D.; Carr, P.; Lucey, S.; Ramanan, D.; et al. Argoverse: 3D Tracking and Forecasting with Rich Maps. arXiv 2019, arXiv:1911.02620. [Google Scholar]
- Yuan, Y.; Jiang, Z.; Wang, Q. Video-based road detection via online structural learning. Neurocomputing 2015, 168, 336–347. [Google Scholar] [CrossRef]
- Aly, M. Real time detection of lane markers in urban streets. In Proceedings of the 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008; IEEE: New York, NY, USA, 2008. [Google Scholar] [CrossRef]
- Kong, H.; Audibert, J.Y.; Ponce, J. General Road Detection From a Single Image. IEEE Trans. Image Process. 2010, 19, 2211–2220. [Google Scholar] [CrossRef]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding Convolution for Semantic Segmentation. arXiv 2018, arXiv:1702.08502. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2017, arXiv:1612.01105. [Google Scholar]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. arXiv 2018, arXiv:1803.06815. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. arXiv 2019, arXiv:1809.02983. [Google Scholar]
- Wu, D.; Liao, M.W.; Zhang, W.T.; Wang, X.G.; Bai, X.; Cheng, W.Q.; Liu, W.Y. YOLOP: You Only Look Once for Panoptic Driving Perception. Mach. Intell. Res. 2022, 19, 550–562. [Google Scholar] [CrossRef]
- Che, Q.H.; Nguyen, D.P.; Pham, M.Q.; Lam, D.K. TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars. In Proceedings of the 2023 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), Quy Nhon, Vietnam, 5–6 October 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Yuan, T.; Liu, Y.; Wang, Y.; Wang, Y.; Zhao, H. StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction. arXiv 2023, arXiv:2308.12570. [Google Scholar]
- Qiao, L.; Ding, W.; Qiu, X.; Zhang, C. End-to-End Vectorized HD-Map Construction with Piecewise Bezier Curve. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 13218–13228. [Google Scholar]
- Blayney, H.; Tian, H.; Scott, H.; Goldbeck, N.; Stetson, C.; Angeloudis, P. Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 15365–15374. [Google Scholar]
- Liu, R.; Yuan, Z. Compact HD Map Construction via Douglas-Peucker Point Transformer. Proc. AAAI Conf. Artif. Intell. 2024, 38, 3702–3710. [Google Scholar] [CrossRef]
- Zhu, T.; Leng, J.; Zhong, J.; Zhang, Z.; Sun, C. LaneMapNet: Lane Network Recognization and HD Map Construction Using Curve Region Aware Temporal Bird’s-Eye-View Perception. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Island, Republic of Korea, 2–5 June 2024; pp. 2168–2175. [Google Scholar] [CrossRef]
- Jia, P.; Wen, T.; Luo, Z.; Yang, M.; Jiang, K.; Lei, Z.; Tang, X.; Liu, Z.; Cui, L.; Sheng, K.; et al. DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model. arXiv 2024, arXiv:2405.02008. [Google Scholar] [CrossRef]
- Hao, X.; Wei, M.; Yang, Y.; Zhao, H.; Zhang, H.; Zhou, Y.; Wang, Q.; Li, W.; Kong, L.; Zhang, J. Is Your HD Map Constructor Reliable under Sensor Corruptions? arXiv 2024, arXiv:2406.12214. [Google Scholar]
- Zhong, C.; Li, B.; Wu, T. Off-Road Drivable Area Detection: A Learning-Based Approach Exploiting LiDAR Reflection Texture Information. Remote. Sens. 2023, 15, 27. [Google Scholar] [CrossRef]
- Paek, D.H.; Kong, S.H.; Wijaya, K.T. K-lane: Lidar lane dataset and benchmark for urban roads and highways. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4450–4459. [Google Scholar]
- Ali, A.; Gergis, M.; Abdennadher, S.; El Mougy, A. Drivable Area Segmentation in Deteriorating Road Regions for Autonomous Vehicles using 3D LiDAR Sensor. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 845–852. [Google Scholar] [CrossRef]
- Zhang, W. LIDAR-based road and road-edge detection. In Proceedings of the 2010 IEEE Intelligent Vehicles Symposium, La Jolla, CA, USA, 21–24 June 2010; pp. 845–848. [Google Scholar] [CrossRef]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12697–12705. [Google Scholar]
- Nagy, I.; Oniga, F. Free Space Detection from Lidar Data Based on Semantic Segmentation. In Proceedings of the 2021 IEEE 17th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 28–30 October 2021; pp. 95–100. [Google Scholar] [CrossRef]
- Raguraman, S.J.; Park, J. Intelligent Drivable Area Detection System using Camera and Lidar Sensor for Autonomous Vehicle. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA, 31 July–1 August 2020; pp. 429–436. [Google Scholar] [CrossRef]
- Wang, L.; Huang, Y. LiDAR–camera fusion for road detection using a recurrent conditional random field model. Sci. Rep. 2022, 12, 11320. [Google Scholar] [CrossRef]
- Shaban, A.; Meng, X.; Lee, J.; Boots, B.; Fox, D. Semantic Terrain Classification for Off-Road Autonomous Driving. In Proceedings of the 5th Conference on Robot Learning, London, UK, 8–11 November 2022; Proceedings of Machine Learning Research. Faust, A., Hsu, D., Neumann, G., Eds.; PMLR: Birmingham, UK, 2022; Volume 164, pp. 619–629. [Google Scholar]
- Caltagirone, L.; Scheidegger, S.; Svensson, L.; Wahde, M. Fast LIDAR-based Road Detection Using Fully Convolutional Neural Networks. arXiv 2017, arXiv:1703.03613. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv 2016, arXiv:1511.00561. [Google Scholar] [CrossRef] [PubMed]
- Graham, B. Sparse 3D convolutional neural networks. arXiv 2015, arXiv:1505.02890. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: New York, NY, USA, 2018; pp. 1–8. [Google Scholar]
- Simony, M.; Milzy, S.; Amendey, K.; Gross, H.M. Complex-yolo: An euler-region-proposal for real-time 3D object detection on point clouds. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar]
- Tolstikhin, I.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. MLP-Mixer: An all-MLP Architecture for Vision. arXiv 2021, arXiv:2105.01601. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Xia, Z.; Pan, X.; Song, S.; Li, L.E.; Huang, G. Vision Transformer with Deformable Attention. arXiv 2022, arXiv:2201.00520. [Google Scholar]
Detector | Sensor | Accuracy (%) ↑ | F1-Score ↑ | Speed (ms) ↓ |
---|---|---|---|---|
TwinLiteNet (7 cameras) [17] | C | 72.88 | 0.5273 | 125 |
MapTr (7 cameras) [4] | C | 74.88 | 0.5324 | 182 |
Gaussian Bayes Kernel [31] | L | 82.33 | 0.6255 | 2400 |
GndNet [6] | L | 88.65 | 0.7225 | 200 |
Grid-DATrNet (Ours) | L | |||
- Transformer + PointPillar | 93.28 | 0.8328 | 231 | |
- Transformer + Point Projection | 93.40 | 0.8321 | 280 | |
- MLP Mixer + PointPillar | 91.40 | 0.8145 | 205 | |
- MLP Mixer + Point Projection | 91.63 | 0.8233 | 213 |
Detector | Computation (FLOPS) ↓ |
---|---|
Gaussian Bayes Kernel | 20 × |
GndNet | 100 × |
Grid-DATrNet (Ours) | |
- Transformer + PointPillar | 142 × |
- Transformer + Point Projection | 180 × |
- MLP Mixer + PointPillar | 110 × |
- MLP Mixer + Point Projection | 130 × |
ROI Size | Accuracy (%) | F1 Score | Inference Time (ms) | Computation (FLOPS) |
---|---|---|---|---|
120 m × 100 m | 93.28 | 0.8328 | 231 | 142 × |
200 m × 200 m | 88.35 | 0.7732 | 324 | 182 × |
400 m × 400 m | 75.42 | 0.6588 | 502 | 268 × |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Goenawan, C.R.; Paek, D.-H.; Kong, S.-H. See the Unseen: Grid-Wise Drivable Area Detection Dataset and Network Using LiDAR. Remote Sens. 2024, 16, 3777. https://doi.org/10.3390/rs16203777
Goenawan CR, Paek D-H, Kong S-H. See the Unseen: Grid-Wise Drivable Area Detection Dataset and Network Using LiDAR. Remote Sensing. 2024; 16(20):3777. https://doi.org/10.3390/rs16203777
Chicago/Turabian StyleGoenawan, Christofel Rio, Dong-Hee Paek, and Seung-Hyun Kong. 2024. "See the Unseen: Grid-Wise Drivable Area Detection Dataset and Network Using LiDAR" Remote Sensing 16, no. 20: 3777. https://doi.org/10.3390/rs16203777
APA StyleGoenawan, C. R., Paek, D. -H., & Kong, S. -H. (2024). See the Unseen: Grid-Wise Drivable Area Detection Dataset and Network Using LiDAR. Remote Sensing, 16(20), 3777. https://doi.org/10.3390/rs16203777