TARTS: Training-Free Adaptive Reference-Guided Traversability Segmentation with Automated Footprint Supervision and Experimental Verification
Abstract
1. Introduction
- 1.
- We propose a comprehensive training-free traversability segmentation framework that combines reference-guided one-shot prototype initialization with trajectory-based online adaptation, enabling both immediate deployment capabilities and continuous performance improvement through embodied interaction with the environment.
- 2.
- We demonstrate that decoupling semantic recognition from fine-grained spatial localization—by leveraging SLIC for perceptual grouping and patch-level DINO features for semantic discrimination—effectively alleviates the spatial inconsistency inherent in vision foundation models.
2. Related Work
2.1. Semantic Traversability Analysis
2.2. Traversability from Self-Supervision
3. Methodology
3.1. System Overview
3.2. Semantic Feature Extraction via Vision Foundation Model
3.3. Reference-Guided One-Shot Traversability Prototype Seeding
3.4. Segmentation Inference
3.4.1. Superpixel-Based Feature Aggregation
3.4.2. Similarity Matching and Adaptive Thresholding
3.5. Online Prototype Adaptation via Trajectory-Guided Self-Supervision
3.5.1. Self-Supervision via Retrospective Footprint Projection
- Timestep i
- Patch-level feature map extracted from the corresponding image
- Odometry data encoding the robot’s motion
- Camera intrinsic parameters K
- Camera extrinsic parameters, specifically the translation
3.5.2. Prototype Update with Exponential Moving Average
4. Experiments
4.1. Dataset
4.1.1. Reference-Guided Traversability Segmentation Dataset (RTSD)
4.1.2. Off-Road Freespace Detection (ORFD)
4.2. Implementation Details
4.3. Quantitative Results
4.3.1. Comparison to Baseline and SOTA
4.3.2. Threshold Selection Strategy Ablation
4.3.3. Feature-Superpixel Alignment Strategy Ablation
4.4. Qualitative Results
4.5. Computational Performance
4.6. Analysis of Typical Failure Cases and Environmental Boundaries
4.6.1. Reflective and Dynamic Surfaces (e.g., Water, Sheet Ice)
4.6.2. Semantic–Geometric Ambiguity
4.6.3. Extreme Photometric Degradation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| TARTS | Training-free Adaptive Reference-guided Traversability Segmentation |
| ViT | Vision Transformer |
| EMA | Exponential Moving Average |
| ORFD | Off-Road Freespace Detection |
| RTSD | Reference-guided Traversability Segmentation Dataset |
References
- Kim, Y.; Lee, J.H.; Lee, C.; Mun, J.; Youm, D.; Park, J.; Hwangbo, J. Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy. IEEE Robot. Autom. Lett. 2024, 9, 10423–10430. [Google Scholar] [CrossRef]
- Fankhauser, P.; Hutter, M. A universal grid map library: Implementation and use case for rough terrain navigation. In Robot Operating System (ROS): The Complete Reference (Volume 1); Springer: Berlin/Heidelberg, Germany, 2016; pp. 99–120. [Google Scholar]
- Papadakis, P. Terrain traversability analysis methods for unmanned ground vehicles: A survey. Eng. Appl. Artif. Intell. 2013, 26, 1373–1385. [Google Scholar] [CrossRef]
- Ægidius, S.; Hadjivelichkov, D.; Jiao, J.; Embley-Riches, J.; Kanoulas, D. Watch your stepp: Semantic traversability estimation using pose projected features. In 2025 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2025; pp. 2376–2382. [Google Scholar]
- Mattamala, M.; Frey, J.; Libera, P.; Chebrolu, N.; Martius, G.; Cadena, C.; Hutter, M.; Fallon, M. Wild visual navigation: Fast traversability learning via pre-trained models and online self-supervision. Auton. Robot. 2025, 49, 19. [Google Scholar] [CrossRef]
- Li, J.; Zhang, Y.; Yun, P.; Zhou, G.; Chen, Q.; Fan, R. RoadFormer: Duplex transformer for RGB-normal semantic road scene parsing. IEEE Trans. Intell. Veh. 2024, 9, 5163–5172. [Google Scholar] [CrossRef]
- Liao, Y.; Kang, S.; Li, J.; Liu, Y.; Liu, Y.; Dong, Z.; Yang, B.; Chen, X. Mobile-seed: Joint semantic segmentation and boundary detection for mobile robots. IEEE Robot. Autom. Lett. 2024, 9, 3902–3909. [Google Scholar] [CrossRef]
- Guan, T.; Kothandaraman, D.; Chandra, R.; Sathyamoorthy, A.J.; Weerakoon, K.; Manocha, D. Ga-nav: Efficient terrain segmentation for robot navigation in unstructured outdoor environments. IEEE Robot. Autom. Lett. 2022, 7, 8138–8145. [Google Scholar] [CrossRef]
- Zhang, Y.; Yin, M.; Bi, W.; Yan, H.; Bian, S.; Zhang, C.H.; Hua, C. ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models. IEEE Trans. Robot. 2025, 41, 1568–1580. [Google Scholar] [CrossRef]
- Wellhausen, L.; Dosovitskiy, A.; Ranftl, R.; Walas, K.; Cadena, C.; Hutter, M. Where should i walk? predicting terrain properties from images via self-supervised learning. IEEE Robot. Autom. Lett. 2019, 4, 1509–1516. [Google Scholar] [CrossRef]
- Gasparino, M.V.; Sivakumar, A.N.; Chowdhary, G. Wayfaster: A self-supervised traversability prediction for increased navigation awareness. In 2024 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2024; pp. 8486–8492. [Google Scholar]
- Siméoni, O.; Vo, H.V.; Seitzer, M.; Baldassarre, F.; Oquab, M.; Jose, C.; Khalidov, V.; Szafraniec, M.; Yi, S.; Ramamonjisoa, M.; et al. Dinov3. arXiv 2025, arXiv:2508.10104. [Google Scholar] [PubMed]
- Ng, H.F. Automatic thresholding for defect detection. Pattern Recognit. Lett. 2006, 27, 1644–1649. [Google Scholar] [CrossRef]
- Ewen, P.; Li, A.; Chen, Y.; Hong, S.; Vasudevan, R. These maps are made for walking: Real-time terrain property estimation for mobile robots. IEEE Robot. Autom. Lett. 2022, 7, 7083–7090. [Google Scholar] [CrossRef]
- Jung, S.; Lee, J.; Meng, X.; Boots, B.; Lambert, A. V-strong: Visual self-supervised traversability learning for off-road navigation. In 2024 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2024; pp. 1766–1773. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. Dinov2: Learning robust visual features without supervision. arXiv 2024, arXiv:2304.07193. [Google Scholar] [CrossRef]
- Zürn, J.; Burgard, W.; Valada, A. Self-supervised visual terrain classification from unsupervised acoustic feature learning. IEEE Trans. Robot. 2020, 37, 466–481. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2023; pp. 4015–4026. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging properties in self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2021; pp. 9650–9660. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Pariza, V.; Salehi, M.; Burghouts, G.; Locatello, F.; Asano, Y.M. NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency. arXiv 2024, arXiv:2408.11054v1. [Google Scholar]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
- Wigness, M.; Eum, S.; Rogers, J.G.; Han, D.; Kwon, H. A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: New York, NY, USA, 2019; pp. 5000–5007. [Google Scholar]
- Min, C.; Jiang, W.; Zhao, D.; Xu, J.; Xiao, L.; Nie, Y.; Dai, B. Orfd: A dataset and benchmark for off-road freespace detection. In 2022 International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2022; pp. 2532–2538. [Google Scholar]
- Hazirbas, C.; Ma, L.; Domokos, C.; Cremers, D. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 213–228. [Google Scholar]
- Fan, R.; Wang, H.; Cai, P.; Liu, M. Sne-roadseg: Incorporating surface normal information into semantic segmentation for accurate freespace detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 340–356. [Google Scholar]
- Sun, Y.; Zuo, W.; Liu, M. RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes. IEEE Robot. Autom. Lett. 2019, 4, 2576–2583. [Google Scholar] [CrossRef]
- Ha, Q.; Watanabe, K.; Karasawa, T.; Ushiku, Y.; Harada, T. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: New York, NY, USA, 2017; pp. 5108–5115. [Google Scholar]
- Jin, Z.; Li, H.; Qin, Z.; Wang, Z. Gradient-free cooperative source-seeking of quadrotor under disturbances and communication constraints. IEEE Trans. Ind. Electron. 2024, 72, 1969–1979. [Google Scholar] [CrossRef]
- Jin, Z. Global asymptotic stability analysis for autonomous optimization. IEEE Trans. Autom. Control 2025, 70, 6953–6960. [Google Scholar] [CrossRef]






| Method | RTSD | ORFD-All | ||||||
|---|---|---|---|---|---|---|---|---|
| P | R | F | IoU | P | R | F | IoU | |
| TARTS- | 93.6 | 99.7 | 96.5 | 93.3 | 93.5 | 96.2 | 94.8 | 90.2 |
| TARTS | 95.1 | 99.3 | 97.2 | 94.5 | 94.4 | 97.0 | 95.7 | 91.7 |
| Method | Modality | ORFD-Test | |||
|---|---|---|---|---|---|
| P | R | F | IoU | ||
| FuseNet [26] | RGB + Sparse Depth | 74.5 | 85.2 | 79.5 | 66.0 |
| SNE-RoadSeg [27] | RGB + Surface Normal | 86.7 | 92.7 | 89.6 | 81.2 |
| OFF-Net [25] | RGB + Surface Normal | 86.6 | 94.3 | 90.3 | 82.3 |
| RTFNet [28] | RGB + Surface Normal | 93.8 | 96.5 | 95.1 | 90.7 |
| MFNet [29] | RGB + Surface Normal | 89.6 | 90.3 | 89.9 | 81.7 |
| RoadFormer [6] | RGB + Surface Normal | 95.1 | 97.2 | 96.1 | 92.5 |
| TARTS | RGB | 96.4 | 97.5 | 97.0 | 94.1 |
| Method | Threshold Strategies | RTSD | |||
|---|---|---|---|---|---|
| P | R | F | IoU | ||
| TARTS- | Median | 72.2 | 99.9 | 82.9 | 72.2 |
| TARTS | Median | 72.1 | 99.9 | 83.5 | 72.0 |
| TARTS- | Mean | 91.0 | 99.9 | 95.3 | 91.0 |
| TARTS | Mean | 91.3 | 99.7 | 95.1 | 91.0 |
| TARTS- | Otsu-standard | 91.2 | 98.8 | 94.9 | 90.2 |
| TARTS | Otsu-standard | 95.0 | 99.1 | 97.0 | 94.2 |
| TARTS- | Otsu-valley-emphasis | 93.6 | 99.7 | 96.5 | 93.3 |
| TARTS | Otsu-valley-emphasis | 95.1 | 99.3 | 97.2 | 94.5 |
| Method | Alignment Strategy | RTSD | |||
|---|---|---|---|---|---|
| P | R | F | IoU | ||
| TARTS- | Bilinear Interpolation | 91.0 | 98.9 | 94.8 | 90.1 |
| TARTS- | Patch-level Alignment | 93.6 | 99.7 | 96.5 | 93.3 |
| TARTS | Bilinear Interpolation | 94.7 | 99.2 | 97.0 | 94.1 |
| TARTS | Patch-level Alignment | 95.1 | 99.3 | 97.2 | 94.5 |
| ORFD-All | |||||
| P | R | F | IoU | ||
| TARTS- | Bilinear Interpolation | 92.3 | 96.9 | 94.5 | 89.6 |
| TARTS- | Patch-level Alignment | 93.5 | 96.2 | 94.9 | 90.2 |
| TARTS | Bilinear Interpolation | 93.0 | 97.7 | 95.3 | 91.0 |
| TARTS | Patch-level Alignment | 94.4 | 97.0 | 95.7 | 91.7 |
| Resolution | Stage 1 | Stage 2 | Stage 3 | Total (FPS) |
|---|---|---|---|---|
| 25.16 | 6.15 | 8.22 | 41.52 (24.1) | |
| 30.23 | 6.73 | 7.62 | 44.57 (22.4) | |
| 28.60 | 7.73 | 8.50 | 44.83 (22.3) | |
| 33.30 | 7.58 | 9.10 | 49.97 (20.0) | |
| 36.42 | 7.85 | 8.87 | 53.14 (18.8) | |
| 39.88 | 8.10 | 9.01 | 57.00 (17.5) | |
| 40.17 | 8.40 | 8.95 | 57.53 (17.4) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Shi, S.; Zeng, L. TARTS: Training-Free Adaptive Reference-Guided Traversability Segmentation with Automated Footprint Supervision and Experimental Verification. Electronics 2026, 15, 1194. https://doi.org/10.3390/electronics15061194
Shi S, Zeng L. TARTS: Training-Free Adaptive Reference-Guided Traversability Segmentation with Automated Footprint Supervision and Experimental Verification. Electronics. 2026; 15(6):1194. https://doi.org/10.3390/electronics15061194
Chicago/Turabian StyleShi, Shuhong, and Lingchuan Zeng. 2026. "TARTS: Training-Free Adaptive Reference-Guided Traversability Segmentation with Automated Footprint Supervision and Experimental Verification" Electronics 15, no. 6: 1194. https://doi.org/10.3390/electronics15061194
APA StyleShi, S., & Zeng, L. (2026). TARTS: Training-Free Adaptive Reference-Guided Traversability Segmentation with Automated Footprint Supervision and Experimental Verification. Electronics, 15(6), 1194. https://doi.org/10.3390/electronics15061194
