Skip Content
You are currently on the new version of our website. Access the old version .
DronesDrones
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

31 January 2026

A Fine-Grained Difficulty and Similarity Framework for Dynamic Evaluation of Path-Planning Generalization in UGVs

,
,
,
,
and
1
Department of Automation, Tsinghua University, Beijing 100084, China
2
Department of Avionics and Ordnance Engineering, Army Aviation Institute, Beijing 100123, China
*
Author to whom correspondence should be addressed.
Drones2026, 10(2), 101;https://doi.org/10.3390/drones10020101 
(registering DOI)

Abstract

The generalization capability of the decision-making modules in unmanned ground vehicles (UGVs) is critical for their safe deployment in unseen environments. Prevailing evaluation methods, which rely on aggregated performance over static benchmark sets, lack the granularity to diagnose the root causes of model failure, as they often conflate the distinct influences of scenario similarity and intrinsic difficulty. To overcome this limitation, we introduce a fine-grained, dynamic evaluation framework that deconstructs generalization along the dual axes of multi-level difficulty and similarity. First, scenario similarity is quantified through a four-layer hierarchical decomposition, with results aggregated into a composite similarity score. Test scenarios are independently classified into ten discrete difficulty levels via a consensus mechanism integrating large language models and task-specific proxy models. By constructing a three-dimensional (3D) performance landscape across similarity, difficulty, and task performance, we enable detailed behavioral diagnosis. The framework assesses robustness by analyzing performance within the high-similarity band (90–100%), while the full 3D landscape characterizes generalization under distribution shift. Seven interpretable metrics are derived to quantify distinct facets of both generalization and robustness. This initial validation focuses on the path-planning layer under full state observability, establishing a foundational proof-of-concept for the framework. It not only ranks algorithms but also reveals non-trivial behavioral patterns, such as the decoupling between in-distribution robustness and out-of-distribution generalization. It provides a reliable and interpretable foundation for evaluating the readiness of UGVs for safe deployment in unseen environments.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.