Abstract
This work addresses computational inefficiency in ultra-wide-area remote sensing image (RSI) object detection. Traditional homogeneous tiling strategies enforce computational symmetry by processing all image regions uniformly, ignoring the intrinsic spatial asymmetry of target distribution where target-dense coexist with vast target-sparse areas (e.g., deserts, farmlands), thereby wasting computational resources. To overcome symmetry mismatch, we propose a heat-guided adaptive blocking and dual-model collaboration (HAB-DMC) framework. First, a lightweight EfficientNetV2 classifies initial 1024 × 1024 tiles into semantic scenes (e.g., airports, forests). A target-scene relevance metric converts scene probabilities into a heatmap, identifying high-attention regions (HARs, e.g., airports) and low-attention regions (LARs, e.g., forests). HARs undergo fine-grained tiling (640 × 640 with 20% overlap) to preserve small targets, while LARs use coarse tiling (1024 × 1024) to minimize processing. Crucially, a dual-model strategy deploys: (1) a high-precision LSK-RTDETR-base detector (with Large Selective Kernel backbone) for HARs to capture multi-scale features, and (2) a streamlined LSK-RTDETR-lite detector for LARs to accelerate inference. Experiments show 23.9% faster inference on 30k-pixel images and reduction in invalid computations by 72.8% (from 50% to 13.6%) versus traditional methods, while maintaining competitive mAP (74.2%). The key innovation lies in repurposing heatmaps from localization tools to dynamic computation schedulers, enabling system-level efficiency for Ultra-Wide-Area RSIs.