Previous Article in Journal
Three Decades of GeoAI for Wildfire Science: A Systematic and Meta-Analysis Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Communication

Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation

1
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China
3
University of Chinese Academy of Sciences, Beijing 100094, China
4
School of Aeronautic Engineering, Changsha University of Science and Technology, Changsha 410004, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(12), 1875; https://doi.org/10.3390/rs18121875 (registering DOI)
Submission received: 6 May 2026 / Revised: 4 June 2026 / Accepted: 5 June 2026 / Published: 6 June 2026
(This article belongs to the Section AI Remote Sensing)

Abstract

Semantic segmentation of large-scale airborne point clouds traditionally relies on labor-intensive 3D manual annotations. While recent zero-shot methods attempt to alleviate this burden by distilling knowledge from 2D Vision–Language Models (VLMs) via 2D-to-3D projection, they suffer from performance degradation in complex urban environments. Specifically, lacking 3D geometric awareness, 2D VLMs frequently exhibit “semantic bleeding”, where large-scale background categories (e.g., ground) erroneously submerge small-scale targets (e.g., vehicles and street elements). To address this issue, we propose a geometry-constrained pseudo-label generation and purification framework. Our approach tackles the problem through a dual-branch design: extracting open-vocabulary semantics via SAM3-based multi-view projection while simultaneously deriving sharp, class-agnostic instances using SAM2 on Gamma-transformed elevation maps. By introducing a geometric–semantic consistency module, we evaluate the internal semantic purity and external spatial homogeneity of these instances, detecting and filtering out semantic misclassifications. The purified pseudo-labels are then used to supervise a 3D sparse convolutional network via a Masked Cross-Entropy Loss. Experiments on the H3D and Turin3D datasets demonstrate that our method recovers small-scale targets that are prone to being submerged, outperforming existing zero-shot baselines by improving mIoU from 52.15% to 63.45% on H3D and from 29.52% to 58.51% on Turin3D, thereby narrowing the performance gap with fully-supervised approaches.
Keywords: airborne point cloud; semantic segmentation; zero-shot; foundation model airborne point cloud; semantic segmentation; zero-shot; foundation model

Share and Cite

MDPI and ACS Style

Gao, Y.; Zhao, J.; Xia, S.; Nie, S.; Wang, C.; Xi, X. Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation. Remote Sens. 2026, 18, 1875. https://doi.org/10.3390/rs18121875

AMA Style

Gao Y, Zhao J, Xia S, Nie S, Wang C, Xi X. Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation. Remote Sensing. 2026; 18(12):1875. https://doi.org/10.3390/rs18121875

Chicago/Turabian Style

Gao, Yuan, Jindong Zhao, Shaobo Xia, Sheng Nie, Cheng Wang, and Xiaohuan Xi. 2026. "Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation" Remote Sensing 18, no. 12: 1875. https://doi.org/10.3390/rs18121875

APA Style

Gao, Y., Zhao, J., Xia, S., Nie, S., Wang, C., & Xi, X. (2026). Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation. Remote Sensing, 18(12), 1875. https://doi.org/10.3390/rs18121875

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop