Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation

Gao, Yuan; Zhao, Jindong; Xia, Shaobo; Nie, Sheng; Wang, Cheng; Xi, Xiaohuan

doi:10.3390/rs18121875

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessCommunication

Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation

by

Yuan Gao

^1,2,3

,

Jindong Zhao

⁴,

Shaobo Xia

⁴,

Sheng Nie

^1,2,3

,

Cheng Wang

^1,2,3 and

Xiaohuan Xi

^1,2,3,*

¹

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

²

International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China

³

University of Chinese Academy of Sciences, Beijing 100094, China

⁴

School of Aeronautic Engineering, Changsha University of Science and Technology, Changsha 410004, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(12), 1875; https://doi.org/10.3390/rs18121875 (registering DOI)

Submission received: 6 May 2026 / Revised: 4 June 2026 / Accepted: 5 June 2026 / Published: 6 June 2026

(This article belongs to the Section AI Remote Sensing)

Download Versions Notes

Abstract

Semantic segmentation of large-scale airborne point clouds traditionally relies on labor-intensive 3D manual annotations. While recent zero-shot methods attempt to alleviate this burden by distilling knowledge from 2D Vision–Language Models (VLMs) via 2D-to-3D projection, they suffer from performance degradation in complex urban environments. Specifically, lacking 3D geometric awareness, 2D VLMs frequently exhibit “semantic bleeding”, where large-scale background categories (e.g., ground) erroneously submerge small-scale targets (e.g., vehicles and street elements). To address this issue, we propose a geometry-constrained pseudo-label generation and purification framework. Our approach tackles the problem through a dual-branch design: extracting open-vocabulary semantics via SAM3-based multi-view projection while simultaneously deriving sharp, class-agnostic instances using SAM2 on Gamma-transformed elevation maps. By introducing a geometric–semantic consistency module, we evaluate the internal semantic purity and external spatial homogeneity of these instances, detecting and filtering out semantic misclassifications. The purified pseudo-labels are then used to supervise a 3D sparse convolutional network via a Masked Cross-Entropy Loss. Experiments on the H3D and Turin3D datasets demonstrate that our method recovers small-scale targets that are prone to being submerged, outperforming existing zero-shot baselines by improving mIoU from 52.15% to 63.45% on H3D and from 29.52% to 58.51% on Turin3D, thereby narrowing the performance gap with fully-supervised approaches.

Keywords: airborne point cloud; semantic segmentation; zero-shot; foundation model

Share and Cite

MDPI and ACS Style

Gao, Y.; Zhao, J.; Xia, S.; Nie, S.; Wang, C.; Xi, X. Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation. Remote Sens. 2026, 18, 1875. https://doi.org/10.3390/rs18121875

AMA Style

Gao Y, Zhao J, Xia S, Nie S, Wang C, Xi X. Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation. Remote Sensing. 2026; 18(12):1875. https://doi.org/10.3390/rs18121875

Chicago/Turabian Style

Gao, Yuan, Jindong Zhao, Shaobo Xia, Sheng Nie, Cheng Wang, and Xiaohuan Xi. 2026. "Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation" Remote Sensing 18, no. 12: 1875. https://doi.org/10.3390/rs18121875

APA Style

Gao, Y., Zhao, J., Xia, S., Nie, S., Wang, C., & Xi, X. (2026). Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation. Remote Sensing, 18(12), 1875. https://doi.org/10.3390/rs18121875

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Consistency-Guided Distillation from Vision Foundation Models for Zero-Shot Airborne Point Cloud Segmentation

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI