Search for Articles

Article

1,080 Views

29 Pages

Bridging Vision Foundation and Vision–Language Models for Open-Vocabulary Semantic Segmentation of UAV Imagery

Fan Li,
Zhaoxiang Zhang,
Xuanbin Wang,
Xuan Wang and
Yuelei Xu

Remote Sens.2025, 17(22), 3704;https://doi.org/10.3390/rs17223704

-

13 November 2025

Open-vocabulary semantic segmentation (OVSS) is of critical importance for unmanned aerial vehicle (UAV) imagery, as UAV scenes are highly dynamic and characterized by diverse, unpredictable object categories. Current OVSS approaches mainly rely on t...

603 Results Found

Bridging Vision Foundation and Vision–Language Models for Open-Vocabulary Semantic Segmentation of UAV Imagery

VL-Meta: Vision-Language Models for Multimodal Meta-Learning

Comparative Evaluation of Vision–Language Models for Detecting and Localizing Dental Lesions from Intraoral Images

Transforming Product Discovery and Interpretation Using Vision–Language Models

Few-Shot Image Classification of Crop Diseases Based on Vision–Language Models

CoCM: Conditional Cross-Modal Learning for Vision-Language Models

Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

CrackCLIP: Adapting Vision-Language Models for Weakly Supervised Crack Segmentation

Research Progress on Vision–Language Multimodal Pretraining Model Technology

Preliminary Study on Image-Finding Generation and Classification of Lung Nodules in Chest CT Images Using Vision–Language Models

Prototype-Guided Zero-Shot Medical Image Segmentation with Large Vision-Language Models

Towards Robust Industrial Control Interpretation Through Comparative Analysis of Vision–Language Models

Application of Vision-Language Models in the Automatic Recognition of Bone Tumors on Radiographs: A Retrospective Study

An Empirical Evaluation of Low-Rank Adapted Vision–Language Models for Radiology Image Captioning

Cross-Cultural Safety Judgments in Child Environments: A Semantic Comparison of Vision-Language Models and Humans

Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models

A Novel Approach for Reliable Classification of Marine Low Cloud Morphologies with Vision–Language Models

CSP-DCPE: Category-Specific Prompt with Deep Contextual Prompt Enhancement for Vision–Language Models

Analyzing Diagnostic Reasoning of Vision–Language Models via Zero-Shot Chain-of-Thought Prompting in Medical Visual Question Answering

Multimodal AI for UAV: Vision–Language Models in Human– Machine Collaboration

CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model

V-PRUNE: Semantic-Aware Patch Pruning Before Tokenization in Vision–Language Model Inference

Fast and Lightweight Vision-Language Model for Adversarial Traffic Sign Detection

Public Perception of Urban Recreational Spaces Based on Large Vision–Language Models: A Case Study of Beijing’s Third Ring Area

Auto-Rad: End-to-End Report Generation from Lumber Spine MRI Using Vision–Language Model

MVL-Loc: Leveraging Vision-Language Model for Generalizable Multi-Scene Camera Relocalization

SADAMB: Advancing Spatially-Aware Vision-Language Modeling Through Datasets, Metrics, and Benchmarks

A Vision–Language Model-Based Traffic Sign Detection Method for High-Resolution Drone Images: A Case Study in Guyuan, China

Mitigating Context Bias in Vision–Language Models via Multimodal Emotion Recognition

Cross-Modal Data Fusion via Vision-Language Model for Crop Disease Recognition

RelVid: Relational Learning with Vision-Language Models for Weakly Video Anomaly Detection

Vision-Language Model-Based Local Interpretable Model-Agnostic Explanations Analysis for Explainable In-Vehicle Controller Area Network Intrusion Detection

Application of Vision Language Models in the Shoe Industry

RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

An Experimental Evaluation of Smart Sensors for Pedestrian Attribute Recognition Using Multi-Task Learning and Vision Language Models

Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy

Vision-Language Models for Zero-Shot Classification of Remote Sensing Images

Evaluation of Thermal Comfort in Urban Commercial Space with Vision–Language-Model-Based Agent Model

DL-VLM: A Dynamic Lightweight Vision-Language Model for Bridge Health Diagnosis

CracksGPT: Exploring the Potential and Limitations of Multimodal AI for Building Crack Analysis

Think-to-Detect: Rationale-Driven Vision–Language Anomaly Detection

A Survey of Robot Intelligence with Large Language Models

Real-Time Vision–Language Analysis for Autonomous Underwater Drones: A Cloud–Edge Framework Using Qwen2.5-VL

From Vision-Only to Vision + Language: A Multimodal Framework for Few-Shot Unsound Wheat Grain Classification

Detailed Image Captioning and Hashtag Generation

Parameter-Efficient Adaptation of Large Vision—Language Models for Video Memorability Prediction

An Exploratory Study on Workover Scenario Understanding Using Prompt-Enhanced Vision-Language Models

LLaVA-Pose: Keypoint-Integrated Instruction Tuning for Human Pose and Action Understanding

GPTArm: An Autonomous Task Planning Manipulator Grasping System Based on Vision–Language Models

Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management