Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (223)

Search Parameters:
Keywords = segmentation anything model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 95637 KB  
Article
Promptable Foundation Models for SAR Remote Sensing: Adapting the Segment Anything Model for Snow Avalanche Segmentation
by Riccardo Gelato, Carlo Sgaravatti, Jakob Grahn, Giacomo Boracchi and Filippo Maria Bianchi
Remote Sens. 2026, 18(3), 519; https://doi.org/10.3390/rs18030519 - 5 Feb 2026
Abstract
Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentinel-1 can be effectively used for this task, but training an effective detection model requires gathering a large [...] Read more.
Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentinel-1 can be effectively used for this task, but training an effective detection model requires gathering a large dataset with high-quality annotations from domain experts, which is prohibitively time-consuming. In this work, we aim to facilitate and accelerate the annotation of SAR images for avalanche mapping. We build on the Segment Anything Model (SAM), a segmentation foundation model trained on natural images, and tailor it to Sentinel-1 SAR data. Adapting SAM to our use case requires addressing several domain-specific challenges: (1) domain mismatch, since SAM was not trained on satellite or SAR imagery; (2) input adaptation, because SAR products typically provide more than three channels while the SAM is constrained to RGB images; (3) robustness to imprecise prompts that can affect target identification and degrade the segmentation quality, an issue exacerbated in small, low-contrast avalanches; and (4) training efficiency, since standard fine-tuning is computationally demanding for the SAM. We tackle these challenges through a combination of adapters to mitigate the domain gap, multiple encoders to handle multi-channel SAR inputs, prompt-engineering strategies to improve avalanche localization accuracy, and a training algorithm that limits the training time of the encoder, which is recognized as the major bottleneck. We integrate the resulting model into a segmentation tool and show experimentally that it speeds up the annotation of SAR images. Full article
(This article belongs to the Section Environmental Remote Sensing)
22 pages, 11216 KB  
Article
A Multi-Scale Remote Sensing Image Change Detection Network Based on Vision Foundation Model
by Shenbo Liu, Dongxue Zhao and Lijun Tang
Remote Sens. 2026, 18(3), 506; https://doi.org/10.3390/rs18030506 - 4 Feb 2026
Abstract
As a key technology in the intelligent interpretation of remote sensing, remote sensing image change detection aims to automatically identify surface changes from images of the same area acquired at different times. Although vision foundation models have demonstrated outstanding capabilities in image feature [...] Read more.
As a key technology in the intelligent interpretation of remote sensing, remote sensing image change detection aims to automatically identify surface changes from images of the same area acquired at different times. Although vision foundation models have demonstrated outstanding capabilities in image feature representation, their inherent patch-based processing and global attention mechanisms limit their effectiveness in perceiving multi-scale targets. To address this, we propose a multi-scale remote sensing image change detection network based on a vision foundation model, termed SAM-MSCD. This network integrates an efficient parameter fine-tuning strategy with a cross-temporal multi-scale feature fusion mechanism, significantly improving change perception accuracy in complex scenarios. Specifically, the Low-Rank Adaptation mechanism is adopted for parameter-efficient fine-tuning of the Segment Anything Model (SAM) image encoder, adapting it for the remote sensing change detection task. A bi-temporal feature interaction module(BIM) is designed to enhance the semantic alignment and the modeling of change relationships between feature maps from different time phases. Furthermore, a change feature enhancement module (CFEM) is proposed to fuse and highlight differential information from different levels, achieving precise capture of multi-scale changes. Comprehensive experimental results on four public remote sensing change detection datasets, namely LEVIR-CD, WHU-CD, NJDS, and MSRS-CD, demonstrate that SAM-MSCD surpasses current state-of-the-art (SOTA) methods on several key evaluation metrics, including the F1-score and Intersection over Union(IoU), indicating its broad prospects for practical application. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

31 pages, 1633 KB  
Article
Foundation-Model-Driven Skin Lesion Segmentation and Classification Using SAM-Adapters and Vision Transformers
by Faisal Binzagr and Majed Hariri
Diagnostics 2026, 16(3), 468; https://doi.org/10.3390/diagnostics16030468 - 3 Feb 2026
Viewed by 40
Abstract
Background: The precise segmentation and classification of dermoscopic images remain prominent obstacles in automated skin cancer evaluation due, in part, to variability in lesions, low-contrast borders, and additional artifacts in the background. There have been recent developments in foundation models, with a particular [...] Read more.
Background: The precise segmentation and classification of dermoscopic images remain prominent obstacles in automated skin cancer evaluation due, in part, to variability in lesions, low-contrast borders, and additional artifacts in the background. There have been recent developments in foundation models, with a particular emphasis on the Segment Anything Model (SAM)—these models exhibit strong generalization potential but require domain-specific adaptation to function effectively in medical imaging. The advent of new architectures, particularly Vision Transformers (ViTs), expands the means of implementing robust lesion identification; however, their strengths are limited without spatial priors. Methods: The proposed study lays out an integrated foundation-model-based framework that utilizes SAM-Adapter-fine-tuning for lesion segmentation and a ViT-based classifier that incorporates lesion-specific cropping derived from segmentation and cross-attention fusion. The SAM encoder is kept frozen while lightweight adapters are fine-tuned only, to introduce skin surface-specific capacity. Segmentation priors are incorporated during the classification stage through fusion with patch-embeddings from the images, creating lesion-centric reasoning. The entire pipeline is trained using a joint multi-task approach using data from the ISIC 2018, HAM10000, and PH2 datasets. Results: From extensive experimentation, the proposed method outperforms the state-of-the-art segmentation and classification across the dataset. On the ISIC 2018 dataset, it achieves a Dice score of 94.27% for segmentation and an accuracy of 95.88% for classification performance. On PH2, a Dice score of 95.62% is achieved, and for HAM10000, an accuracy of 96.37% is achieved. Several ablation analyses confirm that both the SAM-Adapters and lesion-specific cropping and cross-attention fusion contribute substantially to performance. Paired t-tests are used to confirm statistical significance for all the previously stated measures where improvements over strong baselines indicate a p<0.01 for most comparisons and with large effect sizes. Conclusions: The results indicate that the combination of prior segmentation from foundation models, plus transformer-based classification, consistently and reliably improves the quality of lesion boundaries and diagnosis accuracy. Thus, the proposed SAM-ViT framework demonstrates a robust, generalizable, and lesion-centric automated dermoscopic analysis, and represents a promising initial step towards clinically deployable skin cancer decision-support system. Next steps will include model compression, improved pseudo-mask refinement and evaluation on real-world multi-center clinical cohorts. Full article
(This article belongs to the Special Issue Medical Image Analysis and Machine Learning)
Show Figures

Figure 1

20 pages, 4296 KB  
Article
Occlusion-Aware Multi-Object Tracking in Vineyards via SAM-Based Visibility Modeling
by Yanan Wang, Hagsong Kim, Muhammad Fayaz, Lien Minh Dang, Hyeonjoon Moon and Kang-Won Lee
Electronics 2026, 15(3), 621; https://doi.org/10.3390/electronics15030621 - 1 Feb 2026
Viewed by 109
Abstract
Multi-object tracking (MOT) in vineyard environments remains challenging due to frequent and long-term occlusions caused by dense foliage, overlapping grape clusters, and complex plant structures. These characteristics often result in identity switches and fragmented trajectories when using conventional tracking methods. This paper proposes [...] Read more.
Multi-object tracking (MOT) in vineyard environments remains challenging due to frequent and long-term occlusions caused by dense foliage, overlapping grape clusters, and complex plant structures. These characteristics often result in identity switches and fragmented trajectories when using conventional tracking methods. This paper proposes OATSAM-Track, an occlusion-aware multi-object tracking framework designed for vineyard fruit monitoring. The framework integrates lightweight MobileSAM-assisted instance segmentation to estimate target visibility and occlusion severity. Occlusion-state reasoning is further incorporated into temporal association, appearance memory updating, and identity recovery. An adaptive temporal memory mechanism selectively updates appearance features according to predicted occlusion states, reducing identity drift under partial and severe occlusions. To facilitate occlusion-aware evaluation, an extended vineyard multi-object tracking dataset (GrapeOcclusionMOTS) with SAM-refined instance masks and fine-grained occlusion annotations is constructed. The experimental results demonstrate that OATSAM-Track improves identity consistency and tracking robustness compared to representative baseline trackers, particularly under medium and severe occlusion scenarios. These results indicate that explicit occlusion modeling is beneficial for reliable fruit monitoring in precision agriculture. Full article
Show Figures

Figure 1

20 pages, 17064 KB  
Article
PriorSAM-DBNet: A SAM-Prior-Enhanced Dual-Branch Network for Efficient Semantic Segmentation of High-Resolution Remote Sensing Images
by Qiwei Zhang, Yisong Wang, Ning Li, Quanwen Jiang and Yong He
Sensors 2026, 26(2), 749; https://doi.org/10.3390/s26020749 - 22 Jan 2026
Viewed by 172
Abstract
Semantic segmentation of high-resolution remote sensing imagery is a critical technology for the intelligent interpretation of sensor data, supporting automated environmental monitoring and urban sensing systems. However, processing data from dense urban scenarios remains challenging due to sensor signal occlusions (e.g., shadows) and [...] Read more.
Semantic segmentation of high-resolution remote sensing imagery is a critical technology for the intelligent interpretation of sensor data, supporting automated environmental monitoring and urban sensing systems. However, processing data from dense urban scenarios remains challenging due to sensor signal occlusions (e.g., shadows) and the complexity of parsing multi-scale targets from optical sensors. Existing approaches often exhibit a trade-off between the accuracy of global semantic modeling and the precision of complex boundary recognition. While the Segment Anything Model (SAM) offers powerful zero-shot structural priors, its direct application to remote sensing is hindered by domain gaps and the lack of inherent semantic categorization. To address these limitations, we propose a dual-branch cooperative network, PriorSAM-DBNet. The main branch employs a Densely Connected Swin (DC-Swin) Transformer to capture cross-scale global features via a hierarchical shifted window attention mechanism. The auxiliary branch leverages SAM’s zero-shot capability to exploit structural universality, generating object-boundary masks as robust signal priors while bypassing semantic domain shifts. Crucially, we introduce a parameter-efficient Scaled Subsampling Projection (SSP) module that employs a weight-sharing mechanism to align cross-modal features, freezing the massive SAM backbone to ensure computational viability for practical sensor applications. Furthermore, a novel Attentive Cross-Modal Fusion (ACMF) module is designed to dynamically resolve semantic ambiguities by calibrating the global context with local structural priors. Extensive experiments on the ISPRS Vaihingen, Potsdam, and LoveDA-Urban datasets demonstrate that PriorSAM-DBNet outperforms state-of-the-art approaches. By fine-tuning only 0.91 million parameters in the auxiliary branch, our method achieves mIoU scores of 82.50%, 85.59%, and 53.36%, respectively. The proposed framework offers a scalable, high-precision solution for remote sensing semantic segmentation, particularly effective for disaster emergency response where rapid feature recognition from sensor streams is paramount. Full article
Show Figures

Figure 1

15 pages, 6022 KB  
Perspective
A Multidimensional Approach to Cereal Caryopsis Development: Insights into Adlay (Coix lacryma-jobi L.) and Emerging Applications
by Xiaoyu Yang, Jian Zhang, Maohong Ao, Jing Lei and Chenglong Yang
Plants 2026, 15(2), 320; https://doi.org/10.3390/plants15020320 - 21 Jan 2026
Viewed by 177
Abstract
Adlay (Coix lacryma-jobi L.) stands out as a vital health-promoting cereal due to its dual nutritional and medicinal properties; however, it remains significantly underdeveloped compared to major crops. The lack of mechanistic understanding of its caryopsis development and trait formation severely constrains [...] Read more.
Adlay (Coix lacryma-jobi L.) stands out as a vital health-promoting cereal due to its dual nutritional and medicinal properties; however, it remains significantly underdeveloped compared to major crops. The lack of mechanistic understanding of its caryopsis development and trait formation severely constrains targeted genetic improvement. While transformative technologies, specifically micro-computed tomography (micro-CT) imaging combined with AI-assisted analysis (e.g., Segment Anything Model (SAM)) and multi-omics approaches, have been successfully applied to unravel the structural and physiological complexities of model cereals, their systematic adoption in adlay research remains fragmented. Going beyond a traditional synthesis of these methodologies, this article proposes a novel, multidimensional framework specifically designed for adlay. This forward-looking strategy integrates high-resolution 3D phenotyping with spatial multi-omics data to bridge the gap between macroscopic caryopsis architecture and microscopic metabolic accumulation. By offering a precise digital solution to elucidate adlay’s unique developmental mechanisms, the proposed framework aims to accelerate precision breeding and advance the scientific modernization of this promising underutilized crop. Full article
(This article belongs to the Special Issue AI-Driven Machine Vision Technologies in Plant Science)
Show Figures

Figure 1

25 pages, 19621 KB  
Article
Scrap-SAM-CLIP: Assembling Foundation Models for Typical Shape Recognition in Scrap Classification and Rating
by Guangda Bao, Wenzhi Xia, Haichuan Wang, Zhiyou Liao, Ting Wu and Yun Zhou
Sensors 2026, 26(2), 656; https://doi.org/10.3390/s26020656 - 18 Jan 2026
Viewed by 339
Abstract
To address the limitation of 2D methods in inferring absolute scrap dimensions from images, we propose Scrap-SAM-CLIP (SSC), a vision-language model integrating the segment anything model (SAM) and contrastive language-image pre-training in Chinese (CN-CLIP). The model enables identification of canonical scrap shapes, establishing [...] Read more.
To address the limitation of 2D methods in inferring absolute scrap dimensions from images, we propose Scrap-SAM-CLIP (SSC), a vision-language model integrating the segment anything model (SAM) and contrastive language-image pre-training in Chinese (CN-CLIP). The model enables identification of canonical scrap shapes, establishing a foundational framework for subsequent 3D reconstruction and dimensional extraction within the 3D recognition pipeline. Individual modules of SSC are fine-tuned on the self-constructed scrap dataset. For segmentation, the combined box-and-point prompt yields optimal performance among various prompting strategies. MobileSAM and SAM-HQ-Tiny serve as effective lightweight alternatives for edge deployment. Fine-tuning the SAM decoder significantly enhances robustness under noisy prompts, improving accuracy by at least 5.55% with a five-positive-points prompt and up to 15.00% with a five-positive-points-and-five-negative-points prompt. In classification, SSC achieves 95.3% accuracy, outperforming Swin Transformer V2_base by 2.9%, with t-SNE visualizations confirming superior feature learning capability. The performance advantages of SSC stem from its modular assembly strategy, enabling component-specific optimization through subtask decoupling and enhancing system interpretability. This work refines the scrap 3D identification pipeline and demonstrates the efficacy of adapted foundation models in industrial vision systems. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

25 pages, 4064 KB  
Article
Application of CNN and Vision Transformer Models for Classifying Crowns in Pine Plantations Affected by Diplodia Shoot Blight
by Mingzhu Wang, Christine Stone and Angus J. Carnegie
Forests 2026, 17(1), 108; https://doi.org/10.3390/f17010108 - 13 Jan 2026
Viewed by 250
Abstract
Diplodia shoot blight is an opportunistic fungal pathogen infesting many conifer species and it has a global distribution. Depending on the duration and severity of the disease, affected needles appear yellow (chlorotic) for a brief period before becoming red or brown in colour. [...] Read more.
Diplodia shoot blight is an opportunistic fungal pathogen infesting many conifer species and it has a global distribution. Depending on the duration and severity of the disease, affected needles appear yellow (chlorotic) for a brief period before becoming red or brown in colour. These symptoms can occur on individual branches or over the entire crown. Aerial sketch-mapping or the manual interpretation of aerial photography for tree health surveys are labour-intensive and subjective. Recently, however, the application of deep learning (DL) techniques to detect and classify tree crowns in high-spatial-resolution imagery has gained significant attention. This study evaluated two complementary DL approaches for the detection and classification of Pinus radiata trees infected with diplodia shoot blight across five geographically dispersed sites with varying topographies over two acquisition years: (1) object detection using YOLOv12 combined with Segment Anything Model (SAM) and (2) pixel-level semantic segmentation using U-Net, SegFormer, and EVitNet. The three damage classes for the object detection approach were ‘yellow’, ‘red-brown’ (both whole-crown discolouration) and ‘dead tops’ (partially discoloured crowns), while for the semantic segmentation the three classes were yellow, red-brown, and background. The YOLOv12m model achieved an overall mAP50 score of 0.766 and mAP50–95 of 0.447 across all three classes, with red-brown crowns demonstrating the highest detection accuracy (mAP50: 0.918, F1 score: 0.851). For semantic segmentation models, SegFormer showed the strongest performance (IoU of 0.662 for red-brown and 0.542 for yellow) but at the cost of longest training time, while EVitNet offered the most cost-effective solution achieving comparable accuracy to SegFormer but with a superior training efficiency with its lighter architecture. The accurate identification and symptom classification of crown damage symptoms support the calibration and validation of satellite-based monitoring systems and assist in the prioritisation of ground-based diagnosis or management interventions. Full article
(This article belongs to the Section Forest Health)
Show Figures

Figure 1

30 pages, 16273 KB  
Article
PMG-SAM: Boosting Auto-Segmentation of SAM with Pre-Mask Guidance
by Jixue Gao, Xiaoyan Jiang, Anjie Wang, Yongbin Gao, Zhijun Fang and Michael S. Lew
Sensors 2026, 26(2), 365; https://doi.org/10.3390/s26020365 - 6 Jan 2026
Viewed by 352
Abstract
The Segment Anything Model (SAM), a foundational vision model, struggles with fully automatic segmentation of specific objects. Its “segment everything” mode, reliant on a grid-based prompt strategy, suffers from localization blindness and computational redundancy, leading to poor performance on tasks like Dichotomous Image [...] Read more.
The Segment Anything Model (SAM), a foundational vision model, struggles with fully automatic segmentation of specific objects. Its “segment everything” mode, reliant on a grid-based prompt strategy, suffers from localization blindness and computational redundancy, leading to poor performance on tasks like Dichotomous Image Segmentation (DIS). To address this, we propose PMG-SAM, a framework that introduces a Pre-Mask Guided paradigm for automatic targeted segmentation. Our method employs a dual-branch encoder to generate a coarse global Pre-Mask, which then acts as a dense internal prompt to guide the segmentation decoder. A key component, our proposed Dense Residual Fusion Module (DRFM), iteratively co-refines multi-scale features to significantly enhance the Pre-Mask’s quality. Extensive experiments on challenging DIS and Camouflaged Object Segmentation (COS) tasks validate our approach. On the DIS-TE2 benchmark, PMG-SAM boosts the maximal F-measure from SAM’s 0.283 to 0.815. Notably, our fully automatic model’s performance surpasses even the ground-truth bounding box prompted modes of SAM and SAM2, while using only 22.9 M trainable parameters (58.8% of SAM2-Tiny). PMG-SAM thus presents an efficient and accurate paradigm for resolving the localization bottleneck of large vision models in prompt-free scenarios. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

21 pages, 11031 KB  
Article
CF-SAM: An Efficient and Precise SAM Model for Instance Segmentation of Cotton Top Leaves
by Yanliang Mao, Kubwimana Olivier, Guangzhi Niu and Liping Chen
Agronomy 2026, 16(1), 30; https://doi.org/10.3390/agronomy16010030 - 22 Dec 2025
Viewed by 428
Abstract
The complexity of field environments poses significant challenges for the segmentation of cotton top leaves, a critical step for apical bud localization in intelligent topping systems. Conventional segmentation models typically rely on large annotated datasets and high computational costs to achieve high precision [...] Read more.
The complexity of field environments poses significant challenges for the segmentation of cotton top leaves, a critical step for apical bud localization in intelligent topping systems. Conventional segmentation models typically rely on large annotated datasets and high computational costs to achieve high precision and robustness. To address these challenges, this paper proposes an efficient and accurate segmentation model, CF-SAM, built upon the Segment Anything Model (SAM) framework. CF-SAM integrates a lightweight Tiny-ViT encoder to reduce computational overhead and employs a LoRA-based fine-tuning strategy for domain adaptation, achieving improved performance with minimal parameter increments. In addition, an Adaptive Prompting Strategy (APS) is introduced to automatically generate high-quality point prompts, enabling fully automated and end-to-end instance segmentation. Trained on only 1000 field images, CF-SAM achieves 98.0% mask accuracy and an mAP@0.5 of 97.83%, while maintaining real-time inference at 58 FPS with only 0.091 M (0.8%) additional parameters. These results demonstrate that CF-SAM achieves an excellent balance between segmentation accuracy and computational cost, providing a reliable technical foundation for apical bud localization and precision agricultural operations. Full article
(This article belongs to the Special Issue Agricultural Imagery and Machine Vision)
Show Figures

Figure 1

15 pages, 3989 KB  
Article
YOLO-SAM AgriScan: A Unified Framework for Ripe Strawberry Detection and Segmentation with Few-Shot and Zero-Shot Learning
by Partho Ghose, Al Bashir, Yibin Wang, Cristian Bua and Azlan Zahid
Sensors 2025, 25(24), 7678; https://doi.org/10.3390/s25247678 - 18 Dec 2025
Cited by 1 | Viewed by 487
Abstract
Traditional segmentation methods are slow and rely on manual annotations, which are labor-intensive. To address these limitations, we propose YOLO-SAM AgriScan, a unified framework that combines the fast object detection capabilities of YOLOv11 with the zero-shot segmentation power of the Segment Anything Model [...] Read more.
Traditional segmentation methods are slow and rely on manual annotations, which are labor-intensive. To address these limitations, we propose YOLO-SAM AgriScan, a unified framework that combines the fast object detection capabilities of YOLOv11 with the zero-shot segmentation power of the Segment Anything Model 2 (SAM2). Our approach adopts a hybrid paradigm for on-plant ripe strawberry segmentation, wherein YOLOv11 is fine-tuned using a few-shot learning strategy with minimal annotated samples, and SAM2 performs mask generation without additional supervision. This architecture eliminates the bottleneck of pixel-wise manual annotation and enables the scalable and efficient segmentation of strawberries in both controlled and natural farm environments. Experimental evaluations on two datasets, a custom-collected dataset and a publicly available benchmark, demonstrate strong detection and segmentation performance in both full-data and data-constrained scenarios. The proposed framework achieved a mean Dice score of 0.95 and an IoU of 0.93 on our collected dataset and maintained competitive performance on public data (Dice: 0.95, IoU: 0.92), demonstrating its robustness, generalizability, and practical relevance in real-world agricultural settings. Our results highlight the potential of combining few-shot detection and zero-shot segmentation to accelerate the development of annotation-light, intelligent phenotyping systems. Full article
Show Figures

Figure 1

18 pages, 5536 KB  
Article
Automated Particle Size Analysis of Supported Nanoparticle TEM Images Using a Pre-Trained SAM Model
by Xiukun Zhong, Guohong Liang, Lingbei Meng, Wei Xi, Lin Gu, Nana Tian, Yong Zhai, Yutong He, Yuqiong Huang, Fengmin Jin and Hong Gao
Nanomaterials 2025, 15(24), 1886; https://doi.org/10.3390/nano15241886 - 16 Dec 2025
Cited by 2 | Viewed by 739
Abstract
This study addresses the challenges associated with transmission electron microscopy (TEM) image analysis of supported nanoparticles, including low signal-to-noise ratio, poor contrast, and interference from complex substrate backgrounds. This study proposes an automated segmentation and particle size analysis method based on a large-scale [...] Read more.
This study addresses the challenges associated with transmission electron microscopy (TEM) image analysis of supported nanoparticles, including low signal-to-noise ratio, poor contrast, and interference from complex substrate backgrounds. This study proposes an automated segmentation and particle size analysis method based on a large-scale deep learning model, namely segment anything model (SAM). Using Ru/TiO2 and related materials as representative systems, the pretrained SAM is employed for zero-shot segmentation of nanoparticles, which is further integrated with a custom image processing pipeline, including optical character recognition (OCR) module, morphological optimization, and connected component analysis to achieve high-precision particle size quantification. Experimental results demonstrate that the method retains robust performance under challenging imaging conditions, with a size estimation error between 3% and 5% and a per-image processing time under 1 min, significantly outperforming traditional manual annotation and threshold-based segmentation approaches. This framework provides an efficient and reliable analytical tool for morphological characterization and structure–performance correlation studies in supported nanocatalysts. Full article
(This article belongs to the Section Theory and Simulation of Nanostructures)
Show Figures

Figure 1

20 pages, 14411 KB  
Article
An Integrated Framework with SAM and OCR for Pavement Crack Quantification and Geospatial Mapping
by Nut Sovanneth, Asnake Adraro Angelo, Felix Obonguta and Kiyoyuki Kaito
Infrastructures 2025, 10(12), 348; https://doi.org/10.3390/infrastructures10120348 - 15 Dec 2025
Viewed by 638
Abstract
Pavement condition assessment using computer vision has emerged as an efficient alternative to traditional manual surveys, which are often labor-intensive and time-consuming. Leveraging deep learning, pavement distress such as cracks can be automatically detected, segmented, and quantified from high-resolution images captured by survey [...] Read more.
Pavement condition assessment using computer vision has emerged as an efficient alternative to traditional manual surveys, which are often labor-intensive and time-consuming. Leveraging deep learning, pavement distress such as cracks can be automatically detected, segmented, and quantified from high-resolution images captured by survey vehicles. Although numerous segmentation models have been proposed to generate crack masks, they typically require extensive pixel-level annotations, leading to high labeling costs. To overcome this limitation, this study integrates the Segmentation Anything Model (SAM), which produces accurate segmentation masks from simple bounding box prompts while leveraging its zero-shot capability to generalize to unseen images with minimal retraining. However, since SAM alone is not an end-to-end solution, we incorporate YOLOv8 for automated crack detection, eliminating the need for manual box annotation. Furthermore, the framework applies local refinement techniques to enhance mask precision and employs Optical Character Recognition (OCR) to automatically extract embedded GPS coordinates for geospatial mapping. The proposed framework is empirically validated using open-source pavement images from Yamanashi, demonstrating effective automated detection, classification, quantification, and geospatial mapping of pavement cracks. The results support automated pavement distress mapping onto real-world road networks, facilitating efficient maintenance planning for road agencies. Full article
Show Figures

Figure 1

19 pages, 6617 KB  
Article
Domain-Adaptive Segment Anything Model for Cross-Domain Water Body Segmentation in Satellite Imagery
by Lihong Yang, Pengfei Liu, Guilong Zhang, Huaici Zhao and Chunyang Zhao
J. Imaging 2025, 11(12), 437; https://doi.org/10.3390/jimaging11120437 - 9 Dec 2025
Viewed by 389
Abstract
Monitoring surface water bodies is crucial for environmental protection and resource management. Existing segmentation methods often struggle with limited generalization across different satellite domains. We propose DASAM, a domain-adaptive Segment Anything Model for cross-domain water body segmentation in satellite imagery. The core innovation [...] Read more.
Monitoring surface water bodies is crucial for environmental protection and resource management. Existing segmentation methods often struggle with limited generalization across different satellite domains. We propose DASAM, a domain-adaptive Segment Anything Model for cross-domain water body segmentation in satellite imagery. The core innovation of DASAM is a contrastive learning module that aligns features between source and style-augmented images, enabling robust domain generalization without requiring annotations from the target domain. Additionally, DASAM integrates a prompt-enhanced module and an encoder adapter to capture fine-grained spatial details and global context, further improving segmentation accuracy. Experiments on the China GF-2 dataset demonstrate superior performance over existing methods, while cross-domain evaluations on GLH-water and Sentinel-2 water body image datasets verify its strong generalization and robustness. These results highlight DASAM’s potential for large-scale, diverse satellite water body monitoring and accurate environmental analysis. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

28 pages, 29492 KB  
Article
RSAM: Vision-Language Two-Way Guidance for Referring Remote Sensing Image Segmentation
by Zilong Zhao, Xin Xu, Bingxin Huang, Hongjia Chen and Fangling Pu
Remote Sens. 2025, 17(24), 3960; https://doi.org/10.3390/rs17243960 - 8 Dec 2025
Cited by 1 | Viewed by 644
Abstract
Referring remote sensing image segmentation (RRSIS) aims to accurately segment target objects in remote sensing images based on natural language instructions. Despite its growing relevance, progress in this field is constrained by limited datasets and weak cross-modal alignment. To support RRSIS research, we [...] Read more.
Referring remote sensing image segmentation (RRSIS) aims to accurately segment target objects in remote sensing images based on natural language instructions. Despite its growing relevance, progress in this field is constrained by limited datasets and weak cross-modal alignment. To support RRSIS research, we construct referring image segmentation in optical remote sensing (RISORS), a large-scale benchmark containing 36,697 instruction–mask pairs. RISORS provides diverse and high-quality samples that enable comprehensive experiment in remote sensing contexts. Building on this foundation, we propose Referring-SAM (RSAM), a novel framework that extends Segment Anything Model 2 to support text-prompted segmentation. RSAM integrates a Two-Way Guidance Module (TWGM) and a Multimodal Mask Decoder (MMMD). TWGM facilitates a two-way guidance mechanism that mutually refines image and text features, with positional encodings incorporated across all attention layers to significantly enhance relational reasoning. MMMD effectively separates textual prompts from spatial prompts, improving segmentation accuracy in complex multimodal settings. Extensive experiments on RISORS, as well as RefSegRS and RRSIS-D datasets, demonstrate that RSAM achieves state-of-the-art performance, particularly in segmenting small and diverse targets. Ablation studies further validate the individual contributions of TWGM and MMMD. This work provides a solid foundation for further developments in integrated vision-language analysis within remote sensing applications. Full article
Show Figures

Figure 1

Back to TopTop