Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (63)

Search Parameters:
Keywords = Grounding SAM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 7711 KB  
Article
Fundamentals of Controlled Demolition in Structures: Real-Life Applications, Discrete Element Methods, Monitoring, and Artificial Intelligence-Based Research Directions
by Julide Yuzbasi
Buildings 2025, 15(19), 3501; https://doi.org/10.3390/buildings15193501 - 28 Sep 2025
Abstract
Controlled demolition is a critical engineering practice that enables the safe and efficient dismantling of structures while minimizing risks to the surrounding environment. This study presents, for the first time, a detailed, structured framework for understanding the fundamental principles of controlled demolition by [...] Read more.
Controlled demolition is a critical engineering practice that enables the safe and efficient dismantling of structures while minimizing risks to the surrounding environment. This study presents, for the first time, a detailed, structured framework for understanding the fundamental principles of controlled demolition by outlining key procedures, methodologies, and directions for future research. Through original, carefully designed charts and full-scale numerical simulations, including two 23-story building scenarios with different delay and blasting sequences, this paper provides real-life insights into the effects of floor-to-floor versus axis-by-axis delays on structural collapse behavior, debris spread, and toppling control. Beyond traditional techniques, this study explores how emerging technologies, such as real-time structural monitoring via object tracking, LiDAR scanning, and Unmanned Aerial Vehicle (UAV)-based inspections, can be further advanced through the integration of artificial intelligence (AI). The potential Deep learning (DL) and Machine learning (ML)-based applications of tools like Convolutional Neural Network (CNN)-based digital twins, YOLO object detection, and XGBoost classifiers are highlighted as promising avenues for future research. These technologies could support real-time decision-making, automation, and risk assessment in demolition scenarios. Furthermore, vision-language models such as SAM and Grounding DINO are discussed as enabling technologies for real-time risk assessment, anomaly detection, and adaptive control. By sharing insights from full-scale observations and proposing a forward-looking analytical framework, this work lays a foundation for intelligent and resilient demolition practices. Full article
(This article belongs to the Section Building Structures)
Show Figures

Figure 1

25 pages, 7964 KB  
Article
DSCSRN: Physically Guided Symmetry-Aware Spatial-Spectral Collaborative Network for Single-Image Hyperspectral Super-Resolution
by Xueli Chang, Jintong Liu, Guotao Wen, Xiaoyu Huang and Meng Yan
Symmetry 2025, 17(9), 1520; https://doi.org/10.3390/sym17091520 - 12 Sep 2025
Viewed by 328
Abstract
Hyperspectral images (HSIs), with their rich spectral information, are widely used in remote sensing; yet the inherent trade-off between spectral and spatial resolution in imaging systems often limits spatial details. Single-image hyperspectral super-resolution (HSI-SR) seeks to recover high-resolution HSIs from a single low-resolution [...] Read more.
Hyperspectral images (HSIs), with their rich spectral information, are widely used in remote sensing; yet the inherent trade-off between spectral and spatial resolution in imaging systems often limits spatial details. Single-image hyperspectral super-resolution (HSI-SR) seeks to recover high-resolution HSIs from a single low-resolution input, but the high dimensionality and spectral redundancy of HSIs make this task challenging. In HSIs, spectral signatures and spatial textures often exhibit intrinsic symmetries, and preserving these symmetries provides additional physical constraints that enhance reconstruction fidelity and robustness. To address these challenges, we propose the Dynamic Spectral Collaborative Super-Resolution Network (DSCSRN), an end-to-end framework that integrates physical modeling with deep learning and explicitly embeds spatial–spectral symmetry priors into the network architecture. DSCSRN processes low-resolution HSIs with a Cascaded Residual Spectral Decomposition Network (CRSDN) to compress redundant channels while preserving spatial structures, generating accurate abundance maps. These maps are refined by two Synergistic Progressive Feature Refinement Modules (SPFRMs), which progressively enhance spatial textures and spectral details via a multi-scale dual-domain collaborative attention mechanism. The Dynamic Endmember Adjustment Module (DEAM) then adaptively updates spectral endmembers according to scene context, overcoming the limitations of fixed-endmember assumptions. Grounded in the Linear Mixture Model (LMM), this unmixing–recovery–reconstruction pipeline restores subtle spectral variations alongside improved spatial resolution. Experiments on the Chikusei, Pavia Center, and CAVE datasets show that DSCSRN outperforms state-of-the-art methods in both perceptual quality and quantitative performance, achieving an average PSNR of 43.42 and a SAM of 1.75 (×4 scale) on Chikusei. The integration of symmetry principles offers a unifying perspective aligned with the intrinsic structure of HSIs, producing reconstructions that are both accurate and structurally consistent. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

15 pages, 2654 KB  
Article
The Evaluation of a Deep Learning Approach to Automatic Segmentation of Teeth and Shade Guides for Tooth Shade Matching Using the SAM2 Algorithm
by KyeongHwan Han, JaeHyung Lim, Jin-Soo Ahn and Ki-Sun Lee
Bioengineering 2025, 12(9), 959; https://doi.org/10.3390/bioengineering12090959 - 6 Sep 2025
Cited by 1 | Viewed by 612
Abstract
Accurate shade matching is essential in restorative and prosthetic dentistry yet remains difficult due to subjectivity in visual assessments. We develop and evaluate a deep learning approach for the simultaneous segmentation of natural teeth and shade guides in intraoral photographs using four fine-tuned [...] Read more.
Accurate shade matching is essential in restorative and prosthetic dentistry yet remains difficult due to subjectivity in visual assessments. We develop and evaluate a deep learning approach for the simultaneous segmentation of natural teeth and shade guides in intraoral photographs using four fine-tuned variants of Segment Anything Model 2 (SAM2: tiny, small, base plus, and large) and a UNet baseline trained under the same protocol. The spatial performance was assessed using the Dice Similarity Coefficient (DSC), the Intersection over the Union (IoU), and the 95th-percentile Hausdorff distance normalized by the ground-truth equivalent diameter (HD95). The color consistency within masks was quantified by the coefficient of variation (CV) of the CIELAB components (L*, a*, b*). The perceptual color difference was measured using CIEDE2000 (ΔE00). On a held-out test set, all SAM2 variants achieved a high overlap accuracy; SAM2-large performed best (DSC: 0.987 ± 0.006; IoU: 0.975 ± 0.012; HD95: 1.25 ± 1.80%), followed by SAM2-small (0.987 ± 0.008; 0.974 ± 0.014; 2.96 ± 11.03%), SAM2-base plus (0.985 ± 0.011; 0.971 ± 0.021; 1.71 ± 3.28%), and SAM2-tiny (0.979 ± 0.015; 0.959 ± 0.028; 6.16 ± 11.17%). UNet reached a DSC = 0.972 ± 0.020, an IoU = 0.947 ± 0.035, and an HD95 = 6.54 ± 16.35%. The CV distributions for all of the prediction models closely matched the ground truth (e.g., GT L*: 0.164 ± 0.040; UNet: 0.144 ± 0.028; SAM2-small: 0.164 ± 0.038; SAM2-base plus: 0.162 ± 0.039). The full-mask ΔE00 was low across models, with the summary statistics reported as the median (mean ± SD): UNet: 0.325 (0.487 ± 0.364); SAM2-tiny: 0.162 (0.410 ± 0.665); SAM2-small: 0.078 (0.126 ± 0.166); SAM2-base plus: 0.072 (0.198 ± 0.417); SAM2-large: 0.065 (0.167 ± 0.257). These ΔE00 values lie well below the ≈1 just noticeable difference threshold on average, indicating close chromatic agreement between the predictions and annotations. Within a single dataset and training protocol, fine-tuned SAM2, especially its larger variants, provides robust spatial accuracy, boundary reliability, and color fidelity suitable for clinical shade-matching workflows, while UNet offers a competitive convolutional baseline. These results indicate technical feasibility rather than clinical validation; broader baselines and external, multi-center evaluations are needed to determine its suitability for routine shade-matching workflows. Full article
Show Figures

Figure 1

30 pages, 8388 KB  
Article
ASTER and Hyperion Satellite Remote Sensing Data for Lithological Mapping and Mineral Exploration in Ophiolitic Zones: A Case Study from Lasbela, Baluchistan, Pakistan
by Saima Khurram, Zahid Khalil Rao, Amin Beiranvand Pour, Khurram Riaz, Arshia Fatima and Amna Ahmed
Mining 2025, 5(3), 53; https://doi.org/10.3390/mining5030053 - 2 Sep 2025
Viewed by 636
Abstract
This study evaluates the capabilities of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and Hyperion remote sensing sensors for mapping ophiolitic sequences and identifying manganese mineralization in the Bela Ophiolite region, located along the axial fold–thrust belt northwest of Karachi, Pakistan. [...] Read more.
This study evaluates the capabilities of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and Hyperion remote sensing sensors for mapping ophiolitic sequences and identifying manganese mineralization in the Bela Ophiolite region, located along the axial fold–thrust belt northwest of Karachi, Pakistan. The study area comprises tholeiitic basalts, gabbros, mafic and ultramafic rocks, and sedimentary formations where manganese occurrences are associated with jasperitic chert and shale. To delineate lithological units and Mn mineralization, advanced image processing techniques were applied, including band ratio (BR), Principal Component Analysis (PCA), and Spectral Angle Mapper (SAM) on visible and near-infrared (VNIR) and shortwave infrared (SWIR) bands of ASTER. Using these methods, gabbros, basalts, and mafic-ultramafic rocks were effectively mapped, and previously unrecognized basaltic outcrops and gabbroic outcrops were also discovered. The ENVI Spectral Hourglass Wizard was used to analyze the hyperspectral data, integrating the Minimum Noise Fraction (MNF), Pixel Purity Index (PPI), and N-Dimensional Visualizer to extract the spectra of end-members associated with Mn-bearing host rocks. In addition, the Hyperspectral Material Identification (HMI) tool was tested to recognize Mn minerals. The remote sensing results were validated by petrographic analysis and ground-truth data, confirming the effectiveness of these techniques in ophiolite mapping and mineral exploration. This study shows that ASTER band combinations (3-6-7, 3-7-9) and band ratios (1/4, 4/9, 9/1 and 3/4, 4/9, 9/1) provide optimal results for lithological discrimination. The results show that remote sensing-based image processing is a powerful tool for mapping ophiolites on a regional scale and can help geologists identify potential mineralization zones in ophiolitic sequences. Full article
Show Figures

Figure 1

29 pages, 6246 KB  
Article
DASeg: A Domain-Adaptive Segmentation Pipeline Using Vision Foundation Models—Earthquake Damage Detection Use Case
by Huili Huang, Andrew Zhang, Danrong Zhang, Max Mahdi Roozbahani and James David Frost
Remote Sens. 2025, 17(16), 2812; https://doi.org/10.3390/rs17162812 - 14 Aug 2025
Viewed by 617
Abstract
Limited labeled imagery and tight response windows hinder the accurate damage quantification for post-disaster assessment. The objective of this study is to develop and evaluate a deep learning-based Domain-Adaptive Segmentation (DASeg) workflow to detect post-disaster damage using limited information [...] Read more.
Limited labeled imagery and tight response windows hinder the accurate damage quantification for post-disaster assessment. The objective of this study is to develop and evaluate a deep learning-based Domain-Adaptive Segmentation (DASeg) workflow to detect post-disaster damage using limited information available shortly after an event. DASeg unifies three Vision Foundation Models in an automatic workflow: fine-tuned DINOv2 supplies attention-based point prompts, fine-tuned Grounding DINO yields open-set box prompts, and a frozen Segment Anything Model (SAM) generates the final masks. In the earthquake-focused case study DASeg-Quake, the pipeline boosts mean Intersection over Union (mIoU) by 9.52% over prior work and 2.10% over state-of-the-art supervised baselines. In a zero-shot setting scenario, DASeg-Quake achieves the mIoU of 75.03% for geo-damage analysis, closely matching expert-level annotations. These results show that DASeg achieves superior workflow enhancement in infrastructure damage segmentation without needing pixel-level annotation, providing a practical solution for early-stage disaster response. Full article
Show Figures

Graphical abstract

22 pages, 9981 KB  
Article
Design and Experiment of Autonomous Shield-Cutting End-Effector for Dual-Zone Maize Field Weeding
by Yunxiang Li, Yinsong Qu, Yuan Fang, Jie Yang and Yanfeng Lu
Agriculture 2025, 15(14), 1549; https://doi.org/10.3390/agriculture15141549 - 18 Jul 2025
Viewed by 431
Abstract
This study presented an autonomous shield-cutting end-effector for maize surrounding weeding (SEMSW), addressing the challenges of the low weed removal rate (WRR) and high seedling damage rate (SDR) in northern China’s 3–5 leaf stage maize. The SEMSW integrated seedling positioning, robotic arm control, [...] Read more.
This study presented an autonomous shield-cutting end-effector for maize surrounding weeding (SEMSW), addressing the challenges of the low weed removal rate (WRR) and high seedling damage rate (SDR) in northern China’s 3–5 leaf stage maize. The SEMSW integrated seedling positioning, robotic arm control, and precision weeding functionalities: a seedling positioning sensor identified maize seedlings and weeds, guiding XYZ translational motions to align the robotic arm. The seedling-shielding anti-cutting mechanism (SAM) enclosed crop stems, while the contour-adaptive weeding mechanism (CWM) activated two-stage retractable blades (TRWBs) for inter/intra-row weeding operations. The following key design parameters were determined: 150 mm inner diameter for the seedling-shielding disc; 30 mm minimum inscribed-circle for retractable clamping units (RCUs); 40 mm ground clearance for SAM; 170 mm shielding height; and 100 mm minimum inscribed-circle diameter for the TRWB. Mathematical optimization defined the shape-following weeding cam (SWC) contour and TRWB dimensional chain. Kinematic/dynamic models were introduced alongside an adaptive sliding mode controller, ensuring lateral translation error convergence. A YOLOv8 model achieved 0.951 precision, 0.95 mAP50, and 0.819 mAP50-95, striking a balance between detection accuracy and localization precision. Field trials of the prototype showed 88.3% WRR and 2.2% SDR, meeting northern China’s agronomic standards. Full article
Show Figures

Figure 1

20 pages, 1669 KB  
Article
Automated Pneumothorax Segmentation with a Spatial Prior Contrast Adapter
by Yiming Jia and Essam A. Rashed
Appl. Sci. 2025, 15(12), 6598; https://doi.org/10.3390/app15126598 - 12 Jun 2025
Viewed by 851
Abstract
Pneumothorax is a critical condition that requires rapid and accurate diagnosis from standard chest radiographs. Identifying and segmenting the location of the pneumothorax are essential for developing an effective treatment plan. nnUNet is a self-configuring, deep learning-based framework for medical image segmentation. Despite [...] Read more.
Pneumothorax is a critical condition that requires rapid and accurate diagnosis from standard chest radiographs. Identifying and segmenting the location of the pneumothorax are essential for developing an effective treatment plan. nnUNet is a self-configuring, deep learning-based framework for medical image segmentation. Despite adjusting its parameters automatically through data-driven optimization strategies and offering robust feature extraction and segmentation capabilities across diverse datasets, our initial experiments revealed that nnUNet alone struggled to achieve consistently accurate segmentation for pneumothorax, particularly in challenging scenarios where subtle intensity variations and anatomical noise obscure the target regions. This study aims to enhance the accuracy and robustness of pneumothorax segmentation in low-contrast chest radiographs by integrating spatial prior information and attention mechanism into the nnUNet framework. In this study, we introduce the spatial prior contrast adapter (SPCA)-enhanced nnUNet by implementing two modules. First, we integrate an SPCA utilizing the MedSAM foundation model to incorporate spatial prior information of the lung region, effectively guiding the segmentation network to focus on anatomically relevant areas. In the meantime, a probabilistic atlas, which shows the probability of an area prone to pneumothorax, is generated based on the ground truth masks. Both the lung segmentation results and the probabilistic atlas are used as attention maps in nnUNet. Second, we combine the two attention maps as additional input into nnUNet and integrate an attention mechanism into standard nnUNet by using a convolutional block attention module (CBAM). We validate our method by experimenting on the dataset CANDID-PTX, a benchmark dataset representing 19,237 chest radiographs. By introducing spatial awareness and intensity adjustments, the model reduces false positives and improves the precision of boundary delineations, ultimately overcoming many of the limitations associated with low-contrast radiographs. Compared with standard nnUNet, SPCA-enhanced nnUNet achieves an average Dice coefficient of 0.81, which indicates an improvement of standard nnUNet by 15%. This study provides a novel approach toward enhancing the segmentation performance of pneumothorax with low contrast in chest X-ray radiographs. Full article
(This article belongs to the Special Issue Applications of Computer Vision and Image Processing in Medicine)
Show Figures

Figure 1

17 pages, 3455 KB  
Article
Segment Anything Model (SAM) and Medical SAM (MedSAM) for Lumbar Spine MRI
by Christian Chang, Hudson Law, Connor Poon, Sydney Yen, Kaustubh Lall, Armin Jamshidi, Vadim Malis, Dosik Hwang and Won C. Bae
Sensors 2025, 25(12), 3596; https://doi.org/10.3390/s25123596 - 7 Jun 2025
Cited by 1 | Viewed by 2279
Abstract
Lumbar spine Magnetic Resonance Imaging (MRI) is commonly used for intervertebral disc (IVD) and vertebral body (VB) evaluation during low back pain. Segmentation of these tissues can provide useful quantitative information such as shape and volume. The objective of the study was to [...] Read more.
Lumbar spine Magnetic Resonance Imaging (MRI) is commonly used for intervertebral disc (IVD) and vertebral body (VB) evaluation during low back pain. Segmentation of these tissues can provide useful quantitative information such as shape and volume. The objective of the study was to determine the performances of Segment Anything Model (SAM) and medical SAM (MedSAM), two “zero-shot” deep learning models, in segmenting lumbar IVD and VB from MRI images and compare against the nnU-Net model. This cadaveric study used 82 donor spines. Manual segmentation was performed to serve as ground truth. Two readers processed the spine MRI using SAM and MedSAM by placing points or drawing bounding boxes around regions of interest (ROI). The outputs were compared against ground truths to determine Dice score, sensitivity, and specificity. Qualitatively, results varied but overall, MedSAM produced more consistent results than SAM, but neither matched the performance of nnU-Net. Mean Dice scores for MedSAM were 0.79 for IVDs and 0.88 for VBs, and significantly higher (each p < 0.001) than those for SAM (0.64 for IVDs, 0.83 for VBs). Both were lower compared to nnU-Net (0.99 for IVD and VB). Sensitivity values also favored MedSAM. These results demonstrated the feasibility of “zero-shot” DL models to segment lumbar spine MRI. While performance falls short of recent models, these zero-shot models offer key advantages in not needing training data and faster adaptation to other anatomies and tasks. Validation of a generalizable segmentation model for lumbar spine MRI can lead to more precise diagnostics, follow-up, and enhanced back pain research, with potential cost savings from automated analyses while supporting the broader use of AI and machine learning in healthcare. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 8306 KB  
Article
Plant Sam Gaussian Reconstruction (PSGR): A High-Precision and Accelerated Strategy for Plant 3D Reconstruction
by Jinlong Chen, Yingjie Jiao, Fuqiang Jin, Xingguo Qin, Yi Ning, Minghao Yang and Yongsong Zhan
Electronics 2025, 14(11), 2291; https://doi.org/10.3390/electronics14112291 - 4 Jun 2025
Viewed by 935
Abstract
Plant 3D reconstruction plays a critical role in precision agriculture and plant growth monitoring, yet it faces challenges such as complex background interference, difficulties in capturing intricate plant structures, and a slow reconstruction speed. In this study, we propose PlantSamGaussianReconstruction (PSGR), a novel [...] Read more.
Plant 3D reconstruction plays a critical role in precision agriculture and plant growth monitoring, yet it faces challenges such as complex background interference, difficulties in capturing intricate plant structures, and a slow reconstruction speed. In this study, we propose PlantSamGaussianReconstruction (PSGR), a novel method that integrates Grounding SAM with 3D Gaussian Splatting (3DGS) techniques. PSGR employs Grounding DINO and SAM for accurate plant–background segmentation, utilizes algorithms such as Scale-Invariant Feature Transform (SIFT) for camera pose estimation and sparse point cloud generation, and leverages 3DGS for plant reconstruction. Furthermore, a 3D–2D projection-guided optimization strategy is introduced to enhance segmentation precision. The experimental results of various multi-view plant image datasets demonstrate that PSGR effectively removes background noise under diverse environments, accurately captures plant details, and achieves peak signal-to-noise ratio (PSNR) values exceeding 30 in most scenarios, outperforming the original 3DGS approach. Moreover, PSGR reduces training time by up to 26.9%, significantly improving reconstruction efficiency. These results suggest that PSGR is an efficient, scalable, and high-precision solution for plant modeling. Full article
Show Figures

Figure 1

17 pages, 2122 KB  
Article
Improving Dynamic Gesture Recognition with Attention-Enhanced LSTM and Grounding SAM
by Jinlong Chen, Fuqiang Jin, Yingjie Jiao, Yongsong Zhan and Xingguo Qin
Electronics 2025, 14(9), 1793; https://doi.org/10.3390/electronics14091793 - 28 Apr 2025
Viewed by 964
Abstract
Dynamic gesture detection is a key topic in computer vision and deep learning, with applications in human–computer interaction and virtual reality. However, traditional methods struggle with long sequences, complex scenes, and multimodal data, facing issues such as high computational cost and background noise. [...] Read more.
Dynamic gesture detection is a key topic in computer vision and deep learning, with applications in human–computer interaction and virtual reality. However, traditional methods struggle with long sequences, complex scenes, and multimodal data, facing issues such as high computational cost and background noise. This study proposes an Attention-Enhanced dual-layer LSTM (Long Short-Term Memory) network combined with Grounding SAM (Grounding Segment Anything Model) for gesture detection. The dual-layer LSTM captures long-term temporal dependencies, while a multi-head attention mechanism improves the extraction of global spatiotemporal features. Grounding SAM, composed of Grounding DINO for object localization and SAM (Segment Anything Model) for image segmentation, is employed during preprocessing to precisely extract gesture regions and remove background noise. This enhances feature quality and reduces interference during training. Experiments show that the proposed method achieves 96.3% accuracy on a self-constructed dataset and 96.1% on the SHREC 2017 dataset, outperforming several baseline methods by an average of 4.6 percentage points. It also demonstrates strong robustness under complex and dynamic conditions. This approach provides a reliable and efficient solution for future dynamic gesture-recognition systems. Full article
Show Figures

Figure 1

26 pages, 20953 KB  
Article
Optimization-Based Downscaling of Satellite-Derived Isotropic Broadband Albedo to High Resolution
by Niko Lukač, Domen Mongus and Marko Bizjak
Remote Sens. 2025, 17(8), 1366; https://doi.org/10.3390/rs17081366 - 11 Apr 2025
Viewed by 486
Abstract
In this paper, a novel method for estimating high-resolution isotropic broadband albedo is proposed, by downscaling satellite-derived albedo using an optimization approach. At first, broadband albedo is calculated from the lower-resolution multispectral satellite image using standard narrow-to-broadband (NTB) conversion, where the surfaces are [...] Read more.
In this paper, a novel method for estimating high-resolution isotropic broadband albedo is proposed, by downscaling satellite-derived albedo using an optimization approach. At first, broadband albedo is calculated from the lower-resolution multispectral satellite image using standard narrow-to-broadband (NTB) conversion, where the surfaces are considered Lambertian with isotropic reflectance. The high-resolution true orthophoto for the same location is segmented with the deep learning-based Segment Anything Model (SAM), and the resulting segments are refined with a classified digital surface model (cDSM) to exclude small transient objects. Afterwards, the remaining segments are grouped using K-means clustering, by considering orthophoto-visible (VIS) and near-infrared (NIR) bands. These segments present surfaces with similar materials and underlying reflectance properties. Next, the Differential Evolution (DE) optimization algorithm is applied to approximate albedo values to these segments so that their spatial aggregate matches the coarse-resolution satellite albedo, by proposing two novel objective functions. Extensive experiments considering different DE parameters over an 0.75 km2 large urban area in Maribor, Slovenia, have been carried out, where Sentinel-2 Level-2A NTB-derived albedo was downscaled to 1 m spatial resolution. Looking at the performed spatiospectral analysis, the proposed method achieved absolute differences of 0.09 per VIS band and below 0.18 per NIR band, in comparison to lower-resolution NTB-derived albedo. Moreover, the proposed method achieved a root mean square error (RMSE) of 0.0179 and a mean absolute percentage error (MAPE) of 4.0299% against ground truth broadband albedo annotations of characteristic materials in the given urban area. The proposed method outperformed the Enhanced Super-Resolution Generative Adversarial Networks (ESRGANs), which achieved an RMSE of 0.0285 and an MAPE of 9.2778%, and the Blind Super-Resolution Generative Adversarial Network (BSRGAN), which achieved an RMSE of 0.0341 and an MAPE of 12.3104%. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

19 pages, 10070 KB  
Article
SAR Image Target Segmentation Guided by the Scattering Mechanism-Based Visual Foundation Model
by Chaochen Zhang, Jie Chen, Zhongling Huang, Hongcheng Zeng, Zhixiang Huang, Yingsong Li, Hui Xu, Xiangkai Pu and Long Sun
Remote Sens. 2025, 17(7), 1209; https://doi.org/10.3390/rs17071209 - 28 Mar 2025
Cited by 1 | Viewed by 965
Abstract
As a typical visual foundation model, SAM has been extensively utilized for optical image segmentation tasks. However, synthetic aperture radar (SAR) employs a unique imaging mechanism, and its images are very different from optical images. Directly transferring a pretrained SAM from optical scenes [...] Read more.
As a typical visual foundation model, SAM has been extensively utilized for optical image segmentation tasks. However, synthetic aperture radar (SAR) employs a unique imaging mechanism, and its images are very different from optical images. Directly transferring a pretrained SAM from optical scenes to SAR image instance segmentation tasks can lead to a substantial decline in performance. Therefore, this paper fully integrates the SAR scattering mechanism, and proposes a SAR image target segmentation method guided by the SAR scattering mechanism-based visual foundation model. First, considering the discrete distribution features of strong scattering points in SAR imagery, we develop an edge enhancement morphological adaptor. This adaptor is designed to incorporate a limited set of trainable parameters aimed at effectively boosting the target’s edge morphology, allowing quick fine-tuning within the SAR realm. Second, an adaptive denoising module based on wavelets and soft-thresholding techniques is implemented to reduce the impact of SAR coherent speckle noise, thus improving the feature representation performance. Furthermore, an efficient automatic prompt module based on a deep object detector is built to enhance the ability of rapid target localization in wide-area scenes and improve image segmentation performance. Our approach has been shown to outperform current segmentation methods through experiments conducted on two open-source datasets, SSDD and HRSID. When the ground-truth is used as a prompt, SARSAM improves mIOU by more than 10%, and APmask50 by more than 5% from the baseline. In addition, the computational cost is greatly reduced because the number of parameters and FLOPs of the structures that require fine-tuning are only 13.5% and 10.1% of the baseline, respectively. Full article
(This article belongs to the Special Issue Physics Informed Foundational Models for SAR Image Interpretation)
Show Figures

Figure 1

21 pages, 2437 KB  
Article
Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming
by Stevan Cakic, Tomo Popovic, Srdjan Krco, Ivan Jovovic and Dejan Babic
Appl. Sci. 2025, 15(7), 3663; https://doi.org/10.3390/app15073663 - 27 Mar 2025
Viewed by 1578
Abstract
This research explores the role of synthetic data in enhancing the accuracy of deep learning models for automated poultry farm management. A hybrid dataset was created by combining real images of chickens with 400 FLUX.1 [dev] generated synthetic images, aiming to reduce reliance [...] Read more.
This research explores the role of synthetic data in enhancing the accuracy of deep learning models for automated poultry farm management. A hybrid dataset was created by combining real images of chickens with 400 FLUX.1 [dev] generated synthetic images, aiming to reduce reliance on extensive manual data collection. The YOLOv9 model was trained on various dataset compositions to assess the impact of synthetic data on detection performance. Additionally, automated annotation techniques utilizing Grounding DINO and SAM2 streamlined dataset labeling, significantly reducing manual effort. Experimental results demonstrate that models trained on a balanced combination of real and synthetic images performed comparably to those trained on larger, augmented datasets, confirming the effectiveness of synthetic data in improving model generalization. The best-performing model trained on 300 real and 100 synthetic images achieved mAP = 0.829, while models trained on 100 real and 300 synthetic images reached mAP = 0.820, highlighting the potential of generative AI to bridge data scarcity gaps in precision poultry farming. This study demonstrates that synthetic data can enhance AI-driven poultry monitoring and reduce the importance of collecting real data. Full article
(This article belongs to the Special Issue Applied Computer Vision in Industry and Agriculture)
Show Figures

Figure 1

16 pages, 1769 KB  
Article
Advanced Brain Tumor Segmentation Using SAM2-UNet
by Rohit Viswakarma Pidishetti, Maaz Amjad and Victor S. Sheng
Appl. Sci. 2025, 15(6), 3267; https://doi.org/10.3390/app15063267 - 17 Mar 2025
Cited by 2 | Viewed by 2183
Abstract
Image segmentation is one of the key factors in diagnosing glioma patients with brain tumors. It helps doctors identify the types of tumor that a patient is carrying and will lead to a prognosis that will help save the lives of patients. The [...] Read more.
Image segmentation is one of the key factors in diagnosing glioma patients with brain tumors. It helps doctors identify the types of tumor that a patient is carrying and will lead to a prognosis that will help save the lives of patients. The analysis of medical images is a specialized domain in computer vision and image processing. This process extracts meaningful information from medical images that helps in treatment planning and monitoring the condition of patients. Deep learning models like CNN have shown promising results in image segmentation by identifying complex patterns in the image data. These methods have also shown great results in tumor segmentation and the identification of anomalies, which assist health care professionals in treatment planning. Despite advancements made in the domain of deep learning for medical image segmentation, the precise segmentation of tumors remains challenging because of the complex structures of tumors across patients. Existing models, such as traditional U-Net- and SAM-based architectures, either lack efficiency in handling class-specific segmentation or require extensive computational resources. This study aims to bridge this gap by proposing Segment Anything Model 2-UNetwork, a hybrid model that leverages the strengths of both architectures to improve segmentation accuracy and consumes less computational resources by maintaining efficiency. The proposed model possesses the ability to perform explicitly well on scarce data, and we trained this model on the Brain Tumor Segmentation Challenge 2020 (BraTS) dataset. This architecture is inspired by U-Networks that are based on the encoder and decoder architecture. The Hiera pre-trained model is set as a backbone to this architecture to capture multi-scale features. Adapters are embedded into the encoder to achieve parameter-efficient fine-tuning. The dataset contains four channels of MRI scans of 369 glioma patients as T1, T1ce, T2, and T2-flair and a segmentation mask for each patient consisting of non-tumor (NT), necrotic and non-enhancing tumor (NCR/NET), and peritumoral edema or GD-enhancing tumor (ET) as the ground-truth value. These experiments yielded good segmentation performance and achieved balanced performance based on the metrics discussed next in this paragraph for each tumor region. Our experiments yielded the following results with minimal hardware resources, i.e., 16 GB RAM with 30 epochs: a mean Dice score (mDice) of 0.771, a mean Intersection over Union (mIoU) of 0.569, an Sα score of 0.692, a weighted F-beta score (Fβw) of 0.267, a F-beta score (Fβ) of 0.261, an Eϕ score of 0.857, and a Mean Absolute Error (MAE) of 0.04 on the BraTS 2020 dataset. Full article
(This article belongs to the Special Issue Artificial Intelligence Techniques for Medical Data Analytics)
Show Figures

Figure 1

19 pages, 9426 KB  
Article
Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input
by Jinqiang Wang, Zhanjie Li, Ling Zhou, Chi Ma and Wenchao Sun
Remote Sens. 2025, 17(6), 967; https://doi.org/10.3390/rs17060967 - 9 Mar 2025
Cited by 2 | Viewed by 1865
Abstract
Satellite and reanalysis-based precipitation products have played a crucial role in addressing the challenges associated with limited ground-based observational data. These products are widely utilized in hydrometeorological research, particularly in data-scarce regions like the Qinghai–Tibetan Plateau (QTP). This study proposed an ensemble streamflow [...] Read more.
Satellite and reanalysis-based precipitation products have played a crucial role in addressing the challenges associated with limited ground-based observational data. These products are widely utilized in hydrometeorological research, particularly in data-scarce regions like the Qinghai–Tibetan Plateau (QTP). This study proposed an ensemble streamflow simulation method using remote sensing precipitation data as input. By employing a 1D Convolutional Neural Networks (1D CNN), streamflow simulations from multiple models are integrated and a Shapley Additive exPlanations (SHAP) interpretability analysis was conducted to examine the contributions of individual models on ensemble streamflow simulation. The method is demonstrated using GPM IMERG (Global Precipitation Measurement Integrated Multi-satellite Retrievals) remote sensing precipitation data for streamflow estimation in the upstream region of the Ganzi gauging station in the Yalong River basin of QTP for the period from 2010 to 2019. Streamflow simulations were carried out using models with diverse structures, including the physically based BTOPMC (Block-wise use of TOPMODEL) and two machine learning models, i.e., Random Forest (RF) and Long Short-Term Memory Neural Networks (LSTM). Furthermore, ensemble simulations were compared: the Simple Average Method (SAM), Weighted Average Method (WAM), and the proposed 1D CNN method. The results revealed that, for the hydrological simulation of each individual models, the Kling–Gupta Efficiency (KGE) values during the validation period were 0.66 for BTOPMC, 0.71 for RF, and 0.74 for LSTM. Among the ensemble approaches, the validation period KGE values for SAM, WAM, and the 1D CNN-based nonlinear method were 0.74, 0.73, and 0.82, respectively, indicating that the nonlinear 1D CNN approach achieved the highest accuracy. The SHAP-based interpretability analysis further demonstrated that RF made the most significant contribution to the ensemble simulation, while LSTM contributed the least. These findings highlight that the proposed 1D CNN ensemble simulation framework has great potential to improve streamflow estimations using remote sensing precipitation data as input and may provide new insight into how deep learning methods advance the application of remote sensing in hydrological research. Full article
Show Figures

Figure 1

Back to TopTop