Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (78)

Search Parameters:
Keywords = Segformer model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 4104 KB  
Article
CropCLR-Wheat: A Label-Efficient Contrastive Learning Architecture for Lightweight Wheat Pest Detection
by Yan Wang, Chengze Li, Chenlu Jiang, Mingyu Liu, Shengzhe Xu, Binghua Yang and Min Dong
Insects 2025, 16(11), 1096; https://doi.org/10.3390/insects16111096 (registering DOI) - 25 Oct 2025
Abstract
To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature [...] Read more.
To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature filtering module, the model significantly enhances pest damage localization and feature consistency, enabling high-accuracy recognition under limited-sample conditions. In 5-shot classification tasks, CropCLR-Wheat achieves a precision of 89.4%, a recall of 87.1%, and an accuracy of 88.2%; these metrics further improve to 92.3%, 90.5%, and 91.2%, respectively, under the 10-shot setting. In the semantic segmentation of wheat pest damage regions, the model attains a mean intersection over union (mIoU) of 82.7%, with precision and recall reaching 85.2% and 82.4%, respectively, markedly outperforming advanced models such as SegFormer and Mask R-CNN. In robustness evaluation under viewpoint disturbances, a prediction consistency rate of 88.7%, a confidence variation of only 7.8%, and a prediction consistency score (PCS) of 0.914 are recorded, indicating strong stability and adaptability. Deployment results further demonstrate the framework’s practical viability: on the Jetson Nano device, an inference latency of 84 ms, a frame rate of 11.9 FPS, and an accuracy of 88.2% are achieved. These results confirm the efficiency of the proposed approach in edge computing environments. By balancing generalization performance with deployability, the proposed method provides robust support for intelligent agricultural terminal systems and holds substantial potential for wide-scale application. Full article
Show Figures

Figure 1

20 pages, 45835 KB  
Article
Computer Vision-Assisted Spatial Analysis of Mitoses and Vasculature in Lung Cancer
by Anna Timakova, Alexey Fayzullin, Vladislav Ananev, Egor Zemnuhov, Vadim Alfimov, Alexey Baranov, Yulia Smirnova, Vitaly Shatalov, Natalia Konukhova, Evgeny Karpulevich, Peter Timashev and Vladimir Makarov
J. Clin. Med. 2025, 14(21), 7526; https://doi.org/10.3390/jcm14217526 - 23 Oct 2025
Viewed by 188
Abstract
Background/Objectives: Lung cancer is characterized by a significant microstructural heterogenicity among different histological types. Artificial intelligence and digital pathology instruments can facilitate morphological analysis by introducing calculated metrics allowing for the distinguishment of different tissue patterns. Methods: We used computer vision models to [...] Read more.
Background/Objectives: Lung cancer is characterized by a significant microstructural heterogenicity among different histological types. Artificial intelligence and digital pathology instruments can facilitate morphological analysis by introducing calculated metrics allowing for the distinguishment of different tissue patterns. Methods: We used computer vision models to calculate a number of morphometric features of tumor vascularization and proliferation. We used two frameworks to process whole-slide images: (1) LVI-PathNet framework for vascular detection, based on the SegFormer architecture; and (2) Mito-PathNet framework for mitotic figure detection, based on the RetinaNet detector and an ensemble classification model. The results were visualized in the segmented and gradient heatmaps. Results: SegFormer for vessel segmentation achieved the following quality metrics: IoU = 0.96, FBeta-score = 0.98, and AUC-ROC = 0.98. RetinaNet + CNN ensemble achieved the following quality metrics: specificity = 0.96 and sensitivity = 0.97. The analysis of the obtained parameters allowed us to identify trophic patterns of lung cancer according to the degree of aggressiveness, which can serve as potential targets for therapy, including proliferative-vascular, hypoxic, proliferative, vascular, and inactive. Conclusions: The analysis of the obtained parameters allowed us to identify distinct quantitative characteristics for each histological type of lung cancer. These patterns could potentially become markers for therapeutic choices, such as antiangiogenic and hypoxia-induced factor therapy. Full article
Show Figures

Figure 1

28 pages, 16418 KB  
Article
Hybrid-SegUFormer: A Hybrid Multi-Scale Network with Self-Distillation for Robust Landslide InSAR Deformation Detection
by Wenyi Zhao, Jiahao Zhang, Jianao Cai and Dongping Ming
Remote Sens. 2025, 17(21), 3514; https://doi.org/10.3390/rs17213514 - 23 Oct 2025
Viewed by 197
Abstract
Landslide deformation monitoring via InSAR is crucial for assessing the risk of hazards. Quick and accurate detection of active deformation zones is crucial for early warning and mitigation planning. While the application of deep learning has substantially improved the detection efficiency, several challenges [...] Read more.
Landslide deformation monitoring via InSAR is crucial for assessing the risk of hazards. Quick and accurate detection of active deformation zones is crucial for early warning and mitigation planning. While the application of deep learning has substantially improved the detection efficiency, several challenges still persist, such as poor multi-scale perception, blurred boundaries, and limited model generalization. This study proposes Hybrid-SegUFormer to address these limitations. The model integrates the SegFormer encoder’s efficient feature extraction with the U-Net decoder’s superior boundary restoration. It introduces a multi-scale fusion decoding mechanism to enhance context perception structurally and incorporates a self-distillation strategy to significantly improve generalization capability. Hybrid-SegUFormer achieves detection performance (98.79% accuracy, 80.05% F1-score) while demonstrating superior multi-scale adaptability (IoU degradation of only 6.99–8.83%) and strong cross-regional generalization capability. The synergistic integration of its core modules enables an optimal balance between precision and recall, making it particularly effective for complex landslide detection tasks. This study provides a new approach for intelligent interpretation of InSAR deformation in complex mountainous areas. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

33 pages, 20327 KB  
Article
Automated Detection of Beaver-Influenced Floodplain Inundations in Multi-Temporal Aerial Imagery Using Deep Learning Algorithms
by Evan Zocco, Chandi Witharana, Isaac M. Ortega and William Ouimet
ISPRS Int. J. Geo-Inf. 2025, 14(10), 383; https://doi.org/10.3390/ijgi14100383 - 30 Sep 2025
Viewed by 258
Abstract
Remote sensing provides a viable alternative for understanding landscape modifications attributed to beaver activity. The central objective of this study is to integrate multi-source remote sensing observations in tandem with a deep learning (DL) (convolutional neural net or transformer) model to automatically map [...] Read more.
Remote sensing provides a viable alternative for understanding landscape modifications attributed to beaver activity. The central objective of this study is to integrate multi-source remote sensing observations in tandem with a deep learning (DL) (convolutional neural net or transformer) model to automatically map beaver-influenced floodplain inundations (BIFI) over large geographical extents. We trained, validated, and tested eleven different model configurations in three architectures using five ResNet and five B-Finetuned encoders. The training dataset consisted of >25,000 manually annotated aerial image tiles of BIFIs in Connecticut. The YOLOv8 architecture outperformed competing configurations and achieved an F1 score of 80.59% and pixel-based map accuracy of 98.95%. SegFormer and U-Net++’s highest-performing models had F1 scores of 68.98% and 78.86%, respectively. The YOLOv8l-seg model was deployed at a statewide scale based on 1 m resolution multi-temporal aerial imagery acquired from 1990 to 2019 under leaf-on and leaf-off conditions. Our results suggest a variety of inferences when comparing leaf-on and leaf-off conditions of the same year. The model exhibits limitations in identifying BIFIs in panchromatic imagery in occluded environments. Study findings demonstrate the potential of harnessing historical and modern aerial image datasets with state-of-the-art DL models to increase our understanding of beaver activity across space and time. Full article
Show Figures

Figure 1

18 pages, 1597 KB  
Article
A Comparative Analysis of SegFormer, FabE-Net and VGG-UNet Models for the Segmentation of Neural Structures on Histological Sections
by Igor Makarov, Elena Koshevaya, Alina Pechenina, Galina Boyko, Anna Starshinova, Dmitry Kudlay, Taiana Makarova and Lubov Mitrofanova
Diagnostics 2025, 15(18), 2408; https://doi.org/10.3390/diagnostics15182408 - 22 Sep 2025
Viewed by 464
Abstract
Background: Segmenting nerve fibres in histological images is a tricky job because of how much the tissue looks can change. Modern neural network architectures, including U-Net and transformers, demonstrate varying degrees of effectiveness in this area. The aim of this study is to [...] Read more.
Background: Segmenting nerve fibres in histological images is a tricky job because of how much the tissue looks can change. Modern neural network architectures, including U-Net and transformers, demonstrate varying degrees of effectiveness in this area. The aim of this study is to conduct a comparative analysis of the SegFormer, VGG-UNet, and FabE-Net models in terms of segmentation quality and speed. Methods: The training sample consisted of more than 75,000 pairs of images of different tissues (original slice and corresponding mask), scaled from 1024 × 1024 to 224 × 224 pixels to optimise computations. Three neural network architectures were used: the classic VGG-UNet, FabE-Net with attention and global context perception blocks, and the SegFormer transformer model. For an objective assessment of the quality of the models, expert validation was carried out with the participation of four independent pathologists, who evaluated the quality of segmentation according to specified criteria. Quality metrics (precision, recall, F1-score, accuracy) were calculated as averages based on the assessments of all experts, which made it possible to take into account variability in interpretation and increase the reliability of the results. Results: SegFormer achieved stable stabilisation of the loss function faster than the other models—by the 20–30th epoch, compared to 45–60 epochs for VGG-UNet and FabE-Net. Despite taking longer to train per epoch, SegFormer produced the best segmentation quality, with the following metrics: precision 0.84, recall 0.99, F1-score 0.91 and accuracy 0.89. It also annotated a complete histological section in the fastest time. Visual analysis revealed that, compared to other models, which tended to produce incomplete or excessive segmentation, SegFormer more accurately and completely highlights nerve structures. Conclusions: Using attention mechanisms in SegFormer compensates for morphological variability in tissues, resulting in faster and higher-quality segmentation. Image scaling does not impair training quality while significantly accelerating computational processes. These results confirm the potential of SegFormer for practical use in digital pathology, while also highlighting the need for high-precision, immunohistochemistry-informed labelling to improve segmentation accuracy. Full article
(This article belongs to the Special Issue Pathology and Diagnosis of Neurological Disorders, 2nd Edition)
Show Figures

Figure 1

40 pages, 9065 KB  
Article
Empirical Evaluation of Invariances in Deep Vision Models
by Konstantinos Keremis, Eleni Vrochidou and George A. Papakostas
J. Imaging 2025, 11(9), 322; https://doi.org/10.3390/jimaging11090322 - 19 Sep 2025
Viewed by 650
Abstract
The ability of deep learning models to maintain consistent performance under image transformations-termed invariances, is critical for reliable deployment across diverse computer vision applications. This study presents a comprehensive empirical evaluation of modern convolutional neural networks (CNNs) and vision transformers (ViTs) concerning four [...] Read more.
The ability of deep learning models to maintain consistent performance under image transformations-termed invariances, is critical for reliable deployment across diverse computer vision applications. This study presents a comprehensive empirical evaluation of modern convolutional neural networks (CNNs) and vision transformers (ViTs) concerning four fundamental types of image invariances: blur, noise, rotation, and scale. We analyze a curated selection of thirty models across three common vision tasks, object localization, recognition, and semantic segmentation, using benchmark datasets including COCO, ImageNet, and a custom segmentation dataset. Our experimental protocol introduces controlled perturbations to test model robustness and employs task-specific metrics such as mean Intersection over Union (mIoU), and classification accuracy (Acc) to quantify models’ performance degradation. Results indicate that while ViTs generally outperform CNNs under blur and noise corruption in recognition tasks, both model families exhibit significant vulnerabilities to rotation and extreme scale transformations. Notably, segmentation models demonstrate higher resilience to geometric variations, with SegFormer and Mask2Former emerging as the most robust architectures. These findings challenge prevailing assumptions regarding model robustness and provide actionable insights for designing vision systems capable of withstanding real-world input variability. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)
Show Figures

Figure 1

25 pages, 3025 KB  
Article
QiGSAN: A Novel Probability-Informed Approach for Small Object Segmentation in the Case of Limited Image Datasets
by Andrey Gorshenin and Anastasia Dostovalova
Big Data Cogn. Comput. 2025, 9(9), 239; https://doi.org/10.3390/bdcc9090239 - 18 Sep 2025
Viewed by 548
Abstract
The paper presents a novel probability-informed approach to improving the accuracy of small object semantic segmentation in high-resolution imagery datasets with imbalanced classes and a limited volume of samples. Small objects imply having a small pixel footprint on the input image, for example, [...] Read more.
The paper presents a novel probability-informed approach to improving the accuracy of small object semantic segmentation in high-resolution imagery datasets with imbalanced classes and a limited volume of samples. Small objects imply having a small pixel footprint on the input image, for example, ships in the ocean. Informing in this context means using mathematical models to represent data in the layers of deep neural networks. Thus, the ensemble Quadtree-informed Graph Self-Attention Networks (QiGSANs) are proposed. New architectural blocks, informed by types of Markov random fields such as quadtrees, have been introduced to capture the interconnections between features in images at different spatial resolutions during the graph convolution of superpixel subregions. It has been analytically proven that quadtree-informed graph convolutional neural networks, a part of QiGSAN, tend to achieve faster loss reduction compared to convolutional architectures. This justifies the effectiveness of probability-informed modifications based on quadtrees. To empirically demonstrate the processing of real small data with imbalanced object classes using QiGSAN, two open datasets of synthetic aperture radar (SAR) imagery (up to 0.5 m per pixel) are used: the High Resolution SAR Images Dataset (HRSID) and the SAR Ship Detection Dataset (SSDD). The results of QiGSAN are compared to those of the transformers SegFormer and LWGANet, which constitute a new state-of-the-art model for UAV (Unmanned Aerial Vehicles) and SAR image processing. They are also compared to convolutional neural networks and several ensemble implementations using other graph neural networks. QiGSAN significantly increases the F1-score values by up to 63.93%, 48.57%, and 9.84% compared to transformers, convolutional neural networks, and other ensemble architectures, respectively. QiGSAN outperformed the base segmentors with the mIOU (mean intersection-over-union) metric too: the highest increase was 35.79%. Therefore, our approach to knowledge extraction using mathematical models allows us to significantly improve modern computer vision techniques for imbalanced data. Full article
Show Figures

Figure 1

23 pages, 10375 KB  
Article
Extraction of Photosynthetic and Non-Photosynthetic Vegetation Cover in Typical Grasslands Using UAV Imagery and an Improved SegFormer Model
by Jie He, Xiaoping Zhang, Weibin Li, Du Lyu, Yi Ren and Wenlin Fu
Remote Sens. 2025, 17(18), 3162; https://doi.org/10.3390/rs17183162 - 12 Sep 2025
Viewed by 567
Abstract
Accurate monitoring of the coverage and distribution of photosynthetic (PV) and non-photosynthetic vegetation (NPV) in the grasslands of semi-arid regions is crucial for understanding the environment and addressing climate change. However, the extraction of PV and NPV information from Unmanned Aerial Vehicle (UAV) [...] Read more.
Accurate monitoring of the coverage and distribution of photosynthetic (PV) and non-photosynthetic vegetation (NPV) in the grasslands of semi-arid regions is crucial for understanding the environment and addressing climate change. However, the extraction of PV and NPV information from Unmanned Aerial Vehicle (UAV) remote sensing imagery is often hindered by challenges such as low extraction accuracy and blurred boundaries. To overcome these limitations, this study proposed an improved semantic segmentation model, designated SegFormer-CPED. The model was developed based on the SegFormer architecture, incorporating several synergistic optimizations. Specifically, a Convolutional Block Attention Module (CBAM) was integrated into the encoder to enhance early-stage feature perception, while a Polarized Self-Attention (PSA) module was embedded to strengthen contextual understanding and mitigate semantic loss. An Edge Contour Extraction Module (ECEM) was introduced to refine boundary details. Concurrently, the Dice Loss function was employed to replace the Cross-Entropy Loss, thereby more effectively addressing the class imbalance issue and significantly improving both the segmentation accuracy and boundary clarity of PV and NPV. To support model development, a high-quality PV and NPV segmentation dataset for Hengshan grassland was also constructed. Comprehensive experimental results demonstrated that the proposed SegFormer-CPED model achieved state-of-the-art performance, with a mIoU of 93.26% and an F1-score of 96.44%. It significantly outperformed classic architectures and surpassed all leading frameworks benchmarked here. Its high-fidelity maps can bridge field surveys and satellite remote sensing. Ablation studies verified the effectiveness of each improved module and its synergistic interplay. Moreover, this study successfully utilized SegFormer-CPED to perform fine-grained monitoring of the spatiotemporal dynamics of PV and NPV in the Hengshan grassland, confirming that the model-estimated fPV and fNPV were highly correlated with ground survey data. The proposed SegFormer-CPED model provides a robust and effective solution for the precise, semi-automated extraction of PV and NPV from high-resolution UAV imagery. Full article
Show Figures

Graphical abstract

26 pages, 10494 KB  
Article
Data-Model Complexity Trade-Off in UAV-Acquired Ultra-High-Resolution Remote Sensing: Empirical Study on Photovoltaic Panel Segmentation
by Zhigang Zou, Xinhui Zhou, Pukaiyuan Yang, Jingyi Liu and Wu Yang
Drones 2025, 9(9), 619; https://doi.org/10.3390/drones9090619 - 3 Sep 2025
Viewed by 505
Abstract
With the growing adoption of deep learning in remote sensing, the increasing diversity of models and datasets has made method selection and experimentation more challenging, especially for non-expert users. This study presents a comprehensive evaluation of photovoltaic panel segmentation using a large-scale ultra-high-resolution [...] Read more.
With the growing adoption of deep learning in remote sensing, the increasing diversity of models and datasets has made method selection and experimentation more challenging, especially for non-expert users. This study presents a comprehensive evaluation of photovoltaic panel segmentation using a large-scale ultra-high-resolution benchmark of over 25,000 manually annotated unmanned aerial vehicle image patches, systematically quantifying the impact of model and data characteristics. Our results indicate that increasing the spatial diversity of training data has a more substantial impact on training stability and segmentation accuracy than simply adding spectral bands or enlarging the dataset volume. Across all experimental settings, moderate-sized models (DeepLabV3_50, ResUNet50, and SegFormer B4) often provided the best trade-off between segmentation performance and computational efficiency, achieving an average Intersection over Union (IoU) of 0.8966 comparable to 0.8970 of larger models. Moreover, model architecture plays a more critical role than model size; as the ResUNet models consistently achieved higher mean IoU than both DeepLabV3 and SegFormer models, with average improvements of 0.047 and 0.143, respectively. Our findings offer quantitative guidance for balancing architectural choices, model complexity, and dataset design, ultimately promoting more robust and efficient deployment of deep learning models in high-resolution remote sensing applications. Full article
Show Figures

Figure 1

27 pages, 6315 KB  
Article
A Method for the Extraction of Apocynum venetum L. Spatial Distribution in Yuli County, Xinjiang, via an Improved SegFormer Network
by Yixuan Wang, Hong Wang and Xinhui Wang
Remote Sens. 2025, 17(17), 3039; https://doi.org/10.3390/rs17173039 - 1 Sep 2025
Viewed by 888
Abstract
Efficient and accurate acquisition of spatial distribution information for Apocynum venetum L. is highly important for the sustainable development of agriculture in Yuli County, Xinjiang. As an important cash crop, Apocynum relies on specific natural conditions for growth, and its survival environment is [...] Read more.
Efficient and accurate acquisition of spatial distribution information for Apocynum venetum L. is highly important for the sustainable development of agriculture in Yuli County, Xinjiang. As an important cash crop, Apocynum relies on specific natural conditions for growth, and its survival environment is currently under severe threat. Therefore, accurately quantifying its spatial distribution information is crucial. This research takes Yuli County in Xinjiang as the study area and proposes an enhanced SegFormer model based on deep learning, aiming to realize the effective identification and extraction of Apocynum. The study indicates the following. (1) The improved SegFormer model adds smaller-scale feature layers in the encoder stage, allowing the improved model’s encoder to extract features at five scales: 1/4, 1/8, 1/16, 1/32, and 1/64; meanwhile, integrating the T2T-ViT backbone network into the encoder significantly enhances the precision and efficiency of Apocynum’s spatial distribution extraction. (2) Compared with Unet, TransUNet, and the original SegFormer, the improved SegFormer model outperforms the other models in terms of the mIoU, OA, and mPA metrics, achieving values of 88.22%, 93.98%, and 89.66%, respectively. (3) Ablation experiments show that the T2T_vit_14 model performs best among all the T2T-ViT configurations, with superior extraction effects on fragmented small plots compared with the other models. Therefore, the T2T_vit_14 model is integrated into the SegFormer model. This work improves the extraction accuracy and efficiency of the spatial distribution of Apocynum via an improved SegFormer model, which has strong stability and robustness and offers scientific evidence for resource protection, restoration planting, and germplasm breeding in Yuli County, Xinjiang. Full article
Show Figures

Graphical abstract

13 pages, 2141 KB  
Article
Transformer-Based Semantic Segmentation of Japanese Knotweed in High-Resolution UAV Imagery Using Twins-SVT
by Sruthi Keerthi Valicharla, Roghaiyeh Karimzadeh, Xin Li and Yong-Lak Park
Information 2025, 16(9), 741; https://doi.org/10.3390/info16090741 - 28 Aug 2025
Viewed by 673
Abstract
Japanese knotweed (Fallopia japonica) is a noxious invasive plant species that requires scalable and precise monitoring methods. Current visually based ground surveys are resource-intensive and inefficient for detecting Japanese knotweed in landscapes. This study presents a transformer-based semantic segmentation framework for [...] Read more.
Japanese knotweed (Fallopia japonica) is a noxious invasive plant species that requires scalable and precise monitoring methods. Current visually based ground surveys are resource-intensive and inefficient for detecting Japanese knotweed in landscapes. This study presents a transformer-based semantic segmentation framework for the automated detection of Japanese knotweed patches using high-resolution RGB imagery acquired with unmanned aerial vehicles (UAVs). We used the Twins Spatially Separable Vision Transformer (Twins-SVT), which utilizes a hierarchical architecture with spatially separable self-attention to effectively model long-range dependencies and multiscale contextual features. The model was trained on 6945 annotated aerial images collected in three sites infested with Japanese knotweed in West Virginia, USA. The results of this study showed that the proposed framework achieved superior performance compared to other transformer-based baselines. The Twins-SVT model achieved a mean Intersection over Union (mIoU) of 94.94% and an Average Accuracy (AAcc) of 97.50%, outperforming SegFormer, Swin-T, and ViT. These findings highlight the model’s ability to accurately distinguish Japanese knotweed patches from surrounding vegetation. The method and protocol presented in this research provide a robust, scalable solution for mapping Japanese knotweed through aerial imagery and highlight the successful use of advanced vision transformers in ecological and geospatial information analysis. Full article
(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)
Show Figures

Graphical abstract

19 pages, 14441 KB  
Article
Study on Forest Extraction and Ecological Network Construction of Remote Sensing Images Combined with Dynamic Large Kernel Convolution
by Feiyue Wang, Fan Yang, Xinyue Chang and Yang Ye
Forests 2025, 16(8), 1342; https://doi.org/10.3390/f16081342 - 18 Aug 2025
Viewed by 633
Abstract
As an important input parameter of the ecological network, the accuracy and detail with which forest cover is extracted directly constrain the accuracy of forest ecological network construction. The development of medium- and high-resolution remote sensing technology has provided an opportunity to obtain [...] Read more.
As an important input parameter of the ecological network, the accuracy and detail with which forest cover is extracted directly constrain the accuracy of forest ecological network construction. The development of medium- and high-resolution remote sensing technology has provided an opportunity to obtain accurate and high-resolution forest coverage data. As forests have diverse contours and complex scenes on remote sensing images, a model of them will be disturbed by the natural distribution characteristics of complex forests, which in turn will affect the extraction accuracy. In this study, we first constructed a rather large, complex, diverse, and scene-rich forest extraction dataset based on Sentinel-2 multispectral images, comprising 20,962 labeled images with a spatial resolution of 10 m, in a manually and accurately labeled manner. At the same time, this paper proposes the Dynamic Large Kernel Segformer and conducts forest extraction experiments in Liaoning Province, China. We then used forest coverage as an input parameter and classified the forest landscape patterns in the study area using a landscape spatial pattern characterization method, based on which a forest ecological network was constructed. The results show that the Dynamic Large Kernel Segformer obtains 80.58% IoU, 89.29% precision, 88.63% recall, and a 88.96% F1 Score in extraction accuracy, which is 4.02% higher than that of the Segformer network, and achieves large-scale forest extraction in the study area. The forest area in Liaoning Province increased during the 5-year period from 2019 to 2023. With respect to the overall spatial pattern change, the Core area of Liaoning Province saw an increase in 2019–2023, and the overall quality of the forest landscape improved. Finally, we constructed the forest ecological network for Liaoning Province in 2023, which consists of ecological sources, ecological nodes, and ecological corridors based on circuit theory. This method can be used to extract large areas of forest based on remote sensing images, which is helpful for constructing forest ecological networks and achieving coordinated regional, ecological, and economic development. Full article
(This article belongs to the Special Issue Long-Term Monitoring and Driving Forces of Forest Cover)
Show Figures

Figure 1

24 pages, 18845 KB  
Article
ProtoLeafNet: A Prototype Attention-Based Leafy Vegetable Disease Detection and Segmentation Network for Sustainable Agriculture
by Yuluxin Fu and Chen Shi
Sustainability 2025, 17(16), 7443; https://doi.org/10.3390/su17167443 - 18 Aug 2025
Viewed by 707
Abstract
In response to the challenges posed by visually similar disease symptoms, complex background noise, and the need for fine-grained disease classification in leafy vegetables, this study proposes ProtoLeafNet—a prototype attention-based deep learning model for multi-task disease detection and segmentation. By integrating a class-prototype–guided [...] Read more.
In response to the challenges posed by visually similar disease symptoms, complex background noise, and the need for fine-grained disease classification in leafy vegetables, this study proposes ProtoLeafNet—a prototype attention-based deep learning model for multi-task disease detection and segmentation. By integrating a class-prototype–guided attention mechanism with a prototype loss function, the model effectively enhances the focus on lesion areas and improves category discrimination. The architecture leverages a dual-task framework that combines object detection and semantic segmentation, achieving robust performance in real agricultural scenarios. Experimental results demonstrate that the model attains a detection precision of 93.12%, recall of 90.27%, accuracy of 91.45%, and mAP scores of 91.07% and 90.25% at IoU thresholds of 50% and 75%, respectively. In the segmentation task, the model achieves a precision of 91.79%, recall of 90.80%, accuracy of 93.77%, and mAP@50 and mAP@75 both reaching 90.80%. Comparative evaluations against state-of-the-art models such as YOLOv10 and TinySegformer verify the superior detection accuracy and fine-grained segmentation ability of ProtoLeafNet. These results highlight the potential of prototype attention mechanisms in enhancing model robustness, offering practical value for intelligent disease monitoring and sustainable agriculture. Full article
Show Figures

Figure 1

26 pages, 3316 KB  
Article
Land8Fire: A Complete Study on Wildfire Segmentation Through Comprehensive Review, Human-Annotated Multispectral Dataset, and Extensive Benchmarking
by Anh Tran, Minh Tran, Esteban Marti, Jackson Cothren, Chase Rainwater, Sandra Eksioglu and Ngan Le
Remote Sens. 2025, 17(16), 2776; https://doi.org/10.3390/rs17162776 - 11 Aug 2025
Cited by 1 | Viewed by 1350
Abstract
Early and accurate wildfire detection is critical for minimizing environmental damage and ensuring a timely response. However, existing satellite-based wildfire datasets suffer from limitations such as coarse ground truth, poor spectral coverage, and class imbalance, which hinder progress in developing robust segmentation models. [...] Read more.
Early and accurate wildfire detection is critical for minimizing environmental damage and ensuring a timely response. However, existing satellite-based wildfire datasets suffer from limitations such as coarse ground truth, poor spectral coverage, and class imbalance, which hinder progress in developing robust segmentation models. In this paper, we introduce Land8Fire, a new large-scale wildfire segmentation dataset composed of over 20,000 multispectral image patches derived from Landsat 8 and manually annotated for high-quality fire masks. Building on the ActiveFire dataset, Land8Fire improves ground truth reliability and offers predefined splits for consistent benchmarking. We evaluate a range of state-of-the-art convolutional and transformer-based models, including UNet, DeepLabV3+, SegFormer, and Mask2Former, and investigate the impact of different objective functions (cross-entropy and focal losses) and spectral band combinations (B1–B11). Our results reveal that focal loss, though effective for small object detection, underperforms in scenarios with clustered fires, leading to reduced recall. In contrast, spectral analysis highlights the critical role of short-wave infared 1 (SWIR1) and short-wave infared 2 (SWIR2) bands, with further gains observed when including near infrared (NIR) to penetrate smoke and cloud cover. Land8Fire sets a new benchmark for wildfire segmentation and provides valuable insights for advancing fire detection research in remote sensing. Full article
Show Figures

Figure 1

22 pages, 6482 KB  
Article
Surface Damage Detection in Hydraulic Structures from UAV Images Using Lightweight Neural Networks
by Feng Han and Chongshi Gu
Remote Sens. 2025, 17(15), 2668; https://doi.org/10.3390/rs17152668 - 1 Aug 2025
Viewed by 740
Abstract
Timely and accurate identification of surface damage in hydraulic structures is essential for maintaining structural integrity and ensuring operational safety. Traditional manual inspections are time-consuming, labor-intensive, and prone to subjectivity, especially for large-scale or inaccessible infrastructure. Leveraging advancements in aerial imaging, unmanned aerial [...] Read more.
Timely and accurate identification of surface damage in hydraulic structures is essential for maintaining structural integrity and ensuring operational safety. Traditional manual inspections are time-consuming, labor-intensive, and prone to subjectivity, especially for large-scale or inaccessible infrastructure. Leveraging advancements in aerial imaging, unmanned aerial vehicles (UAVs) enable efficient acquisition of high-resolution visual data across expansive hydraulic environments. However, existing deep learning (DL) models often lack architectural adaptations for the visual complexities of UAV imagery, including low-texture contrast, noise interference, and irregular crack patterns. To address these challenges, this study proposes a lightweight, robust, and high-precision segmentation framework, called LFPA-EAM-Fast-SCNN, specifically designed for pixel-level damage detection in UAV-captured images of hydraulic concrete surfaces. The developed DL-based model integrates an enhanced Fast-SCNN backbone for efficient feature extraction, a Lightweight Feature Pyramid Attention (LFPA) module for multi-scale context enhancement, and an Edge Attention Module (EAM) for refined boundary localization. The experimental results on a custom UAV-based dataset show that the proposed damage detection method achieves superior performance, with a precision of 0.949, a recall of 0.892, an F1 score of 0.906, and an IoU of 87.92%, outperforming U-Net, Attention U-Net, SegNet, DeepLab v3+, I-ST-UNet, and SegFormer. Additionally, it reaches a real-time inference speed of 56.31 FPS, significantly surpassing other models. The experimental results demonstrate the proposed framework’s strong generalization capability and robustness under varying noise levels and damage scenarios, underscoring its suitability for scalable, automated surface damage assessment in UAV-based remote sensing of civil infrastructure. Full article
Show Figures

Figure 1

Back to TopTop