Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (904)

Search Parameters:
Keywords = Attention UNet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 6482 KiB  
Article
Surface Damage Detection in Hydraulic Structures from UAV Images Using Lightweight Neural Networks
by Feng Han and Chongshi Gu
Remote Sens. 2025, 17(15), 2668; https://doi.org/10.3390/rs17152668 - 1 Aug 2025
Viewed by 102
Abstract
Timely and accurate identification of surface damage in hydraulic structures is essential for maintaining structural integrity and ensuring operational safety. Traditional manual inspections are time-consuming, labor-intensive, and prone to subjectivity, especially for large-scale or inaccessible infrastructure. Leveraging advancements in aerial imaging, unmanned aerial [...] Read more.
Timely and accurate identification of surface damage in hydraulic structures is essential for maintaining structural integrity and ensuring operational safety. Traditional manual inspections are time-consuming, labor-intensive, and prone to subjectivity, especially for large-scale or inaccessible infrastructure. Leveraging advancements in aerial imaging, unmanned aerial vehicles (UAVs) enable efficient acquisition of high-resolution visual data across expansive hydraulic environments. However, existing deep learning (DL) models often lack architectural adaptations for the visual complexities of UAV imagery, including low-texture contrast, noise interference, and irregular crack patterns. To address these challenges, this study proposes a lightweight, robust, and high-precision segmentation framework, called LFPA-EAM-Fast-SCNN, specifically designed for pixel-level damage detection in UAV-captured images of hydraulic concrete surfaces. The developed DL-based model integrates an enhanced Fast-SCNN backbone for efficient feature extraction, a Lightweight Feature Pyramid Attention (LFPA) module for multi-scale context enhancement, and an Edge Attention Module (EAM) for refined boundary localization. The experimental results on a custom UAV-based dataset show that the proposed damage detection method achieves superior performance, with a precision of 0.949, a recall of 0.892, an F1 score of 0.906, and an IoU of 87.92%, outperforming U-Net, Attention U-Net, SegNet, DeepLab v3+, I-ST-UNet, and SegFormer. Additionally, it reaches a real-time inference speed of 56.31 FPS, significantly surpassing other models. The experimental results demonstrate the proposed framework’s strong generalization capability and robustness under varying noise levels and damage scenarios, underscoring its suitability for scalable, automated surface damage assessment in UAV-based remote sensing of civil infrastructure. Full article
Show Figures

Figure 1

23 pages, 3099 KiB  
Article
Explainable Multi-Scale CAM Attention for Interpretable Cloud Segmentation in Astro-Meteorological Applications
by Qing Xu, Zichen Zhang, Guanfang Wang and Yunjie Chen
Appl. Sci. 2025, 15(15), 8555; https://doi.org/10.3390/app15158555 (registering DOI) - 1 Aug 2025
Viewed by 149
Abstract
Accurate cloud segmentation is critical for astronomical observations and solar forecasting. However, traditional threshold- and texture-based methods suffer from limited accuracy (65–80%) under complex conditions such as thin cirrus or twilight transitions. Although the deep-learning segmentation method based on U-Net effectively captures low-level [...] Read more.
Accurate cloud segmentation is critical for astronomical observations and solar forecasting. However, traditional threshold- and texture-based methods suffer from limited accuracy (65–80%) under complex conditions such as thin cirrus or twilight transitions. Although the deep-learning segmentation method based on U-Net effectively captures low-level and high-level features and achieves significant progress in accuracy, current methods still lack interpretability and multi-scale feature integration and usually produce fuzzy boundaries or fragmented predictions. In this paper, we propose multi-scale CAM, an explainable AI (XAI) framework that integrates class activation mapping (CAM) with hierarchical feature fusion to quantify pixel-level attention across hierarchical features, thereby enhancing the model’s discriminative capability. To achieve precise segmentation, we integrate CAM into an improved U-Net architecture, incorporating multi-scale CAM attention for adaptive feature fusion and dilated residual modules for large-scale context extraction. Experimental results on the SWINSEG dataset demonstrate that our method outperforms existing state-of-the-art methods, improving recall by 3.06%, F1 score by 1.49%, and MIoU by 2.21% over the best baseline. The proposed framework balances accuracy, interpretability, and computational efficiency, offering a trustworthy solution for cloud detection systems in operational settings. Full article
(This article belongs to the Special Issue Explainable Artificial Intelligence Technology and Its Applications)
Show Figures

Figure 1

27 pages, 6715 KiB  
Article
Structural Component Identification and Damage Localization of Civil Infrastructure Using Semantic Segmentation
by Piotr Tauzowski, Mariusz Ostrowski, Dominik Bogucki, Piotr Jarosik and Bartłomiej Błachowski
Sensors 2025, 25(15), 4698; https://doi.org/10.3390/s25154698 - 30 Jul 2025
Viewed by 280
Abstract
Visual inspection of civil infrastructure for structural health assessment, as performed by structural engineers, is expensive and time-consuming. Therefore, automating this process is highly attractive, which has received significant attention in recent years. With the increasing capabilities of computers, deep neural networks have [...] Read more.
Visual inspection of civil infrastructure for structural health assessment, as performed by structural engineers, is expensive and time-consuming. Therefore, automating this process is highly attractive, which has received significant attention in recent years. With the increasing capabilities of computers, deep neural networks have become a standard tool and can be used for structural health inspections. A key challenge, however, is the availability of reliable datasets. In this work, the U-net and DeepLab v3+ convolutional neural networks are trained on a synthetic Tokaido dataset. This dataset comprises images representative of data acquired by unmanned aerial vehicle (UAV) imagery and corresponding ground truth data. The data includes semantic segmentation masks for both categorizing structural elements (slabs, beams, and columns) and assessing structural damage (concrete spalling or exposed rebars). Data augmentation, including both image quality degradation (e.g., brightness modification, added noise) and image transformations (e.g., image flipping), is applied to the synthetic dataset. The selected neural network architectures achieve excellent performance, reaching values of 97% for accuracy and 87% for Mean Intersection over Union (mIoU) on the validation data. It also demonstrates promising results in the semantic segmentation of real-world structures captured in photographs, despite being trained solely on synthetic data. Additionally, based on the obtained results of semantic segmentation, it can be concluded that DeepLabV3+ outperforms U-net in structural component identification. However, this is not the case in the damage identification task. Full article
(This article belongs to the Special Issue AI-Assisted Condition Monitoring and Fault Diagnosis)
Show Figures

Figure 1

20 pages, 19642 KiB  
Article
SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention
by Baishao Zhan, Jiawei Liao, Hailiang Zhang, Wei Luo, Shizhao Wang, Qiangqiang Zeng and Yongxian Lai
Spectrosc. J. 2025, 3(3), 22; https://doi.org/10.3390/spectroscj3030022 - 29 Jul 2025
Viewed by 139
Abstract
Bruising in ‘Korla’ pears represents a prevalent phenomenon that leads to progressive fruit decay and substantial economic losses. The detection of early-stage bruising proves challenging due to the absence of visible external characteristics, and existing deep learning models have limitations in weak feature [...] Read more.
Bruising in ‘Korla’ pears represents a prevalent phenomenon that leads to progressive fruit decay and substantial economic losses. The detection of early-stage bruising proves challenging due to the absence of visible external characteristics, and existing deep learning models have limitations in weak feature extraction under complex optical interference. To address the postharvest latent damage detection challenges in ‘Korla’ pears, this study proposes a collaborative detection framework integrating structured-illumination reflectance imaging (SIRI) with multi-order gated attention mechanisms. Initially, an SIRI optical system was constructed, employing 150 cycles·m−1 spatial frequency modulation and a three-phase demodulation algorithm to extract subtle interference signal variations, thereby generating RT (Relative Transmission) images with significantly enhanced contrast in subsurface damage regions. To improve the detection accuracy of latent damage areas, the MOGA-UNet model was developed with three key innovations: 1. Integrate the lightweight VGG16 encoder structure into the feature extraction network to improve computational efficiency while retaining details. 2. Add a multi-order gated aggregation module at the end of the encoder to realize the fusion of features at different scales through a special convolution method. 3. Embed the channel attention mechanism in the decoding stage to dynamically enhance the weight of feature channels related to damage. Experimental results demonstrate that the proposed model achieves 94.38% mean Intersection over Union (mIoU) and 97.02% Dice coefficient on RT images, outperforming the baseline UNet model by 2.80% with superior segmentation accuracy and boundary localization capabilities compared with mainstream models. This approach provides an efficient and reliable technical solution for intelligent postharvest agricultural product sorting. Full article
Show Figures

Figure 1

23 pages, 7839 KiB  
Article
Automated Identification and Analysis of Cracks and Damage in Historical Buildings Using Advanced YOLO-Based Machine Vision Technology
by Kui Gao, Li Chen, Zhiyong Li and Zhifeng Wu
Buildings 2025, 15(15), 2675; https://doi.org/10.3390/buildings15152675 - 29 Jul 2025
Viewed by 180
Abstract
Structural cracks significantly threaten the safety and longevity of historical buildings, which are essential parts of cultural heritage. Conventional inspection techniques, which depend heavily on manual visual evaluations, tend to be inefficient and subjective. This research introduces an automated framework for crack and [...] Read more.
Structural cracks significantly threaten the safety and longevity of historical buildings, which are essential parts of cultural heritage. Conventional inspection techniques, which depend heavily on manual visual evaluations, tend to be inefficient and subjective. This research introduces an automated framework for crack and damage detection using advanced YOLO (You Only Look Once) models, aiming to improve both the accuracy and efficiency of monitoring heritage structures. A dataset comprising 2500 high-resolution images was gathered from historical buildings and categorized into four levels of damage: no damage, minor, moderate, and severe. Following preprocessing and data augmentation, a total of 5000 labeled images were utilized to train and evaluate four YOLO variants: YOLOv5, YOLOv8, YOLOv10, and YOLOv11. The models’ performances were measured using metrics such as precision, recall, mAP@50, mAP@50–95, as well as losses related to bounding box regression, classification, and distribution. Experimental findings reveal that YOLOv10 surpasses other models in multi-target detection and identifying minor damage, achieving higher localization accuracy and faster inference speeds. YOLOv8 and YOLOv11 demonstrate consistent performance and strong adaptability, whereas YOLOv5 converges rapidly but shows weaker validation results. Further testing confirms YOLOv10’s effectiveness across different structural components, including walls, beams, and ceilings. This study highlights the practicality of deep learning-based crack detection methods for preserving building heritage. Future advancements could include combining semantic segmentation networks (e.g., U-Net) with attention mechanisms to further refine detection accuracy in complex scenarios. Full article
(This article belongs to the Special Issue Structural Safety Evaluation and Health Monitoring)
Show Figures

Figure 1

31 pages, 103100 KiB  
Article
Semantic Segmentation of Small Target Diseases on Tobacco Leaves
by Yanze Zou, Zhenping Qiang, Shuang Zhang and Hong Lin
Agronomy 2025, 15(8), 1825; https://doi.org/10.3390/agronomy15081825 - 28 Jul 2025
Viewed by 244
Abstract
The application of image recognition technology plays a vital role in agricultural disease identification. Existing approaches primarily rely on image classification, object detection, or semantic segmentation. However, a major challenge in current semantic segmentation methods lies in accurately identifying small target objects. In [...] Read more.
The application of image recognition technology plays a vital role in agricultural disease identification. Existing approaches primarily rely on image classification, object detection, or semantic segmentation. However, a major challenge in current semantic segmentation methods lies in accurately identifying small target objects. In this study, common tobacco leaf diseases—such as frog-eye disease, climate spots, and wildfire disease—are characterized by small lesion areas, with an average target size of only 32 pixels. This poses significant challenges for existing techniques to achieve precise segmentation. To address this issue, we propose integrating two attention mechanisms, namely cross-feature map attention and dual-branch attention, which are incorporated into the semantic segmentation network to enhance performance on small lesion segmentation. Moreover, considering the lack of publicly available datasets for tobacco leaf disease segmentation, we constructed a training dataset via image splicing. Extensive experiments were conducted on baseline segmentation models, including UNet, DeepLab, and HRNet. Experimental results demonstrate that the proposed method improves the mean Intersection over Union (mIoU) by 4.75% on the constructed dataset, with only a 15.07% increase in computational cost. These results validate the effectiveness of our novel attention-based strategy in the specific context of tobacco leaf disease segmentation. Full article
(This article belongs to the Section Pest and Disease Management)
Show Figures

Figure 1

21 pages, 5527 KiB  
Article
SGNet: A Structure-Guided Network with Dual-Domain Boundary Enhancement and Semantic Fusion for Skin Lesion Segmentation
by Haijiao Yun, Qingyu Du, Ziqing Han, Mingjing Li, Le Yang, Xinyang Liu, Chao Wang and Weitian Ma
Sensors 2025, 25(15), 4652; https://doi.org/10.3390/s25154652 - 27 Jul 2025
Viewed by 300
Abstract
Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based [...] Read more.
Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based on UNet or Transformer architectures, often face limitations in regard to fully exploiting lesion features and incur high computational costs, compromising precise lesion delineation. To overcome these challenges, we propose SGNet, a structure-guided network, integrating a hybrid CNN–Mamba framework for robust skin lesion segmentation. The SGNet employs the Visual Mamba (VMamba) encoder to efficiently extract multi-scale features, followed by the Dual-Domain Boundary Enhancer (DDBE), which refines boundary representations and suppresses noise through spatial and frequency-domain processing. The Semantic-Texture Fusion Unit (STFU) adaptively integrates low-level texture with high-level semantic features, while the Structure-Aware Guidance Module (SAGM) generates coarse segmentation maps to provide global structural guidance. The Guided Multi-Scale Refiner (GMSR) further optimizes boundary details through a multi-scale semantic attention mechanism. Comprehensive experiments based on the ISIC2017, ISIC2018, and PH2 datasets demonstrate SGNet’s superior performance, with average improvements of 3.30% in terms of the mean Intersection over Union (mIoU) value and 1.77% in regard to the Dice Similarity Coefficient (DSC) compared to state-of-the-art methods. Ablation studies confirm the effectiveness of each component, highlighting SGNet’s exceptional accuracy and robust generalization for computer-aided dermatological diagnosis. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

27 pages, 13439 KiB  
Article
Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism
by Jie Rao, Mingju Chen, Xiaofei Song, Chen Xie, Xueyang Duan, Xiao Hu, Senyuan Li and Xingyue Zhang
Appl. Sci. 2025, 15(15), 8332; https://doi.org/10.3390/app15158332 - 26 Jul 2025
Viewed by 166
Abstract
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale [...] Read more.
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale geological signal representation. The decoder replaces traditional self-attention with ORCA attention to enable global context modeling with lower computational cost. Skip connections integrate a residual channel attention module, mitigating gradient degradation via dual-pooling feature fusion and activation optimization, forming a full-link optimization from low-level feature enhancement to high-level semantic integration. Simulated and real dataset experiments show that at decimation ratios of 0.1–0.5, the method significantly outperforms SwinUnet, TransUnet, etc., in reconstruction performance. Residual signals and F-K spectra verify high-fidelity reconstruction. Despite increased difficulty with higher sparsity, it maintains optimal performance with notable margins, demonstrating strong robustness. The proposed hierarchical feature enhancement and cross-scale attention strategies offer an efficient seismic profile signal reconstruction solution and show generality for migration to complex visual tasks, advancing geophysics-computer vision interdisciplinary innovation. Full article
Show Figures

Figure 1

25 pages, 27219 KiB  
Article
KCUNET: Multi-Focus Image Fusion via the Parallel Integration of KAN and Convolutional Layers
by Jing Fang, Ruxian Wang, Xinglin Ning, Ruiqing Wang, Shuyun Teng, Xuran Liu, Zhipeng Zhang, Wenfeng Lu, Shaohai Hu and Jingjing Wang
Entropy 2025, 27(8), 785; https://doi.org/10.3390/e27080785 - 24 Jul 2025
Viewed by 170
Abstract
Multi-focus image fusion (MFIF) is an image-processing method that aims to generate fully focused images by integrating source images from different focal planes. However, the defocus spread effect (DSE) often leads to blurred or jagged focus/defocus boundaries in fused images, which affects the [...] Read more.
Multi-focus image fusion (MFIF) is an image-processing method that aims to generate fully focused images by integrating source images from different focal planes. However, the defocus spread effect (DSE) often leads to blurred or jagged focus/defocus boundaries in fused images, which affects the quality of the image. To address this issue, this paper proposes a novel model that embeds the Kolmogorov–Arnold network with convolutional layers in parallel within the U-Net architecture (KCUNet). This model keeps the spatial dimensions of the feature map constant to maintain high-resolution details while progressively increasing the number of channels to capture multi-level features at the encoding stage. In addition, KCUNet incorporates a content-guided attention mechanism to enhance edge information processing, which is crucial for DSE reduction and edge preservation. The model’s performance is optimized through a hybrid loss function that evaluates in several aspects, including edge alignment, mask prediction, and image quality. Finally, comparative evaluations against 15 state-of-the-art methods demonstrate KCUNet’s superior performance in both qualitative and quantitative analyses. Full article
(This article belongs to the Section Signal and Data Analysis)
Show Figures

Figure 1

18 pages, 3368 KiB  
Article
Segmentation-Assisted Fusion-Based Classification for Automated CXR Image Analysis
by Shilu Kang, Dongfang Li, Jiaxin Xu, Aokun Mei and Hua Huo
Sensors 2025, 25(15), 4580; https://doi.org/10.3390/s25154580 - 24 Jul 2025
Viewed by 299
Abstract
Accurate classification of chest X-ray (CXR) images is crucial for diagnosing lung diseases in medical imaging. Existing deep learning models for CXR image classification face challenges in distinguishing non-lung features. In this work, we propose a new segmentation-assisted fusion-based classification method. The method [...] Read more.
Accurate classification of chest X-ray (CXR) images is crucial for diagnosing lung diseases in medical imaging. Existing deep learning models for CXR image classification face challenges in distinguishing non-lung features. In this work, we propose a new segmentation-assisted fusion-based classification method. The method involves two stages: first, we use a lightweight segmentation model, Partial Convolutional Segmentation Network (PCSNet) designed based on an encoder–decoder architecture, to accurately obtain lung masks from CXR images. Then, a fusion of the masked CXR image with the original image enables classification using the improved lightweight ShuffleNetV2 model. The proposed method is trained and evaluated on segmentation datasets including the Montgomery County Dataset (MC) and Shenzhen Hospital Dataset (SH), and classification datasets such as Chest X-Ray Images for Pneumonia (CXIP) and COVIDx. Compared with seven segmentation models (U-Net, Attention-Net, SegNet, FPNNet, DANet, DMNet, and SETR), five classification models (ResNet34, ResNet50, DenseNet121, Swin-Transforms, and ShuffleNetV2), and state-of-the-art methods, our PCSNet model achieved high segmentation performance on CXR images. Compared to the state-of-the-art Attention-Net model, the accuracy of PCSNet increased by 0.19% (98.94% vs. 98.75%), and the boundary accuracy improved by 0.3% (97.86% vs. 97.56%), while requiring 62% fewer parameters. For pneumonia classification using the CXIP dataset, the proposed strategy outperforms the current best model by 0.14% in accuracy (98.55% vs. 98.41%). For COVID-19 classification with the COVIDx dataset, the model reached an accuracy of 97.50%, the absolute improvement in accuracy compared to CovXNet was 0.1%, and clinical metrics demonstrate more significant gains: specificity increased from 94.7% to 99.5%. These results highlight the model’s effectiveness in medical image analysis, demonstrating clinically meaningful improvements over state-of-the-art approaches. Full article
(This article belongs to the Special Issue Vision- and Image-Based Biomedical Diagnostics—2nd Edition)
Show Figures

Figure 1

19 pages, 9361 KiB  
Article
A Multi-Domain Enhanced Network for Underwater Image Enhancement
by Tianmeng Sun, Yinghao Zhang, Jiamin Hu, Haiyuan Cui and Teng Yu
Information 2025, 16(8), 627; https://doi.org/10.3390/info16080627 - 23 Jul 2025
Viewed by 168
Abstract
Owing to the intricate variability of underwater environments, images suffer from degradation including light absorption, scattering, and color distortion. However, U-Net architectures severely limit global context utilization due to fixed-receptive-field convolutions, while traditional attention mechanisms incur quadratic complexity and fail to efficiently fuse [...] Read more.
Owing to the intricate variability of underwater environments, images suffer from degradation including light absorption, scattering, and color distortion. However, U-Net architectures severely limit global context utilization due to fixed-receptive-field convolutions, while traditional attention mechanisms incur quadratic complexity and fail to efficiently fuse spatial–frequency features. Unlike local enhancement-focused methods, HMENet integrates a transformer sub-network for long-range dependency modeling and dual-domain attention for bidirectional spatial–frequency fusion. This design increases the receptive field while maintaining linear complexity. On UIEB and EUVP datasets, HMENet achieves PSNR/SSIM of 25.96/0.946 and 27.92/0.927, surpassing HCLR-Net by 0.97 dB/1.88 dB, respectively. Full article
Show Figures

Figure 1

18 pages, 2028 KiB  
Article
Research on Single-Tree Segmentation Method for Forest 3D Reconstruction Point Cloud Based on Attention Mechanism
by Lishuo Huo, Zhao Chen, Lingnan Dai, Dianchang Wang and Xinrong Zhao
Forests 2025, 16(7), 1192; https://doi.org/10.3390/f16071192 - 19 Jul 2025
Viewed by 248
Abstract
The segmentation of individual trees holds considerable significance in the investigation and management of forest resources. Utilizing smartphone-captured imagery combined with image-based 3D reconstruction techniques to generate corresponding point cloud data can serve as a more accessible and potentially cost-efficient alternative for data [...] Read more.
The segmentation of individual trees holds considerable significance in the investigation and management of forest resources. Utilizing smartphone-captured imagery combined with image-based 3D reconstruction techniques to generate corresponding point cloud data can serve as a more accessible and potentially cost-efficient alternative for data acquisition compared to conventional LiDAR methods. In this study, we present a Sparse 3D U-Net framework for single-tree segmentation which is predicated on a multi-head attention mechanism. The mechanism functions by projecting the input data into multiple subspaces—referred to as “heads”—followed by independent attention computation within each subspace. Subsequently, the outputs are aggregated to form a comprehensive representation. As a result, multi-head attention facilitates the model’s ability to capture diverse contextual information, thereby enhancing performance across a wide range of applications. This framework enables efficient, intelligent, and end-to-end instance segmentation of forest point cloud data through the integration of multi-scale features and global contextual information. The introduction of an iterative mechanism at the attention layer allows the model to learn more compact feature representations, thereby significantly enhancing its convergence speed. In this study, Dongsheng Bajia Country Park and Jiufeng National Forest Park, situated in Haidian District, Beijing, China, were selected as the designated test sites. Eight representative sample plots within these areas were systematically sampled. Forest stand sequential photographs were captured using an iPhone, and these images were processed to generate corresponding point cloud data for the respective sample plots. This methodology was employed to comprehensively assess the model’s capability for single-tree segmentation. Furthermore, the generalization performance of the proposed model was validated using the publicly available dataset TreeLearn. The model’s advantages were demonstrated across multiple aspects, including data processing efficiency, training robustness, and single-tree segmentation speed. The proposed method achieved an F1 score of 91.58% on the customized dataset. On the TreeLearn dataset, the method attained an F1 score of 97.12%. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

23 pages, 5668 KiB  
Article
MEFA-Net: Multilevel Feature Extraction and Fusion Attention Network for Infrared Small-Target Detection
by Jingcui Ma, Nian Pan, Dengyu Yin, Di Wang and Jin Zhou
Remote Sens. 2025, 17(14), 2502; https://doi.org/10.3390/rs17142502 - 18 Jul 2025
Viewed by 300
Abstract
Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic [...] Read more.
Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic gap in the feature fusion process, a multilevel feature extraction and fusion attention network (MEFA-Net) is designed. Specifically, the dilated direction-sensitive convolution block (DDCB) is devised to collaboratively extract local detail features, contextual features, and Gaussian salient features via ordinary convolution, dilated convolution and parallel strip convolution. Furthermore, the encoder attention fusion module (EAF) is employed, where spatial and channel attention weights are generated using dual-path pooling to achieve the adaptive fusion of deep and shallow layer features. Lastly, an efficient up-sampling block (EUB) is constructed, integrating a hybrid up-sampling strategy with multi-scale dilated convolution to refine the localization of small targets. The experimental results confirm that the proposed algorithm model surpasses most existing recent methods. Compared with the baseline, the intersection over union (IoU) and probability of detection Pd of MEFA-Net on the IRSTD-1k dataset are increased by 2.25% and 3.05%, respectively, achieving better detection performance and a lower false alarm rate in complex scenarios. Full article
Show Figures

Figure 1

21 pages, 2308 KiB  
Article
Forgery-Aware Guided Spatial–Frequency Feature Fusion for Face Image Forgery Detection
by Zhenxiang He, Zhihao Liu and Ziqi Zhao
Symmetry 2025, 17(7), 1148; https://doi.org/10.3390/sym17071148 - 18 Jul 2025
Viewed by 314
Abstract
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they [...] Read more.
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they still suffer from limited sensitivity to local forgery regions and inadequate interaction between spatial and frequency information in practical applications. To address these challenges, we propose a novel forgery-aware guided spatial–frequency feature fusion network. A lightweight U-Net is employed to generate pixel-level saliency maps by leveraging structural symmetry and semantic consistency, without relying on ground-truth masks. These maps dynamically guide the fusion of spatial features (from an improved Swin Transformer) and frequency features (via Haar wavelet transforms). Cross-domain attention, channel recalibration, and spatial gating are introduced to enhance feature complementarity and regional discrimination. Extensive experiments conducted on two benchmark face forgery datasets, FaceForensics++ and Celeb-DFv2, show that the proposed method consistently outperforms existing state-of-the-art techniques in terms of detection accuracy and generalization capability. The future work includes improving robustness under compression, incorporating temporal cues, extending to multimodal scenarios, and evaluating model efficiency for real-world deployment. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

21 pages, 5917 KiB  
Article
VML-UNet: Fusing Vision Mamba and Lightweight Attention Mechanism for Skin Lesion Segmentation
by Tang Tang, Haihui Wang, Qiang Rao, Ke Zuo and Wen Gan
Electronics 2025, 14(14), 2866; https://doi.org/10.3390/electronics14142866 - 17 Jul 2025
Viewed by 503
Abstract
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks [...] Read more.
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks is crucial for accurate lesion localization and optimized clinical workflows. We propose the VML-UNet, a lightweight segmentation network with core innovations including the CPMamba module and the multi-scale local supervision module (MLSM). The CPMamba module integrates the visual state space (VSS) block and a channel prior attention mechanism to enable efficient modeling of spatial relationships with linear computational complexity through dynamic channel-space weight allocation, while preserving channel feature integrity. The MLSM enhances local feature perception and reduces the inference burden. Comparative experiments were conducted on three public datasets, including ISIC2017, ISIC2018, and PH2, with ablation experiments performed on ISIC2017. VML-UNet achieves 0.53 M parameters, 2.18 MB memory usage, and 1.24 GFLOPs time complexity, with its performance on the datasets outperforming comparative networks, validating its effectiveness. This study provides valuable references for developing lightweight, high-performance skin lesion segmentation networks, advancing the field of skin lesion segmentation. Full article
(This article belongs to the Section Bioelectronics)
Show Figures

Figure 1

Back to TopTop