Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (41)

Search Parameters:
Keywords = dilated pixel attention network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 50937 KB  
Article
MAFT: A Lightweight Network for Martian Rock Segmentation Based on an Adaptive Frequency Transformer
by Chu Li, Yutong Jia, Gang Wan, Qifang Ma, Jia Liu, Yang Wang, Biao Wang, Jia Liu and Zhanji Wei
Remote Sens. 2026, 18(11), 1794; https://doi.org/10.3390/rs18111794 - 1 Jun 2026
Viewed by 322
Abstract
The segmentation of rocks on the Martian surface is crucial for navigation and obstacle avoidance by Mars rovers. However, frequent dust storms degrade rock surface textures, and the wide range of rock scales—from sub-meter to ten-meter—further complicates segmentation, especially under the strict computational [...] Read more.
The segmentation of rocks on the Martian surface is crucial for navigation and obstacle avoidance by Mars rovers. However, frequent dust storms degrade rock surface textures, and the wide range of rock scales—from sub-meter to ten-meter—further complicates segmentation, especially under the strict computational constraints of rover hardware. This paper proposes a lightweight network named MAFT, specifically designed for Martian rock segmentation. The network builds upon the Adaptive Frequency Transformer (AFFormer) and constructs an improved backbone termed the Improved Adaptive Frequency Transformer (IAFFormer). By replacing the traditional self-attention mechanism with a frequency-domain approach, it captures global feature dependencies while reducing the computational complexity from quadratic to linear. The spatially isolated 1 × 1 convolutions in the pixel descriptor module are further replaced with Adaptive Kernel Convolution (AKConv), enabling the backbone to dynamically adjust its sampling positions to conform to the irregular and diverse morphologies of Martian rocks. An Enhanced Multidimensional Convolutional Attention (EMCA) module is introduced as the decoding structure. By integrating max-pooling in the squeeze stage and adaptive dilated convolutions in the excitation stage, EMCA strengthens the boundary perception and long-range dependency modeling of dust-covered rocks without increasing the parameter count. Additionally, we constructed a dataset of Martian rocks for the Zhurong rover (TWMARS-V2) and conducted experiments using a synthetic dataset (SynMars) and a real dataset (MarsData-V2). Experimental results demonstrate that MAFT achieves the highest segmentation accuracy among all compared methods, with only 2.97 M parameters and 15.49 G FLOPs. On the TWMARS-V2 dataset, Pixel Accuracy (PA) reaches 98.17%, and IoU reaches 88.90%. Full article
Show Figures

Figure 1

23 pages, 8187 KB  
Article
DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection
by Mengyao Lan, Bangjun Huang and Peng Wu
Agronomy 2026, 16(10), 964; https://doi.org/10.3390/agronomy16100964 - 12 May 2026
Viewed by 329
Abstract
Farmland resources are fundamental to human survival and play a vital role in ensuring global food security. However, farmland boundary detection remains a significant technical challenge due to the low proportion of boundary pixels, multi-scale variations, and weak boundary continuity. To address these [...] Read more.
Farmland resources are fundamental to human survival and play a vital role in ensuring global food security. However, farmland boundary detection remains a significant technical challenge due to the low proportion of boundary pixels, multi-scale variations, and weak boundary continuity. To address these issues, this study proposes DCFENet, a dual-branch collaborative feature enhancement network. Specifically, a multi-scale feature fusion attention module TA-ASPP (Task-Aware Atrous Spatial Pyramid Pooling) is designed, which effectively enhances the network’s perception of farmland boundary features by integrating multi-scale dilated convolutions with skeleton-aware attention. In addition, a dual-branch decoding structure is proposed to enhance boundary localization and global topology modeling through boundary-aware gating and cross-branch feature fusion, thereby improving the boundary continuity. Furthermore, a collaborative constraint mechanism is proposed for dual-branch decoding, which supervises the two decoders using boundary loss and skeleton loss, thereby enhancing structural consistency and topology preservation. Experimental results demonstrate that DCFENet achieves precision, recall, and boundary IoU of 74.5%, 68.1%, and 77.4%, respectively, representing an improvement of 26.8%, 36.3%, and 13.2% compared with ResNet18_UNet. It also outperforms mainstream methods such as UNet, EdgeNAT, and EDTER. In terms of computational efficiency, DCFENet contains 26.43 M parameters and 37.43 G FLOPs, with a memory usage of 1.03 GB and an inference speed of 97.97 FPS, achieving a good balance between accuracy and efficiency. The results demonstrate the efficiency and accuracy of DCFENet in extracting farmland boundaries from high-resolution remote sensing images, providing technical support for farmland management and the advancement of precision and digital agriculture. Full article
(This article belongs to the Special Issue Remote Sensing and GIS in Sustainable and Precision Agriculture)
Show Figures

Figure 1

27 pages, 3514 KB  
Article
ECAB-SegFormer: A Boundary-Aware and Efficient Channel Attention Network for Ulva prolifera Semantic Segmentation in Remote Sensing Imagery
by Yue Liang, Danyang Cao, Zice Ji, Hao Yang, Maohua Guo, Xiaoya Liu, Xutong Guo, Jiahao Wu, Yulong Song and Shanzhe Zhang
Sensors 2026, 26(7), 2166; https://doi.org/10.3390/s26072166 - 31 Mar 2026
Viewed by 535
Abstract
To achieve high-precision Ulva prolifera semantic segmentation from remote sensing imagery and address issues such as boundary fragmentation, contour dilation, and missed segmentation of scattered patches under complex marine backgrounds, this paper proposes an improved SegFormer-based network termed ECAB-SegFormer. The proposed method enhances [...] Read more.
To achieve high-precision Ulva prolifera semantic segmentation from remote sensing imagery and address issues such as boundary fragmentation, contour dilation, and missed segmentation of scattered patches under complex marine backgrounds, this paper proposes an improved SegFormer-based network termed ECAB-SegFormer. The proposed method enhances near-infrared feature representation and boundary perception by embedding an Efficient Channel Attention (ECA) module into shallow features and introducing a boundary supervision branch. Experimental results on the HYU dataset demonstrate that the proposed method achieves consistent improvements over classical baseline models and further outperforms several representative modern strong segmentation baselines. Compared with advanced methods such as DeepLabV3+, Swin-Unet, and Gated-SCNN, the proposed model achieves maximum improvements of 2.77%, 5.80%, and 4.26(pixel) in mIoU, BFScore, and Hausdorff Distance (HD), respectively, while also obtaining superior Precision and F1 Scores. These results demonstrate significant advantages in both regional segmentation accuracy and boundary localization quality, validating the effectiveness, robustness, and practical potential of the proposed method for Ulva prolifera semantic segmentation in remote sensing applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 3875 KB  
Article
Attention-Weighted Hierarchical Decoding for Few-Shot Semantic Segmentation: A Case Study on Batik Cultural Heritage Patterns
by Yuzhou Ma, Haolong Qian and Wei Li
Electronics 2026, 15(6), 1242; https://doi.org/10.3390/electronics15061242 - 17 Mar 2026
Viewed by 409
Abstract
Few-shot semantic segmentation aims to learn accurate pixel-level classification from limited annotated samples, a critical capability for real-world applications where data acquisition is expensive or impractical. However, existing methods often struggle with fine-grained texture details and complex boundaries under data-scarce conditions, particularly when [...] Read more.
Few-shot semantic segmentation aims to learn accurate pixel-level classification from limited annotated samples, a critical capability for real-world applications where data acquisition is expensive or impractical. However, existing methods often struggle with fine-grained texture details and complex boundaries under data-scarce conditions, particularly when applied to domains with intricate visual patterns (such as batik patterns). To address this few-shot learning challenge, we constructed a few-shot batik pattern dataset and proposed a novel network architecture centered on attention weighting and hierarchical decoding. Our method leverages a pre-trained ResNet101 backbone for transfer learning to establish a strong feature foundation. It incorporates a dual-attention module that combines spatial and channel attention to dynamically highlight semantically rich regions and intricate texture boundaries specific to batik. For multi-scale context aggregation, a lightweight module utilizing parallel dilated convolutions is introduced to efficiently capture features from varying receptive fields. Finally, a hierarchical decoder progressively integrates these enhanced, multi-scale features with high-resolution shallow features to reconstruct precise segmentation maps. Comprehensive evaluations on a dedicated batik dataset show that our model achieves state-of-the-art performance, with a mean Intersection over Union (mIoU) of 79.22% and a pixel accuracy (PA) of 92.47%. It notably improves over the strong DeepLabV3+ baseline by 3.3% in mIoU and 0.95% in PA, demonstrating its effectiveness for the task of batik pattern segmentation under data-scarce conditions. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

25 pages, 24156 KB  
Article
MLCANet: Multi-Level Composite Attention-Guided Network for Non-Homogeneous Image Dehazing in Adverse Weather Conditions
by Yongsheng Qiu
Sensors 2026, 26(5), 1505; https://doi.org/10.3390/s26051505 - 27 Feb 2026
Viewed by 424
Abstract
Image dehazing is a challenging ill-posed problem in low-level computer vision tasks, requiring the restoration of high-quality, haze-free images from complex and foggy conditions. Deep learning-based dehazing methods struggle to effectively remove non-homogeneous fog distributions due to the uneven and dense nature of [...] Read more.
Image dehazing is a challenging ill-posed problem in low-level computer vision tasks, requiring the restoration of high-quality, haze-free images from complex and foggy conditions. Deep learning-based dehazing methods struggle to effectively remove non-homogeneous fog distributions due to the uneven and dense nature of fog patches, making it difficult to clear real-world fog variations. A key challenge for non-homogeneous image dehazing algorithms is efficiently capturing the spatial distribution of haze in areas with varying fog densities while restoring fine image details. To address these challenges, we propose MLCANet, a multi-level composite attention-guided network for non-homogeneous image dehazing. MLCANet mitigates the impact of uneven haze areas through two main components: the Multi-level Composite Attention Generation Network (MCAGN) and the Dehazed Image Reconstruction Network (DIRN). The MCAGN integrates channel attention (CA), spatial attention (SA), and multi-scale pixel attention (MSPA) to capture haze features at different spatial scales. The DIRN, based on a decoder-encoder architecture, combines multi-scale dilated convolutions and deformable convolutions to restore fine image details more flexibly and efficiently. Extensive qualitative and quantitative experiments, along with ablation studies, demonstrate the effectiveness and feasibility of this method for non-homogeneous image dehazing. Full article
Show Figures

Figure 1

27 pages, 18987 KB  
Article
YOLO11s-UAV: An Advanced Algorithm for Small Object Detection in UAV Aerial Imagery
by Qi Mi, Jianshu Chao, Anqi Chen, Kaiyuan Zhang and Jiahua Lai
J. Imaging 2026, 12(2), 69; https://doi.org/10.3390/jimaging12020069 - 6 Feb 2026
Cited by 6 | Viewed by 2955
Abstract
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in [...] Read more.
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in aerial footage, and limited computational resources onboard. To address these issues, this paper proposes an improved UAV-based small object detection algorithm, YOLO11s-UAV, specifically designed for aerial imagery. Firstly, we introduce a novel FPN, called Content-Aware Reassembly and Interaction Feature Pyramid Network (CARIFPN), which significantly enhances small object feature detection while reducing redundant network structures. Secondly, we apply a new downsampling convolution for small object feature extraction, called Space-to-Depth for Dilation-wise Residual Convolution (S2DResConv), in the model’s backbone. This module effectively eliminates information loss caused by strided convolution or pooling operations and facilitates the capture of multi-scale context. Finally, we integrate a simple, parameter-free attention module (SimAM) with C3k2 to form Flexible SimAM (FlexSimAM), which is applied throughout the entire model. This improved module not only reduces the model’s complexity but also enables efficient enhancement of small object features in complex scenarios. Experimental results demonstrate that on the VisDrone-DET2019 dataset, our model improves mAP@0.5 by 7.8% on the validation set (reaching 46.0%) and by 5.9% on the test set (increasing to 37.3%) compared to the baseline YOLO11s, while reducing model parameters by 55.3%. Similarly, it achieves a 7.2% improvement on the TinyPerson dataset and a 3.0% increase on UAVDT-DET. Deployment on the NVIDIA Jetson Orin NX SUPER platform shows that our model achieves 33 FPS, which is 21.4% lower than YOLO11s, confirming its feasibility for real-time onboard UAV applications. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

31 pages, 64042 KB  
Article
Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera
by Ajun Shao, Hongying He, Guanghui Gao, Mengxu Zhang, Pengqiang Ge, Xiaofang Kong, Weixian Qian, Guohua Gu, Qian Chen and Minjie Wan
Appl. Sci. 2026, 16(2), 1052; https://doi.org/10.3390/app16021052 - 20 Jan 2026
Cited by 1 | Viewed by 811
Abstract
The imaging quality of uncooled long wave infrared (IR) cameras is always limited by the stripe non-uniformity mainly caused by fixed pattern noise (FPN). In this paper, we propose an adaptive dual-frequency denoising network-based stripe non-uniformity correction (NUC) method, namely ADFDNet, to realize [...] Read more.
The imaging quality of uncooled long wave infrared (IR) cameras is always limited by the stripe non-uniformity mainly caused by fixed pattern noise (FPN). In this paper, we propose an adaptive dual-frequency denoising network-based stripe non-uniformity correction (NUC) method, namely ADFDNet, to realize the balance between FPN removal and image detail preservation. Our ADFDNet takes the dual-frequency feature deconstruction module as its core, which decomposes the IR image into high-frequency and low-frequency features, and performs targeted processing through detail enhancement branches and sparse denoising branches. The former enhances the performance of detail preservation through multi-scale convolution and pixel attention mechanism, while the latter combines sparse attention mechanism and dilated convolution design to suppress high-frequency FPN. Furthermore, the dynamic weight fusion of features is realized using the adaptive dual-frequency fusion module, which better integrates detail information. In our study, a 420-pair image dataset covering different noise levels is constructed for better model training and evaluation. Experiments verify that the presented ADFDNet method significantly improves image clarity in both real and simulated noise scenes, and achieves a better balance between FPN suppression and detail preservation than other existing methods. Full article
(This article belongs to the Section Optics and Lasers)
Show Figures

Figure 1

33 pages, 4303 KB  
Article
Artificial Intelligence-Based Plant Disease Classification in Low-Light Environments
by Hafiz Ali Hamza Gondal, Seong In Jeong, Won Ho Jang, Jun Seo Kim, Rehan Akram, Muhammad Irfan, Muhammad Hamza Tariq and Kang Ryoung Park
Fractal Fract. 2025, 9(11), 691; https://doi.org/10.3390/fractalfract9110691 - 27 Oct 2025
Cited by 4 | Viewed by 3202
Abstract
The accurate classification of plant diseases is vital for global food security, as diseases can cause major yield losses and threaten sustainable and precision agriculture. The classification of plant diseases in low-light noisy environments is crucial because crops can be continuously monitored even [...] Read more.
The accurate classification of plant diseases is vital for global food security, as diseases can cause major yield losses and threaten sustainable and precision agriculture. The classification of plant diseases in low-light noisy environments is crucial because crops can be continuously monitored even at night. Important visual cues of disease symptoms can be lost due to the degraded quality of images captured under low-illumination, resulting in poor performance of conventional plant disease classifiers. However, researchers have proposed various techniques for classifying plant diseases in daylight, and no studies have been conducted for low-light noisy environments. Therefore, we propose a novel model for classifying plant diseases from low-light noisy images called dilated pixel attention network (DPA-Net). DPA-Net uses a pixel attention mechanism and multi-layer dilated convolution with a high receptive field, which obtains essential features while highlighting the most relevant information under this challenging condition, allowing more accurate classification results. Additionally, we performed fractal dimension estimation on diseased and healthy leaves to analyze the structural irregularities and complexities. For the performance evaluation, experiments were conducted on two public datasets: the PlantVillage and Potato Leaf Disease datasets. In both datasets, the image resolution is 256 × 256 pixels in joint photographic experts group (JPG) format. For the first dataset, DPA-Net achieved an average accuracy of 92.11% and harmonic mean of precision and recall (F1-score) of 89.11%. For the second dataset, it achieved an average accuracy of 88.92% and an F1-score of 88.60%. These results revealed that the proposed method outperforms state-of-the-art methods. On the first dataset, our method achieved an improvement of 2.27% in average accuracy and 2.86% in F1-score compared to the baseline. Similarly, on the second dataset, it attained an improvement of 6.32% in average accuracy and 6.37% in F1-score over the baseline. In addition, we confirm that our method is effective with the real low-illumination dataset self-constructed by capturing images at 0 lux using a smartphone at night. This approach provides farmers with an affordable practical tool for early disease detection, which can support crop protection worldwide. Full article
Show Figures

Figure 1

24 pages, 3480 KB  
Article
MFPI-Net: A Multi-Scale Feature Perception and Interaction Network for Semantic Segmentation of Urban Remote Sensing Images
by Xiaofei Song, Mingju Chen, Jie Rao, Yangming Luo, Zhihao Lin, Xingyue Zhang, Senyuan Li and Xiao Hu
Sensors 2025, 25(15), 4660; https://doi.org/10.3390/s25154660 - 27 Jul 2025
Cited by 4 | Viewed by 1467
Abstract
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation [...] Read more.
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation rates attention shuffle decoder (DDRASD), a multi-scale convolutional feature enhancement module (MCFEM), and a cross-path residual fusion module (CPRFM). The Swin Transformer efficiently extracts multi-level global semantic features through its hierarchical structure and window attention mechanism. The DDRASD’s diverse dilation rates attention (DDRA) block combines convolutions with diverse dilation rates and channel-coordinate attention to enhance multi-scale contextual awareness, while Shuffle Block improves resolution via pixel rearrangement and avoids checkerboard artifacts. The MCFEM enhances local feature modeling through parallel multi-kernel convolutions, forming a complementary relationship with the Swin Transformer’s global perception capability. The CPRFM employs multi-branch convolutions and a residual multiplication–addition fusion mechanism to enhance interactions among multi-source features, thereby improving the recognition of small objects and similar categories. Experiments on the ISPRS Vaihingen and Potsdam datasets show that MFPI-Net outperforms mainstream methods, achieving 82.57% and 88.49% mIoU, validating its superior segmentation performance in urban remote sensing. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 1770 KB  
Article
PSHNet: Hybrid Supervision and Feature Enhancement for Accurate Infrared Small-Target Detection
by Weicong Chen, Chenghong Zhang and Yuan Liu
Appl. Sci. 2025, 15(14), 7629; https://doi.org/10.3390/app15147629 - 8 Jul 2025
Cited by 1 | Viewed by 1351
Abstract
Detecting small targets in infrared imagery remains highly challenging due to sub-pixel target sizes, low signal-to-noise ratios, and complex background clutter. This paper proposes PSHNet, a hybrid deep-learning framework that combines dense spatial heatmap supervision with geometry-aware regression for accurate infrared small-target detection. [...] Read more.
Detecting small targets in infrared imagery remains highly challenging due to sub-pixel target sizes, low signal-to-noise ratios, and complex background clutter. This paper proposes PSHNet, a hybrid deep-learning framework that combines dense spatial heatmap supervision with geometry-aware regression for accurate infrared small-target detection. The network generates position–scale heatmaps to guide coarse localization, which are further refined through sub-pixel offset and size regression. A Complete IoU (CIoU) loss is introduced as a geometric regularization term to improve alignment between predicted and ground-truth bounding boxes. To better preserve fine spatial details essential for identifying small thermal signatures, an Enhanced Low-level Feature Module (ELFM) is incorporated using multi-scale dilated convolutions and channel attention. Experiments on the NUDT-SIRST and IRSTD-1k datasets demonstrate that PSHNet outperforms existing methods in IoU, detection probability, and false alarm rate, achieving IoU improvement and robust performance under low-SNR conditions. Full article
Show Figures

Figure 1

20 pages, 4198 KB  
Article
HiDRA-DCDNet: Dynamic Hierarchical Attention and Multi-Scale Context Fusion for Real-Time Remote Sensing Small-Target Detection
by Jiale Wang, Zhe Bai, Ximing Zhang, Yuehong Qiu, Fan Bu and Yuancheng Shao
Remote Sens. 2025, 17(13), 2195; https://doi.org/10.3390/rs17132195 - 25 Jun 2025
Cited by 2 | Viewed by 1307
Abstract
Small-target detection in remote sensing presents three fundamental challenges: limited pixel representation of targets, multi-angle imaging-induced appearance variance, and complex background interference. This paper introduces a dual-component neural architecture comprising Hierarchical Dynamic Refinement Attention (HiDRA) and Densely Connected Dilated Block (DCDBlock) to address [...] Read more.
Small-target detection in remote sensing presents three fundamental challenges: limited pixel representation of targets, multi-angle imaging-induced appearance variance, and complex background interference. This paper introduces a dual-component neural architecture comprising Hierarchical Dynamic Refinement Attention (HiDRA) and Densely Connected Dilated Block (DCDBlock) to address these challenges systematically. The HiDRA mechanism implements a dual-phase feature enhancement process: channel competition through bottleneck compression for discriminative feature selection, followed by spatial-semantic reweighting for foreground–background decoupling. The DCDBlock architecture synergizes multi-scale dilated convolutions with cross-layer dense connections, establishing persistent feature propagation pathways that preserve critical spatial details across network depths. Extensive experiments on AI-TOD, VisDrone, MAR20, and DOTA-v1.0 datasets demonstrate our method’s consistent superiority, achieving average absolute gains of +1.16% (mAP50), +0.93% (mAP95), and +1.83% (F1-score) over prior state-of-the-art approaches across all benchmarks. With 8.1 GFLOPs computational complexity and 2.6 ms inference speed per image, our framework demonstrates practical efficacy for real-time remote sensing applications, achieving superior accuracy–efficiency trade-off compared to existing approaches. Full article
Show Figures

Figure 1

19 pages, 5574 KB  
Article
Low-Damage Grasp Method for Plug Seedlings Based on Machine Vision and Deep Learning
by Fengwei Yuan, Gengzhen Ren, Zhang Xiao, Erjie Sun, Guoning Ma, Shuaiyin Chen, Zhenlong Li, Zhenhong Zou and Xiangjiang Wang
Agronomy 2025, 15(6), 1376; https://doi.org/10.3390/agronomy15061376 - 4 Jun 2025
Cited by 3 | Viewed by 1156
Abstract
In the process of plug seedling transplantation, the cracking and dropping of seedling substrate or the damage of seedling stems and leaves will affect the survival rate of seedlings after transplantation. Currently, most research focuses on the reduction of substrate loss, while ignoring [...] Read more.
In the process of plug seedling transplantation, the cracking and dropping of seedling substrate or the damage of seedling stems and leaves will affect the survival rate of seedlings after transplantation. Currently, most research focuses on the reduction of substrate loss, while ignoring damage to the hole tray seedling itself. Targeting the problem of high damage rate during transplantation of plug seedlings, we have proposed an adaptive grasp method based on machine vision and deep learning, and designed a lightweight real-time grasp detection network (LRGN). The lightweight network Mobilenet is used as the feature extraction network to reduce the number of parameters of the network. Meanwhile, a dilated refinement module (DRM) is designed to increase the receptive field effectively and capture more contextual information. Further, a pixel-attention-guided fusion module (PAG) and a depth-guided fusion module (DGFM) are proposed to effectively fuse deep and shallow features to extract multi-scale information. Lastly, a mixed attention module (MAM) is proposed to enhance the network’s attention to important grasp features. The experimental results show that the proposed network can reach 98.96% and 98.30% accuracy of grasp detection for the image splitting and object splitting subsets of the Cornell dataset, respectively. The accuracy of grasp detection for the plug seedling grasp dataset is up to 98.83%, and the speed of image detection is up to 113 images/sec, with the number of parameters only 12.67 M. Compared with the comparison network, the proposed network not only has a smaller computational volume and number of parameters, but also significantly improves the accuracy and speed of grasp detection, and the generated grasp results can effectively avoid seedlings, reduce the damage rate in the grasp phase of the plug seedlings, and realize a low-damage grasp, which provides the theoretical basis and method for low-damage transplantation mechanical equipment. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

22 pages, 1077 KB  
Article
SECrackSeg: A High-Accuracy Crack Segmentation Network Based on Proposed UNet with SAM2 S-Adapter and Edge-Aware Attention
by Xiyin Chen, Yonghua Shi and Junjie Pang
Sensors 2025, 25(9), 2642; https://doi.org/10.3390/s25092642 - 22 Apr 2025
Cited by 10 | Viewed by 3313
Abstract
Crack segmentation is essential for structural health monitoring and infrastructure maintenance, playing a crucial role in early damage detection and safety risk reduction. Traditional methods, including digital image processing techniques have limitations in complex environments. Deep learning-based methods have shown potential, but still [...] Read more.
Crack segmentation is essential for structural health monitoring and infrastructure maintenance, playing a crucial role in early damage detection and safety risk reduction. Traditional methods, including digital image processing techniques have limitations in complex environments. Deep learning-based methods have shown potential, but still face challenges, such as poor generalization with limited samples, insufficient extraction of fine-grained features, feature loss during upsampling, and inadequate capture of crack edge details. This study proposes SECrackSeg, a high-accuracy crack segmentation network that integrates an improved UNet architecture, Segment Anything Model 2 (SAM2), MI-Upsampling, and an Edge-Aware Attention mechanism. The key innovations include: (1) using a SAM2 S-Adapter with a frozen backbone to enhance generalization in low-data scenarios; (2) employing a Multi-Scale Dilated Convolution (MSDC) module to promote multi-scale feature fusion; (3) introducing MI-Upsampling to reduce feature loss during upsampling; and (4) implementing an Edge-Aware Attention mechanism to improve crack edge segmentation precision. Additionally, a custom loss function incorporating weighted binary cross-entropy and weighted IoU loss is utilized to emphasize challenging pixels. This function also applies Multi-Granularity Supervision by optimizing segmentation outputs at three different resolution levels, ensuring better feature consistency and improved model robustness across varying image scales. Experimental results show that SECrackSeg achieves higher precision, recall, F1-score, and mIoU scores on the CFD, Crack500, and DeepCrack datasets compared to state-of-the-art models, demonstrating its excellent performance in fine-grained feature recognition, edge segmentation, and robustness. Full article
(This article belongs to the Collection Sensors and Sensing Technology for Industry 4.0)
Show Figures

Figure 1

26 pages, 5977 KB  
Article
Hyperspectral Image Classification Using a Multi-Scale CNN Architecture with Asymmetric Convolutions from Small to Large Kernels
by Xun Liu, Alex Hay-Man Ng, Fangyuan Lei, Jinchang Ren, Xuejiao Liao and Linlin Ge
Remote Sens. 2025, 17(8), 1461; https://doi.org/10.3390/rs17081461 - 19 Apr 2025
Cited by 15 | Viewed by 2754
Abstract
Deep learning-based hyperspectral image (HSI) classification methods, such as Transformers and Mambas, have attracted considerable attention. However, several challenges persist, e.g., (1) Transformers suffer from quadratic computational complexity due to the self-attention mechanism; and (2) both the local and global feature extraction capabilities [...] Read more.
Deep learning-based hyperspectral image (HSI) classification methods, such as Transformers and Mambas, have attracted considerable attention. However, several challenges persist, e.g., (1) Transformers suffer from quadratic computational complexity due to the self-attention mechanism; and (2) both the local and global feature extraction capabilities of large kernel convolutional neural networks (LKCNNs) need to be enhanced. To address these limitations, we introduce a multi-scale large kernel asymmetric CNN (MSLKACNN) with the large kernel sizes as large as 1×17 and 17×1 for HSI classification. MSLKACNN comprises a spectral feature extraction module (SFEM) and a multi-scale large kernel asymmetric convolution (MSLKAC). Specifically, the SFEM is first utilized to suppress noise, reduce spectral bands, and capture spectral features. Then, MSLKAC, with a large receptive field, joins two parallel multi-scale asymmetric convolution components to extract both local and global spatial features: (C1) a multi-scale large kernel asymmetric depthwise convolution (MLKADC) is designed to capture short-range, middle-range, and long-range spatial features; and (C2) a multi-scale asymmetric dilated depthwise convolution (MADDC) is proposed to aggregate the spatial features between pixels across diverse distances. Extensive experimental results on four widely used HSI datasets show that the proposed MSLKACNN significantly outperforms ten state-of-the-art methods, with overall accuracy (OA) gains ranging from 4.93% to 17.80% on Indian Pines, 2.09% to 15.86% on Botswana, 0.67% to 13.33% on Houston 2013, and 2.20% to 24.33% on LongKou. These results validate the effectiveness of the proposed MSLKACNN. Full article
Show Figures

Figure 1

22 pages, 32472 KB  
Article
Multi-Scale Feature Fusion GANomaly with Dilated Neighborhood Attention for Oil and Gas Pipeline Sound Anomaly Detection
by Yizhuo Zhang, Zhengfeng Sun, Shen Shi and Huiling Yu
Information 2025, 16(4), 279; https://doi.org/10.3390/info16040279 - 30 Mar 2025
Cited by 1 | Viewed by 2066
Abstract
Anomaly detection in oil and gas pipelines based on acoustic signals currently faces challenges, including limited anomalous samples, varying audio data distributions across different operating conditions, and interference from background noise. These challenges lead to reduced accuracy and efficiency in pipeline anomaly detection. [...] Read more.
Anomaly detection in oil and gas pipelines based on acoustic signals currently faces challenges, including limited anomalous samples, varying audio data distributions across different operating conditions, and interference from background noise. These challenges lead to reduced accuracy and efficiency in pipeline anomaly detection. The primary challenge in reconstruction-based pipeline audio anomaly detection is to prevent the loss of critical information and ensure the high-quality reconstruction of feature maps. This paper proposes a pipeline anomaly detection method termed Multi-scale Feature Fusion GANomaly with Dilated Neighborhood Attention. Firstly, to mitigate information loss during network deepening, a Multi-scale Feature Fusion module is proposed to merge the encoded and decoded feature maps at different dimensions, enhancing low-level detail and high-level semantic information. Secondly, a Dilated Neighborhood Attention module is introduced to assign varying weights to neighborhoods at various dilation rates, extracting channel interactions and spatial relationships between the current pixel and its neighborhoods. Finally, to enhance the quality of the reconstructed spectrum, a loss function based on the Structure Similarity Index Measure is designed, considering both pixel-level and structural differences to maintain the structural characteristics of the reconstructed spectrum. MFDNA-GANomaly achieved 92.06% AUC, 93.96% Accuracy, and 0.955 F1-score on the test set, demonstrating that the proposed method can effectively enhance pipeline anomaly detection performance. Additionally, MFDNA-GANomaly exhibited competitive performance on the ToyTrain and Bearing subsets of the development dataset in the DCASE Challenge 2023 Task 2, confirming the generalization capability of the model. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop