Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (250)

Search Parameters:
Keywords = lightweight attention U-Net

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1572 KB  
Article
Efficient Glare Suppression Network for Nighttime Images with Lightweight Parallel Attention and Ghost Convolution
by Ruoyu Yang, Huaixin Chen, Sijie Luo and Zhixi Wang
Sensors 2026, 26(12), 3773; https://doi.org/10.3390/s26123773 (registering DOI) - 12 Jun 2026
Abstract
Aiming at the problems of glare interference, local overexposure and detail loss caused by artificial light sources such as vehicle lamps and street lamps in nighttime road scenes, as well as the challenges of existing glare suppression models with large parameters, high computational [...] Read more.
Aiming at the problems of glare interference, local overexposure and detail loss caused by artificial light sources such as vehicle lamps and street lamps in nighttime road scenes, as well as the challenges of existing glare suppression models with large parameters, high computational complexity and difficulty in deploying on edge devices, this paper proposes a lightweight glare suppression network (LGSNet) based on ghost depthwise separable convolution and Lightweight Parallel Attention. Based on the U-Net architecture, the network introduces ghost depthwise separable convolution blocks (GhostDSC) in the encoder and decoder, which generates ghost features through cheap linear transformations by exploiting feature map redundancy, significantly reducing model parameters and computational costs while maintaining feature representation ability. Meanwhile, a Lightweight Parallel Attention (LPA) module is designed in the decoder stage, which integrates channel attention and pixel attention in parallel, enhancing the network’s attention to glare regions and edge details with extremely low parameter increment to improve detail recovery accuracy. In addition, a joint loss function consisting of background loss, glare loss and reconstruction loss is constructed to collaboratively optimize glare suppression and detail preservation. Experimental results on the public Flare7K++ dataset and the self-built nighttime road glare dataset NRGD show that the proposed method has only 7.45 M parameters, much lower than standard U-Net and Uformer. It achieves competitive results on full-reference metrics such as PSNR, SSIM, LPIPS and no-reference metrics such as NIQE, BRISQUE, PIQE, and can effectively suppress various types of glare interference and restore obscured scene details. It achieves a superior trade-off between model complexity and enhancement performance, significantly reducing the parameter count and computational overhead compared to heavy baselines, thereby offering a highly efficient solution for resource-aware glare suppression tasks. Full article
(This article belongs to the Section Intelligent Sensors)
30 pages, 68434 KB  
Article
A Lightweight and High-Precision Citrus Detection Model for Unstructured Orchard Environments
by Junjie Yang, Haorong Wu, Dong Lv, Wei Ma, Hao Teng and Dehua Chen
Horticulturae 2026, 12(6), 718; https://doi.org/10.3390/horticulturae12060718 (registering DOI) - 11 Jun 2026
Viewed by 143
Abstract
This study was conducted to address the challenges of detecting citrus fruits in complex orchard environments characterized by overlap, occlusion, and variable lighting conditions. To tackle these issues, an improved detection model named YOLO-MGP was developed based on the YOLOv8n architecture. Four key [...] Read more.
This study was conducted to address the challenges of detecting citrus fruits in complex orchard environments characterized by overlap, occlusion, and variable lighting conditions. To tackle these issues, an improved detection model named YOLO-MGP was developed based on the YOLOv8n architecture. Four key enhancements were introduced to the core components of the detection framework. First, the primary backbone network was replaced with MobileNetV3, which substantially reduced computational requirements while preserving the capability for multi-scale feature extraction. Second, a C2f-GLU module was incorporated into the neck network. By leveraging Gated Linear Units, this module strengthens the feature selection and fusion processes. Third, an additional P2 detection layer was added to improve the detection of small targets. This modification was complemented by the integration of a Coordinate Attention mechanism, which refines the distribution of feature weights across spatial and channel dimensions. Finally, the CIoU loss was replaced by PIoU to enhance the accuracy of bounding box regression, particularly for occluded and overlapping targets. Experimental results demonstrate that the YOLO-MGP model achieved a precision of 94.2%, a recall of 89.7%, and a mAP50 of 95.7% on our custom citrus dataset. By substantially reducing the number of parameters while maintaining competitive detection performance, the proposed method offers a practical and lightweight solution for fruit detection in automated harvesting systems. Full article
(This article belongs to the Special Issue Emerging Technologies in Smart Agriculture)
Show Figures

Figure 1

22 pages, 19870 KB  
Article
SIG-Net: A Spectral-Index-Guided Network for Red Tide Extraction from Sentinel-2 Multispectral Imagery
by Lei Zhou, Hongping Li, Xiaojun Chen and Zhanqiang Li
Remote Sens. 2026, 18(12), 1928; https://doi.org/10.3390/rs18121928 - 11 Jun 2026
Viewed by 158
Abstract
Red tide events pose substantial threats to marine ecosystems, aquaculture, and coastal public health. Timely and accurate delineation of red tide extent from satellite imagery is therefore essential for operational monitoring and early warning. However, existing deep learning-based semantic segmentation methods generally treat [...] Read more.
Red tide events pose substantial threats to marine ecosystems, aquaculture, and coastal public health. Timely and accurate delineation of red tide extent from satellite imagery is therefore essential for operational monitoring and early warning. However, existing deep learning-based semantic segmentation methods generally treat multispectral bands as homogeneous inputs and do not fully exploit the domain knowledge embodied in spectral indices commonly used in traditional remote sensing analysis. To address this limitation, this study proposes a spectral-index-guided network (SIG-Net) that explicitly incorporates spectral-index priors into deep feature extraction through a dual-branch architecture. SIG-Net comprises three components: a spectral encoder based on a Mix Vision Transformer (MiT-B2) that learns spatial-spectral representations from the original Sentinel-2 bands; a lightweight CNN-based index encoder that extracts discriminative features from four spectral indices, namely the red-green index (RGI), blue-green index (BGI), normalized difference vegetation index (NDVI), and the normalized difference Noctiluca index (NDNI) proposed in this study; and a spectral-index-guided fusion (SIGF) module that adaptively integrates multi-scale features from the two branches using spatial-reduction cross-attention and a gated fusion mechanism. Experiments on a Sentinel-2 red tide dataset show that SIG-Net outperforms single-branch baselines, including U-Net, DeepLabV3+, and SegFormer, as well as naive multi-source fusion strategies. Ablation studies further confirm the contributions of the SIGF module, the gating mechanism, and the proposed NDNI to performance improvements. The proposed method provides an effective framework for integrating domain knowledge with deep learning for red tide remote sensing monitoring. Full article
Show Figures

Figure 1

17 pages, 1905 KB  
Article
DAS-Net: A Lightweight Dynamic Convolution Network with Attention Gates and Deep Supervision for UAV Semantic Segmentation
by Young Jae Kim and Sang-Chul Kim
Appl. Sci. 2026, 16(11), 5688; https://doi.org/10.3390/app16115688 - 5 Jun 2026
Viewed by 117
Abstract
Anti-UAV surveillance demands real-time pixel-level UAV localization on resource-constrained gimbal-mounted platforms, yet existing lightweight segmentation models suffer from low recall that propagates to downstream tracking failure. Building on our prior dataset of 605,045 paired visible-light and infrared images, we extend the lightweight ThinDyUNet [...] Read more.
Anti-UAV surveillance demands real-time pixel-level UAV localization on resource-constrained gimbal-mounted platforms, yet existing lightweight segmentation models suffer from low recall that propagates to downstream tracking failure. Building on our prior dataset of 605,045 paired visible-light and infrared images, we extend the lightweight ThinDyUNet baseline with three architectural improvements: (1) symmetric dynamic convolution applied to both the encoder and decoder, (2) attention gates filtering skip connections, and (3) deep supervision with auxiliary loss heads. The resulting DAS-Net is evaluated under a three-seed Monte Carlo cross-validation protocol on the full 174,008-image test set. DAS-Net achieves a mean test mIoU of 0.6780 and Dice coefficient of 0.7509 across three independent seeds, outperforming the ThinDyUNet baseline by +6.65 percentage points (pp) in mIoU with statistical significance (one-sided paired t-test, p = 0.045, Cohen’s d = 1.74; full variance and significance analysis in the experimental section). DAS-Net matches the best-performing external baseline (UNet) and exceeds the others (MobileUNet, PAN, PSPNet) while using approximately 14.7× fewer parameters than ResNet-34-based variants. DAS-Net runs at 8.83 ms per image on an NVIDIA A6000 GPU (113 FPS) and 38.44 ms on an NVIDIA Jetson AGX Orin (26 FPS at FP16), demonstrating real-time deployability across server-class and embedded edge platforms. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

33 pages, 18240 KB  
Article
Disagreement-Guided Knowledge Distillation for Efficient Kidney Segmentation in Abdominal CT
by Coşku Öksüz
Appl. Sci. 2026, 16(11), 5573; https://doi.org/10.3390/app16115573 - 2 Jun 2026
Viewed by 310
Abstract
Accurate kidney segmentation in abdominal computed tomography (CT) images is important for quantitative analysis and computer-assisted clinical workflows, yet deploying deep learning models in practice remains challenging due to computational constraints. To address this, a disagreement-guided knowledge distillation (KD) framework is proposed for [...] Read more.
Accurate kidney segmentation in abdominal computed tomography (CT) images is important for quantitative analysis and computer-assisted clinical workflows, yet deploying deep learning models in practice remains challenging due to computational constraints. To address this, a disagreement-guided knowledge distillation (KD) framework is proposed for efficient kidney segmentation. The method introduces a spatial disagreement mask (Ω) to identify regions where teacher and student predictions diverge, enabling selective knowledge transfer focused on informative and error-prone areas while avoiding redundant supervision in well-predicted regions. In addition, a pixel-level annotated kidney segmentation dataset is created by extending a previously published abdominal CT dataset with kidney masks. The experimental results on both the in-house dataset and the KiTS19 benchmark show improved overlap-based segmentation performance, particularly in Dice and IoU, compared with supervised training and conventional pixel-wise KD. On the in-house dataset, Dice increases from 0.9239 to 0.9335 and IoU from 0.8720 to 0.8859, together with improved boundary-based distance metrics. On KiTS19, Dice improves from 0.8732 to 0.8812, primarily driven by improved kidney recall and a reduction in under-segmentation errors; however, boundary-based distance metrics remain more favorable for the conventional pixel-wise KD under domain shift. Additional experiments with a compact Attention U-Net-small student and stronger teacher sources further show that KD-Ω can improve compact student performance, although the magnitude of improvement depends on the teacher prediction profile. These findings indicate that the proposed framework provides an efficient and practical approach for enhancing lightweight segmentation models by prioritizing clinically relevant foreground preservation and reducing missed kidney regions under computational constraints. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 8126 KB  
Article
Lightweight and Accurate Forest Canopy Segmentation and Cover Estimation via Text-Prompted Pre-Annotation
by Hongbing Chen, Zhipeng Li, Mingming Li, Zhihang Xu, Yubo Zhang, Shuwen Zhang, Libo Liu and Changji Wen
Remote Sens. 2026, 18(11), 1767; https://doi.org/10.3390/rs18111767 - 1 Jun 2026
Viewed by 201
Abstract
Traditional high-precision canopy segmentation heavily relies on tedious pixel-level manual annotation, while general-purpose zero-shot visual detection algorithms are prone to boundary adhesion and excessive computational load in dense forest areas. To address this, this study proposes a human–machine collaborative, efficient canopy segmentation and [...] Read more.
Traditional high-precision canopy segmentation heavily relies on tedious pixel-level manual annotation, while general-purpose zero-shot visual detection algorithms are prone to boundary adhesion and excessive computational load in dense forest areas. To address this, this study proposes a human–machine collaborative, efficient canopy segmentation and canopy cover inversion paradigm, combining the zero-shot pre-annotation capabilities of text-driven object detection with the high-precision segmentation advantages of the lightweight proprietary network LGBU-Net. In the offline annotation stage, this method automatically locates candidate canopy regions using Grounding DINO combined with text prompts and generates initial pixel-level masks using SAM. A high-quality training set is then constructed through minimal manual correction, significantly reducing the cost of traditional fully manual annotation. Subsequently, an improved LGBU-Net designed for complex forest conditions is used for supervised learning. In the feature extraction stage, a lightweight phantom-coordinate attention module (LG-CAM) is introduced to enhance the network’s focus on the geometric center of the tree canopy and suppress semantic interference caused by the forest background, light spots, and shadows. In the decoding stage, a boundary difference fusion module (BDF-Block) is deployed to alleviate the problem of adjacent tree canopy boundaries adhering by utilizing high-frequency gradient information from the underlying layers of UAV imagery. Combined with a boundary-aware hybrid loss function, the clarity of individual tree boundaries is further improved in the gradient domain. Experiments based on UAV imagery of high-density mixed and coniferous forests in Baishan, Jilin Province, show that, with low manual annotation costs, LGBU-Net achieves a canopy segmentation IoU of 90.45% and an individual tree separation F1 score of 89.35%, significantly outperforming general visual algorithms with zero-shot direct inference, and with only 4.85 M model parameters. Furthermore, the segmentation results are used for plot-level canopy vertical cover (CC) inversion, and the estimated values are highly consistent with ground-based measurements. This research provides a high-precision, low-annotation-cost technical solution with good edge deployment potential for large-scale forest resource surveys and forest understory light environment assessment. Full article
(This article belongs to the Section Forest Remote Sensing)
Show Figures

Figure 1

25 pages, 5985 KB  
Article
FLIC: A Real-World Dataset for Visual Estimation of Food Leftovers in Canteens
by Flavio Piccoli, Damiano Callegaro, Davide Marelli, Marco Buzzelli, Cinzia Franchini, Lorenzo Stella, Simone Bianco, Gianluigi Ciocca, Raimondo Schettini and Francesca Scazzina
Appl. Sci. 2026, 16(11), 5465; https://doi.org/10.3390/app16115465 - 31 May 2026
Viewed by 279
Abstract
We present FLIC, a real-world annotated dataset designed for the visual estimation of food leftovers in canteens and other collective catering environments using standard 2D RGB imagery. Collected over 22 days in an operational university canteen, the dataset includes 401 paired image acquisitions [...] Read more.
We present FLIC, a real-world annotated dataset designed for the visual estimation of food leftovers in canteens and other collective catering environments using standard 2D RGB imagery. Collected over 22 days in an operational university canteen, the dataset includes 401 paired image acquisitions of full and leftover trays, each associated with pixel-precise semantic segmentation masks and physically measured food mass. The goal is to support research on the estimation of leftover food mass from tray images, a task that has received limited attention compared to pre-consumption food recognition, despite its relevance for sustainability and operational decision making in food services. Unlike existing food datasets, FLIC jointly provides paired before–after visual observations and reliable mass ground truth, enabling quantitative analysis of food leftovers under realistic conditions without relying on depth or multi-view information. To demonstrate the dataset’s applicability, we rely on the concept of digital density, relating pixel area to food mass, and implement a lightweight, interpretable baseline mass estimation pipeline. This includes an automatic food/no-food segmentation stage, evaluated across multiple deep learning models (U-Net, DABNet, DINOv2+FeatUp, and SAM), followed by an assisted food recognition stage that leverages the fixed daily menu to map broad user input (e.g., “first course” vs. “second course”) to a specific food class. Experimental results highlight both the potential and the intrinsic challenges of visual food leftover estimation. Full article
(This article belongs to the Section Food Science and Technology)
Show Figures

Figure 1

26 pages, 22796 KB  
Article
Farmland Visual Navigation with Semantic Segmentation Under Leaf Occlusion
by Jiahao Liang, Chao Liu, Yuting Zhai, Mingfu Zhang and Yanlei Xu
Agriculture 2026, 16(11), 1205; https://doi.org/10.3390/agriculture16111205 - 29 May 2026
Viewed by 223
Abstract
In agricultural machinery visual navigation, accurately identifying the navigation line extraction region (NLER) at the center of the field of view is crucial for obtaining a precise navigation centerline. Although deep learning is the predominant method for NLER extraction, existing approaches face challenges [...] Read more.
In agricultural machinery visual navigation, accurately identifying the navigation line extraction region (NLER) at the center of the field of view is crucial for obtaining a precise navigation centerline. Although deep learning is the predominant method for NLER extraction, existing approaches face challenges in farmland environments characterized by densely distributed and irregularly extended leaves. These challenges result in unstable predictions, slow inference, and large model sizes that impede real-time applications. To address these issues, we propose a lightweight navigation segmentation residual network (LNS-ResNet), which integrates an inhibition–enhancement module (IEM) and a global convolutional residual block (GCRB). The IEM uses row–column one-dimensional convolutions to enhance vertical features between crop rows and suppress leaf-edge interference, producing more robust input features. The GCRB incorporates a full convolutional global attention (FCGA) mechanism to capture global context while preserving local spatial information. LNS-ResNet effectively reduces foliage interference and achieves accurate segmentation, with intersection over union (IoU) scores of 84.71% for crop row and 93.77% for path regions. Based on the segmentation output, we further propose a mask region determination-based navigation line extraction algorithm (MRD-Line), which directly identifies the NLER and connects the centerline within the mask without relying on line fitting. Deployed experiments on the Jetson TX2 demonstrate that the proposed method achieves both accuracy and efficiency, with mean angular deviations of 0.138° (path) and 0.425° (crop row), with average processing times of 64.1 ms (path) and 62.6 ms (crop row). Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

23 pages, 1836 KB  
Article
Long-Tail Aware Cross-Modal Graph Attention Network for Fine-Grained Indoor 3D Semantic Segmentation of Point Clouds
by Erdal Özbay and Feyza Altunbey Özbay
Sensors 2026, 26(11), 3401; https://doi.org/10.3390/s26113401 - 27 May 2026
Viewed by 384
Abstract
Accurate and efficient semantic segmentation of point cloud data is critical in many application areas involving indoor scene understanding. In particular, fine-grained object categories, high data density, and class imbalance in high-resolution indoor datasets significantly limit class discrimination in 3D semantic segmentation. The [...] Read more.
Accurate and efficient semantic segmentation of point cloud data is critical in many application areas involving indoor scene understanding. In particular, fine-grained object categories, high data density, and class imbalance in high-resolution indoor datasets significantly limit class discrimination in 3D semantic segmentation. The multimodal data structure, high-fidelity geometry, and long-tail class distribution of the recently popular ScanNet++ dataset further exacerbate these challenges. This study proposes a novel Long-Tail Aware Cross-Modal Graph Attention Network (LT-CM-GACNet++) to address fine-grained 3D semantic segmentation under long-tail distributions. The proposed method integrates dynamic graph-based geometric feature extraction with a lightweight visual feature extractor based on MobileNetV3, enabling effective fusion of geometric and RGB-based information. The proposed Cross-Modal Graph Attention (CMGA) module facilitates adaptive information transfer between modalities, enabling more effective representation learning of both local and global contextual features. To mitigate the adverse effects of long-tail class distributions, prototype-based representation learning and a class frequency-aware loss function are jointly employed. This strategy improves the learning of rare classes while enhancing the discrimination between visually and geometrically similar categories. In the preprocessing stage, density-based sampling, normal vector estimation, and block-based fixed-size point cloud generation are applied to high-resolution mesh-derived data. The proposed model is evaluated on 50 scenes and 100 semantic classes selected from the ScanNet++ dataset. Experimental results demonstrate that the proposed method achieves significant improvements over existing approaches in terms of both overall segmentation performance and rare-class performance. In particular, notable gains are observed in mean Intersection over Union (mIoU) and rare-class mIoU metrics. These results highlight the effectiveness of cross-modal learning for high-resolution 3D scene segmentation under long-tail distributions. Full article
(This article belongs to the Special Issue Advances in Point Clouds for Sensing Applications)
Show Figures

Figure 1

22 pages, 31225 KB  
Article
SAR-Based Flood Extent Mapping with a Lightweight Siamese U-Net and Differential Attention Mechanism
by Ahmet Kaçmaz and Ugur Alganci
Earth 2026, 7(3), 87; https://doi.org/10.3390/earth7030087 - 25 May 2026
Viewed by 291
Abstract
Floods are among the most catastrophic natural disasters globally, causing significant damage to both life and infrastructure. Consequently, immediate and accurate assessment of inundated areas is critical for effective emergency response. While optical remote sensing is typically used for flood assessment, it is [...] Read more.
Floods are among the most catastrophic natural disasters globally, causing significant damage to both life and infrastructure. Consequently, immediate and accurate assessment of inundated areas is critical for effective emergency response. While optical remote sensing is typically used for flood assessment, it is often ineffective during active flood events due to persistent cloud cover and precipitation. To address this, this research develops a deep learning method utilizing Synthetic Aperture Radar (SAR), which offers all-weather, 24 h imaging capabilities. Specifically, an attention-based differential Siamese U-Net was developed to detect temporal changes in bi-temporal SAR imagery (e.g., Sentinel-1) acquired before and after flood events. The method was evaluated on the S1GFloods dataset, comprising 5360 bi-temporal Sentinel-1 SAR image pairs across 46 flood incidents on six continents. Experimental results demonstrate a flood Intersection over Union (IoU) of 92.43%, an F1 score of 96.07%, and a recall of 97.64%. These metrics rank the proposed approach third overall among top-performing methods on this dataset. Notably, the high recall rate indicates the model is particularly beneficial for emergency response, as it minimizes the number of undetected flooded areas. Despite utilizing a CNN-based architecture that is less complex than Vision Transformer models, this method achieves results comparable to the state-of-the-art DAM-Net, with a performance difference of only 0.77%. Full article
Show Figures

Figure 1

27 pages, 4438 KB  
Article
DOM-MUSE: A Deformable Omnidirectional State Space Architecture for Efficient Speech Enhancement
by Tsung-Jung Li, Bo-Yu Su, Jung-Shan Lin and Jeih-Weih Hung
Electronics 2026, 15(10), 2159; https://doi.org/10.3390/electronics15102159 - 18 May 2026
Viewed by 242
Abstract
Transformer-based speech enhancement (SE) architectures suffer from high computational complexity, while existing lightweight state space model (SSM) approaches are constrained to fixed one-dimensional scanning that cannot fully exploit the two-dimensional time–frequency structure of speech spectrograms. To address these limitations, we propose DOM-MUSE, a [...] Read more.
Transformer-based speech enhancement (SE) architectures suffer from high computational complexity, while existing lightweight state space model (SSM) approaches are constrained to fixed one-dimensional scanning that cannot fully exploit the two-dimensional time–frequency structure of speech spectrograms. To address these limitations, we propose DOM-MUSE, a lightweight U-Net-style SE framework built upon the Mamba-2 SSM with four targeted innovations. First, a Deformable Feature Extractor (DFE) predicts per location spatial offsets that warp the feature sampling grid to align with speech formant trajectories and harmonic structures, providing geometrically coherent inputs to the state space model. Second, a DOM Mamba Block with Cross-Dimensional Gated Fusion (CDGF) deploys two parallel Mamba-2 instances scanning the time and frequency axes independently, and uses Taylor Channel Attention (TCA) to derive semantic gates that modulate each SSM output before fusion. Third, a Phase-Guided Feature Conditioner (PGFC) computes local phase-gradient gates that suppress noise-dominated activations prior to the SSM stage, making the feature extraction pathway implicitly phase-aware. Fourth, an Attention-Based Skip Connection (ABSC) replaces conventional concatenation skip connections with a learned channel gate, adaptively controlling the information flow from the encoder to the decoder. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DOM-MUSE outperforms the reproduced MUSE baseline on all five evaluation metrics—including PESQ (+0.077), CSIG (+0.058), CBAK (+0.026), COVL (+0.070), and STOI (+0.002)—while reducing the parameter count by 24% (0.51 M to 0.39 M). Notably, DOM-MUSE also surpasses MUSE++ on perceptual quality metrics (PESQ +0.061, COVL +0.032) despite MUSE++ employing dynamic SNR augmentation and an augmented multi-objective loss that DOM-MUSE deliberately omits, demonstrating that the proposed architectural innovations yield genuine improvements independent of training strategy. When DOM-MUSE is additionally trained under the same augmented protocol as MUSE++, it achieves PESQ of 3.46 and COVL of 4.22, further confirming the complementary nature of architectural and training improvements. Full article
Show Figures

Figure 1

24 pages, 5438 KB  
Article
An Improved DeepLabV3+-Based Method for Crop Row Segmentation and Navigation Line Extraction in Agricultural Fields
by Letian Wu, Yongzhi Cui, Huifeng Shi, Xiaoli Sun, Jiayan Yang, Xinwei Cao, Ping Zou and Ya Liu
Sensors 2026, 26(10), 3142; https://doi.org/10.3390/s26103142 - 15 May 2026
Viewed by 398
Abstract
Accurate crop row detection is identified as a critical prerequisite for autonomous agricultural navigation, yet it remains challenging in complex field environments. To achieve a balance between segmentation accuracy, robustness, and real-time performance, an improved crop row segmentation and navigation method based on [...] Read more.
Accurate crop row detection is identified as a critical prerequisite for autonomous agricultural navigation, yet it remains challenging in complex field environments. To achieve a balance between segmentation accuracy, robustness, and real-time performance, an improved crop row segmentation and navigation method based on the DeepLabV3+ framework was developed. MobileNetV2 was adopted as the backbone to minimize computational costs, while feature representation was enhanced through integrated attention mechanisms and multi-scale fusion. Specifically, split-attention convolution was integrated into the backbone, a DenseASPP + SP module was employed for multi-scale contextual capture, and a Convolutional Block Attention Module (CBAM) was added to refine feature responses. Experimental results demonstrated that the proposed method outperformed mainstream models, achieving a mean Intersection over Union (mIoU) of 93.42% and an f1-score of 96.8%. The model maintained a lightweight architecture with 8.35 M parameters and a real-time speed of 32 FPS. Furthermore, crop row anchor points were extracted and processed via DBSCAN clustering and RANSAC fitting to generate high-precision navigation lines. Validation showed that the middle crop row yielded the highest fitting accuracy with minimal angular and lateral errors. This study provides an efficient visual perception solution for intelligent field operations. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

30 pages, 6946 KB  
Article
ISDG-Net: Efficient RGB–Infrared Object Detection for Remote Sensing Imagery
by Yaoyue Gao, Xinru Cheng, Yimeng Li, Dawei Xu, Desheng Sun and Yaoyi Hu
Remote Sens. 2026, 18(10), 1570; https://doi.org/10.3390/rs18101570 - 14 May 2026
Viewed by 302
Abstract
In all-weather Earth observation and complex unstructured environments, traditional single-modal remote sensing object detection often fails due to low illumination and strong background interference. While RGB–infrared fusion provides complementary information, existing methods are typically computationally intensive and struggle with dense small objects and [...] Read more.
In all-weather Earth observation and complex unstructured environments, traditional single-modal remote sensing object detection often fails due to low illumination and strong background interference. While RGB–infrared fusion provides complementary information, existing methods are typically computationally intensive and struggle with dense small objects and modality discrepancies, limiting their deployment on resource-constrained platforms. To address these challenges, we propose ISDG-Net, a lightweight and efficient visible-infrared dual-modal object detection framework specifically tailored for edge deployment. ISDG-Net integrates four core components: (1) a channel-separated inverted bottleneck backbone (IBC-Conv) that reduces parameter redundancy while preserving modality-specific semantics; (2) a dynamic sparse attention module (DySparse) based on Bi-Level Routing Attention, enabling long-range dependency modeling with low computational cost; (3) an adaptive spatial fusion detection head (Detect-SASD) that aligns visible and infrared features at the pixel level to resolve semantic inconsistency and scale mismatch; and (4) a geometry-aware IoU selector (GIS) that mitigates over-suppression in crowded scenes by incorporating multi-dimensional geometric constraints into post-processing. Extensive experiments on the VEDAI, M3FD, and LLVIP datasets demonstrate the effectiveness and efficiency of ISDG-Net. It achieves 55.1% and 77.1% mAP@0.5 on VEDAI and M3FD, respectively, and 93.7% mAP@0.5 with 89.7% recall on LLVIP, while maintaining a compact model size of 4.2 M parameters and 11.3 GFLOPs. These results validate that accurate RGB–infrared detection is achievable under strict resource constraints, making ISDG-Net well-suited for deployment in edge-based remote sensing systems. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

29 pages, 17443 KB  
Article
Per-SAM-MCPA: A Lightweight Framework for Individual Tree Crown Segmentation from UAV Imagery
by Chuting Hu, Size Dai, Shifan Wu, Qiaolin Ye and He Yan
Remote Sens. 2026, 18(10), 1559; https://doi.org/10.3390/rs18101559 - 13 May 2026
Viewed by 300
Abstract
Accurate individual tree crown (ITC) segmentation from unmanned aerial vehicle (UAV) imagery is important for fine-scale forest inventory, plantation management, and ecological monitoring. However, delineating ITCs in dense plantation environments remains difficult because crowns are strongly adjacent, canopy structures are highly homogeneous, and [...] Read more.
Accurate individual tree crown (ITC) segmentation from unmanned aerial vehicle (UAV) imagery is important for fine-scale forest inventory, plantation management, and ecological monitoring. However, delineating ITCs in dense plantation environments remains difficult because crowns are strongly adjacent, canopy structures are highly homogeneous, and crown boundaries are often blurred, making it hard for existing methods to preserve both regional integrity and boundary continuity. This study proposes the Perceptual Segment-Anything Model with Multi-head Cross-Parallel Attention (Per-SAM-MCPA), a lightweight and effective framework for fine-grained ITC segmentation in dense plantation scenes. Based on a compact ResNet-50 backbone, the framework integrates perceptual target-aware representation, multi-scale detail enhancement, global contextual modeling, and semantic-boundary collaborative refinement to improve crown discrimination and structural consistency. A perceptual relation module is used to strengthen pixel-level semantic dependency modeling, and a Multi-head Cross-Parallel Attention (MCPA) mechanism is designed to capture long-range contextual interactions through orthogonally decomposed spatial attention, improving global geometric consistency with limited computational overhead. A Composite Constraint Loss (CCL) that combines a weighted cross-entropy loss, a structural similarity loss, and a boundary term based on Hausdorff distance is introduced to jointly optimize region-level segmentation quality and boundary fidelity. Experiments on the Catalpa bungei UAV dataset show that the proposed method achieves an intersection over union (IoU) of 87.3% and an F1-score of 91.0%, outperforming representative baseline methods such as SAM and Mask R-CNN while maintaining an inference speed of 35.7 FPS on a single GPU. These results indicate that Per-SAM-MCPA offers an accurate, efficient, and practical solution for ITC segmentation in dense plantation environments. Full article
Show Figures

Figure 1

19 pages, 5202 KB  
Article
LASH-SegNet: A Lightweight Deep Learning Network for Multi-Trait Segmentation of Early-Stage Soybean Plants
by Liqiang Qi, Jinhua Liu, Chuntao Yu, Bo Zhang, Jinyang Li, Chen Zhao and Wei Zhang
Agriculture 2026, 16(10), 1025; https://doi.org/10.3390/agriculture16101025 - 8 May 2026
Viewed by 644
Abstract
Accurate segmentation of multiple phenotypic traits in early-stage soybean plants is essential for automated phenotyping and early-stage breeding analysis. However, the morphological diversity and heterogeneous visual characteristics of key traits, including hypocotyls, flowers, pubescence, and leaves, make unified segmentation challenging under complex backgrounds. [...] Read more.
Accurate segmentation of multiple phenotypic traits in early-stage soybean plants is essential for automated phenotyping and early-stage breeding analysis. However, the morphological diversity and heterogeneous visual characteristics of key traits, including hypocotyls, flowers, pubescence, and leaves, make unified segmentation challenging under complex backgrounds. To address this problem, this study proposes LASH-SegNet, a lightweight deep learning network for multi-trait segmentation of early-stage soybean plants. The network integrates dynamic snake convolution to model elongated and non-rigid structures and incorporates a SegNeXt-Attention module to enhance multi-scale feature representation and boundary awareness. In addition, the WIoUv3 loss function is adopted to improve localization accuracy and boundary alignment, particularly for slender targets. Experimental results show that LASH-SegNet achieves a precision of 88.82%, recall of 89.78%, and an F1-score of 89.30%, with an mAP50 of 91.24%, while maintaining a compact model size of 5.9 M parameters and 11.3 MB. These results demonstrate that LASH-SegNet provides an accurate and efficient solution for high-throughput multi-trait early-stage soybean plant phenotyping. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

Back to TopTop