Saved Queries

With the increasing demand for low-carbon energy, automated defect detection using unmanned aerial vehicle (UAV)-based thermal inspection has become essential for maintaining the reliability of photovoltaic systems. However, existing methods still suffer from low-contrast thermal imagery, large-scale variations of defects, and subtle thermal anomalies. To address these challenges, this study proposes Grouped-Hypergraph-Modulation DEIM (GHM-DEIM), a robust end-to-end detection framework based on an improved DEIM architecture. Specifically, a grouped multi-scale aggregation attention network is introduced to enhance global thermal perception and recover discriminative features from blurred backgrounds. In addition, an enhanced encoder incorporating a hypergraph-based context encoding mechanism is designed to model high-order non-local relationships and improve feature representation across different defect scales. Furthermore, a modulation fusion module is employed to adaptively refine multi-scale feature responses and suppress environmental noise interference. Extensive experiments conducted on the ThermoSolar-PV and PV-HSD-2025 datasets demonstrate that the proposed method consistently outperforms state-of-the-art detectors, achieving mAP@50 values of 88.6% and 74.2%, respectively, with improvements of 4.7% and 2.9% over the baseline. These results demonstrate the effectiveness and robustness of GHM-DEIM for UAV-based PV thermal defect inspection. Full article

(This article belongs to the Section Sensors and Robotics)

►▼ Show Figures

Figure 1

25 pages, 5937 KB

Open AccessArticle

CGSTA-Net: A Cross-Domain Generative Prior-Assisted Structure–Texture Adaptive Network for Remote Sensing Image Dehazing

by Xiaoyan Li, Yankun Zhao and Na Niu

Symmetry 2026, 18(6), 1027; https://doi.org/10.3390/sym18061027 (registering DOI) - 14 Jun 2026

Abstract

Dehazing of images is important for proper interpretation of optical images in remote sensing. However, current dehazing networks tend to have limited receptive field and texture information loss caused by conventional downsampling and complementary cross-domain information not being utilized in dehazing frameworks. In order to cope with these problems, we propose a Cross-domain Generative Prior-assisted Structure–Texture Adaptive Network for remote sensing image dehazing. It is a dual-stream encoder–decoder framework, which enhances the domain-specific information of RGB and generated prior, and then integrates them adaptively for haze-free reconstruction. In order to minimize information loss in downsampling, wavelet pooling is introduced to consider the frequency-aware structural and textural features. Additionally, a Structure–Texture Calibration Block is designed to simultaneously improve the local frequency textures and construct sparse long-range dependencies of structures, so as to achieve better restoration performance under spatially non-uniform haze. To appropriately fuse the various representations from RGB and generated prior images, a Prior-aware Gated Adaptive Fusion module is developed to balance the domain-specific features dynamically and keep the fine details at multi-level feature fusion. Finally, we utilize pixel-level contrastive learning to guide the latent space away from hazy distributions, thus enhancing the discriminability of the features. Extensive experiments on the three datasets, namely RSID, RICE-I and HRSD, demonstrate that CGSTA-Net can effectively restore images under varying haze conditions and significantly outperforms the latest dehazing methods in terms of visual quality and quantitative performance. Specifically, compared with the most effective competitive method, CGSTA-Net increased the PSNR by 22.9% on RSID, by 13.2% on RICE-I, and by 7.2% on HRSD. Full article

(This article belongs to the Section Computer)

►▼ Show Figures

Figure 1

38 pages, 2895 KB

Open AccessArticle

A Two-View Hierarchical Contrastive Learning-Driven Method for Community Detection

by Shun Liu, Yuzhi Xiao, Tao Huang, Yuanli Zhang and Yifei Wang

Mathematics 2026, 14(12), 2121; https://doi.org/10.3390/math14122121 (registering DOI) - 14 Jun 2026

Abstract

Effectively integrating graph topology and node attributes, while assigning nodes with both semantic similarity and structural closeness to the same community, remains a key challenge in attributed graph community detection. To address this challenge, this study proposes TVHCL-CD, a two-view hierarchical contrastive learning-driven method for community detection. The proposed method constructs an attribute view and a modularity view from the node attribute matrix and the modularity matrix, respectively, to model attribute semantics and high-order community structure priors. Structure-aware two-view representations are then learned in parallel through dual-view graph attention encoders incorporating multi-order neighborhood priors. Furthermore, a structure-enhanced Graph Transformer fusion module is designed to achieve node-level adaptive fusion of the two-view representations by introducing a learnable adjacency bias into global self-attention and a view-aware gating mechanism into the feed-forward network. To align the optimization objective with community semantics, a hierarchical contrastive learning strategy is further developed. Specifically, view-level consistency contrastive learning constructs modularity-guided augmented views to improve representation robustness, while community-level semantic contrastive learning incorporates partial ground-truth labels to enhance intra-community compactness and inter-community separation. Finally, clustering is performed on the fused representations to obtain community partitions. Experimental results on eight real-world attributed graphs and the generated tree-like attributed graph Tree-2500 indicate that TVHCL-CD achieves competitive performance under the semi-supervised transductive setting, while ablation results support the contributions of its main components. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

23 pages, 2717 KB

Open AccessArticle

3DWaFusion: Three-Dimensional Multiscale Wavelet Convolutional Neural Network for Multimodal Medical Image Fusion

by Yu Wang, Rui Zhang, Zhiqiang Zhang, Ningzhong Liu and Xiulai Wang

Sensors 2026, 26(12), 3784; https://doi.org/10.3390/s26123784 (registering DOI) - 14 Jun 2026

Abstract

Multimodal image fusion is a promising technology designed to fuse information from different medical sensors, which offer structured insights for disease diagnosis and treatment. However, existing 2D-centric fusion methods fail to capture 3D spatial continuity, and conventional wavelet-based approaches lack adaptability to diverse lesion regions and suffer from background artifacts. To address this issue, we propose a 3D multiscale wavelet convolutional neural network for multimodal medical image fusion. Specifically, a 3D Discrete Wavelet Transformation (3D DWT) is introduced to decompose input volumes into multi-frequency bands, isolating anatomical structures and lesion details while reducing 3D spatial redundancy. We embed hierarchical multiple frequency band into a Global and Local Feature Calibration (GLFC) module to adaptively enhance single-modal features by fusing global contextual information and local details. Furthermore, a pyramid group-wise multiscale feature interaction is proposed for capturing complementary features across different spatial scales. Finally, a voxel-wise weighted averaging strategy reconstructs the fused image by adaptively assigning contributions to each modality at every spatial position, effectively eliminating artifacts and improving the visual fidelity of the result. Extensive experiments on the BraTS2020 and Hecktor datasets demonstrate that our proposed method outperforms state-of-the-art (SOTA) fusion methods in both subjective visual quality and objective metrics. Moreover, downstream segmentation validation confirms that fused images from our method significantly improve tumor segmentation accuracy. The source code and pre-trained models will be publicly available. Full article

(This article belongs to the Section Biomedical Sensors)

►▼ Show Figures

Figure 1

26 pages, 18173 KB

Open AccessArticle

MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion

by Bo Li, Chunhao Li and Yuheng Li

Appl. Sci. 2026, 16(12), 5998; https://doi.org/10.3390/app16125998 (registering DOI) - 13 Jun 2026

Abstract

Robust autonomous-driving detection requires using RGB texture and infrared thermal cues without sacrificing real-time inference. Existing RGB-IR detectors often rely on static feature concatenation or quadratic attention, which makes them sensitive to modality imbalance, small spatial offsets, and deployment cost. We propose MobileMamba-DETR, a lightweight DETR-style detector that treats dual-modal fusion as a selective state-space process. Its principal design is an SS2D-based cross-modal interaction module that uses normalized RGB-IR contrast as a guide, while a MobileMamba backbone, spectral–spatial encoder, and dynamic convolutional decoder provide efficient multi-scale representation and query localization. On M3FD and FLIR-ADAS, MobileMamba-DETR achieves mAP

_{50}

of 83.6% and 78.3%, respectively, with 38.7M parameters and 42 FPS inference at

640 \times 640

on an RTX 3090. The results, ablations, and seed-based validation show that selective state-space fusion improves accuracy while retaining real-time throughput. Full article

(This article belongs to the Special Issue AI-Based Methods for Object Detection and Path Planning)

24 pages, 15476 KB

Open AccessArticle

Chrs-Net: A Dual-Stream YOLO Network for Underwater RGB–Sonar Object Detection

by Chuheng Zhang, Hongli Xu, Pangyi Xiao, Han Wang, Jingyu Ru and Hongxu Yang

J. Mar. Sci. Eng. 2026, 14(12), 1094; https://doi.org/10.3390/jmse14121094 (registering DOI) - 13 Jun 2026

Abstract

Underwater RGB–sonar object detection remains challenging due to severe optical degradation, strong sonar noise, and spatial misalignment between heterogeneous modalities. Existing multimodal detectors usually rely on simple feature aggregation or limited structural coupling, which cannot effectively model global cross-modal dependencies or address modality-specific degradation. To address these challenges, we propose Chrs-Net, a YOLOv12-based dual-stream framework for underwater RGB–sonar object detection. The proposed network integrates three key components: a Transformer-based Cross-Modal Communication Fusion module (C-mcf) for global cross-modal interaction and semantic alignment, a Multi-Layer Feature Enhancement module (MLFE) for degraded optical feature enhancement, and a Pinwheel-Shaped Convolution module (PConv) for sonar-side structural feature extraction. In addition, an RGB–sonar object detection dataset is constructed for experimental evaluation by relabeling part of the RGBS benchmark, combining simulator-collected samples, and introducing style-transfer-based augmentation to improve data diversity. Experiments on the constructed dataset yield 94.91% mAP@0.5 and 61.10% mAP@0.5:0.95 on the RGB branch, and 94.00% and 57.13% on the sonar branch, respectively, with an inference speed of 53.6 FPS. Compared with representative single-modality and multimodal detectors, Chrs-Net consistently yields superior detection accuracy and localization performance. These results demonstrate that the combination of global cross-modal communication and modality-specific enhancement is effective for robust underwater RGB–sonar object detection in complex environments. Full article

(This article belongs to the Topic Applications and Development of Underwater Robotics and Underwater Vision Technology, 2nd Edition)

33 pages, 3096 KB

Open AccessArticle

Multimodal Uncertainty-Aware Gating Fusion and Iterative Feedback Refinement for HSI-LiDAR Open-Set Classification

by Davaajargal Myagmarsuren, Haibin Wu and Aili Wang

Remote Sens. 2026, 18(12), 1963; https://doi.org/10.3390/rs18121963 (registering DOI) - 12 Jun 2026

Abstract

Open-set classification for remote sensing requires models that simultaneously achieve high accuracy on known land-cover types and reliably detect novel classes absent from the training distribution—a capability essential for real-world deployment where new classes routinely emerge. Existing multimodal fusion approaches for hyperspectral imagery (HSI) and LiDAR are primarily designed for closed-set scenarios and lack robust uncertainty modeling for unknown detection. We propose a post hoc calibrated multimodal open-set framework with three tightly integrated components. First, an Uncertainty-Aware Gating Fusion (UAGF) module dynamically weights HSI and LiDAR features per sample based on modality reliability and produces a gating uncertainty signal reflecting fusion confidence. Second, an Iterative Feedback Refinement (IFR) module progressively refines fused representations over multiple iterations and captures convergence dynamics, where stable convergence indicates known samples while high feature-change variance identifies potential unknowns. Third, a compact two-signal open-set detector combines gating uncertainty and refinement variance through an EVT (Weibull)-based post hoc calibration mechanism fitted exclusively on known validation samples. The framework follows a strict zero-unknown-supervision protocol: the multimodal backbone is trained using only known-class samples, and the open-set decision threshold is derived solely from the known validation score distribution. This design decouples representation of learning from open-set decision learning, improving robustness and avoiding the objective conflicts that arise in joint training. Comprehensive experiments on three benchmark datasets—Houston2013, Muufl, and Augsburg—demonstrate that the proposed method achieves 92.79%, 84.47%, and 80.99% overall accuracy and 76.48%, 63.91%, and 56.81% unknown accuracy, outperforming the closest multimodal competitor HyLiOSR by up to 32.4 pp in unknown accuracy while maintaining competitive closed-set performance. Full article

(This article belongs to the Special Issue Deep Learning for Multi-Sensor Remote Sensing: Advancements in Image Classification and Semantic Segmentation)

18 pages, 1579 KB

Open AccessArticle

A Lightweight Algal Bloom Detection Algorithm for Water Surfaces Based on Improved YOLOv26

by Haoran Wang, Zifei Ma, Mi Zhou, Yunfeng Pan, Jing Wang and Yanji Yao

Appl. Sci. 2026, 16(12), 5969; https://doi.org/10.3390/app16125969 (registering DOI) - 12 Jun 2026

Abstract

Monitoring water surface algal blooms from surveillance perspectives faces challenges such as small objects, low texture contrasts, dynamic background interferences, and limited labeled datasets. In this study, we propose GECA-YOLOv26, a lightweight model that integrates Ghost Convolution (GhostConv) and Efficient Channel Attention (ECA) modules. First, the GhostConv lightweight module is introduced in the first layer of the YOLOv26 backbone, reducing parameters from 4608 to 2704 and achieving a 41% reduction in computational cost. Second, eight ECA modules are embedded at key locations after backbone downsampling and neck feature fusion to enhance feature representation and mitigate degradation caused by model lightweighting. Finally, the MuSGD optimizer is used for training, with adaptive modifications to resolve tensor shape conflicts with the ECA modules. Experimental results indicate that the model achieves a mAP50 of 82.16%. Compared with the YOLOv26 baseline, our model improves mAP50 by 6.42%, while mAP@0.5:0.95 decreases by 0.79% and inference speed reduces from 143 FPS to 123 FPS. The model also reduces parameters and size, achieving 5.19 MB and 1864 fewer parameters. Compared with YOLOv8, YOLOv10, and YOLOv11, the proposed model improves mAP50 by 2.12%, 5.99%, and 2.79%, respectively. To evaluate the stability of the results under small-sample conditions, we conducted 3-fold and 5-fold cross-validation experiments, which demonstrated that the model performs robustly across different folds and random seeds. Ablation studies further confirm the effectiveness of each module. Heatmap analysis demonstrates that the proposed model effectively highlights small object regions, remains robust under limited-sample conditions, and reduces model complexity. This study provides a novel solution for algal bloom detection in surveillance scenarios. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

28 pages, 24246 KB

Open AccessArticle

Multimodal Prompt Learning for Spatial Reasoning in Remote Sensing Image Scene

by Yan Ren, Haizhong Qian, Bingchuan Jiang, Tingting Li, Xiao Wang, Long Sun and Li Yang

Remote Sens. 2026, 18(12), 1959; https://doi.org/10.3390/rs18121959 (registering DOI) - 12 Jun 2026

Abstract

A remote sensing scene graph (RSSG) enables machines to interpret interactions among ground objects in remote sensing images and supports semantic reasoning and description, thus making it a fundamental technique in the field. However, most existing scene reasoning approaches cannot fully utilize multimodal information, resulting in limited performance when inferring spatial relationships among ground objects. To this end, we propose a Unified Visual-Semantic Triple Prompt Learning (UVSTPL) framework, which integrates visual features with matched geospatial object labels, leverages a prompt learning module for multimodal feature extraction, and employs a refined UVTransE model to predict spatial relationships. The core principle of UVSTPL is to enhance semantic feature extraction and improve relationship prediction performance via the collaborative fusion of visual and linguistic modalities. To strengthen the model’s ability to reason about the spatial relationships among ground objects in images, a novel Geo-RSSG dataset is constructed, which includes precise annotations of geographic entities, spatial relationships, and attributes. Extensive experiments demonstrate that the proposed UVSTPL method outperforms benchmark models on the spatial relationship prediction task. In comparison with the best baseline method, our approach improves prediction precision by 1.85%, mean precision by 8.49%, mean recall by 17.46%, and mean F1-score by 12.97%. This study offers valuable insights for advancing the understanding and cognitive capabilities of remote sensing scenes. Full article

(This article belongs to the Special Issue Vision–Language Multimodal Learning for Remote Sensing and Geospatial Artificial Intelligence)

►▼ Show Figures

Figure 1

23 pages, 517 KB

Open AccessArticle

Design and Experimental Evaluationof an Open-Architecture Multi-Sensor Telemetry System for Real-Time Motorcycle Dynamics Acquisition

by Andrei García Cuadra, Alberto Brunete González and Francisco Santos Olalla

Electronics 2026, 15(12), 2604; https://doi.org/10.3390/electronics15122604 (registering DOI) - 12 Jun 2026

Abstract

Real-time telemetry is essential for performance optimization and safety in motorcycle racing, yet commercial solutions remain proprietary, expensive, and poorly extensible. This paper presents the design, implementation, and experimental evaluation of an open-architecture embedded telemetry unit built around the STM32H745 dual-core microcontroller. The system integrates a u-blox ZED-F9P RTK-GNSS receiver, a Bosch BNO085 9-DoF IMU with on-chip sensor fusion, a CAN-FD interface for powertrain data acquisition, and a SIM7600E-H 4G/LTE module for real-time remote streaming, all housed in a 3D-printed vibration-resistant enclosure. The firmware employs deterministic dual-core task partitioning: the Cortex-M7 core handles sensor fusion and CAN-FD at high frequency, while the Cortex-M4 core manages 4G communication and microSD logging. We explicitly delimit the scope of the evidence presented: CAN-FD powertrain acquisition and end-to-end operational reliability are experimentally validated on real circuit data spanning four campaigns, over 100 laps, and 5.8 h of logging—with sustained acquisition of 13 powertrain channels at speeds up to 185 km/h and zero system resets or data-integrity errors. In contrast, RTK positioning accuracy (2.5 cm CEP), sensor-fusion latency (sub-2 ms at the 99th percentile), 4G-uplink reliability, and thermal margins are characterized through manufacturer specifications, Monte Carlo simulation, and analytical models, with a fully instrumented end-to-end measurement campaign identified as the immediate next step. The 50 Hz effective positioning rate combines 25 Hz GNSS with IMU interpolation. With a bill of materials of approximately EUR 265, the platform offers an order-of-magnitude cost reduction over commercial alternatives while providing full openness and extensibility for distributed intelligence applications. Full article

(This article belongs to the Topic Electronic Communications, IOT and Big Data, 2nd Volume)

32 pages, 7334 KB

Open AccessArticle

Text Semantic Guided Spatial–Frequency Fusion Network for HSI–LiDAR Land-Cover Classification

by Aili Wang, Manman Yao, Haoran Lv and Haisong Chen

Remote Sens. 2026, 18(12), 1957; https://doi.org/10.3390/rs18121957 (registering DOI) - 12 Jun 2026

Abstract

Joint classification of hyperspectral images (HSI) and light detection and ranging (LiDAR) data is important for land-cover recognition, as it can exploit both spectral discrimination and structural elevation information. However, existing methods mainly focus on visual feature fusion and insufficiently utilize class-level semantic priors, which limits their discriminative capability in complex boundaries, visually similar categories, and limited-sample scenarios. To address these issues, this paper proposes a text-guided multimodal semantic fusion network for HSI–LiDAR classification. Specifically, a Channel-Modulated Mobile Convolution Module (CMMC) is designed to extract modality-specific features, a Spatial–Frequency Feature Enhancement Module (SFFE) is introduced to enhance spatial-boundary and frequency-domain structural representations, and a Bidirectional Cross-Modal Fusion Module (BCMF) is developed to promote complementary interaction between spectral and structural information. Meanwhile, class-level textual descriptions are constructed from class names, color attributes, and geographical contexts, and a text encoder is employed to obtain semantic prototypes. Furthermore, a multi-branch vision–text semantic alignment mechanism projects HSI features, LiDAR features, and fused features into a shared semantic space for joint constraints, improving semantic consistency and class separability. Experiments on the Houston2013, Augsburg, and Trento datasets demonstrate the effectiveness of the proposed method. It achieves an overall accuracy of 98.76% on Houston2013, with improvements of 0.62%, 0.52%, and 0.67 in overall accuracy, average accuracy, and Kappa coefficient × 100 over the best competing results, respectively. The proposed method also obtains the best overall metrics on Augsburg and Trento, and ablation studies verify the effectiveness of the proposed components. Full article

(This article belongs to the Special Issue Deep Learning for Multi-Sensor Remote Sensing: Advancements in Image Classification and Semantic Segmentation)

29 pages, 2267 KB

Open AccessArticle

EdgeElderCare: A Resource-Aware, Scene-Adaptive Edge-Cloud Collaborative System for Long-Term Elderly Safety and Health Monitoring

by Lihao Luo, Yuting Li, Lin Wei, Di Han, Ruifeng Cao, Bo Chen, Yuechen Pan and Yunfan Chen

Electronics 2026, 15(12), 2601; https://doi.org/10.3390/electronics15122601 (registering DOI) - 12 Jun 2026

Abstract

Driven by global population aging, long-term in-home and institutional elderly care faces challenges in delivering continuous, privacy-aware, and resource-efficient safety and health monitoring. Existing edge-based solutions struggle to jointly balance detection accuracy, privacy, and resource overhead during continuous operation, and often have limited situational awareness and inflexible management. We propose EdgeElderCare, a resource-aware, scene-adaptive edge-cloud collaborative system for continuous elderly safety and health monitoring. Its contributions are threefold: (1) a scene-adaptive multi-sensor task-sharing architecture that deploys vision-based fall detection in public areas and privacy-aware millimeter-wave radar in private spaces. Combined with edge-side task scheduling, it provides spatially complementary coverage of public and private areas, mitigates the accuracy–privacy conflict, and reduces computing and bandwidth consumption relative to data-level fusion; (2) a lightweight myocardial infarction detection module deployed on an edge platform, enabling local ECG analysis with low resource overhead; (3) a 3D digital-twin edge-cloud management platform that maps multi-source sensing data to a virtual scene in real time and supports hierarchical visual alerting. Experiments in a real nursing home environment show that the system operated stably on resource-constrained edge hardware: UWB positioning achieved centimeter-level RMSE, visual fall detection reached a recall of 0.90, millimeter-wave radar fall detection achieved accuracy, and F1 above 0.90, and myocardial infarction detection exceeded 0.99 accuracy on the public PTB/PTB-XL benchmark. These results indicate an engineering-feasible approach to intelligent elderly care. Larger-scale and longer-term validation remains the focus of future work. Full article

(This article belongs to the Special Issue Resource-Aware Edge/on-Device Intelligence for Long-Term Autonomous Mobile Systems)

30 pages, 6714 KB

Open AccessArticle

Study on a Method for Identifying Particles Causing High-Speed Fluid Wear Based on Multi-Source Information Fusion

by Long Feng, Zhiyu Xiang, Junming Liu, Feng Zhu, Zhenzhen Zhang and Hongxin Xu

Processes 2026, 14(12), 1918; https://doi.org/10.3390/pr14121918 (registering DOI) - 12 Jun 2026

Abstract

Mechanical Wear particle recognition is an important approach for equipment health monitoring and fault early warning. However, flow-field disturbances and high-speed particle motion in high-speed fluid environments can lead to image degradation, non-stationary electrostatic signals, and insufficient reliability of single-source recognition methods. Therefore, this study proposes a wear particle recognition method based on multi-source information fusion for high-speed fluid environments. The method establishes a multi-scale electrostatic sensing model to characterize the coupling relationship among particle material properties, motion states, and electrostatic response characteristics. Empirical mode decomposition and independent component analysis are combined for adaptive electrostatic signal denoising, and a Transformer network is used to extract multi-domain features. Meanwhile, an ECA-CNN model with an efficient channel attention mechanism is introduced to enhance the feature representation of degraded particle images. On this basis, a meta-learning-based sample-adaptive decision fusion framework is developed to achieve dynamic and complementary fusion of electrostatic and visual information. The experimental results demonstrate that the proposed method exhibits excellent recognition accuracy and robustness in the tested high-speed fluid environment of 10 m/s, achieving a fusion recognition accuracy of 96.0%, which is significantly superior to single-source recognition methods. Ablation experiments further show that removing the global scaling factor, guidance loss, interpolation loss, and category-specific weight generator decreases the average recognition accuracy by 0.7%, 1.2%, 0.4%, and 1.8%, respectively, confirming the contribution of each key module to fusion recognition performance. These findings provide a new technical approach for the online intelligent recognition of wear particles under high-speed fluid conditions and offer theoretical support and methodological guidance for condition monitoring, health assessment, and intelligent operation and maintenance of large-scale equipment. Full article

(This article belongs to the Section Process Control, Modeling and Optimization)

►▼ Show Figures

Figure 1

15 pages, 2984 KB

Open AccessArticle

GG-YOLO: A Lightweight Dual-Path Attention Detector with Dynamic Sampling for Dense Wheat Spike Detection

by Guohong Gao, Fucheng Zhou, Lijun Xu, Jiaxin Zhang and Xueyong Li

Agronomy 2026, 16(12), 1156; https://doi.org/10.3390/agronomy16121156 (registering DOI) - 12 Jun 2026

Abstract

Accurate wheat spike detection is essential for crop phenotyping and yield estimation, but real-world field conditions—such as dense spike overlap, environmental domain shifts, and degradation-induced failures like motion blur—pose significant challenges. Achieving robust perception under these circumstances while maintaining a strict accuracy-efficiency trade-off for edge devices remains a pressing research problem. To overcome these limitations, we propose GG-YOLO, a unified lightweight detection framework specifically tailored for complex agricultural environments. Rather than a simple recombination of existing lightweight modules, GG-YOLO integrates three original structural adaptations: First, a Dual-path Attentive Ghost Mechanism (DAGM) introduces gradient-guided attention modulation to enhance feature discrimination and explicitly resolve feature confusion in dense, overlapping regions. Second, a C3Ghost module combines multi-branch aggregation with linear feature generation, mitigating parameter redundancy in the prediction head by approximately 31% compared to the standard YOLOv8s without sacrificing semantic capacity. Third, DSample, a dynamic upsampling operator featuring an original dual-mode adaptive mechanism, robustly recovers fine-grained spatial details during multi-scale feature pyramid fusion. Extensive cross-dataset experiments on the GlobalWheat2020 and HNKJXYwheat datasets validate the model’s exceptional resilience to domain shifts and varying growth stages. GG-YOLO achieves a precision of 94.35%, a recall of 91.93%, and a state-of-the-art mAP@50 of 96.47%. Furthermore, the model contains only 7.89 M parameters and requires 20.4 GFLOPs, reaching an inference speed of 165 FPS on a desktop GPU and a validated real-time speed of 64 FPS on an NVIDIA Jetson edge computing platform. These results demonstrate that GG-YOLO establishes a superior accuracy-efficiency frontier, making it highly reliable for real-time field deployment in precision agriculture. Full article

(This article belongs to the Section Precision and Digital Agriculture)

►▼ Show Figures

Figure 1

18 pages, 7317 KB

Open AccessArticle

ASM-DBNet: Introducing Adaptive Differentiable Binarization, Spatial-Channel Self-Attention and Multi-Scale Context-Enhanced Dynamic Upsampling for Natural Scene Text Detection

by Xiaoliang Qian, Pengfei Wang, Li Zeng, Mengyang Chen, Wandian Chen, Jinchao Guo and Yanfang Mao

Information 2026, 17(6), 585; https://doi.org/10.3390/info17060585 - 12 Jun 2026

Abstract

Text detection models based on DBNet have demonstrated strong performance in natural scene text detection. However, these models still suffer from the following three issues. Firstly, the amplifying factor hyperparameter in the differentiable binarization (DB) makes it difficult for the text detection model to achieve optimal performance. Secondly, the integration of low-level and high-level features within the backbone’s feature pyramid lacks specific optimization strategies. Thirdly, the deconvolution operation in the prediction head may damage text contours. To tackle the aforementioned issues, this paper presents a text detection model termed ASM-DBNet, which mainly consists of three innovations. For the first issue, an adaptive differentiable binarization (ADB) scheme is proposed. It can independently predict amplifying factor for feature points at different spatial locations and replace the original amplifying factor hyperparameter, thereby improving the overall optimization performance of the model. For the second issue, a spatial-channel self-attention (SCA) module is proposed to optimize the fusion of high-level and low-level features. On the one hand, spatial self-attention is used to enhance the spatial localization ability of high-level features; on the other hand, channel self-attention based on a grouped transformer is used to optimize the fusion results of high-level and low-level features. For the third issue, a multi-scale context-enhanced dynamic upsampling (MC-DyUpS) module is proposed to replace the deconvolution operation in the prediction head. It enhances contextual perception in the region of interpolation points through multi-scale context feature extraction, and then accurately predicts coordinate offsets of interpolation points. The position correction based on these offsets effectively suppresses the spatial deviation caused by deconvolution. Ablation studies demonstrate the effectiveness of the SCA module, MC-DyUpS module, ADB scheme, and their arbitrary combinations. Comprehensive quantitative evaluations demonstrate that ASM-DBNet achieves competitive F1-scores of 84.1%, 84.2%, and 85.7% on the ICDAR 2015, Total-Text, and MSRA-TD500 datasets, respectively, with improvements of 1.8%, 1.4%, and 2.9% over the baseline model. Full article

(This article belongs to the Section Artificial Intelligence)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 116.

Go to page 1 2 3 4 5

Search Results (5,800)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI