Saved Queries

Fine-grained insect pest classification presents a particularly demanding visual recognition challenge due to severe class imbalance, pronounced intra-class morphological variability across developmental stages, and high inter-class visual similarity among taxonomically related species. Existing deep learning approaches typically rely on a single feature representation extracted from a single network depth, overlooking complementary discriminative cues distributed across multiple abstraction levels. Furthermore, classical attention mechanisms perform spatial weighting deterministically, without explicitly modeling the underlying statistical structure of the feature space, which is inherently multimodal on long-tailed benchmarks such as IP102. This study proposes a Multi-Scale Gaussian Mixture Model-Gated Mixture of Experts (GMM-MoE) architecture that operates as a plug-in module insertable into any convolutional backbone, evaluated here on DenseNet-121 at three distinct feature depths. The proposed module computes analytic GMM posterior responsibilities in closed form, softly assigning each spatial location to dedicated convolutional expert sub-networks. At the same time, a conditional prior mechanism π(x) adapts the routing strategy to individual image content rather than relying on fixed priors. The architecture is evaluated on the IP102 benchmark (102 pest classes, ~75,000 images) under a two-stage training protocol. Ablation experiments confirm that increasing the number of experts consistently improves accuracy across all three routing depths, and that multi-scale fusion surpasses any single-scale configuration. The proposed model achieves a mean top-1 accuracy of 74.12% (±0.25%, 95% CI) across three independent runs on the IP102 test set. To the best of our knowledge, this is the first work to employ GMM posterior responsibilities as a spatial routing mechanism within a multi-scale CNN feature hierarchy for fine-grained insect pest classification, establishing a principled probabilistic alternative to deterministic attention weighting in visual recognition systems. Full article

►▼ Show Figures

Figure 1

32 pages, 4734 KB

Open AccessArticle

Multi-Source Remote Sensing–Driven Spatiotemporal Monitoring and SHAP-Based Driver Attribution of Soil Salinization in Arid Northwest China

by Yanrun Ren, Yaonan Zhang, Yufang Min and Yanbo Zhao

Land 2026, 15(6), 903; https://doi.org/10.3390/land15060903 (registering DOI) - 23 May 2026

Abstract

Soil salinization threatens agricultural sustainability in arid zones, yet quantitative attribution of its spatiotemporal dynamics to multi-source drivers remains scarce at regional scales. To address this, we developed an explainable framework merging Sentinel-1/2, ERA5-Land, and topographic-hydrological indices with XGBoost, trained under weak supervision with proxy labels and independently validated using field-measured ECe. A 7-group, 44-feature ensemble with spatial block 5-fold cross-validation ensured robust assessment. SHapley Additive exPlanations (SHAP) quantified driver contributions and enabled a novel dominant driver zoning (DDZ) framework. Monitoring the Hexi Corridor and Tarim Basin (2017–2024) revealed contrasting trajectories: Hexi’s dynamics were primarily climate-driven (Aridity Index), whereas 19.2% of Tarim showed significant salinization along oasis–desert margins co-dominated by elevation, soil indices, and temperature. The model achieved spatial cross-validation R² values around 0.65. DDZ mapping showed climate dominance in 98.2% of Hexi compared to 76.5% in Tarim, where terrain and optical factors were more influential. The weak supervision strategy overcomes scarce in-situ measurements, while the DDZ maps identified that Land-use-dominated zones recorded the highest salinity, offering clear directives for targeted salinity control in arid basins. Full article

(This article belongs to the Section Land Use, Impact Assessment and Sustainability)

26 pages, 2506 KB

Open AccessArticle

Nationwide Daily Wildfire Occurrence Prediction Using Time Proxy Variables and the Canadian Fire Weather Index (FWI)

by Boksoo Choi and Gye-Young Kim

Fire 2026, 9(6), 217; https://doi.org/10.3390/fire9060217 (registering DOI) - 23 May 2026

Abstract

Climate change has intensified global wildfire risks, yet national-scale prediction remains challenging due to the difficulty of consistently monitoring fuel conditions and human ignition factors. This study introduces calendar-based time proxy variables as structural surrogates for these unobservable drivers and integrates them with the Canadian Fire Weather Index (FWI) within a parsimonious framework for seasonally fire-prone regions such as South Korea. Using 15 years of nationwide wildfire records and daily observations from 100 ASOS stations (2011–2025), predictive performance was evaluated across eight models and five feature sets (Time-only, Weather-only, Weather + Time, FWI-only, and FWI + Time). Based on test-set mean AUC, the Time-only feature set reached 0.7374, clearly exceeding the random-classifier baseline (AUC = 0.5) and indicating the independent predictive value of time proxy variables. Furthermore, integrating time proxies with FWI improved performance, with the best model (CatBoost) achieving test AUC = 0.8394 and Recall = 0.6019. Multi-model SHAP analysis revealed complementary contributions of FWI components (53.7% ± 4.7%) and time proxy variables (46.3% ± 4.7%). Overall, the results demonstrate that a simple yet structured input design based on time proxy variables provides meaningful predictive performance for nationwide wildfire early warning systems. Full article

(This article belongs to the Topic AI for Natural Disasters Detection, Prediction and Modeling)

23 pages, 5981 KB

Open AccessArticle

High-Accuracy Prediction of Chunmee Tea Grade via DeepSpectra Model and Near-Infrared Spectroscopy

by Yatong Zhang, Mobing Ren, Xiaohong Wu and Bin Wu

Foods 2026, 15(11), 1848; https://doi.org/10.3390/foods15111848 (registering DOI) - 23 May 2026

Abstract

Chunmee tea quality is critical to its grading, and accurate identification is essential for quality evaluation and market valuation. However, traditional machine learning relies on manual feature extraction and causes spectral information loss, while conventional one-dimensional convolutional neural networks (1D-CNNs) are restricted by fixed kernels and narrow receptive fields, making multi-scale feature capture difficult. In this study, an improved DeepSpectra model integrated with the Inception module and residual connections was proposed for end-to-end automatic grading of Chunmee tea. A total of 360 samples across six grades (60 samples per grade) were collected using an Antaris II near-infrared spectrometer and preprocessed by multiplicative scatter correction (MSC). The proposed model was compared with other models. Results showed that under a 7:1:2 train–validation–test split, the proposed DeepSpectra achieved an average test accuracy of 96.39 ± 1.63% across ten random sample divisions, significantly outperforming the other models (p < 0.05). The model also exhibited excellent stability in five-fold cross-validation and superior generalization in small-sample scenarios, and a lightweight structure with low inference latency of 2.2 ms, which is suitable for real-time industrial applications. This work provides a reliable, efficient, and end-to-end method for grading Chunmee tea and offers a promising strategy for intelligent and rapid quality control of green tea. Full article

(This article belongs to the Special Issue Hyperspectral Imaging and Other Nondestructive Methods for Analyzing Food Quality)

►▼ Show Figures

Figure 1

34 pages, 13304 KB

Open AccessArticle

Wavelet-Fourier Network Combined with Advanced Preprocessing Techniques for Univariate Daily Rainfall Prediction

by Md. Jobayer Parvez Ratul, Usmi Akter, Tajrian Mollick, Eshrat Jahan Mumu, Nondita Deb Nath, Syeda Wasifa Adila, Wafa Saleh Alkhuraiji, Padam Jee Omar and Mohamed Zhran

Water 2026, 18(11), 1264; https://doi.org/10.3390/w18111264 (registering DOI) - 23 May 2026

Abstract

Rainfall prediction is essential for the enhanced understanding of several issues related to water resources and agriculture, such as flood and drought alerts and flood management. Neural network models are frequently used due to their capability of effectively handling large datasets and addressing the non-stationarity of rainfall data series, resulting in better accuracy and affordable solutions. However, further study is necessary to comprehend the dynamic nature and extreme events of rainfall. Therefore, we implemented a novel wavelet Fourier-enhanced network (W-FENet) that included a Fourier enhancement module (FEMEX) and an improved U-Net mechanism to strengthen the predictive accuracy of daily rainfall. The adopted U-Net structure facilitated efficient multiscale feature extraction and preservation of temporal rainfall information through encoder–decoder connections and residual learning. The results of the developed models for one-day-ahead rainfall prediction were evaluated against two traditional neural network models, i.e., artificial neural networks and long short-term memory networks. Mongla, being a coastal station and having a highly non-linear rainfall pattern, operated by the Bangladesh Meteorological Department, was selected as the study area. Four preprocessing techniques were incorporated to enhance the robustness of the models: empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD), variational mode decomposition (VMD), and successive variational mode decomposition (SVMD). The SVMD-enhanced W-FENet model (abbreviated as W5) demonstrated significant improvements over existing literature with RMSE = 2.226 mm, MAE = 1.131 mm, PCC = 0.988, NSE = 0.974, and WI = 0.993 at the testing phase. Full article

(This article belongs to the Special Issue Climate Change and Hydrological Processes, 3rd Edition)

26 pages, 3619 KB

Open AccessArticle

Rapid Detection of Mixed Gases from Lithium Battery Thermal Runaway Based on ISA-LSTM-TCN

by Ruqi Guo, Qian Yu, Hao Li, Zilong Pu and Mingzhi Jiao

Batteries 2026, 12(6), 188; https://doi.org/10.3390/batteries12060188 (registering DOI) - 23 May 2026

Abstract

As new energy vehicles and energy storage systems become more common, safety accidents caused by lithium-ion batteries overheating have become more of a concern. Early detection based on distinctive gases (such as H₂ and CO) can give an earlier warning than typical monitoring methods like temperature, voltage, or impedance. Nonetheless, attaining high-precision identification in intricate mixed-gas settings continues to be difficult because of the considerable cross-sensitivity of metal oxide semiconductor (MOS) gas sensors. This research presents an ISA-LSTM-TCN multi-task learning model utilizing an enhanced spatial attention mechanism for the swift identification and concentration forecasting of distinctive gases during lithium-ion battery thermal runaway. The model improves key feature extraction and anti-noise performance by combining the long-term temporal modeling ability of the Long Short-Term Memory (LSTM) network with the multi-scale feature extraction ability of the Temporal Convolutional Network (TCN). It also adds an Improved Spatial Attention (ISA) module with a residual multiplication structure. Moreover, in a multi-task learning framework, joint optimization of gas categorization and concentration regression is facilitated using a hard parameter-sharing method. Tests using a built MOS sensor array dataset show that the model is 99.23% accurate at classifying gases and that the

R^{2}

values for predicting H₂ and CO concentrations are 0.9510 and 0.8400, respectively. Tests on public datasets and in different noisy environments show that the model is even better at generalizing and is more robust. The results show that the suggested method allows for quick, accurate detection of thermal runaway gases. This makes it an effective and smart way to monitor battery safety warning systems. Full article

(This article belongs to the Special Issue Advances in Lithium-Ion Battery Safety and Fire: 2nd Edition)

►▼ Show Figures

Figure 1

20 pages, 11051 KB

Open AccessArticle

A Cross-Scale Decoder with Token Refinement for Off-Road Semantic Segmentation

by Seongkyu Choi and Jhonghyun An

Appl. Sci. 2026, 16(11), 5238; https://doi.org/10.3390/app16115238 (registering DOI) - 23 May 2026

Abstract

Off-road semantic segmentation is challenging due to irregular terrain, vegetation clutter, class-level similarity, and ambiguous boundary annotations. Existing decoder designs often rely on compact bottlenecks that oversmooth fine structures or repeated multi-scale fusion that can amplify annotation noise and increase computational cost. To address these limitations, we propose a Cross-Scale Decoder for robust off-road semantic segmentation. The proposed decoder first stabilizes semantic representations through Global–Local Token Refinement (GLTR) on a compact bottleneck lattice. It then selectively incorporates fine-scale structural cues using Boundary-Guided Correction (BGC) and Gated Cross-Scale Interaction (GCS), avoiding dense and repeated feature fusion. In addition, uncertainty-guided class-aware point refinement focuses computation on ambiguous and low-confidence regions. Experiments on standard off-road benchmarks demonstrate that the proposed method improves segmentation accuracy and boundary consistency over existing approaches while maintaining practical inference efficiency. Full article

(This article belongs to the Special Issue Advances in Autonomous Driving: Detection and Tracking)

16 pages, 1495 KB

Open AccessArticle

DDCATNet: Effective Deep Learning-Based Illumination Color Cast Estimation Approach for Achieving Computational Color Constancy

by Ho-Hyoung Choi

Sensors 2026, 26(11), 3313; https://doi.org/10.3390/s26113313 (registering DOI) - 23 May 2026

Abstract

Digital camera sensors are designed to capture a wide range of incident illuminants, enabling the creation of high-quality images. However, these sensors lack the capability to differentiate between the color of the source illuminant and the actual color (or original color) of the object being captured. For this reason, the computational color constancy (CCC) was introduced and has been developed over decades. The CCC is an approach to modeling the color perception of the human visual system (HVS) by ensuring accurate object color determination under varying source illuminant conditions. At the core of human visual perception (HVP)-based CCC is attaining higher accuracy in scene illuminant estimation. The emergence of deep convolutional neural networks (DCNNs) was a recent innovation in accurate illuminant estimation, fundamentally transforming the CCC research landscape. Nevertheless, accurate illuminant estimation still remains a huge challenge for both traditional and state-of-the-art (SOTA) approaches. To further advance precision in illuminant estimation, this article presents a novel learning-based illumination color cast estimation approach to HVP-based CCC. Most importantly, the proposed approach is intended to integrate informative features into both channel and spatial regions while preserving long-term dependency feature information with the use of dense skip connections. To achieve these objectives, the proposed Dense Dual Connection Aggregated Transform Network (DDCATNet) architecture is designed to comprise several modules: shallow feature extraction, channel-wise and spatial feature-based Dense Dual Connection (DDC), fusion of the dense channel-wise attention (CA) and spatial attention (SA) branches through a gate mechanism (GM) unit, and aggregate transform. It is worth noting that both the CA blocks and the SA blocks in the DDC module are characterized by dense and cascading connections, meant to preserve long-term feature information and modulate different-level feature information at both global and local scales. The densely connected CA branch (DCA) and the densely connected SA branch (DSA) are also highly effective in securing high-contribution information while suppressing redundant data. The GM unit is integrated at the back of the DDC module, fusing the two DCA and DSA branches to ensure the adaptive merging of useful hierarchical feature information and the extraction of more valuable feature information. As a result, the proposed DDCATNet architecture significantly enhanced precision in illuminant estimation, thereby improving performance. In rigorous experiments on a wide range of datasets, the proposed DDCATNet approach outperformed its SOTA counterparts, validating the efficacy and generalization capabilities, as well as robust camera-invariance, across diverse, single- and multi-illuminant datasets and model architectures. Full article

(This article belongs to the Section Sensing and Imaging)

15 pages, 3512 KB

Open AccessArticle

A Robust Multi-Branch CNN-LSTM Architecture for Cross-Subject Motor Imagery Classification

by Simone Zini, Federico Bidone and Paolo Napoletano

Sensors 2026, 26(11), 3310; https://doi.org/10.3390/s26113310 (registering DOI) - 23 May 2026

Abstract

Brain–computer interfaces (BCIs) based on motor imagery (MI) aim to convert electroencephalographic (EEG) activity into reliable device commands across users and recording setups. However, low signal-to-noise ratio and strong inter-subject variability still limit true “plug-and-play” deployment without lengthy calibration. To address these challenges, we propose a multi-branch convolutional long short-term memory (CNN-LSTM) architecture that jointly performs multi-scale temporal feature extraction and within-trial sequence modeling. The model employs four parallel 1D convolutional branches with distinct kernel sizes, each followed by an LSTM module and late fusion, combined with group normalization and supervision over sequences of sub-windows within each trial. We evaluate the approach on the EEG Motor Movement/Imagery (EEGMMI) dataset from PhysioNet under strictly subject-independent conditions, and on the ISLab-MI Dataset, a 32-channel wearable-EEG collection designed to assess cross-setup robustness. On EEGMMI, the network achieves up to 82.63% accuracy for binary left/right MI and 74.10% for a four-class task using 4 s trials under 5-fold cross-validation, outperforming an EEGNet-style baseline by 1–10% depending on class count and window length. Under a leave-one-subject-out protocol, the model attains 74.9% mean accuracy for a three-class MI task. Zero-shot transfer to ISLab-MI yields 64.60% and 63.02% accuracy in three- and four-class settings, respectively, while brief subject-specific fine-tuning using only 20% of each session improves performance to 81.38% and 73.48%. These findings show that combining multi-scale convolutional feature extraction with explicit sequence modeling and robust normalization yields accurate, data-efficient, and portable MI decoders suitable for practical BCI applications. Full article

(This article belongs to the Section Biomedical Sensors)

►▼ Show Figures

Figure 1

19 pages, 5072 KB

Open AccessArticle

MDCL-DETR: Multi-Domain Enhancement and Cross-Layer Feature Fusion for Small Object Detection

by Tianran Hao, Xiao Zhang and Bing Zhou

Sensors 2026, 26(11), 3305; https://doi.org/10.3390/s26113305 - 22 May 2026

Abstract

Small object detection in uncrewed aerial vehicle (UAV) imagery is hindered by limited pixels, insufficient detailed information, and strong background interference, leading to weak feature representation and poor contextual modeling. To address these issues, we propose a multi-domain enhancement and cross-layer feature fusion detection Transformer (MDCL-DETR) with progressive feature processing. First, a multi-domain enhancement module (MDEM) based on CSP (cross stage partial) structure is proposed, which fuses spatial and frequency-domain features in a lightweight manner to enhance object detail and global structures while effectively distinguishing object features from background interference. Second, a cross-layer feature extraction module (CLEM) is introduced to aggregate multi-scale features across layers, alleviate information loss caused by downsampling, and preserve spatial details of small objects while integrating high-level contextual semantics. Meanwhile, a gated Mamba fusion module (GMFM) is proposed, which adopts the Mamba architecture for long-range dependency modeling of multi-scale features and integrates a gating mechanism to realize the dynamic weighted fusion of local details and global context, further improving feature discriminability and global modeling capability. Finally, a fine-grained enhancement module (FGEM) is designed, which leverages feature reorganization and adaptive feature extraction to reinforce and compensate fine-grained features. Extensive experimental results validate the effectiveness and generalization of the proposed method, achieving mAP

_{50}

scores of

54.1 %

and

56.2 %

on the VisDrone2019 and AI-TOD datasets. Full article

(This article belongs to the Section Sensing and Imaging)

19 pages, 1326 KB

Open AccessArticle

A Lightweight Network for Encrypted Traffic Classification Based on Convolutional Positional Encoding and Efficient Multi-Scale Attention

by Yuan Feng, Yifan Ren, Jianwei Zhang, Zengyu Cai, Juncheng Yang and Liang Zhu

Electronics 2026, 15(11), 2248; https://doi.org/10.3390/electronics15112248 - 22 May 2026

Abstract

Network traffic classification is a cornerstone of network management and security. Addressing the challenges of feature extraction in encrypted traffic and the deployment limitations of traditional deep learning models on resource-constrained edge devices due to their large parameter sizes, this paper proposes a lightweight network for encrypted traffic classification, termed CEMA-Net (Convolutional Positional Encoding and Efficient Multi-scale Attention Network). Specifically, the proposed model integrates an Efficient Multi-scale Attention (EMA) mechanism with a Convolutional Positional Encoding (CPE) strategy to jointly capture global dependencies and local contextual information. To enable efficient adaptation to traffic data, an Efficient Multi-scale Attention Adapter (EMAAdapter) is designed, which reconstructs one-dimensional traffic sequences into a pseudo-2D representation and extracts horizontal, vertical, and local features in parallel. This design facilitates effective modeling of complex cross-scale dependencies in encrypted traffic with minimal computational overhead. Experimental results on three public datasets demonstrate that the proposed method, with only 0.66 M parameters, achieves superior classification performance compared with mainstream vision-based models such as ResNet-101, while significantly reducing computational cost. These results highlight the effectiveness of combining convolutional positional encoding with multi-scale attention mechanisms and provide an efficient solution for encrypted traffic classification in resource-constrained environments. Full article

(This article belongs to the Section Networks)

22 pages, 1543 KB

Open AccessArticle

Bridging Annotation Gaps: Hierarchical Self-Support Learning for Brain Tumor Segmentation

by Saqib Qamar, Mohd Fazil and Zubair Ashraf

Diagnostics 2026, 16(11), 1588; https://doi.org/10.3390/diagnostics16111588 - 22 May 2026

Abstract

Background: Accurate brain tumor segmentation from Magnetic Resonance Imaging (MRI) depends on the fusion of multiple complementary modalities. However, clinical practice often faces incomplete modality sets due to acquisition failures, patient contraindications, or protocol variations. Current methods either treat each modality feature extractor in isolation or depend on computationally expensive teacher networks for cross-modal knowledge transfer. Objective: This paper presents Hierarchical Adaptive Group Self-Support Learning with Boundary-Aware Calibration (HAGSS), a framework that overcomes three key limitations of existing group self-support methods: static group formation that ignores temporal prediction quality, uniform treatment of boundary and interior voxels, and distribution mismatch across heterogeneous modality logits. Methods: We propose a hierarchical adaptive group formation mechanism that reassigns group leader roles at each epoch based on voxel-level prediction confidence scores instead of fixed sensitivity priors. We also introduce a boundary-aware calibration module that applies spatially varied distillation weights with greater emphasis on tumor boundary regions. In addition, we design a cross-scale consistency regularization term that enforces agreement between multi-resolution predictions to stabilize the self-support target. Results: Experiments on BraTS2020, BraTS2018, and BraTS2021 datasets show that HAGSS achieves consistent improvements over state-of-the-art baselines. The average Dice gains across the whole tumor, tumor core, and enhancing tumor regions reach 1.30% on BraTS2020 and 1.61% on BraTS2021 compared to existing methods. All improvements are statistically significant (

p < 0.05

). Conclusions: HAGSS operates exclusively during training, adds no parameters or inference cost, and can be applied as a plug-in module to any multi-encoder incomplete multi-modal segmentation architecture. Code is publicly available at GitHub. Full article

(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)

30 pages, 3472 KB

Open AccessArticle

Dynamic Recency-Weighted Multi-Scale PatchTST with Physically Motivated Statistical Anchors for Robust BDS-3 Clock Bias Prediction

by Chengling Cai, Shuai Wang, Shaohui Li, Weijia Huang and Kun Xie

Eng 2026, 7(6), 252; https://doi.org/10.3390/eng7060252 - 22 May 2026

Abstract

High-precision satellite clock offset prediction is a core prerequisite for the BeiDou-3 Global Navigation Satellite System to achieve precise single-point positioning and timing. However, because of space radiation and the physical aging of the clock itself, the operational state of onboard atomic clocks exhibits a high degree of physical heterogeneity and time-varying drift characteristics. Traditional physical models struggle to capture complex nonlinear residuals, while existing deep learning methods often face boundary discontinuities caused by baseline separation when handling long-sequence forecasts. Furthermore, channel crosstalk in multivariate prediction and insufficient sensitivity to dynamic multiscale features limit the robustness of long-term predictions. To address these issues, this paper proposes a clock offset prediction architecture that integrates physically motivated statistical constraints with dynamic adaptive feature learning. Extensive experiments conducted using real BDS-3 precise clock difference products provided by Wuhan University demonstrate that the proposed method effectively mitigates the performance degradation often observed in existing models on heterogeneous satellites during the evaluated period. In the 24-h extrapolation task, the architecture achieved an average root-mean-square error as low as 0.507 ns, significantly improving prediction accuracy. It outperformed mainstream physical models and advanced deep learning baseline algorithms, providing a promising framework with good interpretability for high-precision clock error forecasting under dynamic space weather conditions. Full article

27 pages, 1685 KB

Open AccessArticle

EMWMS-YOLO: Efficient Multi-Scale Detection Framework for Small Objects in Challenging Remote Sensing Scenes

by Shuo Tian, Yuguo Li, Jian Li, Wenzheng Sun, Longfa Chen and Na Meng

Remote Sens. 2026, 18(11), 1682; https://doi.org/10.3390/rs18111682 - 22 May 2026

Abstract

Nowadays, remote sensing images are characterized by significant scale variations, a high density of small targets, and complex background conditions, which pose substantial challenges for small-object detection. To address these issues, we propose EMWMS-YOLO, a lightweight and efficient detection framework built upon YOLOv11n. Specifically, an Efficient Multi-Scale Cross-Layer Extraction (EMSCLE) backbone is designed by integrating the Dual-Branch Feature Extraction (DBFE), Multi-Scale Feature Perception (MSFP), and Spatial Pyramid Pooling Fast with Large Separable Kernel Attention (SPPF-LSKA) modules, enabling effective multi-scale feature extraction and cross-channel interaction. Furthermore, a Multi-Scale Adaptive Feature Fusion (MSAFF) neck architecture, composed of the Channel-Enhanced Convolution (CEC) and Multi-Scale Gated Feature Fusion (MSGFF) modules, is introduced to dynamically fuse cross-scale features and enhance salient target responses while suppressing background noise. In addition, the WaveletPool module replaces conventional pooling operations to reduce information loss and feature aliasing while preserving structural details. A Detect-MultiSEAM detection head is constructed by embedding a multi-scale spatial enhancement attention mechanism, which improves feature representation under complex conditions and reduces missed detections and false positives. Finally, the ShapeIoU loss function is employed to better model geometric and morphological properties, thereby improving localization accuracy. Experimental results on the VEDAI and NWPU-VHR-10 datasets demonstrate that the proposed method achieves improvements of 9.8% and 4.1% in mAP50 over the YOLOv11n baseline, respectively, verifying its effectiveness in small-object detection. Full article

(This article belongs to the Section Remote Sensing Image Processing)

23 pages, 1978 KB

Open AccessArticle

A Multi-Scale Attention-Enhanced YOLOv26 Framework for Steel Structure Corrosion Detection and Segmentation

by Hongmei Hou, Zhixin Wang, Jianbo Zheng, Jinzhen Xi and Libin Tian

Buildings 2026, 16(11), 2057; https://doi.org/10.3390/buildings16112057 - 22 May 2026

Abstract

Steel structures in complex service environments are highly susceptible to corrosion, making accurate detection challenging. This study proposes an improved YOLOv26-based method for corrosion damage segmentation. A diverse dataset is constructed by combining field-collected and public data with varying lighting conditions and multi-scale features. Enhancements to the YOLOv26-seg architecture include integrating Efficient Channel Attention (ECA) in the backbone to strengthen low-contrast feature representation, designing a multi-branch attention mechanism (ECA + CBAM) in the detection head to improve small- and medium-scale target recognition, and introducing Selective Kernel Attention (SKA) in the segmentation branch to refine boundary details. The resulting YOLOv26-ECS model achieves an mAP50 of 0.920 and mAP50–95 of 0.851 on the self-constructed dataset, outperforming the baseline by 5.0% and 6.0%, respectively, while maintaining 28.34 FPS. Experiments on public datasets further demonstrate strong generalization. A GUI system is also developed for visualization and practical deployment. Overall, the proposed method delivers accurate and efficient corrosion detection and segmentation, showing strong potential for engineering applications. Full article

(This article belongs to the Section Building Structures)

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 186.

Go to page 1 2 3 4 5

Search Results (9,251)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI