Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (331)

Search Parameters:
Keywords = multiscale dilated convolution

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 1218 KB  
Article
A Gradient-Compensated Feature Learning Network for Infrared Small Target Detection
by Yanwei Wang, Haitao Zhang, Xiangyue Zhang and Xinhao Zheng
Electronics 2026, 15(4), 868; https://doi.org/10.3390/electronics15040868 - 19 Feb 2026
Viewed by 46
Abstract
Infrared small target detection under complex backgrounds remains challenging due to the extremely small target size and low contrast with the surrounding background. These factors make contour information difficult to extract and often cause target features to attenuate or disappear during deep feature [...] Read more.
Infrared small target detection under complex backgrounds remains challenging due to the extremely small target size and low contrast with the surrounding background. These factors make contour information difficult to extract and often cause target features to attenuate or disappear during deep feature learning. To address these issues, this paper proposes a Gradient-Compensation-based Feature Learning Network (GCFLNet). GCFLNet adopts a multi-module collaborative design to enhance feature representation and fusion. First, an Edge Enhancement Module (EEM) is introduced to accurately capture fine-grained edge information of infrared small targets while suppressing background noise through smoothing operations. This provides reliable structural cues for subsequent feature extraction. Second, the extracted edge features are embedded into a Global–Local Feature Interaction (GLFI) module, which is inspired by self-attention mechanisms with dilated convolutions to strengthen global semantic dependencies and local detail representation, enabling effective enhancement of target features. In addition, a Multi-Scale Information Compensation (MSIC) module is designed to exploit the complementary characteristics of multi-scale features across spatial and channel dimensions, guiding efficient fusion of high-level and low-level information. Experimental results on the NUDT and IRSTD-1K datasets demonstrate that GCFLNet outperforms existing state-of-the-art methods, achieving higher detection accuracy and robustness for infrared small targets in complex backgrounds. Full article
Show Figures

Figure 1

27 pages, 18819 KB  
Article
DSAFNet: Dilated–Separable Convolution and Attention Fusion Network for Real-Time Semantic Segmentation
by Wencong Lv, Xin Liu, Jianjun Zhang, Dongmei Luo and Ping Han
Electronics 2026, 15(4), 866; https://doi.org/10.3390/electronics15040866 - 19 Feb 2026
Viewed by 42
Abstract
Real-time semantic segmentation has been widely adopted in resource-constrained applications such as mobile devices, autonomous driving, and drones due to its high efficiency. However, existing lightweight networks often compromise segmentation accuracy to reduce parameter count and improve inference speed. To achieve an optimal [...] Read more.
Real-time semantic segmentation has been widely adopted in resource-constrained applications such as mobile devices, autonomous driving, and drones due to its high efficiency. However, existing lightweight networks often compromise segmentation accuracy to reduce parameter count and improve inference speed. To achieve an optimal balance among accuracy, latency, and model size, we propose the Dilated–Separable Convolution and Attention Fusion Network (DSAFNet), a lightweight real-time semantic segmentation network based on an asymmetric encoder–decoder framework. DSAFNet integrates three core components: (i) the Double-Layer Multi-Branch Depthwise Convolution (DL-MBDC) module that fuses channel splitting and multi-branch depthwise convolutions to efficiently extract multi-scale features with minimal parameters; (ii) the Multi-scale Dilated Fusion Attention (MDFA) module that utilizes factorized dilated convolutions and channel-spatial collaborative attention to expand the receptive field and reinforce key contextual features; (iii) the Multi-scale Attention Lightweight Decoder (MALD) that integrates multi-scale feature maps to generate attention-guided segmentation results. Experiments conducted on an RTX 3090 platform demonstrate that DSAFNet, with only 1.00 M parameters, achieves 74.78% mIoU and a frame rate of 74.74 FPS on the Cityscapes dataset, while 70.5% mIoU and a frame rate of 89.5 FPS on the CamVid dataset. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 3610 KB  
Article
LCS-Net: Learnable Color Correction and Selective Multi-Scale Fusion for Underwater Image Enhancement
by Gang Li and Xiangfei Zhao
Sensors 2026, 26(4), 1323; https://doi.org/10.3390/s26041323 - 18 Feb 2026
Viewed by 138
Abstract
Underwater images are frequently degraded by wavelength-dependent absorption and scattering, which introduce strong color casts, reduce contrast, and obscure fine structures. Although learning-based enhancement methods have recently improved perceptual quality, many remain computationally intensive, limiting deployment on resource-constrained underwater platforms. To address this [...] Read more.
Underwater images are frequently degraded by wavelength-dependent absorption and scattering, which introduce strong color casts, reduce contrast, and obscure fine structures. Although learning-based enhancement methods have recently improved perceptual quality, many remain computationally intensive, limiting deployment on resource-constrained underwater platforms. To address this challenge, we propose LCS-Net, a lightweight framework for single underwater image enhancement that targets a favorable quality–efficiency trade-off. LCS-Net first applies a dynamic Learnable Color Correction Module (LCCM) that predicts image-specific correction parameters from global color statistics, enabling low-overhead cast compensation and stabilizing the input distribution. Feature extraction is conducted using efficient inverted residual blocks equipped with squeeze-and-excitation (SE) to recalibrate channel responses and facilitate detail recovery under scattering-induced degradation. At the bottleneck, a Selective Multi-Scale Dilated Block (SMSDB) aggregates complementary context via parallel dilated convolutions and global cues and adaptively reweights the fused features to handle diverse water conditions. Extensive experiments on public benchmarks demonstrate that LCS-Net achieves competitive performance, yielding a PSNR of 26.46 dB and an SSIM of 0.92 on UIEB, along with 28.71 dB and 0.86 on EUVP, while maintaining a compact model size and low computational cost, highlighting its potential for practical deployment. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

39 pages, 9763 KB  
Article
SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection
by Lanfang Lei, Sheng Chang, Zhongzhen Sun, Xinli Zheng, Changyu Liao, Wenjun Wei, Long Ma and Ping Zhong
Remote Sens. 2026, 18(4), 619; https://doi.org/10.3390/rs18040619 - 16 Feb 2026
Viewed by 224
Abstract
Synthetic aperture radar (SAR) imagery is widely used for target detection in complex backgrounds and adverse weather conditions. However, high-precision detection of rotated small targets remains challenging due to severe speckle noise, significant scale variations, and the need for robust rotation-aware representations. To [...] Read more.
Synthetic aperture radar (SAR) imagery is widely used for target detection in complex backgrounds and adverse weather conditions. However, high-precision detection of rotated small targets remains challenging due to severe speckle noise, significant scale variations, and the need for robust rotation-aware representations. To address these issues, we propose SAR-DRBNet, a high-precision rotated small-target detection framework built upon YOLOv13. First, we introduce a Detail-Enhanced Oriented Bounding Box detection head (DEOBB), which leverages multi-branch enhanced convolutions to strengthen fine-grained feature extraction and improve oriented bounding box regression, thereby enhancing rotation sensitivity and localization accuracy for small targets. Second, we design a Ck-MultiDilated Reparameterization Block (CkDRB) that captures multi-scale contextual cues and suppresses speckle interference via multi-branch dilated convolutions and an efficient reparameterization strategy. Third, we propose a Dynamic Feature Weaving module (DynWeave) that integrates global–local dual attention with dynamic large-kernel convolutions to adaptively fuse features across scales and orientations, improving robustness in cluttered SAR scenes. Extensive experiments on three widely used SAR rotated object detection benchmarks (HRSID, RSDD-SAR, and DSSDD) demonstrate that SAR-DRBNet achieves a strong balance between detection accuracy and computational efficiency compared with state-of-the-art oriented bounding box detectors, while exhibiting superior cross-dataset generalization. These results indicate that SAR-DRBNet provides an effective and reliable solution for rotated small-target detection in SAR imagery. Full article
Show Figures

Figure 1

19 pages, 2559 KB  
Article
A CPO-Optimized BiTCN–BiGRU–Attention Network for Short-Term Wind Power Forecasting
by Liusong Huang, Adam Amril bin Jaharadak, Nor Izzati Ahmad and Jie Wang
Energies 2026, 19(4), 1034; https://doi.org/10.3390/en19041034 - 15 Feb 2026
Viewed by 271
Abstract
Short-term wind power prediction is pivotal for maintaining the stability of power grids characterized by high renewable energy penetration. However, wind power time series exhibit complex characteristics, including local turbulence-induced fluctuations and long-term temporal dependencies, which challenge traditional forecasting models. Furthermore, the performance [...] Read more.
Short-term wind power prediction is pivotal for maintaining the stability of power grids characterized by high renewable energy penetration. However, wind power time series exhibit complex characteristics, including local turbulence-induced fluctuations and long-term temporal dependencies, which challenge traditional forecasting models. Furthermore, the performance of hybrid deep learning models is often compromised by the difficulty of tuning hyperparameters over non-convex optimization surfaces. To address these challenges, this study proposes a novel framework: CPO—BiTCN—BiGRU—Attention. Adopting a physically motivated “Filter–Memorize–Focus” strategy, the model first employs a Bidirectional Temporal Convolutional Network (BiTCN) with dilated causal convolutions to extract multi-scale local features and denoise raw data. Subsequently, a Bidirectional Gated Recurrent Unit (BiGRU) captures global temporal evolution, while an attention mechanism dynamically weights critical time steps corresponding to ramp events. To mitigate hyperparameter uncertainty, the Crowned Porcupine Optimization (CPO) algorithm is introduced to adaptively tune the network structure, balancing global exploration and local exploitation more effectively than traditional swarm algorithms. Experimental results obtained from real-world wind farm data in Xinjiang, China, demonstrate that the proposed model consistently outperforms State-of-the-Art benchmark models. Compared with the best competing methods, the proposed framework reduces MAE and MAPE by approximately 30–45%, while maintaining competitive RMSE performance, indicating improved average forecasting accuracy and robustness under varying operating conditions. The results confirm that the proposed architecture effectively decouples local noise from global trends, providing a robust and practical solution for short-term wind power forecasting in grid dispatching applications. Full article
(This article belongs to the Section A3: Wind, Wave and Tidal Energy)
Show Figures

Figure 1

20 pages, 2405 KB  
Article
Confidence-Guided Adaptive Diffusion Network for Medical Image Classification
by Yang Yan, Zhuo Xie and Wenbo Huang
J. Imaging 2026, 12(2), 80; https://doi.org/10.3390/jimaging12020080 - 14 Feb 2026
Viewed by 129
Abstract
Medical image classification is a fundamental task in medical image analysis and underpins a wide range of clinical applications, including dermatological screening, retinal disease assessment, and malignant tissue detection. In recent years, diffusion models have demonstrated promising potential for medical image classification owing [...] Read more.
Medical image classification is a fundamental task in medical image analysis and underpins a wide range of clinical applications, including dermatological screening, retinal disease assessment, and malignant tissue detection. In recent years, diffusion models have demonstrated promising potential for medical image classification owing to their strong representation learning capability. However, existing diffusion-based classification methods often rely on oversimplified prior modeling strategies, which fail to adequately capture the intrinsic multi-scale semantic information and contextual dependencies inherent in medical images. As a result, the discriminative power and stability of feature representations are constrained in complex scenarios. In addition, fixed noise injection strategies neglect variations in sample-level prediction confidence, leading to uniform perturbations being imposed on samples with different levels of semantic reliability during the diffusion process, which in turn limits the model’s discriminative performance and generalization ability. To address these challenges, this paper proposes a Confidence-Guided Adaptive Diffusion Network (CGAD-Net) for medical image classification. Specifically, a hybrid prior modeling framework is introduced, consisting of a Hierarchical Pyramid Context Modeling (HPCM) module and an Intra-Scale Dilated Convolution Refinement (IDCR) module. These two components jointly enable the diffusion-based feature modeling process to effectively capture fine-grained structural details and global contextual semantic information. Furthermore, a Confidence-Guided Adaptive Noise Injection (CG-ANI) strategy is designed to dynamically regulate noise intensity during the diffusion process according to sample-level prediction confidence. Without altering the underlying discriminative objective, CG-ANI stabilizes model training and enhances robust representation learning for semantically ambiguous samples.Experimental results on multiple public medical image classification benchmarks, including HAM10000, APTOS2019, and Chaoyang, demonstrate that CGAD-Net achieves competitive performance in terms of classification accuracy, robustness, and training stability. These results validate the effectiveness and application potential of confidence-guided diffusion modeling for two-dimensional medical image classification tasks, and provide valuable insights for further research on diffusion models in the field of medical image analysis. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

29 pages, 9489 KB  
Article
Lightweight Gearbox Fault Diagnosis Under High Noise Based on Improved Multi-Scale Depthwise Separable Convolution and Efficient Channel Attention
by Xiubin Liu, Wei Li, Haoming Li, Yong Zhu and Ramesh K. Agarwal
Sensors 2026, 26(4), 1196; https://doi.org/10.3390/s26041196 - 12 Feb 2026
Viewed by 118
Abstract
Gearbox fault diagnosis under strong-noise conditions remains challenging due to the difficulty of extracting weak fault-related features from noise-dominated vibration signals, inefficient modeling of multi-scale impulsive characteristics under limited computational resources, and degraded diagnostic stability across varying noise levels. To address these issues, [...] Read more.
Gearbox fault diagnosis under strong-noise conditions remains challenging due to the difficulty of extracting weak fault-related features from noise-dominated vibration signals, inefficient modeling of multi-scale impulsive characteristics under limited computational resources, and degraded diagnostic stability across varying noise levels. To address these issues, this paper proposes a lightweight fault diagnosis model (DSMC-ECA) that integrates an improved multi-scale depthwise separable convolution scheme with efficient channel attention. The proposed model adopts a dual-branch parallel feature extraction architecture: the SMC branch captures local fine-grained impulsive features, while the SMDC branch expands the receptive field via multi-scale separable dilated convolutions to model long-range dependencies. Meanwhile, ECA is embedded into the multi-scale features for channel-wise recalibration, highlighting fault-relevant discriminative information and suppressing noise disturbances. The model contains only 0.204 M parameters and requires 10.037 M FLOPs, achieving a favorable trade-off between performance and efficiency. Experimental results on the XJTU and SEU datasets demonstrate that DSMC-ECA consistently outperforms baseline methods across a wide range of signal-to-noise ratios (from −6 dB to noise-free conditions). Notably, under the most challenging −6 dB setting, it achieves the highest average diagnostic accuracies of 95.11% (XJTU) and 86.84% (SEU). Full article
Show Figures

Figure 1

16 pages, 3837 KB  
Article
DKTransformer: An Accurate and Efficient Model for Fine-Grained Food Image Classification
by Hongjuan Wang, Chenxi Wang and Xinjun An
Sensors 2026, 26(4), 1157; https://doi.org/10.3390/s26041157 - 11 Feb 2026
Viewed by 117
Abstract
With the rapid development of dietary analysis and health computing, food image classification has attracted increasing attention. However, this task remains challenging due to the fine-grained nature of food categories. Different classes are visually similar, whereas samples within the same class exhibit large [...] Read more.
With the rapid development of dietary analysis and health computing, food image classification has attracted increasing attention. However, this task remains challenging due to the fine-grained nature of food categories. Different classes are visually similar, whereas samples within the same class exhibit large appearance variations. Existing methods often rely excessively on either global or local features, limiting their effectiveness in complex food scenes. To address these challenges, this paper proposes DKTransformer, a lightweight hybrid architecture that combines Vision Transformers (ViT) and convolutional neural networks (CNNs) for fine-grained food image classification. Specifically, DKTransformer introduces a Local Feature Extraction (LDE) module based on depthwise separable convolution to enhance local detail modeling. Furthermore, a Multi-Scale Dilated Attention (MSDA) module is designed to capture long-range dependencies with reduced computational cost while suppressing background interference. In addition, an Efficient Kolmogorov–Arnold Network (EfficientKAN) is employed to replace the conventional feedforward network, further reducing parameter redundancy. Experimental results on three public food image datasets—ETH Food-101, Vireo-Food-172, and ISIA Food-500—demonstrate the effectiveness of the proposed method. In particular, DKTransformer achieves a Top-1 accuracy of 92.71% on the ETH Food-101 dataset with 47 M parameters and 7.21 G FLOPs. Moreover, DKTransformer attains 90.70% Top-1 accuracy on Vireo-Food-172 and 66.89% on Food-500, indicating strong generalization across different food styles and dataset scales. These results suggest that DKTransformer achieves a favorable balance between accuracy and efficiency for fine-grained food image classification. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

27 pages, 18987 KB  
Article
YOLO11s-UAV: An Advanced Algorithm for Small Object Detection in UAV Aerial Imagery
by Qi Mi, Jianshu Chao, Anqi Chen, Kaiyuan Zhang and Jiahua Lai
J. Imaging 2026, 12(2), 69; https://doi.org/10.3390/jimaging12020069 - 6 Feb 2026
Viewed by 289
Abstract
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in [...] Read more.
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in aerial footage, and limited computational resources onboard. To address these issues, this paper proposes an improved UAV-based small object detection algorithm, YOLO11s-UAV, specifically designed for aerial imagery. Firstly, we introduce a novel FPN, called Content-Aware Reassembly and Interaction Feature Pyramid Network (CARIFPN), which significantly enhances small object feature detection while reducing redundant network structures. Secondly, we apply a new downsampling convolution for small object feature extraction, called Space-to-Depth for Dilation-wise Residual Convolution (S2DResConv), in the model’s backbone. This module effectively eliminates information loss caused by pooling operations and facilitates the capture of multi-scale context. Finally, we integrate a simple, parameter-free attention module (SimAM) with C3k2 to form Flexible SimAM (FlexSimAM), which is applied throughout the entire model. This improved module not only reduces the model’s complexity but also enables efficient enhancement of small object features in complex scenarios. Experimental results demonstrate that on the VisDrone-DET2019 dataset, our model improves mAP@0.5 by 7.8% on the validation set (reaching 46.0%) and by 5.9% on the test set (increasing to 37.3%) compared to the baseline YOLO11s, while reducing model parameters by 55.3%. Similarly, it achieves a 7.2% improvement on the TinyPerson dataset and a 3.0% increase on UAVDT-DET. Deployment on the NVIDIA Jetson Orin NX SUPER platform shows that our model achieves 33 FPS, which is 21.4% lower than YOLO11s, confirming its feasibility for real-time onboard UAV applications. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 1982 KB  
Article
Enhanced 3D DenseNet with CDC for Multimodal Brain Tumor Segmentation
by Bekir Berkcan and Temel Kayıkçıoğlu
Appl. Sci. 2026, 16(3), 1572; https://doi.org/10.3390/app16031572 - 4 Feb 2026
Viewed by 159
Abstract
Precise tumor segmentation in multimodal MRI is crucial for glioma diagnosis and treatment planning; yet, deep learning models still struggle with irregular boundaries and severe class imbalance under computational constraints. An Enhanced 3D DenseNet with CDC architecture was proposed, integrating Central Difference Convolution, [...] Read more.
Precise tumor segmentation in multimodal MRI is crucial for glioma diagnosis and treatment planning; yet, deep learning models still struggle with irregular boundaries and severe class imbalance under computational constraints. An Enhanced 3D DenseNet with CDC architecture was proposed, integrating Central Difference Convolution, attention gates, and Atrous Spatial Pyramid Pooling for brain tumor segmentation on the BraTS 2023-GLI dataset. CDC layers enhance boundary sensitivity by combining intensity-level semantics and gradient-level features. Attention gates selectively emphasize relevant encoder features during skip connections, whereas the ASPP captures the multi-scale context with dilation rates. A hybrid loss function spanning three levels was introduced, consisting of a region-based Dice loss for volumetric overlap, a GPU-native 3D Sobel boundary loss for edge precision, and a class-weighted focal loss for handling class imbalance. The proposed model achieved a mean Dice score of 91.30% (ET: 87.84%, TC: 92.73%, WT: 93.34%) on the test set. Notably, these results were achieved with approximately 3.7 million parameters, representing a 17–76x reduction compared to the 50–200 million parameters required by transformer-based approaches. Enhanced 3D DenseNet with CDC architecture demonstrates that the integration of gradient-sensitive convolutions, attention mechanisms, multi-scale feature extraction, and multi-level loss optimization achieves competitive segmentation performance with significantly reduced computational requirements. Full article
Show Figures

Figure 1

27 pages, 14177 KB  
Article
Lite-BSSNet: A Lightweight Blueprint-Guided Visual State Space Network for Remote Sensing Imagery Segmentation
by Jiaxin Yan, Yuxiang Xie, Yan Chen, Yanming Guo and Wenzhe Liu
Remote Sens. 2026, 18(3), 441; https://doi.org/10.3390/rs18030441 - 30 Jan 2026
Viewed by 271
Abstract
Remote sensing image segmentation requires balancing global context and local detail across multi-scale objects. However, convolutional neural network (CNN)-based methods struggle to model long-range dependencies, while transformer-based approaches suffer from quadratic complexity and become inefficient for high-resolution remote sensing scenarios. In addition, the [...] Read more.
Remote sensing image segmentation requires balancing global context and local detail across multi-scale objects. However, convolutional neural network (CNN)-based methods struggle to model long-range dependencies, while transformer-based approaches suffer from quadratic complexity and become inefficient for high-resolution remote sensing scenarios. In addition, the semantic gap between deep and shallow features can cause misalignment during cross-layer aggregation, and information loss in upsampling tends to break thin continuous structures, such as roads and roof edges, introducing pronounced structural noise. To address these issues, we propose lightweight Lite-BSSNet (Blueprint-Guided State Space Network). First, a Structural Blueprint Generator (SBG) converts high-level semantics into an edge-enhanced structural blueprint that provides a topological prior. Then, a Visual State Space Bridge (VSS-Bridge) aligns multi-level features and projects axially aggregated features into a linear-complexity visual state space, smoothing high-gradient edge signals for sequential scanning. Finally, a Structural Repair Block (SRB) enlarges the effective receptive field via dilated convolutions and uses spatial/channel gating to suppress upsampling artifacts and reconnect thin structures. Experiments on the ISPRS Vaihingen and Potsdam datasets show that Lite-BSSNet achieves the highest segmentation accuracy among the compared lightweight models, with mIoU of 83.9% and 86.7%, respectively, while requiring only 45.4 GFLOPs, thus achieving a favorable trade-off between accuracy and efficiency. Full article
Show Figures

Figure 1

20 pages, 4637 KB  
Article
A Lightweight YOLOv13-G Framework for High-Precision Building Instance Segmentation in Complex UAV Scenes
by Yao Qu, Libin Tian, Jijun Miao, Sergei Leonovich, Yanchun Liu, Caiwei Liu and Panfeng Ba
Buildings 2026, 16(3), 559; https://doi.org/10.3390/buildings16030559 - 29 Jan 2026
Viewed by 175
Abstract
Accurate building instance segmentation from UAV imagery remains a challenging task due to significant scale variations, complex backgrounds, and frequent occlusions. To tackle these issues, this paper proposes an improved lightweight YOLOv13-G-based framework for building extraction in UAV imagery. The backbone network is [...] Read more.
Accurate building instance segmentation from UAV imagery remains a challenging task due to significant scale variations, complex backgrounds, and frequent occlusions. To tackle these issues, this paper proposes an improved lightweight YOLOv13-G-based framework for building extraction in UAV imagery. The backbone network is enhanced by incorporating cross-stage lightweight connections and dilated convolutions, which improve multi-scale feature representation and expand the receptive field with minimal computational cost. Furthermore, a coordinate attention mechanism and an adaptive feature fusion module are introduced to enhance spatial awareness and dynamically balance multi-level features. Extensive experiments on a large-scale dataset, which includes both public benchmarks and real UAV images, demonstrate that the proposed method achieves superior segmentation accuracy with a mean intersection over union of 93.12% and real-time inference speed of 38.46 frames per second while maintaining a compact Model size of 5.66 MB. Ablation studies and cross-dataset experiments further validate the effectiveness and generalization capability of the framework, highlighting its strong potential for practical UAV-based urban applications. Full article
(This article belongs to the Topic Application of Smart Technologies in Buildings)
Show Figures

Figure 1

27 pages, 101439 KB  
Article
YOLO-WL: A Lightweight and Efficient Framework for UAV-Based Wildlife Detection
by Chang Liu, Peng Wang, Yunping Gong and Anyu Cheng
Sensors 2026, 26(3), 790; https://doi.org/10.3390/s26030790 - 24 Jan 2026
Viewed by 336
Abstract
Accurate wildlife detection in Unmanned Aerial Vehicle (UAV)-captured imagery is crucial for biodiversity conservation, yet it remains challenging due to the visual similarity of species, environmental disturbances, and the small size of target animals. To address these challenges, this paper introduces YOLO-WL, a [...] Read more.
Accurate wildlife detection in Unmanned Aerial Vehicle (UAV)-captured imagery is crucial for biodiversity conservation, yet it remains challenging due to the visual similarity of species, environmental disturbances, and the small size of target animals. To address these challenges, this paper introduces YOLO-WL, a wildlife detection algorithm specifically designed for UAV-based monitoring. First, a Multi-Scale Dilated Depthwise Separable Convolution (MSDDSC) module, integrated with the C2f-MSDDSC structure, expands the receptive field and enriches semantic representation, enabling reliable discrimination of species with similar appearances. Next, a Multi-Scale Large Kernel Spatial Attention (MLKSA) mechanism adaptively highlights salient animal regions across different spatial scales while suppressing interference from vegetation, terrain, and lighting variations. Finally, a Shallow-Spatial Alignment Path Aggregation Network (SSA-PAN), combined with a Spatial Guidance Fusion (SGF) module, ensures precise alignment and effective fusion of multi-scale shallow features, thereby improving detection accuracy for small and low-resolution targets. Experimental results on the WAID dataset demonstrate that YOLO-WL outperforms existing state-of-the-art (SOTA) methods, achieving 94.2% mAP@0.5 and 58.0% mAP@0.5:0.95. Furthermore, evaluations on the Aerial Sheep and AI-TOD datasets confirm YOLO-WL’s robustness and generalization ability across diverse ecological environments. These findings highlight YOLO-WL as an effective tool for enhancing UAV-based wildlife monitoring and supporting ecological conservation practices. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

20 pages, 3656 KB  
Article
Efficient Model for Detecting Steel Surface Defects Utilizing Dual-Branch Feature Enhancement and Downsampling
by Quan Lu, Minsheng Gong and Linfei Yin
Appl. Sci. 2026, 16(3), 1181; https://doi.org/10.3390/app16031181 - 23 Jan 2026
Viewed by 161
Abstract
Surface defect evaluation in steel production demands both high inference speed and accuracy for efficient production. However, existing methods face two critical challenges: (1) the diverse dimensions and irregular morphologies of surface defects reduce detection accuracy, and (2) computationally intensive feature extraction slows [...] Read more.
Surface defect evaluation in steel production demands both high inference speed and accuracy for efficient production. However, existing methods face two critical challenges: (1) the diverse dimensions and irregular morphologies of surface defects reduce detection accuracy, and (2) computationally intensive feature extraction slows inference. In response to these challenges, this study proposes an innovative network based on dual-branch feature enhancement and downsampling (DFED-Net). First, an atrous convolution and multi-scale dilated attention fusion module (AMFM) is developed, incorporating local–global feature representation. By emphasizing local details and global semantics, the module suppresses noise interference and enhances the capability of the model to separate small-object features from complex backgrounds. Additionally, a dual-branch downsampling module (DBDM) is developed to preserve the fine details related to scale that are typically lost during downsampling. The DBDM efficiently fuses semantic and detailed information, improving consistency across feature maps at different scales. A lightweight dynamic upsampling (DySample) is introduced to supplant traditional fixed methods with a learnable, adaptive approach, which retains critical feature information more flexibly while reducing redundant computation. Experimental evaluation shows a mean average precision (mAP) of 81.5% on the Northeastern University surface defect detection (NEU-DET) dataset, a 5.2% increase compared to the baseline, while maintaining a real-time inference speed of 120 FPS compared to the 118 FPS of the baseline. The proposed DFED-Net provides strong support for the development of automated visual inspection systems for detecting defects on steel surfaces. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 4209 KB  
Article
Stability-Oriented Deep Learning for Hyperspectral Soil Organic Matter Estimation
by Yun Deng and Yuxi Shi
Sensors 2026, 26(2), 741; https://doi.org/10.3390/s26020741 - 22 Jan 2026
Viewed by 129
Abstract
Soil organic matter (SOM) is a key indicator for evaluating soil fertility and ecological functions, and hyperspectral technology provides an effective means for its rapid and non-destructive estimation. However, in practical soil systems, the spectral response of SOM is often highly covariant with [...] Read more.
Soil organic matter (SOM) is a key indicator for evaluating soil fertility and ecological functions, and hyperspectral technology provides an effective means for its rapid and non-destructive estimation. However, in practical soil systems, the spectral response of SOM is often highly covariant with mineral composition, moisture conditions, and soil structural characteristics. Under small-sample conditions, hyperspectral SOM modeling results are usually highly sensitive to spectral preprocessing methods, sample perturbations, and model architecture and parameter configurations, leading to fluctuations in predictive performance across independent runs and thereby limiting model stability and practical applicability. To address these issues, this study proposes a multi-strategy collaborative deep learning modeling framework for small-sample conditions (SE-EDCNN-DA-LWGPSO). Under unified data partitioning and evaluation settings, the framework integrates spectral preprocessing, data augmentation based on sensor perturbation simulation, multi-scale dilated convolution feature extraction, an SE channel attention mechanism, and a linearly weighted generalized particle swarm optimization algorithm. Subtropical red soil samples from Guangxi were used as the study object. Samples were partitioned using the SPXY method, and multiple independent repeated experiments were conducted to evaluate the predictive performance and training consistency of the model under fixed validation conditions. The results indicate that the combination of Savitzky–Golay filtering and first-derivative transformation (SG–1DR) exhibits superior overall stability among various preprocessing schemes. In model structure comparison and ablation analysis, as dilated convolution, data augmentation, and channel attention mechanisms were progressively introduced, the fluctuations of prediction errors on the validation set gradually converged, and the performance dispersion among different independent runs was significantly reduced. Under ten independent repeated experiments, the final model achieved R2 = 0.938 ± 0.010, RMSE = 2.256 ± 0.176 g·kg−1, and RPD = 4.050 ± 0.305 on the validation set, demonstrating that the proposed framework has good modeling consistency and numerical stability under small-sample conditions. Full article
(This article belongs to the Section Environmental Sensing)
Show Figures

Figure 1

Back to TopTop