MDPI - Publisher of Open Access Journals

20 pages, 2546 KB

Open AccessArticle

MCC-Net: Efficient Dual-Attention Network for Infrared Small-Target Detection

by Xiaotian Zhou, Xin Wang, Yan Tian, Kai Jiang, Min Guo, Xuezheng Lian, Lu Ding, Quanyu Zhang and Yaqi Xue

Remote Sens. 2026, 18(11), 1858; https://doi.org/10.3390/rs18111858 - 5 Jun 2026

Viewed by 243

Abstract

Recent years have witnessed the emergence of numerous U-shaped deep learning segmentation methods for infrared small-target detection (IRSTD). However, increasingly complex models still suffer from false and missed detections in challenging scenarios with cluttered backgrounds and weak targets while incurring escalating computational costs. [...] Read more.

Recent years have witnessed the emergence of numerous U-shaped deep learning segmentation methods for infrared small-target detection (IRSTD). However, increasingly complex models still suffer from false and missed detections in challenging scenarios with cluttered backgrounds and weak targets while incurring escalating computational costs. To address these limitations, this paper proposes MCC-Net, a novel and efficient IRSTD framework that achieves superior detection performance with significantly reduced computational complexity. First, we integrate Magnitude-Aware Linear Attention (MALA) and Conditionally Parameterized Convolutions (CondConv) to replace conventional attention mechanisms in skip connections and standard convolutions, respectively, endowing the model with spatial contextual modeling and enhanced feature extraction capabilities at minimal computational overhead. Second, we design an innovative Conditional Cross-Channel Fusion (CondCCF) module that establishes a complementary spatial-channel dual-attention mechanism with MALA, enabling efficient multi-scale feature fusion. Extensive comparative and ablation experiments conducted on three public benchmarks—SIRST-v1, NUDT-SIRST, and IRSTD-1K—demonstrate that MCC-Net achieves state-of-the-art mIoU scores of 77.98%, 95.43%, and 70.46%, respectively, surpassing state-of-the-art methods by 1.07%, 1.95%, and 0.95%. MCC-Net also outperforms existing approaches across multiple evaluation metrics while maintaining substantially lower computational complexity. Full article

(This article belongs to the Special Issue New Insights in Remote Sensing Image Interpretation with Deep Learning)

► Show Figures

Figure 1

23 pages, 7625 KB

Open AccessArticle

MultiDecNet: An Ensemble-Based Semantic Segmentation Architecture for Urban Scene Understanding

by Büşra Emek Soylu and Mehmet Serdar Güzel

Information 2026, 17(6), 540; https://doi.org/10.3390/info17060540 - 1 Jun 2026

Viewed by 237

Abstract

Semantic segmentation is a fundamental task in computer vision that aims to assign a categorical label to each pixel in an image, facilitating dense and detailed scene understanding. This pixel-level classification is especially crucial in autonomous driving, where accurate environmental perception is vital [...] Read more.

Semantic segmentation is a fundamental task in computer vision that aims to assign a categorical label to each pixel in an image, facilitating dense and detailed scene understanding. This pixel-level classification is especially crucial in autonomous driving, where accurate environmental perception is vital for dependable object detection and safe decision-making. In this study, we propose MultiDecNet, a novel multi-decoder semantic segmentation framework designed to capture both macroscopic scene layouts and fine-grained spatial boundaries in complex urban environments. Drawing inspiration from classical networks, MultiDecNet incorporates a parallel dual-branch decoding strategy that simultaneously leverages the multi-scale context modeling of the Pyramid Pooling Module (PPM) and the structural refinement capabilities of Atrous Spatial Pyramid Pooling (ASPP). To explore the impact of modern backbone representations, we structurally modernize the feature extraction pipeline by introducing the contemporary ConvNeXt convolutional architecture as an alternative to traditional ResNet101 backbones. We extensively evaluate and compare the baseline configurations alongside our proposed MultiDecNet using both ResNet101 and ConvNeXt-Large backbones on the benchmark Cityscapes dataset. The quantitative assessments demonstrate that the MultiDecNet architecture consistently provides highly competitive performance within the scope of this comparative study, with the MultiDecNet-ConvNeXt variant achieving favorable overall scores among the evaluated methods. Furthermore, a granular, class-wise IoU and training dynamics analysis reveals that while traditional networks retain competitive boundaries for localized minority targets, the modern ConvNeXt backbone ensures faster convergence stability and balanced contextual mastery over large-scale driving layouts. Ultimately, these findings offer critical insights into architectural synergy and backbone selection, presenting a robust, scalable, and well-balanced solution for advanced autonomous navigation systems. Full article

(This article belongs to the Special Issue Computer Vision for Security Applications, 2nd Edition)

► Show Figures

Graphical abstract

21 pages, 3412 KB

Open AccessArticle

MCA-YOLO: An Improved YOLOv11n-Based Model for Precise Detection of Cotton Apical Buds

by Shuhua Yang, Chongwu Wang, Jianhe Wang, Bo Peng, Ran Yan and Jianjun Hao

Agriculture 2026, 16(11), 1189; https://doi.org/10.3390/agriculture16111189 - 28 May 2026

Viewed by 201

Abstract

Precise detection of cotton apical buds is the primary step toward achieving intelligent topping operations. Existing object detection models still struggle to accurately recognize dense small targets under complex field conditions. In this study, we propose an improved model, MCA-YOLO, based on YOLOv11n, [...] Read more.

Precise detection of cotton apical buds is the primary step toward achieving intelligent topping operations. Existing object detection models still struggle to accurately recognize dense small targets under complex field conditions. In this study, we propose an improved model, MCA-YOLO, based on YOLOv11n, and optimize it from three aspects: feature extraction, computational efficiency, and multi-scale feature fusion. First, we introduce the MLCA attention mechanism into the PSABlock to construct the C2PSA_MLCA module, enhancing the model’s capability to represent both local and global features. Second, a CSPHet module is reconstructed using heterogeneous convolution (HetConv) combined with a dual-path design to reduce convolutional redundancy and improve feature extraction efficiency. Finally, the original YOLOv11n detection head is replaced with an ASFFHead, enabling adaptive multi-scale feature fusion, thereby improving detection performance for small, dense, and scale-varying targets. Experimental results show that MCA-YOLO achieves Precision, Recall, mAP@0.5, and F1-score of 89.0%, 83.1%, 90.6%, and 85.9%, corresponding to improvements of 3.0, 8.1, 7.1, and 5.8 percentage points over YOLOv11n. Compared with YOLOv11n, the parameters and GFLOPs increase by 50.0% and 31.7%. Even with this increase in model complexity, MCA-YOLO achieves 75 FPS with a model size of 7.76 MB, indicating that it maintains real-time detection capability while improving detection accuracy. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

22 pages, 1580 KB

Open AccessArticle

Input-Adaptive Dynamic Neural Network for Efficient Object Detection Toward Resource-Constrained Deployment

by Jungwoo Lee, Hyogon Kim, Sung-Jo Yun and Youngho Choi

Electronics 2026, 15(11), 2310; https://doi.org/10.3390/electronics15112310 - 26 May 2026

Viewed by 167

Abstract

The deployment of object detection models on resource-constrained edge devices remains a substantial challenge, primarily because conventional static networks expend the same worst-case computational cost on every input, regardless of intrinsic difficulty. This paper proposes an input-adaptive dynamic neural network architecture for object [...] Read more.

The deployment of object detection models on resource-constrained edge devices remains a substantial challenge, primarily because conventional static networks expend the same worst-case computational cost on every input, regardless of intrinsic difficulty. This paper proposes an input-adaptive dynamic neural network architecture for object detection in embedded environments. The present study investigates two orthogonal axes of input-adaptive inference for embedded object detection: The system demonstrates depth adaptivity through the implementation of Early Exit, and width adaptivity via group-wise Adaptive Routing. The proposed framework is constructed on a frozen Ultralytics YOLO26s backbone and incorporates two YOLO-style early-exit heads positioned at approximately 33% and 66% of the backbone depth. Furthermore, the framework incorporates two Straight-Through Gumbel-Softmax routers, which are appended after Layers 4 and 8 with group-wise hard gating. Both axes additionally accept an explicit external control signal that allows the host system to override the input-conditional policy at inference time. The dual-mode design facilitates the functionality of the trained checkpoint as either an input-adaptive policy, in which the depth and width are determined per sample from the input distribution, or an externally controlled policy. The experimental findings demonstrate two strongly asymmetric input-adaptive policies on a frozen YOLO26s backbone. The early-exit profile reduces the compute per sample from 12.739 to 10.532 GFLOPs—a 17.32% reduction according to our in-house Conv2d/Linear MAC-based GFLOPs estimator—while preserving baseline accuracy (mAP50 = 0.1545 vs. baseline = 0.1528; ΔmAP50 = +0.0017, within evaluation noise; mAP50–95 = −0.0033). Evaluating the router-only profile in the same validator pipeline with a sparsity penalty of γ = 0.05 results in a 12.3% decrease in logical GFLOPs (from 12.739 to 11.172), while maintaining an accuracy level that is at or above the PEFT baseline (mAP50 = 0.2324 and mAP50–95 = 0.1040). In our small-domain PEFT setup, training the dynamic-policy modules yields per-checkpoint mAP shifts in this magnitude. Therefore, we interpret the width-axis accuracy result as preservation of the baseline rather than an improvement. Our contribution on the width axis is reducing computing power while maintaining baseline accuracy. Importantly, the router profile’s logical GFLOP savings are not currently reflected in wall-clock latency under our dense-kernel PyTorch implementation. Achieving practical speedup requires sparse-kernel deployment, such as structured-sparse kernels, TensorRT, TVM, or Triton paths. We will address this in future deployment-level work. Our results indicate that the depth axis can yield genuine end-to-end speedup today, while the width axis offers deployment-pending compute reduction. Full article

(This article belongs to the Special Issue Implementation of Neural Network Models on Resource-Constrained Devices)

► Show Figures

Figure 1

22 pages, 3661 KB

Open AccessArticle

Industrial Weld Defect Detection Based on Monocular Depth Estimation and Dual-Attention Point Cloud Network

by Nannan Zhao and Shijie Chen

Sensors 2026, 26(11), 3321; https://doi.org/10.3390/s26113321 - 23 May 2026

Viewed by 373

Abstract

In industrial quality control, the precise identification of severe structural weld defects is paramount. Traditional 2D image-based detection methods are susceptible to illumination and texture interference, while high-precision 3D laser scanning solutions are costly and impractical for large-scale deployment. To achieve reliable geometric [...] Read more.

In industrial quality control, the precise identification of severe structural weld defects is paramount. Traditional 2D image-based detection methods are susceptible to illumination and texture interference, while high-precision 3D laser scanning solutions are costly and impractical for large-scale deployment. To achieve reliable geometric defect detection at low cost, this paper proposes a detection framework based on monocular depth estimation and a dual-attention point cloud network. First, YOLOv8 is employed for rapid region of interest extraction, and an advanced monocular depth estimation model generates 3D pseudo-point clouds containing geometric information. Secondly, addressing the challenge of distinct spatial orientation features in missed weld defects that are prone to confusion, this paper introduces a dual-attention-enhanced point cloud classification network named DA-PointNet++. This model embeds dual-attention modules within the PointNet++ backbone network, enhancing key feature representation in both the channel and spatial dimensions. Experimental results demonstrate that this approach achieves an accuracy of 93.67% and a recall rate of 90.51% in a unified binary classification task for general weld defect detection, effectively identifying both normal welds and complex missed weld defects. Compared to PointConv, Dynamic Graph Convolutional Neural Network (DGCNN), and mainstream Point Cloud Transformer, this method significantly reduces false negative rates while maintaining low computational costs, offering a cost-effective solution for industrial automation. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

15 pages, 1802 KB

Open AccessArticle

FusionTyphoonPredictor: Dual-Branch Enhanced Spatiotemporal Prediction for Typhoon Cloud Images

by Haipeng Li, Jun Liu, Yan Liu and Zelin Liu

Atmosphere 2026, 17(6), 536; https://doi.org/10.3390/atmos17060536 - 23 May 2026

Viewed by 271

Abstract

Accurate forecasting of typhoon evolution from satellite cloud imagery is critical for disaster preparedness and mitigation, yet remains challenging due to the complex spatiotemporal dynamics of typhoon systems. While deep learning models have shown promise in spatiotemporal sequence prediction, existing approaches often struggle [...] Read more.

Accurate forecasting of typhoon evolution from satellite cloud imagery is critical for disaster preparedness and mitigation, yet remains challenging due to the complex spatiotemporal dynamics of typhoon systems. While deep learning models have shown promise in spatiotemporal sequence prediction, existing approaches often struggle to balance the modeling of large-scale structural evolution with fine-grained local dynamics. In this paper, we propose FusionTyphoonPredictor, a novel dual-branch encoder–decoder framework designed for typhoon cloud image prediction. The model integrates a Global Fusion Module to capture multi-scale spatial interactions using large-kernel attention and multi-scale convolution, and an ST Recurrent Refiner to enhance temporal consistency and local detail through recurrent processing with ConvGRU and residual blocks. Extensive experiments on the Digital Typhoon dataset demonstrate that our approach achieves improved performance compared to existing methods (including PredFormer and PhyDNet) across most metrics and forecasting horizons. Specifically, FusionTyphoonPredictor shows consistent advantages in SSIM, MAE, and MSE, with particular strength in short-term forecasting. Comprehensive ablation studies validate the complementary design of the two branches and confirm the effectiveness of each proposed component. Our work advances typhoon forecasting and has potential for real-time operational deployment. Full article

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

► Show Figures

Figure 1

23 pages, 4189 KB

Open AccessArticle

DARE-YOLO: A Lightweight Object Detection Algorithm and Its FPGA Acceleration for Sustainable PV Panel Inspection

by Yuchuan Yang, Feng Xing, Caiyan Qin, Shuxu Chen, Hyundong Shin and Sungyoung Lee

Sustainability 2026, 18(10), 4999; https://doi.org/10.3390/su18104999 - 15 May 2026

Viewed by 221

Abstract

As a critical component of sustainable energy systems, the efficient maintenance of photovoltaic (PV) panels is essential. While deep learning is an important approach for PV panel defect detection, the high complexity of existing models and their substantial computational demand make deployment on [...] Read more.

As a critical component of sustainable energy systems, the efficient maintenance of photovoltaic (PV) panels is essential. While deep learning is an important approach for PV panel defect detection, the high complexity of existing models and their substantial computational demand make deployment on edge platforms difficult. This paper studies an acceleration method for photovoltaic panel defect detection on the Zynq-7020 heterogeneous platform. We design DARE-YOLO, a lightweight network for photovoltaic panel defect detection, together with a Zynq-based accelerator. In DARE-YOLO, we introduce RepConv and a lightweight single-path backbone to reduce the memory bandwidth overhead caused by multi-branch structures. We further design a Dilated Context Block (DCB) and a Dual-scale Decoupled Head (DDH), which effectively improve the detection accuracy of DARE-YOLO. On the Zynq platform, we develop the accelerator through a mixed fixed-point quantization strategy, a custom convolution IP core, and pipeline unrolling. These optimizations reduce data access latency, improve computational parallelism, and increase computational throughput. Experimental results show that DARE-YOLO achieves 93.84% mAP@0.5 with only 6.4 M parameters. The accelerator has a total on-board power consumption of only 1.95 W, while delivering a throughput of 37.5 GOPS, an energy efficiency of 19.23 GOPS/W. The image inference latency is 661.3 ms. This low-power, high-efficiency co-design paradigm ensures the long-term reliability of renewable energy facilities. Full article

(This article belongs to the Special Issue Sustainable Solar Power Systems and Applications)

► Show Figures

Figure 1

24 pages, 6147 KB

Open AccessArticle

Multi-Scale Transformer-Based Neural Architecture Search for Hyperspectral Image Classification

by Aili Wang, Xinyu Liu and Haisong Chen

Remote Sens. 2026, 18(10), 1586; https://doi.org/10.3390/rs18101586 - 15 May 2026

Viewed by 232

Abstract

Hyperspectral image classification (HSIC) is a crucial task for remote sensing applications, requiring accurate pixel-level labeling while effectively capturing both spectral and spatial information. Traditional convolutional neural network architectures often struggle to balance local texture detail and global contextual consistency, and existing neural [...] Read more.

Hyperspectral image classification (HSIC) is a crucial task for remote sensing applications, requiring accurate pixel-level labeling while effectively capturing both spectral and spatial information. Traditional convolutional neural network architectures often struggle to balance local texture detail and global contextual consistency, and existing neural architecture search (NAS) methods rarely incorporate attention mechanisms, limiting their performance. To address these challenges, this study proposes a multi-scale Transformer-based NAS framework (TR-NAS) for fine-grained hyperspectral image classification. The framework combines local cube sampling, shallow and deep multi-scale convolutions, and a searchable Transformer module that adaptively selects global, local window, and multi-scale attention operators. Lightweight enhanced convolution operators, including dual-gated (DG-Conv) and mixed depthwise (MixConv) convolutions, are incorporated to improve spectral discrimination and scale robustness. Extensive experiments on the PU and Hanchuan datasets demonstrate that TR-NAS achieves superior classification accuracy, stability, and boundary consistency compared to traditional methods and existing NAS architectures, showing improved robustness to spectral similarity and spatial heterogeneity in complex remote sensing scenes. Full article

(This article belongs to the Special Issue Deep Learning for Multi-Sensor Remote Sensing: Advancements in Image Classification and Semantic Segmentation)

► Show Figures

Figure 1

30 pages, 6946 KB

Open AccessArticle

ISDG-Net: Efficient RGB–Infrared Object Detection for Remote Sensing Imagery

by Yaoyue Gao, Xinru Cheng, Yimeng Li, Dawei Xu, Desheng Sun and Yaoyi Hu

Remote Sens. 2026, 18(10), 1570; https://doi.org/10.3390/rs18101570 - 14 May 2026

Viewed by 302

Abstract

In all-weather Earth observation and complex unstructured environments, traditional single-modal remote sensing object detection often fails due to low illumination and strong background interference. While RGB–infrared fusion provides complementary information, existing methods are typically computationally intensive and struggle with dense small objects and [...] Read more.

In all-weather Earth observation and complex unstructured environments, traditional single-modal remote sensing object detection often fails due to low illumination and strong background interference. While RGB–infrared fusion provides complementary information, existing methods are typically computationally intensive and struggle with dense small objects and modality discrepancies, limiting their deployment on resource-constrained platforms. To address these challenges, we propose ISDG-Net, a lightweight and efficient visible-infrared dual-modal object detection framework specifically tailored for edge deployment. ISDG-Net integrates four core components: (1) a channel-separated inverted bottleneck backbone (IBC-Conv) that reduces parameter redundancy while preserving modality-specific semantics; (2) a dynamic sparse attention module (DySparse) based on Bi-Level Routing Attention, enabling long-range dependency modeling with low computational cost; (3) an adaptive spatial fusion detection head (Detect-SASD) that aligns visible and infrared features at the pixel level to resolve semantic inconsistency and scale mismatch; and (4) a geometry-aware IoU selector (GIS) that mitigates over-suppression in crowded scenes by incorporating multi-dimensional geometric constraints into post-processing. Extensive experiments on the VEDAI, M3FD, and LLVIP datasets demonstrate the effectiveness and efficiency of ISDG-Net. It achieves 55.1% and 77.1% mAP@0.5 on VEDAI and M3FD, respectively, and 93.7% mAP@0.5 with 89.7% recall on LLVIP, while maintaining a compact model size of 4.2 M parameters and 11.3 GFLOPs. These results validate that accurate RGB–infrared detection is achievable under strict resource constraints, making ISDG-Net well-suited for deployment in edge-based remote sensing systems. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

25 pages, 1880 KB

Open AccessArticle

A Dual-Branch Deep Learning Framework with Explainability for Dental Caries Classification Using Intra-Oral Photographs and Radiographs

by Lijuan Ren and Jinjing Chen

J. Imaging 2026, 12(5), 207; https://doi.org/10.3390/jimaging12050207 - 12 May 2026

Viewed by 274

Abstract

The accurate detection of dental caries is often hindered by modality-specific imaging challenges, such as illumination artifacts in intra-oral photographs and low lesion contrast in radiographs. This study proposes a comprehensive framework comprising three key components: (1) HybridAugment+, an entropy-guided adaptive augmentation strategy [...] Read more.

The accurate detection of dental caries is often hindered by modality-specific imaging challenges, such as illumination artifacts in intra-oral photographs and low lesion contrast in radiographs. This study proposes a comprehensive framework comprising three key components: (1) HybridAugment+, an entropy-guided adaptive augmentation strategy that applies stronger transformations to low-information images; (2) DBAttNet, a dual-branch attention network featuring illumination–reflection aware attention (IRAA) for photographs and contrast–frequency-aware attention (CFA) for radiographs; and (3) a CAM-based explainability method, selected through a systematic evaluation of five advanced techniques. This study utilized two datasets derived from public sources, comprising 639 intra-oral photographs (481 caries, 158 healthy) and 456 radiographs (268 caries, 188 healthy). These were annotated by two dentists, with established inter-rater reliability (κ = 0.82 for photographs, κ = 0.79 for radiographs). The experimental results demonstrate that HybridAugment+ improved performance over conventional augmentation by up to 8.72% on photographs and 7.67% on radiographs. Furthermore, DBAttNet achieved F1-scores of 97.90% on photographs and 95.72% on radiographs, outperforming ResNet50, InceptionV3, MSDNet, DCANet, and ARM-Net. A comparative evaluation identified XGrad-CAM as the most suitable explainability method, with optimal visualization thresholds of 30% for photographs and 20% for radiographs. Generalization experiments on ophthalmology (APTOS 2019, Messidor-2) and chest radiography datasets (Kermany CXR, NIH ChestX-ray14) demonstrated consistent performance gains over domain-specific methods (DT-Net, ConvNeXt-Tiny). These results confirm that the core design principles effectively transfer to other modalities facing analogous imaging challenges. Full article

(This article belongs to the Special Issue Artificial Intelligence for Medical Imaging and Applications)

► Show Figures

Figure 1

22 pages, 8004 KB

Open AccessArticle

DESA-YOLO: A Growth-Stage Adaptive Pig Face Recognition Algorithm Based on Multi-Scale Feature Fusion

by Xin Li, Jinghan Cai, Tonghai Liu, Fanzhen Wang, Xiaomeng Zheng and Meng Wang

Animals 2026, 16(10), 1468; https://doi.org/10.3390/ani16101468 - 10 May 2026

Viewed by 423

Abstract

This paper proposes a pig face individual recognition algorithm named DESA-YOLO based on an improved YOLO11 model, aiming to address the adaptability issue of pig face recognition across different growth stages. With the large-scale development of pig farming, traditional individual identification methods suffer [...] Read more.

This paper proposes a pig face individual recognition algorithm named DESA-YOLO based on an improved YOLO11 model, aiming to address the adaptability issue of pig face recognition across different growth stages. With the large-scale development of pig farming, traditional individual identification methods suffer from low efficiency and high cost, while pig face recognition technology has great application potential as an important tool for precision suckling and disease prevention. Due to the significant facial feature differences among pigs at different growth stages, this study proposes an improved YOLO11 architecture to address this challenge. The method improves detection accuracy and adaptability by introducing a DualConv structure, an EMA module, a SEAM attention mechanism, and an ASFF detection head. Experimental results show that DESA-YOLO achieves significant improvements over traditional models such as YOLOv5 and YOLOv8 in precision, recall, mAP, and F1 score, obtaining an mAP of 93.7%, which represents increases of 6.3%, 3.5%, and 3% in precision, recall, and mAP respectively compared with the YOLO11 baseline model. Ablation experiments and heatmap visualizations further validate the effectiveness of the proposed improvement modules. The improved model demonstrates higher adaptability and stability across different pig growth stages, while maintaining real-time inference performance for practical deployment. Full article

(This article belongs to the Section Pigs)

► Show Figures

Figure 1

20 pages, 2795 KB

Open AccessArticle

A U-Net Improved Version for Crop and Weed Segmentation from Aerial Images

by Alexandru Bunica-Mihai, Dan Popescu and Loretta Ichim

Sensors 2026, 26(10), 2997; https://doi.org/10.3390/s26102997 - 9 May 2026

Viewed by 707

Abstract

The optimization of herbicide application is one of the most important topics in Precision Agriculture, driven by both economic efficiency and ecological sustainability. Excessive herbicide use can lead to soil degradation, water contamination, and negative impacts on biodiversity, while also contributing to human [...] Read more.

The optimization of herbicide application is one of the most important topics in Precision Agriculture, driven by both economic efficiency and ecological sustainability. Excessive herbicide use can lead to soil degradation, water contamination, and negative impacts on biodiversity, while also contributing to human health risks and climate-related concerns. Developing accurate, automated approaches for distinguishing crops from weeds is therefore essential to support sustainable agricultural practices. In this paper, a novel architecture for crops and weed segmentation in tobacco plantations is proposed: a U-Net variant which incorporates several specific design elements, including deep supervision, a Vegetation Global Context block, and a dual-headed output that separately predicts vegetation and crop masks. Weed regions are derived as the difference between vegetation and crop predictions, allowing the model to enforce logical consistency directly within a single framework, in contrast to other two-step approaches. The proposed architecture was evaluated using multiple modern encoder backbones (ConvNextV2, FastViT, RepViT, MambaVision). Experimental results demonstrate that this architecture not only improves segmentation accuracy compared to prior approaches, with best scores of 94.24% Dice for crop segmentation and 93.72% for weeds, but also significantly reduces inference time by avoiding multi-stage pipelines, making it well-suited for real-time deployment. Full article

(This article belongs to the Special Issue Image Processing and Pattern Recognition Based on Deep Learning for Sensing Applications—3rd Edition)

► Show Figures

Figure 1

15 pages, 2523 KB

Open AccessArticle

Small-Sample Ctenopharyngodon idella Disease Recognition via Dual-Stream Data Augmentation and Supervised Contrastive Learning

by Yuzhu Wang and Dexing Wang

Appl. Sci. 2026, 16(9), 4460; https://doi.org/10.3390/app16094460 - 2 May 2026

Viewed by 357

Abstract

Addressing the challenges of extreme sample scarcity, complex underwater optical environments, and significant variations in lesion scales in real-world aquaculture, this paper proposes a small-sample grass carp disease recognition method, namely Swin Transformer with Supervised Contrastive Learning (ST-SCL), integrating dual-stream data augmentation and [...] Read more.

Addressing the challenges of extreme sample scarcity, complex underwater optical environments, and significant variations in lesion scales in real-world aquaculture, this paper proposes a small-sample grass carp disease recognition method, namely Swin Transformer with Supervised Contrastive Learning (ST-SCL), integrating dual-stream data augmentation and supervised contrastive learning. First, a frequency-spatial dual-stream augmentation strategy is constructed. In the frequency domain, the Amplitude-Mix technique is introduced to simulate diverse lighting and turbidity styles by mixing background amplitude spectra, thereby enhancing environmental generalization. In the spatial domain, a pathology-mask-guided instance-level Copy-Paste strategy is employed to directionally expand scarce lesion samples and address data imbalance. Second, the Swin Transformer is adopted as the backbone network, leveraging its hierarchical shifted window attention mechanism to effectively capture multi-scale features, balancing the detection of tiny parasites and extensive superficial ulcerations. Finally, supervised contrastive learning is incorporated to maximize intra-class compactness and minimize inter-class separability within the feature space, effectively reducing overfitting inherent to small-sample learning. Experimental results demonstrate that the proposed method achieves a macro-average F1-score of 95.86% across six disease categories. Compared with mainstream models such as ResNet and ConvNeXt, the ST-SCL exhibits notable performance improvements and enhanced robustness in small-sample scenarios, offering a promising technical path for precise fish disease diagnosis in complex aquatic environments. Full article

(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision, 2nd Edition)

► Show Figures

Figure 1

26 pages, 7586 KB

Open AccessArticle

RFA2Net: A Receptive Field and Global Attention Enhanced Model for Semantic Segmentation of High-Resolution Remote-Sensing Images

by Xingyi Zhong, Junhao Liu, Yiqiu Mao, Yubin Zhong and Guanquan Zhu

AI 2026, 7(5), 156; https://doi.org/10.3390/ai7050156 - 29 Apr 2026

Viewed by 1027

Abstract

Semantic segmentation of high-resolution remote-sensing images is critical for urban planning, land-cover mapping, and ecological monitoring. However, existing methods face limitations in handling complex land-cover types, multi-scale objects, and modeling long-range dependencies. To address these challenges, we propose RFA2Net, an enhanced semantic segmentation [...] Read more.

Semantic segmentation of high-resolution remote-sensing images is critical for urban planning, land-cover mapping, and ecological monitoring. However, existing methods face limitations in handling complex land-cover types, multi-scale objects, and modeling long-range dependencies. To address these challenges, we propose RFA2Net, an enhanced semantic segmentation model based on the DeepLabv3+ framework. The key innovations include the integration of the RFCSA-Conv module into the ResNet101 backbone to enhance feature representation and long-range dependency modeling, the design of the RFA-DASPP structure built upon the Dense ASPP framework with the novel RFCA-DConv dilated convolution module to reduce information loss during multi-scale feature fusion and enhance the model’s ability to perceive long-range directional structures, and the introduction of a Dual-Branch Fusion Network to improve segmentation accuracy for small-scale objects. Experimental results on the ISPRS Potsdam and LoveDA datasets demonstrate that RFA2Net outperforms several CNN and Transformer-based models, achieving 78.94% and 59.46% mean intersection over union (mIoU) on the ISPRS Potsdam and LoveDA datasets, respectively, with improvements of 3.19% and 3.08% over the original DeepLabv3+. Ablation studies and comparative experiments further confirm the model’s effectiveness, robustness, and practical applicability in high-resolution remote-sensing image segmentation, with particular relevance to environmental monitoring and sustainable energy applications. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

32 pages, 1505 KB

Open AccessArticle

Assessing the Transferability and Structural Sensitivity of Convolutional Neural Networks in Art Media Classification

by Juan M. Fortuna-Cervantes, Mayra D. Govea-Tello, Carlos Soubervielle-Montalvo, Rafael Peña-Gallardo, Luis J. Ontañon-García and Isaac Campos-Cantón

Mathematics 2026, 14(9), 1414; https://doi.org/10.3390/math14091414 - 23 Apr 2026

Viewed by 830

Abstract

While convolutional neural networks (CNNs) excel at image classification, their generalization across domains and robustness to nonlinear degradation remain challenges in art media classification (AMC). To address these challenges, this article presents a dual-stage analytical framework: first, an evaluation of seven discrete CNN [...] Read more.

While convolutional neural networks (CNNs) excel at image classification, their generalization across domains and robustness to nonlinear degradation remain challenges in art media classification (AMC). To address these challenges, this article presents a dual-stage analytical framework: first, an evaluation of seven discrete CNN architectures—ranging from VGG16 to ConvNeXt—subjected to domain shift using the New Spain (Mexico) Art Media Dataset; and second, a formal robustness analysis using an artistic corruption benchmark (Art-C). This benchmark simulates nonlinear degradations, including cracking, oxidized varnish, and pictorial abstraction. Our results demonstrate that while deep convolutional representations maintain acceptable transferability (accuracy

> 70 %

), significant variability exists in architectural stability (mean 0.0607) under progressive stochastic degradation. Notably, Xception exhibited the highest robustness (Art-mCE = 0.8039), whereas VGG16 showed the greatest relative performance decay. Severity analysis further indicates that structural perturbations induce higher error rates than chromatic shifts, suggesting that CNNs are more sensitive to topological features (depth and residual connections) than color-space distributions. We provide quantitative evidence characterizing the relationship between architectural topology and empirical stability in non-natural image domains. Full article

(This article belongs to the Special Issue Advances, Challenges, and Applications of Deep Learning Models in Computer Vision and Image Processing and Analysis)

► Show Figures

Figure 1

Search Results (139)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (139)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI