MDPI - Publisher of Open Access Journals

21 pages, 976 KB

Open AccessArticle

A Spatio-Temporal Prototypical Network for Few-Shot Modulation Recognition

by Song Li, Yong Wang, Jun Xiong and Jiankai Huang

Electronics 2026, 15(5), 1036; https://doi.org/10.3390/electronics15051036 - 2 Mar 2026

Though deep learning has brought transformative advances to the field of modulation recognition, conventional approaches typically rely on a large amount of labeled data, which is often difficult to obtain in real-world communication scenarios. Few-shot modulation recognition (FSMR), which aims to identify modulation [...] Read more.

Though deep learning has brought transformative advances to the field of modulation recognition, conventional approaches typically rely on a large amount of labeled data, which is often difficult to obtain in real-world communication scenarios. Few-shot modulation recognition (FSMR), which aims to identify modulation formats with extremely limited training samples, serves as a key enabler for next-generation cognitive radio, intelligent spectrum management, and non-cooperative communications. However, existing neural network models are not inherently designed for few-shot learning (FSL) and cannot be directly applied to FSMR tasks. To address this gap, this paper proposes a spatio-temporal prototypical network (STPN) trained within a meta-learning framework. Through a lightweight multi-module design that sequentially captures spatial patterns and temporal dependencies, STPN effectively integrates hybrid feature extraction with prototype-based classification. In contrast to existing approaches, STPN features a streamlined architecture free from intricate operations that could compromise generalization. This advantage is especially crucial when the model is trained on numerous meta-tasks with only a few samples. Comprehensive experiments on public benchmarks show that STPN achieves superior classification accuracy over several baseline models, while also offering advantages in parameter efficiency and computational cost. Further analysis investigates the key parameters influencing model performance, and ablation studies confirm the individual contribution of each module. This work not only deepens the theoretical understanding of prototype-based FSL techniques but also establishes a practical framework applicable to other signal processing tasks that demand robust performance under limited labeled data. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Wireless Communications)

► Show Figures

Figure 1

30 pages, 7439 KB

Open AccessArticle

Traffic Forecasting for Industrial Internet Gateway Based on Multi-Scale Dependency Integration

by Tingyu Ma, Jiaqi Liu, Panfeng Xu and Yan Song

Sensors 2026, 26(3), 795; https://doi.org/10.3390/s26030795 - 25 Jan 2026

Viewed by 313

Abstract

Industrial gateways serve as critical data aggregation points within the Industrial Internet of Things (IIoT), enabling seamless data interoperability that empowers enterprises to extract value from equipment data more efficiently. However, their role exposes a fundamental trade-off between computational efficiency and prediction accuracy—a [...] Read more.

Industrial gateways serve as critical data aggregation points within the Industrial Internet of Things (IIoT), enabling seamless data interoperability that empowers enterprises to extract value from equipment data more efficiently. However, their role exposes a fundamental trade-off between computational efficiency and prediction accuracy—a contradiction yet to be fully resolved by existing approaches. The rapid proliferation of IoT devices has led to a corresponding surge in network traffic, posing significant challenges for traffic forecasting methods, while deep learning models like Transformers and GNNs demonstrate high accuracy in traffic prediction, their substantial computational and memory demands hinder effective deployment on resource-constrained industrial gateways, while simple linear models offer relative simplicity, they struggle to effectively capture the complex characteristics of IIoT traffic—which often exhibits high nonlinearity, significant burstiness, and a wide distribution of time scales. The inherent time-varying nature of traffic data further complicates achieving high prediction accuracy. To address these interrelated challenges, we propose the lightweight and theoretically grounded DOA-MSDI-CrossLinear framework, redefining traffic forecasting as a hierarchical decomposition–interaction problem. Unlike existing approaches that simply combine components, we recognize that industrial traffic inherently exhibits scale-dependent temporal correlations requiring explicit decomposition prior to interaction modeling. The Multi-Scale Decomposable Mixing (MDM) module implements this concept through adaptive sequence decomposition, while the Dual Dependency Interaction (DDI) module simultaneously captures dependencies across time and channels. Ultimately, decomposed patterns are fed into an enhanced CrossLinear model to predict flow values for specific future time periods. The Dream Optimization Algorithm (DOA) provides bio-inspired hyperparameter tuning that balances exploration and exploitation—particularly suited for the non-convex optimization scenarios typical in industrial forecasting tasks. Extensive experiments on real industrial IoT datasets thoroughly validate the effectiveness of this approach. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

16 pages, 1428 KB

Open AccessArticle

StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation

by Qiong Chen, Donghao Zhang, Yimin Chen, Siyuan Zhang, Yue Sun, Fabiano Reis, Li M. Li, Li Yuan, Huijuan Jin and Wu Qiu

Bioengineering 2026, 13(2), 133; https://doi.org/10.3390/bioengineering13020133 - 23 Jan 2026

Viewed by 462

Abstract

Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation [...] Read more.

Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation framework called StrDiSeg that integrates lightweight bottleneck adapters between selected transformer layers of DINOv3, enabling task-specific learning while preserving pretrained knowledge. An attention-enhanced U-Net decoder with multi-scale feature fusion further refines the representations. Experiments were performed on two publicly available ischemic stroke lesion segmentation datasets—AISD (Non Contrast CT) and ISLES22 (DWI). The proposed method achieved Dice scores of 0.516 on AISD and 0.824 on ISLES22, outperforming baseline models and demonstrating strong robustness across different clinical imaging modalities. These results indicate that adapter-based fine-tuning provides a practical and computationally efficient strategy for leveraging large pretrained vision models in medical image segmentation. Full article

(This article belongs to the Special Issue Machine Learning in Biomedical Research: Application, Innovation and Exploration)

► Show Figures

Figure 1

16 pages, 6353 KB

Open AccessArticle

Research on Encrypted Transmission and Recognition of Garbage Images in Low-Illumination Environments

by Zhenwei Lv, Yapeng Diao, Chunnian Zeng, Weiping Wang and Shufan An

Electronics 2026, 15(2), 302; https://doi.org/10.3390/electronics15020302 - 9 Jan 2026

Viewed by 274

Abstract

Low-illumination conditions significantly degrade the performance of vision-based garbage recognition systems in practical smart city applications. To address this issue, this paper presents a garbage recognition framework that combines low-light image enhancement with attention-guided feature learning. A multi-branch low-light enhancement network (MBLLEN) is [...] Read more.

Low-illumination conditions significantly degrade the performance of vision-based garbage recognition systems in practical smart city applications. To address this issue, this paper presents a garbage recognition framework that combines low-light image enhancement with attention-guided feature learning. A multi-branch low-light enhancement network (MBLLEN) is employed as the enhancement backbone, and a Convolutional Block Attention Module (CBAM) is integrated to alleviate local over-enhancement and guide feature responses under uneven illumination. The enhanced images are then used as inputs for a deep learning-based garbage classification model. In addition, a lightweight encryption mechanism is considered at the system level to support secure data transmission in practical deployment scenarios. Experiments conducted on a self-collected low-light garbage dataset show that the proposed framework achieves improved image quality and recognition performance compared with baseline approaches. These results suggest that integrating low-light enhancement with attention-guided feature learning can be beneficial for garbage recognition tasks under challenging illumination conditions. Full article

(This article belongs to the Special Issue Applied Cryptography and Practical Cryptoanalysis for Web 3.0, Volume 2)

► Show Figures

Figure 1

29 pages, 79553 KB

Open AccessArticle

A²Former: An Airborne Hyperspectral Crop Classification Framework Based on a Fully Attention-Based Mechanism

by Anqi Kang, Hua Li, Guanghao Luo, Jingyu Li and Zhangcai Yin

Remote Sens. 2026, 18(2), 220; https://doi.org/10.3390/rs18020220 - 9 Jan 2026

Viewed by 290

Abstract

Crop classification of farmland is of great significance for crop monitoring and yield estimation. Airborne hyperspectral systems can provide large-format hyperspectral farmland images. However, traditional machine learning-based classification methods rely heavily on handcrafted feature design, resulting in limited representation capability and poor computational [...] Read more.

Crop classification of farmland is of great significance for crop monitoring and yield estimation. Airborne hyperspectral systems can provide large-format hyperspectral farmland images. However, traditional machine learning-based classification methods rely heavily on handcrafted feature design, resulting in limited representation capability and poor computational efficiency when processing large-format data. Meanwhile, mainstream deep-learning-based hyperspectral image (HSI) classification methods primarily rely on patch-based input methods, where a label is assigned to each patch, limiting the full utilization of hyperspectral datasets in agricultural applications. In contrast, this paper focuses on the semantic segmentation task in the field of computer vision and proposes a novel HSI crop classification framework named All-Attention Transformer (A²Former), which combines CNN and Transformer based on a fully attention-based mechanism. First, a CNN-based encoder consisting of two blocks, the overlap-downsample and the spectral–spatial attention weights block (SSWB) is constructed to extract multi-scale spectral–spatial features effectively. Second, we propose a lightweight C-VIT block to enhance high-dimensional features while reducing parameter count and computational cost. Third, a Transformer-based decoder block with gated-style weighted fusion and interaction attention (WIAB), along with a fused segmentation head (FH), is developed to precisely model global and local features and align semantic information across multi-scale features, thereby enabling accurate segmentation. Finally, a checkerboard-style sampling strategy is proposed to avoid information leakage and ensure the objectivity and accuracy of model performance evaluation. Experimental results on two public HSI datasets demonstrate the accuracy and efficiency of the proposed A²Former framework, outperforming several well-known patch-free and patch-based methods on two public HSI datasets. Full article

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

► Show Figures

Figure 1

42 pages, 3251 KB

Open AccessArticle

Efficient and Accurate Epilepsy Seizure Prediction and Detection Based on Multi-Teacher Knowledge Distillation RGF-Model

by Wei Cao, Qi Li, Anyuan Zhang and Tianze Wang

Brain Sci. 2026, 16(1), 83; https://doi.org/10.3390/brainsci16010083 - 9 Jan 2026

Viewed by 562

Abstract

Background: Epileptic seizures are unpredictable, and while existing deep learning models achieve high accuracy, their deployment on wearable devices is constrained by high computational costs and latency. To address this, this work proposes the RGF-Model, a lightweight network that unifies seizure prediction and [...] Read more.

Background: Epileptic seizures are unpredictable, and while existing deep learning models achieve high accuracy, their deployment on wearable devices is constrained by high computational costs and latency. To address this, this work proposes the RGF-Model, a lightweight network that unifies seizure prediction and detection within a single causal framework. Methods: By integrating Feature-wise Linear Modulation (FiLM) with a Ring-Buffer Gated Recurrent Unit (Ring-GRU), the model achieves adaptive task-specific feature conditioning while strictly enforcing causal consistency for real-time inference. A multi-teacher knowledge distillation strategy is employed to transfer complementary knowledge from complex teacher ensembles to the lightweight student, significantly reducing complexity without sacrificing accuracy. Results: Evaluations on the CHB-MIT and Siena datasets demonstrate that the RGF-Model outperforms state-of-the-art teacher models in terms of efficiency while maintaining comparable accuracy. Specifically, on CHB-MIT, it achieves 99.54% Area Under the Curve (AUC) and 0.01 False Prediction Rate per hour (FPR/h) for prediction, and 98.78% Accuracy (Acc) for detection, with only 0.082 million parameters. Statistical significance was assessed using a random predictor baseline (

p

< 0.05). Conclusions: The results indicate that the RGF-Model provides a highly efficient solution for real-time wearable epilepsy monitoring. Full article

(This article belongs to the Section Neurotechnology and Neuroimaging)

► Show Figures

Figure 1

28 pages, 2812 KB

Open AccessArticle

An Integrated Machine Learning-Based Framework for Road Roughness Severity Classification and Predictive Maintenance Planning in Urban Transportation System

by Olusola O. Ajayi, Anish M. Kurien, Karim Djouani and Lamine Dieng

Appl. Sci. 2025, 15(24), 12916; https://doi.org/10.3390/app152412916 - 8 Dec 2025

Viewed by 547

Abstract

Recent advances in vibration-based pavement assessment have enabled the low-cost monitoring of road conditions using inertial sensors and machine learning models. However, most studies focus on isolated tasks, such as roughness classification, without integrating statistical validation, anomaly detection, or maintenance prioritization. This study [...] Read more.

Recent advances in vibration-based pavement assessment have enabled the low-cost monitoring of road conditions using inertial sensors and machine learning models. However, most studies focus on isolated tasks, such as roughness classification, without integrating statistical validation, anomaly detection, or maintenance prioritization. This study presents a unified framework for road roughness severity classification and predictive maintenance using multi-axis accelerometer data collected from urban road networks in Pretoria, South Africa. The proposed pipeline integrates ISO-referenced labeling, ensemble and deep classifiers (Random Forest, XGBoost, MLP, and 1D-CNN), McNemar’s test for model agreement validation, feature importance interpretation, and GIS-based anomaly mapping. Stratified cross-validation and hyperparameter tuning ensured robust generalization, with accuracies exceeding 99%. Statistical outlier detection enabled the early identification of deteriorated segments, supporting proactive maintenance planning. The results confirm that vertical acceleration (accel_z) is the most discriminative signal for roughness severity, validating the feasibility of lightweight single-axis sensing. The study concludes that combining supervised learning with statistical anomaly detection can provide an intelligent, scalable, and cost-effective foundation for municipal pavement management systems. The modular design further supports integration with Internet-of-Things (IoT) telematics platforms for near-real-time road condition monitoring and sustainable transport asset management. Full article

► Show Figures

Figure 1

31 pages, 1530 KB

Open AccessArticle

Towards Resilient Agriculture: A Novel UAV-Based Lightweight Deep Learning Framework for Wheat Head Detection

by Na Luo, Yao Yang, Xiwei Yang, Di Yang, Jiao Tang, Siyuan Duan, Hou Huang and He Zhu

Mathematics 2025, 13(23), 3844; https://doi.org/10.3390/math13233844 - 1 Dec 2025

Viewed by 465

Abstract

Precision agriculture increasingly relies on unmanned aerial vehicle (UAV) imagery for high-throughput crop phenotyping, yet existing deep learning detection models face critical constraints limiting practical deployment: computational demands incompatible with edge computing platforms and insufficient accuracy for multi-scale object detection across diverse environmental [...] Read more.

Precision agriculture increasingly relies on unmanned aerial vehicle (UAV) imagery for high-throughput crop phenotyping, yet existing deep learning detection models face critical constraints limiting practical deployment: computational demands incompatible with edge computing platforms and insufficient accuracy for multi-scale object detection across diverse environmental conditions. We present LSM-YOLO, a lightweight detection framework specifically designed for aerial wheat head monitoring that achieves state-of-the-art performance while maintaining minimal computational requirements. The architecture integrates three synergistic innovations: a Lightweight Adaptive Extraction (LAE) module that reduces parameters by 87.3% through efficient spatial rearrangement and adaptive feature weighting while preserving critical boundary information; a P2-level high-resolution detection head that substantially improves small object recall in high-altitude imagery; and a Dynamic Head mechanism employing unified multi-dimensional attention across scale, spatial, and task dimensions. Comprehensive evaluation on the Global Wheat Head Detection dataset demonstrates that LSM-YOLO achieves 91.4% mAP@0.5 and 51.0% mAP@0.5:0.95—representing 21.1% and 37.1% improvements over baseline YOLO11n—while requiring only 1.29 M parameters and 3.4 GFLOPs, constituting 50.0% parameter reduction and 46.0% computational cost reduction compared to the baseline. Full article

(This article belongs to the Special Issue Optimization and Machine Learning-Based Methods in Air Traffic Management and Aeronautical Domains)

► Show Figures

Figure 1

22 pages, 18974 KB

Open AccessArticle

Lightweight 3D CNN for MRI Analysis in Alzheimer’s Disease: Balancing Accuracy and Efficiency

by Kerang Cao, Zhongqing Lu, Chengkui Zhao, Jiaming Du, Lele Li, Hoekyung Jung and Minghui Geng

J. Imaging 2025, 11(12), 426; https://doi.org/10.3390/jimaging11120426 - 28 Nov 2025

Viewed by 1122

Abstract

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder characterized by subtle structural changes in the brain, which can be observed through MRI scans. Although traditional diagnostic approaches rely on clinical and neuropsychological assessments, deep learning-based methods such as 3D convolutional neural networks (CNNs) [...] Read more.

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder characterized by subtle structural changes in the brain, which can be observed through MRI scans. Although traditional diagnostic approaches rely on clinical and neuropsychological assessments, deep learning-based methods such as 3D convolutional neural networks (CNNs) have recently been introduced to improve diagnostic accuracy. However, their high computational complexity remains a challenge. To address this, we propose a lightweight magnetic resonance imaging (MRI) classification framework that integrates adaptive multi-scale feature extraction with structural pruning and parameter optimization. The pruned model achieving a compact architecture with approximately 490k parameters (0.49 million), 4.39 billion floating-point operations, and a model size of 1.9 MB, while maintaining high classification performance across three binary tasks. The proposed framework was evaluated on the Alzheimer’s Disease Neuroimaging Initiative dataset, a widely used benchmark for AD research. Notably, the model achieves a performance density(PD) of 189.87, where PD is a custom efficiency metric defined as the classification accuracy per million parameters (% pm), which is approximately 70× higher than the basemodel, reflecting its balance between accuracy and computational efficiency. Experimental results demonstrate that the proposed framework significantly reduces resource consumption without compromising diagnostic performance, providing a practical foundation for real-time and resource-constrained clinical applications in Alzheimer’s disease detection. Full article

(This article belongs to the Special Issue AI-Driven Image and Video Understanding)

► Show Figures

Figure 1

24 pages, 39644 KB

Open AccessArticle

Locate then Calibrate: A Synergistic Framework for Small Object Detection from Aerial Imagery to Ground-Level Views

by Kaiye Lin, Zhexiang Zhao and Na Niu

Remote Sens. 2025, 17(22), 3750; https://doi.org/10.3390/rs17223750 - 18 Nov 2025

Viewed by 629

Abstract

Detection of small objects in aerial images captured by Unmanned Aerial Vehicles (UAVs) is a critical task in remote sensing. It is vital for applications like urban monitoring and disaster assessment. This task, however, is challenged by unique viewpoints, diminutive target sizes, and [...] Read more.

Detection of small objects in aerial images captured by Unmanned Aerial Vehicles (UAVs) is a critical task in remote sensing. It is vital for applications like urban monitoring and disaster assessment. This task, however, is challenged by unique viewpoints, diminutive target sizes, and dense scenes. To surmount these challenges, this paper introduces the Locate then Calibrate (LTC) framework. It is a deep learning architecture designed to enhance visual perception systems, specifically for the accurate and robust detection of small objects. Our model builds upon the YOLOv8 architecture and incorporates three synergistic innovations. (1) An Efficient Multi-Scale Attention (EMA) mechanism is employed to ‘Locate’ salient targets by capturing critical cross-dimensional dependencies. (2) We propose a novel Adaptive Multi-Scale (AMS) convolution module to ‘Calibrate’ features, using dynamically learned weights to optimally fuse multi-scale information. (3) An additional high-resolution P2 detection head preserves the fine-grained details essential for localizing diminutive targets. Extensive experimental evaluations demonstrate that the proposed model substantially outperforms the YOLOv8n baseline. Notably, it achieves significant performance gains on the challenging VisDrone aerial dataset. On this dataset, the model achieves a remarkable 11.7% relative increase in mean Average Precision (mAP50). The framework also shows strong generalization. Considerable improvements are recorded on ground-level autonomous driving benchmarks such as KITTI and TT100K_mini. This validated effectiveness proves that LTC is a robust solution for high-accuracy detection: it achieves significant accuracy gains at the cost of a deliberate increase in computational GFLOPs, while maintaining a lightweight parameter count. This design choice positions LTC as a solution for edge applications where accuracy is prioritized over minimal computational cost. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

19 pages, 11078 KB

Open AccessArticle

A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation

by Yingjian Yu, Zhang Li and Qifeng Yu

Drones 2025, 9(11), 774; https://doi.org/10.3390/drones9110774 - 7 Nov 2025

Viewed by 800

Abstract

In this paper, we present a Unified Framework for cross-domain Space drone Pose Estimation (UF-SPE), addressing the simulation-to-reality gap that limits the deployment of deep learning models in real space missions. The proposed UF-SPE framework integrates offline domain generalization with online unsupervised domain [...] Read more.

In this paper, we present a Unified Framework for cross-domain Space drone Pose Estimation (UF-SPE), addressing the simulation-to-reality gap that limits the deployment of deep learning models in real space missions. The proposed UF-SPE framework integrates offline domain generalization with online unsupervised domain adaptation. During offline training, the model relies exclusively on synthetic images. It employs advanced augmentation techniques and a multi-task architecture equipped with Domain Shifting Uncertainty modules to improve the learning of domain-invariant features. In the online phase, normalization layers are fine-tuned using unlabeled real-world imagery via entropy minimization, allowing for the system to adapt to target domain distributions without manual labels. Experiments on the SPEED+ benchmark demonstrate that the UF-SPE achieves competitive accuracy with just 12.9 M parameters, outperforming the comparable lightweight baseline method by 37.5% in pose estimation accuracy. The results validate the framework’s efficacy and efficiency for robust cross-domain space drone pose estimation, indicating promise for applications such as on-orbit servicing, debris removal, and autonomous rendezvous. Full article

► Show Figures

Figure 1

21 pages, 9302 KB

Open AccessArticle

Research on Small Object Detection in Degraded Visual Scenes: An Improved DRF-YOLO Algorithm Based on YOLOv11

by Yan Gu, Lingshan Chen and Tian Su

World Electr. Veh. J. 2025, 16(11), 591; https://doi.org/10.3390/wevj16110591 - 23 Oct 2025

Cited by 2 | Viewed by 1564

Abstract

Object detection in degraded environments such as low-light and nighttime conditions remains a challenging task, as conventional computer vision techniques often fail to achieve high precision and robust performance. With the increasing adoption of deep learning, this paper aims to enhance object detection [...] Read more.

Object detection in degraded environments such as low-light and nighttime conditions remains a challenging task, as conventional computer vision techniques often fail to achieve high precision and robust performance. With the increasing adoption of deep learning, this paper aims to enhance object detection under such adverse conditions by proposing an improved version of YOLOv11, named DRF-YOLO (Degradation-Robust and Feature-enhanced YOLO). The proposed framework incorporates three innovative components: (1) a lightweight Cross Stage Partial Multi-Scale Edge Enhancement (CSP-MSEE) module that combines multi-scale feature extraction with edge enhancement to strengthen feature representation; (2) a Focal Modulation attention mechanism that improves the network’s responsiveness to target regions and contextual information; and (3) a self-developed Dynamic Interaction Head (DIH) that enhances detection accuracy and spatial adaptability for small objects. In addition, a lightweight unsupervised image enhancement algorithm, Zero-DCE (Zero-Reference Deep Curve Estimation), is introduced prior to training to improve image contrast and detail, and Generalized Intersection over Union (GIoU) is employed as the bounding box regression loss. To evaluate the effectiveness of DRF-YOLO, experiments are conducted on two representative low-light datasets: ExDark and the nighttime subset of BDD100K, which include images of vehicles, pedestrians, and other road objects. Results show that DRF-YOLO achieves improvements of 3.4% and 2.3% in mAP@0.5 compared with the original YOLOv11, demonstrating enhanced robustness and accuracy in degraded environments while maintaining lightweight efficiency. Full article

► Show Figures

Figure 1

14 pages, 2127 KB

Open AccessArticle

CycleGAN with Atrous Spatial Pyramid Pooling and Attention-Enhanced MobileNetV4 for Tomato Disease Recognition Under Limited Training Data

by Yueming Jiang, Taizeng Jiang, Chunyan Song and Jian Wang

Appl. Sci. 2025, 15(19), 10790; https://doi.org/10.3390/app151910790 - 7 Oct 2025

Cited by 1 | Viewed by 803

Abstract

To address the challenges of poor model generalization and suboptimal recognition accuracy stemming from limited and imbalanced sample sizes in tomato leaf disease identification, this study proposes a novel recognition strategy. This approach synergistically combines an enhanced image augmentation method based on generative [...] Read more.

To address the challenges of poor model generalization and suboptimal recognition accuracy stemming from limited and imbalanced sample sizes in tomato leaf disease identification, this study proposes a novel recognition strategy. This approach synergistically combines an enhanced image augmentation method based on generative adversarial networks with a lightweight deep learning model. Initially, an Atrous Spatial Pyramid Pooling (ASPP) module is integrated into the CycleGAN framework. This integration enhances the generator’s capacity to model multi-scale pathological lesion features, thereby significantly improving the diversity and realism of synthesized images. Subsequently, the Convolutional Block Attention Module (CBAM), incorporating both channel and spatial attention mechanisms, is embedded into the MobileNetV4 architecture. This enhancement boosts the model’s ability to focus on critical disease regions. Experimental results demonstrate that the proposed ASPP-CycleGAN significantly outperforms the original CycleGAN across multiple disease image generation tasks. Furthermore, the developed CBAM-MobileNetV4 model achieves a remarkable average recognition accuracy exceeding 97% for common tomato diseases, including early blight, late blight, and mosaic disease, representing a 1.86% improvement over the baseline MobileNetV4. The findings indicate that the proposed method offers exceptional data augmentation capabilities and classification performance under small-sample learning conditions, providing an effective technical foundation for the intelligent identification and control of tomato leaf diseases. Full article

(This article belongs to the Section Agricultural Science and Technology)

► Show Figures

Figure 1

20 pages, 162180 KB

Open AccessArticle

Annotation-Efficient and Domain-General Segmentation from Weak Labels: A Bounding Box-Guided Approach

by Ammar M. Okran, Hatem A. Rashwan, Sylvie Chambon and Domenec Puig

Electronics 2025, 14(19), 3917; https://doi.org/10.3390/electronics14193917 - 1 Oct 2025

Cited by 2 | Viewed by 1191

Abstract

Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations [...] Read more.

Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations are time-consuming, expensive, and subject to inter-observer variability. To address these challenges, this work proposes a weakly supervised and annotation-efficient segmentation framework that integrates sparse bounding-box annotations with a limited subset of strong (pixel-level) labels to train robust segmentation models. The fundamental element of the framework is a lightweight Bounding Box Encoder that converts weak annotations into multi-scale attention maps. These maps guide a ConvNeXt-Base encoder, and a lightweight U-Net–style convolutional neural network (CNN) decoder—using nearest-neighbor upsampling and skip connections—reconstructs the final segmentation mask. This design enables the model to focus on semantically relevant regions without relying on full supervision, drastically reducing annotation cost while maintaining high accuracy. We validate our framework on two distinct domains, road crack detection and skin cancer segmentation, demonstrating that it achieves performance comparable to fully supervised segmentation models using only 10–20% of strong annotations. Given the ability of the proposed framework to generalize across varied visual contexts, it has strong potential as a general annotation-efficient segmentation tool for domains where strong labeling is costly or infeasible. Full article

(This article belongs to the Special Issue Advanced Machine Learning, Pattern Recognition, and Deep Learning Technologies: Methodologies and Applications, 2nd Edition)

► Show Figures

Figure 1

30 pages, 5137 KB

Open AccessArticle

High-Resolution Remote Sensing Imagery Water Body Extraction Using a U-Net with Cross-Layer Multi-Scale Attention Fusion

by Chunyan Huang, Mingyang Wang, Zichao Zhu and Yanling Li

Sensors 2025, 25(18), 5655; https://doi.org/10.3390/s25185655 - 10 Sep 2025

Cited by 2 | Viewed by 1716

Abstract

The accurate extraction of water bodies from remote sensing imagery is crucial for water resource monitoring and flood disaster warning. However, this task faces significant challenges due to complex land cover, large variations in water body morphology and spatial scales, and spectral similarities [...] Read more.

The accurate extraction of water bodies from remote sensing imagery is crucial for water resource monitoring and flood disaster warning. However, this task faces significant challenges due to complex land cover, large variations in water body morphology and spatial scales, and spectral similarities between water and non-water features, leading to misclassification and low accuracy. While deep learning-based methods have become a research hotspot, traditional convolutional neural networks (CNNs) struggle to represent multi-scale features and capture global water body information effectively. To enhance water feature recognition and precisely delineate water boundaries, we propose the AMU-Net model. Initially, an improved residual connection module was embedded into the U-Net backbone to enhance complex feature learning. Subsequently, a multi-scale attention mechanism was introduced, combining grouped channel attention with multi-scale convolutional strategies for lightweight yet precise segmentation. Thereafter, a dual-attention gated modulation module dynamically fusing channel and spatial attention was employed to strengthen boundary localization. Furthermore, a cross-layer geometric attention fusion module, incorporating grouped projection convolution and a triple-level geometric attention mechanism, optimizes segmentation accuracy and boundary quality. Finally, a triple-constraint loss framework synergistically optimized global classification, regional overlap, and background specificity to boost segmentation performance. Evaluated on the GID and WHDLD datasets, AMU-Net achieved remarkable IoU scores of 93.6% and 95.02%, respectively, providing an effective new solution for remote sensing water body extraction. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

Search Results (32)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (32)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI