Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (183)

Search Parameters:
Keywords = SE-CNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 5361 KB  
Article
Dual-Stream 2D and 3D-SE-ResNet Architectures for Crop Mapping Using EnMAP Hyperspectral Time-Series
by László Mucsi, Márkó Sóti, Dorottya Litkey-Kovács, János Mészáros, Dóra Vigh-Szabó, Elemér Szalma, Zalán Tobak and József Szatmári
Remote Sens. 2026, 18(6), 884; https://doi.org/10.3390/rs18060884 - 13 Mar 2026
Viewed by 205
Abstract
Deep learning-based crop mapping from hyperspectral satellite data offers immense potential for capturing subtle phenological differences, yet leveraging sparse time series remains a major methodological challenge. This study evaluates the ability of the EnMAP sensor to identify nine major crop types in the [...] Read more.
Deep learning-based crop mapping from hyperspectral satellite data offers immense potential for capturing subtle phenological differences, yet leveraging sparse time series remains a major methodological challenge. This study evaluates the ability of the EnMAP sensor to identify nine major crop types in the intensive agricultural landscape of Southeastern Hungary. We utilized a limited time series (November, March, August) to benchmark two modeling strategies: a single-date dual-stream spatial–spectral 2D-CNN (DSS-2D) and a multi-temporal 3D-SE-ResNet. Model performance was assessed using parcel-level spatial cross-validation to ensure realistic accuracy estimates and reduce spatial autocorrelation bias. The results demonstrate that the DSS-2D model achieved superior single-date accuracy (OA > 97%), significantly outperforming pixel-based baselines. Furthermore, the multi-temporal 3D-SE-ResNet achieved a robust seasonal accuracy of 92.9%, effectively compensating for temporal sparsity by exploiting the deep spectral information of the SWIR domain. This study confirms that treating hyperspectral data as a 3D volume enables the extraction of phenological traits even from limited observations. These findings provide a strong proof-of-concept for the operational feasibility of future missions such as Copernicus CHIME for continental-scale food security monitoring. Full article
Show Figures

Figure 1

20 pages, 3228 KB  
Article
Symmetry-Aware Byzantine Resilience in Federated Learning via Dual-Channel Attention-Driven Anomaly Detection
by Yuliang Zhang, Jian Hou, Xianke Zhou, Linjie Ruan, Xianyu Luo and Lili Wang
Symmetry 2026, 18(3), 478; https://doi.org/10.3390/sym18030478 - 11 Mar 2026
Viewed by 131
Abstract
Byzantine failures remain a critical threat to Federated Learning (FL), where malicious clients inject adversarial updates to disrupt global model convergence. From the perspective of symmetry, benign client updates typically exhibit statistical symmetry around the global consensus, whereas Byzantine attacks function as “symmetry-breaking” [...] Read more.
Byzantine failures remain a critical threat to Federated Learning (FL), where malicious clients inject adversarial updates to disrupt global model convergence. From the perspective of symmetry, benign client updates typically exhibit statistical symmetry around the global consensus, whereas Byzantine attacks function as “symmetry-breaking” events that introduce skewness and distributional anomalies. Existing defenses often rely on unrealistic assumptions or fail to capture these asymmetric deviations under high-dimensional non-IID settings. In this paper, we propose a symmetry-aware Byzantine-resilient FL framework driven by a Dual-Channel Attention-Driven Anomaly Detector (DAAD). Specifically, DAAD transforms inter-client behaviors into geometrically symmetric interaction matrices—encoding Gradient Cosine Similarities and Loss Euclidean Distances—to construct dual-channel spatial representations. These representations are processed via a Convolutional Neural Network (CNN) enhanced with Squeeze-and-Excitation (SE) attention blocks, which leverage the inherent symmetry of benign consensus to extract robust adversarial signatures. The detector is pre-trained offline on a synthetic dataset incorporating a diverse portfolio of simulated attacks (e.g., Gaussian noise and label flipping). Crucially, this pre-trained model is seamlessly embedded into the online FL loop to filter updates without requiring ground-truth labels. By jointly encoding client behaviors and learning cross-modal attack signatures, our framework enables reliable detection even when over half of the clients are Byzantine. Extensive experiments on MNIST, CIFAR-10, and FEMNIST datasets demonstrate that DAAD consistently outperforms existing robust aggregation baselines in both anomaly detection accuracy and global model performance, especially under high Byzantine ratios and non-IID conditions. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

22 pages, 3598 KB  
Article
Fractional Tchebichef-ResNet-SE: A Hybrid Deep Learning Framework Integrating Fractional Tchebichef Moments with Attention Mechanisms for Enhanced IoT Intrusion Detection
by Islam S. Fathi, Ahmed R. El-Saeed, Mohammed Tawfik and Gaber Hassan
Fractal Fract. 2026, 10(3), 172; https://doi.org/10.3390/fractalfract10030172 - 5 Mar 2026
Viewed by 200
Abstract
The Internet of Things (IoT) faces critical security challenges stemming from resource-constrained devices and inadequate intrusion detection capabilities. Traditional machine learning approaches struggle with high-dimensional network traffic data due to the curse of dimensionality, severe class imbalance between benign and malicious traffic, and [...] Read more.
The Internet of Things (IoT) faces critical security challenges stemming from resource-constrained devices and inadequate intrusion detection capabilities. Traditional machine learning approaches struggle with high-dimensional network traffic data due to the curse of dimensionality, severe class imbalance between benign and malicious traffic, and dependence on manual feature engineering that fails to capture complex non-linear attack patterns. Although deep neural networks offer automatic feature extraction, they suffer from two fundamental limitations: the degradation problem, where increasing network depth paradoxically raises training error rather than improving performance, and uniform channel weighting, which prevents the network from adaptively emphasizing attack-relevant features while suppressing irrelevant noise. This research proposes a novel hybrid framework integrating Fractional Tchebichef moment-based feature preprocessing with deep Residual Networks enhanced by Squeeze-and-Excitation (ResNet-SE) attention mechanisms. Fractional Tchebichef moments provide compact, noise-resistant representations by operating directly in the discrete domain, eliminating discretization errors inherent in continuous moment approaches. Network traffic features are transformed into 232 × 232 moment-based matrices capturing discriminative patterns across multiple scales. Comprehensive evaluation on Bot-IoT and Leopard Mobile IoT datasets demonstrates superior performance, achieving 99.78% accuracy and a 99.37% F1-score, substantially outperforming K-Nearest Neighbors (84.7%), Support Vector Machines (87.5%), and baseline CNNs (99.3%). Ablation studies confirm synergistic contributions, with residual connections contributing 0.18% and SE attention adding 0.14% improvements. Cross-dataset evaluation achieves 96.34% and 97.12% accuracy on UNSW-NB15 and IoT-Bot datasets without retraining, while the framework processes 127.9 samples per second across diverse attack taxonomies. Full article
(This article belongs to the Section Optimization, Big Data, and AI/ML)
Show Figures

Figure 1

15 pages, 1404 KB  
Article
A Deep Learning-Based Decision Support System for Cholelithiasis in MRI Data
by Ebru Hasbay, Caglar Cengizler, Mahmut Ucar, Nagihan Durgun, Hayriye Ulkucan Disli and Deniz Bolat
J. Clin. Med. 2026, 15(5), 1891; https://doi.org/10.3390/jcm15051891 - 2 Mar 2026
Viewed by 276
Abstract
Background: Cholelithiasis can lead to significant complications if not diagnosed and treated promptly. Recent advances in deep learning and the improved ability of computer systems to detect clinically significant textural and morphological patterns in magnetic resonance imaging (MRI) can help reduce the time [...] Read more.
Background: Cholelithiasis can lead to significant complications if not diagnosed and treated promptly. Recent advances in deep learning and the improved ability of computer systems to detect clinically significant textural and morphological patterns in magnetic resonance imaging (MRI) can help reduce the time and resources required for the radiological evaluation of the gallbladder and cholelithiasis. Objective: To detect cholelithiasis, a support system with a graphical user interface for magnetic resonance (MR) images of the gallbladder was implemented to reduce the manual effort and time required to identify gallstones. Method: A commonly used deep learning model for pixel-level mask generation and instance segmentation, Mask Region Based Convolutional Neural Network (Mask R-CNN), was modified, trained, and evaluated to provide a robust pipeline for automated analysis. The primary aim was to automatically locate and label the gallbladder in T2-weighted axial MR images to detect gallstones and highlight the visual characteristics of the target region, thereby supporting radiologists. All automation was designed to operate on a single optimal slice instead of the entire volume. While this approach limits generalisability, it offers a practical starting point for method development. This setup reflects a feasibility-oriented design, rather than a comprehensive diagnostic capability. The dataset included 788 axial MR images from different patients. Each image was labeled and segmented by an experienced radiologist to train and test the models at the image level. Results: The proposed model with squeeze and excitation (SE) modification improved classification accuracy, and at the image level, stone detection improved in terms of accuracy, precision, and specificity, although recall and F1 scores slightly decreased. Conclusions: The results show that the modified Mask R-CNN model can detect gallstones with up to 0.89 accuracy, supporting the clinical applicability of the proposed method. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

17 pages, 1091 KB  
Article
ASD Recognition Through Weighted Integration of Landmark-Based Handcrafted and Pixel-Based Deep Learning Features
by Asahi Sekine, Abu Saleh Musa Miah, Koki Hirooka, Najmul Hassan, Md. Al Mehedi Hasan, Yuichi Okuyama, Yoichi Tomioka and Jungpil Shin
Computers 2026, 15(2), 124; https://doi.org/10.3390/computers15020124 - 13 Feb 2026
Viewed by 469
Abstract
Autism Spectrum Disorder (ASD) is a neurological condition that affects communication and social interaction skills, with individuals experiencing a range of challenges that often require specialized care. Automated systems for recognizing ASD face significant challenges due to the complexity of identifying distinguishing features [...] Read more.
Autism Spectrum Disorder (ASD) is a neurological condition that affects communication and social interaction skills, with individuals experiencing a range of challenges that often require specialized care. Automated systems for recognizing ASD face significant challenges due to the complexity of identifying distinguishing features from facial images. This study proposes an incremental advancement in ASD recognition by introducing a dual-stream model that combines handcrafted facial-landmark features with deep learning-based pixel-level features. The model processes images through two distinct streams to capture complementary aspects of facial information. In the first stream, facial landmarks are extracted using MediaPipe (v0.10.21),with a focus on 137 symmetric landmarks. The face’s position is adjusted using in-plane rotation based on eye-corner angles, and geometric features along with 52 blendshape features are processed through Dense layers. In the second stream, RGB image features are extracted using pre-trained CNNs (e.g., ResNet50V2, DenseNet121, InceptionV3) enhanced with Squeeze-and-Excitation (SE) blocks, followed by feature refinement through Global Average Pooling (GAP) and DenseNet layers. The outputs from both streams are fused using weighted concatenation through a softmax gate, followed by further feature refinement for classification. This hybrid approach significantly improves the ability to distinguish between ASD and non-ASD faces, demonstrating the benefits of combining geometric and pixel-based features. The model achieved an accuracy of 96.43% on the Kaggle dataset and 97.83% on the YTUIA dataset. Statistical hypothesis testing further confirms that the proposed approach provides a statistically meaningful advantage over strong baselines, particularly in terms of classification correctness and robustness across datasets. While these results are promising, they show incremental improvements over existing methods, and future work will focus on optimizing performance to exceed current benchmarks. Full article
(This article belongs to the Special Issue Machine and Deep Learning in the Health Domain (3rd Edition))
Show Figures

Figure 1

29 pages, 103124 KB  
Article
Enhancing Cross-Regional Generalization in UAV Forest Segmentation Across Plantation and Natural Forests with Attention-Refined PP-LiteSeg Networks
by Xinyu Ma, Shuang Zhang, Kaibo Li, Xiaorui Wang, Hong Lin and Zhenping Qiang
Remote Sens. 2026, 18(3), 523; https://doi.org/10.3390/rs18030523 - 5 Feb 2026
Viewed by 326
Abstract
Accurate fine-scale forest mapping is fundamental for ecological monitoring and resource management. While deep learning semantic segmentation methods have advanced the interpretation of high-resolution UAV imagery, their generalization across diverse forest regions remains challenging due to high spatial heterogeneity. To address this, we [...] Read more.
Accurate fine-scale forest mapping is fundamental for ecological monitoring and resource management. While deep learning semantic segmentation methods have advanced the interpretation of high-resolution UAV imagery, their generalization across diverse forest regions remains challenging due to high spatial heterogeneity. To address this, we propose two enhanced versions based on the PP-LiteSeg architecture for robust cross-regional forest segmentation. Version 01 (V01) integrates a multi-branch attention fusion module composed of parallel channel, spatial, and pixel attention branches. This design enables fine-grained feature enhancement and precise boundary delineation in structurally regular artificial forests, such as the Huayuan Forest Farm. As a result, V01 achieves a mIoU of 92.64% and an F1-score of 96.10%, representing an approximately 18 percentage-point mIoU improvement over PSPNet and DeepLabv3+. Building on this, Version 02 (V02) introduces a lightweight residual connection that directly shortcuts the fused features, thereby improving feature stability and robustness under complex textures and illumination, and demonstrates stronger performance in naturally heterogeneous forests (Longhai Township), attaining an mIoU of 91.87% and an F1-score of 95.77% (5.72 percentage-point mIoU gain over DeepLabv3+). We further conduct comprehensive comparisons against conventional CNN baselines as well as representative lightweight and transformer-based models (BiSeNetV2 and SegFormer-B0). In bidirectional cross-region transfer (train on one region and directly test on the other), V02 exhibits the most stable performance with minimal degradation, highlighting its robustness under domain shift. On a combined cross-regional dataset, V02 achieves a leading mIoU of 91.50%, outperforming U-Net, DeepLabv3+, and PSPNet. In summary, V01 excels in boundary delineation for regular plantation forests, whereas V02 shows more stable generalization across highly varied natural forest landscapes, providing practical solutions for region-adaptive UAV forest segmentation. Full article
(This article belongs to the Special Issue Remote Sensing-Assisted Forest Inventory Planning)
Show Figures

Figure 1

28 pages, 7334 KB  
Article
I-GhostNetV3: A Lightweight Deep Learning Framework for Vision-Sensor-Based Rice Leaf Disease Detection in Smart Agriculture
by Puyu Zhang, Rui Li, Yuxuan Liu, Guoxi Sun and Chenglin Wen
Sensors 2026, 26(3), 1025; https://doi.org/10.3390/s26031025 - 4 Feb 2026
Cited by 1 | Viewed by 455
Abstract
Accurate and timely diagnosis of rice leaf diseases is crucial for smart agriculture leveraging vision sensors. However, existing lightweight convolutional neural networks (CNNs) often struggle in complex field environments, where small lesions, cluttered backgrounds, and varying illumination complicate recognition. This paper presents I-GhostNetV3, [...] Read more.
Accurate and timely diagnosis of rice leaf diseases is crucial for smart agriculture leveraging vision sensors. However, existing lightweight convolutional neural networks (CNNs) often struggle in complex field environments, where small lesions, cluttered backgrounds, and varying illumination complicate recognition. This paper presents I-GhostNetV3, an incrementally improved GhostNetV3-based network for RGB rice leaf disease recognition. I-GhostNetV3 introduces two modular enhancements with controlled overhead: (1) Adaptive Parallel Attention (APA), which integrates edge-guided spatial and channel cues and is selectively inserted to enhance lesion-related representations (at the cost of additional computation), and (2) Fusion Coordinate-Channel Attention (FCCA), a near-neutral SE replacement that enables efficient spatial–channel feature fusion to suppress background interference. Experiments on the Rice Leaf Bacterial and Fungal Disease (RLBF) dataset show that I-GhostNetV3 achieves 90.02% Top-1 accuracy with 1.831 million parameters and 248.694 million FLOPs, outperforming MobileNetV2 and EfficientNet-B0 under our experimental setup while remaining compact relative to the original GhostNetV3. In addition, evaluation on PlantVillage-Corn serves as a supplementary transfer sanity check; further validation on independent real-field target domains and on-device profiling will be explored in future work. These results indicate that I-GhostNetV3 is a promising efficient backbone for future edge deployment in precision agriculture. Full article
Show Figures

Figure 1

13 pages, 1780 KB  
Article
Dual-Branch CNN for Direction-of-Arrival and Number-of-Sources Estimation
by Yufeng Jiang and Lin Zou
Sensors 2026, 26(3), 809; https://doi.org/10.3390/s26030809 - 26 Jan 2026
Viewed by 252
Abstract
Despite numerous conventional direction-of-arrival (DOA) methods, relationships between number of sources (NOS) and DOA are often ignored, which could yield meaningful estimation information. Therefore, a dual-branch Convolutional Neutral Network (CNN) integrated with squeeze-and-excitation (SE) blocks that can perform DOA and NOS estimation simultaneously [...] Read more.
Despite numerous conventional direction-of-arrival (DOA) methods, relationships between number of sources (NOS) and DOA are often ignored, which could yield meaningful estimation information. Therefore, a dual-branch Convolutional Neutral Network (CNN) integrated with squeeze-and-excitation (SE) blocks that can perform DOA and NOS estimation simultaneously is proposed to address such limitations. Extensive simulations demonstrate the superiority of the proposed model over several traditional algorithms, especially under low signal-to-noise (SNR) conditions, limited snapshots, and in closely spaced incident angle scenarios. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

16 pages, 1206 KB  
Article
HASwinNet: A Swin Transformer-Based Denoising Framework with Hybrid Attention for mmWave MIMO Systems
by Xi Han, Houya Tu, Jiaxi Ying, Junqiao Chen and Zhiqiang Xing
Entropy 2026, 28(1), 124; https://doi.org/10.3390/e28010124 - 20 Jan 2026
Viewed by 368
Abstract
Millimeter-wave (mmWave) massive multiple-input, multiple-output (MIMO) systems are a cornerstone technology for integrated sensing and communication (ISAC) in sixth-generation (6G) mobile networks. These systems provide high-capacity backhaul while simultaneously enabling high-resolution environmental sensing. However, accurate channel estimation remains highly challenging due to intrinsic [...] Read more.
Millimeter-wave (mmWave) massive multiple-input, multiple-output (MIMO) systems are a cornerstone technology for integrated sensing and communication (ISAC) in sixth-generation (6G) mobile networks. These systems provide high-capacity backhaul while simultaneously enabling high-resolution environmental sensing. However, accurate channel estimation remains highly challenging due to intrinsic noise sensitivity and clustered sparse multipath structures. These challenges are particularly severe under limited pilot resources and low signal-to-noise ratio (SNR) conditions. To address these difficulties, this paper proposes HASwinNet, a deep learning (DL) framework designed for mmWave channel denoising. The framework integrates a hierarchical Swin Transformer encoder for structured representation learning. It further incorporates two complementary branches. The first branch performs sparse token extraction guided by angular-domain significance. The second branch focuses on angular-domain refinement by applying discrete Fourier transform (DFT), squeeze-and-excitation (SE), and inverse DFT (IDFT) operations. This generates a mask that highlights angularly coherent features. A decoder combines the outputs of both branches with a residual projection from the input to yield refined channel estimates. Additionally, we introduce an angular-domain perceptual loss during training. This enforces spectral consistency and preserves clustered multipath structures. Simulation results based on the Saleh–Valenzuela (S–V) channel model demonstrate that HASwinNet achieves significant improvements in normalized mean squared error (NMSE) and bit error rate (BER). It consistently outperforms convolutional neural network (CNN), long short-term memory (LSTM), and U-Net baselines. Furthermore, experiments with reduced pilot symbols confirm that HASwinNet effectively exploits angular sparsity. The model retains a consistent advantage over baselines even under pilot-limited conditions. These findings validate the scalability of HASwinNet for practical 6G mmWave backhaul applications. They also highlight its potential in ISAC scenarios where accurate channel recovery supports both communication and sensing. Full article
Show Figures

Figure 1

25 pages, 6809 KB  
Article
Sound Insulation Prediction and Analysis of Vehicle Floor Systems Based on Squeeze-and-Excitation ResNet Method
by Yan Ma, Jingjing Wang, Dianlong Pan, Wei Zhao, Xiaotao Yang, Xiaona Liu, Jie Yan and Weiping Ding
Electronics 2026, 15(1), 184; https://doi.org/10.3390/electronics15010184 - 30 Dec 2025
Viewed by 407
Abstract
The floor acoustic package is a crucial component of a vehicle’s overall acoustic insulation system, and its performance directly influences the interior sound field distribution and acoustic comfort. Conventional investigations of acoustic package performance primarily rely on experimental testing and computer-aided engineering (CAE) [...] Read more.
The floor acoustic package is a crucial component of a vehicle’s overall acoustic insulation system, and its performance directly influences the interior sound field distribution and acoustic comfort. Conventional investigations of acoustic package performance primarily rely on experimental testing and computer-aided engineering (CAE) simulations. However, these methods often suffer from limited accuracy control, high computational cost, and low efficiency. In contrast, data-driven modeling approaches have recently demonstrated strong potential in addressing these challenges. In this paper, a Squeeze-and-Excitation Residual Network (SE-ResNet) is proposed to predict and analyze the sound insulation performance of vehicle floor systems based on the original structural and material parameters of acoustic package components. By replacing the conventional CAE process with a data-driven framework, the proposed method enhances prediction accuracy and computational efficiency. With the lowest recorded RMSE of 0.4048 dB across the 200–8000 Hz spectrum, the SE-ResNet model ranks first in overall performance. It substantially outperforms the SE-CNN (0.9207 dB) and also shows a clear advantage over both the SE-LSTM (0.4591 dB) and the ResNet (0.4593 dB). Validation using the acoustic package data of a new vehicle model further confirms the robustness of the proposed approach, yielding an overall RMSE = 0.4089 dB and CORR = 0.9996 on the test dataset. These results collectively demonstrate that the SE-ResNet-based method presents a promising and robust solution for forecasting the sound insulation performance of vehicle floor systems. Moreover, the proposed framework offers methodological and technical support for the data-driven prediction and analysis of other vehicle noise and vibration problems. Full article
Show Figures

Figure 1

19 pages, 3910 KB  
Article
Defect Detection Algorithm of Galvanized Sheet Based on S-C-B-YOLO
by Yicheng Liu, Gaoxia Fan, Hanquan Zhang and Dong Xiao
Mathematics 2026, 14(1), 110; https://doi.org/10.3390/math14010110 - 28 Dec 2025
Viewed by 381
Abstract
Galvanized steel sheets are vital anti-corrosion materials, yet their surface quality is prone to defects that impact performance. Manual inspection is inefficient, while conventional machine vision struggles with complex, small-scale defects in industrial settings. Although deep learning offers promising solutions, standard object detection [...] Read more.
Galvanized steel sheets are vital anti-corrosion materials, yet their surface quality is prone to defects that impact performance. Manual inspection is inefficient, while conventional machine vision struggles with complex, small-scale defects in industrial settings. Although deep learning offers promising solutions, standard object detection models like YOLOv5 (which is short for ‘You Only Look Once’) exhibit limitations in handling the subtle textures, scale variations, and reflective surfaces characteristic of galvanized sheet defects. To address these challenges, this paper proposes S-C-B-YOLO, an enhanced detection model based on YOLOv5. First, a Squeeze-and-Excitation (SE) attention mechanism is integrated into the deep layers of the backbone network to adaptively recalibrate channel-wise features, improving focus on defect-relevant information. Second, a Transformer block is combined with a C3 module to form a C3TR module, enhancing the model’s ability to capture global contextual relationships for irregular defects. Finally, the original path aggregation network (PANet) is replaced with a bidirectional feature pyramid network (Bi-FPN) to facilitate more efficient multi-scale feature fusion, significantly boosting sensitivity to small defects. Extensive experiments on a dedicated galvanized sheet defect dataset show that S-C-B-YOLO achieves a mean average precision (mAP@0.5) of 92.6% and an inference speed of 62 FPS, outperforming several baseline models including YOLOv3, YOLOv7, and Faster R-CNN. The proposed model demonstrates a favorable balance between accuracy and speed, offering a robust and practical solution for automated, real-time defect inspection in galvanized steel production. Full article
(This article belongs to the Special Issue Advance in Neural Networks and Visual Learning)
Show Figures

Figure 1

26 pages, 8192 KB  
Article
Enhancing Deep Learning Models with Attention Mechanisms for Interpretable Detection of Date Palm Diseases and Pests
by Amine El Hanafy, Abdelaaziz Hessane and Yousef Farhaoui
Technologies 2025, 13(12), 596; https://doi.org/10.3390/technologies13120596 - 18 Dec 2025
Viewed by 650
Abstract
Deep learning has become a powerful tool for diagnosing pests and plant diseases, although conventional convolutional neural networks (CNNs) generally suffer from limited interpretability and suboptimal focus on important image features. This study examines the integration of attention mechanisms into two prevalent CNN [...] Read more.
Deep learning has become a powerful tool for diagnosing pests and plant diseases, although conventional convolutional neural networks (CNNs) generally suffer from limited interpretability and suboptimal focus on important image features. This study examines the integration of attention mechanisms into two prevalent CNN architectures—ResNet50 and MobileNetV2—to improve the interpretability and classification of diseases impacting date palm trees. Four attention modules—Squeeze-and-Excitation (SE), Efficient Channel Attention (ECA), Soft Attention, and the Convolutional Block Attention Module (CBAM)—were systematically integrated into ResNet50 and MobileNetV2 and assessed on the Palm Leaves dataset. Using transfer learning, the models were trained and evaluated through accuracy, F1-score, Grad-CAM visualizations, and quantitative metrics such as entropy and Attention Focus Scores. Analysis was also performed on the model’s complexity, including parameters and FLOPs. To confirm generalization, we tested the improved models on field data that was not part of the dataset used for learning. The experimental results demonstrated that the integration of attention mechanisms substantially improved both predictive accuracy and interpretability across all evaluated architectures. For MobileNetV2, the best performance and the most compact attention maps were obtained with SE and ECA (reaching 91%), while Soft Attention improved accuracy but produced broader, less concentrated activation patterns. For ResNet50, SE achieved the most focused and symptom-specific heatmaps, whereas CBAM reached the highest classification accuracy (up to 90.4%) but generated more spatially diffuse Grad-CAM activations. Overall, these findings demonstrate that attention-enhanced CNNs can provide accurate, interpretable, and robust detection of palm tree diseases and pests under real-world agricultural conditions. Full article
Show Figures

Figure 1

16 pages, 3051 KB  
Article
Automated Classification of Enamel Caries from Intraoral Images Using Deep Learning Models: A Diagnostic Study
by Faris Yahya I. Asiri
J. Clin. Med. 2025, 14(24), 8959; https://doi.org/10.3390/jcm14248959 - 18 Dec 2025
Viewed by 1456
Abstract
Background: Dental caries is a prevalent global oral health issue. The early detection of enamel caries, the initial stage of decay, is critical to preventive dentistry but is often limited by the subjectivity and variability of conventional diagnostic methods. Objective: This study aims [...] Read more.
Background: Dental caries is a prevalent global oral health issue. The early detection of enamel caries, the initial stage of decay, is critical to preventive dentistry but is often limited by the subjectivity and variability of conventional diagnostic methods. Objective: This study aims to develop and evaluate two explainable deep learning models for the automated classification of enamel caries from intraoral images. Dataset and Methodology: A publicly available dataset of 2000 intraoral images showing early-stage enamel caries, advanced enamel caries, no-caries was used. The dataset was split into training, validation, and test sets in a 70:15:15 ratio, and data preprocessing and augmentation were applied to the training set to balance the dataset and prevent model overfitting. Two models were developed, ExplainableDentalNet, a custom lightweight CNN, and Interpretable ResNet50-SE, a fine-tuned ResNet50 model with Squeeze-and-Excitation blocks, and both were integrated with Gradient-Weighted Class Activation Mapping (Grad-CAM) for visual interpretability. Results: As evaluated on the test set, ExplainableDentalNet achieved an overall accuracy of 96.66% and a Matthews Correlation Coefficient [MCC] = 0.95, while Interpretable ResNet50-SE achieved 98.30% accuracy (MCC = 0.975). McNemar’s test indicated no significant prediction bias, with p > 0.05, and internal bootstrap and cross-validation analyses indicated stable performance. Conclusions: The proposed explainable models demonstrated high diagnostic accuracy in enamel caries classification on the studied dataset. While the present findings are promising, future clinical applications will require external validation on multi-center datasets. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Dental Clinical Practice)
Show Figures

Figure 1

28 pages, 33315 KB  
Article
Hyperspectral Image Classification with Multi-Path 3D-CNN and Coordinated Hierarchical Attention
by Wenyi Hu, Wei Shi, Chunjie Lan, Yuxia Li and Lei He
Remote Sens. 2025, 17(24), 4035; https://doi.org/10.3390/rs17244035 - 15 Dec 2025
Cited by 2 | Viewed by 1203
Abstract
Convolutional Neural Networks (CNNs) have been extensively applied for the extraction of deep features in hyperspectral imagery tasks. However, traditional 3D-CNNs are limited by their fixed-size receptive fields and inherent locality. This restricts their ability to capture multi-scale objects and model long-range dependencies, [...] Read more.
Convolutional Neural Networks (CNNs) have been extensively applied for the extraction of deep features in hyperspectral imagery tasks. However, traditional 3D-CNNs are limited by their fixed-size receptive fields and inherent locality. This restricts their ability to capture multi-scale objects and model long-range dependencies, ultimately hindering the representation of large-area land-cover structures. To overcome these drawbacks, we present a new framework designed to integrate multi-scale feature fusion and a hierarchical attention mechanism for hyperspectral image classification. Channel-wise Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM) spatial attention are combined to enhance feature representation from both spectral bands and spatial locations, allowing the network to emphasize critical wavelengths and salient spatial structures. Finally, by integrating the self-attention inherent in the Transformer architecture with a Cross-Attention Fusion (CAF) mechanism, a local-global feature fusion module is developed. This module effectively captures extended-span interdependencies present in hyperspectral remote sensing images, and this process facilitates the effective integration of both localized and holistic attributes. On the Salinas Valley dataset, the proposed method delivers an Overall Accuracy (OA) of 0.9929 and an Average Accuracy (AA) of 0.9949, attaining perfect recognition accuracy for certain classes. The proposed model demonstrates commendable class balance and classification stability. Across multiple publicly available hyperspectral remote sensing image datasets, it systematically produces classification outcomes that significantly outperform those of established benchmark methods, exhibiting distinct advantages in feature representation, structural modeling, and the discrimination of complex ground objects. Full article
Show Figures

Figure 1

18 pages, 3112 KB  
Article
Denatured Recognition of Biological Tissue Using Ultrasonic Phase Space Reconstruction and CBAM-EfficientNet-B0 During HIFU Therapy
by Bei Liu, Haitao Zhu and Xian Zhang
Fractal Fract. 2025, 9(12), 819; https://doi.org/10.3390/fractalfract9120819 - 15 Dec 2025
Viewed by 426
Abstract
This study proposes an automatic denatured recognition method of biological tissue during high-intensity focused ultrasound (HIFU) therapy. The technique integrates ultrasonic phase space reconstruction (PSR) with a convolutional block attention mechanism-enhanced EfficientNet-B0 model (CBAM-EfficientNet-B0). Ultrasonic echo signals are first transformed into high-dimensional phase [...] Read more.
This study proposes an automatic denatured recognition method of biological tissue during high-intensity focused ultrasound (HIFU) therapy. The technique integrates ultrasonic phase space reconstruction (PSR) with a convolutional block attention mechanism-enhanced EfficientNet-B0 model (CBAM-EfficientNet-B0). Ultrasonic echo signals are first transformed into high-dimensional phase space reconstruction trajectory diagrams using PSR, which reveal distinct fractal and chaotic characteristics to analyze tissue complexity. The CBAM module is incorporated into EfficientNet-B0 to enhance feature extraction from these nonlinear dynamic representations by focusing on critical channels and spatial regions. The network is further optimized with Dropout and Scaled Exponential Linear Units (SeLUs) to prevent overfitting, alongside a cosine annealing learning rate scheduler. Experimental results demonstrate the superior performance of the proposed CBAM-EfficientNet-B0 model, achieving a high recognition accuracy of 99.57% and outperforming five benchmark CNN models (EfficientNet-B0, ResNet101, DenseNet201, ResNet18, and VGG16). The method avoids the subjectivity and uncertainty inherent in traditional manual feature extraction, enabling effective identification of HIFU-induced tissue denaturation. This work confirms the significant potential of combining nonlinear dynamics, fractal analysis, and deep learning for accurate, real-time monitoring in HIFU therapy. Full article
Show Figures

Figure 1

Back to TopTop