Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,925)

Search Parameters:
Keywords = VGG-11

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 2563 KB  
Article
LungVisionNet: A Hybrid Deep Learning Model for Chest X-Ray Classification—A Case Study at King Hussein Cancer Center (KHCC)
by Iyad Sultan, Hasan Gharaibeh, Azza Gharaibeh, Belal Lahham, Mais Al-Tarawneh, Rula Al-Qawabah and Ahmad Nasayreh
Technologies 2025, 13(11), 517; https://doi.org/10.3390/technologies13110517 - 12 Nov 2025
Abstract
Early diagnosis and rapid treatment of respiratory abnormalities such as many lung diseases including pneumonia, TB, cancer, and other pulmonary problems depend on accurate and fast classification of chest X-ray images. Delayed diagnosis and insufficient treatment lead to the subjective, labour-intensive, error-prone features [...] Read more.
Early diagnosis and rapid treatment of respiratory abnormalities such as many lung diseases including pneumonia, TB, cancer, and other pulmonary problems depend on accurate and fast classification of chest X-ray images. Delayed diagnosis and insufficient treatment lead to the subjective, labour-intensive, error-prone features of current manual diagnosis systems. To tackle this pressing healthcare issue, this work investigates many deep convolutional neural network (CNN) architectures including VGG16, VGG19, ResNet50, InceptionV3, Xception, DenseNet121, NASNetMobile, and NASNet Large. LungVisionNet (LVNet) is an innovative hybrid model proposed here that combines MobileNetV2 with multilayer perceptron (MLP) layers in a unique way. LungVisionNet outperformed previous models in accuracy 96.91%, recall 97.59%, precision, specificity, F1-score 97.01%, and area under the curve (AUC) measurements according to thorough examination on two publicly available datasets including various chest abnormalities and normal cases exhibited. Comprehensive evaluation with an independent, real-world clinical dataset from King Hussein Cancer Centre (KHCC), which achieved 95.3% accuracy, 95.3% precision, 78.8% recall, 99.1% specificity, and 86.4% F1-score, confirmed the model’s robustness, generalizability, and clinical usefulness. We also created a simple mobile application that lets doctors quickly classify and evaluate chest X-ray images in hospitals, so enhancing clinical integration and practical application and supporting fast decision-making and better patient outcomes. Full article
(This article belongs to the Section Assistive Technologies)
Show Figures

Figure 1

28 pages, 15283 KB  
Article
A Study on the Interpretability of Diabetic Retinopathy Diagnostic Models
by Zerui Zhang, Hongbo Zhao, Li Dong, Lin Luo and Hao Wang
Bioengineering 2025, 12(11), 1231; https://doi.org/10.3390/bioengineering12111231 - 10 Nov 2025
Abstract
This study focuses on the interpretability of diabetic retinopathy classification models. Seven widely used interpretability methods—Gradient, SmoothGrad, Integrated Gradients, SHAP, DeepLIFT, Grad-CAM++, and ScoreCAM—are applied to assess the interpretability of four representative deep learning architectures, VGG, ResNet, DenseNet, and EfficientNet, on fundus images. [...] Read more.
This study focuses on the interpretability of diabetic retinopathy classification models. Seven widely used interpretability methods—Gradient, SmoothGrad, Integrated Gradients, SHAP, DeepLIFT, Grad-CAM++, and ScoreCAM—are applied to assess the interpretability of four representative deep learning architectures, VGG, ResNet, DenseNet, and EfficientNet, on fundus images. Through saliency map visualization, perturbation curve analysis, and trend correlation analysis, combined with four quantitative metrics—saliency map entropy, AOPC score, Recall, and Dice coefficient—the interpretability performance of the models is comprehensively assessed from both qualitative and quantitative perspectives. The results show that model architecture greatly influences interpretability quality: models with simpler structures and clearer feature extraction paths (such as VGG) perform better in terms of interpretability, while deeper or lightweight architectures exhibit certain limitations. Full article
Show Figures

Figure 1

31 pages, 15908 KB  
Review
Fusion of Robotics, AI, and Thermal Imaging Technologies for Intelligent Precision Agriculture Systems
by Omar Shalash, Ahmed Emad, Fares Fathy, Abdallah Alzogby, Mohamed Sallam, Eslam Naser, Mohamed El-Sayed and Esraa Khatab
Sensors 2025, 25(22), 6844; https://doi.org/10.3390/s25226844 - 8 Nov 2025
Viewed by 555
Abstract
The world population is expected to grow to over 10 billion by 2050 and therefore impose further stress on food production. Precision agriculture has become the main approach used to enhance productivity with sustainability in agricultural production. This paper conducts a technical review [...] Read more.
The world population is expected to grow to over 10 billion by 2050 and therefore impose further stress on food production. Precision agriculture has become the main approach used to enhance productivity with sustainability in agricultural production. This paper conducts a technical review of how robotics, artificial intelligence (AI), and thermal imaging (TI) technologies transform precision agriculture operations, focusing on sensing, automation, and farm decision making. Agricultural robots promote labor solutions and efficiency by utilizing their sensing devices and kinematics in planting, spraying, and harvesting. Through accurate assessment of pests/diseases and quality assurance of the harvested crops, AI and TI bring efficiency to the crop monitoring sector. Different deep learning models are employed for plant disease diagnosis and resource management, namely the VGG16 model, InceptionV3, and MobileNet; the PlantVillage, PlantDoc, and FieldPlant datasets are used respectively. To reduce crop losses, AI–TI integration enables early recognition of fluctuations caused by pests or diseases, allowing control and mitigation in good time. While the issues of cost and environmental variability (illumination, canopy moisture, and microclimate instability) are taken into consideration, the advancement in artificial intelligence, robotics technology, and combined technologies will offer sustainable solutions to the existing gaps. Full article
Show Figures

Figure 1

19 pages, 5595 KB  
Article
Improving Oriental Melon Leaf Disease Classification via DCGAN-Based Image Augmentation
by Myeongyong Kang, Niraj Tamrakar and Hyeon Tae Kim
Agriculture 2025, 15(22), 2324; https://doi.org/10.3390/agriculture15222324 - 8 Nov 2025
Viewed by 142
Abstract
Deep learning-based plant disease classification models often suffer from performance degradation when training data are limited. Hence, generative models offer a promising solution for model performance in plant disease classification. In this work, images representing powdery mildew, downy mildew, and healthy plant leaves [...] Read more.
Deep learning-based plant disease classification models often suffer from performance degradation when training data are limited. Hence, generative models offer a promising solution for model performance in plant disease classification. In this work, images representing powdery mildew, downy mildew, and healthy plant leaves were generated using traditional augmentation methods as well as both DCGAN and a modified DCGAN featuring residual connection blocks with varied activation functions. Evaluation metrics IS and FID revealed that the modified DCGAN consistently produced generative images with strong class-distinctive features and greater overall diversity compared to basic GAN methods, with an IS increment of 7.9% to 11.54% and FID decrement of 6.6% to 7.8%. After selecting the best augmentation method, we input the generated images into the training sets for the classification models, AlexNet, VGG16, and Goog-LeNet, to measure improvements in disease recognition. All classifiers benefited from the augmented datasets, with the modified DCGAN-based augmentation yielding the highest precision, recall, and accuracy. GoogLeNet outperformed all classification models, with an overall precision, recall, and F1-Score value of 98%. Notably, this generative approach minimized errors between visually similar categories, such as powdery mildew and healthy samples, by capturing subtle morphological differences. The results confirm that class-aware generative augmentation can both expand the number of training images and preserve the critical features necessary for discrimination, significantly boosting model effectiveness. These advances show the practical potential of generative models not only to enrich datasets but also to improve the accuracy and robustness of plant disease detection for real-world agricultural scenarios. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

25 pages, 2898 KB  
Article
Framework and Layer-Wise Word-Line Activation Method Design for CIM
by Wei-Kai Cheng, Shin-Yi Pai and Shih-Hsu Huang
Electronics 2025, 14(22), 4367; https://doi.org/10.3390/electronics14224367 - 7 Nov 2025
Viewed by 182
Abstract
Convolutional Neural Networks (CNNs) have excellent performance in various fields, such as machine learning, computer vision, and image recognition. With the development of CNNs, huge quantities of data in computing and transmission have placed significant pressure on circuit and architecture design, and RRAM-based [...] Read more.
Convolutional Neural Networks (CNNs) have excellent performance in various fields, such as machine learning, computer vision, and image recognition. With the development of CNNs, huge quantities of data in computing and transmission have placed significant pressure on circuit and architecture design, and RRAM-based computing-in-memory (CIM) is one of the promising solutions to alleviate this problem. However, because of the current deviation phenomenon and the resistance on/off ratio (R ratio) issue in RRAM, there is a trade-off problem between computational accuracy and computational efficiency for CIM. In this paper, we propose a layer-wise activated word-line (AWL) strategy to configure the appropriate number of AWLs for each layer. Based on the observed risk factors, we design a risk index to AWL mapping methodology. Meanwhile, based on the proposed quantization and current deviation error calculation methods, we design a CIM simulation framework to simulate the accuracy of CNNs in the inference stage. We evaluate our methodology on Cifar-10, VGG-8, and ResNet-18. The proposed methodology improves computational efficiency with only slight accuracy loss. In comparison with a fixed-AWL configuration, our methodology has better accuracy with a small resistance on/off ratio. For higher resistance on/off ratios, our methodology gets a significant improvement in computational efficiency in comparison with the baseline. On the exploration of different R ratios, experimental results show that our layer-wise AWL configuration methodology has a more flexible planning space and better computational efficiency. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

17 pages, 4836 KB  
Article
A Deep Learning-Based Approach for Explainable Microsatellite Instability Detection in Gastrointestinal Malignancies
by Ludovica Ciardiello, Patrizia Agnello, Marta Petyx, Fabio Martinelli, Mario Cesarelli, Antonella Santone and Francesco Mercaldo
J. Imaging 2025, 11(11), 398; https://doi.org/10.3390/jimaging11110398 - 7 Nov 2025
Viewed by 193
Abstract
Microsatellite instability represents a key biomarker in gastrointestinal cancers with significant diagnostic and therapeutic implications. Traditional molecular assays for microsatellite instability detection, while effective, are costly, time-consuming, and require specialized infrastructure. In this paper we propose an explainable deep learning-based method for microsatellite [...] Read more.
Microsatellite instability represents a key biomarker in gastrointestinal cancers with significant diagnostic and therapeutic implications. Traditional molecular assays for microsatellite instability detection, while effective, are costly, time-consuming, and require specialized infrastructure. In this paper we propose an explainable deep learning-based method for microsatellite instability detection starting from the analysis of histopathological images. We consider a set of convolutional neural network architectures i.e., MobileNet, Inception, VGG16, VGG19, and a Vision Transformer model, and we propose a way to provide a kind of clinical explainability behind the model prediction through (three) Class Activation Mapping techniques. With the aim to further strengthen trustworthiness in predictions, we introduce a set of robustness metrics aimed to quantify the consistency of highlighted discriminative regions across different Class Activation Mapping methods. Experimental results on a real-world dataset demonstrate that VGG16 and VGG19 models achieve the best performance in terms of accuracy; in particular, the VGG16 model obtains an accuracy of 0.926, while the VGG19 one reaches an accuracy equal to 0.917. Furthermore, Class Activation Mapping techniques confirmed that the developed models consistently focus on similar tissue regions, while robustness analysis highlighted high agreement between different Class Activation Mapping techniques. These results indicate that the proposed method not only achieves interesting predictive accuracy but also provides explainable predictions, with the aim to boost the integration of deep learning into real-world clinical practice. Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis—2nd Edition)
20 pages, 5440 KB  
Article
RepSAU-Net: Semantic Segmentation of Barcodes in Complex Backgrounds via Fused Self-Attention and Reparameterization Methods
by Yanfei Sun, Junyu Wang and Rui Yin
J. Imaging 2025, 11(11), 394; https://doi.org/10.3390/jimaging11110394 - 6 Nov 2025
Viewed by 190
Abstract
In the digital era, commodity barcodes serve as a bridge between the physical and digital worlds and are widely used in retail checkout systems. To meet the broader application demands for product identification, this paper proposes a method for locating, semantically segmenting barcodes [...] Read more.
In the digital era, commodity barcodes serve as a bridge between the physical and digital worlds and are widely used in retail checkout systems. To meet the broader application demands for product identification, this paper proposes a method for locating, semantically segmenting barcodes in complex backgrounds, decoding hidden information, and recovering these barcodes in wide field-of-view images. This method integrates self-attention mechanisms and reparameterization techniques to construct a RepSAU-Net model. Specifically, this paper first introduces a barcode image dataset synthesis strategy adapted for deep learning models, constructing the SBS (Screen Stego Barcodes) dataset, which comprises 2000 wide field-of-view background images (Type A) and 400 information-hidden barcode images (Type B), totaling 30,000 images. Based on this, a network architecture (RepSAU-Net) combining a self-attention mechanism and RepVGG reparameterization technology was designed, with a parameter count of 32.88 M. Experimental results demonstrate that this network performs well in barcode segmentation tasks, achieving an inference speed of 4.88 frames/s, a Mean Intersection over Union (MIoU) of 98.36%, and an Accuracy (Acc) of 94.96%. This research effectively enhances global information capture and feature extraction capabilities without significantly increasing computational load, providing technical support for the application of data-embedded barcodes. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

21 pages, 8098 KB  
Article
Multi-Sensor AI-Based Urban Tree Crown Segmentation from High-Resolution Satellite Imagery for Smart Environmental Monitoring
by Amirmohammad Sharifi, Reza Shah-Hosseini, Danesh Shokri and Saeid Homayouni
Smart Cities 2025, 8(6), 187; https://doi.org/10.3390/smartcities8060187 - 6 Nov 2025
Viewed by 396
Abstract
Urban tree detection is fundamental to effective forestry management, biodiversity preservation, and environmental monitoring—key components of sustainable smart city development. This study introduces a deep learning framework for urban tree crown segmentation that exclusively leverages high-resolution satellite imagery from GeoEye-1, WorldView-2, and WorldView-3, [...] Read more.
Urban tree detection is fundamental to effective forestry management, biodiversity preservation, and environmental monitoring—key components of sustainable smart city development. This study introduces a deep learning framework for urban tree crown segmentation that exclusively leverages high-resolution satellite imagery from GeoEye-1, WorldView-2, and WorldView-3, thereby eliminating the need for additional data sources such as LiDAR or UAV imagery. The proposed framework employs a Residual U-Net architecture augmented with Attention Gates (AGs) to address major challenges, including class imbalance, overlapping crowns, and spectral interference from complex urban structures, using a custom composite loss function. The main contribution of this work is to integrate data from three distinct satellite sensors with varying spatial and spectral characteristics into a single processing pipeline, demonstrating that such well-established architectures can yield reliable, high-accuracy results across heterogeneous resolutions and imaging conditions. A further advancement of this study is the development of a hybrid ground-truth generation strategy that integrates NDVI-based watershed segmentation, manual annotation, and the Segment Anything Model (SAM), thereby reducing annotation effort while enhancing mask fidelity. In addition, by training on 4-band RGBN imagery from multiple satellite sensors, the model exhibits generalization capabilities across diverse urban environments. Despite being trained on a relatively small dataset comprising only 1200 image patches, the framework achieves state-of-the-art performance (F1-score: 0.9121; IoU: 0.8384; precision: 0.9321; recall: 0.8930). These results stem from the integration of the Residual U-Net with Attention Gates, which enhance feature representation and suppress noise from urban backgrounds, as well as from hybrid ground-truth generation and the combined BCE–Dice loss function, which effectively mitigates class imbalance. Collectively, these design choices enable robust model generalization and clear performance superiority over baseline networks such as DeepLab v3 and U-Net with VGG19. Fully automated and computationally efficient, the proposed approach delivers cost-effective, accurate segmentation using satellite data alone, rendering it particularly suitable for scalable, operational smart city applications and environmental monitoring initiatives. Full article
Show Figures

Figure 1

20 pages, 95851 KB  
Article
Swin Transformer Based Recognition for Hydraulic Fracturing Microseismic Signals from Coal Seam Roof with Ultra Large Mining Height
by Peng Wang, Yanjun Feng, Xiaodong Sun and Xing Cheng
Sensors 2025, 25(21), 6750; https://doi.org/10.3390/s25216750 - 4 Nov 2025
Viewed by 289
Abstract
Accurate differentiation between microseismic signals induced by hydraulic fracturing and those from roof fracturing is vital for optimizing fracturing efficiency, assessing roof stability, and mitigating mining-induced hazards in coal mining operations. We propose an automatic identification method for microseismic signals generated by hydraulic [...] Read more.
Accurate differentiation between microseismic signals induced by hydraulic fracturing and those from roof fracturing is vital for optimizing fracturing efficiency, assessing roof stability, and mitigating mining-induced hazards in coal mining operations. We propose an automatic identification method for microseismic signals generated by hydraulic fracturing in coal seam roofs. This method first transforms the microseismic signals induced by hydraulic fracturing and roof fracturing into time-frequency feature images using the Frequency Slice Wavelet Transform (FSWT) technique, and then employs a sliding window (Swin) Transformer network to automatically identify and classify these two types of time-frequency feature maps. A comparative analysis is conducted on the performance of three methods—including the signal energy distribution method, Residual Network (ResNet) model, and VGG Network (VGGNet) model—in identifying microseismic signals from hydraulic fracturing in coal seam roofs. The results demonstrate that the Swin Transformer recognition model combined with FSWT achieves an accuracy of 92.49% and an F1-score of 92.96% on the test set of field-acquired microseismic signals from hydraulic fracturing and roof fracturing. These performance metrics are significantly superior to those of the signal energy distribution method (accuracy: 64.70%, F1-score: 64.70%), ResNet model (accuracy: 88.04%, F1-score: 89.24%), and VGGNet model (accuracy: 88.47%, F1-score: 89.52%). This advancement provides a reliable technical approach for monitoring hydraulic fracturing effects and ensuring roof safety in coal mines. Full article
(This article belongs to the Section Environmental Sensing)
Show Figures

Figure 1

24 pages, 5518 KB  
Article
PropNet-R: A Custom CNN Architecture for Quantitative Estimation of Propane Gas Concentration Based on Thermal Images for Sustainable Safety Monitoring
by Luis Alberto Holgado-Apaza, Jaime Cesar Prieto-Luna, Edgar E. Carpio-Vargas, Nelly Jacqueline Ulloa-Gallardo, Yban Vilchez-Navarro, José Miguel Barrón-Adame, José Alfredo Aguirre-Puente, Dalmiro Ramos Enciso, Danger David Castellon-Apaza and Danny Jesus Saman-Pacamia
Sustainability 2025, 17(21), 9801; https://doi.org/10.3390/su17219801 - 3 Nov 2025
Viewed by 377
Abstract
Liquefied petroleum gas (LPG), composed mainly of propane and butane, is widely used as an energy source in residential, commercial, and industrial sectors; however, its high flammability poses a critical risk in the event of accidental leaks. In Peru, where LPG constitutes the [...] Read more.
Liquefied petroleum gas (LPG), composed mainly of propane and butane, is widely used as an energy source in residential, commercial, and industrial sectors; however, its high flammability poses a critical risk in the event of accidental leaks. In Peru, where LPG constitutes the main domestic energy source, leakage emergencies affect thousands of households each year. This pattern is replicated in developing countries with limited energy infrastructure. Early quantitative detection of propane, the predominant component of Peruvian LPG (~60%), is essential to prevent explosions, poisoning, and greenhouse gas emissions that hinder climate change mitigation strategies. This study presents PropNet-R, a convolutional neural network (CNN) designed to estimate propane concentrations (ppm) from thermal images. A dataset of 3574 thermal images synchronized with concentration measurements was collected under controlled conditions. PropNet-R, composed of four progressive convolutional blocks, was compared with SqueezeNet, VGG19, and ResNet50, all fine-tuned for regression tasks. On the test set, PropNet-R achieved MSE = 0.240, R2 = 0.614, MAE = 0.333, and Pearson’s r = 0.786, outperforming SqueezeNet (MSE = 0.374, R2 = 0.397), VGG19 (MSE = 0.447, R2 = 0.280), and ResNet50 (MSE = 0.474, R2 = 0.236). These findings provide empirical evidence that task-specific CNN architectures outperform generic transfer learning models in thermal image-based regression. By enabling continuous and quantitative monitoring of gas leaks, PropNet-R enhances safety in industrial and urban environments, complementing conventional chemical sensors. The proposed model contributes to the development of sustainable infrastructure by reducing gas-related risks, promoting energy security, and strengthening resilient, safe, and environmentally responsible urban systems. Full article
Show Figures

Figure 1

56 pages, 17528 KB  
Review
A Practical Tutorial on Spiking Neural Networks: Comprehensive Review, Models, Experiments, Software Tools, and Implementation Guidelines
by Bahgat Ayasi, Cristóbal J. Carmona, Mohammed Saleh and Angel M. García-Vico
Eng 2025, 6(11), 304; https://doi.org/10.3390/eng6110304 - 2 Nov 2025
Viewed by 577
Abstract
Spiking neural networks (SNNs) provide a biologically inspired, event-driven alternative to artificial neural networks (ANNs), potentially delivering competitive accuracy at substantially lower energy. This tutorial-study offers a unified, practice-oriented assessment that combines critical review and standardized experiments. We benchmark a shallow fully connected [...] Read more.
Spiking neural networks (SNNs) provide a biologically inspired, event-driven alternative to artificial neural networks (ANNs), potentially delivering competitive accuracy at substantially lower energy. This tutorial-study offers a unified, practice-oriented assessment that combines critical review and standardized experiments. We benchmark a shallow fully connected network (FCN) on MNIST and a deeper VGG7 architecture on CIFAR-10 across multiple neuron models (leaky integrate-and-fire (LIF), sigma–delta, etc.) and input encodings (direct, rate, temporal, etc.), using supervised surrogate-gradient training implemented in Intel Lava, SLAYER, SpikingJelly, Norse, and PyTorch. Empirically, we observe a consistent but tunable trade-off between accuracy and energy. On MNIST, sigma–delta neurons with rate or sigma–delta encodings achieve 98.1% accuracy (ANN baseline: 98.23%). On CIFAR-10, sigma–delta neurons with direct input reach 83.0% accuracy at just two time steps (ANN baseline: 83.6%). A GPU-based operation-count energy proxy indicates that many SNN configurations operate below the ANN energy baseline; some frugal codes minimize energy at the cost of accuracy, whereas accuracy-oriented settings (e.g., sigma–delta with direct or rate coding) narrow the performance gap while remaining energy-conscious—yielding up to threefold efficiency compared with matched ANNs in our setup. Thresholds and the number of time steps are decisive factors: intermediate thresholds and the minimal time window that still meets accuracy targets typically maximize efficiency per joule. We distill actionable design rules—choose the neuron–encoding pair according to the application goal (accuracy-critical vs. energy-constrained) and co-tune thresholds and time steps. Finally, we outline how event-driven neuromorphic hardware can amplify these savings through sparse, local, asynchronous computation, providing a practical playbook for embedded, real-time, and sustainable AI deployments. Full article
(This article belongs to the Section Electrical and Electronic Engineering)
Show Figures

Figure 1

26 pages, 5481 KB  
Article
MCP-X: An Ultra-Compact CNN for Rice Disease Classification in Resource-Constrained Environments
by Xiang Zhang, Lining Yan, Belal Abuhaija and Baha Ihnaini
AgriEngineering 2025, 7(11), 359; https://doi.org/10.3390/agriengineering7110359 - 1 Nov 2025
Viewed by 232
Abstract
Rice, a dietary staple for over half of the global population, is highly susceptible to bacterial and fungal diseases such as bacterial blight, brown spot, and leaf smut, which can severely reduce yields. Traditional manual detection is labor-intensive and often results in delayed [...] Read more.
Rice, a dietary staple for over half of the global population, is highly susceptible to bacterial and fungal diseases such as bacterial blight, brown spot, and leaf smut, which can severely reduce yields. Traditional manual detection is labor-intensive and often results in delayed intervention and excessive chemical use. Although deep learning models like convolutional neural networks (CNNs) achieve high accuracy, their computational demands hinder deployment in resource-limited agricultural settings. We propose MCP-X, an ultra-compact CNN with only 0.21 million parameters for real-time, on-device rice disease classification. MCP-X integrates a shallow encoder, multi-branch expert routing, a bi-level recurrent simulation encoder–decoder (BRSE), an efficient channel attention (ECA) module, and a lightweight classifier. Trained from scratch, MCP-X achieves 98.93% accuracy on PlantVillage and 96.59% on the Rice Disease Detection Dataset, without external pretraining. Mechanistically, expert routing diversifies feature branches, ECA enhances channel-wise signal relevance, and BRSE captures lesion-scale and texture cues—yielding complementary, stage-wise gains confirmed through ablation studies. Despite slightly higher FLOPs than MobileNetV2, MCP-X prioritizes a minimal memory footprint (~1.01 MB) and deployability over raw speed, running at 53.83 FPS (2.42 GFLOPs) on an RTX A5000. It achieves 16.7×, 287×, 420×, and 659× fewer parameters than MobileNetV2, ResNet152V2, ViT-Base, and VGG-16, respectively. When integrated into a multi-resolution ensemble, MCP-X attains 99.85% accuracy, demonstrating exceptional robustness across controlled and field datasets while maintaining efficiency for real-world agricultural applications. Full article
Show Figures

Figure 1

29 pages, 3642 KB  
Article
Securing IoT Vision Systems: An Unsupervised Framework for Adversarial Example Detection Integrating Spatial Prototypes and Multidimensional Statistics
by Naile Wang, Jian Li, Chunhui Zhang and Dejun Zhang
Sensors 2025, 25(21), 6658; https://doi.org/10.3390/s25216658 - 1 Nov 2025
Viewed by 251
Abstract
The deployment of deep learning models in Internet of Things (IoT) systems is increasingly threatened by adversarial attacks. To address the challenge of effectively detecting adversarial examples generated by Generative Adversarial Networks (AdvGANs), this paper proposes an unsupervised detection method that integrates spatial [...] Read more.
The deployment of deep learning models in Internet of Things (IoT) systems is increasingly threatened by adversarial attacks. To address the challenge of effectively detecting adversarial examples generated by Generative Adversarial Networks (AdvGANs), this paper proposes an unsupervised detection method that integrates spatial statistical features and multidimensional distribution characteristics. First, a collection of adversarial examples under four different attack intensities was constructed on the CIFAR-10 dataset. Then, based on the VGG16 and ResNet50 classification models, a dual-module collaborative architecture was designed: Module A extracted spatial statistics from convolutional layers and constructed category prototypes to calculate similarity, while Module B extracted multidimensional statistical features and characterized distribution anomalies using the Mahalanobis distance. Experimental results showed that the proposed method achieved a maximum AUROC of 0.9937 for detecting AdvGAN attacks on ResNet50 and 0.9753 on VGG16. Furthermore, it achieved AUROC scores exceeding 0.95 against traditional attacks such as FGSM and PGD, demonstrating its cross-attack generalization capability. Cross-dataset evaluation on Fashion-MNIST confirms its robust generalization across data domains. This study presents an effective solution for unsupervised adversarial example detection, without requiring adversarial samples for training, making it suitable for a wide range of attack scenarios. These findings highlight the potential of the proposed method for enhancing the robustness of IoT systems in security-critical applications. Full article
(This article belongs to the Special Issue IoT Network Security (Second Edition))
Show Figures

Figure 1

9 pages, 1474 KB  
Proceeding Paper
Comparative Study of MRI Modality Embeddings for Glioma Survival Prediction
by Fatima-Ezzahraa Ben-Bouazza, Saadia Azeroual, Bassma Jioudi and Zakaria Hamane
Eng. Proc. 2025, 112(1), 57; https://doi.org/10.3390/engproc2025112057 - 30 Oct 2025
Viewed by 297
Abstract
Accurately predicting survival within patients diagnosed with diffuse glioma remains one of the most difficult issues in neuro-oncology. While most prior research has focused on multimodal fusion or clinical data, we introduce a modality-specific deep learning framework that employs preoperative MRI only to [...] Read more.
Accurately predicting survival within patients diagnosed with diffuse glioma remains one of the most difficult issues in neuro-oncology. While most prior research has focused on multimodal fusion or clinical data, we introduce a modality-specific deep learning framework that employs preoperative MRI only to predict mortality outcomes using patient MRI scans. Using the UCSF-PDGM dataset containing structural, diffusion, and perfusion imaging of 495 glioma patients, we trained VGG16 models on every MRI modality individually, including T1, T2, FLAIR, SWI, DWI, ASL, HARDI-derived metrics, and segmentation maps. Our findings revealed that segmentation-based and diffusion-derived features, particularly FA or tensor eigenvalues, possessed the greatest predictive strength, surpassing those obtained from standard structural MRI in binary survival classifications. This approach of modality-specific model training allows for clearer explanations of the prediction process compared to fused approaches and is more practical in scenarios where not all types of MRI are performed on patients. This approach demonstrates the strong predictive power of individual MRI sequences for mortality in glioma cases, providing a modular, adaptable, and clinically actionable deep-learning framework. Additional enhancements can incorporate volumetric models, longitudinal imaging, and non-imaging datasets, including genomic and clinical information. Full article
Show Figures

Figure 1

27 pages, 3492 KB  
Article
Filter-Wise Mask Pruning and FPGA Acceleration for Object Classification and Detection
by Wenjing He, Shaohui Mei, Jian Hu, Lingling Ma, Shiqi Hao and Zhihan Lv
Remote Sens. 2025, 17(21), 3582; https://doi.org/10.3390/rs17213582 - 29 Oct 2025
Viewed by 388
Abstract
Pruning and acceleration has become an essential and promising technique for convolutional neural networks (CNN) in remote sensing image processing, especially for deployment on resource-constrained devices. However, how to maintain model accuracy and achieve satisfactory acceleration simultaneously remains to be a challenging and [...] Read more.
Pruning and acceleration has become an essential and promising technique for convolutional neural networks (CNN) in remote sensing image processing, especially for deployment on resource-constrained devices. However, how to maintain model accuracy and achieve satisfactory acceleration simultaneously remains to be a challenging and valuable problem. To break this limitation, we introduce a novel pruning pattern of filter-wise mask by enforcing extra filter-wise structural constraints on pattern-based pruning, which achieves the benefits of both unstructured and structured pruning. The newly introduced filter-wise mask enhances fine-grained sparsity with more hardware-friendly regularity. We further design an acceleration architecture with optimization of calculation parallelism and memory access, aiming to fully translate weight pruning to hardware performance gain. The proposed pruning method is firstly proven on classification networks. The pruning rate can achieve 75.1% for VGG-16 and 84.6% for ResNet-50 without accuracy compromise. Further to this, we enforce our method on the widely used object detection model, the you only look once (YOLO) CNN. On the aerial image dataset, the pruned YOLOv5s achieves a pruning rate of 53.43% with a slight accuracy degradation of 0.6%. Meanwhile, we implement the acceleration architecture on a field-programmable gate array (FPGA) to evaluate its practical execution performance. The throughput reaches up to 809.46MOPS. The pruned network achieves a speedup of 2.23× and 4.4×, with a compression rate of 2.25× and 4.5×, respectively, converting the model compression to execution speedup effectively. The proposed pruning and acceleration approach provides crucial technology to facilitate the application of remote sensing with CNN, especially in scenarios such as on-board real-time processing, emergency response, and low-cost monitoring. Full article
Show Figures

Figure 1

Back to TopTop