Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (14)

Search Parameters:
Keywords = quantization-aware training (QAT)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1957 KiB  
Article
Resource-Efficient Cotton Network: A Lightweight Deep Learning Framework for Cotton Disease and Pest Classification
by Zhengle Wang, Heng-Wei Zhang, Ying-Qiang Dai, Kangning Cui, Haihua Wang, Peng W. Chee and Rui-Feng Wang
Plants 2025, 14(13), 2082; https://doi.org/10.3390/plants14132082 - 7 Jul 2025
Cited by 2 | Viewed by 405
Abstract
Cotton is the most widely cultivated natural fiber crop worldwide, yet it is highly susceptible to various diseases and pests that significantly compromise both yield and quality. To enable rapid and accurate diagnosis of cotton diseases and pests—thus supporting the development of effective [...] Read more.
Cotton is the most widely cultivated natural fiber crop worldwide, yet it is highly susceptible to various diseases and pests that significantly compromise both yield and quality. To enable rapid and accurate diagnosis of cotton diseases and pests—thus supporting the development of effective control strategies and facilitating genetic breeding research—we propose a lightweight model, the Resource-efficient Cotton Network (RF-Cott-Net), alongside an open-source image dataset, CCDPHD-11, encompassing 11 disease categories. Built upon the MobileViTv2 backbone, RF-Cott-Net integrates an early exit mechanism and quantization-aware training (QAT) to enhance deployment efficiency without sacrificing accuracy. Experimental results on CCDPHD-11 demonstrate that RF-Cott-Net achieves an accuracy of 98.4%, an F1-score of 98.4%, a precision of 98.5%, and a recall of 98.3%. With only 4.9 M parameters, 310 M FLOPs, an inference time of 3.8 ms, and a storage footprint of just 4.8 MB, RF-Cott-Net delivers outstanding accuracy and real-time performance, making it highly suitable for deployment on agricultural edge devices and providing robust support for in-field automated detection of cotton diseases and pests. Full article
(This article belongs to the Special Issue Precision Agriculture in Crop Production)
Show Figures

Figure 1

23 pages, 2426 KiB  
Article
SUQ-3: A Three Stage Coarse-to-Fine Compression Framework for Sustainable Edge AI in Smart Farming
by Thavavel Vaiyapuri and Huda Aldosari
Sustainability 2025, 17(12), 5230; https://doi.org/10.3390/su17125230 - 6 Jun 2025
Viewed by 533
Abstract
Artificial intelligence of things (AIoT) has become a pivotal enabler of precision agriculture by supporting real-time, data-driven decision-making at the edge. Deep learning (DL) models are central to this paradigm, offering powerful capabilities for analyzing environmental and climatic data in a range of [...] Read more.
Artificial intelligence of things (AIoT) has become a pivotal enabler of precision agriculture by supporting real-time, data-driven decision-making at the edge. Deep learning (DL) models are central to this paradigm, offering powerful capabilities for analyzing environmental and climatic data in a range of agricultural applications. However, deploying these models on edge devices remains challenging due to constraints in memory, computation, and energy. Existing model compression techniques predominantly target large-scale 2D architectures, with limited attention to one-dimensional (1D) models such as gated recurrent units (GRUs), which are commonly employed for processing sequential sensor data. To address this gap, we propose a novel three-stage coarse-to-fine compression framework, termed SUQ-3 (Structured, Unstructured Pruning, and Quantization), designed to optimize 1D DL models for efficient edge deployment in AIoT applications. The SUQ-3 framework sequentially integrates (1) structured pruning with an M×N sparsity pattern to induce hardware-friendly, coarse-grained sparsity; (2) unstructured pruning to eliminate low-magnitude weights for fine-grained compression; and (3) quantization, applied post quantization-aware training (QAT), to support low-precision inference with minimal accuracy loss. We validate the proposed SUQ-3 by compressing a GRU-based crop recommendation model trained on environmental and climatic data from an agricultural dataset. Experimental results show a model size reduction of approximately 85% and an 80% improvement in inference latency while preserving high predictive accuracy (F1 score: 0.97 vs. baseline: 0.9837). Notably, when deployed on a mobile edge device using TensorFlow Lite, the SUQ-3 model achieved an estimated energy consumption of 1.18 μJ per inference, representing a 74.4% reduction compared with the baseline and demonstrating its potential for sustainable low-power AI deployment in agricultural environments. Although demonstrated in an agricultural AIoT use case, the generality and modularity of SUQ-3 make it applicable to a broad range of DL models across domains requiring efficient edge intelligence. Full article
(This article belongs to the Collection Sustainability in Agricultural Systems and Ecosystem Services)
Show Figures

Figure 1

24 pages, 3848 KiB  
Article
Efficient Deep Learning Model Compression for Sensor-Based Vision Systems via Outlier-Aware Quantization
by Joonhyuk Yoo and Guenwoo Ban
Sensors 2025, 25(9), 2918; https://doi.org/10.3390/s25092918 - 5 May 2025
Viewed by 862
Abstract
With the rapid growth of sensor technology and computer vision, efficient deep learning models are essential for real-time image feature extraction in resource-constrained environments. However, most existing quantized deep neural networks (DNNs) are highly sensitive to outliers, leading to severe performance degradation in [...] Read more.
With the rapid growth of sensor technology and computer vision, efficient deep learning models are essential for real-time image feature extraction in resource-constrained environments. However, most existing quantized deep neural networks (DNNs) are highly sensitive to outliers, leading to severe performance degradation in low-precision settings. Our study reveals that outliers extending beyond the nominal weight distribution significantly increase the dynamic range, thereby reducing quantization resolution and affecting sensor-based image analysis tasks. To address this, we propose an outlier-aware quantization (OAQ) method that effectively reshapes weight distributions to enhance quantization accuracy. By analyzing previous outlier-handling techniques using structural similarity (SSIM) measurement results, we demonstrated that OAQ significantly reduced the negative impact of outliers while maintaining computational efficiency. Notably, OAQ was orthogonal to existing quantization schemes, making it compatible with various quantization methods without additional computational overhead. Experimental results on multiple CNN architectures and quantization approaches showed that OAQ effectively mitigated quantization errors. In post-training quantization (PTQ), our 4-bit OAQ ResNet20 model achieved improved accuracy compared with full-precision counterparts, while in quantization-aware training (QAT), OAQ enhanced 2-bit quantization performance by 43.55% over baseline methods. These results confirmed the potential of OAQ for optimizing deep learning models in sensor-based vision applications. Full article
Show Figures

Figure 1

21 pages, 1565 KiB  
Article
A KWS System for Edge-Computing Applications with Analog-Based Feature Extraction and Learned Step Size Quantized Classifier
by Yukai Shen, Binyi Wu, Dietmar Straeussnigg and Eric Gutierrez
Sensors 2025, 25(8), 2550; https://doi.org/10.3390/s25082550 - 17 Apr 2025
Viewed by 831
Abstract
Edge-computing applications demand ultra-low-power architectures for both feature extraction and classification tasks. In this manuscript, a Keyword Spotting (KWS) system tailored for energy-constrained portable environments is proposed. A 16-channel analog filter bank is employed for audio feature extraction, followed by a digital Gated [...] Read more.
Edge-computing applications demand ultra-low-power architectures for both feature extraction and classification tasks. In this manuscript, a Keyword Spotting (KWS) system tailored for energy-constrained portable environments is proposed. A 16-channel analog filter bank is employed for audio feature extraction, followed by a digital Gated Recurrent Unit (GRU) classifier. The filter bank is behaviorally modeled, making use of second-order band-pass transfer functions, simulating the analog front-end (AFE) processing. To enable efficient deployment, the GRU classifier is trained using a Learned Step Size (LSQ) and Look-Up Table (LUT)-aware quantization method. The resulting quantized model, with 4-bit weights and 8-bit activation functions (W4A8), achieves 91.35% accuracy across 12 classes, including 10 keywords from the Google Speech Command Dataset v2 (GSCDv2), with less than 1% degradation compared to its full-precision counterpart. The model is estimated to require only 34.8 kB of memory and 62,400 multiply–accumulate (MAC) operations per inference in real-time settings. Furthermore, the robustness of the AFE against noise and analog impairments is evaluated by injecting Gaussian noise and perturbing the filter parameters (center frequency and quality factor) in the test data, respectively. The obtained results confirm a strong classification performance even under degraded circuit-level conditions, supporting the suitability of the proposed system for ultra-low-power, noise-resilient edge applications. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

11 pages, 677 KiB  
Article
Benchmarking In-Sensor Machine Learning Computing: An Extension to the MLCommons-Tiny Suite
by Fabrizio Maria Aymone and Danilo Pietro Pau
Information 2024, 15(11), 674; https://doi.org/10.3390/info15110674 - 28 Oct 2024
Cited by 1 | Viewed by 1790
Abstract
This paper proposes a new benchmark specifically designed for in-sensor digital machine learning computing to meet an ultra-low embedded memory requirement. With the exponential growth of edge devices, efficient local processing is essential to mitigate economic costs, latency, and privacy concerns associated with [...] Read more.
This paper proposes a new benchmark specifically designed for in-sensor digital machine learning computing to meet an ultra-low embedded memory requirement. With the exponential growth of edge devices, efficient local processing is essential to mitigate economic costs, latency, and privacy concerns associated with the centralized cloud processing. Emerging intelligent sensors equipped with computing assets to run neural network inferences and embedded in the same package, which hosts the sensing elements, present new challenges due to their limited memory resources and computational skills. This benchmark evaluates models trained with Quantization Aware Training (QAT) and compares their performance with Post-Training Quantization (PTQ) across three use cases: Human Activity Recognition (HAR) by means of the SHL dataset, Physical Activity Monitoring (PAM) by means of the PAMAP2 dataset, and superficial electromyography (sEMG) regression with the NINAPRO DB8 dataset. The results demonstrate the effectiveness of QAT over PTQ in most scenarios, highlighting the potential for deploying advanced AI models on highly resource-constrained sensors. The INT8 versions of the models always outperformed their FP32, regarding memory and latency reductions, except for the activations for CNN. The CNN model exhibited reduced memory usage and latency with respect to its Dense counterpart, allowing it to meet the stringent 8KiB data RAM and 32 KiB program RAM limits of the ISPU. The TCN model proved to be too large to fit within the memory constraints of the ISPU, primarily due to its greater capacity in terms of number of parameters, designed for processing more complex signals like EMG. This benchmark aims to guide the development of efficient AI solutions for In-Sensor Machine Learning Computing, fostering innovation in the field of Edge AI benchmarking, such as the one conducted by the MLCommons-Tiny working group. Full article
Show Figures

Graphical abstract

28 pages, 24699 KiB  
Article
Enhancing Autism Spectrum Disorder Classification with Lightweight Quantized CNNs and Federated Learning on ABIDE-1 Dataset
by Simran Gupta, Md. Rahad Islam Bhuiyan, Sadia Sultana Chowa, Sidratul Montaha, Rashik Rahman, Sk. Tanzir Mehedi and Ziaur Rahman
Mathematics 2024, 12(18), 2886; https://doi.org/10.3390/math12182886 - 16 Sep 2024
Cited by 3 | Viewed by 2679
Abstract
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition that presents significant diagnostic challenges due to its varied symptoms and nature. This study aims to improve ASD classification using advanced deep learning techniques applied to neuroimaging data. We developed an automated system leveraging [...] Read more.
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition that presents significant diagnostic challenges due to its varied symptoms and nature. This study aims to improve ASD classification using advanced deep learning techniques applied to neuroimaging data. We developed an automated system leveraging the ABIDE-1 dataset and a novel lightweight quantized one-dimensional (1D) Convolutional Neural Network (Q-CNN) model to analyze fMRI data. Our approach employs the NIAK pipeline with multiple brain atlases and filtering methods. Initially, the Regions of Interest (ROIs) are converted into feature vectors using tangent space embedding to feed into the Q-CNN model. The proposed 1D-CNN is quantized through Quantize Aware Training (QAT). As the quantization method, int8 quantization is utilized, which makes it both robust and lightweight. We propose a federated learning (FL) framework to ensure data privacy, which allows decentralized training across different data centers without compromising local data security. Our findings indicate that the CC200 brain atlas, within the NIAK pipeline’s filt-global filtering methods, provides the best results for ASD classification. Notably, the ASD classification outcomes have achieved a significant test accuracy of 98% using the CC200 and filt-global filtering techniques. To the best of our knowledge, this performance surpasses previous studies in the field, highlighting a notable enhancement in ASD detection from fMRI data. Furthermore, the FL-based Q-CNN model demonstrated robust performance and high efficiency on a Raspberry Pi 4, underscoring its potential for real-world applications. We exhibit the efficacy of the Q-CNN model by comparing its inference time, power consumption, and storage requirements with those of the 1D-CNN, quantized CNN, and the proposed int8 Q-CNN models. This research has made several key contributions, including the development of a lightweight int8 Q-CNN model, the application of FL for data privacy, and the evaluation of the proposed model in real-world settings. By identifying optimal brain atlases and filtering methods, this study provides valuable insights for future research in the field of neurodevelopmental disorders. Full article
(This article belongs to the Special Issue Advances in Mathematics Computation for Software Engineering)
Show Figures

Figure 1

20 pages, 4604 KiB  
Article
On-Edge Deployment of Vision Transformers for Medical Diagnostics Using the Kvasir-Capsule Dataset
by Dara Varam, Lujain Khalil and Tamer Shanableh
Appl. Sci. 2024, 14(18), 8115; https://doi.org/10.3390/app14188115 - 10 Sep 2024
Cited by 6 | Viewed by 2626
Abstract
This paper aims to explore the possibility of utilizing vision transformers (ViTs) for on-edge medical diagnostics by experimenting with the Kvasir-Capsule image classification dataset, a large-scale image dataset of gastrointestinal diseases. Quantization techniques made available through TensorFlow Lite (TFLite), including post-training float-16 (F16) [...] Read more.
This paper aims to explore the possibility of utilizing vision transformers (ViTs) for on-edge medical diagnostics by experimenting with the Kvasir-Capsule image classification dataset, a large-scale image dataset of gastrointestinal diseases. Quantization techniques made available through TensorFlow Lite (TFLite), including post-training float-16 (F16) quantization and quantization-aware training (QAT), are applied to achieve reductions in model size, without compromising performance. The seven ViT models selected for this study are EfficientFormerV2S2, EfficientViT_B0, EfficientViT_M4, MobileViT_V2_050, MobileViT_V2_100, MobileViT_V2_175, and RepViT_M11. Three metrics are considered when analyzing a model: (i) F1-score, (ii) model size, and (iii) performance-to-size ratio, where performance is the F1-score and size is the model size in megabytes (MB). In terms of F1-score, we show that MobileViT_V2_175 with F16 quantization outperforms all other models with an F1-score of 0.9534. On the other hand, MobileViT_V2_050 trained using QAT was scaled down to a model size of 1.70 MB, making it the smallest model amongst the variations this paper examined. MobileViT_V2_050 also achieved the highest performance-to-size ratio of 41.25. Despite preferring smaller models for latency and memory concerns, medical diagnostics cannot afford poor-performing models. We conclude that MobileViT_V2_175 with F16 quantization is our best-performing model, with a small size of 27.47 MB, providing a benchmark for lightweight models on the Kvasir-Capsule dataset. Full article
(This article belongs to the Special Issue AI Technologies for eHealth and mHealth)
Show Figures

Figure 1

19 pages, 748 KiB  
Article
Eye-Net: A Low-Complexity Distributed Denial of Service Attack-Detection System Based on Multilayer Perceptron
by Ramzi Khantouchi, Ibtissem Gasmi and Mohamed Amine Ferrag
J. Sens. Actuator Netw. 2024, 13(4), 45; https://doi.org/10.3390/jsan13040045 - 12 Aug 2024
Viewed by 2273
Abstract
Distributed Denial of Service (DDoS) attacks disrupt service availability, leading to significant financial setbacks for individuals and businesses. This paper introduces Eye-Net, a deep learning-based system optimized for DDoS attack detection that combines feature selection, balancing methods, Multilayer Perceptron (MLP), and quantization-aware training [...] Read more.
Distributed Denial of Service (DDoS) attacks disrupt service availability, leading to significant financial setbacks for individuals and businesses. This paper introduces Eye-Net, a deep learning-based system optimized for DDoS attack detection that combines feature selection, balancing methods, Multilayer Perceptron (MLP), and quantization-aware training (QAT) techniques. An Analysis of Variance (ANOVA) algorithm is initially applied to the dataset to identify the most distinctive features. Subsequently, the Synthetic Minority Oversampling Technique (SMOTE) balances the dataset by augmenting samples for under-represented classes. Two distinct MLP models are developed: one for the binary classification of flow packets as regular or DDoS traffic and another for identifying six specific DDoS attack types. We store MLP model weights at 8-bit precision by incorporating the quantization-aware training technique. This adjustment slashes memory use by a factor of four and reduces computational cost similarly, making Eye-Net suitable for Internet of Things (IoT) devices. Both models are rigorously trained and assessed using the CICDDoS2019 dataset. Test results reveal that Eye-Net excels, surpassing contemporary DDoS detection techniques in accuracy, recall, precision, and F1 Score. The multiclass model achieves an impressive accuracy of 96.47% with an error rate of 8.78%, while the binary model showcases an outstanding 99.99% accuracy, maintaining a negligible error rate of 0.02%. Full article
(This article belongs to the Section Network Security and Privacy)
Show Figures

Figure 1

17 pages, 2591 KiB  
Article
An Enhanced LightGBM-Based Breast Cancer Detection Technique Using Mammography Images
by Abdul Rahaman Wahab Sait and Ramprasad Nagaraj
Diagnostics 2024, 14(2), 227; https://doi.org/10.3390/diagnostics14020227 - 22 Jan 2024
Cited by 10 | Viewed by 2946
Abstract
Breast cancer (BC) is the leading cause of mortality among women across the world. Earlier screening of BC can significantly reduce the mortality rate and assist the diagnostic process to increase the survival rate. Researchers employ deep learning (DL) techniques to detect BC [...] Read more.
Breast cancer (BC) is the leading cause of mortality among women across the world. Earlier screening of BC can significantly reduce the mortality rate and assist the diagnostic process to increase the survival rate. Researchers employ deep learning (DL) techniques to detect BC using mammogram images. However, these techniques are resource-intensive, leading to implementation complexities in real-life environments. The performance of convolutional neural network (CNN) models depends on the quality of mammogram images. Thus, this study aimed to build a model to detect BC using a DL technique. Image preprocessing techniques were used to enhance image quality. The authors developed a CNN model using the EfficientNet B7 model’s weights to extract the image features. Multi-class classification of BC images was performed using the LightGBM model. The Optuna algorithm was used to fine-tune LightGBM for image classification. In addition, a quantization-aware training (QAT) strategy was followed to implement the proposed model in a resource-constrained environment. The authors generalized the proposed model using the CBIS-DDSM and CMMD datasets. Additionally, they combined these two datasets to ensure the model’s generalizability to diverse images. The experimental findings revealed that the suggested BC detection model produced a promising result. The proposed BC detection model obtained an accuracy of 99.4%, 99.9%, and 97.0%, and Kappa (K) values of 96.9%, 96.9%, and 94.1% in the CBIS-DDSM, CMMD, and combined datasets. The recommended model streamlined the BC detection process in order to achieve an exceptional outcome. It can be deployed in a real-life environment to support physicians in making effective decisions. Graph convolutional networks can be used to improve the performance of the proposed model. Full article
(This article belongs to the Special Issue Artificial Intelligence in Cancers)
Show Figures

Figure 1

35 pages, 4597 KiB  
Article
DDD TinyML: A TinyML-Based Driver Drowsiness Detection Model Using Deep Learning
by Norah N. Alajlan and Dina M. Ibrahim
Sensors 2023, 23(12), 5696; https://doi.org/10.3390/s23125696 - 18 Jun 2023
Cited by 25 | Viewed by 7595
Abstract
Driver drowsiness is one of the main causes of traffic accidents today. In recent years, driver drowsiness detection has suffered from issues integrating deep learning (DL) with Internet-of-things (IoT) devices due to the limited resources of IoT devices, which pose a challenge to [...] Read more.
Driver drowsiness is one of the main causes of traffic accidents today. In recent years, driver drowsiness detection has suffered from issues integrating deep learning (DL) with Internet-of-things (IoT) devices due to the limited resources of IoT devices, which pose a challenge to fulfilling DL models that demand large storage and computation. Thus, there are challenges to meeting the requirements of real-time driver drowsiness detection applications that need short latency and lightweight computation. To this end, we applied Tiny Machine Learning (TinyML) to a driver drowsiness detection case study. In this paper, we first present an overview of TinyML. After conducting some preliminary experiments, we proposed five lightweight DL models that can be deployed on a microcontroller. We applied three DL models: SqueezeNet, AlexNet, and CNN. In addition, we adopted two pretrained models (MobileNet-V2 and MobileNet-V3) to find the best model in terms of size and accuracy results. After that, we applied the optimization methods to DL models using quantization. Three quantization methods were applied: quantization-aware training (QAT), full-integer quantization (FIQ), and dynamic range quantization (DRQ). The obtained results in terms of the model size show that the CNN model achieved the smallest size of 0.05 MB using the DRQ method, followed by SqueezeNet, AlexNet MobileNet-V3, and MobileNet-V2, with 0.141 MB, 0.58 MB, 1.16 MB, and 1.55 MB, respectively. The result after applying the optimization method was 0.9964 accuracy using DRQ in the MobileNet-V2 model, which outperformed the other models, followed by the SqueezeNet and AlexNet models, with 0.9951 and 0.9924 accuracies, respectively, using DRQ. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

17 pages, 2597 KiB  
Article
Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
by Artem Sher, Anton Trusov, Elena Limonova, Dmitry Nikolaev and Vladimir V. Arlazarov
Mathematics 2023, 11(9), 2112; https://doi.org/10.3390/math11092112 - 29 Apr 2023
Cited by 10 | Viewed by 2801
Abstract
Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison [...] Read more.
Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison to their classical analogs. To solve this issue, a number of quantization-aware training (QAT) approaches were proposed. In this paper, we study QAT approaches for two- to eight-bit linear quantization schemes and propose a new combined QAT approach: neuron-by-neuron quantization with straight-through estimator (STE) gradient forwarding. It is suitable for quantizations with two- to eight-bit widths and eliminates significant accuracy drops during training, which results in better accuracy of the final QNN. We experimentally evaluate our approach on CIFAR-10 and ImageNet classification and show that it is comparable to other approaches for four to eight bits and outperforms some of them for two to three bits while being easier to implement. For example, the proposed approach to three-bit quantization of the CIFAR-10 dataset results in 73.2% accuracy, while baseline direct and layer-by-layer result in 71.4% and 67.2% accuracy, respectively. The results for two-bit quantization for ResNet18 on the ImageNet dataset are 63.69% for our approach and 61.55% for the direct baseline. Full article
Show Figures

Figure 1

20 pages, 6643 KiB  
Article
Simplification of Deep Neural Network-Based Object Detector for Real-Time Edge Computing
by Kyoungtaek Choi, Seong Min Wi, Ho Gi Jung and Jae Kyu Suhr
Sensors 2023, 23(7), 3777; https://doi.org/10.3390/s23073777 - 6 Apr 2023
Cited by 16 | Viewed by 3347
Abstract
This paper presents a method for simplifying and quantizing a deep neural network (DNN)-based object detector to embed it into a real-time edge device. For network simplification, this paper compares five methods for applying channel pruning to a residual block because special care [...] Read more.
This paper presents a method for simplifying and quantizing a deep neural network (DNN)-based object detector to embed it into a real-time edge device. For network simplification, this paper compares five methods for applying channel pruning to a residual block because special care must be taken regarding the number of channels when summing two feature maps. Based on the comparison in terms of detection performance, parameter number, computational complexity, and processing time, this paper discovers the most satisfying method on the edge device. For network quantization, this paper compares post-training quantization (PTQ) and quantization-aware training (QAT) using two datasets with different detection difficulties. This comparison shows that both approaches are recommended in the case of the easy-to-detect dataset, but QAT is preferable in the case of the difficult-to-detect dataset. Through experiments, this paper shows that the proposed method can effectively embed the DNN-based object detector into an edge device equipped with Qualcomm’s QCS605 System-on-Chip (SoC), while achieving a real-time operation with more than 10 frames per second. Full article
(This article belongs to the Special Issue Emerging Technologies in Edge Computing and Networking)
Show Figures

Figure 1

21 pages, 13730 KiB  
Article
An SSD-MobileNet Acceleration Strategy for FPGAs Based on Network Compression and Subgraph Fusion
by Shoutao Tan, Zhanfeng Fang, Yanyi Liu, Zhe Wu, Hang Du, Renjie Xu and Yunfei Liu
Forests 2023, 14(1), 53; https://doi.org/10.3390/f14010053 - 27 Dec 2022
Cited by 6 | Viewed by 3099
Abstract
Over the last decade, various deep neural network models have achieved great success in image recognition and classification tasks. The vast majority of high-performing deep neural network models have a huge number of parameters and often require sacrificing performance and accuracy when they [...] Read more.
Over the last decade, various deep neural network models have achieved great success in image recognition and classification tasks. The vast majority of high-performing deep neural network models have a huge number of parameters and often require sacrificing performance and accuracy when they are deployed on mobile devices with limited area and power consumption. To address this problem, we present an SSD-MobileNet-v1 acceleration method based on network compression and subgraph fusion for Field-Programmable Gate Arrays (FPGAs). Firstly, a regularized pruning algorithm based on sensitivity analysis and Filter Pruning via Geometric Median (FPGM) was proposed. Secondly, the Quantize Aware Training (QAT)-based network full quantization algorithm was designed. Finally, a strategy for computing subgraph fusion is proposed for FPGAs to achieve continuous scheduling of Programmable Logic (PL) operators. The experimental results show that using the proposed acceleration strategy can reduce the number of model parameters by a factor of 11 and increase the inference speed on the FPGA platform by a factor of 9–10. The acceleration algorithm is applicable to various mobile edge devices and can be applied to the real-time monitoring of forest fires to improve the intelligence of forest fire detection. Full article
(This article belongs to the Section Natural Hazards and Risk Management)
Show Figures

Figure 1

21 pages, 8145 KiB  
Article
The Possibility of Combining and Implementing Deep Neural Network Compression Methods
by Bratislav Predić, Uroš Vukić, Muzafer Saračević, Darjan Karabašević and Dragiša Stanujkić
Axioms 2022, 11(5), 229; https://doi.org/10.3390/axioms11050229 - 13 May 2022
Cited by 14 | Viewed by 4520
Abstract
In the paper, the possibility of combining deep neural network (DNN) model compression methods to achieve better compression results was considered. To compare the advantages and disadvantages of each method, all methods were applied to the ResNet18 model for pretraining to the NCT-CRC-HE-100K [...] Read more.
In the paper, the possibility of combining deep neural network (DNN) model compression methods to achieve better compression results was considered. To compare the advantages and disadvantages of each method, all methods were applied to the ResNet18 model for pretraining to the NCT-CRC-HE-100K dataset while using CRC-VAL-HE-7K as the validation dataset. In the proposed method, quantization, pruning, weight clustering, QAT (quantization-aware training), preserve cluster QAT (hereinafter PCQAT), and distillation were performed for the compression of ResNet18. The final evaluation of the obtained models was carried out on a Raspberry Pi 4 device using the validation dataset. The greatest model compression result on the disk was achieved by applying the PCQAT method, whose application led to a reduction in size of the initial model by as much as 45 times, whereas the greatest model acceleration result was achieved via distillation on the MobileNetV2 model. All methods led to the compression of the initial size of the model, with a slight loss in the model accuracy or an increase in the model accuracy in the case of QAT and weight clustering. INT8 quantization and knowledge distillation also led to a significant decrease in the model execution time. Full article
Show Figures

Figure 1

Back to TopTop