Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (13)

Search Parameters:
Keywords = conductance-aware quantization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 13698 KB  
Article
Edge-Oriented Adaptive Multi-Task Network for Modulation and Signal Type Classification
by Peixin Zhao and Chengqun Wang
Future Internet 2026, 18(6), 275; https://doi.org/10.3390/fi18060275 - 22 May 2026
Viewed by 250
Abstract
Modulation and signal classification are two highly correlated core tasks in wireless communications and are the core foundation of intelligent spectrum management in Future Internet and 6G networks. Although their objectives differ, the two tasks often share a substantial amount of underlying information [...] Read more.
Modulation and signal classification are two highly correlated core tasks in wireless communications and are the core foundation of intelligent spectrum management in Future Internet and 6G networks. Although their objectives differ, the two tasks often share a substantial amount of underlying information in the feature space. However, focusing solely on their commonalities while neglecting their intrinsic differences may lead to suboptimal model performance. Therefore, by taking into account both the correlation and inherent differences between the two tasks, we propose TAMTNet, a task-adaptive multi-task network for edge deployment in Future Internet. TAMTNet introduces Extremely Efficient Spatial Pyramid (EESP) into the shared layer to efficiently extract multi-scale temporal information. In addition, a multi-gate mixture-of-experts (MMoE) mechanism is employed after the shared layer to enhance the modeling capability of task-specific features. Furthermore, to address the difficulty of deploying deep models on resource-constrained edge devices, a joint lightweight framework combining quantization-aware training and knowledge distillation is proposed, which significantly reduces model complexity while maintaining performance. Extensive experiments conducted on the simulation and real-world over-the-air transmission datasets demonstrate that the TAMTNet model achieves excellent performance on both modulation and signal classification tasks across a wide range of signal-to-noise ratios and radio transmit gain conditions. Meanwhile, the low-bitwidth lightweight models are able to maintain classification performance comparable to the full-precision model while significantly reducing model storage and computational complexity. Full article
Show Figures

Figure 1

28 pages, 4683 KB  
Article
Acoustic Intelligence with Multi-Stage Model Optimization for Environmental Sound Classification
by Pasan Sarathchandra, Senuri Mallikarachchi, Dimalsha Madushani and Dulani Meedeniya
Smart Cities 2026, 9(5), 86; https://doi.org/10.3390/smartcities9050086 - 16 May 2026
Viewed by 490
Abstract
Environmental sound classification is an important component of smart city sensing systems, supporting applications such as urban noise analysis, public safety monitoring, and real-time situational awareness. However, high-accuracy models are often difficult to deploy on low-power edge devices because of memory, computational, and [...] Read more.
Environmental sound classification is an important component of smart city sensing systems, supporting applications such as urban noise analysis, public safety monitoring, and real-time situational awareness. However, high-accuracy models are often difficult to deploy on low-power edge devices because of memory, computational, and latency constraints. This study aims to address this deployment gap by developing a lightweight compression pipeline for a hybrid convolutional and Kolmogorov–Arnold Network-based model. The proposed pipeline consists of three stages. First, structured channel pruning is applied to remove redundant convolutional filters while preserving hardware-efficient dense operations. Second, selective quantization-aware training is applied to the most computation-dominant layers, namely the third convolutional layer and the fully connected layer. Third, knowledge distillation is used to recover accuracy by training the compressed model under the guidance of the baseline model. Experiments were conducted on ESC-10, ESC-50, FSC22, and UrbanSound8K. The proposed pipeline reduced the average parameter count from 511,033 to 50,774 and reduced the model size while maintaining competitive accuracy across all benchmarks. The final model preserved the baseline accuracy of 96.75% on ESC-10, while accuracy decreased only from 88.25% to 86.50% on ESC-50, from 87.92% to 86.38% on FSC22, and from 85.13% to 84.52% on UrbanSound8K. These results show that the proposed compression pipeline provides an effective accuracy–efficiency trade-off for real-time audio classification on resource-constrained devices. Therefore, the resulting compressed model supports the scalable deployment of distributed acoustic sensing systems for real-time smart city monitoring and decision-making. Full article
Show Figures

Figure 1

43 pages, 2338 KB  
Article
Micro-Attention CNN Hybrid Architecture for Real-Time Stress Detection Using Minimalistic Bio-Signals
by Chaymae Yahyati, Ismail Lamaakal, Yassine Maleh, Khalid El Makkaoui and Ibrahim Ouahbi
Technologies 2026, 14(5), 300; https://doi.org/10.3390/technologies14050300 - 13 May 2026
Viewed by 336
Abstract
Real-time psychological stress detection on wearable and edge devices requires models that are accurate, computationally efficient, and small enough for on-device deployment. This paper proposes a Micro-Attention CNN Hybrid Architecture for stress recognition using wearable bio-signals. The model uses six sensor channels, namely [...] Read more.
Real-time psychological stress detection on wearable and edge devices requires models that are accurate, computationally efficient, and small enough for on-device deployment. This paper proposes a Micro-Attention CNN Hybrid Architecture for stress recognition using wearable bio-signals. The model uses six sensor channels, namely tri-axial acceleration, electrodermal activity, heart rate, and skin temperature, and classifies three stress levels: no stress, low stress, and high stress. This study is conducted on a public wearable sensor dataset collected from 15 nurses during hospital work, providing a realistic benchmark for continuous stress monitoring under practical conditions. The proposed architecture combines one-dimensional and depthwise separable convolutions with a lightweight attention module to emphasize the most informative temporal patterns in short multivariate signal segments. To support deployment on resource-constrained devices, we further apply structured pruning, selective quantization-aware training, and post-training quantization. The full-precision model achieves a Macro-F1 score of 99.63%, while the final compressed model retains 98.03% Macro-F1 with a model size of 1.76 kilobytes and a CPU inference latency of 0.40 ms. Additional analyses show that most residual errors occur near the boundary between low stress and neighboring classes, while simple post-compression calibration improves reliability. These results demonstrate that accurate and low-latency stress detection using wearable bio-signals is feasible on compact edge hardware without transmitting raw sensor streams off-device. Full article
(This article belongs to the Special Issue AI-Enabled Smart Healthcare Systems)
Show Figures

Figure 1

21 pages, 6540 KB  
Article
HAPQ: A Hardware-Aware Pruning and Quantization Pipeline for Event-Based SNN Detection
by Zhengyinan Li and Jing Wu
Sensors 2026, 26(9), 2910; https://doi.org/10.3390/s26092910 - 6 May 2026
Viewed by 782
Abstract
Autonomous driving perception demands low latency, high temporal resolution, and stringent hardware efficiency. While event-based spiking neural networks (SNNs) offer bio-inspired sparse computation, their deployment on edge field-programmable gate arrays (FPGAs) is obstructed by irregular execution patterns and temporal state storage overhead. To [...] Read more.
Autonomous driving perception demands low latency, high temporal resolution, and stringent hardware efficiency. While event-based spiking neural networks (SNNs) offer bio-inspired sparse computation, their deployment on edge field-programmable gate arrays (FPGAs) is obstructed by irregular execution patterns and temporal state storage overhead. To address this, we propose HAPQ, a unified hardware-aware pruning and quantization pipeline for compact event-based object detection. Starting from an end-to-end adaptive sampling SNN detector (EAS-SNN), HAPQ conducts hardware-aware configuration search within discrete digital signal processor (DSP) and block RAM (BRAM) budgets, applies single-instruction-multiple-data (SIMD)-aligned structured pruning for computational regularity, and jointly quantizes synaptic weights and membrane potentials via a shift-friendly fixed-point recurrence. Evaluation on the Prophesee Gen1 dataset and an FPGA accelerator shows that HAPQ improves detection accuracy from 0.284 to 0.425 in mean average precision (mAP50:95) and achieves 0.722 mAP50. Hardware implementation reveals a reduction in lookup table (LUT) usage to 1680, complete DSP elimination, and a maximum operating frequency of 920.81 MHz at 0.630 W. These results confirm that effective temporal SNN deployment requires joint optimization of model architecture, state precision, and hardware-aligned workload organization. Full article
Show Figures

Figure 1

18 pages, 972 KB  
Article
CPU Deployment-Oriented Evaluation of Compact Neural Networks for Remaining Useful Life Prediction
by Ali Naderi Bakhtiyari, Vahid Hassani and Mohammad Omidi
Machines 2026, 14(4), 375; https://doi.org/10.3390/machines14040375 - 28 Mar 2026
Viewed by 615
Abstract
Remaining Useful Life (RUL) prediction is a key component of prognostics and health management for modern industrial systems. While deep learning methods have significantly improved prediction accuracy, many existing approaches rely on large neural networks that are difficult to deploy on resource-constrained edge [...] Read more.
Remaining Useful Life (RUL) prediction is a key component of prognostics and health management for modern industrial systems. While deep learning methods have significantly improved prediction accuracy, many existing approaches rely on large neural networks that are difficult to deploy on resource-constrained edge devices. This study presents a deployment-oriented evaluation of compact neural networks for RUL prediction using the NASA C-MAPSS turbofan engine benchmark. Two lightweight hybrid architectures, CNN–GRU and CNN–TCN, were developed with approximately 28k–32k parameters to represent realistic models for CPU-based edge inference. A systematic experimental analysis was conducted across all four C-MAPSS subsets (FD001–FD004), which represent increasing levels of operational and fault complexity. In addition to baseline performance, two post-training compression techniques (i.e., global unstructured magnitude pruning and dynamic INT8 quantization) were evaluated. To assess real deployment behavior, inference latency was measured on both a high-performance Intel x86 workstation and a resource-constrained ARM platform. Results show that CNN–GRU generally achieves higher predictive accuracy, whereas CNN–TCN provides more consistent and lower inference latency due to its convolution-only temporal modeling. Unstructured pruning can yield modest improvements in prediction accuracy, suggesting a regularization effect, but it does not reliably reduce model size or latency on standard CPUs due to the overhead associated with pruning masks. Dynamic quantization substantially reduces model size (particularly for CNN–GRU) while preserving predictive accuracy; however, it increases runtime latency because of additional quantization and dequantization operations. These findings demonstrate that compression techniques commonly used for large models do not necessarily translate into deployment benefits for already compact RUL architectures and highlight the importance of hardware-aware evaluation when designing edge prognostics systems. Full article
Show Figures

Figure 1

29 pages, 18308 KB  
Article
Optimizing Computer Vision for Edge Deployment in Industry 4.0: A Framework and Experimental Evaluation
by Eman Azab, Mohamed Ehab, Lamia Shihata and Maggie Mashaly
Technologies 2026, 14(2), 126; https://doi.org/10.3390/technologies14020126 - 17 Feb 2026
Viewed by 1006
Abstract
Integrating high-performance computer vision (CV) into Industry 4.0 environments remains a challenge due to the computational disparity between state-of-the-art (SOTA) models and resource-constrained edge hardware. This study proposes a hardware-aware optimization framework designed to bridge this gap, focusing on real-time object detection for [...] Read more.
Integrating high-performance computer vision (CV) into Industry 4.0 environments remains a challenge due to the computational disparity between state-of-the-art (SOTA) models and resource-constrained edge hardware. This study proposes a hardware-aware optimization framework designed to bridge this gap, focusing on real-time object detection for high-speed, omnidirectional conveyor systems. Unlike conventional benchmarking, the proposed framework employs a multi-stage optimization pipeline—integrating backbone refinement, hyperparameter tuning, and quantization—to transition diverse architectures from baseline configurations (Mbase) to hardware-optimized variants (Mopt).The framework’s efficacy is validated using a custom-built standalone experimental platform detecting package features, brands, and disruptions on an omnidirectional-wheeled conveyor. A comprehensive comparative analysis is conducted across a heterogeneous edge ecosystem, including the NVIDIA Jetson Nano (GPU), Raspberry Pi 4 (CPU), and Google Coral (TPU). Our findings demonstrate that through systematic tuning, the YOLOv10n variant emerged as the superior architecture, achieving a precision of 98.1% and an mAP50:95 of 81.22%. Post-deployment characterization reveals that the optimized YOLOv10n model on the NVIDIA Jetson Nano achieved a peak inference speed of 25 frames per second (FPS), successfully striking the “Pareto-optimal” balance between predictive accuracy and real-time processing. The primary contributions of this work include a reproducible optimization methodology, a comparative performance map across three distinct hardware backends, and the release of a specialized industrial conveyor dataset. Full article
Show Figures

Figure 1

15 pages, 2618 KB  
Article
Multi-Agent Collaboration for 3D Human Pose Estimation and Its Potential in Passenger-Gathering Behavior Early Warning
by Xirong Chen, Hongxia Lv, Lei Yin and Jie Fang
Electronics 2026, 15(1), 78; https://doi.org/10.3390/electronics15010078 - 24 Dec 2025
Cited by 1 | Viewed by 854
Abstract
Passenger-gathering behavior often triggers safety incidents such as stampedes due to overcrowding, posing significant challenges to public order maintenance and passenger safety. Traditional early warning algorithms for passenger-gathering behavior typically perform only global modeling of image appearance, neglecting the analysis of individual passenger [...] Read more.
Passenger-gathering behavior often triggers safety incidents such as stampedes due to overcrowding, posing significant challenges to public order maintenance and passenger safety. Traditional early warning algorithms for passenger-gathering behavior typically perform only global modeling of image appearance, neglecting the analysis of individual passenger actions in practical 3D physical space, leading to high false-alarm and missed-alarm rates. To address this issue, we decompose the modeling process into two stages: human pose estimation and gathering behavior recognition. Specifically, the pose of each individual in 3D space is first estimated from images, and then fused with global features to complete the early warning. This work focuses on the former stage and aims to develop an accurate and efficient human pose estimation model capable of real-time inference on resource-constrained devices. To this end, we propose a 3D human pose estimation framework that integrates a hybrid spatio-temporal Transformer with three collaborative agents. First, a reinforcement learning-based architecture search agent is designed to adaptively select among Global Self-Attention, Window Attention, and External Attention for each block to optimize the model structure. Second, a feedback optimization agent is developed to dynamically adjust the search process, balancing exploration and convergence. Third, a quantization agent is employed that leverages quantization-aware training (QAT) to generate an INT8 deployment-ready model with minimal loss in accuracy. Experiments conducted on the Human3.6M dataset demonstrate that the proposed method achieves a mean per joint position error (MPJPE) of 42.15 mm with only 4.38 M parameters and 19.39 GFLOPs under FP32 precision, indicating substantial potential for subsequent gathering behavior recognition tasks. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

29 pages, 5099 KB  
Article
Configurable Multi-Layer Perceptron-Based Soft Sensors on Embedded Field Programmable Gate Arrays: Targeting Diverse Deployment Goals in Fluid Flow Estimation
by Tianheng Ling, Chao Qian, Theodor Mario Klann, Julian Hoever, Lukas Einhaus and Gregor Schiele
Sensors 2025, 25(1), 83; https://doi.org/10.3390/s25010083 - 26 Dec 2024
Cited by 5 | Viewed by 1930
Abstract
This study presents a comprehensive workflow for developing and deploying Multi-Layer Perceptron (MLP)-based soft sensors on embedded FPGAs, addressing diverse deployment objectives. The proposed workflow extends our prior research by introducing greater model adaptability. It supports various configurations—spanning layer counts, neuron counts, and [...] Read more.
This study presents a comprehensive workflow for developing and deploying Multi-Layer Perceptron (MLP)-based soft sensors on embedded FPGAs, addressing diverse deployment objectives. The proposed workflow extends our prior research by introducing greater model adaptability. It supports various configurations—spanning layer counts, neuron counts, and quantization bitwidths—to accommodate the constraints and capabilities of different FPGA platforms. The workflow incorporates a custom-developed, open-source toolchain ElasticAI.Creator that facilitates quantization-aware training, integer-only inference, automated accelerator generation using VHDL templates, and synthesis alongside performance estimation. A case study on fluid flow estimation was conducted on two FPGA platforms: the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. For precision-focused and latency-sensitive deployments, a six-layer, 60-neuron MLP accelerator quantized to 8 bits on the XC7S15 achieved an MSE of 56.56, an MAPE of 1.61%, and an inference latency of 23.87 μs. Moreover, for low-power and energy-constrained deployments, a five-layer, 30-neuron MLP accelerator quantized to 8 bits on the iCE40UP5K achieved an inference latency of 83.37 μs, a power consumption of 2.06 mW, and an energy consumption of just 0.172 μJ per inference. These results confirm the workflow’s ability to identify optimal FPGA accelerators tailored to specific deployment requirements, achieving a balanced trade-off between precision, inference latency, and energy efficiency. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

13 pages, 1603 KB  
Article
The Impact of 8- and 4-Bit Quantization on the Accuracy and Silicon Area Footprint of Tiny Neural Networks
by Paweł Tumialis, Marcel Skierkowski, Jakub Przychodny and Paweł Obszarski
Electronics 2025, 14(1), 14; https://doi.org/10.3390/electronics14010014 - 24 Dec 2024
Cited by 12 | Viewed by 8014
Abstract
In the field of embedded and edge devices, efforts have been made to make deep neural network models smaller due to the limited size of the available memory and the low computational efficiency. Typical model footprints are under 100 KB. However, for some [...] Read more.
In the field of embedded and edge devices, efforts have been made to make deep neural network models smaller due to the limited size of the available memory and the low computational efficiency. Typical model footprints are under 100 KB. However, for some applications, models of this size are too large. In low-voltage sensors, signals must be processed, classified or predicted with an order of magnitude smaller memory. Model downsizing can be performed by limiting the number of model parameters or quantizing their weights. These types of operations have a negative impact on the accuracy of the deep network. This study tested the effect of model downscaling techniques on accuracy. The main idea was to reduce neural network models to 3 k parameters or less. Tests were conducted on three different neural network architectures in the context of three separate research problems, modeling real tasks for small networks. The impact of the reduction in the accuracy of the network depends mainly on its initial size. For a network reduced from 40 k parameters, a decrease in accuracy of 16 percentage points was achieved, and for a network with 20 k parameters, a decrease of 8 points was achieved. To obtain the best results, knowledge distillation and quantization-aware training methods were used for training. Thanks to this, the accuracy of the 4-bit networks did not differ significantly from the 8-bit ones and their results were approximately four percentage points worse than those of the full precision networks. For the fully connected network, synthesis to ASIC (application-specific integrated circuit) was also performed to demonstrate the reduction in the silicon area occupied by the model. The 4-bit quantization limits the silicon area footprint by 90%. Full article
Show Figures

Figure 1

16 pages, 582 KB  
Article
A Mixed-Methods Study Exploring Coping Self-Insights Associated with Resilience
by Kirsten J. Bucknell, Scott Hoare, Maria Kangas, Eyal Karin and Monique F. Crane
Behav. Sci. 2024, 14(11), 1018; https://doi.org/10.3390/bs14111018 - 1 Nov 2024
Cited by 3 | Viewed by 4629
Abstract
Self-insight has been associated with psychological resilience; however, less is understood about the role coping-specific self-insights play in strengthening resilience. This study used a convergent mixed-methods approach to investigate the coping self-insights triggered by self-reflection on coping experiences and their effects on perceived [...] Read more.
Self-insight has been associated with psychological resilience; however, less is understood about the role coping-specific self-insights play in strengthening resilience. This study used a convergent mixed-methods approach to investigate the coping self-insights triggered by self-reflection on coping experiences and their effects on perceived resilience. Australian ministry workers (n = 79) provided up to five qualitative self-reflective workbook entries, and quantitative online self-report survey responses before and six months after training. Hierarchical regression analyses of weighted quantized coping-specific self-insights on perceived resilience were conducted. Results suggest two pathways for the strengthening of resilience. A set of three self-insights related to greater perceived resilience appear to reinforce and sustain resilient beliefs across six months to increase perceived resilience. Another set of four self-insights is related to lesser perceived resilience after six months. It is suggested that the first set of self-insights may enhance beliefs that support resilience in the mid-term, whereas the second set may promote self-awareness that reduces perceived resilience in the mid-term. These findings support further exploration of coping self-insights, and the use and on-going testing of self-reflection resilience training. Full article
Show Figures

Figure 1

11 pages, 677 KB  
Article
Benchmarking In-Sensor Machine Learning Computing: An Extension to the MLCommons-Tiny Suite
by Fabrizio Maria Aymone and Danilo Pietro Pau
Information 2024, 15(11), 674; https://doi.org/10.3390/info15110674 - 28 Oct 2024
Cited by 5 | Viewed by 4232
Abstract
This paper proposes a new benchmark specifically designed for in-sensor digital machine learning computing to meet an ultra-low embedded memory requirement. With the exponential growth of edge devices, efficient local processing is essential to mitigate economic costs, latency, and privacy concerns associated with [...] Read more.
This paper proposes a new benchmark specifically designed for in-sensor digital machine learning computing to meet an ultra-low embedded memory requirement. With the exponential growth of edge devices, efficient local processing is essential to mitigate economic costs, latency, and privacy concerns associated with the centralized cloud processing. Emerging intelligent sensors equipped with computing assets to run neural network inferences and embedded in the same package, which hosts the sensing elements, present new challenges due to their limited memory resources and computational skills. This benchmark evaluates models trained with Quantization Aware Training (QAT) and compares their performance with Post-Training Quantization (PTQ) across three use cases: Human Activity Recognition (HAR) by means of the SHL dataset, Physical Activity Monitoring (PAM) by means of the PAMAP2 dataset, and superficial electromyography (sEMG) regression with the NINAPRO DB8 dataset. The results demonstrate the effectiveness of QAT over PTQ in most scenarios, highlighting the potential for deploying advanced AI models on highly resource-constrained sensors. The INT8 versions of the models always outperformed their FP32, regarding memory and latency reductions, except for the activations for CNN. The CNN model exhibited reduced memory usage and latency with respect to its Dense counterpart, allowing it to meet the stringent 8KiB data RAM and 32 KiB program RAM limits of the ISPU. The TCN model proved to be too large to fit within the memory constraints of the ISPU, primarily due to its greater capacity in terms of number of parameters, designed for processing more complex signals like EMG. This benchmark aims to guide the development of efficient AI solutions for In-Sensor Machine Learning Computing, fostering innovation in the field of Edge AI benchmarking, such as the one conducted by the MLCommons-Tiny working group. Full article
Show Figures

Graphical abstract

35 pages, 4597 KB  
Article
DDD TinyML: A TinyML-Based Driver Drowsiness Detection Model Using Deep Learning
by Norah N. Alajlan and Dina M. Ibrahim
Sensors 2023, 23(12), 5696; https://doi.org/10.3390/s23125696 - 18 Jun 2023
Cited by 49 | Viewed by 10622
Abstract
Driver drowsiness is one of the main causes of traffic accidents today. In recent years, driver drowsiness detection has suffered from issues integrating deep learning (DL) with Internet-of-things (IoT) devices due to the limited resources of IoT devices, which pose a challenge to [...] Read more.
Driver drowsiness is one of the main causes of traffic accidents today. In recent years, driver drowsiness detection has suffered from issues integrating deep learning (DL) with Internet-of-things (IoT) devices due to the limited resources of IoT devices, which pose a challenge to fulfilling DL models that demand large storage and computation. Thus, there are challenges to meeting the requirements of real-time driver drowsiness detection applications that need short latency and lightweight computation. To this end, we applied Tiny Machine Learning (TinyML) to a driver drowsiness detection case study. In this paper, we first present an overview of TinyML. After conducting some preliminary experiments, we proposed five lightweight DL models that can be deployed on a microcontroller. We applied three DL models: SqueezeNet, AlexNet, and CNN. In addition, we adopted two pretrained models (MobileNet-V2 and MobileNet-V3) to find the best model in terms of size and accuracy results. After that, we applied the optimization methods to DL models using quantization. Three quantization methods were applied: quantization-aware training (QAT), full-integer quantization (FIQ), and dynamic range quantization (DRQ). The obtained results in terms of the model size show that the CNN model achieved the smallest size of 0.05 MB using the DRQ method, followed by SqueezeNet, AlexNet MobileNet-V3, and MobileNet-V2, with 0.141 MB, 0.58 MB, 1.16 MB, and 1.55 MB, respectively. The result after applying the optimization method was 0.9964 accuracy using DRQ in the MobileNet-V2 model, which outperformed the other models, followed by the SqueezeNet and AlexNet models, with 0.9951 and 0.9924 accuracies, respectively, using DRQ. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

13 pages, 1244 KB  
Article
Conductance-Aware Quantization Based on Minimum Error Substitution for Non-Linear-Conductance-State Tolerance in Neural Computing Systems
by Chenglong Huang, Nuo Xu, Wenqing Wang, Yihong Hu and Liang Fang
Micromachines 2022, 13(5), 667; https://doi.org/10.3390/mi13050667 - 24 Apr 2022
Cited by 2 | Viewed by 2761
Abstract
Emerging resistive random-access memory (ReRAM) has demonstrated great potential in the achievement of the in-memory computing paradigm to overcome the well-known “memory wall” in current von Neumann architecture. The ReRAM crossbar array (RCA) is a promising circuit structure to accelerate the vital multiplication-and-accumulation [...] Read more.
Emerging resistive random-access memory (ReRAM) has demonstrated great potential in the achievement of the in-memory computing paradigm to overcome the well-known “memory wall” in current von Neumann architecture. The ReRAM crossbar array (RCA) is a promising circuit structure to accelerate the vital multiplication-and-accumulation (MAC) operations in deep neural networks (DNN). However, due to the nonlinear distribution of conductance levels in ReRAM, a large deviation exists in the mapping process when the trained weights that are quantized by linear relationships are directly mapped to the nonlinear conductance values from the realistic ReRAM device. This deviation degrades the inference accuracy of the RCA-based DNN. In this paper, we propose a minimum error substitution based on a conductance-aware quantization method to eliminate the deviation in the mapping process from the weights to the actual conductance values. The method is suitable for multiple ReRAM devices with different non-linear conductance distribution and is also immune to the device variation. The simulation results on LeNet5, AlexNet and VGG16 demonstrate that this method can vastly rescue the accuracy degradation from the non-linear resistance distribution of ReRAM devices compared to the linear quantization method. Full article
(This article belongs to the Special Issue Advances in Emerging Nonvolatile Memory)
Show Figures

Figure 1

Back to TopTop