A compression method based on non-uniform binary scalar quantization, designed for the memoryless Laplacian source with zero-mean and unit variance, is analyzed in this paper. Two quantizer design approaches are presented that investigate the effect of clipping with the aim of reducing the quantization noise, where the minimal mean-squared error distortion is used to determine the optimal clipping factor. A detailed comparison of both models is provided, and the performance evaluation in a wide dynamic range of input data variances is also performed. The observed binary scalar quantization models are applied in standard signal processing tasks, such as speech and image quantization, but also to quantization of neural network parameters. The motivation behind the binary quantization of neural network weights is the model compression by a factor of 32, which is crucial for implementation in mobile or embedded devices with limited memory and processing power. The experimental results follow well the theoretical models, confirming their applicability in real-world applications.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited