ClipQ: Clipping Optimization for the Post-Training Quantization of Convolutional Neural Network
Abstract
:1. Introduction
2. Related Work
3. Method
3.1. Linear Quantization
3.2. Weight Clipping Optimization
3.3. Activation Clipping Optimization
4. Experiments
4.1. Image Classification
4.2. Object Detection and Semantic Segmentation
4.3. Ablation Study
4.4. FPGA Implementation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2820–2828. [Google Scholar]
- Zhan, Z.; Ren, H.; Xia, M.; Lin, H.; Wang, X.; Li, X. Amfnet: Attention-guided multi-scale fusion network for bi-temporal change detection in remote sensing images. Remote Sens. 2024, 16, 1765. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Song, L.; Xia, M.; Xu, Y.; Weng, L.; Hu, K.; Lin, H.; Qian, M. Multi-granularity siamese transformer-based change detection in remote sensing imagery. Eng. Appl. Artif. Intell. 2024, 136, 108960. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Li, H.; Xiong, P.; An, J.; Wang, L. Pyramid attention network for semantic segmentation. arXiv 2018, arXiv:1805.10180. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Jiang, J.; Liu, J.; Fu, J.; Zhu, X.; Lu, H. Point Set Attention Network For Semantic Segmentation. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Virtual, 25–28 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2186–2190. [Google Scholar]
- Jiang, S.; Lin, H.; Ren, H.; Hu, Z.; Weng, L.; Xia, M. Mdanet: A high-resolution city change detection network based on difference and attention mechanisms under multi-scale feature fusion. Remote Sens. 2024, 16, 1387. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, W.; Xu, X.; Guo, X.; Gong, G.; Lu, H. Lightweight real-time stereo matching algorithm for AI chips. Comput. Commun. 2023, 199, 210–217. [Google Scholar] [CrossRef]
- Wang, Z.; Gu, G.; Xia, M.; Weng, L.; Hu, K. Bitemporal attention sharing network for remote sensing image change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10368–10379. [Google Scholar]
- Liu, Y.; Wang, J.; Song, Y.; Liang, S.; Xia, M.; Zhang, Q. Lightning nowcasting based on high-density area and extrapolation utilizing long-range lightning location data. Atmos. Res. 2025, 321, 108070. [Google Scholar] [CrossRef]
- Liu, Y.; Cheng, Y.; Song, Y.; Cai, D.; Zhang, N. Oral screening of dental calculus, gingivitis and dental caries through segmentation on intraoral photographic images using deep learning. BMC Oral Health 2024, 24, 1287. [Google Scholar]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar]
- Choi, J.; Wang, Z.; Venkataramani, S.; Chuang, P.I.J.; Srinivasan, V.; Gopalakrishnan, K. Pact: Parameterized clipping activation for quantized neural networks. arXiv 2018, arXiv:1805.06085. [Google Scholar]
- Zhu, S.; Duong, L.H.; Liu, W. XOR-Net: An efficient computation pipeline for binary neural network inference on edge devices. In Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, China, 2–4 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 124–131. [Google Scholar]
- Sung, W.; Shin, S.; Hwang, K. Resiliency of deep neural networks under quantization. arXiv 2015, arXiv:1511.06488. [Google Scholar]
- Migacz, S. 8-bit Inference with TensorRT. NVIDIA. May 2017. Available online: https://www.cse.iitd.ac.in/~rijurekha/course/tensorrt.pdf (accessed on 31 March 2025).
- Zhao, R.; Hu, Y.; Dotzel, J.; De Sa, C.; Zhang, Z. Improving neural network quantization without retraining using outlier channel splitting. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7543–7552. [Google Scholar]
- Choukroun, Y.; Kravchik, E.; Yang, F.; Kisilev, P. Low-bit quantization of neural networks for efficient inference. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3009–3018. [Google Scholar]
- Nagel, M.; Amjad, R.A.; Van Baalen, M.; Louizos, C.; Blankevoort, T. Up or down? adaptive rounding for post-training quantization. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 7197–7206. [Google Scholar]
- Wu, D.; Tang, Q.; Zhao, Y.; Zhang, M.; Fu, Y.; Zhang, D. Easyquant: Post-training quantization via scale optimization. arXiv 2020, arXiv:2006.16669. [Google Scholar]
- Banner, R.; Nahshan, Y.; Hoffer, E.; Soudry, D. Aciq: Analytical Clipping for Integer Quantization of Neural Networks. 2018. Available online: https://openreview.net/forum?id=B1x33sC9KQ (accessed on 31 March 2025).
- Ji, H.; Xia, M.; Zhang, D.; Lin, H. Multi-supervised feature fusion attention network for clouds and shadows detection. ISPRS Int. J. Geo-Inf. 2023, 12, 247. [Google Scholar] [CrossRef]
- Zhu, T.; Zhao, Z.; Xia, M.; Huang, J.; Weng, L.; Hu, K.; Lin, H.; Zhao, W. FTA-Net: Frequency-Temporal-Aware Network for Remote Sensing Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3448–3460. [Google Scholar]
- Fowers, J.; Ovtcharov, K.; Papamichael, M.K.; Massengill, T.; Liu, M.; Lo, D.; Alkalay, S.; Haselman, M.; Adams, L.; Ghandi, M.; et al. Inside Project Brainwave’s Cloud-Scale, Real-Time AI Processor. IEEE Micro 2019, 39, 20–28. [Google Scholar]
- Nagel, M.; Baalen, M.v.; Blankevoort, T.; Welling, M. Data-free quantization through weight equalization and bias correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1325–1334. [Google Scholar]
- Cai, Y.; Yao, Z.; Dong, Z.; Gholami, A.; Mahoney, M.W.; Keutzer, K. Zeroq: A novel zero shot quantization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 13169–13178. [Google Scholar]
- Li, Y.; Gong, R.; Tan, X.; Yang, Y.; Hu, P.; Zhang, Q.; Yu, F.; Wang, W.; Gu, S. Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv 2021, arXiv:2102.05426. [Google Scholar]
- Finkelstein, A.; Almog, U.; Grobman, M. Fighting quantization bias with bias. arXiv 2019, arXiv:1906.03193. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
- Gale, T.; Elsen, E.; Hooker, S. The state of sparsity in deep neural networks. arXiv 2019, arXiv:1902.09574. [Google Scholar]
- Renda, A.; Frankle, J.; Carbin, M. Comparing rewinding and fine-tuning in neural network pruning. arXiv 2020, arXiv:2003.02389. [Google Scholar]
- Muhammad, M.B.; Yeasin, M. Eigen-cam: Class activation map using principal components. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Faghri, F.; Fleet, D.J.; Kiros, J.R.; Fidler, S. Vse++: Improving visual-semantic embeddings with hard negatives. arXiv 2017, arXiv:1707.05612. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Liu, F.; Qiao, R.; Chen, G.; Gong, G.; Lu, H. CASSANN-v2: A high-performance CNN accelerator architecture with on-chip memory self-adaptive tuning. IEICE Electron. Express 2022, 19, 20220124. [Google Scholar] [CrossRef]
FP32 | ClipQ | |
---|---|---|
ResNet18 | 69.76% | 69.69% |
ResNet50 | 76.13% | 76.01% |
VGG11 | 70.37% | 70.28% |
DenseNet121 | 74.65% | 74.54% |
RetinaNet | YoloV3 | SSD | YoloX | |
---|---|---|---|---|
Model Size (MB) | 143.2 | 13.39 | 16.2 | 19.4 |
FP32 | 36.5% ± 0.1% | 23.9% ± 0.12% | 21.3% ± 0.18% | 31.8% ± 0.1% |
ClipQ-8-bit | 36.3% ± 0.2% | 23.5% ± 0.22% | 21.1% ± 0.23% | 31.7% ± 0.2% |
Jacob-6bit | 35.3% ± 0.23% | 16.3% ± 0.25% | 19.1% ± 0.33% | 27.3% ± 0.26% |
ACIQ-6bit | 35.3% ± 0.24% | 18.7% ± 0.28% | 18.4% ± 0.25% | 27.9% ± 0.23% |
ClipQ-6bit | 35.9% ± 0.25% | 21.4% ± 0.27% | 20.3% ± 0.21% | 30.2% ± 0.25% |
FCN | PSPNet | DeepLabV3+ | PSANet | GCNet | |
---|---|---|---|---|---|
Model Size (MB) | 51.4 | 54.7 | 61.2 | 236.9 | 198.8 |
FP32 | 70.8% ± 0.15% | 69.82% ± 0.1% | 75.33% ± 0.13% | 77.92% ± 0.1% | 78.12% ± 0.15% |
ClipQ-8-bit | 70.79% ± 0.26% | 70.76% ± 0.25% | 75.49% ± 0.2% | 78.2% ± 0.22% | 77.98% ± 0.25% |
Jacob-6bit | 69.6% ± 0.28% | 59.07% ± 0.36% | 67.5% ± 0.34% | 72.17% ± 0.31% | 74.17% ± 0.35% |
ACIQ-6bit | 70.4% ± 0.31% | 61.92% ± 0.35% | 71.11% ± 0.32% | 77.03% ± 0.3% | 76.8% ± 0.3% |
ClipQ-6bit | 70.4% ± 0.2% | 69.69% ± 0.23% | 72.97% ± 0.22% | 77.62% ± 0.23% | 77.22% ± 0.22% |
Max | CS | MSE | ACIQ | ClipQ- | ClipQ | |
---|---|---|---|---|---|---|
MobileNetV2 | 66.61% ± 0.1% | 69.46% ± 0.2% | 70% ± 0.2% | 66.59% ± 0.3% | 69.27% ± 0.3% | 70.37% ± 0.2% |
MnasNetV1 | 36.53% ± 0.2% | 56.13% ± 0.3% | 51.78% ± 0.3% | 35.36% ± 0.4% | 55.31% ± 0.4% | 64.33% ± 0.2% |
PSPNet | 64.28% ± 0.1% | 66.9% ± 0.2% | 67.2% ± 0.2% | 63.82% ± 0.3% | 66.63% ± 0.3% | 69.82% ± 0.3% |
DeepLabV3+ | 72.58% ± 0.1% | 74.46% ± 0.2% | 74.42% ± 0.2% | 71.88% ± 0.3% | 73.83% ± 0.2% | 74.75% ± 0.2% |
YoloX | 30.7% ± 0.2% | 30.9% ± 0.3% | 30.9% ± 0.2% | 28.7% ± 0.4% | 30.2% ± 0.3% | 31.1% ± 0.2% |
SSD | 20.2% ± 0.2% | 20.4% ± 0.1% | 20.5% ± 0.1% | 19.6% ± 0.4% | 19.6% ± 0.2% | 20.7% ± 0.1% |
Max | ACIQ | Percent | KL | ClipQ- | ClipQ | |
---|---|---|---|---|---|---|
MobileNetV2 | 66.3% ± 0.2% | 68.58% ± 0.3% | 67.78% ± 0.3% | 70.25% ± 0.2% | 70.24% ± 0.2% | 70.54% ± 0.1% |
MnasNetV1 | 25.04% ± 0.3% | 52.72% ± 0.4% | 57.65% ± 0.3% | 58.68% ± 0.3% | 58.83% ± 0.2% | 60.45% ± 0.3% |
PSPNet | 62.04% ± 0.2% | 69.21% ± 0.3% | 69.25% ± 0.3% | 69.54% ± 0.2% | 69.63% ± 0.1% | 70.01% ± 0.2% |
DeepLabV3+ | 69.56% ± 0.2% | 73.96% ± 0.3% | 74.69% ± 0.2% | 74.71% ± 0.1% | 74.75% ± 0.1% | 74.92% ± 0.1% |
YoloX | 28.6% ± 0.3% | 30.4% ± 0.3% | 29.7% ± 0.2% | 30.1% ± 0.2% | 30.2% ± 0.1% | 30.7% ± 0.2% |
SSD | 20% ± 0.1% | 20.3% ± 0.2% | 20.7% ± 0.1% | 19% ± 0.3% | 20.4% ± 0.2% | 20.7% ± 0.1% |
Fake-Quantization | FPGA | |
---|---|---|
MnasNetV1 | 66.5% | 66.31% |
MobileNetV2 | 71.57% | 71.52% |
FCN | 70.79% | 70.6% |
SSD | 21.1% | 21% |
YoLoV3 | 23.5% | 23.4% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Zhang, H.; Zhang, C.; Liu, Y. ClipQ: Clipping Optimization for the Post-Training Quantization of Convolutional Neural Network. Appl. Sci. 2025, 15, 3980. https://doi.org/10.3390/app15073980
Chen Y, Zhang H, Zhang C, Liu Y. ClipQ: Clipping Optimization for the Post-Training Quantization of Convolutional Neural Network. Applied Sciences. 2025; 15(7):3980. https://doi.org/10.3390/app15073980
Chicago/Turabian StyleChen, Yiming, Hui Zhang, Chen Zhang, and Yi Liu. 2025. "ClipQ: Clipping Optimization for the Post-Training Quantization of Convolutional Neural Network" Applied Sciences 15, no. 7: 3980. https://doi.org/10.3390/app15073980
APA StyleChen, Y., Zhang, H., Zhang, C., & Liu, Y. (2025). ClipQ: Clipping Optimization for the Post-Training Quantization of Convolutional Neural Network. Applied Sciences, 15(7), 3980. https://doi.org/10.3390/app15073980