Efficient Integer Quantization for Compressed DETR Models
Abstract
:1. Introduction
- Stable and accurate integer approximation for Transformer-based non-linear operations: By replacing the ResNet-50 backbone with Swin-T, all non-linear transformations in the backbone, encoder, and decoder are under a unified Transformer architecture. This architectural unification allows non-linear approximations to focus solely on the Transformer structure, eliminating the need to simultaneously account for the CNN architecture, enabling non-linear layers involving floating-point operations, such as Softmax, LayerNorm, and GELU, to be approximately implemented in the integer domain. A stability-enhanced Sigmoid approximation is also introduced to reduce errors under high dynamic range inputs. After fully integer quantization, the average precision only slightly drops from 44.6 to 44.1, demonstrating stable performance and suitability for edge deployment.
- Fully integer quantization optimization for linear operations: This paper applies QAT to all linear operations for integer quantization. The use of QAT enables the model to update parameters through backpropagation, thereby learning to compensate for quantization-induced distortions and alleviating the problem of cumulative quantization errors. In addition, the low-bit integer representation significantly reduces computational complexity during inference, independent of whether QAT or PTQ is used. Compared to the non-quantized version, bitwise operations are reduced by 93.7% (from 94.2 TBLOPs to 5.9 TBLOPs).
- Energy-efficient and hardware-friendly fully integer computation: This paper proposes a fully integer computation scheme that achieves end-to-end integer computation through fully quantized linear operations and integer-approximated non-linear operations. Unlike existing methods, this approach relies entirely on integer computation units during inference, eliminating the need for floating-point computation units. This reduction lowers hardware manufacturing costs and decreases dependence on high-power computing architectures. Furthermore, the proposed method compresses the model storage requirement to 25% of its original size (reducing from 173.95 MB to 43.49 MB), significantly lowering the deployment threshold for edge devices.
2. Related Work
2.1. Quantization Methods for Object Detection Models
2.2. Model Compression and Lightweight Techniques for Transformers
3. Model Method
3.1. Model Overview
3.2. Integer Quantization Implementation for Linear Operation Layers
3.3. Integer Approximation for Non-Linear Operation Layers
4. Experiments
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AP | average precision |
BOPs | bit operations |
CNN | convolutional neural network |
COCO | common objects in context |
DETR | DEtection TRansformer |
DN | dyadic number |
FP32 | 32-bit floating point |
FQN | fully quantized network |
GELU | Gaussian error linear unit |
Int8 | 8-bit integer |
MLP | multi-layer perceptron |
NLP | natural language processing |
NMS | non-maximum suppression |
PTQ | post-training quantization |
Q-DETR | quantized detection Transformer |
QAT | quantization-aware training |
ReLU | rectified linear unit |
Swin-T | Swin Transformer-tiny |
STE | straight-through estimator |
TBOPs | tera bit operations |
ViT | vision Transformer |
References
- Song, Z.; Liu, L.; Jia, F.; Luo, Y.; Jia, C.; Zhang, G.; Yang, L.; Wang, L. Robustness-aware 3D object detection in autonomous driving: A review and outlook. IEEE Trans. Intell. Transp. Syst. 2024, 25, 15407–15436. [Google Scholar] [CrossRef]
- Al-E’mari, S.; Sanjalawe, Y.; Alqudah, H. Integrating enhanced security protocols with moving object detection: A YOLO-based approach for real-time surveillance. In Proceedings of the 2nd International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, 26–28 February 2024; pp. 1–6. [Google Scholar]
- Ragab, M.G.; Abdulkadir, S.J.; Muneer, A.; Alqushaibi, A.; Sumiea, E.H.; Qureshi, R.; Al-Selwi, S.M.; Alhussian, H. A comprehensive systematic review of YOLO for medical object detection (2018 to 2023). IEEE Access 2024, 12, 57815–57836. [Google Scholar] [CrossRef]
- Lin, I.-A.; Cheng, Y.-W.; Lee, T.-Y. Enhancing smart agriculture with lightweight object detection: MobileNetv3-YOLOv4 and adaptive width multipliers. IEEE Sens. J. 2024, 24, 40017–40028. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Part V. pp. 740–755. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Liu, Z.; Wang, Y.; Han, K.; Zhang, W.; Ma, S.; Gao, W. Post-training quantization for vision transformer. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Online, 6–14 December 2021; Volume 34, pp. 28092–28103. [Google Scholar]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar]
- Xu, S.; Li, Y.; Lin, M.; Gao, P.; Guo, G.; Lu, J.; Zhang, B. Q-DETR: An efficient low-bit quantized detection transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 3842–3851. [Google Scholar]
- Wang, Z.; Wu, Z.; Lu, J.; Zhou, J. BiDet: An efficient binarized object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 2049–2058. [Google Scholar]
- Ding, Y.; Feng, W.; Chen, C.; Guo, J.; Liu, X. Reg-PTQ: Regression-specialized post-training quantization for fully quantized object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–18 June 2024; pp. 16174–16184. [Google Scholar]
- Li, R.; Wang, Y.; Liang, F.; Qin, H.; Yan, J.; Fan, R. Fully quantized network for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2810–2819. [Google Scholar]
- Wei, Y.; Pan, X.; Qin, H.; Ouyang, W.; Yan, J. Quantization mimic: Towards very tiny CNN for object detection. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 267–283. [Google Scholar]
- Yang, L.; Zhang, T.; Sun, P.; Li, Z.; Zhou, S. FQ-ViT: Post-training quantization for fully quantized vision transformer. In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI), Vienna, Austria, 23–29 July 2022; pp. 1173–1179. [Google Scholar]
- Li, Z.; Gu, Q. I-ViT: Integer-only quantization for efficient vision transformer inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 17019–17029. [Google Scholar]
- Li, Z.; Xiao, J.; Yang, L.; Gu, Q. RepQViT: Scale reparameterization for post-training quantization of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 17227–17236. [Google Scholar]
- He, Y.; Lou, Z.; Zhang, L.; Liu, J.; Wu, W.; Zhou, H.; Zhuang, B. BiViT: Extremely compressed binary vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 5628–5640. [Google Scholar]
- Kim, S.; Gholami, A.; Yao, Z.; Mahoney, M.W.; Keutzer, K. I-BERT: Integer-only BERT quantization. In Proceedings of the International Conference on Machine Learning (ICML), Virtual Event, 8–24 July 2021; pp. 5506–5518. [Google Scholar]
- Bengio, Y.; Léonard, N.; Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv 2013, arXiv:1308.3432. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Lin, Y.; Li, Y.; Liu, T.; Xiao, T.; Liu, T.; Zhu, J. Towards fully 8-bit integer inference for the transformer model. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 19–27 August 2021; pp. 3759–3765. [Google Scholar]
- Li, Z.; Chen, M.; Xiao, J.; Gu, Q. PSAQ-ViT v2: Toward accurate and general data-free quantization for vision transformers. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 17227–17238. [Google Scholar] [CrossRef]
- van Baalen, M.; Louizos, C.; Nagel, M.; Amjad, R.A.; Wang, Y.; Blankevoort, T.; Welling, M. Bayesian bits: Unifying quantization and pruning. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), Online, 6–12 December 2020; pp. 5741–5752. [Google Scholar]
Model | AP | BOPs (T) | Params |
---|---|---|---|
ResNet-50 DETR | 42.0 | 88.06 | 41.6 M |
Swin-T DETR | 44.6 | 93.79 | 45.6 M |
Bits (W/A/Attention) | AP | Size (MB) | BOPs (T) |
---|---|---|---|
32/32/32 (FP model) | 44.6 | 173.95 | 94.2 |
8/8/8 (INT model) | 44.1 | 43.49 | 5.9 |
Model | Method | Bits | Size (MB) | AP | int.-only | Diff. (%) |
---|---|---|---|---|---|---|
ResNet-50 DETR | Real-valued | 32-32-32 | 159.3 | 42 | × | − |
VT-PTQ | 8-8-8 | 39.8 | 41.2 | × | ||
Swin-T DETR | Real-valued | 32-32-32 | 173.9 | 44.6 | × | − |
Our Method | 8-8-8 | 43.5 | 44.1 | ✓ | −1.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, P.; Li, C.; Zhang, N.; Yang, J.; Wang, L. Efficient Integer Quantization for Compressed DETR Models. Entropy 2025, 27, 422. https://doi.org/10.3390/e27040422
Liu P, Li C, Zhang N, Yang J, Wang L. Efficient Integer Quantization for Compressed DETR Models. Entropy. 2025; 27(4):422. https://doi.org/10.3390/e27040422
Chicago/Turabian StyleLiu, Peng, Congduan Li, Nanfeng Zhang, Jingfeng Yang, and Li Wang. 2025. "Efficient Integer Quantization for Compressed DETR Models" Entropy 27, no. 4: 422. https://doi.org/10.3390/e27040422
APA StyleLiu, P., Li, C., Zhang, N., Yang, J., & Wang, L. (2025). Efficient Integer Quantization for Compressed DETR Models. Entropy, 27(4), 422. https://doi.org/10.3390/e27040422