Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo
Abstract
:1. Introduction
- 1.
- We propose block-wisely supervised network pruning to compress a baseline network by removing semantically similar structures without heavily sacrificing model accuracy.
- 2.
- Based on block-wise KD for learning effective feature representation and an MCMC scheme for sampling subnets from the posterior, BNP is able to find the optimal reduced architecture.
- 3.
- The evaluation results of BNP on multiple network architectures and datasets suggest it has advantages in yielding higher pruning rates and less accuracy drop than state-of-the-art (SOTA) methods.
2. Related Works
2.1. Network Pruning
2.2. Knowledge Distillation
3. Method
3.1. Block-Wise Network Pruning
3.2. MCMC-Based Substructure Search
Algorithm 1: MCMC scheme in BNP. |
Input: A student block S, input data X in validation dataset, number of restarts N, length of each MCMC repetition M. Output: Reduced substructure.
|
3.3. Filter Selection for Each Layer
4. Experiments
4.1. Implementation Details
4.2. Results
4.2.1. Results on CIFAR-10
4.2.2. Results on CIFAR-100
4.2.3. Results on ImageNet
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- He, Y.; Zhang, X.; Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1389–1397. [Google Scholar]
- He, Y.; Lin, J.; Liu, Z.; Wang, H.; Li, L.; Han, S. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–800. [Google Scholar]
- Gan, J.; Wang, W.; Lu, K. Compressing the cnn architecture for in-air handwritten chinese character recognition. Pattern Recognit. Lett. 2020, 129, 190–197. [Google Scholar] [CrossRef]
- Liebenwein, L.; Baykal, C.; Lang, H.; Feldman, D.; Rus, D. Provable filter pruning for efficient neural networks. In Proceedings of the International Conference on Learning Representations (ICLR), Online, 26 April–1 May 2020. [Google Scholar]
- Chin, T.W.; Ding, R.; Zhang, C.; Marculescu, D. Towards efficient model compression via learned global ranking. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Lin, S.; Ji, R.; Yan, C.; Zhang, B.; Cao, L.; Ye, Q.; Huang, F.; Doermann, D. Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2790–2799. [Google Scholar]
- You, Z.; Yan, K.; Ye, J.; Ma, M.; Wang, P. Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 2133–2144. [Google Scholar]
- Wang, H.; Zhao, H.; Li, X.; Tan, X. Progressive blockwise knowledge distillation for neural network acceleration. In Proceedings of the International Joint Conference on Artifificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 2769–2775. [Google Scholar]
- Gasparini, M. Markov chain monte carlo in practice. Technometrics 1999, 39, 338. [Google Scholar] [CrossRef]
- Li, C.; Peng, J.; Yuan, L.; Wang, G.; Liang, X.; Lin, L.; Chang, X. Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Wang, Z.; Li, C.; Wang, X. Convolutional neural network pruning with structural redundancy reduction. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Online, 19–25 June 2021; pp. 14913–14922. [Google Scholar]
- Osaku, D.; Gomes, J.; Falcão, A. Convolutional neural network simplification with progressive retraining. Pattern Recognit. Lett. 2021, 150, 235–241. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, PC, Canada, 7–12 December 2015; pp. 1135–1143. [Google Scholar]
- Ashok, A.; Rhinehart, N.; Beainy, F.; Kitani, K.M. N2n learning: Network to network compression via policy gradient reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Lin, M.; Ji, R.; Wang, Y.; Zhang, Y.; Zhang, B.; Tian, Y.; Shao, L. Hrank: Filter pruning using high-rank feature map. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1526–1535. [Google Scholar]
- Ide, H.; Kobayashi, T.; Watanabe, K.; Kurita, T. Robust pruning for efficient cnns. Pattern Recognit. Lett. 2020, 135, 90–98. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Zaras, A.; Passalis, N.; Tefas, A. Improving knowledge distillation using unified ensembles of specialized teachers. Pattern Recognit. Lett. 2021, 146, 215–221. [Google Scholar] [CrossRef]
- Li, M.; Lin, J.; Ding, Y.; Liu, Z.; Zhu, J.-Y.; Han, S. Gan compression: Efficient architectures for interactive conditional gans. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5284–5294. [Google Scholar]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning filters for efficient convnets. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–7 December 2017. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; Citeseer: Princeton, NJ, USA, 2009. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Feifei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, Z.; Wang, N. Data-driven sparse structure selection for deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 304–320. [Google Scholar]
- He, Y.; Liu, P.; Wang, Z.; Hu, Z.; Yang, Y. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 4340–4349. [Google Scholar]
- He, Y.; Ding, Y.; Liu, P.; Zhu, L.; Zhang, H.; Yang, Y. Learning filter pruning criteria for deep convolutional neural networks acceleration. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2006–2015. [Google Scholar]
- Ning, X.; Zhao, T.; Li, W.; Lei, P.; Wang, Y.; Yang, H. DSA: More efficient budgeted pruning via differentiable sparsity allocation. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 592–607. [Google Scholar]
- Ruan, X.; Liu, Y.; Li, B.; Yuan, C.; Hu, W. Dpfps: Dynamic and progressive filter pruning for compressing convolutional neural networks from scratch. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 2495–2503. [Google Scholar]
- He, Y.; Kang, G.; Dong, X.; Fu, Y.; Yang, Y. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the International Joint Conference on Artifificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 June 2018. [Google Scholar]
- Luo, J.-H.; Wu, J. Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognit. 2020, 107, 107461. [Google Scholar] [CrossRef]
- Kang, M.; Han, B. Operation-aware soft channel pruning using differentiable masks. In Proceedings of the International Conference on Machine Learning, PMLR, Online, 12–18 July 2020; pp. 5122–5131. [Google Scholar]
- Lin, M.; Ji, R.; Zhang, Y.; Zhang, B.; Wu, Y.; Tian, Y. Channel pruning via automatic structure search. In Proceedings of the International Joint Conference on Artifificial Intelligence (IJCAI), Online, 7–15 January 2020; pp. 673–679. [Google Scholar]
Method | Paras↓ % | FLOPs↓ % | Baseline Acc | Acc↓ % |
---|---|---|---|---|
SSS [26] | 66.70 | 36.30 | 93.96 | 0.33 |
SSS [26] | 73.80 | 41.60 | 93.96 | 0.94 |
GAL [6] | 77.60 | 39.60 | 93.96 | 0.19 |
GAL [6] | 88.20 | 45.20 | 93.96 | 0.54 |
HRank [15] | 82.90 | 53.50 | 93.96 | 0.53 |
BNP () | 64.88 | 54.00 | 93.51 | 0.37 |
BNP () | 59.73 | 62.94 | 93.51 | 1.80 |
Method | Paras↓ % | FLOPs↓ % | Baseline Acc | Acc↓ % |
---|---|---|---|---|
AMC [2] | - | 50.00 | 92.80 | 0.90 |
GAL [6] | 11.80 | 37.60 | 93.26 | −0.12 |
GAL [6] | 65.90 | 60.20 | 93.26 | 1.68 |
FPGM [27] | - | 52.60 | 93.59 | 0.70 |
LFPC [28] | - | 52.90 | 93.59 | 0.35 |
HRank [15] | 16.80 | 29.30 | 93.26 | −0.26 |
HRank [15] | 42.40 | 50.00 | 93.26 | 0.09 |
DSA [29] | - | 52.20 | 93.12 | 0.22 |
DSA [29] | - | 67.40 | 93.12 | 0.92 |
DPFPS [30] | - | 52.86 | 93.81 | 0.61 |
BNP () | 40.25 | 46.97 | 93.32 | 0.11 |
BNP () | 52.92 | 62.97 | 93.32 | 0.16 |
BNP () | 60.18 | 66.29 | 93.32 | 0.66 |
Method | Paras↓ % | FLOPs↓ % | Baseline Acc | Acc↓ % |
---|---|---|---|---|
SFP [31] | - | 52.30 | 93.67 | 0.70 |
GAL [6] | 4.10 | 18.70 | 93.50 | −0.09 |
GAL [6] | 44.80 | 48.50 | 93.50 | 0.76 |
FPGM [27] | - | 52.30 | 93.68 | −0.05 |
LFPC [28] | - | 60.30 | 93.68 | −0.11 |
HRank [15] | 39.40 | 41.20 | 93.50 | −0.73 |
HRank [15] | 59.20 | 58.20 | 93.50 | 0.14 |
BNP () | 59.90 | 61.24 | 93.54 | −0.16 |
BNP () | 67.86 | 61.44 | 93.54 | −0.01 |
BNP () | 70.93 | 69.21 | 93.54 | 0.50 |
Method | Paras↓ % | FLOPs↓ % | Baseline Acc | Acc↓ % |
---|---|---|---|---|
SFP [31] | - | 52.60 | 71.40 | 2.61 |
FPGM [27] | - | 52.60 | 71.41 | 1.75 |
LFPC [28] | - | 51.60 | 71.41 | 0.58 |
BNP () | 58.08 | 53.78 | 70.53 | 0.55 |
BNP () | 51.94 | 61.34 | 70.53 | 0.97 |
BNP () | 59.73 | 62.94 | 70.53 | 1.80 |
Method | Paras↓ % | FLOPs↓ % | Top-1 Acc% | Top-5 Acc% | Top-1 Acc↓ % | Top-5 Acc↓ % |
---|---|---|---|---|---|---|
CP [1] | - | 50.00 | - | 92.20 | - | 1.40 |
AP [32] | - | 51.20 | 76.15 | 92.87 | 1.39 | 0.72 |
DSA [29] | - | 50.00 | 76.02 | 92.86 | 1.33 | 0.80 |
FPGM [27] | - | 53.50 | 76.15 | 92.87 | 2.02 | 0.93 |
GAL [6] | - | 55.00 | 76.15 | 92.87 | 4.35 | 2.05 |
SCP [33] | - | 54.30 | 75.89 | 92.98 | 1.69 | 0.98 |
ABC [34] | - | 54.00 | 76.01 | 92.96 | 2.15 | 1.27 |
HRank [15] | 46.00 | 62.10 | 76.15 | 92.87 | 4.17 | 1.86 |
BNP () | 47.73 | 51.31 | 77.73 | 93.83 | 1.27 | 0.73 |
BNP () | 41.89 | 55.00 | 77.73 | 93.83 | 1.50 | 0.91 |
BNP () | 59.91 | 63.04 | 77.73 | 93.83 | 1.93 | 1.09 |
BNP () | 61.05 | 70.63 | 77.73 | 93.83 | 3.10 | 2.37 |
Paras↓ % | FLOPs↓ % | Baseline Acc | Acc↓ % | |
---|---|---|---|---|
0.01 | 30.32 | 38.74 | 92.17 | 0.84 |
0.02 | 54.83 | 51.28 | 92.17 | 1.34 |
0.03 | 58.26 | 60.89 | 92.17 | 2.16 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, H.; Du, F.; Song, L.; Yu, Z. Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo. Appl. Sci. 2022, 12, 10952. https://doi.org/10.3390/app122110952
Liu H, Du F, Song L, Yu Z. Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo. Applied Sciences. 2022; 12(21):10952. https://doi.org/10.3390/app122110952
Chicago/Turabian StyleLiu, Huidong, Fang Du, Lijuan Song, and Zhenhua Yu. 2022. "Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo" Applied Sciences 12, no. 21: 10952. https://doi.org/10.3390/app122110952
APA StyleLiu, H., Du, F., Song, L., & Yu, Z. (2022). Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo. Applied Sciences, 12(21), 10952. https://doi.org/10.3390/app122110952