You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

3 April 2023

A Novel Channel Pruning Compression Algorithm Combined with an Attention Mechanism

,
,
and
1
School of Computer Science, Yangtze University, Jingzhou 434023, China
2
Department of Creative Technologies and Product Design, National Taipei University of Business, Taipei 100, Taiwan
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computer Science & Engineering

Abstract

To solve the problem of complex network models with a large number of redundant parameters, a pruning algorithm combined with an attention mechanism is proposed. Firstly, the basic training is performed once, and the network model is then re-trained with the attention mechanism for the baseline. The obtained model is pruned based on channel correlation, and finally a simplified model is obtained via continuous cyclic iteration while the accuracy rate is kept as close as possible to that of the baseline model. The algorithm was experimentally validated on ResNet based on different datasets, and the results showed that the algorithm provided strong adaptability to different datasets and different network structures. For the CIFAR-100 dataset, ResNet50 was pruned to reduce the amount of model parameters by 80.3% and the amount of computation by 69.4%, while maintaining accuracy. For the ImageNet dataset, the ResNet50 parameter volume was compressed by 2.49 times and the computational volume was compressed by 3.01 times. The ResNet101 parameter volume was reduced by 61.2%, and the computational volume was reduced by 68.5%. Compared with the traditional fixed threshold, the model achieves better results in terms of detection accuracy, compression effect, and inference speed.

1. Introduction

Deep learning has been widely used in many fields, such as image classification, target detection, and semantic segmentation. In order to achieve excellent performance, researchers have proposed VggNet [1], GoogleNet [2], ResNet [3], DenseNet [4], and other backbone architecture networks. More recently, attention mechanism have been introduced to further improve network accuracy. Jiang et al. [5] introduced a convolutional attention mechanism in a residual network to reduce the redundant mapping of remote sensing scene features at a reasonable extra time cost. Zheng et al. [6] added channel and spatial modules based on the self-attention mechanism to the backbone network and the enhanced feature extraction network of pyramid scene parsing, respectively, to extract more important feature detail information from images. However, the large computational resource requirements and high power consumption of deep learning networks limit their potential applications. In order to reduce memory consumption and speed up inference time, researchers have proposed many compression strategies, and these can be divided into weight quantization [7], knowledge distillation [8], low-rank decomposition [9], and network pruning [10]. Pruning as a method to accelerate larger pre-trained models is a common way to compress networks. Wang et al. [11], in their work on network structure, found that pruning the least important filters in the layers with the most structural redundancy enabled them to identify the structural redundancy of the CNN and prune the filters in the selected layers with the most redundancy. Shao et al. [12] proposed a novel filter pruning method that combines convolutional filters and feature map information for convolutional neural network compression, i.e., network pruning using clustered similarity and large eigenvalues. Kim [13] proposed a new technique to shrink the previously used style transfer network to eliminate redundancy in terms of memory consumption and computational cost.
The aforementioned methods have resulted in some progress for neural network model streamlining and other related applications, but the degree of model compression and the degree of accelerated computation are not sufficient and are not necessarily suitable for deployment in mobile terminal devices. Because of this, a cyclic pruning compression algorithm combined with an attention mechanism is proposed.
The remainder of this paper is organized as follows. A basic introduction to the attention mechanism and pruning algorithm is presented in Section 2. A detailed description of the steps and details of the algorithms are then presented in Section 3. Section 4 gives the experimental results and analyzes them. Lastly, the paper is summarized in Section 5.

3. Algorithm

Within this section, we will first provide an overview of the channel correlation pruning based on the channel–spatial attention mechanism (CCPCSA). Subsequently, we will present the structural components of the BAM attention module. Lastly, we will propose a CCPCSA algorithm that accomplishes network model pruning in the context of BAM.
An overview of our CCPCSA approach is illustrated in Figure 2. The BAM module, which captures channel importance, is integrated into the original network, and the resultant network is then trained. Subsequently, based on the relevance of the channels, the final network model is obtained by performing channel pruning on top of this trained network.
Figure 2. Overview of CCPCSA method.

3.1. Attention Mechanism

In accordance with the details put forward in reference [24], Figure 3 depicts the complete and comprehensive structural arrangement of our BAM (spatial and channel attention) module. The figure provides insights into the architecture of the module which could prove useful in understanding its functioning and benefits. Indeed, if we solely employ spatial attention, the channel dimension information will be disregarded because it treats features in distinct channels equally. Similarly, if we solely utilize channel attention, the information within the channel will also be overlooked. Hence, we believe that better performance can be achieved by integrating the spatial and channel attention modules into a unified module.
Figure 3. Schematic diagram of clustering.
For a given input feature map F C × H × W BAM infers a three-dimensional attention map M ( F ) C × H × W . C represents the number of channels in the feature graph, H is the height of the feature graph, and W is the width of the feature graph. The 3D attention map M ( F ) is multiplied element by element with the input feature map F and then added to the original input feature map to obtain the final feature map F . The extracted feature map F is calculated using Equation (1):
F = F + F M ( F )
where ⊗ denotes element-by-element multiplication. To calculate the channel attention M C ( F ) C and the spatial attention M S ( F ) H × W in two separate branches, the attention map M ( F ) is calculated using Equation (2):
M ( F ) = σ ( M C ( F ) + M S ( F ) )
where σ is the Sigmoid activation function, and attention is mapped to C × H × W before the output of both channels.
Each channel of the feature map is aggregated in the channel branch using inter-channel relationships, global average pooling is performed on the feature map F , and a channel vector is generated at F C C × 1 × 1 . This vector encodes global information in each channel. To estimate the cross-channel attention from the channel vector, a multilayer perceptron MLP is used. To save parameter overhead, the hidden activation size is set to C r × 1 × 1 , where r is the compression ratio. For the sake of description, instead of using the compressed result directly, we use the compression ratio. After the MLP, a batch normalization (BN) layer is used to scale the spatial branching output. The channel attention is calculated using Equation (3):
M C ( F ) = Φ B N ( M L P ( A v g P o o l ( F ) ) ) = Φ B N ( W 1 ( W 0 A v g P o o l ( F ) + b 0 ) + b 1 )
where Φ B N denotes batch regularized, and W 0 C r × C , b 0 C r , W 1 C × C r , b 1 C , and F denote input features.
Spatial branching produces spatial attention maps M S ( F ) H × W to emphasize or suppress features in different spatial locations. We use inflated convolution to expand the receptive field to make effective use of global information. Inflated convolution helps to construct a more efficient spatial feature map than standard convolution. The features F C × H × W are projected into the reduced dimension C r × H × W , and the feature map is integrated and compressed across channel dimensions using 1 × 1 convolution. The same reduction rate used for the channel attention is used ( r ). After normalization, convolution with two 3 × 3 expansions is used to efficiently utilize contextual information. Finally, the feature map is reduced again to the spatial attention map 1 × H × W using 1 × 1 convolution. Batch normalization is performed at the end of the spatial channel using BN. The spatial attention is calculated using Equation (4):
M S ( F ) = B N ( f 3 1 × 1 ( f 2 3 × 3 ( f 1 3 × 3 ( f 0 1 × 1 ( F ) ) ) ) )
where f denotes convolution, BN denotes batch normalization, and the superscript denotes the size of the convolution kernel. Two 1 × 1 convolutions are used to narrow the channels and two 3 × 3 convolutions are used to aggregate contextual information with larger sensory fields. The main purpose of introducing this attention mechanism into the network is to reweight the different features to eliminate the influence of background, as well as other factors, as much as possible and to focus more on the extraction of effective features.

3.2. Pruning Algorithm Based on Channel Correlation

Filter pruning is primarily a data-driven technique. When the importance of a filter is solely determined by the feature mappings it produces, it can render the importance scale unstable and vulnerable to minor perturbations in the input data. Conversely, when importance is evaluated based on the information present in multiple feature maps, it has the potential to minimize interference from input data variations, leading to more dependable and robust importance rankings, provided that it is implemented accurately. Cross-channel strategies have an inherent advantage in terms of facilitating improved modeling and the capture of correlations between different channels. This is due to the fact that such strategies enable more precise modeling, and thereby enable the more accurate identification of inter-channel correlations. In the context of compressing models, correlations identified through cross-channel strategies are viewed as redundancies at the architectural level and serve as the target of filter pruning techniques. This approach aims to eliminate unnecessary filters while ensuring that the model remains effective and accurate. Therefore, by adopting a channel-to-channel strategy, it becomes possible to implement more aggressive pruning techniques while still achieving high levels of accuracy. Building on this insight, we propose to investigate the significance of filters from an inter-channel perspective [41]. Our primary concept is to leverage channel correlations as a means of determining the importance of each feature map (and its corresponding filter). A particular feature map exhibits substantial linear interdependence with feature maps from different channels, and this implies that the content it carries is already extensively encoded in those other feature maps. Thus, despite removing the correlated filter, valuable information and knowledge regarding the representation of highly correlated feature maps can still be preserved and effectively reconstructed by other filters through approximation during the fine-tuning process. This suggests that filters producing highly correlated feature maps may have greater “interchangeability” and lower importance. Consequently, it is reasonable to remove filters associated with high correlation feature map channels while maintaining model capacity.
To extract the linear dependence information of each feature map within the 3D tensor feature atlas generated from a single layer, a methodology based on linear algebra is proposed. This approach addresses the challenge of analyzing the large and complex datasets often encountered in deep learning applications. Specifically, assuming that the output feature atlas of the l layer is l , we first transform the matrix l into H l [ α 1 l T , α 2 l T , , α c l l T ] T , where the row vector α i l h × w is vectorized H i l . The h and w are the dimensions of the feature map. In this case, the linear correlation of each vectorized feature map α i l can be measured by existing matrix analysis tools as a row in the matrixed whole set of feature maps H i l . The most straightforward solution is to use the rank to determine the correlation of α i l , since the rank mathematically represents the maximum number of linearly independent rows/columns in the matrix. For example, one can remove a row from a matrix and calculate its rank change, and then determine the impact and importance of the removed row—the smaller the rank change, the higher the relevance of the matrix of the removed row.
However, within the background of filter pruning, we believe that the change in the nuclear norm of the whole set of feature maps is a better indicator for quantifying the relevance of each feature map. This is because the nuclear norm, i.e., the 1 -norm of the matrix singular values, can reveal more about the effect of deleting rows on the matrix, whereas the rank, i.e., the 0 -norm of the singular values, cannot reflect these changes.
For layer i of the output feature map l = { H 1 l , H 2 l , , H c l l } c l × h × w , the channel correlation ( C ) of one of the i channels of the feature map H i l h × w is defined and calculated using Equation (5):
C ( H i l ) | | H l | | * | | M i l H l | | *
where H l c l × h × w is the matrixed l , | |   | | * is the nuclear norm,   is the Hadamard product, and M i l c l × h × w is the row mask matrix whose entries in the i row are zero and whose other entries are one.
Equation (5) defines the channel correlation measurement for a single feature map. However, practical filter pruning targets the removal of multiple filters, requiring the computation of correlations across combinations of feature maps. For filter pruning, this entails evaluating the change in the nuclear norm of F l in the original c l rows following the removal of the m rows ( α i l ) . One solution is to calculate the variation in C m c l , the number of kernel parameters, for all possible m row removal options and select the option with the smallest variation. However, this approach may be computationally expensive and challenging to manage for large c l values.
To effectively address this computational challenge, our proposal entails leveraging the inter-feature map correlation as an approximation for the correlation exhibited by its amalgamation. This strategy seeks to mitigate the computation burden associated with analyzing the combined set of features. To identify the m smallest linearly independent rows in the matrix H l , we adopt an iterative approach that involves removing a single row ( α i l ) from H l and calculating the resulting change in the nuclear norm between the reduced matrix ( c l 1 ) and the original c l row matrix ( H l ). This process is repeated until m independent rows have been determined. Among the set of computed changes in c l , we select the smallest change and subsequently remove the m rows ( α i l ) from H l that contributed most significantly to this change. The chosen set of m vectorized feature maps ( α i l ) were deemed to possess higher correlations with other feature maps. Consequently, the corresponding filters ( i l ) were identified as being less important and subjected to pruning. Overall, the utilization of individual correlation-based measures can often yield a highly accurate approximation of the combined correlation arising from multiple feature maps.
This approximation method requires less computational complexity while still achieving excellent filter pruning performance.
For layer i of the output feature map l c l × h × w   , the channel correlation of the combined m feature map { H b i l } i = 1 m , where H b i l h × w is in the first b i channel, is defined and approximated using in Equation (6):
C ( { H b i l } i = 1 m ) | | H l | | * | | M b 1 , , b m l H l | | * i = 1 m C ( H b i l )
where M b 1 , , b m 1 is a multi-row mask matrix in which the b 1 , , b m row is zero and all the other rows are one.
Given that our proposed filter pruning method based on channel correlation property is a data-driven approach, it should be carefully ensured and checked for its reliability using different input data distributions. To accomplish this, we conducted an empirical assessment of the channel correlation across multiple input images, and consistently observed a high degree of stability in the average channel correlation for each feature map within each batch of samples that we processed. Consequently, we computed the average channel correlation for small batches of image samples and employed this value to estimate the channel correlation of all input data.

3.3. Overall Flow of the Algorithm

The attention mechanism can explicitly describe the importance of relationships between channels in the same layer and continuously adjust the parameters of the fully connected layer during back propagation [23,42]. By inserting the attention module, the network can show a tendency to gradually enhance or suppress some channels. Then, based on channel similarity, channels with high similarity are continuously filtered out while maintaining high accuracy, and these deleted channels are removed for restorative training. The Figure 4 shows the pruning flow chart.
Figure 4. Flow chart of the pruning process guided by the hybrid attention mechanism.
The Algorithm 1 flow is shown below.
Algorithm 1. CCPCSA Algorithm
Input:
M:The initial network model
D:The training dataset
L:The number of the layers in M
m:The number of channel groups removed per epoch
Output:
M′:The pruned network model
Steps:
1:Obtain ACC by training the model M
2:Obtain a model named M’ by training model M within BAM modules
3:For each batch d D
4:   Train M’ with d
5:    For each layer l { 1 , , L }
6:      For each channel i { 1 , , C l }
7:        Calculate the C ( H i l )
8:Sort all the C ( H i l ) and prune the smaller m rows of channels
9:Fine-tune M’ and obtain the Acc’
10:If Acc-Acc’ < 0.5%, then go back to Step 3
11:Return M’

4. Experiment

To validate the efficacy of our model pruning algorithm, we conducted experiments on the ResNet model utilizing the PyTorch framework. The experimental setup was based on a Windows 10 system equipped with an Intel(R) Core(TM) i5-10300H CPU @ 2.50 GHz and a NVIDIA GeForce GTX 1650 graphics card, and it was run in a virtual environment using PyCharm 2020.1.2 (Professional Edition) with PyTorch version 1.8.0, TorchVision version 0.9.0, and CUDA version 10.2.89.
We conducted image classification experiments to evaluate the feasibility of the method. In the ResNet network, the attention module was placed on the convolutional output of the ResBlock. Experiments were performed on CIFAR-100 and ImageNet datasets and compared with other representative pruning schemes.
FLOPs stands for floating point operations. The mathematical unit of FLOPs(M) is million and the mathematical unit of GFLOPs is billion. The “↑” represents an increase or raise relative to the original model, while the “↓” represents a decrease or lower. OTO* indicates the best performing result in the cited paper, which is identified by *. Based on these results, it can be concluded that CCPCSA shows excellent performance with very little loss of inference accuracy. Based on the results presented in Table 1 and Table 2, it can be concluded that the CCPCSA method outperformed the other advanced pruning methods comprehensively on both datasets, indicating its effectiveness. The CCPCSA method is capable of achieving high accuracy, particularly when the pruning ratio is low. This finding demonstrates the effectiveness of the proposed channel pruning approach in enhancing model accuracy while reducing model complexity. The results obtained by compressing the model as much as possible are shown in Table 1 and Table 2. The results show that a larger network pruning ratio leads to a partial loss of accuracy. On both the CIFAR-100 and ImageNet datasets, the accuracy of the CCPCSA algorithm was almost the same as the original accuracy, and the compression effect results were much better than those of other methods. In particular, the proposed CCPCSA pruning algorithm can significantly speed up the inference computation because of the reduced parameters and computational effort. All these observations clearly show that pruning channels under the guidance of the attention module is beneficial.
Table 1. Comparison of different pruning algorithms for ResNet50 on CIFAR-100.
Table 2. Comparison of different pruning algorithms for different ResNet backbones on ImageNet.

5. Conclusions

The proposed method, called CCPCSA, is a new channel pruning approach that utilizes a channel–spatial attention mechanism to compress network models for use in edge computing. The core idea of CCPCSA is to use the attention statistics provided by a new attention module called BAM and prune the network based on channel correlation information to achieve model efficiency and compression. The BAM module combines spatial attention and channel attention as a whole, which not only enhances the representation capability of the network model, but also reveals the impact of the presence of channels on the inference performance. The pruning operation is then completed by removing channels with high similarity based on inter-channel correlation. The comprehensive experiments we conducted on two benchmark datasets validated the superior effectiveness of the CCPCSA approach compared with other state-of-the-art solutions. In future work, we plan to combine this approach with other model compression strategies (e.g., quantization) and other edge intelligence techniques (e.g., edge cloud collaboration) to further reduce model size and inference costs.

Author Contributions

Methodology, M.Z. and J.T.; funding acquisition, M.Z. and J.T.; supervision, M.Z. and J.T.; project administration, M.Z. and J.T.; writing—original draft preparation, T.L.; writing—review and editing, M.Z. and J.T.; validation, T.L.; formal analysis, S.-L.P.; investigation, S.-L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the New Generation Information Technology Innovation Project 2021; Intelligent loading system based on artificial intelligence, grant number 2021ITA05050.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data in this study are publicly available. The data were obtained from the Internet, but access is required due to privacy or ethical concerns. Access can be obtained by contacting the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Comput. Sci. 2014. [Google Scholar] [CrossRef]
  2. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. IEEE Comput. Soc. 2014. [Google Scholar] [CrossRef]
  3. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  4. Huang, G.; Liu, Z.; Laurens, V.; Weinberger, K. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  5. Jiang, Z.F.; He, T.; Shi, Y.L.; Long, X.; Yang, S. Remote sensing image classification based on convolutional block attention module and deep residual network. Laser J. 2022, 43, 76–81. [Google Scholar] [CrossRef]
  6. Zheng, Q.M.; Xu, L.K.; Wang, F.H.; Lin, C. Pyramid scene parsing network based on improved self-attention mechanism. Comput. Eng. 2022, 1–9. [Google Scholar] [CrossRef]
  7. Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In Proceedings of the ICLR, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  8. Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. Comput. Sci. 2015, 14, 38–39. [Google Scholar]
  9. Jaderberg, M.; Vedaldi, A.; Zisserman, A. Speeding up Convolutional Neural Networks with Low Rank Expansions. arXiv 2014, arXiv:1405.3866. [Google Scholar]
  10. Setiono, R.; Liu, H. Neural-network feature selector. IEEE Trans. Neural Netw. 1997, 8, 654–662. [Google Scholar] [CrossRef]
  11. Wang, Z.; Li, C.; Wang, X. Convolutional neural network pruning with structural redundancy reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14913–14922. [Google Scholar]
  12. Shao, M.; Dai, J.; Wang, R.; Kuang, J.; Zuo, W. CSHE: Network pruning by using cluster similarity and matrix eigenvalues. Int. J. Mach. Learn. Cybern. 2022, 13, 371–382. [Google Scholar] [CrossRef]
  13. Kim, M.; Choi, H.-C. Compact Image-Style Transfer: Channel Pruning on the Single Training of a Network. Sensors 2022, 22, 8427. [Google Scholar] [CrossRef]
  14. Xue, Z.; Yu, X.; Liu, B.; Tan, X.; Wei, X. HResNetAM: Hierarchical Residual Network with Attention Mechanism for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3566–3580. [Google Scholar] [CrossRef]
  15. Chen, Y.; Liu, L.; Phonevilay, V.; Gu, K.; Xia, R.; Xie, J.; Zhang, Q.; Yang, K. Image super-resolution reconstruction based on feature map attention mechanism. Appl. Intell. 2021, 51, 4367–4380. [Google Scholar] [CrossRef]
  16. Cai, W.; Zhai, B.; Liu, Y.; Liu, R.; Ning, X. Quadratic Polynomial Guided Fuzzy C-means and Dual Attention Mechanism for Medical Image Segmentation. Displays 2021, 70, 102106. [Google Scholar] [CrossRef]
  17. Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
  18. Liu, M.; Li, L.; Hu, H.; Guan, W.; Tian, J. Image caption generation with dual attention mechanism. Inf. Process. Manag. 2020, 57, 102178. [Google Scholar] [CrossRef]
  19. Li, W.; Liu, K.; Zhang, L.; Cheng, F. Object detection based on an adaptive attention mechanism. Sci. Rep. 2020, 10, 11307. [Google Scholar] [CrossRef]
  20. Dollar, O.; Joshi, N.; Beck DA, C.; Pfaendtner, J. Attention-based generative models for de novo molecular design. Chem. Sci. 2021, 12, 8362–8372. [Google Scholar] [CrossRef] [PubMed]
  21. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  22. Li, X.; Hu, X.; Yang, J. Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks. arXiv 2019, arXiv:1905.09646. [Google Scholar]
  23. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  24. Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I. Bam: Bottleneck attention module. arXiv 2018, arXiv:180706514. [Google Scholar]
  25. Zhang, X.; Colbert, I.; Das, S. Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization. Appl. Sci. 2022, 12, 7829. [Google Scholar] [CrossRef]
  26. Zhang, T.; Ye, S.; Zhang, K.; Tang, J.; Wen, W.; Fardad, M.; Wang, Y. A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 184–199. [Google Scholar]
  27. Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings of the NIPS 2015, Montreal, QC, Canada, 7–10 December 2015. [Google Scholar]
  28. Luo, J.H.; Wu, J. An Entropy-based Pruning Method for CNN Compression. arXiv 2017, arXiv:1706.05791. [Google Scholar]
  29. Xiang, K.; Peng, L.; Yang, H.; Li, M.; Cao, Z.; Jiang, S.; Qu, G. A novel weight pruning strategy for light weight neural net-works with application to the diagnosis of skin disease. Appl. Soft Comput. 2021, 111, 107707. [Google Scholar] [CrossRef]
  30. Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning Structured Sparsity in Deep Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
  31. He, Y.; Zhang, X.; Sun, J. Channel Pruning for Accelerating Very Deep Neural Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
  32. Min, C.; Wang, A.; Chen, Y.; Xu, W.; Chen, X. 2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy. arXiv 2018, arXiv:1809.02220. [Google Scholar]
  33. Yang, C.; Yang, Z.; Khattak, A.M.; Yang, L.; Zhang, W.; Gao, W.; Wang, M. Structured pruning of convolutional neural networks via l1 regularization. IEEE Access 2019, 7, 106385–106394. [Google Scholar] [CrossRef]
  34. Zhuang, L.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, G. Learning Efficient Convolutional Networks through Network Slimming. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  35. Luo, J.H.; Wu, J.; Lin, W. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  36. Yu, R.; Li, A.; Chen, C.-F.; Lai, J.-H.; Morariu, V.; Han, X.; Gao, M.; Lin, Y.; Davis, L. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  37. Song, F.; Wang, Y.; Guo, Y.; Zhu, C. A channel-level pruning strategy for convolutional layers in cnns. In Proceedings of the 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), Guiyang, China, 22–24 August 2018. [Google Scholar]
  38. Yamamoto, K.; Maeno, K. PCAS: Pruning Channels with Attention Statistics for Deep Network Compression. arXiv 2018, arXiv:1806.05382. [Google Scholar]
  39. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  40. Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global second-order pooling convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  41. Sui, Y.; Yin, M.; Xie, Y.; Phan, H.; Aliari Zonouz, S.; Yuan, B. Chip: Channel independence-based pruning for compact neural networks. Adv. Neural Inf. Process. Syst. 2021, 34, 24604–24616. [Google Scholar]
  42. Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Comput. Sci. 2015, 2048–2057. [Google Scholar] [CrossRef]
  43. Wang, J.; Jiang, T.; Cui, Z.; Cao, Z. Filter pruning with a feature map entropy importance criterion for convolution neural networks compressing. Neurocomputing 2021, 461, 41–54. [Google Scholar] [CrossRef]
  44. Hu, Y.; Sun, S.; Li, J.; Wang, X.; Gu, Q. A novel channel pruning method for deep neural network compression. arXiv 2018, arXiv:1805.11394. [Google Scholar]
  45. Shao, W.; Yu, H.; Zhang, Z.; Xu, H.; Li, Z.; Luo, P. BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening. arXiv 2021, arXiv:2105.06423. [Google Scholar]
  46. Wang, Z.; Li, F.; Shi, G.; Xie, X.; Wang, F. Network pruning using sparse learning and genetic algorithm—ScienceDirect. Neurocomputing 2020, 404, 247–256. [Google Scholar] [CrossRef]
  47. Aflalo, Y.; Noy, A.; Lin, M.; Friedman, I.; Zelnik, L. Knapsack Pruning with Inner Distillation. arXiv 2020, arXiv:2002.08258. [Google Scholar]
  48. Chen, T.; Ji, B.; Ding, T.; Fang, B.; Wang, G.; Zhu, Z.; Liang, L.; Shi, Y.; Yi, S.; Tu, X. Only Train Once: A One-Shot Neural Network Training and Pruning Framework. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021. [Google Scholar]
  49. Molchanov, P.; Mallya, A.; Tyree, S.; Frosio, I.; Kautz, J. Importance Estimation for Neural Network Pruning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.