Skip to Content
SensorsSensors
  • Article
  • Open Access

23 October 2020

Compressing Deep Networks by Neuron Agglomerative Clustering

,
,
,
,
,
and
1
Department of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
2
Innovation Center, Ocean University of China, Qingdao 266100, China
3
Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India
4
Department of Electrical and Electronical Engineering, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China

Abstract

In recent years, deep learning models have achieved remarkable successes in various applications, such as pattern recognition, computer vision, and signal processing. However, high-performance deep architectures are often accompanied by a large storage space and long computational time, which make it difficult to fully exploit many deep neural networks (DNNs), especially in scenarios in which computing resources are limited. In this paper, to tackle this problem, we introduce a method for compressing the structure and parameters of DNNs based on neuron agglomerative clustering (NAC). Specifically, we utilize the agglomerative clustering algorithm to find similar neurons, while these similar neurons and the connections linked to them are then agglomerated together. Using NAC, the number of parameters and the storage space of DNNs are greatly reduced, without the support of an extra library or hardware. Extensive experiments demonstrate that NAC is very effective for the neuron agglomeration of both the fully connected and convolutional layers, which are common building blocks of DNNs, delivering similar or even higher network accuracy. Specifically, on the benchmark CIFAR-10 and CIFAR-100 datasets, using NAC to compress the parameters of the original VGGNet by 92.96% and 81.10%, respectively, the compact network obtained still outperforms the original networks.

1. Introduction

In order to solve challenging deep learning problems, such as pattern recognition and computer vision [1,2,3], researchers tend to design deep neural networks (DNNs) with complex structures and many neurons. However, as DNNs go deeper and deeper, the number of parameters increases dramatically. Therefore, huge storage space and long inference time are usually required by DNNs, which leads them to be only deployed on computational servers with graphics processing units (GPUs) [4]. Nevertheless, for more common mobile devices with limited storage and computing resources, network storage and running efficiency are very important factors. Therefore, it is almost impossible to directly apply large-scale DNNs on them. In contrast, shallow neural networks are much easier to store and more computationally efficient. However, shallow neural networks cannot generally match the performance of DNNs. To that end, it is necessary to compress DNNs and deploy them on devices with limited storage and computing resources.
In most DNNs, fully connected layers and convolutional layers are two widely used building blocks. In particular, the fully connected layers have dense connections and correspondingly a large number of parameters. Alternatively, the convolutional layers are important for learning layer-wise representations of the input data and are always computationally intensive. For example, in VGGNet-16, the parameters of the fully connected layers account for more than 90% of the total ones, whilst the floating-point operations (FLOPs) account for only less than 1% of the total FLOPs [5]. However, some existing network compression methods can only work on fully connected layers [6], while some approaches to compress convolutional layers require an additional sparse BLASlibrary or special hardware support [7,8,9].
In this paper, we introduce a systematic DNN compression method built on neuron agglomerative clustering (NAC), which is mainly applied to the neurons/feature maps of fully connected layers and convolutional layers. In particular, NAC agglomerates neurons/feature maps in the network instead of pruning individual weights at a time. For concreteness, we attain similar neurons/feature maps in the neural network through agglomerative clustering and then respectively agglomerate them and their related connections together. As NAC does not cause sparse connections, it does not require an additional library or hardware support. Last but not the least, during the process of agglomerative clustering, NAC does not need to use the original training data, but only the weights and biases connected to the neurons/feature maps.
The rest of this paper is organized as follows: In the following section, we briefly introduce some related work, including some neural network compression approaches and the agglomerative clustering algorithm used in the proposed network compression method. In Section 2, we describe the proposed method, NAC, in detail. The experimental results are reported in Section 3, while Section 4 concludes this paper with remarks.

3. The Proposed Approach

In this section, we first introduce the proposed network compression algorithm based on neuron agglomerative clustering (NAC) in detail. Then, we specify the application of NAC to the fully connected and convolutional layers.

3.1. Network Compression Based on Neuron Agglomerative Clustering

In general, neurons in the same layer of DNNs are redundant, which may require extra storage space and running time during learning. Here, we present a network compression algorithm based on neuron agglomerative clustering (NAC), which can be applied to both fully connected and convolutional layers and greatly reduce the redundancy of the neurons/feature maps.
In order to facilitate the following analysis, we first specify some notations. In a neural network, the activation output of a neuron is denoted as:
a i l = f ( j = 1 n l 1 W i , j l a j l 1 + b i l ) ,
where f is the activation function, n l 1 represents the number of neurons in the (l − 1)th layer, W i , j l represents the weight between the ith neuron in the lth layer and the jth neuron in the (l − 1)th layer, a j l 1 is the the activation output of the jth neuron in the (l − 1)th layer, and b i l is the bias of the ith neuron in the lth layer. According to Equation (1), the activation output of the ith neuron in the lth layer is uniquely determined by the weight [ W i , 1 l , W i , 2 l , W i , 3 l , , W i , n l 1 l ] , bias b i l , and the outputs of the previous layer. Therefore, each neuron can be expressed as:
n e u i l = [ W i , 1 l , W i , 2 l , W i , 3 l , , W i , n l 1 l , b i l ] ,
where n e u i l represents the ith neuron in the lth layer.
For all neurons of the lth layer in the network, we write them together as a set:
S l = { n e u 1 l , n e u 2 l , n e u 3 l , , n e u n l l } .
For each hidden layer l in DNNs, to find similar neurons, we perform the agglomerative clustering algorithm on the neuron set S l that includes all the neurons in the same layer. Concretely, in the beginning of the agglomerative clustering algorithm, we consider each neuron as a cluster and measure their similarity using the Euclidean distance. Then, the most similar pair of neurons is merged together, and their mean is taken as the new cluster center. Following that, we repeatedly agglomerate the clusters by minimizing the variances of them and update the cluster means, until the given compression ratio is reached [62]. To the end, we take into account the neurons in the same cluster that are similar to each other. Furthermore, the clustering center of each cluster is retained as a new neuron to represent the similar neurons in the corresponding cluster (i.e., multiple similar neurons in the same layer are agglomerated into a single cluster centroid). Specifically, in a network, neurons in layer l have connections with neurons in layer l − 1 and layer l + 1. After agglomerating multiple neurons in layer l, we need to adjust the architecture and weights to maintain the performance of the network. Therefore, we must retain the cluster assignment information of each neuron (i.e., into which cluster each neuron in the original network was specifically divided). This is very important for the subsequent architecture adjustment. We use k l to denote the number of cluster centers in the lth layer, and the cluster centroids set in the lth layer can be expressed as:
P l = { p 1 l , p 2 l , p 3 l , , p k l l } ,
where p i l represents the centroid of the ith cluster in the lth layer. The cluster assignment information of the lth layer is expressed as:
R l = [ r 1 l , r 2 l , r 3 l , , r n l l ] ;
here, r i l indicates into which cluster the ith neuron in the lth layer is divided. In the lth layer, neurons divided into the same cluster are agglomerated to the cluster center. To be more concrete, for the neuron n e u i l , r i l = m denotes that n e u i l is divided into the mth cluster, and then, n e u i l is replaced by the center of the cluster p m l . After neurons in a cluster are agglomerated, the connections between these neurons and related neurons must also be agglomerated accordingly. In other words, all related connections of the agglomerated neurons must be agglomerated to a new connection and connected to the clustering center p m l . Since neurons in the same cluster are similar, we use the addition operation to agglomerate connections. Here, we give a simple derivation process. First, we suppose that neurons in one cluster are the same. With this assumption, if r i l = r j l , then n e u i l = n e u j l is satisfied. In particular, if the same input is fed into the network, these neurons will generate the same activation output. As a result, we can calculate the activation output of the ith neuron in the (l + 1)th layer as:
a i l + 1 = f ( j = 1 n l W i , j l + 1 a j l + b i l + 1 )
= f ( p = 1 k l ( r j = p W i , j l + 1 a j l ) + b i l + 1 )
f ( p = 1 k l ( r j = p W i , j l + 1 a ^ p l ) + b i l + 1 )
= f ( p = 1 k l W ˜ i , p l + 1 a ^ p l + b i l + 1 ) ,
where W ˜ i , p l + 1 = r j = p W i , j l + 1 means the agglomeration of the connections in the network and a ^ p l represents the activation output of the pth cluster center in the lth layer. In the end, the connections corresponding to the similar neurons are also agglomerated together.
Informally, the proposed network compression method based on NAC is shown in Algorithm 1.
Algorithm 1 Neuron agglomerative clustering.
1:
For each layer l in the network:
2:
Let n e u i l = [ W i , 1 l , W i , 2 l , W i , 3 l , , W i , n l 1 l , b i l ] ;
3:
Construct neuron set S l = { n e u 1 l , n e u 2 l , n e u 3 l , , n e u n l l } ;
4:
Cluster the neurons in set S l into k l groups using agglomerative clustering;
5:
Construct a set of cluster centroids P l = { p 1 l , p 2 l , p 3 l , , p k l l } ;
6:
Agglomerate neurons of the same cluster into its cluster centroid;
7:
Remember the agglomerating list R l = [ r 1 l , r 2 l , r 3 l , , r n l l ] ;
8:
Calculate W ˜ i , p l + 1 = j = 1 n l I ( r j l = p ) W i , j l + 1 , i = 1 , , n l + 1 , p = 1 , , k l , where I ( · ) is the indication function;
9:
Agglomerate connections of layer l + 1 into W ˜ i , p l + 1 . Bias remains unchanged.

3.2. Applying NAC to Fully Connected Layers and Convolutional Layers

In this section, we specify how to apply the proposed NAC method to the fully connected layers and the convolutional layers.
The process of agglomerating neurons in fully connected layers is illustrated in Figure 1. The upper part of Figure 1 shows the structure of the neural network before agglomerating. The two neurons drawn with dashed lines in the l i + 1 th layer are two similar neurons divided into the same cluster by the agglomerative clustering algorithm. The colored dashed lines indicate their connections with the neurons in the l i th layer and the l i + 2 th layer. After the two neurons and their related connections are agglomerated, we can obtain the compressed network structure as shown in the lower part of Figure 1. The two similar neurons in the l i + 1 th layer are agglomerated into a cluster centroid obtained by the agglomerative clustering algorithm, and the connections between the neurons are also merged into new connections with the same colors, constructing a compact network structure.
Figure 1. Agglomerating neurons in the fully connected layers.
For the application of NAC to the convolutional layers, we only need some small modifications to the convolutional filters. The rest of the processes are exactly the same as NAC applied to the fully connected layers. In the following, we introduce the application of NAC to the convolutional layers.
In the convolutional layers, neurons in the same feature map share the same parameters (convolutional kernels and biases). These neurons also include the spatial information determined by their specific location. Therefore, we do not agglomerate these neurons directly to avoid destroying the effectiveness of the convolutional network. We consider agglomerating neurons from different feature maps. We represent the neurons in the same feature map as a tuple consisting of a convolutional kernel and bias. Before performing the agglomerating operation on the neurons, we first reshape this tuple { k e r n e l , b i a s } into a one-dimensional vector, and then, the agglomerating operation is applied the same as for the fully connected layer. Finally, the agglomerated neurons are reshaped into the tuple composed of the convolutional kernel and bias, and the corresponding parameters are put back into the convolutional neural network to complete the agglomerating process and maintain the structure and performance of the entire neural network. The specific neuron agglomerating process of the convolutional layers is shown in Figure 2.
Figure 2. Agglomerating neurons in the convolutional layers.
After agglomerating neurons in the fully connected layers and the convolutional layers, we can obtain a compact neural network, which greatly reduces the parameters of the original network. However, the agglomeration of a large number of neurons in the network may cause the loss of accuracy in the network; specifically, the compression ratio is relatively high. We combine the fine-tuning operation to improve the accuracy of the compressed network, and the experiments showed that by fine-tuning the compressed network, we can even get higher accuracy than the original network, as shown in the following section.
Based on the above introduction, the overall process of the proposed network compression method based on NAC can be summarized as that shown in Figure 3. First, we fully train the original network, then we apply the NAC algorithm to perform network compression to obtain the compact network. Finally, we fine-tune the compressed network and adjust the parameters of the network. After the above steps, we obtain the final compact network. Here, please note that the proposed network compression method based on NAC can be easily combined with other network compression approaches (e.g., quantization and pruning) to learn more compact networks.
Figure 3. The process of the network compression based on neuron agglomerative clustering.

4. Experiments and Results

To evaluate the proposed network compression method, we conducted extensive experiments on the Mixed National Institute of Standards and Technology Database (MNIST), CIFAR-10, and CIFAR-100 datasets. Specifically, we applied NAC on three networks: a deep belief network (DBN) and two convolutional neural networks (CNNs). In the following, we report the results of the experiments separately, where the best results shown in the tables are highlighted in boldface.

4.1. The Used Datasets

4.1.1. MNIST

MNIST is a commonly used dataset in the computer vision field to test the methods for handwritten digit recognition. The dataset has 10 categories and contains 70,000 gray scale images in total. Among them, 60,000 images are used to train the neural networks, and the other 10,000 images are used for testing. Each image 28 × 28 in size.

4.1.2. CIFAR

The CIFAR-10 dataset contains 60,000 color images in 10 classes, while each class has 6000 images. It is divided into 50,000 training images and 10,000 test images. Each image is 32 × 32 × 3 in size. Specifically, the training data are randomly divided into five training batches, and each batch has 10,000 images. Alternatively, the CIFAR-100 dataset has 100 classes, while each class contains 600 images. In addition, each class has 500 training images and 100 test images. The image size is 32 × 32 × 3.

4.2. Results on the MNIST Dataset

The experiments in this section were performed on the deep belief network (DBN) as presented in Table 1. Figure 4 visualizes the neurons in the first hidden layer of the well-trained DBN. It can be seen from Figure 4 that several neurons in the first hidden layer learn similar features. For instance, the 44th (highlighted with the blue rectangle) and 183rd (highlighted with the red rectangle) neurons almost show the same pattern. This fact indicates that neurons in the original DBNs are severely redundant. Therefore, to obtain a compact network structure, it is necessary to reduce the redundancy of the neurons in it.
Table 1. The result of the original and compressed DBN on the MNIST dataset.
Figure 4. Visualization of neurons in the first hidden layer of the original DBN.
In our experiments, we first reduced the neurons in each layer of the original DBN to 300, 300, and 1000, respectively, and obtained a compact network with a compression ratio of 61.68%. The experimental results are shown in Table 1. From Table 1, we can see that even without fine-tuning, our network compression method NAC has almost no loss of accuracy compared to the original network. This proves that the proposed network compression method effectively agglomerates redundant neurons in the network. Furthermore, this also demonstrates the feasibility and rationality of the network compression method proposed in this paper.
To further verify the effectiveness of NAC at a higher compression ratio, we set the number of neurons in the three layers of the DBN to 200, 100, and 100, respectively, for network compression. The classification results are shown in Table 2. When the compression ratio reaches 88.62%, compared with the original network, the compressed network after fine-tuning obtains a higher accuracy than the original network, which justifies the effectiveness of the proposed network compression method in this paper.
Table 2. Architecture and performance of the compressed network with a high compression ratio.
In order to investigate whether the good performance of the compressed network can be attributed to the new network architecture and for a more fair comparison, we rebuilt a neural network with the same architecture (the numbers of neurons in the hidden layers are 200, 100, and 100). We fully trained it and fine-tuned it. In the end, the error rate of the reconstructed network is 1.16%, higher than that of the compressed network. Therefore, the good performance of the compressed network cannot be attributed to its new network architecture, but to the reduction of the redundant information in the original network.
Subsequently, we verified the effectiveness of the proposed NAC method on a convolutional neural network (CNN). The structure of the convolutional neural network we used is shown in Table 3. After we fully trained this CNN, we used the proposed NAC method to compress it. The classification results are shown in the third column of Table 3, and as we can see, the compressed network after fine-tuning obtained a higher accuracy than the original network. Comparing the original network and the compressed network, we can see that our network compression method is also quite effective for CNNs. Moreover, this demonstrates the feasibility and versatility of the proposed NAC method again.
Table 3. The classification results of the original and compressed convolutional neural networks tested on the Mixed National Institute of Standards and Technology Database (MNIST).

4.3. Results on the CIFAR Datasets

In the following experiments, we verified the effectiveness of our proposed NAC method on a relatively deeper network, the classic VGGNet-16. To show the superiority of the agglomerative clustering algorithm, we compared it with the k-means clustering method [2], which is one of our methods built for network compression.
Table 4 shows the experimental results of compressing VGGNet-16 on the CIFAR-10 and CIFAR-100 datasets. VGGNet (Model-A) corresponds to the results of network compression based on the k-means clustering of neurons, while VGGNet (Model-B) corresponds to the network compression based on agglomerative clustering of neurons. From Table 4, we can see that no matter whether k-means clustering or agglomerative clustering is used for network compression on the CIFAR-10 dataset, when the compression ratio is as high as 92.96%, our network compression method can still obtain higher accuracy than the original network. This demonstrates that the proposed network compression method is very effective to deep CNNs. By comparing the results of Model-A and Model-B on both the CIFAR-10 and CIFAR-100 datasets, we can see that the results obtained using agglomerative clustering for network compression are consistently better than those obtained using k-means clustering. This is because the k-means clustering algorithm needs to randomly initialize the cluster centers. This may affect the effect of the k-means clustering, e.g., obtaining different cluster centers due to different initializations of the cluster centers, which may further affect the performance of the compressed network. Therefore, the agglomerative clustering is more suitable for the network compression tasks, as it is a much more stable algorithm than k-means clustering, basically delivering same positions of the cluster centers given the compression ratio. Overall, the results shown in Table 4 demonstrate the effectiveness of the proposed network compression method, NAC.
Table 4. The results obtained on the CIFAR-10 and CIFAR-100 datasets. P-Pruned refers to the pruned ratio of parameters, and F-Pruned refers to the pruned ratio of floating-point operations per second (FLOPs). The best results are highlighted with bold face.
Table 5 shows the comparison results of network compression on the CIFAR-10 and CIFAR-100 datasets using agglomerative clustering, k-means clustering, and randomly merging the neurons without any clustering method. For a fair comparison, the three methods compressed the same number of neurons in each layer of the original network, and the test errors in the tables were obtained by directly testing the compressed network without fine-tuning. By comparing NAC with the method based on randomly merging neurons, we can see from Table 5 that NAC has almost no harm on the performance of the original neural network, but the method based on randomly merging neurons causes severe damage to the performance of the original network. At the same time, by comparing the results obtained by applying the agglomerative clustering and the k-means clustering, it is obviously shown that the results obtained by applying the agglomerative clustering are consistently better than that obtained by applying the k-means clustering. This is not because of the influence of fine-tuning, but because the agglomerative clustering algorithm itself is more suitable for the similar neurons’ clustering.
Table 5. Comparison results between neuron agglomerative clustering (NAC) and network compression by randomly merging neurons and using k-means clustering on the CIFAR-10 and CIFAR-100 datasets.
To further demonstrate the effectiveness of the proposed network compression method, we compared NAC with three other state-of-the-art network compression methods. Among them, the work in [63] enforced the channel-level sparsity of CNNs; the method presented in [5] pruned filters from CNNs that were identified as having a small effect on the output accuracy; while [64] categorized all the parameters into two parts at each training iteration and updated them using different rules. The results obtained on the CIFAR-10 and CIFAR-100 datasets are shown in Table 6. For a fair comparison, we used the same network structure provided in [63]. From Table 6, we can see that the model obtained by NAC has higher accuracy when the compression ratio is similar for all the compared methods. This justifies the superiority of the network compression method based on NAC over existing approaches.
Table 6. Comparison between NAC and two related approaches on the CIFAR-10 and CIFAR-100 datasets.

5. Conclusions

In this paper, we propose a novel neural network compression method based on neuron agglomerative clustering (NAC). Built upon the fact that neurons and connections in the deep neural networks are redundant, we use NAC to agglomerate similar neurons and their corresponding connections. For concreteness, we apply the agglomerative clustering algorithm to cluster neurons in the same layer and find similar neurons in each layer, and then, the similar neurons and their corresponding connections are agglomerated together. Finally, we fine-tune the compressed network to obtain a compact network with no loss of accuracy compared to the original network. We conducted experiments on DBN and two CNNs, including the classic VGGNet and obtained excellent experimental results. In particular, we compared NAC with two state-of-the-art network compression approaches on VGGNet and obtained better classification results than them. These experimental results faithfully demonstrate the effectiveness of the proposed NAC method for network compression.

Author Contributions

Conceptualization, L.-N.W. and J.D.; methodology, G.Z. and W.L.; software, W.L., X.L., and P.P.R.; validation, L.-N.W., W.L., X.L., and G.Z.; formal analysis, K.H.; investigation, L.-N.W., X.L., and K.H.; resources, L.-N.W., W.L., and X.L.; writing, original draft preparation, L.-N.W. and W.L.; writing, review and editing, G.Z., J.D. and K.H.; visualization, P.P.R.; supervision, G.Z., J.D. and K.H.; project administration, J.D. and K.H.; funding acquisition, G.Z. and J.D. All authors read and agreed to the published version of the manuscript.

Funding

This research was funded by the Major Project for the New Generation of AI under Grant No. 2018AAA0100400, the National Natural Science Foundation of China (NSFC) under Grant Nos. 41706010, U1706218 and 41927805, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grant No. 6141A020337, the Project for Graduate Student Education Reformation and Research of Ocean University of China under Grant No. HDJG19001, and the Fundamental Research Funds for the Central Universities of China under Grant No. 201964022.

Acknowledgments

The authors would like to thank the guest editors and the anonymous reviewers for their work and time on the publication of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. LeCun, Y.; Bengio, Y.; Hinton, G.E. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  2. Zhong, G.; Yan, S.; Huang, K.; Cai, Y.; Dong, J. Reducing and Stretching Deep Convolutional Activation Features for Accurate Image Classification. Cogn. Comput. 2018, 10, 179–186. [Google Scholar] [CrossRef]
  3. Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P.P. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
  4. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.S.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  5. Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. arXiv 2017, arXiv:1608.08710. [Google Scholar]
  6. Srinivas, S.; Babu, R.V. Data-free Parameter Pruning for Deep Neural Networks. arXiv 2015, arXiv:1507.06149. [Google Scholar]
  7. Iandola, F.N.; Moskewicz, M.W.; Ashraf, K.; Han, S.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
  8. Wang, K.; Liu, Z.; Lin, Y.; Lin, J.; Han, S. HAQ: Hardware-Aware Automated Quantization With Mixed Precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–19 June 2019; pp. 8612–8620. [Google Scholar]
  9. Han, S.; Cai, H.; Zhu, L.; Lin, J.; Wang, K.; Liu, Z.; Lin, Y. Design Automation for Efficient Deep Learning Computing. arXiv 2019, arXiv:1904.10616. [Google Scholar]
  10. Peng, B.; Tan, W.; Li, Z.; Zhang, S.; Xie, D.; Pu, S. Extreme Network Compression via Filter Group Approximation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 307–323. [Google Scholar]
  11. Son, S.; Nah, S.; Lee, K.M. Clustering Convolutional Kernels to Compress Deep Neural Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 225–240. [Google Scholar]
  12. Li, Y.; Gu, S.; Gool, L.V.; Timofte, R. Learning Filter Basis for Convolutional Neural Network Compression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 5622–5631. [Google Scholar]
  13. Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv 2017, arXiv:1611.06440. [Google Scholar]
  14. Liu, X.; Li, W.; Huo, J.; Yao, L.; Gao, Y. Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio. In Proceedings of the AAAI, New York, NY, USA, 7–12 February 2020; pp. 4900–4907. [Google Scholar]
  15. Frankle, J.; Carbin, M. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv 2019, arXiv:1803.03635. [Google Scholar]
  16. Yu, J.; Tian, S. A Review of Network Compression Based on Deep Network Pruning. In Proceedings of the 3rd International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2019), Dalian, China, 29–30 March 2019; pp. 308–319. [Google Scholar]
  17. LeCun, Y.; Denker, J.S.; Solla, S.A. Optimal Brain Damage. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 26–29 November 1989; pp. 598–605. [Google Scholar]
  18. Hassibi, B.; Stork, D.G. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 30 November–3 December 1989; pp. 164–171. [Google Scholar]
  19. Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 10–16 December 2016. [Google Scholar]
  20. Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning both Weights and Connections for Efficient Neural Network. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 1135–1143. [Google Scholar]
  21. Anwar, S.; Hwang, K.; Sung, W. Structured Pruning of Deep Convolutional Neural Networks. ACM J. Emerg. Technol. Comput. Syst. 2017, 13, 32. [Google Scholar] [CrossRef]
  22. Figurnov, M.; Ibraimova, A.; Vetrov, D.P.; Kohli, P. PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 10–16 December 2016; pp. 947–955. [Google Scholar]
  23. Hu, H.; Peng, R.; Tai, Y.; Tang, C. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. arXiv 2016, arXiv:1607.03250. [Google Scholar]
  24. Rueda, F.M.; Grzeszick, R.; Fink, G.A. Neuron Pruning for Compressing Deep Networks Using Maxout Architectures. In Lecture Notes in Computer Science; CGC Press: New York, NY, USA, 2017; Volume 10496, pp. 177–188. [Google Scholar]
  25. Denton, E.L.; Zaremba, W.; Bruna, J.; LeCun, Y.; Fergus, R. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1269–1277. [Google Scholar]
  26. Lin, S.; Ji, R.; Guo, X.; Li, X. Towards Convolutional Neural Networks Compression via Global Error Reconstruction. In Proceedings of the 2016 International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, USA, 9–15 July 2016; pp. 1753–1759. [Google Scholar]
  27. Wolter, M.; Lin, S.; Yao, A. Towards deep neural network compression via learnable wavelet transforms. arXiv 2020, arXiv:2004.09569. [Google Scholar]
  28. Jaderberg, M.; Vedaldi, A.; Zisserman, A. Speeding up Convolutional Neural Networks with Low Rank Expansions. arXiv 2014, arXiv:1405.3866. [Google Scholar]
  29. Zhang, X.; Zou, J.; He, K.; Sun, J. Accelerating Very Deep Convolutional Networks for Classification and Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1943–1955. [Google Scholar] [CrossRef]
  30. Denil, M.; Shakibi, B.; Dinh, L.; Ranzato, M.; de Freitas, N. Predicting Parameters in Deep Learning. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, CA, USA, 5–8 December 2013; pp. 2148–2156. [Google Scholar]
  31. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  32. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
  33. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
  34. Qi, W.; Su, H.; Yang, C.; Ferrigno, G.; Momi, E.D.; Aliverti, A. A Fast and Robust Deep Convolutional Neural Networks for Complex Human Activity Recognition Using Smartphone. Sensors 2019, 19, 3731. [Google Scholar] [CrossRef]
  35. Liu, J.; Chen, F.; Yan, J.; Wang, D. CBN-VAE: A Data Compression Model with Efficient Convolutional Structure for Wireless Sensor Networks. Sensors 2019, 19, 3445. [Google Scholar] [CrossRef]
  36. Salakhutdinov, R.; Mnih, A.; Hinton, G.E. Restricted Boltzmann Machines for Collaborative Filtering. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 791–798. [Google Scholar]
  37. Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2014, arXiv:1312.6114. [Google Scholar]
  38. Pu, Y.; Gan, Z.; Henao, R.; Yuan, X.; Li, C.; Stevens, A.; Carin, L. Variational Autoencoder for Deep Learning of Images, Labels and Captions. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 10–16 December 2016; pp. 2352–2360. [Google Scholar]
  39. Ba, J.; Caruana, R. Do Deep Nets Really Need to be Deep? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2654–2662. [Google Scholar]
  40. Hinton, G.E.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
  41. Aguinaldo, A.; Chiang, P.; Gain, A.; Patil, A.; Pearson, K.; Feizi, S. Compressing GANs using Knowledge Distillation. arXiv 2019, arXiv:1902.00159. [Google Scholar]
  42. Chen, G.; Choi, W.; Yu, X.; Han, T.X.; Chandraker, M. Learning Efficient Object Detection Models with Knowledge Distillation. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 742–751. [Google Scholar]
  43. Li, T.; Li, J.; Liu, Z.; Zhang, C. Knowledge Distillation from Few Samples. CoRR 2018, abs/1812.01839. [Google Scholar]
  44. Luo, P.; Zhu, Z.; Liu, Z.; Wang, X.; Tang, X. Face Model Compression by Distilling Knowledge from Neurons. In Proceedings of the AAAI, Phoenix, AZ, USA, 12–17 February 2016; pp. 3560–3566. [Google Scholar]
  45. Yim, J.; Joo, D.; Bae, J.; Kim, J. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7130–7138. [Google Scholar]
  46. Li, M.; Lin, J.; Ding, Y.; Liu, Z.; Zhu, J.; Han, S. GAN Compression: Efficient Architectures for Interactive Conditional GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Montreal, QC, Canada, 14–19 July 2020. [Google Scholar]
  47. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
  48. Chen, H.; Wang, Y.; Xu, C.; Yang, Z.; Liu, C.; Shi, B.; Xu, C.; Xu, C.; Tian, Q. Data-Free Learning of Student Networks. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
  49. Peng, B.; Jin, X.; Liu, J.; Zhou, S.; Wu, Y.; Liu, Y.; Li, D.; Zhang, Z. Correlation Congruence for Knowledge Distillation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
  50. Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Lecture Notes in Computer Science; ECCV: Prague, Czech, 2016; Volume 9908, pp. 525–542. [Google Scholar]
  51. Li, F.; Liu, B. Ternary Weight Networks. arXiv 2016, arXiv:1605.04711. [Google Scholar]
  52. Zhu, C.; Han, S.; Mao, H.; Dally, W.J. Trained Ternary Quantization. arXiv 2016, arXiv:1612.01064. [Google Scholar]
  53. Miao, H.; Li, A.; Davis, L.S.; Deshpande, A. Towards Unified Data and Lifecycle Management for Deep Learning. In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 April 2017; pp. 571–582. [Google Scholar]
  54. Louizos, C.; Ullrich, K.; Welling, M. Bayesian Compression for Deep Learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  55. Li, Z.; Ni, B.; Zhang, W.; Yang, X.; Gao, W. Performance Guaranteed Network Acceleration via High-Order Residual Quantization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2603–2611. [Google Scholar]
  56. Hu, Y.; Li, J.; Long, X.; Hu, S.; Zhu, J.; Wang, X.; Gu, Q. Cluster Regularized Quantization for Deep Networks Compression. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019. [Google Scholar]
  57. Cheng, Y.; Yu, F.X.; Feris, R.S.; Kumar, S.; Choudhary, A.N.; Chang, S. An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 3–7 December 2015; pp. 2857–2865. [Google Scholar]
  58. Ma, Y.; Suda, N.; Cao, Y.; Seo, J.; Vrudhula, S.B.K. Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications, FPL, Lausanne, Switzerland, 29 August–2 September 2016; pp. 1–8. [Google Scholar]
  59. Gysel, P. Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks. arXiv 2016, arXiv:1605.06402. [Google Scholar]
  60. Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 10–16 December 2016; pp. 4107–4115. [Google Scholar]
  61. Aggarwal, C.C.; Reddy, C.K. (Eds.) Data Clustering: Algorithms and Applications, 1st ed.; Data Mining and Knowledge Discovery; Chapman and Hall/CRC: New York, NY, USA, 2014. [Google Scholar]
  62. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley: New York, NY, USA, 2001. [Google Scholar]
  63. Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks through Network Slimming. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2755–2763. [Google Scholar]
  64. Ding, X.; Ding, G.; Zhou, X.; Guo, Y.; Han, J.; Liu, J. Global Sparse Momentum SGD for Pruning Very Deep Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 6382–6394. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.