Feature Map Analysis-Based Dynamic CNN Pruning and the Acceleration on FPGAs

Li, Qi; Li, Hengyi; Meng, Lin

doi:10.3390/electronics11182887

Open AccessArticle

Feature Map Analysis-Based Dynamic CNN Pruning and the Acceleration on FPGAs

by

Qi Li

^†

,

Hengyi Li

^†

and

Lin Meng

^*,†

Department of Electronic and Computer Engineering, Ritsumeikan University, Kusatsu 525-8577, Japan

^*

Author to whom correspondence should be addressed.

^†

Current address: College of Science and Engineering, Ritsumeikan University, 1-1-1, Nojihigashi, Kusatsu 525-8577, Japan.

Electronics 2022, 11(18), 2887; https://doi.org/10.3390/electronics11182887

Submission received: 23 August 2022 / Revised: 7 September 2022 / Accepted: 7 September 2022 / Published: 12 September 2022

(This article belongs to the Collection Graph Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Deep-learning-based applications bring impressive results to graph machine learning and are widely used in fields such as autonomous driving and language translations. Nevertheless, the tremendous capacity of convolutional neural networks makes it difficult for them to be implemented on resource-constrained devices. Channel pruning provides a promising solution to compress networks by removing a redundant calculation. Existing pruning methods measure the importance of each filter and discard the less important ones until reaching a fixed compression target. However, the static approach limits the pruning effect. Thus, we propose a dynamic channel-pruning method that dynamically identifies and removes less important filters based on a redundancy analysis of its feature maps. Experimental results show that 77.10% of floating-point operations per second (FLOPs) and 91.72% of the parameters are reduced on VGG16BN with only a 0.54% accuracy drop. Furthermore, the compressed models were implemented on the field-programmable gate array (FPGA) and a significant speed-up was observed.

Keywords:

convolutional neural network; dynamic channel pruning; FPGA acceleration; graph machine learning; redundancy analysis

1. Introduction

Convolution Neural Networks (CNNs) have achieved impressive performance in a wide range of applications [1,2,3]: classification [4], cultural heritage protection [5], environment monitoring [6] and disease detection [7,8]. However, the success of CNNs is accompanied by an obvious increase in memory and computational cost. Thus, it is not practical to deploy heavy CNNs on resource-constrained computing devices like embedded systems and mobile devices. To address these problems, various network compression methods have been proposed: channel pruning [9], weight quantization [10], and low-rank decomposition [11]. Channel pruning significantly reduces the number of floating-point operations (FLOPs) and parameters by removing redundant structures such as entire filters. In addition, it is more hardware-friendly as it only changes the width of the network without adding operations or structures [12].

Because of the inferences of CNNs, depending on their parameters, it is difficult to discard filters without causing a performance drop. Hence, channel pruning entails a trade-off between the compression ratio and performance [13,14]. Many works [15] By measure the importance of filters based on proposal criteria thereby differentiating the less important filters, which can be removed with only a slight accuracy loss. The network is then fine-tuned to regain an accuracy close to baseline. Moreover, in some methods [16,17,18,19], iterative processing is adopted that further improves compression by repeating the pruning and fine-tuning stages.

However, the pruning compression target becomes a potential problem. To achieve a compact network, some researchers set the pruning compression target (e.g., reducing a certain amount of computation [13,20], or reducing a fixed percentage of filters [21,22,23]), and then prune the least important filters until the compression target is reached. The target of those methods is empirically determined based on a lot of experimental data. Because of the limitation of the static approach, mistaken pruning of important filters or inadequate compression is inevitable. Because of a low adaptability, this problem is exacerbated when adopting iterative processing. On the other hand, dynamic methods have advantages in iterative processing and are robust to various network models.

Therefore, this paper proposes a dynamic channel pruning method to compress CNNs. Specifically, the importance of filters is estimated by analyzing the corresponding feature maps. Then binary search algorithms are used to predict fine-tuning results and achieve dynamic unimportant filter identification and pruning. Experimental results indicate that our method can be applied to various datasets and models, and it achieves excellent compression results compared to other approaches. Furthermore, the pruned networks are implemented on a field-programmable gate array (FPGA) for network acceleration.

The main contributions of the paper are as follows:

a dynamic method to evaluate the redundancy of filters by observing the intermediate feature map. The evaluation process needs no additional constraints or structures and has good generalization ability.
implementing the proposed pruned networks on a FPGA and achieving significant acceleration.

The rest of the paper is organized as follows: Section 2 introduces related works. Section 3 details the methodology. Section 4 shows the experimental results. Section 5 conclude the paper.

2. Related Work

Channel pruning is effective for CNN compression, and research into this technique has led to many achievements. However, removing the structure entails damage to the network, so many methods [24,25,26] tend to discriminate and remove unimportant filters. Li et al. [21] measured the relative importance of a filter by calculating the sum of its absolute weights then pruned the least important ones and their corresponding feature maps. Afterwards, the pruned network was retrained to avoid a serious accuracy drop. Various pruning strategies were applied in the experiment, such as setting different percentage targets for the filter reduction, adopting different retraining schemes, and skipping the pruning of some sensitive layers. The final pruning strategy was determined by analyzing and comparing the experimental results. However, the efficiency of this approach needs to be improved.

There are also methods [27,28] that modify the objective function to affect the filter parameter update, which pushes the network to produce significantly sparse structures during training. These structures are then easily identified and removed without degrading performance. Lin et al. [29] incorporated constraint into the objective function to produce a structured sparse network and prune the least important filters. Furthermore, alternative updating with Lagrange multipliers (AULM) was proposed to identify the importance of filters, and then updates the parameters by alternating between promoting the structured sparsity and optimizing the loss. In the retraining process, ALUM removed the sparse filters and updated the remainder. The pruning ended after several iterations. However, training with constraints can affect network performance.

In addition, adding temporary constructs to the network is an effective approach [30,31]. Ding et al. [13] proposed an additional structure to the network to assist with pruning. Specifically, a compactor consisting of

1 \times 1

filters was appended to the convolutional layer, which was an equivalent conversion. Second, a penalty term was added to the objective function to produce structural sparsity in the compactor during the fine-tuning. Third, the importance of the filters in the compactor was evaluated by Euclidean norms, and the unimportant ones were removed until the reduced FLOPs reached the predetermined target. The above steps were repeated for further compression. Finally, the compactor was equivalently converted to the original parameters to prune the network.

We noted that the static approach was adopted in many methods when selecting pruning filters based on the proposed criteria, and this inspired us to introduce dynamic methods to improve compression efficiency further.

3. Methods

3.1. Preliminary

First, the convolution operation and feature maps are introduced. Let K and C be the filter kernel size and number of input channels, respectively.

F_{j} \in R^{C \times K \times K}

denotes the 2D filter applied on the

j th

output channel. All the filters together constitute the parameter of the convolutional layer. The feature map is a matrix containing feature information.

M_{i}

and

O_{i}

denote the input and output feature map on the

i th

channel. If we let ⊛ be the convolution operator and b be the bias, the output feature map is generated as

\sum_{i = 1}^{C} M_{i} ⊛ F_{j} + b = O_{j} .

(1)

Each input feature map is converted into intermediate results by the same filter, and then all intermediate results are summed and biased to obtain the output feature map. Note that each output feature map corresponds to one filter.

In the channel pruning processing, when one filter

F_{j}

is pruned, the feature map

O_{j}

is not generated. Since

O_{j}

is also the input feature map

M_{j}

for the next layer, the input channel is reduced and each filter in the next layer is modified to

F_{j}^{'} \in R^{(C - 1) \times K \times K}

.

This paper determined pruning targets by analyzing the redundancy of feature maps. When an output feature map

O_{i}

was considered redundant, the corresponding filter

F_{i}

was pruned and the filters in the next layer were modified (All the abbreviations and definitions are listed in the Appendix A).

3.2. Redundancy Analysis

This part analyzes the redundancy feature maps, each of which represents feature information essential for the network. However, according to the investigation, the activation function, a common structure in mainstream networks, results in huge sparsity in the feature maps [32]. A high degree of sparsity means a high percentage of zero elements, which is redundant data in the feature map.

Due to convolution characteristics, a high sparse feature map is converted to a slightly lower sparse intermediate result during convolution. According to Equation (1), the output feature map is composed of the sum of multiple intermediate results, which means the zero elements do not affect the result. Since only non-zero elements affect the convolution result, highly sparse feature maps are considered redundant and can be removed with only slight harm to performance. Therefore, sparsity is an effective criterion for determining the importance of a feature map and its corresponding filter.

Moreover, without reducing network performance, the sparsity of the feature map can be enhanced. According to the analysis, there are a large number of elements with values close to 0 (i.e., low-value elements). Since all input feature maps are converted by the same filter during convolution, the conversion results for low-value elements are insignificant compared to others and differ very little from those for zero elements. Therefore low-value elements can be converted to 0 without harming the network.

To measure the sparsity of feature maps more exactly, the feature penalty method is proposed to enhance sparsity, which zeros out the elements in the feature map with values below a preset threshold. It was noted that the threshold of the penalty was qualified only when the network accuracy was not degraded.

Next, how to determine the threshold of the feature penalty is explained. Figure 1 illustrates the accuracy decrease after setting different thresholds for a feature penalty. It is noticeable that the accuracy remains around the baseline at first, and begins to decrease when the threshold is greater than 0.49. It indicated that existing low-value elements could be converted to zero without affecting network accuracy. Based on the pattern in the figure, an approach to explore the eligibility thresholds was proposed: apply the feature penalty to the output feature map of one layer, keep increasing the threshold value of the penalty by a certain value and observe variations in accuracy until it showed a continuous decreasing trend. With this process, all the qualified thresholds were determined, and the maximum among them was selected as the optimal threshold.

Specifically, the optimal thresholds are different for each layer. Thus, the feature penalty was applied one layer at at a time and the optimal threshold was explored separately. Afterwards, the feature penalties were applied to all layers with optimal thresholds to measure the sparsity of each feature map. The sparsity obtained by this method was defined as En-sparsity, which is the criterion for pruning.

Furthermore, the results of pruning by En-sparsity and pruning by random selection were compared. In all, 30 En-sparsity thresholds were set with values from 1.0 to 0.7 and feature maps with En-sparsity greater than the threshold were pruned. Corresponding to each En-sparsity-based pruning, the same number of feature maps were randomly pruned. The results are summarized in Figure 2. It can be seen that removing feature maps with high En-sparsity hurt the network less, while random selection did not. As the En-sparsity of the feature map decreased, the impact on accuracy increased when it was removed. In conclusion, En-sparsity can reflect the influence of feature maps on network accuracy.

3.3. Dynamic Unimportant Filter Identification

This section dynamically identifies filters that can be removed without accuracy loss based on En-sparsity. First,

P

is a threshold for En-sparsity to distinguish between important and unimportant feature maps. During filter pruning, when En-sparsity is greater than

P

, the feature map and its corresponding filter are removed. After each pruning operation, the network is fine-tuned to regain high accuracy. Here, the value of

P

can seriously affect compression and accuracy after fine-tuning. When set to a relatively low

P

, a high compression ratio is obtained. However, it can seriously harm the network and lead to difficulties in fine-tuning. Conversely, choosing a high

P

results in a smaller or no accuracy drop, after removing only a small number of filters. Consequently, it is important to adjust

P

to an appropriate value.

For a clear description, let

{ACC}_{D}

be the accuracy drop constraint. Pruning is only considered successful when the accuracy drop is less than

{ACC}_{D}

. The target

P

ensures that the accuracy drop is within

{ACC}_{D}

and achieves a high compression ratio.

Obtaining fine-tuned accuracy requires a high-cost fine-tuning process, which means it is not practical to adjust

P

through multiple attempts. To solve this problem, the same model was pruned based on the value of

P

from 0.5 to 1.0,. The results are shown in Figure 3. It can be seen that, the accuracy shows a steady increase as the value of

P

increases. After the accuracy approaches the baseline, it remains stable In addition, with the increase in the prune point, the compression ratio dropped significantly. From the experimental results of various models, we empirically concluded the following: (1) The decrease in accuracy caused by pruning can be reduced by increasing

P

. (2) When the the damage caused by pruning is low, high accuracy can be regained by fine-tuning, which is critical to improving the compression effect. Although it is impractical to analyze the damage to the network quantitatively, the

P

adjustment direction can be determined based on the fine-tuning results.

Therefore, a binary search algorithm was introduced to adjust the

P

. Algorithm 1 describes our proposal in detail: (1)

P_{u}

and

P_{l}

are the two endpoints of the target

P

interval and are initialized to 1 and 0.5. (2)

P_{c}

is the current threshold value of En-sparsity and is set to the middle value of

P_{u}

and

P_{l}

, and the original network is pruned based on the current

P

. (3) The pruned network is fine-tuned and then tested on the validation set. (4) If the accuracy drops below

{ACC}_{D}

, The current network is saved as the pruning result, and the upper endpoint

P_{u}

is updated to

P_{c}

. In the other case, the lower endpoint

P_{l}

is updated to the current

P_{c}

. The above steps loop until the gap between

P_{l}

and

P_{u}

is less than 0.02. When the loop ends, the upper limit

P_{u}

is the target

P

, and the saved model is the result of this dynamic pruning.

Algorithm 1: Algorithm for dynamic pruning and adjustment of

P

.

To compress the network further, redundancy analysis and dynamic selection were repeated on the result of the previous iteration. The iterative process ends when the target value is not obtained in the dynamic selection phase.

4. Experimental

4.1. Experimental Configuration

The proposal was applied on the VGG16BN [33], ResNet56, ResNet110 [34] and Mobile-NetV2 [35] models to evaluate the compression effect on different network structures. Dataset Cifar10 and Cifar100 [36] were adopted, both consisting of 50k training and 10k testing images. The

32 \times 32

images were divided into 10 and 100 classes, respectively. Considering the size of the input image, the stride of the first convolutional layer of MobileNetV2 was set to 1. For VGG16BN, the fully connected layer contributes little to performance while requiring large amounts of memory storage, so it was replaced by a convolutional layer consisting of

1 \times 1

filters. According to our evaluation, the performance of the modified VGG16BN was slightly better than the original.

After 5 warm-up epochs, each base model was trained with 320 epochs on the datasets Cifar10 and Cifar100 from scratch. The momentum was 0.9; the weight decay factor was 10

^{- 4}

; and the batch size was 64. Table 1 shows the accuracy and FLOPs of each model. All models were trained and pruned with the Nvidia GeForce GTX 3080 Ti GPU and Intel i9-10900 CPU and implemented by PyTorch.

For the fine-tuning operation, the optimization function SGD with a learning rate initialized to 0.01 was adopted. It was decayed by cosine annealing [37] with a period of 320 epochs and restarted at epoch 160. The network was fine-tuned on the training set, with 320 epochs. After 160, if the best accuracy was not updated for more than 20 epochs, the fine-tuning was stopped early.

In the pruning phase, considering the network structure, different pruning strategies were adopted. For residual networks such as ResNet56 and ResNet110, considering the shortcut connection which added the output feature map to the input, the output channels of the last layer in each bottleneck were not modified. For the MobileNetV2, the feature penalty was applied to the input and output feature maps of the depthwise separable convolutions, and then the En-sparsity of the input feature maps is measured. Referring to the design of the width multiplier in the original paper, which scales the width of the network, the number of input and output channels in the depthwise separable convolution layer remained the same in the pruning phase. When intended to prune all the filters in a layer, this layer was removed. In this case, the conv-BN-activation structures of VGG16BN, the bottlenecks of ResNet, and the inverted residuals of MobileNetV2 were removed.

Since pruning is a trade-off between accuracy and the compression ratio, a limitation on the accuracy drop

{ACC}_{D}

was set to 1%, 2%, and 3% in the experiments.

4.2. Results on Cifar10

The compression results on Cifar10 are shown in Table 2. The compressed model was evaluated by Top-1 accuracy, FLOPs, and the number of parameters. Within a 1% accuracy drop, the maximum FLOPs compression ratio reached 77.10% on the VGG16BN. For other models, the compression ratio reached more than 56.27%. Regarding the reduction of the parameters, within 1%

{ACC}_{D}

, it reached 91.72% on VGG16BN, and over 67.77% on others. When

{ACC}_{D}

was set to 2 and 3%, the FLOPs and parameters of each model decreased further compared to the

{ACC}_{D}

of 1%. When

{ACC}_{D}

was increased to 2%, there was a significant increase in the FLOP compression rate: 16.48% for ResNet56 and 12.02% for VGG16BN. In conclusion, an appropriate increase in

{ACC}_{D}

improved compression effects significantly with a small decrease in accuracy.

4.3. Results on Cifar100

Moreover, Table 3 provides experimental results on the more complex dataset Cifar100. With the 1%

{ACC}_{D}

, MobileNetV2 was compressed by 68.48% for FLOPs, 70.92% for parameters, while VGG16BN was reduced by 52.38% for FLOPs and 82.94% for parameters. For ResNet56 and ResNet110, the FLOPs decreased by 35.25 and 37.40%, while as

{ACC}_{D}

increased, these items rose to 56.17 and 60.69%, respectively.

4.4. Analysis

Figure 4 illustrates the fine-tuning of ResNet110 on Cifar10, 1%

{ACC}_{D}

, which was based on the target

P

. There were four iterations in this fine-tuning, but there was no expected

P

in the fourth iteration so it was not shown. It was noticeable that although the accuracy dropped due to pruning, it recovered to high accuracy after fine-tuning for a few epochs, indicating that our method removed the filter without hurting the network.

The compression of the FLOPs and parameters at the end of each iteration are summarised in Figure 5. Overall, the compression effect on the first iteration was the most significant, with the highest reaching a reduction of 67.56% of FLOPs. For VGG16BN and MobileNetV2, the compression ratio improved considerably in the subsequent iterations when MobileNetV2 reduced FLOPs by up to 8.26% in one iteration. In the iteration near the end, only a small number of filters were removed, and about 1% of the FLOPs were reduced. At this phase the model was compact. In summary, iterative processing achieved further compression.

Furthermore, the experimental results on Cifar10 with 1%

{ACC}_{D}

were compared with other compression methods in Table 4. It can be seen that on ResNet56 regarding the pruned Top-1 accuracy, our method was only 0.15% lower than that of Hinge [20], which achieved channel pruning and decomposition, and was higher than the other channel pruning methods: HRank [38], SCP [39] and FPGM [40]. Moreover, our method realized better parameters and FLOP reductions. The FLOP compression radio was 6.27% higher than for HRank and Hinge. Furthermore, the Top-1 accuracy of ResNet110 pruned by our method was higher compared to HRank and FPGM, while the remaining FLOPs were less than HRank (27.57 vs. 41.80%) and FPGM (27.5 vs. 47.70%).

In addition, hyperparameters were adopted in other studies to trade off the compression ratio for pruning accuracy, similar to our experiments. In Hinge, the compression targets of 50 and 75% FLOP reduction were set, and different filter compression targets were adopted in Hank. Accordingly, the compression results of our method with various

{ACC}_{D}

were compared with the HRank and Hinge with different targets in Table 4. The HRank with higher FLOP compression was denoted by HRank-

β

and the same for the hinge. From the Table, it can be seen that in the compression of ResNet56, compared to HRank-

β

, almost the same compression of FLOPs was achieved by our method (3%

{ACC}_{D}

), and the Top-1 accuracy and parameter reduction were higher. Compared to Hinge-

β

, the FLOP compression ratio of our method (2%

{ACC}_{D}

) was slightly less than that of Hinge (3.25%), but our method achieved a higher parameter compression with nearly the same Top-1 accuracy. In the compression of ResNet110, our method (2%

{ACC}_{D}

) achieved a better compression effect for FLOPs (81.61 vs. 68.60%) and parameters (68.70 vs. 84.64%) then HRank-

β

did with a higher accuracy.

4.5. Implementation on the FPGA

To evaluate the acceleration effect of our proposal on resource-limited hardware, the compressed network was implemented on the FPGA. The Vitis AI [41] was used to implement the pruned model. The Xilinx ZCU102 was the experimental evaluation platform. Before deploying it on the edge board, the network was processed as follows: First, in the quantization process, 32-bit floating-point weights and activation values were converted to 8-bit integer format. Second, because the accuracy dropped after quantization, the network was fast fine-tuned to achieve better accuracy. This phase was achieved by the AdaQuant algorithm [42], which used part of the unlabeled data to fine-tune parameters and activation. Third, the network was compiled to deploy on a Deep-learning Processing Unit (DPU). After deployment, Top-1 accuracy was measured on the test set, and the same image was processed 10,000 times to measure the average inference time. The time on CPU and GPU was also measured by the same approach. Table 5 provides the average time required for the base models to process an input image on the CPU, GPU, and FPGA. The time was measured on Intel’s Core i9-12900 and Nvidia GeForce RTX3090.

The compressed models were implemented on the FPGA, and the decreases in Top-1 accuracy, average inference time, and accuracy compared to the base model are summarized in Table 6. The decrease due to quantization and compilation was very slight, about 0.3%. Moreover, the results of the ResNet56 and ResNet110, produced relatively significant drops in accuracy, about 1%.

The average inference time of the compressed models on the FPGA was compared to the time of the base models on the CPU, GPU, and FPGA in Table 7. With the combination of our method and the FPGA accelerator, the inference speed of various networks improved significantly. Compared to the CPU, the pruned networks processed an image in about only 20% of the time, and in comparison with the GPU and FPGA, the speed-up effect of our method was relatively different according to the model. The inference time of ResNet56 and ResNet110 was reduced by about 70% compared to the GPU and only by about 20% compared to the FPGA. Regarding the pruned VGG16BN, the speed was about 2 × compared to the GPU and about 5× compared to the FPGA. These results proved that our compression method with the FPGA accelerator considerably improved the speed of the CNN-based applications.

5. Conclusions

In this paper, we proposed a dynamic channel pruning method to compress mainstream CNNs. Specifically, we provided a method to evaluate the redundancy of feature maps and designed an algorithm for dynamic pruning filters based on redundancy. In addition, we adopted iterative pruning to improve the compression effect further. Compression experiments were conducted on various CNN models that reduced network computational costs by up to 77.10% and the number of parameters by 91.72% with only a 0.54% decrease in accuracy. Analysis results showed that the approach identified relatively fewer important filters and pruned the network with only slight damage. Furthermore, the compressed models were implemented into the FPGA and accomplished the recognition task in only 24.47% time of the GPU. In future work, the pruning strategy should be improved to achieve efficient pruning of shortcut connections for ResNet and MobileNetV2. Furthermore, the proposal was planned to be equipped with object detection models, such as YOLO [43] and SSD, and service in the Internet of Things [44].

Author Contributions

Conceptualization, L.M. and H.L.; methodology, Q.L.; software, Q.L.; validation, Q.L. and H.L; formal analysis, Q.L.; investigation, Q.L.; resources, L.M.; data curation, Q.L.; writing—original draft preparation, Q.L.; writing—review and editing, H.L. and L.M.; visualization, Q.L.; supervision, L.M.; project administration, L.M.; funding acquisition, L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. List of abbreviations and definitions.

Abbreviation	Explanation
CNN	Convolution Neural Network
FLOP	Floating-point operation
FPGA	Field-programmable gate array
GPU	Graphics processing unit
CPU	Central Processing Unit
K	Filter kernel size
C	Number of input channels
$F_{j}$	The 2D filter applied on the jth output channel
$M_{i}$	Input feature map
$O_{i}$	Output feature map
⊛	Convolution operator
b	Bias
En-sparsity	Sparsity obtained by the feature penalty method
$P$	Threshold value of En-sparsity
${ACC}_{D}$	Constraint on accuracy drop
$P_{u}$	Upper endpoint of target interval
$P_{l}$	Lower endpoint of target interval
$P_{c}$	Current demarcation point of En-sparsity
DPU	Deep-learning Processing Unit
Ours- $λ$	Proposed method with $λ$ ${ACC}_{D}$
method- $β$	A variant of the method
AULM	Proposal in the paper [29]
Hinge	Proposal in the paper [20]
HRank	Proposal in the paper [38]
SCP	Proposal in the paper [39]
FPGM	Proposal in the paper [40]

References

Jiang, W.; Ren, Y.; Liu, Y.; Leng, J. Artificial Neural Networks and Deep Learning Techniques Applied to Radar Target Detection: A Review. Electronics 2022, 11, 156. [Google Scholar] [CrossRef]
Chen, X.; Liu, L.; Tan, X. Robust Pedestrian Detection Based on Multi-Spectral Image Fusion and Convolutional Neural Networks. Electronics 2022, 11, 1. [Google Scholar] [CrossRef]
Avazov, K.; Mukhiddinov, M.; Makhmudov, F.; Cho, Y.I. Fire Detection Method in Smart City Environments Using a Deep-Learning-Based Approach. Electronics 2022, 11, 73. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [Google Scholar] [CrossRef]
Yue, X.; Li, H.; Fujikawa, Y.; Meng, L. Dynamic Dataset Augmentation for Deep Learning-based Oracle Bone Inscriptions Recognition. J. Comput. Cult. Herit. (JOCCH) 2022. [Google Scholar] [CrossRef]
Meng, L.; Hirayama, T.; Oyanagi, S. Underwater-drone with panoramic camera for automatic fish recognition based on deep learning. IEEE Access 2018, 6, 17880–17886. [Google Scholar] [CrossRef]
Paradisa, R.H.; Bustamam, A.; Mangunwardoyo, W.; Victor, A.A.; Yudantha, A.R.; Anki, P. Deep Feature Vectors Concatenation for Eye Disease Detection Using Fundus Image. Electronics 2022, 11, 23. [Google Scholar] [CrossRef]
Saho, K.; Hayashi, S.; Tsuyama, M.; Meng, L.; Masugi, M. Machine Learning-Based Classification of Human Behaviors and Falls in Restroom via Dual Doppler Radar Measurements. Sensors 2022, 22, 1721. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Alvarez, J.M.; Porikli, F. Less Is More: Towards Compact CNNs. In Computer Vision—ECCV 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 662–677. [Google Scholar]
Gong, C.; Chen, Y.; Lu, Y.; Li, T.; Hao, C.; Chen, D. VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization. IEEE Trans. Comput. 2021, 70, 696–710. [Google Scholar] [CrossRef]
Lin, S.; Ji, R.; Chen, C.; Tao, D.; Luo, J. Holistic CNN Compression via Low-Rank Decomposition with Knowledge Transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2889–2905. [Google Scholar] [CrossRef] [PubMed]
Guo, K.; Zeng, S.; Yu, J.; Wang, Y.; Yang, H. [DL] A survey of FPGA-based neural network inference accelerators. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2019, 12, 1–26. [Google Scholar] [CrossRef]
Ding, X.; Hao, T.; Tan, J.; Liu, J.; Han, J.; Guo, Y.; Ding, G. ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 4510–4520. [Google Scholar]
Chen, T.; Ji, B.; Ding, T.; Fang, B.; Wang, G.; Zhu, Z.; Liang, L.; Shi, Y.; Yi, S.; Tu, X. Only train once: A one-shot neural network training and pruning framework. Adv. Neural Inf. Process. Syst. 2021, 34, 19637–19651. [Google Scholar]
Li, B.; Wu, B.; Su, J.; Wang, G. Eagleeye: Fast sub-net evaluation for efficient neural network pruning. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 639–654. [Google Scholar]
Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2015; Volume 28. [Google Scholar]
Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv 2016, arXiv:1611.06440. [Google Scholar]
Ding, X.; Ding, G.; Han, J.; Tang, S. Auto-balanced filter pruning for efficient convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Li, H.; Yue, X.; Wang, Z.; Chai, Z.; Wang, W.; Tomiyama, H.; Meng, L. Optimizing the deep neural networks by layer-wise refined pruning and the acceleration on FPGA. Comput. Intell. Neurosci. 2022, 2022, 8039281. [Google Scholar] [CrossRef]
Li, Y.; Gu, S.; Mayer, C.; Gool, L.V.; Timofte, R. Group sparsity: The hinge between filter pruning and decomposition for network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8018–8027. [Google Scholar]
Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
Luo, J.H.; Wu, J.; Lin, W. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
He, Y.; Kang, G.; Dong, X.; Fu, Y.; Yang, Y. Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks. arXiv 2018, arXiv:1808.06866. [Google Scholar]
Molchanov, D.; Ashukha, A.; Vetrov, D. Variational dropout sparsifies deep neural networks. arXiv 2017, arXiv:1701.05369. [Google Scholar]
Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks Through Network Slimming. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Zhao, C.; Ni, B.; Zhang, J.; Zhao, Q.; Zhang, W.; Tian, Q. Variational Convolutional Neural Network Pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Ye, J.; Lu, X.; Lin, Z.; Wang, J.Z. Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers. arXiv 2018, arXiv:1802.00124. [Google Scholar]
Huang, Z.; Wang, N. Data-Driven Sparse Structure Selection for Deep Neural Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Lin, S.; Ji, R.; Li, Y.; Deng, C.; Li, X. Toward compact convnets via structure-sparsity regularized filter pruning. IEEE Trans. Neural Networks Learn. Syst. 2019, 31, 574–588. [Google Scholar] [CrossRef]
Dong, X.; Huang, J.; Yang, Y.; Yan, S. More is less: A more complicated network with less inference complexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5840–5848. [Google Scholar]
Ding, X.; Chen, H.; Zhang, X.; Huang, K.; Han, J.; Ding, G. Re-parameterizing Your Optimizers rather than Architectures. arXiv 2022, arXiv:2205.15242. [Google Scholar]
Li, H.; Wang, Z.; Yue, X.; Wang, W.; Tomiyama, H.; Meng, L. An architecture-level analysis on deep learning models for low-impact computations. Artif. Intell. Rev. 2022, 1–40. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
CIFAR-10 and CIFAR-100 Datasets. Available online: https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 3 October 2021).
Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
Lin, M.; Ji, R.; Wang, Y.; Zhang, Y.; Zhang, B.; Tian, Y.; Shao, L. HRank: Filter Pruning Using High-Rank Feature Map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Kang, M.; Han, B. Operation-aware soft channel pruning using differentiable masks. arXiv 2020, arXiv:2007.03938. [Google Scholar]
He, Y.; Liu, P.; Wang, Z.; Hu, Z.; Yang, Y. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4340–4349. [Google Scholar]
VitisAI Develop Environment. Available online: https://japan.xilinx.com/products/design-tools/vitis/vitis-ai.html (accessed on 2 March 2022).
Hubara, I.; Nahshan, Y.; Hanani, Y.; Banner, R.; Soudry, D. Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming. arXiv 2020, arXiv:2006.10518. [Google Scholar]
Yue, X.; Li, H.; Shimizu, M.; Kawamura, S.; Meng, L. YOLO-GD: A Deep Learning-Based Object Detection Algorithm for Empty-Dish Recycling Robots. Machines 2022, 10, 294. [Google Scholar] [CrossRef]
Xue, X.; Zhou, D.; Chen, F.; Yu, X.; Feng, Z.; Duan, Y.; Meng, L.; Zhang, M. From SOA to VOA: A Shift in Understanding the Operation and Evolution of Service Ecosystem. IEEE Trans. Serv. Comput. 2021, 1-1. [Google Scholar] [CrossRef]

Figure 1. The increase in accuracy drop with increasing feature penalty threshold.

Figure 2. Results of pruning by En-sparsity and random selection. The accuracy is without fine-tuning.

Figure 3. Experimental results of pruning based on a

P

from 0.5 to 1.0.

Figure 3. Experimental results of pruning based on a

P

from 0.5 to 1.0.

Figure 4. Fine-tuning details of ResNet110 on the Cifar10 with 1%

{ACC}_{D}

. Three iterations are shown from left to right. Only the data when the best accuracy was updated is presented.

Figure 4. Fine-tuning details of ResNet110 on the Cifar10 with 1%

{ACC}_{D}

. Three iterations are shown from left to right. Only the data when the best accuracy was updated is presented.

Figure 5. The graphs show the compression ratio for FLOPs and the parameters of pruned models after each iteration in the experiments on Cifar10 with 1%

{ACC}_{D}

.

Figure 5. The graphs show the compression ratio for FLOPs and the parameters of pruned models after each iteration in the experiments on Cifar10 with 1%

{ACC}_{D}

.

Table 1. Training results of base models.

Dataset	Model	Top-1 (%)	FLOPs (M)	Param. (M)
Cifar10	VGG16BN	94.14	314.03	14.73
	ResNet56	94.45	126.56	0.85
	ResNet110	94.49	254.99	1.73
	MobileNetV2	92.78	25.55	2.24
Cifar100	VGG16BN	73.47	314.08	14.77
	ResNet56	71.61	126.56	0.86
	ResNet110	73.72	255.00	1.73
	MobileNetV2	67.01	25.67	2.35

Table 2. Pruning results of the experiment on Cifar10. “Top-1 ↓”, “FLOPs ↓” and “Param. ↓” denote reductions compared to the base networks. The other tables and figures followed the same convention.

Model	${ACC}_{D}$	Pruned Top-1 (%)	Pruned FLOPs (M)	Pruned Param. (M)	Top-1 ↓ (%)	FLOPs ↓ (%)	Param. ↓ (%)
VGG16BN	1%	93.60	71.92	1.22	0.54	77.10	91.72
	2%	92.44	34.18	0.43	1.70	89.12	97.09
	3%	91.39	43.00	0.52	2.75	86.31	96.45
ResNet56	1%	93.54	55.34	0.27	0.91	56.27	67.77
	2%	92.54	34.49	0.15	1.95	72.75	82.63
	3%	91.83	32.88	0.12	2.62	74.02	86.27
ResNet110	1%	93.94	70.31	0.36	0.76	72.43	79.28
	2%	92.95	46.90	0.21	1.75	81.61	87.64
	3%	91.76	34.37	0.13	2.94	86.52	92.38
MobileNetV2	1%	91.87	9.78	0.64	0.91	61.72	71.47
	2%	90.82	8.80	0.54	1.96	65.56	75.70
	3%	89.84	7.78	0.56	2.94	69.55	75.16

Table 3. Pruning results of the experiments on Cifar100.

Model	${ACC}_{D}$	Pruned Top-1 (%)	Pruned FLOPs (M)	Pruned Param. (M)	Top-1 ↓ (%)	FLOPs ↓ (%)	Param. ↓ (%)
VGG16BN	1%	72.63	149.57	2.52	0.84	52.38	82.94
	2%	71.81	115.05	1.47	1.66	63.37	90.05
	3%	70.55	119.54	1.26	2.92	61.94	91.47
ResNet56	1%	70.80	81.95	0.49	0.81	35.25	43.02
	2%	69.72	64.71	0.36	1.89	48.87	58.35
	3%	68.76	55.47	0.30	2.85	56.17	64.75
ResNet110	1%	73.07	159.64	0.90	0.65	37.40	48.11
	2%	72.33	128.53	0.70	1.39	49.60	59.62
	3%	70.78	100.24	0.44	2.94	60.69	74.49
MobileNetV2	1%	66.04	8.09	0.68	0.97	68.48	70.92
	2%	65.10	8.26	0.67	1.91	67.82	71.50
	3%	64.03	7.77	0.65	2.98	69.73	72.55

Table 4. Pruning results on Cifar10. Ours-1%, Ours-2% and Ours-3% refers to our methods with 1%, 2% and 3%

{ACC}_{D}

. HRank-

β

is the HRank with a different compression target and the same for other methods.

Table 4. Pruning results on Cifar10. Ours-1%, Ours-2% and Ours-3% refers to our methods with 1%, 2% and 3%

{ACC}_{D}

. HRank-

β

is the HRank with a different compression target and the same for other methods.

Model	Method	Pruned Top-1 (%)	FLOP ↓ (%)	Param. ↓ (%)
ResNet56	HRank	93.17	50.00	42.40
	SCP	93.23	51.50	48.47
	FPGM	93.49	52.60	-
	ours-1%	93.54	56.27	67.77
	Hinge	93.69	50.00	51.27
	HRank- $β$	90.72	74.10	68.10
	ours-3%	91.83	74.02	86.27
	ours-2%	92.54	72.75	82.63
	Hinge- $β$	92.65	76.00	79.20
ResNet110	HRank	93.36	58.20	59.20
	FPGM	93.74	52.30	-
	ours-1%	93.94	72.43	79.28
	HRank- $β$	92.65	68.60	68.70
	ours-2%	92.95	81.61	84.64

Table 5. Average inference time of the base models on the GPU and FPGA.

Model	Cifar10			Cifar100
Model	CPU	GPU	FPGA	CPU	GPU	FPGA
VGG16BN	2.42	0.87	2.44	2.59	0.89	2.45
ResNet56	3.22	1.55	0.68	3.29	1.57	0.68
ResNet110	6.77	3.02	1.22	6.15	3.04	1.23
MobileNetV2	3.54	1.20	1.49	2.96	1.19	1.51

Table 6. Implementation results of the pruned models on the FPGA.

Model	${ACC}_{D}$	Cifar10			Cifar100
Model	${ACC}_{D}$	Top-1 (%)	Time (ms)	Top-1 ↓ (%)	Top-1 (%)	Time (ms)	Top-1 ↓(%)
VGG16BN	1%	93.59	0.47	0.55	72.31	0.704	1.16
	2%	92.21	0.32	1.93	71.58	0.556	1.89
	3%	91.30	0.34	2.84	70.41	0.552	3.06
ResNet56	1%	93.36	0.52	1.09	69.73	0.608	1.88
	2%	92.05	0.41	2.40	68.43	0.562	3.18
	3%	91.40	0.41	3.05	68.08	0.536	3.53
ResNet110	1%	93.50	0.74	0.99	71.68	1.06	2.04
	2%	92.57	0.65	1.92	71.10	0.98	2.62
	3%	91.60	0.55	2.89	68.97	0.892	4.75
MobileNetV2	1%	91.66	0.59	1.12	65.30	0.623	1.71
	2%	90.42	0.55	2.36	64.62	0.628	2.39
	3%	89.43	0.51	3.35	63.21	0.607	3.8

Table 7. Time reduction of the compressed model compared to the base models inference on the CPU, GPU, and FPGA.

Model	${ACC}_{D}$	Cifar10			Cifar100
Model	${ACC}_{D}$	vs. CPU ↓ (%)	vs. GPU ↓ (%)	vs. FPGA ↓ (%)	vs. CPU ↓ (%)	vs. GPU ↓ (%)	vs. FPGA ↓ (%)
VGG16BN	1%	80.47	45.47	80.66	72.85	21.12	71.24
	2%	86.97	63.61	87.09	78.56	37.70	77.29
	3%	86.14	61.30	86.27	78.71	38.15	77.45
ResNet56	1%	83.82	66.41	22.81	81.54	61.37	10.98
	2%	87.33	73.70	39.56	82.94	64.29	17.72
	3%	87.21	73.44	38.96	83.73	65.94	21.52
ResNet110	1%	89.10	75.53	39.56	82.77	65.12	13.68
	2%	90.37	78.38	46.60	84.07	67.75	20.20
	3%	91.92	81.86	55.20	85.50	70.65	27.36
MobileNetV2	1%	83.28	50.90	60.20	78.98	47.56	58.63
	2%	84.36	54.06	62.76	78.81	47.14	58.30
	3%	85.63	57.80	65.79	79.52	48.91	59.69

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Q.; Li, H.; Meng, L. Feature Map Analysis-Based Dynamic CNN Pruning and the Acceleration on FPGAs. Electronics 2022, 11, 2887. https://doi.org/10.3390/electronics11182887

AMA Style

Li Q, Li H, Meng L. Feature Map Analysis-Based Dynamic CNN Pruning and the Acceleration on FPGAs. Electronics. 2022; 11(18):2887. https://doi.org/10.3390/electronics11182887

Chicago/Turabian Style

Li, Qi, Hengyi Li, and Lin Meng. 2022. "Feature Map Analysis-Based Dynamic CNN Pruning and the Acceleration on FPGAs" Electronics 11, no. 18: 2887. https://doi.org/10.3390/electronics11182887

APA Style

Li, Q., Li, H., & Meng, L. (2022). Feature Map Analysis-Based Dynamic CNN Pruning and the Acceleration on FPGAs. Electronics, 11(18), 2887. https://doi.org/10.3390/electronics11182887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Map Analysis-Based Dynamic CNN Pruning and the Acceleration on FPGAs

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Preliminary

3.2. Redundancy Analysis

3.3. Dynamic Unimportant Filter Identification

4. Experimental

4.1. Experimental Configuration

4.2. Results on Cifar10

4.3. Results on Cifar100

4.4. Analysis

4.5. Implementation on the FPGA

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI