# Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

^{*}

## Abstract

**:**

## 1. Introduction

- In the same layer of a CNN model, most feature maps are either highly dense or highly sparse. Take the feature maps in layer 2 of the CNN model vgg16 for illustration. Figure 1 gives the feature maps of the first eight channels in layer 2 of the CNN model vgg16. In Figure 1, a zero value is displayed in a white color, while nonzero values are displayed in a black color. Then, we can find: channels CH2, CH3 and CH5 are highly dense, while channels CH1, CH4, CH6, CH7 and CH8 are highly sparse. In other words, for the same feature map, two adjacent pixel locations are often in the same color. Thus, there is a high possibility that two adjacent pixel locations, called a block, can share the same indication bit.
- In the same layer of a CNN model, feature maps in different channels are often similar. Take the first eight feature maps in layer 2 of the CNN model vgg16 for example. As displayed in Figure 1, channels CH2, CH3 and CH5 are white dog pictures, while channels CH1, CH4, CH6, CH7 and CH8 are black dog pictures. In other words, these eight feature maps are essentially dog pictures. In particular, if the colors of CH2, CH3 and CH5 are reversed, we can obtain the eight feature maps, as shown in Figure 2. Note that these eight feature maps (displayed in Figure 2) are similar. Owing to the similarity of feature maps, we can try to consider multiple channels at the same time for compression.

## 2. Related Works

## 3. Proposed Approach

Algorithm 1: Proposed Block-Based Compression |

## 4. Experiment Results

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Aloysius, N.; Geetha, M. A Review on Deep Convolutional Neural Networks. In Proceedings of the IEEE International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 6–8 April 2017; pp. 588–592. [Google Scholar] [CrossRef]
- Sze, V.; Yang, T.-J.; Chen, Y.-H.; Emer, J. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE
**2017**, 105, 2295–2329. [Google Scholar] [CrossRef][Green Version] - Amato, G.; Carrara, F.; Falchi, F.; Gennaro, C.; Meghini, C.; Vairo, C. Deep Learning for Decentralized Parking Lot Occupancy Detection. Expert Syst. Appl.
**2017**, 32, 327–334. [Google Scholar] [CrossRef] - Chun, C.; Lee, T.; Kwon, S.; Ryu, S.K. Classification and Segmentation of Longitudinal Road Marking Using Convolutional Neural Networks for Dynamic Retroreflection Estimation. Sensors
**2020**, 20, 5560. [Google Scholar] [CrossRef] [PubMed] - Cheong, J.Y.; Park, I.K. Deep CNN-Based Super-Resolution Using External and Internal Examples. IEEE Signal Process. Lett.
**2017**, 24, 1252–1256. [Google Scholar] [CrossRef] - Vargas, E.; Hopgood, J.R.; Brown, K.; Subr, K. On Improved Training of CNN for Acoustic Source Localisation. IEEE Trans. Audio Speech Lang. Process.
**2021**, 29, 720–732. [Google Scholar] [CrossRef] - Gupta, H.; Jin, K.H.; Nguyen, H.Q.; McCann, M.T.; Unser, M. CNN-Based Projected Gradient Descent for Consistent CT Image Reconstruction. IEEE Trans. Med. Imaging
**2018**, 37, 1440–1453. [Google Scholar] [CrossRef][Green Version] - Marsi, S.; Bhattacharya, J.; Molina, R.; Ramponi, G. A Non-Linear Convolution Network for Image Processing. Electronics
**2021**, 10, 201. [Google Scholar] [CrossRef] - Zhang, S.; Du, Z.; Zhang, L.; Lan, H.; Liu, S.; Li, L.; Guo, Q.; Chen, T.; Chen, Y. Cambricon-X: An Accelerator for Sparse Neural Networks. In Proceedings of the IEEE International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016; pp. 1–12. [Google Scholar] [CrossRef]
- Sze, V.; Chen, Y.-H.; Emer, J.; Suleiman, A.; Zhang, Z. Hardware for Machine Learning: Challenges and Opportunities. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC), Austin, TX, USA, 30 April–3 May 2017; pp. 1–8. [Google Scholar] [CrossRef][Green Version]
- Chen, Y.; Krishna, T.; Emer, J.S.; Sze, V. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J. Solid-State Circuits
**2017**, 52, 127–138. [Google Scholar] [CrossRef][Green Version] - Zhang, X.; Wang, J.; Zhu, C.; Lin, Y.; Xiong, J.; Hwu, W.M.; Chen, D. DNNBuilder: An Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 5–8 November 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Chen, Y.; Yang, T.; Emer, J.; Sze, V. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE J. Emerg. Sel. Top. Circuits Syst.
**2019**, 9, 292–308. [Google Scholar] [CrossRef][Green Version] - Lin, W.-H.; Kao, H.-Y.; Huang, S.-H. A Design Framework for Hardware Approximation of Deep Neural Networks. In Proceedings of the IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Taipei, Taiwan, 3–6 December 2019; pp. 1–2. [Google Scholar] [CrossRef]
- Nabavinejad, S.M.; Baharloo, M.; Chen, K.C.; Palesi, M.; Kogel, T.; Ebrahimi, M. An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators. IEEE J. Emerg. Sel. Top. Circuits Syst.
**2020**, 10, 268–282. [Google Scholar] [CrossRef] - Yuan, Z.; Liu, Y.; Yue, J.; Yang, Y.; Wang, J.; Feng, X.; Zhao, J.; Li, X.; Yang, H. STICKER: An Energy-Efficient Multi-Sparsity Compatible Accelerator for Convolutional Neural Networks in 65-nm CMOS. IEEE J. Solid-State Circuits
**2020**, 55, 465–477. [Google Scholar] [CrossRef] - Zhao, Y.; Lu, J.; Chen, X. An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs. Sensors
**2020**, 20, 5558. [Google Scholar] [CrossRef] [PubMed] - Kao, H.-Y.; Chen, X.-J.; Huang, S.-H. Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing. Sensors
**2021**, 21, 5081. [Google Scholar] [CrossRef] [PubMed] - Chen, T.; Du, Z.; Sun, N.; Wang, J.; Wu, C.; Chen, Y.; Temam, O. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine Learning. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Salt Lake City, UT, USA, 1–5 March 2014; pp. 269–284. [Google Scholar] [CrossRef]
- Parashar, A.; Rhu, M.; Mukkara, A.; Puglielli, A.; Venkatesan, R.; Khailany, B.; Emer, J.; Keckler, S.W.; Dally, W.J. SCNN: An Accelerator for Compressed-Sparse Convolutional Neural Networks. In Proceedings of the IEEE International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, 24–28 June 2017; pp. 27–40. [Google Scholar] [CrossRef]
- Wang, Y.; Li, H.; Li, X. A Case of On-Chip Memory Subsystem Design for Low-Power CNN Accelerators. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
**2018**, 37, 1971–1984. [Google Scholar] [CrossRef] - Keras Applications. Available online: https://keras.io/api/applications/ (accessed on 18 September 2021).
- Albericio, J.; Judd, P.; Hetherington, T.; Aamodt, T.; Jerger, N.E.; Moshovos, A. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Ccomputing. ACM SIGARCH Comput. Archit. News
**2016**, 44, 1–13. [Google Scholar] [CrossRef] - Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016; pp. 1–9. Available online: https://arxiv.org/abs/1510.00149 (accessed on 18 September 2021).
- Zhu, M.H.; Gupta, S. To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression. Available online: https://arxiv.org/abs/1710.01878 (accessed on 18 September 2021).
- Lin, C.-Y.; Lai, B.-C. Supporting Compressed-Sparse Activations and Weights on SIMD-like Accelerator for Sparse Convolutional Neural Networks. In Proceedings of the IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, Korea, 22–25 January 2018; pp. 105–110. [Google Scholar] [CrossRef]
- Lai, B.-C.; Pan, J.-W.; Lin, C.-Y. Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
**2019**, 27, 1218–1222. [Google Scholar] [CrossRef] - Abdelgawad, A.; Bayoumi, M. High Speed and Area-Efficient Multiply Accumulate (MAC) Unit for Digital Signal Processing Applications. In Proceedings of the IEEE International Symposium on Circuits and Systems, New Orleans, LA, USA, 27–30 May 2007; pp. 3199–3202. [Google Scholar] [CrossRef]
- Hoang, T.T.; Sjalander, M.; Larsson-Edefors, P. A High-Speed Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application to a Double-Throughput MAC Unit. IEEE Trans. Circuits Syst.
**2010**, 57, 3073–3081. [Google Scholar] [CrossRef] - Tung, C.-W.; Huang, S.-H. A High-Performance Multiply-Accumulate Unit by Integrating Additions and Accumulations into Partial Product Reduction Process. IEEE Access
**2020**, 8, 87367–87377. [Google Scholar] [CrossRef] - Choukroun, Y.; Kravchik, E.; Yang, F.; Kisilev, P. Low-bit Quantization of Neural Networks for Efficient Inference. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 3009–3018. [Google Scholar] [CrossRef][Green Version]
- Kim, H.; Lee, K.; Shin, D. Towards Accurate Low Bit DNNs with Filter-wise Quantization. In Proceedings of the IEEE International Conference on Consumer Electronics—Asia (ICCE-Asia), Seoul, Korea, 1–3 November 2020; pp. 1–4. [Google Scholar] [CrossRef]

**Figure 3.**The hardware design of the direct indexing module [9].

**Figure 4.**The hardware design of the dual indexing module [26].

**Figure 5.**The hardware design of the encoder [26].

vgg16 | ResNet50 | ResNet50v2 | MobileNet | MobileNetv2 | DenseNet121 |
---|---|---|---|---|---|

49.0% | 33.1% | 39.5% | 49.4% | 65.7% | 48.7% |

Cnvlutin [23] | Cambricon-X [9] | Dual Indexing [26] | Ours |
---|---|---|---|

16,208 $\mathsf{\mu}{\mathrm{m}}^{2}$ | 12,112 $\mathsf{\mu}{\mathrm{m}}^{2}$ | 30,108 $\mathsf{\mu}{\mathrm{m}}^{2}$ | 13,599 $\mathsf{\mu}{\mathrm{m}}^{2}$ |

CNN Model | Approach | |||
---|---|---|---|---|

Cnvlutin | Cambricon-X | Dual Indexing | Ours | |

vgg16 | 71.3% | 71.3% | 71.3% | 71.3% |

ResNet50 | 74.9% | 74.9% | 74.9% | 74.9% |

ResNet50v2 | 76.0% | 76.0% | 76.0% | 76.0% |

MobileNet | 70.4% | 70.4% | 70.4% | 70.4% |

MobileNetv2 | 71.3% | 71.3% | 71.3% | 71.3% |

DenseNet121 | 75.0% | 75.0% | 75.0% | 75.0% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Weng, Y.-K.; Huang, S.-H.; Kao, H.-Y. Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations. *Sensors* **2021**, *21*, 7468.
https://doi.org/10.3390/s21227468

**AMA Style**

Weng Y-K, Huang S-H, Kao H-Y. Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations. *Sensors*. 2021; 21(22):7468.
https://doi.org/10.3390/s21227468

**Chicago/Turabian Style**

Weng, Yui-Kai, Shih-Hsu Huang, and Hsu-Yu Kao. 2021. "Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations" *Sensors* 21, no. 22: 7468.
https://doi.org/10.3390/s21227468