Pre-Computing Batch Normalisation Parameters for Edge Devices on a Binarized Neural Network
Abstract
:1. Introduction
1.1. Binarized Neural Networks
1.2. Batch Normalisation
1.3. Implementing Batch Normalisation on Hardware
1.4. Resource-Constrained Edge Devices
1.4.1. TensorFlow Lite
1.4.2. Larq Compute Engine
1.5. Further Discussion
- Parameter Quantisation: Reducing the precision of real-valued parameters comes at a minor cost to accuracy but can significantly reduce memory usage.
- Layer Grouping and Computation Re-ordering: Equations can be simplified by grouping layers and re-ordering computations to use fewer logic units. This can be performed post-training and prior to loading the parameters on an edge device. Since equations are simplified, the total number of parameters can be reduced, reducing memory usage.
2. Specification and Proposal
2.1. Methodology
2.1.1. Processing Parameters
2.1.2. Grouped Layer Operations
2.2. Reference Network
Pico MNIST BinaryNet
2.3. Proposed Quantization Method
2.3.1. Parameter Quantization
2.3.2. Reduction of Arithmetic Operations
- The process uses 32-Bit FLP to compute a part of the equation prior to being exported. This reduces the quantization effect from the exported parameters.
- By going from five parameters to three, the memory requirements are lowered. This reduces the amount of memory needed on-chip.
- Additional arithmetic units to compute division or square root operators are no longer needed. These take up a greater area and more clock cycles to compute fully.
- The use of a linear equation means that the logic block can be re-used in a CNN for a MAC operation. Figure 6c demonstrates a generic MAC PE unit.
2.4. Proposal Discussion
3. Results
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ASIC | Application Specific Integrated Circuit |
BN | Batch Normalisation |
BNN | Binarized Neural Network |
CIFAR | Canadian Institute for Advanced Research |
CNN | Convolutional Neural Network |
DLNN | Deep Learning Neural Network |
FC | Fully-Connected |
FiP | Fixed-Point |
FLP | Floating-Point |
MAC | Multiply-Accumulate |
MNIST | Modified National Institute of Standards and Technology |
PE | Processing Element |
ReLU | Rectified Linear Unit |
SRAM | Static Random-Access Memory |
XNOR | Exclusive NOR Gate |
References
- Sayed, R.; Azmi, H.; Shawkey, H.; Khalil, A.H.; Refky, M. A Systematic Literature Review on Binary Neural Networks. IEEE Access 2023, 11, 27546–27578. [Google Scholar] [CrossRef]
- Chang, J.; Chen, Y.H.; Chan, G.; Cheng, H.; Wang, P.S.; Lin, Y.; Fujiwara, H.; Lee, R.; Liao, H.J.; Wang, P.W.; et al. 15.1 A 5nm 135Mb SRAM in EUV and High-Mobility-Channel FinFET Technology with Metal Coupling and Charge-Sharing Write-Assist Circuitry Schemes for High-Density and Low-VMIN Applications. In Proceedings of the 2020 IEEE International Solid- State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 238–240. [Google Scholar] [CrossRef]
- Chang, C.H.; Chang, V.; Pan, K.; Lai, K.; Lu, J.H.; Ng, J.; Chen, C.; Wu, B.; Lin, C.; Liang, C.; et al. Critical Process Features Enabling Aggressive Contacted Gate Pitch Scaling for 3nm CMOS Technology and Beyond. In Proceedings of the 2022 International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 3–7 December 2022; pp. 27.1.1–27.1.4. [Google Scholar] [CrossRef]
- Geiger, L.; Team, P. Larq: An Open-Source Library for Training Binarized Neural Networks. J. Open Source Softw. 2020, 5, 1746. [Google Scholar] [CrossRef] [Green Version]
- Simons, T.; Lee, D.J. A review of Binarized Neural Networks. Electronics 2019, 8, 661. [Google Scholar] [CrossRef] [Green Version]
- Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1. arXiv 2016, arXiv:1602.02830. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
- TensorFlow. tf.keras.layers.BatchNormalization. Available online: https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization (accessed on 30 March 2023).
- Chen, T.; Zhang, Z.; Ouyang, X.; Liu, Z.; Shen, Z.; Wang, Z. “BNN − BN = ?”: Training Binary Neural Networks without Batch Normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Virtual, 19–25 June 2021. [Google Scholar]
- Nurvitadhi, E.; Sheffield, D.; Sim, J.; Mishra, A.; Venkatesh, G.; Marr, D. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi’an, China, 7–9 December 2016. [Google Scholar] [CrossRef]
- Zhao, R.; Song, W.; Zhang, W.; Xing, T.; Lin, J.H.; Srivastava, M.; Gupta, R.; Zhang, Z. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2017; pp. 15–24. [Google Scholar] [CrossRef] [Green Version]
- Noh, S.H.; Park, J.; Park, D.; Koo, J.; Choi, J.; Kung, J. LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training. In Proceedings of the 2022 IEEE 40th International Conference on Computer Design (ICCD), Olympic Valley, CA, USA, 23–26 October 2022. [Google Scholar]
- Courbariaux, M.; Bengio, Y.; David, J.P. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona Spain, 5–10 December 2016; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
- Tensorflow. Deploy Machine Learning Models on Mobile and Edge Devices. Available online: https://www.tensorflow.org/lite (accessed on 30 March 2023).
- FPL. FlatBuffers White Paper. Available online: https://flatbuffers.dev/flatbuffers_white_paper.html (accessed on 30 March 2023).
- David, R.; Duke, J.; Jain, A.; Reddi, V.J.; Jeffries, N.; Li, J.; Kreeger, N.; Nappier, I.; Natraj, M.; Regev, S.; et al. TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems. Proc. Mach. Learn. Syst. 2020, 3, 800–811. [Google Scholar]
- Lai, L.; Suda, N.; Chandra, V. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs. arXiv 2018, arXiv:1801.06601. [Google Scholar]
- Bannink, T.; Bakhtiari, A.; Hillier, A.; Geiger, L.; de Bruin, T.; Overweel, L.; Neeven, J.; Helwegen, K. Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-Art Binarized Neural Networks. Proc. Mach. Learn. Syst. 2020, 3, 680–695. [Google Scholar]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- LeCun, Y.; Cortes, C.; Burges, C.J. MNIST Handwritten Digit Database. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 30 March 2023).
- Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 30 March 2023).
- IEEE Std 754-2019; IEEE Standard for Floating-Point Arithmetic. IEEE: Piscataway, NJ, USA, 2019; pp. 1–84. [CrossRef]
- Alcaraz, F. Fxpmath. Available online: https://github.com/francof2a/fxpmath (accessed on 30 March 2023).
- Spagnolo, F.; Perri, S.; Corsonello, P. Approximate Down-Sampling Strategy for Power-Constrained Intelligent Systems. IEEE Access 2022, 10, 7073–7081. [Google Scholar] [CrossRef]
- Yan, F.; Zhang, Z.; Liu, Y.; Liu, J. Design of Convolutional Neural Network Processor Based on FPGA Resource Multiplexing Architecture. Sensors 2022, 22, 5967. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Bu, X.; Li, B. XNORCONV: CNNs accelerator implemented on FPGA using a hybrid CNNs structure and an inter-layer pipeline method. IET Image Process. 2020, 14, 105–113. [Google Scholar] [CrossRef]
- González, E.; Luna, W.D.V.; Ariza, C.A.F. A Hardware Accelerator for the Inference of a Convolutional Neural network. Cienc. Ing. Neogranadina 2019, 30, 107–116. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Computer Vision—Proceedings of the ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part IV; Springer International Publishing: Cham, Switzerland, 2016; pp. 525–542. [Google Scholar]
- Martinez, B.; Yang, J.; Bulat, A.; Tzimiropoulos, G. Training Binary Neural Networks with Real-to-Binary Convolutions. arXiv 2020, arXiv:2003.11535. [Google Scholar]
- ARM Limited. Cortex-M4 Technical Reference Manual r0p0; ARM Limited: Cambridge, UK, 2010. [Google Scholar]
Layer Type | Output Size | Weight (1-Bit) | Parameter (32-Bit) | Memory (kB) |
---|---|---|---|---|
Convolution | 72 | 8 | 0.04 | |
Max-pooling | 0 | 0 | 0 | |
BatchNorm | 0 | 32 | 0.12 | |
Convolution | 1152 | 16 | 0.2 | |
Max-pooling | 0 | 0 | 0 | |
BatchNorm | 0 | 64 | 0.24 | |
Flatten | 0 | 0 | 0 | |
Dense | 4000 | 10 | 0.53 | |
BatchNorm | 0 | 40 | 0.16 | |
Softmax | 0 | 0 | 0 | |
Total | 5224 | 170 | 1.29 |
Network Name | Remarks | Parameter Precision | Data Format 3 | Memory (kB) |
---|---|---|---|---|
Pico(A) 1 | BNN, with and | 32 FLP | {1,8,23} | 1.29 |
32 FiP 2 | {1,8,23} | 1.03 | ||
16 FiP | {1,7,8} | 0.84 | ||
14 FiP | {1,7,6} | 0.81 | ||
12 FiP (A) | {1,7,4} | 0.79 | ||
10 FiP | {1,5,4} | 0.76 | ||
Pico(B) 1 | BNN, no or | 32 FLP | {1,8,23} | 1.03 |
32 FiP 2 | {1,8,23} | 1.03 | ||
16 FiP | {1,7,8} | 0.84 | ||
14 FiP | {1,7,6} | 0.81 | ||
12 FiP (A) | {1,7,4} | 0.79 | ||
12 FiP (B) | {1,5,6} | 0.76 | ||
[28] | CNN, Pooling Estimation | 32 FiP | - | 5.35 |
[29] | CNN, Reduced Params, No | 16 FiP | - | 9.13 |
[30] | Hybrid XNOR-CNN | 1b, 2b, 32 FLP | - | 220.1 |
[31] | CNN, Hardware and Software Co-Process | 16 FiP | {1,7,8} | 86.8 |
12 FiP | {1,5,6} | 65.1 |
Network Name | Parameter Precision | Accuracy (%) | Memory (kB) | Normalised Accuracy | Normalised Memory |
---|---|---|---|---|---|
Pico(A) | 32 FLP | 96.11 | 1.29 | 1.0 | 1.0 |
32 FiP | 96.11 | 1.03 | 1.0 | 0.8 | |
16 FiP | 96.15 | 0.84 | 1.0 | 0.65 | |
14 FiP | 96.00 | 0.81 | 1.0 | 0.63 | |
12 FiP (A) | 94.74 | 0.79 | 0.99 | 0.61 | |
10 FiP | 43.18 | 0.76 | 0.45 | 0.59 | |
Pico(B) | 32 FLP | 94.58 | 1.03 | 0.98 | 0.8 |
32 FiP | 94.58 | 1.03 | 0.98 | 0.8 | |
16 FiP | 94.37 | 0.84 | 0.98 | 0.65 | |
14 FiP | 92.94 | 0.81 | 0.97 | 0.63 | |
12 FiP (A) | 11.35 | 0.79 | 0.12 | 0.61 | |
12 FiP (B) | 55.44 | 0.76 | 0.58 | 0.61 | |
[28] | 32 FiP | 96.3 | 5.35 | 1.0 | 4.1 |
[29] | 16 FiP | 97.3 | 9.13 | 1.01 | 7.1 |
[30] | 1b, 2b, 32 FLP | 98.4 | 220.1 | 1.02 | 170.6 |
[31] | 16 FiP | 98.70 | 86.8 | 1.03 | 67.3 |
12 FiP | 97.59 | 65.1 | 1.02 | 50.5 |
Network | Method | Accuracy (Top-1%) | Parameter Utilisation 1 (%) | Memory Savings (kB) |
---|---|---|---|---|
MobileNet [33] | Typical | 70.6 | 0.96 | - |
Ours | - | 80 | ||
XNOR-Net [34] | Typical | 45.0 | 0.48 | - |
Ours | - | 0 | ||
Real-to-Binary [35] | Typical | 65.0 | 1.29 | - |
Ours | - | 34 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Phipps, N.; Shang, J.-J.; Teo, T.H.; Wey, I.-C. Pre-Computing Batch Normalisation Parameters for Edge Devices on a Binarized Neural Network. Sensors 2023, 23, 5556. https://doi.org/10.3390/s23125556
Phipps N, Shang J-J, Teo TH, Wey I-C. Pre-Computing Batch Normalisation Parameters for Edge Devices on a Binarized Neural Network. Sensors. 2023; 23(12):5556. https://doi.org/10.3390/s23125556
Chicago/Turabian StylePhipps, Nicholas, Jin-Jia Shang, Tee Hui Teo, and I-Chyn Wey. 2023. "Pre-Computing Batch Normalisation Parameters for Edge Devices on a Binarized Neural Network" Sensors 23, no. 12: 5556. https://doi.org/10.3390/s23125556
APA StylePhipps, N., Shang, J.-J., Teo, T. H., & Wey, I.-C. (2023). Pre-Computing Batch Normalisation Parameters for Edge Devices on a Binarized Neural Network. Sensors, 23(12), 5556. https://doi.org/10.3390/s23125556