HW-ADAM: FPGA-Based Accelerator for Adaptive Moment Estimation
Abstract
:1. Introduction
2. Background Knowledge
2.1. Overview of Adaptive Moment Estimation
Algorithm 1 The workflow of original ADAM. |
Require:a: Stepsize |
Require:, ∈ [0,1): Exponential decay rates for the moment estimates |
Require:: Stochastic objective function with parameters |
Require:: Initial parameter vector |
1: |
2: |
3: |
4: while not converged do |
5: |
6: |
7: |
8: |
9: |
2.2. Overview of Fast Inverse Square Root
3. Proposed Method
3.1. Design of Efficient-ADAM
Algorithm 2 The workflow of E-ADAM |
Require:a: Stepsize |
Require:, : Exponential decay rates for the moment estimates |
Require:: Stochastic objective function with parameters |
Require:: Initial parameter vector |
1: |
2: |
3: |
4: while not converged do |
5: |
6: |
7: |
8: |
9: |
10: |
3.2. Design of Fast-ADAM
4. Experimental Results
4.1. Accuracy Validation of the Proposed Approximation
4.2. Functionality Validation of the Proposed Design
4.3. Efficiency Validation of the Proposed Design
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ADAM | Adaptive moment estimation |
GD | Gradient descent |
SGD | Stochastic gradient descent |
MUOP/s | Million updating operation per second |
DNN | Deep neural network |
CNN | Convolutional neural network |
DRL | Deep reinforcement learning |
RL | Reinforcement learning |
GPU | Graphic processing unit |
ASIC | Application specific integrated circuit |
FPGA | Field programmable gate array |
FP | Forward propagation |
BP | Backward propagation |
WG | Weight gradient update |
Fast InvSqrt | Fast inverse square root |
PPO | Proximal policy optimization |
FF | Flip-flop |
LUT | Look-up table |
DSP | Data processing unit |
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Chen, C.; Seff, A.; Kornhauser, A.; Xiao, J. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2722–2730. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sermanet, P.; Hadsell, R.; Scoffier, M.; Grimes, M.; Ben, J.; Erkan, A.; Crudele, C.; Miller, U.; LeCun, Y. A multirange architecture for collision-free off-road robot navigation. J. Field Robot. 2009, 26, 52–87. [Google Scholar] [CrossRef] [Green Version]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Machupalli, R.; Hossain, M.; Mandal, M. Review of ASIC accelerators for deep neural network. Microprocess. Microsystems 2022, 89, 104441. [Google Scholar] [CrossRef]
- Shawahna, A.; Sait, S.M.; El-Maleh, A. FPGA-based accelerators of deep learning networks for learning and classification: A review. IEEE Access 2018, 7, 7823–7859. [Google Scholar] [CrossRef]
- Misra, J.; Saha, I. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 2010, 74, 239–255. [Google Scholar] [CrossRef]
- Esmaeilzadeh, H.; Sampson, A.; Ceze, L.; Burger, D. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, Vancouver, BC, Canada, 1–5 December 2012; pp. 449–460. [Google Scholar]
- Han, S.; Liu, X.; Mao, H.; Pu, J.; Pedram, A.; Horowitz, M.A.; Dally, W.J. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 2016, 44, 243–254. [Google Scholar] [CrossRef]
- Du, L.; Du, Y.; Li, Y.; Su, J.; Kuan, Y.C.; Liu, C.C.; Chang, M.C.F. A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 65, 198–208. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170. [Google Scholar]
- Yuan, Z.; Yue, J.; Yang, H.; Wang, Z.; Li, J.; Yang, Y.; Guo, Q.; Li, X.; Chang, M.F.; Yang, H.; et al. Sticker: A 0.41-62.1 TOPS/W 8Bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In Proceedings of the 2018 IEEE Symposium on VLSI Circuits, Honolulu, HI, USA, 18–22 June 2018; pp. 33–34. [Google Scholar]
- Ueyoshi, K.; Ando, K.; Hirose, K.; Takamaeda-Yamazaki, S.; Kadomoto, J.; Miyata, T.; Hamada, M.; Kuroda, T.; Motomura, M. QUEST: A 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS. In Proceedings of the 2018 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 5–9 February 2018; pp. 216–218. [Google Scholar]
- Lee, J.; Lee, J.; Han, D.; Lee, J.; Park, G.; Yoo, H.J. An energy-efficient sparse deep-neural-network learning accelerator with fine-grained mixed precision of FP8–FP16. IEEE Solid-State Circuits Lett. 2019, 2, 232–235. [Google Scholar] [CrossRef]
- Dai, P.; Yang, J.; Ye, X.; Cheng, X.; Luo, J.; Song, L.; Chen, Y.; Zhao, W. SparseTrain: Exploiting dataflow sparsity for efficient convolutional neural networks training. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020; pp. 1–6. [Google Scholar]
- Zhang, W.; Jiang, Y.; Farrukh, F.U.D.; Zhang, C.; Xie, X. A portable accelerator of proximal policy optimization for robots. In Proceedings of the 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), Zhuhai, China, 24–26 November 2021; pp. 171–172. [Google Scholar]
- Imani, M.; Gupta, S.; Kim, Y.; Rosing, T. Floatpim: In-memory acceleration of deep neural network training with high precision. In Proceedings of the 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), Phoenix, AZ, USA, 22–26 June 2019; pp. 802–815. [Google Scholar]
- Yang, Y.; Deng, L.; Wu, S.; Yan, T.; Xie, Y.; Li, G. Training high-performance and large-scale deep neural networks with full 8-bit integers. Neural Netw. 2020, 125, 70–82. [Google Scholar] [CrossRef] [Green Version]
- Zhu, F.; Gong, R.; Yu, F.; Liu, X.; Wang, Y.; Li, Z.; Yang, X.; Yan, J. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1969–1979. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28, 1135–1143. [Google Scholar]
- Yang, D.; Ghasemazar, A.; Ren, X.; Golub, M.; Lemieux, G.; Lis, M. Procrustes: A dataflow and accelerator for sparse deep neural network training. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece, 17–21 October 2020; pp. 711–724. [Google Scholar]
- Choi, D.; Shallue, C.J.; Nado, Z.; Lee, J.; Maddison, C.J.; Dahl, G.E. On empirical comparisons of optimizers for deep learning. arXiv 2019, arXiv:1910.05446. [Google Scholar]
- Robbins, H.; Monro, S. A stochastic approximation method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
- Cho, H.; Oh, P.; Park, J.; Jung, W.; Lee, J. Fa3c: Fpga-accelerated deep reinforcement learning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 499–513. [Google Scholar]
- Yang, J.; Hong, S.; Kim, J.Y. FIXAR: A fixed-point deep reinforcement learning platform with quantization-aware training and adaptive parallelism. In Proceedings of the 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 5–9 December 2021; pp. 259–264. [Google Scholar]
- Zhao, Y.; Liu, C.; Du, Z.; Guo, Q.; Hu, X.; Zhuang, Y.; Zhang, Z.; Song, X.; Li, W.; Zhang, X.; et al. Cambricon-Q: A hybrid architecture for efficient training. In Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, 14–18 June 2021; pp. 706–719. [Google Scholar]
- Kara, K.; Alistarh, D.; Alonso, G.; Mutlu, O.; Zhang, C. FPGA-accelerated dense linear machine learning: A precision-convergence trade-off. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA, 30 April–2 May 2017; pp. 160–167. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Lomont, C. Fast Inverse Square Root; Technical Report; Purdue University: Indianapolis, IN, USA, 2003. [Google Scholar]
- Polyak, B.T. Some methods of speeding up the convergence of iteration methods. Ussr Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
- Nesterov, Y.E. A method for solving the convex programming problem with convergence rate O (1/k2). Dokl. Akad. Nauk. SSSR 1983, 269, 543–547. [Google Scholar]
- Ellenberger, B. PyBullet Gymperium. 2018–2019. Available online: https://github.com/benelot/pybullet-gym (accessed on 6 September 2021).
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
E-ADAM | F-ADAM | Original ADAM | FA3C [27] | |
---|---|---|---|---|
Algorithm | ADAM | ADAM | ADAM | RMSProp |
FF | 729 | 686 | 2630 | 8.1k |
LUT | 1043 | 2297 | 7082 | 6.7k |
DSP | 2 | 14 | 10 | 28 |
ADAM | E-ADAM | F-ADAM | |
---|---|---|---|
FF | 2630 | 729 | 686 |
LUT | 7082 | 1043 | 2297 |
DSP | 10 | 2 | 14 |
Execution clocks | 112 | 53 | 7 |
Max frequency (MHz) | 115 | 153 | 117 |
Throughput (MUOP/s) | 1.02 | 2.89 | 16.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Niu, L.; Zhang, D.; Wang, G.; Farrukh, F.U.D.; Zhang, C. HW-ADAM: FPGA-Based Accelerator for Adaptive Moment Estimation. Electronics 2023, 12, 263. https://doi.org/10.3390/electronics12020263
Zhang W, Niu L, Zhang D, Wang G, Farrukh FUD, Zhang C. HW-ADAM: FPGA-Based Accelerator for Adaptive Moment Estimation. Electronics. 2023; 12(2):263. https://doi.org/10.3390/electronics12020263
Chicago/Turabian StyleZhang, Weiyi, Liting Niu, Debing Zhang, Guangqi Wang, Fasih Ud Din Farrukh, and Chun Zhang. 2023. "HW-ADAM: FPGA-Based Accelerator for Adaptive Moment Estimation" Electronics 12, no. 2: 263. https://doi.org/10.3390/electronics12020263
APA StyleZhang, W., Niu, L., Zhang, D., Wang, G., Farrukh, F. U. D., & Zhang, C. (2023). HW-ADAM: FPGA-Based Accelerator for Adaptive Moment Estimation. Electronics, 12(2), 263. https://doi.org/10.3390/electronics12020263