Low-Power FPGA Implementation of Convolution Neural Network Accelerator for Pulse Waveform Classification
Abstract
:1. Introduction
2. System Design
2.1. System Design Flow
2.2. Data Collection and Preprocessing
2.3. Algorithm Design: Structure of CNN
2.4. Hardware System Design
Algorithm 1. Algorithm of CONV layer. |
for j in range(n): # loop 1 for i in range(m): # loop 1.1 load input[i][:]; load weights[j][i][3]; conv_out[j][i][:] = CONV(input[i][:], weights[j][i]); store conv_out[j][i][:]; for j in range(n): # loop 2 for k in range(l): # loop 2.1 load conv_out[j][:][k]; accu_out[j][k] = ACCU(conv_out[j][:][k]); accu_out[j][k] += bias[j] for j in range(n): # loop 3 for k in range(l): # loop 3.1 relu_out[j][k] = accu_out[j][k]>0 ? accu_out[j][k]:0; for k in range(l/2): # loop 3.1 maxp_out[j][k] = max(relu_out[j][2*k], relu_out[j][2*k+1]); |
2.4.1. System Architecture Design
2.4.2. Computation Modules Design
2.4.3. Control Modules Design
3. Optimization Methods and Results
3.1. Network Model Design and Parameter Reduction
- Downsampling the data. We can use downsampling to reduce the amount of computation significantly while the accuracy of classification remains high.
- Use more CONV layers to reduce the FC layer’s weights. The more CONV layers, the shorter length of the FC layer’s input tensor, and the fewer FC layer’s parameters.
- Modify the CONV layers’ structure to reduce CONV layers’ parameters. Reducing the ratio of output channels to input channels can effectively reduce the parameters of the convolution layers.
3.2. Hardware System Optimization
3.2.1. Memory Access Optimization Method 1: Continuous Read Mode
3.2.2. Memory Access Optimization Method 2: Task Pipelining
3.2.3. Memory Access Optimization Method 3: Use BRAM
3.2.4. Memory Access Optimization Results
3.3. Comparison with Related Studies
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Wang, N.; Yu, Y.; Huang, D.; Xu, B.; Liu, J.; Li, T.; Xue, L.; Shan, Z.; Chen, Y.; Wang, J. Pulse diagnosis signals analysis of fatty liver disease and cirrhosis patients by using machine learning. Sci. World J. 2015, 2015. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Charbonnier, S.; Galichet, S.; Mauris, G.; Siché, J.P. Statistical and fuzzy models of ambulatory systolic blood pressure for hypertension diagnosis. IEEE Trans. Instrum. Meas. 2000, 49, 998–1003. [Google Scholar] [CrossRef]
- He, D.; Wang, L.; Fan, X.; Yao, Y.; Geng, N.; Sun, Y.; Xu, L.; Qian, W. A new mathematical model of wrist pulse waveforms characterizes patients with cardiovascular disease—A pilot study. Med. Eng. Phys. 2017, 48, 142–149. [Google Scholar] [CrossRef] [PubMed]
- Gomes Ribeiro Moura, N.; Sá Ferreira, A. Pulse waveform analysis of chinese pulse images and its association with disability in hypertension. JAMS J. Acupunct. Meridian Stud. 2016, 9, 93–98. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Zhang, Y.; Yao, L.; Song, H.; Kos, A. A sensor-based wrist pulse signal processing and lung cancer recognition. J. Biomed. Inform. 2018, 79, 107–116. [Google Scholar] [CrossRef] [PubMed]
- Fei, Z. Contemporary Sphygmology in Traditional Chinese Medicine; People’s Medical Publishing House: Beijing, China, 2003. [Google Scholar]
- Hu, X.; Zhu, H.; Xu, J.; Xu, D.; Dong, J. Wrist pulse signals analysis based on Deep Convolutional Neural Networks. In Proceedings of the 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2014), Honolulu, HI, USA, 21–24 May 2014. [Google Scholar] [CrossRef]
- Wang, Y.-Y.L.; Hsu, T.-L.; Jan, M.-Y.; Wang, W.-K. Theory and applications of the harmonic analysis of arterial pressure pulse wave. J. Med. Biol. Eng. 2010, 30, 125–131. [Google Scholar] [CrossRef]
- Lu, G.; Jiang, Z.; Ye, L.; Huang, Y. Pulse feature extraction based on improved gaussian model. In Proceedings of the Proceedings—2014 International Conference on Medical Biometrics, ICMB 2014, Shenzhen, China, 30 May–1 June 2014; pp. 90–94. [Google Scholar]
- Tang, A.C.Y.; Chung, J.W.Y.; Wong, T.K.S. Digitalizing traditional chinese medicine pulse diagnosis with artificial neural network. Telemed. e-Health 2012, 18, 446–453. [Google Scholar] [CrossRef] [PubMed]
- Xu, L.S.; Meng, M.Q.H.; Wang, K.Q. Pulse image recognition using fuzzy neural network. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, 22–26 August 2007; Volume 36, pp. 3148–3151. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, L.; Zhang, D.; Zhang, D. Wrist pulse signal diagnosis using modified Gaussian models and Fuzzy C-Means classification. Med. Eng. Phys. 2009, 31, 1283–1289. [Google Scholar] [CrossRef] [PubMed]
- Shu, J.J.; Sun, Y. Developing classification indices for Chinese pulse diagnosis. Complement. Ther. Med. 2007, 15, 190–198. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, Y.H.; Yang, Q.H.; Shi, H.F. Pulse feature analysis and extraction based on pulse mechanism analysis. In Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009, Los Angeles, CA, USA, 31 March–2 April 2009; Volume 7, pp. 53–56. [Google Scholar] [CrossRef]
- Hudoba, G. Vascular health diagnosis by pulse wave analysis. In Proceedings of the SAMI 2010—8th International Symposium on Applied Machine Intelligence and Informatics, Herlany, Slovakia, 28–30 January 2010; pp. 89–91. [Google Scholar] [CrossRef]
- Sareen, M.; Abhinav, A.; Prakash, P.; Anand, S. Wavelet decomposition and feature extraction from pulse signals of the radial artery. In Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering, Phuket, Thailand, 20–22 December 2008; pp. 551–555. [Google Scholar] [CrossRef]
- Zhang, P.Y.; Wang, H.Y. A framework for automatic time-domain characteristic parameters extraction of human pulse signals. EURASIP J. Adv. Signal Process. 2008, 2008. [Google Scholar] [CrossRef] [Green Version]
- Joshi, A.; Chandran, S.; Jayaraman, V.K.; Kulkarni, B.D. Arterial pulse system modern methods for traditional indian. In Proceedings of the 2007 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, 22–26 August 2007; pp. 608–611. [Google Scholar] [CrossRef]
- Li, J.; Cao, Y.; Liu, Q.; Jiao, Q. Determination of urinary L-citrulline by enzymatic method. Chin. J. Anal. Chem. 2006, 34, 379–381. [Google Scholar] [CrossRef]
- Wang, K.; Wang, L.; Wang, D.; Xu, L. SVM classification for discriminating cardiovascular disease patients from non-cardiovascular disease controls using pulse waveform variability analysis. Lect. Notes Comput. Sci. 2005, 109–119. [Google Scholar] [CrossRef]
- Wang, H.; Cheng, Y. A quantitative system for pulse diagnosis in traditional Chinese medicine. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Shanghai, China, 17–18 January 2006; Volume 7, pp. 5676–5679. [Google Scholar] [CrossRef]
- Qiu, J.; Wang, J.; Yao, S.; Guo, K.; Li, B. Going deeper with embedded FPGA Platform for Convolutional Neural Network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 21 February 2016; pp. 26–35. [Google Scholar] [CrossRef]
- Ma, Y.; Suda, N.; Cao, Y.; Seo, J.S.; Vrudhula, S. Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA. In Proceedings of the FPL 2016—26th International Conference on Field-Programmable Logic and Applications, Lausanne, Switzerland, 29 August–2 September 2016. [Google Scholar] [CrossRef]
- Ma, Y.; Cao, Y.; Vrudhula, S.; Seo, J.S. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the FPGA 2017—The 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2017; pp. 45–54. [Google Scholar] [CrossRef]
- Zhang, C. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the FPGA 2015—The 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170. [Google Scholar] [CrossRef]
- Li, S.; Sun, K.; Luo, Y.; Yadav, N.; Choi, K. Novel CNN-based AP2D-net accelerator: An area and power efficient solution for real-time applications on mobile FPGA. Electron 2020, 9, 832. [Google Scholar] [CrossRef]
- Gong, L.; Wang, C.; Li, X.; Chen, H.; Zhou, X. MALOC: A fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip. IEEE Trans. Comput. Des. Integr. Circuits Syst. 2018, 37, 2601–2612. [Google Scholar] [CrossRef]
- Zhang, C.; Wu, D.; Sun, J.; Sun, G.; Luo, G.; Cong, J. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design, ISLPED 2016, San Francisco, CA, USA, 8–10 August 2016; pp. 326–331. [Google Scholar] [CrossRef]
- Di Cecco, R.; Lacey, G.; Vasiljevic, J.; Chow, P.; Taylor, G.; Areibi, S. Caffeinated FPGAs: FPGA framework for convolutional neural networks. In Proceedings of the 2016 International Conference on Field-Programmable Technology, FPT 2016, Xi’an, China, 7–9 December 2016; pp. 265–268. [Google Scholar] [CrossRef] [Green Version]
- Guo, K.; Sui, L.; Qiu, J.; Yu, J.; Wang, J.; Yao, S.; Han, S.; Wang, Y.; Yang, H. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Comput. Des. Integr. Circuits Syst. 2018, 37, 35–47. [Google Scholar] [CrossRef]
- Geng, T.; Wang, T.; Sanaullah, A.; Yang, C.; Patel, R.; Herbordt, M. A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL), Dublin, Ireland, 27–31 August 2018; pp. 394–398. [Google Scholar] [CrossRef]
- Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv 2016, arXiv:1602.02830. [Google Scholar]
- Chen, C.; Li, Z.; Zhang, Y.; Zhang, S.; Hou, J.; Zhang, H. A 3D wrist pulse signal acquisition system for width information of pulse wave. Sensors 2020, 20, 11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
No. | Layer 1 | Layer 2 | Layer 3 | Layer 4 | Layer 5 | Layer 6–9 |
---|---|---|---|---|---|---|
1 | (1–16) 1 | (16–32) | (32–32) | (32–32) | (32–32) | (32–32), (32–32), (128,100)2, (100,4) |
2 | (1–8) | (8–32) | (32–32) | (32–32) | (32–32) | |
3 | (1–8) | (8–16) | (16–32) | (32–32) | (32–32) | |
4 | (1–4) | (4–16) | (16–32) | (32–32) | (32–32) | |
5 | (1–4) | (4–8) | (8–16) | (16–32) | (32–32) | |
6 | (1–2) | (2–4) | (4–8) | (8–16) | (16–32) |
Layer | Solution 1/Cycles | Solution 2/Cycles | Solution 3/Cycles | Solution 4/Cycles |
---|---|---|---|---|
CONV1 | 33,908 | 10,320 | 4027 | 3411 |
CONV2 | 77,650 | 14,309 | 6069 | 4710 |
CONV3 | 136,134 | 21,171 | 9884 | 6648 |
CONV4 | 267,820 | 35,892 | 19,037 | 11,292 |
CONV5 | 535,148 | 67,012 | 37,794 | 20,868 |
CONV6 | 533,049 | 60,047 | 30,894 | 20,484 |
CONV7 | 298,832 | 30,694 | 14,344 | 10,812 |
FC1 | 76,335 | 9981 | 4614 | 4254 |
FC2 | 4184 | 530 | 303 | 238 |
Total | 1,963,060 | 249,956 | 126,966 | 82,717 |
Component (Total) | Solution 1 | Solution 2 | Solution 3 | Solution 4-1 | Solution 4-2 |
---|---|---|---|---|---|
Clock (MHz) | 100 | 100 | 100 | 100 | 170 |
BRAMs (36 Kb) | 26.30% | 35.93% | 68.52% | 39.63% | 39.63% |
DSPs | 61.25% | 61.25% | 61.25% | 61.25% | 61.25% |
LUT (63,400) | 37.63% | 37.80% | 39.92% | 29.76% | 29.83% |
LUTRAM (19,000) | 12.35% | 12.36% | 12.39% | 7.42% | 7.43% |
Flip-flop (F/F) (126,800) | 28.30% | 28.66% | 28.25% | 23.72% | 23.75% |
Latency (ms) | 19.631 | 2.499 | 1.270 | 0.827 | 0.487 |
Power (W) | 1.63 | 1.638 | 1.645 | 0.714 | 1.089 |
[27] | [30] | [31] | [26] | Our Work | |
---|---|---|---|---|---|
CNN Model | AlexNet | VGG16 | VGG16 | AP2D-Net | Self-Design |
Platform | Vertix-7 VX690T | Zynq XC7Z020 | Virtex-7 VX690T | Ultra96 | Artix XC7A100T |
Clock (MHz) | 150 | 214 | 150 | 300 | 100 |
BRAMs (36 Kb) | 2192 | 85.5 | 1220 | 162 | 53.5 |
DSPs | 2980 | 190 | 2160 | 287 | 147 |
Flip-flop (F/F) | 281.8 K | 35.5 K | - | 94.3 K | 30.08 K |
Latency (ms) | - | 364 | 106.6 | 32.8 | 0.827 |
Power (W) | 31.2 | - | 35 | 5.59 | 0.714 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, C.; Li, Z.; Zhang, Y.; Zhang, S.; Hou, J.; Zhang, H. Low-Power FPGA Implementation of Convolution Neural Network Accelerator for Pulse Waveform Classification. Algorithms 2020, 13, 213. https://doi.org/10.3390/a13090213
Chen C, Li Z, Zhang Y, Zhang S, Hou J, Zhang H. Low-Power FPGA Implementation of Convolution Neural Network Accelerator for Pulse Waveform Classification. Algorithms. 2020; 13(9):213. https://doi.org/10.3390/a13090213
Chicago/Turabian StyleChen, Chuanglu, Zhiqiang Li, Yitao Zhang, Shaolong Zhang, Jiena Hou, and Haiying Zhang. 2020. "Low-Power FPGA Implementation of Convolution Neural Network Accelerator for Pulse Waveform Classification" Algorithms 13, no. 9: 213. https://doi.org/10.3390/a13090213
APA StyleChen, C., Li, Z., Zhang, Y., Zhang, S., Hou, J., & Zhang, H. (2020). Low-Power FPGA Implementation of Convolution Neural Network Accelerator for Pulse Waveform Classification. Algorithms, 13(9), 213. https://doi.org/10.3390/a13090213