# Hardware Platform-Aware Binarized Neural Network Model Optimization

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

**Deepbit**, which can explore optimal BNN models for target hardware platforms by using the binary search method and hardware cost estimation charts. More specifically, our method is performed over three steps:

**Pre-Training**: Before training, we develop cost estimation charts. The cost estimation charts are developed by deploying various models on the target hardware and calculating their actual costs.**Training**: We propose the**Deepbit**method, which searches over a large hyperspace and outputs a series of efficient BNN models with variable depths.**Post-Training**: Finally, we use the cost estimation charts to predict the performance of our networks on actual hardware. Based on the predictions, we choose the most efficient network. In the scope of this paper, FPGA, GPU, and RRAM are target hardware platforms, while the MNIST dataset is used for training to demonstrate the effectiveness of our method.

## 2. Background and Related Work

#### 2.1. Related Work

#### 2.2. Binarized Neural Networks (BNNs)

## 3. Basic Design Strategies

## 4. Proposed Architecture Search Solution

#### 4.1. Hardware Cost Estimation

#### 4.2. Architectural Search via Deepbit Method

**Deepbit**algorithm is the most crucial part of the proposed solution, which is responsible for training and hardware evaluation to explore the optimal model for a specific hardware platform. In particular, the training process is performed first for a series of BNN models to reduce the searching scope. Next, hardware cost for output BNN models is estimated according to the hardware cost estimation charts. Finally, the ultimate optimal BNN model is determined based on hardware cost comparison among BNN models. The detail of an example used for our experiments is described with the following input:

- The set L defines the range for the depth (the number of convolution layers) of the BNN model. In our experiments:$$\mathrm{L}=\{3,4,5,6,7,8\}$$
- The maximum number of uniform channels per layer. In our experiments, we set this value at 50.
- The threshold for an acceptable model. In most cases, the threshold is the minimum acceptable accuracy.

**Deepbit**algorithm during the searching process. Similar to FPGA devices with streaming BNN architecture, RRAM devices affected by IR drop and sneak path also consider minimizing the number of channels on each layer an essential factor needed to perform. In particular, as shown in Figure 6, when increasing the number of channels into each layer, the accuracy is affected more severely, and the accuracy deviation consequently increases. In conclusion, during the searching process, to reduce impacts on critical target results for RRAM and FPGA devices, in this work, the

**Deepbit**algorithm aims to minimize the number of channels on each layer. In doing so, the output of the

**Deepbit**algorithm is a series of optimal BNN models. Each of them is an optimal BNN model with the minimum number of channels per layer, and the number of layers is the corresponding value in the set of L input. Therefore, the number of output models is the same as the size of set L.

**Deepbit**method is performed in three phases. In the first phase, for any given model, the width of the network (the number of output channels of each convolution layer) is reduced uniformly for each layer. In other words, for a given model, the width of the network remains constant across all layers. The first phase is started by defining a model with a depth equal to the minimum number in set L, while the width of the network is defined by the mentioned second input (50 for our experiments). The binary search algorithm is used to explore the BNN model with a minimal uniform width that matches the minimum accuracy threshold defined in the third input to our method. The algorithm for phase 1 is explained in Algorithm 1.

Algorithm 1: Optimal uniform channel search for a given BNN model |

Algorithm 2: Optimal width search for a given BNN model |

**Deepbit**algorithm is summarized in Algorithm 4. More specifically, based on the initial inputs, including the number of layers, accuracy threshold, and initial max-width, the $uniform\_width\_search$ procedure is used to explore $optimal\_uniform\_width$ (phase 1). Then, the $optimal\_uniform\_width$ is used to find the $optimal\_width$ for each layer by using the $optimal\_width\_search$ procedure (phase 2). Finally, the $optimal\_widths$ of all layers are used to construct a unique model, and this model is continuously optimized with the $optimal\_bnn\_search$ procedure until we have the minimum width for each layer, and this final model satisfies the accuracy threshold (phase 3).

Algorithm 3: Optimal BNN search based on optimal width configuration |

Algorithm 4: Deepbit Architecture search algorithm |

## 5. Experimental Results and Discussion

#### 5.1. Optimal BNN Search

**Deepbit**method, in this subsection, the searching process performed on our experiments with a specific configuration is described. In addition, the corresponding results are also shown to prove the efficiency of the proposed method. In particular, for FPGA, the target for the optimal BNN architecture is 98.4% accuracy with minimal hardware resources. Meanwhile, the target for GPU and RRAM is simply developing a BNN architecture with maximum accuracy. As mentioned in the previous section, the

**Deepbit**method takes three inputs:

- L = {3, 4, 5, 6, 7, 8}
- maximum number of channels = 50
- minimum accuracy threshold = 98.4%

**Deepbit**architecture search method.

#### 5.2. Estimating the Hardware Costs for Optimal Models

#### 5.3. Analysis and Discussion

**md1**, which has the fewest MAC operations, we can see that the proposed method can reduce the power consumption of the target hardware compared to using MAC operations. In addition, according to Table 9, the estimation method gives valuable data related to the hardware cost compared to actual hardware overhead. Therefore, the network designs can be considerably optimized on the critical cost when applying the proposed method for complex networks and datasets. For example, in our experiments, the model

**md4**with six convolution layers has almost 1.5 times more MAC operations than the model with three layers but has similar hardware statistics as the model with three layers. This result is proof that MAC operations can be misleading and not a reasonable estimate for actual hardware performance.

#### 5.4. Future Work

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E.; Andina, D. Deep Learning for Computer Vision: A Brief Review. Intell. Neurosci.
**2018**, 2018, 7068349. [Google Scholar] [CrossRef] - Nassif, A.B.; Shahin, I.; Attili, I.; Azzeh, M.; Shaalan, K. Speech recognition using deep neural networks: A systematic review. IEEE Access
**2019**, 7, 19143–19165. [Google Scholar] [CrossRef] - LeCun, Y.; Denker, J.S.; Solla, S.A. Optimal brain damage. In Advances in Neural Information Processing Systems; Touretzky, D., Ed.; Morgan-Kaufmann: San Mateo, CA, USA, 1990; pp. 598–605. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning both weights and connections for efficient neural networks. arXiv
**2015**, arXiv:1506.02626. [Google Scholar] - Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv
**2015**, arXiv:1510.00149. [Google Scholar] - Lin, D.; Talathi, S.; Annapureddy, S. Fixed point quantization of deep convolutional networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 2849–2858. [Google Scholar]
- Wu, J.; Leng, C.; Wang, Y.; Hu, Q.; Cheng, J. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4820–4828. [Google Scholar]
- Courbariaux, M.; Bengio, Y.; David, J.P. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Montreal, QC, Canada, 2015; pp. 3123–3131. [Google Scholar]
- Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 525–542. [Google Scholar]
- Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or -1. arXiv
**2016**, arXiv:1602.02830. [Google Scholar] - Tang, W.; Hua, G.; Wang, L. How to train a compact binary neural network with high accuracy? In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Lin, X.; Zhao, C.; Pan, W. Towards accurate binary convolutional neural network. arXiv
**2017**, arXiv:1711.11294. [Google Scholar] - Liu, Z.; Shen, Z.; Savvides, M.; Cheng, K.T. Reactnet: Towards precise binary neural network with generalized activation functions. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 143–159. [Google Scholar]
- Lin, M.; Ji, R.; Xu, Z.; Zhang, B.; Wang, Y.; Wu, Y.; Huang, F.; Lin, C.W. Rotated binary neural network. arXiv
**2020**, arXiv:2009.13055. [Google Scholar] - Blott, M.; Preußer, T.B.; Fraser, N.J.; Gambardella, G.; O’brien, K.; Umuroglu, Y.; Leeser, M.; Vissers, K. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Trans. Reconfigurable Technol. Syst. (TRETS)
**2018**, 11, 1–23. [Google Scholar] [CrossRef] - Umuroglu, Y.; Fraser, N.J.; Gambardella, G.; Blott, M.; Leong, P.; Jahre, M.; Vissers, K. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2017; pp. 65–74. [Google Scholar]
- Liang, S.; Yin, S.; Liu, L.; Luk, W.; Wei, S. FP-BNN: Binarized neural network on FPGA. Neurocomputing
**2018**, 275, 1072–1086. [Google Scholar] [CrossRef] - Fu, C.; Zhu, S.; Su, H.; Lee, C.E.; Zhao, J. Towards fast and energy-efficient binarized neural network inference on fpga. arXiv
**2018**, arXiv:1810.02068. [Google Scholar] - Sun, X.; Yin, S.; Peng, X.; Liu, R.; Seo, J.S.; Yu, S. XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018; pp. 1423–1428. [Google Scholar]
- Yin, S.; Jiang, Z.; Seo, J.S.; Seok, M. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks. IEEE J.-Solid-State Circuits
**2020**, 55, 1733–1743. [Google Scholar] [CrossRef] - Agrawal, A.; Jaiswal, A.; Roy, D.; Han, B.; Srinivasan, G.; Ankit, A.; Roy, K. Xcel-RAM: Accelerating binary neural networks in high-throughput SRAM compute arrays. IEEE Trans. Circuits Syst. I Regul. Pap.
**2019**, 66, 3064–3076. [Google Scholar] [CrossRef] [Green Version] - Khwa, W.S.; Chen, J.J.; Li, J.F.; Si, X.; Yang, E.Y.; Sun, X.; Liu, R.; Chen, P.Y.; Li, Q.; Yu, S.; et al. A 65 nm 4 Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. In Proceedings of the 2018 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 11–15 February 2018; pp. 496–498. [Google Scholar]
- Kim, H.; Sim, J.; Choi, Y.; Kim, L.S. A kernel decomposition architecture for binary-weight convolutional neural networks. In Proceedings of the 54th Annual Design Automation Conference 2017, Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
- Alwani, M.; Chen, H.; Ferdman, M.; Milder, P. Fused-layer CNN accelerators. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016; pp. 1–12. [Google Scholar]
- Chen, Y.H.; Krishna, T.; Emer, J.S.; Sze, V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J.-Solid-State Circuits
**2016**, 52, 127–138. [Google Scholar] [CrossRef] [Green Version] - Lu, L.; Liang, Y. SpWA: An efficient sparse winograd convolutional neural networks accelerator on FPGAs. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–29 June 2018; pp. 1–6. [Google Scholar]
- Lu, L.; Liang, Y.; Xiao, Q.; Yan, S. Evaluating fast algorithms for convolutional neural networks on FPGAs. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA, 30 April–2 May 2017; pp. 101–108. [Google Scholar]
- Qiu, J.; Wang, J.; Yao, S.; Guo, K.; Li, B.; Zhou, E.; Yu, J.; Tang, T.; Xu, N.; Song, S.; et al. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 21–23 February 2016; pp. 26–35. [Google Scholar]
- Wei, X.; Liang, Y.; Li, X.; Yu, C.H.; Zhang, P.; Cong, J. TGPA: Tile-grained pipeline architecture for low latency CNN inference. In Proceedings of the International Conference on Computer-Aided Design, San Diego, CA, USA, 5–8 November 2018; pp. 1–8. [Google Scholar]
- Wei, X.; Yu, C.H.; Zhang, P.; Chen, Y.; Wang, Y.; Hu, H.; Liang, Y.; Cong, J. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017, Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
- Xiao, Q.; Liang, Y.; Lu, L.; Yan, S.; Tai, Y.W. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017, Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
- Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170. [Google Scholar]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv
**2016**, arXiv:1611.01578. [Google Scholar] - Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8697–8710. [Google Scholar]
- Stamoulis, D.; Cai, E.; Juan, D.C.; Marculescu, D. Hyperpower: Power- and memory-constrained hyper-parameter optimization for neural networks. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018; pp. 19–24. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2815–2823. [Google Scholar]
- Wu, B.; Dai, X.; Zhang, P.; Wang, Y.; Sun, F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y.; Keutzer, K. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10734–10742. [Google Scholar]
- Guo, Z.; Zhang, X.; Mu, H.; Heng, W.; Liu, Z.; Wei, Y.; Sun, J. Single path one-shot neural architecture search with uniform sampling. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 544–560. [Google Scholar]
- Chu, X.; Zhang, B.; Xu, R. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 12239–12248. [Google Scholar]
- Cai, H.; Zhu, L.; Han, S. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv
**2018**, arXiv:1812.00332. [Google Scholar] - Stamoulis, D.; Ding, R.; Wang, D.; Lymberopoulos, D.; Priyantha, B.; Liu, J.; Marculescu, D. Single-path nas: Designing hardware-efficient convnets in less than 4 hours. arXiv
**2019**, arXiv:1904.02877. [Google Scholar] - Wang, K.; Liu, Z.; Lin, Y.; Lin, J.; Han, S. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8612–8620. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2820–2828. [Google Scholar]
- Google. Edge TPU. Available online: https://cloud.google.com (accessed on 20 November 2021).
- Hruska, J. New Movidius Myriad X VPU Packs a Custom Neural Compute Engine. August 2017. Available online: https://www.extremetech.com/computing/254772-new-movidiusmyriad-x-vpu-packs-custom-neural-compute-engine (accessed on 20 November 2021).
- Hruska, J. Nvidia’s Jetson Xavier Stuffs Volta Performance into Tiny Form Factor. June 2018. Available online: https://www.extremetech.com/computing/270681-nvidias-jetsonxavier-stuffs-volta-performance-into-tiny-form-factor (accessed on 20 November 2021).
- Benmeziane, H.; Maghraoui, K.E.; Ouarnoughi, H.; Niar, S.; Wistuba, M.; Wang, N. A Comprehensive Survey on Hardware-Aware Neural Architecture Search. arXiv
**2021**, arXiv:2101.09336. [Google Scholar] - Smithson, S.C.; Yang, G.; Gross, W.J.; Meyer, B.H. Neural networks designing neural networks: Multi-objective hyper-parameter optimization. In Proceedings of the 35th International Conference on Computer-Aided Design, Austin, TX, USA, 7–10 November 2016; pp. 1–8. [Google Scholar]
- Gordon, A.; Eban, E.; Nachum, O.; Chen, B.; Wu, H.; Yang, T.J.; Choi, E. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1586–1595. [Google Scholar]
- Dong, J.D.; Cheng, A.C.; Juan, D.C.; Wei, W.; Sun, M. Ppp-Net: Platform-Aware Progressive Search for Pareto-Optimal Neural Architectures. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Elsken, T.; Metzen, J.H.; Hutter, F. Efficient multi-objective neural architecture search via lamarckian evolution. arXiv
**2018**, arXiv:1804.09081. [Google Scholar] - Zhou, Y.; Ebrahimi, S.; Arık, S.Ö.; Yu, H.; Liu, H.; Diamos, G. Resource-efficient neural architect. arXiv
**2018**, arXiv:1806.07912. [Google Scholar] - Hsu, C.H.; Chang, S.H.; Liang, J.H.; Chou, H.P.; Liu, C.H.; Chang, S.C.; Pan, J.Y.; Chen, Y.T.; Wei, W.; Juan, D.C. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv
**2018**, arXiv:1806.10332. [Google Scholar] - Li, J.; Diao, W.; Sun, X.; Feng, Y.; Zhang, W.; Chang, Z.; Fu, K. Automated and lightweight network design via random search for remote sensing image scene classification. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2020**, 43, 1217–1224. [Google Scholar] [CrossRef] - Zhang, L.L.; Yang, Y.; Jiang, Y.; Zhu, W.; Liu, Y. Fast hardware-aware neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 692–693. [Google Scholar]
- Gupta, S.; Akin, B. Accelerator-aware neural network design using automl. arXiv
**2020**, arXiv:2003.02838. [Google Scholar] - Cai, H.; Gan, C.; Wang, T.; Zhang, Z.; Han, S. Once-for-all: Train one network and specialize it for efficient deployment. arXiv
**2019**, arXiv:1908.09791. [Google Scholar] - Laube, K.A.; Zell, A. Shufflenasnets: Efficient cnn models through modified efficient neural architecture search. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–6. [Google Scholar]
- Woo, J.; Peng, X.; Yu, S. Design considerations of selector device in cross-point RRAM array for neuromorphic computing. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–4. [Google Scholar]
- Ni, L.; Liu, Z.; Song, W.; Yang, J.J.; Yu, H.; Wang, K.; Wang, Y. An energy-efficient and high-throughput bitwise CNN on sneak-path-free digital ReRAM crossbar. In Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Monterey, CA, USA, 10–12 August 2017; pp. 1–6. [Google Scholar]

**Figure 2.**The building module for a network architecture: (

**a**) layer module for our network architecture; (

**b**) layer module for our network architecture with Max pooling.

**Figure 5.**The hardware and power cost estimation for a list of BNN models with different L and C values on FPGA: (

**a**) hardware resources for BNN models with different L and C values on FPGA; (

**b**) power consumption for BNN models with different L and C values on FPGA.

**Figure 6.**An example of accuracy deviation on RRAM devices (40 nm, 20 nm, 10 nm), effected by IR drop and sneak path, in which Bitline number represents the number of channels on each layer, and read inaccuracy is accuracy deviation: (

**a**) accruacy deviation with a number of bitline from 1 to 8; (

**b**) accuracy deviation with a number of bitline from 1 to 32.

Pooling Layers (#) | Channels per Layer (#) | Accuracy (%) | MAC Operations |
---|---|---|---|

0 | 32 | 98.54 | 29,352,960 |

1 | 32 | 98.48 | 28,976,640 |

2 | 32 | 98.44 | 18,063,360 |

Type of Pooling | Channels per Layers (#) | Accuracy (%) |
---|---|---|

Average | 32 | 98.35 |

Max | 32 | 98.44 |

Channels\Layers | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|

5 | 89.79 | 91.87 | 92.14 | 91.61 | 93.07 | 94.09 | 93.69 | 93.41 |

10 | 95.46 | 96.23 | 96.66 | 96.97 | 96.86 | 97.09 | 97.57 | 97.29 |

15 | 96.61 | 97.46 | 97.56 | 98.01 | 97.95 | 97.91 | 98.19 | 98.15 |

20 | 97.72 | 97.85 | 98.05 | 98.25 | 98.44 | 98.32 | 98.24 | 98.47 |

25 | 97.85 | 98.33 | 98.37 | 98.35 | 98.51 | 98.44 | 98.7 | 98.4 |

30 | 98.15 | 98.34 | 98.48 | 98.59 | 98.55 | 98.57 | 98.71 | 98.62 |

35 | 98.45 | 98.44 | 98.38 | 98.6 | 98.65 | 98.82 | 98.7 | 98.6 |

40 | 98.49 | 98.57 | 98.74 | 98.71 | 98.7 | 98.83 | 98.64 | 98.75 |

45 | 98.4 | 98.48 | 98.68 | 98.76 | 98.85 | 98.8 | 98.78 | 98.86 |

50 | 98.47 | 98.48 | 98.82 | 98.88 | 98.84 | 98.75 | 98.84 | 98.88 |

Channels\Layers | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|

10 | 0.611 | 0.676 | 0.74 | 0.791 | 0.836 | |||

15 | 0.721 | 0.877 | 0.985 | 1.127 | 1.26 | 1.422 | 1.561 | |

20 | 0.77 | 1.053 | 1.281 | 1.494 | 1.718 | 1.983 | 2.251 | 2.499 |

25 | 0.966 | 1.117 | 1.692 | 2.143 | 2.525 | |||

30 | 1.429 | 2.079 |

md1 | md2 | md3 | md4 | md5 | md6 | |
---|---|---|---|---|---|---|

3 Layers | 4 Layers | 5 Layers | 6 Layers | 7 Layers | 8 Layers | |

Layers | Channels | Channels | Channels | Channels | Channels | Channels |

1 | 26 | 23 | 20 | 19 | 11 | 8 |

2 | 24 | 21 | 21 | 19 | 9 | 15 |

3 | 31 | 22 | 21 | 14 | 20 | 17 |

4 | 21 | 21 | 16 | 15 | 17 | |

5 | 20 | 18 | 20 | 17 | ||

6 | 19 | 20 | 10 | |||

7 | 20 | 17 | ||||

8 | 17 | |||||

MAC OPs | 5,989,368 | 7,756,896 | 10,206,896 | 8,916,432 | 9,964,640 | 9,854,684 |

Channels\Layers | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|

10 | 9192 | 10,329 | 11,935 | 12,381 | 13,259 | |||

15 | 11,003 | 13,340 | 15,546 | 17,823 | 20,058 | 22,268 | 24,592 | |

20 | 12,648 | 16,230 | 19,387 | 23,299 | 26,911 | 31,188 | 38,827 | 43,374 |

25 | 16,845 | 22,801 | 29,206 | 38,535 | 45,095 | |||

30 | 25,465 | 35,192 |

Channels\Layers | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|

10 | 6037 | 6732 | 7862 | 8590 | 8859 | |||

15 | 7076 | 8288 | 9511 | 10,755 | 11,551 | 12,720 | 13,975 | |

20 | 7449 | 9157 | 10,483 | 12,258 | 13,987 | 16231 | 17,553 | 19,777 |

25 | 8723 | 11,123 | 13,970 | 16,379 | 18,788 | |||

30 | 11,296 | 14,528 |

Channels\Layers | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|

10 | 0.611 | 0.676 | 0.74 | 0.791 | 0.836 | |||

15 | 0.721 | 0.877 | 0.985 | 1.127 | 1.26 | 1.422 | 1.561 | |

20 | 0.77 | 1.053 | 1.281 | 1.494 | 1.718 | 1.983 | 2.251 | 2.499 |

25 | 0.966 | 1.117 | 1.692 | 2.143 | 2.525 | |||

30 | 1.429 | 2.079 |

Model | Layers | LUTS | FFlops | Power | MAC OPs |
---|---|---|---|---|---|

md1 | 3 | 20,293 | 9752.2 | 1.1512 | 5,989,368 |

md2 | 4 | 18,764 | 10,000.5 | 1.151 | 7,756,896 |

md3 | 5 | 20,565 | 10,901 | 1.33 | 10,206,896 |

md4 | 6 | 19,422 | 10,884.5 | 1.2395 | 8,916,432 |

md5 | 7 | 20,422.17 | 11,679 | 1.296 | 9,964,640 |

md6 | 8 | 19,651 | 11,366 | 1.234 | 9,854,684 |

Channels\Layers | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|

10 | 95.43 | 95.33 | 96.2 | 96.52 | 96.25 | 96.39 | 97.14 | 96.96 |

30 | 97.38 | 98.07 | 97.72 | 98.15 | 98.45 | 97.78 | 98.37 | 98.24 |

50 | 96.93 | 97.78 | 97.86 | 97.34 | 97.94 | 95.63 | 97.61 | 96.92 |

Model | Layers (#) | LUTs | FFlops | Power | MAC OPs |
---|---|---|---|---|---|

md1 | 3 | 19,211 | 9104 | 1.126 | 5,989,368 |

md2 | 4 | 20,692 | 10,082 | 1.123 | 7,756,896 |

md4 | 6 | 21,410 | 10,936 | 1.256 | 8,916,432 |

Layers | Width Configuration | Accuracy on Target Device (%) | |
---|---|---|---|

FPGA | 4 | 23-21-22-21 | 98.37 |

RRAM | 7 | 30-30-30-30-30-30-30 | 98.45 |

GPU | 6 | 50-50-50-50-50-50 | 98.88 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Vo, Q.H.; Asim, F.; Alimkhanuly, B.; Lee, S.; Kim, L.
Hardware Platform-Aware Binarized Neural Network Model Optimization. *Appl. Sci.* **2022**, *12*, 1296.
https://doi.org/10.3390/app12031296

**AMA Style**

Vo QH, Asim F, Alimkhanuly B, Lee S, Kim L.
Hardware Platform-Aware Binarized Neural Network Model Optimization. *Applied Sciences*. 2022; 12(3):1296.
https://doi.org/10.3390/app12031296

**Chicago/Turabian Style**

Vo, Quang Hieu, Faaiz Asim, Batyrbek Alimkhanuly, Seunghyun Lee, and Lokwon Kim.
2022. "Hardware Platform-Aware Binarized Neural Network Model Optimization" *Applied Sciences* 12, no. 3: 1296.
https://doi.org/10.3390/app12031296