# Entropy-Driven Adaptive Filtering for High-Accuracy and Resource-Efficient FPGA-Based Neural Network Systems

^{*}

## Abstract

**:**

## 1. Introduction

- We show how the accuracy of the low precision neural network working with real video data can be improved with increasing processing rates.
- We propose a novel entropy-based uncertainty estimation scheme and compared its performance with other estimation techniques.
- We created a novel neural network pipeline with software and hardware components that dynamically adjust the processing rate based on the uncertainty present in the neural network output.

## 2. Background and Related Work

## 3. Methodology

## 4. Proposed Window Filter

## 5. Baseline and Regions Definition

## 6. Window Filter Evaluation

- Scheme A with accuracy as the goal: adopted window filter settings with highest accuracy from Table 3 region I.
- Scheme B with accuracy and efficiency as goals: adopted window filter settings with highest accuracy from Table 3 region II.
- Scheme C with accuracy and efficiency as goals (with more aggressive computational savings): adopted window filter settings with most frames decimated from Table 3 region II.

## 7. Proposed Uncertainty Estimation Measures

#### 7.1. Scheme I: Entropy

#### 7.2. Scheme II: VAR

#### 7.3. Scheme III: Autocorrelation

## 8. Uncertainty Estimation Schemes Evaluation

- In S2 (graph A): scores are below $1\times {10}^{-3}$.
- In S3 (graph B): scores are larger than $4\times {10}^{-3}$ at the point of change.
- In S4 (graph C): scores lay roughly in $1\times {10}^{-3}$ – $2\times {10}^{-3}$ after event happened.
- In S5 (graph D): scores lay roughly in $2\times {10}^{-3}$ – $4\times {10}^{-3}$ after event happened.

## 9. Adaptive Filtering

## 10. Overall Accuracy and Performance Analysis

#### 10.1. Energy Consumption of Various Setups

#### 10.2. Accuracy Gain under Diverse Setups

#### 10.3. Overall Performance Gain

## 11. Conclusions

- Energy Savings: Our adaptive processing strategy coupled with the energy proportional binarized network presented in [6] to measure the obtained energy savings while processing real video streams.
- Scale up for larger scale networks: Our FINN accelerator is limited to a low resolution (32 × 32) input due to constraints on the memory capability of the Zynq Z7020 device used in this research. Future work will consider larger scale networks for more complex datasets like Imagenet.
- Adopt more powerful hardware: Our current approach utilized both PS and PL, because in Zynq Z7020, the PL BRAMs are already fully occupied by the neural network hardware. In the future, it would be useful to explore implementing the system on more powerful FPGAs, such as Ultrascale MPSOC. This will enable our design to run on PL exclusively, which generates better performance and increases the maximum processing rate.
- Performance on a fixed processing resources system: With scheme B, we explored the idea of dynamically decimating frames, while achieving a high accuracy. It would be interesting to compare it against a fixed processing resources system with the same goal. Hence in the future, we can explore more on the performance of fixed processing resources systems, which would then allow us to draw a more comprehensive comparison between the static and dynamic systems.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Nurvitadhi, E.; Venkatesh, G.; Sim, J.; Marr, D.; Huang, R.; Ong Gee Hock, J.; Liew, Y.T.; Srivatsan, K.; Moss, D.; Subhaschandra, S.; et al. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’17), Monterey, CA, USA, 22–24 February 2017; ACM: New York, NY, USA, 2017; pp. 5–14. [Google Scholar] [CrossRef]
- Umuroglu, Y.; Fraser, N.J.; Gambardella, G.; Blott, M.; Leong, P.; Jahre, M.; Vissers, K. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’17), Monterey, CA, USA, 22–24 February 2017; ACM: New York, NY, USA, 2017; pp. 65–74. [Google Scholar] [CrossRef] [Green Version]
- Courbariaux, M.; Bengio, Y. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1. arXiv
**2016**, arXiv:1602.02830. [Google Scholar] - Nurvitadhi, E.; Sheffield, D.; Sim, J.; Mishra, A.; Venkatesh, G.; Marr, D. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi’an, China, 7–9 December 2016; pp. 77–84. [Google Scholar] [CrossRef]
- Moss, D.J.M.; Nurvitadhi, E.; Sim, J.; Mishra, A.; Marr, D.; Subhaschandra, S.; Leong, P.H.W. High performance binary neural networks on the Xeon+FPGA platform. In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 4–8 September 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Nunez-Yanez, J. Energy Proportional Neural Network Inference with Adaptive Voltage and Frequency Scaling. IEEE Trans. Comput.
**2019**, 68, 676–687. [Google Scholar] [CrossRef] [Green Version] - Jokic, P.; Emery, S.; Benini, L. BinaryEye: A 20 kfps Streaming Camera System on FPGA with Real-Time On-Device Image Recognition Using Binary Neural Networks. In Proceedings of the 2018 IEEE 13th International Symposium on Industrial Embedded Systems, SIES 2018-Proceedings, Graz, Austria, 6–8 June 2018; pp. 1–7. [Google Scholar] [CrossRef]
- Sharma, H.; Park, J.; Mahajan, D.; Amaro, E.; Kim, J.K.; Shao, C.; Mishra, A.; Esmaeilzadeh, H. From high-level deep neural models to FPGAs. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016; pp. 1–12. [Google Scholar] [CrossRef]
- Teerapittayanon, S.; McDanel, B.; Kung, H.T. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2464–2469. [Google Scholar]
- Haim Barad, H.T. Fast Inference with Early Exit. Available online: https://www.intel.com/content/www/us/en/artificial-intelligence/posts/fast-inference-with-early-exit.html (accessed on 6 August 2020).
- Huang, G.; Chen, D.; Li, T.; Wu, F.; van der Maaten, L.; Weinberger, K.Q. Multi-Scale Dense Convolutional Networks for Efficient Prediction. arXiv
**2017**, arXiv:1703.09844. [Google Scholar] - Knuth, D. Art of Computer Programming, 3rd ed.; Addison-Wesley: Reading MA, USA, 1968; Volume 2, p. 232. [Google Scholar]
- Cook, J.D. Comparing Three Methods of Computing Standard Deviation. Available online: https://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ (accessed on 27 April 2020).
- Witt, T.J. Using the Autocorrelation Function to Characterize Time Series of Voltage Measurements. Metrologia
**2007**, 44, 201–209. [Google Scholar] [CrossRef]

LUTs | FFs | DSPs | BRAMs |
---|---|---|---|

25,980 (49%) | 36,115 (34%) | 32 (15%) | 132 (94%) |

No. of Frames | Occasionally | |||||
---|---|---|---|---|---|---|

Datasets | per Object | Obj. Size | Obj. Speed | Rotating | out of Frame | Properties |

S1 | 200 (0.8 Hz) | 80% | 0 m/s | 0 rad/s | No | Stationary and Large object size |

S2 | 200 (0.8 Hz) | 50% | 0.32 m/s | 0 rad/s | No | Slow Motion |

S3 | 100 (1.6 Hz) | 50% | 0.32 m/s | 0 rad/s | No | High Switching Frequency |

S4 | 200 (0.8 Hz) | 50% | 0.32 m/s | 6.22 rad/s | No | Rotating |

S5 | 200 (0.8 Hz) | 50% | 0.71 m/s | 0 rad/s | Yes | Fast Motion |

U1 | Object motion changes from stationary to 0.32 m/s (S1 to S2) | |||||

U2 | Object switching frequency changes from 0.8 Hz to 1.6 Hz (S1 to S3) | |||||

U3 | Object rotational speed changes from 0 rad/s to 6.22 rad/s (S1 to S4) | |||||

U4 | Object motion changes from stationary to 0.71 m/s (S1 to S5) |

**Table 3.**Window filter configurations with highest accuracy for each region–scenario pair according to Figure 5 (notation: ${\ell}_{SS}$ = step size, ${\ell}_{WL}$ = window length).

Region 1 | Region 2 | Region 3 | |||||||
---|---|---|---|---|---|---|---|---|---|

Scenario | Step Size | Win. Length | Acc.(%) | Step Size | Win. Length | Acc.(%) | Step Size | Win. Length | Acc.(%) |

S1: Stationary objects | 2 | 11 | 85.0 | 10 | 9 | 83.5 | 14 | 3 | 58.0 |

that occupies > 80% | 10 | 13 | 85.0 | 10 | 10 | 83.5 | 15 | 3 | 54.5 |

of the screen | 10 | 14 | 85.0 | 11 | 11 | 83.0 | 10 | 2 | 54.0 |

S2: Objects with an | 10 | 15 | 80.0 | 15 | 15 | 78.5 | 14 | 3 | 51.5 |

average speed | 1 | 15 | 79.0 | 10 | 10 | 78.0 | 15 | 3 | 51.5 |

of 0.32 m/s | 5 | 15 | 79.0 | 10 | 8 | 77.5 | 10 | 2 | 51.0 |

S3: Switch objects with | 12 | 15 | 89.0 | 15 | 12 | 85.0 | 14 | 3 | 27.0 |

a frequency of 1.6Hz | 1 | 14 | 86.0 | 12 | 10 | 84.0 | 15 | 3 | 24.0 |

instead of 0.8 Hz | 2 | 14 | 86.0 | 15 | 14 | 82.0 | 10 | 2 | 22.0 |

S4: Objects with a | 1 | 10 | 53.0 | 15 | 10 | 52.5 | 14 | 3 | 39.5 |

rotation speed | 5 | 10 | 53.0 | 5 | 4 | 51.5 | 15 | 3 | 36.5 |

of 6.22 rad/s | 2 | 5 | 52.5 | 7 | 7 | 51.5 | 5 | 1 | 36.0 |

S5: Objects with an | 10 | 13 | 66.0 | 10 | 8 | 67.0 | 14 | 3 | 44.5 |

average speed | 11 | 15 | 64.5 | 13 | 13 | 63.5 | 11 | 2 | 42.0 |

of 0.71 m/s | 1 | 11 | 64.0 | 15 | 13 | 63.5 | 10 | 2 | 41.0 |

**Table 4.**Recommended window filter configurations based on the rules defined (window step size–window length).

Scenarios | Scheme A | Scheme B | Scheme C |
---|---|---|---|

Initialization | 1–1 | 1–1 | 1–1 |

S2 | 10–15 | 15-15 | 10–8 |

S3 | 12–15 | 15–12 | 15–12 |

S4 | 1–10 | 15–10 | 15–10 |

S5 | 10–13 | 10–8 | 10–6 |

Entropy | VAR | Autocorrelation | |
---|---|---|---|

Avg. Processing Time ($\mathsf{\mu}\mathrm{s}$) | 93 | 96 | 119 |

Mode | Entropy | Scenarios | Win. Filter Config. (Step Size—Length) | ||
---|---|---|---|---|---|

Uncertainty | Scheme A | Scheme B | Scheme C | ||

0 | / | Initialization | 1–1 | 1–1 | 1–1 |

1 | 0–1 | Transit to Normal Motion (S2) | 10–15 | 15–15 | 10–8 |

2 | 1–2 | Transit to Rotational Motion (S4) | 1–10 | 15–10 | 15–10 |

3 | 2–4 | Transit to Fast Motion (S5) | 10–13 | 10–8 | 10–6 |

4 | > 4 | Change to another object quickly (S3) | 12–15 | 15–12 | 15–12 |

Base | Scheme A | Scheme B | Scheme C | |
---|---|---|---|---|

Average Energy per second (mJ) | 76 | 381 | 264 | 224 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kwan, E.Y.L.; Nunez-Yanez, J.
Entropy-Driven Adaptive Filtering for High-Accuracy and Resource-Efficient FPGA-Based Neural Network Systems. *Electronics* **2020**, *9*, 1765.
https://doi.org/10.3390/electronics9111765

**AMA Style**

Kwan EYL, Nunez-Yanez J.
Entropy-Driven Adaptive Filtering for High-Accuracy and Resource-Efficient FPGA-Based Neural Network Systems. *Electronics*. 2020; 9(11):1765.
https://doi.org/10.3390/electronics9111765

**Chicago/Turabian Style**

Kwan, Elim Yi Lam, and Jose Nunez-Yanez.
2020. "Entropy-Driven Adaptive Filtering for High-Accuracy and Resource-Efficient FPGA-Based Neural Network Systems" *Electronics* 9, no. 11: 1765.
https://doi.org/10.3390/electronics9111765