# The Limits of SEMA on Distinguishing Similar Activation Functions of Embedded Deep Neural Networks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- We simulated the timing attack performed by Batina et al. [15] to recover the activation functions on various similar activation functions. The analysis of the timing behavior of the Leaky ReLU function [20], SELU function [17], Swish function [18], and Fast Sigmoid function [19] was first performed in this work and showed that all of the activation functions could be identified. However, the different implementations of activation functions made for more difficult classification and showed the potential versatility of the SEMA-based attack to different activation function implementations. This contribution corresponds to Section 3.
- We utilized the idea proposed by Takatoi et al. [16] of using SEMA to identify activation functions to further upgrade the attack and analyze the limits of SEMA-based identification. Our work was also implemented and demonstrated on Arduino Uno. We improved the signal processing for noisy measurements by erasing the trend patterns that were not intrinsic to the data. We then report the results of the SEMA attack performed on an MLP using eight similar activation functions to show the limits of this attack. The SEMA attack can observe the differences in operations, but cannot observe the differences in values from the traces. Although this makes it difficult for the SEMA attack to identify activation functions with the same operations with differing values in the functions, this attack is still an effective method to distinguish many popular activation functions. This contribution corresponds to Section 5.
- We discuss the activation functions that are difficult to identify with the timing attack and SEMA attack and the potential of the SEMA attack to be used on different platforms, such as a GPU. This contribution corresponds to Section 5.6.

## 2. Background

#### 2.1. Neural Network

#### 2.1.1. Multilayer Perceptron

#### 2.1.2. Activation Functions

**Sigmoid**- The Sigmoid (logistic) function, i.e., Equation (2), is a nonlinear function that plots arbitrary inputs to outputs in the range $(0,1)$.$$h\left(a\right)=\frac{1}{1+{e}^{-a}}$$
**Tanh**- The Tanh function, i.e., Equation (3), is a rescaling of the Sigmoid function. The Tanh function maps arbitrary inputs to outputs in the range $(-1,1)$. Different from Sigmoid, Tanh is symmetric by the origin.$$h\left(a\right)=\frac{{e}^{a}-{e}^{-a}}{{e}^{a}+{e}^{-a}}=\frac{2}{1+{e}^{-2a}}-1$$
**Softmax**- The softmax function, i.e., Equation (4), can map values into several outputs (or classes), the sum of which becomes 1. The output range is $(0,1)$. It is able to be seen as a probability and is used for classification problems.$$\begin{array}{cc}\hfill h{\left(\mathbf{a}\right)}_{j}& =\frac{{e}^{{a}_{j}}}{{\sum}_{k=1}^{K}{e}^{{a}_{k}}},\phantom{\rule{0.222222em}{0ex}}\mathrm{for}\phantom{\rule{0.222222em}{0ex}}j=1,\dots ,K\phantom{\rule{0.222222em}{0ex}}\hfill \\ \hfill \mathrm{and}\phantom{\rule{0.222222em}{0ex}}\mathbf{a}& =({a}_{1},\dots ,{a}_{K})\in {\mathbb{R}}^{K}\hfill \end{array}$$
**ReLU**- The ReLU function, i.e., Equation (5), is an extremely simple function, which can reduce the time of training and computing. It is mainly used as an activation function for deep neural networks (DNNs).$$h\left(a\right)=\left\{\begin{array}{cc}0,\hfill & \mathrm{for}\phantom{\rule{4.pt}{0ex}}a\le 0\hfill \\ a,\hfill & \mathrm{for}\phantom{\rule{4.pt}{0ex}}a>0\hfill \end{array}\right.$$
**Leaky****ReLU**- The Leaky ReLU function, i.e., Equation (6), is similar to the ReLU function, except that it enables back propagation, even for negative input values by having the positive slope proposed in [20], where $\beta =0.01$.$$h\left(a\right)=\left\{\begin{array}{cc}\beta a,\hfill & \mathrm{for}\phantom{\rule{4.pt}{0ex}}a\le 0\hfill \\ a,\hfill & \mathrm{for}\phantom{\rule{4.pt}{0ex}}a>0\hfill \end{array}\right.$$
**SELU**- The SELU function, i.e., Equation (7), is an improvised version of the ReLU function for self-normalizing neural networks, a type of feed-forward neural network [17]. The SELU function has parameters $\lambda $ and $\alpha $, which have flexible values according to the applications, while we fixed them as $\lambda =1.0507$, $\alpha =1.6733$.$$h\left(a\right)=\lambda \left\{\begin{array}{cc}\alpha ({\mathrm{e}}^{a}-1),\hfill & \mathrm{for}\phantom{\rule{4.pt}{0ex}}a\le 0\hfill \\ a,\hfill & \mathrm{for}\phantom{\rule{4.pt}{0ex}}a>0\hfill \end{array}\right.$$
**Swish****Fast****Sigmoid**- Some activation functions include the exponential function, which is expensive to compute. Timmons et al. evaluated several function approximation techniques to replace the original function with faster execution at the cost of some loss of accuracy [19]. We used one of them in our work, which we call the Fast Sigmoid function, i.e., Equation (9). This function is a cheap alternative to the original Sigmoid function with good enough accuracy. The flexible n parameter was fixed as $n=1$ and $n=9$ in this work.$$h\left(a\right)=\frac{1}{1+{\mathrm{e}}^{-a}},{\mathrm{e}}^{a}={(1+\frac{a}{{2}^{n}})}^{{2}^{n}}$$

#### 2.2. Simple Electromagnetic Analysis

## 3. Identification Using Timing

#### 3.1. Timing Attack on Activation Functions

- The Sigmoid and Swish functions are the same pattern, with a slightly different minimum computation time.
- The Tanh function is not symmetrical compared to the Sigmoid function, which has a similar computation time.
- The ReLU and Leaky ReLU functions are mirror images of each other with a similar computation time.
- The SELU function is similar to the Sigmoid function for negative inputs and similar to the ReLU function for positive inputs.
- The Fast Sigmoid function is symmetrical, yet significantly different from the other functions in the computation time.

#### 3.2. Limitations

## 4. Experimental Setup

#### 4.1. Threat Model

- The attacker has no insight on the architecture of the network.
- The attacker is capable of accessing the target device and acquiring EM emanations. However, the attacker is non-invasive and can only operate the device normally such as accessing random inputs to the network.
- The attacker knows what set of activation functions could be implemented on the architecture.

#### 4.2. Hardware Setup

## 5. Identification Using SEMA

- We first acquired the side-channel trace from the device. In our work, we used EM traces.
- For the next step, we applied signal processing to the acquired trace to obtain the desired trace for visual inspection.
- We then observed the trace and split the trace into two major parts, the multiplication operation of the weights and the activation function phase.
- We classified the traces by the computation time of the activation functions. This was to compare the patterns of peaks in the next step with only similar activation functions that are difficult to distinguish with just timing.
- Finally, we distinguished the similar activation functions by the pattern of the peaks.

#### 5.1. Signal Processing

#### 5.2. Classification Using Calculation Time

#### 5.3. Sigmoid, Tanh, and Swish

#### 5.4. ReLU, Leaky ReLU, and SELU

#### 5.5. Softmax and Fast Sigmoid

#### 5.6. Discussion

## 6. Related Works

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012; Curran Associates, Inc.: New York, NY, USA, 2012; pp. 1106–1114. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tariq, Z.; Shah, S.K.; Lee, Y. Speech Emotion Detection using IoT based Deep Learning for Health Care. In Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), Los Angeles, CA, USA, 9–12 December 2019; pp. 4191–4196. [Google Scholar]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res.
**2013**, 32, 1238–1274. [Google Scholar] [CrossRef][Green Version] - Teufl, P.; Payer, U.; Lackner, G. From NLP (Natural Language Processing) to MLP (Machine Language Processing). In Proceedings of the Computer Network Security, 5th International Conference on Mathematical Methods, Models and Architectures for Computer Network Security, MMM-ACNS 2010, St. Petersburg, Russia, 8–10 September 2010; Kotenko, I.V., Skormin, V.A., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6258, pp. 256–269. [Google Scholar]
- Gilad-Bachrach, R.; Dowlin, N.; Laine, K.; Lauter, K.E.; Naehrig, M.; Wernsing, J. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York, NY, USA, 19–24 June 2016; Volume 48, pp. 201–210. [Google Scholar]
- Fredrikson, M.; Lantz, E.; Jha, S.; Lin, S.M.; Page, D.; Ristenpart, T. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, 20–22 August 2014; Fu, K., Jung, J., Eds.; USENIX Association: Berkeley, CA, USA, 2014; pp. 17–32. [Google Scholar]
- Xu, X.; Liu, C.; Feng, Q.; Yin, H.; Song, L.; Song, D. Neural Network-Based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, 30 October–3 November 2017; Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D., Eds.; ACM: New York, NY, USA, 2017; pp. 363–376. [Google Scholar]
- Kucera, M.; Tsankov, P.; Gehr, T.; Guarnieri, M.; Vechev, M.T. Synthesis of Probabilistic Privacy Enforcement. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, 30 October–3 November 2017; Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D., Eds.; ACM: New York, NY, USA, 2017; pp. 391–408. [Google Scholar]
- Riscure. Available online: https://www.riscure.com/ (accessed on 5 January 2022).
- Yu, H.; Ma, H.; Yang, K.; Zhao, Y.; Jin, Y. DeepEM: Deep Neural Networks Model Recovery through EM Side-Channel Information Leakage. In Proceedings of the 2020 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2020, San Jose, CA, USA, 7–11 December 2020; pp. 209–218. [Google Scholar]
- Wei, L.; Luo, B.; Li, Y.; Liu, Y.; Xu, Q. I Know What You See: Power Side-Channel Attack on Convolutional Neural Network Accelerators. In Proceedings of the 34th Annual Computer Security Applications Conference, ACSAC 2018, San Juan, PR, USA, 3–7 December 2018; ACM: New York, NY, USA, 2018; pp. 393–406. [Google Scholar]
- Yan, M.; Fletcher, C.W.; Torrellas, J. Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures. In Proceedings of the 29th USENIX Security Symposium, Baltimore, MD, USA, 15–17 August 2018. [Google Scholar]
- Hong, S.; Davinroy, M.; Kaya, Y.; Dachman-Soled, D.; Dumitras, T. How to 0wn the NAS in Your Spare Time. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Batina, L.; Bhasin, S.; Jap, D.; Picek, S. CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel. In Proceedings of the 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, 14–16 August 2019; Heninger, N., Traynor, P., Eds.; USENIX Association: Berkeley, CA, USA, 2019; pp. 515–532. [Google Scholar]
- Takatoi, G.; Sugawara, T.; Sakiyama, K.; Li, Y. Simple Electromagnetic Analysis Against Activation Functions of Deep Neural Networks. In Proceedings of the International Conference on Applied Cryptography and Network Security, Rome, Italy, 19–22 October 2020; pp. 181–197. [Google Scholar]
- Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-Normalizing Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 971–980. [Google Scholar]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Timmons, N.G.; Rice, A. Approximating Activation Functions. arXiv
**2020**, arXiv:2001.06370. [Google Scholar] - Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier Nonlinearities Improve Neural Network Acoustic Models. In Proceedings of the International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech and Language Processing, Atlanta, GA, USA, 16 June 2013. [Google Scholar]
- Kocher, P.C.; Jaffe, J.; Jun, B. Differential Power Analysis. In Proceedings of the Advances in Cryptology—CRYPTO’99, 19th Annual International Cryptology Conference, Santa Barbara, CA, USA, 15–19 August 1999; Wiener, M.J., Ed.; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1666, pp. 388–397. [Google Scholar]
- Kocher, P.C. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In Proceedings of the Advances in Cryptology—CRYPTO’96, 16th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 1996; Koblitz, N., Ed.; Springer: Berlin/Heidelberg, Germany, 1996; Volume 1109, pp. 104–113. [Google Scholar]
- Tinkercad. Available online: https://www.tinkercad.com/ (accessed on 5 January 2022).
- Patranabis, S.; Mukhopadhyay, D. Fault Tolerant Architectures for Cryptography and Hardware Security; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Luo, C.; Fei, Y.; Luo, P.; Mukherjee, S.; Kaeli, D.R. Side-channel power analysis of a GPU AES implementation. In Proceedings of the 33rd IEEE International Conference on Computer Design, ICCD 2015, New York, NY, USA, 18–21 October 2015; pp. 281–288. [Google Scholar]
- Chmielewski, L.; Weissbart, L. On Reverse Engineering Neural Network Implementation on GPU. In Proceedings of the Applied Cryptography and Network Security Workshops—ACNS 2021 Satellite Workshops, AIBlock, AIHWS, AIoTS, CIMSS, Cloud S&P, SCI, SecMT, and SiMLA, Kamakura, Japan, 21–24 June 2021; Zhou, J., Ahmed, C.M., Batina, L., Chattopadhyay, S., Gadyatskaya, O., Jin, C., Lin, J., Losiouk, E., Luo, B., Majumdar, S., et al., Eds.; Springer: Amsterdam, The Netherlands, 2021; Volume 12809, pp. 96–113. [Google Scholar]
- Dubey, A.; Cammarota, R.; Aysu, A. MaskedNet: The First Hardware Inference Engine Aiming Power Side-Channel Protection. In Proceedings of the 2020 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2020, San Jose, CA, USA, 7–11 December 2020; pp. 197–208. [Google Scholar]
- Regazzoni, F.; Bhasin, S.; Alipour, A.; Alshaer, I.; Aydin, F.; Aysu, A.; Beroulle, V.; Natale, G.D.; Franzon, P.D.; Hély, D.; et al. Machine Learning and Hardware security: Challenges and Opportunities Invited Talk. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design, ICCAD 2020, San Diego, CA, USA, 2–5 November 2020; pp. 141:1–141:6. [Google Scholar]
- Hu, X.; Liang, L.; Li, S.; Deng, L.; Zuo, P.; Ji, Y.; Xie, X.; Ding, Y.; Liu, C.; Sherwood, T.; et al. DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints. In Proceedings of the ASPLOS’20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 16–20 March 2020; ACM: New York, NY, USA, 2020; pp. 385–399. [Google Scholar]
- Duddu, V.; Samanta, D.; Rao, D.V.; Balas, V.E. Stealing Neural Networks via Timing Side Channels. arXiv
**2018**, arXiv:1812.11720. [Google Scholar] - Maji, S.; Banerjee, U.; Chandrakasan, A.P. Leaky Nets: Recovering Embedded Neural Network Models and Inputs Through Simple Power and Timing Side-Channels—Attacks and Defenses. IEEE Internet Things J.
**2021**, 8, 12079–12092. [Google Scholar] [CrossRef] - Wang, H.; Hafiz, S.M.; Patwari, K.; Chuah, C.N.; Shafiq, Z.; Homayoun, H. Stealthy Inference Attack on DNN via Cache-based Side-Channel Attacks. Available online: https://web.cs.ucdavis.edu/~zubair/files/dnn-sc-date2022.pdf (accessed on 15 April 2022).
- Maji, S.; Banerjee, U.; Fuller, S.H.; Chandrakasan, A.P. A ThreshoId-ImpIementation-Based Neural-Network Accelerator Securing Model Parameters and Inputs Against Power Side-Channel Attacks. In Proceedings of the 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; Volume 65, pp. 518–520. [Google Scholar]

**Figure 2.**Timing Behavior of Similar Activation Functions. (

**a**) Timing Behavior for Sigmoid. (

**b**) Timing Behavior for Swish. (

**c**) Timing Behavior for Tanh. (

**d**) Timing Behavior for ReLU. (

**e**) Timing Behavior for Leaky ReLU. (

**f**) Timing Behavior for SELU. (

**g**) Timing Behavior for Fast Sigmoid when $n=1$. (

**h**) Timing Behavior for Fast Sigmoid When $n=9$.

**Figure 4.**Comparison of the Patterns of Sigmoid, Tanh, and Swish Functions for Positive Input. (

**a**) Sigmoid Function. (

**b**) Tanh Function. (

**c**) Swish Function.

**Figure 5.**Comparison of the Patterns of Sigmoid and Swish Functions for Input 0. (

**a**) Sigmoid Function. (

**b**) Swish Function.

Activation Function | Calculation Time | Approximate Range of Computation Time (in $\mathsf{\mu}$s) |
---|---|---|

ReLU, Leaky ReLU, SELU | Short | 5∼15 |

Sigmoid, Tanh, Swish | Medium | 80∼210 |

Softmax, Fast Sigmoid | Variable | 50∼ |

ReLU with PI | Leaky ReLU with PI | SELU with PI | |
---|---|---|---|

ReLU with NI | Same | Same | - |

Leaky ReLU with NI | Same | - | Same |

SELU with NI | - | - | - |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Takatoi, G.; Sugawara, T.; Sakiyama, K.; Hara-Azumi, Y.; Li, Y. The Limits of SEMA on Distinguishing Similar Activation Functions of Embedded Deep Neural Networks. *Appl. Sci.* **2022**, *12*, 4135.
https://doi.org/10.3390/app12094135

**AMA Style**

Takatoi G, Sugawara T, Sakiyama K, Hara-Azumi Y, Li Y. The Limits of SEMA on Distinguishing Similar Activation Functions of Embedded Deep Neural Networks. *Applied Sciences*. 2022; 12(9):4135.
https://doi.org/10.3390/app12094135

**Chicago/Turabian Style**

Takatoi, Go, Takeshi Sugawara, Kazuo Sakiyama, Yuko Hara-Azumi, and Yang Li. 2022. "The Limits of SEMA on Distinguishing Similar Activation Functions of Embedded Deep Neural Networks" *Applied Sciences* 12, no. 9: 4135.
https://doi.org/10.3390/app12094135