# Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- The operation evaluation is faster than its counterpart with little to no degradation on accuracy.
- Since these operations only need integer arithmetic and logic operations, they are straightforwardly executed in the already existing ALU, also allowing a faster emulation of Posits
- Being able to write functions as a sequence of arithmetic-logic operations allows us to vectorize them exploiting already existing SIMD (Single Instruction–Multiple Data) engines.

#### Paper Structure

## 2. Posit Arithmetic

- Sign field: 1-bit
- Regime field: variable length, composed of a string of bits equal to 1 or 0 ended respectively by a 0 or 1 bit.
- Exponent field: at most es bits
- Fraction field: variable length mantissa

- Count leading zeros: using the embedded
`__builtin_clz`C function that several CPU families provide in hardware [13]. - Next power of two: used to extract the fraction. An efficient way to obtain the next power of two, given a representation X on 32 bit, is the following:
`next_p2(X) -> Y``Y = X - 1``Y = Y | X >> 1``Y = Y | X >> 2``Y = Y | X >> 4``Y = y | X >> 8``Y = Y | X >> 16``Y = Y + 1`This approach copies the highest set bit to all the lower bits. Adding one to such a string will result in a sequence of carries that will set all the bits from the highest set to the least significant one to 0 and the next (in order of significancy) bit of the highest set to 1, thus producing the next power of two. Let us use an example. Suppose $X={\left(5\right)}_{10}={\left(0101\right)}_{2}$. At the first step, $Y={\left(0100\right)}_{2}$. At the second step, $Y={\left(0100\right)}_{2}{|\left(0010\right)}_{2}={\left(0110\right)}_{2}$. At the next step, $Y={\left(0110\right)}_{2}{|\left(0001\right)}_{2}={\left(0111\right)}_{2}$. From now on, Y will remain set to $Y={\left(0111\right)}_{2}$. At the last step, $Y={\left(0111\right)}_{2}+{\left(0001\right)}_{2}={\left(1000\right)}_{2}={\left(8\right)}_{10}$, that is the next power of two starting from 5.

#### 2.1. The Case of No Exponent Bits (esbits = 0)

`inv(x) -> y`

`X = x.v // ’v’ field: bit-string representing the Posit`

`msb = 1 << (N-1)`

`signmask = ~((msb | msb -1) >> 1)`

`Y = X ^ (~signmask) // negation operator followed by XOR operator (C-style)`

`y(Y)`

`comp_one(x) -> y`

`X = x.v // ’v’ field: bit-string representing the Posit`

`invert_bit = 1 << (N-2)`

`Y = invert_bit - X`

`y(Y)`

#### 2.2. FastSigmoid

`00`(see Figure 4).

`fastSigmoid(x) -> y`

`X = x.v // ’v’ field: bit-string representing the Posit`

`Y = (invert_bit + (X >> 1)) >> 1`

`y(Y)`

## 3. CppPosit Library

#### 3.1. Tabulated Posits

`T`, each table entry will occupy

`b=sizeof(T)`bits. Typically, there will be between $N=8$ and $N=10$ tables for a Posit configuration. This means that the overall space occupation will be $S=N\xb7(R\xb7C)\xb7b$.

- Addition and subtraction are respectively symmetric and antisymmetric. The two tables can be merged into one, and only one half of it is required (above or below the main diagonal).
- Multiplication and division can be simplified through logarithm properties. Given $p=x\xb7y$, we can apply $log$ operator on both sides (see Reference [14]), thus obtaining $log\left(p\right)=log(x\xb7y)$. From logarithm properties, this results in $log\left(p\right)=log\left(x\right)+log\left(y\right)$ Finally, going back with exponentiation, we get $p={e}^{log\left(x\right)+log\left(y\right)}$. Since tabulation of single operators scales linearly with the Posit size, it is feasible only to store $exp,log$ instead of multiplication and division, thus exploiting addition/subtraction LUT for the computation.
- We can compact multiplication tables even more by exploiting the fast inversion (L1) shown in Section 2. Suppose to have two Posit numbers $x,y$ and their reciprocates, if we want to provide every multiplication or division combination, we would build a LUT like in Table 3. This table would result in 16 entries for only 4 numbers, hence not manageably growing with Posit size. If we apply the L1 inversion and symmetry of negative values, we only need to store the operations for $x\xb7y$ and $x/y$, thus resulting in a LUT size of only 2 elements for the same amount of numbers, as shown in Table 4.

#### 3.2. Type Proxying

`sizeof(T2)`≫

`sizeof(T1)`. In this case, the conversion operation is the following:

`convert0(p1) -> p2`

`v1 = p1.v // ’v’ field: bit-string representing the Posit`

`v2 = cast<T2>(v1) << (Z - X)`

`p2.v2 = v2`

#### 3.3. Brain Posits

## 4. Hyperbolic Tangent, Extended Linear Unit, and their Approximations

`FastTanh(x) -> y`

`x_n = x > 0 ? -x:x`

`s = x > 0`

`y_n = neg(compl1(twice(FastSigmoid(twice(x_n)))))`

`y = s > 0 ? -y_n:y_n`

`FastELU(x) -> y`

`y_n = neg(twice(compl1(half(reciprocate(FastSigmoid(neg(x)))))))`

`y = x > 0 ? x:y_n`

## 5. Implementation Results

`Intel i7-7560U`processor, running Ubuntu Linux

`18.04`, equipped with

`GCC 8.3`. Benchmark data is publicly available in References [17]. The C++ source code can be downloaded from Reference [19].

## 6. Conclusions and Future Work

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Alcaide, S.; Kosmidis, L.; Tabani, H.; Hernandez, C.; Abella, J.; Cazorla, F.J. Safety-Related Challenges and Opportunities for GPUs in the Automotive Domain. IEEE Micro
**2018**, 38, 46–55. [Google Scholar] [CrossRef] - Benedicte, P.; Abella, J.; Hernandez, C.; Mezzetti, E.; Cazorla, F.J. Towards Limiting the Impact of Timing Anomalies in Complex Real-Time Processors. In Proceedings of the 24th Asia and South Pacific Design Automation Conference (ASPDAC’19), Tokyo, Japan, 21–24 January 2019; pp. 27–32. [Google Scholar] [CrossRef]
- 10 Useful Tips for Using the Floating Point Unit on the Cortex-M4. Available online: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/10-useful-tips-to-using-the-floating-point-unit-on-the-arm-cortex--m4-processor (accessed on 4 March 2020).
- Köster, U.; Webb, T.; Wang, X.; Nassar, M.; Bansal, A.K.; Constable, W.; Elibol, O.; Gray, S.; Hall, S.; Hornof, L.; et al. Flexpoint: An adaptive numerical format for efficient training of deep neural networks. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 1742–1752. [Google Scholar]
- Popescu, V.; Nassar, M.; Wang, X.; Tumer, E.; Webb, T. Flexpoint: Predictive Numerics for Deep Learning. In Proceedings of the 25th IEEE Symposium on Computer Arithmetic (ARITH’18), Amherst, MA, USA, 25–27 June 2018; pp. 1–4. [Google Scholar] [CrossRef]
- NVIDIA Turing GPU Architecture, Graphics Reinvented. Available online: https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf (accessed on 4 March 2020).
- Malossi, A.C.I.; Schaffner, M.; Molnos, A.; Gammaitoni, L.; Tagliavini, G.; Emerson, A.; Tomás, A.; Nikolopoulos, D.S.; Flamand, E.; Wehn, N. The transprecision computing paradigm: Concept, design, and applications. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’18), Dresden, Germany, 19–23 March 2018; pp. 1105–1110. [Google Scholar] [CrossRef][Green Version]
- Gustafson, J.L.; Yonemoto, I.T. Beating Floating Point at its Own Game: Posit Arithmetic. Supercomput. Front. Innov.
**2017**, 4, 71–86. [Google Scholar] - Cococcioni, M.; Rossi, F.; Ruffaldi, E.; Saponara, S. Novel Arithmetics to Accelerate Machine Learning Classifiers in Autonomous Driving Applications. In Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS’19), Genoa, Italy, 27–29 Noember 2019; pp. 779–782. [Google Scholar] [CrossRef][Green Version]
- Cococcioni, M.; Ruffaldi, E.; Saponara, S. Exploiting Posit arithmetic for Deep Neural Networks in Autonomous Driving Applications. In Proceedings of the 2018 IEEE International Conference of Electrical and Electronic Technologies for Automotive (Automotive ’18), Milan, Italy, 9–11 July 2018; pp. 1–6. [Google Scholar] [CrossRef][Green Version]
- Carmichael, Z.; Langroudi, H.F.; Khazanov, C.; Lillie, J.; Gustafson, J.L.; Kudithipudi, D. Deep Positron: A Deep Neural Network Using the Posit Number System. In Proceedings of the 2019 Design, Automation Test in Europe Conference Exhibition (DATE), Florence, Italy, 29 March 2019; pp. 1421–1426. [Google Scholar] [CrossRef][Green Version]
- Cococcioni, M.; Rossi, F.; Ruffaldi, E.; Saponara, S. A Fast Approximation of the Hyperbolic Tangent when Using Posit Numbers and its Application to Deep Neural Networks. In Proceedings of the International Workshop on Applications in Electronics Pervading Industry, Environment and Society (ApplePies’19), Pisa, Italy, 18 September 2019. [Google Scholar] [CrossRef]
- Other Built-in Functions Provided by GCC. Available online: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html (accessed on 4 March 2020).
- Arnold, M.G.; Garcia, J.; Schulte, M.J. The interval logarithmic number system. In Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH’03), Santiago de Compostela, Spain, 15–18 June 2003; pp. 253–261. [Google Scholar] [CrossRef][Green Version]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-Normalizing Neural Networks. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Long Beach, CA, USA, 4–9 December 2017; pp. 971–980.
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef][Green Version] - Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’11), San Jose, CA, USA, 31 July–5 August 2011; pp. 1453–1460. [Google Scholar]
- European Processor Initiative: Posit-Based TinyDNN. Available online: https://gitlab.version.fz-juelich.de/epi-wp1-public/tinyDNN (accessed on 4 March 2020).

**Figure 2.**An example of a 16-bit Posit with 3 bits for the exponent ($esbits$ = 3): Given the sequence on top of the figure, after detecting that it starts with one 1, we have to compute the 2’s complement of all the remaining bits (passing from 001-110-111011001 to 110-001-000100111). Then, we can proceed to decode the Posit. The associated real value is therefore . The final value is therefore $-512\xb7(1+39/512)$$=-551$ (exact value, i.e., no rounding, for this case).

**Figure 3.**Resolution of a 12-bit Posit when varying the exponent size. With a 0-bit exponent, the Posit resolution in the $[0,1]$ range is the one of a 12-bit fixed point format.

**Figure 6.**Five-bit Posit mapping to the Posit circle: As reported, the tanh function manages to cover the lower half of the circle while the sigmoid one covers only the quarter $[0,1]$.

**Figure 7.**The Posit circle when the total number of bits is 5: The extended linear unit uses all the numbers in $[-1,inf)$, while the ReLU function uses only the ones in $[0,inf)$.

**Figure 8.**Comparison between the exact and approximated versions of hyperbolic tangent (TANH) and extended linear unit (ELU).

**Figure 9.**Flowchart for the proposed method: models are trained using formats with high bit count like Float32 or, in the future, Posit$\left(\right)$. The models obtained this way are then converted to formats with lower bit count (e.g., Posit$\left(\right)$) to increase space efficiency and bandwidth.

**Table 1.**Most interesting L1 operators implemented in cppPosit and their requirements to be applied on the argument x.

Operation | Approximated | Requirements |
---|---|---|

$2\xb7x$ | no | esbits = 0 |

$x/2$ | no | esbits = 0 |

$1/x$ | yes | esbits = 0 |

$1-x$ | no | esbits = 0, $x\in [0,1]$ |

FastSigmoid(x) | yes | esbits = 0 |

FastTanh(x) | yes | esbits = 0 |

FastELU(x) | yes | esbits = 0 |

Total Bits (X) | Storage Type Bits (b) | Per-Table Occupation |
---|---|---|

8 | 8 | 64 KB |

10 | 16 | 2 MB |

12 | 16 | 32 MB |

14 | 16 | 512 MB |

16 | 16 | 8 GB |

1/x | x | −1/x | −x | |
---|---|---|---|---|

1/y | 1/xy | x/y | −1/xy | −x/y |

y | y/x | xy | −y/x | −xy |

−1/y | −1/xy | −x/y | 1/xy | −x/y |

−y | −y/x | −xy | −y/x | xy |

**Table 4.**All the possible combinations for multiplying and dividing two Posit numbers: all the cells in italics correspond to the same LUT entry, and all the remaining ones correspond to another LUT entry.

1/x | x | −1/x | −x | |
---|---|---|---|---|

1/y | 1/xy | x/y | −1/xy | −x/y |

y | y/x | xy | −y/x | −xy |

−1/y | −1/xy | −x/y | 1/xy | −x/y |

−y | −y/x | −xy | −y/x | xy |

Standard Posits | Brain Posits |
---|---|

Posit<16,1> | Posit<8,2> |

Posit<32,2> | Posit<16,3> |

Posit<64,3> | Posit<32,4> |

**Table 6.**Comparison using Posits for the MNIST dataset for three different activation functions: fast approximated version of Tanh (FastTanh), exact Tanh, and FastSigmoid. Accuracy of the neural network and mean sample inference time are reported.

Activation | FastTanh (This Paper) | Tanh | FastSigmoid | |||
---|---|---|---|---|---|---|

Acc. (%) | Time (ms) | % | ms | % | ms | |

SoftFloat32 | - | - | 99.4 | 8.3 | 97.1 | - |

Posit$\left(\right)$ | 99.1 | 3.2 | 99.4 | 5.28 | 97.1 | 3.31 |

Posit$\left(\right)$ | 99.1 | 2.9 | 99.4 | 4.64 | 97.1 | 3.09 |

Posit$\left(\right)$ | 99.1 | 2.9 | 99.4 | 4.66 | 97.1 | 3.04 |

Posit$\left(\right)$ | 99.1 | 2.9 | 99.3 | 4.62 | 96.9 | 3.08 |

bottomrule Posit $\left(\right)$ | 98.6 | 2.9 | 98.5 | 4.84 | 94.2 | 3.01 |

**Table 7.**Comparison using Posits for the GTRSB dataset (see Table 6).

Activation | FastTanh (This Paper) | Tanh | FastSigmoid | |||
---|---|---|---|---|---|---|

Acc. (%) | Time (ms) | % | ms | % | ms | |

SoftFloat32 | - | - | 94.2 | 15.2 | 82.0 | - |

Posit$\left(\right)$ | 93.5 | 5.3 | 93.5 | 6.2 | 81.9 | 5.0 |

Posit$\left(\right)$ | 93.4 | 4.2 | 93.5 | 5.1 | 81.9 | 4.3 |

Posit$\left(\right)$ | 93.4 | 4.2 | 93.4 | 5.1 | 81.9 | 4.3 |

Posit$\left(\right)$ | 93.4 | 4.2 | 93.3 | 5.1 | 81.0 | 4.2 |

Posit$\left(\right)$ | 93.0 | 4.0 | 92.3 | 5.0 | 72.1 | 4.0 |

**Table 8.**Comparison using Posits for the MNIST dataset for three different activation functions: fast approximated version of ELU (FastELU), exact ELU, and ReLU. Accuracy of the neural network and mean sample inference time are reported.

Activation | FastELU (This Paper) | ELU | ReLU | |||
---|---|---|---|---|---|---|

Acc. (%) | Time (ms) | % | ms | % | ms | |

SoftFloat32 | - | - | 98.6 | 8.8 | 89.1 | 6.3 |

Posit$\left(\right)$ | 98.5 | 3.2 | 98.6 | 3.9 | 89.1 | 2.0 |

Posit$\left(\right)$ | 98.5 | 2.4 | 98.6 | 3.1 | 89.05 | 2.0 |

Posit$\left(\right)$ | 98.5 | 2.3 | 98.6 | 3.1 | 89.0 | 2.0 |

Posit$\left(\right)$ | 98.3 | 2.3 | 98.5 | 3.0 | 89.0 | 1.9 |

Posit$\left(\right)$ | 91.1 | 2.2 | 90.1 | 3.0 | 88.4 | 1.9 |

**Table 9.**Comparison using Posits for the GTRSB dataset (see Table 8).

Activation | FastELU (This Paper) | ELU | ReLU | |||
---|---|---|---|---|---|---|

Acc. (%) | Time (ms) | % | ms | % | ms | |

SoftFloat32 | - | - | 94.2 | 15.86 | 92.0 | 8.2 |

Posit$\left(\right)$ | 94.0 | 5.8 | 94.2 | 6.37 | 92.0 | 5.0 |

Posit$\left(\right)$ | 94.0 | 4.6 | 94.2 | 5.21 | 92.0 | 4.3 |

Posit$\left(\right)$ | 94.0 | 4.6 | 94.2 | 5.08 | 92.0 | 4.3 |

Posit$\left(\right)$ | 94.0 | 4.6 | 94.2 | 5.0 | 92.0 | 4.2 |

Posit$\left(\right)$ | 92.0 | 4.6 | 91.8 | 5.0 | 86.8 | 4.0 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cococcioni, M.; Rossi, F.; Ruffaldi, E.; Saponara, S.
Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic. *Sensors* **2020**, *20*, 1515.
https://doi.org/10.3390/s20051515

**AMA Style**

Cococcioni M, Rossi F, Ruffaldi E, Saponara S.
Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic. *Sensors*. 2020; 20(5):1515.
https://doi.org/10.3390/s20051515

**Chicago/Turabian Style**

Cococcioni, Marco, Federico Rossi, Emanuele Ruffaldi, and Sergio Saponara.
2020. "Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic" *Sensors* 20, no. 5: 1515.
https://doi.org/10.3390/s20051515