# A Low-Voltage, Low-Power Reconfigurable Current-Mode Softmax Circuit for Analog Neural Networks

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

^{2}and consumes only 3 µW of power, representing a very compact and energy-efficient option compared to the corresponding digital implementations.

## 1. Introduction

^{2}, consuming a power in the range of 0.5 to 5 mW [17,18,19,20,21,22]. On the other hand, and as reported in [15], analog softmax can be realized with only N transistors, where N is the number of inputs and outputs. Indeed, it uses only one transistor for each input and output of the function. It is worth noting that the input and the output share the same node since input data are provided as a drain voltage, while the drain current is the output. The method in [15] claims a good precision in a very compact-area solution with very low power consumption. However, this straightforward implementation is not adequate for practical applications requiring current-mode inputs and distinct input/output nodes. In addition, transistors operated in subthreshold regime are very sensitive to process and temperature variations. A different analog softmax circuit proposed in [16] features a relatively high computation cost in terms of power consuming 690 µW at a supply voltage of 5 V and for N = 5 input. This is not a fixed limit since the operating power can be likely scaled by using more advanced CMOS technology nodes. However, the proposed topology achieves an approximate equation of the softmax model, where the exponential terms are approximated by their quadratic Taylor’s series.

## 2. Analytical Model of the Proposed Softmax Analog Implementation

_{p}and V

_{TH}represent the pMOS transconductance coefficient and threshold voltage, respectively. The channel-length modulation is neglected by appropriately sizing the transistor length. The following relations for the input (Equation (4)) and output (V

_{x}in Equation (5)) voltages can be derived as:

_{s}is the reverse saturation current of source and drain p-diffusions/nwell junctions, n is the subthreshold slope factor, and V

_{t}is the thermal voltage. Equation (7) implements the I–V converter and exponential blocks in Figure 2. For an M-sized softmax function, an M + 1 replica of these functional blocks is required.

_{SG}s highlighted in Figure 3b must equal 0. This basically means that the sum of the V

_{SG}s oriented in the clockwise (CW) direction must equal the sum of the V

_{SG}s oriented in the counterclockwise (CCW) direction. Due to the current–voltage exponential relation, this implies that the product of CW device currents equals the product of CCW devices. By arbitrarily selecting three currents as inputs and one as output, both multiplication and division operations can be realized [24]. This circuit uses dynamic-threshold-voltage (DVT) transistors with shorted body and gate terminals in order to improve the transient response for a given supply voltage.

_{DS}of the transistor is higher than 4·V

_{t}, the ${e}^{-\frac{{V}_{SD}}{{V}_{t}}}$ term in Equation (8) can be neglected.

_{SG}s in Equation (9), we finally obtain the following relation:

_{SCALE}is set to a fixed value, it is possible to obtain the analog division between the other two inputs, i.e., I

_{A}/I

_{B}. The relation obtained by joining Equations (7) and (10) is:

_{SCALE}represent the softmax slope and amplitude, respectively.

## 3. Analog Softmax Circuit Design and Performance

_{DD}) of 500 mV. We selected a current of 10 nA as the nominal full-scale output current, corresponding to the ‘1’ output level of the softmax operation (i.e., 100% probability). As for the number of inputs—which corresponds to the number of outputs—N = 2 was used as a nominal case. The behavior as a function of the full-scale output current and of increasing N was also explored. Softmax transfer characteristics were simulated by sweeping only one normalized input from −5 to 5 in the normalized input range by keeping the other one (or the other ones, when N > 2) at 0. The input scale was normalized to get a nominal slope α equal to 1 for an easy comparison with the theoretical equation.

#### 3.1. Softmax Nominal Operation and Impact of the Full-Scale Output Current and Number of Inputs

_{SCALE}= 10 nA is the same. Our intrinsic softmax proposal features a bell-shaped error, with a peak error in the central part of 2.2%, which can be ascribed to an input offset. On the other hand, the error in topology proposed in [15] shows two peaks for an input close to −2.5 and 2.5 (of 0.8% and 1%, respectively). In addition, if we consider the impact of the current-to-voltage converter, there is an additional error contribution in regions I and III. This is ascribed to the upper and lower bounds of the conversion circuit given in Equation (6): only the one in III can be compensated by appropriate trimming of the I

_{SCALE}(already done in the figure). However, an error lower than 2.2% in the whole range is observed with an average value of ~1.4% in the investigated operating range (the corresponding value when the input current–voltage converter is not considered is <1%).

_{SCALE}varied from 10 nA to 100 nA. This plot is relevant because it highlights that in our proposed softmax, the error increases only marginally with increasing I

_{SCALE}, and this is achieved because the slope parameter is practically independent of I

_{SCALE}. This is not the case with the counterpart, where slope and the output current scale are both varied when I

_{SCALE}is changed so that they cannot be optimized independently. This is the reason our proposal shows a lower relative error for a variable output-scale (e.g., ~3.4% versus ~6.8% at I

_{SCALE}= 100 nA).

_{SCALE}of 10, 25 and 50 nA (with M = 2) is shown. Given that we are considering only two inputs, the softmax probability for each of them corresponds to 50% when their value is the same (i.e., zero in this example). In Figure 5b, a similar plot as in (a) is shown but for a fixed I

_{SCALE}of 10 nA and for M = 2, 5 and 10. Even in this case, only one input is swept, while all the other inputs are kept constant to 0. The softmax probability corresponds to 1/M when the values of all inputs are the same.

_{SCALE}and for a different M, resulting in 3.41 µs, 1.66 µs, and 1.39 µs for I

_{SCALE}of 10, 25, and 50 nA, respectively, while no significant dependence on the number of inputs was observed.

#### 3.2. Impact of Voltage and Temperature on the Softmax Slope

_{SCALE}current, the original property of our softmax circuit is the electrical adjustability of the slope α by varying V

_{DD}(see Equation (11)). This property can be exploited when temperature variations are considered, given that the effect of the temperature and voltage on the softmax characteristics is similar. This can be observed in Equation (11), where a similar dependence of the term $\alpha $ on voltage and temperature parameters is described.

_{DD}variations, as shown in Figure 6b, where V

_{DD}is varied from 700 mV down to 400 mV. The similarity between the impact of V

_{DD}and thermal voltage (and temperature) variations is consistent with the analytical model in Equation (11). The proposed softmax circuit exhibits different voltage sensitivity at different voltage ranges. More precisely, the voltage sensitivity is higher for lower V

_{DD}values: the slope exhibits a variation of 45.19% from 400 mV to 500 mV, while a variation of 28.14% occurs for a V

_{DD}variation from 700 mV to 800 mV.

_{DD}variations, it is possible to easily implement a correction at circuit level to get an almost constant softmax slope, for example, through an external circuit implementing a negative regulation of the V

_{DD}with respect to temperature. This concept is also shown in Figure 6c, where we calculated the V

_{DD}needed to keep the same softmax slope as the temperature changes. This flexibility allows our circuit to feature better temperature sensitivity with respect to the one proposed in [15], as highlighted in Figure 6d, where a linear V

_{DD}–temperature correction is implemented, i.e., V

_{DD}= 500 mV + (27 − T) × 2.064 mV/°C (where T is expressed in °C).

#### 3.3. Mismatch and Process Variations

#### 3.4. Area and Power Consumption

^{3}or 53 × 10

^{3}µm

^{2}(requiring 22, 190 or 10,900 transistors) for M = 2, 10 or 100, respectively. Figure 9b shows the power consumption as a function of the output scale (for a variable number of M). It can be observed that our proposal shows a power consumption strongly dependent on the number of inputs, because the conversion blocks are the most power-hungry circuits, while I

_{SCALE}has a lower impact. A two-inputs design operated with V

_{DD}= 500 mV, and I

_{SCALE}= 10 nA shows an average power consumption of only 431 nW, among which almost 65% of power is dissipated by the input current-to-voltage conversion (280 nW). For a ten-inputs/ten-outputs case, the power increases to 3 µW for I

_{SCALE}= 10 nA, or to 3.55 µW for I

_{SCALE}= 100 nA.

#### 3.5. Impact of the Technology Node Scaling

_{SCALE}being adjusted to match the upper part, a worsened matching in the linear region can be observed, resulting in a higher relative error, with a peak value close to 6.5%, which can be still reasonable since there are simple DNNs which can operate with a reduced equivalent number of bits [3].

## 4. Conclusions

^{2}and consumes 3 µW when operated at V

_{DD}= 500 mV for an output scaling current of 10 nA, rendering it a very interesting option compared to the digital counterparts. These improvements are achieved with limited precision degradation, considering that the maximum and average relative errors, with respect to the theoretical softmax equation, are of 2.2% and 0.9% only, respectively.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Sarpeshkar, R. Analog Versus Digital: Extrapolating from Electronics to Neurobiology. Neural Comput.
**1998**, 10, 1601–1638. [Google Scholar] [CrossRef] [PubMed] - Haensch, W.; Gokmen, T.; Puri, R. The Next Generation of Deep Learning Hardware: Analog Computing. Proc. IEEE
**2018**, 107, 108–122. [Google Scholar] [CrossRef] - Paliy, M.; Strangio, S.; Ruiu, P.; Rizzo, T.; Iannaccone, G. Analog Vector-Matrix Multiplier Based on Programmable Current Mirrors for Neural Network Integrated Circuits. IEEE Access
**2020**, 8, 203525–203537. [Google Scholar] [CrossRef] - Danial, L.; Pikhay, E.; Herbelin, E.; Wainstein, N.; Gupta, V.; Wald, N.; Roizin, Y.; Daniel, R.; Kvatinsky, S. Two-terminal floating-gate transistors with a low-power memristive operation mode for analogue neuromorphic computing. Nat. Electron.
**2019**, 2, 596–605. [Google Scholar] [CrossRef] - Veire, L.V.; De Boom, C.; De Bie, T. Sigmoidal NMFD: Convolutional NMF with Saturating Activations for Drum Mixture Decomposition. Electronics
**2021**, 10, 284. [Google Scholar] [CrossRef] - Xing, S.; Wu, C. Implementation of A Neuron Using Sigmoid Activation Function with CMOS. In Proceedings of the 2020 IEEE 5th International Conference on Integrated Circuits and Microsystems (ICICM), Nanjing, China, 23–25 October 2020; pp. 201–204. [Google Scholar]
- Shamsi, J.; Amirsoleimani, A.; Mirzakuchaki, S.; Ahmade, A.; Alirezaee, S.; Ahmadi, M. Hyperbolic tangent passive resistive-type neuron. In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 581–584. [Google Scholar]
- Fan, D.; Shim, Y.; Raghunathan, A.; Roy, K. STT-SNN: A Spin-Transfer-Torque Based Soft-Limiting Non-Linear Neuron for Low-Power Artificial Neural Networks. IEEE Trans. Nanotechnol.
**2015**, 14, 1013–1023. [Google Scholar] [CrossRef] [Green Version] - Valle, M. Analog VLSI Implementation of Artificial Neural Networks with Supervised On-Chip Learning. Analog. Integr. Circuits Signal Process.
**2002**, 33, 263–287. [Google Scholar] [CrossRef] - Ghomi, A.; Dolatshahi, M. Design of a new CMOS Low-Power Analogue Neuron. IETE J. Res.
**2017**, 64, 67–75. [Google Scholar] [CrossRef] - Joubert, A.; Belhadj, B.; Temam, O.; Héliot, R. Hardware spiking neurons design: Analog or digital? In Proceedings of the the 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, QLD, Australia, 10–15 June 2012; pp. 1–5. [Google Scholar]
- Khodabandehloo, G.; MirHassani, M.; Ahmadi, M. Analog Implementation of a Novel Resistive-Type Sigmoidal Neuron. IEEE Trans. Very Large Scale Integr. Syst.
**2011**, 20, 750–754. [Google Scholar] [CrossRef] - Koosh, V.F.; Goodman, R. VLSI neural network with digital weights and analog multipliers. In Proceedings of the ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196), Sydney, NSW, Australia, 6–9 May 2001; Volume 2, pp. 233–236. [Google Scholar]
- Koosh, V.; Goodman, R. Analog VLSI neural network with digital perturbative learning. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process.
**2002**, 49, 359–368. [Google Scholar] [CrossRef] [Green Version] - Elfadel, I.M.; Wyatt, J.L. The “Softmax” nonlinearity: Derivation using statistical mechanics and useful properties as a multiterminal analog circuit element. In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS’93), Denver, CO, USA, 1 January 1993; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993; pp. 882–887. [Google Scholar]
- Zunino, R.; Gastaldo, P. Analog implementation of the SoftMax function. In Proceedings of the 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353), Phoenix-Scottsdale, AZ, USA, 26–29 May 2002; pp. II.117–II.120. [Google Scholar]
- Mohammed, A.A.; Umaashankar, V. Effectiveness of Hierarchical Softmax in Large Scale Classification Tasks. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 19–22 September 2018; pp. 1090–1094. [Google Scholar]
- Kouretas, I.; Paliouras, V. Hardware Implementation of a Softmax-Like Function for Deep Learning. Technologies
**2020**, 8, 46. [Google Scholar] [CrossRef] - Li, Z.; Li, H.; Jiang, X.; Chen, B.; Zhang, Y.; Du, G. Efficient FPGA Implementation of Softmax Function for DNN Applications. In Proceedings of the 2018 12th IEEE International Conference on Anti-Counterfeiting, Security, and Identification (ASID), Xiamen, China, 9–11 November 2018; pp. 212–216. [Google Scholar]
- Dong, X.; Zhu, X.; Ma, D. Hardware Implementation of Softmax Function Based on Piecewise LUT. In Proceedings of the 2019 IEEE International Workshop on Future Computing (IWOFC), Hangzhou, China, 14–15 December 2019; pp. 1–3. [Google Scholar]
- Kagalkar, A.; Raghuram, S. CORDIC Based Implementation of the Softmax Activation Function. In Proceedings of the 2020 24th International Symposium on VLSI Design and Test (VDAT), Bhubaneswar, India, 23–25 July 2020; pp. 1–4. [Google Scholar]
- Alabassy, B.; Safar, M.; El-Kharashi, M.W. A High-Accuracy Implementation for Softmax Layer in Deep Neural Networks. In Proceedings of the 2020 15th Design & Technology of Integrated Systems in Nanoscale Era (DTIS), Marrakech, Morocco, 1–3 April 2020; pp. 1–6. [Google Scholar]
- Serrano-Gotarredona, T.; Linares-Barranco, B.; Andreou, A.G. A general translinear principle for subthreshold MOS transistors. IEEE Trans. Circuits Syst. I Regul. Pap.
**1999**, 46, 607–616. [Google Scholar] [CrossRef] [Green Version] - Al-Absi, M.A.; Hussein, A.; Abuelma’Atti, M.T. A Novel Current-Mode Ultra Low Power Analog CMOS Four Quadrant Multiplier. In Proceedings of the 2012 International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malaysia, 3–5 July 2012; pp. 13–17. [Google Scholar] [CrossRef]

**Figure 1.**Block diagram of an artificial neuron: it takes the weighted sum of N inputs and passes the result x

_{(i)}through a nonlinear activation function f

_{NL}to produce the elaborated output.

**Figure 2.**Softmax diagram, composed of M conversion blocks, M + 1 exponentials, and one analog divider. Exponential blocks and the analog divider must be replicated to produce the other outputs.

**Figure 3.**Transistor-level schematics of the (

**a**) input conversion block (current-to-voltage linear conversion and exponential conversion) and (

**b**) the analog divider block.

**Figure 4.**(

**a**) Proposed softmax design simulated transfer function and theoretical analytical model (M = 2). The simulated input signals have been arbitrarily normalized to get a softmax slope α = 1, while the output has been normalized to the output full scale (10 nA, in this plot). (

**b**) Relative error of the proposed softmax design and of the one proposed in [15]. The error of our proposal is shown with/without considering the impact of the input voltage–current liner converter. (

**c**) Impact of I

_{SCALE}on the error averaged over the (−5, 5) input range.

**Figure 5.**Softmax transfer characteristics (

**a**) at I

_{SCALE}= 10, 25 and 50 nA with M=2 and (

**b**) at different number of inputs M for I

_{SCALE}= 10 nA.

**Figure 6.**Softmax circuit transfer characteristics (

**a**) at different temperatures and (

**b**) at different V

_{DD}. (

**c**) V

_{DD}required to keep constant the softmax slope at a different temperature and linear interpolation. (

**d**) Softmax slope as a function of temperature reported for the proposed softmax circuit (with linear correction given in (

**c**)) and for the one proposed by Elfadel et al. in [15].

**Figure 7.**Impact of (

**a**) mismatch and of (

**b**) process variations on the softmax transfer characteristics (two inputs, I

_{SCALE}= 10 nA) for 100 MC runs.

**Figure 8.**Softmax transfer-characteristic parameters extracted for 1000 MC runs (two inputs, I

_{SCALE}= 10 nA). Histograms of (

**a**) slope, (

**b**) amplitude, and (

**c**) offset error extracted for mismatch and process variability simulations.

**Figure 9.**(

**a**) Area overhead as a function of the number of inputs and outputs (M) of the softmax assuming an implementation in a 180 nm CMOS technology node. The needed number of transistors is also reported for some conditions. (

**b**) Power consumption as a function of I

_{SCALE}for different number of inputs. V

_{DD}= 500 mV.

**Figure 10.**(

**a**) Transfer characteristic and corresponding relative error (

**b**) for three technology node (180 nm, 65 nm, 40 nm) softmax circuits.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Vatalaro, M.; Moposita, T.; Strangio, S.; Trojman, L.; Vladimirescu, A.; Lanuzza, M.; Crupi, F.
A Low-Voltage, Low-Power Reconfigurable Current-Mode Softmax Circuit for Analog Neural Networks. *Electronics* **2021**, *10*, 1004.
https://doi.org/10.3390/electronics10091004

**AMA Style**

Vatalaro M, Moposita T, Strangio S, Trojman L, Vladimirescu A, Lanuzza M, Crupi F.
A Low-Voltage, Low-Power Reconfigurable Current-Mode Softmax Circuit for Analog Neural Networks. *Electronics*. 2021; 10(9):1004.
https://doi.org/10.3390/electronics10091004

**Chicago/Turabian Style**

Vatalaro, Massimo, Tatiana Moposita, Sebastiano Strangio, Lionel Trojman, Andrei Vladimirescu, Marco Lanuzza, and Felice Crupi.
2021. "A Low-Voltage, Low-Power Reconfigurable Current-Mode Softmax Circuit for Analog Neural Networks" *Electronics* 10, no. 9: 1004.
https://doi.org/10.3390/electronics10091004