Analog Gaussian Function Circuit: Architectures, Operating Principles and Applications

This review paper explores existing architectures, operating principles, performance metrics and applications of analog Gaussian function circuits. Architectures based on the translinear principle, the bulk-controlled approach, the floating gate approach, the use of multiple differential pairs, compositions of different fundamental blocks and others are considered. Applications involving analog implementations of Machine Learning algorithms, neuromorphic circuits, smart sensor systems and fuzzy/neuro-fuzzy systems are discussed, focusing on the role of the Gaussian function circuit. Finally, a general discussion and concluding remarks are provided.


Introduction
Since the first Gaussian function circuit (also called the Bump circuit) was introduced by Delbruck (Delbruck's Simple Bump) in 1991 [1,2] (shown in Figure 1), many research groups have focused on improving this architecture and/or incorporating it in various fields [3][4][5][6][7][8][9][10][11][12][13][14]. Some of the implementations are used for Machine Learning (ML) applications [15,16], neuromorphic systems [17,18], smart sensors [19,20] and fuzzy or neuro-fuzzy systems [21,22]. Gaussian function circuits are specific analog circuits that provide an output current, which is a typical Gaussian function [23] or a bell-shaped curve [24]. The main three characteristics of a Gaussian function curve, shown in Figure 2, are the height (amplitude), the mean value (center) and the variance (width) [1,2]. Depending on the type of the architecture or the application of the Gaussian function circuit can operate in the sub-threshold [25] or the strong inversion region [26].
The first Gaussian function circuit (Simple Bump) is used for computing the similarity of analog voltages (or generally the distance between input signals) [1,2]. The Simple Bump circuit consists of two sub-circuits, a current correlator and a simple differential pair, as shown in Figure 3. The (non-symmetric) current correlator is a compact circuit which consists of four PMOS transistors (M p1 − M p4 ), shown in Figure 1. When the current correlator operates only in the sub-threshold region, it has the ability to measure the similarity between two input signals (two input currents I 1 and I 2 ). If one of the transistors operats above the threshold, the correlation is not commutative. This leads to a difficulty in providing the appropriate mathematical model (more complicated expression). Normally, the output current I out is a self-normalized correlation of the two input currents that resembles the Gaussian function, based on Equation (1), shown in Figure 4. The output current of Delbruck's Simple Bump is given by: where I bias and V mean are the bias current and the voltage parameter, controlling the height and the mean value of the Gaussian function curve, respectively. The quantity S is an important circuit parameter for the simple current correlator, given by Equation (2), κ is the slope factor and V in is the input voltage. S = (W/L) 3,4 (W/L) 1,2 . (2) In the case of the Simple Bump circuit, the height and the mean value are independently tuned via the circuit's parameters, while the deviation is altered by the effective W/L ratio (via transistors' dimensions). We note that a primary aim in the design flow of most Gaussian function circuits is the independent and electronic tunability of the Gaussian curve's characteristics (height, mean value and variance) [27][28][29]. By designing a fullytunable architecture, the Gaussian function circuit can be used as a general purpose circuit in multiple applications [22]. On the other hand, a design lacking electronic tunability is focused on a specific application [9].    In order to properly demonstrate the existing Gaussian function circuit implementations we divide them in five main categories based on their operating principles: • Architectures based on the translinear principle which use absoluters, squarers, current to voltage (I-V) converters, exponentiators and compensators as building blocks; • Bulk-controlled circuits based on modifications of Delbruck's Simple Bump, that use the body effect in order to tune the variance; • Circuits including floating-gate transistors that use floating nodes in direct current and capacitively connected inputs in order to achieve tunability in the characteristics; • Designs using exclusively differential pairs and current mirrors; • Implementations that add extra components, for example Operational Transconductance Amplifiers (OTAs), Digital to Analog Converters (DACs), mixed-mode circuits, and so forth, which provide the appropriate tunability in the variance.
These categories are not mutually exclusive, since there are a few implementations that may belong in more than one. Each category consists of the circuits with the most possible simillarities since an absolute criterion for categorization is difficult. Moreover, there are other implementations, which use different design methodologies for the realization of the Gaussian function. Therefore, we add an extra category (other implementations), which consists of the designs that are not part of the previous categories.
Gaussian function circuits' applications range from low-power and area efficient to high speed computation and high accuracy due to the fact that each application has its own limitations [30,31]. For example, in the case of a wearable classification application, the design of a low-power and area efficient Gaussian function circuit is necessary, because its realization consists of many cells which have to operate in parallel fashion [30]. On the other hand, for object recognition or fuzzy systems, high accuracy, high-speed and real-time computation are needed [31]. The most popular Gaussian function circuits' domains and applications are the following: • Analog-hardware implementations of ML algorithms, for example Radial Basis Function Neural Networks (RBF NNs), Support Vector Machines (SVMs), the K-means clustering algorithm and so forth; • Neuromorphic systems, architectures which use physical artificial neurons for computations or design artificial neural systems; • Smart sensor systems, devices that take input from the physical environment and use built-in computing resources; • Fuzzy or neuro-fuzzy systems with main applications in controllers and object recognition.
This paper presents an overview of circuits and systems for Gaussian function circuits focusing on integrated implementations. All related architectures, operating principles, design methods and applications, are provided, to the best of the authors' knowledge. The rest of the paper is organized as follows: Architectures and operating principles are reviewed in Section 2. System level implementations and applications are summarized in Section 3. Section 4 discusses and summarizes the performance of different Gaussian function circuits. Concluding remarks are drawn in Section 5.

Architectures and Operating Principles
Many Gaussian function circuits have been proposed in a effort to achieve low power consumption, area efficiency and high accuracy, for example [5,9,32,33]. This Section presents and analyzes the existing Gaussian function circuits, and categorizes them based on their operating principle. The five main categories include the architectures based on the translinear principle, the bulk-controlled approach, circuits built with floating-gate transistors, circuits built exclusively with differential pairs and designs combining different fundamental blocks. Additionally, a sixth category is formed of all other types of Gaussian function circuits.

Architectures Based on the Translinear Principle
The translinear principle, introduced by Barry Gilbert in 1975 [34], results in a direct and elegant methodology to analyze and synthesize circuits realizing certain nonlinear mathematical functions, like multiplication, power-law, etc., using exclusively analog circuits [35,36]. The core of such circuits is a closed translinear loop, containing a number of translinear elements (e.g. bipolar and sub-threshold MOS transistors exhibiting an exponential current-voltage relationship). A typical translinear loop contains an even number of only one type of transistors (p-type or n-type).
Typically, this realization of a Gaussian function circuit is done by implementing and combining the absolute value, the square and exponential functions circuits using the translinear principle [35,36]. The design flow ( Figure 5) involves three basic building blocks, an absoluter (for example Figures 6 and 7), a squarer (for example Figures 8 and 9) and an exponentiator (for example Figure 10) [25,[37][38][39]. Some of the implementations combine the absoluter with the squarer or use a squarer directly [3,4,[40][41][42][43][44]. The exponentiator can be excluded for the generation of the Gaussian function's logarithm [45]. Some implementations use additional components to improve the accuracy of the Gaussian function. Specifically, an I-V converter ( Figure 11) [4,[40][41][42] or a transimpedance amplifier [37] are used between the squarer and the exponentiator. Such components are used because typical translinear squarers have a current output while typical exponentiators have a voltage input. Alternatively, in [43] a compensation circuit is added targeting the reduction of the offset caused by the Body Effect.   Figure 10. A schematic of a simple exponential generation circuit. The absoluter derives the absolute value of the difference between the input current and the current setting the Gaussian mean value by using both PMOS and NMOS current mirrors. The squaring circuits are typically translinear multipliers (using the same input twice) [35,36]. The multiplication is performed by the translinear loop; the product of currents flowing in clockwise elements is equal to the product of currents of counter clockwise elements with exponential characteristic. Some squaring circuits include an absoluter and some other ones require an external one. Also, some of them include a divider (squarer divider circuit). The exponentiator is generally based on the exponential current to voltage law of a single MOS transistor operating in the sub-threshold region.
The expression of the output current of a Gaussian function circuit, can be a good approximation of the Gaussian function, or an exact realization (at least with the standard simple models). For example in [25], where I pre is the transistor's pre-exponential current, κ is the slope factor, is the generic error term, V T is thermal voltage and I bias , I m and V width are the bias current and current and voltage parameters, controlling the height, the mean value and the variance of the Gaussian function curve, respectively. The value k tri = µ p C ox (W/L) tri , where µ p is the hole mobility, C ox is the oxide capacitance per unit area and (W/L) tri is the effective (W/L) ratio of the exponentiator input transistor. The theoretical output current of the Gaussian circuit, according to (3), is presented in Figure 12. In Table 1, we summarize the technology used, the minimum operational characteristics (power consumption, power supply, bias current), the operation region and the number of transistors for each implementation. The power consumption for the Gaussian function circuit ranges from 350 nW to 1.534 µW, the power supply ranges from 0.7 V to 3.3 V with the minimum operational bias current being less than 0.8 µA (except from [44]). All of the architectures are designed to operate in the sub-threshold region, except from [3,44]. The number of the transistors is usually high, as the Gaussian function is implemented with multiple stages. Though, it should be noted that the Gaussian function's dimensionality can be increased by adding extra squarers and using the same exponentiator circuit, therefore decreasing the overall number of transistors compared to a fully cascaded implementation. In the case of [4,39], the number of transistors and the presented power consumption refers to the realized Support Vector Regression Algorithm [4] or Self Organized Map (SOM) [39], respectively.

Bulk-Controlled Implementations
MOS transistors are four-terminal devices (Gate, Drain, Source, Bulk), in which traditionally the Gate terminal is used as a signal input. Depending on the type of the CMOS technology (for example P-well, N-well, twin-tub), the Bulk terminal is usually connected to either the negative (for NMOS) or the positive (for PMOS) supply voltage or even the related Source terminal (isolating the Bulk from the P-substrate) [46][47][48][49][50][51]. However, there are cases (PMOS transistors, triple N-well technologies) in which voltage signals are applied to the Bulk terminal directly. By using bulk-driven or bulk-controlled transistors the voltage threshold limitation is reduced from the signal path [46][47][48][49][50][51]. Therefore, lower power supply voltages and bias currents are available and hence, using mainly sub-threshold region techniques, the power consumption is decreased. Additionally, the control voltage connected to the Bulk offers a wide range tunable parameter, directly affecting the transistor's Drain current.
The aforementioned benefits motivated researchers to implement new Gaussian function circuits built with bulk-controlled transistors, which achieve electronic tunability in the Gaussian output curve's variance. The variance tunability is also enhanced by altering Delbruck's Simple Bump [1,2], which consists of a non-symmetric current correlator, shown in Figure 13a and a simple differential pair. Some of the proposed modifications include a symmetric current correlator [24,32], shown in Figure 13b, a differential difference pair [5,6,17,18,24,[52][53][54], shown in Figure 14 and/or adding additional transistors to the standard differential pair [24,27,32,55]; an example is shown in Figure 15. Any combination of current correlators and differential blocks can implement a Gaussian function curve, for example, Figure 16. Moreover, there are researchers who made significant modifications to Delbruck's Simple Bump [17,18,24,[52][53][54]. Specifically, the variable width (VW) Bump, shown in Figure 17, used in [17,18,[52][53][54], adds multiple current mirrors along with the (non symmetric) current correlator and the [24] combines the symmetric current correlator with current substructors.
In general, the desired tunability in the Gaussian function curve's variance is achieved by connecting a control voltage to the Body node of some of the differential block's transistors. By incorporating a differential difference pair [5,6,17,18,24,[52][53][54] an increased linearity [56], along with additional output currents are provided. In either case, the control voltage provides sigmoidal shaped curves I 1 , I 2 with adjustable slopes or symmetric displacement of currents. The correlation of these currents results in a tunable Gaussian function circuit. The standard current correlator is, by design, not an ideal circuit, having inherent asymmetries in the output current. The output of a symmetric current correlator is the sum of two non symmetric current correlator cells and therefore reduces such asymmetries. By using this correlator, the Gaussian function is realized more accurately in the cost of increased power consumption and circuit's complexity.

Mp4
Mp1 Mp3 Figure 13. Transistor level implementation of (a) a non symmetric current correlator and (b) a symmetric current correlator.

Mp4
Mp1 Mp3 Figure 16. An example of a fully tunable, bulk-controlled Gaussian function circuit.
Mn10 UP/ DN Figure 17. Transistor level implementation of the VWbump, which has electrical control over the Gaussian function curve's variance.
The analysis of the bulk-controlled designs is based on the MOS model described in [57] and since all transistors operate in the sub-threshold region (domain), the currents for the PMOS and NMOS devices are, respectively: Here, κ p and κ n are the slope factors for PMOS and NMOS transistors, respectively, V G , V S , V D and V w are the gate voltage, source voltage, drain voltage and bulk voltage, respectively, V T is the thermal voltage and I o p and I o n are the characteristic currents (preexponential current) for PMOS and NMOS transistors, respectively [57]. Specifically, by using (4) and (5) the output current of [5] is expressed as: where the variable M is defined as: and parameter x is given by: Here, V ss is the lower supply voltage, V in is the input voltage and I bias , V r and V c are the bias current and the voltage parameters, controlling the height, the mean value and the variance of the Gaussian function curve, respectively. This circuit [5] consists of a non-symmetric current correlator and a differential difference pair. The theoretical output current of the Gaussian circuit, according to (6), is presented in Figure 18.
A summary of each implementation's characteristics is presented in Table 2. This Table includes information regarding the technology used, the minimum operational characteristics (power consumption, power supply, bias current), the operation region and the number of transistors. Regarding the technology, most implementations are in CMOS process, while two of them are tested with discrete components. In the case of [6,24] PTM, transistor models and quad MOS transistor arrays (Advanced Linear Devices) are used, respectively. In general, the power consumption ranges from a few nW to a couple µW, with the power supply being lower than 1.8 V (except from [6]). Moreover, for [54], the power consumption of the entire system instead of a single bump cell is provided. The consumption of 50 µW is relatively low for the realized application (neuromorphic Spiking NN for Electromyography (EMG) signals). The transistors of all the implementations operate in the sub-threshold region, and therefore the minimum operational bias currents are less than 50nA (except from [24]). All of the architectures add additional transistors to achieve independent tunability of the Gaussian function's characteristics. As a result, their number of the transistors is higher than that of Delbruck's Simple Bump.

Circuits Built with Floating-Gate Transistors
The floating-gate transistor or, as it is also called, Floating-gate MOSFET (FGMOS), is a type of complementary metal-oxide semiconductor transistor or metal-oxide-semiconductor field-effect transistor, which has the ability to hold an electrical charge in a memory device that is used to store data [58][59][60]. In comparison with a typical MOS, it has an additional terminal (electrode) between the gate and the semiconductor. The name floating is derived from the floating gate terminal, which is not connected to a voltage source. As in a typical MOS all the other terminals (Gate, Drain, Source, Bulk) can be connected with a voltage source. In the case of a high current, electrons get stuck in the floating terminal. Due to the fact that the floating gate is not connected to anything, it can maintain the stored data.
In all the presented implementations, a classic bump circuit architecture is modified by replacing existing transistors with floating gate ones or by simply adding extra floating gate transistors. There are implementations directly inspired by Delbruck's Simple Bump [1,2], which add an inverse generation stage [61,62], shown in Figure 19 or a folded differential pair [63]. Moreover, a compact design based on Delbruck's non-symmetric current correlator with an integrated differential pair, using floating gate input transistors, is proposed in [64], shown in Figure 20. Similarly, there are architectures inspired by [65], replacing the input transistors with floating gate ones [66][67][68], depicted in Figure 21. Some designs also incorporate floating gate transistors in architectures following the mathematical approach of the translinear principle. Specifically, [26,69] modify the exponentiator; [26] is shown in Figure 22, [7] creates a squaring circuit using floating gate transistors and [8] enhances a Gilbert multiplier [70] with a floating gate memory cell.
In general, the Gaussian function curve's characteristics are controlled via the floating gate transistor's parameters. In particular, the voltage stored in the floating terminal can be used as an additional parameter. In the case of [61][62][63], an inverse generation block composed of floating gate transistors is used, shown in Figure 19. This block provides the appropriate input voltages to the variable gain amplifier (VGA). By altering the gain of the VGA, the width tunability is achieved. An architecture which replaces the input transistors of Delbruck's Simple Bump with FGMOS ones is provided in [64]. It uses FGMOS in order to subtract the stored mean value from the gate input and achieves the width tunability by setting the value of the input capacitors. A simple exponential based design is presented in [66][67][68]. It consists of three MOS and two FGMOS transistors, shown in Figure 21. The height of the Gaussian output curve is controlled via the parameter V GG and References [67,68] achieve the width tunability using extra digital components and signals. References [7,8,26,69] are based on the translinear principle and use FGMOS to reduce the complexity (number of transistors) of typical translinear architectures. Figure 19. Schematic of a Gaussian function circuit with a floating gate transistor based inverse generation block.
Mp5 Figure 20. A floating gate transistor based modification of a bump-antibump circuit.

Mp1
Mn2 Mp1 The output current of a circuit encorporating floating gate transistors depends on the original (without floating gate transistors) architecture. For a design inspired by the Simple Bump (specifically for [63]) the output is given by: where variable γ is defined as: where β 1 is set as: Here, κ is the slope factor, V T is the thermal voltage, I bias , V mean and V width are the bias current and the voltage parameters, controlling the height, the mean value and the variance of the Gaussian function curve, respectively, and V in is the input voltage. I b is a tail current of the core differential pair, I 0 is the pre-exponential current. C c and C d are floating gate input capacitances. Gaussian function curves based on Equation (9) are shown in Figure 23. The characteristics of each implementation are summarized in Table 3. The power consumption for the presented circuits is higher than 90 µW (except from [7]), with a power supply ranging from 0.75 V to 10 V and a minimum operational bias current varying from a couple nA to a couple µA. The power consumption of [66] refers to the realized handwritten digit recognition system. The number of the transistors depends on the original architecture and the methodology used to achieve the electronic tuning (replacing existing transistors or adding new ones). Regarding the operation regime, there are implementations in both the above and the sub-threshold regions.

Circuits Built Exclusively with Differential Pairs
Some researchers follow a simpler approach to realize Gaussian function circuits and base their designs on multiple differential pairs. The Gaussian function is formed by adding and subtracting currents using mainly the differential pairs and current mirrors, unlike the Delbruck inspired architectures that combine differential pairs with current or voltage correlators. Most of the architectures have the same operating principles. Some of the implementations produce a Gaussian function curve by adding two currents (generated from a different input voltage) from two differential pairs [10,21,23,[71][72][73][74][75]. A characteristic example of such circuits is shown in Figure 24. In [75] an extra resistor is used to determine the height of the Gaussian function curve. Some architectures follow the same principle but use folded cascode differential pairs [76], include multiple mirrors [77,78] or produce more than one Gaussian curve [79]. Furthermore, some designs [9,80] are inspired by Gilbert's Gaussian circuit, shown in Figure 25, which is not fundamentally different from the previous implementations but is based on the Gilbert multiplier [70], an example is shown in Figure 26.
The output current of a typical circuit built exclusively with differential pairs (for example [23] ) is given by: where β i = K · W i /L i , i = 1, 2 representing the input transistors, K is a process related constant, V mean and I bias are the voltage and current parameters controlling the mean value and height of the Gaussian curve, respectively, and V in is the input voltage. The variance is controlled via the parameter β 1 . The theoretical output current of the Gaussian circuit, according to (12), is presented in Figure 27. A summary that includes the technology used, the minimum operational characteristics (power consumption, power supply, bias current), the operation region and the number of transistors for each implementation is presented in Table 4. Regarding the technology used, all the implementations are in the CMOS process, with the exception of [10,76], which are tested with discrete components. All the provided power consumptions, mentioned in Table 4 refer to the realized system (except from [23]). The power supply is generally higher than 2 V with most of the implementations operating mainly in the above threshold region and the minimum operational bias current being around a few µA. Regarding the number of transistors, most implementations are generally compact. The implementation with the minimum number of transistors (only four) is categorized here. Moreover, the implementations that are marked with a single star, have a simplified schematic where the bias transistors are replaced with current sources and therefore the actual number of transistors is higher.
.. Figure 24. An example of a Gaussian function circuit using only differential pairs.
... Each implementation uses the extra components differently, but the general concept regarding the added extra components is to enhance the operation of a simple Gaussian function core, for example, to provide variance tunability to the circuit. In particular, multiplexers and switches are used to select the appropriate value from multiple parallel outputs in order to achieve the tunability in the variance [11,15,[81][82][83][84][85]. In a similar manner, the series of resistors alter the Gaussian function output by changing the total resistance value [19,87,88]. Moreover, DACs, multipliers, squarers or tunable current mirrors usually directly affect the height of the Gaussian function [22,28,31,33,87,89,90]. There are implementations that use OTAs as current to voltage converters [91] or deploy three OTAs along with multiple resistors as basic building blocks to design tunable Gaussian function circuits [92][93][94]. Similarly, CCIIs, exponentiators, additional current correlators or minimum value circuits are used as basic building blocks in [87,90,96,97]. The operational amplifier in [87] is used to bias a BJT transistor in the exponential region, while the sense amplifier in [12] operates as a CMFB, similar to the extra components in [22,95].
In Table 5, we summarize the characteristics of each implementation. The provided power consumption for most of the implementations refers to the realized system and varies from a couple to many mW (except from [88]). For the rest, the power consumption ranges from 13.5 nW to 220 µW. The increased power consumption is reasonable due to the scale of the extra components. The power supply is different for each application, with most implementations operating above the threshold (in the saturation region) and the minimum operational bias current ranges from nA to µA. There are implementations for which the provided power consumption or number of transistors, mentioned in Table 5, refer to the bump circuit core without including all or any of the added components. These implementations are marked accordingly.

Other Implementations
The following implementations do not belong to any of the previously described categories and do not fit to a different distinct design methodology. Nonetheless, they can be added into a general group. In particular, there are architectures [13,14,20,30] based on Delbruck's Simple Bump , designs [65,98] inspired by Anderson [12] or based on other function generation circuits, like a triangular [29,99], an exponential [16,100] or a Euclidean [101]. An example for each group is shown in Figures 33-36, respectively. The characteristics of each implementation are summarized in Table 6. Regarding the technology, most architectures are in the CMOS process, except for one [101]. In the case of the power consumption, only three designs provide the appropriate value and two of them refer to the entire system's consumption. The power supply ranges from 1.8 V to 5 V and the minimum operational bias current is less than 1 µA (except from [29]). There are implementations operating in either the sub-threshold or the saturation region, with two of them having transistors that operate in the triode region [13,30]. Regarding the number of transistors, most architectures are generally compact. The implementations that are marked with a single star have simplified the schematic by replacing the bias transistors with current sources or have added resistors or capacitors to their designs.
Mn4 Mn3 I bias I bias Figure 35. An example of a Gaussian function circuit based on an exponentiator circuit.

Gaussian Function Circuit Applications
Gaussian function circuits are used as building blocks in various applications and domains. This Section discusses the applications and describes the role of the Gaussian function circuits in system level implementations. Various realizations are presented and categorized in four main fields. These categories are the following: (a) Analog-hardware implementation of ML algorithms; (b) neuromorphic circuits/systems; (c) smart sensor systems; and (d) fuzzy/neuro-fuzzy systems. The use of Gaussian function circuits in (a)-(d) is extensively explained.

Analog-Hardware ML
The world is filled with a lot of data (words, pictures, videos, etc.) and it does not look like it is going to slow down anytime soon [102,103]. ML provides the promise of deriving meaning from all of that data. As an interdisciplinary field, ML shares common threads with the mathematical fields of statistics, information theory, game theory, and optimization [104,105]. ML is a combination of tools and technology that can be used in order to process all these data. Moreover, all these automated techniques (algorithms) may be able to figure out meaningful patterns (or hypotheses) that may have been missed by the human observer. Traditionally, all these algorithms are implemented in the software. However, there is a trend in which hardware-friendly implementations are used in order to realize these algorithms and models [57,106].
There are three different hardware design approaches with their own advantages and disadvantages. These three approaches are analog, digital, and mixed-mode implementations. In general, digital circuits for ML applications have the advantage that they can achieve high classification accuracy, flexibility, and programmability, but they consume huge power and area due to the large amount of data transaction and high operation speed. On the other hand, specific analog-hardware ML enables low-cost parallelism with low-power computation, but their inaccurate circuit parameters induced by noise and low precision degrade the accuracy. Several mixed-mode architectures took advantage of both analog and digital implementations obtaining low-power consumption within small areas, but it suffers from domain conversion overhead costs.
There are dedicated Analog-hardware architectures for ML algorithms and models that are based on Gaussian function circuits. In Table 7, we summarize some common characteristics of the system level implementations, presented along with the Gaussian function circuit. The proposed ML systems are RBF NNs [11,12,14,16,24,28,61,81,94,101], a general design flow is shown in Figure 37 or other NNs, like a Multi-layer Perceptron (MLP) / RBF network (RBFN) [15] or a Gaussian RBF NN (GRBF NN) [87], Support Vector Machine (SVM) [41,62], regression (SVR) [4] or domain description (SVDD) [42] algorithms, pattern-matching classifiers [66,68], vector quantizers [64,99], a Deep ML (DML) engine [45], a similarity evaluation circuit [67] and an SOM [39]. A typical example of an Analog-hardware implementation of the SVM algorithm is shown in Figure 38. Gaussian function circuits are used for the implementation of two functions that are useful for many ML algorithms: (a) kernel density (b) distance computation. Most of the applications are designed for an input dimensionality lower than 65 dimensions, with some not specifying an upper boundary [15,24,64,87], being able to categorize high definition images. Additionally, the simulation level as well as the circuit area for the layout and chip implementations (or for [39] an estimation), if provided, can be found in Table 7.

Neuromorphic Systems
Traditional computing systems based on the von Neumann architecture are facing many problems related to power efficiency (high power consumption) and memory limitations [102,103]. Indeed, the amount of data to be processed is ever increasing and it is necessary for new computing paradigms. To address these problems, an emerging approach which demonstrates promising results in computing hardware is that of neuromorphic systems [107,108]. This design methodology is inspired by synaptic plasticity in the brain, which is capable of in-memory computing and is suitable for multi-valued or analog arithmetic. The basic building blocks for the implementation of neuromorphic systems are analog spike-based circuits and memristors. There are also design flows which are based on Gaussian function circuits; an example is shown in Figure 39.
Neuromorphic computing represents a novel paradigm for non-Turing computation that aims to reproduce aspects of the ongoing dynamics and computational functionality found in biological brains. This endeavor entails an abstraction of the brain's neural architecture that retains an amount of biological fidelity sufficient to reproduce its functionality while disregarding unnecessary detail. Models of neurons, which are considered the computational unit of the brain, can be emulated using electronic circuits or simulated using specialized digital systems. Analog designs offer power and area efficiency, which is necessary for large parallel neuron arrays. On the other hand, digital counterparts provide reconfigurability (FPGA), portable or scaled Hardware Description Language (HDL) designs (from one technology to another), invariance to process, variance and temperature (PVT), a simpler design of complex functions and fast design of high-level architectures.
All the applications, the simulation level of the designs and the use or not of memristive devices are summarized in Table 8. A spike-based circuit with a configurable stop-learning feature for always-on online learning applications is presented in [17]. The Bump circuit used (VW Bump) is a necessary building block for the implementation of the Delta-rule (a learning algorithm), which compares the rate of the neuron spikes to a target value. A high accuracy Spiking NN (SNN) based on an error-triggered learning rule with the aforementioned stop-learning capability is explained in [18]. The Bump circuit here (VW Bump) is used in order to indicate the stop-learning and/or the weight update mechanism. A stochastic learning rule based on the stochastic nature of memristors is proposed in [52]. The VW Bump is used similarly with [18]. An SNN architecture is realized in [53] based on synaptic elements and mixed-mode circuits. The VW Bump compares the rate of the neuron spikes to a target value and outputs the direction of the weight update. All the previous designs [17,18,52,53] are based on memristive devices and the simulation results are extracted from the circuit's schematic. A mixed-mode neuromorphic processor for the discrimination of EMG signals is presented in [54]. The EMG signals are converted into spikes using a delta encoding scheme. Here, the VW Bump is used for the weight update mechanism, similarly to the previous implementations [17,18,52,53]. This architecture does not include memristive devices and its performance is verified on a manufactured chip.   Figure 38. A hardware friendly implementation of the Support Vector Machine algorithm (learning and classification).    Figure 39. An example of a neuromorphic network high level architecture based on the VW Bump.

Smart Sensor Systems
A typical sensor is a device, sub-system or machine that can detect changes in an environment and it can also send this information to a related electronic system or a simple processor which will derive meaning from these data (signal detection, signal processing, data validation, etc.) [109]. In the case of a smart sensor system, multiple sensors are included [110,111]. Their operating properties can be set by an embedded microprocessor. All the smart sensors have four main functions-measurement, configuration, verification and communication. This means that, apart from a microprocessor, it is necessary to include a wireless communication system. There is a necessity for application-specific integrated circuits, which will be part of smart sensor systems. Except for the implementation of analog front-end, which consists of analog signal conditioning circuitry, there is a need for circuits which will be used for back-end acceleration [103].
Smart sensor systems are based on analog signal processing because they collect realtime data from the environment. Analog signals are easier to process, best suited for audio and video transmission, have a higher density and can present more refined information. Additionally, they provide a more accurate representation of changes in physical phenomena, such as sound, light, temperature, position, or pressure. The drawbacks are the fact that they are prone to generation loss, are subjected to noise and distortion, as opposed to digital signals, which have much higher immunity and are generally lower quality signals than analog signals. In the case of digital circuits, there is a need for DAC/ADC converters and they suffer from round-off noise due to quantization.
Gaussian function circuits are used as a building block for the implementation of the detector in smart sensor systems. Specifically, two exemplary implementations exist in the literature [19,20], to the best of the author's knowledge, and their characteristics are summarized in Table 9. The first [19] is a mixed-mode real-time anomaly detection system for sensor stream statistics, shown in Figure 40. This system is operational in any type of sensor without the need for pre-training on the sensor's data. The probability density function (PDF) learner, which is a part of the implemented system, consists of parallel connected Gaussian function circuits which realise the kernel density based statistics estimation. The second [20] is a fully analog edge detection circuit directly integrated to a photodiode, shown in Figure 41. The edge detection is performed on the analog output of the photodiode greatly reducing the power consumption and the need for data transfer.
Here, the output of the active pixel sensor (APS) is directed to the Gaussian function circuit, which provides a high output current if the pixel is an edge and a low otherwise. In comparison with [19], this implementation [20] is tested on a fabricated chip with an area per pixel of 225 µm 2 . Table 9. Smart Sensor Systems Summary. [19] [20]

Application
Anomaly detection Edge detection

Type of sensor
General Photodiode

Fuzzy and Neuro-Fuzzy Systems
Fuzzy systems are based on fuzzy logic, which provides the theory of modeling real-world phenomena, which are inherently vague and ambiguous [112]. This theory provides all the tools (fuzzy techniques) for processing and mathematical representation. A neuro-fuzzy system is based on a fuzzy system, which is trained by a learning algorithm derived from NN theory [113,114]. Therefore a neuro-fuzzy system can be represented as a special multilayer feedforward NN and can be used as a universal approximator. Moreover, it can be interpreted as a system of fuzzy rules. It also uses fuzzy logic criteria for increasing the size of a NN. NNs are used to tune membership functions of fuzzy systems that are employed as decision-making systems for controlling equipment.
Except from software-based implementations, there are many realizations of membership functions, which are based on Gaussian function circuits. The fuzzy/neuro-fuzzy systems, which are realized based on these membership functions are categorized according to the application example. Most of them are mixed-mode implementations which took advantage of both analog and digital circuits. The existing categories include controllers [72,74,79,80,85,88], object recognition inference [82][83][84] or neural perception [31] engines or processors [22], function approximators [71,75] and a min-max network [21]. The presented controllers are hardware-friendly implementations based on classic fuzzy control theory, an example is shown in Figure 42. Specifically, References [74,79,80] design a Takagi-Sugeno based controller and [72] realize a Type-2 fuzzy controller. The object recognition applications [22,31,[82][83][84] are based on neuro-fuzzy logic, an example is shown in Figure 43. In particular, a fuzzy system pre-processes the input data for the following perceptron (a single NN layer with an activation function). Both function approximators [71,75] combine different membership functions to produce more complex nonlinear functions. The min-max network [21] uses the Gaussian function circuits in the fuzzification block prior to the min-max operators. In Table 10 we summarize the category, the complexity, the simulation level and the area of each implementation. The fuzzy rules complexity ranges from 4 to 50. A system with 50 rules is considered a high complexity system [113,114]. Almost all of the designs are tested on fabricated chips, except from [21] and the chip area varies from 0.08 mm 2 to 50 mm 2 .

Summary and Discussion
Throughout the years, there have been many different analog implementations of the Gaussian function, using various design techniques. These circuits are implemented targeting specific characteristics, for example low power consumption, area efficiency, high computation speed, better tunability or increased similarity with the theoretical response. Unfortunately, reliable documentation can be provided only for the power consumption, the number of transistors and the minimum operational bias current. The rest are not given by most of the research teams. Based on Section 2, a summary containing five architectures for each characteristic is presented in Tables 11-13.   Table 11 includes the implementations with the lowest power consumption. The power consumption ranges from 3.3 nW to 6 nW using bulk-controlled transistors [5,27,32,55], except from [95], which adds a CMFB circuit, which consumes 18.9 nW. They are all compact implementations, consisting of 10 to 14 transistors with a power supply at only 0.6 V [5,27,32,55] (except [95]) and their transistors operate in the sub-threshold region. All five designs are also fully electronically tunable. Table 12 includes the implementations with the smallest number of transistors. The number of transistors ranges from 4 to 8 but without tunability in the Gaussian curve's characteristics, except from [64]. These compact implementations are designed using floating gate transistors [26,64], differential pairs [9] or other designs techniques [20,100] with a power supply ranging from 1.3 V to 5 V. The transistors of [20,100] operate in the above threshold region, while [64] operates in the subthreshold region. The design of [26] can operate in either the sub or the above threshold region and [9] has transistors operating in both regions. Table 13 includes the implementations with the smallest minimum operational bias current. These bias currents vary from 1nA to 3nA while being fully electronically tunable. Moreover, small bias currents are directly related to the operation of transistors in the subthreshold region. The number of transistors is relatively low (9 to 14). However, in the case of [95,97], the schematic is simplified by replacing current mirrors with current sources.
Architectures based on the translinear principle have many advantages and drawbacks. Translinear circuits have high frequency operation, high parameters' tunability, low supply voltage, lower-power consumption, low noise, low third order intermodulation distortion, low total harmonic distortion, the immunity to body effects, extended dynamic range, compactness, design modularity, and low circuit complexity. Despite the fact that translinear circuits have efficient realization of many analog nonlinear signal processing functions with small quantities of MOS transistors, in the case of the Gaussian function curve's implementation three separate components are required (absoluter, squarer and exponentiator circuits). Therefore, the number of transistors is higher compared to other architectures. The trade-off lies between the accuracy in the realization of the Gaussian function curve and the architecture's complexity. Circuits based on the translinear principles offer higher quality Gaussian function curves (compared to the theoretical Gaussian function) since they implement the exact mathematical equations, at the expense of using extra transistors.
Bulk-controlled designs reduce the need for extra transistors by using the fourthterminal (Bulk terminal) to offer the desired tunability. The bulk-controlled transistor also deals with voltage threshold limitations and the whole topology is biased easier with lower power supply (based on sub-threshold region techniques). An additional advantage of connecting a parameter voltage to the bulk-terminal of differential pairs is that the bulks are no longer connected to the power supply rails, thus reducing any possible supply noise. Consequently, this circuit has better power supply rejection ratio. Possible drawbacks include higher leakage currents, lower computation speed, approximate behavior of the Gaussian function and necessity for triple-nwell technology.
Circuits built with floating-gate transistors use an extra terminal just like bulk-controlled architectures. This terminal, except from the desire tunability, also provides a non-volatile data storage capability. As a result, these implementations are relatively compact. However, FGMOS implementations require a high power supply voltage, which leads to higher power consumption. Moreover, the incorporation of FGMOS presents challenges in the aspect IC fabrication.
Gaussian function circuits based on differential pairs are compact and have design modularity. They operate at low speeds, limited parameters' tunability, high supply voltage and high-power consumption. These designs are used as a simple solution to realize a Gaussian function curve. Architectures using extra components achieve higher tunability at the cost of higher complexity (area) and power consumption.
Despite the numerous works in the literature about the implementations and the applications of the Gaussian function circuits, analog realizations have not yet been established in commercial or real-world applications compared to digital or software-based ones. Therefore, to further motivate new researchers to implement new analog realizations, development should be focused on the advantages of analog-hardware implementations. New computing paradigms should lead to a new domain of smart industry based on low-power consumption, area efficiency, high computation speed and parallelization. This way, analog accelerators should gain popularity and create a stable dipole between them and digital ones. In this case, hardware (both analog and digital) implementations will gain a new role in the artificial intelligence domain and new demanding applications will be developed in the future.

Conclusions
This paper has provided a review of Gaussian function circuits' architectures, operating principles and applications. Furthermore, a number of the commonly used design architectures for current correlators, differential blocks, current-mode circuits, analog computational circuits and sub-threshold region methods have been discussed in detail with possible tradeoffs. In the context of current applications, state-of-the-art high-level implementations have been subsequently described to illustrate challenges in their realization together with different approaches and techniques. Collecting and providing all the referred architectures and applications, it is necessary to upgrade these implementations and design new, high-speed, ultra-low power, area efficient and accurate Gaussian function circuits, which can be used as building blocks in different wearable or portable applications.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: