The programmable, energy-efficient analog MLP system contains neurons, winner-take-all (WTA) circuits, and switching structures as well as several support structures. The neurons are composed of several multipliers and one sigmoid circuit that are implemented via weak inversion operation, providing increased power efficiency while creating exponential characteristics within the circuits [
16]. The support circuitry consists of numerous copies of biasing cells, switching cells, and shift registers that provide power, connectivity, and programmability, respectively. The system structure has been constructed with many copies of these subsystems to create an analog configurable neural network. The block diagram for the MLP is detailed in
Figure 13 without the bias cells and shift registers. The system has four inputs that are routed to the neurons with two different types of switching matrices (detailed further in
Section 3.4). The switching matrices offer the ability to create an MLP structure with 12 neurons in a fashion that the user requires. For example, the user could utilize a single neuron in the first layer that outputs to the next layer containing 4 neurons that in turn output to the next layer of 2 neurons that could then output to the winner-take-all circuitry before the signals exit the integrated circuit. The system limits each layer to four since each neuron only has four input multipliers, which is to simplify the overall structure to focus on the programmable and low-power system design. The main objective for this small MLP design is to merge the programmability capable within digital circuits with the energy efficiency achievable within analog circuits to implement a system that contains constructs that are scalable and improve the power efficiency of the simple neural network.
Section 3.4 details the general operation of the overall architecture, while
Section 3.1,
Section 3.2 and
Section 3.3 describe in further detail the circuitry that composes the system architecture.
3.1. Multiplier Design
The multiplier in this work is based on the translinear principle developed by Gilbert [
17]. The basic translinear principle states that any closed loop containing an equal number of devices oriented in both directions (clockwise and counterclockwise loops) creates a circuit where the product of currents in one orientation equals the product of currents in the other orientation. Gilbert created the first translinear cells consisting of bipolar transistors, whereas today designers can utilize MOSFETs in weak inversion to create the same operation [
18].
As the following equations will demonstrate, the multiplication and division operation of the Gilbert translinear cell is completely reliant upon four currents (
through
) and not upon device parameters.
Equation (
3) is based on the translinear principle equating two current loops of equal
pn junctions where
J represents the current density of a bipolar transistor.
Equation (
4) is a manipulation of Equation (
3) to isolate a single device’s current density relative to that of the other three devices.
Equation (
5) represents removing the device sizing characteristics from the current and is only true when Equation (
6) is true. Equation (
6) states that the device sizing (in this case, the emitter area of the bipolar devices) of the first two transistors must equal that of the second two for the current multiplication or division in Equation (
5) to be valid. Gilbert states in [
19] that the devices need not all be the same size but rather only equal in the same amount of area when their area is multiplied with the area of their corresponding device. The Gilbert cell configuration coupled with weak inversion operation creates the opportunity to develop a MOSFET circuit capable of operating with improved energy efficiency via current-mode operation in a neural network application.
With control of individual transistor characteristics, the multiplier circuit can be biased to operate in weak inversion at low current levels. This ability opens up the possibility of a low-power solution for the multiplier circuit. Weak inversion utilization offers exponential transfer characteristics and low power consumption [
7]. The MOSFET weak inversion equation is compared with the bipolar exponential equation in the following two equations:
Equation (
7) depicts the MOSFET weak inversion drain current, where
is a process-dependent constant,
W is the width of the transistor,
L is the length of the transistor,
is the gate coupling coefficient,
is the gate voltage,
is the source voltage, and
is the thermal voltage [
16]. Equation (
8) represents the collector current in a bipolar transistor, where
is the saturation current and
is the base-emitter voltage [
20]. Equations (
7) and (
8) are highly similar to the largest difference coming from the gate coupling coefficient (
) in the MOSFET equation. If assuming that
equals one, then the transconductance of a MOSFET and a bipolar transistor are the same, leading to a similar operation between the weak inversion MOSFET drain current and the bipolar collector current. Furthermore, Equation (
7) directly changes Equation (
3) to be the following for currents:
While the
term does not cancel out from both sides of Equation (
9), the variance experienced by the inclusion of the
term is taken care of by the biasing/weighting constructs implemented within the system and does not appreciably affect the functionality of the multiplier within the overall system. While the MOSFET weak inversion operation varies greatly within real devices, these equations demonstrate the ability to utilize MOSFETs as exponential devices similar to BJTs for the implementation of the system circuitry.
The multiplier design consists of two different sections. The first section implements the multiplication operation through a circuit similar to Gilbert’s bipolar cell. The second section implements two Minch cascode current mirrors to provide the bias or weight signals (
and
in Equation (
5)). These signals scale the input signal according to the desired weight characteristic being implemented.
Figure 14 and
Figure 15 show the circuit diagrams for the multiplier cell and the Minch cascode current mirrors, respectively. The dimensions of the transistors are not shown on the figures, but the minimum-sized devices are used for maximum area density within a neural network application (width of 360 nm and length of 240 nm). These devices are thick oxide transistors used to take advantage of their higher threshold voltage for better control of the weak inversion signals.
The multiplier utilizes a current-mode approach to decrease the propagation delay of the circuit from long wire traces on-chip. The current-mode operation along with the weak inversion biasing allows the circuit to operate at increased energy efficiency. Current-mode operation offers the capability for higher data rates due to lower signal propagation delay times due to the parasitics seen by the current signal. Lower-power operation is obtained by biasing the circuitry in the weak inversion region, which also maintains the exponential transistor characteristics required for successful multiplication.
Figure 14 depicts the core circuitry that implements the multiplication operation and is similar to Gilbert’s cell. The five core transistors exhibit a similar translinear loop seen in Gilbert’s cell due to the exponential characteristics of the MOSFET weak inversion operation. The addition of the cascode device in the
input pathway allows the multiplier to operate faster while maintaining signal integrity via a low impedance input node for the current input signals. Otherwise, there would be significant current division between the input node and the source impedance, resulting in attenuation in the signal pathway. The circuit operation is as follows: the input signal enters via the
pathway, the input signal is scaled with the
and
weight signals from the Minch cascode circuits, and
sinks the current, producing a scaled version of the input signal with some variations due to transistor mismatch and weak inversion operation.
The circuits in
Figure 15 represent two Minch cascode biasing schemes that transfer the weight inputs to the multiplier for adjustment of the input signal. The Minch cascode circuit from [
21] creates a low-voltage cascode structure that allows for weak inversion bias weights while maintaining a lower-voltage rail for the circuit. The Minch cascode circuit also provides a more accurate current mirroring operation than other simpler current mirror designs. This effect allows for the multiplier weights to be more reliably reproduced as they are programmed for physical circuits, even accounting for transistor mismatch and operation in the weak inversion region. The key to this increased accuracy comes from the input bias current signal (
) that helps reduce the weak inversion current mismatch [
21]. The Minch cascode scheme allows the transistors to operate effectively during switching operations within the minimal voltage headroom seen in modern process technologies. Since the weight signals are DC values and do not require high-speed operation like the input signal
, the Minch cascode cells can easily be integrated with the multiplier cell while maintaining the weak inversion weights.
Figure 16 details the simulation results for a 100 nA input signal into the multiplier when the weight and saturation currents are also set to 100 nA. The current output signal follows the input signal with some variation due to the utilization of subthreshold signals, but this variation will be resolved at the system level via adjustment of weights further down the signal processing line (at either the sigmoid circuit or the next neuron). Additionally, the added current is beneficial to maintaining the desired current signal levels while the signal propagates through the system.
Figure 17 demonstrates some of the DC characteristics of the multiplier for set weighting signals of 100 nA each, while the input current is swept from 0 to 500 nA. The output signal at the bottom shows current variations due to the subthreshold design, but these variances will not impact the functionality of the following sigmoid circuit significantly and can be adjusted via the control of bias currents in either the sigmoid circuit or the multiplier circuit itself.
3.2. Sigmoid Design
The sigmoid circuit is an important base element in the MLP system, creating either a logistic function or hyperbolic tangent function output (
Figure 18) at the end of the neuron signal pathway that will propagate onto another neuron within the neural network. For this work, the logistic function is utilized since it does not require a negatively biased circuit to operate as in the hyperbolic tangent function. The utilization of current signals in the sigmoid circuit is desired to allow the MOSFETs to function in the weak inversion region while operating as quickly as possible.
The core circuitry for the sigmoid function consists of three transistors, a differential pair, and a tail (bias) current [
22]. The differential output current of the MOSFET pair is quite similar to that of the bipolar pair with some differences that stem from using MOSFETs over bipolar transistors. Analyzing the circuit from Mead, the following equations are generated:
Applying the saturated drain current equation to the differential pair yields:
The drain currents added together must equal the tail bias current:
Solving for
and substituting into Equations (
11) and (
12) produces:
Taking the difference of Equations (
14) and (
15) gives the final tanh function:
For all of the above equations,
is a constant that represents the ratio of the MOSFET surface potential of the gate voltage [
22]. The basis of these equations requires the differential pair to have voltage inputs. Since the multiplier design from the previous section outputs a current signal and the reference signal for the sigmoid circuit being a current signal, both need to be converted via diode-connected devices represented by Equations (
17) and (
18), leading to Equation (
19), which is the output current difference based on the input current and reference current signals. Therefore, it has been established that a circuit including a differential pair will produce the desired sigmoid function if the input currents are transformed appropriately to voltages via the diode-connected devices, and the output currents are effectively subtracted from each other.
Figure 19 is the design obtained that accommodates high-frequency current signals and outputs the desired logistic function. The sigmoid design is composed of several current mirrors that relay the input, bias, and output signals to and from the differential transistor pair at the core of the circuit with all transistors having a minimum width and length of 360 and 240 nm, respectively. These width and length are the minimum size for the thicker gate oxide devices available within the 130 nm technology node that offer a higher threshold voltage, which in turn provides the capability for greater design margin within the subthreshold region of each device. The basic functionality of the circuit is that when the main input signal
is below the reference input signal
, the output signal
will be near zero current. When
becomes much higher in magnitude than
,
will then output a current that is near that of the bias current
for the sigmoid. This behavior is typical of a logistic function at the positive and negative extremes on the horizontal axis. The reference current
allows for the logistic function’s half point to be shifted further up the horizontal axis (see
Figure 18). This functionality requires the circuit to have more input current in order to reach the fully saturated bias current output. The functionality of the sigmoid circuit can only be obtained by operating in the weak inversion region, which enables the output to take the shape of the desired logistic function for proper signal propagation.
Figure 20 details the simulation results for the sigmoid circuit with a 100 nA switching current input signal and reference and biasing currents that produce an output signal that is similar in amplitude to the input signal. Chosen current values were based on the utilization of the bias cells (discussed in
Section 3.4) with a power of 2 increments up and down from a system reference current of 100 nA. Therefore, a reference current of 25 nA was utilized alongside a bias current of 200 nA to produce an output current signal of around 93 nA for propagation to the next neuron in the system.
Figure 21 shows the DC characteristics of the sigmoid circuit as the input signal is swept from 0 to 500 nA with a bias current of 200 nA and a reference current of 25 nA. The output current signal shows the activation function and the signal magnitude required for the logistic curve to start. The subthreshold variance in the sigmoid circuit keeps the output signal from saturating at the bias current, but this variance can be dealt with in a similar fashion to the variance seen in the multiplier with the bias control system (adjustment of either the reference or bias currents to achieve a sufficient signal magnitude).
3.3. Winner-Take-All Design
The final standalone circuit before the system circuitry is that of the thresholding circuit (TC) that creates a WTA circuit when combined with a 2-input OR gate. The TC receives neuron outputs and outputs a signal based on the highest signal level it receives at its input.
Figure 22 shows a block diagram for the two WTA structures that correspond to the two system outputs. Each “complete” WTA design consists of a multiplier circuit (
Figure 14) and a TC cell (
Figure 23). The multiplier circuit sums the neuron current outputs at its input terminal and then scales the summed currents to create a better signal for comparison in the TC cell. The TC cell takes the multiplier output and compares it with a reference current level. This comparison determines if the signal should remain “high” or “low” by creating a voltage at the comparison node that is either just above the threshold voltages of the following inverters or just below their threshold voltages. The use of the inverters provides the final output off-chip to be a digital voltage instead of an analog current signal that then needs to be converted. The inverters have different minimum widths (160 nm) and lengths (120 nm) compared with that of all the other transistors to provide faster functionality as they create the digital output voltage and utilize less physical space. The different minimum width and length stem from utilizing the standard gate oxide devices that are the main transistor devices available within the 130 nm technology node.
The TC cell in
Figure 23 contains three current inputs and one voltage output. The main input signal
that comes from the multiplier (and the neurons before that) goes into a PMOS Minch cascode current mirror to maintain the signal level and integrity before the comparison node. The reference current
for current comparison is input into an NMOS Minch cascode current mirror similar to those in the multiplier circuit mentioned previously in this chapter. The Minch cascode structure requires a bias current
to operate effectively. The reference and bias current inputs are taken from system circuits and are DC values. The inputs
and
are mirrored and compared against each other at the comparison node before the two inverters. As mentioned before, this node fluctuates based on the input signal
around the inverter threshold voltages. The output signal
is a digital voltage signal that is then passed on to an OR2 gate. The OR2 gate provides another level of comparison with another signal chain, ensuring that the MLP system output follows the winner-take-all concept.
Figure 24 demonstrates the winner-take-all circuit functioning in a simulation environment. The input signal to one of the multipliers is a 100 nA switching current signal. The weight and saturation currents for the multiplier are both 100 nA with the biasing signal for the TC cell being the same level. These current signals lead to the winner-take-all circuit, producing a high digital output signal when the current input signal is high.
3.4. MLP Hardware System Architecture
The system design is based on the structure of field-programmable analog arrays (FPAAs) or FPGAs such that a similar programmable construct can be achieved. The architecture contains several hundred biasing cells that need to be programmed for the proper operation of the neural network. The basic biasing scheme flows as follows: first, the master bias current is sent on-chip; next, the master bias is mirrored to the bias control circuitry that controls whether currents are sent to the neuron/WTA blocks; lastly, the bias current is sent and mirrored into the bias cells based on how many of the current mirrors are programmed in each neuron/WTA block. The bias programming is controlled by a string of shift registers that are made up of basic D-type flip-flops, where an example of a 4-bit shift register can be seen in
Figure 25.
Figure 26 shows the master bias input current structure and one of the current mirrors that would be controlled to send current to a single neuron or WTA block.
Figure 27 details the bias input structure for the neuron and a single current mirror for the biasing of the neuron or WTA block. The programming of the neuron or WTA biasing determines the number of current mirrors in parallel for each bias current input.
The weighting of the multiplier and sigmoid circuits relies upon the programming of hundreds of bias cells similar to
Figure 26 and
Figure 27 via shift registers connected directly to the “Sw” node between the transistors that set the current sink or source and VSS and VDD, respectively. Several bias cells have been put in parallel for each required bias current in the multiplier and sigmoid circuits to provide the weights required for an appropriate system operation. The bias cells for a single current signal would range from 0.25× to 8× or 16×, the master bias input current, which is 100 nA supplied externally. This methodology offers discrete current steps from 25 nA up to greater than 800 or 1600 nA, depending upon the circuit’s needed biasing levels. These digitized current levels additionally provided programming for the user and increased design flexibility.
The system in
Figure 13 consists of 12 neurons, 2 WTA blocks for the main system outputs, S-switch matrices, and C-switch matrices. Starting with the neurons, each contains four multipliers and a single sigmoid, which is shown in
Figure 28. Normally, a neuron for a configurable architecture would have a much larger number of multipliers per neuron as each neuron should be capable of receiving inputs from every other neuron in the previous layer. The total inputs are limited to four per neuron due to IC process limits with metal layers and interconnects. The constraint on the inputs maintains signal integrity by decreasing routing and switching characteristics that would be needed for a higher number of multipliers, and helps constrain the required chip area for the hardware implementation.
The next structures that will be discussed are the two switch matrices. The C-switch and S-switch matrices are developed from the switch matrix topology in [
23] and take the forms shown in
Figure 29a,b, respectively. The main difference is that the switches in the matrices are a single PMOS transistor whose gate is controlled by the output of a shift register instead of a floating gate node. The use of a single PMOS transistor instead of a floating gate cell for each switching node for connectivity reduces the complexity of the switching network and the programming complexity. The basic C-switch matrix has eight vertical routes that can be connected to five horizontal routes (four for neuron inputs and one for the output).
Figure 30 shows these vertical and horizontal routes with the C-switch (single transistor) linking them together when activated. The S-switch matrix provides the ability to route a signal north, south, east, or west with as few switches as possible and can be seen in
Figure 31. Each S-switch construct contains six transistors for the desired signal connections and a 3-bit address decoder to simplify the number of shift registers required to program a single S-switch structure. Additionally, the S-switch matrix offers routing through a layer of neurons to the next layer if desired. The connectivity is also stored within blocks of shift registers, providing a single desired configuration prior to the application of any input signals.
The system operates by routing one or more of the input signals to the first layer of neurons. Next, those neurons will weigh their inputs at the multipliers, sum the multiplier currents together before the sigmoid input, and then output a signal to the next layer of neurons. This operation will continue until the desired number of layers is achieved or all resources on the chip are utilized. The last layer of neurons will output to one or both of the WTAs that will then amplify and compare the signal with a reference before outputting the digital version of the final signal. The system requires four data streams for switch programming, four data streams for neuron and WTA bias programming, and one data stream for master bias control programming.
The system implementation offers increased configurability for analog neural network hardware. The improved configurability stems from implementing an FPGA/FPAA-type routing scheme that allows signal pathways to be created going back and forth throughout the system to reach the desired resources. Additionally, the simplistic circuit structures utilized promote the scalability of the overall architecture by offering a “plug and play” type of circuit blocks that can readily be placed in a system architecture with little adjustment required to fine-tune the circuit performance. The biasing cells included within each neuron also provide greater flexibility to control multiplier weighting, sigmoid reference and biasing, and thresholding circuit response time. In this manner, the designer has the ability to control the power consumption and speed of the overall architecture via the programming of the neural network hardware. The programmability of the system offers the user the ability to create a diverse range of very basic neural networks that are capable of improved energy efficiency. The programmability also offers the capability to circumvent the effects that the subthreshold circuit variations have on the signal pathways. For example, the signal variations caused by the subthreshold design attribute to generally higher currents on the “high” levels of the signals. This variation can be neutralized by either adjusting the sigmoid circuit’s reference or biasing levels or altering the biasing levels of other circuits downstream to be slightly lower to attest for the higher input levels. In this fashion, the analog variations that occur can be readily adjusted with little to no system level effects.
Figure 32 shows the final layout view of the MLP programmable architecture that was sent for fabrication within the 130 nm technology node. The design with all circuit constructs was constrained to a 1 mm by 1 mm square layout space.