Here, we introduce the basic definitions associated with Artificial Neural Networks and Spiking Neural Networks. Then, we describe the contribution of this work.
2.1. Artificial Neural Network Fundamentals
Artificial neural networks (ANNs) take inspiration from nature. Biological neurons, as depicted in
Figure 1a, respond to positive and negative stimuli and generate a stimulus to other neurons. The stimulus from one neuron to another crosses a small gap called a
synapse. The neuron’s activity level is an electrical potential across the neuron cell membrane. The signal is collected along the dendrite structures, and it is integrated as a
membrane potential, until the neuron has a high-enough potential to fire an outgoing pulse. This pulse, called
action potential, travels along the axon structure to one or more axon terminals at synapses with further neurons [
16].
Figure 1b depicts the potential voltage level observed during a firing event. It has a spike-like shape, so action potential pulses are often called spikes. Once the action potential pulse travels down the axon, the potential drops below the nominal baseline level, and there is a time (refractory period), during which the neuron cannot fire another significant pulse. Thus, each neuron has an upper frequency limit on the overall spike rate that can be considered its fully-saturated activity level.
Once the action potential reaches an axon terminal, the signal propagates onto the next neurons. This mechanism can be inhibitory (suppressing the neuron output), or excitatory (increasing the activity level of the downstream neuron). In addition, within a group of synapses, a synapse may be larger/smaller, more/less sensitive, or repeated multiple/fewer times (or not at all) than the other. This can increase or decrease the effect of a neuron’s stimulus impulse on the following neuron. Thus, there is a variable gain or
weight for each connection from one neuron to another, that can be modeled with gain factor (positive, negative, null) relative to a baseline level. A model of the biological neuron, called an artificial neuron is shown with its connections in
Figure 2.
ANNs are organized into layers, where all the neurons in one layer receive their inputs from only neurons in a prior (upstream) layer, and only provide outputs to the neurons in the subsequent (downstream) layer.
Figure 3a shows a neural network layer with 3 layers, and
Figure 3b shows the notation for the weights, bias, inputs, and output for a neuron. Note that we can use a vector to denote the set of input activations to a neuron from the connected neurons in the prior layer. Same for the set of weight factors for each of these connections. Here, we denote
as the number of layers, with
being the layer index,
. For a neuron
in layer
, the membrane potential is denoted by
, the bias by
, and the action potential by
. Most ANN implementations model
as the sum of each input connection’s signal level
weighted by the synaptic gain factor
from neuron
in an upstream layer (prior layer
) to neuron unit
in the current layer
[
17].
It is common to represent the computation for all membrane potentials in layer
in matrix form as per Equation (2). Here,
is a column vector that includes all the membrane potentials of layer
,
is a column vector that includes all the inputs signals to the layer
,
is the weight matrix containing all the synaptic gain factors from layer
to layer
, and
is a column vector that includes the biases of all neurons of layer
.
Figure 4 depicts the matrix operation.
The action potential intensity (activation function) of a neuron
in a layer
, denoted by
, is modeled as a scalar function of the membrane potential
. We can also denote all action potentials of layer
as
, where
. Note that for
,
represents the outputs of the first layer, which are defined as the inputs to the neural network.
Early research into so-called Perceptron neural networks determined that purely linear weighting combiners as in Equation (1) do not have some important processing abilities [
18], and that some non-linear response is needed. Natural neurons mostly exhibit not much response to low input signals, a rapidly increasing response in some sensitive portion of the overall range, and then a modest or no further response level for additional input levels (i.e., saturation). Various functions have been proposed based on their computational cost and system effectiveness. Specific examples include biased, thresholded step-function (Equation (4)), Rectified Linear Unit (ReLU) (Equation (5)), and true sigmoids based on the hyperbolic tangent, arc-tangent, or logistic functions (Equation (6)).
While there are mathematical arguments for smooth sigmoid functions such as full differentiability; in practice, linear rectification (ReLU) tends to work nearly as well and it is much simpler to implement in software and hardware [
19], whereas the thresholded step function does not provide much expressiveness, leading to information loss that is unsuitable for many applications.
The action potential between neurons can be encoded with numeric formats such as 16-bit, 32-bit, 64-bit floating-point or variable-size fixed-point (down to 8 bits or less). Likewise, the weights for each synapse-like connection between neurons can have sizes from 64 to just a few bits. With fewer bits to represent the numbers the range and precision of the encoded value is reduced, resulting in loss of accuracy. However, such methods reduce hardware resource usage, thereby increasing performance along dimensions not directly related to accuracy, such as power, end-to-end latency, throughput bandwidth, heat dissipation, and system cost at a given level of performance.
2.2. Spiking Neural Networks
These are ANN designs that encode the neuron activation as a series of pulses and implement an
integrate and fire neuron action model where the neuron unit receives spike inputs over a period of time and fires once a certain threshold is achieved [
11]. For such Spiking Neural Networks (SNNs), the most direct realization of a biological neural signal would be a series of short-duration binary pulses, whose frequency is proportional to the overall excitation level of that neuron. Since natural potential voltage spikes of a given neuron’s action potential signal tend to have the same size and shape, using a simple, short 1-bit digital pulse is a reasonable simplification. Encoding of higher-level information onto clusters/trains of spike events in biological neurons is not well-understood, but some researchers suspect that it is more than straightforward frequency encoding. Conceivably, pulse clusters of various durations, spacing, and intra-cluster spike density, or finer-grained spike timing patterns may encode multi-dimensional information in a given neuron’s axonal output. However, useful applications are possible with simple spike patterns, as detailed in this work.
Often, the SNN model includes a leakage decay term so that the activation is suppressed when there is a low rate of input activation. The bias term is a first-order approximation of the leakage effect, at least when the SNN network activity is structured into fixed, synchronized time-frames. In an asynchronous SNN, where each unit fires independently at the time it has met its threshold, the loss mechanism can be modeled as a fractional loss rate for the current membrane potential .
SNNs have some features that reduce hardware complexity, but at the same time they present some implementation challenges. The features that can be optimized are listed as follows:
The output of a given unit can be represented by a single wire rather than by an 8-bit or 16-bit bus. This simplifies routing in the FPGA and reduces the size of registers and other elements;
Spikes (as pulses) can be easily counted (up/down counter), making the integration a straightforward process rather than utilizing multi-bit accumulators;
Weights (gain factors) for input connections can be realized by repeated sampling of the input pulse as per the gain level or by ‘step counting’ after receiving each input pulse;
In the analog domain, specially-designed transistors may be used to better implement the integrate and fire behavior with very few active components by using a special charge-accumulating semiconductor layer. Also, analog signals are more robust to modest amounts of signal pulse collision (two signals hit the bus at the same time), within the limits of saturation.
However, there exist some challenges to applying SNNs in applications, such as:
Many existing processors, input, and output devices are designed to work with multi-bit digitized signals. Spike-oriented encoding methods are uncommon among current electronics components and system designs, although there are some examples of duty-cycle encoding (variable pulse width for a set pulse frequency) or pulse-rate encoding. Pure analog-domain transducers exist, and components such as voltage-controlled oscillators (VCOs) may be a useful front-end interface. If SNNs become more prominent, interfacing mechanisms will improve;
If spikes are represented by 1-bit pulses, if two of more outputs feeding a single input are firing at the same time, it will be received as a single spike pulse, and any multiplicity of spikes will be lost. For a large and robust enough network, the signal loss due to spike collision may not impact the accuracy significantly. This collision effect can be minimized if the pulses are kept short and each unit producing an output delays the timing of the output by a variable amount within a longer time window. This delay could be random, pre-configured to avoid conflict with other units, or externally-triggered and coordinated to minimize conflict. Alternatively, the multiple output spikes can be held over a longer period so the receiving unit may sample each one using a series of sub-spike sampling period. These techniques, however, introduce latency;
Some types of neural network layers, such as the softmax layer that emphasizes one output among a set of outputs (the best classification out of multiple possibilities) and normalizes them in a likelihood-based ration, are difficult to implement in a SNN. It is possible to make do without such layers, but interpretation of SNN output becomes somewhat less intelligible;
Perhaps the most fundamental challenge is that training of SNNs is not as straightforward as the conventional ANNs. Back-propagation of network error compared to the expected results is a gradient descent operation where the local slope of an error surface is expected to be smooth due to differentiability of the neuron internal operation. Depending on the SNN design, there may be internal variables that provide a smooth error surface for gradient descent. Alternatively, we can train an ANN proxy of the intended SNN with gradient descent and map the gain and bias values to the SNN representation with reasonable overall performance. We also note that there are training approaches unique to SNNs such as Spike-Timing-Dependent Plasticity (STDP) that reinforces connections in the network where the input spike is temporally coincident or proximal to a rewarded output spike, while weakening others [
15].
2.3. Our Implementation
Based on the advantages and drawbacks of SNNs, we propose an SNN-like design that features: (i) Fixed-frequency duty-cycle encoding of excitation level inputs rather than frequency encoding; (ii) sub-pulse sampling over all connections via a control circuit; (iii) use of back-propagation of a proxy ANN with corresponding encoding precision for weights, biases, and excitation levels.
Our work seeks to address the design of a neural network (SNN-like) that optimizes resource usage and that it is trainable with conventional back-propagation techniques. There exist fully analog neural network implementations that offer high efficiency in power and performance. There are also custom chip digital designs for a neural network. We preferred an FPGA implementation as its flexibility lets us explore various hardware designs. We can model and train a neural network using a software application, then map it into a hardware design in an FPGA that could be successfully synthesized, simulated, and validated. In addition, while some SNNs use an alternative to classic gradient-descent back-propagation such as spike-timing-dependent plasticity (STDP), we prefer to make use of back-propagation techniques for which a large body of work already exists.
Finally, for our target application, we selected the problem of classification of images of hand-written numeric digits in the MNIST data set [
20]. This is a well-known and popular neural network test application because the data set is easy to access and use; input test images are of modest size; the possible digit classes 0 to 9 are well-bounded to a small, definite set, allowing correspondingly modest-sized neural networks to be designed, trained, implemented and evaluated in hardware. However, the digit-classification problem offers a set of non-trivial challenges, including a wide range of digit styles, various levels of neatness/clarity, different sizes, different orientations, etc., and requires learning generalization in order to perform robustly [
21].