Solving Overlapping Pattern Issues in On-Chip Learning of Bio-Inspired Neuromorphic System with Synaptic Transistors

: Recently, bio-inspired neuromorphic systems have been attracting widespread interest thanks to their energy-e ﬃ ciency compared to conventional von Neumann architecture computing systems. Previously, we reported a silicon synaptic transistor with an asymmetric dual-gate structure for the direct connection between synaptic devices and neuron circuits. In this study, we study a hardware-based spiking neural network for pattern recognition using a binary modiﬁed National Institute of Standards and Technology (MNIST) dataset with a device model. A total of three systems were compared with regard to learning methods, and it was conﬁrmed that the feature extraction of each pattern is the most crucial factor to avoiding overlapping pattern issues and obtaining a high pattern classiﬁcation ability.


Introduction
Even though computing systems based on von Neumann architecture still dominate computer architecture, this architecture is considered inefficient for dealing with big data in the training of deep neural networks (DNNs) because of its serial signal processing [1]; therefore, a totally new computing system is required for the next generation of artificial intelligence. Recently, a bio-inspired neuromorphic system based on a spiking neural network (SNN) has been widely investigated because of its power-efficiency and parallel signal processing properties [2][3][4][5]. With regard to its application, the neuromorphic system, which is a hardware implementation of an artificial neural network, has been utilized mostly for pattern recognition [6][7][8][9][10], but also as a denoising auto encoder [11], for color image reconstruction [12], and for speech recognition [13]. In addition, various kinds of electronic devices have been studied as an artificial synaptic device, a crucial building block for constructing neuromorphic systems, including resistive switching materials [14][15][16][17], phase change materials [18][19][20], ferroelectric materials [21,22], and transistors [23][24][25]. Among them, transistor-based synaptic devices are considered as having better reliability characteristics and device variation for very-large-scale integration (VLSI) implementation of neural networks compared to their counterparts.
In our previous works, we reported a synaptic transistor with an asymmetric dual-gate structure as having short-and long-term memories and spike-timing dependent plasticity (STDP) characteristics [26][27][28], and its fabrication method [29]. In this work, a system-level study of a SNN for pattern recognition is presented with a binary modified National Institute of Standards and Technology (MNIST) handwritten dataset. The necessity of an inhibitory synaptic component is analyzed in order to solve an overlapping pattern issue when it comes to pattern recognition for on-chip learning of bio-inspired neuromorphic systems in the form of SNNs.

Device Model of Synaptic Transistor for System-Level Study
A schematic view of weight modulation in the synaptic transistor is illustrated in Figure 1a. As the pre-synaptic spikes are applied to the first gate (G1) and the drain, excess holes are generated by impact ionization and accumulate in the floating body region. The impact generation region expands as a result of the positive feedback between impact generation rate and accumulated holes. Afterwards, newly generated hot carriers near the second gate (G2) are injected into the nitride layer depending on the second gate voltage (V G2 ). The device is potentiated and depressed when holes and electrons are stored in the nitride layer because of the threshold voltage (V T ) change. These weight modulation characteristics of the synaptic transistor are incorporated into a device model with a voltage-controlled current source (VCCS) [30] based on the gate current caused by hot carrier injection [31], as shown in Figure 1b. The VCCS delivers the second gate current (I G2 ) to the nitride layer, which is modeled by the gate current flowing by hot carrier injection as a function of V G2 as per the following equation: where α is a fitting coefficient. The type and number of injected carriers are determined depending on V G2 so that the amount of V T change (∆V T ) per each pre-and post-synaptic spike is calculated, providing good agreement with the measured data [28]. Technology (MNIST) handwritten dataset. The necessity of an inhibitory synaptic component is analyzed in order to solve an overlapping pattern issue when it comes to pattern recognition for onchip learning of bio-inspired neuromorphic systems in the form of SNNs.

Device Model of Synaptic Transistor for System-Level Study
A schematic view of weight modulation in the synaptic transistor is illustrated in Figure 1a. As the pre-synaptic spikes are applied to the first gate (G1) and the drain, excess holes are generated by impact ionization and accumulate in the floating body region. The impact generation region expands as a result of the positive feedback between impact generation rate and accumulated holes. Afterwards, newly generated hot carriers near the second gate (G2) are injected into the nitride layer depending on the second gate voltage (VG2). The device is potentiated and depressed when holes and electrons are stored in the nitride layer because of the threshold voltage (VT) change. These weight modulation characteristics of the synaptic transistor are incorporated into a device model with a voltage-controlled current source (VCCS) [30] based on the gate current caused by hot carrier injection [31], as shown in Figure 1b. The VCCS delivers the second gate current (IG2) to the nitride layer, which is modeled by the gate current flowing by hot carrier injection as a function of VG2 as per the following equation: where α is a fitting coefficient. The type and number of injected carriers are determined depending on VG2 so that the amount of VT change (∆VT) per each pre-and post-synaptic spike is calculated, providing good agreement with the measured data [28].

Results and Discussion
With the help of the developed device model, the performance of the SNN composed of the synaptic transistors was studied with regard to pattern recognition. A 784 × 10 single-layer SNN was constructed to train and test 28 × 28 binary MNIST images (60,000 training images and 10,000 testing images). A total of 784 synaptic transistors were connected to each output node as shown in Figure  2. Charges were integrated at a capacitor node while pre-synaptic spikes were applied to each synaptic transistor, and a post-synaptic neuron circuit generated post-synaptic spikes at the output node when the node voltage of the capacitor exceeded VT of the neuron circuit [32]. The spike generation rate of each post-synaptic neuron circuit was considered as the intensity of the output node; therefore, the system was considered successful in pattern recognition when the answer node fired most among all the output nodes during test operation. The reason why recognition accuracy was calculated in this manner is that the weight sum of transferred currents (IE) to the output node, which is the most congruous to the testing sample, was expected to be the largest owing to the potentiated synaptic transistors in the shape of the digit, leading to high current flows.

Results and Discussion
With the help of the developed device model, the performance of the SNN composed of the synaptic transistors was studied with regard to pattern recognition. A 784 × 10 single-layer SNN was constructed to train and test 28 × 28 binary MNIST images (60,000 training images and 10,000 testing images). A total of 784 synaptic transistors were connected to each output node as shown in Figure 2. Charges were integrated at a capacitor node while pre-synaptic spikes were applied to each synaptic transistor, and a post-synaptic neuron circuit generated post-synaptic spikes at the output node when the node voltage of the capacitor exceeded V T of the neuron circuit [32]. The spike generation rate of each post-synaptic neuron circuit was considered as the intensity of the output node; therefore, the system was considered successful in pattern recognition when the answer node fired most among all the output nodes during test operation. The reason why recognition accuracy was calculated in this manner is that the weight sum of transferred currents (I E ) to the output node, which is the most congruous to the testing sample, was expected to be the largest owing to the potentiated synaptic transistors in the shape of the digit, leading to high current flows. Figure 3a shows how the system was trained using the binary MNIST images and STDP characteristics. The pre-synaptic spikes were applied to the corresponding synaptic transistors with different timing depending on their colors: black with ∆t = 0.5 µs and white with ∆t = −0.5 µs, compared to a teaching signal which was given to the output node matching to the digit of the training sample. Therefore, VT was increased (depression) for the synaptic transistors representing black pixels (background) and decreased (potentiation) for white pixels (handwritten digit). Figure  3b shows the classification rate of the SNN with untrained testing samples as a function of the number of trained samples.
The accuracy rate became rapidly saturated due to the nonlinear weight modulation characteristics coming from the hot carrier injection model. The more electrons or holes were trapped in the nitride layer, the less likely were additional electrons or holes to be injected due to the potential inhibition by the already stored ones. The saturated accuracy rate of over 3000 trained samples was about 60%, which is quite low compared to other SNN systems because of the overlapping pattern issue. Figure 3c describes how the overlapping pattern issue degrades the classification rate. The output nodes having more white pixels in their weight maps, such as eight or zero, have a higher probability to fire, even though they do not match the digits of test samples, leading to a low recognition rate of the ones that have less white pixels (such as digit 1).  Figure 3a shows how the system was trained using the binary MNIST images and STDP characteristics. The pre-synaptic spikes were applied to the corresponding synaptic transistors with different timing depending on their colors: black with ∆t = 0.5 µs and white with ∆t = −0.5 µs, compared to a teaching signal which was given to the output node matching to the digit of the training sample. Therefore, V T was increased (depression) for the synaptic transistors representing black pixels (background) and decreased (potentiation) for white pixels (handwritten digit). Figure 3b shows the classification rate of the SNN with untrained testing samples as a function of the number of trained samples. Figure 3a shows how the system was trained using the binary MNIST images and STDP characteristics. The pre-synaptic spikes were applied to the corresponding synaptic transistors with different timing depending on their colors: black with ∆t = 0.5 µs and white with ∆t = −0.5 µs, compared to a teaching signal which was given to the output node matching to the digit of the training sample. Therefore, VT was increased (depression) for the synaptic transistors representing black pixels (background) and decreased (potentiation) for white pixels (handwritten digit). Figure  3b shows the classification rate of the SNN with untrained testing samples as a function of the number of trained samples.
The accuracy rate became rapidly saturated due to the nonlinear weight modulation characteristics coming from the hot carrier injection model. The more electrons or holes were trapped in the nitride layer, the less likely were additional electrons or holes to be injected due to the potential inhibition by the already stored ones. The saturated accuracy rate of over 3000 trained samples was about 60%, which is quite low compared to other SNN systems because of the overlapping pattern issue. Figure 3c describes how the overlapping pattern issue degrades the classification rate. The output nodes having more white pixels in their weight maps, such as eight or zero, have a higher probability to fire, even though they do not match the digits of test samples, leading to a low recognition rate of the ones that have less white pixels (such as digit 1).   The accuracy rate became rapidly saturated due to the nonlinear weight modulation characteristics coming from the hot carrier injection model. The more electrons or holes were trapped in the nitride layer, the less likely were additional electrons or holes to be injected due to the potential inhibition by the already stored ones. The saturated accuracy rate of over 3000 trained samples was about 60%, which is quite low compared to other SNN systems because of the overlapping pattern issue. Figure 3c describes how the overlapping pattern issue degrades the classification rate. The output nodes having more white pixels in their weight maps, such as eight or zero, have a higher probability to fire, even though they do not match the digits of test samples, leading to a low recognition rate of the ones that have less white pixels (such as digit 1). Figure 4a compares two weight maps in the form of ∆V T at the same scale, which were learned through the STDP rule and transferred from an artificial neural network (ANN) through off-chip learning; here, the synaptic weights of the ANN were converted to the ones of the SNN to be proportional to their square roots, so that the transferred I E can be in line with the weight sum of the ANN with a rectified linear unit (ReLU), which is one of the most popular activation functions in ANNs because of the lack of vanishing gradients problems compared to other ones, such as sigmoid or a hyperbolic tangent [33][34][35]. The former looks like carving digits to the synaptic devices, whereas the latter is well characterized by the features of each digit. That is why the hardware-based SNN has a poor accuracy of 60% because its weight map does not reflect the characteristics of each digit. In the case of the STDP method, the V T is modulated only according to whether a training sample is the answer or not, and the amount of V T change is determined by the amount of already stored carriers in the nitride layer. However, the amount of V T change is adjusted in the case of the ANN according to a backpropagation algorithm. Illustrated in Figure 4b are the transformation processes of the weight maps for digit 8 as the training progressed for the two cases. The carved pattern on the weight map by the STDP method becomes clearer in the direction in which it can fire frequently by digit input samples; however, the transferred weight map from the ANN exhibits its unique features in fine detail, so that all the weight maps have higher classification accuracies, which means that even a narrower memory window of the synaptic transistors can provide a higher accuracy when the weight map reflects the unique characteristics of the training images which are supposed to be classified. Figure 4c plots the classification rates for each digit depending on the methods. The poor accuracies, especially digit 1 and digit 9, have been highly improved by adopting the transferred synaptic weights, leading to 87.6% of the total accuracy. In addition, the most noteworthy thing is that the classification rates of the transferring method and the ANN itself are almost the same for every single digit. It is believed that the SNN using the transferred weight maps and the ANN with ReLU are equivalent in their operations in the respect that the intensity of the output nodes can correspond to the firing rate [36].  Figure 4a compares two weight maps in the form of ∆VT at the same scale, which were learned through the STDP rule and transferred from an artificial neural network (ANN) through off-chip learning; here, the synaptic weights of the ANN were converted to the ones of the SNN to be proportional to their square roots, so that the transferred IE can be in line with the weight sum of the ANN with a rectified linear unit (ReLU), which is one of the most popular activation functions in ANNs because of the lack of vanishing gradients problems compared to other ones, such as sigmoid or a hyperbolic tangent [33][34][35]. The former looks like carving digits to the synaptic devices, whereas the latter is well characterized by the features of each digit. That is why the hardware-based SNN has a poor accuracy of 60% because its weight map does not reflect the characteristics of each digit. In the case of the STDP method, the VT is modulated only according to whether a training sample is the answer or not, and the amount of VT change is determined by the amount of already stored carriers in the nitride layer. However, the amount of VT change is adjusted in the case of the ANN according to a backpropagation algorithm. Illustrated in Figure 4b are the transformation processes of the weight maps for digit 8 as the training progressed for the two cases. The carved pattern on the weight map by the STDP method becomes clearer in the direction in which it can fire frequently by digit input samples; however, the transferred weight map from the ANN exhibits its unique features in fine detail, so that all the weight maps have higher classification accuracies, which means that even a narrower memory window of the synaptic transistors can provide a higher accuracy when the weight map reflects the unique characteristics of the training images which are supposed to be classified. Figure 4c plots the classification rates for each digit depending on the methods. The poor accuracies, especially digit 1 and digit 9, have been highly improved by adopting the transferred synaptic weights, leading to 87.6% of the total accuracy. In addition, the most noteworthy thing is that the classification rates of the transferring method and the ANN itself are almost the same for every single digit. It is believed that the SNN using the transferred weight maps and the ANN with ReLU are equivalent in their operations in the respect that the intensity of the output nodes can correspond to the firing rate [36].  In order to reduce the classification error caused by the overlapping pattern issue discussed above, the inhibitory synaptic devices with the same weight maps as the excitatory ones are added as shown in Figure 5a. As in the previous method, the input signals are applied to the excitatory synapses corresponding to their own pixels in the case of the white pixels; at the same time, the input signals are applied to the inhibitory synapses in the case of the black pixels. This change in the manner of classification leads to the result that if the testing samples cover not only their own digits but also other digits, the remaining parts contribute to a subtraction of the weight sum by the current flows (I I ) through the inhibitory synaptic transistors as shown in Figure 5b. The overlapping pattern issue can be significantly solved in this way because it mainly comes from the contribution of the remaining parts to undesired firings. In order to reduce the classification error caused by the overlapping pattern issue discussed above, the inhibitory synaptic devices with the same weight maps as the excitatory ones are added as shown in Figure 5a. As in the previous method, the input signals are applied to the excitatory synapses corresponding to their own pixels in the case of the white pixels; at the same time, the input signals are applied to the inhibitory synapses in the case of the black pixels. This change in the manner of classification leads to the result that if the testing samples cover not only their own digits but also other digits, the remaining parts contribute to a subtraction of the weight sum by the current flows (II) through the inhibitory synaptic transistors as shown in Figure 5b. The overlapping pattern issue can be significantly solved in this way because it mainly comes from the contribution of the remaining parts to undesired firings.  Figure 6a shows the accuracies as a function of the number of training samples for various ratios between the channel widths of the excitatory synaptic transistors (Wex) and the inhibitory ones (Win). The accuracy is improved by 10% at Win/Wex = 0.1; however, it starts decreasing after that and reaches the bottom (nearly 0% instead of 10%) when Win/Wex = 0.5. This is because the output nodes cannot fire when Win is too wide. The number of black pixels is larger than that of white pixels and II is higher than IE in most testing samples when Win/Wex exceeds 0.5. Figure 6b compares the classification rate of each digit for those two SNN systems. It is noteworthy that the accuracies of the digits which have a small number of white pixels, such as one, is significantly enhanced from 19% to 60%, while the accuracies of other digits maintain similar values. It is confirmed that the addition of an inhibitory synapse part can effectively solve the misclassified cases stemming from the overlapping pattern problem.  Figure 6a shows the accuracies as a function of the number of training samples for various ratios between the channel widths of the excitatory synaptic transistors (W ex ) and the inhibitory ones (W in ). The accuracy is improved by 10% at W in /W ex = 0.1; however, it starts decreasing after that and reaches the bottom (nearly 0% instead of 10%) when W in /W ex = 0.5. This is because the output nodes cannot fire when W in is too wide. The number of black pixels is larger than that of white pixels and I I is higher than I E in most testing samples when W in /W ex exceeds 0.5. Figure 6b compares the classification rate of each digit for those two SNN systems. It is noteworthy that the accuracies of the digits which have a small number of white pixels, such as one, is significantly enhanced from 19 to 60%, while the accuracies of other digits maintain similar values. It is confirmed that the addition of an inhibitory synapse part can effectively solve the misclassified cases stemming from the overlapping pattern problem.

Conclusions
In conclusion, we presented a system-level study regarding pattern recognition with the help of a device model. The device model was developed with a VCCS based on measured data and gate current by hot carrier injection. A total of three SNN systems were constructed and analyzed using binary MNIST images. A SNN with only the excitatory synaptic transistors trained under the STDP rule had a poor classification rate with 60% of the total accuracy because of the pattern overlapping issue. This dramatically improved to 87.6% in the case of a SNN with transferred synaptic weights from an ANN using ReLU. The difference between those two systems was whether the region representing the unique features of each digit was potentiated or the handwritten digit region was just carved. The addition of inhibitory synaptic transistors with the same weight maps improved the classification accuracy by 10% by solving the overlapping pattern problem, which comes from the fact that the output nodes having more white pixels tend to fire to unmatched training samples. These results lead us to conclude that these SNN systems and learning methods provide a framework for future studies about hardware-based neuromorphic systems using both excitatory and inhibitory synaptic devices for pattern recognition applications.

Conclusions
In conclusion, we presented a system-level study regarding pattern recognition with the help of a device model. The device model was developed with a VCCS based on measured data and gate current by hot carrier injection. A total of three SNN systems were constructed and analyzed using binary MNIST images. A SNN with only the excitatory synaptic transistors trained under the STDP rule had a poor classification rate with 60% of the total accuracy because of the pattern overlapping issue. This dramatically improved to 87.6% in the case of a SNN with transferred synaptic weights from an ANN using ReLU. The difference between those two systems was whether the region representing the unique features of each digit was potentiated or the handwritten digit region was just carved. The addition of inhibitory synaptic transistors with the same weight maps improved the classification accuracy by 10% by solving the overlapping pattern problem, which comes from the fact that the output nodes having more white pixels tend to fire to unmatched training samples. These results lead us to conclude that these SNN systems and learning methods provide a framework for future studies about hardware-based neuromorphic systems using both excitatory and inhibitory synaptic devices for pattern recognition applications.