Hardware Demonstration of SRDP Neuromorphic Computing with Online Unsupervised Learning Based on Memristor Synapses

Neuromorphic computing has shown great advantages towards cognitive tasks with high speed and remarkable energy efficiency. Memristor is considered as one of the most promising candidates for the electronic synapse of the neuromorphic computing system due to its scalability, power efficiency and capability to simulate biological behaviors. Several memristor-based hardware demonstrations have been explored to achieve the capacity of unsupervised learning with the spike-rate-dependent plasticity (SRDP) learning rule. However, the learning capacity is limited and few of the memristor-based hardware demonstrations have explored the online unsupervised learning at the network level with an SRDP algorithm. Here, we construct a memristor-based hardware system and demonstrate the online unsupervised learning of SRDP networks. The neuromorphic system consists of multiple memristor arrays as the synapse and the discrete CMOS circuit unit as the neuron. Unsupervised learning and online weight update of 10 MNIST handwritten digits are realized by the constructed SRDP networks, and the recognition accuracy is above 90% with 20% device variation. This work paves the way towards the realization of large-scale and efficient networks for more complex tasks.


Introduction
The human brain is a highly efficient system, which consists of approximately 10 11 neurons and 10 15 synapses with merely 20 W power consumption [1][2][3]. Neuromorphic computing is a new computing paradigm inspired by the brain, with the advantage of massive parallelism and distributed storage, and is claimed as a promising technology to enhance information analysis abilities in the data-rich era [4][5][6][7]. However, existing hardware demonstrations are far from competing with the biological ones in terms of efficiency and power consumption [8][9][10]. One reason is that the systems are constructed based on CMOS devices with complex synapses and neuron circuits occupying quite a large area [11][12][13]. Therefore, the compact nanoelectronic device which can successfully simulate the biological elements is essential to construct efficient networks [2]. Recently, the memristor with high density, low power consumption and tunable conductance has shown great promise for the synapses [14][15][16][17]. Another attribution to the inefficiency is that the recognition tasks are realized via supervised learning, which demands a large amount of training data and additional feedback circuits, leading to time latency and energy consumption [18][19][20][21], especially when online training is required [22][23][24]. Thus, recent studies focus on unsupervised learning [25][26][27][28][29][30], where the synaptic weights are usually updated according to bio-inspired local learning rules [31][32][33], such as spiketiming-dependent plasticity (STDP) [34,35] and spike-rate-dependent plasticity (SRDP) [36]. STDP refers to the learning principle that relative timing between pre-synaptic and postsynaptic spikes determines the direction of weight update and the magnitude of weight change [37][38][39]. SRDP is another learning rule that modulates the synaptic weights by the frequency of pre-and post-neuron activities, which is one of the most critical learning algorithms for neuromorphic computing [40][41][42].
Early research studies have proved that memristor devices can exhibit SRDP-like behaviors, including SiO x N y :Ag-based diffusive memristor [43], HfO x -based memristor [44], TiO x /AlO y -based [45] oxide memristor, AgInSbTe-based chalcogenide memristor [46], hybrid CMOS/memristor structure [26,28], and devices with many other materials [47][48][49]. Going beyond the device demonstrations, several hardware implementations of pattern learning by SRDP have been proposed [26,28]. Milo et al. demonstrated online unsupervised learning of patterns with 8 × 8 pixels by SRDP based on the 4T1R structure [26]. Nevertheless, the learning ability of the small-scale network is limited, which is unable to accomplish challenging tasks, such as classification of different inputs and recognition of data sets. Recently, Huang et al. proposed a single-layer fully-connected network to classify 10 images by SRDP and constructed a CNN-SRDP network to recognize the whole MNIST images with up to 92% accuracy, which enlarges the learning abilities [4]. However, only simulation results are presented, and the device demonstration is performed based on discrete cells. Therefore, hardware demonstration by SRDP at the network level is of great importance to address more practical tasks [2,[50][51][52].
In this work, we present a neuromorphic hardware system, which is comprised of multiple memristor arrays, DACs, ADCs and many other assemblies, and is equipped with inference and training functions. The SRDP characteristic is implemented experimentally by the memristor synapses and CMOS neurons. A 196 × 10 SRDP neural network is constructed to demonstrate the online unsupervised learning of 10 MNIST digits, and about 90% classification accuracy is achieved.

Memristor-Based Neuromorphic Hardware System
Here, a memristor-based neuromorphic system is constructed for the hardware demonstration of SRDP neural networks. The system consists of three parts, including memristor crossbar arrays, the customized printed circuit board (PCB) and the personal computer (PC), as shown in Figure 1a. The memristor array provides hardware synapses, and the device conductance is considered as the analogy of synaptic weight. The vector-matrix multiplications and weight update can be performed on the array. The PCB implements partial functions of the neurons, which primarily consists of Digital-Analog Converters (DACs), trans-impedance amplifier (TIA), Analog-Digital Converters (ADCs) and multiplexers (MUX), as shown in Figure 1b. DACs in the pre-neuron module are used to generate input signal and noise signal to WL of the array under the control of a reference random signal. The post-neuron is made up of an integrator, a comparator and a multiplexer, as can be seen in Figure 1c. Therefore, DACs in the post-neuron module are used to generate constant voltage for inference tasks and spike pulses to BL or SL for weight update. ADCs together with TIAs are used to read the integral current across the synapses through SL. MUXs are utilized to select different memristor chips and operation modes including inference and weight update. MCU controls the discrete components and processes data. Matlab script running on PC is used to control the generation of signals and perform some calculations of the leaky-integrate-and-fire (LIF) post-neuron, including the accumulation of membrane voltage (V m ) and the comparison between V m and threshold voltage (V th ). The computer sends control commands and communicates with MCU via a serial port.  Figure 2a shows the micrograph of the memristor chip. Each packaged chip is integrated with 256 × 16 1T1R cells and multiplexers to control and select word lines. The crossbar array is constructed by connecting the gates of transistors in the same row (WL) and the top electrodes (TE) of memristors in the same column (BL). The sources of the transistor are wired to the same SL, which is parallel to BL, as can be seen in Figure 2b. The structure of the array is designed to meet the requirements for the SRDP algorithm, where input signals of the pre-neurons are sent to WL and the top electrodes of the memristor corresponding to the same post-neuron should be connected for synchronous weight update. Figure 2c shows the memristor device with TiN/TaOx/HfOx/TiN structure. TiN is used as the bottom electrode, on the top of which an 8-nm HfO2 resistive layer was deposited by atomic layer deposition (ALD) at 250 °C. Then, a 45-nm TaOx was deposited as a capping layer by magnetron sputtering with an Ar/N2 atmosphere. The TiN top electrode is grown by physical vapor deposition and patterned by the dry etching method.  Figure 2a shows the micrograph of the memristor chip. Each packaged chip is integrated with 256 × 16 1T1R cells and multiplexers to control and select word lines. The crossbar array is constructed by connecting the gates of transistors in the same row (WL) and the top electrodes (TE) of memristors in the same column (BL). The sources of the transistor are wired to the same SL, which is parallel to BL, as can be seen in Figure 2b. The structure of the array is designed to meet the requirements for the SRDP algorithm, where input signals of the pre-neurons are sent to WL and the top electrodes of the memristor corresponding to the same post-neuron should be connected for synchronous weight update. Figure 2c shows the memristor device with TiN/TaO x /HfO x /TiN structure. TiN is used as the bottom electrode, on the top of which an 8-nm HfO 2 resistive layer was deposited by atomic layer deposition (ALD) at 250 • C. Then, a 45-nm TaO x was deposited as a capping layer by magnetron sputtering with an Ar/N 2 atmosphere. The TiN top electrode is grown by physical vapor deposition and patterned by the dry etching method.

Memristor Synapse with SRDP Characteristic
The basic properties and SRDP characteristics of the memristor are shown in Figure  3. The typical I-V characteristic is presented in Figure 3a. The distribution of high conductance state (HGS) and low conductance state (LGS) of ten memristors selected randomly is shown in Figure 3b. The result shows that HGS is around 80.0 μS and LGS is 2.7 μS on average, indicating approximately 30× conductance window. The variation of HGS is below 20% and that of LGS is about 80%. The SRDP learning rules and the circuit of memristor synapse and CMOS neurons have been illustrated in Section 2. To prove the feasibility of the SRDP algorithm, we perform experiments based on the hardware system. According to the previous work, the learning efficiency and accuracy are sensitive to the circuit parameters of the post-neuron [4]. Thus, we should select the circuit parameters firstly, which include the leaky resistance R, the capacitor C of the integrator, the threshold voltage Vth of the comparator in the post-neuron module, and so on. The various signals are initiated as the binarized sequences with certain probabilities, where a high level "1" represents a spike with 1 μs width and "0" represents that there is no spike generated. We randomly select a device in the array for the demonstration of SRDP behavior. Initially, the device has the probability Pg = 0.5 to be in HGS. The training process of SRDP is comprised of three stages, including accumulation, potentiation and depression. When training starts, the system first enters the accumulation stage. DAC in the pre-neuron module generates Vg according to the input signal and sends it to the selected WL, while that in the post-neuron module sends small constant voltage (Vs) to BL. When the transistor of the memristor synapse is switched on, the current will be generated following Ohm's law and read out by TIAs and ADCs. The current data is processed in MCU and then transferred to the computer, where Vm is calculated and compared with Vth. Once Vm exceeds Vth, a fire event occurs and Vm will be cleared to zero. If the fire spike coincides with the reference random signal, the neuron will enter the depression stage. The computer sends the instructions to control DACs for selecting the proper signals, acting as the MUX. DAC of pre-neuron sends Vg according to the noise signal to WL, and that of post-neuron generates Vreset to SL and makes BL grounded. When the RESET spike overlaps with the noise spike, the device will be RESET to LGS. Otherwise, if the fire spike is not superimposed with a reference random signal, the neuron will turn to the potentiation stage. Vg is generated according to the input signal. Vset is sent to BL and SL is switched to the ground. When the SET spike overlaps with the input spike, the device will be SET to HGS. In other

Memristor Synapse with SRDP Characteristic
The basic properties and SRDP characteristics of the memristor are shown in Figure 3. The typical I-V characteristic is presented in Figure 3a. The distribution of high conductance state (HGS) and low conductance state (LGS) of ten memristors selected randomly is shown in Figure 3b. The result shows that HGS is around 80.0 µS and LGS is 2.7 µS on average, indicating approximately 30× conductance window. The variation of HGS is below 20% and that of LGS is about 80%. The SRDP learning rules and the circuit of memristor synapse and CMOS neurons have been illustrated in Section 2. To prove the feasibility of the SRDP algorithm, we perform experiments based on the hardware system. According to the previous work, the learning efficiency and accuracy are sensitive to the circuit parameters of the post-neuron [4]. Thus, we should select the circuit parameters firstly, which include the leaky resistance R, the capacitor C of the integrator, the threshold voltage V th of the comparator in the post-neuron module, and so on. The various signals are initiated as the binarized sequences with certain probabilities, where a high level "1" represents a spike with 1 µs width and "0" represents that there is no spike generated. We randomly select a device in the array for the demonstration of SRDP behavior. Initially, the device has the probability P g = 0.5 to be in HGS. The training process of SRDP is comprised of three stages, including accumulation, potentiation and depression. When training starts, the system first enters the accumulation stage. DAC in the pre-neuron module generates V g according to the input signal and sends it to the selected WL, while that in the post-neuron module sends small constant voltage (V s ) to BL. When the transistor of the memristor synapse is switched on, the current will be generated following Ohm's law and read out by TIAs and ADCs. The current data is processed in MCU and then transferred to the computer, where V m is calculated and compared with V th . Once V m exceeds V th , a fire event occurs and V m will be cleared to zero. If the fire spike coincides with the reference random signal, the neuron will enter the depression stage. The computer sends the instructions to control DACs for selecting the proper signals, acting as the MUX. DAC of pre-neuron sends V g according to the noise signal to WL, and that of post-neuron generates V reset to SL and makes BL grounded. When the RESET spike overlaps with the noise spike, the device will be RESET to LGS. Otherwise, if the fire spike is not superimposed with a reference random signal, the neuron will turn to the potentiation stage. V g is generated according to the input signal. V set is sent to BL and SL is switched to the ground. When the SET spike overlaps with the input spike, the device will be SET to HGS. In other cases, the neuron remains in the accumulation stage. After training, the conductance at the final epoch is recorded as the learned weight. Note that the weight update is performed without write-verify, so there exists device variation as shown in Figure 3b. Figure 3c presents the measured and simulated results of SRDP characteristics. The frequency of the input signal is normalized by 1 MHz. For each frequency point, the weight is the mean of 300 times' experiments after 100 training epochs. The outcome of measurement agrees with that of simulation. Because P g is 0.5, the initialized weight is about 40.0µS. When the input frequency is higher than 0.3, the synapse experiences an enhancement process, otherwise, synaptic depression is triggered. The result shows that the relationship between the trained weights and the frequency of input signals is identical to the biological SRDP phenomenon, where LTP (LTD) is achieved with a high (low) frequency of input signal [37,38].
cases, the neuron remains in the accumulation stage. After training, the conductanc the final epoch is recorded as the learned weight. Note that the weight update is formed without write-verify, so there exists device variation as shown in Figure 3b. Fig  3c presents the measured and simulated results of SRDP characteristics. The frequenc the input signal is normalized by 1 MHz. For each frequency point, the weight is the m of 300 times' experiments after 100 training epochs. The outcome of measurement ag with that of simulation. Because Pg is 0.5, the initialized weight is about 40.0μS. When input frequency is higher than 0.3, the synapse experiences an enhancement process, erwise, synaptic depression is triggered. The result shows that the relationship betw the trained weights and the frequency of input signals is identical to the biological SR phenomenon, where LTP (LTD) is achieved with a high (low) frequency of input sig [37,38].

Online Unsupervised Learning of SRDP Network
We partition a 196 × 10 area of the array to construct a single-layer, fully-conne network consisting of 196 pre-neurons, 1960 synapses and 10 post-neurons. Ten handw ten digits from the MNIST data set [53] are selected. The input images are rescaled to 14 pixels to match the size of the array and then binarized. The input values of each im are unrolled into 196 × 1 vectors and then mapped to signals with different frequen Before training, the devices are initialized into HGS with the probability Pg. The train parameters and the corresponding definitions are listed in Table 1, optimized by the properly principle in ref. [4]. When the training starts, the accumulation process woul conducted first. DACs corresponding to the pattern pixels in the digit region send in signals with the same frequency Pin but different temporal sequences to WLs, and th within the background region generate signals with low-frequency Pb. For the input nals at a high level, the corresponding transistors in the same row will be switched Meanwhile, Vs is applied to BLs of all post-neurons. The currents sharing the same colu are integrated according to Kirchhoff's current law. Due to the random distributio initialized weights, post-neurons sharing the same input signals will have different a mulation speeds of membrane voltage and compete with each other. PC compares ev membrane voltage with Vth. Once any Vm exceeds Vth, the corresponding post-neuron experience weight update. No matter which post-neuron becomes the winner, Vm o the post-neurons will be cleared to zeros. If the reference random signal with the quency Pr is at a high level, the system will be in the depression stage. Different n signals with certain rates will be generated by DACs. The post-neuron whose Vm exce Vth will send a RESET spike to SL, indicating that only the winner experiences the dep sion. The weight of the synapses connected to the winner will be tuned according to noise signal. If the reference random signal is at a low level, the system will be in potentiation stage and the synapses of the winner will have a certain probability to

Online Unsupervised Learning of SRDP Network
We partition a 196 × 10 area of the array to construct a single-layer, fully-connected network consisting of 196 pre-neurons, 1960 synapses and 10 post-neurons. Ten handwritten digits from the MNIST data set [53] are selected. The input images are rescaled to 14 × 14 pixels to match the size of the array and then binarized. The input values of each image are unrolled into 196 × 1 vectors and then mapped to signals with different frequencies. Before training, the devices are initialized into HGS with the probability P g . The training parameters and the corresponding definitions are listed in Table 1, optimized by the fire-properly principle in ref. [4]. When the training starts, the accumulation process would be conducted first. DACs corresponding to the pattern pixels in the digit region send input signals with the same frequency P in but different temporal sequences to WLs, and those within the background region generate signals with low-frequency P b . For the input signals at a high level, the corresponding transistors in the same row will be switched on. Meanwhile, V s is applied to BLs of all post-neurons. The currents sharing the same column are integrated according to Kirchhoff's current law. Due to the random distribution of initialized weights, post-neurons sharing the same input signals will have different accumulation speeds of membrane voltage and compete with each other. PC compares every membrane voltage with V th . Once any V m exceeds V th , the corresponding post-neuron will experience weight update. No matter which post-neuron becomes the winner, V m of all the post-neurons will be cleared to zeros. If the reference random signal with the frequency P r is at a high level, the system will be in the depression stage. Different noise signals with certain rates will be generated by DACs. The post-neuron whose V m exceeds V th will send a RESET spike to SL, indicating that only the winner experiences the depression. The weight of the synapses connected to the winner will be tuned according to the noise signal. If the reference random signal is at a low level, the system will be in the potentiation stage and the synapses of the winner will have a certain probability to be enhanced. During the training process, the images are forwarded to the pre-neurons in sequence and each image holds for 600 training epochs.  Figure 4 shows the experimental learning process of digit "0". In order to present more details, the training speed is slowed down by decreasing the parameter P in and increasing the training epochs t n . In Figure 4a, the evolution of integral current, membrane voltage and the voltage of TE is shown during the first 300 epochs. The current across the synapses charges the capacitor of LIF post-neuron, contributing to the increase of membrane voltage. When the V th is reached, a positive spike is transferred to TE and V m will be cleared to zeros. Figure 4b shows the change of weights during the whole 1000 epochs, indicating that the weights in pattern regions get close to HGS and those in background regions tend to LGS. Figure 4c displays the evolution of mean weight corresponding to pixels in different regions. The results suggest that the potentiation (depression) occurs at high (low) frequency due to the larger probability for the weight to be enhanced (depressed), which is identical with the SRDP phenomenon [40][41][42].
Micromachines 2022, 13, x 6 of 11 Probability to be in HGS of synaptic weights in the initial state 0.65 a.u. Pr Frequency of the reference random signal 0.15 a.u. Pn Frequency of the noise signal 0.04 a.u. Pin Frequency of the input signal in the pattern pixels 1 a.u. Pb Frequency of the input signal in the background pixels 0 a.u. tn Training epoch of each image 600 # Figure 4 shows the experimental learning process of digit "0". In order to present more details, the training speed is slowed down by decreasing the parameter Pin and increasing the training epochs tn. In Figure 4a, the evolution of integral current, membrane voltage and the voltage of TE is shown during the first 300 epochs. The current across the synapses charges the capacitor of LIF post-neuron, contributing to the increase of membrane voltage. When the Vth is reached, a positive spike is transferred to TE and Vm will be cleared to zeros. Figure 4b shows the change of weights during the whole 1000 epochs, indicating that the weights in pattern regions get close to HGS and those in background regions tend to LGS. Figure 4c displays the evolution of mean weight corresponding to pixels in different regions. The results suggest that the potentiation (depression) occurs at high (low) frequency due to the larger probability for the weight to be enhanced (depressed), which is identical with the SRDP phenomenon [40][41][42]. The learned synaptic weights of 10 handwritten digits are displayed in Figure 5. As the training goes on, the images are learned more clearly and the distinctions between inputs are enlarged, showing the learning ability of the SRDP network. The inference results before and after training are shown in Figure 6a,b. The post-neurons have been reordered, according to the fire sequence. The results show that the network fails to distinguish the inputs before training but succeeds to classify the digits after unsupervised learning. Figure 6c shows the normalized fire frequency for each digit. As can be seen in the result, one post-neuron fires for one digit, and 10 digits are learned by different post- The learned synaptic weights of 10 handwritten digits are displayed in Figure 5. As the training goes on, the images are learned more clearly and the distinctions between inputs are enlarged, showing the learning ability of the SRDP network. The inference results before and after training are shown in Figure 6a,b. The post-neurons have been reordered, according to the fire sequence. The results show that the network fails to distinguish the inputs before training but succeeds to classify the digits after unsupervised learning. Figure 6c shows the normalized fire frequency for each digit. As can be seen in the result, one post-neuron fires for one digit, and 10 digits are learned by different post-neurons, indicating a successful classification. The gradual evolution process of the post-neuron dynamics in Figure 4 and the learned synaptic weights in Figure 5 is consistent with previous simulation outcomes in ref. [4], proving the SRDP network feasible. Considering that the synaptic weights are modulated in an unsupervised way without write-verify operations, the influence of the device variation on accuracy should be taken into account. Here, the accuracy is defined as the ratio of the number of successful classifications to the number of total measurements. We perform measurements 10 times and 9 of them succeed, which is in accordance with the simulation results as shown in Figure 6d. The result suggests that variation of HGS has more negative effect than that of LGS does. When HGS variation reaches 20%, the accuracy is 93.5%, which shows the strong robustness of the SRDP network.
Micromachines 2022, 13, x 7 of 11 neurons, indicating a successful classification. The gradual evolution process of the postneuron dynamics in Figure 4 and the learned synaptic weights in Figure 5 is consistent with previous simulation outcomes in ref. [4], proving the SRDP network feasible. Considering that the synaptic weights are modulated in an unsupervised way without writeverify operations, the influence of the device variation on accuracy should be taken into account. Here, the accuracy is defined as the ratio of the number of successful classifications to the number of total measurements. We perform measurements 10 times and 9 of them succeed, which is in accordance with the simulation results as shown in Figure 6d. The result suggests that variation of HGS has more negative effect than that of LGS does. When HGS variation reaches 20%, the accuracy is 93.5%, which shows the strong robustness of the SRDP network.  The influence of the network parameters on training accuracy and energy consump tion is simulated as shown in Figure 7. The accuracy is the most crucial standard for the network. Pin and Pg control the learning speed and have a great impact on accuracy, a The influence of the network parameters on training accuracy and energy consumption is simulated as shown in Figure 7. The accuracy is the most crucial standard for the network. P in and P g control the learning speed and have a great impact on accuracy, as shown in Figure 7a. These parameters cannot be too small, because the post-neurons will not learn images if they seldom fire. P g cannot be too large, because when the number of set events for devices in HGS is smaller than that of reset events for devices in LGS, the training will fail, namely, forgetting is faster than learning for the neuron which should have been the winner at next epoch. The larger the P in is, the higher the accuracy becomes in terms of this task. This is because the overlap rate between images is relatively small, thus, the images will be learned distinctly and the difference between images will be enlarged, if the training speed is fast. As for more complicated applications, the impact of P in will be different, and all of the parameters should be re-optimized. P r and P n are also two critical parameters, and we only discuss the influence of P r in Figure 7b. The product of P r and P n determines the probability of depression. If P n is large, too many numbers of depression in pattern pixels may happen at a single epoch, which will make the learning fail in the worst case. However, if P n is small, few reset events at one epoch will make the training process smoother and the probability of forgetting can remain unchanged by tuning the parameter P r . Thus, we adjust the probability of forgetting through the parameter P r with the fixed slight value of P n . P r cannot be too large in order to avoid catastrophic forgetting and cannot be too small because it will cause the winner to have no time resetting the devices in background pixels and continue to be the winner in the following learning period. The energy consumption shown in Figure 7c,d is calculated by integrating the current across the memristor array. With the increase of P in and P g , the fire frequency is enlarged leading to more energy cost. Meanwhile, a larger probability of forgetting will decrease the fire frequency causing the reduction of power consumption. The outcomes indicate that the network parameters have a crucial impact on accuracy and energy consumption, and need to be fine-tuned and optimized for hardware demonstration. indicate that the network parameters have a crucial impact on accuracy and energy consumption, and need to be fine-tuned and optimized for hardware demonstration.

Conclusions
In conclusion, we have constructed a neuromorphic hardware system with memristor synapses and CMOS neurons. The SRDP characteristic of memristor synapse is proved experimentally. The online unsupervised learning of 10 handwritten digits at a network level is successfully demonstrated by the SRDP algorithm with above 90% accuracy. The

Conclusions
In conclusion, we have constructed a neuromorphic hardware system with memristor synapses and CMOS neurons. The SRDP characteristic of memristor synapse is proved experimentally. The online unsupervised learning of 10 handwritten digits at a network level is successfully demonstrated by the SRDP algorithm with above 90% accuracy. The proposition of a bio-inspired SRDP algorithm and the construction of a neuromorphic hardware system paves the way towards the realization of large-scale and highly efficient neuromorphic systems.