A Highly Robust Binary Neural Network Inference Accelerator Based on Binary Memristors

: Since memristor was found, it has shown great application potential in neuromorphic computing. Currently, most neural networks based on memristors deploy the special analog characteristics of memristor. However, owing to the limitation of manufacturing process, non-ideal characteristics such as non-linearity, asymmetry, and inconsistent device periodicity appear frequently and deﬁnitely, therefore, it is a challenge to employ memristor in a massive way. On the contrary, a binary neural network (BNN) requires its weights to be either + 1 or − 1, which can be mapped by digital memristors with high technical maturity. Upon this, a highly robust BNN inference accelerator with binary sigmoid activation function is proposed. In the accelerator, the inputs of each network layer are either + 1 or 0, which can facilitate feature encoding and reduce the peripheral circuit complexity of memristor hardware. The proposed two-column reference mem-ristor structure together with current controlled voltage source (CCVS) circuit not only solves the problem of mapping positive and negative weights on memristor array, but also eliminates the sneak current effect under the minimum conductance status. Being compared to the traditional differential pair structure of BNN, the proposed two-column reference scheme can reduce both the number of memristors and the latency to refresh the memristor array by nearly 50%. The inﬂuence of non-ideal factors of memristor array such as memristor array yield, memristor conductance ﬂuctuation, and reading noise on the accuracy of BNN is investigated in detail based on a newly memristor circuit model with non-ideal characteristics. The experimental results demonstrate that when the array yield α ≥ 5%, or the reading noise σ ≤ 0.25, a recognition accuracy greater than 97% on the MNIST data set is achieved.


Introduction
In recent years, deep learning algorithms such as deep neural networks (DNNs) have achieved great success in a wide range of artificial intelligence (AI) applications, including, but not limited to, image recognition, natural language processing, and pattern recognition [1][2][3][4][5][6][7]. To obtain high computing performance, current DNNs are mainly implemented or accelerated by Von Neumann computer architecture based on traditional circuits such as CPU, GPU, and FPGA [8,9]. Due to the limitation of such computer architecture, these deployments require a large amount of data to be transported between memory and the core computing unit, which costs both huge delay and immense power consumption. In addition, in the post-Moore's Law era, performance enhancement of DNNs cannot always keep up with the computing requirements and the actual gap increases quickly [10]. Therefore, it is urgent to design new computing architecture to attack the famous memory wall problem.
In 2008, Hewlett Packard (HP) Labs announced that they had experimentally confirmed the existence of memristor, also known as resistive random access memory (RRAM) [11]. Since then, a large number of researchers have been attracted to it because the memristor has beneficial physical properties, such as high scalability, low operating voltage, high endurance, and a large ON/OFF ratio which make it particularly suitable to act as a biological synapse. Furthermore, it has a high degree of circuit integration and can be combined with CMOS technology, showing extremely high feasibility and superiority in the fields of logical computing, neural network and chaotic computing, and has thus become one of the outstanding candidates for the next generation of computing architecture [12][13][14][15]. However, that is not the entire beauty of it. Non-linearity, inconsistent periodicity, and asymmetry in the memristor's conductance modulation, drift and failure in read/write operation, and other non-ideal factors could cause tremendous difficulties when the memristor is introduced in pattern recognition and other similar AI applications [16][17][18][19].
To solve these problems, one possible method is to improve manufacturing process of memristors to increase device performance, but there is still a long way to go. On the other hand, specific neural network structures matching the performance level of existing memristors will be the most reasonable approach. Binary neural network (BNN) was recently proposed to compress DNNs algorithms with acceptable recognition accuracy into conventional neural networks on various data sets (MNIST [20], CIFAR-10, and so on), and some works have shown it to be promising for accelerating neuromorphic computing [21]. Nowadays, the industries are able to manufacture binary memristors with stable performance, which provides a strong basis to implement memristive binary neural network (MBNN). Compared with analog memristive neural network implementations, MBNN replaces full-precision floating point weights with binarized constants ±1 without any other weight value, then neuron values become binary 0/+1 to adapt and simplify network structure and especially peripheral circuits. In particular, it is suitable for implementation at the existing memristor's performance level [22]. In addition to allowing storing information and performing computation, there are other advantages to binary memristors being integrated in a highly parallel architecture: good size (down to 2 nm), fast switching (faster than 100 ps), and low energy per conductance update (lower than 3 fJ) [23,24]. Lately, binary memristor-based neuromorphic Integrated Circuits (IC) is gradually becoming a trend of In-Memory Computing. Initial works reported a 16 Mb RRAM Macro Chip using binary memristors, the BNN achieves a relatively high recognition accuracy (96.5% for MNIST) under the non-perfect bit yield and endurance [25]. Another work designed, fabricated, and tested a binary memristor array that could obtain classification accuracies of 98.3%, 87.5%, and 69.7% on MNIST, CIFAR-10, and ImageNet, respectively [26]. These works were mainly based on the differential pair structure, thus, new BNN algorithms and memristive crossbar structures are needed to improve the performance of MBNNs.
The single-column reference structure is applied in previous studies, which can provide nearly 50% area reduction and power saving for a memristive array. Nevertheless, the sneak current effect in this scheme limits the scale of memristive crossbar array [27,28]. More recently, Yi-Fan Qin et al. proposed a binary memristor-based BNN inference accelerator, which demonstrated it is promising to obtain a robust BNN acceleration hardware implementation with less overhead of chip area, energy consumption, peripheral circuit complexity, but theoretically [29].
In this paper, a novel memristive crossbar structure for BNN accelerator with binary sigmoid activation function is proposed. In the accelerator, inputs of each layer of BNN are set to be either +1 or 0, which not only facilitates feature encoding but also reduces peripheral circuit complexity of memristor-based circuits. Its two-column reference memristor together with current controlled voltage source (CCVS) circuit not only solves the problem of mapping positive and negative weights on memristor array, but also eliminates the sneak current effect due to the minimum memristor conductance. Futhermore, compared to the traditional differential pair structure of BNN, the proposed two-column reference scheme can reduce both the number of memristors and the latency to refresh memristor array by nearly 50%. Influence of non-ideal factors of memristor array such as memristor array yield, memristor conductance fluctuation, and reading noise on accuracy of MBNN is investigated in detail based on a new memristor circuit model with non-ideal characteristics. The experimental results demonstrate that when array yield is α ≥ 5%, or reading noise σ ≤ 0.25, recognition accuracy greater than 97% in the MNIST data set can be achieved.

Binary Neural Network
The full process of the BNN algorithm is shown in Figure 1 [22]. In its forward inference stage, input feature vector X and binary weight matrix W b are multiplied by vector-matrix multiplication (VMM) operations to obtain intermediate value T, where the elements in W b take only two numerical values of either +1 or −1 obtained from a binarization procedure. Then batch normalization (BN) and activation process are applied to calculate output feature vector Z. Parameter g is the gradient of loss function which includes two variables of output feature Z and its target value Y. While in its training stage, binary weight W b is adopted to calculate weight gradient W by W = f (W b , g, X i ). In order to maintain a sufficiently high accuracy of BNN algorithm, 32-bit float point weight matrix W is adopted in the weight updating stage. At the end of the training stage, the following binarization rule is applied to convert the 32-bit float point weight W into binary W b : To facilitate a BNN algorithm suitable for hardware implementation using memristor crossbar, the original BNN algorithm is improved in the following method: input features of the first layer of BNN are from the binarized image pixels of MNIST test data set, that is, we mapped the gray value of the original image pixels into either 0 or 1. Input features of other layers are the intermediate data processed by the activation function of the output from their previous layer, and the binary sigmoid activation function is as follows: In this way, during the inference stage, all features remain as either 0 or 1, which will facilitate the coding of input features and reduce peripheral circuit complexity. The specific coding method is detailed in Section 2.3. In order to accelerate training and obtain better convergence, the sizes of original images are cropped from 28 × 28 to 20 × 20, which is enough to show the performance of our proposed memristor-based network circuit.
The MNIST database contains 70,000 images representing the digits zero through nine. The data are split into two subsets, with 60,000 images belonging to the training set and 10,000 images belonging to the testing set. This kind of separation ensures that given what an adequately trained model has learned previously, it can accurately classify relevant images not previously examined. For all BNNs in this work, 8000 training images are randomly picked from the training dataset (60,000 images) per epoch, and the training process is for 125 epochs. The network topology of BNNs in this work is a kind of multilayer perceptron (MLP) [30], having single or multiple hidden layers between the input layer and the output layer. All layers in the BNN are fully connected layers where all the inputs from one layer are connected to every unit of the next layer. To match both the size of MNIST test data set and classification goal of 10 digits, the number of input and output neurons of proposed MBNN are 400 and 10, the scale of hidden layers will directly determine network complexity. In general, the larger the network scale, the higher the recognition accuracy will be. Nevertheless, the networks with a larger scale need more time to train. From Figure 2, we find that when the hidden layer has more than 1000 neurons, the recognition accuracy will nearly saturate, meanwhile, the recognition accuracy difference between networks with one and three hidden layers is only about 1%, a slight degradation. Thus, in our work, a 400(input layer)-1000(hidden layer)-10(output layer) MBNN is employed to explore the effects of non-ideal factors. The detailed results are given in Table 1.

Memristor Model
The model proposed by Pershin et al. [31] has been widely referenced by researches in memristor systems [32][33][34][35] because of its simplicity. So far, researchers have developed many excellent memristor devices which have reliable switching behaviours, high switching ratio, and extended endurance of set/reset cycling processes [36][37][38]. Since we focus on binary memristors rather than analog memristors, utilizing fabricated or measured memristor data could make little improvement in the simulation. In order to obtain fast simulation when BNN is mapped into MBNN, this bipolar threshold voltage memristor model is utilized in our work, which can be expressed by the following equations: where I is memristor current, R is memristor resistance, V(t) is voltage applied to memristor to modulate its conductance. R OFF is the maximum memristor resistance (i.e., the resistance when the memristor is turned off), and R ON is the minimum memristor resistance (i.e., the resistance when the memristor is turned on); α and β are two critical parameters for adjusting change rate of memristor resistance: When α = 0, memristor resistance remains unchanged when the applied voltage is less than the threshold voltage V TH . β is used to adjust the resistance change rate when the applied voltage V(t) is greater than V TH . The larger β is, the higher the change rate is, or vise versa. θ(x) is a step function denoted as: The function θ(x) constraints memristor resistance between R OFF and R ON . The model is realized using Verilog-AMS language, and parameters in the model are: R OFF = 200,000 Ω, R ON = 2000 Ω, α = 0, β = 10 16 , V TH = 4.6 V. 6 V 1000 Hz sinusoidal voltage signal is applied, and the current-voltage (I-V) characteristic curve is shown in Figure 3, where a hysteresis curve shows a nearly ideal memristor switch between OFF and ON state. Memristor conductance modulated by positive and negative voltage with pulse width 100 ns is shown in Figure 4. It can be seen that when the 6 V positive voltage is applied, conductance reaches the maximum value. When applied voltage is less than the threshold voltage of 4.6 V, the conductance value remains unchanged. When the 6 V negative voltage is applied, conductance value becomes the minimum one. All these characteristics make this model suitable for MBNN implementation.

Implementation of MBNN
In the proposed MBNN, each memristor acts as a synapse in a neural network by applying a certain voltage pulse to modulate its conductance. The weight matrix W m×n is correspondingly represented by the conductance G m×n of memristor array, input feature vector x m×1 is encoded into voltage signal vector V m×1 to write into memristor array, and output feature is the current I k of each column of the memristor array, which is expressed as: and it implements VMM functionally as shown in Figure 5. In the whole forward inference stage of MBNN, weight has only two values: +1 and −1, so we use binary memristor to implement the circuit. Memristor conductance can only be positive, so mapping both positive and negative weights into the memristor array is a critical problem. The three methods shown in Figure 6 are investigated. The traditional differential pair structure is shown in Figure 6a, where the conductance difference of two memristors (a differential pair) is used to represent one synapse, each of the two memristors has two resistive states: high-resistance state (HRS) and low-resistance state (LRS), the corresponding conductance states are denoted as G HRS and G LRS . This is a classic scheme to map positive and negative weights in memristive crossbar array, which is also used in previous work [39][40][41][42]. Differential operation is performed on the two memristors to achieve mapping of the two weight values of +1 and −1, where the output feature, namely the current Ik of each column of memristor array, is expressed as: where G (k,i+) and G (k,i−) can only take two values of G HRS and G LRS , where G HRS − G LRS is used to represent the weight value of −1, and G LRS − G HRS is used to represent the weight value of +1.
(a) (b) (c) Figure 6. The schematic of memristive array structures to implement MBNN: (a) Traditional differential pair structure. (b) The select column scheme. Memristors in the select column are turned into G LRS , and the current mirror converts Isel to the half of itself [27]. (c) The proposed two-column reference memristor scheme. Memristors in the first reference column are turned into G HRS , and memristors in the second reference column are turned into G LRS .
As shown in Figure 6b, The select column scheme was proposed in a previous study, and it can provide nearly 50% area reduction and power saving for a memristive array [27]. Nevertheless, the sneak current effect in this scheme limits the scale of memristive crossbar array, which can be suppressed by the two-column reference memristor scheme proposed in our work. The proposed circuit is shown in Figure 6c, only one memristor is used to realize synapse. In the blue box, there are two columns of reference memristors. The conductance of the two reference columns can be referenced by the conductance of the synapse memristor to realize mapping of the positive and negative weights (+1 and −1). The output characteristic I k can be expressed as: That is, 1/2(G HRS − G LRS ) is used to represent the weight value of −1, and 1/2(G LRS − G HRS ) is used to represent the weight value of +1. It should be pointed out that the output of memristor reference columns is current, and each neuron needs the output of reference columns, which will cause the current to shunt. Thus CCVS circuit is used to convert current signal into voltage signal to solve the current shunting problem. By such design, the number of memristors is reduced by nearly 50% compared to the traditional differential pair structure. In addition, due to the reduction in the number of memristors, the writing time of trained data into the memristor array will also be reduced almost by half. The hardware implementations of the proposed BNN system are similar to the select scheme [19], with the major difference of considering sneak current effect under the minimum conductance status, thereby successfully eliminating the sneak paths induced by minimum conductance states (HRS) of the memristors in the crossbar array, which is a valid method to promote the robustness of MBNN.
The process of the MBNN acceleration circuit is mainly divided into two steps: (1) writing weights; (2) inferencing the result. In the first step of writing weights, memristor conductance can be modulated by voltage pulses, and weights obtained through software training are written into the memristor array. 1T-1M (one transistor and one memristor per cell) is a common model designed for memristive crossbar array to overcome the shortcomings of the conventional 1M model, in which a MOSFET transistor is added to a memristor as a selector to reduce the sneak current effect [43,44]. Compared with 1T-1M crossbar array, 1M crossbar array has a simpler structure, higher integration, and is easier to produce. As the problem of sneak paths has been solved by the proposed two-column reference memristor scheme, a 1M crossbar memristor array is adopted for implementation in our work. For 1M crossbar array, 1/2 voltage bias scheme is adopted to update weights, as shown in Figure 7. In this way, it can be ensured that most memristors in the array maintain 0 V when conductance update is not required, which significantly reduces power consumption. In the actual circuit simulation process, we initialize resistance of the memristor model to the maximum value, so when updating weights, the memristor array can operate column by column to reduce writing time. In our case, memristors in red will be programmed to +1 when they are biased at Vw and ground, while memristors in blue will be programmed to 0 when they are biased at ground and V w (the so-called 1/2 voltage bias scheme). In the inference phase, the input feature is encoded as voltage pulse series and applied to the memristor array. As mentioned above, we have made improvements to BNN algorithms suitable for memristor array implementation. The input value of each fully connected layer is 0 or +1, so it can be succinctly coded: input feature +1 is coded as voltage V READ (0.2 V), and input feature 0 is coded as voltage 0 V. This process can greatly simplify the peripheral circuit design.

Simulation Results of the Two Schemes
Verilog-AMS introduces the connect module construct [45]. To facilitate verification of a large digital-analog hybrid circuit, designers could develop models representing the functionality of blocks using modeling constructs in Verilog-AMS. Commonly, these models are simulated in conjunction with the corresponding transistor-level circuit to ensure consistency of the models with the actual implementation of the circuit. [46]. The memristor model, a connect module, provides two ports and has a behavioral description stating the transformation process. In Cadence Virtuoso, the memristors will be automatically arranged to form a crossbar array through a script. Unlike software language descriptions, in this way, circuit simulation could be conducted to low-level and detailed, which is the key to probing and addressing the problem of sneak paths. Both memristive BNNs in Figure 6a,c are carried out via the industry-standard mixed signal circuit design language Verilog-AMS and simulated by Cadence Spectre. Validating feasibility of the proposed memristor circuit especially in real circuit level is our main purpose, so we set the size of neural network for 400(input layer)-100(hidden layer)-10(output layer) to reduce the simulation time which mainly depends on the size of hidden layers. The whole circuit system consists of three voltage sources: the amplitude and period of writing voltage source is 6 V and 0.2 µs respectively; 1/2 writing voltage source is 3 V (i.e., less than 4.6 V voltage threshold of memristor), and has the same 0.2 µs period; reading voltage source is 2 V and also has 0.2 µs period. The initial conductance states of memristors to carry out MBNN are all G HRS , and then the weights for each column of the crossbar are sequentially written into one column each time.
For the traditional differential pair scheme of memristor array, the simulation time is set to (100 × 2 × 0.2 + 10,000 × 0.2) µs = 2040 µs , where 100 is the number of hidden layer neurons, 2 represents the number of memristors required to achieve a synapse, 0.2 µs is voltage writing cycle time. In other words, 40 µs resistance tuning time is required for a network with a scale of 400 × 100 × 10, in which the memristor array between the input layer and the hidden layer and the memristor array between the hidden layer and the output layer are modulated simultaneously. The inference time required to test 10,000 digits of MNIST data set is 2000 µs. As to results, the voltage signals of output neurons are obtained, which correspond to the probability of each possible class of ten 10 digits (0-9), the circuit finally chooses the one with the highest probability as the classification result. Figure 8 shows waveforms extracted from the output neurons, where at the first 0.2 µs cycle, the maximum magnitude of voltage values indicates that the winner digit is 7, which is consistent with the correct digit, and the same kind of analysis can be applied to the rest of simulations. Overall, the performance of MBNN based on the differential pair scheme is consistent with the software results, and recognition accuracy of 91.8% can be achieved.
For the proposed MBNN with a two-column memristor reference structure, the simulation time is set to (100 + 2) × 0.2 µs + 10,000 × 0.2 µs = 2020.4 µs, where (100 + 2) is the amount of memristor columns needed to be tuned. Figure 9 shows the voltage signals of output neurons, and the recognition accuracy also reaches 91.8%, which is of course also consistent with the software results.
Some comparison are shown in Table 2. It can be seen that the proposed two-column reference scheme utilized 42,000 memristors, which is reduced by as much as 48.78% compared to 82,000 memristors of the traditional differential pair scheme; writing time is 20.4 µs , 49% less than 40 µs of the traditional differential pair scheme. Both of the two schemes have obtained 91.8% recognition accuracy on MNIST data test set. Though the schematics of the two schemes are different, the same BNN algorithm is correctly mapped by both of them respectively and that is the reason why both of them have achieved the same recognition accuracy.

Non-Ideal Effects on Memristor Crossbar
To speed up the study on non-ideal effects on memristor crossbar of MBNNs, the simulation is futher carried out by a chip-like framework implemented in C++ software [18]. In this case, we considered various non-ideal effects such as yield rate of memristor array, memristor conductance fluctuation, and reading noise in memristor crossbar to evaluate the robustness of the proposed method. Compared with Verilog-AMS approach, the puresoftware simulation speed is faster and it is more convenient to analyze the impacts of non-ideal factors.
As shown in Figure 10, the network size to explore non-ideal factors on MBNN is set as 400-1000-10, and recognition accuracy is 97.41% on the MNIST test data set under ideal conditions. This shows that there is no error in mapping of BNN on memristor array, which also proves the superiority of MBNN built in this paper in an undirect way. Figure 10. Schematic diagram of MBNN structure to study non-deal effects.

Yield Rate of Memristor Array
Commonly, actual memristor array often has device failure problem, for example, resistance value of a memristor could stuck between the maximum resistance and the minimum resistance, regardless of the set or reset operations [47]. The ratio of memristors that can work well to the overall memristors is called yield rate α of array. For example, if a memristor array has a scale of m × n and assuming that its yield rate is α, then the number of memristors that work normally is α(m × n), and the number of failing memristors is (1 − α)(m × n), these memristors are assumed evenly distributed in memristor array as shown in Figure 11, where a black dot represents a memristor that cannot work normally, and a white one indicates a memristor that can work well.  Figure 12 shows the effect of memristive array yield on MBNNs with different schemes, and detailed values are given in Table 3. It can be seen that recognition accuracy of both schemes increases along with the yield rate. For the traditional differential pair scheme, when the yield rate is greater than 92%, the recognition accuracy is greater than 97%, and even when the yield rate is 90%, the recognition accuracy is still greater than 96.9%; for the proposed two-column reference scheme, when the memristor array yield rate is greater than 94%, the network recognition accuracy is greater than 97%, even when the yield rate is 90%, the recognition accuracy is still greater than 96.7%. We can also see that when the array yield remains the same, the recognition accuracy of traditional differential pair scheme is slightly better than the proposed two-column reference scheme, which may be due to the differential pair scheme including a large number of memristors. When one memristor in the pair fails, the two memristors constituting a synapse are in the same state, so the contribution current of the differential operation is 0, that is, the weight turns from +1 or −1 to 0. When a memristor fails in the reference column structure, the weight in the two memristors of a synapse changes reversely. In short, both of them have good robustness as to the yield rate of memristor array.

Memristor Resistance Fluctuation
Memristor resistance fluctuation refers to the difference between the target conductance state and the actual conductance state of memristor. This is mainly caused by two reasons: (1) Due to the limitation of manufacturing process, the electrical characteristics of each memristor are not completely consistent, and conductance fluctuates around expected value; (2) Due to the limitation of the working principle of the conductive filament of memristor, when the state of memristor is modulated by voltage pulse, there is a certain difference between the actual state and the target state [48]. Hence, the actual conductance state may be a little smaller or larger than the target state. In our study, we assume that the target conductance state of memristor is G aim , and the actual conductance state of memristor obeys a normal distribution with the mean value of the target conductance G aim and the standard deviation is σ . Figure 13 is a schematic diagram of memristor fluctuation in memristor array, where the darker the dot color is, the greater the fluctuation of the memristor is; and the lighter the dot color is, the smaller the fluctuation of the memristor is.  Figure 14 shows that the recognition accuracy of both schemes decreases along with memristor fluctuation increases, and detailed values are given in Table 4. For the traditional differential pair scheme, when σ is smaller than 0.2, the recognition accuracy is greater than 97%. When σ equals 0.5, the recognition accuracy is still close to 90%; for the proposed twocolumn reference scheme, when σ is smaller than 0.1, the recognition accuracy is greater than 90%. Compared to the traditional structure, MBNN with a two-column reference structure is less robust to the device fluctuation.

Reading Noise
Reading noise (thermal noise, 1/f noise and random telegraph noise, etc.) causes current read from memristor to be different from the true value [49][50][51]. Conductance changes between reading cycles due to noise follows a Gaussian distribution with a standard deviation of σ from the trained state of that memristor [49]. Figure 15 shows the effect of memristor volatility on MBNN with different schemes, and detailed values are given in Table 5.
It can be seen that recognition accuracy of both schemes decreases as reading noise increases. For the traditional structure, when σ is less than 0.45, recognition accuracy of MBNN is greater than 97%. Even if σ equals 0.5, recognition accuracy is still greater than 96.8%; for the proposed two-column reference scheme, when σ is smaller than 0.45, recognition accuracy of MBNN is greater than 97%; meanwhile, even if σ equals 0.5, recognition accuracy is still greater than 96.08%. When the level of reading noise is the same, the traditional differential pair structure has a greater recognition accuracy than the other one. Overall, the experimental results show that the two schemes have great robustness against reading noise, when σ is greater 0.5, a good recognition accuracy can be both achieved. Figure 15. The recognition accuracy of the two schemes, the x-axis represents the level of reading noise.

Experiment Environment and Configuration
In the training stage, the optimized algorithm of BNN were implemented using Theano framework, a Python library. Theano framework involves a large number of functions commonly used in deep learning, and it is efficient to calculate the gradient of neural network parameters. As for analysis of non-ideal factors, simulations were implemented by the C++ language. Both of them were implemented under the Windows 10 platform. The hardware platform included NVIDIA GeForce GTX1050Ti and Intel(R) Core (TM) i5-8300H CPU, 2.30 GHz, with 16 GB of memory. The computational time information of tasks completed under the Windows 10 platform is shown in Tables 6 and 7 for readers to reference.
The hardware implementation simulated in Cadence Virtuoso version 17.0 environment was under the Linux platform. The processor of the hardware platform included two Intel Xeon Platinum 8124 M, 3.00 GHz, with 36 cores, 72 threads, 440 GB of memory, and Visual Studio Code was used to remote control the server. The computational time information of tasks completed under Linux platform is shown in Table 8.

Analysis of Non-Ideal Factors
The Traditional Differential Structure

Discussion
The ex situ training method is adopted in this work. That is, the weights trained in the software are mapped into the conductance of the crossbar array before the inference stage. Whereas, in the in situ training method, the weight value of each memristor devices has been updated after each iteration during training process in hardware. One of the major limitations of our work is that we do not include a training peripheral circuit in the MBNN implementation. However, the advantage of the ex situ method is that the optimization of the training algorithm is more convenient because there is no need to redesign the inference circuit, which is difficult in the in situ method.
Almost every topic related to the Internet of Things (IoT) systems refers to cloud computing and edge computing. Cloud is on the top level of data processing, which is responsible for large data processing. In contrast, edge computing means processing real-time data near the data source when the data are not sent to the cloud. Good scalability and low-energy consumption make the memristor circuit a promising solution for edge-computing devices, which means that the nanoscale edge-computing systems are application candidates for the proposed MBNN. There is no denying that BNNs show worse performance than analog networks on complex and heavy tasks because of low precision weights. Nevertheless, the recent research works show that the up-to-date memristive devices can achieve more than 64 stable resistive states [52]. This means that one of the major bottlenecks for memristive neural network is being removed. Hence, the proposed method of mapping weights is of great significance to the future hardware implementation for both binary and analog memristive neural networks.

Conclusions
In this work, we proposed a novel MBNN with a two-column reference structure that effectively solves the problem of sneak current effect and validated it by Cadence circuit simulation. Specifically, we made improvements to BNN algorithms and made it suitable for memristor array implementation. Compared to the traditional differential pair structure of MBNN, the proposed two-column reference scheme can reduce both the amount of memristors and the latency to refresh memristor array by nearly 50%. To further explore the robustness, memristor yield rate, fluctuation, and reading noise are considered. The experimental results demonstrate that when array yield α ≥ 5% or reading noise σ ≤ 0.25, a recognition accuracy greater than 97% in the MNIST data set is achieved. The experimental results manifest that the proposed scheme is robust to memristor array yield and reading noise, and slightly sensitive to memristor conductance fluctuation. To sum up, the proposed scheme provides a promising solution for on-chip circuit implementation of BNNs. After all, the MBNN designed and implemented in this work is a kind of fully connected structure. In future work, we will aim at exploring other memristive neural networks with more complex structures, such as convolutional neural network (CNN) and recurrent neural network (RNN), which need to further optimize the function and timing of the peripheral control circuit.

Conflicts of Interest:
The authors declare no conflict of interest.