Minimization of the Line Resistance Impact on Memdiode-Based Simulations of Multilayer Perceptron Arrays Applied to Pattern Recognition

In this paper, we extend the application of the Quasi-Static Memdiode model to the realistic SPICE simulation of memristor-based single (SLPs) and multilayer perceptrons (MLPs) intended for large dataset pattern recognition. By considering ex-situ training and the classification of the hand-written characters of the MNIST database, we evaluate the degradation of the inference accuracy due to the interconnection resistances for MLPs involving up to three hidden neural layers. Two approaches to reduce the impact of the line resistance are considered and implemented in our simulations, they are the inclusion of an iterative calibration algorithm and the partitioning of the synaptic layers into smaller blocks. The obtained results indicate that MLPs are more sensitive to the line resistance effect than SLPs and that partitioning is the most effective way to minimize the impact of high line resistance values.


Introduction
In-memory-computation [1] has been recently proposed as an alternative approach to overcome the inherent bottleneck that limits the performance improvement of traditional Von-Neuman architectures, while also allowing significant energy saving. The key elements to enable the further maturing of this technology are the memory cells, which are required to be nonvolatile (nonvolatile memory, NVM) and to operate at low power [2]. Resistive memories (RRAM) [1] were found to meet these requirements as well as allowing dense memory integration (up to 4F 2 , F being the feature size of the technology node [3]) via architectures such as memristor cross-bar arrays (MCA see Figure 1a). In particular, MCAs are of great interest for the development of hardware-based deep neural networks (DNN, Figure 1b) as they are suitable for implementing the matrix-vector-multiplication (MVM) method necessary to perform operations and propagate signals through the neural layers [2] with reduced power consumption. Such applications have been extensively studied in previous works [4][5][6][7][8] considering various MCA architectures as well as different memristor models. For instance, Li et al. reported in [9] the case of character classification using an MCA-based multilayer perceptron (MLP) of 64 × 54 × 10 neurons with a single layer of hidden neurons.
However, despite these promising studies, the development of in-memory computation is still hindered by the many practical limitations faced by MCAs, such as the line or wire resistances (RL), the limited resistance window of the devices (RON and ROFF) as well as the inherent features associated with the integration of memristors in an MCA such as the so-called sneakpath problem (see Figure 1a). While the former are mainly a consequence of the RL increase as the fabrication technology node scales down [8,10] and which in combination with a reduced resistance window or low RON causes a significant voltage drop across the MCA lines, the latter refers to the non-negligible current flowing through the unselected devices. This causes errors in the read and write processes [10]. Although hardware-based techniques were proposed to address these challenges, they are in general both time, power and cost demanding [9]. Instead, software solutions [4][5][6][7][8][10][11][12][13][14] allow a more systematic study and thus different approaches have been proposed. Among them, SPICE simulation appears to be the most suitable approach as it allows studying the full system, i.e., the MCA and the control electronics necessary to operate the network. However, this approach is normally constrained to the limitations of the memristor model considered and to the size of the memristor-based MLP given the high computational requirements [15,16].
In this regard, the results presented by Aguirre et al. in [17] represent a step forward in the realistic circuital modeling of MCA-based single-layer perceptrons (SLP) involving thousands of devices intended for the classifications of large pattern datasets. A key element in that study is the Quasi-Static Memdiode Model (QMM), a memristor model originally proposed by Miranda in [18,19], that provides high simulation accuracy at reduced computational cost. The closed-form expression for the transport equation, i.e., the current-voltage (I-V) curve (continuous and differentiable) and the recursive nature of the state variable computation, makes the QMM suitable for dealing with arbitrary input signals (continuous or discontinuous, differentiable or nondifferentiable). This is a significant advantage when compared to other widely explored memristor models such as the general phenomenological models (Yakopcic [20], TEAM [21], VTEAM [22], Eshraghian [23], etc.) that although capable of successfully fitting experimental data, rely on various internal equations or artificial window functions (commonly used for modeling the SET/RE-SET transitions) in the memory equation (ME, a first order differential equation that links the current flowing through or the voltage applied to the structure with its internal memory state) that can seriously affect the model's convergence [24,25]. Nevertheless, the extension of the results obtained for the SLP test structures to more practical implementations such as MLPs considering the aforementioned line parasitics is still to be addressed. It is worth pointing out that other memristor device nonidealities threaten the performance of MCA DNNs and are currently the focus of intense research: nonlinearity in the I-V characteristics [26], retention failures [27][28][29], nonuniformity [17,30], Device-to-Device (D2D) and Cycle-to-Cycle (C2C) variability are some of the most representative challenges. However, nonlinearity factors well below 10 have been obtained by optimizing the device fabrication process [31,32] and have also been addressed through specific training [33,34] and voltage mapping [26] methodologies. Additionally, the use of devices with a higher ROFF/RON ratio has been shown to reduce the impact of the D2D variability [17]. Moreover, for the specific case of on-line training, nonlinear weight update [35][36][37] is another relevant source of inaccuracy. In this regard, it has been shown that activation function engineering and threshold weight update schemes effectively suppress training noise [36]. Particularly, the write-verify approach, as the one described in [17,38], allows to mitigate the impact of this effect while also providing robustness against C2C and D2D variability [39]. Line resistance (RL) is another nonideal factor that worsens as the technology scales down [8,10]. Therefore, the realistic simulation and optimization of DNNs considering the impact of line resistance is of utmost importance to enable robust implementation of neuromorphic circuits independently of the technology node and RRAM device optimization.
In this paper, we demonstrate the applicability of the QMM to SPICE simulations of MCA-based MLPs and evaluate the inference accuracy degradation as a function of RL. Ex-situ training is considered and the classification of the grayscale images from the MNIST dataset [40] is assumed for benchmarking purposes. The simulation workflow presented in [17] was modified so as to account for multiple synaptic layers and hidden neural layers. In order to minimize the impact of RL, two approaches were evaluated, they are the divisions of each synaptic layer into smaller partitions and the inclusion of a calibration procedure that compensates the effects associated with RL. The rest of this paper is organized as follows: the fundamentals (I-V and ME characteristics) of the QMM are presented in Section 2. Section 3 explains the MCA-based MLP training and simulation procedures, including the MCA partitioning and RL-dependent calibration. In Section 4, the obtained simulation results in terms of the aforementioned features are discussed. Finally, in Section 5, the general conclusions of this paper are presented. To the best of the authors knowledge, the study of MLPs including the parasitic effects by means of SPICE simulations and considering a realistic memristor model has not been published before.

Quasi-Static Memdiode Model
The resistive switching (RS) mechanism is the fundamental phenomenon behind RRAM devices. In the particular cases of CBRAMs and OxRAMs, RS relies on the displacement of metal ions/oxygen vacancies within the dielectric film in a metal-insulatormetal (MIM) structure originated from the application of an external electrical stimulus, current or voltage [41][42][43][44]. Such migration of ions causes the alternate completion and destruction of a conductive filament (CF) spanning across the insulating film. For a ruptured CF, the device is in the high resistance state (HRS), often characterized by an exponential I-V relationship, while the completion of the CF leads to the low resistance state (LRS), which often exhibits a linear I-V curve [45,46]. In between these two extreme situations, the modulation of the CF transport properties renders intermediate states by voltage-controlled redox reactions. From the modeling viewpoint, the compact model originally proposed by Miranda in [18] and later extended by Patterson et al. in [19] is able to describe the major and minor I-V loops and the gradual transitions in bipolar resistive switches. This is accomplished, as shown in the inset of Figure 2a, by considering a nonlinear transport equation based on two identical opposite-biased diodes in series with a resistor. The I-V relationship resembles a diode with memory and that is why this device was termed memdiode. Notice that the antiparallel connected diodes allow the bidirec-tional current flow through the memdiode device, as for both positive and negative polarities there will be a forward biased diode. For the sake of completeness, the QMM is succinctly reviewed in the next paragraphs.  (4)). Ω is the space of feasible states S. The red thick faded line superimposed to the hysteron model indicates the trajectory of the state variable λ inside Ω from an initial S1 to a final S2 state. The inset in the right shows the equivalent circuit model for the current equation (Equation (1)) including the series resistance. The diodes are driven by the memory state of the device and one diode is activated at a time. Typical I-V characteristic for a memdiode obtained via simulation of the proposed model are superimposed. Current evolution is indicated by the blue arrows. (b) I-V characteristics of the memdiode showing the exponential (HRS) to lineal (LRS) transition by varying λ. The red shaded region indicates the possible voltages applied to the device as the read margin reduces. IHRS and ILRS currents are pinpointed at nominal Vread with the grey and white circle markers, respectively. Overestimation of IHRS may occur when considering a linear model for the HRS regime and lower effective Vread voltages as indicated by the cyan, blue and black ball markers. (c) Experimental I-V loops of different materials reported in the literature fitted with the QMM model: Al2O3 [47] and TaOX [46]. Physically, the memdiode is associated with a potential barrier that controls the electron flow in the CF. The conduction properties of this nonlinear device change according to the variation of this barrier. Due to the uncertainty in the area of the CF, instead of the potential barrier height, the diode current amplitude is used as the reference variable. Following Chua's memristive device theory, the proposed model comprises two equations, one for the electron transport and a second equation for the memory state of the device (ME), which is controlled by a hysteresis operator. The equation for the I-V characteristic of a memdiode is given by the expression: where I0(λ) = Imin(1 − λ) + Imaxλ is the diode current amplitude, α a fitting constant, and R a series resistance. Equation (1) is the solution of a diode with series resistance and W is the Lambert function. Imin and Imax are the minimum and maximum values of the current amplitude, respectively. abs(V) is the absolute value of the applied bias and sgn() the sign function. As I0 increases in Equation (1), the I-V curve changes its shape from exponential to linear through a continuum of states as experimentally observed for this kind of device. λ is a control parameter that runs between 0 (HRS) and 1 (LRS) and is given by the recursive operator (Equation (2)): where min() and max() are the minimum and maximum functions, respectively, and ⃖� is the voltage a timestep before V. The positive and negative ridge functions in Equation (2), Γ + (V) and Γ − (V) represent the transitions from HRS to LRS (SET) and vice versa (RESET) and can be physically linked to the completion and destruction of the CF [45,46], respectively. They are defined by Equations (3) and (4) where η + and η − are the transition rates and V + and V − the threshold voltages for SET and RESET, respectively. λ(V) defines the so-called logistic hysteron or memory map of the device and keeps track of the history of the device as a function of the applied voltage (see Figure 2a). λ calculated from Equation (2) (1) and (2) results in a I-V loop such as that superimposed to the hysteron loop in Figure 2a, which starts in HRS (λ = 0) and evolves as indicated by the blue arrows. The name quasi-static comes from the fact that the characteristic time of the ions/vacancies responsible of the switching phenomenon is assumed to be infinite for a state within the hysteron structure. This implies that for a state located inside the hysteron loop no changes occur in the conduction characteristics, unless it reaches the ridge functions Γ + (V) or Γ − (V). The QMM can be transformed into a dynamic model by incorporating the time module described in [19]. Figure 2b shows the HRS (exponential) to LRS (linear) transition, altogether with some intermediate states (solid blue lines). Note that the memdiode model can successfully describe both HRS and LRS curves by solely changing a single parameter in the transport equation. As λ is swept from 10 −7 to 1, I0 in Equation (1) varies between and , causing the I-V curve to gradually change its shape from linear-exponential (HRS regime) to linear (LRS regime). This is a consequence of the potential drop in the series resistance which linearizes the transport equation. In a neuromorphic application such as the one discussed in this paper, the intermediate conductance states are achieved by means of a Write-Verify iterative loop approach. In such method, pulses of incremental amplitude are applied to the devices (Write) until the required conductance is reached (Verify) [17,38]. If the target conductance is exceeded, then increasing pulses with the opposite polarity are applied in a similar fashion to gradually reduce the conductance value (within an error margin). This writing methodology implies a transition as the one depicted in Figure 2a by the red-thick faded line, where the incremental pulses cause the system to evolve from the initial state S1 up to the final state S2 following Γ + . If the conductance target is exceeded, then the system moves down along Γ − by the application of voltage pulses with the appropriate polarity. Another relevant feature of the proposed model is that it can be described by a simple SPICE script as shown in [17]. Finally, the accuracy of the model is reported in Figure 2c by fitting experimental data extracted from different published works. In particular, results obtained for Al2O3 [47] and TaOx [46] structures at room temperature under DC voltage sweeps are presented. In summary, the proposed QMM not only provides a simple SPICE-compatible implementation for the resistive memory devices but also a versatile one, as it can accurately fit the major and minor I-V loops measured in a wide variety of RRAM devices

MCA-Based MLP Modeling and RL Calibration
Based on the procedure previously reported in [17] to create and simulate realistic circuital MCA-based SLPs intended for large dataset pattern recognition tasks, a novel procedure is derived here to account for a more practical case such as the MLP. For simplicity, ex-situ (off-line) supervised learning will remain as the training method of choice. To evaluate the MLP performance, the recognition of patterns from the MNIST [40] database (see Figure 3a,b) will be considered. Besides the extension to MLP classifiers, this modified workflow also involves an iterative calibration algorithm intended to minimize the RL-induced degradation of the inference accuracy. The chart depicted in Figure 3c summarizes the workflow. The tasks can be split into three parts: the first one comprises a set of MATLAB subroutines for creating, training, and writing the SPICE netlist for an ideal feed-forward MLP. The second part creates an idealized fully linear model of the MCA-based artificial neural networks (ANNs) in Python to calibrate the synaptic weights obtained during the training to account for the parasitic line resistances (the details can be seen in Figure 3d). Last but not least, the third part relates to the SPICE simulation of the proposed circuit during the inference phase.

Simulation Flow
Regarding the MATLAB-implemented part of the procedure, the first step consists in creating the image (n × n pixels) database. This includes rescaling each image of the original database (item (1) in the flowchart shown in Figure 3c). The MNIST (Modified National Institute of Standards and Technology) is a large database of handwritten digits from 0 to 9 commonly used for training and testing image processing systems including ANNs in the field of machine learning. This database contains 60,000 training images and 10,000 testing images, both in grayscale and with a 28 × 28 pixels resolution [40]. A few examples of these images can be seen in Figure 3a where the x and y axes stand for the pixel index. Pixel brightness is codified into 256 gray levels between 0 (fully OFF, black) and 1 (fully ON, white). Resizing to different resolutions can be seen in Figure 3b.
Then, a software-based SLP or MLP with n 2 inputs, 10 outputs and a number N of hidden neural layers (each of them comprising mi neurons) is created (2) and trained (3) using the previously rescaled database of training images (4). The MLP (or SLP) is ex-situ trained considering the scaled conjugate gradient (SCG) [48] as the training algorithm, as proposed in [17]. Further details concerning the training function are beyond the scope of this work, as we focus on the MCA-based implementation of the MLP. This produces N + 1 weight matrices ∈ ℝ, with k ∈ {1,2,…,N + 1} (5) (for instance for two hidden layers with m1 and m2 neurons each, three weight matrices 1 , 2 and 3 are obtained, with sizes n 2 × m1, m1 × m2 and m2 × 10, respectively). To allow rendering both the positive and negative elements of with the always positive conductance of the MCA, each synaptic weight is implemented using two memdiodes as suggested in [49][50][51] resulting in two MCAs per synaptic layer. Thereby, each matrix is split into two matrices + and − as: each of them containing only positive weights, so that = + − − . In the next step, the conductance matrices + and − ( (6) and (7)) to be mapped onto the MCAs are calculated by the linear transformation [52]:  (8), adding the parasitic wire resistance, connection scheme, and control logic necessary to perform the inference phase. As reported in [17], a single MCA is not efficient for implementing large matrices. Given that both RL and RON/ROFF are normally defined by the selected fabrication node and RS mechanism, respectively, a widely accepted [51,53] alternative design consists of dividing the large matrices into smaller partitions, whose reduced size improves the voltage effectively delivered to the memristive cell. Figure 4a shows the simplified circuit schematic of the partitioned MCA and the interconnections required to realize the complete matrixvector multiplication (MVM) in the 1st synaptic layer. Exploding the integrability of the MCA with CMOS circuitry, vertical interconnects used to connect the outputs of the vertical MCA partitions may be placed under the partitioned structure, as well as the analogue sensing electronics, allowing the partitioned MCA to maintain a similar area consumption than the original nonpartitioned case [51]. The vertical interconnects are grounded through the sensing circuit to absorb the currents within the same vertical wire. As in this work we focus on the artificial synapses modeling using the memdiode model, hidden neurons in the k th hidden neural layer connecting the two adjacent layers of synapses k − 1 and k + 1 are implemented in terms of a behavioral SPICE model. The model for each neuron involves a trans-impedance amplifier (TIA) that translates the output current in the associated bitline on the i − 1 synaptic layer to a voltage which is fed to a nonlinear activation function and then propagated to the corresponding wordline in the i + 1 synaptic layer. In this paper, we consider a log-sigmoidal (1/(1 + e −x )) activation function, though a tan-sigmoidal activation function could be used as well. The input stimulus in each synaptic layer is delivered following a dual side connection (DSC) scheme, as shown in the simplified equivalent circuit in Figure 4a. Despite the increased peripheral circuitry complexity, this scheme improves the voltage delivery to each synapse [10] by connecting the two wordline terminals to the same input stimuli. The input stimuli of the 1st synaptic layer is obtained by unrolling each of the rescaled grayscale n × n images of the test database (9) into an equivalent n 2 × 1 vector and scaling it by a voltage . is chosen such as to prevent altering the memdiode states during the inference simulation. In this way, during the inference process each of the test images is presented to the MLP as a vector of analogue voltages in the range [0, Vread]. Once the circuit netlist has been generated, it is passed to the HSPICE simulator (10) which evaluates the voltage and current distributions in the MCA-based MLP circuit while it processes and classifies the input images (11) and then passes the resulting waveforms back to the MATLAB routine for metrics extraction (12).

Calibration Tool Workflow
In an ideal case scenario, that is with RL negligible, the output current for a column (or bitline) in a MCA of the ℎ synaptic layer of a MLP or SLP is given by Equation (8), where the , +(−) elements are the junction conductances along one bitline and , the wordline voltages. Note that the voltage applied to each artificial synapse (conductances) is independent of the device location within the MCA provided that no voltage drops occur in the interconnection lines. Instead, in a real case scenario, the voltage applied across each junction depends on the device location in the MCA partition as indicated by Equation (9). This happens because a significant IR-drop occurs in the line resistances. Consequently, the voltage applied across each junction is always lower than the applied wordline voltage, and so it is the resulting output current An interesting approach to compensate for the smaller currents was presented by Lee et al. in [13]. In their study, the authors propose to increase the conductance level of each individual memory cell proportionally to the voltage reduction. Let us then consider a To speed-up the iterative calibration process, a parametric fully linear model of the MCA-based MLP was developed. In this scenario each memristor is represented as a resistor of fixed value and the overall MCA model (see Figure 4b) is expressed in terms of a system of coupled equations arising from considering the Current Kirchhoff Law at each junction of the MCA (see Figure 4c). The details of such modeling approach first considered in [10] are included in Appendix A. This method avoids calculating the required values of λ for each memdiode in each iteration, which significantly reduces the calibration time, especially for large networks. The simulation code was implemented in Python, taking advantage of the object oriented programming characteristic of such language. In this context, each MCA in a partitioned multilayer perceptron can be easily created as an instance of a unique class that describes the properties and behavior of a single MCA.
The details of the iterative calibration process (block (5) in the flowchart of Figure 3c) are illustrated in Figure 3d and Algorithm 1. First, the synaptic weight matrices , delivered from the training process (block (4) in the flowchart of Figure 3c) are mapped onto each MCA of the complete MLP (block (C2) in the flowchart of Figure 3d). The input stimuli feed to each MCA during calibration consists of a vector of analogue voltages obtained from averaging the brightness of each pixel from the images of the training set (C3). By solving the system of coupled equations, the effective voltage delivered to each memristor is calculated and used to compute the required calibration factor  (6) and (7) in the flowchart of Figure 3c).

Simulation Results and Discussion
The line resistance between adjacent cells can be calculated as RL = ρ ·L/(W·T), where L and W are the wire length between adjacent cells and wire width, respectively. For simplicity, they are taken equal to the feature size F. T is the metal thickness which is assumed >10 nm. In this context, RL ranges from 1 to 10 Ω, as the resistivity of conventional metal wires (ρ) ranges from 10 −8 to 10 −7 Ω·m. Thereby, RL can be estimated to be ≈4.53, 2.97, and 1.55 Ω for the 16,22, and 32 nm technology nodes, respectively [13]. However, in Cu-wires there is a non-negligible size-dependent resistivity for technology nodes below the 10 nm limit, caused by the surface and grain boundary scattering as the mean free path of electrons becomes comparable to the wire dimensions. According to the Fuchs-Sondheimer (FS) and the Mayadas-Shatzkes (MS) models [53], RL for highly scaled nodes can be as large as ≈100 kΩ. Considering the QMM model, the influence of the line resistance is evaluated in the following Sections 4.1 and 4.2, assuming a different number of hidden layers and two alternative approaches to minimize the parasitic voltage drop, respectively.

Influence of the Number of Hidden Layers
Unlike the case of SLP, where the network size (in terms of the number of devices) is fixed by the pattern features and possible output classes, in case of MLP, the introduction of hidden neural layers results in multiple possible networks for the classification of a given pattern dataset [54]. In this regard, it is known that as the number of hidden layers increases, so does the overall network accuracy. Nevertheless, when considering a realistic memristor-based implementation as done in this paper, there is a degradation of the signals propagated across each synaptic layer due to the line resistance in combination with the sneakpath effect. Consequently, the hidden neurons are prone to propagate erroneous signals, thus threatening the accuracy of the MLP. To shed light on this issue, five MLPs comprising different numbers of hidden layers and neurons per layer were simulated. The obtained results are summarized in Table 1, considering for all cases the MNIST images resized to 8 × 8 px, dual-side-connection, no partitioning of the MCA used for each synaptic layer, Vread = 300 mV and RL swept from 100 mΩ to 1 kΩ. The synaptic connections are modeled with the QMM SPICE subcircuit described in [17] and considering the following set of parameters: Imin = 85 nA, Imax = 52 µA, αmin = 4.5, αmax = 2.5, Rmin = Rmin = 110 Ω, and β = 0.5. This combination of values renders (at the nominal Vread) resistances ROFF ≈ 577 kΩ and RON ≈ 7.5 kΩ (approx. an ROFF/RON ratio in the order of 100). The simulation results are graphically reported in Figure 5, where the inference accuracy as function of RL is shown normalized against the inference accuracy for RL→0 Ω. A central point to highlight here is the notorious increase of the MLP sensitivity to RL when compared against the reference SLP, regardless of the number of hidden layers and neurons per layer. This could be explained by taking into account the larger size of the synaptic layers involved for the MLPs cases (the MCAs used for the SLP has a maximum size of 64 × 10 while for the MLP-#a it increases up to 64 × 54). The use of larger MCAs with no partitions to implement the synaptic connections degrades the effective voltage delivered to the synapses located away from the driving ports of the MCA. The ratio between the effective voltage delivered to each synapse and the nominal applied voltage is known as the read margin, and it has been shown in [17] that for a given value of RL, it decreases as the size of the MCA grows, directly degrading the inference accuracy. This interpretation is also supported by the results obtained for the set of MLPs named MLP-#b. For these simulations, the RL sensitivity further increases as it could be expected given the bigger size of the largest MCA involved in the network (64 × 100 for the set MLP-#b against 64 × 54 for MLP-#a). It is also worth noting that both the MLP-#a and MLP-#b sets follow unique decreasing trends with RL regardless of the number of layers. Thereby the increase in the number of hidden layers does not significantly compromise the RL sensitivity but allows a non-negligible increase in the inference accuracy, as shown in the inset of Figure 5. Instead, the number of neurons per layer causes a sensible increase of the inference accuracy degradation caused by RL, as it implies changes in the MCAs used to implement the MLP.  Table I for details) are considered as well as a SLP for comparison purposes. The inset in the lower left shows the inference accuracy at RL→0 Ω for the different cases considered. Note that the RL dependency of the inference accuracy is determined by the size of the largest MLP layer, and it presents a very shallow dependence on the number of hidden layers. In fact, the increase in the number of hidden layers allows boosting the inference accuracy as RL decreases without compromising the MLP sensitivity to RL variations.

Techniques to Minimize the Impact of the Line Resistance (RL)
As mentioned in the previous subsection, the voltage drop occurring across the parasitic line resistances imposes a serious limitation to the number of neurons that can be included in each neural layer without causing a major reduction of the inference accuracy. Methods to minimize this problem are thereby mandatory to allow rendering MLPs capable of dealing with large input patterns. For instance, in [55], Truong et al. proposed a circuit to compensate the voltage drop across the interconnections. Although capable of improving the inference metrics, the proposed method implies a significant circuit overhead and might be not suitable for networks involving a large number of neurons. Therefore, the search for alternative solutions requiring lesser additional circuitry is encouraged. Two of them were discussed in Sections 3.1 and 3.2, namely the MCA partitioning and the iterative calibration of the synaptic weights. Although tested in [13,17], their applicability in MLP has not yet been addressed considering realistic electrical simulations. Thus, in this section the capability of such techniques to mitigate the line resistance impact on MLP is studied based on the framework defined in Section 3.1 and using the same values for the QMM as in Section 4.1. Only one hidden layer is considered as it was shown in Section 4.1 that the number of layers does not significantly alter the RL dependency.
Let us first consider the nonpartitioned (NP = 1), uncalibrated cases (blue empty markers). As it can be seen in Figure 6, in all cases (MLP and SLP for 8 × 8 px. and 14 × 14 px. images) the inference accuracy approaches the ideal case as RL tends to zero. Nonetheless, when considering the 14 × 14 px. images (Figure 6b,d) a higher accuracy degradation is observed, as expected for the use of a larger MCA as the first synaptic layer. This can be seen as a left-shift of the accuracy vs. RL curves when the image size is increased and occurs both for the SLP (see the displacement of the trend from Figure 6a,b) and the MLP (Figure 6c,d) cases. Note that for the 14 × 14 px. images there is a significant accuracy loss even for low values of the line resistance (see Figure 6d for instance, where a value of RL of approx. 5 Ω obtained for a feature size of 16 nm causes the inference accuracy to drop from approx. 96% to 73%). It is also worth mentioning that the steeper decrease of the inference accuracy vs. RL observed in MLPs vs. SLP trained to classify the 8 × 8 px. images in Figure 5 is also present for the 14 × 14 px. images (see Figure 6b,d). To improve the metrics discussed in the previous paragraph, the post-training iterative calibration of the synaptic weights is first performed on the nonpartitioned SLP (filled blue lines). This process has two different outcomes: on one hand when considering the SLP case, a clear improvement of up to approx. 30% for the 8 × 8 px images (Figure 6a) can be observed for RL values approaching 100 Ω in highly scaled fabrication nodes [53]. Beyond this limit, the capability of the calibration method to reduce the voltage drop in the interconnections is not enough and thereby the accuracy improvement becomes smaller. A very similar behavior is shown in Figure 6b for the 14 × 14 px images. However, given the larger size of the MCAs involved, the improvement is smaller (not bigger than 20%). As the target calibration factor passed to the calibration routine is defined by the user, in this paper we performed an iterative loop to automatically determine the calibration factor that allows maximizing the inference accuracy. The resulting factors are plotted against the inference accuracy in the inset of Figure 6a for the 8 × 8 px. images. Note that for low RL values, the calibration factor plays no role as no calibration is indeed required (the parasitic voltage drop due to the line resistance is negligible). Then the factor is tightened and progressively relaxed as the line resistance increases, as if the calibration factor is too exigent the calibration cannot yield a real accuracy improvement.
When addressing the case of the MLPs with different sizes, it was found that the calibration process produces a marginal improvement, resulting in identical inference vs. RL trends as in the noncalibrated cases (and thereby not plotted in Figure 6c,d as they would coincide with the noncalibrated trends). Instead, the use of partitioned schemes for the realization of the complete MCA-based synaptic layers is shown to be efficient both for SLPs and MLPs. For instance, when the 64 × 10 SLP shown in Figure 6a is partitioned into four blocks of 16 × 10 the inference accuracy notably increases (note the empty red markers). The same effect is observed for the 196 × 10 MCA (partitioned into four blocks of 49 × 10) from which the results presented in Figure 6b were extracted. Furthermore, the inference accuracy of the partitioned SLP can also be improved by using the calibration algorithm (filled red markers in Figure 6a,b). For the MLP, the enhancement achieved with the partitioning is seen as a right shift in the accuracy vs. RL trends. Note that in these cases, the first layer in the 64 × 54 × 10 MLP (Figure 6c) was implemented with 12 blocks of 16 × 18 and the second layer with three blocks of 18 × 10 and for the 196 × 20 × 10 MLP (Figure 6d), the first layer was implemented using four partitions of 49 × 20 and the second layer was not partitioned (20 × 10).

Conclusions
In this paper we extended the use of the Quasi-static Memdiode Model (QMM) previously proven for single-layer perceptrons (SLPs) to the SPICE modeling and simulation of multilayer perceptrons (MLPs) intended for large dataset pattern recognition. The versatility and reduced computational cost of this model allow performing electrical simulations without losing accuracy. The inference performance was tested considering the MNIST dataset of grey-scale handwritten digits, rescaled to different resolutions to test MLPs of different sizes. Two aspects were analyzed: the impact of the MLP structure (number of layers and neurons per layer) on the inference accuracy and alternative techniques to mitigate the impact of the line resistance. Concerning the first point, it was found that the number of hidden layers does not cause major variations in the line resistance dependence of the inference accuracy. Instead, it is the size of the largest synaptic layer what acts as a bottleneck, severely limiting the overall accuracy. Thereby the addition of memristive-based synaptic layers helps improving the accuracy without inducing further RL-related degradation. Concerning the second point, the use of partitioned schemes was shown to provide the best performance results both in SLP and MLP when compared to the calibration technique. In fact, the calibration technique resulted in no gain in terms of accuracy when applied to MLP networks. This should be taken into account by circuit designers.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Each MCA is described using the equivalent circuit schematic represented in Figure  4b, by considering the 2nd Kirchhoff's law on the terminals of each memristive device (see Figure 4c), which will have the form of any of the Equations (A1)-(A6) depending on the device location within the MCA. GL is the line conductance (1/RL), , = , are the wordline and bitline access conductances (resistance in the BL/WL terminals) ( , =1/ , and , =1/ , ), , are the applied voltages in the WL terminals, corresponding to the MNIST images and , are grounded through a sensing resistor. Node voltages in the WLs are indicated as , and, in the same way, , refers to node voltages in the BLs. Six different equations arise as they account for the elements located at the BL/WL terminals (Equations (A2), (A3), (A5) and A6) or somewhere in between (Equations (A1) and (A4)). This results in a system of 2mn coupled equations, with 2mn unknown voltages corresponding to the WL ( V WL = �V 1,1 WL , V 1,2 WL , … , V 1,n WL , V 2,1 WL , … , V n,m WL � T ) and BL ( V BL = �V 1,1 BL , V 1,2 BL , … , V 1,n BL , V 2,1 BL , … , V n,m BL � T ) voltages. By defining the column vectors EWL and EBL as �G 1,in WL V 1,in WL , 0, … , G 2,in WL V 2,in WL , 0, … , G m,in WL V m,in WL � and �G 1,in BL V 1,in BL , 0, … , G 2,in BL V 2,in BL , 0, … , G n,in BL V n,in BL � respectively, Equations (A1)-(A6) can be represented following a matrix formulation as in Equation (A7): where all A, B, C, and D matrix are m × n. Further details regarding the structure of these matrices can be found in [10]. Then the output of the m×n MCA is a row vector of 1 × n currents, defined as = , , with 1 ≤ ≤ , obtained by solving Equation (A7). It should be noted that the system of coupled equations presented in the matrix formulation of Equation (A7) allow representing both the case of the input voltage being applied from one single side or from both sides of WLs.