Bitwise Logical Operations in VCMA-MRAM

Today's technology demands compact, portable, fast, and energy-efficient devices. One approach to making energy-efficient devices is an in-memory computation that addresses the memory bottleneck issues of the present computing system by utilizing a spintronic device viz. magnetic tunnel junction (MTJ). Further, area and energy can be reduced through approximate computation. We present a circuit design based on the logic-in-memory computing paradigm on voltage controlled magnetic anisotropy magnetoresistive random access memory (VCMA-MRAM). During the computation, multiple bit cells within the memory array are selected that are in parallel by activating multiple word lines. The designed circuit performs all logic operations-NOT, AND-NAND, OR-NOR, and arithmetic operation SUM (1-bit approximate adder with 75% accuracy for SUM and accurate carry out) by slight modification using control signals. All the simulations have been performed at a 45 nm CMOS technology node with VCMA-MTJ compact model by using the HSPICE simulator. Simulations show that the 1-bit approximate adder saves 52% energy, reduces hardware count by 72%, and delays by 44.3% compared to its counterpart 1-bit exact adder.


I. INTRODUCTION
In von Neumann computing architecture, memory and processing units are separated by power-hungry high capacitive buses, so that in order to perform any task, data transfer from memory to processing unit and (post-processing) from processing to memory unit causes problems like energy bottleneck, memory access latency, input-output congestion, etc.
[1]- [3]. Furthermore, the scaling of CMOS-based devices is becoming challenging due to associated static and dynamic energy dissipation [4], [5]. One of the most optimal solutions to alleviate these issues is to compute and store information at the same place in new physical variables such as the electron spin (spintronics) [6] or the material phase such as crystalline and amorphous phases (Phase change memory) [7]. By exploring these novel concepts and also new architectures such as in-memory computing [8]- [12] and neuromorphic computing, we can achieve a more sustainable future for computing and storage technology.
The spintronic device MTJ has the capability of storing as well as processing the data at the same place. It consists of three layers; an oxide layer, a free layer, and a fixed layer. The oxide layer is sandwiched between the two ferromagnetic layers as shown in Fig. 1(a). The direction of magnetization of the reference layer is fixed while the direction of magnetization of the free layer is free to move or rotate and magnetization dynamics of the free layer are determined by Landau-Lifshitz-Gilbert (LLG) equation [6], [13]- [16]. Depending upon the direction of magnetization of both the ferromagnetic layers it offers two modes of configuration; If the directions are parallel (antiparallel) then the MTJ acts as a low (high) resistance device as illustrated in Fig. 1(b)-(c). The quantitative difference between high resistance and low resistance is expressed by TMR (Tunnel magnetoresistance ratio). A MTJ stores data (low logic or high logic) in its free layer's magnetization. The direction of free layer's magnetization is switched by various switching schemes e.g. spin transfer torque (STT) [6], [13], spin hall effect (SHE) [14], voltage controlled magnetic anisotropy (VCMA) [15], SHE assisted STT [14], and VCMA assisted STT [16]. Additionally, approximate computing [17]- [20] is an effective way to reduce energy, area, and delay that can be used for error-resilient applications, e.g. voice processing, image processing, data mining, and pattern recognition. In light of the aforementioned challenges, this paper presents a fully non-volatile hybrid MTJ/CMOS logic-inmemory computing-based multi-functional circuit that performs all the basic logic operations such as NOT, AND-NAND, OR-NOR, and arithmetic operation SUM (1-bit approximate adder with 75% accuracy for sum and accurate carry out) by a slight modification using the control signals. All the simulations have been performed at a 45 nm CMOS technology with VCMA-MTJ compact model using the HSPICE simulator. The proposed multi-functional circuit will the main building block of the future MTJ-based processors, in which the CMOS transistors will be replaced by hybrid MTJ/CMOS or only MTJ.
The rest of the paper is organized as follows. Section II covers previous state-of-the-art structures and basics of VCMA-MRAM bank. Section III demonstrates the working principle of the proposed multi-functional circuit. Simulation results are discussed in Section IV. Section IV finally concludes the paper.

II. PRIOR WORKS
C. Wang et al. designed a circuit that performs Read, AND, OR, Sum, and carry operations. It consists of eight MTJs and eleven MOSFETs excluding the writing circuit for MTJs in its logic tree [21]. W. Kang et al. presented a circuit in [9] that performs all the basic logic operations such as NOT, NAND-AND, OR-NOR, and memory read. The hardware needed is only six MTJs in its logic tree excluding the writing circuit.
The schematic diagram of the VCMA-MRAM bank is illustrated in Fig. 2. A conventional bit cell consists of one MTJ and one transistor both are in series and an array of bit-cells is designed with multiple word-lines, source-lines, and bit-lines [22]. The spintronic memory bank contains a bit-cell array, sense amplifier, word-line driver, write driver, row/column decoder, and input/output interfaces. The row and column decoder are used to select a particular bit-cell as per its row and column address. The data in the MTJs are written by passing shaped voltage pulse through a write driver [23]. The shaped voltage pulse changes the magnetization state of the free layer of a MTJ or can say data is written in the bitcell. In this paper, the circuit is designed by selecting five bit cells from the memory array in which three bit cells are connected to one arm of differential sense amplifier and other two bit cells are connected to other arm. The implementation and working of other peripheral circuits are not the focus of this paper.

III. PROPOSED MULTI-FUNCTIONAL CIRCUIT
The proposed logic-in-memory based multi-functional circuit's schematic diagram is shown in Fig. 3. The hybrid MTJ-CMOS circuit performs all the basic logic operations such as NOT, AND-NAND, OR-NOR, and an arithmetic operation-SUM (1-bit approximate adder with 75% accuracy for sum and accurate carry out) by slight modification using the control signals illustrated in Table I.
The circuit as shown in Fig. 3 consists of two parts: (1) sense amplifier; SRAM-based sense amplifier is cross-coupled with two inverters that sense the small voltage difference across the nodes and pulls one of its nodes to the full swing voltage levels and the other one to zero in the differential manner in the sensing mode, and (2) logic tree; made up of MTJs that store input logic values in their spin. The reconfigurable MTJs use the VCMA-assisted STT switching scheme. The multifunctional circuit is a dynamic circuit that works in two phases: (1) pre-charge phase, and (2) evaluate phase. In the pre-charge phase, when Clk = 0, all the transistors turn ON except the transistor N3; output nodes are pre-charged to supply voltage minus the threshold voltage of the PMOS transistor. In evaluate phase, when Clk = 1, all the transistors turn ON except the transistor P2; based on inputs logic values that are stored on MTJs one of its output node discharges to zero through a lower resistance path and pulls the other output node to a full supply voltage, V DD .  The functionality of the circuit is controlled by the left arm MTJs. The right arm MTJs have resistances, R1L and R2L that are fixed and both the MTJs are in low resistance mode. The circuit performs two inputs AND-NAND operations between inputs logic A and logic B at the control signal MTJ Ci stores logic zero (offers high resistance) as shown in Table I and output nodes store AND and NAND logic values in OUT L and OUT R respectively. When the control signal MTJ Ci is kept at a high logic value then the circuit performs 1-bit OR and NOR operations between inputs logic A and logic B. The output nodes OUT L and OUT R give OR and NOR logical values respectively. The circuit acts as a logical NOT operation when the MTJ A and MTJ B stores the same logic values that are considered to be input to the NOT gate irrespective of the values of MTJ Ci. The output node OUT L gives logical NOT values. Moreover, the circuit performs arithmetic SUM operation; the MTJ A , and the MTJ B are one-bit inputs and the MTJ Ci is carry-in, operations are performed among the inputs and hence logical SUM (Asum) with 75% accuracy and carry output (Cout) with 100% accuracy as per the truth Table II are stored in output nodes OUT L and OUT R, respectively. To understand the functionality of the circuit with more clarity, let's consider that the inputs MTJ A store logical value 1, MTJ B stores logical value 0, and the carry input to the MTJ Ci stores the logical value 1 therefore the total resistance of the left arm is 11.40 KΩ and the right arm is 13.25 KΩ so that the output node OUT L discharges faster than output node OUT R and hence OUT L is at zero voltage while the OUT R node to full swings supply voltage.

IV. SIMULATIONS AND DISCUSSION
Simulations have been performed at 45 nm CMOS technology node using predictive technology model (PTM) and voltage-controlled magnetic anisotropy (VCMA) assisted spintransfer torque (STT) switching mechanism at VCMA coefficient value of 105 fJV −1 m −1 using VCMA-MTJ compact model [15] with HSPICE simulator in order to validate its functionalities as well as its robustness against the counterpart exact adder (EA) [24]. The comparison circuit is also simulated with the same MTJ and CMOS model for fair comparison which is illustrated in Table III.
The transient response of logic operations NOT is shown in Fig. 4, AND, NAND, OR, and NOR are shown in Fig. 5, and arithmetic operation SUM (approximate adder) is shown in Fig. 6 for all possible input combinations. A clock voltage pulse of 0.9 V peak to peak amplitude and 8 ns period with a duty cycle of 50% is applied for all the operations. When CLK=0; reconfigurable MTJs are in the writing phase. The  MTJs store logic values by applying a shaped voltage pulse Vpulse of amplitude 0.8 V for VCMA switching followed by lower voltage (for STT switching) of +0.55 V for high logic value or -0.55 V for lower logic value is shown in Fig. 4. When CLK = 1; the circuit is in the evaluation phase so it gives the logical NOT values as NOT (A) of applied input A as shown in Fig. 4 and logical AND-NAND, OR-NOR values of inputs logic values A and B as shown in Fig. 5. The input-output waveform of the approximate adder (SUM) is illustrated in Fig. 6 as per the truth table that is shown in Table II.
The performance metrics of all the logical operations are tabulated in Table III. The approximate adder performance is compared with its counterpart exact adder [24] which is also shown in Table III. The approximate adder (APXA) has one sense amplifier for both Asum and Cout while the exact adder (EA) has two sense amplifiers one for sum and other for Cout so that the exact adder consumes more than 50% energy as compared to the approximate adder (APXA). In APXA, MTJs are in parallel so the net resistance is lower than the individual MTJ resistance while MTJs are in series in exact adder and hence, the net resistance becomes roughly 3 times of the individual MTJ resistance so that APXA is faster than EA by 44.3%. The switching energy depends upon the number of reconfigurable MTJs and operations of the circuits or arrangement of MTJs in the logic tree. The energy-delay product (EDP) of the APXA is less by 81% than EA [24].

V. CONCLUSION
This paper presents a logic-in-memory multi-functional circuit that performs the main logic functions, namely NOT, AND-NAND, OR-NOR, and an arithmetic function SUM (approximate full adder) with 75% accuracy and Cout with 100% accuracy by using the control signals. It requires less hardware (five MTJs in the logic tree, five transistors in the sense amplifier, and a transistor in the dynamic current source) as compared to previous works. The approximate adder (APXA) is faster by 44.3% and consumes 52% less energy as compared to the exact adder (EA). The fully nonvolatile functionality of the proposed circuit lowers the energy consumption over the buses that connect the memory to the processor, no need for refreshing energy, and also reduces the leakage power near to zero.