A Timing-Based Split-Path Sensing Circuit for STT-MRAM

Spin-transfer torque magnetoresistive random access memory (STT-MRAM) applications have received considerable attention as a possible alternative for universal memory applications because they offer a cost advantage comparable to that of a dynamic RAM with fast performance comparable to that of a static RAM, while solving the scaling issues faced by conventional MRAMs. However, owing to the decrease in supply voltage (VDD) and increase in process fluctuations, STT-MRAMs require an advanced sensing circuit (SC) to ensure a sufficient read yield in deep submicron technology. In this study, we propose a timing-based split-path SC (TSSC) that can achieve a greater read yield compared to a conventional split-path SC (SPSC) by employing a timing-based dynamic reference voltage technique to minimize the threshold voltage mismatch effects. Monte Carlo simulation results based on industry-compatible 28-nm model parameters reveal that the proposed TSSC method obtains a 42% higher read access pass yield at a nominal VDD of 1.0 V compared to the SPSC in terms of iso-area and -power, trading off 1.75× sensing time.


Introduction
To prolong the battery life in lithium-ion battery-powered applications, such as smartphones, wearable devices, and wireless sensor nodes, it is crucial to achieve good performance with low power consumption [1]. Even though a conventional magnetoresistive random access memory (MRAM) has the speed of a static RAM (SRAM) and density of a dynamic RAM (DRAM), it has unique problems, including poor scalability and excessive power consumption owing to large write currents. Therefore, spin-transfer torque MRAM (STT-MRAM) has emerged as the top choice for universal memory applications owing to its short access time, low power consumption, and high density [2]. In addition, STT-MRAM has outstanding scalability, as the critical switching current of the magnetic tunnel junction (MTJ) decreases with device size to overcome the scaling problems faced by conventional memories, such as DRAM, SRAM, and flash memories [3][4][5]. In other words, STT-MRAM can achieve a higher performance than DRAM and smaller cell size than SRAM, with the nonvolatility of flash memory [6][7][8][9].
However, STT-MRAM faces a read yield degradation problem when used in deep submicron technologies because of the large process variations in low supply voltage (V DD ) and small resistance difference between the low resistance (state 0) and high resistance (state 1) states of the MTJ [10,11]. The current (∆I 0 or ∆I 1 ) or voltage differences (∆V 0 or ∆V 1 ) generated in a sensing circuit (SC), which are then conveyed to a sense amplifier (SA), can be expressed as: where µ ∆V0,1 and µ SA_OS are the means of ∆V 0,1 and V SA_OS , respectively, and σ ∆V0,1 and σ SA_OS are the standard deviations of ∆V 0,1 and V SA_OS , respectively. RAPY CELL is defined as the minimum value between RAPY CELL0 and RAPY CELL1 .
In this study, a novel timing-based split-path sensing circuit (TSSC) that is tolerant to process variations and increases ∆V 0,1 value is proposed and compared with various SCs with respect to RAPY CELL , delay, and power consumption. It improves µ ∆V0,1 using the dynamic reference voltage (DRV) technique that modifies V ref according to the MTJ state. It also reduces σ ∆V0,1 by compensating the transistor mismatch and effectively increasing the sensing current. Even though the split-path sensing circuit (SPSC) [10] is considered to have optimal performance in terms of read yield, Monte Carlo HSPICE simulation results based on industry-compatible 28-nm model parameters reveal that the proposed TSSC achieves a 42% boost in RAPY CELL at a V DD of 1.0 V when compared to the SPSC in terms of iso-area and -power. The remainder of this paper is organized as follows: Section 2 describes the operational principles and characteristics of the conventional SCs and proposed TSSC. Section 3 compares the performance of the proposed SC and conventional SCs. Finally, Section 4 presents the conclusions drawn from our study.

Previous SCs and Proposed TSSC
In this section, the characteristics and operation principles of the existing and proposed SCs are described.

Existing SCs
Three conventional SCs [10,11,13] for STT-MRAM are shown in Figure 1 along with their output voltage distributions. To analyze the voltage distributions, industry-compatible 28-nm model parameter libraries were used. Temperature was set to room temperature (25 • C) and, to compare in terms of iso-area and -power, the layout area of SCs were set to be identical by varying the transistor sizes. In addition, the gate voltage of the clamp NMOS (V CLAMP ) was used differently for each SC to set the sensing current to 20 µA at state 1.
NMOS (VCLAMP) was used differently for each SC to set the sensing current to 20 μA at state 1.   Figure 1a,b illustrate the schematics and output voltage distributions of the source degeneration sensing circuit (SDSC) [11] at states 0 and 1, respectively. In the SDSC, the clamp NMOS (NCD, NCR) is used to generate ∆I, which is then converted to ∆V via a load PMOS (PLD, PLR) using a current mirror; then, ∆V is fed to the SA. In addition, SDSC achieves a PMOS (PDD, PDR) degeneration between V DD and load PMOS for source degeneration purposes. The source degeneration effect reduces the variation in the load PMOS and increases r O_PLD , leading to an improvement in the RAPY CELL . The I-V curve shown in Figure 2a represents the relationship between the drain voltage of each MOSFET and current through each MOSFET in the SDSC. The crossing points of the I-V curve for the load PMOS and clamp NMOS denote the operating points. For example, the crossing points for PLR and NCR, PLD0 and NCD0, and PLD1 and NCD1 are the operating points for V ref , V data0 , and V data1 , respectively. PLD0 (PLD1) passes through the operating point between PLR and NCR, as V ref denotes the gate voltage of PLD0 (PLD1). The drain voltage distribution of the PLR shown in Figure 1a has a small standard deviation because of the large slope (i.e., the output resistance is small) of the diode-connected PLR. When the output resistance is small, the voltage variation is small as well, expressed in (2). The PLD has a relatively large standard deviation owing to the small slope of the PLD I-V curve and current mirror variation. Thus, to obtain a proper sensing margin in SDSC, the PLD variation must be reduced. SDSC can obtain an adequate read yield in 65-nm process technology because the degeneration effect reduces the process variation in the SC. However, under 65-nm process technology, the SDSC suffers from significant read yield degradation because the fixed V ref limits µ ∆V0,1 and process variation increment increases σ ∆V0,1 .  Figure 1c,d illustrate the SC with a highly symmetric cross-coupled current mirror (HSCC) [13] at states 0 and 1, respectively. To address the fixed Vref issue observed in SDSC, the HSCC uses the DRV technique that adjusts Vref according to the data state to  Figure 1c,d illustrate the SC with a highly symmetric cross-coupled current mirror (HSCC) [13] at states 0 and 1, respectively. To address the fixed V ref issue observed in SDSC, the HSCC uses the DRV technique that adjusts V ref according to the data state to enhance the sensing margin [14]. Figure 2b shows the I-V curve of HSCC, where crossing point of POR and NOR transistors' I-V curves represents V ref . As demonstrated, V ref decreases at state 1 (V ref1 ) and increases at state 0 (V ref0 ). The DRV technique can almost double µ ∆V0,1 compared to the fixed V ref approach used in the SDSC. However, in the HSCC, three current mirrors from the PLD to the NCMD are employed to generate V data0,1 . Consequently, these current mirrors induce a substantial current mismatch, which increases σ ∆V0,1 . Neglecting the channel-length modulation, the current through the NCD (I NCD ) for the current mirror is expressed as where A is the transconductance parameter, V GS is the gate-to-source voltage of the NCD, and V TH is the threshold voltage. The current through the PCMD (I data_cm0 ), including the mismatch effects, is given by where ∆A is the transconductance mismatch that results in a gain error and ∆V TH is the threshold voltage mismatch that results in voltage offsets [15]. The current through the NOD (I data_cm1 ), including the mismatch effects, is given by The current through the POD is similar to I data_cm0 . Furthermore, PMOS-based circuits are vulnerable to mismatches because of poor gate oxide capacitance matching and high mobility variations [16]. Thus, when numerous current mirrors are used, the current mismatch increases, as expressed in (7). The standard deviations of V data0,1 (V ref0,1 ) shown in Figure 1c,d are relatively large because of the many steps in the current mirror and small slope of the POD0,1 (POR0,1) I-V curve. Consequently, even though the HSSC doubles µ ∆V0,1 using the DRV technique, it cannot guarantee a sufficient read yield in deep submicron technology. Figure 1e,f show the SPSC [10] at states 0 and 1, respectively. The SPSC uses a split path to employ the DRV technique and achieves PMOS degeneration to reduce the variation. Figure 2c shows that the SPSC successfully modifies V ref according to the MTJ state; the subsequent V ref0 and V ref1 values that confirm this are presented in Figure 1e,f. Furthermore, to overcome the process variation issue faced by the HSCC, the SPSC utilizes a split-path technique instead of current mirrors. Instead of employing current mirrors to transmit the NCD saturation current to the NOD as HSCC does, the SPSC uses the split path on the NCD source and applies the same gate-to-source voltage to both the NCD and NOD. As a result, even without a current mirror, the saturation currents of the NCD and NOD are equalized. The standard deviations of V data0,1 (V ref0,1 ) presented in Figure 1e,f is smaller than those of the HSCC because the number of current mirrors is reduced. Therefore, among the existing SCs, SPSC is superior in terms of the RAPY CELL . However, as shown in Figure 1e,f, the standard deviations of V ref0 and V data1 remain large (i.e., 106 mV and 101 mV). This is because the data current (I data ) is split in half by the split-path scheme (I data = I data_cm + I data_sp ), and the lowered current is sensitive to the increased process variations in deep submicron technology.

Proposed TSSC
In SPSC, the lowered saturation current caused by the split path is vulnerable against process variations. The TSSC maintains the advantages of SPSC (i.e., the DRV technique, thereby increasing the sensing margin ∆V 0,1 ) but overcomes the lowered saturation current problem. Figure 3a shows the schematic and timing diagram of the proposed TSSC. The key circuit design differences between the TSSC and SPSC are the inclusion of two transistors as switches (ST1 and ST2) and exclusion of NOD and NOR by implementing a timing-based split-path scheme. To reduce σ ∆V0,1 by enhancing the current for each path, the MOSFET operating times are controlled by the PRE signal. Figure 3b shows the TSSC operation in phase 1 (P1) at state 0. In P1, the PRE signal is high; thus, PD0, PD3, ST1, and ST2 are turned on at the beginning. The gate voltage of POR0 (POD0) starts to pre-charge as ST1 and ST2 are turned on. PLD0 and PLR0 reach saturation points after a sufficient pre-charge time if I data0 and I ref0 are constant in P1, regardless of the process variations in PLD0 (PLR0). At the end of P1, the gate voltage of POR0 (POD0) is kept at the same value as V data0 (V ref0 ). However, POR0 and POD0 cannot operate because PD1 and PD2 are turned off, respectively. Figure 3c shows the TSSC operation in phase 2 (P2) at state 0. At the beginning of P2, the PRE signal is turned off. Thus, PD0, PD3, ST1, and ST2 are turned off, and PD1 and PD2 are turned on. When PLD0 and PLR0 are cut off from V DD , POR0 and POD0 can operate because the gate voltages of POR0 and POD0 are pre-charged to V data0 and V ref0 , respectively, in P1. Thus, the saturation current of POR0 is transmitted from NCD0 as the SPSC. However, for TSSC, the saturation current of POR0 is approximately two times larger than that of the SPSC, regardless of process variation. This is because PLD0 is turned off in P2 and there is no current division. In addition, in the TSSC, without current division, the clamp NMOS number is less than that of the SPSC, which is achieved by shifting the split-path position to the drain of the clamp NMOS. Therefore, the TSSC implements the DRV technique with the minimization of σ ∆V0,1 , which is confirmed by the TSSC voltage distribution diagram shown in Figure 3c,e. The process variation in the TSSC can be further decreased compared to the SPSC because in terms of iso-area, the size of MOSFETs in the TSSC can be increased as the number of clamp NMOS is reduced. Figure 3d,e show the TSSC operation at state 1 in P1 and P2, respectively. The TSSC operation at state 1 is nearly identical to that at state 0, with only a change in the MTJ state. Figure 4 shows the TSSC I-V curves, from which V data0,1 and V ref0,1 can be estimated. The crossing point in the I-V curves for NCD0 (NCD1) and POD0 (POD1) is the operating point for V data0 (V data1 ). Furthermore, the crossing point in the I-V curves of NCR0 (NCR1) and POR0 (POR1) is the operating point for V ref0 (V ref1 ). V ref is successfully modified according to the MTJ state, and operation current is nearly doubled compared to that of SPSC.

Simulation Conditions
All simulation results included in this section were obtained using Mont

Simulation Conditions
All simulation results included in this section were obtained using Monte Carlo HSPICE simulations implemented in industry-compatible 28-nm model parameter libraries. The µ SA_OS and σ SA_OS values for calculating RAPY CELL were set to 0 and 20 mV, respectively [17]. Furthermore, a standard deviation of 4% was considered for the MTJ variation. The MTJ model used in this study had an R 1 (R H , anti-parallel) of 6 kΩ and R 0 (R L , parallel) of 3 kΩ, considering a tunnel magnetoresistance (TMR) ratio of 100%. For a fair comparison between SCs in terms of iso-area and iso-power, all transistor sizes were chosen such that the layout area (=sum of each transistor's area (width × length)) of each SC was 1.76 µm 2 . Moreover, the V CLAMP for each SC was precisely set such that it generated 20 µA at state 1. In addition, the optimal R ref for each circuit was used for architectural analysis.

Results and Comparison
RAPY CELL0 and RAPY CELL1 increased when R ref increased and decreased, respectively. Accordingly, the crossing points for RAPY CELL0 and RAPY CELL1 were the maximum values of RAPY CELL . Figure 5 shows RAPY CELL0,1 for SDSC, HSCC, SPSC, and TSSC with respect to R ref when the temperature was set to room temperature (25 • C). The HSCC achieved the lowest RAPY CELL value owing to the large output variation due to the current mismatch of the multiple current mirrors. Despite the large µ ∆V0,1 , the SDSC and SPSC exhibited the same value for RAPY CELL in terms of iso-area because the SDSC transistor size was twice as large as that of the SPSC. The RAPY CELL value of TSSC was the largest compared with that of the conventional SC because the DRV technique was successfully implemented and a timing-based split-path scheme overcame the lowered data current problem faced by SPSC.  Figure 6 shows the RAPYCELL for SDSC, HSCC, SPSC, and TSSC with respect to VDD and VCLAMP. To analyze the SCs in terms of iso-power, VCLAMP for each SC was precisely set such that it generates 20 μA at state 1. The temperature range was set in the range from −45 to 90 °C, and the worst case was chosen for RAPYCELL calculation. As shown in Figure  6a, the HSCC exhibited minimal variations in RAPYCELL when VDD increased because σ∆V0,1    Figure 6 shows the RAPY CELL for SDSC, HSCC, SPSC, and TSSC with respect to V DD and V CLAMP . To analyze the SCs in terms of iso-power, V CLAMP for each SC was precisely set such that it generates 20 µA at state 1. The temperature range was set in the range from −45 to 90 • C, and the worst case was chosen for RAPY CELL calculation. As shown in Figure 6a, the HSCC exhibited minimal variations in RAPY CELL when V DD increased because σ ∆V0,1 was excessively large owing to the number of current mirrors. When V DD was less than 0.9 V, RAPY CELL for the SPSC was greater than that for the TSSC. In the TSSC, RAPY CELL was significantly reduced at low V DD because the decrease rate of µ ∆V0,1 was high. When V DD was greater than 0.9 V, the TSSC achieved a large value for RAPY CELL because the increase rate of µ ∆V0,1 was higher than that of σ ∆V0,1 . Figure 6b shows RAPY CELL with respect to V CLAMP when V DD = 1.0 V. RAPY CELL value's reliance on V CLAMP is not linear because when V CLAMP is low, the operating current is not enough, and when V CLAMP is high µ ∆V0,1 decreases because of the decreased r O_PLD . Thus, V CLAMP has an optimal value that maximizes the read yield. The HSCC has low RAPY CELL and no optimal value according to V CLAMP in the range from 0.55 to 0.8 V. The optimal V CLAMP values for SDSC and SPSC are 0.65 V and 0.7 V, respectively. However, STT-MRAM requires low sensing current to prevent read disturbances. An unintentional write operation occurs during a read operation when the critical MTJ switching current is lower than the read current [11]. In the TSSC, the optimal V CLAMP value, 0.6 V, is lower than that of conventional SCs, and the RAPY CELL value is the highest. Therefore, the TSSC is suitable to be implemented in STT-MRAM, which requires low sensing current.  Figure 7a illustrates the RAPYCELL values of the SDSC, HSCC, SPSC, and proposed TSSC with respect to the TMR of MTJ when VDD is set to 1 V and VCLAMP is set to 0.6 V. The temperature range was set in the range from −45 to 90 °C and the worst case was chosen for RAPYCELL calculation. TMR is calculated as (R1 − R0)/R0 and ideally assumes a high values because it influences the sensing speed, read margin, and noise margin of the memory cell. However, the SCs for STT-MRAM must be built to compensate for processrelated fluctuations in the TMR value. As shown in Figure 7a, for low sensing current applications, even when the TMR value is decreased to 60%, the proposed TSSC maintains a greater RAPYCELL value compared to existing SCs. Figure 7b plots the RAPYCELL as a function of sensing time when VDD was set to 1 V and VCLAMP was set to 0.6 V. Because the TSSC uses a two-phase sensing operation for the timing-split-based DRV technique, it re-  Figure 7a illustrates the RAPY CELL values of the SDSC, HSCC, SPSC, and proposed TSSC with respect to the TMR of MTJ when V DD is set to 1 V and V CLAMP is set to 0.6 V. The temperature range was set in the range from −45 to 90 • C and the worst case was chosen for RAPY CELL calculation. TMR is calculated as (R 1 − R 0 )/R 0 and ideally assumes a high values because it influences the sensing speed, read margin, and noise margin of the memory cell. However, the SCs for STT-MRAM must be built to compensate for process-related fluctuations in the TMR value. As shown in Figure 7a, for low sensing current applications, even when the TMR value is decreased to 60%, the proposed TSSC maintains a greater RAPY CELL value compared to existing SCs. Figure 7b plots the RAPY CELL as a function of sensing time when V DD was set to 1 V and V CLAMP was set to 0.6 V. Because the TSSC uses a two-phase sensing operation for the timing-split-based DRV technique, it requires sufficient time for charging and discharging. Thus, a sharp increase in RAPY CELL can be observed at approximately 8 ns, and the RAPY CELL value is saturated at approximately 14 ns. The existing designs are not sensitive to the sensing time compared to the proposed TSSC, which provides more accurate sensing at the expense of increased sensing times.
achieved the lowest PAVG compared with the conventional SCs. However, the proposed TSSC requires a longer sensing time because of its charging and discharging times. As shown in Table 1, the TSSC can achieve a 42% higher RAPYCELL value compared to that of SPSC. Because of the read disturbance, sensing circuitries are required to work with low sensing currents. Moreover, the RAPYCELL value of the proposed TSSC is the highest in low-current sensing tasks. Therefore, the TSSC is suitable for STT-MRAM applications, which require low sensing currents.    As mentioned earlier, the TSSC exploits the gate capacitance of the PLD, POR, POD, and PLR transistors to accumulate charge during P1, which raises the question of whether the capacitor-less design is better than using a capacitor. Figure 8 shows the TSSC RAPY CELL value with respect to the additional capacitor capacitance when two capacitors are added at the gates of the PLD and PLR. The RAPY CELL value increases until the additional capacitor capacitance reaches 2 fF but the increment is insignificant and starts to decrease as the capacitance increases further because of the limited sensing time. Moreover, if a capacitor is added to the design, the sizes of all transistors need to be reduced in terms of iso-area, which results in a decrease in the gate capacitance and performance degradation. Accordingly, Figure 8 indicates that the gate capacitances of PLD, POR and POD, PLR pairs are sufficient for the TSSC to accumulate charge during P1 to mirror the data current during P2. achieved the lowest PAVG compared with the conventional SCs. However, the proposed TSSC requires a longer sensing time because of its charging and discharging times. As shown in Table 1, the TSSC can achieve a 42% higher RAPYCELL value compared to that of SPSC. Because of the read disturbance, sensing circuitries are required to work with low sensing currents. Moreover, the RAPYCELL value of the proposed TSSC is the highest in low-current sensing tasks. Therefore, the TSSC is suitable for STT-MRAM applications, which require low sensing currents.    Figure 8. RAPY CELL of TSSC with respect to capacitance value when an additional capacitor is assumed. Table 1 summarizes the simulation results and compares the proposed TSSC with conventional SCs. The simulation to calculate the average power consumption (P AVG ) used and the V CLAMP and R ref values are presented in Figure 5. In both states 0 and 1, the TSSC achieved the lowest P AVG compared with the conventional SCs. However, the proposed TSSC requires a longer sensing time because of its charging and discharging times. As shown in Table 1, the TSSC can achieve a 42% higher RAPY CELL value compared to that of SPSC. Because of the read disturbance, sensing circuitries are required to work with low sensing currents. Moreover, the RAPY CELL value of the proposed TSSC is the highest in low-current sensing tasks. Therefore, the TSSC is suitable for STT-MRAM applications, which require low sensing currents.  (2) Temperature was set to room temperature (25 • C) and V CLAMP was set to the values depicted in Figure 5. (3) The worst RAPY CELL value was calculated by comparing the results obtained at 45 • C and 90 • C when V CLAMP was set to 0.6 V.

Conclusions
In this study, we proposed a TSSC using the DRV technique and novel split path over time, which maintains a large µ ∆V0,1 and reduces σ ∆V0,1 . The proposed TSSC can reduce the V TH variation effects by increasing the current and reducing the transistor mismatch. The simulation results indicate that conventional SCs exhibit low read yield because of small µ ∆V0,1 or large σ ∆V0,1 values. In contrast, the TSSC obtains the greatest read yield in the 28-nm process technology.