Robust O ﬀ set-Cancellation Sense Ampliﬁer for an O ﬀ set-Canceling Dual-Stage Sensing Circuit in Resistive Nonvolatile Memories

: With technology scaling, achieving a target read yield of resistive nonvolatile memories becomes more di ﬃ cult due to increased process variation and decreased supply voltage. Recently, an o ﬀ set-canceling dual-stage sensing circuit (OCDS-SC) has been proposed to improve the read yield by canceling the o ﬀ set voltage and utilizing a double-sensing-margin structure. In this paper, an o ﬀ set-canceling zero-sensing-dead-zone sense ampliﬁer (OCZS-SA) combined with the OCDS-SC is proposed to signiﬁcantly improve the read yield. The OCZS-SA has two major advantages, namely, o ﬀ set voltage cancellation and a zero sensing dead zone. The Monte Carlo HSPICE simulation results using a 65-nm predictive technology model show that the OCZS-SA achieves 2.1 times smaller o ﬀ set voltage with a zero sensing dead zone than the conventional latch-type SAs at the cost of an increased area overhead of 1.0% for a subarray size of 128 × 16.


Introduction
Although resistive nonvolatile memories (NVMs) such as spin-transfer-torque random access memory (RAM) and resistive RAM promise higher density and lower power than conventional memories such as static RAM, dynamic RAM, and Flash memory [1][2][3], they suffer from degraded read yield following technology scaling due to the increased process variation, reduced supply voltage, and decreased read cell current (I read ) [4][5][6].
In general, two output voltages of a sensing circuit (SC), namely, V SA_data and V SA_ref , are introduced into a sense amplifier (SA) to generate a digital signal (zero or one) [7]. Considering the offset voltage in the SA (V SA_OS ) and assuming that the statistical distributions of the input voltage difference (∆V) between V SA_data and V SA_ref as well as V SA_OS are modeled by a Gaussian distribution, the read yield can be statistically expressed as the read-access pass yield for a single cell (RAPY CELL ) [8], i.e., where µ ∆V (µ SA_OS ) and σ ∆V (σ SA_OS ) are the mean and standard deviation of ∆V (V SA_OS ), respectively. The recently proposed offset-canceling dual-stage SC (OCDS-SC) has improved RAPY CELL by reducing σ ∆V due to the offset voltage cancellation in the SC and by increasing µ ∆V due to the double-sensing-margin structure [6]. Figure 1a shows an example of the ∆V distribution of the OCDS-SC with µ ∆V = 200 mV and σ ∆V = 23 mV. Figure 1b shows that RAPY CELL can be significantly improved by developing a novel SA with much smaller σ SA_OS than the typical σ SA_OS value of 20 mV [7]. If the improved RAPY CELL value is greater than a target RAPY CELL value, the read energy can be significantly saved by trading-off the improvement in RAPY CELL [9,10]. In this paper, we propose an offset-canceling zero-sensing-dead-zone SA (OCZS-SA) that is capable of significantly reducing σSA_OS by offset-voltage cancellation with a zero-sensing-dead-zone characteristic. The proposed OCZS-SA achieves 2.1 times smaller σSA_OS of 9.62 mV without any sensing dead zone at the cost of an increased area overhead of 1.0% for a subarray size of 128 × 16. The remainder of this paper is organized as follows: Section 2 describes the problems in conventional latch-type SAs; Section 3 introduces the proposed OCZS-SA; Section 4 presents the simulation results and comparison; and Section 5 presents the conclusions drawn from our study.

Problems in Conventional Latch-Type SAs
In the OCDS-SC, two SA input voltages, VSA_data and VSA_ref, are generated with a time difference due to the dual-stage sensing operation, as shown in Figure 2. In the first stage, SS1 is activated, and VSA_data generated in the OCDS-SC is introduced into the SA. In the second stage, SS2 is activated, and VSA_ref generated in the OCDS-SC is introduced into the SA. However, because of the time-difference input, a capacitive-coupling problem occurs when a conventional voltage-latched SA (VLSA) is used. Figure 3 shows the capacitive-coupling problem in the VLSA in which VSA_ref changes VSA_data to some extent (Δ) through parasitic capacitors. This problem increases σSA_OS from 20 mV to 30-50 mV, depending on the ratio of the output loading capacitance to the parasitic capacitance. Thus, the SA for the OCDS-SC should not be a VLSA type to avoid the capacitive-coupling problem.  In this paper, we propose an offset-canceling zero-sensing-dead-zone SA (OCZS-SA) that is capable of significantly reducing σ SA_OS by offset-voltage cancellation with a zero-sensing-dead-zone characteristic. The proposed OCZS-SA achieves 2.1 times smaller σ SA_OS of 9.62 mV without any sensing dead zone at the cost of an increased area overhead of 1.0% for a subarray size of 128 × 16. The remainder of this paper is organized as follows: Section 2 describes the problems in conventional latch-type SAs; Section 3 introduces the proposed OCZS-SA; Section 4 presents the simulation results and comparison; and Section 5 presents the conclusions drawn from our study.

Problems in Conventional Latch-Type SAs
In the OCDS-SC, two SA input voltages, V SA_data and V SA_ref , are generated with a time difference due to the dual-stage sensing operation, as shown in Figure 2. In the first stage, SS1 is activated, and V SA_data generated in the OCDS-SC is introduced into the SA. In the second stage, SS2 is activated, and V SA_ref generated in the OCDS-SC is introduced into the SA. However, because of the time-difference input, a capacitive-coupling problem occurs when a conventional voltage-latched SA (VLSA) is used. Figure 3 shows the capacitive-coupling problem in the VLSA in which V SA_ref changes V SA_data to some extent (∆) through parasitic capacitors. This problem increases σ SA_OS from 20 mV to 30-50 mV, depending on the ratio of the output loading capacitance to the parasitic capacitance. Thus, the SA for the OCDS-SC should not be a VLSA type to avoid the capacitive-coupling problem.
Electronics 2020, 9, x FOR PEER REVIEW 2 of 10 improved RAPYCELL value is greater than a target RAPYCELL value, the read energy can be significantly saved by trading-off the improvement in RAPYCELL [9,10]. In this paper, we propose an offset-canceling zero-sensing-dead-zone SA (OCZS-SA) that is capable of significantly reducing σSA_OS by offset-voltage cancellation with a zero-sensing-dead-zone characteristic. The proposed OCZS-SA achieves 2.1 times smaller σSA_OS of 9.62 mV without any sensing dead zone at the cost of an increased area overhead of 1.0% for a subarray size of 128 × 16. The remainder of this paper is organized as follows: Section 2 describes the problems in conventional latch-type SAs; Section 3 introduces the proposed OCZS-SA; Section 4 presents the simulation results and comparison; and Section 5 presents the conclusions drawn from our study.

Problems in Conventional Latch-Type SAs
In the OCDS-SC, two SA input voltages, VSA_data and VSA_ref, are generated with a time difference due to the dual-stage sensing operation, as shown in Figure 2. In the first stage, SS1 is activated, and VSA_data generated in the OCDS-SC is introduced into the SA. In the second stage, SS2 is activated, and VSA_ref generated in the OCDS-SC is introduced into the SA. However, because of the time-difference input, a capacitive-coupling problem occurs when a conventional voltage-latched SA (VLSA) is used. Figure 3 shows the capacitive-coupling problem in the VLSA in which VSA_ref changes VSA_data to some extent (Δ) through parasitic capacitors. This problem increases σSA_OS from 20 mV to 30-50 mV, depending on the ratio of the output loading capacitance to the parasitic capacitance. Thus, the SA for the OCDS-SC should not be a VLSA type to avoid the capacitive-coupling problem.    Because VSA_data is generated at the operating point between a load PMOS and a clamp NMOS [11,12], the voltage range of VSA_data is from almost GND (in state 0) to almost VDD (in state 1), depending on the sensing time and process variation. Thus, a conventional current-latched SA (CLSA) with a sensing dead zone cannot be applied to the OCDS-SC to achieve a supply-rail sensing capability. Figure 4 shows the sensing-dead-zone problem in a CLSA with an NMOS footswitch (FS-CLSA) and a CLSA with a PMOS headswitch (HS-CLSA) [7]. In the FS-CLSA, the input transistors (MN3 and MN4) should be turned on for correct operation, which means that VSA_data should be greater than VTHN, where VTHN is the NMOS threshold voltage. Thus, the sensing dead zone of the FS-CLSA becomes VSA_data < VTHN. In the same manner, the sensing dead zone of the HS-CLSA becomes VSA_data > VDD − |VTHP|, where VTHP is the PMOS threshold voltage.

Proposed OCZS-SA
In this section, we propose the OCZS-SA that offers two major advantages of offset voltage cancellation and zero sensing dead zone. Figure 5 shows the schematic and timing diagrams of the proposed OCZS-SA. Before we explain the OCZS-SA operation in detail, we should note that the OCZS-SA operation is fully pipelined with Because V SA_data is generated at the operating point between a load PMOS and a clamp NMOS [11,12], the voltage range of V SA_data is from almost GND (in state 0) to almost V DD (in state 1), depending on the sensing time and process variation. Thus, a conventional current-latched SA (CLSA) with a sensing dead zone cannot be applied to the OCDS-SC to achieve a supply-rail sensing capability. Figure 4 shows the sensing-dead-zone problem in a CLSA with an NMOS footswitch (FS-CLSA) and a CLSA with a PMOS headswitch (HS-CLSA) [7]. In the FS-CLSA, the input transistors (MN3 and MN4) should be turned on for correct operation, which means that V SA_data should be greater than V THN , where V THN is the NMOS threshold voltage. Thus, the sensing dead zone of the FS-CLSA becomes V SA_data < V THN . In the same manner, the sensing dead zone of the HS-CLSA becomes  Because VSA_data is generated at the operating point between a load PMOS and a clamp NMOS [11,12], the voltage range of VSA_data is from almost GND (in state 0) to almost VDD (in state 1), depending on the sensing time and process variation. Thus, a conventional current-latched SA (CLSA) with a sensing dead zone cannot be applied to the OCDS-SC to achieve a supply-rail sensing capability. Figure 4 shows the sensing-dead-zone problem in a CLSA with an NMOS footswitch (FS-CLSA) and a CLSA with a PMOS headswitch (HS-CLSA) [7]. In the FS-CLSA, the input transistors (MN3 and MN4) should be turned on for correct operation, which means that VSA_data should be greater than VTHN, where VTHN is the NMOS threshold voltage. Thus, the sensing dead zone of the FS-CLSA becomes VSA_data < VTHN. In the same manner, the sensing dead zone of the HS-CLSA becomes VSA_data > VDD − |VTHP|, where VTHP is the PMOS threshold voltage.

Proposed OCZS-SA
In this section, we propose the OCZS-SA that offers two major advantages of offset voltage cancellation and zero sensing dead zone. Figure 5 shows the schematic and timing diagrams of the proposed OCZS-SA. Before we explain the OCZS-SA operation in detail, we should note that the OCZS-SA operation is fully pipelined with

Proposed OCZS-SA
In this section, we propose the OCZS-SA that offers two major advantages of offset voltage cancellation and zero sensing dead zone.  Figure 5 shows the schematic and timing diagrams of the proposed OCZS-SA. Before we explain the OCZS-SA operation in detail, we should note that the OCZS-SA operation is fully pipelined with the OCDS-SC operation, as shown in the timing diagram in Figure 5. Phases 1 and 2 of the OCZS-SA are pipelined during SS1, and phases 3 and 4 are pipelined during SS2. Thus, the OCZS-SA does not incur any sensing delay penalty.   Simultaneously, the OUT and OUTB nodes are precharged to VDD for reliable sensing operation. In phase 3 (Figure 6c), the P3 signal is activated, and the OCZS-SA waits for VSA_ref to be generated in the OCDS-SC. In phase 4 (Figure 6d), the P4 signal is activated, and VSA_ref is captured at the INB node. Thus, the voltages in the IN and INB nodes VIN and VINB become VTH1 + VSA_data and VTH2 + VSA_ref, respectively. After phase 4, the SAE signal is activated, and a digital signal (zero or one) is generated by the voltage difference between VIN (= VTH1 + VSA_data) and VINB (= VTH2 + VSA_ref). We note that the operation after phase 4 is the same as that in the FS-CLSA.  Simultaneously, the OUT and OUTB nodes are precharged to V DD for reliable sensing operation. In phase 3 (Figure 6c), the P3 signal is activated, and the OCZS-SA waits for V SA_ref to be generated in the OCDS-SC. In phase 4 (Figure 6d), the P4 signal is activated, and V SA_ref is captured at the INB node. Thus, the voltages in the IN and INB nodes V IN and V INB become V TH1 + V SA_data and V TH2 + V SA_ref , respectively. After phase 4, the SAE signal is activated, and a digital signal (zero or one) is generated by the voltage difference between V IN (= V TH1 + V SA_data ) and V INB (= V TH2 + V SA_ref ). We note that the operation after phase 4 is the same as that in the FS-CLSA.

First Advantage: Offset Voltage Cancellation
The first advantage of the OCZS-SA is the offset voltage cancellation of the two input NMOSs. As mentioned earlier, in phase 4, VIN and VINB become VTH1 + VSA_data and VTH2 + VSA_ref, respectively. Because the overdrive voltage (= VGS -VTH) of the input NMOS, where VGS is the input NMOS gateto-source voltage, does not depend on the VTH variation, a VTH mismatch between the two input NMOSs does not influence σSA_OS.
In addition, in the FS-CLSA (Figure 4a), because σSA_OS is dominantly determined by the input NMOSs, σSA_OS can be effectively reduced by canceling only the offset in the input NMOSs. Figure 7 shows σSA_OS of the FS-CLSA according to the SA input voltage (VSA_data) when process variation is applied only to the input NMOSs (MN3 and MN4), only to the latch NMOSs (MN1 and MN2), only to the latch PMOSs (MP1 and MP2), and to all the transistors. Figure 7 clearly shows that σSA_OS is more sensitive to the input NMOSs than to the transistors because the input NMOSs operate in the saturation region, whereas the latch NMOSs operate in the linear region. Thus, the VTH mismatch between MN1 and MN2 becomes less sensitive. Meanwhile, the latch PMOSs do not operate at the initial sensing period. Thus, the VTH mismatch between MP1 and MP2 becomes negligible. The variation in the latch NMOSs of the FS-CLSA has more effect on the σSA_OS as VSA_data increases because the initial voltage in the small parasitic capacitance between MN1 (MN2) and MN3 (MN4) is discharged much faster with increasing VSA_data, resulting in the latch NMOSs operating in the saturation region.

First Advantage: Offset Voltage Cancellation
The first advantage of the OCZS-SA is the offset voltage cancellation of the two input NMOSs. As mentioned earlier, in phase 4, V IN and V INB become V TH1 + V SA_data and V TH2 + V SA_ref , respectively. Because the overdrive voltage (= V GS − V TH ) of the input NMOS, where V GS is the input NMOS gate-to-source voltage, does not depend on the V TH variation, a V TH mismatch between the two input NMOSs does not influence σ SA_OS .
In addition, in the FS-CLSA (Figure 4a), because σ SA_OS is dominantly determined by the input NMOSs, σ SA_OS can be effectively reduced by canceling only the offset in the input NMOSs. Figure 7 shows σ SA_OS of the FS-CLSA according to the SA input voltage (V SA_data ) when process variation is applied only to the input NMOSs (MN3 and MN4), only to the latch NMOSs (MN1 and MN2), only to the latch PMOSs (MP1 and MP2), and to all the transistors. Figure 7 clearly shows that σ SA_OS is more sensitive to the input NMOSs than to the transistors because the input NMOSs operate in the saturation region, whereas the latch NMOSs operate in the linear region. Thus, the V TH mismatch between MN1 and MN2 becomes less sensitive. Meanwhile, the latch PMOSs do not operate at the initial sensing period. Thus, the V TH mismatch between MP1 and MP2 becomes negligible. The variation in the latch NMOSs of the FS-CLSA has more effect on the σ SA_OS as V SA_data increases because the initial voltage in the small parasitic capacitance between MN1 (MN2) and MN3 (MN4) is discharged much faster with increasing V SA_data , resulting in the latch NMOSs operating in the saturation region.

Second Advantage: Zero Sensing Dead Zone
Unlike the FS-CLSA with a sensing dead zone in the region of VSA_data < VTHN, as shown in Figure  7, the OCZS-SA does not have any sensing dead zone because in phase 4, VIN and VINB are always greater than VTH1 and VTH2, respectively, even if VSA_data and VSA_ref are 0 V. Thus, supply-rail sensing capability is achieved.

Simulation Results and Comparison
HSPICE Monte Carlo simulations were performed using a 65-nm predictive technology model at VDD = 1.1 V. To fully pipeline the operation with the OCDS-SC, each phase operation time (TP1, TP2, TP3, and TP4 for phases 1-4, respectively) was set to 0.5 ns. Figure 8 shows σSA_OS according to the SA input voltage (VSA_data) of the FS-CLSA, HS-CLSA, VLSA with double switches and transmission gate access transistors (DSTA-VLSA) without timedifference inputs, DSTA-VLSA with time-difference inputs, and OCZS-SA. Among the various VLSAs such as the VLSA with an NMOS footswitch and PMOS access transistors, VLSA with a PMOS headswitch and NMOS access transistors, and DSTA-VLSA, only the latter is compared in this paper because only the DSTA-VLSA can achieve a zero sensing dead zone [7]. As mentioned in Section 2, the capacitive-coupling problem increases σSA_OS of the DSTA-VLSA when time-difference inputs are applied to the DSTA-VLSA. The FS-CLSA and HS-CLSA suffer from the sensing-dead-zone problem. On the other hand, Figure 8 clearly shows that the OCZS-SA achieved 2.1 times smaller σSA_OS of 9.62 mV on average (minimum σSA_OS = 5.07 mV at VSA_data = 0.3 V; maximum σSA_OS = 25.41 mV at VSA_data = 1.1 V) with a zero sensing dead zone. In the same manner as that of the FS-CLSA, the variation in the latch NMOSs of the OCZS-SA significantly affected σSA_OS as VSA_data increased. Thus, σSA_OS tended to increase with VSA_data.

Second Advantage: Zero Sensing Dead Zone
Unlike the FS-CLSA with a sensing dead zone in the region of V SA_data < V THN , as shown in Figure 7, the OCZS-SA does not have any sensing dead zone because in phase 4, V IN and V INB are always greater than V TH1 and V TH2 , respectively, even if V SA_data and V SA_ref are 0 V. Thus, supply-rail sensing capability is achieved.

Simulation Results and Comparison
HSPICE Monte Carlo simulations were performed using a 65-nm predictive technology model at V DD = 1.1 V. To fully pipeline the operation with the OCDS-SC, each phase operation time (T P1 , T P2 , T P3 , and T P4 for phases 1-4, respectively) was set to 0.5 ns. Figure 8 shows σ SA_OS according to the SA input voltage (V SA_data ) of the FS-CLSA, HS-CLSA, VLSA with double switches and transmission gate access transistors (DSTA-VLSA) without time-difference inputs, DSTA-VLSA with time-difference inputs, and OCZS-SA. Among the various VLSAs such as the VLSA with an NMOS footswitch and PMOS access transistors, VLSA with a PMOS headswitch and NMOS access transistors, and DSTA-VLSA, only the latter is compared in this paper because only the DSTA-VLSA can achieve a zero sensing dead zone [7]. As mentioned in Section 2, the capacitive-coupling problem increases σ SA_OS of the DSTA-VLSA when time-difference inputs are applied to the DSTA-VLSA. The FS-CLSA and HS-CLSA suffer from the sensing-dead-zone problem. On the other hand, Figure 8 clearly shows that the OCZS-SA achieved 2.1 times smaller σ SA_OS of 9.62 mV on average (minimum σ SA_OS = 5.07 mV at V SA_data = 0.3 V; maximum σ SA_OS = 25.41 mV at V SA_data = 1.1 V) with a zero sensing dead zone. In the same manner as that of the FS-CLSA, the variation in the latch NMOSs of the OCZS-SA significantly affected σ SA_OS as V SA_data increased. Thus, σ SA_OS tended to increase with V SA_data . Figure 9 shows the average σ SA_OS of the OCZS-SA according to the width of the PMOSCAP for C SA (W CSA ) when the PMOSCAP length (L CSA ) was 0.2 µm. By considering the area overhead, a W CSA value of 2.0 µm was selected. We note that the effect of the C SA size on the loading of the OCDS-SC is negligible because C SA was serially coupled to the input capacitance (C IN ) at nodes IN and INB (total loading capacitance = C SA //C IN ≈ C IN ). Figure 10 shows normalized σ SA_OS of the OCZS-SA according to T P1 . Because σ SA_OS is saturated at T P1 of approximately 0.2 ns, the OCZS-SA can be fully pipelined without any problem.         Table 1 lists the performance summary and comparison between the proposed OCZS-SA and the conventional latch-type SAs. The OCZS-SA achieved a zero sensing dead zone and a 2.1 times smaller σSA_OS of 9.62 mV, on average, than the FS-CLSA. From the SA viewpoint, owing to the additional transistors and phases, the OCZS-SA requires 37% more layout area (Figure 11a) and 125% more read energy compared with the FS-CLSA. From the array architecture viewpoint, however, the area overhead is only 1.0% when the subarray size is 128 × 16 (Figure 11b), and it decreases as the  Table 1 lists the performance summary and comparison between the proposed OCZS-SA and the conventional latch-type SAs. The OCZS-SA achieved a zero sensing dead zone and a 2.1 times Electronics 2020, 9, 1403 8 of 10 smaller σ SA_OS of 9.62 mV, on average, than the FS-CLSA. From the SA viewpoint, owing to the additional transistors and phases, the OCZS-SA requires 37% more layout area (Figure 11a) and 125% more read energy compared with the FS-CLSA. From the array architecture viewpoint, however, the area overhead is only 1.0% when the subarray size is 128 × 16 (Figure 11b), and it decreases as the subarray size increases. In addition, the read-energy consumption of the SC part is much greater (>70 fJ [6]) than that of the SA part. As a result, if RAPY CELL , which is improved by employing the OCZS-SA, is greater than a target RAPY CELL , the total read energy in the SC and SA can be saved by reducing the SC operation time (T SC ) and/or I read . This result can be achieved in spite of the higher read energy of the OCZS-SA, which sacrifices some of the improvement in RAPY CELL but satisfies target RAPY CELL . By employing the OCZS-SA together with the OCDS-SC, RAPY CELL increases from 6.0σ to 8.7σ. The RAPY CELL values of 6.0σ and 8.7σ correspond to sensing error rates of 9.87 × 10 −10 and 1.66 × 10 −18 , respectively. Therefore, the OCZS-SA yields an eighth-order improvement in the read yield relative to the conventional SAs.   Figure 12 shows that the minimum Iread that satisfies a target RAPYCELL value of 6σ is reduced by 21%-32% by sacrificing the improvement in RAPYCELL. Figure 13 shows that the read energy in the SC and SA is accordingly reduced by approximately 13%-16%.   Figure 12 shows that the minimum I read that satisfies a target RAPY CELL value of 6σ is reduced by 21-32% by sacrificing the improvement in RAPY CELL . Figure 13 shows that the read energy in the SC and SA is accordingly reduced by approximately 13-16%. Figure 12 shows that the minimum Iread that satisfies a target RAPYCELL value of 6σ is reduced by 21%-32% by sacrificing the improvement in RAPYCELL. Figure 13 shows that the read energy in the SC and SA is accordingly reduced by approximately 13%-16%.   architecture areas when DSTA-VLSA, FS-CLSA, and OCZS-SA are employed. Figure 12 shows that the minimum Iread that satisfies a target RAPYCELL value of 6σ is reduced by 21%-32% by sacrificing the improvement in RAPYCELL. Figure 13 shows that the read energy in the SC and SA is accordingly reduced by approximately 13%-16%.

Read energy/bit [norm]
T

Conclusions
The conventional latch-type SAs cannot be applied to the OCDS-SC due to the capacitive-coupling and sensing-dead-zone problems. In this paper, we have proposed the OCZS-SA, which offers two major advantages: offset voltage cancellation and zero sensing dead zone. The simulation results prove that the OCZS-SA can achieve a 2.1 times smaller σ SA_OS value of 9.62 mV without any sensing dead zone at the cost of an increased area overhead of 1.0% for a subarray size of 128 × 16. Furthermore, a 13-16% read-energy saving is achieved due to the 21-32% reduction in I read . Thus, the OCZS-SA can be a compelling candidate for the OCDS-SC in deep submicrometer resistive NVMs.