A Minimum Leakage Quasi-Static RAM Bitcell

As SRAMs continue to grow and comprise larger percentages of the area and power consumption in advanced systems, the need to minimize static currents becomes essential. This brief presents a novel 9T Quasi-Static RAM Bitcell that provides aggressive leakage reduction and high write margins. The quasi-static operation method of this cell, based on internal feedback and leakage ratios, minimizes static power while maintaining sufficient, albeit depleted, noise margins. This paper presents the concept of the novel cell, and discusses the stability of the cell under hold, read and write operations. The cell was implemented in a low-power 40 nm TSMC process, showing as much as a 12× reduction in leakage current at typical conditions, as compared to a standard 6T or 8T bitcell at the same supply voltage. The implemented cell showed full functionality under global and local process variations at nominal and low voltages, as low as 300 mV.


Introduction
Throughout the past decade, power dissipation has replaced high performance as the central focus of VLSI design, primarily due to the ever increasing rise in popularity of portable devices.As process technologies continue to advance, device scaling generally leads to a decrease in switched capacitance and a degradation of transistor I on /I off ratio, indirectly causing static power to dominate the total power consumed by digital circuits [1].Subthreshold leakage is a problem for all system components, but it is a particularly important problem in on-chip caches, as they are a growing fraction of the total number of microprocessor devices.Today, SRAMs comprise a significant percentage of the total area and total power for many digital chips, and this is only expected to rise [2].Furthermore, leakage power is becoming the primary factor of cache power consumption due to the large number of storage cells (cross-coupled inverters) in on-chip caches, where there is no stacking effect to reduce the leakage current.The source of this leakage current is from sub-threshold and gate leakage ever-present in a standard 6T SRAM cell at its steady state.At least two transistors provide high sub-threshold leakage due to Drain Induced Barrier Lowering (DIBL) with the drain-to-source voltage at V DD , and an additional pair of transistors provides high gate leakage, with their gate-to-bulk voltage at V DD .Most data in caches is accessed relatively infrequently due to either temporal or spatial locality.Thus, as the cost of storing data increases in the form of leakage power, the contribution of dynamic power dissipation diminishes [1].
Over the years, many techniques have been proposed for the reduction of SRAM power consumption [2][3][4][5][6][7], but the most efficient way to reduce the power is generally considered to be lowering the operating voltage.This can either be done in a dynamic scheme according to operating conditions [8] or statically during hold cycles.However, a standard 6T bitcell is limited to a minimum operating voltage of approximately 0.7 V, mainly due to process variations that degrade the read and write margins [2].Read margin constraints are solved by using a two-port 8T bitcell, due to its decoupled readout path (for example, the circuits used in [9,10]), however write margins still limit this cell to 700 mV under global and local variations.Recently, many groups have developed robust bitcells, designed for low voltage and sub-threshold operation.In 2007, a 256 kb SRAM array in 65 nm with a 10T bitcell, operating under 400 mV at 475 kHz was presented by Calhoun and Chandrakasan [2] showing a 3.28 μW power consumption.In 2009, a 32 kb SRAM in 90 nm with a 10T bitcell, operating successfully at 160 mV at 500 Hz with a read power dissipation of 0.123 μW was presented by Roy's group at Purdue [11].A thorough overview of sub-threshold SRAM operation is given by Wang et al. [12].Very few designs have actually changed the basic internal structure of the bitcell to achieve additional leakage reduction or improved low voltage functionality.Levacq et al. [7] proposed one such design, based on their Ultra-Low Power (ULP) Diode.This implementation provided a very interesting and novel approach to bitcell leakage reduction and robustness, showing a leakage reduction of 40×.However, operation under local variations at sub-100 nm process nodes was not presented.
In this paper, we present a novel 9T Quasi-SRAM bitcell for low-voltage, ultra-low leakage operation.The proposed cell internally cuts off the supply, and the stable states are set by leakage current ratios, resulting in Quasi-Static operation.This is achieved with improved write access time, with a design controlled read access time penalty, and without the need for any additional peripheral circuitry, as compared to a standard two-port 8T bitcell.The stability of the non-static state is defined with the concept of Dynamic Noise Margin (DNM), as discussed in several recent publications [13,14].Simulation results show that the proposed bitcell achieves a 12× static power reduction as compared to a standard 6T or 8T cell operated at a nominal supply voltage (1.1 V), and a 7× reduction compared to an 8T or 6T cell at 0.7 V (the lowest possible operating voltage of these cells).At high leakage process corners, this reduction increases substantially.The proposed cell was simulated under global and local process variations and was shown to maintain functionality at supply voltages as low as 300 mV.An 8kb array of Q-SRAM cell was fabricated in a standard 40nm process and preliminary measurements show full functionality.
This brief is composed as follows: The cell design and operation methods are shown in Section II; a discussion of cell stability, including the Quasi-Static nature of the cell, is discussed in Section III; Section IV presents the cell implementation and performance figures; and Section V concludes the paper.

Description
Figure 1 shows the schematic of the proposed 9T Quasi-Static RAM (Q-SRAM) bitcell.The core of the cell, comprising transistors M1-M8, is similar to a standard two-port 8T cell.The control signals are identical to an 8T cell, as well; separate word lines are used for reads (RWL) and writes (WWL), while a pair of differential bitlines (WBL and WBLB) are used for writes, and a single ended bitline (RBL) is used for reads.Operation of these control signals is identical to a standard 8T cell, as well.The innovation of the proposed cell comes from the additional supply gating transistor (M9) that is connected in a feedback loop to the QB node.This technique is similar to the Auto-Gating technique proposed by Frustaci et al. [15] At first glance, this would seem to cause the cell to lose its functionality when QB is low, as M9 is clearly cutoff and the cell is no longer static, but an in depth look into the stable states shows a much different picture.Assuming Q = '1', the inverter created by M4 and M6 drives QB low, closing M9 and gating the supply.In this state there is no low resistive path to V DD , causing leakage currents to eventually partially discharge the high state of Q.As such, it would seem that in this case, the bitcell has "lost" its stored data.However, due to the lack of a feedback inverter, the circuit reaches a quasi-stable state with Q storing a median voltage dependent on the leakage ratios of M9 and M1, and QB storing a low (zero) voltage.As with an 8T bitcell, readout is commenced through a single-ended readout buffer (comprised of M7 and M8) that is connected to QB and is unaffected by the voltage at Q. Therefore, in the hold '1' state, M7 is cutoff (QB = 0 V) and no discharge path is available from RBL through M8, resulting in a '1'-the correct state of the bitcell.This non-static state is illustrated in Figure 2b.This figure shows that a '1' will be read out, even though Q is partially discharged.In the illustrated case, standard V T pull down devices were used, causing the steady state voltage of Q to be very low, resulting in very low DIBL (on M1) and gate leakage (through M4).However, this leakage minimization comes at the expense of reduced stability, as will be discussed in Section 3. As an alternative, the ratio between M1 and M9 can be modified (for example, by using HVT pull-down devices), resulting in a slight increase in leakage power and a higher (and more robust) steady state.Note that transistors M2 and M3 have been omitted from Figure 2a.It appears from a first look at the opposite state, i.e., Q = '0', that static operation is achieved in a similar fashion as in a standard 6T or 8T cell.A high state written into QB turns M9 on, providing a supply voltage of V DD -V T9 to the positive feedback cross coupled inverter structure, such that QB would be held at this voltage, turning on M7.However, a closer look shows a more complex picture, as shown in Figure 3a.Note that M7 and M8 have been omitted from this figure for convenience.Assuming Q has been completely discharged, M6 is on with a low resistance, such that the voltages at the source of M9 (VV DD ) and at QB are equivalent.This results in a very low gate-to-source voltage for M9 (V GS → 0), gating the supply and disabling charge current to QB.Therefore, following the write operation, which would charge QB to approximately V DD -V T5 , there would be no supplementary current to QB if the level was degraded.Here again, a non-static state is reached, and it would seem that the cell is dysfunctional.However, the steady state voltage at QB is ultimately set by the ratioed contention between the leakage currents of M9 and M4.Therefore, implementing M9 with a low threshold voltage (LVT) device and the nMOS pull-down transistor (M4) with a high threshold (HVT) device, ensures a high level at QB. Figure 3b plots the current ratio of M9 vs. M4 for a given voltage at node QB.This figure enhances the fact that the leakage through M9 is much stronger than that of M4, replenishing any lost charge, or essentially providing quasi-static operation.
This unique operation scheme presents two asymmetric stable states, set by "quasi-static" leakage current ratios.Both states present substantially reduced leakage currents, with the currents in the hold '1' state (QB = '0') approaching a minimum achievable figure, due to the low V DS on both nMOS transistors (M1 and M4) and the serial resistance of the supply gating transistor, M9.The gate leakage is also highly reduced, as compared to a 6T or 8T cell, as the nMOS transistors all have small voltages across the gates.As presented in Section 3, simulation results show a nominal leakage reduction of 12× at this state as compared to a standard 8T cell at 1.1 V and as high as 31× under process variations.

Write Operation
One of the primary challenges when designing a low voltage bitcell is maintaining significant write margins.To ensure the success of a write operation, the pull-down current discharging the high internal node (Q or QB) has to overcome the pull-up current to that node.This is troublesome under process variations, when the pMOS devices in the pull-up network can be much stronger than the nMOS devices in the pull-down network, resulting in write failure.The proposed 9T Q-SRAM cell inherently solves this problem by cutting off the supply with the internal feedback node, and thus significantly weakening the pull-up network.Figure 4 illustrates the Q-SRAM write operation.Note that M7 and M8 have been omitted from this figure for convenience.Writing a '0' into a cell in the hold '1' state is shown in Figure 4a.Depending on the time that has passed since the previous write, Q is partially discharged and M9 is cut-off.In a standard 6T or 8T write operation, the write is considered to be performed on the '0' side, as the nMOS access transistors (M2 and M5) are better at passing a '0' than a '1'.And once one side is written, the positive feedback pulls the cell to one of the circuit's bi-stable points.In the case of the Q-SRAM cell, the '0' is already partially written and the supply is gated, so Q is discharged very quickly.QB is charged through M5 without contention (as M4 is cutoff almost immediately), reaching a voltage of approximately V DD -V T5 .As M9 is cutoff and VV DD is low, QB initially charges VV DD through M6.Eventually, QB will reach a level of approximately 80% of V DD , due to the current ratios shown in Figure 3b.This can be enhanced by sizing M5 to manipulate the Reverse Short Channel Effect (RSCE) and/or using an LVT implant on this device.The Write '1' operation is depicted in Figure 4b.This operation is again much easier to achieve than a standard 6T or 8T write, as the supply is gated, enabling an uncontested discharge of the high node (QB).There is no need to fully charge Q to complete the write, as the steady state has Q tending to GND; in fact, shutting off the write word line (WWL) before Q is charged will save power (in addition to decreasing the write access time).

Read Operation
The read operation of the Q-SRAM cell is identical to a standard 8T cell.The non-penetrating read, implemented by using a read buffer, is one of the common techniques to ensure high read margin that can be a limiting factor for low voltage operation.Following the write '0' operation (described above), there is a degraded voltage level on QB, reducing the overdrive of the readout transistor (M7).As mentioned above, an LVT transistor can used to implement M9 and M5 in order to increase the QB voltage.To reduce the read access time, M7 can be implemented using an LVT transistor, however this increases the off-leakage of the buffer, reducing the number of bitcells in a column.This leakage can be reduced through several techniques, such as implementing M8 with an HVT implant, or adding an additional stacking transistor, as shown by Calhoun et al. [2] in their 10T sub-threshold cell.
When reading a '1', Q is originally written to V DD , but gradually discharges down towards GND until reaching leakage equilibrium at a low voltage.The initial state causes QB to completely discharge and stay discharged throughout subsequent hold and read cycles.In this case, the readout transistor (M7) is completely cut-off, resulting in read access time identical to a standard 8T cell.Therefore, skewing the read sensing towards the read '0' detection can improve the overall read access time of this scheme.

Cell Stability
The previous section presented the novel Q-SRAM cell and its operating modes.The cell presents very aggressive leakage reduction, resulting in minimal static power.However, the question of cell stability is unconventional, as this is not a "static" cell, and therefore static noise margins are inapplicable.

Hold Stability
The standard definition of hold stability for an SRAM cell is the Static Noise Margin (SNM), first described by Seevinck et al. [16].In this method, a constant voltage is applied as noise on the internal data nodes, and the minimal voltage required to "flip" the bitcell is defined as the noise margin.For a standard static bitcell with positive feedback this is a sufficient metric, as once the trip point is crossed, the cell will ultimately stabilize at the opposite state.This is, however, a worst case scenario, as a constant voltage drop over a given net is a non-physical noise source.Therefore, in recent years, the concept of Dynamic Noise Margin (DNM), taking into account the duration and amplitude of the noise source, has become an increasingly popular alternative metric [14,[17][18][19][20][21].
In the case of the proposed Q-SRAM, the traditional SNM metric is inapplicable, as the cell is not a static cell.Applying a constant voltage inside the cell ultimately disrupts the leakage ratios of the cut-off devices that maintain the cell's stability.However, as the cell doesn't provide a well-defined positive feedback loop, the final settling state of the cell is not as clear as in the case of a static bitcell.The application of a noise current to the internal nodes (Q and/or QB) asymmetrically changes the voltages at these nodes, and once this noise is finished, the cell will settle at one of the stable states according to the new initial condition.Therefore, testing the DNM of the bitcell is achieved by applying a current pulse to one of the internal data nodes and plotting the duration and amplitude of such a pulse that causes the cell a faulty readout.For the hold '1' state, the stability of the cell is ensured as long as the pull-down current of M4 will successfully discharge QB after the noise ceases.If the voltage rises at QB, as long as M1 doesn't turn on, the voltage at Q does not change, and the overdrive of M4 is strong enough to discharge QB.This is further enhanced by the rise in DIBL current over M4 as QB rises.If the voltage falls at Q, a negative feedback loop is initiated, as the voltage at V VDD decreases (via charge sharing over M3), reducing the negative overdrive of M9 and increasing the leakage current to V VDD .M3 is conducting, so this current is directed towards Q, raising the voltage back and thus negating the discharge noise.In this case, there is virtually no way that QB will charge (and therefore result in an incorrect readout), as M9 is further cut-off as QB rises, such that no charging current is provided and the DIBL over M4 will discharge the voltage rise.
For the hold '0' state, the stability of the cell is ensured as long as M4 does not discharge QB.Initially, Q is low (V Q = 0 V) and V QB = VV DD , at a voltage slightly lower than V DD .If the noise raises the voltage at Q, M1 is conducting with V GS = VV DD and this will easily discharge Q back to 0, unless the noise level approaches V T4 , which will ultimately cause QB to discharge and the cell to flip.Again, an HVT implementation of the pull-down devices helps raise this noise margin.If the noise causes the voltage at QB to decrease (up to the cell's noise margin), a negative feedback loop will again help save the state.Q will stay discharged and VV DD will follow QB through M6, increasing the DIBL over M9 and replenishing the voltage at QB.The behavior described above is based on the assumption that the noise is dynamic rather than static.In other words, the noise pulse is finite and results in an initial state of charge in Q and QB, as opposed to a constant voltage, as in an SNM measurement.Figure 5 plots the current noise amplitude required to flip the Q-SRAM bitcell at various noise pulse durations and at process corners.
The stability of the Q-SRAM bitcell under global process variations and local mismatch is best shown by Monte Carlo statistical simulations.An example of such a distribution is shown in Figure 6.The figure shows the steady state voltages at the bitcell's internal data nodes (Q and QB) for 2500 simulations at a low supply voltage of 400 mV.For the hold '0' state, as expected, node Q is Hold '0' clearly discharged, whereas QB is degraded from the full rail of 400mV.However, the majority of the samples are around 370 mV, and in all cases the voltage at QB is much higher than the opposite state at Q.For the hold '1' state, QB is discharged, while Q resides at a very low steady state voltage.These distributions can be tweaked with the implementation of various V T implants and sizing of the bitcell's devices.

Read and Write Stability
As previously mentioned, the read stability of the proposed cell is identical to the hold stability, due to the non-penetrating read through the read buffer.Further discussions about this stability are given in [12].
For write stability, we must again separately analyze the different situations shown in Figure 4. To write a '0', only charging QB must be considered, as Q is at least partially discharged (depending on the duration since the previous write), and the supply is cut-off, ensuring a quick and complete discharge of Q through M2.The final state requires QB to be high, which is also achieved easily, as M4 is almost immediately cut off and the charge through M5 meets hardly any contention.Even if the write pulse is shortened, such that QB hasn't reached its final level, is will continue to charge as long

QB is Completely Discharged
as the pull-up to pull-down current ratio is above unity.To measure the stability of this operation, traditional static metrics are sufficient, even though they are pessimistic.A DC noise source was added in series with the WBLB writing voltage, degrading the write level until the write failed.For a worst case scenario, this was measured with a full level stored on Q, assuming successive writes without time for Q to discharge.Figure 7 shows the ratio of the 9T Q-SRAM cell's write margin (for writing a '0') as compared to a standard 8T cell.It is clear that the proposed cell has a significant advantage, especially at lower supply voltages, and this only increases as node Q discharges over time.To write a '1' successfully, QB needs to be discharged.The level at Q will eventually degrade in any case, so it is sufficient to look predominately at the QB side.Again, there is no contention while performing this task, as M9 is cut-off, gating the supply and enabling an easy pull down of QB.This again significantly improves the write margin as compared to a standard 8T cell, and enables operation well below the 0.7 V supply voltage limitation.

Implementation and Performance
The proposed cell was implemented and simulated in a low-power 40 nm TSMC technology, using only standard process steps and multiple V T implants.Simulations of stability, power dissipation and access times were performed at various supply voltages and under process variations.Post layout simulations were performed for proof of concept.The final layout of the cell was integrated into a fully operational array and taped out as part of a 40 nm test chip, shown in Figure 8, along with the micrograph of the fabricated test chip.Preliminary functionality tests were performed on the test chip at supply voltages from 400 mV to 1.1 V, showing correct operation.Figure 9 shows an example of a write and read operation, as shown at the output of the test chip.

J. Low Powe
The stati standard 6T Figure 11a) These increa also achieve  8T bitcells are -functional in region.
er te V. is This brief paper deals mainly with the concept and the stability of the proposed cell; however the dynamic features were measured as well.As expected, read access time for a '0' is larger than that of a standard 8T cell, due to the voltage drop on node Q.This, however, can be adjusted by using a low threshold transistor for M7 and by decreasing the bitline capacitance.At 700 mV with a minimum sized LVT M7, the read access time was 1.74× longer than a standard 8T.On the other hand, the write access time of the proposed cell is much shorter than a standard 6T or 8T cell.When writing a '0', the write access time is 12× faster than 6T or 8T cell at 1.1 V, and 5.9× faster at 700 mV.An overall comparison of figures of merit with standard 6T and 8T cells are given in Table 1.

Conclusions
A novel 9T Quasi-Static RAM bitcell was briefly presented.The operational concepts and stability issues were briefly discussed.Implementation of the concept in a low-power 40 nm CMOS process showed substantial improvements in leakage power, as well as functionality at low operating voltages.Quasi-static operation of the bitcell was discussed and dynamic noise margins were shown.Access times were mentioned, although full descriptions of the simulation setup and results will be provided in a future work.The proposed bitcell was found to be advantageous in power consumption, low-voltage operation and write access time, at the expense of cell area, read access time and robustness.
In addition to detailed descriptions of the dynamic performance of the proposed cell, further research will include post-silicon measurements of the 40 nm test chip.

Figure 2 .
Figure 2. (a) Low leakage state of Q-SRAM bitcell.Both internal nodes are low, minimizing leakage currents, but ensuring correct readout value ('1'); (b) Final voltages of internal nodes Q and QB in the low-leakage "hold 1" state at common process corners (V DD = 600 mV).

Figure 3 .
Figure 3. (a) Hold '0' state of Q-SRAM cell.Node Q is discharged, whereas node QB is set according to the ratioed leakage between M4, M3 and M9; (b) Leakage ratio of M9(with LVT implant) and M4 (with HVT implant) for various levels of QB.This ratio is substantially higher than 1 when QB is lower than 0.8V DD .

Figure 4 .
Figure 4. (a) The Write '0' operation.The deflated Q is immediately discharged to ground, cutting off M4 and enabling a strong charge of QB through M5.VV DD is initially charged from QB through M6 and subsequently, leakage currents charge QB to its steady state; (b) The Write '1' operation.As M9 is cut-off, the discharge of QB through M5 is uncontested.Q is charged to a level slightly lower than V DD .

Figure 5 .
Figure 5. Dynamic Noise Margin of the Q-SRAM cell.The horizontal axis shows the width of the current noise pulse, while the vertical axis shows the amplitude of the current noise that causes the cell to reside at the opposite state at this pulse width.(a) The hold '1' state; (b) The hold '0' state.

Figure 6 .
Figure 6.Monte Carlo statistical distributions of the steady state voltages at nodes Q (left panel) and QB (right) for a 400 mV supply.

Figure 7 .
Figure 7. Ratio of the write margins of the 9T Q-SRAM cell as compared to a standard 6T or 8T cell at various supply voltages (to write a '0').
Figure bottom