- freely available
JLPEA 2012, 2(2), 143-154; doi:10.3390/jlpea2020143
Abstract: The need for ultra low power circuits has forced circuit designers to scale voltage supplies into the sub-threshold region where energy per operation is minimized . The problem with this is that the traditional 6T SRAM bitcell, used for data storage, becomes unreliable at voltages below about 700 mV due to process variations and decreased device drive strength . In order to achieve reliable operation, new bitcell topologies and assist methods have been proposed. This paper provides a comparison of four different bitcell topologies using read and write VMIN as the metrics for evaluation. In addition, read and write assist methods were tested using the periphery voltage scaling techniques discussed in [4,5,6,7,8,9,10,11,12,13]. Measurements taken from a 180 nm test chip show read functionality (without assist methods) down to 500 mV and write functionality down to 600 mV. Using assist methods can reduce both read and write VMIN by 100 mV over the unassisted test case.
As mobile devices become heavily energy constrained, the need for ultra low power circuits has emerged. In order to reduce energy consumption, voltage supplies are scaled down to take advantage of quadratic energy savings. The sub-threshold region (VDD < VT) has been shown by  to minimize energy per operation. Sub-threshold systems require Static Random Access Memory (SRAM) for storing data at these low voltages. The problem is that while logic has been shown to easily scale into the sub-threshold region, the traditional 6T SRAM bitcell becomes unreliable at voltages below 700 mV due to process variations and decreased device drive strength . SRAM devices are typically minimum sized, which further compounds this problem. As the capacity of SRAM arrays continues to increase, the stability (typically measured in terms of Static Noise Margin (SNM) ) of the worst case bitcell degrades. Therefore, in order for the minimum operating voltage (VMIN) of SRAMs to enter the sub-threshold regime, more robust bitcell designs or assist methods must be used.
One possible solution to this problem is to design a more robust bitcell topology capable of larger read and write margins. The downside to this strategy is that adding more transistors to the bitcell increases the total area of the array. The second strategy is to use various assist methods [4,5,6,7,8,9,10,11,12,13] to make the cell easier to read and write. This method also results in a smaller area overhead and may require multiple voltage sources. In this work we will analyze different bitcell topologies and assist methods to determine which is the most effective at reducing SRAM VMIN. In Section 2, we will introduce a variety of sub-threshold bitcell topologies and explain the pros and cons of each. In Section 3 and Section 4 we will present an overview of read and write assist methods, and explain how each method can be used to improve margins. Section 5 will present the results from a test chip, and Section 6 will conclude.
2. Introduction of Sub-Threshold Bitcell Topologies
In a sub-threshold circuit, the supply voltage (VDD) is set below the threshold voltage (VT) of the transistors. This reduction in VDD results in a quadratic reduction in switching power. In addition, it reduces leakage power, which is especially important for SRAMs that contain thousands or millions of bitcells. The main limitations of sub-threshold circuits are their sensitivity to variation and slow speed. In the sub-threshold region, transistor currents vary exponentially with VT. This makes designing ratioed circuits such as SRAMs nearly impossible . Another problem is that the ION/IOFF current ratio is reduced, which can lead to read access failures on bitlines with excessive leakage. In order to combat these problems, new bitcell topologies have been introduced and are described below.
The 8T bitcell  shown in Figure 1a adds a two transistor read buffer to the conventional 6T bitcell in order to prevent the data from being disturbed during a read. In a normal read operation, the bitlines are precharged and the WL is pulsed high, causing the bitcell to discharge one of the bitlines. The problem with this is that if the node storing a “0” rises above the switching threshold of right inverter (Figure 1a), then the cell could unintentionally flip. The 8T cell solves this problem by decoupling the data from the read operation; therefore the read SNM becomes the hold SNM. One weakness of this bitcell is that it still suffers from half-select instability, which occurs during a write when an unselected cell is read like a traditional 6T bitcell. Currently the best method to solve this problem in a bit interleaved architecture is by using a read before write scheme. In this method the entire row is read and then the data is written back into the unselected cells at the same time that new data is written to the selected cells.
The 10T bitcell  (Figure 1b) uses Schmitt Trigger (ST) inverters to help improve the read static noise margin (RSNM). The NR2/NFR feedback transistors weaken the pull down network when VR is high, increasing the switching threshold of the right inverter. This means that the VL node would have to pull up much higher during a read in order to flip the cell, resulting in higher read stability. This bitcell has been shown by  to have 1.56× higher read SNM compared to the conventional 6T bitcell. The downside to this topology is that the four extra transistors result in a 33% area penalty compared to the 6T bitcell.
We propose an 8T asymmetric Schmitt Trigger bitcell (Figure 2). This bitcell uses single-ended reading and asymmetric inverters, similar to the asymmetric 5T bitcell in  to improve read margin. By using an asymmetrical design, the trip point of the ST inverter is increased, resulting in higher read stability. Because the 5T bitcell has only one access transistor, write assist methods must be used when trying to write a ‘1’ into the bitcell. The advantage that this design has over the 5T bitcell is that it is written like a traditional 6T bitcell, which eliminates the need for write assist methods. The WL is pulsed high during both a read and write, and the WWL is only pulsed high during a write. In simulation (Figure 3) this bitcell achieves 86% higher RSNM than the 6T cell and 19% higher RSNM than the 10T ST bitcell with no VT variation added.
In Figure 4, we compare statistic distributions of the read and hold static noise margins for each of the bitcells. The average hold static noise margin (HSNM) of the 6T and 8T bitcells is 222 mV, with the 10T ST slightly higher at 226 mV and the asymmetric ST slightly lower at 218 mV. However it is interesting to note that the standard deviation of the HSNM is 2.5 mV for 6T and 8T bitcells, 5.0 mV for the asymmetric ST, and 7.8 mV for the 10T ST bitcell. Therefore as the number of bitcells increases, the HSNM of the worst case bitcell in the 10T ST array will be lower compared to the other arrays. The average read static noise margin (RSNM) of the asymmetric ST is 88% higher than the 6T and 8% higher than the 10T ST. The 8T read distribution is the same as the hold distribution since the data is decoupled from the read operation. This assumes that the architecture of the 8T array does not interleave bits, or that a read before write scheme is implemented.
3. Write Assist Methods
A write failure occurs when the value being stored in the bitcell is unable to be flipped. For example, to write the bitcell in Figure 1, the bitline (BL) is held high and BLB is held low. In order for the internal state to flip, pass-gate transistor XR must be able to pull node QB below the switching threshold of the left inverter. A ratioed fight is occurring between XR and PR, therefore transistor PR is usually made weak, to make writing easier. The downside to making the pull up transistor minimum sized is that it increases the VT variation of this transistor.
The goal of write assist methods is to further weaken the pull-up transistor or strengthen the pass-gate transistor. There are several ways to accomplish this. The first is to increase the pass-gate to pull-up ratio, however because we are operating in sub-threshold sizing is not an efficient knob. The second method is to collapse VDD, which weakens the pull-up transistors [4,9,10]. The third and fourth methods involve strengthening the pass-gate transistors by either boosting the WL VDD or reducing the BL VSS [4,5,6,7,8,11,13]. These methods strengthen the passgate by increasing its VGS. The downside to boosting the WL VDD is that it reduces half selected cell stability. The weakness of reducing the BL VSS is that it increases the BL swing, which increases the total write energy.
4. Read Assist Methods
Read failures can occurs in two ways. The first is that the bitcell is flipped during a read operation (referred to as read failure). This occurs when the XL and NL1 transistors (Figure 1) are sinking the large amount of charge from the highly capacitive BL, and the Q node rises above the trip point of the right inverter. In order to increase read stability, the pull-down transistor is made stronger than the pass-gate. The second type of read failure occurs when the voltage difference between the BL and BLB is not large enough for the sense amp to determine the correct value (referred to as read access). This happens in sub-threshold especially due to the BL leakage current in unaccessed cells causing the BL voltage to droop. Because the ION/IOFF ratio is reduced in sub-threshold, it is feasible for the leakage current through the unaccessed rows to pull the BL low at the same rate that the on current is pulling BLB low. This leakage current can be reduced by having less bitcells sharing the same bitline or by using one of the assist methods discussed below.
There are two goals involved in read assist methods. The first is to improve the stability of the cross-coupled inverters during the read by either raising the bitcell VDD or reducing its VSS [4,5,7,8,9,10]. While raising bitcell VDD has been shown by  to result in larger gains in RSNM, the advantage of reducing the bitcell VSS is that it significantly reduces read delay due to the body effect strengthening both the pull-down and pass-gate transistors . The second goal is improve read access by increasing the read current (ION) and reducing the BL leakage in unaccessed cells (IOFF). The read current can be increased by boosting the WL VDD. The downside here is that by strengthening the passgate, you reduce the stability of the cross-coupled inverters. In order to reduce bitline leakage current, the WL VSS is reduced to a negative voltage.
To compare bitcell topologies for subthreshold and to test assist features, we implemented a test chip that was fabricated in MITLL 180 nm FDSOI. This technology is specifically optimized for subthreshold operation by using an undoped channel to reduce capacitance and improve VT control . In addition, the gate spacer is widened and the source/drain extensions are removed which has only a small impact on ION due to low VDS barrier. These optimizations result in a 50× reduction in energy-delay product compared to bulk silicon. As shown in Figure 6, the chip contains four SRAM arrays, with each array containing two four-Kb banks. The banks’ dimensions are 128 rows by two 16 bit words. The 6T and 8T cells are sized iso-area; the ST and asymmetric ST bitcells are also iso-area and suffer a 33% area penalty over the 6T and 8T bitcells. In order to easily test the read and write assist methods, peripheral and bitcell array voltages are controlled by separate supplies. The output pads used level converters to convert from sub-threshold to super-threshold in order to ensure that the data could be read by the Logic Analyzer. Because the main objective was reducing VMIN, the chip was tested at 20 kHz to ensure that timing errors would not occur.
The test setup used a combination of Labview to control Keithley 2400 Source Meters and a Tektronix TLA7012 Logic Analyzer to handle the input and output signals. To determine the minimum data retention voltage (DRV), the memory is written with a known value, the voltage is dropped below nominal, then raised back to nominal and the data is read back out. The DRV is defined as the minimum voltage that the memory will retain the data. The second metric, write VMIN is determined in a similar way. First a known value is written at nominal VDD, then the voltage is dropped and the opposite value is written. Next the voltage is raised back to nominal and read back out. To determine read VMIN, a known value is written at nominal VDD, then the voltage is dropped and the data is read back out. Each of the tests described above is an iterative process, with the voltage dropping lower at each step until it is close to ground.
Because the test chip was fabricated during the first run of a new technology (MITLL 180 nm FDSOI), the yield was not ideal. We found full columns to be non-functional as well as a relatively high number of random bit failures. However, even with the non-ideal yield we were able to obtain some interesting results. The first result was that the SRAM proved to be write limited, meaning that the write VMIN exceeded the read VMIN. The best case write VMIN at 80% yield was 620 mV, and the best case read VMIN was 440 mV at 80% yield. This number was chosen because the yield of some of the arrays even at nominal voltage was below 90%. Therefore in order to capture the trends of the various assist methods, we chose to use a yield value of 80% in order to negate the effect of these outliers. The 8T bitcell offered the lowest read VMIN which is surprisingly only 10% lower than the other three bitcells. This is interesting because in simulation, the RSNM of the asymmetric ST and 10T ST bitcells was much higher than the 6T bitcell. What we observed was that there seems to be a discrepancy between the spice models and silicon data. This is most likely due to the technology being relatively immature during its first fabrication run. As a result, it was difficult to compare bitcell topologies, which ended up producing very similar results in silicon. The cause of these discrepancies is not yet fully understood, and more research will be necessary to identify the source of error.
Although bitcell measurements yielded inconclusive results, we can still evaluate assist features. The results from the different write assist methods are shown in Figure 7 and Table 1. Based on these figures, we conclude that BL VSS reduction is the most effect method for reducing write VMIN. This method outperforms the WL VDD boost method across each of the bitcells. It is interesting to note that the 6T bitcell and Asymmetric ST bitcell achieve the lowest write VMIN at 430 mV, a reduction of 190 mV compared to the best case without assist methods.
As seen in Figure 8a, the WL VSS reduction resulted in a 100 mV reduction in read VMIN for each of the bitcells. The interesting trend with this plot is that each of the bitcells had almost identical read VMIN values. This would suggest using a combination of the 6T bitcell and WL VSS reduction is the most area efficient strategy for reducing read VMIN. Based on the results from Figure 8b, reducing WL VSS and bitcell VSS consistently improved the read VMIN for each of the bitcells. This suggests that bitline leakage was a major contributor to reduced read margin. It is also interesting to note that increasing the bitcell VDD had the greatest impact on the 10T ST bitcell and WL VDD boosting had the most positive effect on the 8T bitcell. Again, process features in the new technology most likely masked the effects of topological differences in the cells.
Figure 9a shows that as the cell VDD is boosted above 100 mV, the effect it has on reducing read VMIN degrades. Increasing the cell VDD from 100 mV to 200 mV results in only a 9% average reduction in read VMIN. Figure 9b shows that reducing the cell VSS below −100 mV actually results in an increase in read VMIN. This is likely due to the forward biasing of the source to bulk junction. Not shown in Figure 9 is the effect of increasing the WL VDD and VSS from 100 to 200 mV, because this increase had no effect on the read VMIN. This is most likely due to the fact that the measured data retention voltage (DRV) ranged from 300 to 350 mV.
The results in Figure 10 show the effect of raising the assist voltage above 100 mV and are measured at a yield of 70%. As seen in Figure 10a, as the WL VDD is boosted up to 200 mV greater than nominal VDD, the write VMIN of the 10T ST and the 8T bitcells improve consistently. However, the 6T bitcell sees no improvement in VMIN as the WL VDD is boosted above 100 mV. Reducing the BL VSS below −100 mV has a significant effect on reducing the write VMIN. For the 8T bitcell, a reduction from −100 mV to −150 mV results in a 26% reduction in VMIN. However, further reducing the BL VSS to −200 mV does not have a significant effect on reducing VMIN. Based on this data we conclude that using a combination of the 6T bitcell and negative BL VSS is the most area efficient strategy for reducing write VMIN.
In this paper we present a novel asymmetric ST bitcell which uses single ended reading to achieve 86% higher RSNM than the 6T cell and 19% higher RSNM than the 10T ST bitcell in simulation. Although the asymmetrical ST and 10T ST bitcells offer improved read stability, silicon results in the first run of a 180 nm FDSOI process showed read VMIN comparable to the 6T bitcell. Therefore it would be interesting to repeat this analysis in a more mature technology, to determine if the discrepancy was caused by the Spice models or by faults in the immature process. The second contribution of this paper is a comparison of different read and write assist methods and various sub-threshold bitcell topologies. One important observation is that by choosing an effective assist method, the bitcell topology has much less of an impact on VMIN. Therefore the bitcell topology with less leakage and/or less area might be the optimum one for all the trade-offs. Another important observation is that sub-threshold bitcells proved to be write-limited, with unassisted write VMIN 41% higher than read VMIN. This trend has been shown by  to be especially true in newer technologies. In terms of write assist methods, the BL VSS reduction is the most effective, providing a 46% increase at −200 mV. Reducing WL VSS or bitcell VSS provided the largest reduction in read VMIN of 26%. Based on our results, we conclude that using assist methods as opposed to designing new bitcell topologies is more effective at reducing SRAM VMIN.
We would like to thank MITLL for their help and support in the completion of this work.
- Wang, A.; Chandrakasan, A.; Kosonocky, S. Optimal Supply and Threshold Scaling for Sub-threshold CMOS Circuits. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI 2002, Pittsburgh, PA, USA, 25-26 April 2002; pp. 7–11.
- Mann, R.W.; Nalam, S.; Wang, J.; Calhoun, B.H. Limits of bias based assist methods in nano-scale 6T SRAM. In Proceedings of the 11th International Symposium on Quality Electronic Design, San Jose, CA, USA, 22-24 March 2010; pp. 1–8.
- Seevinck, E.; List, F.J.; Lohstroh, J. Static-noise margin analysis of MOS SRAM cells. IEEE J. Solid-State Circuits 1987, 22, 748–754. [Google Scholar]
- Hirabayashi, O.; Kawasumi, A.; Suzuki, A.; Takeyama, Y.; Kushida, K.; Sasaki, T.; Katayama, A.; Fukamo, G.; Fujimura, Y.; Nakazato, T.; Shizuki, Y.; Kushiyama, N.; Yabe, T. A process-variation-tolerant dual-power-supply SRAM with 0.179 μm2 cell in 40 nm CMOS using level-programmable wordline driver. In Proceedings of the International Solid-State Circuits Conference, San Francisco, CA, USA, 8-12 February 2009; pp. 458–459.
- Wang, D.P.; Liao, H.J.; Yamauchi, H.; Chen, Y.H.; Lin, Y.L.; Lin, S.H.; Liu, D.C.; Chang, H.C.; Hwang, W. A 45 nm dual-port SRAM with write and read capability enhancement at low voltage. In Proceedings of IEEE International SOC Conference, Hsin Chu, Taiwan, 26-29 September 2007; pp. 211–214.
- Nii, K.; Yabuuchi, M.; Tsukamoto, Y.; Ohbayashi, S.; Oda, Y.; Usui, K.; Kawamura, T.; Tsuboi, N.; Iwasaki, T.; Hashimoto, K.; Makino, H.; Shinohara, H. A 45-nm single-port and dual-port SRAM family with robust read/write stabilizing circuitry under DVFS environment. In Proceedings of IEEE Symposium on VLSI Circuits, Honolulu, HI, USA, 18-20 June 2008; pp. 212–213.
- Chen, Y.H.; Chan, W.M.; Chou, S.Y.; Liao, H.J.; Pan, H.Y.; Wu, J.J.; Lee, C.H.; Yang, S.M.; Liu, Y.C.; Yamauchi, H. A 0.6 V 45 nm adaptive dual-rail SRAM compiler circuit design for lower VDD min VLSIs. In Proceedings of IEEE Symposium on VLSI Circuits, Honolulu, HI, USA, 18-20 June 2008; pp. 210–211.
- Chung, Y.; Song, S.H. Implementation of low-voltage static RAM with enhance data stability and circuit speed. Microelectron. J. 2009, 40, 944–951. [Google Scholar] [CrossRef]
- Zhang, K.; Bhattacharya, U.; Chen, Z.; Hamzaoglu, F.; Murray, D.; Vallepalli, N.; Wang, Y.; Zheng, B.; Bohr, M. A 3-GHz 70-Mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply. IEEE J. Solid-State Circuits 2006, 41, 146–151. [Google Scholar]
- Yamaoka, M.; Osada, K.; Ishibashi, K. 0.4V logic-library-friendly SRAM array using rectangular-diffusion cell and delta-boosted-array voltage scheme. IEEE J. Solid-State Circuits 2004, 39, 934–940. [Google Scholar]
- Iijima, M.; Seto, K.; Numa, M.; Tada, A.; Ipposhi, T. Low power SRAM with boost driver generating pulsed word line voltage for sub-1 V operation. J. Comput. 2008, 3, 34–40. [Google Scholar]
- Yang, H.S.; Wong, R.; Hasumi, R.; Gao, Y.; Kim, N.S.; Lee, D.H.; Badrudduza, S.; Nair, D.; Ostermayr, M.; Kang, H.; Zhuang, H.; Li, J.; Kang, L.; Chen, X.; Thean, A.; Arnaud, F.; Zhuang, L.; Schiller, C.; Sun, D.P.; Teh, Y.W.; Wallner, J.; Takasu, Y.; Stein, K.; Samavedam, S.; Jaeger, D.; Baiocco, C.V.; Sherony, M.; Khare, M.; Lage, C.; Pape, J.; Sudijono, J.; Steegen, A.L.; Stiffle, S. Scaling of 32 nm low power SRAM with high-k metal gate. In Proceedings of International Electron Devices Meeting, San Francisco, CA, USA, 15-17 December 2008; pp. 1–4.
- Shibata, N.; Kiya, H.; Kurita, S.; Okamoto, H.; Tan’no, M.; Douseki, T. A 0.5 V 25 MHz 1-mw 256-kb MTCMOS/SOI SRAM for solar-power-operated portable personal digital equipment—sure write operation by using step-down negatively overdrive bitline scheme. IEEEJ. Solid State Circuits 2006, 41, 728–742. [Google Scholar] [CrossRef]
- Calhoun, B.H.; Chandrakasan, A.P. A 256 kb sub-threshold SRAM in 65 nm CMOS. In Proceedings of the International Solid-State Circuits ConferenceSan Francisco, San Francisco, CA, USA, 6-9 February 2006; pp. 2592–2601.
- Verma, N. Chandrakasan, A.P. A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy. IEEE J. Solid-State Circuits 2008, 43, 141–149. [Google Scholar]
- Kulkarni, J.P.; Kim, K.; Roy, K. A 160 mV Robust Schmitt Trigger Based Subthreshold SRAM. IEEE J. Solid-State Circuits 2007, 42, 2303–2313. [Google Scholar]
- Nalam, S.; Calhoun, B.H. Asymmetric sizing in a 45nm 5T SRAM to improve read stability over 6T. In Proceedings of IEEE Custom Integrated Circuits Conference, San Jose, CA, USA, 13-16 September 2009; pp. 709–712.
- Vitale, S.A.; Wyatt, P.W.; Checka, N.; Kedzierski, J.; Keast, C.L. FDSOI process technology for subthreshold-operation ultralow-power electronics. Proc. IEEE 2010, 98, 333–342. [Google Scholar] [CrossRef]
- Bhavnagarwala, A.; Kosonocky, S.; Radens, C.; Stawiasz, K.; Mann, R.; Ye, Q.; Chin, K. Fluctuation limits and scaling opportunities for CMOS SRAM cells. In Proceedings of International Electron Devices Meeting, Washington, DC, USA, 5 December 2005; pp. 659–662.
© 2012 by MDPI, Basel, Switzerland This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).