Stable Local Bit-Line 6 T SRAM Architecture Design for Low-Voltage Operation and Access Enhancement

: To incur the memory interface and faster access of static RAM for near-threshold operation, a stable local bit-line static random-access memory (SRAM) architecture has been proposed along with the low-voltage pre-charged and negative local bit-line (NLBL) scheme. In addition to the low-voltage pre-charged and NLBL scheme being operated by the write bit-line column to work out for the write half-select condition. The proposed local bit-line SRAM design reduces variations and enhances the read stability, the write capacity, prevents the bit-line leakage current, and the designed pre-charged circuit has achieved an optimal pre-charge voltage during the near-threshold operation. Compared to the conventional 6 T SRAM design, the optimal pre-charge voltage has been improved up to 15% for the read static noise margin (RSNM) and the write delay enriched up to 22% for the proposed NLBL SRAM design which is energy-efﬁcient. At 400 mV supply voltage and 25 MHz operating frequency, the read and write energy consumption is 0.22 pJ and 0.23 pJ respectively. After comparing with the related works, the access average energy (AAE) is lower than in other works. The overall performance for the proposed local bit-line SRAM has achieved the highest ﬁgure of merit (FoM). The designed architecture has been implemented based on the 1-Kb SRAM macros and TSMC − 40 nm GP process technology.


Introduction
Modern electronics are being merged into smart technologies such as the Internet of Things (IoT), automotive electronics, biomedical electronics, sensor devices, and so on. Therefore, integrated circuits are widely used for low power consumption, low leakage current, and compact area [1]. However, with the development of nanometer (nm) process technology, the effect of leakage current has become a major problem in the system on chip (SoC) [2]. Modern microprocessors require more embedded memory for system specification, compact area, low energy, and power consumption [3,4]. The embedded memory consumes most of the power, and the large memory is used for data storage in the SoC. Thus, every circuit designer must pay attention to reduce power consumption. Conversely, the hostile design and size constraints make it much more difficult than general logic circuits to minimize the operating voltage [5][6][7]. As a result, several designers have connected additional circuits to decrease power consumption and increase operating stability for low supply voltages [8].
The conventional (conv.) 6 T SRAM performance is ineffective at the low supply voltage because of pseudo-read error in half-select condition. A read-decoupled 8 T SRAM [9,10] architecture was presented to solve the read error although minimized read error and increased RSNM slightly. Conversely, the memory cell is still affected by the read error during the write operation. Adding stacked access transistors, several SRAMs architectures the 9 T SRAM [11][12][13], 10 T SRAM [14,15], and 12 T SRAM [16] for near-threshold/ sub-threshold operations have been proposed to address read errors in half-select conditions. As a result of stacked access transistors, the write operation becomes weak. The bitline of SRAM is deeply affected by large parasitic capacitance during read and write operations. The Average−8 T SRAM [17] consisted of local and global bit-lines proposed to solve the parasitic problem. However, the local and global bit-lines cannot achieve full swing due to less write ability, slow operation, and high-power consumption. The fullswing local bit-line SRAM architecture [18] was proposed but still poor write ability because two cascade transistors controlled the bit cell. The 10 T SRAM [19] stack pull-down transistors for a cross-coupled inverter with VGND technology was proposed to change the write path, but the write ability is still inefficient.
After analyzing the relationship between read noise and pre-charge voltage of the bit-line pair, a new local bit-line 6 T SRAM architecture has been proposed to improve the noise margin of SRAM, capacitive density, read stability, as well as write capacity.

Proposed Local Bit-Line SRAM Architecture
A modern 6 T SRAM architecture has been proposed to robust the performance, the block diagram shown in Figure 1. Consisting with four bits of 6 T cells, two assistant circuits for optimal pre-charge (OPC), and the NLBL framework have been developed for the proposed local bit-line SRAM architecture.

Optimal Pre-Charge Circuit and Read Operation
To facilitate the read operation, an OPC circuit relates to the proposed SRAM design shown in Figure 2. The OPC circuit is controlled by the transistors LPL, LPR, NCL, NCR, and EQ. Initially, the block selection line set BLKB [0] = 0 and BLK [0] = 1 to turn off the pre-charge circuit, when BLKB [0] = 1, the optimal voltages (Vopt) are obtained by the transistors LPL and LPR using the following equation:

Optimal Pre-Charge Circuit and Read Operation
To facilitate the read operation, an OPC circuit relates to the proposed SRAM design shown in Figure 2. The OPC circuit is controlled by the transistors LPL, LPR, NCL, NCR, and EQ. Initially, the block selection line set BLKB [0] = 0 and BLK [0] = 1 to turn off the pre-charge circuit, when BLKB [0] = 1, the optimal voltages (V opt ) are obtained by the transistors LPL and LPR using the following equation: Four memory cells connected with local bit-line pair LBL [00] and LBLB [00] which minimizes the parasitic capacitance on the bit-line and reduces the read error. The memory cell reads data "1" and the local bit-line LBLB [00] is discharged into the cell through the transistor T4. The transistor T3 and transistor NCL have a small current, the four memory cells of the local bit-line significantly decrease the leakage currents. To increase the read stability and improve the RSNM, the global read bit-line (GRBL/GRBLB) avoids memory cell leakage currents. The sense amplifier (SA) reads the data out to charge the global read bit-line GRBLB [0] through transistor T2, there has a small parasitic capacitance which makes the read operation faster. The voltage of block selection lines BLKB [0] and BLK [0] is not less than V opt , so the low-voltage pre-charge circuit saves optimal voltage (V opt ) and ensures read stability of read operation.      sets '1' to write for the memory cell. By sharing the charging capacitor, memory cells are discharged via VGND to improve the write capabilities for the near-threshold operation. For instance, the first cell discharge path through T3 to LBL [00] to pass data '0', and then the LBLB [00] retained the VDD to provide additional data. In this process the LBL [00] is connected to the NLBL, the memory cell data is reversed faster during the write operation and RWL [0] is still holding data '0'. For the differential read-write function the proposed local bit-line SRAM has high speed, decreased parasitic capacitance, and resistance which is energy-efficient. Figure 3 illustrates the half-select condition operation for row blocks of proposed local bit-line SRAM. At the write operation of BLOCK 0, the transistors T3 and T4 are used to pass data of Q and QB which is indicated by the sky-blue color. Initially, the word-line WL0 =1 becomes active that shown by the red color and Q save data '1' and QB saves '0'. The global write bit-lines GWBL [0] =1 and GWBLB [0] =0 select data to activate the transistors T5 and T6 then the local bit-line LBL [00] turn out to be discharged through the transistor T5 to the negative voltage of VGND [0]. After discharge the local bit-line LBL [00], the Q = 0 and QB = 1 flip data in the memory cell. At BLOCK 1, the half-selected cell is affected by the word-line WL0 in the same row pseudo-read of LBL [10] and LBLB [10] that indicated by the orange color. The read word-line RWL [0] remains data "0" and row half-selected blocks does not charge by the global read bit-lines (GRBL/GRBLB), so the low-voltage pre-charge scheme reduces the read disturb and flip data in the memory cell. The proposed structure injects less pre-charge into cell and reduces the capacitance in the local bit-line which decline the leakage power consumption that shown by the purple color dotted line.

Half-Select Condition Operation
Electronics 2021, 10, x FOR PEER REVIEW 4 of 11 '1', to switched on transistor T5 and T6. The other side of the bit-line LBL/ LBLB is engaged through the NCL/NCR to the VDD, and the word-line WL [0] sets '1' to write for the memory cell. By sharing the charging capacitor, memory cells are discharged via VGND to improve the write capabilities for the near-threshold operation. For instance, the first cell discharge path through T3 to LBL [00] to pass data '0', and then the LBLB [00] retained the VDD to provide additional data. In this process the LBL [00] is connected to the NLBL, the memory cell data is reversed faster during the write operation and RWL [0] is still holding data '0'. For the differential read-write function the proposed local bit-line SRAM has high speed, decreased parasitic capacitance, and resistance which is energy-efficient. Figure 3 illustrates the half-select condition operation for row blocks of proposed local bit-line SRAM. At the write operation of BLOCK 0, the transistors T3 and T4 are used to pass data of Q and QB which is indicated by the sky-blue color. Initially, the word-line WL0 =1 becomes active that shown by the red color and Q save data '1' and QB saves '0'. The global write bit-lines GWBL [0] =1 and GWBLB [0] =0 select data to activate the transistors T5 and T6 then the local bit-line LBL [00] turn out to be discharged through the transistor T5 to the negative voltage of VGND [0]. After discharge the local bit-line LBL [00], the Q = 0 and QB = 1 flip data in the memory cell. At BLOCK 1, the half-selected cell is affected by the word-line WL0 in the same row pseudo-read of LBL [10] and LBLB [10] that indicated by the orange color. The read word-line RWL [0] remains data "0" and row half-selected blocks does not charge by the global read bit-lines (GRBL / GRBLB), so the low-voltage pre-charge scheme reduces the read disturb and flip data in the memory cell. The proposed structure injects less pre-charge into cell and reduces the capacitance in the local bit-line which decline the leakage power consumption that shown by the purple color dotted line.   Figure 4 shows the RSNM simulation and analysis results for 6 T SRAM cell depends on the bit-line pre-charge voltage adjustment with different supply voltages. It is obvious that each curve has the maximum RSNM value at the specific bit-line voltage corresponding to the supply voltage. This simulation result is considered for proposed local bit-line design.   The comparison of RSNM simulation result of proposed local bit-line SRAM and conv. 6 T SRAM at the various operating voltages is shown in Figure 5. The RSNM memory cell curve decreased for low voltages. The proposed local bit-line SRAM has obtained strong RSNM by using low-voltage pre-charged scheme during the read operation.

The Comparison of Monte Carlo Simulation
For the global variations, the process-voltage-temperature (PVT) corners that combined the extreme cases of these variables are commonly used to verify the performance. The 10,000 times Monte Carlo post-simulations were performed for the experiment to show the enhancement of read stability. At 400 mV supply voltages, the experimental result of the conventional 6 T SRAM cell is shown in Figure 6a. During the read operation, the conv. 6 T SRAM charge on the bit-line pair destroyed the data store in the memory cell, flipped the data store in the cell, and reversed several read errors and data flips. Alternatively, the proposed local bit-line SRAM uses a low-voltage pre-charge scheme to optimize the voltage and achieved the stability of the cell without any read errors at the same supply voltage the result is shown in Figure 6b. The comparison of RSNM simulation result of proposed local bit-line SRAM and conv. 6 T SRAM at the various operating voltages is shown in Figure 5. The RSNM memory cell curve decreased for low voltages. The proposed local bit-line SRAM has obtained strong RSNM by using low-voltage pre-charged scheme during the read operation.  The comparison of RSNM simulation result of proposed local bit-line SRAM and conv. 6 T SRAM at the various operating voltages is shown in Figure 5. The RSNM memory cell curve decreased for low voltages. The proposed local bit-line SRAM has obtained strong RSNM by using low-voltage pre-charged scheme during the read operation.

The Comparison of Monte Carlo Simulation
For the global variations, the process-voltage-temperature (PVT) corners that combined the extreme cases of these variables are commonly used to verify the performance. The 10,000 times Monte Carlo post-simulations were performed for the experiment to show the enhancement of read stability. At 400 mV supply voltages, the experimental result of the conventional 6 T SRAM cell is shown in Figure 6a. During the read operation, the conv. 6 T SRAM charge on the bit-line pair destroyed the data store in the memory cell, flipped the data store in the cell, and reversed several read errors and data flips. Alternatively, the proposed local bit-line SRAM uses a low-voltage pre-charge scheme to optimize the voltage and achieved the stability of the cell without any read errors at the same supply voltage the result is shown in Figure 6b.

The Comparison of Monte Carlo Simulation
For the global variations, the process-voltage-temperature (PVT) corners that combined the extreme cases of these variables are commonly used to verify the performance. The 10,000 times Monte Carlo post-simulations were performed for the experiment to show the enhancement of read stability. At 400 mV supply voltages, the experimental result of the conventional 6 T SRAM cell is shown in Figure 6a. During the read operation, the conv. 6 T SRAM charge on the bit-line pair destroyed the data store in the memory cell, flipped the data store in the cell, and reversed several read errors and data flips. Alternatively, the proposed local bit-line SRAM uses a low-voltage pre-charge scheme to optimize the voltage and achieved the stability of the cell without any read errors at the same supply voltage the result is shown in Figure 6b

The Comparison of Write Ability Simulation
The comparison and write ability simulation result for the different operating voltages is shown in Figure 7. The differential write operation is considered to write speed of data "0" and "1", the write operation is affected by the transition point of the cross-coupled inverter. However, the NLBL technology that included with the proposed design, so the write ability of the memory cells has improved and provides better performance. Compared to the conv. 6 T SRAM, the write speed of the proposed design has improved about 22% at the 400 mV operating voltage.  Figure 8 shows the difference between conv. 6 T SRAM and the proposed bit-line swing at 400 mV supply voltage. During the read operation, the bit-line pair has a different voltage because of the leakage current of all column half-select cells. The voltage difference of conv. 6 T SRAM is less than 200 mV at 128 bits, which causes the bit-line swing is too small for the SA that cannot succeed stability. The proposed local bit-line SRAM maintains the voltage difference of more than 300 mV and bit-line depth is changed to 256 bits.

The Comparison of Write Ability Simulation
The comparison and write ability simulation result for the different operating voltages is shown in Figure 7. The differential write operation is considered to write speed of data "0" and "1", the write operation is affected by the transition point of the cross-coupled inverter. However, the NLBL technology that included with the proposed design, so the write ability of the memory cells has improved and provides better performance. Compared to the conv. 6 T SRAM, the write speed of the proposed design has improved about 22% at the 400 mV operating voltage.

The Comparison of Write Ability Simulation
The comparison and write ability simulation result for the different operating voltages is shown in Figure 7. The differential write operation is considered to write speed of data "0" and "1", the write operation is affected by the transition point of the cross-coupled inverter. However, the NLBL technology that included with the proposed design, so the write ability of the memory cells has improved and provides better performance. Compared to the conv. 6 T SRAM, the write speed of the proposed design has improved about 22% at the 400 mV operating voltage.  Figure 8 shows the difference between conv. 6 T SRAM and the proposed bit-line swing at 400 mV supply voltage. During the read operation, the bit-line pair has a different voltage because of the leakage current of all column half-select cells. The voltage difference of conv. 6 T SRAM is less than 200 mV at 128 bits, which causes the bit-line swing is too small for the SA that cannot succeed stability. The proposed local bit-line SRAM maintains the voltage difference of more than 300 mV and bit-line depth is changed to 256 bits.  Figure 8 shows the difference between conv. 6 T SRAM and the proposed bit-line swing at 400 mV supply voltage. During the read operation, the bit-line pair has a different voltage because of the leakage current of all column half-select cells. The voltage difference of conv. 6 T SRAM is less than 200 mV at 128 bits, which causes the bit-line swing is too small for the SA that cannot succeed stability. The proposed local bit-line SRAM maintains the voltage difference of more than 300 mV and bit-line depth is changed to 256 bits.

The Comparison of Leakage Power
The static leakage power consumption of the memory cell is shown in Figure 9. Although the proposed architecture increases the metal-oxide-semiconductor field-effect transistor (MOSFET) number but effectively reduces the leakage current on the bit-line and the static leakage power consumption. The proposed design effectively reduces the bit-line leakage current and static leakage power consumption compared to the conv. 6 T SRAM.

Chip Implementation and Result Comparison.
The read and write operations are considered to construct the proposed local bit-line 6 T SRAM. While doing the operation, the read and write cell cannot destroy the logic in the cell. The uniqueness of the layout area, the cell size of the SRAM system has a great impact on the proposed design. The comparison of memory cell size and area is shown in Table 2. The 1 kb conv. 6 T area is still smaller than 8 T and proposed 6 T whereas, the proposed 6 T SRAM cell has a better performance for the read and write operation. The proposed local bit-line SRAM area is 7.65 μm 2 as shown in Figure 10a, and the light layout of the proposed design using the TSMC−40 nmGP process technology is shown in Figure  10b. Comparison table of the SRAM area.

The Comparison of Leakage Power
The static leakage power consumption of the memory cell is shown in Figure 9. Although the proposed architecture increases the metal-oxide-semiconductor field-effect transistor (MOSFET) number but effectively reduces the leakage current on the bit-line and the static leakage power consumption. The proposed design effectively reduces the bit-line leakage current and static leakage power consumption compared to the conv. 6 T SRAM.

The Comparison of Leakage Power
The static leakage power consumption of the memory cell is shown in Figure 9. Although the proposed architecture increases the metal-oxide-semiconductor field-effect transistor (MOSFET) number but effectively reduces the leakage current on the bit-line and the static leakage power consumption. The proposed design effectively reduces the bit-line leakage current and static leakage power consumption compared to the conv. 6 T SRAM.

Chip Implementation and Result Comparison.
The read and write operations are considered to construct the proposed local bit-line 6 T SRAM. While doing the operation, the read and write cell cannot destroy the logic in the cell. The uniqueness of the layout area, the cell size of the SRAM system has a great impact on the proposed design. The comparison of memory cell size and area is shown in Table 2. The 1 kb conv. 6 T area is still smaller than 8 T and proposed 6 T whereas, the proposed 6 T SRAM cell has a better performance for the read and write operation. The proposed local bit-line SRAM area is 7.65 μm 2 as shown in Figure 10a, and the light layout of the proposed design using the TSMC−40 nmGP process technology is shown in Figure  10b. Comparison table of the SRAM area.

Chip Implementation and Result Comparison
The read and write operations are considered to construct the proposed local bit-line 6 T SRAM. While doing the operation, the read and write cell cannot destroy the logic in the cell. The uniqueness of the layout area, the cell size of the SRAM system has a great impact on the proposed design. The comparison of memory cell size and area is shown in Table 2. The 1 kb conv. 6 T area is still smaller than 8 T and proposed 6 T whereas, the proposed 6 T SRAM cell has a better performance for the read and write operation. The proposed local bit-line SRAM area is 7.65 µm 2 as shown in Figure 10a, and the light layout of the proposed design using the TSMC−40 nmGP process technology is shown in Figure 10b. Comparison table of the SRAM area.  The implemented architecture 1 kb SRAM macros (128 rows × 8 columns) has 32 blocks in each column and each block consists of 4 memory cells. By using the Hspice EDA tool, the waveform of post-layout simulation is shown in Figure 11 at 400 mV / 25 MHz. The WRITE_EN is the write control signal that controls the GWBL/GWBLB decoder, write driver circuit and NLBL scheme circuits to start work when WRITE_EN is "1". At first set input data is "1" to test the write status parameter. The selected memory cell stores data "1" during the write cycle "1" and reads the status, then the sense amplifier starts operating and pulls up the SA and latch to "1" which means the read and write operation both are successful. If WRITE_EN is "0" then the read operation starts to work. Similarly, when the write data is "0" the selected memory cell stored data "0" and the read status start, the SA and latch is also pulled down to "0". These parameters speed up the proposed local bit-line SRAM to operate at the near-threshold operation by reducing read error strengthening the write ability. The implemented architecture 1 kb SRAM macros (128 rows × 8 columns) has 32 blocks in each column and each block consists of 4 memory cells. By using the Hspice EDA tool, the waveform of post-layout simulation is shown in Figure 11 at 400 mV/25 MHz. The WRITE_EN is the write control signal that controls the GWBL/GWBLB decoder, write driver circuit and NLBL scheme circuits to start work when WRITE_EN is "1". At first set input data is "1" to test the write status parameter. The selected memory cell stores data "1" during the write cycle "1" and reads the status, then the sense amplifier starts operating and pulls up the SA and latch to "1" which means the read and write operation both are successful. If WRITE_EN is "0" then the read operation starts to work. Similarly, when the write data is "0" the selected memory cell stored data "0" and the read status start, the SA and latch is also pulled down to "0". These parameters speed up the proposed local bit-line SRAM to operate at the near-threshold operation by reducing read error strengthening the write ability.
Electronics 2021, 10, x FOR PEER REVIEW 9 of 11 Figure 11. Post-layout simulation waveform for SRAM Chip @400 mV/25 MHz/TT corner. Table 3 shows the comparison of proposed local bit-line SRAMs architecture performance and previous work. The comprehensive summary of the proposed local bit-line SRAM has smaller average energy consumption than the MINI-Array [21] and the highest FoM performance.  [22] is normalized to proposed local lit-line SRAM.

Conclusions
The low-voltage pre-charged and NLBL scheme have been included with the proposed local bit-line 6 T SRAM architecture which can be operated at the near-threshold operation. The low-voltage pre-charged circuit has reduced the read error and improved the read stability and RSNM of the memory cells. Moreover, the NLBL scheme has reduced the write error and improved the write ability for the near-threshold operation.  Table 3 shows the comparison of proposed local bit-line SRAMs architecture performance and previous work. The comprehensive summary of the proposed local bit-line SRAM has smaller average energy consumption than the MINI-Array [20] and the highest FoM performance.  [22] is normalized to proposed local lit-line SRAM.

Conclusions
The low-voltage pre-charged and NLBL scheme have been included with the proposed local bit-line 6 T SRAM architecture which can be operated at the near-threshold operation. The low-voltage pre-charged circuit has reduced the read error and improved the read stability and RSNM of the memory cells. Moreover, the NLBL scheme has reduced the write error and improved the write ability for the near-threshold operation. Furthermore, the halfselect cells pseudo-read error has reduced at the half-select condition in the proposed design. Likewise, the proposed architecture of the local bit-line SRAM eliminates the bitline leakage induced and read failures. The TSMC−40 nmGP process technology has been implemented for the proposed local bit-line 6 T SRAM on 1 kb SRAM macros fabricated. At 400 mV supply voltage and 25 MHz operating frequency, write energy consumption is saved about 45.2%, and the average energy consumption is reduced by about 52.9% compared to the MINI-Array. The proposed local bit-line 6 T SRAM effectively applicable to operate at low-power SoC chips.