Impact of Low-Variability SOTB Process on Ultra-Low-Voltage Operation of 1 Million Logic Gates †

In this study, we demonstrate near-0.1 V minimum operating voltage of a low-variability Silicon on Thin Buried Oxide (SOTB) process for one million logic gates on silicon. Low process variability is required to obtain higher energy efficiency during ultra-low-voltage operation with steeper subthreshold slope transistors. In this study, we verify the decrease in operating voltage of logic circuits via a variability-suppressed SOTB process. In our measurement results with test chips fabricated in 65-nm SOTB and bulk processes, the operating voltage at which the first failure is observed was lowered from 0.2 to 0.125 V by introducing a low-variability SOTB process. Even at 0.115 V, over 40% yield can be expected as per our measurement results on SOTB test chips.


Introduction
Subthreshold designs yield high energy efficiency by lowering operating voltages of the given circuits. Such designs can be applied to low-power sensor nodes or other low-power applications [1]. On the other hand, several new-structure steep subthreshold slope (SS) FETs, such as FinFETs, Gate All Around (GAA) FETs, and tunnel FETs [2,3], have been discussed as further advancement of CMOS processes. These transistors have steeper SS values than conventional planar MOSFETs, and reduced leakage current at low voltages. Figure 1 [4] shows the energy/cycle of a 54-stage inverter chain under several SS conditions which was simulated following [5]. From the figure, the SS of the planar CMOS process is approximately 100 mV/dec, with the minimum energy/cycle obtained at about 0.22 V. On the other hand, 3× and 8.5× energy efficiency improvement was obtained at near 0.1 V under 65 mV/dec and 45 mV/dec conditions because of reduced leakage current. In general, when the SS values of transistors are improved, the minimum energy point is lowered.  Lowering the operating voltage will be more effective for obtaining high energy efficiency in future processes: however, low-voltage operation is restricted by variability in transistor performance, and the impact of local random variation is serious in large-scale circuits [5]. In [6], Niiyama et al. reported the minimum operating voltage of ring oscillators (ROs) in a 90-nm bulk process. Although the minimum operating voltage of 11-stage ROs was 90 mV, that of 1 million-stage ROs reached 343 mV. Several studies have reported on the implementation of ultra-low-voltage operation circuits [7][8][9][10]; their operating ranges are from 0.175 V to 0.28 V depending on the scale of the circuit and process technology. SRAM has relatively high minimum operating voltage, and methods of lowering the minimum operating voltage via circuit structures are frequently discussed in the literature. A conventional 6T-SRAM has approximately 0.8 V minimum operating voltage [11], whereas 7T [12], 8T [13], 10T [14], and improved-6T [15] SRAMs operate at 0.44 V, 0.25 V, 0.16 V and 0.208 V, respectively.
In this study, we report notably decreased minimum operating voltages via variability suppressed Silicon on Thin Buried Oxide (SOTB) [16] technology with a million-gate logic circuit based on our previous research [17]. Figure 2 shows the structure of the SOTB transistor, which is formed on a thin silicon on insulator (SOI) layer and has an ultra-low dose channel. This ultra-low dose channel enables reductions in the random dopant variability and, as a result, decreased operating voltage is expected. Whereas a 0.37 V minimum operating voltage of 6T-SRAM in the SOTB process has been reported [11], we focus on logic circuits. We measure a 1 million-gate logic circuit consisting of 1011 ROs with 1001 stages on test chips fabricated in 65-nm SOTB and bulk processes, and demonstrate the notable impact of low-variability SOTB technology on ultra-low-voltage design. In addition, the operating voltage limit of logic circuits is the fundamental limit of the minimum operating voltage of SRAMs. Our measurement results on the operating voltage limit of logic circuits in bulk and SOTB processes suggest the goal of research on the reduction of the minimum operating voltage of SRAMs in the common bulk, SOTB, or other variability-suppressed processes.

Implementation of Test Element Group (TEG)
We designed 337 RO patterns, and each RO pattern consists of 3 ROs. An RO pulse is counted by the counter on the chip. To measure ultra-low-voltage operation of ROs, we implemented three power supply and ground-line pairs for ROs, a counter and buffers. The buffers were located between the ROs and the counter. Table 1 describes the details of the implemented 337 ROs. We selected INV, NAND, and NOR gates as measurement target gates. Our nominal design of INV, NAND and NOR cells had 1:1, 1:1 and 2:1 P:N (pMOS width:nMOS width) ratios, respectively. In addition to basic logic cells, complex gates are commonly included in standard cell libraries, and these complex gates frequently include serially stacked transistors, transmission gates, or an unbalanced P:N ratio. The impact of these designs in a low-voltage design is different from that of a MOS width (W) change. Although we observe ultra-low-voltage operation of 1 million logic gates with this RO Test Element Group (TEG), this TEG is also aimed to be designed for observation of layout dependence [18]. Gates in which shallow trench isolation (STI) stress effect and inverse narrow channel effect were escalated were included in the RO patterns. The stress caused by STI causes a mobility shift of pMOS and nMOS transistors depending on the distance between the transistor gate and STI edge. The inverse narrow channel effect is the reduction of V th observed in narrow channel device. These effects can cause V th or mobility variation, and are beneficial for estimating low-voltage behavior of logic circuits.
Changes in STI effect were implemented with active area design. Each standard cell consisted of an even number of pMOS or nMOS transistor, and their sources or drains, which are connected to ground or power, were connected to neighboring gates. When a source or drain area of a pMOS or nMOS transistor is connected to that of neighboring gates, STI is not inserted between the gates, and the impact of STI in the gate width direction can be weakened. The strength of the inverse narrow channel effect can be escalated by narrowing the transistor width. Table 2 shows implemented STI conditions in the gates. We designed four STI conditions, i.e., (pMOS STI weak, nMOS STI weak), (pMOS STI weak, nMOS STI strong), (pMOS STI strong, nMOS STI weak) and (pMOS STI strong, nMOS STI strong), for INV, NAND and NOR cells. In addition to changing STI conditions, simple transistor width changes were also implemented such that the impact of STI could be compensated. The STI stress effect is known as nMOS mobility degradation and pMOS mobility enhancement, and (pMOS nominal W, nMOS W +10%) and (pMOS nominal W, nMOS W +20%) patterns were additionally designed for (pMOS STI weak, nMOS STI strong) and (pMOS STI strong, nMOS STI weak) designs.
We also designed the gates with narrow channel transistors as shown in Table 3. We set 1/n× width per finger and n× number of fingers for cells to observe narrow channel effects, and enhanced the narrow channel effect without changing the total transistor width in each cell. We used this approach because simply narrowing the width causes a reduction of driving current of transistors, and the narrow channel effect and reduction of width would otherwise be confused in our measurement results. Implemented (width per finger, number of fingers) patterns were (1/1.5× width per finger, 1.5× number of fingers), (1/2× width per finger, 2× number of fingers), and (1/3× width per finger, 3× number of fingers). The pMOS and nMOS conditions of narrow channel cells includes (pMOS nominal width, nMOS narrow width) and (pMOS narrow width, nMOS narrow width).
An RO consisting of only one type of cell may report too optimistic a minimum operating voltage and, in addition to ROs with one type of cell, ROs which include two or three types of cells were also designed. The details of the number of RO patterns are shown in Table 1. 53 patterns of ROs included a single type of cell, whereas 284 patterns were composed of two or three cells. Figure 3 shows the connection of cells when an RO included two or three types of cells. When two types of cells were included in an RO, the cells were organized in an alternating pattern. When an RO had three types of cells, these cells repetitively appear in the order shown in Figure 3.   Each RO consisted of 1000-stage measurement target gates and 1 NAND gate for controlling oscillation. There were trade-offs between implementation costs of glue logic and granularity of yield evaluation. The 1-million gate measurement structure can be implemented with a 1000-to-1 selector and 1000 output wires from ROs by adopting 1001-stage ROs. The implemented 1011-to-1 selector consists of 1010 multiplexer cells, and does not include a notably large number of cells in comparison with 1001-stage RO.
Although the routing of 1011 wires was a bottleneck of the layout of the selector circuit and required 0.964 mm 2 area on the test chip with the automated place and route software [19], the area cost of routing was acceptable because we had 12.5 mm 2 area on the chip. Had we adopted 101-stage ROs, we would have required a 10000-to-1 selector, which consists of 9999 multiplexer cells, and 10,000 wires. Compared to the chosen implementation of the 1011-to-1 selector and 1011 wires, the area cost for the implementation of 9999 cells and 10,000 wires would have been unacceptable for our test chip. Consequently, we avoid the huge area cost of a selector circuit and wiring cost of RO control and output signals by adopting 1001-stage ROs. Though the yield of a small-size circuit becomes difficult to be identified when too large RO size is adopted, yields of 1 million gates, 100,000 gates and 10,000 gates can easily be calculated from the measurement results of the 1001-stage ROs. These circuit scales are suitable for practical use of Application Specific Integrated Circuits (ASIC) or System on Chip (SoC) applications. The test chips were fabricated in 65-nm SOTB and bulk processes. Figure 4 shows the micrograph of the fabricated test chip. The size of the test chips is 5.8 mm × 5.8 mm, with an area of approximately 5.0 mm × 2.5 mm allocated for the RO TEG.

Measurement Setup
We measured 4 bulk chips and 4 SOTB chips. Supply voltages and back-gate bias voltages were supplied by source measure units. Measurement was controlled by an external signal input generated via a pattern generator, and the values of the counter, which counted RO pulses, was read by an external logic analyzer. All of the 1011 ROs on each chip were measured. RO pulses were counted over a 125 ms time window, and each RO was measured 5 times. The average results excluding the fastest and slowest were recorded as measured values. Supply voltages were varied from 0.1 V to 0.55 V, and the RO period and tendency of operation failure were observed. Back-gate bias voltages were set to V dd and V ss for pMOS and nMOS transistors, respectively, in the RO period measurement, which is described in Section 4. Adequate reverse bias voltages were set for the observation of operation failure, which is detailed in Section 5.

Variability in SOTB and Bulk Processes
In this section, we evaluate the process variability of bulk and SOTB processes based on measured RO periods. RO periods on SOTB and bulk chips were measured at 0.4 V and 0.55 V V dd respectively. As V th of SOTB and bulk transistors were different, we compared RO periods of SOTB and bulk chips at supply voltages where the same RO period was obtained. The periods of basic ROs of SOTB at 0.4 V and bulk at 0.55 V were approximately 500 ns. Figure 5 shows a histogram of measured RO periods. RO periods were normalized by averaging RO periods of each pattern. Each pattern had 3 ROs on each chip, and 12 ROs of each pattern on 4 chips were used for calculating the average value, and periods of 12 ROs were normalized by the calculated average of each pattern. The probabilities of SOTB were notably higher than that of bulk in the RO periods ranging from 0.998 to 1.002, and clearly lower in the range from 0.992 to 0.996 and the range from 1.004 to 1.018. The distribution of RO periods was more concentrated on the average period in the SOTB process than that of the bulk process and, thus, we confirmed smaller local random variability in the SOTB process. Although notable differences were not observed in ranges below 0.990 and above 1.020 between SOTB and bulk results, distributions in these ranges can be considered to be caused by the impact of global variability, which is not improved by SOTB technology.
We also calculated the standard deviations (σ) of RO periods for each of the 337 RO patterns on each chip, and Figure 6 shows the histogram of the calculated σ values. For each RO pattern, 3 ROs were implemented on each chip, and standard deviations of each pattern on each chip were calculated to eliminate the impact of global variability from standard deviations. Each standard deviation value was normalized by the average period of 3 ROs of each pattern on each chip. The peaks of σ distribution of SOTB and bulk were 0.2% and 0.5% respectively. The σ distribution of SOTB was larger than that of bulk in the range from 0.1% to 0.3%, and smaller in the range from 0.4% to 1%. The σ values of SOTB concentrate in a smaller range, and small random variability of SOTB was also confirmed, as in the results of average RO period. Both Figures 5 and 6 indicate low variability of the SOTB process in comparison with the bulk process.

Low-Voltage Operation
1011 ROs with 1001-stages on test chips fabricated both in bulk and SOTB processes were measured. The V th mismatch between pMOS and nMOS transistors can be one of the factors that deteriorates circuit operating voltages, and considerable efforts are therefore devoted to minimize this V th mismatch in process development; however, at near V dd = 0.1 V, the margin of V th mismatch is very small, and even the slight V th mismatch has a serious impact. We tested several back-gate bias conditions, identifying back-gate bias conditions that minimized operation failure for bulk and SOTB chips. Our results for the applied back-gate bias conditions include: VBP and VBN are pMOS and nMOS back-gate bias voltages. Furthermore, in conditions b, back-gate bias voltages were applied to compensate for V th mismatch, whereas in conditions a, the back-gate bias voltages were applied equally to pMOS and nMOS transistors, and the back-gate bias voltages were not applied to compensate for V th mismatch. When reverse body bias voltages were not applied, the RO period on bulk and SOTB chips were 307.5 µs and 4.36 µs, respectively, at V dd = 0.2 V. The RO period in Bulk-b, SOTB-a and SOTB-b conditions increased to 902.5 µs, 11.1 µs and 14.0 µs at V dd = 0.2 V, respectively. Figures 7 and 8 show the measurement result of the RO operation. In these figures, the x-axis is the supply voltage and the y-axis is the ratio of the number of ROs that failed to operate. The yield of 10 k-gate and 100 k-gate circuits can be calculated as (1−Y ) 10 and (1−Y ) 100 , respectively where Y is the failure ratio. When the back-gate bias voltages were equally applied for pMOS and nMOS transistors, the first failures were observed at 0.25 V and 0.15 V for bulk and SOTB, respectively. The failure rates at this voltage enabled a practical yield of a 100 k-gate circuit. A 0.072 failure ratio of SOTB at 0.125 V corresponds to approximately 47.3% yield of a 10 k-gate circuit, whereas 0.225 V is required for the bulk process to obtain this yield. Adjusting the back-gate bias lowered the first failure voltages to 0.2 V and 0.125 V for bulk and SOTB, respectively. Over 40% yield for a 10 k-gate circuit is achieved at 0.115 V in the SOTB process whereas 0.175 V was required for the bulk process. These measurement results confirmed that the low variability of SOTB notably lowers the minimum operating voltage of a large scale logic circuit, and the low-variability of SOTB technology enables near-0.1 V operation of logic circuits. At near 0.1 V, the margin against V th mismatch is seriously small in comparison with 0.2 V-to 0.4 V-class low-voltage operation. Even V th control by the foundries for low-voltage operation will not sufficiently eliminate this V th mismatch. Measurement results also indicate that back-gate biasing is effective for obtaining operation margin in extremely low-voltage and small-V th -margin conditions.
Here, the limit of the operating voltage on the bulk process was 0.175 V-0.25 V according to our measurement results. The limit of operating voltage of several improved SRAMs reported in [13][14][15], 0.16 V-0.25 V, has already reached our measured limit of the operating voltage of the logic circuits. The operating voltage of SRAM cannot be fundamentally reduced beyond that of logic circuits, and further lowering of the operating voltage of SRAM is estimated to be difficult.  We also focused on the tendency of 1011 ROs to operate successfully. Here we numbered 1011 ROs serially, and 3 ROs on each pattern on each chip were identified by the location. We measured 4 bulk chips and 4 SOTB chips, and the number of chips where specific ROs operate is from 0 to 4, depending on supply voltages. For example, in the case of bulk 0.15 V operation, the total failure ratio was 0.455 with 6.8% of ROs operating on all 4 chips, 13.6% of ROs operating on 3 chips, 44.8% of ROs operating on 2 chips, 24.1% of ROs operating on 1 chip, and 10.6% of ROs failing on all 4 chips. The number of chips with a specific RO operating successfully strongly depends on the total failure ratio; therefore, the total failure ratio was also considered in our evaluation. In Figures 9 and 10, the tendencies of bulk-b and SOTB-b conditions were demonstrated, with x-axis representing the total failure ratio, and the y-axis representing the ratio of the number of ROs operating on 0, 1, 2, 3, and 4 chips.
In Figures 9 and 10, the number of ROs operating on 1, 2, or 3 chips in the bulk test chip is larger than that in the SOTB test chip, and number of ROs operating on 0 or 4 chips in the bulk test chip is smaller than that in the SOTB test chip. This tendency indicates that RO operation failure tends to be random in bulk chips, and deterministic in SOTB chips. In particular, the cause of RO operation failure in bulk chips can be considered to be random variability. Conversely, in SOTB chips, the bottleneck of the operating voltage can be thought to be a gate design problem rather than random variability. These results indicate that random variability of the SOTB process is smaller than that estimated from the observed operating voltage and that the operating voltage of the SOTB process can be further decreased by optimizing the gate design.      Figure 10. Distribution of the number of ROs operated on 1, 2, or 3 chips; the x-axis is the total operation ratio shown in Figures 7 and 8.

Conclusions
In this study, we implemented RO TEG chips in 65-nm SOTB and bulk processes, and measured the minimum operating voltages of logic gates. The SOTB process achieved fine yield with supply voltages as low as 0.11 V-0.15 V depending on the circuit size and back-gate bias conditions, whereas the bulk process required 0.175 V-0.25 V supply voltages. The low variability of the SOTB process significantly contributed to decreasing the minimum operating voltage. The minimum operating voltage achieved by the SOTB process was close to the voltage at which the minimum energy/cycle is expected with 65 mV/dec and 45 mV/dec SS transistors in simulation. Measurement results also indicated that applying a back-gate bias voltage was effective at reducing V th mismatch beyond that of general process control, which is vital at ultra-low-voltages.