A Compact Digital Pixel Sensor (DPS) Using 2T-DRAM

In digital pixel sensors (DPS), memory elements typically occupy large silicon area of the pixel, which significantly reduces the pixel’s fill factor while increases its size, power and cost. In this work, we propose to reduce DPS memory’s area and power overhead by reducing the memory requirements with a multi-reset integration scheme, and meanwhile employing a dynamic memory instead of traditionally exploited large 6T-SRAM cell. The operation of the DPS takes advantage from the chronological change of the code, which results in reduced memory needs without affecting the light resolution. In the proposed implementation, a 4-bit in-pixel memory is used to reduce the pixel size, and an 8-bit resolution is achieved with multi-reset scheme. In addition, full complementary metal-oxide-semiconductor (CMOS) 2T DRAM and selective refresh scheme are adopted to implement the memory elements and further increase the area savings. This paper presents the proposed multi-reset integration methodology and its implementation with dedicated memory circuits. Proposed architecture is validated by a prototype chip fabricated using AMS 0.35 μm CMOS technology. Reported experimental results are compared with relative works.


Introduction
Complementary metal-oxide-semiconductor (CMOS) image sensors are now part of our everyday's life covering a wide spectrum of applications from cell-phone cameras, webcams, digital cameras, video games to security and automotive applications [1].The successful deployment of CMOS image sensors over the charge coupled devices (CCD) technology is mainly due to reduced cost and power consumption, as well as higher integration and on-chip processing capabilities, which are critical for mobile applications.
Among CMOS image sensors, two mainstream architectures are to be distinguished depending on where the analog-to-digital conversion (ADC) is achieved.First is active pixel sensors (APS), which perform the ADC outside the pixel array using a per-array or per-column readout scheme.Second is digital pixel sensors (DPS), which integrate the ADC at the pixel level.DPS, while giving flexibility of use, perform massive parallel analog to digital conversion at the pixel level and enable the promising pixel-level image processing, which is very attractive for real time applications such as automotive and surveillance systems.In addition, due to a very early analog-to-digital conversion, DPS can offer improved noise figure and dynamic range [2].Previous DPS implementations are based on either pulse frequency modulation (PFM) [3] or pulse width modulation (PWM) [4] scheme to digitize the voltage of the photodiode's sensing node with a voltage comparator.The PWM scheme uses the comparator output signal to write a global counter value in the pixel-level memory, while the PFM scheme uses the comparator output to enable an increment of the pixel-level counter.Both PFM and PWM schemes are typically power hungry, due to the high switching activity at the pixel level.In addition, a major drawback of both schemes is the requirement of area-hungry pixel level memory particularly for high resolution imaging.As a consequence, the area and power overhead due to the pixel level memory are substantial and can account for up to 50% of the total overhead.This strongly impacts the pixel's fill factor and thus its light sensitivity as well as the power and pixel area.
Several attempts were made to reduce the memory needs at pixel level.Among the other solutions, the use of an address event representation (AER) scheme [5] completely removes the memory from the pixel at the cost of increased complexity and the introduction of timing errors due to collision which occurs when several pixels attempt to access the data bus simultaneously.Another reported solution consists of compressing the data even before storage as proposed in [6].This last solution drastically reduces the pixel area but at the cost of reduced signal-to-noise ratio (SNR) due to the use of lossy compression scheme.In [7] , the output of the comparator is sampled and stored using a single bit register cell per pixel.All register cells are connected in series to form a scan chain and the whole array is scanned after each sampling operation.While only one bit pixel-level memory is required, this architecture increases power consumption as the number of samples which is a strong function of the resolution.In this work, a DPS architecture using 2T DRAM as the storage element and a multi-reset integration methodology are proposed in order to reduce both the memory needs and the memory size at the pixel-level and reduce the power overhead.
The paper is organized as follows: Section 2 introduces the PWM time domain DPS and its conversion time analysis.Section 3 is devoted to the multi-reset integration methodology and a discussion about the trade-offs involved in terms of speed, area and memory size.Section 4 describes the multi-reset integration DPS architecture, the pixel circuitry and the 2T DRAM implementation and sensing scheme.Section 5 provides an analysis of power consumption as well as power reduction techniques.Section 6 presents the prototype implementation and experimental measurements.Section 7 concludes this work.

Conventional Architecture
The conventional pixel architecture of a DPS using the PWM technique is shown in Figure 1.The pixel comprises a photodiode P d , a reset transistor AR, a voltage comparator, a memory unit and a feedback circuit to perform the auto-reset of the photodiode.Outside the pixel array, a timing control unit and a global counter are required to perform the conversion of the light intensity into a digital code.A read-out circuit is also needed to read the contents from the memory and to output the data to the processing unit.

Td2
The integration phase starts by disabling the precharge to the supply voltage of the photodiode, i.e., when the reset signal transits from 0 to V dd .Meanwhile, a global counter located outside the pixel array is simultaneously enabled.The voltage V d of the photodiode node loaded with a capacitance C pd starts decreasing proportionally to light intensity due to the photo-generated current I d through the photodiode P d .When the voltage V d reaches a reference voltage V ref , the output of the comparator switches high and the counter value is written to the pixel-level memory.The reset of the photodiode is performed automatically after the photodiode voltage has reached the reference level V ref .The written code in the memory is a digitized value of the time required for V d to reach V ref , which is a function of the light intensity.Using first order approximation, the integration time is expressed as: From Equation (1), one must notice that the integration time is inversely proportional to the photo-generated current I d .To perform the quantization of the integration time, the global counter is used to provide the quantization boundaries for the time to digital conversion.The non-linearity between the photocurrent and the integration time can be compensated by adapting the frequency of the global counter as shown in Figure 2. On one hand, Figure 2

Conversion Time Analysis
PWM coding scheme encodes the illumination level information of each pixel using a single pulse.This pulse width represents time T int to discharge a photodiode from V dd to V ref .In order to convert the pulse into digital code, a time code generated by the global counter is written into the embedded memory once the pulse is detected.There are two quantization approaches to convert the PWM pulse signal into digital code.The first approach, referred to as uniform time domain quantization (UQ), which provides sampling times (or boundary quantization levels) from T min to T max uniformly.The second approach referred to as non-uniform time domain quantization (NUQ), resolves the sampling time in order to form a uniformly quantized photocurrents.A non-linear time-domain quantizer is therefore required in order to compensate for time-to-photocurrent non-linearity.

Uniform Time-Domain Quantization
Uniform time domain quantization scheme divides the time within the boundaries T min and T max equally into 2 n quantization levels, where n represents the number of bits or resolution.Therefore, the time width of each quantization level △T can be expressed as: While the relationship between the quantization level ξ U Q (n, t) and the time width of the PWM pulse t is illustrated in the following equation: Using Equation (3) the conversion between the quantization level ξ U Q (n, I d ) and the discharge photocurrent I d of the photodiode can be expressed as following: Assuming that I dmax ≫ I dmin , Equation (4) can be approximated as: Equation (5) suggests that, the quantization level is inversely proportional to the discharge current under the UQ scheme, as I dmin I dmax is a constant.Indeed, it also suggests that the UQ scheme is sensitive to the discharging current close to I dmin , shown in Figure 3(b).This suggests that the UQ scheme focuses on quantizing the low illumination levels, while the high illumination levels are not properly covered.

Non-Uniform Time-Domain Quantization
Non-uniform time domain quantization scheme resolves the quantization levels in order to form linearly distributed photocurrents △I within the boundaries I dmin and I dmax .
The photocurrent I d can be converted to its dedicated quantization level ξ N U Q (n, I d ) as follows: From Equation ( 7), we can note that the relationship between the time and the quantization level ξ N U Q (n, t) can be expressed as: In contrast to UQ scheme, Equation (8) suggests that, the NUQ quantization levels are inversely proportional to the sampling time.Figure 3(b) shows that the NUQ provides an evenly distributed photocurrent sampling boundaries.

The Proposed Multi-Reset Integration (MRI) Scheme
In order to reduce silicon area of the pixel and to improve the fill factor, a multi-reset integration (MRI) scheme is proposed in this work to reduce the memory needs at the pixel level.This section presents the concept of the MRI and discusses the trade-offs in terms of delay overhead depending on the size of the pixel memory.

MRI Concept
The proposed integration scheme takes advantage of the sequential way how the illumination level is digitized.The MRI scheme can be interpreted as performing the integration process several times in order to resolve each bit of the illumination code sequentially, from the most significant bit (MSB) to the least significant bit (LSB), as shown in Figure 4.During each new integration phase, only a sub-set of the n bits are stored at the pixel reducing the memory requirements.Then, between two integration periods, the content of the pixel memory is scanned out of the array allowing for the remaining bits of the code to be stored during the successive integration phases.The required number of bits for the memory is therefore reduced by a factor proportional to the number of iterations.For example, considering a resolution of 8 bits and a single bit memory, 8 successive resets will be required.In order to reduce the delay overhead caused by the successive integration periods, it is possible to define timing boundaries for which the value of the concerned bit of the code is resolved allowing to optimize the corresponding integration duration for each bit.Assuming the photocurrent is quantized into 2 n values, with n as the resolution.The quantization levels N QT covered by each bit can be expressed as: where bit(0) represents the LSB and bit(n − 1) represents the MSB.The quantization levels coverage of the MSB is only half of the total quantization levels.Therefore, the optimized integration time for the bits closer to the MSB is much shorter than those towards the LSB.Indeed, derived from Equation (1), the optimized duration of the partial integration phase T int required to obtain the bit number i of the code is given by: ) where T max = α I min is the maximum integration time for n bits resolution.Under the I max ≫ I min condition, Equation (10) illustrates that integration time decreases exponentially when i increases.Assuming 1-bit pixel memory, getting the N bits successively will give a total integration time T total that corresponds to the sum of the durations of each partial integration phase T int .Therefore, the actual integration time of the proposed scheme is expressed by: From this equation, the total time required to realize the integration with the multi-reset integration scheme is much shorter than n × T max .It can also be noted that the time required to scan the values out is not accounted for in this equation.However, this time can be reduced significantly using parallel and high-speed readout and remains negligible compared to the total integration time.In addition, the read-out phase can be interleaved with the integration phase.Figure 5 takes 2-bit resolution as example and assumes I max /I min = 100.Using Equation (10), It is possible to resolve the MSB at a 1/26 fraction of the integration period followed by the second bit or the LSB after one integration period.Originally, under the MRI scheme, 2T max are required to obtain the MSB and the LSB values.In this scheme, using the optimum timing boundaries, the total integration time is reduced by 48%.These optimized timing boundaries are used to reset each partial integration therefore greatly reduce the timing overhead for high resolution imagers.In addition, over integration periods are avoided leading to reduced power consumption.

Trade-off Analysis
Depending on the memory size integrated at the pixel level, different area-delay trade-offs can be achieved.Indeed a trade-off can be made between the required number of bits for the pixel-level memory and the required number of reset of the integration phases.MATLAB simulations were performed to extract the trends in pixel area versus the size of the memory embedded at the pixel-level and the results are shown in Figure 6.In the simulation, a square pixel with 30% fill-factor is assumed.The area is estimated for CMOS 0.35 µm technology.Area limited by the metal wires and the total size of the transistors are also considered.Area of pixel-level embedded memories with 6T SRAM and 2T DRAM are both considered.The Figure illustrates that the area of the pixel with embedded 6T SRAM is linear to the number of embedded bits, since it is mainly constrained by the transistors' area.While the area of the pixel with embedded 2T DRAM is mainly constrained by metal wires.The size of a pixel using 4 bit 2TDRAM is only 72% of the pixel using 4 bit 6T SRAM.MATLAB simulations were also performed to extract the trends in the delay overhead versus the size of the memory embedded at the pixel-level.Figure 7 shows the interpolated curve of the delay overhead in terms of integration time versus the memory size n for an 8-bit resolution by assuming Imax I min = 250.From these results, it appears that using a 4-bit pixel-level memory requires only one reset and leads to less than 10% overhead on the total integration time.Considering both the area and the integration time overheads, using the 4 bits 2T DRAM optimally maintains a small pixel area and keeps a relatively high operation speed.

Overall Architecture
In order to implement the MRI scheme with a dynamic memory several changes are required compared to the conventional architecture.As a result of the trade-offs analysis presented in the previous section, the choice to implement a 4 bit memory element has been made.Besides, on top of cutting the memory requirements by a factor of two with the MRI scheme, the use of the full CMOS 2T DRAM instead of the 6T SRAM cell is proposed to further increase the area savings.The block diagram of the resulting global architecture is presented in Figure 8. Two scan shift registers are used to select each line of the array during the reading operation and to shift the read values out of the sensor.The memory sensing and refresh circuit is composed of sense amplifiers to both detect the state of the pixel: "fired" or "not fired", as well as to read the content of the pixel-level memories, simultaneously.Based on the state of the pixel, a conditional refresh circuit allows to selectively rewrite the useful information.In other terms, based on an external leakage monitor or an external temperature sensor giving retention time conditions in the memory, the content will be refreshed with minimal overhead for power consumption.A control unit allows the pipeline of the read operation and the scan out operation to minimize the time between two partial integrations.Large write buffers are used to store the information from the global counter inside the pixel-level memory during the integration.A counter timing unit and a linearization circuit are also used to provide the data to the pixel-level memories.

Pixel Circuit
The proposed pixel circuitry is depicted in Figure 9, which contains a reset transistor controlled by a global reset signal, a voltage comparator, a flag circuitry to indicate the firing state of the pixel, a pass logic circuit to bypass the output of the comparator during the refresh operation and finally the 2T DRAM cells.The operation of the proposed pixel starts by a reset low signal enabling to set the photodiode voltage to the supply level.At the end of the reset phase, the photodiode is left floating and the voltage of the equivalent capacitance of the photodiode is discharged by the illumination dependant photocurrent.At the same time, a global down counter outside the pixel array is enabled and its data is fed to the internal storage elements of the pixel.The memory is set in the write mode during the integration phase, as shown in Figure 10 and the content of one bit DRAM will track the corresponding data line.When the photodiode voltage crosses the reference voltage, the output of the comparator switches disabling the write signal.When a row is accessed the memory is set in the read mode and the flag signal allows the write driver of the column to refresh the data if the pixel has fired.At the same time, all the pixels of the same column belonging to different rows are set in a hold mode in order to prevent a wrong data to be written to the memory.

2T DRAM Implementation
In the proposed architecture, a 4-bit 2T DRAM is used as the pixel level memory element.The use of 2T-DRAM reduces the area compared to the conventional 6T SRAM.The use of a dynamic memory fits very well the requirements of a DPS as a frequent use of the memory is required with multi-reset integration scheme.However, the use of dynamic memories adds some complexity compared to static ones mainly due to the loss of charges on the storage node caused by leakage currents.A refresh circuit is therefore required to solve this problem, depending on external conditions, temperature, supply voltage and process variations.In this part, the implementation of the pixel dedicated memory is described.Some techniques employed in memory design will be reviewed and adapted to the need of the DPS architecture.
The 2T DRAM cell shown in Figure 11(a) is derived from the well known 3T-DRAM and was proposed recently as a potential memory cell for microprocessor's cache [9].The main advantage of this memory cell is that it uses a full CMOS technology and it improves density compared to the 3T-DRAM, by removing the access transistor.The operating principle illustrated in Figure 11(b) can be summarized as follows: during a write operation, the write word-line is set high or low in order to charge or discharge the storage node loaded by the gate capacitance of transistor M2 and the diffusion capacitance of transistor M1.Note that for a conventional NMOS bit-cell, the gate voltage for storing the state 1 will be limited to the supply voltage minus one threshold voltage.In the hold mode, the write word-line is kept low and the leakage currents discharge progressively the internal node and therefore determine the data retention time and correspondingly the required refresh period.During the read operation, the read word-line is set low enabling or disabling the discharge of the pre-charged read bit-line, depending on the stored state on the gate of transistor M2.

Differential Sensing Scheme and Voltage Generation
The choice of differential sensing was made as it reduces the voltage swing on the bit-lines thus the power consumption incurred by the read operation.Besides, differential sensing improves the robustness to common mode noise on the bit-line such as supply voltage variations.This scheme has however a cost as it requires both a bit-line and a reference bit-line for a single cell and a reference voltage generation circuit.In order to reduce coupling noise between bit-lines, transposed bit-line architecture is adopted using regular interleave of bit-lines within the pixel array.As shown in Figure 12, each column is divided into sub-blocks indexed either "odd" or "even".During a read operation, the control unit outputs the state of the block being accessed either "odd" or "even".This signal allows to determine the bit-line and the reference bit-line for correct reference voltage generation and also to configure the latch input, using a multiplexer.The reference voltage is generated using a row of dummy cells either stuck at value 0 or 1, giving an intermediate voltage for the sensing operation.

Read and Refresh Circuit
In order to activate the sense amplifier at the right time for correct sensing of the memory cell, a replica circuit (Figure 13) based on a dummy column and a dummy memory cell is used [10].Note that for correct sensing of the memory, the differential voltage, i.e., the voltage between the reference bit-line and the bit-line to be read, must be higher than the offset value of the sense amplifier.Figure 14 depicts the reading stage designed to access the pixel level memory.As all memory cells of a same pixel must be read simultaneously, a simple sense amplifier structure was chosen.A basic latch-based sense amplifier is used to fit within the pixel pitch.Each pixel contains 4 bits, all are read simultaneously.Therefore, four sense amplifiers have to fit with one column pitch.The reading operation is a large contributor to the total power consumption of the chip, indeed the charge required during pre-charge operation of the large bit-line capacitances and the large static current flowing through the sense amplifier must be reduced as much as possible.As a consequence, a close control of the sense amplifier timing is critical and can reduce large amount of power.As the sensing operation is differential, the generation of a reference voltage is required.The reference voltage is generated as in [9].The transposed bit-line architecture allows to reduce the coupling effects between two adjacent lines but adds some complexity.In order to keep power under control, the refresh operation is performed only on the fired pixel (Figure 15).This prevents unnecessary switching transitions on the data buses, thus saves power.In order to achieve this, a flag signal is used to detect the state of the pixel and only if the pixel has fired the content of the memory is read and refreshed.To control the refresh operation a signal is fed to the DPS starting the refresh operation.

Power Analysis and Power Reduction Techniques
In this part an analysis of the power consumption of the proposed architecture is given.Power consumption results were obtained from electrical simulations using Spectre simulator from Cadence.The aim of this analysis is to identify the major contributors to the total power consumption by giving the distribution of total power among different blocks of the architecture such as the array, the sense amplifier, precharge circuit and the write buffers.This analysis is then used to consider power reduction techniques that are generally costly in terms of speed and area, only on critical blocks of the architecture.

Power Consumption Analysis
In order to perform our power consumption analysis, a critical path of the 64 × 64 DPS has been designed and simulated using Spectre electrical simulator to reduce the netlist size and therefore the simulation time.From this analysis, it is clear that main contributor to power and energy consumptions are the pixels as the voltage comparator of each pixel is drawing large static current during the whole integration time.Considering mobile applications, energy is the key metric to be considered and except the energy required by the pixel to capture one frame, one can say that the remaining energy consumption is spread on all blocks.The energy consumption from scan shift registers can be neglected.Table 1 shows the power consumption analysis from electrical simulation at nominal process and supply voltage.

Power Reduction Techniques
The power consumed by the voltage comparators contained in each pixel represents the major contributor of the total power consumption.In order to reduce the power consumption compared to the actual implementation, either an auto-reset function employing a feedback unit or a different comparator structure could be used.The main advantage of a latch-based sense amplifier is the power consumption reduction compared to the conventional architecture, where a bias current is drawn permanently from the supply.However, this kind of comparator suffers from large offset resulting in a lower image quality.To solve this problem, a pre-amplifier may be used as a first stage at the cost of increased area overhead.Another solution is the use of switched op-amp technique, allowing to enable or disable the comparator using a clock signal.This circuit matches well with the time domain DPS architecture.Indeed, during the analog to digital conversion, the voltage of the photodiode needs to be compared to a reference voltage V ref periodically when the global counter output is switching from one state to another.If the global counter frequency is low enough to switch on the comparator during a short period before the counter value changes, then power savings may be achieved.This scheme is even more efficient when using the non-uniform quantization scheme, as the frequency is reduced along with the integration period.Therefore the switching activity and the time during which the comparator is on is strongly reduced compared with the standard implementation.Figure 16 depicts the schematic of the proposed implementation for the DPS voltage comparator and simulated curves of key signals.
Compared with the conventional pixel level comparator, 2 PMOS and 1 NMOS are added to control the switching.These three gates only increase the size of the proposed design in Figure 9 by 2.8%.Transistors MN1-MN2-MP1-MP2 constitute the differential pair, MNB is a bias transistor with its gate voltage controlled externally for improved flexibility.MNF is a footer transistor to cut the supply path once the pixel is fired.MP3 and MN3 constitute the output stage of the comparator to have a full swing signal output voltage os2.MPS is added to set the correct state on the output during inactive mode by controlling the voltage os1.Note that we can use this comparator with a photodiode set in photovoltaic mode allowing for energy harvesting capabilities.The feedback circuit used to disable the operation of the comparator is not affecting the reset transistor but controlling the supply path and enabling to maximize the time during which the photodiode is in the energy harvesting mode.From the simulation results (Figure 17), one can observe that while enabling periodically the comparator, only one pulse is observed on signal os2 corresponding to the crossing of the photodiode voltage with the voltage level V ref .The simulated power consumption of the voltage comparator is in the range of nW depending on the frequency and linearization scheme used, which greatly reduces the power consumption of the original implementation by 2 to 3 orders of magnitude.

Hardware and Measurement Results
In order to validate the proposed architecture, a prototype of a 64 × 64 DPS array was designed using the AMS 0.35µm CMOS technology.Figure 18 shows the microphotograph of the fabricated chip illustrating the different parts of the chip namely, the pixel array, the shift registers for line selection and scan-out of data as well as the control block.Table 2 summarizes the main characteristics of the proposed implementation compared to previous 6-T SRAM based DPS implementations [7] and [2] using the same technology node.The area of the proposed pixel is 22 µm × 22 µm and the fill factor is 20%, which could be further improved with the use of a single ended sensing scheme compared to the differential one we used in this work.While the current consumption per pixel is improved compared to previous implementations, it is expected that further benefits can be obtained using power reduction techniques discussed in previous sections.

Conclusions
Digital pixel sensors are promising architecture for high speed image acquisition, high dynamic range and high illumination image.However, the area occupied by the memory is a major drawback reducing the pixel sensitivity to light.In this paper two different approaches were explored to reduce the area of the pixel memory.First approach is to reduce the memory requirements by using the proposed multi-reset integration scheme.The second approach is to directly improve the area of the storage element using a 2T-DRAM cell instead of the area-consuming 6T-SRAM cell.To successfully implement this concept, a new DPS architecture was proposed.This DPS relies on a multi-reset integration scheme that takes benefit from the chronological way the bits of the code are changing.Using this scheme, a four bit per pixel memory was employed making the design of 20% fill factor and 22 µm × 22 µm digital pixel sensor possible.Considering the DPS architecture for mobile applications, the power consumption of the sensor is also of major concern.Detailed analysis of the power contributors presented in this paper helped to identify the main building blocks contributing to power.The voltage comparator within each pixel is identified as the bottleneck in terms of power due to its large static current during the whole integration phase.Some power reduction techniques for future DPS implementations were also proposed and discussed in this paper.

Figure 1 .
Figure 1.Schematic view of the conventional pixel and corresponding timing diagrams.
(a) represents a uniform time domain quantization leading to a non-linear response from the sensor as the photocurrent boundary quantization steps are non-linear.On the other hand, the non-uniform time domain quantization depicted in Figure 2(b) enables to linearize the conversion of photocurrent into digital code as the photocurrent boundary quantization steps are now linear.
(a) Uniform time domain quantization.(b) Non-uniform time domain quantization.

Figure 3 .
Figure 3. (a) 3 bits UQ and NUQ in terms of time; (b) 3 bits UQ and NUQ in terms of discharging current.

Figure 4 .
Figure 4. Timing diagram of the multi-reset integration scheme.

Figure 5 .
Figure 5. Illustration of timing boundaries for a 2-bit resolution.

Figure 6 .
Figure 6.Pixel area as function of the number of embedded bits.

Figure 7 .
Figure 7. Interpolated curve of the delay as a function of memory size.

Figure 8 .
Figure 8. Block diagram of the overall architecture.

Figure 12 .
Figure 12.Diagram of the transposed bit-line architecture and the latching stage.
Schematic diagram of the refresh scheme.(b) Simulated curves.

Figure 16 .Figure 17 .
Figure 16.Schematic diagram of the proposed switched-opamp comparator for PWM DPS applications.

Figure 19 .
Figure 19.Measurement results of key electrical signals.Reset, Vpd, Wbl and Rbl are the reset signal, the photodiode voltage, the right bit line signal and the read bit line signal, respectively.

Table 1 .
Power consumption analysis from electrical simulation at nominal process and supply voltage.

Table 2 .
Comparison of key metrics with related work.
** Estimated from the corresponding literature.