Readout Circuit Design for RRAM Array-Based Computing in Memory Architecture

: In recent advancements, the traditional von Neumann architecture has been challenged by the computational needs of AI. This is due to its high power and data transfer costs. As a solution, the computing-in-memory (CIM) architecture, which combines storage and computation, has gained attention for its superior computational power and energy efficiency. Within CIM, using resistive random access memory (RRAM) arrays, the readout circuit, which converts analog outputs from multiply–accumulate operations into digital signals, faces limitations due to its area and power consumption. There are mainly two types of CIM readout circuits for analog types: the traditional ADC type and the non-traditional type. This paper presents two types of readout circuit designs. The first is a low-power, compact successive approximation register (SAR) analog-to-digital converter (ADC) readout circuit. The core circuit is an 8-bit SAR ADC operating at 70 MS/s. It incorporates a linearity-improved bootstrapped switch to minimize leakage and enhance linearity, whose spurious-free dynamic range (SFDR) has been improved by 10.1 dB from 76.78 dB to 86.88 dB, and whose signal-to-noise and distortion ratio (SNDR) has increased by 4.56 dB from 75.13 dB to 79.69 dB. The delay of a transconductance-enhanced dynamic comparator is reduced from 184 ps to 149 ps, presenting a performance improvement of approximately 20%. Concurrently, the energy consumption decreased from 178 µ m to 132 µ m, attaining an improvement of roughly 26%. A “sandwich” capacitor structure is used that reduces the overall area of the layout. After layout and post-simulation, this circuit occupies only 49.6 µ m × 51.5 µ m, consumes 553 µ W power, has a SINAD of 46.22 dB, and has an SFDR of 57.21 dB. The second is a current controlled oscillator (CCO)-type readout circuit, which comprises a CCO oscillator with low process-sensitivity. The readout circuit also utilizes an op-amp and current mirrors for a negative feedback loop, ensuring a constant voltage across the RRAM arrays. The frequency generated through the CCO is controlled by the current, and quantified by a counter, supporting different weights quantification per ReRAM column without additional digital weighting. This circuit achieves 95-level resolution, 5.2 µ s delay, and an average consumption of 183.1 µ W. A comparative analysis highlights that traditional ADC readout circuits offer high resolution and speed but are limited by their high power and area costs, often overshadowing CIM arrays’ benefits. Thus, for applications with more lenient resolution and speed requirements, non-traditional readout circuits present considerable advantages.


Introduction
In the field of information technology, an AI revolution is unfolding.The computational prowess required for AI is predominantly provided by proprietary hardware chips.However, due to the deceleration of Moore's law and bottlenecks associated with semiconductor advancement, hardware performance enhancements have not kept pace with the demands of algorithms, leading to the so-called "von Neumann bottleneck".To transcend these constraints, various solutions, particularly CIM technology, have been proposed.CIM synergistically integrates storage and computation, thereby reducing data transfer needs.
At the algorithmic level, the convolutional neural network (CNN) is suitable for feature extraction and data classification tasks.On the hardware front, RRAM displays competitive prowess within the non-volatile memory market due to its speed, low power consumption, and simplicity.Integrating RRAM into neural networks allows for the analog-domain computation of multiply-and-accumulate (MAC), mapping weight parameters to resistance values, which accelerates computing and improves energy efficiency ratios [1].
The parallel processing offered by RRAM arrays facilitates rapid and efficient data handling.Nonetheless, in certain CIM chips, the power consumption associated with the readout circuitry can be nearly 70%, while its area cost might reach up to 90%, severely constraining the energy efficiency of CIM [2].Consequently, innovating high energy-efficient and space-conserving readout circuits becomes critical within the realm of CIM research.
There are mainly two types of CIM readout circuits for analog types: the traditional ADC type and the non-traditional type.For the first type, [3] features a variable precision circuit based on split-capacitance for 5/6 bit accuracy.This design reduces the ADC area by employing split-capacitance in the DAC and adjusting the total capacitance at the DAC output node by controlling the MSB capacitor.The authors of [4] utilize a Flash ADC readout circuit designed for 4-bit array operations.The choice of Flash ADC is attributed to its rapid speed.The strategy implements input sparsity sensing (ISS) with the Flash ADC to enhance chip energy efficiency and integration.In [5], a shared scheme is suggested where a single readout circuit is utilized across multiple columns.Further, [6] achieves a hybrid approach with analog-domain floating-point and single-slope A/D conversion to derive 2-bit exponent codes and 5-bit mantissa codes.For the second type, [7] introduces a voltage controlled oscillator (VCO)-based readout circuit, which boasts a smaller area and reduced power consumption compared to shared traditional ADC structures, making it applicable on a per-column basis.In [8], a scalable integrate-and-fire (IF) readout circuit is described, demonstrating greater suitability for spiking neural network algorithms.
This research presents the design of two readout circuit types of CIM based on RRAM arrays: a traditional ADC-based circuit and a non-traditional circuit.Section 2 outlines the design of an SAR ADC for traditional ADC readouts.Section 3 details the design of a readout circuit based on a CCO circuit.Section 4 discusses simulation results for the readout circuits described in Sections 2 and 3. Section 5 summarizes these two types of readout circuits.

SAR ADC Design for Traditional ADC Readout Circuits
This section focuses on designing a low-power, compact SAR ADC architecture for traditional ADC-type readout circuits.The module converts analog voltage signals, obtained through current sampling circuits, into digital signals.Initially, we present the overall framework for an 8-bit SAR ADC, elaborating on performance metrics and key technologies related to readout circuits in CIM chips.We then detail the design process for components of the SAR ADC circuit, including the sampling switch circuit, dynamic latch comparator, DAC capacitor array.
The overall structure and timing diagram of the designed SAR ADC are as depicted in Figure 1.This includes an improved bootstrapped switch, capacitive digital-to-analog converter (CDAC) circuit, an enhanced dynamic comparator, and SAR asynchronous sequence logic.During the sampling phase, differential inputs sample on the CDAC.Triggered by the falling edge of the sampling clock, the comparator activates to retrieve the most significant bit (MSB) information.Subsequently, the MSB data are stored in the successive approximation register and used for higher-order capacitor switching in the CDAC, continuing until the least significant bit (LSB) is attained.Among them, CLKs represents the sampling period, and Clkc represents the comparison period of the comparator (in Figure 1, it is replaced by a uniform period, but in actuality, when different voltages are compared, the comparison time is different), and the change of Clkn from low level to high level represents the completion of the nth comparison.Design highlights are shown as follows: 1. Improved Bootstrapped Switch: From the hold to the sampling phase, the linearity is enhanced by reducing the charge leakage related to Vin through the addition of a MOS transistor; 2.
Upgraded Dynamic Comparator: During the latching phase, the comparative speed is elevated via increase in the equivalent transconductance (gm); 3.
CDAC Capacitor Array: Employing a "sandwich" three-layer capacitor structure augmented with a dummy structure, this element features a minimal area and high matching precision; 4.
Switching Strategy: For the top four bits, a splitting capacitor scheme is used to secure the common-mode voltage at 1/2 V re f .The lower four bits employ a monotonic switching strategy to avoid excessive capacitor mismatches caused by excessively small unit capacitors; 5.
SAR Logic: To enhance the overall comparison efficiency, we adopt an asynchronous timing scheme in SAR logic [9].

Linearity-Improved Bootstrapped Switch
To mitigate the effect of V I N variation on sampling accuracy, researchers have proposed a gate voltage bootstrapped switch that reduces the nonlinearity of the R on .The schematic of the classical gate voltage bootstrapped switch is shown in Figure 2. When CK = 0, the switch is in the hold phase, charging the bootstrapped capacitor C B up to VDD; when CK = 1, the switch enters the sampling phase, and the gate of the sampling switch M 1 is connected to the bootstrapped capacitor C B , resulting in a gate voltage of V I N + V DD .Therefore, the gate-to-source voltage of M 1 becomes a fixed V DD , decoupling its resistance from input variations, effectively reducing switch nonlinearity and enhancing overall sampling accuracy.
The inclusion of M 4 in the classical gate voltage bootstrapped switch is to limit the V DS5 voltage in M 5 to V X − V TH4 , preventing excessive source-drain voltage in M 5 from causing breakdown [10].However, the introduction of M 4 introduces new nonidealities; as the mode changes from hold to sampling, the voltage at point X rises from 0 V to V X1 due to the presence of parasitic capacitance C P , as shown in the equation below: C B is the bootstrapped capacitor.When M 5 is off and point Y is floating during the sampling phase, if V DD − V X1 is less than V TH4 , M 4 does not fully turn off, causing charge to leak from X to Y. Since V X1 is related to V I N , charge leakage becomes related to V I N , introducing nonlinearity [11].To address the nonlinearity issue of the bootstrapped switch, an improved gate voltage bootstrapped switch is proposed, as depicted in Figure 2.This structure incorporates an additional PMOS transistor M 11 to reduce nonlinearity.During the transition from hold to sampling, M 11 is turned on, setting the voltage at point Y to V DD and ensuring V DS4 of M 4 is much less than V TH4 .Therefore, by turning off M 4 , the influence of V X is minimized, which, in turn, decreases the leakage from X to Y through M 4 , thereby enhancing the dynamic performance of the bootstrapped switch.

The improved circuit
Several considerations for the gate voltage bootstrapped switch design are as follows: To achieve area optimization while ensuring stable elevation of the bootstrapped switch's gate-to-source voltage and to fulfill the SNDR design requirement, the bootstrapped capacitor C B is set to 135 fF; other MOS transistors are sized relatively small to diminish the effects of parasitic capacitance.
The dynamic performance of the proposed linearity-improved bootstrapped sampling circuit is compared with the traditional one in Figure 3

Transconductance-Enhanced Dynamic Comparator
In SAR ADC design, comparators are utilized primarily for comparing voltages across the CDAC capacitor array.Comparator designs are classified into static and dynamic types: static comparators provide lower input offset voltages with rapid response speeds, but high-resolution instances usually depend on multi-stage amplifiers, leading to high static power consumption and larger area occupancy; in contrast, dynamic comparators merge preamplification and latching stages, controlled by clock signals, typically devoid of static power consumption.Given the low energy consumption and compact size requirements of the designed readout circuit, this research opts for the dynamic comparator approach.
A traditional two-stage dynamic comparator structure is depicted in Figure 4a, comprising two operation phases: reset and compare.During reset, with CLK at low level, the comparator's preamplification stage outputs are charged up to VDD through MP3 and MP4, activating MN1 and MN4, and grounding the output of the latching stage.In the compare phase, with CLK at high level, MN7 conducts, causing differential voltages Vinp and Vinn to discharge the preamplification stage outputs at different rates, generating a differential voltage ∆V.Concurrently, in the latching stage, MP5 conducts, allowing ∆V to propagate through MN1 and MN4 to the cross-coupled inverter structure, initiating a positive feedback mechanism.Ultimately, one high and one low level are produced at the output nodes.
The two-stage dynamic comparator, consisting of a preamplifier stage and a latching stage, allows for a flexible balance between speed and offset in low-voltage operations.During latching, the effective transconductance g m,e f f significantly influences speed.In the outlined structure, when MN2 and MN3 are off, g m,e f f is solely contributed by MP1 and MP2, affecting the comparator's comparison speed.To achieve low power consumption and compact design in the readout circuit, while ensuring the comparator's response speed meets specifications, a transconductance-enhanced two-stage dynamic comparator is proposed [12], illustrated in Figure 4b.In the transconductance-enhanced two-stage dynamic comparator, the latching stage structure is improved by adding M11, M12, and M13, M14.During reset, with CLK at low, the preamplification stage functions as the conventional structure.High preamplification stage outputs lead to conduction through M5 and M6, driving M9, M10.The dual action of M13, M14 conducting sets comparator outputs V outp and V outn to high; during comparison, with CLK at high, preamplification stage outputs discharge at differing rates.As V outp exceeds V outn , point A discharges faster than B, causing M5 and M6's V DS to diverge.High CLK1 leads to differing discharge rates at points C and D. The positive feedback formed by M7, M8, M9, and M10 ultimately sets V outp high and V outn low.
M11 and M12 ensure NMOS and PMOS (M7, M9, or M8, M10) stay in the deep inversion region during latching, thus improving g m,e f f , as shown in Equation (2): CLK1 is CLK plus a delay ∆t and an increased ∆V, aimed at reducing power consumption and ensuring effective conduction of M11, M12: ∆t delays M11 and M12's conduction at CLK's rising edge, reducing short-circuit current through M5 and M6; ∆V ensures M11 and M12 better meet V gs > V thn .However, this structure results in longer reset times, affecting the comparator's overall comparison speed.Thus, M13 and M14 are introduced to reduce reset time.
Comparator offset is a critical factor, particularly in terms of its evident impact on dynamic performance due to process mismatches.The transconductance-enhanced comparator runs the Monte Carlo simulation, as illustrated in Waveform comparisons between the transconductance-enhanced two-stage dynamic comparator and traditional two-stage dynamic comparator are shown in Figure 6.The data indicate that for a 1 LSB input, comparator comparison delay decreased from 184 ps to 149 ps, a performance improvement of around 20%.Simultaneously, energy consumption dropped from 178 µm to 132 µm, an improvement of approximately 26%.

CDAC Switching Scheme and Unit Capacitor
To reduce the energy consumption during the switching in conventional strategies, a split-capacitor switching strategy was introduced, as detailed in [13].This approach halves the capacitor at the most significant bit into two equal sub-capacitors, resulting in a 38% reduction in power consumption compared to traditional strategies.The monotonic switching strategy, which involves discharge only, requires 2 (n−1) unit capacitors (singleended) for an n-bit ADC, reducing the total capacitor value by half and decreasing energy consumption by 81% relative to traditional strategies.Although the monotonic strategy is efficient in saving energy, it changes the common-mode voltage of the top-plate in a differential structure.In contrast, the split-capacitor strategy maintains the common-mode voltage during switching, but the unit capacitor C unit used for the least significant bit is twice as much as that in the monotonic strategy.
To achieve low power consumption, reduce common-mode interference, and improve ADC accuracy, a combined switching strategy of split-capacitor and monotonic is adopted: the first four bits (MSBs) use the split-capacitor method to prevent substantial changes in the common-mode voltage during the bit variations; the last four bits (LSBs) employ the monotonic method to avoid the excessive use of C unit and reduce power consumption.
For a low-power, compact readout circuit applied in CIM, a "sandwich" capacitor structure is used to meet the requirements of small area, which is shown in Figure 7.The structure in SMIC 110 nm process design uses layers 4, 5, and 6, with M5 as the top plate and M4 and M6 as the bottom plate of the capacitor.To reduce the influence of parasitic capacitance on the relative accuracy of the unit capacitor, a shielding layer is added between the unit capacitors, essentially providing better protection by dummy layers and reducing capacitor mismatches at the boundaries.The DNL and INL are estimated by the method of relative standard deviation.In the designed process, the relative standard deviation of a 10 fF MOM capacitor is 1.3%.The relative standard deviation of the designed 0.8 fF MOM capacitor can be inferred to be approximately 4.2% based on the ratio of standard deviation to capacitance [14].As shown in Figure 8

CCO-Type Readout Circuit
In general, the operation speed and accuracy of CIM arrays are at least an order of magnitude lower than those of the SAR ADC converters designed in Section 2. Therefore, the SAR ADC readout circuits proposed in these applications are typically utilized in shared CIM arrays, where different columns of a single CIM array or similar columns across multiple CIM arrays share a single SAR ADC readout circuit.While this type of readout circuit offers the advantage of high precision, its power consumption and area requirements are less than ideal.Hence, this section introduces a non-traditional type of readout circuit design.Leveraging the benefits of low power consumption and compact size, this readout circuit design can be applied to every column of the CIM array.
For the study of CIM readout circuits in this section, RRAM is chosen as the storage compute array element.RRAM modulates resistance through the formation and rupture of conductive filaments to enable data storage and read-write operations.The dynamism of the conductive filaments involves migration of oxygen ions and the generation and annihilation of oxygen vacancies, influenced by stochastic variations and thermal effects, leading to the switching of RRAM resistance values [15].To simplify the model, this study adopts a linear RRAM model for designing the readout circuit, disregarding other nonlinear factors.For quantification ease, this model utilizes a dual-weight system, where high resistance states (HRS) and low-resistance states (LRS) store binary data "0" and "1", respectively, with analog resistance using a high/low resistance ratio of 20 KΩ/2 KΩ.The 1T1R configuration is selected for simulating the array, as shown in Figure 9.In this model, the source line (SL) connects to the transistor's source; the bit line (BL) connects the storage unit with the column signal sensor; the word line (WL) is used for selecting the row address and triggering row signals.To use a specific 1T1R unit, the WL is enabled, and current flows out from the BL when the SL voltage is 1.2 V.A 4 × 4 CIM array constructed with 1T1R to minimize sneak current is illustrated in Figure 9. Sneak current leakage occurs when some computing units in a CIM column are disabled (SL = 0), and the current from enabled computing units (SL = 1) may flow through the BL of the disabled units and exit through SL.
The 4 × 4 CIM array's SL1-SL4 end voltage is 1.2 V; WL1-WL4 are enabled at 0 V and disabled at 1.2 V; BL1-BL4 serve as current output ends.The readout circuit designed for the 4 × 4 CIM array, as shown in Figure 10, includes a front-end circuit, a CCO circuit, and a digital logic circuit.BL1-BL4 represent the current output from the 4 × 4 CIM array.The voltage at Point A (A1-A4) is clamped to V re f through the operational amplifier, after which the current is mirrored into the CCO circuit (comprising three current control inverters) to produce variable frequency waveforms.Finally, a digital logic circuit quantizes waveforms of different frequencies and outputs them in digital form.Notable design highlights within this approach include: 1.

Clamp voltage
The current within an RRAM array is computed as I = ΣV i G i .Consequently, variations in column output voltage within the CIM array induce changes in the column's current, thus impacting accuracy.The CCO-type readout circuit designed in this paper leverages a clamping action generated by a negative feedback circuit, constructed from an operational amplifier and a current mirror, to clamp the voltage at Point A (A1-A4) to V re f .This maintains a constant voltage difference across the array, thereby enhancing the linearity of the CIM storage compute array's output current.

2.
Selection of size in current mirror Given the order of magnitude variation in column current within the CIM array, careful consideration should be given to sizing MN1,1-MN4,1: these NMOS transistors must remain in the saturation region during column current changes.Therefore, prudent selection of the dimensions for these four NMOS transistors is critical during the design process.Considering the mA level current in the CIM array columns, direct replication of current via a current mirror results in prohibitive power consumption.Hence, scaling of the current is performed using (MN1,2, MN1,3, MN1,4); moreover, different proportions of (MN1,2, MN1,3, MN1,4) to (MN4,2, MN4,3, MN4,4) facilitate the assignment of varied weights to different columns within the CIM.

3.
Design of CCO This study uses a design for a CCO characterized by low process-sensitivity, thereby eliminating the need for an additional reference voltage.Additionally, leveraging the complementary nature of technology-induced variations in capacitance and voltage, the design's sensitivity to manufacturing process variations is mitigated [16].

Front-End Circuit Design
In the readout circuit of the voltage-controlled oscillator (VCO)-type proposed in [7], the authors used a current-to-voltage converter (a MOS transistor) to convert current into voltage.This voltage then controls the output frequency of the VCO, and finally, digital circuits quantify the counter to achieve the final digital output.
A structural drawback of this setup is that while it is designed to quantify different column currents, the conversion from current to voltage results in changes in the voltage at point Y reflecting the variations in column currents.Changes in the voltage at point Y (point A in this paper) lead to modifications in the voltage difference across columns in the RRAM Crossbar, thus causing deviations in the column currents from their ideal values.To address this issue, this paper implements a negative feedback circuit formed by an operational amplifier and a current mirror, as shown in Figure 11a.
In the front-end circuit, the voltage at point A1 is clamped to V re f .With this setup, as the changes in the LRS within the CIM array occur, the voltage at point A1 remains almost unchanged, as illustrated in Figure 11b.The voltage variations at point A1 in the circuit designed in the [7] are around 200 mV, whereas in the circuit designed in this paper, the voltage changes at point A1 are about 5 mV, which can be considered negligible.Hence, this structure enhances the linearity of the output column currents in the CIM array.Within the circuit, the operational amplifier adopts a five-transistor architecture, as depicted in Figure 12a.Due to the requirement for the input to remain stable at 0.8 V, NMOS pair transistors are utilized for the input.The amplitude and phase frequency characteristic curves are shown in Figure 12b.The operational amplifier exhibits an approximate DC gain of 37.5 dB, with a phase margin of about 92°.To achieve better current mirror matching in the circuit, the current mirror for the first array is divided into MN1,2, MN1,3, and MN1,4, with IB1, IB2, and IB3 each connected to one of the three CCOs in the CCO circuit.Furthermore, considering the near-mA scale current flowing through the columns of the CIM array, the width-to-length ratio of (MN1,2, MN1,3, MN1,4) is proportionally reduced to decrease power consumption in subsequent circuits.

Design of CCO
Traditional CCOs have limited suppression of fabrication process-sensitivity, which can lead to variations in the output frequency of the CCO, particularly in high-gain CCO designs.These frequency shifts can incur significant correction costs in subsequent circuit outputs.To address this challenge, this paper proposes a new low process-sensitive CCO structure, as demonstrated in Figure 13.
The IBn is the mirrored current from the current mirror in the front-end circuit flowing out of each delay unit.V DD denotes the power supply voltage, and V thn1 represents the threshold voltage of MN1.When the input V in shifts from a low to a high level, MP1 and MP3 disconnect, MN3 conducts.Subsequently, the capacitor C int discharges through the current I CCO .Once the voltage on the bottom plate of C int drops to V thn1 , MN1 turns off, leading to a high inverter output and thus a low V out .Conversely, when V in shifts from high to low, MP1 conducts and the current through MP1 recharges C int , resetting its voltage.
With MP3 conducting and MN3 off, V out goes high.The output period of the CCO is given by the following equation: Here, N is the number of delay units (N = 3 in this design), t rise is the signal's rising delay time, and t f all is the falling delay time.In practical circuits, t rise and t f all can be neglected compared to the first term within the parentheses in Equation (3).Thus, for simplification, the output waveform's frequency is assumed to be a first-order function of I CCO .The low process-sensitivity of this CCO structure is due to the complementary nature of changes in the value of C int with those of (V DD − V thn1 ) with respect to process variations.For instance, in an FF process corner, the value of C int and the threshold voltage V thn1 would decrease, while the value of (V DD − V thn1 ) would increase, thus reducing the changes in the product of C int (V DD − V thn1 ).Table 2 provides the MOS transistor dimensions for the current mirrors shown in Figure 13.Moreover, the C int is equal to 1 pf.Considering the approximate current range of 7.2 µA to 42 µA, the Monte Carlo simulation of CCO was operated with a current of 25 µA as I CCO , as depicted in Figure 14.

Digital Logic Circuit
The final output of the readout circuit under discussion is generated as a digital signal via a time-to-digital converter (TDC).Traditional TDC implementations typically use either counters or encoders.In time-delay-based ADC, encoder-based T/D conversion is common due to the circuit structure's requirements.However, in this research, digital counters are the primary means for achieving T/D conversion [17].
The readout circuit does not impose high requirements on conversion speed.There are no stringent demands on the delay times of the digital modules.To fulfill objectives of low power consumption and reduced area, true single-phase clock (TSPC)-type flipflops are employed within this structure.In TDCs utilizing digital counters, the output is expressed as: Here, N clk denotes the bit depth of the reference side counter, f clk is the input frequency of the reference signal, N CCO represents the bit depth of the CCO output counter, and f CCO is the output waveform frequency of the CCO, with ⌊•⌋ indicating a floor function.
When the reference counter reaches a full count of N bits, it triggers the CCO side counter, with the count value of the CCO side counter representing the digital output at that moment.Consequently, the reference counter has fewer bits compared to the CCO counter, which has a greater number to prevent counting errors due to insufficient bit count, typically including a margin for redundancy.

SAR ADC Readout Circuit Layout and Post-Simitation Results
The SAR ADC proposed in this study was designed utilizing SMIC 110 nm CMOS technology.By employing custom-designed capacitors, active devices were arranged beneath the CDAC array, utilizing metal layers M1 to M3 for routing.We present the simulation results conducted with a 1.2 V supply voltage in this subsection.
Figure 15a presents the performance metrics for the SAR ADC with a Nyquist-rate input.The results demonstrate SNDR of 46.22 dB, SFDR of 57.21 dB, and ENOB of 7.38 bits for the 70 MS/s sampling rate.Furthermore, Figure 15b displays the variation in SNDR and SFDR across different input frequencies.The observations reveal that the SFDR remains above 57 dB and the SNDR stays above 46 dB, with both metrics exhibiting a decrement of less than 2 dB as the input frequency escalates to the Nyquist rate.The power dissipation observed for the system operating with a 1.2 V supply is measured at 553 µW.The distribution of power consumption across different components of the system is as follows: the DAC accounts for 7% of the total power dissipation; the comparator is responsible for 31%; the S/H circuit utilizes 2%; and the digital circuits collectively demand 60% of the power, as illustrated in Figure 16.Consequently, the achieved FoM performance is 47.26 fJ/Conv.In this design, active devices are incorporated beneath custom-designed capacitors, employing M1 to M3 layers for routing.This arrangement culminates in a core area of 0.0025544 mm 2 (49.6 µm × 51.5 µm), utilizing 110 nm CMOS technology, as depicted in Figure 17.A comparison with preceding SAR ADCs is presented in Table 3.It reveals that the current design achieves the minimum area footprint while delivering a competitive FoM at moderate operational speed.Consequently, this design is particularly suitable for CIM applications, demanding an ADC that is both space-efficient and energy-efficient.With the reference CLK clock frequency set to 25 M, specific digital outputs postquantification are derived based on Equation (4).Altering the high/low resistance state count for each RRAM column in the CIM yields digital outputs at the OUTPUT terminal of the digital logic module.When the RRAM in the 4 × 4 CIM array is entirely in the HRS, the OUTPUT yields 0001101; when entirely in the LRS, the OUTPUT yields 1111111.Statistical analysis performed using Python on the output results reveals a quantification of 95 outcomes within the range of 0001101 to 1111111, hence N D = 95.These 95 outcomes are arranged in ascending order, as shown in Figure 18.The observed frequency discontinuities at both sides occur because the semiconductor device within the CCO (specifically, transistor MN1,1 in the current mirror of the front-end circuit) reaches its operational limits.At very low currents, the transistor enters the cut-off region, while at very high currents, it transitions to the linear region.Consequently, this leads to a non-ideal current-frequency relationship.The delay L represents the interval between the OUTPUT terminal delivering an 8-bit result and the subsequent output of the next 8-bit result.With a reference CLK of 25 MHz, after 128 counts, the reference counter issues a STOP signal ceasing the counting at the CCO counter; concurrently, the parallel-to-serial state machine initiates conversion.After 12 CLK1 (with CLK1 frequency equal to CLK) cycles, the OUTPUT is complete.For efficiency, as OUTPUT begins, the CCO is reset, and the next counting cycle commences.The time from OUTPUT delivering a binary result until the next cycle's binary output, as shown in Figure 19, sets the circuit delay L to 5.2 µs.
When varying numbers of HRS and LRS are configured in the CIM RRAM array, there is a significant variation in mirrored current from the front-end circuit, leading to a considerable change in power consumption for the front-end and CCO circuits.The average power consumption, P average , in this study is quantified by summing and then averaging the power consumption across all 625 scenarios: Under a supply voltage of 1.2 V, transient simulations yield an estimation of power dissipation for the current-controlled oscillator-type readout circuit, as shown in Figure 20.P1 is the power consumption of the front-end circuit, P2 is that of the CCO circuit, and P3 is the power consumption of the digital logic circuit.It is apparent from the figure that as the HRS/LRS change in the CIM array, resulting in current variations to the CCO, there is a corresponding fluctuation in power consumption due to the different frequencies of output.Calculating the average power consumption as per Equation ( 6), the average total power dissipation of the circuit structure is found to be 183.1 µW, of which the front-end circuit accounts for 94.7 µW, the CCO circuit for 86.8 µW, and the digital logic circuit for 1.6 µW.Their proportional contributions are illustrated in Figure 21.
The CCO-type readout circuit designed in this paper achieves a resolution of 95, delay of 5.2 µs, and power dissipation of 183.1 µW at a voltage of 1.2 V in TSMC 65 nm.
Table 4 provides a comparative analysis between the non-traditional readout circuit presented in this study and those reported in other research.This study adopts a method of distinct weighting for different CIM columns, providing a marked advantage in resolution relative to [7,22].Compared to references [7,22], the circuit delay in this study is relatively larger.There are two main reasons for the extended time in our design.For the first, time and resolution are positively correlated.Taking [7] as an example, the paper designs a VCO-based readout circuit, and its results are only applicable to a single column in an RRAM array with the resolution of 32 levels.Our paper proposes a quantization approach for four columns of an RRAM array simultaneously (the weights for the four columns being 1, 2, 4, and 8, respectively) with the resolution of 95 levels.For the second, we scale down the current using current mirrors, which results in a frequency reduction in the CCO circuit.In turn, this leads to the growth of time.As power consumption metrics are not provided in [7,22], and the number of cycles is not specified, a comparison of power consumption is eschewed.

Conclusions
With the rapid evolution of AI technology and its expanding applications, increasingly sophisticated algorithms require superior computing power.The traditional von Neumann architecture is facing power limitations, causing substantial computational bottlenecks.To address this challenge, CIM architectures offer distinct advantages in processing CNN algorithms.However, the readout circuitry incurs significant power consumption and area overhead, often exceeding that of the compute array itself.This study investigates readout circuits within CIM using RRAM as the computational storage unit.
There are two prevalent designs: traditional ADC-type readout circuits and nontraditional types.Based on the former, a low-power, compact SAR ADC is designed as the core of the readout circuit for digital signal extraction.The prototype ADC occupies an active area of only 0.00255 mm 2 (49.6 µm × 51.5 µm) and achieves an SNDR of 46.22 dB and an SFDR of 57.21 dB, with the Nyquist rate input at a sampling rate of 70 MS/s.The power consumption is 553 µW, resulting in a 47.26 fJ/Conv FOM at 1.2 V supply.
Nevertheless, the complex conversion process in traditional ADC types yields significant power and area costs, which can surpass the compute array.To overcome this drawback, the paper designs a CCO-type readout circuit.This CCO-type readout circuit design achieves a resolution of 95, delay of 5.2 µs, and power dissipation of 183.1 µW at a voltage of 1.2 V in TSMC 65 nm.
Author Contributions: X.X., under the supervisor's guidance, designed two types of CIM readout circuits in both the schematic and layout phases and wrote the paper.A.W. proposed the topic, provided guidance and ideas for the design, and revised the paper.Y.S. assisted with the CLK generation and pad modules for the first readout circuit, as well as the CCO module for the second readout circuit.All authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.Block and timing diagrams of the proposed SAR ADC.
. The SFDR improved by 10.1 dB from 76.78 dB to 86.88 dB, and the SNDR improved by 4.56 dB from 75.13 dB to 79.69 dB at 70 MS/s.

Figure 6 .
Figure 6.Comparison of V out waveforms from two comparators.

Figure 11 .
Figure 11.(a) The front-end circuit and (b) change of point A1 with the number of LRS (V re f = 0.8 V).

Figure 12 .
Figure 12.(a) The operational amplifier structure and (b) its amplitude frequency response and phase response.

Figure 14 .
Figure 14.Monte Carlo simulation of the CCO.

Table 1 .
MOS transistors size configuration of current mirror in front-end circuit.

Table 2 .
MOS transistors size configuration of current mirror in front-end circuit.

Table 3 .
Comparison with previous works I.

Table 4 .
Comparison with previous works II.