A Compact 4T+2T SRAM-Based Digital Compute-in-Memory Bitcell with Reduced Transistor Count for Energy-Efficient Bitwise MAC Operations in 45 nm CMOS

Hariprasad, Shamanth; Balasubramanian, Srinivas; Patel, Adnan A.; Choi, Kyuwon Ken

doi:10.3390/electronics15122630

Open AccessArticle

A Compact 4T+2T SRAM-Based Digital Compute-in-Memory Bitcell with Reduced Transistor Count for Energy-Efficient Bitwise MAC Operations in 45 nm CMOS

DA-Lab, Department of Electrical and Computer Engineering, Illinois Institute of Technology, 3301 South Dearborn Street, Chicago, IL 60616, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2026, 15(12), 2630; https://doi.org/10.3390/electronics15122630 (registering DOI)

Submission received: 6 May 2026 / Revised: 2 June 2026 / Accepted: 11 June 2026 / Published: 14 June 2026

(This article belongs to the Special Issue New Challenges in High-Performance Computing and Computer Architecture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The increasing computational demands of deep neural network inference drive the need for energy-efficient hardware accelerators that minimize data movement between memory and processing units. Compute-in-memory (CIM) architectures address this bottleneck by embedding computation directly within memory arrays, reducing the overhead of repeated weight transfers in conventional von Neumann systems. Conventional 6T SRAM-based digital CIM bitcells incur significant transistor overhead as arrays scale, motivating exploration of reduced-transistor bitcell alternatives. We propose a compact 4T+2T SRAM-based digital CIM bitcell implemented in 45 nm CMOS, combining a 4T SRAM storage cell with a 2T multiplier for bitwise multiply-and-accumulate (MAC) operations. The proposed design reduces transistor count from 8 to 6 compared to the 6T+2T reference, lowering parasitic capacitance and hardware overhead without compromising memory or computation functionality. Transient simulations confirm correct write, read, and CIM operations. The bitcell achieves a read delay of 26.91 ps, read power of 1.351 nW, and read energy of 0.005403 fJ—reductions of 98.7%, 86.5%, and 73.1% over the 6T+2T reference, respectively. For CIM operation, bitwise multiplication power decreases from 1.772 µW to 0.8014 µW and energy from 10.63 fJ to 4.808 fJ, representing a 54.8% reduction in both metrics, with only a marginal CIM delay increase of 3.13 ps. Monte Carlo analysis across 100 samples confirms robust write behavior under process variation, with write delay ranging from 55.02 to 69.59 ps and write energy from 0.05870 to 0.06557 fJ. Static noise margin analysis yields an SNM of 83.7 mV under nominal conditions, confirming stable data retention. These results demonstrate that the proposed 4T+2T bitcell offers strong transistor efficiency, energy savings, and computational correctness, making it a promising candidate for area-efficient digital CIM architectures targeting edge AI inference.

Keywords:

compute-in-memory (CIM); SRAM; 4T SRAM; digital CIM; bitwise multiplication; multiply-and-accumulate (MAC); 45 nm CMOS

1. Introduction

The rapid growth of Artificial Intelligence (AI) and Machine Learning (ML) applications has significantly increased the demand for energy-efficient hardware accelerators capable of performing large-scale multiply-and-accumulate (MAC) operations [1,2,3]. Deep neural network workloads require repeated movement of input activations, weights, and partial sums between memory and processing units, resulting in high latency and energy consumption in conventional von Neumann architectures. As a result, data movement has become a major bottleneck in modern AI hardware systems.

Compute-in-memory (CIM) has emerged as a promising approach to reduce this memory-access overhead by allowing computation inside or near the memory array [4,5,6]. Among different memory technologies, Static Random Access Memory (SRAM)-based CIM architectures are widely explored due to their high-speed operation, CMOS compatibility, and reliable memory behavior. SRAM-CIM designs can generally be classified into analog and digital approaches. Analog CIM architectures perform computation via bitline charge-sharing or voltage-domain accumulation, but they suffer from limited signal margin, nonlinearity, sensitivity to process variation, and reduced precision [7]. These limitations make high-precision MAC operation challenging, especially for low-voltage and edge-AI applications.

Digital CIM architectures address many of the accuracy limitations of analog CIM by using logic-based multiplication and digital accumulation [8]. Since the accumulated output bit-width can be extended to the required precision, digital CIM enables more reliable, loss-free MAC computation. However, this comes at the cost of additional area and power overhead from modified SRAM bitcells, adder trees, shifters, and peripheral logic. Therefore, reducing the bitcell-level overhead while preserving correct memory and computation functionality remains an important design challenge.

Recent work by Tyagi and Mittal proposed an all-digital 6T SRAM-based CIM macro that features a compact 2T multiplier for bitwise multiplication [9]. Their design improves over earlier NOR-gate-based multiplier approaches by reducing multiplier transistor count and eliminating the need for input inversion. The 6T+2T architecture supports reconfigurable input and weight precision, digital MAC computation, concurrent MAC and weight update operations, and wide supply-voltage operation. However, the storage portion of the bitcell still relies on the conventional 6T SRAM structure, which contributes to the total transistor count and area overhead when scaled to large memory arrays.

In parallel, reduced-transistor SRAM cells such as 4T SRAM have been explored for near-threshold and low-power memory operation in scaled CMOS technologies [10]. A lower transistor count can improve area density and reduce parasitic capacitance, thereby reducing switching energy. Motivated by this, the present work investigates a compact 4T+2T SRAM-based digital CIM bitcell implemented in 45 nm CMOS technology. The proposed design replaces the conventional 6T storage cell with a 4T SRAM cell while retaining the compact 2T multiplier path for bitwise computation. The objective is to evaluate whether the transistor count and hardware overhead can be reduced while maintaining correct write, read, and CIM functionality.

This work focuses on bitcell-level and small array-level validation of the proposed 4T+2T architecture. The design is evaluated through transient simulation, delay measurement, power and energy analysis, Monte Carlo variation study, CIM waveform validation, and noise/SNR estimation. The results are compared with the reference 6T+2T structure to analyze the potential benefits and trade-offs of the proposed reduced-transistor CIM bitcell. The remainder of this paper is organized as follows. Section 2 presents the background, research gap, and proposed implementation. Section 3 describes the proposed architecture and methodology. Section 4 discusses simulation results and performance analysis. Finally, Section 5 concludes the paper and outlines future work.

Beyond SRAM, alternative non-volatile memory technologies have also been actively explored as computational substrates for in-memory computing. Resistive RAM (ReRAM) has attracted considerable attention due to its crossbar array structure, which enables highly parallel analog MAC operations through Ohm’s law and Kirchhoff’s current law, offering potentially superior area density and energy efficiency compared to SRAM-based approaches. Prior work has explored a wide range of SRAM-based CIM designs, including analog bitline-based approaches [11,12,13], all-digital CIM macros [14,15], compute-in-memory circuits for deep learning [16,17,18], classifier implementations in standard SRAM arrays [19,20], ReRAM-based CIM designs [21], and high-precision MAC-oriented SRAM architectures [22,23,24,25]. Chen et al. provide a comprehensive review of ReRAM-based processing-in-memory architectures for neural network acceleration, discussing various design schemes and their associated challenges including device non-linearity, soft and hard faults, and limited precision [26]. Phase-change memory (PCM) has similarly been explored as a CIM substrate, leveraging multi-level cell programming to store multi-bit weights in a compact footprint. More recently, embedded resistive RAM (eRRAM) implementations in advanced CMOS nodes have demonstrated logic-compatible integration without extra masks, as reported by Huang et al. in a 2T bipolar eRRAM macro fabricated in 28 nm HKMG process [27]. While these non-volatile CIM approaches offer compelling advantages in standby power and density, they face challenges in write endurance, device variability, and CMOS process compatibility that make SRAM-based digital CIM a preferred choice for applications requiring high reliability, full CMOS compatibility, and deterministic digital computation, which motivates the present work.

2. Background

SRAM-based CIM architectures aim to overcome the data-transfer bottleneck by allowing memory arrays to participate directly in computation. In machine learning workloads, MAC operations dominate the overall computation, and repeated access to stored weights creates significant energy overhead in conventional memory-processing systems. CIM reduces this overhead by reusing memory cells not only for storage but also for computation. Analog CIM approaches commonly perform multiplication or accumulation through bitline voltage development, charge sharing, or current-domain summation. While these techniques can offer high energy efficiency, they are limited by signal margin, nonlinearity, device variation, and the need for ADC-based readout. These limitations become more severe as the required computation precision increases. Digital CIM architectures improve computational accuracy by performing bitwise multiplication and accumulation using digital logic. Since the accumulation path is digital, the output precision can be extended to support the required MAC bit-width. The reference 6T+2T CIM design integrates a conventional 6T SRAM bitcell with a 2T multiplier to perform bitwise multiplication between the stored weight and input activation [9], as illustrated in Figure 1. In this structure, the complement of the stored weight is already available inside the SRAM cell, and the 2T multiplier generates the

I N \times W

output without requiring additional input inversion. This reduces the complexity compared with earlier NOR-gate-based multiplier structures and improves delay and energy efficiency. Although the 6T+2T CIM architecture improves over prior digital CIM bitcells, the memory storage portion still uses a conventional 6T SRAM cell. Since CIM macros are composed of a large number of repeated bitcells, even a small reduction in transistor count at the bitcell level can yield measurable reductions in parasitic capacitance and aggregate switching energy. This creates a research gap: most prior digital CIM work focuses on improving the multiplier or peripheral MAC circuitry, while the storage cell itself remains relatively transistor-intensive. A reduced-transistor SRAM cell integrated with a compact multiplier can reduce switching overhead, provided that stable storage and correct computation are maintained. The proposed work addresses this gap by replacing the 6T SRAM storage cell with a compact 4T SRAM cell and integrating it with the 2T multiplier concept, forming a 4T+2T digital CIM bitcell. The 4T SRAM cell stores the weight value, while the 2T multiplier receives the input activation and produces the bitwise product

I N \times W

. Compared with the 6T+2T reference design, the proposed 4T+2T structure reduces the total number of transistors from eight to six. This reduction lowers device-level switching capacitance, which is reflected in the measured power and energy results reported in this work. The proposed implementation is evaluated in 45 nm CMOS technology. The bitcell is first validated for write and read functionality to confirm stable memory operation. The CIM mode is then verified by applying input activation pulses while maintaining the stored weight inside the SRAM cell. The output waveform is analyzed to confirm correct bitwise multiplication behavior. In addition, delay, average power, energy, Monte Carlo variation, and noise/SNR behavior are evaluated and compared with the 6T+2T reference structure. At this stage, the work focuses on bitcell-level feasibility and comparative analysis rather than complete macro-level integration.

3. Proposed Architecture & Methodology

This section presents a proposed SRAM-based digital compute-in-memory (CIM) architecture that uses a compact 4T SRAM storage cell integrated with a 2T bitwise multiplier. The main objective of the proposed design is to reduce transistor count while preserving memory functionality and enabling in-memory bitwise computation. In the proposed approach, the SRAM cell stores the weight bit, while the input activation is applied to the multiplier path to generate the bitwise multiplication output. The design was implemented and evaluated in

45 nm

CMOS technology using HSPICE simulations. All simulations were performed at a supply voltage of

V_{D D} = 1 V

, temperature of

T = 25 ° C

, and output load capacitance of

C_{L} = 5 \times 10^{- 15} F

.

3.1. Proposed 4T+2T Bitcell

The proposed 4T+2T bitcell consists of two functional parts: a 4T SRAM storage section and a 2T multiplier section. The 4T SRAM section is responsible for storing the binary weight at the internal storage nodes Q and

Q B

. The additional 2T multiplier path uses the stored data and the input activation

I N

to produce the bitwise CIM output

I N \times W

.

Compared to a conventional 6T SRAM cell, the 4T SRAM structure uses fewer transistors, thereby improving memory density and reducing device-level switching overhead, as shown in Figure 2. This makes the proposed structure suitable for compact CIM arrays, where both storage density and computation efficiency are important. Since the proposed design retains the basic storage functionality while adding a separate multiplier output path, the read/write operation and the CIM operation can be evaluated independently.

In CIM mode, the stored SRAM data acts as the weight, and the external input activation

I N

is applied to the multiplier path. The multiplier output node

I N \times W

represents the bitwise product of the input activation and the stored weight. When the stored weight is logic high, and the input activation switches high, the multiplier output also switches high, corresponding to

I N \times W = 1

. For other input-weight combinations, the output remains at logic low. Therefore, the proposed 4T+2T bitcell supports bitwise multiplication inside the memory structure without requiring a separate external multiplier for the evaluated bit-level operation.

For performance evaluation, conventional SRAM read and write operations were analyzed along the word-line and bit-line paths. In contrast, CIM operations were analyzed along the input-to-multiplier-output path. Specifically, CIM delay is measured from the input activation node

I N

to the multiplier output node

I N \times W

, since this path directly represents the bitwise computation behavior of the proposed cell. The same nominal simulation conditions,

V_{D D} = 1 V

,

T = 25 ° C

, and

C_{L} = 5 \times 10^{- 15} F

, are used for read, write, and CIM performance evaluation to ensure consistent comparison across the bitcell designs.

The truth table of the proposed multiplier operation is shown in Table 1.

Integrating the bitwise multiplication path directly with the memory cell, the proposed 4T+2T bitcell reduces the need for transferring stored weight data to external computation units. The stored SRAM data is used locally during CIM operation, while the input activation propagates through the 2T multiplier path to generate the

I N \times W

output. Furthermore, the reduced transistor count can lower parasitic and switching capacitance, thereby improving power and energy efficiency.

Figure 3 illustrates the CIM operation flow of the proposed 4T+2T bitcell. In CIM mode, the stored weight remains inside the SRAM cell, while the input activation is applied to the 2T multiplier path. The resulting

I N \times W

signal represents the bitwise multiplication output used for compute-in-memory operation.

3.2. Overall Architecture and Methodology

The proposed CIM architecture is constructed using an array of 4T+2T bitcells arranged in rows and columns. Each bitcell stores a binary weight in the SRAM storage nodes and uses the additional 2T multiplier path to generate a bitwise product with the applied input activation. In the memory mode, the cell operates as a conventional SRAM bitcell for read and write access. In the CIM mode, the stored weight remains inside the memory cell, while the input activation is applied to the multiplier path to produce the bitwise output

I N \times W

.

During CIM operation, input activation bits are applied to the multiplier path of the selected bitcells. Each cell locally computes a bitwise product using the stored weight and the input activation. The generated bitwise products can then be processed by peripheral digital circuits, such as adder trees, barrel shifters, and accumulators, to complete a multibit MAC operation. In this evaluation, the cell-level CIM functionality is assessed by measuring the propagation delay, power, and energy from the input activation node

I N

to the multiplier output node

I N \times W

.

For a multibit MAC operation, the multiplication between an input activation and stored weight can be performed using bit-level partial products. For a 4-bit input activation

I N [3 : 0]

and a 4-bit weight

W [3 : 0]

, the MAC operation can be expressed as:

\begin{matrix} M A C_{j} = & \sum_{i = 0}^{R - 1} (I N_{3, i} \times W_{j, i}) \times 2^{3} + \sum_{i = 0}^{R - 1} (I N_{2, i} \times W_{j, i}) \times 2^{2} \\ + \sum_{i = 0}^{R - 1} (I N_{1, i} \times W_{j, i}) \times 2^{1} + \sum_{i = 0}^{R - 1} (I N_{0, i} \times W_{j, i}) \times 2^{0} \end{matrix}

(1)

where

I N_{3, i}

to

I N_{0, i}

represent the 4-bit input activation bits of the

i^{t h}

row,

W_{j, i}

represents the stored weight bit, and R is the number of rows participating in the CIM operation. Each term represents a bitwise multiplication between the input activation bit and the stored weight, followed by a shift according to the input bit significance. The shifted partial sums are then accumulated to generate the final MAC output.

The architecture supports two modes of operation:

Memory Mode: Standard SRAM read and write operations are performed through the word-line and bit-line paths to store or update weight values.
CIM Mode: The stored weight is retained inside the SRAM cell, and the input activation is applied to the multiplier path to generate the bitwise multiplication output $I N \times W$ .

Read and write simulations verify the storage functionality of the proposed cell, while CIM simulations verify the bitwise multiplication path. Since the computation is performed digitally, the proposed methodology avoids the precision loss and non-linearity issues commonly associated with analog CIM approaches. The use of a compact 4T storage cell with a lightweight 2T multiplier provides a hardware-efficient approach for low-power SRAM-based CIM design in 45 nm CMOS technology.

4. Results

4.1. Functional Validation of the 4T+2T Bitcell

The proposed 4T+2T SRAM-based compute-in-memory (CIM) bitcell was validated using transient simulations in 45 nm CMOS technology. The simulations were performed to verify both the conventional memory operation and the CIM bitwise multiplication functionality of the proposed structure. We analyzed the bitcell under read, write, and CIM operating conditions to confirm that the reduced-transistor storage structure and the integrated 2T multiplier path function correctly.

The overall performance was evaluated in terms of delay, power, and energy for read, write, and CIM operations. The conventional 6T SRAM cell served as the baseline memory structure, while the 6T+2T design served as the reference CIM bitcell. The proposed 4T+2T design was compared against both to analyze the trade-off between memory functionality and compute-in-memory capability.

During memory operation, the internal storage nodes Q and

Q B

were monitored to verify stable data storage. The write operation confirms that the proposed bitcell can switch the internal nodes according to the applied input condition. In contrast, the read operation verifies that the stored data can be accessed without causing a destructive change in the stored state. Since reduced-transistor SRAM cells are more sensitive to node disturbances, the stability of Q and

Q B

was carefully monitored during the transient analysis.

For CIM operation, the stored data acts as the weight, while the input activation

I N

is applied to the multiplier path. The multiplier output

I N \times W

was observed to follow the expected bitwise multiplication behavior. When the stored weight is logic high, and the input activation switches high, the multiplier output also switches high, confirming the

I N \times W = 1

condition. For other input-weight combinations, the output remains at logic low. This verifies that the proposed 4T+2T bitcell supports in-memory bitwise multiplication.

4.2. Read Operation Analysis

The read performance of both the 6T+2T and proposed 4T+2T bitcells was evaluated using transient simulations. The delay was measured from the output node’s response to the applied wordline signal.

The measured read delay was 2023 ps for the 6T+2T design and 26.91 ps for the proposed 4T+2T design. The reduced delay in the 4T+2T bitcell indicates faster read response due to lower parasitic capacitance and simplified transistor structure.

In addition, the proposed design demonstrates significantly lower power consumption during read operations. The average read power is reduced from 10.03 nW in the 6T+2T design to 1.35 nW in the 4T+2T design.

Similarly, the read energy is reduced from 0.02006 fJ to 0.005403 fJ, highlighting the energy-efficient nature of the proposed architecture.

4.3. Write Operation Performance

The write operation of the proposed 4T+2T bitcell was evaluated by measuring the rise and fall transition delays of the storage nodes Q and QB, along with average write power and write energy.

The proposed 4T+2T bitcell achieved write delays of

61.17 ps

, calculated as the average of rise and fall transitions, with a write power of

30.60 nW

and write energy of

0.06120 fJ

. Compared with the 6T+2T reference, the proposed design shows a higher write delay of

61.17 ps

compared with

19.01 ps

, but substantially lower write power of

30.60 nW

compared with

879.9 nW

, and lower write energy of

0.06120 fJ

compared with

1.760 fJ

. The faster write speed of the 6T+2T design is attributed to its stronger transistor-assisted write path. However, the proposed 4T+2T design provides a more energy-efficient write operation, which is advantageous for workloads with infrequent weight updates.

4.4. CIM Operation and Bitwise Multiplication Performance

The CIM functionality of the proposed 4T+2T bitcell was verified by measuring the propagation from the input activation node

I N

to the multiplier output node

I N \times W

. This measurement directly represents the bitwise multiplication path of the CIM cell. In this mode, the stored SRAM data remains at the internal nodes

Q / Q B

, while the input activation is applied to the 2T multiplier path.

The waveform results confirmed that the multiplier output switches according to the expected bitwise multiplication operation, for the selected case where the stored weight is logic high,

I N \times W

follows the applied input activation. The storage node Q remains stable during this operation, indicating that the CIM evaluation does not disturb the stored weight. A small variation was observed at the complementary node

Q B

; however, it remained below the logic threshold and did not affect computational correctness.

The CIM performance of the proposed 4T+2T design was compared with the reference 6T+2T design. It was observed that the proposed design improves CIM energy efficiency while incurring only a minor delay penalty.

The compute-in-memory (CIM) capability of the proposed 4T+2T architecture was validated through transient simulation by evaluating the bitwise multiplication path. In CIM mode, the stored SRAM data acts as the weight, while the input activation

I N

is applied to the 2T multiplier path. The multiplier output

I N \times W

represents the bitwise product between the input activation and the stored weight.

Unlike conventional SRAM read operations, the CIM delay was not measured along the word-line-to-bit-line path. Instead, the delay was measured from the input activation node

I N

to the multiplier output node

I N \times W

, since this path directly represents the computation performed inside the bitcell. For the evaluated case, the stored weight was initialized as logic high, and the input activation was switched from logic low to logic high. The resulting transition at

I N \times W

confirms the bitwise multiplication condition

I N \times W = 1

.

The CIM performance of the proposed 4T+2T design was compared with the reference 6T+2T CIM bitcell using the same input transition, supply voltage, and measurement window. The results show that the proposed 4T+2T design has a slightly higher CIM delay of 50.89 ps compared to 47.76 ps for the 6T+2T design. However, the proposed design significantly reduces CIM power and energy. The CIM power decreases from 1.772 µW to 0.8014 µW, while the CIM energy decreases from 10.63 fJ to 4.808 fJ.

4.5. Monte Carlo Analysis

We performed Monte Carlo analysis on the proposed 4T+2T SRAM bitcell to evaluate its robustness under process variations. A total of 100 simulation samples were considered, with Gaussian variations applied to the transistor dimensions while maintaining a load capacitance of 5 fF. The analysis was carried out under the write-‘0’ condition to observe the stability of the internal storage nodes, propagation delay, power, and energy across different process samples.

The simulation results show that the write delay remains stable across all Monte Carlo samples. The rise delay varies approximately between 56 ps and 68 ps, while the fall delay varies between 58 ps and 64 ps. This narrow delay variation indicates that the proposed 4T+2T bitcell is insensitive to process-induced changes in transistor dimensions during write operations.

The average power consumption also shows only a small variation, remaining approximately within the range of 29 nW to 32 nW. Similarly, the write energy remains nearly constant across the samples, varying approximately from 0.059 fJ to 0.064 fJ. This confirms that the proposed design maintains consistent energy behavior under process variations.

The internal storage nodes Q and

Q B

remain stable for all Monte Carlo samples. During the write ‘0’ condition, node Q settles close to ground, while node

Q B

remains near the supply voltage. This confirms that the proposed bitcell successfully stores the intended logic state without bit-flip failures across the simulated variation range.

The multiplier output node

I N \times W

also maintains valid logic behavior during the simulations. The output reaches a valid logic-high level when expected, while a small transient undershoot may appear due to capacitive coupling and switching effects. Since this undershoot remains small and does not cross the logic threshold, it does not affect the circuit’s functional correctness.

The Monte Carlo waveforms in Figure 4 show the process-variation behavior of the reference 6T+2T and proposed 4T+2T bitcells across 100 samples. The

I N \times W

maximum voltage remains close to

1 V

for both designs, confirming that the multiplier output maintains a valid logic-high level under transistor-dimension variations. The proposed 4T+2T bitcell also shows lower average power variation, remaining within approximately

29.35

–

32.78 nW

, compared with

37.93

–

41.30 nW

for the 6T+2T reference. Similarly, the write energy of the proposed design remains lower and tightly distributed, varying from

0.05870

–

0.06557 fJ

, while the 6T+2T reference varies from

0.07586

–

0.08260 fJ

. These results confirm that the proposed 4T+2T bitcell maintains stable output behavior while reducing power and energy under process variations.

Table 2 summarizes the nominal CIM performance and Monte Carlo write variation results for the 6T+2T and proposed 4T+2T bitcells. The proposed 4T+2T design achieves a

54.77 %

reduction in both CIM power and energy with only a

6.55 %

increase in CIM delay. Under process variation, the 4T+2T bitcell consistently shows lower write delay, average power, and write energy across all 100 Monte Carlo samples, while the

I N \times W

output remains at a valid logic-high level for both designs.

Overall, the Monte Carlo analysis confirms that the proposed 4T+2T bitcell maintains stable write operation under process variations. The limited variation in delay, power, energy, and storage-node voltages demonstrates the robustness of the proposed design for low-power SRAM-based compute-in-memory applications.

5. Discussion

The proposed 4T+2T SRAM-based CIM bitcell has been validated at the circuit level for read, write, and bitwise CIM multiplication operations. The current work demonstrates that the proposed design can reduce read power, read energy, and CIM energy while maintaining correct storage and computation behavior. The reduction in transistor count from eight to six lowers device-level switching capacitance, which is consistent with the measured improvements in power and energy consumption reported in Section 4.

The transient write waveform shown in Figure 5 confirmed the correct switching behavior of the internal storage nodes Q and

Q B

. Across multiple write cycles over a

40 ns

simulation window, Q and

Q B

switch between fully complementary logic states, with Q transitioning from logic high to logic low while

Q B

transitions from logic low to logic high, and vice versa. The clean rail-to-rail transitions observed at both storage nodes, without intermediate metastability, indicate that the proposed 4T structure can reliably accept new data during word-line activation. The concurrently applied

I N

and W signals drive the write and CIM operation as expected. The multiplier output

v (i n x w)

rises to logic high only when the input and stored weight satisfy the multiplication condition

I N \times W = 1

, and returns to logic low otherwise. The exponential discharge observed at

v (i n x w)

during falling transitions is consistent with the RC discharge behavior of the 2T multiplier path and does not indicate a functional error.

Table 3 presents a consolidated comparison across bitcell designs. The proposed 4T+2T bitcell achieves the lowest read delay of

26.91 ps

, read power of

1.351 nW

, and read energy of

0.005403 fJ

among all compared designs. This read performance advantage stems directly from the reduced transistor count. By eliminating two transistors from the storage cell, the proposed design lowers the aggregate bitline capacitance seen during read access, which shortens the discharge time and reduces dynamic switching energy. This behavior is consistent with the well-established relationship between bitline parasitic capacitance and read performance in SRAM design. In the broader CIM design space, Tyagi and Mittal’s 6T+2T design [9], evaluated in

65 nm

, demonstrates that a compact 2T multiplier improves over NOR-gate-based multiplier approaches. The present work extends this direction by also compressing the storage cell, trading a modest write delay increase of

61.17 ps

compared with

19.01 ps

for substantially lower write power of

30.60 nW

compared with

879.9 nW

, and lower write energy of

0.06120 fJ

compared with

1.760 fJ

. The

65 nm

6T+4T and 8T designs in Table 3 show markedly higher read and write energy values of

69.2 fJ

and

58.8 fJ

, respectively, reflecting both process-node differences and the overhead of more complex multiplier structures. Unlike the conventional 6T SRAM cell, both the 6T+2T and proposed 4T+2T designs support bitwise multiplication, confirming that the transistor reduction in the proposed design does not sacrifice CIM capability.

Figure 6 illustrates the CIM waveform of the proposed 4T+2T bitcell under sustained switching operation. The input signal

v (i n)

is applied as a periodic pulse train, and the multiplier output

v (i n x w)

correctly follows the input when the stored weight supports a logic-high product. This confirms the intended bitwise multiplication behavior of the proposed CIM path. However, two important waveform characteristics are observed. First, the storage node

v (q 44)

, plotted with an expanded scale and a

+ 1 V

offset, shows small positive and negative transient spikes synchronized with the switching events of

I N

. These spikes remain approximately around

0.1 mV

and decay rapidly, indicating capacitive coupling from the multiplier path to the storage node through the shared transistor. Although these transients do not disturb the stored logic state, they indicate a coupling mechanism that should be further characterized in a full array environment, where bitline and wordline parasitics are present.

Second, the complementary node

v (q b 44)

shows a slow staircase-like drift that accumulates across switching cycles, reaching approximately

80 mV

by the end of the

20 ns

simulation window. Based on the periodic pulse train visible in Figure 6, with a period of approximately

2 ns

, this

80 mV

accumulation occurs over roughly 10 switching cycles. This corresponds to a per-cycle drift of approximately

8 mV

per transition at the

Q B

node. This drift is attributed to charge accumulation at the floating QB node in the asymmetric 4T topology, which lacks the strong restoring path present in a conventional 6T SRAM cell.

Since the observed

80 mV

drift remains below the logic threshold, no bit flip occurs during the simulated interval. However, this effect may become more significant over longer operating periods or at elevated temperatures. Two practical mitigation techniques should be investigated in future work. First, a periodic write-back scheme can be used to refresh the stored weight at fixed intervals, analogous to DRAM refresh, thereby resetting the

Q B

drift without requiring structural changes to the bitcell. Second, a weak keeper transistor can be inserted at the

Q B

node to provide a continuous restoring current, at the cost of a small increase in static power and transistor count.

Figure 7 presents the static noise margin (SNM) analysis of the proposed 4T SRAM storage cell using the voltage transfer characteristic (VTC) and butterfly curve methods under FreePDK45 technology at

V_{D D} = 1 V

and

T = 25 ° C

. The VTC shows a clear inverting transition around the switching point

V_{m} = 0.510 V

. The extracted unity-gain points are

U G P_{low} = 0.429 V

and

U G P_{high} = 0.596 V

, resulting in an

S N M_{UGP}

value of

83.7 mV

. Unlike a conventional 6T SRAM cell, the 4T topology does not produce two symmetric butterfly lobes; instead, the butterfly curve shows a single crossing point, indicating an asymmetric hold condition. Therefore, the butterfly-based SNM is approximately zero, and the stability of the proposed 4T cell is evaluated using the

S N M_{UGP}

metric. The obtained

83.7 mV

margin confirms that the proposed cell can maintain its stored state under nominal simulation conditions. However, since this margin is lower than that of a conventional 6T SRAM cell, process variation analysis is required to further evaluate the robustness of the proposed design.

Given the

83.7 mV

static noise margin and the

Q B

drift behavior characterized above, the proposed 4T+2T bitcell is best suited for moderate-scale CIM arrays, on the order of tens to low hundreds of rows, where weight refresh intervals can be bounded and the supply voltage remains near the nominal value of

1 V

. Workloads that involve sparse or infrequent weight updates, such as inference-only edge-AI acceleration for keyword spotting or low-resolution image classification, are particularly well matched to the proposed design. These workloads minimize write-cycle stress on the asymmetric storage node while benefiting from the read and CIM energy advantages of the 4T+2T bitcell.

6. Conclusions

This work presented a 4T+2T SRAM bitcell architecture and evaluated its performance against the conventional 6T+2T design in both memory and compute-in-memory (CIM) operations. The proposed design demonstrated significant improvements in write delay, power consumption, and energy efficiency, primarily due to reduced transistor count and lower switching capacitance. Read and write functionality at the storage nodes (Q/QB) confirmed correct operation, while Monte Carlo analysis verified robustness under process variations. Although minor non-ideal effects, such as voltage undershoot, were observed, they did not affect functional correctness. Overall, the results indicate that the proposed 4T+2T bitcell offers a compelling trade-off between performance, energy efficiency, and CIM capability, making it a promising candidate for next-generation low-power memory and in-memory computing applications.

Author Contributions

Conceptualization, data curation, formal analysis, investigation, methodology, validation, S.H. and S.B.; writing—review and editing, A.A.P.; visualization, supervision, project administration, K.K.C.; funding acquisition, K.K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Korea Planning & Evaluation Institute of Industrial Technology (KEIT) grant funded by the Korea government (MOTIE) (No. RS-2024-00432265, Development of an embedded AI controller to determine road surface conditions).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank our colleagues from KETI and KEIT, who provided insight and expertise, which greatly assisted the research and improved the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CIM	Compute-in-Memory
SRAM	Static Random Access Memory
MAC	Multiply-and-Accumulate
CMOS	Complementary Metal-Oxide-Semiconductor
AI	Artificial Intelligence
ML	Machine Learning
BL	Bit Line
BLB	Bit Line Bar (Complementary Bitline)
WL	Word Line
Q/QB	Storage Nodes (True/Complement)
IN	Input Activation
W	Weight
SNM	Static Noise Margin
4T	Four-Transistor Count-Based Cell Design
6T	Six-Transistor Count-Based Cell Design
2T	Two-Transistor Count-Based Cell Design
ps/ns	Time Units (picosecond/nanosecond)
nW/µW	Power Units (nanowatt/microwatt)
fJ/pJ	Energy Units (femtojoule/picojoule)

References

Park, S.; Hong, I.; Park, J.; Yoo, H.-J. An energy-efficient embedded deep neural network processor for high speed visual attention in mobile vision recognition SoC. IEEE J. Solid-State Circuits 2016, 51, 2380–2388. [Google Scholar] [CrossRef]
Chen, Y.-H.; Krishna, T.; Emer, J.S.; Sze, V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 2016, 52, 127–138. [Google Scholar] [CrossRef]
Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, 24–28 June 2017; pp. 1–12. [Google Scholar]
Lin, Z.; Tong, Z.; Zhang, J.; Wang, F.; Xu, T.; Zhao, Y.; Wu, X.; Peng, C.; Lu, W.; Zhao, Q.; et al. A review on SRAM-based computing in-memory: Circuits, functions, and applications. J. Semicond. 2022, 43, 031401. [Google Scholar] [CrossRef]
Kim, D.; Yu, C.; Xie, S.; Chen, Y.; Kim, J.-Y.; Kim, B.; Kulkarni, J.P.; Kim, T.T.-H. An overview of processing-in-memory circuits for artificial intelligence and machine learning. IEEE J. Emerg. Sel. Top. Circuits Syst. 2022, 12, 338–353. [Google Scholar] [CrossRef]
Mittal, S.; Verma, G.; Kaushik, B.K.; Khanday, F.A. A survey of SRAM-based in-memory computing techniques and applications. J. Syst. Archit. 2021, 119, 102276. [Google Scholar] [CrossRef]
Su, J.-W.; Chou, Y.-C.; Liu, R.; Liu, T.-W.; Chen, P.-J.; Chang, S.-J.; Hsieh, P.-H.; Lee, Y.-C.; Chang, M.-F. A 28nm 384kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 250–252. [Google Scholar]
Kim, H.; Chen, Q.; Yoo, T.; Kim, T.; Kim, B. A 1–16b precision reconfigurable digital in-memory computing macro featuring column-MAC architecture and bit-serial computation. In Proceedings of the 45th IEEE European Solid-State Circuits Conference (ESSCIRC), Cracow, Poland, 23–26 September 2019; pp. 345–348. [Google Scholar]
Tyagi, P.; Mittal, S. A 101 TOPS/W and 1.73 TOPS/mm² 6T SRAM-based digital compute-in-memory macro featuring a novel 2T multiplier. In Proceedings of the 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 24–28 March 2025; pp. 1–7. [Google Scholar] [CrossRef]
Chen, Y.; Yu, Z.; Nan, H.; Choi, K. Ultralow power SRAM design in near threshold region using 45 nm CMOS technology. In Proceedings of the 2011 IEEE International Conference on Electro/Information Technology (EIT), Mankato, MN, USA, 15–17 May 2011; pp. 1–4. [Google Scholar] [CrossRef]
Lee, K.; Kim, J.; Park, J. Low-cost 7T-SRAM compute-in-memory design based on bit-line charge-sharing based analog-to-digital conversion. In Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 30 October–3 November 2022; pp. 1–8. [Google Scholar] [CrossRef]
Yu, C.; Yoo, T.; Chai, K.T.C.; Kim, T.T.-H.; Kim, B. A 65-nm 8T SRAM compute-in-memory macro with column ADCs for processing neural networks. IEEE J. Solid-State Circuits 2022, 57, 3466–3476. [Google Scholar] [CrossRef]
Lee, K.; Cheon, S.; Jo, J.; Choi, W.; Park, J. A charge-sharing based 8T SRAM in-memory computing for edge DNN acceleration. In Proceedings of the 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 5–9 December 2021; pp. 739–744. [Google Scholar]
Chih, Y.-D.; Lee, Y.-C.; Fujiwara, H.; Shih, Y.-C.; Lo, C.-J.; Lai, C.-H.; Chang, C.-W.; Liao, W.-S.; Wang, C.-C.; Chang, M.-F. An all-digital SRAM-based full-precision compute-in-memory macro in 22 nm for machine-learning edge applications. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 252–254. [Google Scholar]
Fujiwara, H.; Mori, H.; Zhao, W.-C.; Chih, Y.-D.; Lee, Y.-C.; Chang, M.-F. A 5-nm fully-digital computing-in-memory macro supporting wide-range DVFS and simultaneous MAC and write operations. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–24 February 2022. [Google Scholar]
Yu, S.; Sun, X.; Liu, X.; Chen, Y.; Chen, X.; Huang, H.; Yu, H. Compute-in-memory chips for deep learning: Recent trends and prospects. IEEE Circuits Syst. Mag. 2021, 21, 31–56. [Google Scholar] [CrossRef]
Jhang, C.-J.; Xue, C.-X.; Hung, J.-M.; Chang, F.-C.; Chang, M.-F. Challenges and trends of SRAM-based computing-in-memory for AI edge devices. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 1773–1786. [Google Scholar] [CrossRef]
Burd, T.; Li, W.; Pistole, J.; Venkataraman, S.; Johnson, T.; Lee, J.; Velaga, S.; Schoenborn, Z. “Zen 4c”: The AMD 5nm area-optimized x86-64 microprocessor core. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 18–22 February 2024; pp. 38–40. [Google Scholar]
Zhang, J.; Wang, Z.; Verma, N. A machine-learning classifier implemented in a standard 6T SRAM array. In Proceedings of the 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), Honolulu, HI, USA, 14–17 June 2016. [Google Scholar]
Ali, M.; Chakraborty, I.; Saxena, U.; Agrawal, A.; Ankit, A.; Roy, K. A 35.5–127.2 TOPS/W dynamic sparsity-aware reconfigurable-precision compute-in-memory SRAM macro for machine learning. IEEE Solid-State Circuits Lett. 2021, 4, 129–132. [Google Scholar] [CrossRef]
Sharma, V.; Kim, H.; Kim, T.T.-H. A 64 Kb reconfigurable full-precision digital ReRAM-based compute-in-memory for artificial intelligence applications. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 3284–3296. [Google Scholar] [CrossRef]
Xiong, T.; Zhou, Y.; Kong, Y.; Wang, B.; Guo, A.; Wang, Y.; Xue, C.; Hsu, H.; Si, X.; Yang, J. Design methodology towards high-precision SRAM based computation-in-memory for AI edge devices. In Proceedings of the 18th International SoC Design Conference (ISOCC), Jeju, Republic of Korea, 6–9 October 2021; pp. 195–196. [Google Scholar]
Khwa, W.-S.; Chen, J.-J.; Li, J.-F.; Si, X.; Yang, E.-Y.; Sun, X.; Liu, R.; Chen, P.-J.; Li, Q.; Wang, S.; et al. A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 11–15 February 2018; pp. 496–498. [Google Scholar]
Biswas, A.; Chandrakasan, A.P. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In Proceedings of the 2018 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 11–15 February 2018; pp. 488–490. [Google Scholar]
Si, X.; Chen, J.-J.; Tu, Y.-N.; Huang, W.-H.; Wang, J.-H.; Chiu, Y.-C.; Wei, W.-C.; Wu, S.-Y.; Sun, X.; Liu, R.; et al. A twin-8T SRAM computation-in-memory unit-macro for multibit CNN-based AI edge processors. IEEE J. Solid-State Circuits 2019, 55, 189–202. [Google Scholar] [CrossRef]
Chen, W.; Qi, Z.; Akhtar, Z.; Siddique, K. Resistive-RAM-based in-memory computing for neural network: A review. Electronics 2022, 11, 3667. [Google Scholar] [CrossRef]
Antolini, A.; Lico, A.; Zavalloni, F.; Greco, L.; Zurla, R.; Bertolini, J.; Vignali, R.; Iannelli, L.; Calvetti, E.; Pasotti, M.; et al. High-precision close-to-analog programming of PCM cells as devices for AiMC edge-AI. IEEE J. Solid-State Circuits 2026, 1–14. [Google Scholar] [CrossRef]

Figure 1. Structural comparison of conventional 6T SRAM, reference 6T+2T SRAM-CIM, and proposed 4T+2T SRAM-CIM bitcells.

Figure 2. Proposed 4T+2T SRAM bitcell architecture.

Figure 3. CIM operation flow of the proposed 4T+2T bitcell. The 4T SRAM bitcell stores the weight

W B

and provides its complementary value W, while the input activation

I N

is applied to the 2T multiplier path. The multiplier generates the bitwise output

I N \times W

without requiring additional input inversion. During CIM operation, the conventional SRAM access signals

W L

,

B L

, and

B L B

remain inactive.

Figure 3. CIM operation flow of the proposed 4T+2T bitcell. The 4T SRAM bitcell stores the weight

W B

and provides its complementary value W, while the input activation

I N

is applied to the 2T multiplier path. The multiplier generates the bitwise output

I N \times W

without requiring additional input inversion. During CIM operation, the conventional SRAM access signals

W L

,

B L

, and

B L B

remain inactive.

Figure 4. Monte Carlo waveform comparison of the 6T+2T reference and proposed 4T+2T bitcells for output voltage, average power, and write energy.

Figure 5. Write operation waveform of the proposed 4T+2T bitcell. Complementary switching of Q and

Q B

confirms correct write behavior;

v (i n x w)

rises to logic high only when

I N \times W = 1

.

Figure 5. Write operation waveform of the proposed 4T+2T bitcell. Complementary switching of Q and

Q B

confirms correct write behavior;

v (i n x w)

rises to logic high only when

I N \times W = 1

.

Figure 6. CIM waveform of the proposed 4T+2T bitcell.

I N \times W

correctly follows

I N

when the stored weight is logic high; Q remains stable, confirming that CIM operation does not disturb stored data.

Figure 6. CIM waveform of the proposed 4T+2T bitcell.

I N \times W

correctly follows

I N

when the stored weight is logic high; Q remains stable, confirming that CIM operation does not disturb stored data.

Figure 7. Static noise margin (SNM) analysis of the proposed 4T SRAM bitcell using FreePDK45 at

V_{D D} = 1 V

and

T = 25 ° C

. The voltage transfer characteristic (VTC) and butterfly curve are used to evaluate the stability of the storage nodes. The unity-gain points are observed at

0.429 V

and

0.596 V

, resulting in an SNM of approximately

83.7 mV

.

Figure 7. Static noise margin (SNM) analysis of the proposed 4T SRAM bitcell using FreePDK45 at

V_{D D} = 1 V

and

T = 25 ° C

. The voltage transfer characteristic (VTC) and butterfly curve are used to evaluate the stability of the storage nodes. The unity-gain points are observed at

0.429 V

and

0.596 V

, resulting in an SNM of approximately

83.7 mV

.

Table 1. Truth table of the proposed 2T multiplier. Logic levels correspond to VDD = 1 V operation in 45 nm CMOS.

Input ( $IN$ )	Weight W ( $WB$ )	Output ( $IN \times W$ )
0	0 (1)	0
0	1 (0)	0
1	0 (1)	0
1	1 (0)	1

Table 2. CIM Performance and Monte Carlo Write Variation: 6T+2T vs. Proposed 4T+2T.

Parameter	6T+2T	4T+2T	Observation
CIM Bitwise Multiplication (Nominal)
CIM Delay	47.76 ps	50.89 ps	↑ 6.55% increase
CIM Power	1.772 µW	0.8014 µW	↓ 54.77% reduction
CIM Energy	10.63 fJ	4.808 fJ	↓ 54.77% reduction
Bitwise Mult.	Yes	Yes	No change
Monte Carlo Write Variation (100 Samples)
Write Delay Rise	8062.98–8087.08 ps	55.02–69.59 ps	Lower in 4T+2T
Write Delay Fall	8074.01–8098.36 ps	56.97–65.07 ps	Lower in 4T+2T
Average Power	37.93–41.30 nW	29.35–32.78 nW	Lower in 4T+2T
Write Energy	0.07586–0.08260 fJ	0.05870–0.06557 fJ	Lower in 4T+2T
$I N \times W$ Max	0.999979–0.999984 V	0.999982–0.999987 V	Valid logic high

Table 3. Comprehensive Comparison of SRAM-CIM Bitcell Designs Across Process Nodes.

Parameter	6T	6T+4T	8T	6T+2T	4T+2T
Parameter	45 nm	65 nm * [9]	65 nm * [9]	45 nm	45 nm (Ours)
Bitwise Multiplication	No	Yes	No	Yes	Yes
Read Delay (ps)	2008	127.5	102.7	2023	26.91
Read Power (nW)	15.82	–	–	10.03	1.351
Read Energy (fJ)	0.03164	69.2	58.8	0.02006	0.005403
Write Delay (ps)	8077	216.7	164.3	19.01	61.17
Write Power (nW)	16.98	–	–	879.9	30.60
Write Energy (fJ)	0.03396	155.8	133.1	1.760	0.06120

* 6T+4T and 8T values are from a 65nm CMOS process at 1V supply; direct comparison is indicative only due to process node differences.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hariprasad, S.; Balasubramanian, S.; Patel, A.A.; Choi, K.K. A Compact 4T+2T SRAM-Based Digital Compute-in-Memory Bitcell with Reduced Transistor Count for Energy-Efficient Bitwise MAC Operations in 45 nm CMOS. Electronics 2026, 15, 2630. https://doi.org/10.3390/electronics15122630

AMA Style

Hariprasad S, Balasubramanian S, Patel AA, Choi KK. A Compact 4T+2T SRAM-Based Digital Compute-in-Memory Bitcell with Reduced Transistor Count for Energy-Efficient Bitwise MAC Operations in 45 nm CMOS. Electronics. 2026; 15(12):2630. https://doi.org/10.3390/electronics15122630

Chicago/Turabian Style

Hariprasad, Shamanth, Srinivas Balasubramanian, Adnan A. Patel, and Kyuwon Ken Choi. 2026. "A Compact 4T+2T SRAM-Based Digital Compute-in-Memory Bitcell with Reduced Transistor Count for Energy-Efficient Bitwise MAC Operations in 45 nm CMOS" Electronics 15, no. 12: 2630. https://doi.org/10.3390/electronics15122630

APA Style

Hariprasad, S., Balasubramanian, S., Patel, A. A., & Choi, K. K. (2026). A Compact 4T+2T SRAM-Based Digital Compute-in-Memory Bitcell with Reduced Transistor Count for Energy-Efficient Bitwise MAC Operations in 45 nm CMOS. Electronics, 15(12), 2630. https://doi.org/10.3390/electronics15122630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Compact 4T+2T SRAM-Based Digital Compute-in-Memory Bitcell with Reduced Transistor Count for Energy-Efficient Bitwise MAC Operations in 45 nm CMOS

Abstract

1. Introduction

2. Background

3. Proposed Architecture & Methodology

3.1. Proposed 4T+2T Bitcell

3.2. Overall Architecture and Methodology

4. Results

4.1. Functional Validation of the 4T+2T Bitcell

4.2. Read Operation Analysis

4.3. Write Operation Performance

4.4. CIM Operation and Bitwise Multiplication Performance

4.5. Monte Carlo Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI