Novel Low-Power Computing-In-Memory (CIM) Design for Binary and Ternary Deep Neural Networks by Using 8T XNOR SRAM

Gundrapally, Achyuth; Alnatsheh, Nader; Choi, Kyuwon Ken

doi:10.3390/electronics13234828

Open AccessArticle

Novel Low-Power Computing-In-Memory (CIM) Design for Binary and Ternary Deep Neural Networks by Using 8T XNOR SRAM

by

Achyuth Gundrapally

^*

,

Nader Alnatsheh

and

Kyuwon Ken Choi

DA-Lab, Department of Electrical and Computer Engineering, Illinois Institute of Technology, 3301 South Dearborn Street, Chicago, IL 60616, USA

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(23), 4828; https://doi.org/10.3390/electronics13234828

Submission received: 8 November 2024 / Revised: 2 December 2024 / Accepted: 5 December 2024 / Published: 6 December 2024

(This article belongs to the Special Issue Recent Advances in AI Hardware Design)

Download

Browse Figures

Versions Notes

Abstract

The increasing demand for high-performance and low-power hardware in artificial intelligence (AI) applications, such as speech recognition, facial recognition, and object detection, has driven the exploration of advanced memory designs. Convolutional neural networks (CNNs) and deep neural networks (DNNs) require intensive computational resources, leading to memory access times and power consumption challenges. To address these challenges, we propose the application of computing-in-memory (CIM) within FinFET-based 8T SRAM structures, specifically utilizing P-latch N-access (PLNA) and single-ended (SE) configurations. Our design significantly reduces power consumption by up to 56% in the PLNA configuration and 60% in the SEconfiguration compared to traditional FinFET SRAM designs. These reductions are achieved while maintaining competitive delay performance, making our approach a promising solution for implementing efficient and low-power AI hardware. Detailed simulations in 7 nm FinFET technology underscore the potential of these CIM-based SRAM structures in overcoming the computational bottlenecks associated with DNNs and CNNs.

Keywords:

binary weights; ternary activations; SRAM; CIM; SRAM-CIM; DNN; CNNs; Al; VMM; MAC

1. Introduction

The demand for efficient and low-power computational architectures has surged due to the widespread adoption of artificial intelligence (AI) and machine learning applications [1]. One promising solution to address the limitations of the traditional von Neumann’s architecture, which suffers from significant data transfer inefficiencies, is computing in-memory (CIM). CIM integrates computation directly into memory, reducing data transfer latency and power consumption, making it highly suitable for applications requiring high performance and low energy consumption.

Figure 1 below illustrates the von Neumann bottleneck, highlighting the inefficiency caused by continuous data movement between the processing unit and memory in traditional architectures. This bottleneck has driven interest in alternative architectures, like CIM, which seeks to minimize these inefficiencies [2].

A key area where CIM is particularly beneficial is executing binary and ternary computations. Binary Neural Networks (BNNs), which use binarized activations and weights (+1, −1), have demonstrated significant improvements in computational efficiency by transforming the Multiply-And-Accumulate (MAC) operations into XNOR operations. This reduces the complexity of matrix multiplication, which is a core operation in AI workloads [1,3]. Ternary Neural Networks (TNNs), which extend this concept by allowing inputs to take values of +1, 0, and −1, offer even greater potential for nuanced computations. TNNs balance computational efficiency and representational flexibility, maintaining low power requirements while enhancing accuracy compared to BNNs [4].

This paper presents an 8T SRAM bit-cell design optimized for ternary XNOR operations in CIM applications. Our design leverages the simplicity of XNOR-based multiplication, extending its capability to handle ternary inputs and providing a highly efficient solution for in-memory computing. Simulations using 7 nm FinFET technology demonstrate significant power and delay improvements over conventional designs [5]. The P-latch N-access (PLNA) and single-ended (SE) structures proposed in the paper further enhance efficiency, lowering overall power consumption and delay.

The results of this study indicate that the proposed 8T SRAM design has the potential to contribute significantly to the development of ultra-low-power CIM architectures, which are crucial for next-generation AI accelerators and embedded systems [6]. By focusing on ternary computations, we can optimize both speed and energy efficiency, paving the way for more effective and scalable CIM solutions in the rapidly evolving landscape of AI-driven technologies [7]. Furthermore, the integration of CIM into Processing-In-Memory (PIM) systems offers the potential to further enhance computational efficiency by reducing data movement overhead, enabling faster processing directly within memory [8]. This synergy between CIM and PIM could revolutionize the architecture of memory-centric AI accelerators, providing a more efficient path to handling increasingly complex AI workloads.

2. Background

Large volumes of data must be processed in parallel in AI-related applications, frequently resulting in high energy costs. SRAM-based computing-in-memory (CIM) can address these issues, minimizing data travel and, consequently, power consumption by enabling operations to be carried out directly within the memory array. By lowering the number of logic operations needed, ternary logic further improves these systems’ energy efficiencies, resulting in more power-efficient designs than conventional binary systems [9].

The XNOR gate is essential in these ternary logic systems, particularly for tasks involving comparison functions and arithmetic computations. The 8T SRAM cells and ternary XNOR gates allow for effective in-memory processing, which boosts speed while lowering power and delay. Therefore, they are perfect for AI applications that demand quick data processing and minimal power consumption [10].

2.1. 8T SRAM Cell

Compared to conventional 6T SRAM, the 8-transistor (8T) SRAM cell has several benefits [7]. The 8T SRAM increases data stability and enables faster operation at low voltages by separating the read and write processes. Because of this, 8T SRAM is a recommended option for low-power applications where speed and energy efficiency are crucial [11]. Specifically, 8T SRAM is energy-constrained and used in applications such as edge computing and the Internet of Things due to its enhanced read stability under ultra-low voltage situations.

2.2. Pass Transistor Logic

Pass transistor logic (PTL) is a design technique for implementing digital circuits, where transistors pass signals directly rather than switching between voltage levels. Traditional CMOS logic uses a combination of PMOS and NMOS transistors to pull outputs high or low, depending on input conditions [11]. On the other hand, PTL, in Figure 2, uses transistors to selectively allow or block the flow of voltage signals, acting as switches. PTL can result in lower power consumption, fewer transistors, and faster switching times [12].

In PTL, NMOS transistors can pass a strong ‘0’ but have difficulty passing a strong ‘1’, while PMOS transistors can pass a strong ‘1’ but are less efficient at passing a strong ‘0’. Thus, both NMOS and PMOS are often combined for full-swing operations (passing both ‘0’ and ‘1’) [11,12].

3. XNOR Operation in Ternary Logic Using Pass Transistor Logic (PTL)

In conventional binary logic, the XNOR gate outputs a 1 when both inputs are equal and a 0 when they differ. However, in a ternary logic system, the XNOR gate outputs a neutral state (0) when there is a balance in input conditions, while +1 and −1 represent the high and low logic states, respectively [13].

The PTL-based ternary XNOR gate in Figure 2 design operates within a memory cell that stores values of Q and its complement Q_b. The operation controls several vital signals: RWL_P, RWL_N, RWL_PB, and RWL_NB. These control signals determine whether the transistors act to pull up or pull down the output, influencing whether the XNOR gate outputs. Table 1 shows the logic map for the product between the input and the weight [12,13].

The variables involved are the following:

Q and Q_b: Stored values representing the XNOR gate’s primary inputs.
RWL_P (Read/Write Line for Pull-up): Controls whether the circuit attempts to output a high value.
RWL_N (Read/Write Line for Pull-down): Controls whether the circuit attempts to output a low value.
RWL_PB and RWL_NB: Complementary signals that balance the pull-up and pull-down behavior.

XNOR Operation Under Different Input Combinations When Q = 1, Q_b = 0

The relevant variables are as follows:

XNOR output −1: With RWL_PB = 0, RWL_NB = 1 as shown in Figure 2a, the transistors M7 and M8 are turned resulting in the multiplication result −1.
XNOR output +0: In the case with RWL_P = 1 and RWL_NB = 0 from Figure 2b, the output indicates a strong pull-down path. The logic 1 of Q and 0 of Q_b is used as source of M6 and M7 transistors making them on, and due to strong pull-down path the resulting result is +0.
XNOR output −0: As shown in Figure 2c, both Q and Q_B are utilized to generate the output −0; the configuration typically involves a balance of weak currents flowing through the M5 and M8 transistors. In this case, M8 receives a logic 0 from Q_b, while M5 receives a logic 1 from Q. The weak currents from these transistors meet, resulting in a neutral output of −0 due to the limited strength of the pull-up and pull-down actions.
XNOR output +1: With RWL_P = 0 and RWL_N = 1 from Figure 2d, the output is a solid high (+1) due to transistors M5 and M6, and the multiplication result is +1.

To complement the explanation and diagram of the XNOR operation in CIM architecture, Table 1 outlines the input combinations of RWL_P, RWL_N, RWL_PB, and RWL_NB, as well as the corresponding XNOR outputs based on the weights and inputs. This table will clearly explain how different control signals interact with the circuit to produce +1, 0, or −1 outputs [9,12,13].

4. Proposed XNOR SRAM Design and Optimization

This study introduces two advanced SRAM designs: single-ended (SE) SRAM and P-Latch and N-Access (PLNA) SRAM. These are integrated with XNOR pass transistor logic [14]. These designs are critical and evaluated against a conventional 6T SRAM architecture that uses standard XNOR pass transistor logic. The original design is a 10T SRAM, while our proposed designs utilize an 8T structure, and are all developed using 7 nm FinFET technology [15].

4.1. P-Latch N-Access (PLNA) SRAM Structure

The proposed P-latch N-access (PLNA) 8T SRAM cell differs significantly from traditional SRAM designs by completely separating the read and write circuits, as illustrated in Figure 3. This separation has been implemented to achieve two primary goals: reducing switching power consumption and improving write speed.

Traditional SRAM cells typically integrate read and write operations within the same circuit, which can lead to increased power dissipation during switching. In contrast, the proposed PLNA 8T cell isolates the read and write paths, allowing each to function independently. We can eliminate unnecessary switching events in the read path during write operations by isolating these operations, leading to significant power savings. This approach also reduces the possibility of rapid current flow between the latch and access transistors, which typically occurs in conventional designs [16].

The PLNA cell uses only two load transistors (M1 and M2) for the latch, minimizing the number of active elements involved in holding the data. This simplified latch design avoids rapid or current solid flow between the latch and the access transistor during write operations. As a result, the write operation can be completed more quickly and efficiently, saving power without compromising performance. The decoupled nature of the design not only enhances the cell’s write speed but also allows for low-power operation, which is especially advantageous for applications where minimizing energy consumption is critical.

The proposed PLNA 8T SRAM cell design is to support CIM operations, enabling essential logic functions directly within the memory cell. We can perform bitwise operations during data retrieval by utilizing the XNOR operation in the read circuit, as shown in the schematic, allowing for simple computational tasks without transferring data to separate processing units. This CIM [12] capability further enhances energy efficiency for memory-intensive tasks, as it minimizes data movement and leverages in-memory computation to accelerate specific logic functions, which is beneficial in applications such as artificial intelligence and machine learning [16].

The sizes of FinFET transistors in the SRAM cell are in Table 2. Proper transistor sizing is critical to balancing this design’s speed, power, and area, particularly in sub-threshold regions where achieving high yields can be challenging.

4.2. Single-Ended (SE) SRAM Structure

The single-ended (SE) SRAM structure presented in Figure 4 differs from the traditional differential SRAM cells by using a single-ended read mechanism, leveraging only one-bit line (Q) for read operations. This unique design provides substantial benefits for area efficiency and power savings, making it an attractive option for high-density memory applications.

Unlike conventional SRAM cells that require both Q and Q_b for readout, the SE SRAM cell reads only the Q node, simplifying the read circuitry and eliminating the need for a complementary output. This single-ended approach conserves area by removing the need for differential read bit lines and associated sense amplifiers, making the SE SRAM structure suitable for applications where minimizing cell size is critical. However, this design has a trade-off in noise margin and speed, as single-ended sensing can be more susceptible to process variations. Careful design of the read circuitry, including appropriate transistor sizing, mitigates these effects [11].

In the single-ended design, the SRAM cell reduces the number of active transistors during read and write operations, leading to lower dynamic power consumption. During write operations, the cell operates similarly to traditional SRAM structures, with transistors (e.g., M3) controlled by the write word line to update the Q value. The SE SRAM cell uses an efficient single-ended bit line structure for read operations. The circuit minimizes switching activity by using selective transistors in the read path, which lowers power consumption during idle and active states.

The SE SRAM structure can support CIM operations, allowing essential logic functions such as XNOR to be performed directly in the memory cell [13]. This capability reduces data movement between memory and processing units, optimizing performance and energy usage in applications like machine learning, where memory-centric operations dominate.

The sizes of FinFET transistors in the SE SRAM cell are in Table 3. Optimized transistor sizing is crucial to balance power, speed, and area efficiency, especially in single-ended designs where noise sensitivity can be a concern. The number of fins determines the effective channel width of the transistor and directly impacts the current-driving capability.

This SE SRAM design addresses the challenge of stable operation in sub-threshold voltage regions, ensuring high yield under stringent power constraints. It is suitable for ultra-low-power applications, where maintaining stable operation at minimal supply voltages is essential.

5. Results

The SRAM designs were simulated using Synopsys HSPICE 2012 to evaluate performance across all four input combinations, applying weights of +1 and −1 at room temperature (27 °C) and minimum transistor sizing. Typical–typical (TT) FinFET structures were used to represent nominal conditions for the simulations, ensuring realistic performance evaluation. Power consumption and delay metrics for each configuration were measured and analyzed to assess efficiency under varying operational conditions. The FinFET libraries for this simulation were sourced from Arizona State University’s standard 7nm FinFET library, ensuring compatibility with advanced process technologies [17,18].

5.1. Power Results

Output waveforms were generated from Synopsys CosmosScope Waveform Analyzer, 2012 providing visual confirmation of signal behavior under the proposed and original SRAM designs. Additionally, .sp files based on each design’s specifications (original and proposed) were generated to enable the precise simulation of all structures. Each simulation produced detailed output files, including graph data files and measured result files generated by HSPICE., which recorded each operation’s power and delay outputs.

The power comparison is shown in Table 4, detailing each design’s power consumption (measured in microWatts) for all input conditions and weight applications.

An overall power comparison is shown in Table 5, consolidating the total power consumption (measured in microWatts) of the SE and PLNA designs across all eight operations, providing a cumulative view of power efficiency.

5.2. Delay Results

A delay comparison is shown in Table 6, recording the delay observed across all eight operations (measured in nanoseconds), encompassing four input states with weights of +1 and −1.

The delay comparison of different states is shown in Table 7, which summarizes the total delay across all input states with weights of +1 and −1, enabling a clear comparison of delay characteristics between the original and proposed designs (measured in picoseconds).

From the detailed power consumption analysis shown in Table 4, we observe that multiplication with zero inputs significantly increases power consumption compared to binary multiplication with +1 or −1. Power consumption increases because the zero states lead to stable RBL conditions, where minimal switching happens, causing unnecessary power retention. In contrast, the binary operations with weights +1 and −1 exhibit discharge and charge states in the RBL, which lead to more efficient power consumption through controlled switching activity. This finding highlights an essential characteristic of binary-weighted operations, where dynamic switching results in lower cumulative power than idle or stable states associated with zero input multiplications.

Delay analysis, illustrated in Table 6 and Table 7, shows that the SE and PLNA architectures also outperform the original design in terms of speed, with the SE design achieving the lowest overall delay (0.501 ns) compared to the PLNA (0.713 ns) and the original (1.009 ns). This improved delay performance reduces signal fluctuations and efficient switching during binary-weighted operations. Moreover, the analysis reveals that zero-weight multiplications contribute to higher power consumption than binary multiplications with weights +1 or −1, reinforcing the benefits of binary operations in reducing power.

6. Discussion

The following graphs represent the output waveforms for the original in Figure 5, PLNA in Figure 6, and SE in Figure 7 SRAM designs when simulated with four input combinations and weights of +1 and −1. These waveforms were generated in Synopsys CosmosScope Waveform Analyzer 2012 and are used to assess each design’s power consumption and delay characteristics.

The four input signals are labeled with binary values (−1, −0, +0, and +1) and paired with weights of +1 or −1. The output is shown in the Mac output region.

The Mac output fluctuates between positive and negative values as shown in Figure 5 depending on the input–weight multiplication, indicating dynamic switching that impacts power and delay. Q and Qb (output and its complement) show transitions based on the weighted input states. For a weight of −1, Q = 0 and Qb = 1; for a weight of +1, Q = 1 and Qb = 0 confirm the correct multiplication process functioning.

The PLNA SRAM waveform in Figure 6 reveals that the Mac output waveform reflects the input–weight products but with smoother transitions, suggesting reduced delay and switching activity. This design appears optimized for power efficiency due to its limited Mac output fluctuation range. The PLNA design exhibits the most stable Mac output waveform among the three designs, indicating a potential delay and power consumption reduction due to minimized signal fluctuations.

The SE SRAM design in Figure 7 is similar to the original design, where four input signals are paired with weights of +1 and −1, affecting the output waveform based on the input–weight combinations. The Mac output waveform shows noticeable changes corresponding to input conditions, with smoother transitions than the original design, indicating potentially lower switching activity and power. The SE design shows reduced amplitude in the MAC Output transitions compared to the original design, which suggests potential power savings.

While the original design provides baseline performance, the SE and PLNA designs demonstrate significant improvements in power efficiency, with the PLNA structure providing the most stable and optimized output. The SE design contributes to area efficiency by employing a streamlined structure with fewer transistors compared to conventional designs, further reducing the overall area overhead. Additionally, the proposed architecture innovatively shares the reading part across multiple writing parts (4T PLNAs) in the same SRAM column. This shared configuration enhances area optimization by minimizing the need for redundant read components, which are typically required for each cell in conventional SRAM designs. Consequently, both the SE and PLNA structures offer a balanced combination of power efficiency, stability, and area optimization, making them highly suitable for advanced computing-in-memory applications.

7. Conclusions

In this study, we proposed two novel FinFET 7nm SRAM designs—single-ended (SE) and P-latch N-access (PLNA) architectures—targeted at optimizing power and delay performance for computing-in-memory (CIM) applications. Through extensive simulations, we compared these designs with an original 10T SRAM architecture across multiple input–weight combinations. The results demonstrate substantial power savings for both designs, with the SE architecture achieving a 60.42% reduction and the PLNA architecture achieving a 56.90% reduction in total power consumption compared to the original design. Additionally, both SE and PLNA architectures outperformed the original 10T SRAM design in terms of speed, with the SE design achieving the lowest overall delay (0.501 ns), followed by the PLNA design (0.713 ns), and the original design at 1.009 ns.

In summary, the proposed SE and PLNA SRAM architectures significantly improve both power efficiency and processing speed, making them highly suitable for CIM applications where energy efficiency and high-speed operation are critical. This research highlights the advantages of binary-weighted operations for power optimization, positioning the SE and PLNA designs as practical solutions for low-power, high-performance memory in advanced computing systems. We are currently working [19] on a follow-up study to present a complete chip design and evaluate performance using standard datasets, such as CIFAR-10. This future work will also include a detailed comparative analysis with key references, further validating the practical benefits of the proposed designs in real-world deep learning applications.

Author Contributions

Conceptualization, data curation, formal analysis, investigation, methodology, validation, A.G.; writing—original draft, writing—review and editing, N.A.; supervision, funding acquisition, project administration, K.K.C. All authors read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology Innovation Program (20018906, Development of autonomous driving collaboration control platform for commercial and task assistance vehicles) funded By the Ministry of Trade, Industry and Energy (MOTIE, Republic of Korea).

Data Availability Statement

Data are contained within the article.

Acknowledgments

We thank our colleagues from KETI and KEIT, who provided insight and expertise, which greatly assisted the research and improved the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CNN	Convolution Neural Networks
DNN	Deep Neural Networks
PLNA	P-Latch N-Access
SE	Single-Ended
MAC	Multiply and Accumulate
SRAM	Static Random Access Memory
CIM	Computing-In-Memory
ADC	Analog to Digital Converter
TBN	Ternary Binary Neural Network

References

Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.-R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
Taigman, Y.; Yang, M.; Ranzato, M.A.; Wolf, L. DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar] [CrossRef]
Fang, W.; Wang, L.; Ren, P. Tinier-YOLO: A real-time object detection method for constrained environments. IEEE Access 2020, 8, 1935–1944. [Google Scholar] [CrossRef]
Alemdar, H.; Leroy, V.; Prost-Boucle, A.; Pétrot, F. Ternary neural networks for resource-efficient AI applications. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2547–2554. [Google Scholar] [CrossRef]
Song, T.; Jung, J.; Rim, W.; Kim, H. A 7 nm FinFET SRAM using EUV lithography with dual write-driver-assist circuitry for low-voltage applications. In Proceedings of the 2018 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 11–15 February 2018. [Google Scholar] [CrossRef]
Lu, A.; Peng, X.; Luo, Y.; Yu, S. Benchmark of the compute-in-memory based DNN accelerator with area constraint. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2020, 28, 1945–1952. [Google Scholar] [CrossRef]
Si, X.; Khwa, W.-S.; Chen, J.-J.; Li, J.-F.; Sun, X.; Liu, R.; Yu, S.; Yamauchi, H.; Li, Q.; Chang, M.-F. A dual-split 6T SRAM-based computing-in-memory unit-macro with fully parallel product-sum operation for binarized DNN edge processors. IEEE Trans. Circuits Syst. Regul. Pap. 2019, 66, 4172–4185. [Google Scholar] [CrossRef]
De Castro, V.; Marcos, E.; Vara, J.M. Applying CIM-to-PIM model transformations for the service-oriented development of information systems. Inf. Softw. Technol. 2010, 52, 1295–1311. [Google Scholar] [CrossRef]
Alnatsheh, N.; Kim, Y.; Cho, J.; Choi, K.K. A Novel 8T XNOR-SRAM: Computing-in-Memory Design for Binary/Ternary Deep Neural Networks. Electronics 2023, 12, 877. [Google Scholar] [CrossRef]
Kim, Y.; Patel, S.; Kim, H.; Yadav, N.; Choi, K.K. Ultra-Low Power and High-Throughput SRAM Design to Enhance AI Computing Ability in Autonomous Vehicles. Electronics 2021, 10, 256. [Google Scholar] [CrossRef]
Kim, Y.; Li, S.; Yadav, N.; Choi, K.K. A Novel Ultra-Low Power 8T SRAM-Based Compute-in-Memory Design for Binary Neural Networks. Electronics 2021, 10, 2181. [Google Scholar] [CrossRef]
Lee, S.; Kim, Y. Low power ternary XNOR using 10T SRAM for in-memory computing. In Proceedings of the 2022 19th International SoC Design Conference (ISOCC), Gangneung-si, Republic of Korea, 19–22 October 2022; pp. 352–353. [Google Scholar] [CrossRef]
Yin, S.; Jiang, Z.; Seo, J.-S.; Seok, M. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks. IEEE J.-Solid-State Circuits 2020, 55, 1733–1743. [Google Scholar] [CrossRef]
Biswas, A.; Chandrakasan, A.P. CONV-SRAM: An energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks. IEEE J.-Solid-State Circuits 2019, 54, 217–230. [Google Scholar] [CrossRef]
Almeida, R.B.; Marques, C.; Butzen, P.F.; Silva, F.; Reis, R.A.; Meinhardt, C. Analysis of 6T SRAM cell in sub-45 nm CMOS and FinFET technologies. Microelectron. Reliab. 2018, 88, 196–202. [Google Scholar] [CrossRef]
Yadav, N.; Shah, A.P.; Vishvakarma, S.K. Stable, reliable, and bit-interleaving 12T SRAM for space applications: A device circuit co-design. IEEE Trans. Semicond. Manuf. 2017, 30, 276–284. [Google Scholar] [CrossRef]
Vangala, M. FinFET Cell Library Design and Characterization. Master’s Thesis, Arizona State University, Tempe, TX, USA, 2017. Available online: https://hdl.handle.net/2286/R.I.45536 (accessed on 4 December 2024).
Narasimham, B.; Luk, H.; Paone, C.; Montoya, A.-R.; Riehle, T.; Smith, M.; Tsau, L. Scaling trends and the effect of process variations on the soft error rate of advanced FinFET SRAMs. In Proceedings of the 2023 IEEE International Reliability Physics Symposium (IRPS), Monterey, CA, USA, 26–30 March 2023; pp. 1–4. [Google Scholar] [CrossRef]
Jeong, H.; Kim, S.; Park, K.; Jung, J.; Lee, K.J. A Ternary Neural Network Computing-in-Memory Processor With 16T1C Bitcell Architecture. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 1739–1743. [Google Scholar] [CrossRef]

Figure 1. Comparison between von Neumann and computing-in-memory(CIM) architecture.

Figure 2. Ternary XNOR operation with fixed weights and varying inputs. Subfigures (a–d) explains different XNOR inputs and desired outputs.

Figure 3. P-Latch N-Access (PLNA) SRAM structure with XNOR logic.

Figure 4. Single-Ended (SE) SRAM structure with XNOR logic.

Figure 5. Original SRAM waveform analysis of the proposed XNOR-SRAM CIM operation for ternary inputs. The input signals (+1, 0, −1, +0) are represented by yellow, black, cyan, and magenta lines, respectively. The ternary combination output (−1, 0, +1) is shown by the orange line. The complementary output Qb is indicated by the green line, while the primary output Q is represented by the blue line. The results illustrate correct ternary computation, highlighting the functionality of the XNOR pass transistor logic.

Figure 6. PLNA SRAM waveform analysis of the proposed XNOR-SRAM CIM operation for ternary inputs. The input signals (+1, 0, −1, +0) are represented by yellow, black, cyan, and magenta lines, respectively. The ternary combination output (−1, 0, +1) is shown by the orange line. The complementary output Qb is indicated by the green line, while the primary output Q is represented by the blue line. The results illustrate correct ternary computation, highlighting the functionality of the XNOR pass transistor logic.

Figure 7. SE SRAM waveform analysis of the proposed XNOR-SRAM CIM operation for ternary inputs. The input signals (+1, 0, −1, +0) are represented by yellow, black, cyan, and magenta lines, respectively. The ternary combination output (−1, 0, +1) is shown by the orange line. The complementary output Qb is indicated by the green line, while the primary output Q is represented by the blue line. The results illustrate correct ternary computation, highlighting the functionality of the XNOR pass transistor logic.

Table 1. Logic Map for the Product Between the Input and the Weight.

Input Value	Inputs				Weights		Weights Value	Outputs RBL
Input Value	RWL_P	RWL_N	RWL_PB	RWL_NB	Q	Q_b	Weights Value	Outputs RBL
−1	1	0	0	1	1	0	+1	−1
+0	0	0	1	1	1	0	+1	0
−0	1	1	0	0	1	0	+1	0
+1	0	1	1	0	1	0	+1	1
−1	1	0	0	1	0	1	−1	1
+0	0	0	1	1	0	1	−1	0
−0	1	1	0	0	0	1	−1	0
+1	0	1	1	0	0	1	−1	−1

RBL = Read Bit Line, RWL = Read Word Line.

Table 2. Transistor Sizing for FinFET in PLNA 8T SRAM Cell.

Number	Transistor	Number of FINS
1	M1	1
2	M2	1
3	M3	2
4	M4	2
5	M5	1
6	M6	4
7	M7	1
8	M8	4

Table 3. Transistor Sizing for FinFET in SE 8T SRAM Cell.

Number	Transistor	Number of FINS
1	M1	1
2	M2	1
3	M3	2
4	M4	4
5	M5	1
6	M6	4
7	M7	1
8	M8	4

Table 4. Comparison of Power Consumption for Different Weights, Inputs, and RBL States across Original, 7 nm FinFET PLNA, and 7 nm FinFET SE Configurations.

Weight	Input	RBL	Power ( $μ$ W)
Weight	Input	RBL	Original [12]	FinFET 7 nm PLNA	FinFET 7 nm SE
+1 (Q=H, Qb=L)	−1	Discharge	2.579	1.179	0.874
+1 (Q=H, Qb=L)	+0	Stable	189.86	70.867	83.384
+1 (Q=H, Qb=L)	−0	Stable	118.21	63.74	41.462
+1 (Q=H, Qb=L)	+1	Charge	23.3	4.858	5.798
−1 (Q=L, Qb=H)	−1	Charge	1.8853	1.081	0.76809
−1 (Q=L, Qb=H)	+0	Stable	119.53	64.34	42.07
−1 (Q=L, Qb=H)	−0	Stable	185.6	69.2	78.85
−1 (Q=L, Qb=H)	+1	Discharge	6.611	4.11	2.2427

Table 5. Overall Power Comparison of FinFET Designs.

Design	Total Power ( $μ$ W)	Difference ( $μ$ W)	Percentage Reduction (%)
FINFET Original [12]	81.401	–	–
FINFET PLNA	35.093	−46.308	56.90%
FINFET SE	32.2123	−49.1887	60.42%

Table 6. Total Delay (8 Rows and Single Column).

Design	Total Delay
Original [12]	1.009 ns
PLNA	0.713 ns
SE	0.501 ns

Table 7. Delay Comparison for Different Operations.

Operation	Original (pS) [12]	SE (pS)	PLNA (pS)
+1 $\times - 1$	23.77	0.5073	1.124
$- 1 \times - 1$	20.652	0.29143	4.9496
+1 × +1	20.57	0.3612	2.052
$- 1 \times$ +1	21.22	0.5463	1.1154

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gundrapally, A.; Alnatsheh, N.; Choi, K.K. Novel Low-Power Computing-In-Memory (CIM) Design for Binary and Ternary Deep Neural Networks by Using 8T XNOR SRAM. Electronics 2024, 13, 4828. https://doi.org/10.3390/electronics13234828

AMA Style

Gundrapally A, Alnatsheh N, Choi KK. Novel Low-Power Computing-In-Memory (CIM) Design for Binary and Ternary Deep Neural Networks by Using 8T XNOR SRAM. Electronics. 2024; 13(23):4828. https://doi.org/10.3390/electronics13234828

Chicago/Turabian Style

Gundrapally, Achyuth, Nader Alnatsheh, and Kyuwon Ken Choi. 2024. "Novel Low-Power Computing-In-Memory (CIM) Design for Binary and Ternary Deep Neural Networks by Using 8T XNOR SRAM" Electronics 13, no. 23: 4828. https://doi.org/10.3390/electronics13234828

APA Style

Gundrapally, A., Alnatsheh, N., & Choi, K. K. (2024). Novel Low-Power Computing-In-Memory (CIM) Design for Binary and Ternary Deep Neural Networks by Using 8T XNOR SRAM. Electronics, 13(23), 4828. https://doi.org/10.3390/electronics13234828

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Novel Low-Power Computing-In-Memory (CIM) Design for Binary and Ternary Deep Neural Networks by Using 8T XNOR SRAM

Abstract

1. Introduction

2. Background

2.1. 8T SRAM Cell

2.2. Pass Transistor Logic

3. XNOR Operation in Ternary Logic Using Pass Transistor Logic (PTL)

XNOR Operation Under Different Input Combinations When Q = 1, Q_b = 0

4. Proposed XNOR SRAM Design and Optimization

4.1. P-Latch N-Access (PLNA) SRAM Structure

4.2. Single-Ended (SE) SRAM Structure

5. Results

5.1. Power Results

5.2. Delay Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI