FinFET 6T-SRAM All-Digital Compute-in-Memory for Artificial Intelligence Applications: An Overview and Analysis

Artificial intelligence (AI) has revolutionized present-day life through automation and independent decision-making capabilities. For AI hardware implementations, the 6T-SRAM cell is a suitable candidate due to its performance edge over its counterparts. However, modern AI hardware such as neural networks (NNs) access off-chip data quite often, degrading the overall system performance. Compute-in-memory (CIM) reduces off-chip data access transactions. One CIM approach is based on the mixed-signal domain, but it suffers from limited bit precision and signal margin issues. An alternate emerging approach uses the all-digital signal domain that provides better signal margins and bit precision; however, it will be at the expense of hardware overhead. We have analyzed digital signal domain CIM silicon-verified 6T-SRAM CIM solutions, after classifying them as SRAM-based accelerators, i.e., near-memory computing (NMC), and custom SRAM-based CIM, i.e., in-memory-computing (IMC). We have focused on multiply and accumulate (MAC) as the most frequent operation in convolution neural networks (CNNs) and compared state-of-the-art implementations. Neural networks with low weight precision, i.e., <12b, show lower accuracy but higher power efficiency. An input precision of 8b achieves implementation requirements. The maximum performance reported is 7.49 TOPS at 330 MHz, while custom SRAM-based performance has shown a maximum of 5.6 GOPS at 100 MHz. The second part of this article analyzes the FinFET 6T-SRAM as one of the critical components in determining overall performance of an AI computing system. We have investigated the FinFET 6T-SRAM cell performance and limitations as dictated by the FinFET technology-specific parameters, such as sizing, threshold voltage (Vth), supply voltage (VDD), and process and environmental variations. The HD FinFET 6T-SRAM cell shows 32% lower read access time and 1.09 times better leakage power as compared with the HC cell configuration. The minimum achievable supply voltage is 600 mV without utilization of any read- or write-assist scheme for all cell configurations, while temperature variations show noise margin deviation of up to 22% of the nominal values.


Introduction
Modern artificial intelligence (AI) deploys deep neural networks (DNNs) for quick and self-sufficient operations. Data-centric DNNs require huge amounts of data for their training and inference [1,2]. Consequently, data traffic between memory and processing units has increased immensely and choked overall system performance. Researchers have proposed the compute-in-memory (CIM) concept as a solution to overcome data movement bottlenecks to ensure AI platform performance.
Modern edge computing devices are now generating data on the order of the terabits per second. Huge data inference in general processing units demands extended amounts of time for the on and off chip movement and extra computational resources. Thus, CIM addresses these challenges by reducing off-chip traffic and data movement for cloud computing.
Thus, CIM addresses these challenges by reducing off-chip traffic and data movement for cloud computing.
The CIM approach is accomplished either based on the mixed-signal or all-digital signal domains. The mixed-signal approach evolved earlier, as it achieves better energy efficiency [3]. However, the analog signal nature restricts such CIM to fewer bits precision and needs additional peripheral circuitry [4]. Hence, this approach is not feasible for the accuracy of critical AI systems. Howbeit, the digital signal domain approach does not suffer from accuracy or bit-precision issues, but it uses dedicated on-chip circuitry for arithmetic operations, known as near-memory computing (NMC), and pushes hardware and power overhead high. Recently, modified SRAM cells achieved better efficiency by performing some computations inside the cell, known as in-memory computation (IMC), and the rest of the computations with arithmetic units placed near the memory. For a broad range of AI applications coverage, near-memory and in-memory computations optimize frequent NN operations such as MAC. The all-digital signal domain SRAM CIM [5] is more accurate and performance-centric because of its digital signal domain operations and adaptability to the advanced technological nodes.
6T-SRAM cell performance has improved with the evolution of the transistor. Over the last two decades, the 6T-SRAM cell has evolved from planar transistor to the 3D-Fin-FET structure to improve design density, power, and performance. Figure 1 shows the SRAM density trends and projections. In modern technological nodes, the 6T-SRAM cell suffers from many challenges, such as low-voltage operation reliability, leakage current, soft errors, security-aware design requirements, and half-select problems [6][7][8][9]. The severity of these issues has increased in the FinFET 6T-SRAM cells. Because in modern technological nodes low supply voltage usage saves power but poses a threat to cell stability, leakage current has become comparable to the on-current; thus, leakage power is a now significant portion of total power consumption, and decreased geometrical dimensions make it easier to alter internal node value, which increases soft error probability. To discourage the side-channel attacks to steal data from the cell, FinFET 6T-SRAM needs design alterations at the cell level. Furthermore, in a higher SRAM density design, only row or column selected cells, known as half-select, can flip their values. Therefore, the FinFET 6T-SRAM cell design needs a comprehensive analysis to ensure a reliable operation in CIM solutions for AI applications. The rest of the article is organized as follows. Section 2 gives background knowledge about DNN structures and the 6T-SRAM cell's basic operation. Section 3 presents siliconverified CIM digital accelerators (NMC) and custom-designed CIM SRAMs (IMC) at the system-on-chip (SoC) level. Next, Section 4 provides a comprehensive analysis of The rest of the article is organized as follows. Section 2 gives background knowledge about DNN structures and the 6T-SRAM cell's basic operation. Section 3 presents siliconverified CIM digital accelerators (NMC) and custom-designed CIM SRAMs (IMC) at the system-on-chip (SoC) level. Next, Section 4 provides a comprehensive analysis of challenges to the modern FinFET 6T-SRAM cell as a direct impact of transistor scaling. Finally, Section 5 concludes this paper.

Background
This section provides background knowledge about NN structures and the 6T-SRAM cell as the foundation for the FinFET 6T-SRAM CIM comprehension.

Deep Neural Networks (DNNs)
In Figure 2, we see a three-layer neural network. The first layer, known as the input layer, uses input values. The second layer or hidden layer is composed of hidden neurons (z) whose activation is dependent on the synapses' values and a specific function, known as the activation function. The third layer is referred to as the output layer. Each layer connects itself to the next layer's neurons via connections or synapses. These connections carry weights; an NN trainer tunes weight values during the training phase.
challenges to the modern FinFET 6T-SRAM cell as a direct impact of transistor scaling Finally, Section 5 concludes this paper.

Background
This section provides background knowledge about NN structures and the 6T-SRAM cell as the foundation for the FinFET 6T-SRAM CIM comprehension.

Deep Neural Networks (DNNs)
In Figure 2, we see a three-layer neural network. The first layer, known as the inpu layer, uses input values. The second layer or hidden layer is composed of hidden neuron (z) whose activation is dependent on the synapses' values and a specific function, known as the activation function. The third layer is referred to as the output layer. Each laye connects itself to the next layer's neurons via connections or synapses. These connection carry weights; an NN trainer tunes weight values during the training phase. Moreover, the number of hidden layers and the number of neurons in each layer ar not fixed. Variation in hidden layers and neuron number is a tradeoff between NN com plexity and accuracy. Therefore, these two numbers must be chosen to achieve acceptabl accuracy, while keeping the complexity level low. Each weight multiplies itself with th output value from the previous layer neuron. Afterward, each neuron accumulates mul tiplication results from all synapses and applies an activation function on multiply accu mulated value. A neuron usually uses sigmoid, rectified-linear, tangent, or binary func tions for activation purposes. Figure 3 shows a conventional 6T-SRAM cell; it consists of six transistors: pull-up (PU), pull-down (PD), and access (AC). The 6T-SRAM cell operates in either of thre modes: hold, read, and write. In the hold mode, both access transistors are off, hence th cell retains the internal nodes' values. In the read operation, the access transistors activa tion through the wordline (WL) connects the internal nodes (Q and Qb) to the precharged bitlines (BLs). Then, the SRAM cell puts stored values onto the BLs. The write operation is the inverse of the read operation, i.e., a write driver through the BLs puts value onto th internal nodes. The 6T-SRAM cell transistors' strength differs to ensure a successful oper ation. The PD transistors should be the strongest ones, while the AC and PU transistor come in second and third place, respectively, in the strength hierarchy. The pull ratio (th PU-to-AC transistors strength) and cell ratio (the PD-to-AC transistors strength) adjust ment tunes the read and write performance of an SRAM cell.

6T-SRAM Cell
Noise margins are fundamental indicators for measuring noise tolerance levels. Each operational mode, as described above, has a different noise tolerance. To measure th noise tolerance level during each operational mode, an external source injects noise into Moreover, the number of hidden layers and the number of neurons in each layer are not fixed. Variation in hidden layers and neuron number is a tradeoff between NN complexity and accuracy. Therefore, these two numbers must be chosen to achieve acceptable accuracy, while keeping the complexity level low. Each weight multiplies itself with the output value from the previous layer neuron. Afterward, each neuron accumulates multiplication results from all synapses and applies an activation function on multiply accumulated value. A neuron usually uses sigmoid, rectified-linear, tangent, or binary functions for activation purposes. Figure 3 shows a conventional 6T-SRAM cell; it consists of six transistors: pull-up (PU), pull-down (PD), and access (AC). The 6T-SRAM cell operates in either of three modes: hold, read, and write. In the hold mode, both access transistors are off, hence the cell retains the internal nodes' values. In the read operation, the access transistors activation through the wordline (WL) connects the internal nodes (Q and Qb) to the precharged bitlines (BLs). Then, the SRAM cell puts stored values onto the BLs. The write operation is the inverse of the read operation, i.e., a write driver through the BLs puts value onto the internal nodes. The 6T-SRAM cell transistors' strength differs to ensure a successful operation. The PD transistors should be the strongest ones, while the AC and PU transistors come in second and third place, respectively, in the strength hierarchy. The pull ratio (the PU-to-AC transistors strength) and cell ratio (the PD-to-AC transistors strength) adjustment tunes the read and write performance of an SRAM cell.  An SRAM array periphery includes a row decoder and a column decoder to access a particular SRAM location, sense amplifiers to read the memory, write drivers to write data into the memory, and a controller to synchronize control signals to perform overall operations.

Compute-in-Memory (CIM)
An increase in data traffic between microprocessor and memory due to DNN inference degrades performance and increases power consumption. To reduce the data traffic, a newer version of memory is introduced with some computational capability, known as CIM. Hence, CIM reduces traffic and consequently optimizes performance and power. Different memories are used for CIM implementation. Table 1 lists the fundamental performance parameters of multiple memories cell. A designer can make an informed decision about using a specific memory cell type for a CIM implementation. The FinFET 6T-SRAM uses restoring logic but needs six transistors per cell ( Figure  3). Due to the restoring logic, retention time of the value stored inside the FinFET 6T-SRAM cell is limited to the duration of supply voltage availability. Figure 4 shows the transistor-level implementation of each memory cell type reported in Table 1, except the FinFET 6T-SRAM cell. Next, dynamic random-access memory (DRAM) suffers from the Noise margins are fundamental indicators for measuring noise tolerance levels. Each operational mode, as described above, has a different noise tolerance. To measure the noise tolerance level during each operational mode, an external source injects noise into the internal nodes (Q and Qb). After a certain noise level, an SRAM cell is flipped. That noise level is known as the noise margin for that particular mode of operation.
An SRAM array periphery includes a row decoder and a column decoder to access a particular SRAM location, sense amplifiers to read the memory, write drivers to write data into the memory, and a controller to synchronize control signals to perform overall operations.

Compute-in-Memory (CIM)
An increase in data traffic between microprocessor and memory due to DNN inference degrades performance and increases power consumption. To reduce the data traffic, a newer version of memory is introduced with some computational capability, known as CIM. Hence, CIM reduces traffic and consequently optimizes performance and power. Different memories are used for CIM implementation. Table 1 lists the fundamental performance parameters of multiple memories cell. A designer can make an informed decision about using a specific memory cell type for a CIM implementation. The FinFET 6T-SRAM uses restoring logic but needs six transistors per cell ( Figure 3). Due to the restoring logic, retention time of the value stored inside the FinFET 6T-SRAM cell is limited to the duration of supply voltage availability. Figure 4 shows the transistorlevel implementation of each memory cell type reported in Table 1, except the FinFET 6T-SRAM cell. Next, dynamic random-access memory (DRAM) suffers from the challenge of leakage current and thus needs a refresher current to keep the charge level maintained. The retention time of a DRAM is in the range of few milliseconds, as it uses a tiny capacitor to store the value. Small storage capacitance keeps retained value duration minimal. Formation of the resistive path in resistive RAM (ReRAM) during write operation pushes the write voltage requirement high, and reliability becomes a challenge in the long run. NAND/NOR flash memories utilize a floating gate transistor for value storage. Both memories, NAND/NOR flash, differ in erase, program, read, reliability, and power consumption. NOR flash accessibility enables its utilization in code execution, while NAND flash has higher density and lower cost compared with the NOR flash counterpart. Phasechange memory (PCM) exploits amorphous and crystalline transitions of phase-change material to store data. However, the high write current and temperature limits PCM endurance. Replacement of dielectric material in DRAM with the ferroelectric material leads to a ferroelectric RAM (FRAM) cell. However, like the DRAM cell, the FRAM cell also needs rewriting after each read value operation. Magneto-resistive RAM (MRAM) unveils magnetic layers as a potential candidate for value storage. The magnetic storage concept makes MRAM performance comparable to DRAM. FRAM reliability, especially under the radiation emission environment, is still a challenge. FeFET uses a ferroelectric layer at the gate to control charge flow between the source and drain. Similar to other nonvolatile memories, this one also suffers from issues of high write voltage. challenge of leakage current and thus needs a refresher current to keep the charge level maintained. The retention time of a DRAM is in the range of few milliseconds, as it uses a tiny capacitor to store the value. Small storage capacitance keeps retained value duration minimal. Formation of the resistive path in resistive RAM (ReRAM) during write operation pushes the write voltage requirement high, and reliability becomes a challenge in the long run. NAND/NOR flash memories utilize a floating gate transistor for value storage. Both memories, NAND/NOR flash, differ in erase, program, read, reliability, and power consumption. NOR flash accessibility enables its utilization in code execution, while NAND flash has higher density and lower cost compared with the NOR flash counterpart. Phase-change memory (PCM) exploits amorphous and crystalline transitions of phasechange material to store data. However, the high write current and temperature limits PCM endurance. Replacement of dielectric material in DRAM with the ferroelectric material leads to a ferroelectric RAM (FRAM) cell. However, like the DRAM cell, the FRAM cell also needs rewriting after each read value operation. Magneto-resistive RAM (MRAM) unveils magnetic layers as a potential candidate for value storage. The magnetic storage concept makes MRAM performance comparable to DRAM. FRAM reliability, especially under the radiation emission environment, is still a challenge. FeFET uses a ferroelectric layer at the gate to control charge flow between the source and drain. Similar to other nonvolatile memories, this one also suffers from issues of high write voltage. Besides the memories mentioned in Table 1, some emerging memories are [10] novel magnetic memory, i.e., spin-transfer torque (STT) and spin-orbital torque (SOT), oxidebased resistive RAM (OxRAM), conducting bridging RAM (CBRAM), macromolecular memory, massive storage devices, and MOTT memory. New emerging devices are under research and are still in their infancy stages; therefore, very limited performance data and cell structure details are available.
The 6T-SRAM cell only relies only on transistors ( Figure 3). No specific material or properties exploitation, i.e., resistance or magnetization, is needed. Therefore, SRAM cell design keeps pace with technological node scaling. As of today, 3 nm FinFET is the latest technological node, and the FinFET 6T-SRAM manufacturing is also fabricated at that node [11]. Whereas materials-specific transistor structure hinders the rest of the memory cells' maturity towards modern technological nodes, the restoring logic of the FinFET 6T-SRAM shows better performance (Table 1), and scalability to advanced technological nodes enables it to have lower nominal supply voltage. Dynamic power consumption depends on the square of the supply voltage. Hence, dynamic power consumption is lower for the FinFET 6T-SRAM cell. This makes the FinFET 6T-SRAM cell an appealing choice for CIM AI applications. Besides the memories mentioned in Table 1, some emerging memories are [10] novel magnetic memory, i.e., spin-transfer torque (STT) and spin-orbital torque (SOT), oxidebased resistive RAM (OxRAM), conducting bridging RAM (CBRAM), macromolecular memory, massive storage devices, and MOTT memory. New emerging devices are under research and are still in their infancy stages; therefore, very limited performance data and cell structure details are available.
The 6T-SRAM cell only relies only on transistors ( Figure 3). No specific material or properties exploitation, i.e., resistance or magnetization, is needed. Therefore, SRAM cell design keeps pace with technological node scaling. As of today, 3 nm FinFET is the latest technological node, and the FinFET 6T-SRAM manufacturing is also fabricated at that node [11]. Whereas materials-specific transistor structure hinders the rest of the memory cells' maturity towards modern technological nodes, the restoring logic of the FinFET 6T-SRAM shows better performance (Table 1), and scalability to advanced technological nodes enables it to have lower nominal supply voltage. Dynamic power consumption depends on the square of the supply voltage. Hence, dynamic power consumption is lower for the FinFET 6T-SRAM cell. This makes the FinFET 6T-SRAM cell an appealing choice for CIM AI applications.  Figure 5a shows the introduction of the additional computational hardware without any other alteration to the memory architecture. In this architecture, the SRAM read operation provides data to the computational unit for NN inference. CIM processors/accelerators adopt this type of hardware using the ASIC implementation approach. The computational unit is composed of standard cell-based arithmetic units and control circuitry for data flow. This requires less time and focuses on data flow/compression efficiency. Therefore, Figure 5a architecture is used in the digital signal domain NMC.  Figure 5a shows the introduction of the additional computational hardware witho other alteration to the memory architecture. In this architecture, the SRAM read ope provides data to the computational unit for NN inference. CIM processors/accele adopt this type of hardware using the ASIC implementation approach. The computa unit is composed of standard cell-based arithmetic units and control circuitry fo flow. This requires less time and focuses on data flow/compression efficiency. The Figure 5a architecture is used in the digital signal domain NMC. Figure 5b represents modifications in the SRAM cells and extra computation cuits with highlighted blocks to indicate modified cell structures. In this approac application of an input signal on BLs interacts with the SRAM cells column-wise used for digital signal domain IMC. Figure 5c is widely used for customized mixe digital signal domain IMC. A mixed-signal domain activates multiple rows simu ously, consequently increasing energy efficiency. Row and column circuitry includ nal domain conversions, i.e., digital to analog and vice versa. However, signal d conversions pose a bottleneck to the performance of mixed-signal domain SRAM Recently, the all-digital signal domain CIM-SRAM has emerged as a solution to th lenges faced by the mixed-signal domain, such as the elimination of signal conv signal precision, and level quantization. Now, computational circuitry consists of metic units for accumulation and activation for DNN execution.

SRAM CIM Signal Domains
Considering the SRAM-based CIM, memory computations (IMC or NMC) can be in the mixed analog/digital or all-digital signal domain.

Mixed Analog/Digital SRAM Based CIM
The mixed-signal domain is energy efficient and provides simultaneous activa multiple SRAM rows. We can further divide it into three distinct classes, as shown ure 6.

SRAM CIM Signal Domains
Considering the SRAM-based CIM, memory computations (IMC or NMC) can either be in the mixed analog/digital or all-digital signal domain.

Mixed Analog/Digital SRAM Based CIM
The mixed-signal domain is energy efficient and provides simultaneous activation of multiple SRAM rows. We can further divide it into three distinct classes, as shown in Figure 6.  Variable capacitance, current starving, and taping ball tune the delay to perform the compute in memory using time multiplexing [58].

Digital Signal Domain SRAM-Based CIM
As technology scales down, the supply voltage is reduced, which has a major impact on dynamic power consumption. As per the international roadmap for devices and systems (IRDS), the supply voltage would scale down to 600 mV for 0.7 nm node by 2034. At such a low operating voltage, the SRAM-based mixed-signal CIM memory would be challenging for a high-precision NN. In addition, the reduction in voltage headroom (VDD-Vth) will exacerbate the problem of FinFET 6T-SRAM's reliable operation. Thus, a digital signal domain SRAM is the most suitable candidate for CIM applications.
SRAM being the CIM ultimate choice, we overviewed the recent silicon-verified implementations of convolution neural networks (CNNs). Utilization of multiple databases, i.e., MNIST, AlexNet, VGG, CIFAR, Google-Net, Standford bg, and GTRSB, depicts NN inference with the help of numerous structures. Input and weight precisions of 8b and 1b-16b are taken into consideration, as they achieve the required NN inference and functions. We mention performance evaluation in Tables 2-4. Table 2 reports five CIM accelerators: Brien, Eyeriss, deep neural processing unit (DNPU), Envision, and Quest. Performance and energy efficiency are critical parameters for a particular design evaluation. These are measured in tera/gega operations per second In the first case, the WL carries the input signal in the form of amplitude modulation, pulse-width modulation, or pulse frequency [55]. Variable capacitance, current starving, and taping ball tune the delay to perform the compute in memory using time multiplexing [58].

Digital Signal Domain SRAM-Based CIM
As technology scales down, the supply voltage is reduced, which has a major impact on dynamic power consumption. As per the international roadmap for devices and systems (IRDS), the supply voltage would scale down to 600 mV for 0.7 nm node by 2034. At such a low operating voltage, the SRAM-based mixed-signal CIM memory would be challenging for a high-precision NN. In addition, the reduction in voltage headroom (V DD -V th ) will exacerbate the problem of FinFET 6T-SRAM's reliable operation. Thus, a digital signal domain SRAM is the most suitable candidate for CIM applications.
SRAM being the CIM ultimate choice, we overviewed the recent silicon-verified implementations of convolution neural networks (CNNs). Utilization of multiple databases, i.e., MNIST, AlexNet, VGG, CIFAR, Google-Net, Standford bg, and GTRSB, depicts NN inference with the help of numerous structures. Input and weight precisions of 8b and 1b-16b are taken into consideration, as they achieve the required NN inference and functions. We mention performance evaluation in Tables 2-4.    Table 2 reports five CIM accelerators: Brien, Eyeriss, deep neural processing unit (DNPU), Envision, and Quest. Performance and energy efficiency are critical parameters for a particular design evaluation. These are measured in tera/gega operations per second (T/GOPS) and tera/gega operations per second per watt (T/GOPS/W), respectively. These accelerators have limited weight precision compared with the ones reported in Table 3. BRein [59] uses ternary (0, 1, −1) and binary (0, 1) precision for the weights and input signals, respectively. Layer-wise input/output parallel computation concept successfully incorporates three-layer NNs for a single hardware unit. On-chip functions XNOR, accumulation, and sign achieve MAC and activation functions. Overall, the hardware has 6 processing units in cascade to host 13-layer NNs, without any external data access. Eyeriss [60] has an on-chip memory composed of SRAM cells and an off-chip DRAM memory. These memories hold weights, image features, and their partial sums. The processing element array controls the data flow by adapting to the multiple CNN sizes. In addition, the data-gating technique detects image zero data values to skip some computation functions to enhance power efficiency. Dynamic neural processing units (DNPUs) [61] contain four convolutions clusters and one aggregation core inside a convolution processor. Inside each convolution cluster, weight and image memory provide data to the processing elements to perform computations. The aggregation core accumulates partial sums obtained from processing elements and also executes activation functions and other operations. Envision [62] exploits multivoltage and body biasing techniques together with dynamic voltage and frequency scaling. Multiple SRAM data banks provide simultaneous read and write operations at the same instant. These weights and image storage banks provide data to the MAC array for precision-controlled CNN computations. Quest [63] uses 3D-stacked SRAMs, and each stack is accessible in parallel. The high bandwidth of the data transfer bus and independent memory stack access increase the data retrieval capacity. The overall chip contains 24 cores, where 1 core has a sequencer unit to control operation and address generation, a memory controller, a serializer, de-serializer, and a PE array. PE arrays store images and filter weights in SRAM buffers and carry out MAC operations. Activation operation follows MAC operation. Log-quantization reduces the number of MAC operations, consequently saving energy. Table 3 summarizes features of five silicon-verified SRAM-based CIM implementations with a weight scalability of 1-16 bits, i.e., high weight precision. A unified neural processing unit (UNPU) [64] has the capability to accelerate CNNs and recurrent NNs with a weight scalability of 1-16 bits. Precision tradeoffs occur among accuracy and energy optimization. The hardware consists of four DNN cores, an aggregation core, and a controller. Each DNN core possesses image features and weights in memory banks and passes on weights to the look-up table processing element for MAC operation. All DNN cores transfer intermediate results to the aggregation core for postprocessing and finalization. The recognition processor [65] core component is the neuron processing engine (NPE) to compute dual-range MAC operations, activation function, and other essential calculations. Furthermore, data compression and external memory access reduction via on-chip memory enhance energy optimization. The origami accelerator [66] exploits scalability for power efficiency. Image and filter banks store features and CNN weights, respectively. Parallel sum-of-products (SoP) units get data from SRAM-based storage banks. An SoP unit is composed of multipliers and adder units to carry out MAC operations, and then it pushes the data onto channel sum units after processing for accumulation completion. Reconfigurable and hybrid NN processors [67] support runtime reconfiguration and partition to adapt themselves as per convolution layer demand. Two 16 × 16 processing element (PE) arrays, memory blocks for images and weights, and controller for data flow are the core components of the processor. Each PE performs all key CNN computations locally and stores the result back in the memory. Precision scalable processor [68] hardware has 256 parallel processing MAC units capable of precision and voltage scaling for low-power operation. An MAC unit passes partial sum results to the on-chip computation unit, which applies more mathematical functions to complete the NN inference. Beforehand, the data compression unit provides convolution sparsity to reduce the number of data bits.

Custom SRAM-Based CIM (IMC)
The aforementioned (Tables 2 and 3) CIMs are ASIC-based accelerators, with the 6T-SRAM memory only being used as storage for weights or input features. However, Table 4 provides custom CIM solutions, where a modified SRAM cell performs some computations right inside the SRAM cell or on the BLs. Computation circuitry at the periphery carries out the remaining arithmetic operations. Colonnade [69] architecture proposes a CIM SRAM cell equipped with a conventional 6T-SRAM cell, a customized XNOR gate, two 2:1 MUXs, and a full adder. This cell, with full adder activated only mode, scales input precision up to 16 bits. The input application at BLs calculates MAC value column-wise, and each result is then passed on to other columns. At the periphery, the post accumulator unit takes partial sum results and produces complete MAC calculations. Compute-SRAM [70] employs 8T transposable SRAM cells, with separate compute WLs and BLs. Conventional read/write operation and associated peripheral circuits remain intact. Moreover, this CIM-SRAM can perform basic Boolean functions and floating-point arithmetic. All-digital CIM-SRAM [71] associates a two-input NOR gate with each SRAM cell. The NOR gate takes one input from the internal node of a 6T-SRAM cell (weight-bit), whereas the input driver provides the second input. This way, the NOR gate multiplies two bits. An adder tree, based on alternate 24T and 28T full adders, takes bit multiplication results and renders MAC operation completion. Multifunctional CIM SRAM [72] employs 7T-SRAM cells, as this cell provides isolation from the internal storage nodes. Alongside this, each SRAM cell carries six additional transistors. Every SRAM column has a dedicated ripple carry adder and multiplier unit for the multiplication and addition operation. A write-back mechanism stores results back into the SRAM. In [73], the SRAM synaptic array stores images and weights in the SRAM array. Adders and registers aid in the row-by-row summation operation. Reduction in weight precision up to ternary level along supply voltage reduction provides low-power operation.

FinFET 6T-SRAM Cell
This section provides 6T-SRAM cell evolution, i.e., the same as the transistor structural evolution, and some key performance parameters. Afterward, we provide a comprehensive analysis of FinFET 6T-SRAM cell reliability issues.

FinFET 6T-SRAM Cell Design
Conventional 6T-SRAM cells (Figure 4) are still an appealing choice for cache memory mass production due to the minimum number of transistors, dual port for read and write operations, and less leakage current as compared with 7T and 8T SRAM cells. Alternate cells have an edge in noise margins, internal node isolation, and half-cell selection issues but at the expense of increased peripheral circuits, operational complexity, extra control signals, and cell area [74]. However, the FinFET 6T-SRAM differs in transistor structure from the planar CMOS. Therefore, it poses specific challenges to SRAM cell performance. Figure 7 overviews the evolution of the SRAM cell from a transistor structural perspective. In 1971, Intel introduced the 4004 microprocessors based on 12 µm channel length transistors [75]. Later, in the early 1980s, polycide-gate and silicide resolved issues of the increasing gate and source/drain resistance [76]. In the following decade, in the 1990s, shallow trench isolation (STI) improved the electrical isolation of a device [77]. Then, high-K metal gate (HKMG) and dielectric reduced gate oxide thickness and leakage current [78]. Silicon-on-insulator (SOI) was a major structural modification to mitigate body effect and junction capacitances [79]. A better subthreshold slope owing to better control over the conduction channel led to the modern FinFET design [80]. However, gate all around (GAA) or horizontal nanosheet (HNS) transistors seem to have replaced FinFET due to all-around gate control capability, but they are still being researched. Transistors have scaled down from 12 µm to 3 nm over the last five decades [81]. The 6T-SRAM cell has evolved in parallel to transistor evolution by adopting evolved structural modifications in the transistor. extra control signals, and cell area [74]. However, the FinFET 6T-SRAM differs in transistor structure from the planar CMOS. Therefore, it poses specific challenges to SRAM cell performance. Figure 7 overviews the evolution of the SRAM cell from a transistor structural perspective. In 1971, Intel introduced the 4004 microprocessors based on 12 µm channel length transistors [75]. Later, in the early 1980s, polycide-gate and silicide resolved issues of the increasing gate and source/drain resistance [76]. In the following decade, in the 1990s, shallow trench isolation (STI) improved the electrical isolation of a device [77]. Then, high-K metal gate (HKMG) and dielectric reduced gate oxide thickness and leakage current [78]. Silicon-on-insulator (SOI) was a major structural modification to mitigate body effect and junction capacitances [79]. A better subthreshold slope owing to better control over the conduction channel led to the modern FinFET design [80]. However, gate all around (GAA) or horizontal nanosheet (HNS) transistors seem to have replaced FinFET due to all-around gate control capability, but they are still being researched. Transistors have scaled down from 12 µm to 3 nm over the last five decades [81]. The 6T-SRAM cell has evolved in parallel to transistor evolution by adopting evolved structural modifications in the transistor.  Table 5 reports performance parameters [82][83][84]. We investigated the FinFET 6T-SRAM cell performance and associated challenges, taking these parameters as the benchmark. Moreover, we used a 256 × 128 FinFET 6T-SRAM array configuration in 12 nm Fin-FET process node in Cadence Virtuoso for simulations and performance evaluation purposes. In Section 4.2, figures and tables highlight the performance evaluation as per the criteria reported in Table 1.   Table 5 reports performance parameters [82][83][84]. We investigated the FinFET 6T-SRAM cell performance and associated challenges, taking these parameters as the benchmark. Moreover, we used a 256 × 128 FinFET 6T-SRAM array configuration in 12 nm FinFET process node in Cadence Virtuoso for simulations and performance evaluation purposes. In Section 4.2, figures and tables highlight the performance evaluation as per the criteria reported in Table 1.

= (2ℎ + )
Hence, FinFET 6T-SRAM cells are no longer general purpose; they ar performance-, or power-centric. In the literature [85], three different FinF cells exist: high-density (HD), high-performance (HP), and high-current (H sistor sizing ratio for each of them is PU:AC:PD of 1:1:1, 1:1:2, and 1:2:2 for HC, respectively. Here, PU, AC, and PD represent pull-up, access, and pulltors, respectively. The HD cell uses minimum-size transistors, the HP cel readability with an improved cell ratio, and the HC cell has increased wri increased pull ratio. Furthermore, transistor sizing quantization affects other performance p listed in Table 6. Here, the HD cell is efficient for static and dynamic power owing to the lower transistor sizes, which limit current flow. The PU and A HP cell make it suitable for the read operation, yet the leakage power becom of the HD cell. The PU and AC strengths in the HC cell keep write time t degrade the read performance of the HC cell.   Hence, FinFET 6T-SRAM cells are no longer general purpose; they are either area-, performance-, or power-centric. In the literature [85], three different FinFET 6T-SRAM cells exist: high-density (HD), high-performance (HP), and high-current (HC). The transistor sizing ratio for each of them is PU:AC:PD of 1:1:1, 1:1:2, and 1:2:2 for HD, HP, and HC, respectively. Here, PU, AC, and PD represent pull-up, access, and pull-down transistors, respectively. The HD cell uses minimum-size transistors, the HP cell offers better readability with an improved cell ratio, and the HC cell has increased writability due to increased pull ratio.
Furthermore, transistor sizing quantization affects other performance parameters, as listed in Table 6. Here, the HD cell is efficient for static and dynamic power consumption owing to the lower transistor sizes, which limit current flow. The PU and AC sizes in the HP cell make it suitable for the read operation, yet the leakage power becomes double that of the HD cell. The PU and AC strengths in the HC cell keep write time the lowest but degrade the read performance of the HC cell.  Figure 9 shows noise margins for the FinFET 6T-SRAM cells. The hold static noise margin (HSNM) is same in the HP and HC configurations because of the same back-toback inverter strengths, i.e., the PU and PD transistors. The inverter pair in the HD cell has equal PU and PD transistor sizes that keep each inverter switching voltage near the midpoint of supply voltage (V DD /2). Thus, its HSNM is higher compared with the HP and HC configurations. In the read static noise margin (RSNM), the HP cell outperforms due to the stronger PD device compared with the AC device, while the HC cell RSNM suffers from strong AC transistors, as BLs try to impose external value, but the PD transistors oppose it. Whereas the HD cell shows lower RSNM in contrast with the HC cell due to the same strength of the AC and PD transistors, the HC cell is superior in the write static noise margin (WSNM) due to its better pull ratio. However, the cell ratio in the HP and HD cells decreases their WSNMs.

Supply Voltage
Noise margins and read/write performance have a direct relation to the supply voltage. Hence, reduced supply voltage poses a serious threat to SRAM-cell stability. The Fin-FET 6T-SRAM cell has reduced supply voltage; therefore, it is more prone to failures. Figure 10 shows the decreased headroom available for the noise margins. The threshold voltage for the same technological nodes lies between 211-237 mV [81]. Thus, the noise margin will decrease for future technologies. However, the supply voltage reduction improves the power consumption drastically, since the dynamic power depends on the square of the supply voltage. In the read static noise margin (RSNM), the HP cell outperforms due to the stronger PD device compared with the AC device, while the HC cell RSNM suffers from strong AC transistors, as BLs try to impose external value, but the PD transistors oppose it. Whereas the HD cell shows lower RSNM in contrast with the HC cell due to the same strength of the AC and PD transistors, the HC cell is superior in the write static noise margin (WSNM) due to its better pull ratio. However, the cell ratio in the HP and HD cells decreases their WSNMs.

Supply Voltage
Noise margins and read/write performance have a direct relation to the supply voltage. Hence, reduced supply voltage poses a serious threat to SRAM-cell stability. The FinFET 6T-SRAM cell has reduced supply voltage; therefore, it is more prone to failures. Figure 10 shows the decreased headroom available for the noise margins. The threshold voltage for the same technological nodes lies between 211-237 mV [81]. Thus, the noise margin will decrease for future technologies. However, the supply voltage reduction improves the power consumption drastically, since the dynamic power depends on the square of the supply voltage.
FET 6T-SRAM cell has reduced supply voltage; therefore, it is more prone to failures. Figure 10 shows the decreased headroom available for the noise margins. The threshold voltage for the same technological nodes lies between 211-237 mV [81]. Thus, the noise margin will decrease for future technologies. However, the supply voltage reduction improves the power consumption drastically, since the dynamic power depends on the square of the supply voltage.  Table 7 shows the performance variations with respect to the supply voltage. We considered the HD FinFET 6T-SRAM cell to investigate its performance. The value of RSNM is 26 mV (same as to the thermal voltage value) at 600 mV supply voltage. Supply voltage variations beyond this point make cell read operations unstable, hence 600 mV is the minimum operational voltage. As the supply voltage varies between 800-600 mV, the leakage power improves by almost 40% at the cost of increased read (95%) and write (38%) access times. The noise margin determines the minimum operational voltage, whereas power and performance tradeoffs are based on the supply voltage variations.  Table 7 shows the performance variations with respect to the supply voltage. We considered the HD FinFET 6T-SRAM cell to investigate its performance. The value of RSNM is 26 mV (same as to the thermal voltage value) at 600 mV supply voltage. Supply voltage variations beyond this point make cell read operations unstable, hence 600 mV is the minimum operational voltage. As the supply voltage varies between 800-600 mV, the leakage power improves by almost 40% at the cost of increased read (95%) and write (38%) access times. The noise margin determines the minimum operational voltage, whereas power and performance tradeoffs are based on the supply voltage variations. Threshold voltage (V th ) variation has increased in the FinFET 6T-SRAM cell. Variation in V th is due to the pronounced short-channel effects (SCE) and drain-induced barrier lowering (DIBL). Equation (2) shows the SCE and DIBL relationship with the V th , where the V th∞ is the nominal value. Applied voltages and electric fields contribute to the SCE and DIBL [86], thus affecting the V th nominal value.
Equation (3) [87] explains V th variations through the FinFET geometrical parameters, where A is the material-dependent parameter, while t ox and ε ox represent oxide thickness and permittivity, respectively. W and L denote the FinFET width and gate length, respectively. As a consequence of the FinFET shrinking dimensions, variation in V th is not negligible anymore.
Moreover, any single fin in the FinFET structure ( Figure 8) shows different V th voltages at the center and corner. The difference in the gate work function at these locations is held accountable for this phenomenon [87].
In order to analyze V th effects, we implemented the multithreshold voltage-based (multi-Vth) HD FinFET 6T-SRAM cell. As the FinFET 6T-SRAM cell has 6 transistors, 27 multithreshold SRAM cell combinations are possible by using low-, standard-, and highthreshold voltage FinFET models. As each FinFET model name suggests, the V th voltage is either low, standard, or high in comparison with one another. Figure 11 shows noise margin measurement under threshold voltage variations. Three alphabets (on the x-axis) for each combination are in order of pull-up, access, and pull-down transistor threshold voltage strengths, respectively. The H, R, and L in Figure 11 denote high-Vth, regular-Vth, and low-Vth voltages, respectively. For example, the RLH combination represents pull-up transistors with regular V th (R), access transistors with low V th (L), and pull-down transistors with high V th (H) strength. We simulated the FinFET 6T-SRAM cell array with a size of 32KB designed with 12 nm FinFET process technology. The noise margin measurements are taken using the butterfly curve method [6]. Out of 27 multi-Vth designs, the RSNM has the lowest noise margin value, wh HSNM has the highest one. Therefore, the RSNM is more sensitive to Vth variatio HLR and HLH combinations, the RSNM value is just about 100 mV, thus limiting v scaling for low-power applications. However, the HSNM value remains above 300 all designs, and the WSNM varies from160 mV to 320 mV. Although the FinFET 6Tcell has sufficient noise margin for hold, read, and write operations, this can be ch ing in conjunction with other issues.
As the PU transistor should be the weakest among the other two transistors in FET 6T-SRAM cell, we used the PU transistor with the highest (H) Vth and varied maining two transistors' strengths, i.e., the AC and PD, to analyze cell performanc ure 12 shows the leakage power for the above-mentioned transistor combinations FinFET 6T-SRAM cell. Overall, the leakage power consumption has an inverse re ship with the threshold voltage. Out of 27 multi-V th designs, the RSNM has the lowest noise margin value, while the HSNM has the highest one. Therefore, the RSNM is more sensitive to V th variations. For HLR and HLH combinations, the RSNM value is just about 100 mV, thus limiting voltage scaling for low-power applications. However, the HSNM value remains above 300 mV for all designs, and the WSNM varies from160 mV to 320 mV. Although the FinFET 6T-SRAM cell has sufficient noise margin for hold, read, and write operations, this can be challenging in conjunction with other issues.
As the PU transistor should be the weakest among the other two transistors in a FinFET 6T-SRAM cell, we used the PU transistor with the highest (H) V th and varied the remaining two transistors' strengths, i.e., the AC and PD, to analyze cell performance. Figure 12 shows the leakage power for the above-mentioned transistor combinations in the FinFET 6T-SRAM cell. Overall, the leakage power consumption has an inverse relationship with the threshold voltage. Figure 13 shows the read and write access time for the FinFET 6T-SRAM cells. The read time is lower for the cases where the transistor strength difference between the access and pull-down transistor is greater. On the other hand, the write performance is better for the stronger access transistor compared with the pull-up transistor strength. Similarly, the dynamic power consumption will be greater for strong transistors, because they provide more current. For example, dynamic power for HLL would be higher than HRR due to high current driving capabilities.
As the PU transistor should be the weakest among the other two transistors in a Fin-FET 6T-SRAM cell, we used the PU transistor with the highest (H) Vth and varied the remaining two transistors' strengths, i.e., the AC and PD, to analyze cell performance. Figure 12 shows the leakage power for the above-mentioned transistor combinations in the FinFET 6T-SRAM cell. Overall, the leakage power consumption has an inverse relationship with the threshold voltage.  Figure 13 shows the read and write access time for the FinFET 6T-SRAM cells. The read time is lower for the cases where the transistor strength difference between the access and pull-down transistor is greater. On the other hand, the write performance is better for the stronger access transistor compared with the pull-up transistor strength. Similarly, the dynamic power consumption will be greater for strong transistors, because they provide more current. For example, dynamic power for HLL would be higher than HRR due to high current driving capabilities.

Temperature and Process Variations
Due to the extreme geometrical dimensions and material properties of the FinFET, SRAM cell reliability has become a major concern. Previous subsections show the effects of the supply voltage, threshold voltage (Vth), and sizing on the FinFET 6T-SRAM cell performance. This section explores reliability under temperature and process variations.  (Figure 14c) show variations of about 13%, 22%, and 12%, respectively. This variation is with reference to the noise measurement at room temperature i.e., 27 °C. Here, RSNM is more important, as it is the lowest of the two noise margins. Nonetheless, it remains above 115mV. The variation in noise margin is due to the current carrying variations because of temperature difference. Figure 15 shows power and performance evaluation. The leakage power remains at about the same level till room temperature, i.e., 27 °C, as shown in Figure 15a. However, it increases exponentially after 60 °C due to a sudden increase in leakage current. The effect of temperature on read and write delays is shown in Figure 15b. The read delay decreases as the temperature increases. This is because an increase in temperature de-

Temperature and Process Variations
Due to the extreme geometrical dimensions and material properties of the FinFET, SRAM cell reliability has become a major concern. Previous subsections show the effects of the supply voltage, threshold voltage (V th ), and sizing on the FinFET 6T-SRAM cell performance. This section explores reliability under temperature and process variations. Figure 14 illustrates the effect of temperature on noise margins.

Temperature and Process Variations
Due to the extreme geometrical dimensions and material properties of the FinFET, SRAM cell reliability has become a major concern. Previous subsections show the effects of the supply voltage, threshold voltage (Vth), and sizing on the FinFET 6T-SRAM cell performance. This section explores reliability under temperature and process variations.  (Figure 14c) show variations of about 13%, 22%, and 12%, respectively. This variation is with reference to the noise measurement at room temperature i.e., 27 °C. Here, RSNM is more important, as it is the lowest of the two noise margins. Nonetheless, it remains above 115mV. The variation in noise margin is due to the current carrying variations because of temperature difference. Figure 15 shows power and performance evaluation. The leakage power remains at about the same level till room temperature, i.e., 27 °C, as shown in Figure 15a. However, it increases exponentially after 60 °C due to a sudden increase in leakage current. The effect of temperature on read and write delays is shown in Figure 15b. The read delay decreases as the temperature increases. This is because an increase in temperature de- The temperature range is from −40 • C to 125 • C. HSNM (Figure 14a), RSNM (Figure 14b), and WSNM (Figure 14c) show variations of about 13%, 22%, and 12%, respectively. This variation is with reference to the noise measurement at room temperature i.e., 27 • C. Here, RSNM is more important, as it is the lowest of the two noise margins. Nonetheless, it remains above 115mV. The variation in noise margin is due to the current carrying variations because of temperature difference. Figure 15 shows power and performance evaluation. The leakage power remains at about the same level till room temperature, i.e., 27 • C, as shown in Figure 15a. However, it increases exponentially after 60 • C due to a sudden increase in leakage current. The effect of temperature on read and write delays is shown in Figure 15b. The read delay decreases as the temperature increases. This is because an increase in temperature decreases the transistor current due to the carrier mobility degradation. A decrease in the transistor current at high temperature causes BLs capacitances to charge to a lower level, thus decreasing the discharge time. Consequently, this reduces the read delay and read power consumption at high temperatures, whereas the write performance is opposite to the read performance. An increase in the temperature decreases the transistor current; thus, accordingly, the internal cell nodes take more time to change their stored logic level. A 6T-SRAM cell's performance varies with the FinFET process deviations. Table 8 reports performance parameters under NFET and PFET variations. The read performance is worse for the slower NFET because of performance degradation in the pull-down transistor. Similarly, the write performance shows improvement for the fast PFET due to the increased access transistor strength. Improvement in the cell and pull-up ratio improves the read and write performances, respectively, and vice versa. On account of the maximum transistor strength difference, the RSNM and WSNM show their worst tolerance for FS and SF process corners, respectively.

Conclusions
Modern AI requires deep neural networks (DNNs), whose big data requirement is bottleneck by SRAM performance. CIM solves this challenge by reducing off-chip access. We reviewed silicon-verified all-digital domain SRAM-based accelerators and custom SRAM-based CIM solutions, focusing on multiply and accumulate (MAC) operations for multiple performance measurement benchmarks. ASIC-based CIM solutions utilize 8b input precession, whereas weight precision ranges from 4b-16b. The maximum operational frequency reported is 500 MHz. The highest energy efficiency achieved is 156 TOPS/W, with an area efficiency of 6750 GOPS/mm 2 . A 6T-SRAM cell's performance varies with the FinFET process deviations. Table 8 reports performance parameters under NFET and PFET variations. The read performance is worse for the slower NFET because of performance degradation in the pull-down transistor.
Similarly, the write performance shows improvement for the fast PFET due to the increased access transistor strength. Improvement in the cell and pull-up ratio improves the read and write performances, respectively, and vice versa. On account of the maximum transistor strength difference, the RSNM and WSNM show their worst tolerance for FS and SF process corners, respectively.

Conclusions
Modern AI requires deep neural networks (DNNs), whose big data requirement is bottleneck by SRAM performance. CIM solves this challenge by reducing off-chip access. We reviewed silicon-verified all-digital domain SRAM-based accelerators and custom SRAM-based CIM solutions, focusing on multiply and accumulate (MAC) operations for multiple performance measurement benchmarks. ASIC-based CIM solutions utilize 8b input precession, whereas weight precision ranges from 4b-16b. The maximum operational frequency reported is 500 MHz. The highest energy efficiency achieved is 156 TOPS/W, with an area efficiency of 6750 GOPS/mm 2 .
We investigated the basic building block's performance, i.e., the FinFET 6T-SRAM cell, taking noise margins, read operation, and write operation as performance measurement benchmarks. Variations in FinFET sizing, threshold voltage, supply voltage, process, and environmental conditions make the FinFET 6T-SRAM cell unable to achieve the simultaneous optimization of area, power, and performance. Therefore, a cell is either high-density (HD), high-performance (HP), or high-current (HC) to achieve design-specific targets. A comprehensive investigation puts forth FinFET 6T-SRAM cell evaluations and limitations under various processes and operational and environmental conditions. Improvement in one parameter degrades other performance parameters. HC cell configuration shows a write access time of 9.17 ps (1.31 times more efficient than the HD configuration), whereas read and write power consumptions are the lowest for the HD configuration. The HSNM margin is above 300 mV for all cell configurations. Under supply voltage variations, RSNM goes down to 26 mV at 600 mV, which is a major concern. Under temperature variations, write delay increases, but read delay decreases by up to 25% as a result of decreased current driving capability.