1. Introduction
Different from data-intensive computing (typically all kinds of neural networks), high-precision computing is aimed at accurate numerical computations like large-scale matrix multiplication and partial differential equations (PDEs). At present, neural networks (NNs) have a lot of applications and are widely used in daily life. However, applications of high-precision computing could not be solved by NNs, whether in scientific research or in the actual scene. The requirements of throughput and energy efficiency for computing are constantly improving; therefore, CIM (computing-in-memory) is proposed as the solution of the Von Neumann bottleneck. The progress of CIM for numerical computations has great value in finance, engineering, computer science and other disciplines. It is ubiquitous in the field of scientific research and engineering. For example, improving the physical authenticity of virtual reality (VR), analyzing SIS infectious diseases with age structure, studying the BSM equations of derivative pricing theory, preprocessing and extracting image information and many other practical problems involve partial differential equations. In recent years, all kinds of PDEs solvers based on different CIMs, including ReRAM, SRAM, flash memory and PCM, have emerged in numerical computing research. ReRAM-based CIM, as a relatively mature CIM technology, is still used for most of the high-precision CIM research. So, this article reviews the ReRAM technology, the principle of the ReRAM crossbar and the working process of ReRAM in CIM, firstly. Then, it summarizes the numerical methods of PDEs, matrix iterative methods, rearrangement methods and split methods. After that, the working procedure and current developments of all kinds of CIM-based partial differential equation solvers are discussed. Moreover, their performance and characteristics are also compared. Aimed at defects in PDEs solvers, the solutions to get high-precision in large-scale matrix multiplication under environmental effects are proposed. In the future, the developments of CIM-based numerical computations will be improved in the manufacturing process, the write-verify method, the algorithm of sparse matrixes and the software/hardware collaboration.
2. ReRAM
2.1. The Appearance of ReRAM
In the early 1960s, various research about ReRAM devices with all kinds of oxide materials, including Al
2O
3, NiO, SiO
2, Ta
2O
5, ZrO
2, TiO
2 and Nb
2O
5, emerged in an endless stream [
1,
2,
3,
4,
5]. Compared with metal-oxide-semiconductor field-effect transistors (MOSFET), which appeared in 1960 [
6,
7] for the first time, ReRAM devices are the products of the same period. In the 40 years that followed, the technology of resistive switching has not made significant progress in storage applications.
2.2. The Development of ReRAM as NVM
With the explosive growth of portable electronic devices, the requirement and storage capacity of memory devices have increased rapidly. Higher density, faster speed and lower cost have become the goal of new memory devices. ReRAM, as a kind of nonvolatile memory (NVM) [
8], was regarded as one of the continuations of NAND flash memory [
9], though there were many emerging nonvolatile memory (eNVM) devices, such as phase-change memory (PCM) [
10], magnetic random-access memory (MRAM) [
11] and ferroelectric random-access memory (FeRAM) [
12], over the same period.
Table 1 lists the types of NVMs and the category of ReRAM.
ReRAM is a two-terminal device with a variable resistance based on a physical mechanism of conducting filament formation and rupture [
13]. According to the types of filamentary, ReRAM can be divided into oxide ReRAM (OxRAM) and conductive bridge ReRAM (CBRAM) [
14]. ReRAM changes between high-resistance states (HRS) and low-resistance states (LRS) under different operating conditions, representing logic 0 and 1, separately. The formation of the conducting filament corresponds to the LRS, and the HRS is the opposite.
Figure 1a is the structure of the OxRAM, and there is a metal oxide material between the two electrodes in the OxRAM. When a positive voltage is applied between the top electrode (TE) and the bottom electrode (BE), a conductive filament is formed between the two electrodes. While in
Figure 1b, the electrode of the CBRAM is injected with copper or silver metal (Cu or Ag). Moreover, the CBRAM forms the conductive bridge by diffusing Cu or Ag into the oxide or chalcogenide (like GeS
2). When the voltage is sufficiently positive, there will be the oxidation of Cu or Ag at TE, and they will be reduced and deposited at BE. When the voltage transfers to the negative, there will be the reduction of Cu or Ag at TE, and then the conductive filament connecting the two electrodes, and the state of the CBRAM, changes from HRS to LRS [
15].
Since 2000, the research on ReRAM have explosively increased. The first NiO
x-based ReRAM with promising device characteristics and reliability was proposed by I. Baek in 2005 [
16]; the HfO
2/Ti device was made with fully conventional fab materials [
17]; the 3D vertical ReRAM emerged in 2009 [
18]; the 10 × 10 nm
2 Hf/HfO
x crossbar resistive RAM was produced in 2011 [
19]; the first 16-Gb ReRAM integrated chip with copper oxide material [
20], etc. However, because of the 15 nm critical dimension (CD) and the development of 3D NAND flash memory, using the ReRAM in high-density applications became more and more difficult.
2.3. ReRAM in CIM
In recent years, compute-in-memory widely emerged in machine learning (ML) and data-intensive computing. CIM is an effective method to break the Von Neumann bottleneck when computing large-scale data [
21], which can achieve high speed and low-power computing by reducing data handling. Edge AI applications based on deep neural networks (DNNs) are designed to find the solution to achieving portable, fast, accurate and convenient computing. Computing efficiency (defined as terra-operations-per-second-per-millimeter-squared, TOPS/mm
2) and energy efficiency (defined as terra-operations-per-second-per-watt, TOPS/W) are the two most significant parameters to measure the performance of computing. For digital neural network accelerators, multiplication and addition are calculated in the processing element (PE). However, the global buffer or cache is urgently needed to store the weights and the inputs/outputs, which increases the data storage and handling. There is quite a lot of research on optimizing data flow at the chip, micro and SOC (system on chip) levels, but computation and memory are all separated, which leads to efficiency degradation. The memory not only stores the weights and the inputs but also achieves analog computation. That is what computing-in-memory means. Compared with traditional digital signal accelerators, CIM as a mix-signal processing tremendously increases throughput, area efficiency and energy efficiency, but with the decline of accuracy. Though the requirement of analog-to-digital converters (ADCs) is inevitable, CIM still has enormous appeal in power consumption, no matter whether now or in the future.
Because of the resistive properties of ReRAM, ReRAM could be a natural electrical multiplier, with the function of storage following Ohm’s laws and Kirchhoff’s current laws (KCL). Utilizing
and the sum of the current, multiplication and accumulation are calculated by the ReRAM array in the analog domain, respectively. Obviously, eNVM, including resistive random-access memory, is an excellent memory device for CIM, and other eNVM devices such as PCM, MRAM and FeRAM are also of interest for CIM [
22]. ReRAM is a better choice for compute-in-memory because of the 22 nm high reliability and the compatibility with the complementary metal-oxide-semiconductor (CMOS) process at present. In addition, ReRAM could potentially offer multi-bit per cell capability [
22].
2.4. ReRAM Crossbar
Figure 2a,b shows the principle of calculation in the ReRAM crossbar array. One terminal of each RRAM is connected to the bit line (BL) collecting the current, and the other is connected to the word line (WL) as the input of the voltage. Additionally, the currents through the BLs are added as the outputs of the ReRAM array to achieve the accumulation. One of the outputs could be given by
(where
is the current of the
th column,
is the input voltage of the
th row, and
is the conductance of the
th row and
th column in ReRAM array). Due to the fact that the outputs are the analog currents, ADC is an indispensable part of the ReRAM crossbar peripheral circuit.
3. Partial Differential Equation
A partial differential equation is any equation with a function of multiple variables and their partial derivatives [
23]. The function
:
and the Partial differential equation can be defined as:
Typical results of partial differential equations have two forms: the analytical solution and the numerical solution. The analytical solution that can be expressed by an analytical expression is an exact combination of finite common operations. Given any independent variable, its dependent variable could be solved, so the analytical solution is also known as the closed-form solution. The numerical solution needs to be calculated iteratively from the boundary condition step-by-step [
24], and it is the emphasis of the method in PDEs solver research. With the decrease of the step size, the numerical solution will be more accurate. There are multiple numerical methods, including the Euler method, Runge-Kutta, finite-difference [
25], finite-element method [
26] and finite-volume method [
23].
3.1. Numerical Methods
3.1.1. Finite-Difference Method
The principle of the finite-difference method (FDM) is understandable: converting the continuous problem to its corresponding discrete form and getting results within a finite number of calculations. The core mechanism of FDM is to approximate the partial derivatives at each point using its nearby values based on Taylor’s theorem. There are three basic steps in the finite-difference method:
- (1)
Regional discretization. According to the appropriate step size, the domain that needs to be calculated is divided into finite grids and using the function values on discrete grid points to approximate the continuous function values.
- (2)
Transformation of partial differential equations. Using the difference coefficient to approximate the exact derivatives.
- (3)
Solution of partial differential equations. Bringing the boundary conditions into the equation and repeating calculations to solve a large number of equations.
The first-order finite-difference of
of variable
can be defined as:
where
is step size, or the so-called spacing between two grid points, and the first-order difference coefficient of
of variable
can be defined as:
So that the forward difference coefficient, backward difference coefficient and central difference coefficient can be expressed as:
The finite-difference method uses the difference coefficient to approximate the exact derivative. Similarly, the second-order difference coefficient can be expressed as:
Taking a simple one-dimensional heat diffusion equation without a heat source as an example, the equation and boundary conditions are as follows:
where
is the temperature at grid point
at time
,
a2 is the thermal diffusivity,
is the heat source and
and
are the constant boundary conditions at grid point
and
b, respectively.
Then, the solution domain is divided into finite grids in
Figure 3. The step size in the t-axis direction is taken as
, and the step size in the x-axis direction is taken as
. Therefore,
According to (4) and (6), the continuous function (7) will change to the function of values on discrete grid points:
Excluding the boundary values, when
, the actual temperature of each point is
The above system of linear equations contains
equations, and we can rewrite them into a matrix equation:
(13) can also be expressed as:
where
is the coefficient matrix,
is the matrix at time
and
is the matrix at time
.
The one-dimensional equation is the most ideal and simplest physical scenario. However, researchers and engineers need to solve the problems of two-dimensional or even multidimensional space aimed at the actual requirements. At this time, the two-point difference method is not suitable. As an example, there is a two-dimensional equation, a Laplace equation:
According to (6), the Laplace equation can be written as:
where the step size in the x-axis direction is taken as
, and the step size in the y-axis direction is taken as
. In
Figure 4, the function value of each point can be calculated by the function values of its four neighboring points, and the whole domain is divided into
grid points.
The equation in the grid point of
can be expressed as:
Thus, (15) can be transformed into a matrix form as:
where
Consequently, (18) can also be expressed as:
where
is the coefficient matrix,
is the matrix in grid point of
and
is the matrix composed of the boundary conditions.
3.1.2. Runge-Kutta Method
The Runge-Kutta method includes two kinds of methods: second-order Runge-Kutta and fourth-order Runge-Kutta.
Second-Order Runge-Kutta:
and
can be calculated by the following equations,
The solution of the PDE can be expressed as:
Fourth-Order Runge-Kutta:
,
,
and
can be calculated by the following equations,
The solution of the PDE can be expressed as:
3.2. Matrix Iterative Methods
After getting the matrix equations, the next main task is the iterative computation of a large-scale matrix, and common iterative methods include the Jacobi method, the Guass Seidel method and the SOR method.
3.2.1. Jacobi Method
The principle of the Jacobi method is to disassemble the coefficient matrix
into a diagonal matrix
, a negative upper triangular matrix
and a negative lower triangular matrix
. Consequently,
can be written as:
For an equation of a matrix like
, replacing the coefficient matrix
, it will change to:
After the iterative calculation, the calculation result of the
th iteration is
3.2.2. Guass Seidel Method
The principle of the Guass Seidel method is similar to the Jacobi method, where the difference is the derivation process, the Guass Seidel method rewrites
into:
After the iterative calculation, the calculation result of the
th iteration is
In most cases, the Guass Seidel method converges faster than the Jacobi method, and only one set of storage units is needed to store , but and are required to store in the Jacobi method.
3.2.3. SOR Method
Based on the Guass Seidel method, a convergence factor
is added to the SOR method in order to improve the convergence speed. The calculation result of the
th iteration is
3.2.4. Krylov Subspace Method
For an equation of a matrix like
, the result
can be expressed as
directly. However, if the matrix
has a large size or is a sparse matrix,
will be very hard to solve. The principle of the Krylov subspace method is to approximate
.
where
are unknown coefficients, the step size
is related to the accuracy of the approximation and is less than the dimension of matrix
.
After the discussion above, the number of iterations used by the Jacobi method, the Guass Seidel method and the SOR method is decreasing, which also means the improving of the calculation accuracy. Additionally, in terms of the hardware consumption and hardware implementation, the Guass Seidel method is better than the others. Moreover, the Krylov subspace method sacrifices accuracy to improve speed and is generally used large-scale matrixes. Relatively speaking, the SOR method is the most accurate and efficient method. However, the matrixes could not be inputted into the ReRAM arrays iteratively as weights. Calculating the matrixes directly will waste computing power in the multiplication of zero elements, because the matrixes used in numerical computations are generally sparse matrixes which include more than 60% of zero elements.
3.3. Rearrangement and Split
Rearrangement and split are proposed to solve the multiplication of sparse matrixes whose majority of elements are zero elements. There are a number of studies focusing on cutting or splitting matrixes to improve computational efficiency, while many matrixes cannot be cut or split directly all the time. Therefore, rearrangement of sparse matrixes is needed to change the layout of matrix elements. In sparse matrix-vector multiplication (SpMV), the rearrangement matrix and the splitting matrix can replace sparse matrixes with dense matrix operations in many cases, which can greatly save memory and reduce computational overhead [
27]. Moreover, the methods reducing the bandwidth of sparse matrixes in SpMV are quite useful for matrixes got from the PDEs. Taking a
sparse matrix as an example in
Figure 5, the gray part of the matrix is composed of zero elements, and the blue part is composed of nonzero elements. Firstly, the
sparse matrix is rearranged to a diagonal aggregation matrix (not a diagonal matrix). In addition, it is rewritten as a combination of
and
. At last, the matrix is divided into four types of
slices (a, b, c, d).
According to the discussion of the matrix iterative methods in
Section 3.2, the number of iterations used by the Jacobi method, the Guass Seidel method and the SOR method is decreasing. Though the Guass Seidel method has a lower hardware consumption and the SOR method has the highest computational efficiency, utilizing the Jacobi method with the splitting matrix can not only improve the accuracy but also cut down the number of iterations efficiently. Because the principal concern is which method has fewer zero elements, the Jacobi method is still the best choice for the CIM-based PDEs solver at present.
4. CIM-Based Partial Differential Equation Solver
With the discussion of matrix iterative methods, the problem of the PDEs will be changed to the multiplication of the large-scale matrix. Therefore, the essence of the PDEs Solver is to achieve the high-performance multiplication. The CIM-based PDEs Solver could be composed with input drivers, shifters, adders, a computing array and DAC/ADCs. Matrix will be stored in the array as weight, and matrix will be entered into the array as input. Because of the limited size and precision of the array, the array maps a single column of the matrix to serval columns of an array. After the shifters and adders, the results of the serval columns will be collected and then be quantified in ADCs. The ADCs in the CIM PDEs solvers are usually SAR ADCs to balance the area, power and speed. Recently, CIM-based partial differential equation solvers can be based on different CIMs, including ReRAM, SRAM, flash memory and PCM. Each of them has advantages and their own suitable calculating methods and circuits.
4.1. ReRAM-Based Partial Differential Equation Solver
The coefficient matrix has been divided into several slices, and next they will be written as resistance values into the ReRAM array. Matrix will be entered into the ReRAM array as input, and the slices can be used several times without replacement. That means the ReRAM array rarely needs to be written. The larger the size of the slices, the smaller the number of slices with the decrease of write times. On the contrary, the times of the iterative calculation will increase, and the computational efficiency will be lower. After the multiplication in the simulation domain, ADCs are required to convert the analog signals into digital signals. Then, after the digital signal processing, the result will be iterative, as will the input of the ReRAM array, and the final result can be got from the ReRAM-based partial differential equation solver.
In the work of Mohammed A. Zidan, a general memristor-based partial differential equation solver is proposed with the finite-difference method. To solve the general matrix Equation (20), they use the Jacobi method to decrease the calculation of zero elements. Their memristor is composed of a Ta top electrode, a Pd bottom electrode and a thin Ta
2O
5−x metal oxide. Additionally, the memristor crossbar has extremely high energy efficiency and area utilization, but a lower accuracy because of the variation. With the write-verify approach, they decrease the conductance variation from 5.3% to 0.85%, which overcomes the accuracy defects of the ReRAM immensely. They divide the matrix into equally sized slices, and practical crossbar sizes can be mapped onto the active slices exactly. Cutting the matrix not only minimizes the effects of the series resistance, sneak currents and virtual grounds, but also reduces the operation of zero elements [
23]. With the memristor-based hardware and matching software system, they get high-precision computing results.
Shichao Li and Wenchao Chen simulated fully coupled multiphysics based on bipolar resistive random-access memory in 2017 [
28]. They utilized the finite-difference method and the Scharfetter-Gummel method to solve the PDEs, solving three fully coupled partial differential equations by the crossbar of the HfO
x–based ReRAM [
29]. Like the work of Mohammed A. Zidan, the accuracy has not been effectively improved, while more PDEs are discussed in this work.
In recent work by S. S. Ensan and S. Ghosh, a ReRAM-based linear first-order PDE solver (ReLOPE) is proposed to solve PDEs of the following form:
Unlike the general memristor-based partial differential equation solver [
23], ReLOPE is the first PDEs solver purely based on hardware and used only for linear first-order PDEs [
20]. Moreover, the principle of the ReLOPE is based on the second-order Runge-Kutta method. Though theoretically fourth-order Runge-Kutta offers higher accuracy, the value of the resistance tremendously interferes with the iteration under the limitation of the accurate programming of the ReRAM. Substituting (3) and (4) into (37), (37) will change to:
Similarly, substituting Equations (3)–(6) into (37), (37) will change to:
Figure 6 is the overview of ReLOPE. ReLOPE includes a fully ReRAM crossbar-based CIM, shifters, adders and DAC/ADCs. It expands the operating range of the solution by exploiting shifters to shift input data and output data. ReLOPE improves its power consumption, solving a PDE by 31.4×. The above methods with ReLOPE used have two limitations: (1) it is unachievable to program RRAMs with this accuracy (six decimal points for fourth-order Runge-Kutta in the paper of ReLOPE) at the current technical level; (2) the accuracy has loss due to the nonlinear variation of resistance with voltage and the iterative use of ADC. Therefore, ReLOPE cannot further improve its accuracy with the method of Runge-Kutta.
4.2. SRAM-Based Partial Differential Equation Solver
Yannis Tsividis et al. proposed a programmable, clockless, continuous-time 8-bit hybrid (mixed analog/digital) architecture (ADC + SRAM + DAC) for solving ordinary and partial differential equations. The architecture, shown in
Figure 7, is used to achieve nonlinear functions. The system consisted of an analog multiplier and analog adder/subtractor. The hybrid nonlinear function generator achieves 16× lower power dissipation and the computational accuracy of about 0.5% to 5%.
In 2019, Thomas Chen and Jacob Botimer proposed a SRAM-based accelerator for solving PDEs [
30]. They reformulated the multigrid Jacobi method in a residual form. By interleaving coarse-grid iterations with fine-grid iterations, their system reduced low-frequency errors to accelerate convergence. Their system contains 4 MAC–SRAMs, and each is a 320 × 64 8T SRAM array. The architecture is shown in
Figure 8a,b. A year later, they updated the mapping of the MAC-SRAM [
31] on the basis of their previous research. Finally, the SRAM-based accelerator achieved 56.9–GOPS, consuming 16.6 mW at 200–MHz. However, the SRAM-based CIM has the same accuracy problems due to limited multiplicand precision and limited ADC resolution.
4.3. Flash Memory-Based Partial Differential Equation Solver
Jiezhi Chen et al. proposed a flash memory-based CIM hardware system to improve the computation efficiency of the time-dependent partial differential equations [
32]. Based on the FDM and Jacobi algorithm, they got the matrix equation, then the coefficient matrix was mapped into the flash memory array as threshold voltages. Matrix
is transformed into pulse time as input. Compared with ReRAM, flash memory enables the realization of vector-matrix multiplication with high accuracy and a good tolerance for device error. Moreover, it also has the advantages of ultra-high density and low cost.
4.4. PCM-Based Partial Differential Equation Solver
In 2018, Manuel Le Gallo and Abu Sebastian et al. used mixed-precision in-memory computing which combined a von Neumann machine with a computational memory unit to solve PDEs [
33]. The mixed-precision CIM PDEs solver uses a low-precision computational memory unit to obtain the approximate solution of the first part and high-precision processing iteratively to improve accuracy in the second part. It achieves the low-precision matrix-vector multiplication by using a PCM crossbar array based on the iterative Krylov-subspace method. The PCM-based PDEs solver could offer up to 80 times lower energy consumption than the FPGA solution because of the architecture of PCM-based CIM and the mixed-precision system.
4.5. Discussion of Partial Differential Equation Solver
The memristive crossbar CIM PDEs solver based on the PCM chip could already offer up to 80 times lower energy consumption than the FPGA solution. The energy efficiency and speed of the memristive crossbar CIM PDEs solver is several hundred times than the PDEs solvers based on IBM POWER8 central processing unit (CPU) and NVIDIA Titan RTX graphics processing unit (GPU) [
26,
34]. However, the CIM-based PDEs usually solves the 4-bit to 8-bit PDEs computation in satisfaction of accuracy requirements.
Table 2 summaries the representative CIM-based PDEs solvers recently and their performance are compared.
The deficiencies of ReRAM, including low inherent accuracy, nonlinearity and susceptibility to environmental changes, directly determine the use of ReRAM for high-precision computing. With the constant development of process levels, ReRAM devices will be manufactured accurately and get close to high-precision computing to a certain extent. At the same time, the write-verify method or multiread/write method can also solve accuracy problems, but with an increase in latency. Both of them have a lot of worth in research in ReRAM-based PDEs solvers in the future. To reduce the memristor device variability and nonlinearity, in the work of Mohammed A. Zidan, a general memristor-based partial differential equation solver used a write–verify method to write and update the coefficient values in the crossbar. The write-verify operation is based on a sequence of write-read pulse pairs, each pair including a programming (set or reset) pulse and a subsequent read pulse. When the conductance reaches within a predetermined range of the target value, the write operation is considered complete. The write-verify feedback method could decrease the cell-to-cell variation of <1% from 5.3%.
Though flash memory and SRAM have higher accuracy and density compared with ReRAM for CIM in numerical computations, ReRAM with nonvolatile and multivalued characteristics is still the most extensively studied for CIM-based PDEs solvers.
The sparse coefficient matrix, with only a small number of nonzero elements, usually has a large matrix size. The coefficient matrix must be divided into a certain number of slices, whose sizes are typically or . When the ReRAM array, having a huge size and multiple rows or columns, are unopened, the result of the ReRAM array may have a large error because of the unopened leakage currents. Hence, the size of the ReRAM array, and the size of the slices above-discussed, not only depends on the system working, but also is determined by the hardware operation of the ReRAM array. Moreover, the collected currents need ADC for the next computation, and there are losses of accuracy in the process. Other CIM-based PDEs solvers also have similar problems.
How to cut apart the matrix into slices, how small the size of crossbar arrays is and how the single array is to work are urgent problems to be solved at the system level. At the same time, rearrangement of sparse matrixes and SpMV will play an important role in the future of CIMs for high-precision computing tasks.
5. Summary and Outlook
Different from neural network computations, numerical computations are used in early, basic disciplines, receiving general attention from researchers all the time. In the past few years, the research has become more and more popular to solve numerical computations using CIM. CIM for numerical computations improves energy efficiency and computational efficiency. All kinds of CIM PDEs based ReRAM, SRAM, flash memory and PCM have been proposed with various characteristics. The recent developments of CIM for numerical computations were compared. This article described the ReRAM-based CIM technology in detail. Then, it reviewed the numerical methods of PDEs and matrix iterative methods. Finally, the future of CIM for numerical computations can be summarized as follows:
Regardless of which array of CIM, the accuracy is still the biggest challenge;
More research of CIM-based numerical computations should focus on the computational methods of sparse matrixes;
As for matrix iterative methods, the principal concern is which method has fewer zero elements, so the Jacobi method is still the best choice for CIM-based PDEs solvers at present. In addition, the Krylov subspace method is better when solving very large-scale matrixes;
The future of CIM for high-precision computing tasks really needs a software/hardware codesign to collaborate the algorithm and the CIM array.
Author Contributions
Conceptualization, C.P. and X.X.; investigation, D.Z., Y.W., J.S., Y.C., Z.G., G.D., M.Z., F.W. and W.W.; writing—original draft preparation, Z.G.; writing—review and editing, K.Z.; supervision, K.Z.; project administration, C.P. and X.X. All authors have read and agreed to the published version of the manuscript.
Funding
This work is supported in part by National Key R&D Program of China under grant 2018YFB0407500. At the same time, this work is also supported by The Laboratory Open Fund of Beijing Smart-chip Microelectronics Technology Co., Ltd., (Beijing, China) under grant SGITZX00XSJS2108594.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Hickmott, T.W. Low-frequency negative resistance in thin anodic oxide films. J. Appl. Phys. 1962, 33, 2669–2682. [Google Scholar] [CrossRef]
- Gibbons, J.; Beadle, W. Switching properties of thin Nio films. Solid-State Electr. 1964, 7, 785–790. [Google Scholar] [CrossRef]
- Nielsen, P.; Bashara, N. The reversible voltage-induced initial resistance in the negative resistance sandwich structure. IEEE Trans. Electron. Devices 1964, 11, 243–244. [Google Scholar] [CrossRef]
- Hiatt, W.R.; Hickmott, T.W. Bistable switching in niobium oxide diodes. Appl. Phys. Lett. 1965, 6, 106–108. [Google Scholar] [CrossRef]
- Chen, Y. ReRAM: History, Status, and Future. IEEE Trans. Electron. Devices 2020, 67, 1420–1433. [Google Scholar] [CrossRef]
- Atalla, M.M.; Kahng, D. 1960—Metal Oxide Semiconductor (MOS) Transistor Demonstrated Silicon Engine; Tech. Rep.; Computer History Museum: Mountain View, CA, USA, 1960. [Google Scholar]
- Kahng, D. Electric Field Controlled Semiconductor Device. U.S. Patent 3 102 230 A, 27 August 1963. [Google Scholar]
- Xue, X.; Jian, W.; Yang, J.; Xiao, F.; Chen, G.; Xu, S.; Xie, Y.; Lin, Y.; Huang, R.; Zou, Q.; et al. A 0.13 µm 8 Mb Logic-Based CuxOy ReRAM With Self-Adaptive Operation for Yield Enhancement and Power Reduction. IEEE J. Solid-State Circuits 2013, 48, 1315–1322. [Google Scholar] [CrossRef]
- Ishii, T.; Johguchi, K.; Takeuchi, K. Vertical and horizontal location design of program voltage generator for 3D-integrated ReRAM/NAND flash hybrid SSD. In Proceedings of the 2014 International Conference on Electronics Packaging (ICEP), Toyama, Japan, 23–25 April 2014. [Google Scholar]
- Joshi, V.; le Gallo, M.; Haefeli, S.; Boybat, I.; Nandakumar, S.R.; Piveteau, C.; Dazzi, M.; Rajendran, B.; Sebastian, A.; Eleftheriou, E. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 2020, 11, 2473. [Google Scholar] [CrossRef] [PubMed]
- Jain, S.; Ranjan, A.; Roy, K.; Raghunathan, A. Computing in memory with spin-transfer torque magnetic RAM. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2018, 26, 470–483. [Google Scholar] [CrossRef]
- Takashima, D. Overview of FeRAMs: Trends and perspectives. In Proceedings of the 2011 11th Annual Non-Volatile Memory Technology Symposium Proceeding, Shanghai, China, 7–9 November 2011; pp. 1–6. [Google Scholar]
- Wong, H.-S.P.; Lee, H.; Yu, S.; Chen, Y.-S.; Wu, Y.; Chen, P.-S.; Lee, B.; Chen, F.T.; Tsai, M.J. Metal-oxide RRAM. Proc. IEEE 2012, 100, 1951–1970. [Google Scholar] [CrossRef]
- Jameson, J.R.; Blanchard, P.; Cheng, C.; Dinh, J.; Gallo, A.; Gopalakrishnan, V.; Gopalan, C.; Guichet, B.; Hsu, S.; Kamalanathan, D.; et al. Conductive-bridge memory (CBRAM) with excellent high-temperature retention. In Proceedings of the 2013 IEEE International Electron Devices Meeting, Washington, DC, USA, 9–11 December 2013; pp. 738–741. [Google Scholar]
- Yu, S.; Wong, H.-P. Compact Modeling of Conducting-Bridge Random-Access Memory (CBRAM). IEEE Trans. Electron. Devices 2011, 58, 1352–1360. [Google Scholar]
- Baek, I.G.; Lee, M.S.; Seo, S.; Lee, M.J.; Seo, D.H.; Suh, D.-S.; Park, J.C.; Park, S.O.; Kim, H.S.; Yoo, I.K.; et al. Highly scalable non-volatile resistive memory using simple binary oxide driven by asymmetric unipolar voltage pulses. In Proceedings of the IEEE International Electron Devices Meeting, San Francisco, CA, USA, 13–15 December 2004; pp. 587–590. [Google Scholar]
- Lee, H.Y.; Chen, P.S.; Wu, T.Y.; Chen, Y.S.; Wang, C.C.; Tzeng, P.J.; Lin, C.H.; Chen, F.; Lien, C.H.; Tsai, M.-J. Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM. In Proceedings of the 2008 IEEE International Electron Devices Meeting, San Francisco, CA, USA, 15–17 December 2008; pp. 297–300. [Google Scholar]
- Yoon, H.S.; Baek, I.-G.; Zhao, J.; Sim, H.; Park, M.Y.; Lee, H.; Oh, G.-H.; Shin, J.C.; Yeo, I.-S.; Chung, U.-I. Vertical cross-point resistance change memory for ultra-high density non-volatile memory applications. In Proceedings of the 2009 Symposium on VLSI Technology, Kyoto, Japan, 15–17 June 2009; pp. 26–27. [Google Scholar]
- Govoreanu, B.; Kar, G.; Chen, Y.-Y.; Paraschiv, V.; Kubicek, S.; Fantini, A.; Radu, I.; Goux, L.; Clima, S.; Degraeve, R.; et al. 10×10 nm2 Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation. In Proceedings of the 2011 International Electron Devices Meeting, Washington, DC, USA, 5–7 December 2011; pp. 729–732. [Google Scholar]
- Sills, S.; Yasuda, S.; Strand, J.; Calderoni, A.; Aratani, K.; Johnson, A.; Ramaswamy, N. A copper ReRAM cell for storage class memory applications. In Proceedings of the 2014 Symposium on VLSI Technology (VLSI-Technology), Honolulu, HI, USA, 9–12 June 2014; pp. 80–81. [Google Scholar]
- Hayakawa, Y.; Himeno, A.; Yasuhara, R.; Boullart, W.; Vecchio, E.; Vandeweyer, T.; Witters, T.; Crotti, D.; Jurczak, M.; Fujii, S.; et al. Highly reliable TaOx ReRAM with centralized filament for 28-nm embedded application. In Proceedings of the 2015 Symposium on VLSI Technology (VLSI Technology), Kyoto, Japan, 17–19 June 2015; pp. 14–15. [Google Scholar]
- Yu, S. Compute-in-Memory for AI: From Inference to Training. In Proceedings of the 2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, 10–13 August 2020. [Google Scholar]
- Ensan, S.S.; Ghosh, S. ReLOPE: Resistive RAM-Based Linear First-Order Partial Differential Equation Solver. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 237–241. [Google Scholar] [CrossRef]
- Ames, W.F. Numerical Methods for Partial Differential Equations; Academic: New York, NY, USA, 2014. [Google Scholar]
- Eymard, R.; Gallouët, T.; Herbin, R. Handbook of Numerical Analysis; Ciarlet, G.P., Lions, L.J., Eds.; Elsevier: Amsterdam, The Netherlands, 2000; pp. 713–1018. [Google Scholar]
- Zidan, M.A.; Jeong, Y.; Lee, J.; Chen, B.; Huang, S.; Kushner, M.J.; Lu, W.D. A general memristor-based partial differential equation solver. Nat. Electron. 2018, 1, 411–420. [Google Scholar] [CrossRef]
- Kabir, H.; Booth, J.D.; Raghavan, P. A multilevel compressed sparse row format for efficient sparse computations on multicore processors. In Proceedings of the 2014 21st International Conference on High Performance Computing (HiPC), Goa, India, 17–20 December 2014; pp. 1–10. [Google Scholar]
- Li, S.; Chen, W.; Luo, Y.; Hu, J.; Gao, P.; Ye, J.; Kang, K.; Chen, H.; Li, E.; Yin, W.-Y. Fully Coupled Multiphysics Simulation of Crosstalk Effect in Bipolar Resistive Random Access Memory. IEEE Trans. Electron. Devices 2017, 64, 3647–3653. [Google Scholar] [CrossRef]
- Yu, S. Resistive Random Access Memory (RRAM): From Devices to Array Architectures; Iniewski, K., Ed.; Morgan & Claypool: Saanichton, BC, Canada, 2016. [Google Scholar]
- Chen, T.; Botimer, J.; Chou, T.; Zhang, Z. An Sram-Based Accelerator for Solving Partial Differential Equations. In Proceedings of the 2019 IEEE Custom Integrated Circuits Conference (CICC), Austin, TX, USA, 14–17 April 2019; pp. 1–4. [Google Scholar]
- Chen, T.; Botimer, J.; Chou, T.; Zhang, Z. A 1.87-mm2 56.9-GOPS Accelerator for Solving Partial Differential Equations. IEEE J. Solid-State Circuits 2020, 55, 1709–1718. [Google Scholar] [CrossRef]
- Feng, Y.; Zhan, X.; Chen, J. Flash Memory based Computing-In-Memory to Solve Time-dependent Partial Differential Equations. In Proceedings of the 2020 IEEE Silicon Nanoelectronics Workshop (SNW), Honolulu, HI, USA, 13–14 June 2020; pp. 27–28. [Google Scholar]
- Le Gallo, M.; Sebastian, A.; Mathis, R.; Manica, M.; Giefers, H.; Tuma, T.; Bekas, C.; Curioni, A.; Eleftheriou, E. Mixed-precision in-memory computing. Nat. Electron. 2018, 1, 246–253. [Google Scholar] [CrossRef]
- Jiang, H.; Huang, S.; Peng, X.; Yu, S. MINT: Mixed-Precision RRAM-Based IN-Memory Training Architecture. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).