3-D Synapse Array Architecture Based on Charge-Trap Flash Memory for Neuromorphic Application

: In order to address a fundamental bottleneck of conventional digital computers, there is recently a tremendous upsurge of investigations on hardware-based neuromorphic systems. To emulate the functionalities of artiﬁcial neural networks, various synaptic devices and their 2-D cross-point array structures have been proposed. In our previous work, we proposed the 3-D synapse array architecture based on a charge-trap ﬂash (CTF) memory. It has the advantages of high-density integration of 3-D stacking technology and excellent reliability characteristics of mature CTF device technology. This paper examines some issues of the 3-D synapse array architecture. Also, we propose an improved structure and programming method compared to the previous work. The synaptic characteristics of the proposed method are closely examined and validated through a technology computer-aided design (TCAD) device simulation and a system-level simulation for the pattern recognition task. The proposed technology will be the promising solution for high-performance and high-reliability of neuromorphic hardware


Introduction
Neuromorphic systems have been attracting much attention for next-generation computing systems to overcome the von Neumann architecture [1][2][3][4][5]. The term "neuromorphic" refers to an artificial neural system that mimics neurons and synapses of the biological nervous system [3]. A neuron generates a spike when a membrane potential which is the result of the spatial and temporal summation of the signal received from the pre-neuron exceeds a threshold, and the generated spike is transmitted to the post-neuron. A synapse refers to the junction between neurons, and each synapse has its own synaptic weight which is the connection strength between neurons [6]. In a neuromorphic system, synaptic weight can be represented by the conductance of synapse device.
In our previous work, we proposed a 3-D stacked synapse array based on a charge trap flash (CTF) device [11]. Three-dimensional stacking technology is currently used in the commercialized Not AND (NAND) flash memory products for ultra-high density [14]. Similarly, a 3-D stacked synapse array has the advantage of chip-size reduction when implementing very-large-size artificial neural networks. Consequently, it has the potential to be a promising technology for implementing neuromorphic hardware systems. For the design of the 3-D stacked synapse array architecture, there are several issues. At the full array level, how to operate each layer selectively and how to efficiently form the metal interconnects with peripheral circuits are critical issues. At the device level, how to implement accurate synaptic weight levels with low energy consumption is an important issue. Especially, linear and symmetric synaptic weight (conductance) modulations are essential to improve the accuracy of neuromorphic hardware systems [1][2][3][4].
In this paper, we examine these issues and suggest two improvements in terms of an architecture design and a device operation method. The rest of the paper is structured as follows: Section 2 contains design methods based on the viewpoint of a full-chip architecture. In this section, we review the 3-D stacked synapse array structure developed in the previous work [11] and propose an improved version of the 3-D stacked synapse array architecture to solve the unwanted problem of the previous version. In Section 3, we propose an improved programming method to obtain linear and symmetric conductance changes. Using a pattern recognition application with the Modified National Institute of Standards and Technology (MNIST) database, we demonstrate the improvement of the proposed method.

Design Methods of 3-D Synapse Array Architecture
In general, a large-size artificial neural network that has a large number of synaptic weights and neuron layers is required to obtain high performance artificial intelligence tasks. In the case of the ImageNet classification challenge, state-of-the-art deep neural network (DNN) architectures have 5~155M synaptic weight parameters [16]. In order to implement efficiently a large-size artificial neural network on a limited-size hardware chip, we proposed the 3-D stacked synapse array structure ( Figure 1) in the previous work [11]. Unit synapse cell is composed of two CTF devices having two drain nodes (D(+), D(−)) and common source node(S). The D(+) part is connected to the output neuron circuit to increase membrane potential, acting as an excitatory synaptic behavior. The D(−) part is connected to the output neuron circuit to decrease membrane potential, acting as an inhibitory synaptic behavior. By using this configuration, it can be represented the negative and positive weight at the same time. As summarized in Table 1, the CTF device has several advantages over other non-volatile memory devices. First, it does not need an additional selector device because the three-terminal MOSFET-based unit cell has a built-in selection operation. Second, it has perfect CMOS compatibility. Third, the linear and incremental modulation of the weight (conductance) can be more easily achieved because its conductance is determined by the number of trapped charges. Fourth, it has good retention reliability characteristics. On the other hand, the drawback of CTF is large power consumption during program operation. Therefore, CTF devices are the best solution for off-chip learning-based neuromorphic systems where frequent weight updates do not occur. The proposed 3-D stacked synapse array structure is based on the word-line stacking method which is similar to the commercialized V-NAND flash memory. Therefore, it has the advantage of utilizing the existing stable process methods used in V-NAND flash memory.
A key issue in the design of 3-D stacked synapse array architecture is the metal interconnection. For example, a 4-layer stacked synapse array would have four times as many word lines as a 2-D synapse array. If the word-line (WL) decoder is connected by a conventional metal interconnection method, the vertical length of the WL decoder (HWL_Decoder) will increase as illustrated in Figure 2, resulting in an enormous loss of area efficiency in terms of full-chip level architecture. To solve this issue, we proposed the smart design of a layer select decoder with 3-D metal line connection in the previous work [11]. As shown in Figure 3a, the area of WL decoder is not increased, and a layer select decoder is added to selectively operate each stacked layer. A layer select decoder delivers the gate voltages generated by the WL decoder to the WLs of the selected layer. It is important to note that the vertical length of a layer select decoder is the same as that of the WL decode, and the horizontal length is only 4 F×N where F is the minimum feature size and N is the number of staked layers. The specific structure of the transistors and metal interconnects is depicted in our previous paper [11].
The top-view layout of the 3-D synapse array architecture is illustrated in Figure 4. The layer select decoder is composed of pass transistors. The pass transistors are arranged next to each word line and are connected one-to-one with each WL contact. The gate nodes of the pass transistors are vertically connected to form a layer select line (LSL) that is controlled by LSL control circuit. Through this configuration, each stacked layer can be selectively operated while maintaining compact full-chip configuration efficiency. For example, if the turn-on voltage is applied to L4 and the turn-off voltages are applied to L1~L3, pass transistors corresponding to L = 4 are activated. Consequently, the WL voltages generated in the WL decoder are transferred to the forth-layer WLs (L = 4).   In this paper, we proposed an improved architecture design compared to the previous work, adding the ground select decoder as shown in Figure 4. If there is only a layer select decoder, the WLs of the unselected stacked layer are on a floating state because they are not connected to the WL decoder. In this case, the potential of the WLs of the unselected layer varies due to the capacitive coupling between the stacked WLs. In the worst case, the WLs of unselected layers located above or below (L = n − 1 or L = n + 1) the selected layer (L = n) may be boosted together when a high voltage is applied to the selected WLs. To solve this inherent risk of the architecture of the previous version, a ground select decoder that applies a turn-off voltage (0 V) to the WLs of the unselected layer is added to the right side of the main 3-D stacked synapse array as shown in Figure 4.
The detailed manufacturing process of the 3-D synapse array was described in our previous paper [11]. The revised synapse array architecture can be made with the same process method. Since the newly added ground select decoder structure has the same structure as the layer select decoder, it can be made by just adding the same layout as the layer select decoder.
To validate the synaptic operations of the designed CTF-based synapse device, the technology computer-aided design (TCAD) device simulation (Synopsys Sentaurus [17]) was used. The specific device parameters are summarized in Table 2. Electrical characteristics of the designed synapse device are discussed in the next chapter.

Synapse Device Operation
In the proposed synapse array (Figure 3b), synaptic weight (w ij ) of the artificial neural network is represented as follows: As depicted in Figure 3b, G + ij and G − ij are the conductances of the D(+) CTF device and D(−) CTF device, respectively. Each conductance is determined by the amount of trapped charge in each charge-trap layer (silicon nitride). For the conductance modulation, hot-electron injection (HEI) and hot-hole injection (HHI) can be used as a charge injection mechanism. The potentiation process of increasing the synaptic weight can be performed by increasing G + ij and decreasing G − ij . On the other hand, the depression process of decreasing the synaptic weight can be carried out by decreasing G + ij and increasing G − ij . Using a technology computer-aided design (TCAD) device simulation (Synopsys Sentaurus), we verify two pulse schemes for the modulation of synaptic weight. A successive-pulse programming scheme and the incremental-step-pulse programming (ISPP) scheme are illustrated in Figure 5a,b, respectively. A successive-pulse programming is a method of continuously applying drain pulses with the same voltage as shown in Figure 5a. In this programming scheme, the amount of conductance change is controlled by the number of applied drain pulses. When the drain pulse is applied, the sign of the gate voltage determines whether HEI or HHI occurs. If the drain pulse is applied when the gate bias is positive (6 V), HEI occurs. In this case, the threshold voltage increases by the trapped electron and the conductance decreases. On the other hand, if the drain pulse is applied when the gate bias in negative (−7 V), HHI occurs. In this case, the threshold voltage decreases by the trapped hole and the conductance increases. The proposed unit synapse cell is composed of two CTF devices. Consequently, the potentiation operation is conducted simultaneously by HHI in the D(+) CTF device and HEI in the D(−) CTF device. The depression operation is conducted by HEI in the D(+) device and HHI in the D(−) device.
The ISPP is used for the program scheme of NAND flash memory [18]. The program pulse is increased by a constant value V step after each program step, as shown in Figure 5b. In our previous paper, only successive-pulse programming was used. In this work, we applied the ISPP method to the conductance modulation of our designed synapse device. Using a TCAD device simulation, we compared the conductance modulation characteristics of successive-pulse programming and the ISPP. As shown in Figure 6, the ISPP scheme shows better synaptic behavior than the successive-pulse scheme. The ISPP scheme showed that the conductance changes linearly according to the number of applied pulses. Also, the range of available synaptic weights (memory window) can be further increased. Consequently, the ISPP scheme can adjust the synaptic weight more accurately than the successive-pulse programming scheme during the learning process. However, the ISPP scheme also has a drawback. In order to determine the start pulse voltage, a verify operation is required prior to programming to check the current synaptic weight value. Therefore, the ISPP scheme can increase the accuracy of the learning process, but also increases time and energy consumption. Gradual changes of synaptic weights by successive-pulse programming and incremental-step-pulse programming (ISPP).

System-Level Simulation for Pattern Recognition
To validate the functionality of the proposed programming schemes, the single-layer artificial neural network system for the Modified National Institute of Standards and Technology (MNIST) pattern recognition was simulated. The MNIST database is a large database of handwritten digits, which contains about 60,000 learning images and 10,000 test images [19]. A total of 784 input neurons represent 28 × 28 pixels of each image and 10 output neurons represent 10 digits (0~9). We also used a rectifier linear unit (ReLU) as an activation function of neuron, which is one of the popular activation functions [20]. For the learning process, a supervised learning method was used. At first, the error was calculated at the output neurons. Next, the target change in synaptic weight (the number of programming pulses) was determined by the gradient descent method. After that, the synaptic weight value is updated based on fitted equations for the conductance modulation characteristics of a successive-pulse programming scheme and the ISPP scheme. Figure 7a shows the system-level simulation result of the pattern recognition accuracy with the 10,000 test image samples. Compared to our previous work [11], the ISPP scheme can increase recognition accuracy by about 6% (a successive-pulse programming scheme in our previous work: 79.83% [11], and the ISPP scheme in this work: 85.9%). This result is in good agreement with the other papers that the linear conductance modulation characteristic is essential for the better performance of neuromorphic systems [5,21]. The synaptic weight maps after training 10,000 samples with the ISPP scheme are illustrated in Figure 7b. In addition, we examined the synaptic weight modulation characteristics according to the various values of V step in the ISPP scheme. As illustrated in Figure 8a, a smaller V step allows for fine conductance modulation, which means that the number of the synaptic weight level can be increased. As a result, the fine conductance modulation ability with a smaller V step can obtain more accurate pattern recognition rate, as shown in Figure 8b. It should be noted, however, that the retention characteristics (the ability to distinguish each level for a long time) can deteriorate when the interval between each synaptic weight level becomes narrow. Therefore, the magnitude of V step should be determined considering the trade-off relationship between the retention characteristic and the accuracy.

Discussion
Currently, numerous researches based on different types of nonvolatile memory devices have been conducted to implement neuromorphic hardware systems. Table 3 summarizes some of the research results. Almost all previous studies are based on the 2-D synapse array structure, but for the first time we proposed the 3-D stacked synapse array structure. This paper has addressed several issues associated with the design of the 3-D synapse array architecture in terms of a full-chip level. This will be an important guideline for designing a 3-D stacked synapse array. The approach of stacking CTF devices is a mature technology that has been already used in commercialized 3-D NAND flash memories. Consequently, the proposed 3-D synapse architecture is expected to have a high possibility of actual mass production. Also, it can achieve excellent reliability by utilizing the various technologies used in NAND flash memory. For example, we have demonstrated that the ISPP method can improve the pattern recognition accuracy of a neuromorphic system.
For future work, we will demonstrate the superiority of the proposed 3-D synapse architecture based on an actual fabricated array. In addition, application researches to various artificial neural networks such as a convolutional neural network (CNN) and a recurrent neural network (RNN) will be a crucial topic.

Conclusions
We proposed a 3-D synapse array architecture based on a CTF memory device. To resolve the drawback of the previous version of the architecture, a ground select decoder was newly added. Also, we introduced the ISPP scheme to improve the linearity of the conductance modulation. The characteristics of synaptic weight modulation was characterized using a TCAD device simulation. In addition, we demonstrated the feasibility of the proposed architecture for neuromorphic system applications through a MATLAB simulation for the MNIST pattern recognition. The proposed 3-D synapse array architecture that exhibits a compact chip configuration and a high-integration ability will be a promising technology that can realize hardware-based neuromorphic systems.

Conflicts of Interest:
The authors declare no conflict of interest.