Optimal Energetic-Trap Distribution of Nano-Scaled Charge Trap Nitride for Wider Vth Window in 3D NAND Flash Using a Machine-Learning Method

A machine-learning (ML) technique was used to optimize the energetic-trap distributions of nano-scaled charge trap nitride (CTN) in 3D NAND Flash to widen the threshold voltage (Vth) window, which is crucial for NAND operation. The energetic-trap distribution is a critical material property of the CTN that affects the Vth window between the erase and program Vth. An artificial neural network (ANN) was used to model the relationship between the energetic-trap distributions as an input parameter and the Vth window as an output parameter. A well-trained ANN was used with the gradient-descent method to determine the specific inputs that maximize the outputs. The trap densities (NTD and NTA) and their standard deviations (σTD and σTA) were found to most strongly impact the Vth window. As they increased, the Vth window increased because of the availability of a larger number of trap sites. Finally, when the ML-optimized energetic-trap distributions were simulated, the Vth window increased by 49% compared with the experimental value under the same bias condition. Therefore, the developed ML technique can be applied to optimize cell transistor processes by determining the material properties of the CTN in 3D NAND Flash.


Introduction
In recent years, solid-state drives (SSDs) have been developed to satisfy the requirements for rapidly storing and using large amounts of data. NAND Flash is a key and widely used component in SSDs because of its suitability for mass production and reasonable bit cost [1]. NAND Flash has been developed with innovation in three-dimensional (3D) architecture and charge trap nitride (CTN) as a storage node [2][3][4][5]. Further, the bit cost of NAND Flash has been reduced using multi-level cell (MLC) technology, which stores multiple bits in each single cell [6].
An MLC creates more threshold-voltage (V th ) states in one cell by subdividing the amount of charge injected into the storage node. As more V th states are created, the number of bits stored per cell increases, and therefore, the bit cost considerably reduces [7,8]. Unfortunately, the number of sections with overlaps between subdivided V th states also increases, making it difficult to distinguish the actual V th states and thereby degrading the device reliability [9][10][11]. Multiple V th states are created within a limited V th window between the erase (ERS) and the program (PGM) V th [12,13]. Then, if the V th window is widened, the errors in distinguishing V th states can be reduced by obtaining the margin of each V th state. In the previous works, wide V th windows were achieved by adopting new materials and bias conditions [14,15]. However, qualitative analyses for improving the V th window have not sufficiently focused on the material properties of the storage node.
The energetic-trap distribution is an important material property of the CTN, a nanoscaled thin film in 3D NAND Flash. Previously, it was discussed based on the analytical model for charges [16]. The energetic-trap distribution can be extracted using retention models [17] and trap spectroscopy by charge injection and sensing (TSCIS) [18]. Furthermore, the energetic-trap distributions are determined by the gas flow ratio deposited on the CTN [19,20]. These studies indicated that energetic-trap distributions are controllable and that their profiles determine the ERS V th (V th,ers ) and PGM V th (V th,pgm ) [17,21]. We tried to maximize the V th window, that is, the sum of the absolute value of V th,ers (|V th,ers |) and V th,pgm , by optimizing the energetic-trap distributions.
We also used a novel machine-learning (ML) method to improve the V th window. Recently, ML has been used for predicting and optimizing nanoscale transistors [22][23][24]. Moreover, it can help quantitatively determine the complex material properties of the CTN with high speed and accuracy. Therefore, an artificial neural network (ANN) [25] was trained by modeling eight inputs that determine the energetic-trap distributions and the two outputs |V th,ers | and V th,pgm . Next, the energetic-trap distributions were investigated to realize a large V th window by the gradient-descent method, a widely used optimization algorithm.
In this study, the ML method helps determine the optimized energetic-trap distributions and thereby control the material properties of the CTN in 3D NAND Flash. We quantitatively found the energetic-trap distributions that resulted in the largest V th window. The remainder of this paper is organized as follows. In Section 2, the simulation data and ML method used to train the ANN are introduced. In Section 3, we discuss the results of training and optimization of ML-based analysis. Finally, the conclusions are drawn in Section 4.

Simulation and Machine-Learning Method
We obtained data from TCAD simulations for training the ANN [26]. Shockley-Read-Hall recombination, drift-diffusion transport, mobility (high-field, doping and interface dependent) and Hurkx band-to-band tunneling models were adopted for the poly-Si channel. Furthermore, the nonlocal tunneling model was used to describe charge transport in ERS/PGM operations. Figure 1 shows the schematic diagrams of 3D NAND Flash used in the simulation. Figure 1a shows a part of the 3D NAND Flash string. It has a cylindrical structure, and both ends are connected by a bit line (BL) and a source line (SL). V th is extracted from the BL current (I BL ) of 1 µA vs. selected word line (WL Sel ) voltage in the middle of the string. Figure 1b shows the half cross-sectional view of the cylindrical structure and a gate stack. Furthermore, ERS (PGM) state is the condition when holes (electrons) are filled in the CTN. The simulation structure was derived from the manufactured device [27]. Figure 2 shows two energetic-trap distributions following the Gaussian distribution of the bandgap in the CTN. Donor-and acceptor-like traps capture holes and electrons, respectively. Each distribution consisted of the trap densities (N TD and N TA ), peak energy levels (E TD and E TA ), capture cross sections (CCS D and CCS A ), and standard deviations (σ D and σ A ). These eight inputs determine the profile of the distributions, and V th,ers and V th,pgm change accordingly. Figure 3 shows V th with the ERS/PGM operating time. Each operation was performed by applying a single pulse; the PGM operation started from the ERS state, whereas the ERS operation started from the PGM state. Under the same bias condition (V PGM = 15 V, V ERS = 20 V), the simulation data (solid lines) were calibrated with the experimental data Nanomaterials 2022, 12, 1808 3 of 10 (symbols) by adjusting the energetic-trap distributions of the CTN. The calibrated simulation data were then used as training data for the ML-based analysis. In this study, the V th window is the sum of |V th,ers | and V th,pgm at time = 10 −2 s. The corresponding V th window contains the difference between the ERS and the highest PGM V th states of an MLC. Nanomaterials 2022, 12, 1808 3 of 10 (symbols) by adjusting the energetic-trap distributions of the CTN. The calibrated simulation data were then used as training data for the ML-based analysis. In this study, the Vth window is the sum of |Vth,ers| and Vth,pgm at time = 10 −2 s. The corresponding Vth window contains the difference between the ERS and the highest PGM Vth states of an MLC.      Figure 4 shows a schematic of the ANN. First, datasets are produced from TCAD simulation for 3D NAND Flash cell. These extracted datasets are used to design the ANN. A multilayer perceptron (MLP), in which several layers of perceptrons are sequentially attached, was used for training [28]. MLP is suitable for solving nonlinear functions that cannot be solved using a single-layer perceptron, so it is useful for the training complex model. In this study, this feedforward network contains one hidden layer with 15 nodes and tanh as an activation function. In addition, it was trained using MATLAB to model the inputs and outputs [29]. The Levenberg-Marquardt method was used for the backpropagation of training, and the cost function was calculated by the mean squared error (MSE). The MSE indicates the accuracy of ML training. The smaller it is, the better is the training of the ANN. After training, the gradient-descent method was used in the backward direction to determine the optimal inputs that resulted in a large Vth window. This method is widely used for finding the minimum value of the cost function in ML. Finally, we set the cost function in the direction of making large outputs. In summary, we trained the ANN using well-calibrated simulation data and then used the well-trained ANN to determine the optimal inputs that resulted in the largest outputs. Table 1 summarizes the calibrated values and ranges of the eight inputs for training and optimizing of the ANN. These inputs were randomized within each range uniformly, and different |Vth,ers| and Vth,pgm values were derived, accordingly. NTD, NTA, CCSD, and CCSA were logged, and then all parameters were standardized to improve the prediction accuracy of the ANN. In addition, there is no correlation between inputs. The calibrated values are generally the median of each range, and the entire range is reasonable [27,30]. Other material properties of the CTN were fixed only to verify the effect of energetic-trap distributions.   Figure 4 shows a schematic of the ANN. First, datasets are produced from TCAD simulation for 3D NAND Flash cell. These extracted datasets are used to design the ANN. A multilayer perceptron (MLP), in which several layers of perceptrons are sequentially attached, was used for training [28]. MLP is suitable for solving nonlinear functions that cannot be solved using a single-layer perceptron, so it is useful for the training complex model. In this study, this feedforward network contains one hidden layer with 15 nodes and tanh as an activation function. In addition, it was trained using MATLAB to model the inputs and outputs [29]. The Levenberg-Marquardt method was used for the backpropagation of training, and the cost function was calculated by the mean squared error (MSE). The MSE indicates the accuracy of ML training. The smaller it is, the better is the training of the ANN. After training, the gradient-descent method was used in the backward direction to determine the optimal inputs that resulted in a large V th window. This method is widely used for finding the minimum value of the cost function in ML. Finally, we set the cost function in the direction of making large outputs. In summary, we trained the ANN using well-calibrated simulation data and then used the well-trained ANN to determine the optimal inputs that resulted in the largest outputs. Table 1 summarizes the calibrated values and ranges of the eight inputs for training and optimizing of the ANN. These inputs were randomized within each range uniformly, and different |V th,ers | and V th,pgm values were derived, accordingly. N TD , N TA , CCS D , and CCS A were logged, and then all parameters were standardized to improve the prediction accuracy of the ANN. In addition, there is no correlation between inputs. The calibrated values are generally the median of each range, and the entire range is reasonable [27,30]. Other material properties of the CTN were fixed only to verify the effect of energetictrap distributions.  (Table 1) for two outputs (|Vth,ers| and Vth,pgm) in ANN. An MLP was used as a learning algorithm to train the ANN. In the backward direction, the eight inputs can be found in the direction of making the two outputs large.  Figure 5 shows the example of MSEs in the training and validation sets with epochs. Here, epoch refers to the number of times the entire data have passed through the neural network. A total data of 1980 samples were used for ML training. First, they were used for training and test sets with a weight of 80/20. Then, the training sets were divided fivefold. Each fold became a validation set once, and the mean value of the five evaluations was used to determine the performance of the corresponding model. We also repeated this entire process five times by splitting the nodes of the hidden layer to increase the reliability of the model. As a result, the 15 nodes of the hidden layer have the lowest MSE, and the results of applying the model to test sets are shown in Figure 6. Furthermore, the model was well-generalized with no overfitting and underfitting because of the five-fold cross validation.  (Table 1) for two outputs (|V th,ers | and V th,pgm ) in ANN. An MLP was used as a learning algorithm to train the ANN. In the backward direction, the eight inputs can be found in the direction of making the two outputs large.  Figure 5 shows the example of MSEs in the training and validation sets with epochs. Here, epoch refers to the number of times the entire data have passed through the neural network. A total data of 1980 samples were used for ML training. First, they were used for training and test sets with a weight of 80/20. Then, the training sets were divided five-fold. Each fold became a validation set once, and the mean value of the five evaluations was used to determine the performance of the corresponding model. We also repeated this entire process five times by splitting the nodes of the hidden layer to increase the reliability of the model. As a result, the 15 nodes of the hidden layer have the lowest MSE, and the results of applying the model to test sets are shown in Figure 6. Furthermore, the model was well-generalized with no overfitting and underfitting because of the five-fold cross validation. Figure 6a compares the simulated and the estimated values of |V th,ers | and V th,pgm in the test sets. For one input set, two outputs appear simultaneously, and the estimation is expected to be good as the two symbols overlap. Figure 6b shows the errors in the individual values. Each error is extremely small and is within the acceptable range of ±5%. Furthermore, each MSE is 0.7368 × 10 −3 and 0.8472 × 10 −3 ; therefore, the prediction accuracy is very high. These results confirmed that the ANN was trained well. Furthermore, the ANN model was superior in learning speed and accuracy in our datasets compared with other regression models, the random forest, etc.   Figure 6a compares the simulated and the estimated values of |Vth,ers| and Vth,pgm in the test sets. For one input set, two outputs appear simultaneously, and the estimation is expected to be good as the two symbols overlap. Figure 6b shows the errors in the individual values. Each error is extremely small and is within the acceptable range of ±5%. Furthermore, each MSE is 0.7368 × 10 −3 and 0.8472 × 10 −3 ; therefore, the prediction accuracy is very high. These results confirmed that the ANN was trained well. Furthermore, the ANN model was superior in learning speed and accuracy in our datasets compared with other regression models, the random forest, etc. Figure 7 shows the raw simulated and optimized values of the Vth window; the latter are all larger than the former. We set the cost function in the direction of maximizing the Vth window. The gradient-descent method was used to determine the slope of the cost function. After setting the random inputs within the range listed in Table 1 Figure 6a compares the simulated and the estimated values of |Vth,ers| and Vth,pgm in the test sets. For one input set, two outputs appear simultaneously, and the estimation is expected to be good as the two symbols overlap. Figure 6b shows the errors in the individual values. Each error is extremely small and is within the acceptable range of ±5%. Furthermore, each MSE is 0.7368 × 10 −3 and 0.8472 × 10 −3 ; therefore, the prediction accuracy is very high. These results confirmed that the ANN was trained well. Furthermore, the ANN model was superior in learning speed and accuracy in our datasets compared with other regression models, the random forest, etc. Figure 7 shows the raw simulated and optimized values of the Vth window; the latter are all larger than the former. We set the cost function in the direction of maximizing the Vth window. The gradient-descent method was used to determine the slope of the cost function. After setting the random inputs within the range listed in Table 1 Figure 7 shows the raw simulated and optimized values of the V th window; the latter are all larger than the former. We set the cost function in the direction of maximizing the V th window. The gradient-descent method was used to determine the slope of the cost function. After setting the random inputs within the range listed in Table 1, the estimated V th window was derived using the well-trained ANN. Then, the random inputs were set again until the slope reached the extreme value. Finally, the inputs that resulted in a large V th window were found. Table 2 lists the optimized inputs that resulted in the largest V th window after 2000 iterations.

Results and Discussion
Vth window was derived using the well-trained ANN. Then, the random inputs were set again until the slope reached the extreme value. Finally, the inputs that resulted in a large Vth window were found. Table 2 lists the optimized inputs that resulted in the largest Vth window after 2000 iterations.

Trap Parameter
Value NTD (cm −3 ·eV −1 ) 5.00 × 10 19 ETD (eV) 2.00 CCSD (cm 2 ) 1.00 × 10 −15 σD (eV) 0.50 NTA (cm −3 ·eV −1 ) 8.00 × 10 19 ETA (eV) 1.45 CCSA (cm 2 ) 1.00 × 10 −15 σA (eV) 0.50 Figure 8 compares the calibrated and the optimized energetic-trap distributions. The ML results indicated that NTD, NTA, σTD, and σTA strongly influenced the Vth window. When they increased, both |Vth,ers| and Vth,pgm increased significantly. This is because many available trap sites could capture more holes or electrons. When ETD decreased and ETA increased, both |Vth,ers| and Vth,pgm increased slightly. This is because the deeper energy of each distribution reduced the attempt-to-escape factor [31]. However, ETD and ETA mainly determined the retention characteristics; therefore, the correlation with the Vth window was weak. Similarly, CCSD and CCSA had small effects on the Vth window. In summary, large NTD, NTA, σTD, and σTA resulted in a large Vth window owing to the availability of a larger number of trap sites. However, ETD, ETA, CCSD, and CCSA had a small correlation with the Vth window because we did not consider the retention characteristics.   Figure 8 compares the calibrated and the optimized energetic-trap distributions. The ML results indicated that N TD , N TA , σ TD , and σ TA strongly influenced the V th window. When they increased, both |V th,ers | and V th,pgm increased significantly. This is because many available trap sites could capture more holes or electrons. When E TD decreased and E TA increased, both |V th,ers | and V th,pgm increased slightly. This is because the deeper energy of each distribution reduced the attempt-to-escape factor [31]. However, E TD and E TA mainly determined the retention characteristics; therefore, the correlation with the V th window was weak. Similarly, CCS D and CCS A had small effects on the V th window. In summary, large N TD , N TA , σ TD , and σ TA resulted in a large V th window owing to the availability of a larger number of trap sites. However, E TD , E TA , CCS D , and CCS A had a small correlation with the V th window because we did not consider the retention characteristics. Figure 9 shows the experimental and simulated |V th | values optimized by ML. Here, the simulated V th window was calculated from the best inputs in Table 2. In this case, the |V th | error between the simulation and the ANN was within 4.34%. Therefore, the simulated V th windows were reliable. The V th window increased by 49%. In other words, the V th window can be sharply increased by optimizing the energetic-trap distributions. Therefore, we can provide a guideline for maximizing the V th window, although the precise process remains difficult.  Figure 9 shows the experimental and simulated |Vth| values optimized by ML. Here, the simulated Vth window was calculated from the best inputs in Table 2. In this case, the |Vth| error between the simulation and the ANN was within 4.34%. Therefore, the simulated Vth windows were reliable. The Vth window increased by 49%. In other words, the Vth window can be sharply increased by optimizing the energetic-trap distributions. Therefore, we can provide a guideline for maximizing the Vth window, although the precise process remains difficult.

Conclusions
ML-based analysis was used to obtain optimized energetic-trap distributions for the CTN in 3D NAND Flash to improve the Vth window. The ANN enables modeling the relationship between eight inputs that determine the energetic-trap distributions and the two outputs, |Vth,ers| and Vth,pgm. The ANN was trained using well-calibrated simulation   Figure 9 shows the experimental and simulated |Vth| values optimized by ML. Here, the simulated Vth window was calculated from the best inputs in Table 2. In this case, the |Vth| error between the simulation and the ANN was within 4.34%. Therefore, the simulated Vth windows were reliable. The Vth window increased by 49%. In other words, the Vth window can be sharply increased by optimizing the energetic-trap distributions. Therefore, we can provide a guideline for maximizing the Vth window, although the precise process remains difficult.

Conclusions
ML-based analysis was used to obtain optimized energetic-trap distributions for the CTN in 3D NAND Flash to improve the Vth window. The ANN enables modeling the relationship between eight inputs that determine the energetic-trap distributions and the two outputs, |Vth,ers| and Vth,pgm. The ANN was trained using well-calibrated simulation

Conclusions
ML-based analysis was used to obtain optimized energetic-trap distributions for the CTN in 3D NAND Flash to improve the V th window. The ANN enables modeling the relationship between eight inputs that determine the energetic-trap distributions and the two outputs, |V th,ers | and V th,pgm . The ANN was trained using well-calibrated simulation data with experiments, and the MSEs were found to be small. Then, we used the gradientdescent method to determine the best inputs that resulted in the largest V th window. N TD , N TA , σ TD , and σ TA significantly influenced the V th window. As they increased, the V th window grew because of the large number of trap sites. In particular, when the best inputs obtained using ML were employed, the V th window increased by 49% compared with the experimental value. This study should enable the determination of the V th window from the material properties of the CTN in 3D NAND Flash. More generally, this work implies that ML can help to solve the complex problem of nanomaterials accurately and optimize it rapidly.