Recent Progress on 3D NAND Flash Technologies

: Since 3D NAND was introduced to the industry with 24 layers, the areal density has been successfully increased more than ten times, and has exceeded 10 Gb/mm 2 with 176 layers. The physical scaling of XYZ dimensions including layer stacking and footprint scaling enabled the density scaling. Logical scaling has been successfully realized, too. TLC (triple-level cell, 3 bits per cell) is now the mainstream in 3D NAND, while QLC (quad-level cell, 4 bits per cell) is increasing the presence. Several attempts and partial demonstrations were made for PLC (penta-level cell, 5 bits per cell). CMOS under array (CuA) enabled the die size reduction and performance improvements. Program and erase schemes to address the technology challenges such as short-term data retention of the charge-trap cell and the large block size are being investigated.


Introduction
After 2D NAND reached the scaling limit around 15 nm in process node, 3D NAND was proposed as a solution for the continuous NAND scaling [1]. The 3D NAND was introduced into production with 24 layers and MLC technology [2]. The scaling trend of the areal density of 2D NAND and 3D NAND is summarized based on the NAND publications in IEEE ISSCC conferences, ( Figure 1). As seen in Figure 1, 3D NAND successfully replaced 2D NAND and has achieved more than 10 Gb/mm 2 areal density .
At the early stage of the 3D NAND development, various types of 3D NAND technologies were proposed. A comprehensive survey can be found in [32]. After extensive research on various technology options, vertical NAND string architecture with gate-all-around (GAA) cells was introduced to the industry for floating gate (FG) cells and charge-trap cells [2,33].
In this paper, the status of 3D NAND scaling is reviewed and discussed by using the ISSCC conference publications as a reference. This review mainly focuses on the vertical 3D NAND technology adopted in the industry. The physical and logical scaling of 3D NAND will be discussed. The recent progress and topics since the prior review of 3D NAND [34] will be reviewed, including progress on the QLC technology and beyond.  . Adapted with permission from ref. [34], Copyright 2020 IEEE. Figure 2 shows the 3D NAND cell array architectures. The strings are placed in the vertical direction. Word lines (WLs) have a plate-like shape and are stacked vertically for 3D cell stacking. There are multiple select gates at the drain side (SGDs) in a block. The channel of the NAND string has a cylinder shape.  . Adapted with permission from ref. [34], Copyright 2020 IEEE.  Figure 2 shows the 3D NAND cell array architectures. The strings are placed in the vertical direction. Word lines (WLs) have a plate-like shape and are stacked vertically for 3D cell stacking. There are multiple select gates at the drain side (SGDs) in a block. The channel of the NAND string has a cylinder shape.  . Adapted with permission from ref. [34], Copyright 2020 IEEE. Figure 2 shows the 3D NAND cell array architectures. The strings are placed in the vertical direction. Word lines (WLs) have a plate-like shape and are stacked vertically for 3D cell stacking. There are multiple select gates at the drain side (SGDs) in a block. The channel of the NAND string has a cylinder shape. A block is a unit of the erase operation. As shown in Figure 3, there are two types of erase methods in 3D NAND-the body erase ( Figure 3a) and the GIDL erase ( Figure 3b) [1,35]. In the body erase, NAND strings are connected to the Si-substrate, and holes are supplied to the NAND string from the Si-substrate, enabling the positive body potential required for erase. In the GIDL erase, the NAND strings are de-decoupled from the Sisubstrate and formed on the N+ source layer instead. During erase, the electron-hole pairs are generated at source and drain N+ junctions by GIDL mechanism to supply holes to A block is a unit of the erase operation. As shown in Figure 3, there are two types of erase methods in 3D NAND-the body erase ( Figure 3a) and the GIDL erase ( Figure 3b) [1,36]. In the body erase, NAND strings are connected to the Si-substrate, and holes are supplied to the NAND string from the Si-substrate, enabling the positive body potential required for erase. In the GIDL erase, the NAND strings are de-decoupled from the Si-substrate and formed on the N+ source layer instead. During erase, the electron-hole pairs are generated at source and drain N+ junctions by GIDL mechanism to supply holes to the NAND strings. The GIDL erase is used for the CMOS under Array (CuA) technology, which will be discussed later [33]. the NAND strings. The GIDL erase is used for the CMOS under Array (CuA) technology, which will be discussed later [33].

3D NAND Architecture and Operations
(a) (b) During program and read, one of the SGDs is selected in a block so that one NAND string can be selected per bit line ( Figure 4).  During program and read, one of the SGDs is selected in a block so that one NAND string can be selected per bit line (Figure 4).
(a) (b) Figure 3. Erase schemes of 3D NAND. (a) Body erase scheme directly biases the channels at erase voltage. Holes are supplied from the Si-substrate. (b) GIDL erase scheme generates electron-hole pairs at the source and drain junctions. The generated holes are supplied to the channels to boost the potential to around the erase voltage.
During program and read, one of the SGDs is selected in a block so that one NAND string can be selected per bit line (Figure 4).

Architectures of FG NAND and RG NAND
The floating gate (FG) cell technology was used in 2D NAND. In 3D NAND, in addition to the FG technology (FG NAND), replacement gate cell technology (RG NAND) is also utilized [33,35]. Figure 5 compares the cross-sections of the NAND strings for FG NAND and RG NAND.

Architectures of FG NAND and RG NAND
The floating gate (FG) cell technology was used in 2D NAND. In 3D NAND, in addition to the FG technology (FG NAND), replacement gate cell technology (RG NAND) is also utilized [33,36]. Figure 5 compares the cross-sections of the NAND strings for FG NAND and RG NAND.
In FG NAND, the FG storages are separated between cells because the FG is made of the conductive polysilicon material. The advanced cell integration scheme is adopted to realize the FG separation [33]. In this scheme, the cells are formed outside the pillar holes. Therefore, the diameter of the pillar etching needs to be aligned with the final channel diameter of the NAND string. In FG NAND, the FG storages are separated between cells because the FG is made of the conductive polysilicon material. The advanced cell integration scheme is adopted to realize the FG separation [33]. In this scheme, the cells are formed outside the pillar holes. Therefore, the diameter of the pillar etching needs to be aligned with the final channel diameter of the NAND string.  In RG NAND, charge-trap cells with a silicon-nitride (SiN) storage layer are employed [2]. Because the SiN storage is a dielectric material which can trap charges, the storage layer can be shared and continuous among the cells. The SiN storage film and other films composing the charge-trap cells are formed after the dry etching of the pillar holes. Therefore, the diameter of the pillar holes for etching can be larger than that of the final channel diameter by the thicknesses of the cell dielectrics. This is advantageous for the pillar etching, especially for the very tall pillar when many cells are stacked vertically. The word line is formed by tungsten metal by replacing the SiN films stacked originally [36]. Therefore, the technology is called replacement gate NAND. RG NAND is the combination of the replacement gate technology and the charge-trap cell technology.

Band-Engineered Tunneling Dielectrics of the Charge-Trap Cell
In the FG cell, electrons are injected to or emitted from the FG by Fowler-Nordheim (FN) Tunneling (Figure 6a). In the charge-trap cells, the programming is similar to FG cells, where the electrons are injected to the SiN storage by modified FN tunneling. For erase, holes are injected to the SiN storage by direct tunneling (DT) (DT erase) (Figure 6b). In the charge-trap cell, in order to enhance erase efficiently, the cell stack is engineered as shown in Figure 7a. First, the band engineered (BE) tunnel directrices has been introduced [37]. As BE-tunnel dielectrics, an ONO stacked film or a film with an engineered nitrogen profile can be used instead of the SiO 2 tunnel layer. With BE-tunnel dielectrics, holes can tunnel only the thin oxide layer during the erase, while the full ONO thickness can be utilized during the retention. Second, the High-k/Metal gate is used to reduce the unwanted electron injection from the control gate [38]. By combining the BE-tunnel layer and the High-k/Metal gate, good erase characteristics can be achieved with the charge-trap cell ( Figure 7b).

Data Retention Mechanisms of the Charge-Trap Cell
The short-term data retention has been a challenge for the charge-trap cell [38]. Figure 8 shows various mechanisms potentially causing the short-term data retention [39]. The first mechanism is charge migration and relaxation. After the programming, the trapped charge in SiN storage could move both laterally (lateral migration, LM) or vertically (vertical relaxation, VR). The second mechanism is detrapping from the SiN storage layer by trapassisted tunneling (TAT). On top of these, there is trapping at the BE-tunnel oxide as it includes SiN or a nitrogen-rich layer. Due to the very short distance between the trapping sites at BE and the poly Si channel, the detrapping can occur in a very short time. The impacts on the Vth distribution and programming algorithms solutions for the short-term data retention will be discussed in a later section.

Data Retention Mechanisms of the Charge-Trap Cell
The short-term data retention has been a challenge for the charge-trap cell [38]. Figure  8 shows various mechanisms potentially causing the short-term data retention [39]. The first mechanism is charge migration and relaxation. After the programming, the trapped charge in SiN storage could move both laterally (lateral migration, LM) or vertically (vertical relaxation, VR). The second mechanism is detrapping from the SiN storage layer by trap-assisted tunneling (TAT). On top of these, there is trapping at the BE-tunnel oxide as it includes SiN or a nitrogen-rich layer. Due to the very short distance between the trapping sites at BE and the poly Si channel, the detrapping can occur in a very short time. The impacts on the Vth distribution and programming algorithms solutions for the short-term data retention will be discussed in a later section.

Conventional Scaling
The physical scaling of the 3D NAND array can be described as XYZ scaling, as shown in Figure 9. XY scaling means reducing the cell footprint. Z scaling means stacking more layers, which is often done together with layer pitch shrink (Z shrink) in order to minimize the increase in the physical height. Z scaling (stacking) has been the main scaling enabler so far. In the ISSCC publications, the steady progress of the layer stacking has been shown, except for in 2020, when the focus was on circuit design technologies for single-level-cells (SLC) and quad-level-cells (QLC) rather than the physical array scaling. In the latest achievement, 176-layer stacked 3D NAND has been demonstrated in both publication and mass production ( Figure 10).

Conventional Scaling
The physical scaling of the 3D NAND array can be described as XYZ scaling, as shown in Figure 9. XY scaling means reducing the cell footprint. Z scaling means stacking more layers, which is often done together with layer pitch shrink (Z shrink) in order to minimize the increase in the physical height. Z scaling (stacking) has been the main scaling enabler so far. In the ISSCC publications, the steady progress of the layer stacking has been shown, except for in 2020, when the focus was on circuit design technologies for single-level-cells (SLC) and quad-level-cells (QLC) rather than the physical array scaling. In the latest achievement, 176-layer stacked 3D NAND has been demonstrated in both publication and mass production ( Figure 10). more layers, which is often done together with layer pitch shrink (Z shrink) in order to minimize the increase in the physical height. Z scaling (stacking) has been the main scaling enabler so far. In the ISSCC publications, the steady progress of the layer stacking has been shown, except for in 2020, when the focus was on circuit design technologies for single-level-cells (SLC) and quad-level-cells (QLC) rather than the physical array scaling. In the latest achievement, 176-layer stacked 3D NAND has been demonstrated in both publication and mass production ( Figure 10).  The effort for XYZ scaling has been focused on efficiency improvement of the pillar (memory hole) layout [40]. Due to the various layout space requirements such as source line contacts, SGD-to-SGD separations, block-to-block separations, the pillar layout is far from the ideal hexagonal close-packed (HCP) layout. There have been continuous improvements in the array layout so that the pillar arrangement has become closer to HCP, enabling XY scaling ( Figure 11).  The effort for XYZ scaling has been focused on efficiency improvement of the pillar (memory hole) layout [41]. Due to the various layout space requirements such as source line contacts, SGD-to-SGD separations, block-to-block separations, the pillar layout is far from the ideal hexagonal close-packed (HCP) layout. There have been continuous improvements in the array layout so that the pillar arrangement has become closer to HCP, enabling XY scaling ( Figure 11).

Disruptive Scaling
As a disruptive XY scaling, the split cells have been investigated (Figure 12). In the conventional 3D NAND, the cell has a cylinder shape. In the split cell, the cell is split into two parts so that the cell density increases. There are two different types of the split cell proposals. One is a planar-like split cell [41] and the other is a half-cylindrical cell [42,43].

Disruptive Scaling
As a disruptive XY scaling, the split cells have been investigated (Figure 12). In the conventional 3D NAND, the cell has a cylinder shape. In the split cell, the cell is split into two parts so that the cell density increases. There are two different types of the split cell proposals. One is a planar-like split cell [42] and the other is a half-cylindrical cell [43,44]. The planar-like cell is similar to the 2D NAND cell, while the half-cylindrical cell can be seen as an evolution of the 3D NAND cylindrical cell. For both split cells, the challenges are the process integration of the cell split, the increased cell-to-cell interference and the reduced gate-coupling ratio. The planar-like cell is similar to the 2D NAND cell, while the half-cylindrical cell can be seen as an evolution of the 3D NAND cylindrical cell. For both split cells, the challenges are the process integration of the cell split, the increased cell-to-cell interference and the reduced gate-coupling ratio. As discussed earlier, the pitch shrink of the WL layer is critical to manage the pillar height with layer stacking. In the charge-trap cell, the SiN storage layer is continuous between neighboring cells. With the WL pitch scaling, the trapped charge migration between the neighboring cells would raise the reliability concern. In order to overcome this issue, the SiN storage separation has been proposed ( Figure 13) [44]. The process flow is similar to that of 3D FG NAND where the diameter of the pillar etching is smaller compared to the conventional 3D RG NAND. Therefore, the process integration needs to be well-managed for the successful implementation of the SiN storage separation.  As discussed earlier, the pitch shrink of the WL layer is critical to manage the pillar height with layer stacking. In the charge-trap cell, the SiN storage layer is continuous between neighboring cells. With the WL pitch scaling, the trapped charge migration between the neighboring cells would raise the reliability concern. In order to overcome this issue, the SiN storage separation has been proposed ( Figure 13) [45]. The process flow is similar to that of 3D FG NAND where the diameter of the pillar etching is smaller

3D NAND Array Logical Scaling by Cell Device Engineering
In addition to the physical cell density scaling, the logical density scaling (i.e., more bits per cell scaling) has been actively pursued in 3D NAND. Figure 14 shows the number of 3D NAND publications in ISSCC conferences. 3D NAND started as MLC technology and rapidly transitioned to TLC, owing to the excellent cell characteristics and reliability. Recently, QLC technology has been introduced to mass production and the presence of the QLC technology has also been increasing in the publications. Currently, TLC is the mainstream for high-performance and high-endurance usages. QLC is becoming mainstream for the high-density and low-cost usages.
Recently, "beyond QLC" efforts have been reported for both FG and charge-trap cells. PLC (5 bits per cell) distributions were shown with FG cells (Figure 15a) [45]. In this

3D NAND Array Logical Scaling by Cell Device Engineering
In addition to the physical cell density scaling, the logical density scaling (i.e., more bits per cell scaling) has been actively pursued in 3D NAND. Figure 14 shows the number of 3D NAND publications in ISSCC conferences. 3D NAND started as MLC technology and rapidly transitioned to TLC, owing to the excellent cell characteristics and reliability. Recently, QLC technology has been introduced to mass production and the presence of the QLC technology has also been increasing in the publications. Currently, TLC is the mainstream for high-performance and high-endurance usages. QLC is becoming mainstream for the high-density and low-cost usages.
Electronics 2021, 10, x FOR PEER REVIEW 9 of 17 work, the excellent data-retention properties of the FG cell were identified as critical to enable PLC. HLC (6 bits per cell) Vth distributions were experimentally shown at a cryogenic temperature of 77 K for charge-trap cells (Figure 15b) [46]. In that work, it was shown that random telegraph noise (RTN) improves at the 77 K while it degrades at 300K relative to 358 K. Performance and reliability characteristics as well as the operational conditions warrant further study to enable further bit-per-cell scaling beyond QLC.  Recently, "beyond QLC" efforts have been reported for both FG and charge-trap cells. PLC (5 bits per cell) distributions were shown with FG cells (Figure 15a) [46]. In this work, the excellent data-retention properties of the FG cell were identified as critical to enable PLC. HLC (6 bits per cell) Vth distributions were experimentally shown at a cryogenic temperature of 77 K for charge-trap cells (Figure 15b) [47]. In that work, it was shown that random telegraph noise (RTN) improves at the 77 K while it degrades at 300 K relative   [45,46], Copyright 2020 and 2021 IEEE, respectively.

Write bandwidth is provided by
Page size × number of planes/tProg (1)

Write bandwidth is provided by
Page size × number of planes/tProg (1) Therefore, the large write parallelism (= page size × number of planes) and short tProg are critical to achieve high write bandwidth. Figure 16 shows the trend of TLC write bandwidth from the ISSCC publications. There has been incremental improvement over the years and a steep increase in recent years. Therefore, the large write parallelism (= page size × number of planes) and short tProg are critical to achieve high write bandwidth. Figure 16 shows the trend of TLC write bandwidth from the ISSCC publications. There has been incremental improvement over the years and a steep increase in recent years. The former is due to the continuous improvement of TLC tProg. Figure 17 is the TLC tProg trend published or estimated from ISSCC publications. In TLC RG NAND, all seven programmed states are programmed in a single programming pass. The improvements in tProg are realized by combinations of WL and BL bias time reduction, fine tuning of the programming voltage compensating cell characteristics' variability across the pillar and the reduction in the program verify operations. Figure 18 shows the center XDEC (WLdriver) architecture which shortens the time for WL loading [31]. The former is due to the continuous improvement of TLC tProg. Figure 17 is the TLC tProg trend published or estimated from ISSCC publications. In TLC RG NAND, all seven programmed states are programmed in a single programming pass. The improvements in tProg are realized by combinations of WL and BL bias time reduction, fine tuning of the programming voltage compensating cell characteristics' variability across the pillar and the reduction in the program verify operations. Figure 18 shows the center XDEC (WL-driver) architecture which shortens the time for WL loading [31].
The former is due to the continuous improvement of TLC tProg. Figure 17 is the TLC tProg trend published or estimated from ISSCC publications. In TLC RG NAND, all seven programmed states are programmed in a single programming pass. The improvements in tProg are realized by combinations of WL and BL bias time reduction, fine tuning of the programming voltage compensating cell characteristics' variability across the pillar and the reduction in the program verify operations. Figure 18 shows the center XDEC (WLdriver) architecture which shortens the time for WL loading [31].  The latter (the steep increase in the write bandwidth) is realized by the increased number of planes owing to the CMOS under Array (CuA) architecture. Figure 19 shows three different CMOS architectures utilized in 3D NAND. In CMOS outside array (CoA), the CMOS circuits are placed next to the array. Therefore, the die size increases if the CMOS area increases. To increase the parallelism, a larger amount of CMOS circuits such as page buffer circuits need to be placed, which results in the increase in the die size. In the CMOS under Array (CuA), the CMOS circuits are placed under the array. More parallelism (more planes) can be realized with CuA because the larger area is available for CMOS circuits. Another variation is to place the CMOS over array by using wafer bonding technologies (Wafer on wafer, WoW) [47]. With this architecture, CMOS is processed separately from the array processing by using a dedicated wafer for CMOS. Therefore, the process flow can be optimized for CMOS devices and interconnect. The latter (the steep increase in the write bandwidth) is realized by the increased number of planes owing to the CMOS under Array (CuA) architecture. Figure 19 shows three different CMOS architectures utilized in 3D NAND. In CMOS outside array (CoA), the CMOS circuits are placed next to the array. Therefore, the die size increases if the CMOS area increases. To increase the parallelism, a larger amount of CMOS circuits such as page buffer circuits need to be placed, which results in the increase in the die size. In the CMOS under Array (CuA), the CMOS circuits are placed under the array. More parallelism (more planes) can be realized with CuA because the larger area is available for CMOS circuits. Another variation is to place the CMOS over array by using wafer bonding technologies (Wafer on wafer, WoW) [48]. With this architecture, CMOS is processed separately from the array processing by using a dedicated wafer for CMOS. Therefore, the process flow can be optimized for CMOS devices and interconnect. as page buffer circuits need to be placed, which results in the increase in the die size. In the CMOS under Array (CuA), the CMOS circuits are placed under the array. More parallelism (more planes) can be realized with CuA because the larger area is available for CMOS circuits. Another variation is to place the CMOS over array by using wafer bonding technologies (Wafer on wafer, WoW) [47]. With this architecture, CMOS is processed separately from the array processing by using a dedicated wafer for CMOS. Therefore, the process flow can be optimized for CMOS devices and interconnect.

QLC Program Schemes
QLC tProg is much longer than TLC tProg (as shown in Figure 21), in order to realize the tight Vth distributions. As discussed earlier, in RG NAND with the charge-trap cells, there is a phenomenon known as short-term data retention. This causes shift and widening of Vth distributions right after programming. To realize the tight Vth distributions for QLC, it is important to manage the short-term data retention effects.

QLC Program Schemes
QLC tProg is much longer than TLC tProg (as shown in Figure 21), in order to realize the tight Vth distributions. As discussed earlier, in RG NAND with the charge-trap cells, there is a phenomenon known as short-term data retention. This causes shift and widening of Vth distributions right after programming. To realize the tight Vth distributions for QLC, it is important to manage the short-term data retention effects.

QLC Program Schemes
QLC tProg is much longer than TLC tProg (as shown in Figure 21), in order to realize the tight Vth distributions. As discussed earlier, in RG NAND with the charge-trap cells, there is a phenomenon known as short-term data retention. This causes shift and widening of Vth distributions right after programming. To realize the tight Vth distributions for QLC, it is important to manage the short-term data retention effects. The coarse-fine programming scheme has been introduced for QLC RG NAND [26]. As shown in Figure 22a, all 16 levels are programmed in the first pass with relatively wide distribution widths. Due to the short-term data retention, the Vth distributions shift and The coarse-fine programming scheme has been introduced for QLC RG NAND [26]. As shown in Figure 22a, all 16 levels are programmed in the first pass with relatively wide distribution widths. Due to the short-term data retention, the Vth distributions shift and widen right after the coarse programming. The fine programming is performed as the second programming path, which tightens the Vth distributions by touching up the distribution tails caused by the short-term data retention. This scheme can be called the 16-16 scheme as the 16-level programming is performed twice. To reduce QLC tProg, an 8-16 scheme was proposed (Figure 22b). With the 8-16 scheme, eight levels are programmed at the coarse programming pass, contributing to the tProg reduction. widen right after the coarse programming. The fine programming is performed as the second programming path, which tightens the Vth distributions by touching up the distribution tails caused by the short-term data retention. This scheme can be called the 16-16 scheme as the 16-level programming is performed twice. To reduce QLC tProg, an 8-16 scheme was proposed (Figure 22b). With the 8-16 scheme, eight levels are programmed at the coarse programming pass, contributing to the tProg reduction.
In the FG NAND, the first programming pass only has four levels. The 16 levels are then completed at the second pass [30] because the short-term retention is much smaller in the FG cell and does not require the touch-up operation.
So far, 1.6-2 msec tProg has been reported for QLC and ~0.4 msec tProg has been reported for TLC. For the further enhancement of QLC tProg, solving the short-term data retention by programming scheme and cell improvement is critical.

Block Size Scaling
With the WL stacking increasing, the block size increases. Figure 23a shows the TLC block size as a function of number of WL stacking. The number of SGDs per block is chosen as a parameter. Typically, FG NAND runs 12-16 SGDs/block, while RG NAND has 4-8 SGDs/blocks. Given that the block size is the minimum granularity of erase, the increase in the block size could increase the system burden of the data management and would degrade system performance. In order to mitigate this problem, the block-by-deck scheme was proposed (Figure 23b) [30]. In this scheme, the NAND string is divided into multiple segments (three segments in this example) and each segment is treated as a different block. During erase operation, the entire pillar is biased to the erase voltage. The WLs of the selected deck block are grounded while the WLs of the unselected deck blocks are ramped to the channel potential. In this way, the cells in the unselected deck blocks can be inhib- In the FG NAND, the first programming pass only has four levels. The 16 levels are then completed at the second pass [30] because the short-term retention is much smaller in the FG cell and does not require the touch-up operation.
So far, 1.6-2 msec tProg has been reported for QLC and~0.4 msec tProg has been reported for TLC. For the further enhancement of QLC tProg, solving the short-term data retention by programming scheme and cell improvement is critical.

Block Size Scaling
With the WL stacking increasing, the block size increases. Figure 23a shows the TLC block size as a function of number of WL stacking. The number of SGDs per block is chosen as a parameter. Typically, FG NAND runs 12-16 SGDs/block, while RG NAND has 4-8 SGDs/blocks. Given that the block size is the minimum granularity of erase, the increase in the block size could increase the system burden of the data management and would degrade system performance. In order to mitigate this problem, the blockby-deck scheme was proposed (Figure 23b) [30]. In this scheme, the NAND string is divided into multiple segments (three segments in this example) and each segment is treated as a different block. During erase operation, the entire pillar is biased to the erase voltage. The WLs of the selected deck block are grounded while the WLs of the unselected deck blocks are ramped to the channel potential. In this way, the cells in the unselected deck blocks can be inhibited for erase. Block-by-deck erase scheme. The conventional physical block is divided into three logical blocks [30]. Reprinted with permission from ref. [23], Copyright 2021 IEEE.

Conclusions
3D NAND scaling has been successfully achieved. The layer stacking has reached 176 layers. QLC has been introduced in both FG NAND and RG NAND. Research on PLC is active, with partial demonstrations for cell capability. CMOS under array (CuA) has been widely adopted and enables performance enhancement by increasing the number of planes. For future scaling, on top of the continuous XYZ physical scaling, disruptive technologies such as split cells are proposed. Program and erase schemes are being developed further to solve cell reliability challenges and block size scaling challenges. Block-by-deck erase scheme. The conventional physical block is divided into three logical blocks [30]. Reprinted with permission from ref. [23], Copyright 2021 IEEE.

Conclusions
3D NAND scaling has been successfully achieved. The layer stacking has reached 176 layers. QLC has been introduced in both FG NAND and RG NAND. Research on PLC is active, with partial demonstrations for cell capability. CMOS under array (CuA) has been widely adopted and enables performance enhancement by increasing the number of planes. For future scaling, on top of the continuous XYZ physical scaling, disruptive technologies such as split cells are proposed. Program and erase schemes are being developed further to solve cell reliability challenges and block size scaling challenges.