Architecture and Process Integration Overview of 3D NAND Flash Technologies

: In the past few decades, NAND ﬂash memory has been one of the most successful nonvolatile storage technologies, and it is commonly used in electronic devices because of its high scalability and reliable switching properties. To overcome the scaling limit of planar NAND ﬂash arrays, various three-dimensional (3D) architectures of NAND ﬂash memory and their process integration methods have been investigated in both industry and academia and adopted in com-mercial mass production. In this paper, 3D NAND ﬂash technologies are reviewed in terms of their architecture and fabrication methods, and the advantages and disadvantages of the architectures are compared.


Introduction
Recently, the demand for mobile electronic devices has been increasing owing to a no-contact lifestyle. This is leading to an enormous market growth in storage memory, particularly NAND flash, because the technology transformation from hard-disk drives to solid-state drives has already been established, owing to the requirements for a faster operation speed and a lower power consumption. Since NAND flash was commercialized for mass production in the early 1990s, reducing the cost per bit has been one of the most important approaches for NAND flash memory, and there have been efforts to scale down the effective cell size. Interestingly, NAND flash cells have been integrated into threedimensional (3D) NAND flash architectures for further scaling down, unlike logic devices.
The 3D NAND flash architectures have been realized in various structures and can be categorized into two types based on the stacking direction: channel and gate stacks. Figure 1 shows the structure-year classification of 3D NAND flash. Toshiba announced the first 3D stack flash based on a gate stacked structure, such as bit cost scalable (BiCS) [1], pipe-shaped BiCS (PBiCS) [2] with charge trapping layers and a horizontal-channel-type floating gate (HC-FG) [3] with a floating gate. In addition, Samsung has developed various 3D structures based on both stack methods. Examples are the terabit cell array transistor (TCAT) flash [4], vertical recess array transistor (VRAT) flash, vertical stacked array transistor (VSAT) flash [5] structure for efficient connection of peripheral circuits, vertical gate (VG) NAND [6] with a vertical gate and channel stacked structure, and V-NAND, which was the first 3D NAND product commercialized by Samsung [7]. SK Hynix has also developed several 3D architectures based on both methods: a dual control gate with a surrounding floating-gate (DC-SF) [8] structure, a hybrid 3D NAND structure [9] with mixed gate and channel stacks, a stacked memory array transistor (SMArT) [10] using an oxide-nitride-oxide (ONO) film to minimize the stack height, and a metal control gate past process (MCGL process) [11] using a metal gate based on the DC-SF structure. Channel-stacked 3D architectures have been investigated, including the dual-channel 3D NAND structure [12][13][14][15][16] from Macronix and a single crystalline Si-stacked array (STAR) structure [17] from Seoul National University. In this study, all 3D NAND architectures are analyzed and compared from a structural perspective. First, in Section 2, the 3D NAND flash architectures are discussed based on the stack method and their operational characteristics. Section 3 compares and analyzes the fabrication methods of all structures based on the gate fabrication method.

Three-Dimensional NAND Flash Architectures
The architectures of 3D NAND flash can be typically classified into gate-stacked structure [1,2,4,5,7,8,10,11,[18][19][20][21][22][23][24] and channel-stacked structure [3,6,9,[12][13][14][15][16][17], which are illustrated in Figure 2. In the gate-stacked structure, a channel is formed after stacking gate layers, and the current flows in the vertical direction. The cell structure is mostly based on gate-all-around (GAA), because a channel hole is filled with polycrystalline silicon (poly-Si) and a gate dielectric stack; inherently, this structure has several issues arising from the hole diameter. In contrast, in a conventional planar NAND array in a channel-stacked architecture, the current flows in the lateral direction. Although the scaling of a channelstacked structure is limited by the ONO thickness in the bit line (BL) pitch, it is necessary to maintain the word line (WL) channel length to maintain an effective memory window. Moreover, it is difficult to connect the BLs with each layer, owing to their horizontal and parallel orientations. Currently, most commercialized 3D NAND architectures use the gate-stacked structure, owing to the abovementioned problems in the channel-stacked structure [25,26].  Figure 3a,b compare the cross-sectional views of the BiCS and PBiCS flash structures, respectively, based on the gate-stacked structure and gate-first fabrication method. The BiCS flash structure was the first proposed 3D NAND architecture with a high density and cost per bit. In the BiCS structure, the vertically stacked gates are composed of a lower select gate (LSG), an upper select gate (USG), and control gates (CGs), as shown in Figure 3a. Because the channel pillars are not directly connected to the p-well in this structure, gate-induced drain lowering (GIDL) is used as the erase mechanism [1,[27][28][29][30]. The PBiCS architecture improves the limitations of the BiCS flash, including the program/erase window, retention properties, high resistance of the source line (SL), and multibit operation. PBiCS flash has a U-shaped string structure, instead of a straight shape, and a pipe connection (PC) is formed at the bottom between two adjacent gates. This structural difference lowers the SL resistance because it can be accessed by the first and second metal layers, similar to a conventional planar NAND flash array. In terms of reliability, the PBiCS flash has better retention characteristics thanks to less damage to the trapping layer during the fabrication process, and the low resistance of metal wiring and steeply controlled diffusion profile at the SL allows the PBiCs to have better cut-off characteristics [2,31,32].  Figure 4a shows a cross-sectional view of the TCAT architecture. The TCAT uses metal CGs, owing to the use of the gate-last fabrication method. Poly-Si channel holes are formed using the punched-through method like that in the BiCS structure; however, a notable difference is that the TCAT is connected to a p-type substrate, which allows the bulk erase operation. The two poly-Si channels in the structure share a common source line (CSL) formed by the WL cut. Figure 4b shows that the circuit diagram of the TCAT cell array is equivalent to that of a 90 • -rotated planar flash array per string layer, and its bottom end is connected to the CSL. The ground selection line (GSL) and string selection line (SSL) transistors are at the top and bottom of a string, respectively, and flash cells are placed between them in series [4,33].  Figure 5a shows the VRAT architecture containing a planarized-integration-on-thesame-plane (PIPE) structure for effective vertical interconnection [5]. All the WLs are exposed on the same plane simultaneously during the chemical-mechanical planarization (CMP) process. The WLs do not need to be etched consecutively for each layer, unlike in a stair-like method, which implies that the PIPE structure increases the efficiency of the WL interconnection without an additional lithography step after stacking. The VSAT is aimed at solving the difficulties in the fabrication of the VRAT structure, as shown in Figure 5b. Instead of the gate-last process, in which an undercut region is formed and filled, the VSAT structure is fabricated using the gate-first method with doped poly-Si. Because the undercut process is not required for the VSAT structure, it is easy to realize vertical strings along with the stacked WLs, which are similar to the currently commercialized 3D NAND flash string. In addition, the dual-gate structure of the VSAT architecture effectively elongates the channel length without cell density loss, leading to a reduction in the off-current [5,[33][34][35].
A schematic and unit cell diagram of the VG NAND structure are illustrated in Figure 6 [6]. Here, the channels are horizontally stacked, unlike in the BiCS and TCAT structures, and it has an almost identical structure to that of a planar flash array, except for the SSL, which means that the effective cell size per layer is maintained as 4F 2 . The source and the active body (V bb ) are connected to the CSL, enabling the body erase operation, and the required number of SSLs depends on the number of active layers. Common BLs and WLs are used between the stacked active channel layers, and a string needs to be selected to program a specific target cell by SSL biasing.  Unlike the above-mentioned approaches, a floating gate is used as the charge storage layer in the DC-SF structure, as shown in Figure 7a [8]. In this structure, an FG is surrounded by two CGs and covered by an inter-poly dielectric (IPD) and a tunneling oxide. It consists of two CGs and presents a vertical direction. Owing to the thick IPD layer, no tunneling occurs in the CG direction. In addition, a charge spreading phenomenon rarely occurs thanks to the isolated charge in the FG, unlike in the BiCS, as shown in Figure 7b. In the case of the BiCS, charge spreading occurs because the CTL is connected successively to the adjacent cells, while it does not occur in the DC-SF, because the FG is separated and isolated from each cell. Moreover, cell-to-cell interference is negligible, owing to the shielding effect of the two CGs. It also has a wide program/erase window, low operating voltage, and high coupling ratio. The DC-SF can have good retention characteristics by using the FG; however, it is not advantageous for 3D stacking due to a thicker memory film than CTL, which limits the scaling of channel hole dimension. Another 3D NAND structure with an FG is the HC-FG structure, as shown in Figure 8a [3]. Unlike in the DC-SF structure, horizontal channels are stacked in the HC-FG architecture; however, the unit cell is not surrounded by CGs. Consequently, the FG cells can be stacked by the channel-first process, similar to that in a conventional 2D planar flash array, and a 3D structure can be implemented at a low cost. In addition, the HC-FG structure is combined with a layer select transistor (LST), as shown in Figure 8b, to enable additional bit cost scaling. The HC-FG and LST structures are connected via an SSL and share the gate electrodes. The LST structure requires a simple and low-cost process, because impurity regions can be incorporated simultaneously using a self-aligned method, owing to a stair-like structure. The VG NAND architecture has both WL and SSL in the same active layer; therefore, the cell density becomes worse as the stack layer increases. This is improved in a hybrid 3D NAND flash, in which the density is increased by placing the WL and the SSL in the same string, as shown in Figure 9a [9]. In this hybrid structure, the GAA string selector and the metal-alumina-nitride-oxide-silicon (MANOS) cells form string lines, and each channel layer is connected to the BL through the SSL. This structure allows both a vertical channel of the GAA string selectors and horizontal channels of the MANOS cells. The cell string exhibits high channel controllability, owing to GAA structure, and the double-gate memory cell provides a sufficient threshold voltage (V T ) window for multibit operation. Figure 9b shows the SMArT structure [10], which minimizes the stack height using the ONO layer and employs the low-resistance of the metal gate by employing the gate-last process method. The SMArT structure shows superior V T distribution and endurance compared to planar FG cells; however, the retention characteristics need to be improved. In general, the channel-stacked method can have the same pitch size as a 2D NAND architecture; however, it has an issue that the SSL for the target cell access (decoding) increases as the stack layer increases. To address this issue, Macronix proposed several structures for decoding the SSL [12][13][14][15][16]36]. Figure 10 shows the architecture of an island-gate decoded VG structure with channel-stacked BE-SONOS (bandgap engineered SONOS) [37], containing an island gate for the SSL selection [12]. Unlike the VG NAND structure, the presence of an n-type doped poly-Si buried channel allows a junction-free structure, and no additional junction implantation is required. In the island-gate decoded VG structure, the intercepts of the WL, BL, and SSL planes are used for cell decoding, and the WLs and BL are grouped into planes. Another type of gate-stacked 3D NAND structure of Macronix is a single-gate vertical channel (SGVC) [38]. In the SGVC structure, the cell transistor is not based on a nanowire channel of a GAA structure but instead on a flat channel in the WL trench. Figure 11a presents a comparison of GAA and the SGVC cell structure. The SGVC structure is a flat channel-based charge trapping device and has an initial V T distribution and short channel effect similar to that of GAA structure, owing to the ultra-thin body. Compared to the curvature-shaped GAA structure, the SGVC cell structure advantages are in the formation of the critical dimension and channel hole etching. Figure 11b shows the layouts of the GAA VC and SGVC structures. Compared to the GAA VC structure, the SGVC has approximately 2.4 times the memory density in the same stack layer [38][39][40][41].   [39,40]. In the former, an ONO CTL and poly channels are deposited on the WL trench. The thin poly channel layers are separated by the BL cut process, and the CTL and channel are controlled by independent WLs. Owing to the use of a flat channel cell transistor, the scalability of the U-turn SGVC structure is comparable to that of 2D NAND flash, and its etching controllability is better than that of a GAA structure. The bottom-source structure has almost the same characteristics as the U-turn structure, except thin poly channels are connected to the bottom n + substrate. The gate-first process for the bottom-source SGVC fabrication, similar to the BiCS, causes ONO layer damage, and a two-step poly channel process is required to protect the ONO layer. However, the U-turn structure produced by a one-step poly channel process has an extremely thin body structure and a better subthreshold swing distribution [38][39][40][41][42].  Figure 13a shows the STAR NAND flash architecture with a GAA unit cell structure [17]. Because the STAR NAND flash is based on the channel-stacked method, it can have the same minimum cell size as the conventional 2D planar NAND flash structure. Single-crystalline Si nanowire channels are stacked by employing a Si/SiGe epitaxial growth process. However, SiGe-selective etching is required for nanowire channel formation and isolation after the multistacking of the Si/SiGe layers. In the gate-stacked method, poly-Si is generally used as the channel material, because epitaxial growth is difficult, due to narrow and deep channel holes. Since single-crystalline Si is used as a channel layer in this structure, the STAR NAND flash can have a relatively uniform V T and stable BL current distributions. Compared to the VG NAND architecture, the single-crystalline Si channel solves the uniformity issues caused by the defects and the grain boundaries of the poly-Si channel, and it exhibits a better performance with the GAA cell structure. Figure 13b shows the unit structure of the STAR NAND flash [43]. The channel-stacked 3D NAND flash structure requires SSLs for additional address access, unlike a 2D planar NAND flash, and the number of SSLs increases as the stack layer increases.

Fabrication Methods of 3D NAND Flash
Gate-stacked 3D NAND architectures can be fabricated using two methods: gatefirst and -last, and the most representative structures fabricated by the gate-first and -last methods are BiCS [1] and TCAT [4], respectively. Figure 14a shows the gate-first process, in which WLs are stacked first, and subsequently, channel holes are etched. The channel holes are filled with a charge-trapping dielectric and poly-Si channel layers [31]. Figure 14b shows the gate-last (gate replacement) process, in which oxide/nitride multilayers are deposited, which is followed by hole etching and channel poly-Si deposition. An additional process step, called the WL cut, is performed between the channel poly plugs. The WL cut is performed by dry etching; the nitride layer is removed, and the gate dielectric layers and metal gate are deposited. In general, the use of metal gate provides a faster erase speed, lower program/erase voltage, and a wider V T margin by suppressing unwanted backward Fowler-Nordheim tunneling current [44]. The gate-first method of the BiCS [1] structure has a problem in that the hole-etch size is affected by the gate dielectric layers. In contrast, the biconcave produced by the WL cut process of the gate-last method prevents lateral charge loss [45] but increases a process difficulty [4].  Figure 15a shows the fabrication process of the BiCS structure. LSG, memory string, and USG transistors are fabricated separately, and poly-Si is used as the gate material. A transistor channel and a memory plug are formed by hole etching, using a punch-through method. Silicon nitride and tetraethoxysilane (TEOS) layers are formed by low-pressure chemical vapor deposition (LPCVD) for an ONO stack in the etched hole. Arsenic ions are implanted and activated in the LSG source and drain. The CG formation proceeds in the reverse order of the conventional SONOS deposition. The edge of the CG is etched in the form of stair-like steps by reactive ion etching. The entire layers are separated into two blocks through a slit to minimize disturbance. The USG operates as a row address selector using a line pattern and is simultaneously connected to the via hole, BL, and peripheral circuit [29].  Figure 15b shows the process flow of the PBiCS structure string and the PC formation. A memory hole is formed by hole etching, and a sacrificial film is deposited. A PC is formed on the sacrificial film, which is followed by memory layer deposition. After the SG formation, the U-shaped sacrificial layer is removed and the memory films and siliconbody layer are deposited for CG formation on the memory hole, which allows for the pipe-shaped NAND string structure with better reliability characteristics [2]. Figure 16a shows the process flow of the VRAT structure. Firstly, oxide-nitride stacks are deposited on the Si mesa in order, and an active region is defined by patterning and etching. An undercut is created in each oxide layer by wet etching using a buffered oxide etchant (BOE), where each flash cell would be placed. All the gate stacks, including oxidenitride-oxide and the poly-Si gate are sequentially deposited by LPCVD, with an improved step coverage. Subsequently, only the WL electrodes remain in the undercut region by the etch-back process, and they are separated from each other. The exposed part of the multi-stacked layers on the formed Si mesa is flattened by the CMP process, and the WL electrode is exposed. Then, each string is isolated by a poly-Si etch, followed by the contact process for WLs and BLs [35]. The VSAT structure simplifies the overall process compared with the VRAT by adopting a gate-first method, as shown in Figure 16b. Gate electrodes and isolating films (nitride) are sequentially deposited on a Si mesa. Subsequently, an active region is formed, and the CMP process is conducted immediately without the VRAT undercut process, leading each WL to be exposed on the same plane. Subsequently, gate dielectric layers and a poly-Si layer as the channel materials are deposited, and each vertical string is separated by lithography and etching [5]. The direction of WL and BL for the VG NAND structure is changed in a channelstacked structure for simple WL interconnection, as shown in Figure 17. An n + poly-Si BL is formed first, then n + poly-Si WLs are formed on the top of it in a crossing direction. Subsequently, p-type poly-Si multi-active layers are deposited, and n-type ion implantation is performed to form an SSL layer. After the deposition of an interlayer dielectric between the multi-active layers, an ONO stack is deposited over the active regions, and the gate is formed vertically. Thanks to the buried BLs and WLs formed at the early stage of the process flow, the interconnection to them can be easily accomplished [6]. The DC-SF NAND flash structure utilizes a surrounding floating gate as a charge storage layer instead of a CTL, as shown in Figure 18a. Oxide and poly-Si layers are deposited first. Subsequently, a hole etch is done for channel region, and the oxide layer is recessed to form FG regions, respectively. The IPD and FG layers are deposited in order as a blocking oxide and charge storage layer, respectively, and each FG is isolated by wet etching. Lastly, a tunnel oxide and poly-Si layers are deposited in the hole channel [8]. Thanks to the FG structure, the DC-SF boasts better retention characteristics than other 3D SONOS flash architectures; however, the use of the DC-SF poly-Si gate causes several issues, including a high gate resistance, the IPD damage during the FG separation process, field confinement due to the horn-shaped FG, and GIDL during erasing due to the floating channel.
In order to ease them, the MCGL process is demonstrated as shown in Figure 18b. First, oxide and nitride layers are deposited on n + /p-Si substrate. Subsequently, a channel hole is etched, and FGs for individual flash cells are formed at the recessed region after isotropic oxide etching. Then, a tunnel oxide is deposited, and a hole is etched on the Si substrate for channel contact. The poly-Si layer is filled in the channel hole and directly connected to the substrate. Following this, the nitride is removed, and the gate stack, including a high-k IPD film and a tungsten metal gate, is deposited on the FG. In this structure, the tungsten metal gate gives a low WL resistance compared to a poly-Si gate, and the bulk erase operation becomes possible thanks to the direct connection between the substrate and the poly-Si channel [11]. The STAR NAND flash has a unique feature of single crystalline Si nanowire channels, and its fabrications method is described in Figure 19. Initially, Si/SiGe layers are epitaxially grown, and an active region is formed by using oxide/poly-Si/oxide layers as an etching hard mask. Subsequently, n-type and p-type ions are implanted in the left BL region and the right body region, respectively. Then, an additional oxide layer is deposited and etched as a buttress to prevent the collapse of long Si channels during the selective SiGe layer etch. After the selective SiGe etching is carried out, an oxide layer is re-deposited to fill the gap between the Si channels. Oxide patterning for the WL region is followed by isotropic etching of oxide to expose the Si channels and make a gate stack. The width of the buttress oxide (B) should be greater than the width of the oxide (A) of the channel region to be removed so that the buttress oxide can remain with the reduced width (B→C). Subsequently, the gate stacks, including ONO dielectrics and tungsten, are deposited, and planarization is performed to form WL, SSL, and GSL gates. The flash cell, SSL, and GSL transistors are self-aligned using the damascene gate process. Since the STAR architecture is based on the channel-stacked structure, a stair-shaped BL contact is needed [43]. Thanks to the single crystalline Si channel, the STAR can feature better electrical characteristics compared with poly-Si channel flash structures, but it has a difficult fabrication method. In addition, it is hard to increase the number of stacked layers, considering Si/SiGe sequential epitaxial growth.

Conclusions
Currently, the demand for NAND flash memory continues to grow. It has various applications, such as in solid-state drives, and it is widely used in mobile devices requiring data storage. Over time, the planar flash array has evolved into a 3D integrated architecture to increase memory capacity and overcome scaling issues. With over 10 years of development, there has been a significant reduction in the cost per bit of 3D NAND flash. As the number of stack layers increases, a high aspect-ratio etching technology is required for hole formation, considering the fact that it is hard to reduce the hole critical dimension due to gate dielectric stacks, and there is a problem with peripheral circuits and WL thickness. Several efforts have been made to solve these issues, including etching technology with an extremely high aspect-ratio and a so-called four-dimensional NAND flash technology with peripheral circuit under a flash cell array. In addition, program methods have been proposed to reduce the cost per bit of flash instead of stacking layers and conducted to obtain the distribution margin of V T for quadruple level cell (QLC) implementation [20,22,[46][47][48][49]. Starting with the BiCS structure, we have reviewed various kinds of 3D NAND flash structures by classifying and comparing their structures and fabrication methods. Depending on the channel direction and gate formation, their fabrication methods and consequential electrical characteristics significantly differ, and vertical NAND structures with a charge trapping layer have been commercialized, thanks to their easy integration method and high stackablity. In addition, it is believed that etching technology with a high aspect-ratio and programming scheme for accurate multi-level operations should be further improved for the bit cost scaling of 3D NAND flash technologies. However, it is expected that there would be a physical limitation to increase the number of stack layers infinitely, even assuming excellent hole etching technology, and it is necessary to investigate how to solve this through structural changes or emerging memories. Various emerging memories, including resistive random access memory (RRAM) [50][51][52][53][54][55][56][57][58][59] and phase-change random access memory (PCRAM) [60][61][62][63][64][65][66][67], have been widely investigated for faster speed and lower operating voltage, but the reliability is still considered one of the most important factors to be solved for the competition with mainstream memories. The 3D integration of emerging memory devices should also be investigated to replace 3D NAND flash, and it can be inspired by the architectures and fabrication methods of 3D NAND flash technologies. In addition, we believe that it is important to investigate memory computing applications, including neuromorphic systems and processing-in-memory using flash technology, in order to expand the function and capability of 3D NAND flash beyond data storage [68][69][70][71][72][73][74][75][76][77]. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.