1. Introduction
Non-volatile flip-flops (NVFFs) are promising enablers of fine-grained power gating techniques because these circuits do not require a complex interface to transfer data from/to the external storage (e.g., SRAM) before powering down [
1,
2,
3,
4,
5,
6]. The NVFFs can sustain data without power supply. The NVFFs are generally hybrid versions of the conventional D flip-flop (D-FF) and non-volatile storage such as magnetic tunnel junction (MTJ): it uses MOSFETs of the D-FF for static storage in normal operations and MTJs for temporary storage during power gating. Data on the conventional flip flop is stored onto MTJs before the power off (=Storing operation) and is restored after the power is on again (=Restoring operation).
Spin-Transfer-Torque MTJ (STT-MTJ) holds significant promise in providing new capabilities and opportunities for the non-volatile storage of the NVFFs because of its desirable characteristics such as zero standby current, fast read capability, and a small footprint [
7,
8,
9]. Yet, significant challenges remain that stems from the fundamental physics of STT-MTJ. The unique physics of the spin-transfer torque action results in significant stochasticity of the MTJ switching process because a magnetic incubation time is drastically changed by thermal fluctuations [
10]. Thus, high energy is required to guarantee a successful write which uses the longest switching pulse duration on the worst-case approach. In addition to the high energy consumption, endurance and read-disturbance issues are critical because high current/voltage is directly applied to the thin tunnel barrier of the MTJ during writing operations [
11,
12,
13,
14,
15].
Recently, Spin-Orbit-Torque MTJ (SOT-MTJ) has been widely researched because of less stochasticity by eliminating the incubation time with strong initial magnetization and higher endurance by separating write and read current paths [
16,
17,
18]. However, the SOT-MTJ has an additional channel layer, introducing at least one additional transistor to access. This creates another challenge in designing a non-volatile flip-flop using the SOT-MTJ device because one transistor is not negligible on a small flip-flop footprint.
In this paper, two novel area-efficient flip-flops using SOT-MTJs are proposed for the fine-grained power gating. To reduce the area overhead, the existing transistors in the standard D-FF circuit is re-utilized to assist storing and restoring operations. Furthermore, an additional circuit is used to write two MTJs using a single current simultaneously, resulting in fewer extra transistors and less write energy. For the second NVFF, the SOT-MTJ’s device configuration is changed to reduce the number of terminals, inducing fewer access transistors. The proposed NVFFs are implemented using a compact SOT-MTJ model and a 14-nm Predictive Technology Model (PTM) [
19,
20]. Our detailed evaluation indicates that area overheads are 10.3% and 6.9% normalized to the conventional D-FF, which are improvements by a factor of 2–8 over state-of-the-art NVFF circuits. The detailed design and analysis are described in the following sections.
Section 2 describes the SOT-MTJ’s operation and the state-of-the-art NVFF circuits.
Section 3 presents the first NVFF, current reuse flip flop (CR-FF) and
Section 4 shows the second NVFF, STT-Like SOT flip flop (SLS-FF).
Section 5 describes the conclusions.
2. Non-Voltage Flip-Flops (NVFFs) Using Spin-Orbit-Torque (SOT) MTJ Devices
In this section, SOT-MTJ’s write and read processes are described first, explaining the opportunity for reducing area overheads, and the best previously known NVFF circuits are presented.
2.1. SOT-MTJ’s Write and Read Processes
An MTJ consists of two layers of magnetic material separated by a dielectric layer. Two magnetic layers have respective spin directions, one with pinned polarization and the other free. The spin direction of the pinned layer (PL) is fixed while the spin direction of the free layer (FL) can be changed by the injected current. The MTJ can be either parallel (P) or anti-parallel (AP) state by the direction of spins. In the P state, the spin direction of the FL and the PL are the same. The spin direction of the FL is opposite in the AP state. Each state has a unique magnetic resistance that corresponds to logical ‘1’ and ‘0’ (e.g., AP and P).
STT-MTJ has two terminals, and the write/read current is directly passed through the MTJ device. In contrast, SOT-MTJ has three-terminals, and the write current path is isolated from the read current path. This is achieved by adding one channel layer under the free layer. The charge current flows through the channel layer that induces a spin current on the FL as illustrated in
Figure 1a. As the charge current is divided by the spin state (up or down) of the electron, a spin current perpendicular to the charge current is generated. The spin current exerts a damping-like-torque on the magnetization of the free layer. Because the Rashba–Edelstine effect by the interface, a magnetization switching occurs [
17,
18]. Once the access transistors are turned on by wordlines (WL1 and WL2), a current flows through the access transistors and the channel layer. If the current supplied by the access transistors is sufficiently large, the MTJ undergoes a state change to the AP or the P by the voltage on BL and SL. To read the MTJ’s data, a relatively low current is passed through the top terminal by raising a read bitline (RBL) and a read wordline (RWL) with low SL as shown in
Figure 1b. Because the current path for the read operation is isolated from the write current path, high endurance and less read-disturbance are achieved. However, compared to the STT-MTJ, one more terminal is required for the channel layer which contributes to the total area increase.
2.2. State-of-the-Art NVFFs
MTJs are widely used in designing NVFFs because of zero standby current and high integration capability. Moreover, the MTJs do not occupy silicon asset; however, access transistors to write/read the MTJs induce the silicon area overhead. Typically, at least one access transistor is needed for each terminal of an MTJ. Therefore, three-terminal MTJs such as SOT need more access transistors than two-terminal STT-MTJ devices. Note that only three-terminal MTJs are addressed for comparison in this paper.
The best previously known NVFFs are reviewed first and then the proposed NVFF circuits are presented in this paper. Multiple realizations of the NVFFs have been proposed that minimize the size of write and read circuitry [
21,
22,
23,
24,
25]. In Reference [
21], five additional transistors are used to write two MTJs. Two independent write current pulses are used to make two MTJs in opposite states. In Reference [
22], only four access transistors are added on top of the conventional D-FF to write/read SOT-MTJs, but two external AND and OR gates are required to generate to control signals, introducing non-negligible area overheads. Another SOT-based NVFF presented in [
23] that uses two inverters and one NAND gate to generate control signals and two additional transistors for reading MTJs. In addition to the internal circuit, an external write driver is used to write MTJ devices. In Reference [
24], only four access transistors are added, but two external AND and OR gates are required to generate to control signals. All additional transistors of the NVFFs introduce non-negligible area overheads which needs to be reduced as much as possible.
3. CR-FF: Current Reuse Flip Flop
The previously known NVFFs typically uses two MTJs to store Q and Qb values of a slave latch before powering down. The resistance difference of two MTJs triggers the regenerative feedback of an inverter pair in a restoring operation, generating Vdd or 0. To control these two MTJs, six transistors are typically used because each terminal needs at least one switch transistor. Furthermore, the write current is double for two MTJs. It is universally the case that area and energy overheads become double to deal with two devices. But what is the best way to reduce the overheads?
Our idea is re-using a write current, which is used for storing an MTJ, to write the other MTJ. That means that a single current can write both MTJs. Instead of reducing the size of the write circuit, the number of write attempts is reduced. Two independent current pulses typically are required to write two MTJs. If a single current pulse can write two MTJs at a time, one write circuit would be removed, introducing less area overheads. The proposed NVFF is refereed as current-reuse FF (CR-FF) in this paper. This current reuse technique is achieved by placing two channel layers of the MTJs on the same current paths as shown in
Figure 2a. Because two MTJs are connected to each other in opposite directions, these MTJs are always set to opposite states by the same write current. One minimum-sized transistor (M1) works as a switch for turning on the write operation. Importantly, there is no high voltage drop via the channel layers because the resistance of the channel layers is low. The proposed single write current approach also reduces the write energy by 50%. Because the magnitude of the write remains unchanged, the write attempt is reduced by half. Therefore, the area and energy overheads are reduced simultaneously by introducing the single write current approach.
To restore data, the proposed NVFF utilizes the existing inverter pair in the slave latch. Additional two minimum-sized transistors work as switches to change a current path. If Restoring ENable (REN) signal is high as shown in
Figure 2b, the current path is changed from the top node of two MTJs to the FL. Once the power is on again after power gating, a regenerative feedback begins to generate outputs in the inverter pair depending on the MTJs’ resistance. Ultimately, the QS and Qsb voltages becomes Vdd or 0.
The proposed CR-FF is implemented using a 14-nm predictive technology model (PTM) MOSFET model and a compact MTJ model [
19,
20]. Key parameters of the SOT MTJ are described in
Table 1.
Figure 3 shows SPICE simulation results of the proposed NVFF using the models. The proposed NVFF behaves like a conventional D-FF in normal operation. D is followed by Q before power down and after power up. The storing operation is performed before powering off, and data is restored when power is on again. The Q value is set to ‘1’ before the power-down mode. Once Storing Enable (SEN) signal goes high to activate a storing operation, a current is passed through MTJ
A, M1, and MTJ
B. The MTJ
A is written to the AP state because the current direction is from A to B of the channel layer, and MTJ
B becomes the P state because the current direction is reversed (B→A).
Figure 3 shows the waveforms at each node of the proposed circuit where D is changed from one to 0. The waveforms show the change of the node voltage on VDD, CLK, SEN, REN, D, and Q. mz_A and mz_B are spin directions of MTJs on Z axis. 1 and −1 denote anti-parallel and parallel states, respectively. In power-down mode, Q drops low, but Q is restored at 62 ns when power is back. Once the REN becomes high, a regenerative feedback of an inverter pair begins and the values of MTJs are restored. The leakage current goes to MTJ
A and MTJ
B and different voltages are developed on QS and QSb as shown in
Figure 3b. If the resistance of one MTJ is greater than that of the other MTJ, the regenerative feedback ultimately generates Vdd or 0. On the top figure in
Figure 3b, QS goes to Vdd because R
MTJA is greater than R
MTJB which means the stored data is ‘1’. If the stored value is ‘0’, QS would become 0 as the bottom figure in
Figure 3b.
4. SLS-FF: STT-Like SOT Flip Flop
Typically, an MTJ requires three access transistors because each terminal needs a switch transistor. The proposed CR-FF in
Section 3 achieves three extra transistors by re-utilizing one circuit to write two MTJs. This is 10.3% area overhead compared to the conventional D-FF that has 29 transistors. Note that the size of the PMOS transistor is assumed to be
NMOS transistor, and MTJs do not consume silicon area because the MTJs are placed between metal layers. However, the area overhead is non-negligible because millions of flip-flops can be used in advanced Very-Large-Scale Integration (VLSI) chips.
To reduce the area further, the number of terminals of an MTJ is intentionally reduced. This is achieved by introducing a concept of external terminals by shorting the physical terminals. The top terminal of an MTJ is connected with a terminal of a channel layer in the proposed technique. Thus, the number of the external terminals becomes two as shown in
Figure 4b. The proposed technique is refereed as STT-Like SOT (SLS) configuration because the structure including the number of terminals looks similar to an STT-MTJ device. A key observation is that the major write current still flows through the channel layer if the channel layer’s resistance is relatively low. The channel layer is composed of heavy metals, and the metal’s resistance is easily adjusted by the length of the layer. Note that the channel layer’s resistance is proportional to the length of the layer, and is inversely proportional to the thickness of the layer. The ratio between R
MTJ and R
ch is set to
in a reference MTJ model that is used in our experiment [
19]; therefore, the major write current is passed through the channel layer which does not hinder the write operation.
The proposed SLS configuration is used in designing an area-efficient NVFF as shown in
Figure 5. Two SOT-MTJs are inserted using the proposed SLS configuration on a slave latch for temporary storage. The current reuses technique is also used. The two MTJs are placed on the same current path via a transistor. In normal operations, the proposed NVFF operates as a conventional D-FF. Before powering down, data on the slave latch are stored onto the MTJs, and is restored from the MTJs after the power-up. Once a SEN signal is raised before the power-down, the write current is derived by QS and QSb values. Consider QS ‘Low’ case (Q = 0). In this case, the QS node voltage is lower than the QSb node voltage. Therefore, the write current is passed through MTJ
B, an access transistor, and MTJ
A as shown in
Figure 5a. MTJ
B is written to the AP state and MTJ
A is the P state. A restoring operation is achieved with another transistor connected to REN node. Once the power is backed up with high REN and SEN, leakage currents go to MTJ
A and MTJ
B as shown in
Figure 5b. Different voltages are developed on QS and QSb based on the MTJ resistance. If the stored data is ‘0’ as illustrated on the upper figure in
Figure 6b, the QS node becomes 0 V whereas the QS voltage reaches Vdd if the stored data is ‘0’ as the lower figure in
Figure 6b. An MTJ’s resistance can be either
or
in the proposed NVFF. If the resistance of one MTJ is greater than that of the other MTJ, the regenerative feedback of the inverter pair generates Vdd or 0 on the Q node.
The proposed NVFF is also designed with a 14 nm PTM MOSFET and a compact SOT MTJ. SPICE simulation results of the proposed NVFF with the model are shown in
Figure 6. The proposed NVFF operates as a conventional D-FF in normal operations. After 8 ns, a storing operation is performed. The output Q is stored onto MTJs when SEN is raised. MTJ
B is written to the AP state and MTJ
A is the P state at 13 ns for Q = 0. 1 and −1 denote anti-parallel and parallel states, respectively. The Q is restored when power is up again at 60 ns by raising SEN and REN signals. The restored value on the Q node is maintained before the first cycle after the power-up sequence, and then the D is followed by the Q. A simulated delay of the restoring operation is 10 ps and a storing time is 5.8 ns. Energy consumption of the proposed NVFF for restoring and storing operations are 0.1 pJ and 0.9 pJ, respectively.
The proposed two NVFFs are compared with the state-of-the-art NVFF circuits as described in
Table 2. A relative overhead is used for the area comparison because the actual areas strongly depend on process nodes. The area overhead is normalized to the conventional D flip-flop of each technology. The area overheads of CR-FF and SLS-FF are estimated to be 10.3% and 6.9% (=2/29), respectively. Because only three or two minimum-sized NMOS transistors are added to the conventional D-FF that has 29 transistors. Note that the size of the PMOS transistor is assumed to be 2× NMOS transistor. The relative area overheads to the D-FF of the latest NVFF architectures ranges from 20.7% to 55.1%. Therefore, the proposed NVFFs show an improvement of nearly a factor of 2–8 in terms of area compared to state-of-the-art NVFF circuits. Importantly, the proposed NVFFs do not require external controllers and their control signals. Only simple two signals, REN and SEN are required for each operation. As seen in
Table 2, external circuits are required for reference NVFFs to generate control signals which induce non-negligible energy and area overheads.
Furthermore, the delay and energy for storing and restoring operations depend on a type of MTJs. Because switching time and current vary significantly over MTJs. That means that it is not fair to compare delay and energy directly with the other NVFFs which embed the different MTJs. However, the single write current approach in the proposed NVFF can reduce the write energy by 50% because the proposed technique re-uses a write current, which is used for storing an MTJ, to write the other MTJ.