GateRL: Automated Circuit Design Framework of CMOS Logic Gates Using Reinforcement Learning

: This paper proposes a GateRL that is an automated circuit design framework of CMOS logic gates based on reinforcement learning. Because there are constraints in the connection of circuit elements, the action masking scheme is employed. It also reduces the size of the action space leading to the improvement on the learning speed. The GateRL consists of an agent for the action and an environment for state, mask, and reward. State and reward are generated from a connection matrix that describes the current circuit conﬁguration, and the mask is obtained from a masking matrix based on constraints and current connection matrix. The action is given rise to by the deep Q-network of 4 fully connected network layers in the agent. In particular, separate replay buffers are devised for success transitions and failure transitions to expedite the training process. The proposed network is trained with 2 inputs, 1 output, 2 NMOS transistors, and 2 PMOS transistors to design all the target logic gates, such as buffer, inverter, AND, OR, NAND, and NOR. Consequently, the GateRL outputs one-transistor buffer, two-transistor inverter, two-transistor AND, two-transistor OR, three-transistor NAND, and three-transistor NOR. The operations of these resultant logics are veriﬁed by the SPICE simulation.


Introduction
Over the past decade, machine learning (ML) algorithms, especially deep learning (DL) approaches, have attracted lots of interests in a variety of applications, such as image classification [1], object detection [2], image/video search [3], super resolution [4], language translation [5], speech recognition [6], stock market prediction [7], and so on, due to their dramatic advances, along with highly increased processing power and huge amount of training datasets [8].
Besides, there have also been many ML-related research studies on the field of the circuit design [9]. One direction is to implement ML networks in existing or specialized hardware platforms. While various simplification methods have been proposed to alleviate the requirement on processing power and data bandwidth [10][11][12], highly complicated and large size DL networks have been realized based on specialized integrated circuit solutions, such as tensor processing units (TPUs) [13] and general purposed graphics processing units (GPGPUs) [14] that enable the acceleration of DL computations. The other approach is to employ ML algorithms during the course of the integrated circuit design. ML methods optimize transistors' sizes of a given circuit schematic in various points of view of target performances, including power consumption, bandwidth, gain, and area [15,16]. At the layout stage, MLs automate the placement procedure to avoid the routing errors in advance for the small chip area [17,18]. These approaches lead to the substantial reduction on the overall design time by the much smaller number of iterations in simulation and layout stages. In particular, a Berkeley analog generator (BAG) has been studied as the designer-oriented framework since 2013 [19,20]. BAG contains the whole circuit design procedures, including schematic, layout, and verification. Its schematic design platform selects one architecture for a given specification and optimizes parameters. The DL-based layout optimization for a BAG framework was also proposed in 2019 [21].
On the other hand, to the best of our knowledge, there have been no research results to devise DL to design the circuit structure from the scratch at the transistor level. These circuit structure generators cannot avoid very large hypothesis space because a transistor have three terminals of source, gate, and drain that can be connected to each other and used as input, output, or internal node. As a result, it is much more difficult to give rise to the custom schematic directly than to use given circuit blocks that have dedicated ports for inputs and outputs. This paper proposes an automated schematic design framework (GateRL) for digital logic gates, such as inverter, buffer, NAND, AND, NOR, and OR, at the backplane of complementary metal on semiconductor (CMOS) transistors. This full custom design scheme is required for special circuits in the situation with limitations, such as thin-film transistor (TFT) shift registers integrated in display panels, where only one-type TFTs are allowed [22][23][24][25]. While most off-the-shelf applications have been based on the supervised learning networks trained with a huge amount of data and labels, the GateRL employs the reinforcement learning (RL) [26] because the schematic design is difficult to provide labels and only whether the resultant circuit works or not can be decided. The remaining parts are organized as follows. In Section 2, the proposed GateRL is explained in detail by addressing overall architecture and variables exchanged between agent and environment. Section 3 demonstrates the experimental results that include the schematics proposed by the GateRL and their simulation results by simulation program with integrated circuit emphasis (SPICE). Section 4 concludes this paper.

Proposed GateRL Architecture
A proposed GateRL system is based on the RL methodology that consists of agent and environment, as shown in Figure 1. While the agent sends an action (a t ) to the environment to add a new connection to the schematic, the environment updates state (S t+1 ), reward (r t+1 ), and mask (M t+1 ) and transfers them to the agent again. Unlike general RL networks, the GateRL makes use of the action masking scheme to take into account several constraints in the circuit design, as well as to reduce the size of the action space [27]. The target logic gate is specified by the truth table (T), where corresponding outputs are described over all the combinations of inputs.

State (S t )
S t is extracted from the connection matrix (CM) that is similar to an adjacency matrix of a graph theory [28]. As depicted in Figure 2, all possible nodes of the schematic are equally assigned to columns and rows of CM, leading to the shape of a square matrix. N i , N o , N n , and N p are the numbers of inputs, outputs, n-type MOS (NMOS) transistors, and p-type MOS (PMOS) transistors, respectively. Therefore, the total number of columns or rows (N CM ) can be obtained by Equation (1). The first two columns and two rows represent connections to supply voltage sources of VDD and GND. In particular, because transistors have three terminals of source (S), gate (G), and drain (D), the numbers of their nodes are calculated by means of the product of the number of transistors and the factor of 3. The connections of their body terminals are omitted by assuming that bodies of all NMOS and PMOS transistors are fixed at GND and VDD, respectively. From here, the node assigned to the i-th row or column is addressed as the node i (i = 1, 2, . . . , N CM ).
(1) An element of CM at i-th row and j-th column (cm ij ) is assigned to 0 or 1, where 0 means no connection, and 1 indicates the connection between two corresponding nodes as expressed in Equations (2) and (3). Therefore, the circuit schematic can be extracted from a given CM. For example, CM of Figure 3a is converted into the schematic of Figure 3b, where N i , N o , N n , and N p are 1, 1, 1, and 0. In addition, since all the connections in the circuit schematic are undirected, CM is constructed as a symmetric matrix.
cm ij = 0, Without connection between node i and node j 1, With connection between node i and node j .
In the connection of circuit elements, there exist constraints on CM. Firstly, direct connections between VDD, GND, inputs, and outputs must not be allowed to avoid shootthrough currents between voltages sources and to guarantee the functionality of outputs according to the combinations of inputs. Thus, the corresponding area of CM is always filled with 0. Secondly, the self-connections described by diagonal elements have no meaning in the circuit, leading to the diagonal elements (cm ii , i = 1, 2, . . . , N CM ) fixed at 0 all the time. Thirdly, as CM is a symmetric matrix, the full matrix can be reconstructed from the half area over the diagonal. As a result, the whole CM can be represented with the area marked in Figure 4 that is reshaped into S t in a form of a vector by flattening. Its length (N s ) is calculated as Equation (4) that is derived by dividing the region into a rectangular part and a triangular part.
In addition to three above constraints, the proposed algorithm defines two more constraints that can be applied during schematic design steps. The first one is that VDD, GND, and inputs should not be connected to each other through a single transistor. This connection causes the short circuit situation between voltage sources. Thus, any two nodes among them must not be connected to source and drain terminals of the same transistor, respectively. For example, whenever VDD is linked to the source of a transistor, GND and inputs cannot be placed at its drain. The second constraint is that the connection of a single signal to both source and drain of a transistor is not allowed. Because this connection can be replaced with a simple short circuit, that transistor is not necessary. mm ij = 0, if available connection between node i and node j 1, if prohibited connection between node i and node j .
In the first place, a masking matrix (MM) is generated with respect to the current CM. MM is also the square matrix of N CM × N CM where each element (mm ij ) of 0 or 1 represents whether that connection is available or not, respectively as described in Equation (5). Four masking criteria are reflected on MM as follows. First, since it is not necessary to repeat existing connections, corresponding elements of MM are set to be 1. Second, all direct connections between VDD, GND, inputs, and outputs are blocked. When a node is connected to one of VDD, GND, inputs, and outputs, that node must not be connected to any others of them. Therefore, elements of those short connections are also marked as 1 in MM. Third, when one of source and drain terminals is connected to VDD or GND or inputs, the other terminal of the same transistor must avoid the connection to VDD, GND, and inputs that brings about the drastic shoot-through currents between voltage sources via the transistor turned on. Fourth, it is forbidden to connect a single signal to both source and drain of a transistor. When one of source and drain is connected to a signal, its other terminal should not be assigned to the same signal.
From the second to the fourth, all possible connections via other nodes are simultaneously blocked. In the circuit, the connection of two nodes through different nodes is equal to their direct connection. Therefore, whenever mm ij is set to 1, it is checked out whether that corresponding i-th row and j-th column include multiple 1s at CM. If so, all connections between nodes linked to node i and node j are taken into account as the connected ones and then, corresponding elements of MM are set to 1. Finally, after MM is updated by processing four resultant matrixs through logical OR operation, M t is generated in a vector form, like S t , by flattening the upper right region of MM. The overall procedure of the mask generation is closely explained in Figure 5, where N i , N o , N n , and N p are 1, 1, 1, and 1.

Reward (r t )
Since the proposed GateRL simply defines r t as +1 for the correctly working schematic and −1 for others, it is expected that the GateRL proposes the working schematics with the minimum number of connection steps, that is, the minimum number of transistors. Therefore, it is of the most importance to verify whether the schematic represented by CM meets a given truth table or not. For example, the truth table of an inverter regarding CM of N i = 2 and N o = 1 consists of four rows that include combinations of two inputs (I N 1 , I N 2 ) and their target outputs (OUT) as illustrated in Figure 6. Even though the inverter has only two cases of −1 and +1 for I N 1 , the four-row truth table is composed for the agent to be able to support other two-input logic gates, such as AND, NAND, OR, and NOR, with a common GateRL framework. The high level of VDD and the low level of GND are described as +1 and −1, respectively. 0 indicates the high impedance that is equivalent to a floating node without any connections. The second column filled with 0 describes that the second input (I N 2 ) is not in use for this one-input logic gate.
The verification is conducted with the assumption that CMOS transistors are ideal switches where NMOS and PMOS are completely turned on at gate values of +1 and −1, respectively. The following steps are repeated over from the first row to the last one of a given truth table. First, a vector (SV) of the size of 3N n + 3N p is initialized with elements of 0. Because its elements describe the voltage levels at terminals of transistors, initial nodes are set to be high impedance nodes. Second, for terminals connected to VDD and GND in the current CM, corresponding elements of SV are updated by +1 and −1, respectively.
Third, for terminals linked to inputs, corresponding elements in SV are updated by input levels at the selected row of the truth table. The high impedance inputs are not taken into account. Fourth, when the gate of a NMOS is +1, as well as one of its source and drain is 0, the element of SV for the terminal with 0 is modified by the voltage value of the other terminal. This operation is also applied to source and drain terminals of a PMOS in the same way when its gate is asserted with −1. Fifth, updated values at terminals of CMOS transistors are propagated to their connected elements in SV according to CM. Sixth, until there is no change in SV, the fourth and fifth steps are repeated. However, when source and drain are assigned at opposite polarities to each other or both are set to 0, corresponding elements of SV are maintained without any changes. Finally, the output is obtained from the SV element of its connected terminal. Only when verification outputs for all input combinations are matched to the target outputs of the truth table, r t is given as +1. If not, r t is determined as −1. The verification steps for the case of an inverter are illustrated in Figure 7.

Action (a t )
The agent decides a t based on S t and M t received from the environment. a t contains the position of a new connection in the dimension of S t that is converted into a position of the upper right part of CM by the environment. Then, CM is updated by setting the corresponding element to 1. However, as mentioned in the mask generation, when a row and a column including that connection get multiple 1s, all combinations of those nodes should be directly connected to each other. Finally, the logical OR operation is conducted with CM and its transposed one, resulting in the symmetric matrix.
The agent is implemented by a deep Q-network (DQN) [29] that is composed of 4 fully connected networks ( f c 1 , f c 2 , f c 3 , f c 4 ). It takes the concatenation of S t and T as the input layer (X) as presented in Figure 8. T is the flattened vector of a given truth table. The range of the S t values is changed into from −1 to +1 to become equivalent to that of T. Activation functions are rectified linear units (ReLUs) for f c 1 , f c 2 , and f c 3 , and a linear unit for f c 4 . In particular, the multiplication of a mask and a large scalar (β) is subtracted from the output layer of f c 4 , guaranteeing that positions corresponding to mask elements of 1 are not available for the action selection. a t is decided by selecting the position of the maximum output through an argmax function with respect to actions. The overall network operations are described in Equations (6)- (11). W 1 , W 2 , W 3 , and W 4 are weight matrices, and b 1 , b 2 , b 3 , and b 4 are biases. Figure 8. DQN structure. It deals with the concatenation of S t and T as an input layer. The range of S t elements from 0 to 1 is extended from −1 to +1 in order to be matched to T.
An episode for a given T starts with an empty CM and continues until the working schematic is extracted or there are no more available connections. Every step (t) in the episode, the episode buffer stores a transition that contains S t , T, a t , and r t+1 . Then, when the episode is terminated, the return (G t ) at S t and T is computed, and the revised transition, where r t+1 is replaced with G t , is appended to a replay buffer. These episodes are repeated by changing T to address all target logic gates. Then, the agent's DQN is trained based on mini-batches sampled from a replay buffer to minimize a loss (L) described in Equation (12). This is summarized in Algorithm 1, where Q is the DQN function, W is weights and biases, γ is the discounting factor, and α is the learning rate. Select a t with probability : a t ← argmax a {Q(S t , T, a) − βM t } 8: Receive r t and M t from environment 10: until r t = +1 or M t = 1 11: Append (S t , T, a t , G t ) to a replay buffer 16: end for 17: end for 18: if every N episodes then 19: for each mini-batch sample do 20: T, a t )) 2

21:
end for 22: end if 23: end for

Experimental Results
The GateRL is trained for the target logic gates, such as buffer, inverter, AND, NAND, OR, and NOR. While buffer and inverter are one-input logic gates, AND, NAND, OR, and NOR are two-input ones. Therefore, CM is set with N i , N o , N n , and N p of 2, 1, 2, and 2, respectively. The resultant N CM and N s are computed as 17 and 126. The multiplication factor for M t , β, is 100. Especially, the truth tables of buffer and inverter are provided with two versions, where one input is used, and the other is a floating node. Truth tables for all target gates are illustrated in Figure 9, leading to flattened vectors (Ts) at the size of 12. BUF1 and INV1 are buffer and inverter that use I N 1 as an input, and BUF2 and INV2 are buffer and inverter which adopt I N 2 as an input. For the agent's DQN, the numbers of units in f c 1 , f c 2 , and f c 3 are assigned to twice as large as the dimensionality of X (N x ) that is 138. The size of f c 4 is equal to N s of 126. An -greedy method is adopted with the random action selection probability ( ) of 0.1. The discounting factor, γ, is 0.95 and the network is optimized by Adam with a learning rate of 0.001. In addition, there exist 8 special replay buffers dedicated to target logic gates (S0RB for BUF1, S1RB for BUF2, S2RB for INV1, S3RB for INV2, S4RB for AND, S5RB for OR, S6RB for NAND, S7RB for NOR) that contain the transition (S, T, M, a, R) histories terminated at the reward of +1 and one replay buffers (FRB) for the transitions with the failure where the episode is finished with the reward of −1. All the replay buffers store up to 1024 transitions. On top of multiple replay buffers, the probability of the target logic selection (P T ) is adjusted to force the network to focus on the logic gate episodes with more failures, that is, more rewards of −1 at the termination. The network is evaluated at the greedy mode every 64 episodes and the number of extracted working schematics is counted separately for each target logic. Then, those counted values are processed by the Softmax function with the temperature of 0.01 and P T is obtained by Equation (13). SCNT is a vector of 8 elements that includes the counted values for target logic gates.
The numbers of extracted working schematics are plotted in Figure 10a,b for training and evaluation. An x-axis is indicated by the number of SI Ms where one SI M is equal to 64 episodes. As expected, BUF1 and BUF2 begin to be extracted first, and then more complicated logics are obtained in the order of AND, OR, INV1, INV2, NAND, and NOR. Especially, because the counted values for BUF1 and BUF2 are much larger than others, their probabilities of the target logic selection are reduced in the training period, which is represented as their smaller slopes in Figure 10a. The resultant schematics extracted by the proposed GateRL are presented in Figure 11. BUF1 and BUF2 are constructed by one transistor that is always turned on. Their episodes are terminated in 3 steps. AND and OR are proposed with two transistors, where one NMOS transistor is always on, and the other transistor is controlled by I N 1 . They are finished in 6 steps. When I N 1 is low and I N 2 is high for AND, the racing problem between I N 1 and I N 2 takes place; however, OUT can be settled at the low level close to GND by increasing the ratio of channel width to channel length of a PMOS transistor (P1). The similar issue on the schematic of OR can be coped with by increasing the channel widthto-length ratio of a NMOS transistor (N2). INV1 and INV2 are composed of one PMOS transistor and one NMOS transistor, and their episodes are done in 6 steps. Lastly, NAND and NOR are built with three transistors (2 PMOS and 1 NMOS for NAND, 1 PMOS and 2 NMOS for NOR), and the episodes end in 9 steps. Like AND and OR, the extracted circuits of NAND and NOR can cause racing problems that would be addressed by manipulating the channel width-to-length ratio. Components and episode lengths of working schematics are summarized in Table 1.  These extracted circuits are evaluated by the off-the-shelf circuit simulator, SPICE, as summarized in Figure 12. The proposed GateRL does not take into account threshold voltages of transistors and racing issues. Therefore, some circuits cannot achieve the rail to rail outputs, even though the channel width-to-length ratio is adjusted. While BUF1 pulls up OUT to the lower voltage than VDD due to the threshold voltage of N1, BUF2 pulls down to the higher voltage than GND due to that of P1. In AND, OR, NAND, and NOR, lower logic-1 than VDD and higher logic-0 than GND are caused by threshold voltage and racing problem. However, all logic-1 and logic-0 voltages are accomplished at higher and lower levels than the center level, respectively. These issues will be further addressed in our future works by revising the verification algorithm for the reward and the network structures. To verify the necessity of separate replay buffers and adaptive P T , the GateRL is also trained and evaluated with a uniform P T or with only one replay buffer. As depicted in Figure 13a, the uniform P T and separate replay buffers cannot provide NOR schematics within 2000 SI Ms, and even other logic gates are extracted at the longer SI Ms. In the case of one replay buffer, only BUF1 and BUF2 are designed within 2000 SI Ms, as illustrated in Figure 13b,c . Consequently, it is ensured that separate replay buffers can help to find solutions of complicated logic schematics, and the adaptive P T contributes to improve the training speed.

Conclusions
Although various ML algorithms have been employed in many applications, the area of an automated circuit design has not been addressed yet. This paper demonstrates the first ML approach that automatically designs the circuit schematics.
The proposed GateRL is an automated digital circuit design framework based on RL at the backplane of CMOS transistors. The connection of circuit elements is described by CM that is transferred in the format of a vector to the agent along with the mask. The mask is used to reduce the dimensionality of the action space. The agent decides the optimum action of a new connection by a DQN that consists of 4 fully connected network layers. The proposed GateRL is successfully trained with separate replay buffers and adaptive selection probability for 6 target logics, such as buffer, inverter, AND, OR, NAND, and NOR. The extracted schematics are verified by SPICE simulation.
However, the proposed scheme has some limitations. First, all transistors are taken into account as ideal switches by neglecting the threshold voltages. Therefore, the turnedon transistors are dealt with as perfect short circuits regardless of voltage levels at source and drain terminals. Second, some extracted circuits contain the racing problems. These are compensated for to some extent in the SPICE simulation by manipulating the channel width-to-length ratio. Third, only voltage sources and CMOS transistors are included as circuit elements. Even though these limitations, the proposed GateRL will pave the way to the complete ML frameworks of the schematic design over more complicated digital circuits, as well as analog circuits.

Conflicts of Interest:
The authors declare no conflict of interest.