MOSFET Physics-Based Compact Model Mass-Produced: An Artificial Neural Network Approach

The continued scaling-down of nanoscale semiconductor devices has made it very challenging to obtain analytic surface potential solutions from complex equations in physics, which is the fundamental purpose of the MOSFET compact model. In this work, we proposed a general framework to automatically derive analytical solutions for surface potential in MOSFET, by leveraging the universal approximation power of deep neural networks. Our framework incorporated a physical-relation-neural-network (PRNN) to learn side-by-side from a general-purpose numerical simulator in handling complex equations of mathematical physics, and then instilled the “knowledge’’ from the simulation data into the neural network, so as to generate an accurate closed-form mapping between device parameters and surface potential. Inherently, the surface potential was able to reflect the numerical solution of a two-dimensional (2D) Poisson equation, surpassing the limits of traditional 1D Poisson equation solutions, thus better illustrating the physical characteristics of scaling devices. We obtained promising results in inferring the analytic surface potential of MOSFET, and in applying the derived potential function to the building of 130 nm MOSFET compact models and circuit simulation. Such an efficient framework with accurate prediction of device performances demonstrates its potential in device optimization and circuit design.


Introduction
Compact models work as a bridge between the fabrication process and circuit design. They are designed to accurately reproduce minute details of device electrical characteristics, which are essential in the design of digital, analog, mixed-signal, and RF-integrated circuits. This requires the model to be accomplished in a manner consistent with the device operation physics, and with a model structure that remains invariant of fabrication process particulars. Several types of compact model for a Mental-Oxide-Silicon Field-Effect transistor (MOSFET) have been developed, including the threshold-voltage(V th )-based compact model, the inversion-charge(q i )-based compact model, and the surface-potential(φ s )-based compact model. Among these compact models, the φ s -based compact model has achieved widespread success and is most frequently used in modern circuit simulators for its accurate description of major physical effects, which are responsible for the characteristics of scaled MOSFETs. The expression formulation of φ s is the key component of φ s -based compact models, and needs to be carefully designed in solving implicit transcendental equations. Aggressive down-scaling of the device and the resultant physical effects have made it very challenging to obtain analytic solutions from complex equations in mathematical physics [1,2]. On the other hand, though numerical solvers [3] can be high-quality alternatives, the non-differentiable solution is unfit for circuit simulation or design optimization; Figure 1. Schematic of the PRNN Framework. Stimulation data from TCAD is firstly transformed at the physical-relationship layer to contain more fundamental relations of physics. Then, the pretreated data is trained by fully-connected artificial neural networks.

TCAD Simulation
The 130 nm node MOSFET is stimulated with the Sentaurus Device TCAD tool, as shown in Figure 2. The simulated training data have two parts: preselected device data/parameter s (i for sample index), and corresponding surface potential [10] value y s computed by TCAD. Each s is a d-dimensional vector specifying: the thickness of the gate insulation layer (T ), the gate to source voltage (V ), the drain to source voltage (V ), the length of the channel region (L ), the temperature (T), the doping concentration in the channel region (N ), and channel locations (x) ( Table 1 for details). The s should cover a diverse range of device/operation conditions to generate a rich training dataset. The simulated potential value y is the difference between substrate electrostatic potential and electrostatic potential (one nanometer below the channel surface) [11]. The surface potential is defined as the difference between the substrate electrostatic potential and the electrostatic potential (one nanometer below the channel surface).

Physical-Relationship (PR) Layer
The PR-Layer groups the variables in each and applies group-wise transform to account for the desired interaction between parameters; the details are shown in Equation Figure 1. Schematic of the PRNN Framework. Stimulation data from TCAD is firstly transformed at the physical-relationship layer to contain more fundamental relations of physics. Then, the pretreated data is trained by fully-connected artificial neural networks.

TCAD Simulation
The 130 nm node MOSFET is stimulated with the Sentaurus Device TCAD tool, as shown in Figure 2. The simulated training data have two parts: preselected device data/parameter x i s (i for sample index), and corresponding surface potential [10] value y i s computed by TCAD. Each x i s is a d-dimensional vector specifying: the thickness of the gate insulation layer (T ox ), the gate to source voltage (V gs ), the drain to source voltage (V ds ), the length of the channel region (L g ), the temperature (T), the doping concentration in the channel region (N d ), and channel locations (x) ( Table 1 for details). The x i s should cover a diverse range of device/operation conditions to generate a rich training dataset. The simulated potential value y i is the difference between substrate electrostatic potential and electrostatic potential (one nanometer below the channel surface) [11]. network optimization, the resultant analytic surface potential is further applied to b ing 130 nm MOSFET semi-classical compact models (e.g., I-V/C-V characteristics) and cuit simulation. Stimulation data from TCAD is firstly transform the physical-relationship layer to contain more fundamental relations of physics. Then, the treated data is trained by fully-connected artificial neural networks.

TCAD Simulation
The 130 nm node MOSFET is stimulated with the Sentaurus Device TCAD too shown in Figure 2. The simulated training data have two parts: preselected de data/parameter s (i for sample index), and corresponding surface potential [10] v y s computed by TCAD. Each s is a d-dimensional vector specifying: the thickne the gate insulation layer (T ), the gate to source voltage (V ), the drain to source vo (V ), the length of the channel region (L ), the temperature (T), the doping concentr in the channel region (N ), and channel locations (x) ( Table 1 for details). The s sh cover a diverse range of device/operation conditions to generate a rich training dat The simulated potential value y is the difference between substrate electrostatic po tial and electrostatic potential (one nanometer below the channel surface) [11]. Schematic of the 130 nm MOSFET device with a polysilicon gate stimulated in TCAD surface potential is defined as the difference between the substrate electrostatic potential an electrostatic potential (one nanometer below the channel surface).

Physical-Relationship (PR) Layer
The PR-Layer groups the variables in each and applies group-wise transfor account for the desired interaction between parameters; the details are shown in Equa The surface potential is defined as the difference between the substrate electrostatic potential and the electrostatic potential (one nanometer below the channel surface).

Physical-Relationship (PR) Layer
The PR-Layer groups the variables in each x i and applies group-wise transform to account for the desired interaction between parameters; the details are shown in Equation (1), which reflect useful prior knowledge on some simple but fundamental relations of physics [12]. From a learning perspective, incorporating justified variable interactions can effectively reduce sample complexity; i.e., the amount of data needed for training an accurate model [13,14]. Here is the intrinsic Debye length, is the build-in potential at the source/drain terminal. Min-Max normalization is then adopted to standardize the data v i [15].

Fully-Connected (FC) Layer
In FC-Layers, all neurons in one layer will be fully connected to all neurons in the next layer. These are general-purpose network components, and serve as a nice complement to the PR-Layer in capturing complex nonlinear relations [16] between the device data and surface potential. As shown in Figure 1, we cascade two FC-layers right after the PR-layer, with 64 and 32 neurons, respectively, each activated by the sigmoid function. The two FC-layers are as follows [17].
The sigmoid function is well bounded and allows for efficient computation of the gradient. Finally, the predicted surface potential associated with each x i is computed bŷ Here, w ∈ R 32×1 and b ∈ R are model coefficients and bias.ŷ i ∈ R is the predicted value of the ANN. Mean-Squared-Error (MSE) ∑(y i −ŷ i ) 2 /n [19,20] is used as the loss function, which is iteratively minimized with stochastic gradient descent [21].
Upon the completion of the training process, the analytic expression of the surface potential can be written as: which is a concise model and can be efficiently evaluated.

Surface Potential Written in Verilog-A
To further verify the applicability of the framework in circuit simulation, the trained artificial neural network should be transformed into the form of Verilog-A [22], which is the commonly used hardware description language for MOSFET and other electronic components. Verilog-A is the analogy subset of Verilog-AMS [23], originally intended for modeling the behavior of analog and mixed-signal systems. Despite significant initial resistance, Verilog-A has emerged as the de facto stand language for defining and distributing compact models. In 2004, constructs explicitly for the purpose of compact modeling were added. Considering the incompatibility of matrix calculations in Verilog-A language, an automation script (Python [24], for example) is adopted to accelerate the transform process. Two steps are divided to realize this purpose. First, the value of the parameters in each layer should be entered into Verilog-A. In the framework, these parameters include physical parameters (for example, vacuum dielectric constant of silicon [25] ε si , Planck constant k, and Unit charge constant q) used in PRNN, Min and Max value for normalization, bias term b i , and weights term w i . Table 2 (I) shows the pseudo code [26] for inputting w i . The value of w i is read from the saved txt file and written to Verilog-A by intermediate variable a. Then, in the second step, the forward propagation process of the artificial neural network is realized in Verilog-A. In this step, the implementation of the calculation process in the framework includes the data normalization, neurons (h i ) in the hidden layers and denormalization of data at output dataŷ i . Table 2 (II) shows the pseudo code for transforming h 2 i . The calculation of weight and bias terms between each neural is conducted following Equations (3) and (4).

Establishing the Compact Model
After obtaining the analytical surface potential expression, the related compact model for MOSFET can be built. In this work, a semi-classical compact model is developed based on the combination of trained surface potential expression and the classical compact model. The main equations are shown as follows: [27][28][29][30][31] Here, u eff [29] is carrier mobility, u 0 is carrier mobility at a low electrical field, q d is the normalized charge of the depletion region, q i is the normalized charge of the inversion region, (MUE,THEMU) is the fitting parameters accounting for the mobility degradation [32] caused by the surface roughness and phonon scattering. Coulomb scattering is introduced using the parameter CS. E eff denotes the effective vertical field at the potential midpoint. V gf [33] accounts for the subthreshold region with parameter Sl acting as the correction parameter to the subthreshold swing. φs s is the surface potential obtained from the ANN at x = 0.01 um. V dsat [31] is the saturation voltage with m acting as a fitting parameter.
The gate capacitance model is also necessary in circuit simulation and can be calculated as follows [29,[34][35][36]:

Method Validation and Discussion
We evaluated the proposed framework by computing the surface potential in 130 nm MOSFET. We generated 540,000 training samples and 100,000 testing samples by TCAD simulation. The d-dimensional device data x i s were generated by randomly sampling each variable from their feasible domains. The evaluation results are reported in Figure 3. The left coordinate in Figure 3 shows the relationship between the MSE testing error and the training iteration process. It can be found that during the training process, the MSE loss decreases and stabilizes at 9.58 × 10 −7 , which is in millivolts, indicating a highly accurate result. The learning rate is an important hyper-parameter in the training process. A large learning rate promotes the rapid reduction of learning errors, while a small learning rate contributes to the convergence of the model. In this work, the learning rate is set to gradually decrease from 2 × 10 −5 to 1 × 10 −8 using the cosine annealing algorithm (a half cycle is adopted) during the training process, as the right coordinate in Figure 3 shows. is introduced using the parameter CS. E denotes the effective vertical field at the potential midpoint. V [33] accounts for the subthreshold region with parameter Sl acting as the correction parameter to the subthreshold swing. ϕs is the surface potential obtained from the ANN at x = 0.01(um). V [31] is the saturation voltage with m acting as a fitting parameter. The gate capacitance model is also necessary in circuit simulation and can be calculated as follows [29,[34][35][36]:

Method Validation and Discussion
We evaluated the proposed framework by computing the surface potential in 130 nm MOSFET. We generated 540,000 training samples and 100,000 testing samples by TCAD simulation. The d-dimensional device data s were generated by randomly sampling each variable from their feasible domains. The evaluation results are reported in Figure 3. The left coordinate in Figure 3 shows the relationship between the MSE testing error and the training iteration process. It can be found that during the training process, the MSE loss decreases and stabilizes at 9.58 × 10 , which is in millivolts, indicating a highly accurate result. The learning rate is an important hyper-parameter in the training process. A large learning rate promotes the rapid reduction of learning errors, while a small learning rate contributes to the convergence of the model. In this work, the learning rate is set to gradually decrease from 2 × 10 −5 to 1 × 10 −8 using the cosine annealing algorithm (a half cycle is adopted) during the training process, as the right coordinate in Figure 3 shows.   Figure 4a plots the 2D surface potential along the device channel. At a low gate voltage, the surface potential in the middle of the channel is determined by gate−channel work function differences. The surface potential of the drain and source terminal are raised by the PN junction [37], which is induced by different doping types of channel and source/drain terminals, and are hardly affected by V gs . When V gs increases from 0 (V) to 1.4 (V), the surface potential in the channel increases due to the electrical field induced by gate voltage V gs , while the potential at the drain/source terminal stays almost fixed. Thus, the minimum surface potential moves from the middle of the channel to the source terminal, which is a challenging feature that a traditional 1D Poisson equation [38] solution fails to capture. We found an excellent match between our model predictions and the TCAD simulation. Figure 4b plots the surface potential at the source and drain terminal versus the gate voltage, respectively. Excellent agreement is achieved between the TCAD simulation result and the PRNN result. When the gate voltage increases, the surface potential at the drain/source terminal increases first and then gradually saturates, which is consistent with previous reports [8].
source/drain terminals, and are hardly affected by V . When V increases from 0 (V) to 1.4 (V), the surface potential in the channel increases due to the electrical field induced by gate voltage V , while the potential at the drain/source terminal stays almost fixed. Thus, the minimum surface potential moves from the middle of the channel to the source terminal, which is a challenging feature that a traditional 1D Poisson equation [38] solution fails to capture. We found an excellent match between our model predictions and the TCAD simulation. Figure 4b plots the surface potential at the source and drain terminal versus the gate voltage, respectively. Excellent agreement is achieved between the TCAD simulation result and the PRNN result. When the gate voltage increases, the surface potential at the drain/source terminal increases first and then gradually saturates, which is consistent with previous reports [8].
Here, η is the ideality factor and is assumed to be independent of bias condition. σ is a physical parameter that reflects the influence of V to threshold voltage. The DIBL effect causes an excess injection of the charge carrier into the channel and gives rise to an   1.4 (V), the surface potential in the channel increases due to the electrical field induced by gate voltage V , while the potential at the drain/source terminal stays almost fixed. Thus, the minimum surface potential moves from the middle of the channel to the source terminal, which is a challenging feature that a traditional 1D Poisson equation [38] solution fails to capture. We found an excellent match between our model predictions and the TCAD simulation. Figure 4b plots the surface potential at the source and drain terminal versus the gate voltage, respectively. Excellent agreement is achieved between the TCAD simulation result and the PRNN result. When the gate voltage increases, the surface potential at the drain/source terminal increases first and then gradually saturates, which is consistent with previous reports [8].
Here, η is the ideality factor and is assumed to be independent of bias condition. σ is a physical parameter that reflects the influence of V to threshold voltage. The DIBL effect causes an excess injection of the charge carrier into the channel and gives rise to an ∆V t = −η∆φs min = −σV ds .
Here, η is the ideality factor and is assumed to be independent of bias condition. σ is a physical parameter that reflects the influence of V ds to threshold voltage. The DIBL effect causes an excess injection of the charge carrier into the channel and gives rise to an increased subthreshold current [40]. The overlaps of the gate depletion zone with the source/drain depletion zone share its depletion charge [41], and the shared charge is balanced by a counter charge distributed between the gate electrode and the source and drain contacts, which brings a shift in threshold voltage. With the introduction of the PRNN, analysis of the device against the DIBL effect could be conducted in a facile way, and thus facilitates the optimization of device performance. Figure 6 shows the comparison of the developed surface potential based semi-classical compact and TCAD simulation results. Good agreement is achieved between the n-type transfer characteristic curve against different drain voltages (a), the output characteristic curve (b), the transfer characteristic curve against different operation temperatures [42], and (d) small signals gate capacitance C gg . It shows that our model can well describe the device performance.
anced by a counter charge distributed between the gate electrode and the source and drain contacts, which brings a shift in threshold voltage. With the introduction of the PRNN, analysis of the device against the DIBL effect could be conducted in a facile way, and thus facilitates the optimization of device performance. Figure 6 shows the comparison of the developed surface potential based semi-classical compact and TCAD simulation results. Good agreement is achieved between the ntype transfer characteristic curve against different drain voltages (a), the output characteristic curve (b), the transfer characteristic curve against different operation temperatures [41], and (d) small signals gate capacitance C . It shows that our model can well describe the device performance. To further verify the applicability of the framework to modern microelectronic circuit design, the proposed compact model was transformed into Verilog-A and circuits simulation was conducted by including the MOSFET devices as new active components of the circuit simulator, as Figure 7 shows. A ring oscillator circuit with seven-stage inverters was connected in series to generate oscillation [42]. Figure 7a shows the Vout-vs-Vin curves of the inverter. When the size of the p-type MOSFET increases, the driver capability of the pull up increases, and thus, the Vout-vs-Vin curves shift to the right. Figure 7b shows the transient simulation results. When the size of the p-type MOSFET transistor decreases, the frequency of the oscillator decreases as well, and saturation appears at the lowest point of the oscillation. These simulation results clearly demonstrate the applicability and usefulness of our framework in circuit simulation applications. To further verify the applicability of the framework to modern microelectronic circuit design, the proposed compact model was transformed into Verilog-A and circuits simulation was conducted by including the MOSFET devices as new active components of the circuit simulator, as Figure 7 shows. A ring oscillator circuit with seven-stage inverters was connected in series to generate oscillation [43]. Figure 7a shows the Vout-vs-Vin curves of the inverter. When the size of the p-type MOSFET increases, the driver capability of the pull up increases, and thus, the Vout-vs-Vin curves shift to the right. Figure 7b shows the transient simulation results. When the size of the p-type MOSFET transistor decreases, the frequency of the oscillator decreases as well, and saturation appears at the lowest point of the oscillation. These simulation results clearly demonstrate the applicability and usefulness of our framework in circuit simulation applications.
Compared to the classical compact model method [29,44], the proposed framework avoids the numerical iteration process [14] in solving the surface potential expression, and thus, saves a great deal of effort in model design. Considering that most physical effects caused by device scaling directly act on surface potential, the proposed framework can better achieve underlying physic scaling compared to those works that directly train electrical properties of a device that is incompatible with model variation. Furthermore, in the literature [6], due to training data being obtained through the subtraction of device electrical properties and classical compact model output, the ANN acts as a correction term to existing models and contains scarcely underlying physical information. In contrast, the training data of the surface potential in this framework is obtained from TCAD simulators, which are based on equations of mathematical physics and reflect the physical relationship between device parameters and surface potential, thus containing more systematic physical information of the device. Compared to the classical compact model method [29,43], the proposed framework avoids the numerical iteration process [14] in solving the surface potential expression, and thus, saves a great deal of effort in model design. Considering that most physical effects caused by device scaling directly act on surface potential, the proposed framework can better achieve underlying physic scaling compared to those works that directly train electrical properties of a device that is incompatible with model variation. Furthermore, in the literature [6], due to training data being obtained through the subtraction of device electrical properties and classical compact model output, the ANN acts as a correction term to existing models and contains scarcely underlying physical information. In contrast, the training data of the surface potential in this framework is obtained from TCAD simulators, which are based on equations of mathematical physics and reflect the physical relationship between device parameters and surface potential, thus containing more systematic physical information of the device.

Conclusions
We exploited the universal approximation power of artificial neural networks in learning from large amounts of simulation data to generate accurate, generalizable MOSFET compact models in a highly automated manner. Impressive results were reported in building the analytic surface potential of a 130 nm MOSFET, which proved to be of benefit in device optimization. Furthermore, our work reveals the great potential of modern artificial intelligence techniques in boosting microelectronic research. The accurate, generalizable, and automated compact model development not only reduces the gap between theory and computing, but it is also expected to bring new vigor to vast landscapes in design, simulation, and the optimization of very large-scale circuit systems.

Conclusions
We exploited the universal approximation power of artificial neural networks in learning from large amounts of simulation data to generate accurate, generalizable MOSFET compact models in a highly automated manner. Impressive results were reported in building the analytic surface potential of a 130 nm MOSFET, which proved to be of benefit in device optimization. Furthermore, our work reveals the great potential of modern artificial intelligence techniques in boosting microelectronic research. The accurate, generalizable, and automated compact model development not only reduces the gap between theory and computing, but it is also expected to bring new vigor to vast landscapes in design, simulation, and the optimization of very large-scale circuit systems.