Research on Device Modeling Technique Based on MLP Neural Network for Model Parameter Extraction

: The parameter extraction of device models is critically important for circuit simulation. The device models in the existing parameter extraction software are physics-based analytical models, or embedded Simulation program with integrated circuit emphasis (SPICE) functions. The programming implementation of physics-based analytical models is tedious and error prone, while it is time consuming to run the device model evaluation for the device model parameter extraction software by calling the SPICE. We propose a novel modeling technique based on a neural network (NN) for the optimal extraction of device model parameters in this paper, and further integrate the NN model into device model parameter extraction software. The technique does not require developers to understand the device model, which enables faster and less error-prone parameter extraction software developing. Furthermore, the NN model improves the extraction speed compared with the embedded SPICE, which expedites the process of parameter extraction. The technique has been veriﬁed on the BSIM-SOI model with a multilayer perceptron (MLP) neural network. The training error of the NN model is 4.14%, and the testing error is 5.38%. Experimental results show that the trained NN model obtains an extraction error of less than 6%, and its extraction speed is thousands of times faster than SPICE in device model parameter extraction.


Introduction
In the process of device model parameter extraction, the model parameter values are substituted into the device model for calculation. The error between the measured data of the device and the calculated data of the model is obtained and compared with the target value. The optimization engine will adjust the model parameter values and then substitute them into the model calculation. This process is carried out iteratively until the fitting error is lower than the target value, as shown in Figure 1. The device models in the existing parameter extraction software are manually implemented, physics-based analytical models, or embedded SPICE. The implementation of physics-based analytical models requires the developer to be skilled in device physics and have a deep understanding of the models. Moreover, the programming implementation of the physics-based analytical models is tedious and error prone and, therefore, the software development is time intensive. The development of physics-based analytical models is not generic, and needs programming for each device model individually. When the SPICE is called for simulation, it is necessary to parse the netlist, establish the matrix equations, and iteratively solve the matrix equations, etc. It is time consuming to run the device model evaluation for the device model parameter model parameter extraction software by calling the SPICE, making the model parameter extraction less efficient. In recent years, NNs have been widely used due to their superior data processing capabilities. NNs provide smooth functions and infinite order derivatives that can approximate complex nonlinear functions [1,2]. Since the IV curves and other characteristics of devices are nonlinear, NNs have the potential to model them [3,4]. Currently, NNs have been applied in several modeling situations, such as small-signal modeling and microwave device modeling [5][6][7][8]. As the complexity of the device model increases, the number of model parameters goes up. Manually specifying new model parameters for complex device physics equations is much harder than adding new variables to the NN model [9].
The artificial neural network (ANN) is applied to model the device, with a preprocessing of the current and voltage to train the ANN model accurately [10]. A device model based on the NN acts as an intermediate between early device measurements and a later occurring compact model [11]. A fitness function based on key electrical characteristics (the off current and the subthreshold slope) is proposed to compensate for the R2 functions acting solely as the fitness function. A novel physics-inspired neural network (Pi-NN) approach for compact modeling is proposed in [12], where fundamental device physics are incorporated in the NN model to improve the modeling accuracy. In this paper, we propose using an MLP neural network instead of a manually implemented physics-based analytical model or embedded SPICE to build a device model and integrate it into the parameter extraction software. The NN uses the IV and CV curves, and other characteristics for modeling, and learns the complex relationship between the input and output by training. The NN modeling technique does not require understanding the device model, In recent years, NNs have been widely used due to their superior data processing capabilities. NNs provide smooth functions and infinite order derivatives that can approximate complex nonlinear functions [1,2]. Since the IV curves and other characteristics of devices are nonlinear, NNs have the potential to model them [3,4]. Currently, NNs have been applied in several modeling situations, such as small-signal modeling and microwave device modeling [5][6][7][8]. As the complexity of the device model increases, the number of model parameters goes up. Manually specifying new model parameters for complex device physics equations is much harder than adding new variables to the NN model [9].
The artificial neural network (ANN) is applied to model the device, with a preprocessing of the current and voltage to train the ANN model accurately [10]. A device model based on the NN acts as an intermediate between early device measurements and a later occurring compact model [11]. A fitness function based on key electrical characteristics (the off current and the subthreshold slope) is proposed to compensate for the R2 functions acting solely as the fitness function. A novel physics-inspired neural network (Pi-NN) approach for compact modeling is proposed in [12], where fundamental device physics are incorporated in the NN model to improve the modeling accuracy. In this paper, we propose using an MLP neural network instead of a manually implemented physics-based analytical model or embedded SPICE to build a device model and integrate it into the parameter extraction software. The NN uses the IV and CV curves, and other characteristics for modeling, and learns the complex relationship between the input and output by training. The NN modeling technique does not require understanding the device model, which enables faster and less error-prone development than the implementation of the physics-based analytical model. Moreover, the NN model improves the extraction speed over SPICE, which can expedite the process of parameter extraction.
The NNs in these works [10][11][12] are only used for circuit simulation, and the inputs of the NNs only contain the voltages and the device geometries. We use the MLP neural network to build a device model for parameter extraction. The MLP neural network inputs include the voltages, device geometries, temperature, and model parameters. We perform a sensitivity analysis of the model parameters, where sensitivity means the influence degree of the model parameters on the characteristic curves. The model parameters with non-zero sensitivity are used as the inputs for the NN. The device modeling flow based on NN and the flow of device model parameter extraction using the NN model is shown in Figure 2. which enables faster and less error-prone development than the implementation of the physics-based analytical model. Moreover, the NN model improves the extraction speed over SPICE, which can expedite the process of parameter extraction. The NNs in these works [10][11][12] are only used for circuit simulation, and the inputs of the NNs only contain the voltages and the device geometries. We use the MLP neural network to build a device model for parameter extraction. The MLP neural network inputs include the voltages, device geometries, temperature, and model parameters. We perform a sensitivity analysis of the model parameters, where sensitivity means the influence degree of the model parameters on the characteristic curves. The model parameters with non-zero sensitivity are used as the inputs for the NN. The device modeling flow based on NN and the flow of device model parameter extraction using the NN model is shown in Figure 2.   The rest of this paper is organized as follows. Section 2 introduces the data preparation and preprocessing methods. Section 3 presents the training and testing work of the NN. In Section 4, we analyze the device modeling and model parameter extraction results. Section 5 summarizes the work.

Data Preparation
This work uses current data for device modeling, defining the influence of the model parameters on the current as the sensitivity. Model parameters with non-zero sensitivity are taken as the inputs of the NN. The sensitivity calculation formulas are defined as follows: where I i is current when the model parameter A is set to P, and ∆I i denotes the current difference when the model parameter A is set to P and P + ∆P under the same working conditions. Array A0 represents the sensitivity array containing all current data. S(A) represents the sensitivity of the model parameter A, which is the maximum value of array A0.
In this work, we take the BSIM-SOI model as the source for the training set and testing set [13]. The sensitivity analysis results of the BSIM-SOI model show 128 model parameters with non-zero sensitivity, which are used as the NN inputs. Some of the model parameters with high sensitivity are listed in Table 1. It took 23.9 h to generate the data needed for the sensitivity analysis and 36 s for the sensitivity analysis itself.

Model Parameters Description
Bigc Parameter for I gcs and I gcd Nrecr0 Recombination non-ideality factor at reversed bias for source Beta2 Third V ds dependent parameter of impact ionization current Vtun0 Voltage dependent parameter for tunneling current for source Ntun Reverse tunneling non-ideality factor for source The extraction of device model parameters needs to take into account various working conditions (voltages, temperature) [14,15] and use a MOSFET of different sizes [16,17]. Therefore, the inputs of the NN need to contain six variables, namely, drain-source voltage (V ds ), gate-source voltage (V gs ), bulk-source voltage (V bs ), temperature (T), gate width (W), and gate length (L) in addition to the model parameters. Drain-source current (I ds ) is the output. The MLP neural network consists of an input layer, a few hidden layers, and an output layer, which are fully connected [18,19].
The 45 nm SOI process [20] was used to build the circuit. The minimum gate length value of the nmosbc-type MOSFET used in this work is 56 nm, while the minimum value of the gate width is 654 nm. For the training set, the gate length ranges from 56 nm to 1000 nm, the gate width from 654 nm to 1200 nm, and the temperature from −40 • C to 125 • C. The output characteristic (I ds -V ds ) curves sweep from 0 to 1.0 V for the V ds with a step of 0.05 V. The transfer characteristic (I ds -V gs ) curves sweep from 0 to 1.0 V for the V gs with a step of 0.05 V. The temperature, gate length, and gate width are scattered within the ranges. At the same time, the Latin hypercube sampling method was applied to the model parameters to ensure that their ranges were fully covered [21]. Tens of thousands of sets of IV curves were prepared as the training set. The time taken to generate the training set for the NN was 11.2 h. A good NN model must have good generalization ability, i.e., it is required that the NN model performs well on the test data, which is different from the training data [22,23].
The inputs were extended to varying degrees when preparing the test set. W was taken from 654 nm to 1500 nm, L from 56 nm to 1200 nm, and T from −40 • C to 125 • C. The range of model parameters was extended to 1 to 1.1 times the original range. The test set takes a minor step for voltage. The output characteristic curves for V ds sweep from 0 V to 1.0 V with a step of 0.025 V. The transfer characteristic curves for V gs sweep from 0 to 1.0 V with a step of 0.025 V. More than 4000 sets of IV curves were prepared as the test set.

Data Pre-Processing
Input feature vectors have different magnitudes, among which large feature vectors can dominate the training, causing the network to fail to learn from other feature vectors. Thus, the inputs need to be normalized. The normalization of the inputs can facilitate gradient descent and accelerate convergence [24,25]. In this work, min-max normalization is applied to the input feature vectors to map them between 0 and 1.
The values of currents mainly range from 1 × 10 −12 A to 1 × 10 −3 A with a large span of magnitude. If the NN is trained using unprocessed current, the error is dominated by the large current range, and the errors in the small current range are not weighted enough, which will lead to a poor fit for small values. We applied a logarithmic operation to the training data to solve this problem, and errors in all regions are weighted equally. However, when V ds is 0, I ds is zero current, which cannot take logarithms, so the points with V ds of 0 were removed from the training set and test set. At the same time, the voltage settings in the data preparation were adjusted, and the output characteristic curves were swept from 0.05 V to 1.0 V for V ds . In the following work, the I ds at V ds of 0 was specified as 0 for the convenience of discussion.

Training and Testing the MLP Neural Network
This MLP neural network leverages the backpropagation (BP) algorithm for training. The training process is divided into forward propagation and backpropagation. Forward propagation is a series of linear and activation operations using the hyperparameters with the inputs, and the output results are calculated backward from the input layer. Backpropagation uses the output layer error to optimize the hyperparameters [26,27].
The NN employs the adaptive momentum estimation (Adam) algorithm to accelerate the convergence [28], and the initial learning rate of the Adam algorithm is set to 0.01. The mean square root of the relative error (RR) is defined as the performance metrics, which is defined as follows: where I i sim denotes the current value of SPICE simulation and I i nn represents the predicted value of the NN.
The IV curves can be divided into subthreshold and non-subthreshold regions according to the relationship between V gs and the threshold voltage (V th ). The relationship between the current and voltage in the subthreshold region is exponential, and the equation is provided as follows: where µ 0 represents the mobility, W eff is the effective channel width, L eff is the effective channel length, q is the electronic charge, ε si is the dielectric constant of silicon, N ch is the substrate doping concentration, φ s is the surface potential, v t is the thermal voltage, V off is the offset voltage, and n is the subthreshold swing parameter. The relationship between the current and voltage in the non-subthreshold region is linear, and the equation is provided as follows: where v sat is the saturation velocity, C ox is the gate oxide capacitance, A bulk the body charge effect parameter, V dsat the drain-source saturation voltage, V A the Early voltage, and V ASCBE is the substrate current-induced body effect (SCBE) corresponding to the Early voltage [29]. Among the common NN activation functions, sigmoid types are used in preference, as they have smooth derivatives to meet the requirements for high-order derivatives in circuit simulations.
Two activation functions are used to model the device separately. From Figure 3, we can observe that log-sigmoid fits better than tanh-sigmoid. Therefore, the activation function of this NN eventually uses the log-sigmoid. During the forward propagation of the NN, log-sigmoid controls the data within 0 and 1, and the data change smoothly without data diffusion, so log-sigmoid obtains a higher model accuracy.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 6 of 16 the offset voltage, and n is the subthreshold swing parameter. The relationship between the current and voltage in the non-subthreshold region is linear, and the equation is provided as follows: where vsat is the saturation velocity, Cox is the gate oxide capacitance, Abulk the body charge effect parameter, Vdsat the drain-source saturation voltage, VA the Early voltage, and VASCBE is the substrate current-induced body effect (SCBE) corresponding to the Early voltage [29]. Among the common NN activation functions, sigmoid types are used in preference, as they have smooth derivatives to meet the requirements for high-order derivatives in circuit simulations. Among the sigmoid-type functions, the commonly used ones are log-sigmoid and tanh-sigmoid functions, which can map (−∞, +∞) inputs to the steps (0, 1) and (−1, +1), respectively. The equations of log-sigmoid and tanh-sigmoid functions are shown in Equations (6) and (7), respectively.
Two activation functions are used to model the device separately. From Figure 3, we can observe that log-sigmoid fits better than tanh-sigmoid. Therefore, the activation function of this NN eventually uses the log-sigmoid. During the forward propagation of the NN, log-sigmoid controls the data within 0 and 1, and the data change smoothly without data diffusion, so log-sigmoid obtains a higher model accuracy.  Some criteria were used in this study to better estimate the trained NN models and avoid overfitting. Those criteria include the Akaike Information Criterion (AIC), the corrected Akaike Information Criterion (AICc), and the Bayesian Information Criterion (BIC).
The AIC method attempts to find the model that explains the data best with the smallest number of parameters. The preference model is the model with the lowest AIC value Some criteria were used in this study to better estimate the trained NN models and avoid overfitting. Those criteria include the Akaike Information Criterion (AIC), the corrected Akaike Information Criterion (AICc), and the Bayesian Information Criterion (BIC).
The AIC method attempts to find the model that explains the data best with the smallest number of parameters. The preference model is the model with the lowest AIC value [30]. Under the assumption that the model errors are normally distributed, the value of AIC can be obtained by the following equation: where k is the number of parameters for the model to be estimated, n is the sample size, and RSS is the residual sum of squares from the estimated model. The equation of RSS is as follows: AICc is a correction of the AIC, which performs better when the sample size is smaller [31]. The equation of AICc is as follows: The BIC penalizes parameters more heavily than the AIC. For any two estimated models, the model with the lower value of BIC is preferred [32]. The equation of BIC is as follows: The value of RR is obtained based on the relative error. Since there is a large magnitude difference in the currents of different regions, using the relative error can show the fit of the NN model more intuitively. The value of RR can reflect the fitting quality of the IV curves, so when the values of RR of the different NN models differ widely, we prefer the NN model that generates the smallest RR value. When the RR values of different NN models are close to each other, we select the optimal model by referring to the AIC, AICc, and BIC values.
The AIC, AICc, and BIC values were calculated for each NN model, and the results are shown in Table 2. RE-10% in Table 2 represents the proportion of data with a relative error (RE) greater than 10% in the fitting data of the NN model. NN1, NN2, and NN3 are the optimal models corresponding to the AIC, AICc, and BIC, respectively. NN1, NN4, and NN5 are NN models with the smallest RR values. Although NN2 and NN3 achieved optimality under the AICc and BIC criteria, respectively, their RR values were significantly higher than the NN1 model. The equations for AIC and BIC can be divided into two items: the first one is the complexity penalty of the model (2k for AIC, k × ln(n) for BIC), and the second one is the accuracy penalty of the model (n × ln(RSS/n)). The first item is mainly used to avoid the overfitting of the model, and the second item considers the fitting of the model itself. The first item is a penalty to the second item, and the model with the fewest parameters is selected under the condition of satisfying the model validity and reliability, so the complexity must be considered after satisfying the high accuracy first. The difference in the accuracy penalty between the different NN models refers to the difference in RSS. The RSS value is 3.7910 × 10 −5 , 5.2934 × 10 −5 , and 7.5426 × 10 −5 for the NN1, NN2, and NN3 models, respectively. There is a significant difference in the accuracy of the three NN models, and the accuracy of the NN1 model is higher than the other two models. Therefore, the NN2 and NN3 models were not selected as the optimal model.
The RR values for the NN1, NN4, and NN5 models were close, so we further evaluated them using the AIC, AICc, and BIC criteria. The AIC, AICc, and BIC values for NN5 are larger than those of NN1 and NN4, while the AIC, AICc, and BIC values for NN1 and NN4 are closer to each other. The RSS value is 3.7910 × 10 −5 and 4.4666 × 10 −5 for the NN1 and NN4 models, respectively. Therefore, we eventually choose the NN1 model for further work. According to the RE-10% in Table 2, when the NN1 model fits the data, an RE of nearly 98% of the data is within 10%. It took 1.7 h to train the NN1 model, with a training error of 4.14% and a testing error of 5.38%.

Error and Performance Analysis
The mean value µ of the RE for the training data is 2.81%, and the standard deviation σ of the RE is 3.04%. The training data with an RE greater than µ + 3σ accounted for 0.89% of all training data. We performed an analysis on the training data with an RE greater than µ + 3σ and counted the distributions of temperature, gate width, and gate length. The statistical results are shown in Figures 4-6, respectively.
According to Figure 4, it can be seen that for the training data with an RE greater than µ + 3σ, the proportions of different bars are not very different. The proportion of the first bar in Figure 5 is lower than that of the other bars, which is because the gate width range of this bar is smaller. The overall distribution of the gate width is uniform when the gate width range is considered. From Figure 6, it can be observed that the proportion of data with a small gate length is very large, and the proportion almost decreases with the increase in the gate length. It indicates that the NN model fits less well to the data with a small gate length. Due to the short-channel effect, more complex device physics mathematics are involved around the minimum gate length [10], so it is reasonable that the error is slightly higher at the smaller gate length. model is higher than the other two models. Therefore, the NN2 and NN3 models were not selected as the optimal model. The RR values for the NN1, NN4, and NN5 models were close, so we further evaluated them using the AIC, AICc, and BIC criteria. The AIC, AICc, and BIC values for NN5 are larger than those of NN1 and NN4, while the AIC, AICc, and BIC values for NN1 and NN4 are closer to each other. The RSS value is 3.7910 × 10 −5 and 4.4666 × 10 −5 for the NN1 and NN4 models, respectively. Therefore, we eventually choose the NN1 model for further work. According to the RE-10% in Table 2, when the NN1 model fits the data, an RE of nearly 98% of the data is within 10%. It took 1.7 h to train the NN1 model, with a training error of 4.14% and a testing error of 5.38%.

Error and Performance Analysis
The mean value μ of the RE for the training data is 2.81%, and the standard deviation σ of the RE is 3.04%. The training data with an RE greater than μ + 3σ accounted for 0.89% of all training data. We performed an analysis on the training data with an RE greater than μ + 3σ and counted the distributions of temperature, gate width, and gate length. The statistical results are shown in Figures 4-6, respectively.
According to Figure 4, it can be seen that for the training data with an RE greater than μ + 3σ, the proportions of different bars are not very different. The proportion of the first bar in Figure 5 is lower than that of the other bars, which is because the gate width range of this bar is smaller. The overall distribution of the gate width is uniform when the gate width range is considered. From Figure 6, it can be observed that the proportion of data with a small gate length is very large, and the proportion almost decreases with the increase in the gate length. It indicates that the NN model fits less well to the data with a small gate length. Due to the short-channel effect, more complex device physics mathematics are involved around the minimum gate length [10], so it is reasonable that the error is slightly higher at the smaller gate length.    The mean value μ of RE for the test data is 3.89%, and the standard deviation σ of RE is 3.71%. The test data with an RE greater than μ + 3σ accounted for 1.50% of all test data. The distributions of temperature, gate width, and gate length for the test data with an RE greater than μ + 3σ are shown in Figures 7-9, respectively. According to Figures 7 and 8, it can be seen that for temperature and gate width, the proportions of different bars are not very different. According to Figure 9, it can be seen that the proportion of data with a small gate length is higher. The NN model fits less well to the training data for small gate lengths and, therefore, fits the test data for small gate lengths less well. 8   The mean value μ of RE for the test data is 3.89%, and the standard deviation σ of RE is 3.71%. The test data with an RE greater than μ + 3σ accounted for 1.50% of all test data. The distributions of temperature, gate width, and gate length for the test data with an RE greater than μ + 3σ are shown in Figures 7-9, respectively. According to Figures 7 and 8, it can be seen that for temperature and gate width, the proportions of different bars are not very different. According to Figure 9, it can be seen that the proportion of data with a small gate length is higher. The NN model fits less well to the training data for small gate lengths and, therefore, fits the test data for small gate lengths less well.  The mean value µ of RE for the test data is 3.89%, and the standard deviation σ of RE is 3.71%. The test data with an RE greater than µ + 3σ accounted for 1.50% of all test data. The distributions of temperature, gate width, and gate length for the test data with an RE greater than µ + 3σ are shown in Figures 7-9, respectively. According to Figures 7 and 8, it can be seen that for temperature and gate width, the proportions of different bars are not very different. According to Figure 9, it can be seen that the proportion of data with a small gate length is higher. The NN model fits less well to the training data for small gate lengths and, therefore, fits the test data for small gate lengths less well.
A statistical analysis of the IV curves for the training data was performed. It was found that for the I ds -V gs curves, the error of V gs from 0 to 0.2 V was slightly higher than the other regions of the curves. It is basically in the subthreshold region when V gs is between 0 and 0.2 V. According to Equation (4), it is known that the relationship between current and voltage in the subthreshold region is exponential. The exponential relation has infinite order derivatives, and its Taylor series expansion is composed of infinite polynomials. According to Equation (5), the relationship between current and voltage in the non-subthreshold region is polynomial. When using the same NN model for subthreshold and non-subthreshold regions, the finite polynomial for the non-subthreshold region is easier to fit and, therefore, its modeling accuracy is higher. The errors were counted separately for the subthreshold and non-subthreshold data in the training data using Equation (3), with 4.77% for the subthreshold region and 3.83% for the non-subthreshold region. At the same time, the errors were counted separately for the subthreshold and non-subthreshold data in the test data using Equation (3), with 6.89% for the subthreshold region and 4.79% for the non-subthreshold region.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 10 of 16 Figure 7. Temperature distribution of the test data with an RE greater than μ + 3σ. The test data with an RE greater than μ + 3σ accounted for 1.50% of all test data. Figure 8. Gate width distribution of the test data with an RE greater than μ + 3σ. The test data with an RE greater than μ + 3σ accounted for 1.50% of all test data.  . Gate width distribution of the test data with an RE greater than µ + 3σ. The test data with an RE greater than µ + 3σ accounted for 1.50% of all test data. Figure 9. Gate length distribution of the test data with an RE greater than μ + 3σ. The test data with an RE greater than μ + 3σ accounted for 1.50% of all test data.
A statistical analysis of the IV curves for the training data was performed. It was found that for the Ids-Vgs curves, the error of Vgs from 0 to 0.2 V was slightly higher than the other regions of the curves. It is basically in the subthreshold region when Vgs is between 0 and 0.2 V. According to Equation (4), it is known that the relationship between current and voltage in the subthreshold region is exponential. The exponential relation has infinite order derivatives, and its Taylor series expansion is composed of infinite polynomials. According to Equation (5), the relationship between current and voltage in the non-subthreshold region is polynomial. When using the same NN model for subthreshold and non-subthreshold regions, the finite polynomial for the non-subthreshold region is easier to fit and, therefore, its modeling accuracy is higher. The errors were counted separately for the subthreshold and non-subthreshold data in the training data using Equation (3), with 4.77% for the subthreshold region and 3.83% for the non-subthreshold region. At the same time, the errors were counted separately for the subthreshold and nonsubthreshold data in the test data using Equation (3), with 6.89% for the subthreshold region and 4.79% for the non-subthreshold region.
Multiple sets of IV curve data from a device need to be collected to extract different model parameters when extracting [27]. Thus, the NN model needs to fit multiple sets of IV curves from the device well. Ten randomly selected sets of inputs are shown in Table  3, and the IV curve fit for the first three sets of inputs are shown in Figures 10-12, where the curves indicate the IV curves from the SPICE simulation, and the dots indicate the predicted value of the NN. The fitting errors of the ten sets of inputs are shown in Table  4, where RR-all represents the error of the whole region. RR-sub and RR-nonsub represent the error in the subthreshold and non-subthreshold regions, respectively. The results show that the overall fit is good, demonstrating that the NN model achieves accurate modeling of the IV curves for the BSIM-SOI model. Multiple sets of IV curve data from a device need to be collected to extract different model parameters when extracting [27]. Thus, the NN model needs to fit multiple sets of IV curves from the device well. Ten randomly selected sets of inputs are shown in Table 3, and the IV curve fit for the first three sets of inputs are shown in Figures 10-12, where the curves indicate the IV curves from the SPICE simulation, and the dots indicate the predicted value of the NN. The fitting errors of the ten sets of inputs are shown in Table 4, where RR-all represents the error of the whole region. RR-sub and RR-nonsub represent the error in the subthreshold and non-subthreshold regions, respectively. The results show that the overall fit is good, demonstrating that the NN model achieves accurate modeling of the IV curves for the BSIM-SOI model.      The time comparison between the NN model prediction and SPICE simulation is shown in Table 5, comparing the time to generate 5000, 10,000, 15,000, and 20,000 sets of IV curves. The results show that the trained NN model is thousands of times faster than SPICE.

Device Model Parameter Extraction
The trained NN model is used for parameter extraction. Measured results include data of various device sizes. The gate length included 56 nm, 100 nm, 200 nm, 300 nm, 400 nm, and 500 nm, and the gate width included 654 nm, 700 nm, 800 nm, 900 nm, and 1000 nm. The devices were subjected to different operating temperatures. We used Equation (3) to calculate the error between the measured result and simulation result using the extracted model parameters. The errors of the five sets of measured data are 4.88%, 5.71%,  The time comparison between the NN model prediction and SPICE simulation is shown in Table 5, comparing the time to generate 5000, 10,000, 15,000, and 20,000 sets of IV curves. The results show that the trained NN model is thousands of times faster than SPICE.

Device Model Parameter Extraction
The trained NN model is used for parameter extraction. Measured results include data of various device sizes. The gate length included 56 nm, 100 nm, 200 nm, 300 nm, 400 nm, and 500 nm, and the gate width included 654 nm, 700 nm, 800 nm, 900 nm, and 1000 nm. The devices were subjected to different operating temperatures. We used Equation (3) to calculate the error between the measured result and simulation result using the extracted model parameters. The errors of the five sets of measured data are 4.88%, 5.71%, 5.62%, 5.78%, and 5.18%. Figures 13 and 14 show part of the fitting curves of the first two sets of measured data, where the curves indicate the measured results, and the dots indicate the simulation results of the extracted model parameters. The results show a good fit.  We used SPICE and the NN model to extract parameters from the measured data of a single device. The extraction error of the SPICE was 3.27%, and it took 4.4 h. The extraction error of the NN model was 4.28%, and it took 8.6 s. Both the SPICE and the NN model worked on the same CPU processor.

Conclusions
In this paper, a novel device modeling technique based on NNs for the optimal extraction of device model parameters is proposed and verified with the BSIM-SOI model. This technique does not require developers to manually implement physics-based analytical models, which vastly accelerates the developing of parameter extraction software. The development of the program related to the physics-based analytical models in the model We used SPICE and the NN model to extract parameters from the measured data of a single device. The extraction error of the SPICE was 3.27%, and it took 4.4 h. The extraction error of the NN model was 4.28%, and it took 8.6 s. Both the SPICE and the NN model worked on the same CPU processor.

Conclusions
In this paper, a novel device modeling technique based on NNs for the optimal extraction of device model parameters is proposed and verified with the BSIM-SOI model. This technique does not require developers to manually implement physics-based analytical models, which vastly accelerates the developing of parameter extraction software. The development of the program related to the physics-based analytical models in the model We used SPICE and the NN model to extract parameters from the measured data of a single device. The extraction error of the SPICE was 3.27%, and it took 4.4 h. The extraction error of the NN model was 4.28%, and it took 8.6 s. Both the SPICE and the NN model worked on the same CPU processor.

Conclusions
In this paper, a novel device modeling technique based on NNs for the optimal extraction of device model parameters is proposed and verified with the BSIM-SOI model. This technique does not require developers to manually implement physics-based analytical models, which vastly accelerates the developing of parameter extraction software. The development of the program related to the physics-based analytical models in the model parameter extraction software takes about six months, while the development of the NN model takes less than one week.
The MLP neural network used in this paper obtained 4.14% in overall training error, with 3.83% in the non-subthreshold and 4.77% in the subthreshold regions. The NN model has good generalization ability. The overall test error is 5.38% of more than four thousand IV curves, with 6.89% in the subthreshold and 4.79% in the non-subthreshold regions. The trained NN model is used for parameter extraction, and the extraction error is less than 6%. Compared with the embedded SPICE, the extraction speed of the NN model is thousands of times faster.