A Time Delay Neural Network Based Technique for Nonlinear Microwave Device Modeling

This paper presents a nonlinear microwave device modeling technique that is based on time delay neural network (TDNN). The proposed technique can accurately model the nonlinear microwave devices when compared to static neural network modeling method. A new formulation is developed to allow for the proposed TDNN model to be trained with DC, small-signal, and large signal data, which can enhance the generalization of the device model. An algorithm is formulated to train the proposed TDNN model efficiently. This proposed technique is verified by GaAs metal-semiconductor-field-effect transistor (MESFET), and GaAs high-electron mobility transistor (HEMT) examples. These two examples demonstrate that the proposed TDNN is an efficient and valid approach for modeling various types of nonlinear microwave devices.


Introduction
Artificial neural network (ANN) is a recognized tool for modeling and design optimization in RF and microwave computer-aided design (CAD) [1][2][3][4][5][6][7][8][9]. This technique has been successfully used in parametric modeling of microwave components [10][11][12], electromagnetic (EM) optimization [13,14], parasitic modeling [15], nonlinear device modeling [16][17][18], nonlinear microwave circuit optimization [19][20][21][22], power amplifier modeling [23][24][25], and more. This paper addresses the nonlinear device modeling area. Nonlinear device modeling is an important area of CAD and a variety of device models have been built. With the rapid development of semiconductor industry, new devices constantly evolve. The existing models may not be accurate for the new devices. Therefore, there is an ongoing need for new models. The challenge for CAD researchers is not only to develop new models, but also to introduce new CAD methods.
Traditionally, the equivalent circuit modeling approach is a vital modeling technique for nonlinear device modeling. The existing equivalent circuit models need to be modified in order to fit for different devices. The parameters in the equivalent circuit need repetitively changes and sometimes the parameters are mutually contradictory. Especially, when it comes to a new device, it is time consuming to build a nonlinear model that is based on equivalent modeling technique. For an alternative approach, where τ is a time delay parameter and N d represents the total number of delay steps.
Suppose that the TDNN model contains one input and one output and the f ANN is a three-layer multilayer perceptron (MLP) model. Therefore, Figure 1 shows the TDNN structure. In this figure, the TDNN structure contains external delay information compared with MLP model. After the neural network well trained by the device data, the TDNN model can be a good model. We can usually get DC, S-parameters, and harmonic data for nonlinear device modeling from measurement or simulation. Therefore, we propose an analytical formulation of TDNN for nonlinear device modeling using DC, bias-dependent S-parameter data, and large-signal harmonic balance (HB) data.  In this paper, the f ANN of the TDNN is a three-layer MLP. The first layer of the MLP is the input relay layer, the second layer is the hidden layer, and the third layer is the output layer. The sigmoid function is used as the activation function in the internal hidden layer.
After the neural network well trained by the device data, the TDNN model can be a good model. We can usually get DC, S-parameters, and harmonic data for nonlinear device modeling from measurement or simulation. Therefore, we propose an analytical formulation of TDNN for nonlinear device modeling using DC, bias-dependent S-parameter data, and large-signal harmonic balance (HB) data.
Let U represent the DC input signals and O be DC output signals. Therefore, the delayed signals of inputs in DC condition are all equal to U. The output of TDNN in DC case is derived as Let Y represent the small signal transfer function of the system. In transistor example, matrix Y represents Y-parameters. Let U bias denote the DC bias of u. The small-signal S-parameters are derived through the Y-parameters of the TDNN model that are shown in Equation (3). In Equation (3), the derivative of f ANN can be obtained using the adjoint neural network method [27], and k represents the index of delay buffers. The Y matrix, defined as the sum of products of e − jωkτ and ∂f ANN /∂u in (3), is frequency dependent due to the use of delayed signal in output function f ANN . Hence, the proposed TDNN model is a non-quasi static (NQS) model [28][29][30][31], when N d > 0. In Equation (3), jω = j2π f , where f represents frequency.
In the large-signal case, suppose the generic harmonic frequency be ω k , where the subscript k represents the index of harmonic frequency k = 0, 1, 2, ..., N H . N H is the number of harmonics that are considered in HB simulation. N T represents the number of time points. Let W N (n, k) denote the Fourier coefficient for nth time sample and the kth harmonic frequency, where n = 1,2, . . . ,N T and k = 1,2, . . . ,N H . Let superscript* represent complex conjugate. Let U(ω k ) and O(ω k ) be the input and output signals in the frequency domain, respectively. Given input U(ω k ) for all k, u(t n − K τ ) can be computed from Equation (4), where K = 0, 1, 2, . . . , N d . The outputs O(ω k ) are computed as in Equation (5). The frequency domain delay functions e − jω k τ , e −jω k 2τ , . . . ,e −jω k N d τ are introduced into the training equation. The proposed technique can accurately model the nonlinear behavior of the device by training the TDNN model with DC, S-parameter, and HB data.
We systematically described above the TDNN model equation used in DC, small-signal, and large-signal simulation. Because of the neural network universal approximation capability [1], such TDNN model can achieve satisfied accuracy.

An Algorithm for Training the Proposed TDNN Model
Our proposed TDNN model will be good after the neural networks being well trained by DC, S-parameters, and HB data of the nonlinear device. The training error is formulated as where E Tr represents the total training error, E DC represents the error between DC responses of the proposed TDNN model and the DC device data, E S represents the error between small-signal responses of the proposed TDNN model and the small-signal device data, and E HB represents the error between large-signal responses of the proposed TDNN model and the large-signal device data. α, β and γ represent the weighting factors for DC error E DC , small-signal error E S , and large-signal error E HB , respectively. The weighting factors α, β and γ can be roughly determined by the value range of the training data and the number of DC data, small-signal data, and large-signal harmonic data. k represent the kth training data of DC, bias-dependent S-parameters and HB, respectively. T represents of training sets. We use real and imaginary types of the HB data for training in the proposed TDNN technique.
The first step for developing the proposed TDNN model is to generate DC, small-signal and large-signal device data used for training and testing. The range of the training data should cover the range of the testing data. After data preparation, we have to determine the structure of the proposed TDNN model, including the number of delay buffers and the number of hidden neurons. After these preparation works, we begin to train the proposed TDNN model. In the beginning, the number of delays buffers can be tried from 1, i.e., N d = 1 and the hidden neurons can be tried with a smaller number. We first set α and β as constant that are roughly decided by the value range of the training data and the number of DC data, and small-signal data, and set γ equals 0. The proposed TDNN model can be trained with DC and small-signal data by adjust the neural network weights according to the error back propagation algorithm. After the first step training (it may need hundreds or thousands times of iteration, which is according to the practical problem), α, β and γ will be set as constants. Subsequently, the proposed TDNN model can be trained combined with DC, small-signal S-parameters, and large-signal harmonic data. After this step training, the training error will be calculated. When it is less than E t (user defined error criteria), the process of the training will stop. After the overall training, a separate set of DC, small-signal and large-signal data called test data, which are never used in the training, is used to test the quality of the proposed TDNN model. The test error E Te is defined as the error between the model responses and the test data. If the test error is also lower than the threshold error E t , then the model training process terminates and the proposed TDNN model is ready to be used for high-level design. Otherwise, the overall model training process will go to the previous step being repeated with different numbers of hidden neurons or different numbers of delay buffers. Figure 2 shows the flowchart illustrating the overall development process of the proposed TDNN model. error is also lower than the threshold error Et, then the model training process terminates and the proposed TDNN model is ready to be used for high-level design. Otherwise, the overall model training process will go to the previous step being repeated with different numbers of hidden neurons or different numbers of delay buffers. Figure 2 shows the flowchart illustrating the overall development process of the proposed TDNN model. S e t s t o p c r i t e r i a a n d r o u g h l y calculate the constants of , and   

GaAs metal-semiconductor-field-effect transistor (MESFET)
In this example, the TDNN method is used to model a Keysight advanced design system (ADS) [32] internal GaAs MESFET device [33] with the Statz model. Table 1 shows the parameters for the Statz model in ADS. We perform DC, small-signal, and large-signal training together in NeuroModelerPlus [34]. Training data includes DC data at different DC points, S-parameter data at different biases and large-signal harmonic data generated at different fundamental frequencies (1-6

GaAs Metal-Semiconductor-Field-Effect Transistor (MESFET)
In this example, the TDNN method is used to model a Keysight advanced design system (ADS) [32] internal GaAs MESFET device [33] with the Statz model. Table 1 shows the parameters for the Statz model in ADS. We perform DC, small-signal, and large-signal training together in NeuroModelerPlus [34]. Training data includes DC data at different DC points, S-parameter data at different biases and large-signal harmonic data generated at different fundamental frequencies (1-6 GHz), input power levels (−5-7 dBm), and loads (40-60 Ohm), as seen in Table 2. The training data set and test data set are not randomly divided shown in Table 2. There are DC data at 162 different DC points, bias-dependent S-parameter data at 120 different biases, and harmonic data at a total of 936 combinations of input power, fundamental frequency, and load for training data. There are DC data at 130 different DC points, bias-dependent S-parameter data at 95 different biases, and harmonic data at a total of 120 combinations of input power, fundamental frequency, and load for test data. All of the training data was generated in ADS after performing DC simulation, S-parameter simulation, and harmonic balance simulation for getting DC, S-parameter, and harmonic data, respectively. The range of V g and V d in DC case can cover the range of V g and V d in small-signal S-parameter and harmonic cases. The frequency range of S-parameter data can cover the frequency range of harmonic data which is calculated by the fundamental frequency with the number of harmonics considered in the harmonic modeling process. The range of test data is within the range of training data. In this example, we choose the time delay parameter of the TDNN as 0.0045 ns. We perform the training for the proposed TDNN technique according to part 3. The proposed TDNN model is built after nearly 3000 iterations of DC and small-signal training and 300 iterations of DC, small-signal, and large-signal training. It takes roughly 1.5 h with the Intel core i9-9900 CPU at 3.60 GHz of the computing system. When the training is finished, we compare the accuracy of the proposed TDNN model at different training conditions shown in Table 3.

Data Type Parameter Name
Training Data Test Data

Min Max
Step Min Max Step  For comparison purpose, we also developed the static model using the MLP technique for this GaAs MESFET example. MLP is a feedforward neural network. The inputs of the MLP and TDNN both are V g and V d of the transistor, the outputs of the MLP and TDNN are both I g and I d of the transistor. For fairly comparison, we both use a three-layer MLP for MLP technique and the f ANN of the TDNN technique, the activation functions are both the sigmoid function, the numbers of hidden neurons for these two techniques are both same, and the learning algorithm used in this paper is quasi-newton method. We compare the results from the MLP model and the proposed TDNN model that is shown in Table 3. In the case of DC, S-parameter at multiple biases, and HB training, the TDNN approach has accuracy advances over the static modeling technique, as seen in Table 3. This is because nonlinear devices usually contain dynamic effects, which is not adequate for device modeling by using the static modeling technique (MLP). However, when compared with MLP (only contains the present information), the proposed TDNN includes not only present information, but also the history information, which is necessary for nonlinear device modeling, especially when nonlinear device contains dynamic effects. When the number of delay buffers increases, the error of the proposed TDNN model when compared with device data decreases rapidly. We choose the condition (N d = 4, training error = 2.38%, test error = 1.88%) in Table 3

GaAs High-Electron Mobility Transistor (HEMT)
In this example, the proposed TDNN method is used to model the GaAs HEMT device. Training and test data were generated from a five-layer GaAs-AlGaAs-InGaAs HEMT example given in a physics-based device simulator Medici [35]. The structure of the HEMT [36] used in setting up the physical-based simulator is shown in Figure 5. Table 4 shows the parameters for the HEMT device. We performed DC, small-signal, and large signal training of the proposed TDNN according to the algorithm in part 3 with NeuroModelerPlus [34]. Training data includes DC data at different DC points, S-parameter data at different biases and large-signal harmonic data generated at different fundamental frequencies (2)(3)(4)(5) and input power levels (−20-10 dBm), as seen in Table  5. The static bias is chosen as: Vg: 0.2 V and Vd: 5 V. The training data set and test data set are not randomly divided shown in Table 5. There are DC data at 378 different DC points, bias-dependent S-parameter data at 138 different biases, and harmonic data at a total of 44 combinations of input power, fundamental frequency, and load for training data. There are DC data at 310 different DC points, bias-dependent S-parameter data at 110 different biases, and harmonic data at a total of 33 combinations of input power, fundamental frequency, and load for test data. All of the training data was generated in Medici after performing DC simulation, S-parameter simulation, and harmonic

GaAs High-Electron Mobility Transistor (HEMT)
In this example, the proposed TDNN method is used to model the GaAs HEMT device. Training and test data were generated from a five-layer GaAs-AlGaAs-InGaAs HEMT example given in a physics-based device simulator Medici [35]. The structure of the HEMT [36] used in setting up the physical-based simulator is shown in Figure 5. Table 4 shows the parameters for the HEMT device. We performed DC, small-signal, and large signal training of the proposed TDNN according to the algorithm in part 3 with NeuroModelerPlus [34]. Training data includes DC data at different DC points, S-parameter data at different biases and large-signal harmonic data generated at different fundamental frequencies (2)(3)(4)(5) and input power levels (−20-10 dBm), as seen in Table 5. The static bias is chosen as: V g : 0.2 V and V d : 5 V. The training data set and test data set are not randomly divided shown in Table 5. There are DC data at 378 different DC points, bias-dependent S-parameter data at 138 different biases, and harmonic data at a total of 44 combinations of input power, fundamental frequency, and load for training data. There are DC data at 310 different DC points, bias-dependent S-parameter data at 110 different biases, and harmonic data at a total of 33 combinations of input power, fundamental frequency, and load for test data. All of the training data was generated in Medici after performing DC simulation, S-parameter simulation, and harmonic balance simulation for getting DC, S-parameter, and harmonic data, respectively. The range of V g and V d in DC case can cover the range of V g and V d in small-signal S-parameter and harmonic cases. The frequency range of S-parameter data can cover the frequency range of harmonic data, which is calculated by the fundamental frequency with the number of harmonics considered in the harmonic modeling process. The range of test data is within the range of training data. In this example, we choose time delay parameter of the TDNN as 0.005 ns. After nearly 2000-3000 iterations of DC and small-signal training and 300 iterations of DC, small-signal, and large-signal training, the proposed TDNN model is built. It takes roughly 1.5-2 h with the Intel core i9-9900 CPU at 3.60 GHz of the computing system. When the training is finished, we compare the accuracy of the proposed TDNN model at different training conditions shown in Table 6. balance simulation for getting DC, S-parameter, and harmonic data, respectively. The range of Vg and Vd in DC case can cover the range of Vg and Vd in small-signal S-parameter and harmonic cases. The frequency range of S-parameter data can cover the frequency range of harmonic data, which is calculated by the fundamental frequency with the number of harmonics considered in the harmonic modeling process. The range of test data is within the range of training data. In this example, we choose time delay parameter of the TDNN as 0.005 ns. After nearly 2000-3000 iterations of DC and small-signal training and 300 iterations of DC, small-signal, and large-signal training, the proposed TDNN model is built. It takes roughly 1.5-2 h with the Intel core i9-9900 CPU at 3.60 GHz of the computing system. When the training is finished, we compare the accuracy of the proposed TDNN model at different training conditions shown in Table 6.     Drain N+ 2 × 10 20 Table 5. Training data and test data for GaAs HEMT.

Data Type Parameter Name
Training Data Test Data

Min Max
Step Min Max Step For comparison purpose, we have also developed MLP model for this GaAs HEMT example. The inputs of the MLP and TDNN are V g and Vd of the transistor, the outputs of the MLP and TDNN are I g and I d of the transistor. For fairly comparison, we both use a three-layer MLP for MLP technique and the f ANN of the TDNN technique, the activation functions are both the sigmoid function, the numbers of hidden neurons for these two techniques both are same, and the learning algorithm used in this paper is quasi-Newton method. In the complicated case, DC, S-parameter at multiple biases and HB training together, TDNN model has huge accuracy advantage over MLP model, as seen in Table 6. In this table, the error of the TDNN model compared with test data reduces as the number of delay buffers increases. When comparing the number of hidden neurons 30 and 40, we can see as the number of hidden neurons increases, the accuracy enhances slowly. We choose the condition (N d = 7, 40 hidden neurons, training error = 1.15%, and test error = 1.9%) in Table 6 in order to present the results of our proposed TDNN model. The DC, S-parameters and HB responses of the proposed TDNN model are shown in Figures 6 and 7. Finally, in the proposed TDNN model for this GaAs HEMT example, the number of hidden neurons is 40, the time delay parameter of the TDNN is 0.005 and the number of delay buffers is 7. From these figures, we can see that the proposed TDNN technique can accurately model the GaAs HEMT example.

Conclusions
In this paper, we have proposed a TDNN based technique for nonlinear microwave devices modeling. We have proposed a set of new formulations for training with DC, small-signal and large-signal data. We also have proposed an algorithm for the proposed TDNN model development. The modeling of GaAs MESFET and GaAs HEMT examples has successfully demonstrated that the TDNN based technique can accurately build nonlinear microwave device models. Using measurements to validate the comparison with real situation could be a useful direction. In the future direction, the thermal and trapping effects can be combined into the proposed TDNN. In the future, conventional device modeling method as compared with proposed TDNN can be a useful direction. As a potential future direction, the proposed TDNN technique can be investigated for other semiconductor technologies, such as Si and GaN based FETs. In the future, modeling and design microwave absorbers by the proposed technique can also be investigated.

Conclusions
In this paper, we have proposed a TDNN based technique for nonlinear microwave devices modeling. We have proposed a set of new formulations for training with DC, small-signal and large-signal data. We also have proposed an algorithm for the proposed TDNN model development. The modeling of GaAs MESFET and GaAs HEMT examples has successfully demonstrated that the TDNN based technique can accurately build nonlinear microwave device models. Using measurements to validate the comparison with real situation could be a useful direction. In the future direction, the thermal and trapping effects can be combined into the proposed TDNN. In the future, conventional device modeling method as compared with proposed TDNN can be a useful direction. As a potential future direction, the proposed TDNN technique can be investigated for other semiconductor technologies, such as Si and GaN based FETs. In the future, modeling and design microwave absorbers by the proposed technique can also be investigated.