Machine Learning Algorithm for Efficient Design of Separated Buffer Super-Junction IGBT

An improved structure for an Insulated Gate Bipolar Transistor (IGBT) with a separated buffer layer is presented in order to improve the trade-off between the turn-off loss (Eoff) and on-state voltage (Von). However, it is difficult to set efficient parameters due to the increase in the new buffer doping concentration variable. Therefore, a machine learning (ML) algorithm is proposed as a solution. Compared to the conventional Technology Computer-Aided Design (TCAD) simulation tool, it is demonstrated that incorporating the ML algorithm into the device analysis could make it possible to achieve high accuracy and significantly shorten the simulation time. Specifically, utilizing the ML algorithm could achieve coefficients of determination (R2) of Von and Eoff of 0.995 and 0.968, respectively. In addition, it enables the optimized design to fit the target characteristics. In this study, the structure proposed for the trade-off improvement was targeted to obtain the minimum Eoff at the same Von, especially by adjusting the concentration of the separated buffer. We could improve Eoff by 36.2% by optimizing the structure, which was expected to be improved by 24.7% using the ML approach. In another way, it is possible to inversely design four types of structures with characteristics close to the target characteristics (Eoff = 1.64 μJ, Von = 1.38 V). The proposed method of incorporating machine learning into device analysis is expected to be very strategic, especially for power electronics analysis (where the transistor size is comparatively large and requires significant computation). In summary, we improved the trade-off using a separated buffer, and ML enabled optimization and a more precise design, as well as reverse engineering.


Introduction
Recently, along with the emerging self-driving car market, there has been a growing need for electric vehicles with high power conversion efficiency. According to this recent industrial demand, the steady development of power semiconductors has been utilized for electric vehicles, especially in converters of uninterruptible power supply (UPS) [1][2][3]. For designing power semiconductors, the Insulated Gate Bipolar Transistor (IGBT) [4][5][6] is a power switching device that basically operates like a BJT, showing a low forward voltage drop characteristic, and can withstand a high breakdown voltage (BV) owing to the low concentration of the n-drift region [7][8][9][10]. For this reason, IGBTs are widely used as power semiconductors, and super-junction IGBTs (SJBTs) have an especially high BV caused by fully depleted pillars [11]. Despite these advantages, the injected holes in the p-pillar in the turn-off state become a problem in that they increase the turn-off loss (E off ) [12,13]. As soon as the device is turned off, the holes in the p-pillar cannot be extracted immediately, causing loss. To solve this problem, a dual-gate SJ-IGBT (DG-IGBT) for unipolar turn-off [14] and an IGBT with a depletion trench (DT-IGBT) for active electron extraction [15] have been proposed. However, these devices have a trade-off between V on and E off . In a previous report, we propose an SJBT with separated n-buffer layers to solve the relatively long time required for carrier annihilation during turn-off. However, numerous combinations of possible buffer concentrations make it inefficient to simulate by setting arbitrary concentrations. Moreover, optimization takes longer as the structure becomes more complex, and more models are used [16].
In this study, we investigated a separated buffer SJBT (SB-SJBT) for controlling holes in pillars. The SB-SJBT contains buffers with different concentrations, which improves the trade-off by improving E off based on the same V on . In general, E off decreases as the concentration of the p-side n-buffer (p-buffer) increases but adversely affects BV and V on ; therefore, it is necessary to compensate for V on with an appropriate concentration combination by lowering the concentration of the n-side n-buffer (n-buffer). Thus, we propose a new approach using a machine learning (ML) algorithm. This approach rapidly and reliably predicts and optimizes the device characteristics [17]. Additionally, it is useful for controlling variables. For instance, when a designer tries to increase the BV of a device, it is made clear which of the parameters, such as device length and drift region concentration, should be adjusted. The remainder of this paper is organized as follows: The SJBT structure and characteristics and the ML algorithm are explained in Section II. Section III shows the verification of the algorithm for its reliability and its application for improving the performance and reverse engineering the parameters of the device with the targeted characteristics.

Proposed Structure and Characteristics
The structures of the conventional SJBT (C-SJBT) and SB-SJBT are illustrated in Figure 1. To identify the difference in electrical characteristics between the C-SJBT and SB-SJBT, we simulated the electrical characteristics of the devices using the Synopsys Sentaurus TCAD tool. In the simulation, we designed structures using the parameters listed in Table 1. The physical dimensions of SB-SJBT were the same as those of the C-SJBT. The only difference between the C-SJBT and SB-SJBT is the adoption of a separated n-buffer layer. The separated buffer layer was composed of a p/n-side n-buffer layer that was in contact with each p/npillar. Because of the high n-doping concentration in the SB-SJBT, the presence of the p-buffer layer becomes a barrier preventing hole injection in the p-collector layer. Figure 2 also shows the difference in hole injection between the SB-SJBT and C-SJBT. Under the same conditions, except for the concentration of the n-buffer, fewer holes enter through the highly doped buffer. The maximum minority carrier times for the hole and electron (t p , t n ) were set to 1 × 10 −5 s and 1.5 × 10 −5 s, the default values of TCAD [18]. predicts and optimizes the device characteristics [17]. Additionally, it is useful for controlling variables. For instance, when a designer tries to increase the BV of a device, it is made clear which of the parameters, such as device length and drift region concentration, should be adjusted. The remainder of this paper is organized as follows: The SJBT structure and characteristics and the ML algorithm are explained in Section II. Section III shows the verification of the algorithm for its reliability and its application for improving the performance and reverse engineering the parameters of the device with the targeted characteristics.

Proposed Structure and Characteristics
The structures of the conventional SJBT (C-SJBT) and SB-SJBT are illustrated in Figure  1. To identify the difference in electrical characteristics between the C-SJBT and SB-SJBT, we simulated the electrical characteristics of the devices using the Synopsys Sentaurus TCAD tool. In the simulation, we designed structures using the parameters listed in Table  1. The physical dimensions of SB-SJBT were the same as those of the C-SJBT. The only difference between the C-SJBT and SB-SJBT is the adoption of a separated n-buffer layer. The separated buffer layer was composed of a p/n-side n-buffer layer that was in contact with each p/n-pillar. Because of the high n-doping concentration in the SB-SJBT, the presence of the p-buffer layer becomes a barrier preventing hole injection in the p-collector layer. Figure 2 also shows the difference in hole injection between the SB-SJBT and C-SJBT. Under the same conditions, except for the concentration of the n-buffer, fewer holes enter through the highly doped buffer. The maximum minority carrier times for the hole and electron (tp, tn) were set to 1 × 10 −5 s and 1.5 × 10 −5 s, the default values of TCAD [18].      The p-buffer layer, which has a higher doping concentration than the n-buffer layer, increases the recombination rate of holes and reduces the number of holes reaching the ppillar. As a result, the absolute number of carriers present in the p-pillar is reduced, which results in a lower Eoff than that of the conventional SJBT. Figure 3 shows an electrical characteristic curve. The gate voltage, which changes from 0 V → 15 V → −15 V over a short time, is applied to the C-SJBT and SB-SJBT (Figure 1c). It can be seen from the waveform of the collector current ( Figure 3a) and collector voltage (Figure 3b) that the turn-off speed The p-buffer layer, which has a higher doping concentration than the n-buffer layer, increases the recombination rate of holes and reduces the number of holes reaching the p-pillar. As a result, the absolute number of carriers present in the p-pillar is reduced, which results in a lower E off than that of the conventional SJBT. Figure 3 shows an electrical characteristic curve. The gate voltage, which changes from 0 V → 15 V → −15 V over a short time, is applied to the C-SJBT and SB-SJBT ( Figure 1c). It can be seen from the waveform of the collector current ( Figure 3a) and collector voltage (Figure 3b) that the turn-off speed of the proposed SB-SJBT is faster than that of the C-SJBT. Figure 3d shows the distribution of holes in the device over time. Depending on the time, the hole is extracted, and t0 to t5 are indicated, which are 0.7 × 10 −6 s, 0.9 × 10 −6 s, 1.2 × 10 −6 s, 1.4 × 10 −6 s, 1.8 × 10 −6 s, and 2.0 × 10 −6 s, respectively. At turn-off (t 0 ), both devices have almost the same hole distribution, but the figure after t 1 to t 5 shows that the lines with symbols (SB-SJBT) typically have a lower hole distribution than the solid line (C-SJBT). This means that the SB-SJBT extracts the holes in the p-pillar faster than the C-SJBT in the same period of time. Therefore, the proposed structure reduces E off and improves the switching characteristics. To quantitatively measure the degree of improvement in Eoff, the Von of the C-SJBT and that of the SB-SJBT were set to similar values. The appropriate buffer concentrations for similar Von are 9 × 10 16 cm −3 for the C-SJBT, 3 × 10 17 cm −3 for the p-buffer in the SB-SJBT, and 3 × 10 16 cm −3 for the n-buffer in the SB-SJBT. Eoff was defined as the integral of the product of the voltage and current (Pc) from the time corresponding to 10% of the current to the time corresponding to 10% of the voltage. Therefore, the area under the power curve in Figure 3c becomes the Eoff of that structure.
As a result of calculating Eoff in the manner described in the previous sentence, the SB-SJBT and C-SJBT had losses of 1.68 μJ and 2.19 μJ, respectively. The turn-off characteristics of the SB-SJBT were improved by about 23.3% compared to those of the C-SJBT. The Von was 1.38 V in both structures, and BV was 621.5 V and 626.2 V, respectively, meaning that variable control performed well. Figure 3e shows the Eoff-Von trade-off for each device according to the doping concentration of the p-collector. Compared with the C-SJBT, the SB-SJBT shows an overall improved trade-off.

Designing ML Algorithm
The improvement of the SB-SJBT could be verified by TCAD simulations. However, optimization with restricted computing resources cannot be performed easily or clearly. Therefore, we suggest an easy and efficient method: the ML approach. The neural network (NN) generated by the ML model provides reliable predictions through functional relationships between inputs and outputs. The ML algorithm training procedure is shown in Figure 4. To design a useful ML algorithm, the following four steps can be considered. To quantitatively measure the degree of improvement in E off , the V on of the C-SJBT and that of the SB-SJBT were set to similar values. The appropriate buffer concentrations for similar V on are 9 × 10 16 cm −3 for the C-SJBT, 3 × 10 17 cm −3 for the p-buffer in the SB-SJBT, and 3 × 10 16 cm −3 for the n-buffer in the SB-SJBT. E off was defined as the integral of the product of the voltage and current (P c ) from the time corresponding to 10% of the current to the time corresponding to 10% of the voltage. Therefore, the area under the power curve in Figure 3c becomes the E off of that structure.
As a result of calculating E off in the manner described in the previous sentence, the SB-SJBT and C-SJBT had losses of 1.68 µJ and 2.19 µJ, respectively. The turn-off characteristics of the SB-SJBT were improved by about 23.3% compared to those of the C-SJBT. The V on was 1.38 V in both structures, and BV was 621.5 V and 626.2 V, respectively, meaning that variable control performed well. Figure 3e shows the E off -V on trade-off for each device according to the doping concentration of the p-collector. Compared with the C-SJBT, the SB-SJBT shows an overall improved trade-off.

Designing ML Algorithm
The improvement of the SB-SJBT could be verified by TCAD simulations. However, optimization with restricted computing resources cannot be performed easily or clearly. Therefore, we suggest an easy and efficient method: the ML approach. The neural network (NN) generated by the ML model provides reliable predictions through functional relationships between inputs and outputs. The ML algorithm training procedure is shown in Figure 4. To design a useful ML algorithm, the following four steps can be considered.
Step 1: Extract the examples to be used for training.
Step 2: Design the ML algorithm and train data.
Step 3: Extract the training data amount using the ML approach and determine whether it ensures a level of accuracy equal to that of the TCAD simulation.
Step 4: Redesign or complete the neural network and use it appropriately.
In the preprocessing process before training the data in Step 2, all of the data are divided into an 8:1:1 ratio for training data, test data, and validation data.
In the ML algorithm, the rectified linear unit (ReLU) activation [19] function was used for effective learning. ReLU activation is an activation function defined as the positive part of its argument. Thousands of data points were used, and the layer density was set to 200→100→100→3 with ReLU. The adaptive moment estimation (ADAM) optimizer [20,21] was used for accurate error correction. ADAM is generally used to update different values of parameters, such as AdAdagrad, Adadelta, and RMSprop, and has the advantage that the step size is not effective in rescaling the gradient. In order to obtain more precise results while compiling, the learning rate (LR) was set to 0.001. Normalization [22] by the min-max scaler in sklearn was also needed on account of the wide range of parameters. Finally, we controlled overfitting with the dropout and early stopping system of keras. Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel. So, for this setting, 10% of the layer output is randomly ignored. Additionally, early stopping is a form of regularization when training a learner with an iterative method. The patience, which is the number of iterations the algorithm can execute before starting to overfit, was set to 10. Additionally, since the epoch at the end is a value that includes all 10 epochs when there was no significant change, the 'best model' was saved by subtracting 10 epochs for optimization.  Step 1: Extract the examples to be used for training.
Step 2: Design the ML algorithm and train data.
Step 3: Extract the training data amount using the ML approach and determine whether it ensures a level of accuracy equal to that of the TCAD simulation.
Step 4: Redesign or complete the neural network and use it appropriately. In the preprocessing process before training the data in Step 2, all of the data are divided into an 8:1:1 ratio for training data, test data, and validation data.
In the ML algorithm, the rectified linear unit (ReLU) activation [19] function was used for effective learning. ReLU activation is an activation function defined as the positive part of its argument. Thousands of data points were used, and the layer density was set to 200 →100→100→3 with ReLU. The adaptive moment estimation (ADAM) optimizer [20,21] was used for accurate error correction. ADAM is generally used to update different values of parameters, such as AdAdagrad, Adadelta, and RMSprop, and has the advantage that

Verification of Model Reliability
The accuracy with the MSE, RMSE, RMSLE, and Loss is shown in Figure 5. The loss and accuracy are the test data, and Val_Loss and Val_Accuracy are the training data. The Mean Squared Error (MSE) was used to estimate accuracy. Supplementally, the root mean square error (RMSE) and root mean squared logarithmic error (RMSLE) were also used. MSE is sensitive to outliers because the error is just squared. Because the square root of the MSE is RMSE, the RMSE has a unit similar to the actual result, making it easy to interpret. The RMSLE, an abbreviation of log-scaled RMSE, is robust against outliers and measures the relative error because it uses a log scale. This has a feature that imposes a large penalty for underestimation. The lower the MSE, RMSE, and RMSLE, the better the trained model. However, when it converges to zero, that is, when the accuracy approaches 100%, the model only processes the input test data as they are and cannot ignore out-of-trend data. So, it has no value as a predictor. In that sense, these results, with values of around 95%, 0.0006, Micromachines 2023, 14, 334 6 of 10 0.024, and 0.020, are reasonable. Since the RMSLE is about 0.007 lower than the RMSE, it is expected that there are some large errors, but it is not a cause for concern because the RMSE is small enough to handle. In order to check the reliability visually, the distribution plots are shown in 3D in Figure 6g-i. The doping concentration of the n-buffer below the p-pillar is placed on the x-axis (p_side doping), the doping concentration of the n-buffer below the n-pillar (n_side doping) is placed on the y-axis, and E off , V on , and BV are placed on the z-axis. The E off , V on , and BV extracted by TCAD are indicated by red dots, and predictions by NN are indicated by blue gradation triangles according to the z-axis value. One method of using NN to represent the error rate is also shown in Figure 6d-f. The closer the values are to the diagonal on the graph, the smaller the error. Numerical analysis is performed by monitoring the overall output characteristics of the structure using the coefficient of determination (R 2 ) in the regression analysis. R 2 (0 < R 2 < 1) denotes the strength of the linear correlation between TCAD data and the predicted results. The calculation of the R 2 value is as follows: where SS total and SS regression are the squared total error and regression error, respectively, and y i , y, and y regression are each data point, the mean value, and the regression value, respectively. As a result, the R 2 values of E off , V on , and BV are 0.96816, 0.99550, and 0.99075, respectively. Since the value of R 2 is close to 1, it can be said that the model explains the data well. The numerical analysis is also shown in Table 2. The accuracy of the prediction data was evaluated once more using the confidence interval calculated by the standard error of the mean and the standard deviation [23].
Micromachines 2023, 14, x FOR PEER REVIEW 8 of 12 hand, the analysis via ML more clearly demonstrates the advantages of the SB-SJBT than the analysis via TCAD. This is because, unlike TCAD, ML infers more reasonable results while adjusting for large outliers from calculations.     6. Test data set from TCAD versus data set predicted by ML model. There are (a) Eoff-Von, (b) BV-Von, and (c) Eoff-BV. Note that black-colored circles, red-colored triangles, and blue-colored reverse triangles indicate TCAD sample, ML sample 1, and ML sample 2, respectively. Eoff (g), Von (h), (i) Figure 6. Test data set from TCAD versus data set predicted by ML model. There are (a) E off -V on , (b) BV-V on , and (c) E off -BV. Note that black-colored circles, red-colored triangles, and blue-colored reverse triangles indicate TCAD sample, ML sample 1, and ML sample 2, respectively. E off (g), V on (h), and BV (i) according to doping concentration are plotted in 3D, and errors are shown in (d-f), respectively. Each label is min-max-scaled, and the ideal criterion is set as the red diagonal line. E off (j), V on (k), and BV (l), the results of ML models tested with multiple samples, inherit the trends of E off (g), V on (h), and BV (i) that were approximately expected as points. The trained model does not simply mimic TCAD but corrects for some large errors that do not fit the trend. It makes continuous trends more reasonably predictable and helps to increase the reliability of the results. Indeed, the σ of ML is observed to be smaller than that of TCAD. The blue spheres in Figure 6j-l including the blue triangles in Figure 6g-i are the results of running ML with 120,000 input clusters not simulated by TCAD and show the complemented overall trend of the red dots (simulation) in Figure 6g-i. A more detailed example is shown in Figure 7. The superior characteristics of the SB-SJBT in terms of the trade-off compared to those of the C-SJBT were learned well as is. On the other hand, the analysis via ML more clearly demonstrates the advantages of the SB-SJBT than the analysis via TCAD. This is because, unlike TCAD, ML infers more reasonable results while adjusting for large outliers from calculations. and BV (i) according to doping concentration are plotted in 3D, and errors are shown in (d-f), respectively. Each label is min-max-scaled, and the ideal criterion is set as the red diagonal line. Eoff (j), Von (k), and BV (l), the results of ML models tested with multiple samples, inherit the trends of Eoff (g), Von (h), and BV (i) that were approximately expected as points.

Figure 7.
Comparing the trade-off graph extracted by the TCAD tool with the graph predicted by NN shows very high similarity. In addition, the optimal structure extracted by ML shows an improved trade-off.

Optimization
With the learned NN model, we could expect the proposed SB-SJBT to have an Eoff improvement of 24.7% at Von 1.38 V. However, this structure is not guaranteed to have an optimal trade-off. A combination of optimized buffer concentrations and characteristics can be extracted by ML [24][25][26]. The red triangles in Figure 7 are characteristic of a structure with an optimized trade-off characteristic having a combination of buffer concentrations of 3 × 10 16 cm −3 and 1 × 10 18 cm −3 , respectively. Based on the same Von of 1.38 V, the Eoff of the optimized SJBT improved by 36.2% (from 2.18 μJ to 1.39 μJ) for the C-SJBT, which is superior to the SB-SJBT, which improved by 24.7% to 1.64 μJ. In addition to this advantage, the ML approach has shown overwhelming advantages in terms of time during optimization. In this experiment, the duration was improved by more than 118% compared to the TCAD simulation, which took more than 10 min to compute a node, even considering the extraction time for approximately 1000 data points, the model training time, and the operation time required for ML. Moreover, it can be seen that the trade-off is also improved in TCAD when executed as a parameter in ML.

Reverse Engineering by NN
Typical simulations such as TCAD can only check discrete characteristics that are determined by specific input parameters. However, the analysis using NN was shown to have a tremendous advantage in that it generates a function that predicts the fluid point from the relationship between the input data and the output data. This makes reverse engineering possible. In addition, there are countless combinations with similar Comparing the trade-off graph extracted by the TCAD tool with the graph predicted by NN shows very high similarity. In addition, the optimal structure extracted by ML shows an improved trade-off.

Optimization
With the learned NN model, we could expect the proposed SB-SJBT to have an E off improvement of 24.7% at V on 1.38 V. However, this structure is not guaranteed to have an optimal trade-off. A combination of optimized buffer concentrations and characteristics can be extracted by ML [24][25][26]. The red triangles in Figure 7 are characteristic of a structure with an optimized trade-off characteristic having a combination of buffer concentrations of 3 × 10 16 cm −3 and 1 × 10 18 cm −3 , respectively. Based on the same V on of 1.38 V, the E off of the optimized SJBT improved by 36.2% (from 2.18 µJ to 1.39 µJ) for the C-SJBT, which is superior to the SB-SJBT, which improved by 24.7% to 1.64 µJ. In addition to this advantage, the ML approach has shown overwhelming advantages in terms of time during optimization. In this experiment, the duration was improved by more than 118% compared to the TCAD simulation, which took more than 10 min to compute a node, even considering the extraction time for approximately 1000 data points, the model training time, and the operation time required for ML. Moreover, it can be seen that the trade-off is also improved in TCAD when executed as a parameter in ML.

Reverse Engineering by NN
Typical simulations such as TCAD can only check discrete characteristics that are determined by specific input parameters. However, the analysis using NN was shown to have a tremendous advantage in that it generates a function that predicts the fluid point from the relationship between the input data and the output data. This makes reverse engineering possible. In addition, there are countless combinations with similar characteristics, so it is difficult to find each one, and compared to directly adjusting the input parameters by intuition, the method using NN is more effective in obtaining a precisely optimized doping level and checking the output characteristic. To give an example, by adopting the characteristics previously identified as structural characteristics of the SB-SJBT (E off = 1.64 µJ, V on = 1.38 V), structures with similar characteristics were extracted and are listed in Table 3. It turns out that structure B (SB-SJBT) has the same characteristics as expected. Structures A, C, and D were also found with almost the same characteristics, and as shown from the characteristics of structures A, C, and D, it can be seen that E off and BV decrease as the p-buffer concentration increases, and V on is compensated as the n-buffer concentration decreases. As such, the ML approach has the advantage of being able to target specific characteristics and flexibly control other parameters.

Conclusions
The SB-SJBT is proposed for the purpose of reducing E off , which is considered a major problem of the SJBT, and it needs to be optimized. In general, TCAD simulation is regarded as the most powerful means of semiconductor analysis for device design. However, only the characteristics of a specific parameter can be identified by TCAD simulation. This makes it inefficient in terms of the time required for designing and optimizing the structure or considering additional variables (e.g., process conditions). Therefore, an ML model is proposed. The ML approach ensured a remarkably short time with reliable accuracy in this study. Specifically, the R 2 of V on and E off reached 0.995 and 0.968, respectively, and the structure could be optimized to have an E off improvement of 36.2% over the C-SJBT based on a V on of 1.38 V. Moreover, the required input parameters could be easily obtained by using target output characteristics as necessary. Although there is still a simple functional relationship between the input and output, as the technology develops, there are endless prospects, such as predicting the carrier's movement path or the vector diagram of the electric field. In addition, the proposed ML utilization method is expected to be very strategic for various database applications (e.g., finding the optimal equipment and conditions for a unit process), especially in a power semiconductor analysis where the transistor size is comparatively big and requires lots of computation.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.