1. Introduction
In recent years, due to the increase in nonlinear connected to the utility grid, the problems of low quality and harmonic distortion in power systems have increasingly attracted widespread attention [
1]. A large amount of harmonic current will cause power grid voltage distortion and affect the normal operation of electrical equipment. It may also cause parallel or series resonance of the power grid [
2]. Three-phase PWM converter, as the interface for energy conversion, has become an important device to improve this problem due to its excellent performance and potential advantages, such as sinusoidal input current and adjustable high input power factor. It is widely used in photovoltaic power generation, wind power generation, traction drive system and other occasions that require strict equipment reliability.
However, the reliability of converters used in high-reliability applications is not high, and semiconductor power devices are the weakest components in converters. According to the semiconductor device survey report of more than 200 products from 80 companies, the converter failure caused by semiconductor devices accounts for about 60% of all failure data [
3]. In order to obtain high-performance power quality, the system needs to work at a very high switching frequency, generally above 10 kHz. Working at high frequency and high temperature for a long time further increases the damage probability of the IGBT. Since the fault of IGBT has always been the main reason for the fault of the three-phase PWM converter, the fault diagnosis and location of IGBT becomes particularly important [
4].
The fault types of IGBT are divided into short-circuit fault and open-circuit fault. Extremely high inrush currents from short-circuit conditions can cause permanent damage to the system, and this type of fault is catastrophic and easy to measure. Therefore, the protection circuit is usually integrated in the inverter system [
5]. On the other hand, the open-circuit fault is recessive, and when the fault occurs, it is not enough to trigger the hardware protection circuit. Excessive electrical stress and thermal stress will lead to secondary failures of other equipment. If failure detection methods are not taken, serious safety and property accidents will occur. Therefore, the open-circuit fault diagnosis of IGBT has attracted a lot of research attention [
6].
Generally, fault diagnosis methods can be divided into model-based, signal-based and data-driven [
7,
8,
9]. Model-based and signal-based inverter fault diagnosis for drive systems have been extensively studied in recent years. Reference [
10] proposes a fast diagnosis method for a senseless inverter open-circuit fault by analyzing the switching function model of the inverter in healthy state and fault state. Subsequently, they proposed a fault diagnosis method based on the existing residual vector to remove the influence of the load [
11]. Reference [
12] proposed an open-circuit fault diagnosis method for voltage source inverters in PMSM drive systems using model reference adaptive system technology. References [
13,
14] proposed a simple single-switch and double-switch OC fault diagnosis method based on the three-phase current distortion of a vector-controlled induction motor driven by voltage source inverter. Reference [
15] proposed a fault detection technology for an IGBT open-circuit fault in an induction motor driven by voltage source inverter. This technology detects the IGBT open-circuit fault by analyzing the PWM switching signal and the line-to-line voltage level during the switching time.
In recent years, benefiting from the rapid development of machine learning theory, the machine learning data-driven fault diagnosis method, which can directly realize fault diagnosis without logical or mathematical description of the inspection object, has received special attention due to its significant advantages [
16]. The method is independent of the system model and signal image, and provides a potential fault diagnosis scheme to improve the accuracy of fault diagnosis. Reference [
17] proposed a fault diagnosis and reconstruction method for multi-level inverters using neural networks. The output phase voltage of the inverter is used as a diagnostic signal to detect the fault and its location. Reference [
18] proposed a fuzzy logic-based PWM voltage source inverter induction motor drive fault detection and diagnosis method. This technique requires measuring the output current of the inverter to detect the intermittent loss of ignition pulses in the inverter power switch. Reference [
19] proposed a machine learning technique for fault diagnosis of induction motors by using structured neural networks. This method can detect and isolate common fault types such as single switch open-circuit fault, back short-circuit fault, short-circuit fault and unknown fault. Reference [
20] proposed a fault diagnosis strategy for cascaded H-bridge multilevel inverter systems based on principal component analysis and multi-class correlation vector machines. Reference [
21] proposes an OC fault diagnosis and online monitoring scheme for grid-connected single-phase inverters using an adaptive neuro-fuzzy inference system algorithm. Reference [
22] proposes OC fault diagnosis and sliding-window classification based on hybrid ensemble learning. Higher fault diagnosis accuracy is achieved and the fault diagnosis speed is improved. Reference [
23] proposes a data-driven fault diagnosis method for three-phase PWM inverter in induction motor drives to realize simultaneous diagnosis of IGBT and current sensor fault of three-phase PWM inverter in induction motor drives. Reference [
24] proposes a IGBT open-circuit fault diagnosis method based on transferable data driven to improve the generalization performance of fault diagnosis model.
Random forests (RFs) algorithms are suitable for regression and classification, due to their excellent performance in pattern recognition, state monitoring, system control and other fields [
25,
26,
27,
28,
29]. Reference [
27] proposed a RFs-based fault identification model for power transformer by radio frequency identification and the feature extraction of differential measurement current data, which can effectively distinguish internal faults and disturbances. Integrating XGBoost in RFs, a data-driven based wind turbine fault detection method is proposed in [
28], which prevents over-fitting when dealing with multidimensional data. Reference [
30] proposed a RFs regression based implementation of SVPWM for a two-level inverter, which can improve the performance of the three-phase induction motor (TIM) drive. Reference [
31] proposed a DAWRF algorithm for IoT fault detection based on edge computing and blockchain, which improved the traditional RFs algorithm effective application in IoT fault detection.
In practical applications, most data-driven methods pay great attention to the robustness and generalization ability of fault diagnosis. Reliable diagnosis needs to adapt to as many operating conditions as possible, which is not fully addressed in most literatures. In addition, data-driven algorithms often suffer from algorithmic accuracy bottlenecks and huge training burdens. Aiming at the above problems, this paper proposes a data-driven IGBT open-circuit fault diagnosis method. Its salient features and advantages are as follows:
- (1)
- In order to extract the mapping relationship between fault features and failure modes, a robust accuracy weighted random forests algorithm is proposed, which uses out-of-bag datasets and object datasets with random disturbances to evaluate the performance of the fault diagnosis model. In accordance with the evaluation results, the weights of the decision trees are changed accordingly to improve the anti-noise ability of RFs. 
- (2)
- Several hyper-parameters of the trained diagnostic model are optimized by an optimization programming framework (OPF). The OPF’s aims are to improve the accuracy of the proposed fault diagnosis model. By this optimization, a number of hyper-parameter sets can be provided to satisfy the test object limitations and difficulties of dataset collection. 
- (3)
- This method is a non-invasive fault diagnosis method, which only needs three-phase current signal as input without additional sensors. Compared with most fault diagnosis methods, the proposed method has higher fault diagnosis accuracy and less computational burden. 
The method has been validated in simulation and in-circuit testing. Compared with the existing fault identification methods, this method has the advantages of high fault diagnosis accuracy and better robustness.
The rest of this paper is organized as follows: 
Section 2 introduces the inverter system and fault types in this study. 
Section 3 briefly introduces the general framework of the method. The theoretical basis and detailed description of the improved robust accuracy weighted random forests algorithm are presented in 
Section 4. 
Section 5 details the simulation and experimental verification and discussion. Finally, 
Section 6 presents the overall conclusion of this paper.
  2. System Description and Fault Identification
The object of this paper is a three-phase PWM converter widely used in electric drive systems. Its circuit topology is a typical three-phase full-bridge structure as shown in 
Figure 1, consisting of six IGBTs (
T1–
T6) and corresponding anti-parallel connected diodes (
D1–
D6). 
Ia, 
Ib, 
Ic are the output currents of the converter, which are controlled by the control system with 100 kHZ sampling rate, 20 kHZ control sampling rate. The IGBT is controlled by the corresponding drive signals.
In this paper, the proposed open-fault diagnosis system is as shown in 
Figure 1, which is composed of data caching field programmable gate array (FPGA) and industrial personal computer IPC. FPGA resamples the high speed three-phase current signals at a sampling rate of 10 kHZ to caches one cycle of current signals. IPC executes the fault diagnosis algorithm according to the current signal of one cycle to obtain the fault diagnosis result. Once a fault signal is detected, IPC gives instructions via direct memory access (DMA) to shut down the control system. By this way, the fast IGBT open fault protection is realized. The main advantage of this system is that it reduces the computational burden by down sampling, so that one fault diagnosis system can protect multiple power sources simultaneously.
However, in the actual operation of the fault diagnosis system, the three-phase current signal is affected by many disturbance factors, which may lead to misdiagnosis and power shutdown. Therefore, the reliability and anti-interference ability of fault diagnosis are very important. Usually, the IGBT open-circuit fault mainly includes the following two situations:
- (1)
- When the chip Sn fails, the body diode Dn also fails at the same time. 
- (2)
- When the chip Sn fails, the body diode Dn works normally. 
Case 1 usually causes over-voltage to trigger the hardware protection, so this type is not the research scope of open-circuit fault diagnosis in this study.
Different IGBT open-circuit fault scenarios will generate different output current signals. In total, there are six single IGBT fault types and 15 dual IGBT fault types. Considering normal operating conditions, there are a total of 22 fault tags. All labels are listed in 
Table 1. Based on these fault labels, this paper proposed a data-driven fault diagnosis strategy. The input of the method is a periodic three-phase output current signal, and the output is a fault label representing the type of fault and the location of the fault.
  3. Generic Method Framework
This paper proposed a data-driven IGBT open-circuit fault diagnosis method. The three-phase current sampling of the inverter is used as the diagnostic input, and finally the fault label is the diagnostic output to realize the fault diagnosis and location. The framework of the whole method is shown in 
Figure 2, which is divided into two stages: offline model training and online fault diagnosis.
In the offline development process, the random forests fault diagnosis model is trained using a historical fault database composed of ideal model samples. In order to improve the performance of fault diagnosis, standardization is used to perform feature processing on the sampled three-phase current data. Random forest is an ensemble learning algorithm composed of multiple decision trees. It has strong anti-disturbance ability and robust characteristics, and is one of the most widely used fault classifiers. In this paper, an improved random forests algorithm is used to train the fault diagnosis model to obtain the mapping relationship between input features and fault labels. Different from the traditional random forests method, each decision tree in the improved random forests will vote weighted according to its fault diagnosis accuracy. Overall, they provide more accurate and robust fault diagnosis results.
In online application, the measured object is a non-ideal system affected by various factors. The fault diagnosis method proposed in this paper, the trained fault diagnosis model can be directly applied to the non-ideal power system with disturbance factors, and the measured three-phase current information input into the fault diagnosis model to acquire the fault label, no additional model training is required, and excessive dependence on the fault data of the tested object is avoided.
  4. Random Forests Theory
The essence of fault diagnosis and localization of power source is a classifier, which classifies the operating state of the power source inputting characteristic signals. The random forests algorithm is a supervised ensemble learning algorithm that integrates many decision trees for prediction and classification. The core idea is that weak classifiers formed by multiple decision trees are integrated into a strong classifier, and multiple weak classifiers give classification results through certain rules. The decision tree algorithm uses the labeled samples to construct a fault diagnosis classifier, and obtains the mapping relationship between input features and fault labels. It enables the classifier to classify and discriminate unlabeled samples. The random forests classifier trains each decision tree classifier with random samples and uses different training sets to increase the difference between the decision tree models, thereby improving the generalization ability and robustness of the classifier. This characteristic is beneficial to solve various disturbance misdiagnosis of IGBT open-circuit fault diagnosis.
  4.1. Decision Tree
Decision tree is a binary recursive partitioning technique that splits the current sample set into two subsets at each node (except leaf nodes). The attribute selection measure adopted by the algorithm is the Gini index. Assuming that the dataset 
D contains 
m categories, the formula for calculating the Gini index 
GD is:
In the formula: 
Pj is the frequency of the 
j-th type of faults in the training database. The Gini index needs to consider the binary division of each feature. Assuming that the binary division of feature 
A divides dataset 
D into 
D1 and 
D2, the Gini index of sample set 
D divided by feature 
A at the child node this time is:
For each attribute, every possible binary partition is considered, and finally the subset that yields the smallest Gini index for that attribute is selected as its split subset. Therefore, the smaller the Gini index GD on attribute A, the better the division effect on attribute A is. Under this rule, the division continues from top to bottom until the entire decision tree grows:
  4.2. REF Algorithm
Definition 1. The random forests model f is a set of decision trees {h(X, θk), k = 1, 2, … Ntree}, and the classifier h(X, θk) is an unpruned decision tree constructed with a decision tree algorithm.
 θk is a random vector independent and identically distributed with the kth decision tree, representing the growth process of the tree; the final classification value of the random forests is obtained by the majority voting method.
Definition 2. For the input vector X, it contains at most J different categories, let Y be the correct classification category, and for the input vector X and output Y, define the edge function as:
 In the formula: 
j is one of the 
J categories; 
I(.) is the indicator function; 
ak is the average function; 
k = 1, 2, … 
n. The larger the edge function, the higher the confidence of the correct classification. The generalization error of random forests is thus defined as:
In the formula:  is the classification error probability function for a given input vector X. When the number of decision trees in the forest is large, the following theorem is obtained by using the law of large numbers:
Theorem 1. When the number of decision trees is large enough, for all sequences θk, E* converges almost everywhere:
 In the formula:  is the probability of satisfying the condition c for a given sequence θ. This theorem shows that the generalization error of random forests will not cause overfitting as the number of trees increases, but will tend to an upper bound.
Theorem 2. The upper bound of the random forests generalization error is:
 In the formula: ρ and s are the average correlation coefficient and average strength of the tree. It can be seen from Theorem 2 that with the decrease in the correlation of decision trees and the increase in the strength of a single decision tree, the upper bound of the generalization error of random forests will decrease, and its generalization error will be effectively controlled.
Guaranteed by the law of large numbers, random forests has a high classification accuracy without overfitting, which is extremely suitable for power source fault diagnosis. From the above analysis, there are two main ways to improve the diagnostic accuracy performance of random forests fault diagnosis model, namely, reducing the correlation ρ of decision trees and improving the fault diagnosis performance s of a single decision tree. In addition, an important feature of random forests is out-of-bag estimation. When a training subset is generated by bagging, for each decision tree, nearly 37% of the samples in the original sample set S will not appear in the other trees. In the training subset, these samples are called out of bag (OOB) samples. OOB samples can be used to estimate the generalization error of RFs and also to calculate the importance of each feature. The above theorems are the key factors to reduce the fault misdiagnosis in this paper.
  4.3. Improved Random Forest Algorithm
The RFs algorithm uses bootstrap sampling, which is a simple random sampling method with permutation. When extracting each training subset, about a third of the samples are not selected, and these data are called out-of-bag data. These data are of high research value and can be used as an alternative to cross-validation methods for datasets. Based on the traditional RFs algorithm, this study designs a weighted random forests algorithm based on precision prediction with perturbed OOB datasets and a few object datasets. In the training phase of the algorithm, the test dataset and the out-of-bag dataset with additional disturbance factors are used to predict the fault diagnosis accuracy of the decision tree on the tested object. The out-of-bag weight and test weight are calculated according to the above precision. In the decision-making stage, the voting weights of the decision tree are adjusted according to the out-of-bag weight and test weight. The detailed implementation process is described below.
First, the test data, P, and the training dataset, T, are collected at the test sample rate XTP (the ratio of test data to training data). The training dataset, T, is from an ideal system simulation, and the relatively small test dataset, P, is from the test object with uncertain perturbation factors. In the training phase, bootstrap sampling is used to separate the training data subset S and the out-of-bag data O from the training dataset, T, at the out-of-bag data rate Xos (the ratio of the OOB data to the training subset data), and add a certain random Gaussian white noise to the OOB data O. The fault diagnosis accuracy of the out-of-bag dataset and test dataset with disturbance factors reflects the fault classification ability of the decision tree. The decision trees with higher classification accuracy and better classification effect will have heavier weight.
After training the 
kth decision tree with the training subset 
Sk, use the out-of-bag dataset 
Ok with variance 
M gaussian white noise to predict the fault diagnosis accuracy of the kth decision tree. For the 
kth decision tree, the weights for OOB data are as follows:
In this formula: 
Xo is the total number of samples, and 
 is the number of samples correctly classified by the 
k-th decision tree. Use the test dataset 
P to predict the fault diagnosis accuracy of the 
kth decision tree. The ratio of the number of correct test samples to the number of predicted test samples is the weight of the predicted test.
        
In the decision-making stage, the OOB data weights are combined with the predicted test weights to determine the final weights. When voting for fault diagnosis, the vote result of each decision tree is multiplied by its corresponding weight Wk.
The development of the proposed Algorithm 1 is as follows:
        
| Algorithm 1: Improved Random Forest Algorithm | 
| Begin: (1) Determine the out-of-bag data rate, Xos, the test data rate, XTP, the number of decision trees, K, and the Gaussian white noise variance, M;
 (2) According to XTP, determine the training data, T, and predict the test data, P;
 for k = 1 to K do
 (3) Using bootstrap sampling and according to XOS, the training set, T, is divided into training subset, Sk, and out-of-bag data, Ok;
 (4) According to the C4.5 algorithm, N features are randomly selected as node classification features, and Sk is used to generate a decision tree;
 (5) Add random Gaussian white noise of M decibels to Ok;
 (6) Take OK, P as the test set;
 (7) According to Equations (7) and (8), calculate the weights of the kth decision tree as wOk and wPk;
 (8) Calculate the final weight, wk, of the kth decision tree by Equation (9);
 end for
 (9) The test data are classified by the decision tree set, and the final classification result is determined by Equation (11);
 END
 | 
Finally, the type with the most votes is the fault diagnosis result of random forests. The final weights of the RFs model, the fault diagnosis result of each decision tree, and the fault diagnosis result of random forests are as follows:
In order to obtain a better fault diagnosis effect, an optimization model is constructed to optimize the hyperparameters 
XTP, 
XOS, 
K and 
M. The optimization objective function is defined as:
XORF is the average number of samples correctly classified by the random forests model in the out-of-bag data test set. XPRF predicts the correct number of samples for the random forests in the test dataset P.
In the formula: max (THDI) is the maximum total harmonic ratio of the measured object current.
In recent years, particle swarm optimization algorithm has been widely used in various optimization problems because of its simplicity, easy implementation and fast convergence speed. The optimization objective is solved by the particle swarm optimization algorithm. For details, please refer to [
29].
  5. Simulation and Experimental
In order to verify the effectiveness of the proposed data-driven method in practical applications, simulation tests and experiments are carried out. The performance of the ANN, SVM, and RFs, weighted RFs, PSO weighted RFs fault diagnosis model was evaluated in the simulation phase. The commonly used evaluation indicators of data-driven method fault diagnosis, accuracy, precision, recall, and model training time are used to measure the performance of the proposed method. The definition of the fault diagnosis accuracy, precision, and recall are as follows:
TP, TN, FP, and FN denote true positive, true negative, false positive and false negative, respectively.
  5.1. Database Generation
In order to generate fault diagnosis models, a comprehensive and informative database is required. Such databases can be obtained from historical measurements or simulations. Considering that the measured object has a variety of random disturbance factors, which are caused by sensors disturbance, converter parameters and so on, therefore, the fault data are not easy to precisely obtain from object system. In order not to lose generality, the data are simulated in this paper, and DC voltage, output voltage and output current assignment are considered. The fault characteristics will be different under different operating conditions. Therefore, in order to contain more fault information in the fault database, we simulate various converter operation conditions. The data acquisition process is shown in 
Table 2, and the three-phase PWM converter simulation parameters are shown in 
Table 3. The simulation obtains a total of 88,000 sets of data as the model training database, 
DT. Due to the risk of damage from real system open failure, only a few test databases, 
DP (2200 sets), are obtained through two real three-phase converter systems with different model parameters, as shown in 
Table 4. Of which, the parameters of the real system 1 are consistent with those of simulation as shown in 
Table 3.
The training dataset, T, is extracted from DT according to coefficient XOS. Extract the test dataset P from the DP according to coefficient XOS. The out-of-bag data in the DP is used to calculate the evaluation index Equations (15)–(17).
  5.2. Simulation and Comparison Results
In order to verify and analyze the performance of the fault diagnosis model, the dataset 
T is used to train the proposed fault diagnosis model. In addition, Bayesian network (BN) fault diagnosis model, support vector machine (SVM) fault diagnosis model, RFs fault diagnosis model, ensemble extreme learning machine (ELM) fault diagnosis model and the proposed fault diagnosis model, are compared through the same training dataset, 
T, and validation dataset, 
Vd, which is the part of the database 
DP that excludes dataset 
P. All algorithms adopt the same general fault diagnosis framework as shown in 
Figure 1 and 
Figure 2. In order to search for the optimal fault diagnosis performance, the hyperparameters of all algorithms are optimized including architecture, learning rate, kernel function, the number of decision tree, etc. The optimization results and some decision tree weights of the proposed method are shown in 
Table 5.
In order to verify the training performance of the fault diagnosis algorithms, we conducted 10-fold cross-validation on the above model in 88,000 training datasets. The results are as shown in 
Table 6. After the hyperparameter optimization, all the fault diagnosis algorithms have a good fault diagnosis performance index (above 99.5%) in the training datasets. However, integrated fault diagnosis algorithms including RFS, ensemble ELM, and the proposed method have an excellent fault diagnosis performance index of nearly 100% in the OOB of training datasets.
In order to verify the generalization performance of the fault diagnosis model, we performed performance verification on real system 1 and real system 2. Finally, 880 sets of validation dataset, 
Vd, were selected from the 
DP to calculate the performance index. The comparison results are as shown in 
Table 7. The simulation results show that the proposed algorithm has the highest accuracy (96.25%) and recall (97.73%). This shows that the proposed method can check all fault types well and avoid fault miss detection. However, the ensemble ELM has the highest (96.15%), which reflects that the algorithm has the best performance in avoiding false positives. The offline test time of the proposed method is about 0.2837 s reflecting the proposed method has less computation time while achieving the same diagnosis performance.
Due to the training data being mainly from ideal simulation system, the performance of BN, SVM, and RFs fault diagnosis is poor when applied to the real system with disturbance factors. However, the proposed methods and the proposed algorithm improve the adaptability of the diagnostic model by modifying the weights of the model through disturbed data. Therefore, the above results show that the proposed method has good generalization performance
Besides, in order to simulated random disturbance, we add Gaussian white noise with different variances, 
M, to the validation dataset, 
Vd, to verify the robustness of the proposed method as shown in 
Figure 3. With the increase in 
M, the fault diagnosis accuracy of other methods decreases. Especially when M = 6%, the fault diagnosis accuracy of these fault diagnosis methods is lower than 75.12% due to signal disturbance. As a comparison, the fault diagnosis accuracy of the proposed method is above 92.16% when M < 6%. It shows that the proposed method has excellent robustness.
  5.3. Experimental Verification
To verify the feasibility of the method, an online fault diagnosis experiment is carried out on the three-phase PWM inverter. The three-phase PWM inverter online fault diagnosis system is shown in 
Figure 4. The fault diagnosis system consists of an IPC and data caching system realized by FPGA, in which the underlying controller is used for open-circuit fault monitoring, the IPC can be replaced by an ARM chip, and the proposed RFs can run on the ARM chip. The sampling clock of the closed-loop control system is 100 kHz, and the sampling frequency is 20 kHz. At the same time, the current signal is resampled with a clock of 10 kHz and a resampling frequency of 10 kHz. Therefore, it only sends 200 points to the IPC per cycle (20 ms) to reduce the pressure of computational burden. Once an open-circuit fault is detected, the online fault diagnosis system will send a protection signal to the controller to turn off all IGBT control signals.
The output current fault characteristics are different when the converter runs under different working conditions, mainly including DC Voltage, output voltage, output voltage frequency, output current. For example, the current fault characteristics are different, as shown in 
Figure 5 (S1 open-circuit fault) and 
Figure 6 (S1 and S2 open-circuit fault), when the output current is 44 A (
Figure 6a) and 8.8 A (
Figure 6b). Hence, it is difficult to achieve high precision fault diagnosis by setting thresholds or traditional data-driven fault diagnosis methods. It can be seen from the experimental waveform that the occurrence of faults is often accompanied by overstress. If there is no high-performance fault diagnosis algorithm to quickly detect the fault and shut down the system when the IGBT open-circuit fault occurs, the overstress will lead to a more destructive secondary failure.
This paper illustrates the effectiveness of the method by taking S1 open-circuit faults, both S1 and S3 open-circuit faults as examples. Once an open-circuit fault is detected, the controller will immediately turn off all IGBTs to protect the system. This method can locate the fault while ensuring the safety of the system, and the diagnosis result is shown in 
Figure 7.
Figure 7a illustrates the experimental results of S1 open-circuit fault. S1 open-circuit fault occurs at 59.21 ms; the fault diagnosis system detected and located the fault at 79.92 ms, of which the fault label is as shown in 
Table 1. After detecting the IGBT open-circuit fault, the converter shut down at 82.01 ms to protect the device from secondary failures.
 Figure 7b illustrates the experimental results of both the S1 and S2 open-circuit fault. Both the S1 and S2 open-circuit fault occur at 50.82 ms. The fault diagnosis system detected and located the fault at 71.15 ms, of which the fault label is as shown in 
Table 1. After detecting the IGBT open-circuit fault, the converter shut down at 73.09 ms to protect the device from secondary failures.
 The above experiment data show that using the proposed method, IGBT open-circuit fault can be identified in around one current cycle (20 ms). The online calculation time is minor, around 0.46 ms. Once IGBT open-circuit failure occurs, the proposed method can quickly turn off the power supply within about 23 ms, so as to avoid more serious secondary accidents.
In addition, reliable open-circuit fault diagnosis should ensure that there is no error trigger when the converter is adjusted in the transient process. Therefore, the transient sensitivity of the proposed algorithm is analyzed in this paper. The analysis result is shown in 
Figure 8. The output current of the converter is regulated from 44 A to 36 A at 50.92 ms, while the fault label maintain 1 (Fault label = 1 indicates that the converter is operating normally as shown in 
Table 1) all the time. The above results show that the fault diagnosis of the proposed algorithm is independent of the converter regulation transient.
The above results show that the model trained by the proposed method under ideal data can be applied to real systems with disturbance factors very well. Therefore, this method has good robustness and generalization ability.
  6. Conclusions
In this paper, a robust accuracy weighted random forests fault diagnosis method for three-phase PWM converters is proposed. The proposed method takes the three-phase output current as the input signal and uses the normalization method to preprocess the data without additional sensors. Based on the test accuracy of the perturbed out-of-bag data and the multi-source model test data on the model, an accuracy weighted random forests algorithm is proposed for extracting mapping relationship between fault modes and current signal. In order to further improve the fault diagnosis performance, the hyper-parameters of the parameter optimization model are constructed.
Compared with the BN, SVM, RFs, ensemble ELM, the RFs algorithm has better performance in terms of training time, computational burden, diagnostic accuracy and robustness. Finally, comparison simulation and online fault diagnosis experiment are carried out. The comparison simulation and experimental results show that the method can accurately and rapidly locate the open-circuit fault of the IGBT under the premise of ensuring the safety of the system. In addition, this method is not limited to typical three-phase PWM converters, other converters are still applicable through the establishment of database and model retraining in the same way.
The limitations of this method are that in order to perfectly adapt to different application scenarios, it is necessary to study the disturbance distribution type of the tested object. On the other hand, the training efficiency of the model needs to be improved.