A Robust Accuracy Weighted Random Forests Algorithm for IGBTs Fault Diagnosis in PWM Converters without Additional Sensors

When an insulated-gate bipolar transistor (IGBT) open-circuit fault occurs, a three-phase pulse-width modulated (PWM) converter can usually keep working, which will lead to system instability and more serious secondary faults. The fault detection and diagnosis of the converter is extremely necessary to improve the reliability of the power supply system. In order to solve the problem of fault misdiagnosis caused by parameters disturbance, this paper proposes a robust accuracy weighted random forests online fault diagnosis model to accurately locate various IGBTs open-circuit faults. Firstly, the fault signal features are preprocessed by using the three-phase current signal and normalization method. Based on the test accuracy of the perturbed out-of-bag data and the multiple converters test data, a robust accuracy weighted random forests algorithm is proposed for extracting a mapping relationship between fault modes and current signal. In order to further improve the fault diagnosis performance, a parameter optimization model is built to optimize hyper-parameters of the proposed method. Finally, comparative simulation and online fault diagnosis experiments are carried out, and the results demonstrate the effectiveness and superiority of the method.


Introduction
In recent years, due to the increase in nonlinear connected to the utility grid, the problems of low quality and harmonic distortion in power systems have increasingly attracted widespread attention [1]. A large amount of harmonic current will cause power grid voltage distortion and affect the normal operation of electrical equipment. It may also cause parallel or series resonance of the power grid [2]. Three-phase PWM converter, as the interface for energy conversion, has become an important device to improve this problem due to its excellent performance and potential advantages, such as sinusoidal input current and adjustable high input power factor. It is widely used in photovoltaic power generation, wind power generation, traction drive system and other occasions that require strict equipment reliability.
However, the reliability of converters used in high-reliability applications is not high, and semiconductor power devices are the weakest components in converters. According to the semiconductor device survey report of more than 200 products from 80 companies, the converter failure caused by semiconductor devices accounts for about 60% of all failure data [3]. In order to obtain high-performance power quality, the system needs to work at a very high switching frequency, generally above 10 kHz. Working at high frequency and high temperature for a long time further increases the damage probability of the IGBT. Since the fault of IGBT has always been the main reason for the fault of the three-phase PWM converter, the fault diagnosis and location of IGBT becomes particularly important [4]. and disturbances. Integrating XGBoost in RFs, a data-driven based wind turbine fault detection method is proposed in [28], which prevents over-fitting when dealing with multidimensional data. Reference [30] proposed a RFs regression based implementation of SVPWM for a two-level inverter, which can improve the performance of the three-phase induction motor (TIM) drive. Reference [31] proposed a DAWRF algorithm for IoT fault detection based on edge computing and blockchain, which improved the traditional RFs algorithm effective application in IoT fault detection.
In practical applications, most data-driven methods pay great attention to the robustness and generalization ability of fault diagnosis. Reliable diagnosis needs to adapt to as many operating conditions as possible, which is not fully addressed in most literatures. In addition, data-driven algorithms often suffer from algorithmic accuracy bottlenecks and huge training burdens. Aiming at the above problems, this paper proposes a data-driven IGBT open-circuit fault diagnosis method. Its salient features and advantages are as follows: (1) In order to extract the mapping relationship between fault features and failure modes, a robust accuracy weighted random forests algorithm is proposed, which uses outof-bag datasets and object datasets with random disturbances to evaluate the performance of the fault diagnosis model. In accordance with the evaluation results, the weights of the decision trees are changed accordingly to improve the anti-noise ability of RFs. (2) Several hyper-parameters of the trained diagnostic model are optimized by an optimization programming framework (OPF). The OPF's aims are to improve the accuracy of the proposed fault diagnosis model. By this optimization, a number of hyperparameter sets can be provided to satisfy the test object limitations and difficulties of dataset collection. (3) This method is a non-invasive fault diagnosis method, which only needs three-phase current signal as input without additional sensors. Compared with most fault diagnosis methods, the proposed method has higher fault diagnosis accuracy and less computational burden.
The method has been validated in simulation and in-circuit testing. Compared with the existing fault identification methods, this method has the advantages of high fault diagnosis accuracy and better robustness.
The rest of this paper is organized as follows: Section 2 introduces the inverter system and fault types in this study. Section 3 briefly introduces the general framework of the method. The theoretical basis and detailed description of the improved robust accuracy weighted random forests algorithm are presented in Section 4. Section 5 details the simulation and experimental verification and discussion. Finally, Section 6 presents the overall conclusion of this paper.

System Description and Fault Identification
The object of this paper is a three-phase PWM converter widely used in electric drive systems. Its circuit topology is a typical three-phase full-bridge structure as shown in Figure 1, consisting of six IGBTs (T 1 -T 6 ) and corresponding anti-parallel connected diodes (D 1 -D 6 ). I a , I b , I c are the output currents of the converter, which are controlled by the control system with 100 kHZ sampling rate, 20 kHZ control sampling rate. The IGBT is controlled by the corresponding drive signals.  In this paper, the proposed open-fault diagnosis system is as shown in Figure  which is composed of data caching field programmable gate array (FPGA) and industr personal computer IPC. FPGA resamples the high speed three-phase current signals a sampling rate of 10 kHZ to caches one cycle of current signals. IPC executes the fault d agnosis algorithm according to the current signal of one cycle to obtain the fault diagnos result. Once a fault signal is detected, IPC gives instructions via direct memory acce (DMA) to shut down the control system. By this way, the fast IGBT open fault protectio is realized. The main advantage of this system is that it reduces the computational burd by down sampling, so that one fault diagnosis system can protect multiple power sourc simultaneously.
However, in the actual operation of the fault diagnosis system, the three-phase cu rent signal is affected by many disturbance factors, which may lead to misdiagnosis an power shutdown. Therefore, the reliability and anti-interference ability of fault diagnos are very important. Usually, the IGBT open-circuit fault mainly includes the followin two situations: (1) When the chip Sn fails, the body diode Dn also fails at the same time.
(2) When the chip Sn fails, the body diode Dn works normally. Different IGBT open-circuit fault scenarios will generate different output current si nals. In total, there are six single IGBT fault types and 15 dual IGBT fault types. Conside ing normal operating conditions, there are a total of 22 fault tags. All labels are listed Table 1. Based on these fault labels, this paper proposed a data-driven fault diagnos strategy. The input of the method is a periodic three-phase output current signal, and t output is a fault label representing the type of fault and the location of the fault. In this paper, the proposed open-fault diagnosis system is as shown in Figure 1, which is composed of data caching field programmable gate array (FPGA) and industrial personal computer IPC. FPGA resamples the high speed three-phase current signals at a sampling rate of 10 kHZ to caches one cycle of current signals. IPC executes the fault diagnosis algorithm according to the current signal of one cycle to obtain the fault diagnosis result. Once a fault signal is detected, IPC gives instructions via direct memory access (DMA) to shut down the control system. By this way, the fast IGBT open fault protection is realized. The main advantage of this system is that it reduces the computational burden by down sampling, so that one fault diagnosis system can protect multiple power sources simultaneously.
However, in the actual operation of the fault diagnosis system, the three-phase current signal is affected by many disturbance factors, which may lead to misdiagnosis and power shutdown. Therefore, the reliability and anti-interference ability of fault diagnosis are very important. Usually, the IGBT open-circuit fault mainly includes the following two situations: (1) When the chip Sn fails, the body diode Dn also fails at the same time.
(2) When the chip Sn fails, the body diode Dn works normally. Different IGBT open-circuit fault scenarios will generate different output current signals. In total, there are six single IGBT fault types and 15 dual IGBT fault types. Considering normal operating conditions, there are a total of 22 fault tags. All labels are listed in Table 1. Based on these fault labels, this paper proposed a data-driven fault diagnosis strategy. The input of the method is a periodic three-phase output current signal, and the output is a fault label representing the type of fault and the location of the fault.

Generic Method Framework
This paper proposed a data-driven IGBT open-circuit fault diagnosis method. The three-phase current sampling of the inverter is used as the diagnostic input, and finally the fault label is the diagnostic output to realize the fault diagnosis and location. The framework of the whole method is shown in Figure 2, which is divided into two stages: offline model training and online fault diagnosis.

Generic Method Framework
This paper proposed a data-driven IGBT open-circuit fault diagnosis method. The three-phase current sampling of the inverter is used as the diagnostic input, and finally the fault label is the diagnostic output to realize the fault diagnosis and location. The framework of the whole method is shown in Figure 2, which is divided into two stages: offline model training and online fault diagnosis.  In the offline development process, the random forests fault diagnosis model is trained using a historical fault database composed of ideal model samples. In order to improve the performance of fault diagnosis, standardization is used to perform feature processing on the sampled three-phase current data. Random forest is an ensemble learning algorithm composed of multiple decision trees. It has strong anti-disturbance ability and robust characteristics, and is one of the most widely used fault classifiers. In this paper, an improved random forests algorithm is used to train the fault diagnosis model to In the offline development process, the random forests fault diagnosis model is trained using a historical fault database composed of ideal model samples. In order to improve the performance of fault diagnosis, standardization is used to perform feature processing on the sampled three-phase current data. Random forest is an ensemble learning algorithm composed of multiple decision trees. It has strong anti-disturbance ability and robust characteristics, and is one of the most widely used fault classifiers. In this paper, an improved random forests algorithm is used to train the fault diagnosis model to obtain the mapping relationship between input features and fault labels. Different from the traditional random forests method, each decision tree in the improved random forests will vote weighted according to its fault diagnosis accuracy. Overall, they provide more accurate and robust fault diagnosis results.

Small data
In online application, the measured object is a non-ideal system affected by various factors. The fault diagnosis method proposed in this paper, the trained fault diagnosis model can be directly applied to the non-ideal power system with disturbance factors, and the measured three-phase current information input into the fault diagnosis model to acquire the fault label, no additional model training is required, and excessive dependence on the fault data of the tested object is avoided.

Random Forests Theory
The essence of fault diagnosis and localization of power source is a classifier, which classifies the operating state of the power source inputting characteristic signals. The random forests algorithm is a supervised ensemble learning algorithm that integrates many decision trees for prediction and classification. The core idea is that weak classifiers formed by multiple decision trees are integrated into a strong classifier, and multiple weak classifiers give classification results through certain rules. The decision tree algorithm uses the labeled samples to construct a fault diagnosis classifier, and obtains the mapping relationship between input features and fault labels. It enables the classifier to classify and discriminate unlabeled samples. The random forests classifier trains each decision tree classifier with random samples and uses different training sets to increase the difference between the decision tree models, thereby improving the generalization ability and robustness of the classifier. This characteristic is beneficial to solve various disturbance misdiagnosis of IGBT open-circuit fault diagnosis.

Decision Tree
Decision tree is a binary recursive partitioning technique that splits the current sample set into two subsets at each node (except leaf nodes). The attribute selection measure adopted by the algorithm is the Gini index. Assuming that the dataset D contains m categories, the formula for calculating the Gini index G D is: In the formula: P j is the frequency of the j-th type of faults in the training database. The Gini index needs to consider the binary division of each feature. Assuming that the binary division of feature A divides dataset D into D 1 and D 2 , the Gini index of sample set D divided by feature A at the child node this time is: For each attribute, every possible binary partition is considered, and finally the subset that yields the smallest Gini index for that attribute is selected as its split subset. Therefore, the smaller the Gini index G D on attribute A, the better the division effect on attribute A is. Under this rule, the division continues from top to bottom until the entire decision tree grows:

REF Algorithm
Definition 1. The random forests model f is a set of decision trees {h(X, θ k ), k = 1, 2, . . . N tree }, and the classifier h(X, θ k ) is an unpruned decision tree constructed with a decision tree algorithm. θ k is a random vector independent and identically distributed with the kth decision tree, representing the growth process of the tree; the final classification value of the random forests is obtained by the majority voting method.

Definition 2.
For the input vector X, it contains at most J different categories, let Y be the correct classification category, and for the input vector X and output Y, define the edge function as: In the formula: j is one of the J categories; I(.) is the indicator function; a k is the average function; k = 1, 2, . . . n. The larger the edge function, the higher the confidence of the correct classification. The generalization error of random forests is thus defined as: In the formula: P X,Y is the classification error probability function for a given input vector X. When the number of decision trees in the forest is large, the following theorem is obtained by using the law of large numbers: Theorem 1. When the number of decision trees is large enough, for all sequences θ k , E* converges almost everywhere: In the formula: P θ (c) is the probability of satisfying the condition c for a given sequence θ. This theorem shows that the generalization error of random forests will not cause overfitting as the number of trees increases, but will tend to an upper bound.

Theorem 2.
The upper bound of the random forests generalization error is: In the formula: ρ and s are the average correlation coefficient and average strength of the tree. It can be seen from Theorem 2 that with the decrease in the correlation of decision trees and the increase in the strength of a single decision tree, the upper bound of the generalization error of random forests will decrease, and its generalization error will be effectively controlled.
Guaranteed by the law of large numbers, random forests has a high classification accuracy without overfitting, which is extremely suitable for power source fault diagnosis. From the above analysis, there are two main ways to improve the diagnostic accuracy performance of random forests fault diagnosis model, namely, reducing the correlation ρ of decision trees and improving the fault diagnosis performance s of a single decision tree. In addition, an important feature of random forests is out-of-bag estimation. When a training subset is generated by bagging, for each decision tree, nearly 37% of the samples in the original sample set S will not appear in the other trees. In the training subset, these samples are called out of bag (OOB) samples. OOB samples can be used to estimate the generalization error of RFs and also to calculate the importance of each feature. The above theorems are the key factors to reduce the fault misdiagnosis in this paper.

Improved Random Forest Algorithm
The RFs algorithm uses bootstrap sampling, which is a simple random sampling method with permutation. When extracting each training subset, about a third of the samples are not selected, and these data are called out-of-bag data. These data are of high research value and can be used as an alternative to cross-validation methods for datasets. Based on the traditional RFs algorithm, this study designs a weighted random forests algorithm based on precision prediction with perturbed OOB datasets and a few object datasets. In the training phase of the algorithm, the test dataset and the out-of-bag dataset with additional disturbance factors are used to predict the fault diagnosis accuracy of the decision tree on the tested object. The out-of-bag weight and test weight are calculated according to the above precision. In the decision-making stage, the voting weights of the decision tree are adjusted according to the out-of-bag weight and test weight. The detailed implementation process is described below.
First, the test data, P, and the training dataset, T, are collected at the test sample rate X TP (the ratio of test data to training data). The training dataset, T, is from an ideal system simulation, and the relatively small test dataset, P, is from the test object with uncertain perturbation factors. In the training phase, bootstrap sampling is used to separate the training data subset S and the out-of-bag data O from the training dataset, T, at the out-of-bag data rate Xos (the ratio of the OOB data to the training subset data), and add a certain random Gaussian white noise to the OOB data O. The fault diagnosis accuracy of the out-of-bag dataset and test dataset with disturbance factors reflects the fault classification ability of the decision tree. The decision trees with higher classification accuracy and better classification effect will have heavier weight.
After training the kth decision tree with the training subset S k , use the out-of-bag dataset O k with variance M gaussian white noise to predict the fault diagnosis accuracy of the kth decision tree. For the kth decision tree, the weights for OOB data are as follows: In this formula: X o is the total number of samples, and X corr O k is the number of samples correctly classified by the k-th decision tree. Use the test dataset P to predict the fault diagnosis accuracy of the kth decision tree. The ratio of the number of correct test samples to the number of predicted test samples is the weight of the predicted test.
w P k = X corr P X P , k= 1, 2, . . . , K In the decision-making stage, the OOB data weights are combined with the predicted test weights to determine the final weights. When voting for fault diagnosis, the vote result of each decision tree is multiplied by its corresponding weight W k .
The development of the proposed Algorithm 1 is as follows:

Algorithm 1: Improved Random Forest Algorithm
Begin: (1) Determine the out-of-bag data rate, Xos, the test data rate, X TP , the number of decision trees, K, and the Gaussian white noise variance, M; (2) According to X TP , determine the training data, T, and predict the test data, P; for k = 1 to K do (3) Using bootstrap sampling and according to X OS , the training set, T, is divided into training subset, S k , and out-of-bag data, O k ; (4) According to the C4.5 algorithm, N features are randomly selected as node classification features, and S k is used to generate a decision tree; (5) Add random Gaussian white noise of M decibels to O k ; (6) Take O K , P as the test set; (7) According to Equations (7) and (8), calculate the weights of the kth decision tree as w Ok and w Pk ; (8) Calculate the final weight, w k , of the kth decision tree by Equation (9); end for (9) The test data are classified by the decision tree set, and the final classification result is determined by Equation (11); END Finally, the type with the most votes is the fault diagnosis result of random forests. The final weights of the RFs model, the fault diagnosis result of each decision tree, and the fault diagnosis result of random forests are as follows: h k DT (x) = y k (10) In order to obtain a better fault diagnosis effect, an optimization model is constructed to optimize the hyperparameters X TP , X OS , K and M. The optimization objective function is defined as: X ORF is the average number of samples correctly classified by the random forests model in the out-of-bag data test set. X PRF predicts the correct number of samples for the random forests in the test dataset P.
Boundary Condition: In the formula: max (THD I ) is the maximum total harmonic ratio of the measured object current. In recent years, particle swarm optimization algorithm has been widely used in various optimization problems because of its simplicity, easy implementation and fast convergence speed. The optimization objective is solved by the particle swarm optimization algorithm. For details, please refer to [29].

Simulation and Experimental
In order to verify the effectiveness of the proposed data-driven method in practical applications, simulation tests and experiments are carried out. The performance of the ANN, SVM, and RFs, weighted RFs, PSO weighted RFs fault diagnosis model was evaluated in the simulation phase. The commonly used evaluation indicators of data-driven method fault diagnosis, accuracy, precision, recall, and model training time are used to measure the performance of the proposed method. The definition of the fault diagnosis accuracy, precision, and recall are as follows: TP, TN, FP, and FN denote true positive, true negative, false positive and false negative, respectively.

Database Generation
In order to generate fault diagnosis models, a comprehensive and informative database is required. Such databases can be obtained from historical measurements or simulations. Considering that the measured object has a variety of random disturbance factors, which are caused by sensors disturbance, converter parameters and so on, therefore, the fault data are not easy to precisely obtain from object system. In order not to lose generality, the data are simulated in this paper, and DC voltage, output voltage and output current assignment are considered. The fault characteristics will be different under different operating conditions. Therefore, in order to contain more fault information in the fault database, we simulate various converter operation conditions. The data acquisition process is shown in Table 2, and the three-phase PWM converter simulation parameters are shown in Table 3. The simulation obtains a total of 88,000 sets of data as the model training database, DT. Due to the risk of damage from real system open failure, only a few test databases, DP (2200 sets), are obtained through two real three-phase converter systems with different model parameters, as shown in Table 4. Of which, the parameters of the real system 1 are consistent with those of simulation as shown in Table 3.  The training dataset, T, is extracted from DT according to coefficient X OS . Extract the test dataset P from the DP according to coefficient X OS . The out-of-bag data in the DP is used to calculate the evaluation index Equations (15)-(17).

Simulation and Comparison Results
In order to verify and analyze the performance of the fault diagnosis model, the dataset T is used to train the proposed fault diagnosis model. In addition, Bayesian network (BN) fault diagnosis model, support vector machine (SVM) fault diagnosis model, RFs fault diagnosis model, ensemble extreme learning machine (ELM) fault diagnosis model and the proposed fault diagnosis model, are compared through the same training dataset, T, and validation dataset, Vd, which is the part of the database DP that excludes dataset P. All algorithms adopt the same general fault diagnosis framework as shown in Figures 1 and 2. In order to search for the optimal fault diagnosis performance, the hyperparameters of all algorithms are optimized including architecture, learning rate, kernel function, the number of decision tree, etc. The optimization results and some decision tree weights of the proposed method are shown in Table 5. In order to verify the training performance of the fault diagnosis algorithms, we conducted 10-fold cross-validation on the above model in 88,000 training datasets. The results are as shown in Table 6. After the hyperparameter optimization, all the fault diagnosis algorithms have a good fault diagnosis performance index (above 99.5%) in the training datasets. However, integrated fault diagnosis algorithms including RFS, ensemble ELM, and the proposed method have an excellent fault diagnosis performance index of nearly 100% in the OOB of training datasets. In order to verify the generalization performance of the fault diagnosis model, we performed performance verification on real system 1 and real system 2. Finally, 880 sets of validation dataset, Vd, were selected from the DP to calculate the performance index. The comparison results are as shown in Table 7. The simulation results show that the proposed algorithm has the highest accuracy (96.25%) and recall (97.73%). This shows that the proposed method can check all fault types well and avoid fault miss detection. However, the ensemble ELM has the highest (96.15%), which reflects that the algorithm has the best performance in avoiding false positives. The offline test time of the proposed method is about 0.2837 s reflecting the proposed method has less computation time while achieving the same diagnosis performance. Due to the training data being mainly from ideal simulation system, the performance of BN, SVM, and RFs fault diagnosis is poor when applied to the real system with disturbance factors. However, the proposed methods and the proposed algorithm improve the adaptability of the diagnostic model by modifying the weights of the model through disturbed data. Therefore, the above results show that the proposed method has good generalization performance Besides, in order to simulated random disturbance, we add Gaussian white noise with different variances, M, to the validation dataset, Vd, to verify the robustness of the proposed method as shown in Figure 3. With the increase in M, the fault diagnosis accuracy of other methods decreases. Especially when M = 6%, the fault diagnosis accuracy of these fault diagnosis methods is lower than 75.12% due to signal disturbance. As a comparison, the fault diagnosis accuracy of the proposed method is above 92.16% when M < 6%. It shows that the proposed method has excellent robustness. Due to the training data being mainly from ideal simulation system, the performance of BN, SVM, and RFs fault diagnosis is poor when applied to the real system with disturbance factors. However, the proposed methods and the proposed algorithm improve the adaptability of the diagnostic model by modifying the weights of the model through disturbed data. Therefore, the above results show that the proposed method has good generalization performance Besides, in order to simulated random disturbance, we add Gaussian white noise with different variances, M, to the validation dataset, Vd, to verify the robustness of the proposed method as shown in Figure 3. With the increase in M, the fault diagnosis accuracy of other methods decreases. Especially when M = 6%, the fault diagnosis accuracy of these fault diagnosis methods is lower than 75.12% due to signal disturbance. As a comparison, the fault diagnosis accuracy of the proposed method is above 92.16% when M < 6%. It shows that the proposed method has excellent robustness.

Experimental Verification
To verify the feasibility of the method, an online fault diagnosis experiment is carried out on the three-phase PWM inverter. The three-phase PWM inverter online fault diagnosis system is shown in Figure 4. The fault diagnosis system consists of an IPC and data caching system realized by FPGA, in which the underlying controller is used for open-circuit fault monitoring, the IPC can be replaced by an ARM chip, and the proposed RFs can run on the ARM chip. The sampling clock of the closed-loop control system is 100 kHz, and the sampling frequency is 20 kHz. At the same time, the current signal is resampled with a clock of 10 kHz and a resampling frequency of 10 kHz. Therefore, it only sends 200 points to the IPC per cycle (20 ms) to reduce the pressure of computational burden. Once an open-circuit fault is detected, the online fault diagnosis system will send a protection signal to the controller to turn off all IGBT control signals. circuit fault monitoring, the IPC can be replaced by an ARM chip, and the proposed RFs can run on the ARM chip. The sampling clock of the closed-loop control system is 100 kHz, and the sampling frequency is 20 kHz. At the same time, the current signal is resampled with a clock of 10 kHz and a resampling frequency of 10 kHz. Therefore, it only sends 200 points to the IPC per cycle (20 ms) to reduce the pressure of computational burden. Once an open-circuit fault is detected, the online fault diagnosis system will send a protection signal to the controller to turn off all IGBT control signals. The output current fault characteristics are different when the converter runs under different working conditions, mainly including DC Voltage, output voltage, output voltage frequency, output current. For example, the current fault characteristics are different, as shown in Figure 5 (S1 open-circuit fault) and Figure 6 (S1 and S2 open-circuit fault), when the output current is 44 A (Figure 6a) and 8.8 A (Figure 6b). Hence, it is difficult to achieve high precision fault diagnosis by setting thresholds or traditional data-driven fault diagnosis methods. It can be seen from the experimental waveform that the occurrence of faults is often accompanied by overstress. If there is no high-performance fault diagnosis algorithm to quickly detect the fault and shut down the system when the IGBT open-circuit fault occurs, the overstress will lead to a more destructive secondary failure. The output current fault characteristics are different when the converter runs under different working conditions, mainly including DC Voltage, output voltage, output voltage frequency, output current. For example, the current fault characteristics are different, as shown in Figure 5 (S1 open-circuit fault) and Figure 6 (S1 and S2 open-circuit fault), when the output current is 44 A (Figure 6a) and 8.8 A (Figure 6b). Hence, it is difficult to achieve high precision fault diagnosis by setting thresholds or traditional data-driven fault diagnosis methods. It can be seen from the experimental waveform that the occurrence of faults is often accompanied by overstress. If there is no high-performance fault diagnosis algorithm to quickly detect the fault and shut down the system when the IGBT open-circuit fault occurs, the overstress will lead to a more destructive secondary failure.  This paper illustrates the effectiveness of the method by taking S1 open-circuit faults, both S1 and S3 open-circuit faults as examples. Once an open-circuit fault is detected, the controller will immediately turn off all IGBTs to protect the system. This method can locate the fault while ensuring the safety of the system, and the diagnosis result is shown in Figure 7. This paper illustrates the effectiveness of the method by taking S1 open-circuit faults, both S1 and S3 open-circuit faults as examples. Once an open-circuit fault is detected, the controller will immediately turn off all IGBTs to protect the system. This method can locate the fault while ensuring the safety of the system, and the diagnosis result is shown in Figure 7.      Table 1. After detecting the IGBT open-circuit fault, the converter shut down at 82.01 ms to protect the device from secondary failures. Figure 7b illustrates the experimental results of both the S1 and S2 open-circuit fault. Both the S1 and S2 open-circuit fault occur at 50.82 ms. The fault diagnosis system detected and located the fault at 71.15 ms, of which the fault label is as shown in Table 1. After detecting the IGBT open-circuit fault, the converter shut down at 73.09 ms to protect the device from secondary failures.

Drive
The above experiment data show that using the proposed method, IGBT open-circuit fault can be identified in around one current cycle (20 ms). The online calculation time is minor, around 0.46 ms. Once IGBT open-circuit failure occurs, the proposed method can quickly turn off the power supply within about 23 ms, so as to avoid more serious secondary accidents.
In addition, reliable open-circuit fault diagnosis should ensure that there is no error trigger when the converter is adjusted in the transient process. Therefore, the transient sensitivity of the proposed algorithm is analyzed in this paper. The analysis result is shown in Figure 8. The output current of the converter is regulated from 44 A to 36 A at   Table 1. After detecting the IGBT open-circuit fault, the converter shut down at 82.01 ms to protect the device from secondary failures. Figure 7b illustrates the experimental results of both the S1 and S2 open-circuit fault. Both the S1 and S2 open-circuit fault occur at 50.82 ms. The fault diagnosis system detected and located the fault at 71.15 ms, of which the fault label is as shown in Table 1. After detecting the IGBT open-circuit fault, the converter shut down at 73.09 ms to protect the device from secondary failures.
The above experiment data show that using the proposed method, IGBT open-circuit fault can be identified in around one current cycle (20 ms). The online calculation time is minor, around 0.46 ms. Once IGBT open-circuit failure occurs, the proposed method can quickly turn off the power supply within about 23 ms, so as to avoid more serious secondary accidents.
In addition, reliable open-circuit fault diagnosis should ensure that there is no error trigger when the converter is adjusted in the transient process. Therefore, the transient sensitivity of the proposed algorithm is analyzed in this paper. The analysis result is shown in Figure 8. The output current of the converter is regulated from 44 A to 36 A at 50.92 ms, while the fault label maintain 1 (Fault label = 1 indicates that the converter is operating normally as shown in Table 1) all the time. The above results show that the fault diagnosis of the proposed algorithm is independent of the converter regulation transient. 50.92 ms, while the fault label maintain 1 (Fault label = 1 indicates that the converter is operating normally as shown in Table 1) all the time. The above results show that the fault diagnosis of the proposed algorithm is independent of the converter regulation transient. The above results show that the model trained by the proposed method under ideal data can be applied to real systems with disturbance factors very well. Therefore, this method has good robustness and generalization ability. The above results show that the model trained by the proposed method under ideal data can be applied to real systems with disturbance factors very well. Therefore, this method has good robustness and generalization ability.

Conclusions
In this paper, a robust accuracy weighted random forests fault diagnosis method for three-phase PWM converters is proposed. The proposed method takes the three-phase output current as the input signal and uses the normalization method to preprocess the data without additional sensors. Based on the test accuracy of the perturbed out-of-bag data and the multi-source model test data on the model, an accuracy weighted random forests algorithm is proposed for extracting mapping relationship between fault modes and current signal. In order to further improve the fault diagnosis performance, the hyper-parameters of the parameter optimization model are constructed.
Compared with the BN, SVM, RFs, ensemble ELM, the RFs algorithm has better performance in terms of training time, computational burden, diagnostic accuracy and robustness. Finally, comparison simulation and online fault diagnosis experiment are carried out. The comparison simulation and experimental results show that the method can accurately and rapidly locate the open-circuit fault of the IGBT under the premise of ensuring the safety of the system. In addition, this method is not limited to typical threephase PWM converters, other converters are still applicable through the establishment of database and model retraining in the same way.
The limitations of this method are that in order to perfectly adapt to different application scenarios, it is necessary to study the disturbance distribution type of the tested object. On the other hand, the training efficiency of the model needs to be improved.