Remaining Useful Life Estimation of Insulated Gate Biploar Transistors (IGBTs) Based on a Novel Volterra k-Nearest Neighbor Optimally Pruned Extreme Learning Machine (VKOPP) Model Using Degradation Data

The insulated gate bipolar transistor (IGBT) is a kind of excellent performance switching device used widely in power electronic systems. How to estimate the remaining useful life (RUL) of an IGBT to ensure the safety and reliability of the power electronics system is currently a challenging issue in the field of IGBT reliability. The aim of this paper is to develop a prognostic technique for estimating IGBTs’ RUL. There is a need for an efficient prognostic algorithm that is able to support in-situ decision-making. In this paper, a novel prediction model with a complete structure based on optimally pruned extreme learning machine (OPELM) and Volterra series is proposed to track the IGBT’s degradation trace and estimate its RUL; we refer to this model as Volterra k-nearest neighbor OPELM prediction (VKOPP) model. This model uses the minimum entropy rate method and Volterra series to reconstruct phase space for IGBTs’ ageing samples, and a new weight update algorithm, which can effectively reduce the influence of the outliers and noises, is utilized to establish the VKOPP network; then a combination of the k-nearest neighbor method (KNN) and least squares estimation (LSE) method is used to calculate the output weights of OPELM and predict the RUL of the IGBT. The prognostic results show that the proposed approach can predict the RUL of IGBT modules with small error and achieve higher prediction precision and lower time cost than some classic prediction approaches.


Introduction
As power electronic equipment has come into widespread use, insulated gate bipolar transistor (IGBT) fully controlled power electronic devices, combining the facile drive of MOSFET with the low conduction loss of a bipolar junction transistor (BJT), and possess good switching performance, have found wide application in industrial automotive, traction, and solar inverter areas [1][2][3]. Therefore, knowing whether an IGBT is in a normal state is critical to the safe operation of the system [4]. Generally, the performance of a system may gradually decline as IGBTs contain numerous materials with different coefficients of thermal expansion (CTE) with many interfaces, which can wear out and cause overstress failures [5]. There are two typical ways to solve system safety issues. One is condition monitoring and fault diagnosis for an IGBT. In [6], Choi et al. proposed a condition monitoring method of an IGBT module by online V CE-ON measurement. Another is the failure prediction method based on the estimation of remaining useful life (RUL) of an IGBT, which plays an essential role in power semiconductor reliability. To date, there are two approaches to predict RUL [7]: the physics-of-failure (PoF)-based approach and the data-driven approach.
PoF-based prognostic methods rely on extensive knowledge of IGBT chemistry and failure mechanisms. Since the methods involve excessive physical parameters, the models are usually difficult to build precisely [8]. In contrast, the data-driven approaches require prognostic data that reflect the IGBT degradation behavior derived from ordinarily observed operating parameters such as voltage, current, power, and temperature [9,10] without the need of extensive knowledge on the devices.
In [11], Li et al. aged an IGBT module by a temperature cycling test to obtain on-state voltage and current as deterioration parameters; they proposed a method based on particle filter (PF) theory to predict the RUL of IGBTs under test. However, the training process of the algorithm is time consuming, and the prediction accuracy is not high for cases of small samples. In [12], Thakur et al. proposed a temperature prediction method for IGBT modules based on the auto-regressive moving average (ARMA) model. However, the applicable range of the AR model for temperature prediction is limited to linear time series. Wu et al. [13] established a back propagation (BP) artificial neural network model to predict the junction temperature of an IGBT power module by measuring the device's saturation voltage and collector current under a specified temperature. However, this paper remains a preliminary study for semiconductor temperature prediction by using neural networks, and the prediction accuracy may decline as the quantity of training data decreases. In [14], Mominul et al. developed a data-driven prognostic approach based on the neural network (NN) and adaptive neuro fuzzy inference system (ANFI) models to predict the degradation of IGBT devices. The predicted RUL matches the actual RUL, and the accuracy improves as the devices complete more degradation phases of the test. However, the results during the early test time phase cannot be accurate. Therefore, although these data-driven methods have useful characteristics such as simple model building and rapid calculation, the accuracy of prediction is difficult to guarantee for nonlinear time series or inadequate samples, especially when the IGBT data contain too little or irrelevant information related to forecasting the track, resulting in inadequate training or over-fitting and damaging the RUL prediction performance of IGBTs.
To remedy these weaknesses of IGBT prediction models, this paper aged the IGBT module by a temperature cycling test to acquire the collector-emitter ON voltage (V CE ), collector-emitter ON current (I CE ), and case temperature (T) as the indicator of the RUL and fully utilized the Volterra series model [15,16] and optimally pruned extreme learning machine (OPELM) model [17][18][19][20] to achieve a perfect RUL prediction result of IGBTs.
In [21], the author noted that the Volterra series and single-hidden layer feed forward neural network (SLFN) model are inherently in contact and similar in function, structure and method for solving, in the case of the reconstructed phase space vector as the input vector of the SLFN model. In addition, the original extreme learning machine (ELM) algorithm [17], which uses SLFN with very few steps and very low computational cost, has all the properties of SLFN. In addition, the OPELM inherits the characteristics of ELM and wraps this extended algorithm possessing higher generalization and robustness. Thus, the Volterra series and OPELM model are inherently equivalent. Owing to the equivalency of both models, in this paper we propose a novel prediction model named Volterra k-nearest neighbor adaptive OPELM prediction (VKOPP) model to trace the IGBT's degradation and estimate its RUL with the superiority of both methods.
The aim of this paper is to develop a prognostic technique using VKOPP model for estimating the RUL of power electronic components. Its structure is as follows: the next section presents the parametric investigation of the IGBT module degradation process. The mathematical description and the basic idea of the VKOPP algorithm are depicted in Section 3. Section 4 demonstrates the specific steps for IGBT RUL prediction based on the VKOPP method. Section 5 gives some illustrative examples to show the working mechanisms of the VKOPP model for some other datasets and results concerning computational speed and accuracy for various prediction methods. Section 6 presents the experimental results and analysis by using our approach to IGBT RUL prediction. Finally, the discussion and conclusion are given in Section 7.

Parametric Investigation of IGBT Module Degradation
The power cycling test is one of the available standard reliability test methods used widely to test the long-term behavior of power devices. However, its results rely on the test parameters. Hence, selection of the most appropriate parameters to monitor is vital to facilitate more accurate aging experiment results [22]. Therefore, it is necessary to study some characteristics of IGBT modules, especially their failure mechanisms. This section focuses on study of some characteristic indexes of IGBT modules in the process of performance degradation and presents a comparison of advantages and disadvantages of several parameters to choose the best parameters to monitor.
As a combination of a MOSFET and a BJT, IGBT has switching characteristics similar to a MOSFET and high current and voltage capabilities similar to a BJT, and thermal stress failure and electrical stress failure are two typical failure mechanisms of it [23][24][25][26][27]. Since the IGBT modules are always under power cycling, the bond foot, solder layer and wire-bonds part, which have close distance with each other in the module, are too weak to sustain the strike of thermal stress and can easily lead to failure (see Figure 1). In particular, cracks can appear because of solder fatigue failure, and with the growth of the cracks, cavities will form in the interior of the IGBT, and even detachment and breakage of the wires can occur, finally causing failure of the IGBT. experimental results and analysis by using our approach to IGBT RUL prediction. Finally, the discussion and conclusion are given in Section 7.

Parametric Investigation of IGBT Module Degradation
The power cycling test is one of the available standard reliability test methods used widely to test the long-term behavior of power devices. However, its results rely on the test parameters. Hence, selection of the most appropriate parameters to monitor is vital to facilitate more accurate aging experiment results [22]. Therefore, it is necessary to study some characteristics of IGBT modules, especially their failure mechanisms. This section focuses on study of some characteristic indexes of IGBT modules in the process of performance degradation and presents a comparison of advantages and disadvantages of several parameters to choose the best parameters to monitor.
As a combination of a MOSFET and a BJT, IGBT has switching characteristics similar to a MOSFET and high current and voltage capabilities similar to a BJT, and thermal stress failure and electrical stress failure are two typical failure mechanisms of it [23][24][25][26][27]. Since the IGBT modules are always under power cycling, the bond foot, solder layer and wire-bonds part, which have close distance with each other in the module, are too weak to sustain the strike of thermal stress and can easily lead to failure (see Figure 1). In particular, cracks can appear because of solder fatigue failure, and with the growth of the cracks, cavities will form in the interior of the IGBT, and even detachment and breakage of the wires can occur, finally causing failure of the IGBT. In Figure 1, the power module of the IGBT consists of a multi-layer structure. Each layer of material has different cofficients of thermal expansions, which leads to different levels of thermal expansion of the units connected to each other in the interior of the module under thermal stress, causing deformations and typical thermal stress failures such as the peeling and the root breakage of the wire bonds, solder fatigue and wire-metallization. Meanwhile, the IGBT module will sustain extreme over-electrical stress when it is running under over-voltage or over-current, causing local heating effect problems and the corresponding shape changes inside it, which consequently lead to IGBT failure.
Both thermal stress failure and electrical stress failure can lead to the abnormalities in the interior structure of the IGBT module. The abnormalities of the internal structure will lead to changes to the external characteristic parameters [28]. These key parameters such as junction-case thermal resistance [23], gate voltage [24], turn-off time [25], and collector-emitter saturation voltage [26], directly reflect the state of aging of the device. Therefore, these parameters can be used as the monitoring parameters of RUL prediction for the IGBT module. Table 1 shows the relation between the typical failure mechanisms and external characteristic parameters, and Table 2 summarizes the advantages and disadvantages of different IGBT module status monitoring parameters. In Figure 1, the power module of the IGBT consists of a multi-layer structure. Each layer of material has different cofficients of thermal expansions, which leads to different levels of thermal expansion of the units connected to each other in the interior of the module under thermal stress, causing deformations and typical thermal stress failures such as the peeling and the root breakage of the wire bonds, solder fatigue and wire-metallization. Meanwhile, the IGBT module will sustain extreme over-electrical stress when it is running under over-voltage or over-current, causing local heating effect problems and the corresponding shape changes inside it, which consequently lead to IGBT failure.
Both thermal stress failure and electrical stress failure can lead to the abnormalities in the interior structure of the IGBT module. The abnormalities of the internal structure will lead to changes to the external characteristic parameters [28]. These key parameters such as junction-case thermal resistance [23], gate voltage [24], turn-off time [25], and collector-emitter saturation voltage [26], directly reflect the state of aging of the device. Therefore, these parameters can be used as the monitoring parameters of RUL prediction for the IGBT module. Table 1 shows the relation between the typical failure mechanisms and external characteristic parameters, and Table 2 summarizes the advantages and disadvantages of different IGBT module status monitoring parameters. As shown in Table 2, junction-case thermal resistance, gate voltage and turn-off time not only require a sensor of high measurement accuracy, but are also vulnerable to the outside influence of other factors. In contrast, the collector-emitter saturation voltage, which is affected by the collector current and case temperature, is relatively simple to acquire, and it can significantly reflect the solder layer fatigue and wire-bond failure. Hence, the precursor parameters for IGBT failure prognostics are identified as collector-emitter ON voltage (V CE ), collector-emitter ON current (I CE ), and case temperature (T). These precursor parameters are used for the monitoring parameters of IGBT RUL prediction in the accelerated life test.

IGBT Experimental Data Acquisition
In the IGBT accelerated life test (shown in Figure 2), the IGBT devices (600 V/6 A) were packaged in a TO-220AB package along with a soft recovery diode from International Rectifier (El Segundo, CA, USA). A pulse-width-modulated (PWM) signal with an amplitude of 15 V and a frequency of 1 or 5 kHz was chosen to be the gate signal. Our experimental system mainly consists of the driven circuit module, the voltage and current sensor module, the thermocouple module, the data acquisition module and the computer-control system. The driven circuit module adopts the application specific intergrated circuit IRS21271 driver, to form the driven wave of the IGBT and control the IGBT's ON and OFF. The current sensor (BJHCS-104 series Hall current sensor) and voltage sensor (HV25-P series Hall voltage sensor) are mainly used for responsing to the change of state of the IGBT and transferring the data from the data acquisition card to the computer. The thermocouple module includes the T type thermocouple and the matched signal conditioning circuit to realize the measurement of the temperature of the surface of the power device, which is the key module in the accelerated life experiment. During the temperature cycling experiment (the control process is shown in Figure 3a,b), the IGBT under testing was switched on and off repeatedly until the case temperature reached the maximum value T max , which had been set before. When T max was attained, the device was powered off until the temperature decreased to T min . The device was then cycled between the temperatures T max and T min while the average was set to 175 • C with an optional swing. Meanwhile, the precursor parameters V CE , I CE , and T were in-situ measured and preserved by a data acquisition system until failure of the IGBT under testing, which was observed as a large increase in collector-emitter ON current caused by latch-up [29]. The experiment condition setting is shown in Table 3.     Figure 4 shows VCE waveforms of IGBT modules during our temperature-cycling test, and these data exhibit a typical and significant degradation trace, which will be used for the development and verification of the VKOPP prediction model. From the original experimental data of Figure 4, with the gradient failure of the IGBT module,   Figure 4 shows V CE waveforms of IGBT modules during our temperature-cycling test, and these data exhibit a typical and significant degradation trace, which will be used for the development and verification of the VKOPP prediction model.    Figure 4 shows VCE waveforms of IGBT modules during our temperature-cycling test, and these data exhibit a typical and significant degradation trace, which will be used for the development and verification of the VKOPP prediction model. From the original experimental data of Figure 4, with the gradient failure of the IGBT module, the collector-emitter saturation voltage goes up. This phenomenon is mainly caused by the different cut effects by the thermal stress between different structural materials, which leads to obvious cracks and cavities in the solder layer and linking lead. Both cracks and cavities of the solder layer and linking lead can cause heating effects that accelerate devices and provide positive feedback and seriously increase the odds of the distribution of hotspots, leading to the failure of the measured devices. During the data collection of the whole failure process, the experimental circuit has some From the original experimental data of Figure 4, with the gradient failure of the IGBT module, the collector-emitter saturation voltage goes up. This phenomenon is mainly caused by the different cut effects by the thermal stress between different structural materials, which leads to obvious cracks and cavities in the solder layer and linking lead. Both cracks and cavities of the solder layer and linking lead can cause heating effects that accelerate devices and provide positive feedback and seriously increase the odds of the distribution of hotspots, leading to the failure of the measured devices. During the data collection of the whole failure process, the experimental circuit has some disturbances, such as instability of the driving waveform, the transport delay of the twisted pairs, the stray inductance caused by the load network and the error of the PCB circuit board, leading to the collector-emitter saturation voltage of the raw experimental data not being equal to the typical value provided by the device manual and mixed with noisy data and bad points. Therefore, before performing the failure prediction of the IGBT, the raw data need to be preprocessed, including getting rid of bad points, signal denosing, normalization and dimensional reduction. The data pre-processing steps will be introduced in Section 6.2.

Data Transformation Based on the Phase Space Reconstruction
Here, we use the delay-coordinate method [30,31] to reconstruct the phase space of time series {x t , t = 1, 2, · · · , n} of the IGBT degradation process. Points in the phase space are expressed as where m is an embedding dimension and τ is a time delay, and we use the minimal differential entropy ratio(ER) [32] to optimize m and τ simultaneously in place of the inconsistency of mutual information method and false-nearest-neighbors method. The optimization process is as follows.
The substitution data of the given signal are x s,i (t), i = 1, 2, · · · , N S , where N S is the number of x s,i (t). The entropy ratio R ent is defined as follows: where N(N = n − (m − 1)τ) is the number of delay vectors, and < • > i is the average computation operator that calculates the N S -alternative data ER H(x s,i ) (i = 1, 2, · · · , N S ) average. ρ j is the Euclidean distance between the j-th delay vector and its nearest neighbor points. Euler's constant C E ≈ 0.5772. m and τ are then changed within a certain range, and the ER diagram is drawn, where the minimum values on the ER graph in the m axis and τ axis correspond to the optimal embedding dimension m opt and delay time τ opt ; e.g., for IGBT4 experimental data, its ER diagram is shown in Figure 5. disturbances, such as instability of the driving waveform, the transport delay of the twisted pairs, the stray inductance caused by the load network and the error of the PCB circuit board, leading to the collector-emitter saturation voltage of the raw experimental data not being equal to the typical value provided by the device manual and mixed with noisy data and bad points. Therefore, before performing the failure prediction of the IGBT, the raw data need to be preprocessed, including getting rid of bad points, signal denosing, normalization and dimensional reduction. The data pre-processing steps will be introduced in Section 6.2.

Data Transformation Based on the Phase Space Reconstruction
Here, we use the delay-coordinate method [30,31] to reconstruct the phase space of time series  of the IGBT degradation process. Points in the phase space are expressed as , where m is an embedding dimension and τ is a time delay, and we use the minimal differential entropy ratio(ER) [32] to optimize m and τ simultaneously in place of the inconsistency of mutual information method and false-nearest-neighbors method. The optimization process is as follows.
The substitution data of the given signal are , ( ), 1, 2, , The entropy ratio ent R is defined as follows:  ; e.g., for IGBT4 experimental data, its ER diagram is shown in Figure 5.  Thus, it can be embedded into the reconstructed delay phase space with m opt and τ opt , and a point in the reconstructed phase space can be expressed as: Assuming the input of the nonlinear discrete dynamical system as Equation (4) and the output as y(t), the discretization Volterra series model can be expressed by Equation (5): where h p (l 1 , · · · , l p ) (p = 1, 2, · · · ) denotes the system's p-th order Volterra kernel.
Because the Volterra filter is a nonlinear adaptive FIR filter, for p-order Volterra series, the filter coefficient vector and the input signal vector are given as follows: where the vector dimensions of H (t) and Z (t) are both 1 + m opt + m 2 opt + · · · + m p opt . Combining similar terms, and making: i, j, k ∈ 0, 1, · · · , m opt − 1 , and i ≤ j ≤ k, then Equations (6) and (7) can be written as: where the vector dimensions of H(t) and Z(t) are both 1 + m opt + Thus, Equation (5) can be simplified as y(t) = (Z (t)) T (H (t)) = Z T (t)H(t) (t = 1, 2, · · · , n). In practice, the truncation order is generally one-order truncation, second-order truncation or third order intercept. Hereinafter, for the sake of solving the higher order kernel estimates of Volterra series, we pre-treat the IGBT experiment data to obtain Z(t) as the input of the VKOPP model.

Developing the VKOPP Model
The relations between V CE , I CE and T are nonlinear and complex, while the Volterra series demonstrates great appeal because its output is the linear function of the filter core. Therefore, the existing linear tools are useful to analyze the filtering performance. Based on the equivalence of the Volterra series and OPELM model [18], the VKOPP model was established to trace the IGBT degradation by using both methods. The prediction principle is shown in Figure 6.
Sensors 2017, 17, 2524 9 of 23 order intercept. Hereinafter, for the sake of solving the higher order kernel estimates of Volterra series, we pre-treat the IGBT experiment data to obtain ( ) Z t as the input of the VKOPP model.

Developing the VKOPP Model
The relations between VCE, ICE and T are nonlinear and complex, while the Volterra series demonstrates great appeal because its output is the linear function of the filter core. Therefore, the existing linear tools are useful to analyze the filtering performance. Based on the equivalence of the Volterra series and OPELM model [18], the VKOPP model was established to trace the IGBT degradation by using both methods. The prediction principle is shown in Figure 6.  Figure 7 shows the complete structure of the VKOPP model. In Figure 7, Z(t), which is described in Section 3, is used as the input vector t X at moment t of the VKOPP model

VKOPP Complete Structure
where the vector dimension of t X is    Figure 7 shows the complete structure of the VKOPP model. In Figure 7, Z(t), which is described in Section 3, is used as the input vector X t at moment t of the VKOPP model
Sensors 2017, 17, 2524 9 of 23 order intercept. Hereinafter, for the sake of solving the higher order kernel estimates of Volterra series, we pre-treat the IGBT experiment data to obtain ( ) Z t as the input of the VKOPP model.

Developing the VKOPP Model
The relations between VCE, ICE and T are nonlinear and complex, while the Volterra series demonstrates great appeal because its output is the linear function of the filter core. Therefore, the existing linear tools are useful to analyze the filtering performance. Based on the equivalence of the Volterra series and OPELM model [18], the VKOPP model was established to trace the IGBT degradation by using both methods. The prediction principle is shown in Figure 6.  Figure 7 shows the complete structure of the VKOPP model. In Figure 7, Z(t), which is described in Section 3, is used as the input vector t X at moment t of the VKOPP model

VKOPP Complete Structure
where the vector dimension of t X is  When the input selection strategy-i.e., the forward-backward algorithm (FB) [33] or least angle regression algorithm (LARS) [34], is used, the input vector of hidden units in the VKOPP model can be expressed as: where c b , c e , c f ∈ 0, 1, · · · , m opt − 1 , with c e ≤ c f . The vector dimensions of X t are denoted as m, and then X t can be simplified as X t = [x t,1 , x t,2 , . . . , x t,m ] T (t = 1, 2, · · · , n).
In Figure 7, g s (s = 1, 2, · · · , N) is the activation function using a combination of three different types of kernels-linear, sigmoid and Gaussian-for robustness and improvement of generality. N is the hidden neurons; w = (w s,i ) N×m and r = (r s ) are the input weights and output weights of the VKOPP model, respectively; and θ s is the biases. At moment t, the input of the s (s = 1, 2, · · · , N) Further, it is assumed that via effective pruning of irrelevant variables and training via the OPELM algorithm [18], the actual best number of neurons for the model is l, and the OPELM hidden-layer output matrix is:

Original OPELM Algorithm
The OPELM algorithm [18] inherits the characteristics of the ELM [17] and wraps this extended algorithm possessing higher generalization and robustness. The basic principle of OPELM algorithm can be described as follows: first, the ELM model should be constructed. The initial number of hidden nodes is denoted as N. After ranking the best neurons using multiresponse sparse regression (MRSR) [18], the target is the network actual output Y, while regression matrixes considered by the MRSR are the outputs of the hidden layer kernel functions H i (i = 1, 2, · · · , N). Because of the exact ranking provided by MRSR, it is used to rank the neurons of the ELM model. In addition, MRSR is mainly an extension of the least angle regression (LARS) algorithm [34], and when the dimension of the target function is one, the MRSR algorithm is equivalent to the LARS algorithm. N hidden layer nodes after sorting are denoted as H 1 , · · · , H N j N , where subscript 1 ≤ j i ≤ N and superscript 1 ≤ i ≤ N represent the serial number of hidden layer nodes before and after sorting, respectively.
Next, the selection of the final model structure is achieved through leave-one-out (LOO) validation: where i represents the i-th hidden layer node, H i is the columns of the hidden-layer output matrix after sorting H, β i is the output weights, and p is defined as p = (H T H) It can be noted that with the MRSR ranking step, the convergence is faster, while the number of neurons is far fewer, leading to a sparser network with the same performance.

VKOPP Training Algorithm
Desspite that fact that the OPELM algorithm is obtained with very few steps and very low computational cost, there are still some critical issues to be solved. The major problem is that the output layer weight estimation results will be very poor if there are collinearity or gross errors in the training data, so in this section, we propose a method based on the least squares method weighted by M estimation to obtain the output weights and output matrix; we refer to this new training algorithm as the VKOPP algorithm, which uses M estimation to improve the robustness with the weighted least squares method to calculate the regression coefficients and to obtain each output weight by the regression residual.
For the training set {(X t , Y t )} formed by D groups of data, where X t = [x t,1 , x t,2 , · · · , x t,m ] T is the input vector and Y t = [y t,1 , y t,2 , · · · , y t,n ] T is the corresponding expected output, where m and n are dimensions of the input samples and output samples. The mathematic expression of the OPELM model can be represented as: whereĤ is the output matrix of the network hidden layer after pruning,β is the unknown output weight parameter, and e is represented as the regression residuals. This method chooses different types of the influence functions contrapuntally in place of the quadratic sum of residuals in least square method. Here is the chosen influence function proposed in [35]: where k is the harmonic constant, with the typical value k = 1.3450. Assume ϕ(e) as the differential coefficient of ρ(e), which can be represented as: For the OPELM model, the optimization objective function of the regression residuals is: We calculate the partial derivative of the output weight parameterβ and make the partial derivative equal to zero, that is: whereĤ t = g(ŵ T 1,t X t −θ 1 ) · · · g(ŵ T l,t X t −θ l ) ,β = β 1 , · · · ,β l T , and e t is the residual of the t sample.
In M estimation, confirm the weight of each output weight parametersβ by the regression residuals. In other words, give a large proportion of weight to the output weight parametersβ with low regression residuals. To standardize the regression residuals the scale estimation factor S is introduced to the weight function, which generally valued as the median absolute deviation (MAD) divided by the constant 0.6745. Therefore, the new OPELM model can be expressed as: where, v = e s = 0.6745e med(|e|) , with med represented as the median calculation. From the Equation (18), we can get:  (19) and (21), obtain the output weightsβ after the OPELM is pruned: The training algorithm of VKOPP is as follows: Step 1. The number of training set as D, construct the ELM models with N as the number of neurons. Randomly assign the input weights w i and bias of hidden layer θ i . Record the output matrix of hidden layers as H and the output weight matrix as β.
Step 2. Rank nodes of hidden layers by the MRSR algorithm [21] as H 1 , · · · , H N j N , where subscript 1 ≤ j i ≤ N and superscript 1 ≤ i ≤ N represent the serial number of hidden layer nodes before and after sorting.
Step 3. Select the optimized number of neurons by the LOO method based on the ranked order.
Step 4. Update the input weights w i and threshold parameter θ i after pruning. Calculate the output matrix of the hidden layerĤ further.
Step 5. Use the output matrixβ (0) =Ĥ + Y = (Ĥ TĤ ) −1Ĥ T Y from the least square estimation of the traditional OPELM to access the initial regression residual e 0 and standardize e 0 as v.
Step 6. Obtain the initial weight of the t(t = 1, 2, · · · , D) training samples by Step 7. Useβ (1) of Equations (22) instead of theβ (0) to achieve the new regression residual e 1 , and the new weights of the output weight matrix of each training samples based on the new regression residual.
Step 8. Return to step 6, and so on, calculate the output weight parameterβ. Continue the iteration until the absolute value of the differences between the estimated values of two adjacent steps meet up with the given standard error, that is max( β (i) −β (i−1) ) < ξ.

Network Output Prediction
In the output phase of the VKOPP model, a combination of the k-nearest neighbor method (KNN) [36] and least squares estimation (LSE) method [37] is used to calculate the output weights of OPELM and predict the RUL of IGBT. This new weight update method can effectively reduce the influence of the outliers and noises, leading to improve the accuracy of OPELM algorithm.
Assuming that the number of sample data is D and several nearest vectors from matrix H are found to form a new matrix by the KNN method. The corresponding output weights are then calculated using the LSE method. The calculation process is as follows: Step 1. As shown in Figure 7, at this moment, t = D + 1 − m opt τ opt , the initial input vector of the VKOPP model is X t . After performing the input selection strategy, the input is denoted as X t (t = D + 1 − m opt τ opt ). The hidden layer output matrix h i = g s m ∑ i=1 w s,i x t,i − θ s is then calculated.

VKOPP Model-Based IGBT's RUL Prediction
In this section, the proposed VKOPP model is applied to predict the IGBT RUL, and the specific steps are as follows: Step 1. Pre-treat the IGBT degradation data: the original dataset is normalized as DS = {x(1), x(2), · · · , x(D)}, where D is the number of sample data. Take the difference between adjacent data as the input, and then obtain new dataset (1), ∇x(2), · · · , ∇x(D)}. Step 2. Adopt the minimal differential entropy ratio method to optimize embedding dimension d and delay time τ on dataset DN at the same time. Map the data to the d-dimensional feature space by using the windowize function in Matlab to obtain the input vector {X 1 , X 2 , · · · , X t , · · · , X n } (n = D − dτ), where X t = {∇x(t), ∇x(t + τ), · · · , ∇x(t + (d − 1)τ)} (t = 1, 2, · · · , n). To facilitate the calculations, a two-order truncated discretization Volterra model is taken as an example in the following. Thus, the input vector can be expressed as: where the vector dimensions of X t is (d+1)(d+2)/2. The training expected output is Y = {∇y 1 , ∇y 2 , · · · , ∇y t , · · · , ∇y n }, with ∇y t = ∇x(t + dτ).
Step 3. When the input selection strategy (i.e., FB or LARS) is used, the input vector of hidden units can be expressed as with c e ≤ c f . Suppose the vector dimension of X t is denoted as m; then, X t can be simplified as X t = [x t,1 , x t,2 , · · · , x t,m ] T (t = 1, 2, · · · , n).
Step 4. Construct an ELM model with N hidden neurons, and N < D. Take X t obtained by Step3 as the input vector, with the input weights w = (w s,i ) N×m and biases of the ELM model θ = θ s (s = 1, 2, · · · , N; i = 1, 2, · · · , m). At moment t, the input of the s(s = 1, 2, · · · , N) hidden Step 5. Rank neurons by using the MRSR algorithm; the N hidden-layer nodes via ranking can be expressed as g 1 , · · · , g N j N , where subscript 1 ≤ j i ≤ N and superscript 1 ≤ i ≤ N represent the serial number of hidden layer nodes before and after sorting, respectively. Further, we select the optimal number of neurons by LOO for the model as l.
Step 7. Utilize the KNN and LSE methods to calculate the output weights of OPELM and prediction. The process is as follows: (1) As shown in Figure 7, to predict x(D + 1), the initial input vector of the VKOPP model is X t according to Step2. After performing the input selection strategy, the input is denoted as X t (t = D + 1 − dτ). Then, calculate the hidden-layer output matrix (2) Calculate the Euclidean distance between h i and each vector of the matrix in Step 6; Sequence all distances in S, and find the l + 10 nearest neighbor from H in Step 6 to form a new hidden-layer output matrix H i and the corresponding expected output Y i , to obtain the output weights γ = {γ 1 , γ 2 , · · · , The predicted value of VKOPP model can then be presented as: Further, obtain the q-step-ahead (q ≥ 1) predicted value: yˆt +q = yˆt +q−1 + f (x t,q+1 , · · · , x m , ∇yˆt, · · · , ∇yˆt +q−1 ) q ∈ {1, 2, · · · , m − 1}; yˆt +q−1 + f (∇yˆt +q−m , · · · , ∇yˆt +q−1 ) q ∈ {m, m + 1, · · · }.
At each next one-step-ahead (or q-step-ahead) prediction, update h i and γ, and then calculate the predicted value.
Step 8. The metabolism processing technology [38] is employed to update the training data until the predictive value exceeds the IGBT acceptable performance threshold. Once the prediction is completed, obtain the IGBT RUL prediction results, and exit the program.

Algorithm Performance Validation and Assessment
Before using the proposed VKOPP algorithm to predict the IGBT RUL, in this section, this algorithm is compared with the original OPELM, Volterra and other typical machine learning algorithms to verify the validity, feasibility, and generalization. In addition, eight different datasets have been chosen for the experiments.

Datasets
Different types of datasets are used, including the simulation data and the actual chaotic time series to test the effectiveness of the VKOPP model. The simulation sequence is Mackey-Glass data (MG) [39] and the actual sequences are laser, daily minimum temperatures (DMT), electricity demand (ED), CATS benchmark (CATS_B) and the sunspot number (SN) [40,41]. In addition, the degradation model in reference [42] is used to generate a set of data that is used to validate the method.
Singular points are sometimes generated by failures, which can be detected by using wavelets in the training section but are unlikely to be predicted from the forecast data by reducing the impact of the singular point for prediction as far as possible. Hence, the MG_S dataset with some unclear singular points in the simulation sequence (i.e., MG) is also used in the experiments.
These different types of datasets have all been processed in the same way: for each dataset, two-thirds are used for the training set, and the remaining one-third is used for the test set. The training sets are then normalized (zero-mean and unit variance), and the test sets are normalized using the same normalization factors as the corresponding training set.

Experiments
For the sake of measuring the prediction performance of the different types of datasets, the mean square error (MSE) and normalized root mean square error (NRMSE) are used as the performance evaluation criteria, and eight different prediction models for eight datasets are used, including AR model, weighted hidden Markov autoregressive model (WHMAR) [41], RBF neural network model (RBFNN), OPELM, Volterra, pruned lazy learning model (LLpruned) [43], least squares support vector machines (LSSVM) [44], and the VKOPP model proposed in this paper. Moreover, all algorithms in the experiments are given optimal parameters. All experiments are run on the same Microsoft windows XP system with at least 2 GB of memory (no swapping for any of the experiments) and a Pentium Dual-Core E5800 CPU @ 3.20 GHz.
From Table 4, it can be seen that the prediction accuracy of the proposed VKOPP method improved at least one or two orders of magnitude better than the original OPELM and Volterra. Specifically, compared with the strong stochastic Laser, DMT, ED, CATS_B, SN, and RDD sequence, the VKOPP model has better adaptability and thus achieves higher prediction performance. Furthermore, the VKOPP model is always better than, or at least as good as, the other prediction models, with an improvement in the MSE and NRMSE of the results. Obviously, the one-step prediction performance of the VKOPP model is satisfactory for application.
For the long-term prediction, as shown in Table 5, the VKOPP model prediction accuracy is also better than that of the original OPELM and Volterra, and for some data sets, the prediction errors of the original OPELM and Volterra are infinite due to the reason that they are not convergent. For the strong stochastic sequences, this shows that compared with several conventional nonlinear models (such as RBFNN, OPELM, and LSSVM), the VKOPP model can achieve higher multistep prediction accuracy to the time series, but compared with the AR model, the advantage is not very obvious. In view of the MG_S dataset, which contains singular points, the results of the VKOPP model presented in Tables 4 and 5 have higher prediction precision than the other algorithms. Indeed, this shows that the VKOPP model has high robustness.
In summary, the experimental results have shown that in both single-step and multistep prediction, the VKOPP model proposed in this paper can achieve higher prediction accuracy for the different types of random sequences.

IGBT's RUL Prediction Results and Analysis
From the previous discussion, the experimental circuit has some disturbances during the data collection of the whole failure process, such as instability of the driving waveforms, transport delays of the twisted pairs, stray inductance caused by the load network and PCB circuit board errors, leading to the fact that the collector-emitter saturation voltage of the raw experimental data isnot equal to the typical value provided by the device manual and is mixed with noisy data and bad points.Therefore, before performing the failure prediction of the IGBT, the original data need to be pre-processed, including getting rid of bad points, signal denosing, normalization and dimensional reduction. In order to facilitate a more accurate prediction of IGBT RUL, we utilize four methods to compress and convert the raw data extracted to obtain the best features in a low-dimensional space. First, use the 3σ criterion to exclude the bad points. Second, obtain the average of the collection data for each cycle, which was seen as characteristic of the cycle. Then, execute normalization to align all probability distributions of the average. Finally, the data are filtered by wavelet processing. Figure 8 shows the processed data and the prediction results of the VKOPP model for the four IGBTs. Figure 8 (blue line) shows the saturation voltage decrease V CE fade trends with four groups of experiments labeled IGBT1, IGBT2, IGBT3 and IGBT4 from the above temperature cycling test, where APT stands for the acceptable performance threshold. For many applications, when the measured V CE deviates ±15% fromits "normal" reference value, this IGBT is considered as seriously "degraded" and should be replaced [28]. Therefore, the value of APT in our paper is 0.15, which is 15% of the rated value.

IGBT's RUL Prediction Results and Analysis
From the previous discussion, the experimental circuit has some disturbances during the data collection of the whole failure process, such as instability of the driving waveforms, transport delays of the twisted pairs, stray inductance caused by the load network and PCB circuit board errors, leading to the fact that the collector-emitter saturation voltage of the raw experimental data isnot equal to the typical value provided by the device manual and is mixed with noisy data and bad points.Therefore, before performing the failure prediction of the IGBT, the original data need to be pre-processed, including getting rid of bad points, signal denosing, normalization and dimensional reduction. In order to facilitate a more accurate prediction of IGBT RUL, we utilize four methods to compress and convert the raw data extracted to obtain the best features in a low-dimensional space. First, use the 3σ criterion to exclude the bad points. Second, obtain the average of the collection data for each cycle, which was seen as characteristic of the cycle. Then, execute normalization to align all probability distributions of the average. Finally, the data are filtered by wavelet processing. Figure 8 shows the processed data and the prediction results of the VKOPP model for the four IGBTs. Figure 8 (blue line) shows the saturation voltage decrease VCE fade trends with four groups of experiments labeled IGBT1, IGBT2, IGBT3 and IGBT4 from the above temperature cycling test, where APT stands for the acceptable performance threshold. For many applications, when the measured VCE deviates ±15% fromits "normal" reference value, this IGBT is considered as seriously "degraded" and should be replaced [28]. Therefore, the value of APT in our paper is 0.15, which is 15% of the rated value.
For each IGBT experimental dataset, the minimal differential entropy ratio (ER) is utilized to optimize the embedding dimension m and delay time τ at the same time e.g., the ER diagram of IGBT4 and IGBT3 experimental data are shown in Figures 5 and 9, respectively. In addition, selecting 100 nearest neighbors with Volterra truncation order  For each IGBT experimental dataset, the minimal differential entropy ratio (ER) is utilized to optimize the embedding dimension m and delay time τ at the same time e.g., the ER diagram of IGBT4 and IGBT3 experimental data are shown in Figures 5 and 9, respectively. In addition, selecting 100 nearest neighbors with Volterra truncation order p = 1, the VKOPP method is used with all possible kernels-linear, sigmoid, and Gaussian-using a maximum of 100 neurons.
(c) (d)  When the parameter design of the VKOPP algorithm is completed, the forecasting process will be carried out (the specific steps are shown in Section 4 of this paper). To better validate the prediction performance of our proposed prognostic approach, three experiments with different experimental conditions are conducted: (1) use the different numbers of measurement data points as the training data to predict the RUL under the same forecast steps; (2) take two-thirds in each IGBT experimental dataset for the training set to predict the RUL under the different forecast steps; (3) compare the VKOPP algorithm with other typical machine learning algorithms to predict the IGBT RUL in terms of prediction accuracy and time consumption. Table 6. The Life prediction error result foreach IGBT experimental dataset at different training cycles (50-step-ahead).

Life Prediction (Cycles)
In the second experiment, the life prediction error results with prediction steps 1, 10, 50 and 100 for each IGBT experimental dataset are shown in Table 7, and Figure 10 shows the predicted RULs of each IGBT. As shown in Table 7, with increasing number of prediction steps, the error increases gradually, but not very obviously. This illustrates that in both single-step and multistep prediction, the VKOPP model proposed in this paper can achieve higher prediction accuracy for each IGBT experimental dataset. In addition, in Figure 8, experimental results show that even with 50-step-ahead prediction, not only is the data trend in the prediction similar to that of the actual dataset but the predicted life is also close to the actual acceptable performance threshold.  In the last experiment, to further estimate the prediction effect, a comparative study given by other typical machine learning algorithms was presented (shown in Tables 8 and 9 and Figure 11).  In the last experiment, to further estimate the prediction effect, a comparative study given by other typical machine learning algorithms was presented (shown in Tables 8 and 9 and Figure 11).
The algorithms involved in the Tables 8 and 9 and Figure 11 are briefly described in Section 5. For the IGBT3 experimental dataset, two-thirds (2000 samples) are taken for the training set. In addition, all experiments have been run on the same Microsoft Windows XP system with at least 2 GB of memory (no swapping for any of the experiments) and a Pentium(R) Dual-Core E5800 CPU @ 3.20 GHz. other typical machine learning algorithms was presented (shown in Tables 8 and 9 and Figure 11).  AR  1  13  13  40  WHMAR  1  11  17  42  OPELM  12  21  61  111  Volterra  1  9  9  154  LLpruned  0  4  112  86  LSSVM  8  68  102  128  VKOPP  0  2  3  6 The algorithms involved in the Tables 8 and 9 and Figure 11 are briefly described in Section 5. For the IGBT3 experimental dataset, two-thirds (2000 samples) are taken for the training set. In addition, all experiments have been run on the same Microsoft Windows XP system with at least 2 GB of memory (no swapping for any of the experiments) and a Pentium(R) Dual-Core E5800 CPU @ 3.20 GHz.  (1) Prediction Accuracy: With the life prediction results of different prediction steps for the proposed and conventional prediction methodologies tested from Table 8 and Figure 11, the proposed prognostic approach can predict the life of IGBT modules with less error than other algorithms, and with increasing number of prediction steps, the advantage is more obvious. (2) Time-consumption: Table 9 reports the time consumption of 50-step-ahead prediction for experimental dataset IGBT3. The results of Table 9 show the interesting fact that the proposed VKOPP algorithm is computationally efficient, within approximately 1.747 s, to predict the RUL when 2000 samples are used as the training data. Furthermore, compared with some typical machine learning algorithms (i.e., WHMAR, OPELM, LLpruned, and LSSVM), the VKOPP algorithm has an obvious advantage in computational time. All above experimental results have shown that the proposed prognostic approach can predict the life of IGBT modules with small error. Compared with some typical machine learning algorithms, the model can achieve higher prediction precision. Moreover, the proposed prognostic approach is computationally efficient. Hence, this study illustrates that the VKOPP model strikes a very good compromise between computational speed and prediction accuracy for the RUL prediction of IGBT modules.

Conclusions
By analyzing the above experimental results, we can draw the following conclusions: (1) The VKOPP method achieves good approximation of IGBT RUL, and the prediction errors are low for training cycles from the minority to the majority under the same forecast steps. The prediction result can also be good with a small number of training cycles. (2) For both single-step and multistep prediction, the VKOPP model proposed in this paper can achieve higher prediction accuracy for each IGBT experimental dataset. (3) By applying the seven prediction models (i.e., AR, WHMAR, OPELM, Volterra, LLpruned, LSSVM and VKOPP) to IGBT experimental data, the proposed prognostic approach can predict the RUL of IGBT modules with less error than the other six models, and with increasing number of prediction steps, the advantage is more obvious. (4) The proposed VKOPP algorithm is computationally efficient. However, compared with some other algorithms, the advantage in computational time is not very obvious, which will be the focus of future research work on the VKOPP model.