Sensor Fault Diagnosis for Aero Engine Based on Online Sequential Extreme Learning Machine with Memory Principle

: The on-board sensor fault detection and isolation (FDI) system is essential to guarantee the reliability and safety of an aero engine. In this paper, a novel online sequential extreme learning machine with memory principle (MOS-ELM) is proposed for detecting, isolating, and reconstructing the fault sensor signal of aero engines. In many practical online applications, the sequentially coming data chunk usually possesses a characteristic of timeliness, and the overdue training data may mislead the subsequent learning process. The proposed MOS-ELM can improve the training process by introducing the concept of memory principle into the online sequential extreme learning machine (OS-ELM) to tackle the timeliness of the data chunk. Simulations on some time series problems and some benchmark databases show that MOS-ELM performs better in generalization performance, stability, and prediction accuracy than OS-ELM. The experiment results of the MOS-ELM-based sensor fault diagnosis system also verify the excellent generalization performance of MOS-ELM and indicate the effectiveness and feasibility of the developed diagnosis system.


Introduction
Aero engine health management (EHM) is a widely researched field due to its operational reliability and maintenance costs [1,2].Accuracy of most health management systems relies on accuracy of measurements acquired from sensors [3].Due to harsh operating conditions, such as strong vibration, high pressure and high temperature, most sensors are extremely vulnerable to breaking down, which may incur false alarms and increased engine downtime, thus resulting in lower reliability and higher operational costs [4,5].Therefore, an on-board sensor fault detection and isolation (FDI) system is of great importance in enhancing the reliability and safety of an aero engine.
The field of sensor FDI systems for aero engines has been studied over the past few decades [6], and plenty of related works have been reported [7][8][9][10][11][12][13][14].Wallhagen and Arpasi [7] utilized analytical redundancy to address serious sensor fault in one of the engine spool speeds and the compressor outlet static pressure signal, which has set the theoretical basis and given an excellent instance for using the analytical redundancy to diagnose the engine control system.Analytical redundancy technique estimates the sensor measurement according to numerical algorithms, and can reduce the weight and cost of the diagnosis system.With these excellent advantages, the analytical redundancy has attracted large amounts of interest, e.g., Bras designed an FDI system for the inertial and vector sensors of the navigation systems by taking advantage of existing hardware redundancy and exploiting the analytical redundancy [8].
Energies 2017, 10, 39 2 of 15 Lu et al. [9] developed integrated architecture according to an adaptive performance model and a baseline model which are both real-time on-board models.The detection can be determined by the comparison between the baseline model outputs and the performance model outputs, but the baseline model is a piecewise linear model and the storage space cost is huge.Bahareh et al. [10] researched the hybrid Kalman filter bank in the detection and isolation of the sensor faults throughout the whole flight envelope.Escobar et al. [11] presented the sensor fault compensation technique by a pair of high-gain observers and model predictive control strategy.Unfortunately, the reliability of Bahareh's and Escobar's diagnosis systems also suffer from the modeling error.Mattern et al. [12] compared the sensor fault diagnosis performance by a functional approximation neural network with that by an auto-associative neural network (AANN).Sadough-Vanini et al. [13] provided an integrated solution to the sensor FDI problem based on the multi-model approach and a bank of AANNs.The diagnosis system based on AANN does not need the engine modeling knowledge and is also able to perform well for the diagnosis tasks.Torella [14] discussed diagnosis of the apparatus fault for turbine engines according to certain probabilistic expert systems.The expert system with knowledge bases is able to avoid the problem that the same symptoms may be due to different causes.
As shown above, there are three main techniques, the data-based techniques [15,16], the model-based techniques [17,18] and the hybrid techniques [19], used to address the sensor fault problems.Model-based techniques have the ability to diagnose new sensor faults even if there is no prior knowledge and experience, but it depends on the accuracy of the on-board adaptive engine model whose reliability is bound to decline if the nonlinear complexities and modeling uncertainties are increased [20].On the other hand, the data-based method does not need any knowledge of the internal engine working principle and complex engine modeling skills and thus attracts lots of interest and concern.With the rapid development of intelligent computing methods, a good deal of data-based methods have arisen and been applied in sensor fault diagnosis for aero engines since the early 1990s.Shah et al. [21] applied an AANN to pre-processing measurement data, and fed the output of AANN to the EHM system.Ogaji et al. [22] modularly designed a diagnosing and quantifying system for the double sensor faults in aero engine by a bank of neural networks, and the diagnosis results are determined according to the comparison between the predicted value and the real measurement acquired from the sensors.Since the neural networks used by Shah and Ogaji are trained by gradient-based algorithms in an iterative way, the diagnosis systems suffer from time-consuming problems.Xu et al. [23] presented a least squares support vector machine (LS-SVM)-based sensor fault diagnosis system.Zhang and Li [24] proposed introducing the idea of fuzzy membership into an LS-SVM-based fault diagnosis system for yaw angular rate sensor.Although the LS-SVM-based diagnosis systems have high accuracy in offline cases, they are not suitable for online applications.For most of the data-based methods, the training process of the diagnosis system is implemented offline in which the dynamic characteristics of the system cannot be well dealt with and the samples used for the training process are only applicable in certain working conditions.In practical applications, the working operation often changes across a wide range, while the diagnosis system trained offline may not adapt to the dynamic changes.Therefore, research on an online learning algorithm is essential to enhance the adaptability of the sensor fault diagnosis system for aero engine.
Extreme learning machine (ELM), is a high-efficiency learning algorithm for single-hidden layer feedforward neural network (SLFN) [25].As proven in [25], the hidden-layer parameters of the network can be assigned to random values and then the output weight should be analytically determined according to the pseudo-inverse of the hidden-layer output matrix.It has been shown that ELM has not only classification capacity but also universal approximation capacity [26].Furthermore, as verified in [27], compared with traditional SVM and neural networks, ELM is able to learn much faster while obtaining similar or better generalization performance.Liang et al. [28] incorporated ELM with an online learning algorithm and proposed online sequential extreme learning machine (OS-ELM).If the training data is produced sequentially with the chunk size being constant or unfixed, OS-ELM performs better considering generalization performance and learning speed than other conventional sequential algorithms on a lot of benchmark problems.
However, in many real online applications, such as sensor fault diagnosis for aero engines, the data used for the training process are not only produced sequentially, but also usually have time-varying validity; in other words, the validity of the data chunk may decay along with the time passed.The overdue training data, whose validity decays as time goes on, should have lower weight than the new incoming training data, which is the idea behind the memory principle.Consequently, a novel online learning algorithm is presented in this paper by combining OS-ELM with the memory principle, referred to as MOS-ELM.On the one hand, the proposed MOS-ELM reserves the sequential advantages of OS-ELM by the sequential learning process.On the other hand, it deals with the property of timeliness well by decaying the validity of the data chunk as time goes on.In the circumstance of tested problems possessing timeliness in various databases and sensor fault diagnoses for aero engines, it turns out that the MOS-ELM algorithm performs better in generalization performance, stability and predictability than the OS-ELM algorithm.
This manuscript is organized as follows.In Section 2, the basic concepts and related works of the OS-ELM algorithm are reviewed briefly.The formula of the MOS-ELM algorithm is derived and the performance evaluation of MOS-ELM on some time series prediction problems and some real benchmark regression problems are given in Section 3. In Section 4, the sensor fault diagnosis method for aero engines and the experiment results are presented in detail.The conclusion is drawn in Section 5.

Review of Online Sequential Extreme Learning Machine (OS-ELM)
With the purpose of offering an introduction to the proposed MOS-ELM, a brief review of the primary concepts of OS-ELM is given in this section.Considering distinct input-output samples (x i , t i ), where x i = [x i1 , x i2 , . . . ,x in ] T ∈ n and t i = [t i1 , t i2 , . . . ,t im ] T ∈ m , the SLFN model is briefly described in a unified way as: where ω i ∈ n , b i ∈ and β i = [β i1 , β i2 , . . . ,β im ] T ∈ m respectively denote the learning parameters and output weight in regard to the i-th hidden node, L represents the hidden nodes number, and G(ω i , b i , x) denotes the i-th hidden-layer output in regards to x.In the case of the hidden node being an additive function, G(ω i , b i , x) can be represented by: In the case of the hidden node being a radial basis function, G(ω i , b i , x) can be represented by: We suppose that there are N batch-training samples used for the supervised learning process.For the finite distinct set of training samples {(x i , t i )} N i=1 ⊂ n × m , if SLFNs having L hidden nodes absolutely approximates the N training date, it indicates that ω i , b i , and β i satisfy the following equation: We can rewrite Equation (4) in a compact way as: where, Here H is the hidden-layer output matrix.
Traditionally, for the purpose of training an SLFN, one needs to find specific ω i , b i , β i , i = 1, . . ., L, such that Hβ − T takes minimum value.If H is unknown, the gradient-based learning algorithms are usually used to iteratively adjust the ω i , b i , β i .However, for most applications, the gradient-based method is extremely time consuming and often stops at the local minimum.According to the theory of Huang, the hidden-layer learning parameters ω i and b i of SLFN can be assigned randomly and simply, and such SLFN with any nonzero activation function is able to universally approximate any continuous functions on any compact input sets [26].If L ≤ N, the H is of full-column rank with the probability one, and in real-world applications, it is easily satisfied that L ≤ N. Hence, the output weights β can be analytically obtained as the least-squares solutions of Equation ( 5), yielding: where H † represents the pseudo-inverse of H.If H T H is nonsingular, the pseudo-inverse can be calculated as H † = H T H −1 H T in several ways, such as iterative approach, and orthogonalization method [29].Compared with traditional iterative implementations of SLFNs, ELM has similar generalization performance and dramatically increased running speed.The batch ELM algorithm supposes that all the training samples are available for the learning process.Nevertheless, in many problems, the training sample may come chunk by chunk.The OS-ELM algorithm is proposed to handle the online sequential learning problems.We determinate the hidden output function G(ω, b, x) by choosing g and L and assume that the training data is produced to the learning process in the same or different chunk size.The k-th data chunk can be denoted , where N j is the number of the j-th data chunk ℵ j , j = 0, 1, 2, . . ., k.
We use a small data chunk ℵ 0 = {(x i , t i )} N 0 i=1 to carry out the initialization of the learning process, where N 0 is the number of data chunk ℵ 0 which is obtained from the sequential training data ℵ = {(x i , t i )|x i ∈ n , t i ∈ m , i = 1, 2, . ..}, and N 0 is equal or greater than L. The values of learning parameters (ω i , b i ), i = 1, 2, . . ., L are assigned randomly and the initial H 0 may be computed as follows: And then, the initial output weight β(0) can be computed in accordance with ELM as follows: where For the k-th chunk and its previous chunks, the output matrixes of the hidden layer and output layer are respectively defined as: Energies 2017, 10, 39 5 of 15 and then, Here h(k + 1) and t(k + 1) are respectively defined as: Then the estimated values corresponding to the (k + 1)-th data chunk is the least squares solution of H k+1 β = T k+1 and it can be computed iteratively as follows: OS-ELM is composed of an initialization phase and sequential learning phase and needs not retain all the historic data.In the initialization phase, H 0 , β(0), P 0 and T 0 are initialized for use in the sequential learning phase.The samples number of the initialization chunk ought to be equal or greater than the hidden nodes number.In the sequential learning phase, the sequential training date is commenced iteratively.Once the learning procedure on the latest coming data chunk is completed, the historical data can be discarded and is no longer used.From the derivation of OS-ELM, it is easy to conclude that OS-ELM and ELM have similar generalization performance.In fact, ELM algorithm is a specific case of OS-ELM algorithm when all the training samples are used in the initialization phase.
Algorithm OS-ELM: Given hidden nodes number L and activation function g : → (sigmoid or other function), we can summarize OS-ELM algorithm as the following steps.

Proposed Online Sequential Extreme Learning Machine with Memory Principle (MOS-ELM)
In lots of real cases, the sequentially produced training data usually has time variation, that is, the validity of an outdated chunk may decay as time goes on.For instance, in sensor fault diagnosis for an aero engine, since many factors that affect the measurements of the aero engine are usually time-varying, the validity of the previous training process should decay gradually.Hence, the overdue data chunk, whose validity is decaying with time, should have lower weight than the incoming data chunk in the subsequent learning process, which is the idea behind the memory principle.We can easily find that the timeliness of the training data cannot be handled well just with OS-ELM, and the overdue training data may mislead the subsequent learning process.In this section, we introduce the concept of memory principle into OS-ELM to gradually sink the overdue training data into oblivion and name this novel algorithm MOS-ELM.

Formula Derivation
Assume that the decay rate of each data chunk is ρ, where 0 < ρ < 1, then β (k + 1); the output matrix in regards to (k + 1)-th data chunk can be solved from the following equation in the sense of least squares, Let , H 0 := H 0 , T k+1 := ρT k t(k + 1) T , T 0 := T 0 , we can compactly describe Equation (17) as: Theorem 1.The least squares solution of Equation ( 18) can be computed iteratively as follows: where Proof.From the definition of P k , we can easily find that: According to the Sherman-Morrison-Woodbury formula [30], Equation ( 22) can be written as: Substitute Equation ( 23) and the definitions of H k+1 and T k+1 into β (k + 1) = P k+1 H T k+1 T k+1 , and then the output matrix at (k + 1)-th unit time can be determined by: Energies 2017, 10, 39 7 of 15 The P k h(k + 1) T − K k+1 h(k + 1)P k h(k + 1) T in Equation ( 24) can be simplified as: Substitute Equation (25) into Equation (24), and then the output matrix at (k + 1)-th unit time can be described compactly as: Proposed Algorithm MOS-ELM: Given hidden nodes number L and activation function g : → (sigmoid or other function), we can outline the MOS-ELM algorithm with the following steps.
(1) Assign learning parameters (ω i , b i ), i = 1, 2, . . ., L randomly, and set k = 0; (2) Compute H 0 and β(0) as Equations ( 9) and ( 10); (3) Compute h(k + 1) corresponding to (k + 1)-th chunk as Equation ( 13); (4) Calculate β (k + 1) iteratively as Equations ( 19)-( 21); (5) If the new data chunk comes, then let k = k + 1 and skip to the third step.Otherwise, let β be the last iteration value β (k + 1); Remark 1. MOS-ELM is actually an OS-ELM with memory principle.As a newly incoming data chunk is presented to predict the datum of the next chunk, there is no need to repeat the process of ELM.Otherwise, the complex matrix computation would be dealt with as in Equations ( 9) and (10) and the known information which was learned before would be wasted.Only the chunk of training data which is newly arriving and the known information which was learned before are used to carry out the matrix computations for MOS-ELM, while all the training data are used to implement the matrix computations for ELM.Hence, for the sequential prediction problems, the training process by MOS-ELM is much better than ELM's.
Remark 2. In the learning process by MOS-ELM, since the validity of each data chunk decays with time, the SLFN will be trained as soon as the new training data arrives at the next unit time and the validity of the overdue data chunk is reduced.Therefore, the learning process can deal with the timeliness well.

Remark 3.
If each chunk of training data does not have the property of timeliness, that is, ρ = 1, and it is obvious that MOS-ELM is exactly the same as OS-ELM, it is implied that the OS-ELM algorithm is a specific case of MOS-ELM algorithm.

Evaluation Test
In this section, we make a comparison between MOS-ELM and OS-ELM on some time series prediction problems and some real benchmark regression problems.The time series prediction problems considered in this subsection include the Mackey-Glass series, Logistic chaotic series and Sunspot series.The Mackey-Glass series is produced by means of the differential equation described as follows [31]: Energies 2017, 10, 39 8 of 15 where τ = 17, a ∈ [0.2, 0.22], b ∈ [0.1, 0.12] and x(0) = 1.2, and the time series x mg k |k = 1, 2, 3, . . .are generated according to the Runge-Kutta method.The Logistic chaotic time series x lo k |k = 1, 2, 3, . . . is described according to the recursive equation as follows [32]: where λ ∈ [3.5, 4].The Sunspot time series is monthly mean total sunspot number from January 1749 to December 2015 and is obtained from [33].Notes: RMSE: the root-mean-square-error; SD: standard deviation.
As observed from Table 1, the training time for MOS-ELM is close to that for OS-ELM in various database just as we expected.MOS-ELM has a lower standard deviation, implying superior stability.Owing to the memory principle, MOS-ELM performs better generalization performance than OS-ELM in the timeliness databases such as Sunspot, Mackey-Glass and Logistic.In the timeless databases such as Auto-MPG and Housing, MOS-ELM algorithm and OS-ELM algorithm have the close generalization performance.The excellent generalization performance and stability in timeliness databases have created good conditions for the use of MOS-ELM in the sensor fault diagnosis with the characteristic of timeliness.

Sensor Fault Diagnosis for Aero Engines
As sensors have shortcomings of easy fault, FDI of a sensor system plays a very important part in ensuring the reliability of an aero engine control system.If the failure of sensors takes place, the safety of the aero engine would be seriously affected.Accurate sensor fault diagnosis with fast response is essential to enhance the reliability and safety of aero engines.A 8 [34].And t k denotes the prediction of the measurement vector y k .Figure 2 illustrates the prediction module of the fault diagnosis system according to the proposed MOS-ELM algorithm.Each measurement is predicted by an independent MOS-ELM respectively and is able to be mathematically expressed as:

Diagnosis Method
where i k t denotes the i-th element of k t , ( ) y , and p denotes the embedding dimension for the prediction process.
The prediction of the measurement k t is used as an analytical channel for the diagnosis logic in Figure 1.If the discrepancy among the analytical channel k t and the measured channel k y exceeds a tolerance level, the fault diagnosis logic is able to determine the cause of the difference.For each measured parameter, the sensor fault indicator is introduced as the comparison of the analytical channel against the measured channel, and it is defined as follows: where the analytical residual i k r is the absolute difference between i k t and i k y .Two typical kinds of sensor faults, drift fault and bias fault, are considered in this paper, and the thresholds for drift fault and bias fault are defined as DC and FC respectively.The analytical residual computed for each sensor is compared against the thresholds DC and FC , and the detection logic can determine the fault level.If an analytical residual exceeds the bias fault threshold, it implies the existence of a Figure 2 illustrates the prediction module of the fault diagnosis system according to the proposed MOS-ELM algorithm.Each measurement is predicted by an independent MOS-ELM respectively and is able to be mathematically expressed as: where t i k denotes the i-th element of t k , f i (•) represents the i-th MOS-ELM, y i k−p is created through removing i-th measurement y i k−p form y k−p , and p denotes the embedding dimension for the prediction process.
The prediction of the measurement t k is used as an analytical channel for the diagnosis logic in Figure 1.If the discrepancy among the analytical channel t k and the measured channel y k exceeds a tolerance level, the fault diagnosis logic is able to determine the cause of the difference.For each measured parameter, the sensor fault indicator is introduced as the comparison of the analytical channel against the measured channel, and it is defined as follows: where the analytical residual r i k is the absolute difference between t i k and y i k .Two typical kinds of sensor faults, drift fault and bias fault, are considered in this paper, and the thresholds for drift fault and bias fault are defined as DC and FC respectively.The analytical residual computed for each sensor is compared against the thresholds DC and FC, and the detection logic can determine the fault level.If an analytical residual exceeds the bias fault threshold, it implies the existence of a bias fault.If an analytical residual exceeds the drift threshold and does not exceed the drift fault, it implies the existence of a drift fault.
If the drift fault or bias fault occurs, a correction strategy is applied to reconstruct the measurement and isolate the fault sensor.The correction trick is able to be described as the following equation: where y i k denotes the i-th reconstruction value of y k and m > 1 is a correction factor.If the bias fault is detected, the measured value y i k does not contain any effective information.Then the reconstruction value y i k is completely determined by the prediction value t i k , and the fault measured value y i k is isolated.In addition, if the drift fault is detected, we use the above correction strategy to properly utilize the information of the measurement and the prediction.

Diagnosis System for L N Sensor
In this subsection, we use a double-shaft turbofan component level model to be the research object and the measurement noise considered here is Gaussian with the standard deviation being 0.30% [35].At .The thresholds DC and FC are determined according to the compromise between false alarm rate and corrected detection rate.We select the thresholds from the following domain: The best combination of DC and FC is selected manually considering the false alarm rate and corrected detection rate.For the OS-ELM case, the thresholds are selected as

Diagnosis System for N L Sensor
In this subsection, we use a double-shaft turbofan component level model to be the research object and the measurement noise considered here is Gaussian with the standard deviation being 0.30% [35].At H = 0 km and Ma = 0, the acceleration and deceleration of an aero engine is simulated with the throttle lever angle in the interval 30 • -70 • .Because the main dynamic characteristics of the aero engine can be considered to be a second-order element, the embedding dimension is set as p = 2 [36].In order to avoid affecting the weight in the diagnosis system, the measurements acquired from sensors are normalized into [−1, 1].The thresholds DC and FC are determined according to the compromise between false alarm rate and corrected detection rate.We select the thresholds from the following domain: The best combination of DC and FC is selected manually considering the false alarm rate and corrected detection rate.For the OS-ELM case, the thresholds are selected as DC = 0.0145 and FC = 0.0255; and for the MOS-ELM case, the thresholds are selected as DC = 0.0140 and FC = 0.0245; and the correction factor is set as m = 2.The magnitude of bias fault and drift simulated in this section is 3% and 0%-4%, respectively.
The diagnosis results based on OS-ELM and MOS-ELM algorithms for the N L sensor are illustrated in Figures 3 and 4, respectively.The fault level of Figures 3b and 4b is defined as follows: 0, no fault; 1, drift fault; 2, bias fault.The drift fault takes place during the interval 5-9 s, and the bias fault comes up during the interval 34-38 s.We can easily find that the reconstruction value by MOS-ELM is more accurate than that by OS-ELM, and the lower accuracy of OS-ELM tends to lead to a false alarm.With the prediction value being the analytical redundancy, the reconstruction value y i k can effectively approximate the real value, ensuring the approximate validity of sensor signals even if the bias or drift fault happens.Hence, the right commands are able to be produced in accordance with the control law, and then the reliability and safety of aero engines is enhanced.Figure 5 illustrates the prediction bias of N L sensor by OS-ELM and MOS-ELM.It is obvious that MOS-ELM tends to generate more accurate predicted bias than OS-ELM, owing to handling the timeliness properly.0, no fault; 1, drift fault; 2, bias fault.The drift fault takes place during the interval 5 9 s -, and the bias fault comes up during the interval 34 38 s -.We can easily find that the reconstruction value by MOS-ELM is more accurate than that by OS-ELM, and the lower accuracy of OS-ELM tends to lead to a false alarm.With the prediction value being the analytical redundancy, the reconstruction value  i k y can effectively approximate the real value, ensuring the approximate validity of sensor signals even if the bias or drift fault happens.Hence, the right commands are able to be produced in accordance with the control law, and then the reliability and safety of aero engines is enhanced.Figure 5 illustrates the prediction bias of L N sensor by OS-ELM and MOS-ELM.It is obvious that MOS-ELM tends to generate more accurate predicted bias than OS-ELM, owing to handling the timeliness properly.    .We can easily find that the reconstruction value by MOS-ELM is more accurate than that by OS-ELM, and the lower accuracy of OS-ELM tends to lead to a false alarm.With the prediction value being the analytical redundancy, the reconstruction value  i k y can effectively approximate the real value, ensuring the approximate validity of sensor signals even if the bias or drift fault happens.Hence, the right commands are able to be produced in accordance with the control law, and then the reliability and safety of aero engines is enhanced.Figure 5 illustrates the prediction bias of L N sensor by OS-ELM and MOS-ELM.It is obvious that MOS-ELM tends to generate more accurate predicted bias than OS-ELM, owing to handling the timeliness properly.

Statistical Performance for Different Fault Mode
Five kinds of single fault modes, {N L }, {N H }, {T 22 }, {P 22 } and {T 3 }, are considered in this subsection.The single fault mode {N L } denotes that N L breaks down alone, and the other four single fault modes follow suit.In addition, the case that more than one sensor are likely to break down at the same time is not overlooked.Two kinds of dual fault modes, {N L , P 22 } and {N H , T 3 }, are considered here.The dual fault mode {N L , P 22 } denotes that N L and P 22 break down at the same time, and {N H , T 3 } represents that N H and T 3 break down simultaneously.For the purpose of obtaining robust statistical results, 20 different trials are carried out for each instance.Furthermore, in order to measure the learning performance, the root-mean-square error (RMSE) is defined as follows [37]: where #Testing denotes the testing samples number.In general, a smaller RMSE implies a better predicting accuracy for a learning algorithm.The average prediction RMSE for drift sensor fault and bias sensor fault is given in Table 2.It is obvious that the proposed MOS-ELM has lower prediction RMSE and superior generalization performance than OS-ELM in sensor fault diagnosis for aero engines, in both bias fault and drift fault cases, which can be attributed to properly tackling the timeliness of the sensor fault diagnosis system.Just as is derived by theory, the proposed MOS-ELM has no advantage in the aspect of the training time.For each instance, the training time for five learning machines is less than 4 s through the proposed MOS-ELM algorithm or OS-ELM algorithm, which corresponds to 40 s simulation duration.Thus, the real-time performance demand for an aero engine control system is completely met.In addition, in order to measure the performance of the sensor fault diagnosis system, two performance indexes-the correct detection rate and the false alarm rate-is considered here.The average correct detection rate for drift sensor fault and bias sensor fault is given in Table 3. Figure 6a,b illustrates the false alarm rate for drift sensor fault and bias sensor fault respectively.As observed from Table 3 and Figure 6, we can easily find that the MOS-ELM algorithm has a higher corrected detection rate and lower false alarm rate than OS-ELM algorithm.As a result of the coupling among the different sensors, the dual fault mode {N L , P 22 } cannot be detected by the OS-ELM algorithm, while it can be detected well by the MOS-ELM algorithm.

Conclusions
In many real online learning applications, the sequentially arrived data usually has the characteristic of timeliness.The OS-ELM trains the neural network chunk by chunk, but at the same time, it cannot deal with the timeliness of the data chunk.Based on the OS-ELM algorithm, we propose a novel algorithm, MOS-ELM, which introduces the concept of memory principle into OS-ELM to improve the learning process by declining the validity of the outdated data chunk which may mislead the subsequent learning process.Thus MOS-ELM is able to learn sequentially as does the OS-ELM algorithm, but at the same time deals with the timeliness of data chunk properly.
Compared with OS-ELM, simulations on benchmark databases exhibit that MOS-ELM performs better in generalization performance, stability, and prediction accuracy while the tested problems possess the characteristic of timeliness.On this basis, MOS-ELM is employed in detecting, isolating, and reconstructing the fault sensor signal of aero engines.The experiment results show that MOS-ELM has better predictability and generalization performance than OS-ELM in diagnosing a sensor fault.Furthermore, the feasibility and effectiveness of the MOS-ELM-based sensor fault diagnosis system imply that the diagnosis system is an approach with great promise for enhancing the reliability and safety of the aero engine control system.

Figure 1
Figure 1 illustrates the structure of the fault diagnosis and reconstruction system composed of the prediction module and fault diagnosis logic.The vector of measurements y k = [N L , N H , T 22 , P 22 , T 3 ] T k consists of the low pressure rotor speed N L , the high pressure rotor speed N H , the fan discharge

Figure 1 3 TFigure 1 .
Figure 1 illustrates the structure of the fault diagnosis and reconstruction system composed of the prediction module and fault diagnosis logic.The vector of measurements [ ] 22 22 3 , , , , = T k L H k N N T P T y consists of the low pressure rotor speed

Figure 1 .
Figure 1.Structure of sensor fault diagnosis and reconstruction system for aero engines.

y
denotes the i-th reconstruction value of k y and 1 > m is a correction factor.If the bias fault is detected, the measured value i k y does not contain any effective information.Then the reconstruction value  i k y is completely determined by the prediction value i k t , and the fault measured value i k y is isolated.In addition, if the drift fault is detected, we use the above correction strategy to properly utilize the information of the measurement and the prediction.

Figure 2 .
Figure 2. Diagram of the prediction module of the fault diagnosis system according to proposed online sequential extreme learning machine with memory principle (MOS-ELM) algorithm.
and deceleration of an aero engine is simulated with the throttle lever angle in the interval 30 70 -  .Because the main dynamic characteristics of the aero engine can be considered to be a second-order element, the embedding dimension is set as 2 = p[36].In order to avoid affecting the weight in the diagnosis system, the measurements acquired from sensors are normalized into [ ] 1,1 −

Figure 2 .
Figure 2. Diagram of the prediction module of the fault diagnosis system according to proposed online sequential extreme learning machine with memory principle (MOS-ELM) algorithm.

Figure 3 .
Figure 3. (a) Reconstruction of L N sensor by OS-ELM; (b) Detected fault level of L N sensor by OS-

Figure 4 .
Figure 4. (a) Reconstruction of L N sensor by MOS-ELM; (b) Detected fault level of L N sensor by

Figure 3 .
Figure 3. (a) Reconstruction of N L by OS-ELM; (b) Detected fault level of N L sensor by OS-ELM.

Figure 3 .
Figure 3. (a) Reconstruction of L N sensor by OS-ELM; (b) Detected fault level of L N sensor by OS-

Figure 4 .
Figure 4. (a) Reconstruction of L N sensor by MOS-ELM; (b) Detected fault level of L N sensor by

4. 3 N , { } 22 T , { } 22 P and { } 3 T 22 P
Statistical Performance for Different Fault Mode Five kinds of single fault modes, { } L N , { } H , are considered in this subsection.The single fault mode { } L N denotes that L N breaks down alone, and the other four single fault modes follow suit.In addition, the case that more than one sensor are likely to break down at the same time is not overlooked.Two kinds of dual fault modes, break down at the

Figure 5 .
Figure 5. Predicted bias of N L sensor by OS-ELM and MOS-ELM.

Figure 6 .
Figure 6.(a) Comparison of false alarm rate for drift fault; (b) Comparison of false alarm rate for bias fault.Note: the fault mode codes , , , , , , a b c d e f g correspond to

Table 1 .
The benchmark regression databases considered here involve Auto-MPG, which has 338 training and 168 testing data, and Housing, with 338 training and 168 testing data.The software environment for all simulations is MATLAB 7.11 (MathWorks, Natick, MA, USA) and the hardware environment is a general PC with frequency 2.5 GHz frequency.A usual sigmoid function g(x) = 1/(1 + exp(−x)) is used to be the activation function in all simulations and the chunk size is set as 10.There were 50 trials for each database carried out and the average results are illustrated by Table1.Performance comparison between MOS-ELM and OS-ELM on benchmark databases.

Table 2 .
Comparison of prediction RMSE via OS-ELM and MOS-ELM.

Table 3 .
Comparison of correct detection rate via OS-ELM and MOS-ELM.RMSE, the root-mean-square-error; SD, standard deviation.
Note: CDR: correct detection rate.

Table 3 .
Comparison of correct detection rate via OS-ELM and MOS-ELM.
Note: CDR: correct detection rate.