Parsimonious Network based on Fuzzy Inference System (PANFIS) for Time Series Feature Prediction of Low Speed Slew Bearing Prognosis

In recent years, the utilization of rotating parts, e.g. bearings and gears, has been continuously supporting the manufacturing line to produce consistent output quality. Due to their critical role, the breakdown of these components might significantly impact the production rate. A proper condition based monitoring (CBM) is among a few ways to maintain and monitor the rotating systems. Prognosis, as one of the major tasks in CBM that predicts and estimates the remaining useful life of the machine, has attracted significant interest in decades. This paper presents a literature review on prognosis approaches from published papers in the last decade. The prognostic approaches are described comprehensively to provide a better idea on how to select an appropriate prognosis method for specific needs. An advanced predictive analytics, namely Parsimonious Network Based on Fuzzy Inference System (PANFIS), was proposed and tested into the low speed slew bearing data. PANFIS differs itself from conventional prognostic approaches in which it supports for online lifelong prognostics without the requirement of retraining or reconfiguration phase. The method is applied to normal-to-failure bearing vibration data collected for 139 days and to predict the time-domain features of vibration slew bearing signals. The performance of the proposed method is compared to some established methods such as ANFIS, eTS, and Simp_eTS. From the results, it is suggested that PANFIS offers outstanding performance compared to those of other methods.


Introduction
Prognosis methods are commonly applied to predict the lifetime of rotating components, which generally can be divided into two stages. The first stage refers to the normal zone where no significant deviation from the normal operating state observed. The second stage is abnormal zone; this stage is initiated by potential failure that progressively develops into actual failure [1]. It is on the second stage the prognosis methods are usually applied to predict unexpected failures in time basis from the incipient or impeding damage using either event data or condition monitoring (CM) data. 4 classification of prognosis approaches and (2) a review on prognosis methods for rolling element bearings.

Classification of prognosis approaches
There exists in literature some reviews on the prognosis approaches for rotating machineries [15][16][17][18][19][20][21][22]. For example, Lee et al. [15] reviewed prognosis methods for critical components such as bearing, gear, shaft, pump and alternator. The authors classified the prognosis approaches into three, namely model-based, data-driven and hybrid prognosis approaches. However, the classification does not include complete methods on each prognosis approach. In their work, two methods are classified as a model-based approach, i.e. alpha-beta-gamma tracking filter and Kalman filter, while neural network (NN), fuzzy logic and decision tree are classified as a data-driven approach.
Another review paper by Jardine et al. [16] presented a clearer classification in prognosis approaches, but the review emphasized more on rotating machinery diagnostics rather than machinery prognostics. The authors classified the prognosis approaches into three groups: statistical, artificial intelligent (AI) and model-based approaches. The statistical approaches include statistical process control (SPC), logistic regression, autoregressive and moving average (ARMA), proportional hazard model (PHM), proportional intensity model (PIM) and hidden Markov model (HMM). In AI techniques, e.g. artificial neural network (ANN) and its sub-classes such as self-organising neural networks, dynamic wavelet neural networks and recurrent neural networks, back propagation neural network and neural-fuzzy inference systems are still commonly used in AI prognostics. Among modelbased approaches, defect propagation models via mechanistic modelling and crack growth rate model are the commonly used methods.
A popular review paper on prognosis methods was presented by Heng et al. [20] . The authors classified the methodologies for predicting rotating machinery failure into two different groups, namely physics-based and data-driven prognosis models. A number of papers focusing on physics-based prognostics which used Paris' formula are still found to be dominant [20]. Other methods such as finite element analysis (FEA) to calculate stress and strain field, and Forman law of linear elastic fracture mechanics are also classified in physics-based approaches. Similar to the result from the two review papers previously mentioned, ANN and its variants is currently the most commonly used methods in the data-driven prognosis class. Other methods such as fuzzy logic, regression analysis, particle filtering, recursive Bayesian technique and HMM are also included in data-driven prognosis methods. 5 However, a number of methods within the data-driven methods as presented in [20] need further subclassification in terms of artificial intelligent or statistical approaches.
A recent review is presented by Lei et.al. [23], which mentioned that a machinery prognostic method generally consists of four technical processes, i.e. data acquisition, health indicator (HI) construction, health state (HS) division, and RUL prediction. In addition, they also explained that the existing research work and literature review have converged to the four processes, in particular especially the latter one. The paper presents a systematic review that covers the four technical processes comprehensively.
A list of literatures which reviewed the prognosis approaches is presented in Table 1. The table provides the year of publication and the classification groups of prognosis approaches. Although the merits and the demerits of prognosis approaches have also been presented in Lee et al. [15] and Heng et al. [20], to date there have been scientific gaps of prognostics reviews, namely (1) the classification of prognostics approaches that remain unclear: many review papers presented different classifications as seen in Table 1 and sometimes the classified methods in each approach are overlapping depending on the authors; and (2) in what applications can certain prognosis approach be used: the prognosis literature provides few information to help typical industry users in selecting an appropriate approach or method for their specific needs. This paper aims to bridge the gap by providing another classification of prognosis approaches. Such approaches include the basic prognosis method selection for specific needs (shown in Figure 1) and the classified methods of each prognosis approach as presented in Section 2.  8 Heng et al. [20] 2009 -Physic-based prognostics models -Data-driven prognostics models 9 Peng et al. [21] 2010 -Physical model-based methodology -Knowledge-based methodology -Data-driven methodology -Combination model 10 Dragomir et al. [22] 2009 -Model-based approaches -Data-driven approaches 11 Hines et al. [25] 2008 -Time-to-failure data-based prognostics -Stress-based prognostics -Effect-based prognostics 12 Kim [26] 2010 -Data-driven approaches -Model-based approaches -Reliability-based approaches  Figure 1. A general prognosis approach or method selection for specific needs.

Prognosis methods for rolling element bearings
According to the literature review presented in Table 1, four prognosis approaches have been adopted in this paper. The prognosis methods of each approach are reviewed and presented in Table 2. Table 2 also presented the degradation parameters or features extractions which were applied in a particular method or algorithm. It can be seen that RMS and kurtosis are the most commonly used features in prognosis methods. It is worth noting that the reviewed methods in this paper only focused on rolling element bearings prognostics. The approaches are: (1) model-based approaches; (2) reliability-based methods and probability models; (3) data-driven approaches; and (4) combined datadriven approaches and reliability-based methods. 9 Table 2 Prognosis methods for rolling element bearing. Method or algorithm  Features used   1 Model-based approaches -Physics-based -Paris' formula [27,28] N/A prognostics models -Stiffness-based prognostics N/A model [29] -State space-based -Kalman filter [30] N/A Methods -Particle filter [31] RMS and envelope acceleration 2 Reliability-based -Gaussian process models [32] Rényi entropy methods and -PIM [33] Kurtosis probability models -PCM [34] Principal features -Stochastic model [35] N/A -PHM [36,37] N/A -Weibull distribution [ [38]] N/A -HMM [39] RMS -WPD and HMM [40] Peak-to-peak, energy and kurtosis 3 Data-driven approaches -Artificial intelligence (AI) -ANN [41,42] N/A methods -Fuzzy logic [43] N/A -Genetic algorithm (GA) [44] Monitoring index -Regression methods -ALE and ARIMA [45] RMS, skewness, kurtosis -Recursive least square (RLS) [28] RMS -Dempster-Shafer regression [46] RMS and envelope acceleration -ARMA/GARCH model [47] RMS and envelope acceleration -Combined AI and -RVM and logistic regression [ [37] Peak acceleration and RMS -SVM and survival probability [50] Kurtosis -RVM and survival probability [51] Kurtosis -ANN and Weibull distribution [52] RMS, kurtosis and entropy estimation

Review on model-based approaches
Methods included in model-based approaches require an accurate mathematical model to be developed. They also use residuals as features, where residuals are the outcomes of consistency checks between the measurements of a real system and the outputs of a mathematical model [26]. Model-based prognosis approaches in this paper are divided into two classes, namely physics-based prognosis models and state space-based methods.

Physics-based prognostics models
The physics-based approaches assume that accurate mathematical or physical models of the monitored system or machine are available and provide a technically comprehensive approach that has been used traditionally to understand failure mode progression. A common model used in physicsbased prognosis model is Paris's law equation. The physics-based methods are able to predict the failure progression accurately if the appropriate model is used. However, a limitation of physics-based approaches is associated to its inflexibility which means that the particular model can only be applied to specific types of components [53]. Some literatures applied model-based prognosis methods are presented as follows: Li et al. [27] present a defect propagation model by mechanistic modelling approach for bearing prognosis to estimate RUL of rolling element bearing.
In slew bearing prognosis study, several works has been conducted. Potočnik et al. [54] calculate the maximal contact force by means of analytical expression of the Hertzian contact theory, and then used a strain-life model to calculate the fatigue life on the basis of the subsurface stresses. Glodez et al. [55] compare the two methods for calculating the fatigue life of a slewing bearing: strain-life approach and stress-life approach based on ISO 281 [56]. Results show that the stress-life approach is the most precise method for calculating bearing fatigue lifetime.

State space-based methods
Besides physics-based methods, the state space-based methods such as Kalman filter and particle filter are also considered as a part of model-based prognosis methods because it builds a dynamic model of system being analysed to predict a future point in time.
Kalman filtering (KF) incorporates the signal embedded with noise and forms that can be considered as a sequential minimum mean square error estimator (MMSE) of the signal [26].
Particle filter or Monte Carlo methods for nonlinear filtering are based on sequential versions of the importance sampling paradigm. This is a technique that amounts to simulating samples under an instrumental distribution and then approximates the target distributions by weighting these samples using appropriately defined importance weight. Particle filter offers the great advantage of not being subject to the assumption of linearity, Gaussianity and stationary [31]. Particle filter for rolling element bearing prognostics is presented in [31].

Review on reliability-based methods and probability models for prognosis
As aforementioned, prognostics is used to predict how much time left before a failure occurs given the current machine condition and past operation profile which is commonly called as remaining useful life (RUL). In some situation, especially when a fault is catastrophic (e.g., nuclear power plant), it would be desirable to predict the chance that a machine operates without a fault or a failure up to some future time given the current machine condition and past operation profile [16]. This issue can be addressed using failure-based reliability or probability prognosis models. Failure-based reliability is used to estimate the lifetime distribution and its parameters when sufficient, complete and/or censored failure time data exist. If prior knowledge of the lifetime distribution exists for similar components, then often the lifetime distribution is assumed to follow the same distribution of a similar component [26]. For example, Goode et al. [57] separate two intervals of whole machine life: the I-P (Installation-Potential failure) interval in which the machine is running normally and the P-F (Potential failure-Functional failure) in which the machine condition has a problem. Based on two Weibull distributions assumed for the I-P and P-F intervals, failure prediction has been derived in the two intervals and the RUL is estimated.

Proportional hazard model
Proportional hazards models (PHMs) are commonly used in failure prediction and reliability analysis. The method was proposed by Cox in 1972 [58] and was first introduced in the clinical studies to characterise the disease progression in existing cases by revealing the importance of covariates [36].
It is the most popular model for survival analysis due to its simplicity. The reason is that it is not based on any assumptions concerning the nature or shape of the survival distribution [36]. PHMs assume that hazard changes proportionately with covariates and that the proportionality constant remains the same at all time [20]. The method has been used in bearing prognostics [36,37]. A review of the existing literature on the PHM is presented in [59]. Usually PHM cannot be used as a stand-alone prognostic method. PHM is usually used together with AI method. PHM is used to build the degradation model, based on this model, AI method e.g. SVM is used to predict the degradation model [36,37].

Proportional covariates model
A proportional covariates model (PCM) is proposed by Sun et al. [34]. PCM can be used to estimate the hazard functions of mechanical components in cases of sparse or no historical failure data provided that the covariates are proportional to the hazard.

Reliability model
Heng et al. [60] introduce an intelligent reliability model called the intelligent product limit estimator (PLE), which was able to include suspended CM data in machinery fault prognosis. The accurate data modelling of suspended data has been found to be of great importance, since in practice machines are rarely allowed to run to failure and data are commonly suspended. The model consists of a feed-forward neural network (FFNN) whose training targets are asset survival probabilities estimated using a variation of the Kaplan-Meier estimator and the true survival status of historical units.

Proportional intensity model
Vlok et al. [33] utilise statistical residual life estimate (RLE) on roller bearings to study changes in diagnostics measurements of vibration and lubrication levels which can influence bearing life. RLEs are based on proportional intensity models (PIMs) and mainly used for non-repairable systems utilising historic failure data and the corresponding diagnostic measurements.

Stochastic model
A prediction method for residual life of rolling element bearing based on stochastic process called gamma process is presented in [35].

Weibull distribution
In slew bearing cases, several reliability methods for prediction have also been studied. Yang et al.
[38] present the reliability prediction approach for slew bearing based on the Weibull distribution. Hai 13 et al. [61] develop a method for evaluating rolling contact fatigue (RCF) reliability of slew bearings, which replaced the reliability factor a1 from ISO 281 with the Lundberg-Palmgren theory.

Hidden Markov model
The use of hidden Markov models (HMMs) in bearing fault prognosis is investigated by Zhang et al. [39]. In a HMM, a system is modelled to be a stochastic process in which the subsequent states have no causal connection with previous states [20]. It is assumed that the state transition time of estimated vectors follows some multivariate distribution. Once the distribution is addressed, the conditional probability distribution of a distinct state transition can be estimated [21]. Tallian [62] presents a rolling bearing life prediction model using statistical lifetime determination.

Review on data-driven prognosis approaches
Data-driven approaches are derived directly from routine condition monitoring (CM) data of the monitored system (e.g. temperature, vibration, oil debris, current, etc). Data-driven approaches can be regarded as degradation-based methods because they focus on using measures of component degradation, not on failure data, to assess the remaining of a component. In other words data-driven approaches rely on the availability of run-to-failure data and require performing suitable extrapolation to the damage progression to estimate RUL. One major advantage of these techniques is the simplicity of their calculations [20] because these methods do not require mechanistic or physical knowledge of the system or component being analysed. Data-driven approaches may often produce more available solution in many practical cases. The reason is probably that the data-driven models calculated from data-driven methods are easier to obtain compared to an accurate model from a system or component.
A main drawback of data-driven approaches is their dependency on the equality of the monitored data.
In data-driven approaches, the proper selection of a trending parameter or feature is the key issue in implementing the prognosis. The selection criteria for such parameter should include the diagnosis ability, sensitivity, consistency and the amount of calculation required [41]. In this paper, data-driven 14 approaches are divided into three sub-categories: (i) artificial intelligence (AI) methods; (ii) regression methods; and (iii) combined AI and regression methods.

AI methods
The first data-driven approaches sub-class (AI methods) are based on machine-learning techniques for prognostics. AI methods predict the selected features that correlate with the failure progression based on the learning or training process. The methods rely on past patterns of degradation to project future degradation. The features used in AI methods are extracted from CM data e.g. vibration signals.
More CM data are used in the training process, and more accurate model is obtained, but computational time increases. As AI methods use experimental data to train the methods in order to build a prediction model, thus, AI methods are highly-dependent on the quantity and quality of the measured data. In general, AI methods adopt a one-step or multi-step ahead prediction technique in order to predict the future state. A review of AI methods for prognostics can be found in [63]. Several AI methods have been developed for decades. Artificial neural network (ANN) and its variants such as self-organizing map (SOM) and back propagation neural network (BPNN) methods are most commonly used [41,42,[64][65][66]. Although ANN is the most commonly used method and it has worked successfully in bearing prognosis application, it has fundamental drawbacks in model development. Such drawbacks include how many hidden layers should be included and what is the number of processing nodes that should be used for each layer. These are the major questions for users.
Another popular AI technique that is used for prognostics is the fuzzy logic technique. Fuzzy logic provides a language (with syntax and local semantics) into which one can translate qualitative knowledge about the problem to be solved. In particular, fuzzy logic allows the use of linguistic variables to model dynamic systems. These variables take fuzzy values that are characterized by a sentence and a membership function. The meaning of a linguistic variable may be interpreted as an elastic constraint on its value. These constraints are propagated by fuzzy inference operations. The resulting reasoning mechanism has powerful interpolation properties that in turn give fuzzy logic a remarkable robustness with respect to variations in the system's parameters and disturbances.
Other AI prognostics methods for bearing have been applied e.g. LVQ is used by Zhang et al. [67] to generate a sequence of codes for representing fault signatures in the model. 15

Regression methods
The second data-driven sub-class (regression methods) are based on time series analysis techniques for prognostics. The regression methods are useful if a reliable or accurate system model is not available. The data-driven prognosis approaches through regression methods are used to determine the RUL. This is achieved by trending the trajectory of a developing fault and predicting the amount of time before it reaches a predetermined threshold level [15].
Niu and Yang [68] introduce the Dempster-Shafer regression for multi-step-ahead prediction of a methane compressor in a petrochemical plant. Using the similar vibration data, Pham et al. [47] develop a forecasting method based on ARMA/GARCH model.
Kosasih et al. [45] present the adaptive line enhancer (ALE) and auto-regressive integrated moving average (ARIMA). Jantunen [43] uses high-order regression functions to mimic bearing fault development and also to save trending data in a compact form.

Combined AI and regression methods
In combination of AI and regression methods, Caesarendra et al. Tran et al. [70] employ a multi-step-ahead regression technique and ANFIS to predict the trending data. 16

Review on combined data-driven method and reliability-based methods
In this method, the statistical methods are used when the AI prognostics methods require quantitative data measurements.
Widodo and Yang [50] develop an intelligent machine prognostics method using survival probability

Slew bearing test-rig and data acquisition
The run-to-failure data used in this paper was collected from slew bearing test rig. The test rig was designed to replicate an actual condition in steel mill manufacturing that operate the bearing in low rotational speed, high load and dust environment. Figure 2 shows the schematic of the slew bearing test rig including the main drive gear reducer, the hydraulic load and how the bearing is attached. A detailed sensors placement is presented in Figure 3. Four accelerometers, two AE sensors and four temperature sensors were used during the experiment. Two accelerometers of IMI 608A11 ICP type sensors with sensitivity 100mV/g and frequency range 0.5 to 10 kHz, and two accelerometers of IMI 626B02 ICP type sensors with sensitivity 500mV/g and frequency range 0.2 to 6 kHz were used. The IMI 608A11 ICP type sensors were installed on the inner radial surface at 180 degrees to each other and the IMI 626B02 ICP type sensors were attached on the axial surface at 180 degrees to each other.
Similar to the measurement in continuous rotation, these accelerometers were connected to a high speed Pico scope DAQ (PS3424). The IMI 626B02 ICP type accelerometers were selected because of the minimum frequency range of 0.2Hz and because the sensitivity is higher than that of the IMI 626B02 ICP type accelerometer. The vibration signal was acquired using 4880 Hz sampling rate.

17
A three axes (two axial rows and one radial row) brand new slew bearing is used in this experiment.

Each axial and radial row has dozens of rollers inside. The slew bearing is typically large in dimension
and is usually used to support high axial and radial load [71]. The bearing attachment to the test rig is shown in Figure 2 Accelerometer sensor 1 (radial) Accelerometer Sensor 2 (axial) 2 Accelerometer Sensor 3 (radial) 4 Accelerometer Sensor 4 (axial) 1 3 Figure 3. A detailed sketch of accelerometers and AE sensors location.

Feature extraction
Nine time-domain features (i.e. RMS, variance, skewness, kurtosis, shape factor, crest factor, entropy, histogram upper and histogram lower) are extracted from four vibration data collected in 139 days. The example plot of nine features from accelerometer 1 is presented in Figure 4. It can be seen from Figure 4 that not all features represent the degradation condition of slew bearing. Kurtosis, variance and histogram lower features are more sensitive to the bearing condition among the other 9 features. Focusing on the RMS, variance, kurtosis and histogram upper and histogram lower feature that shows a sudden peak on day 90. This is due to the coal dust has inserted to the bearing and makes the roller and raceway has an incipient defect.

Parsimonious network based on fuzzy inference system (PANFIS)
This section presents the working principle of PANFIS [11]. The multivariate Gaussian function producing the rule firing strength is expressed as follows:  [11], the normalization of rule firing strength is performed as follows: where C is the number of fuzzy rule. The output of PANFIS is resulted from the weighted average of the rule firing strength and the rule consequent as follows: where This concept is inspired by the neuron growing mechanism of GGAP-RBF [9] and SAFIS [7]. The key difference, however, can be found in the fact that PANFIS extends the statistical contribution concept to the framework of multivariate Gaussian function. Suppose that a hypothetical rule (C+1) is created using the newest data point, the DS method can be formulated as follows: where n n n e T y denotes the system error. , If a new fuzzy rule is added, the center of the multivariate Gaussian function is set as the data sample of interest while the diagonal element of inverse covariance matrix is set according to the εcompleteness principle as follows: where ε is usually set at 0.6. The ε-completeness principle borrows the seminal work of DFNN and GDFNN in [72] and [73]. It is said that there does not exist any data point with membership degree less than ε if this setting is implemented and has been mathematically confirmed. It is worth-noting that the covariance matrix of the multivariate Gaussian function is vital to the success of PANFIS. Too large values lead to averaging while too small values leads to be overfitting.
Another situation may occur during the training process where a data sample induces minor conflict. That is, the rule growing condition is not satisfied   (7), (8) to allow stable adaptation because a cluster will converge when it is occupied by a high number of supports. This, however, calls for the forgetting mechanism in the presence of concept drift to improve sensitivity of a highly populated cluster in accepting new training stimuli. The direct update formula of (8) is obtained from the rank-1 modification principle [5].

Rule pruning scenario of PANFIS
PANFIS is equipped by a rule base simplification strategy, namely Extended Rule Significance (ERS) method which discards inconsequential rules. That is, inconsequential rules which play little 22 during their lifespan can be detected and in turn pruned. In realm of EIS, the rule pruning strategy is vital to alleviate the risk of overfitting and to improve the interpretability of rule semantics. The rule pruning scenario adopts the same principle of the DS method which approximates the statistical contribution of fuzzy rules. The ERS method is formalized as follows: has a low output weight because it results in a small output which can be negligible to the overall predictive output of PANFIS. A fuzzy rule is pruned provided the following condition is met: where 2 g is the rule pruning threshold. The higher the value of the rule pruning threshold the higher the number of fuzzy rules are pruned during the training process and vice versa. Because the ERS method and the DS method share similar working principle, 12 , gg are often selected close to each other.
PANFIS is equipped by the fuzzy set merging strategy which aims to coalesce highly overlapping fuzzy sets. Although two fuzzy rules are well-separated in the high-dimensional space, the overlap in fuzzy sets is usually resulted from the projection to one-dimensional axis. This situation often results in inconsistency of rule semantics because fuzzy rules with similar fuzzy sets generate different rule conclusions. PANFIS utilizes the kernel-based metric principle for fuzzy set merging scenario which compares the center and width of Gaussian fuzzy sets in one joint formula [4]. It is expressed: This formula holds the following interesting properties. ker ker ( , ) 1 Two fuzzy sets are merged if the kernel-based metric returns ker ( , ) 0.
where , ii r  are respectively the Mahalanobis distance and the diagonal elements of the covariance matrix. This mechanism is executed once completing the training process to show the fuzzy rules to operators.

Adaptation of rule consequent
PANFIS utilizes the extended recursive least square (ERLS) method to update the rule consequent of the fuzzy rule. This approach differs from the original RLS method because of the insertion of a constant  to improve asymptotic convergence of the weight vector. This approach is inspired by the 24 work in [8] where the system error convergence and the weight vector convergence have been mathematically proven with the help of the Lyaponov stability criterion. The constant  behaves like the binary function where it is activated when the approximation error e ) is greater than the system error e . It is worth mentioning that the approximation error refers to the system error before the adaptation process where PANFIS aims to make one-step ahead prediction while the system error corresponds to after the tuning process n n n e T y . In other words, the weight vector remains unchanged if the approximation error happens to be lower than the system error.
where , i QL are the covariance matrix of the i-th rule and the Kalman gain respectively. For the global learning scenario, the covariance matrix Q is global where it embraces the covariance matrix of all fuzzy rules. Since PANFIS is evolving in naturefuzzy rules can be dynamically added -, rule consequent is set as method can be also seen as a variation of FWRLS method in [3] where the binary function is used to enhance the convergence of adaptation process.

Time-series feature prediction
This section presents our numerical study on the slew bearing prognosis method using the 139 data samples which corresponds to 139 daily records of vibration signals. Nine input features, namely RMS, variance, skewness, kurtosis, shape factor, crest factor, entropy, histogram upper and histogram lower, are extracted. Our simulation was carried out under two modes: direct and time-series. 25

Direct mode prediction
The direct mode prediction aims to study the correlation among input features, where 8   The evolving and adaptive characteristic of PANFIS is illustrated in Figure 6 where it shows the fuzzy rule evolution of PANFIS in the kurtosis feature problem. PANFIS starts its learning process from no rule at all until the fuzzy rules are automatically created and pruned on the fly in accordance 26 to the novelty of data streams. PANFIS responds timely changing characteristics of the system where it introduces a new rule at t=88 when there exists "spike" in the kurtosis feature. PANFIS is compared against three prominent algorithms, eTS [3], simp_eTS [9] and ANFIS [2]. eTS and simp_eTS are counterparts of PANFIS which features both structural and parameter learning scenarios in the online manner. This comparison is needed to confirm the learning performance of PANFIS with respect to similar algorithms. ANFIS is a pioneer of fuzzy neural network (FNN) which occurs to be more traditional than PANFIS, eTS and simp_eTS. It adopts an offline learning scenario where the training process is repeated over multiple epochs. Ten epochs are applied in our study. This comparison aims to demonstrate that although PANFIS works fully in the one-pass learning scenario, it produces comparable predictive quality. epochs of the training process. Also, we found that ANFIS suffers from the curse of dimensionality notably when the grid partitioning method is used. Hence, we fix to 2 numbers of rule in our simulations. The advantage of PANFIS is more obvious in the direct mode than that in the time-series mode as displayed in Table 2. It outperforms other three algorithms in terms of RMSE, rule and fuzzy set in almost all study cases.
This fuzzy rule is rather vague and is unable to be associated directly with linguistic labels because of the absence of atomic clauses. This issue can be addressed using the fuzzy set transformation strategy (13) and this leads to the traditional expression of fuzzy rule as follows:  is close to 11 11 0.29, 0.11 c   2 n Skewness  is close to 12 12 This rule is more readable than (18) because each fuzzy set corresponds to specific linguistic label.
This fuzzy rule is generated under the time-series mode.

Conclusions
A number of literature review on prognosis methods have been found in decades and each review paper presented their own terminology of prognosis approaches. This paper aims to consolidate all the prognosis approaches to provide a clearer understanding and guidelines in selecting the particular prognosis approach for specific needs. Following the review, a study on the data-driven prognosis approach has also been presented in this paper. PANFIS based method, which is considered as one of data-driven prognosis approaches is developed and presented in this paper. PANFIS provides a solution for online prognostic requirement where it characterizes a fully open structure and operated in the single-pass learning mode. This trait makes possible to handle non-stationary characteristics of the system in the sample-wise manner. PANFIS is tested in run-to-failure low speed slew bearing vibration data to predict the time domain vibration features. A comparison study of PANFIS to other three prominent algorithms such as eTS, simp_eTS and ANFIS is also presented. It is shown that PANFIS offers a better prediction performance compared to the three methods.