ProgMachina: Feature Extraction and Processing Package for Prognostic Studies

: Prognostic studies of industrial systems essentially focus on health deterioration analysis that has recently been oriented toward data analytics and learning systems. In general, real degradation phenomena su ﬀ er from complex drifted data in which degradation pa tt erns are hidden and change over time. Accordingly, such a process requires a well-structured processing and extraction mechanism to reveal such pa tt erns, which facilitates the transition to other model reconstruction and investigation tasks. In this context, to provide additional simplicity of data processing in the ﬁ eld, a complete software package is designed and grouped into a single function that is fully automated and does not require human intervention. The package named ProgMachina (i.e., prognostic machine) provides a featured list of processed features from a life cycle that passed through de-noising, ﬁ ltering, outlier removal, and scaling process to ensure data signi ﬁ cance in terms of degradation. The package allows using a time window with a speci ﬁ c overlap to ensure that the scanning process of all possible degradation pa tt erns is properly done. Additionally, an exponential function is used to identify a corresponding health index of degraded signals. Besides, a set of well-known metrics is used to assess degradation of extracted features. Data visualization and many previous experiments on machines show the e ﬀ ectiveness of such a methodology in terms of obtained prediction accuracy and degradation assessment. The package is designed with Matlab software and made available online to be exploited in similar ﬁ elds.


Introduction
Nowadays, prognostic studies rely heavily on data analysis and learning systems for condition monitoring rather than highly complex traditional physics-based modeling.Physics-based modeling is primarily needed when the systems under study are both safety-critical and financially expensive, rarely fail under working conditions, and cannot be subjected to real deterioration or accelerated aging laboratory experiments.However, physics-based modeling is used for generative modeling and is also hybridized with learning systems to ensure efficient predictions.In this case, the acquisition, extraction, and processing of run-to-failure data are crucial steps for data analysis and reconstruction of the learning model [1].When it comes to building a learning model for system prognostics, run-to-failure is usually a challenge of complexity and data drift, while degradation patterns are hidden and buried with ever-changing noise and different distortion patterns, respectively, resulting from harsh system operating conditions.In this case, training a learning model with such data will certainly mislead the predictions and over-fit the model [2].In this context, the need for a well-structured feature extraction and processing methodology is urgent to ensure that data is well presented in terms of providing a reliable source of information to improve the performance of the learning model.In the literature, many paths have been proposed, including most importantly, denoising [3], extraction [4], and outlier removal [5].Accordingly, since these methodologies are proven to be necessary for progressive degradations analysis in terms of prognostics studies, the main goal of this paper is to combine them as a single and full package as an important contribution to facilitating such a complex process.In this case, this paper introduces ProgMachina; a full package designed specifically to deal with such run-to-failure data features passing via different important steps.Each of these steps is used to uncover and extract degradation patterns from row data of entire life cycles.The package also allows for releasing an exponentially deteriorating health index for the intended life cycle.
This paper is organized as follows: Section 2 represents the package descriptions, its main features, and its relationship with run-to-failure data besides some illustrative examples.Section 3 is specifically dedicated to introducing the impact of this package on prognostics studies, while the conclusion part is dedicated to limits and future improvements of the package.

ProgMachina Package Description
ProgMachina is a function designed in Matlab software to deliver well-processed runto-failure data with a corresponding health index (HI) ready to feed a learning system for training and evaluation.Table 1 gives further details about the metadata of the package.ProgMachina allows the acquisition of run-to-failure dataset per life cycle (i.e., a single degradation unit from normal operating conditions to a complete failure of the system), which is organized vertically as (observations, channels (i.e., different sensors measurements)) only and uses them to generate an extracted and well-prepared list of features and corresponding HI.According to previous literature [3][4][5], ProgMachina follows specific steps of extraction, denoising, and outlier removal as main steps of uncovering hidden degradation patterns in provided lifecycles, while, smoothening, filtering, and scaling brings further enhanced representations and building strange connection and correlation between data samples.Accordingly, this section is dedicated to exploring such steps in detail.

Features Extraction
A set of well-used features in the literature is included in ProgMachina.These features include mean, standard deviation (Std), skewness, kurtosis, peak to peak, square root of the arithmetic mean (RMS), crest factor, shape factor, impulse factor, margin factor, energy, mean value spectral kurtosis (SKMean), standard deviation of spectral kurtosis (SKStd), spectral kurtosis of skewness (SKSkewness), spectral kurtosis of kurtosis (SK Kurtosis).More details of these features background and their mathematical background can be found in the following references [4][5][6][7].These features are extracted per each time window that overlaps all over the signal and have been selected as they are well-known signal descriptors and used for such slowly evolving degradation processes analysis while reducing problems complexity and prevent from information loss [7].It should be mentioned that these features need further analysis of whether they describe a degradation mechanism or not.In this case, metrics like Monotonicity, Tenability, Prognosability, and Robustness (MTPR) are well investigated for such purposes [7].Accordingly, Prog-Machina also includes such metrics to further provides insights about the degradation ability of extracted features and also to provide further information about feature selections.The goal of measuring MTPR is to guarantee that the signal is monotonic to specific degradation trends and given in a meaningful way through the degradation path reflecting actual system health.While, prognosability, is mainly indicating the possibility of separating faulty and healthy degradation patterns.

Denoising
Slowly evolving degradation processes are well-known with the complex dynamics resulting in very complex feature space with higher levels of noise with unknown sources [8].In this context, the collected features unquestionably need to be subject to a noise reduction procedure.ProgMachina offers an empirical Bayesian wavelet transformation to create more reliable representations by reducing the amount of noise of such features.This method successfully minimizes the effect of noise in the feature space by combining a Cauchy prior with a posterior median threshold rule [9,10].This process is accomplished by including the default "wdenoise(---)" Matlab function.

Outlier Removal
Besides the existence of noise in recorded signals as well as per extracted features, different random pulses of higher magnitude disturbances can be found in such a degradation process.Therefore, an outlier remover is necessary to eliminate/reduce their effects on the recorded data.Subsequently, the denoised characteristics of the entire tire life cycle will be further processed using an outlier removal tool.This distinct outlier removal approach was implemented to distinguish differences in data characteristics.Removal of outliers was done by default, using a moving median function "rmoutliers(---)" [11].

Smoothing and Filtering
Additional filtering and smoothing processes are required to further enhance signal quality and provide further appearance to degradation patterns.While the Markov-switching dynamic regression model models data, we employed the state probabilities of the active latent states in the regime transition for smoothing "smooth(---)".It conducts reverse recursion after doing forward recursion [12].For the filtering process, an additional step of median filtering is involved to further enhance signal quality "medfilt1(---)".

Scaling and Health Index Identification
A min-max normalization in the range [0,1] is used as usual to unify features measurements scales after each process of denoising and smoothing and outlier removal.From such data, ProgMachina defines a deteriorating HI according to an exponential function (see Equation (2) from [13]).

Illustrative Example
As an illustrative example, a life cycle of bearing dataset generated from a mathematical model is used in this case [14].The dataset represents a vibration run-to-failure measurement.Figure 1a is an example of vibration measurement while degradation grows exponentially.ProgMachina is used to find out both features in Figure 1b and HI in Figure 1c.The extracted features are smoother, cleaner, and even representative of degradation than the original row signal and the HI signal.This is demonstrated by the fact that Figure 1d,e,g shows values closer to 1 for all features.This means that the features reflect the degradation mechanism.Meanwhile, Figure 1f provides further instructions for feature selection if dimensionality reduction is required.

ProgMachina Impact
ProgMachina as an easy-to-use single package is expected to draw several advantages to prognostics studies including the most important ones listed as follows:

•
Bringing more simplicity in learning model reconstruction;

Conclusions
This paper has introduced ProgMachina; a full package for feature extraction and processing degradation signals recorded from slowly evolving degradation processes.It flows different important steps of extraction, denoising, outlier removal, smoothing, filtering, and scaling to reach quality signals that can be used to feed learning systems and ready for investigations.We should mention that ProgMachina is built based on a limited set of both time domain and frequency domain feature extraction and processing.Therefore, future opportunities in improving such a package are to consider other features and further signal processing tools to produce better quality and clean data with better illustration of degradation.It is also important to consider tempo-frequency domain.Additionally, we can use methods of attribute reduction to define health indicators in a new space whose evolution will surely be more linear.

Figure 1 .
Figure 1.ProgMachina package inputs and outputs: (a) raw vibration data; (b) Extracted and processed features; (c) Identified health index.