1. Introduction
Bearings are fundamental mechanical components that facilitate rotational motion across a broad range of engineering applications. These components are integral to electric motors found in power plants, manufacturing facilities, and various modes of transportation, such as land vehicles, airplanes, ships, and space equipment. Operating under harsh conditions and susceptible to factors like improper installation, inadequate or incorrect lubrication, and mechanical damage, bearings can develop faults over time, eventually leading to system breakdowns. According to [
1], bearing faults are responsible for up to 45% of all electric motor failures. Given their critical role in machine operations, the occurrence of significant bearing faults can result in severe consequences, including irreversible machine damage, loss of production, and even human casualties. Consequently, the subject of condition monitoring (CM) of roller element bearings as much as bearing fault diagnosis (FD) has attracted the interest of researchers [
2].
With the widespread availability of high-quality vibration sensors and the advancements in machine learning (ML) and deep learning (DL) algorithms, data-driven approaches to various diagnosis applications [
3,
4,
5], bearing fault diagnosis, and especially approaches based on vibration monitoring, have gained prominence [
6,
7,
8,
9,
10]. A typical data-based method for bearing fault diagnosis using ML generally involves signal processing, feature extraction, feature selection, and ML classification. Conversely, FD methods based on DL can utilize DL algorithms exclusively for classification or dimensionality reduction purposes. Ultimately, DL can be employed to develop end-to-end methods that bypass the manual feature processing [
11,
12,
13,
14] or even trained to perform frequency analysis of time-series data [
15,
16]. While DL models generally outperform other learning algorithms as data volumes increase, real-world scenarios often have insufficient data to achieve the desired model performance levels. Moreover, the explainability of DL models in fault diagnosis remains a challenge, although there is growing momentum in research on this topic [
17,
18]. Consequently, traditional ML techniques for fault diagnosis still hold merit as a viable alternative deserving research focus.
As previously discussed, signal processing serves as the initial step in machine-learning-based FD algorithms. Traditionally, the fast Fourier transform (FFT) algorithm has been widely employed in this field [
19]. However, the FFT algorithm possesses several shortcomings, including limited resolution, the inability to capture transient signals, the absence of time–frequency relations, and the introduction of spectral leakage in the output representation [
20]. To address these challenges, a time–frequency analysis method called short-time Fourier transform (STFT) has been introduced. STFT overcomes the issue of connecting frequency components to the time axis by sliding a window along the time-domain signal and applying an FFT on each windowed segment. The resulting FFTs are then stacked sequentially, yielding a time–frequency representation of the signal. Typically, these windows overlap to mitigate the adverse effects of boundaries. Among the recent methods for bearing FD that utilize STFT in the signal analysis are the time–frequency spectral amplitude modulation method (TFSAM), proposed by Jiang et al., and a method by Zhang et al. that utilizes STFT to obtain input images for the CNN [
21,
22]. Nevertheless, STFT encounters limitations pertaining to the selection of window length. Larger windows are required to analyze lower frequencies, but this compromises time resolution, while smaller windows yield higher time resolution but lack frequency resolution, thus necessitating a tradeoff that remains unresolved.
Empirical mode decomposition (EMD) is a time–frequency method that decomposes time-domain signals into intrinsic mode functions (IMFs) [
23,
24,
25,
26]. Unlike STFT, EMD is adaptive, does not rely on base functions, and accurately captures local features without assuming periodicity. It enables high-resolution processing of non-stationary signals without segmenting them into smaller parts. EMD famously suffers from the “mode mixing” (MM) and “mode splitting” (MS) phenomena. These occur as side effects of signal contamination with noise and imprecise definition of the local extrema on which the IMFs are based. While MM refers to the blending or mixing of different modes or components of a signal into a single IMF, MS refers to occurrences when a single oscillatory mode in the original signal is decomposed into two or more IMFs [
27]. Consequently, the decomposition may not accurately represent the underlying components of the signal, leading to difficulties in signal analysis and interpretation. To mitigate these effects, techniques like ensemble EMD (EEMD) and complete EMD (CEMD) have been developed [
28,
29]. The noise-eliminated EEMD (NEEEMD) method yielded improved noise reduction by decomposing the ensemble of white noise signals using EMD and subtracting it from the outputs of EEMD [
30]. Another method that restrains the mode mixing and solves the over- and undershooting problem caused by the cubic spline curve is an improved EMD (I-EMD) method, which replaces cubic spline interpolation with weighted rational quartic spline interpolation (WRQSI) and introduces a novel parameter selection criterion called envelope characteristic frequency ratio (ECFR) [
31]. All these improvements generally involve applying EMD to multiple realizations of the signal, achieved by adding different types of white Gaussian noise in each trial. This helps refine the decomposition and reduce mode mixing. However, these techniques may face difficulties in deployment in industrial settings due to their computationally intensive nature. The repetitive algorithms and the trade-off between the number of decomposition attempts and quality contribute to the substantial computing time required. Moreover, in the recent discussion published by Randall R. B. and Antoni J., it is argued that EMD is generally of little benefit for the diagnosis of rolling element bearings, because while EMD and similar decompositions require continuous phase signal to perform meaningful successful analysis, roller element bearing signals have a discontinuous phase. This means that the decomposition wastes excessive computation time to ensure a continuous phase in mono-components of the bearing signal, when in reality bearing signals are stochastic in nature and cannot be decomposed into unique mono-components; thus, such methods as wavelet analysis and fast kurtogram are considered more appropriate [
32].
The wavelet packet transform (WPT) is another time–frequency analysis method that surpasses STFT in terms of both time–frequency resolution and sensitivity to transient components. This decomposition is closely related to the discrete wavelet transform (DWT) in the way that decomposition is based on the discrete levels of mother wavelets scaled in powers of two. However, with each level DWT splits the decomposition only towards the lower frequencies, creating a branch of consequent low-pass filters, which with every level cut off the higher half of the signal spectrum. Unlike DWT, WPT splits in all directions and decomposes a signal into various sub-bands with different frequencies, which comprise a full
decomposition tree, allowing for a more detailed analysis of the signal. WPT, like all methods in this family, uses wavelet functions with non-zero values only at a specific limited duration of time, which act as decomposition bases [
33]. Unlike the sine waves used by FFT and STFT, which can only capture global frequency information, wavelets are localized in time and are well-suited for representing local features and transient components in signals. Compared to EMD, WPT is less adaptive and less flexible due to the utilization of one pre-determined scalable wavelet function and a finite number of decomposition levels. However, the same reasons allow WPT to be a less computationally expensive [
32,
34]. Thus, there is no consensus regarding which method is generally better; rather, the selection of either of the methods should be performed based on the particular type of signal and application. Even though the development of mother wavelet base functions is still an ongoing process, a conservative estimation of their existing number would be from several dozens of the most popular to several hundred including the less-known wavelet families. Up to the present day, an abundance of mother wavelet selection methods with comparable performance can be found in the literature. This shows that mother wavelet selection, as one of the most vulnerable parts of WPT, still lacks a general state-of-the-art solution method; thus, further research and new solutions are needed.
Feature preprocessing is a crucial aspect of the fault diagnosis framework. In feature preprocessing, the fault indicators extracted from the signal are evaluated, and discriminant features are then selected from them [
35]. Discriminancy of the features directly affects the generalization and classification capabilities of the classifier. Techniques such as the probabilistic principal component analysis (PCA) [
36], trace ratio LDA [
37], and sensitive discriminant analysis [
35] were proposed in the past. These methods resulted in discriminant feature spaces; however, there exist several shortcomings. The feature preprocessing methods based on PCA suffer from class separation problems and information loss. The between-class separation problem addressed by LDA can be affected by the penalty graph representation of the features from different classes.
To address the above-mentioned issues, this paper proposes a solution to the problem of mother wavelet selection for WPT analysis by constructing a signal representation that combines the nodes from several WPT trees obtained using different mother wavelets. Corresponding nodes of every tree are analyzed on the matter of their power spectrum content. The best nodes are selected based on the comparison using the proposed criterion. Additionally, the paper introduces the IF-LDA feature engineering method as a solution for dimensionality reduction. This method evaluates the feature pool using an informative factor (IF) and eliminates low-quality features, ensuring optimal performance of linear discriminant analysis (LDA). The novelty of this work is as follows:
- (1)
A new WPT-based signal representation is introduced for the extraction of bearing fault-related components.
- (2)
A variant of LDA, IF-LDA, is introduced to increase the discriminancy of the feature space based on the informative factor.
The contributions of this paper can be summarized as follows:
- (1)
WPT is used with a novel R-value criterion for mother wavelet selection in analyzing bearing signals. The R-value criterion considers the energy-to-entropy ratio of the signal power spectrum to select the mother wavelet that provides the most uneven energy distribution in a specific WPT node while preserving high signal energy.
- (2)
The proposed method constructs the final signal representation node by node, based on the R-value of each node’s reconstruction. As nodes are selected from WPT trees decomposed using different mother wavelets, the method is referred to as a novel WPT-based signal representation.
- (3)
The introduction of a novel feature engineering approach that greatly benefits linear discriminant analysis. This approach ensures minimal scatteredness among features within the same class and maximizes between-class separation, leading to improved accuracy in model predictions and easier generalization.
The subsequent sections of this manuscript are organized as follows: In
Section 2, we outline the datasets utilized to assess the effectiveness of the proposed method.
Section 3 offers technical background information on WPT, methods for selecting the mother wavelet, and LDA. In
Section 4, we present the detailed methodology proposed in this study. The obtained results and performance comparisons are discussed in
Section 5. Finally, in
Section 6, we draw conclusions based on our findings.
4. Proposed Method
4.1. Vibration Signal Processing
Faulty bearings produce high-frequency components in the vibration signal due to various mechanisms, such as impact, rubbing, or resonance. These high-frequency components are often masked by low-frequency components in the signal, such as those caused by machine operation, background noise, or measurement noise. For this reason, a signal processing technique known as envelope analysis is applied for signal preprocessing in order to extract the high-frequency components of a signal by use of demodulation. Thus, as can be seen from the workflow of the proposed method in
Figure 7, the raw signal is initially preprocessed using Hilbert transform envelope extraction.
This is completed by taking the module of the analytical signal obtained from the Hilbert transform. The whole process can be described mathematically starting with Equation (14), which reveals the expression of the vibration signal
, where the amplitude modulation envelope is given by
and the function of phase modulation is represented by
.
The transformation of x(t) via the Hilbert transform is demonstrated in Equation (15) as its 90-degree phase shift.
The ensuing analytical signal is obtained as a complex number:
By computing the modulus of
, the envelope of the signal can be determined as follows:
4.2. Vibration Signal Processing
The choice of a mother wavelet when decomposing a signal using the wavelet packet transform (WPT) can have a significant impact on the spectral characteristics of the resulting coefficients. Different mother wavelets may be better suited for capturing specific types of spectral content or signal features, while others may not be as effective. In the regular WPT procedure explained in
Section 3.2, a list of various mother wavelets is available for selection. To evaluate the effectiveness of the mother wavelets, a representative subspace of the signal data is chosen. This subspace is decomposed using each mother wavelet from the list, creating a unique WPT tree for each one. Then, the reconstructed coefficients at the desired decomposition level are assessed for each tree. The mother wavelet that exhibits the best evaluation score in comparison to the other wavelets is selected for the decomposition of the entire dataset.
The proposed WPT-based signal representation, on the other hand, provides a different approach. This method aims to represent a signal using WPT decomposition as a foundation, but it is not limited to using a single mother wavelet. Firstly, the given signal data are decomposed to the level j (which in this work equals to j =3) using the set of W mother wavelets resulting in W WPT trees. Following that, at the desired decomposition level, the nodes with the same indexes from
to
are taken for comparison across the
WPT trees forming a list of candidates with a dimension of
, which in this work is
since
wavelet functions were tried. For each of the candidate lists, the assessment is performed based on the spectral content evaluation of each reconstructed WPT coefficient, which is calculated using the ratio of the total power of the spectrum and the Shannon entropy of the signal power spectrum. The workflow of the novel WPT-based signal representation is shown in
Figure 8.
Considering the definition of Shannon entropy of signal power spectrum:
where
N is the total number of frequency bins in the power spectrum, and
pi is the probability of the signal power being in the frequency bin, which is defined as follows:
where
Pi is the power in the
i-th frequency bin.
The total power of the signal spectrum is calculated as follows:
The ratio is defined as follows:
Evaluation of the reconstructed coefficients of the WPT node using the R-value makes it possible to compare the spectral content captured by each mother wavelet and choose the one that provides the best representation of the signal within the frequency range of a particular WPT node. Specifically, the criterion measures the amount of information in the signal power spectrum that is being concentrated in specific frequency bands, as opposed to being distributed uniformly over the entire spectrum. A mother wavelet that produces a reconstructed coefficient with a higher ratio of total power to Shannon entropy is preferred, as it indicates a signal with a more predictable and more structured spectral composition.
The proposed method assembles the final decomposition of the signal node by node, depending on the R-value of its reconstruction. Different parts of the signal may have distinct spectral characteristics, and by selecting a specific mother wavelet for the decomposition of each node, the novel WPT-based signal representation can capture the relevant spectral features of the signal more accurately. It can better acquire the spectral content of the signal and identify important discriminant features, resulting in improved performance as compared to WPT which relies on the traditional mother wavelet selection methods and uses a single mother wavelet for the entire signal.
It is worth mentioning that the proposed signal representation method is based on the same principles as WPT. It does not satisfy some of the basic properties of wavelet decomposition, such as the superposition property or conservation of energy. Therefore, it cannot be considered an advanced version of WPT. However, the proposed signal representation method can still be used effectively as a feature extraction tool. The manipulations performed on the signal using the proposed signal representation method have a solid basis and are based on sound mathematical principles. As a result, the extracted features can provide useful information about the signal, which can be used for a wide range of applications, such as signal processing, classification, and pattern recognition.
4.3. Feature Extraction and Feature Pool Configuration
Collecting real-world data necessary for the diagnosis of bearing faults, including vibration data, acoustic emission data, or electric current data, involves extended periods of high-rate sampling. This process generates complex datasets with numerous variables, placing a significant demand on memory and computational resources. As a result, applying ML techniques to unprocessed data is restricted in practicality. Feature extraction is a technique that addresses this challenge by reducing the dimensionality of data. It involves the conversion of a raw dataset to a smaller one by means of extraction of high-quality features representative of the whole dataset, which contributes to superior generalization and prevents overfitting. A set of features obtained after extraction is conventionally referred to as the feature vector.
Existing literature on bearing fault diagnosis encompasses a substantial number of features that are utilized in varying permutations to establish a condensed depiction of the vibration data. In the current work, a total set of 19 features was extracted from the WPT-reconstructed node signals. Out of them, 16 are time-domain and three are frequency-domain features. These statistical features are ubiquitous in the field of bearing fault diagnosis, and anticipating the significance of specific features for fault diagnosis before feature selection is tedious. Consequently, the collection of features aggregated for this study aims to incorporate as many statistical features as feasible from the literature. The feature names along with the equations are displayed in
Table 5. These 19 features extracted from 8 reconstructed WPT coefficients form a row of 152 features for each sample and together constitute a primary feature pool.
4.4. Feature Extraction and Feature Pool Configuration
LDA is a dimensionality reduction algorithm without inherent feature selection capabilities. This means that input data of low quality provided to an LDA can degrade its performance for a number of reasons. Firstly, the presence of the low-quality features introduces noise and causes distortions in inter-class and between-class mean values, ergo, causes distortions in transformation matrix eigenvalues and eigenvectors resulting in suboptimal LDA space, which leads to poor classification results. Secondly, with a higher number of features, LDA will have to perform more computations due to the possibility of a larger data matrix having more linear discriminants, making the model more time-consuming. In order to avoid these issues and prevent the presence of low-quality features, LDA requires selective feature preprocessing.
In this work, selective preprocessing is performed based on the feature informative factor (IF). Initially, the cosine similarity for each pair of features in the primary vector is calculated. If features in the pair are defined as
and
, then their cosine similarity can be defined as follows:
An informative factor metric for each feature
is calculated as a sum of the cosine similarities of this feature with every other feature in the set as follows:
Based on the IF, the feature is included in the informative feature pool if its magnitude is above zero or is left in the primary feature pool if its magnitude is below zero, according to the following definition:
The resulting informative feature pool then undergoes the LDA transformation as was described in
Section 3.3. The application of informative factors offers significant benefits for linear discriminant analysis in that it ensures a minimal level of scatteredness among the features within the same class. Overall, the application of IF-LDA for dimensionality reduction offers significant benefits for effective bearing fault diagnosis. It enables the creation of a feature space that maximizes the separation between different classes while simultaneously ensuring a dense configuration among the features within the same class. This improvement in feature space facilitates enhanced accuracy in model predictions and ensures easy generalization, leading to a more robust and reliable diagnosis. A visual representation of the high-quality feature spaces with well-separated classes obtained from using the proposed method on three different datasets is shown in
Figure 9.
4.5. Bearing Fault Classification
After the feature vector dimensionality is reduced using IF-LDA, the classification of the bearing state is performed by the k-nearest neighbor (KNN) classifier. The KNN classifier is a non-parametric machine learning classification method. It determines the class membership of an input data point by finding the k closest labeled data points using a distance metric. KNN does not construct a model based on training data and, thus, is considered instance-based. Once KNN receives a new data sample for classification, it calculates the distances d from this sample to the known labeled samples . Then, based on majority voting, the new sample is allotted to the class with the highest number of instances among -nearest samples in the training dataset. In this work, the number of nearest neighbors was set to .
The effectiveness of the KNN classifier heavily relies on the quality of the features used. By leveraging the distance-based approach, KNN measures the similarity between instances based on their feature vectors. If the features effectively capture the relevant patterns and characteristics of the data, KNN can successfully identify similar instances and make accurate predictions. On the other hand, if the features are not informative or do not capture the underlying structure of the data, KNN’s performance may be limited. Therefore, the choice and quality of features play a crucial role in the success of KNN. This allows for a comprehensive assessment of the impact of different feature engineering techniques on classification accuracy. As a result, the KNN classifier serves as a robust benchmark for the evaluation of the effectiveness and generalizability of different feature engineering methods.
5. Experimental Results and Discussion
In this section, the evaluation of the bearing fault diagnosis performance is conducted on three datasets previously described in
Section 2: The PUA is categorized into three classes and 5760 samples in total, the PUR dataset is categorized into four classes with 6400 samples in total, and the CWRU dataset with four class labels and 1920 total samples. To ensure fairness in the evaluation, the datasets are split in a way that 80% of data are reserved for training and 20% are reserved for testing. The validation is carried out using the 10-fold cross-validation strategy. This strategy involves randomly reordering and partitioning the data into 10 groups. During each iteration, one group is assigned as the validation data, while the remaining nine groups are utilized for training. This process is repeated 10 times, ensuring that each data sample is included in a single holdout set.
Macro-averaged (MA) recall, macro-averaged precision, F1-score, fault identification accuracy, and one-class true positive rate were used as metrics for performance comparison and their definitions are provided in Equations (25)–(29), where TP stands for true positive,
FN for false negative,
FP for false positive, the lowercase
stands for the class number, the capital
stands for the total number of classes, and
is the total number of samples.
The methods selected for comparison have a similar nature to the proposed method in terms of bearing vibration signal processing. This relatedness helps to correctly evaluate the increase in fault diagnosis performance introduced by the proposed method. All the calculated metrics are represented as column charts in
Figure 10 and
Figure 11 for convenient comparison.
Applying the proposed method to the PUR, PUA, and CWRU datasets resulted in an FIA of 100% for each dataset. Accordingly, the error rate equals 0% for each dataset and MA recall and MA precision are 100%. The results obtained from the proposed method can be explained by the quality of the WPT-based signal representation, where each node is chosen according to the
R-value criterion, ensuring that the reconstructed signals possess a well-defined spectrum with prominent high-energy frequency components and minimal interference from noise. This allows for avoiding the effect of noisy or corrupted reconstructed signals in certain nodes for the reason of the low sensitivity of a particular mother wavelet to the shape of the components in the frequency range of the particular WPT node. Another reason for the high performance of the proposed method lies in the IF-LDA dimensionality reduction technique. The informative factor (IF) selective feature preprocessing helps to eliminate low-quality features that have poor correlation with the dependent variables. This step is crucial to prevent degradation of the performance of the LDA. Moreover, the features accepted into the final pool using IF preprocessing are compactly clustered within the same class and exhibit a high inter-class difference in mean value. Therefore, after reducing the dimensionality with LDA, the feature space depicted in
Figure 9 is obtained. Given the resulting feature space that exhibits distinct boundaries between classes, maximizing the distinction between healthy bearings and various fault states, a simple KNN model is fully capable of achieving 100% fault diagnosis performance, as assessed using several metrics presented in
Figure 7 and
Figure 8.
The first comparison method uses the signal energy features extracted from wavelet packet bases to train the random forest classifier [
40]. Utilizing WPT for signal energy feature extraction has become a widely used reliable strategy in the fault diagnosis field. Together with a powerful random forest algorithm, this method showed high performance on three datasets. Nevertheless, by solely relying on energy features in contrast to a diverse vector of features derived from both the time domain and the frequency domain, the method fails to capture the necessary level of distinctiveness and falls behind the proposed method. Thereby, the FIA demonstrated by this method on the PUA set is 93.90%, 99.34% for the PUR set, and 96.22% for the CWRU set. Accordingly, the error rates are 6.10%, 0.66%, and 3.78%. The MA recall values are 93.54%, 99.28%, and 95.99%, respectively, and the MA precision values are 93.62%, 99.12%, and 99.46%, respectively. In addition, while node energy features may indicate changes in overall energy levels, they do not offer insights into the specific fault-related patterns that can be crucial for accurate diagnosis.
The second comparison method [
41] employed a similar approach by utilizing the WPT for bearing vibration signal decomposition. The extracted features described in
Section 4.2 were used to train a KNN classifier. This method demonstrated inadequate performance when diagnosing compound bearing faults and exhibited mediocre abilities in fault diagnosis. In particular, the FIA values for the three datasets achieved by this method are 88.22%, 92.09%, and 83.79%, respectively, which correspond to the error rates of 11.78%, 7.91%, and 16.21%. The MA recall values are 87.31%, 90.24%, and 84.45%, respectively, and the MA precision values are 93.62%, 99.12%, and 96.46%, respectively. One of the contributing factors to this subpar performance is the absence of feature selection. Without it, random fluctuations in the data and noisy features which contain minimal discriminant information or exhibit weak correlation with the response may cause the model to overfit to these instances and eventually cause poor fault diagnosis performance and misclassification.
The third comparison method, which utilizes a robust Gaussian kernel SVM classifier, exhibited poor performance due to its heavy reliance on WPT energy features, analogously to the first comparison method [
42]. The FIA values for the three datasets are 92.43%, 98.67%, and 76.43%, respectively, which correspond to error rates of 7.57%, 1.33%, and 23.57%. The MA recall values are 92.25%, 98.60%, and 73.27%, respectively, and the MA precision values are 92.11%, 98.49%, and 80.79%, respectively.
The performance levels of the last comparison method are very high and are the closest to the proposed method [
43]. The FIA values achieved by this method on the three datasets are 99.48%, 98.70%, and 97.06%, respectively, while the error rates are 0.52%, 1.30%, and 2.94%, respectively. The MA recall metrics are equal to 99.41%, 98.22%, and 96.75%, respectively, while the MA precision values are 99.45%, 97.55%, and 96.04%, respectively. This can be explained by the utilization of the Boruta feature selection algorithm, which evaluates each feature in the set depending on its usability for the random forest (RF) classifier. Boruta uses random permutations of the features called shadow attributes (SA) and attaches them to the feature vector to create the extended information system (EIS). RF is trained on EIS multiple times, each time with new SA permutations. After each training iteration, the model is tested, and the calculation of the correct class votes is calculated. Eventually, only the features that are significantly more useful than any of its own permutations are selected for the final feature set. The rest of the features are neglected. This allows for constructing a feature set with highly discriminant features that enables reliable FD performance of the method using the KNN model.
Given the data parameters described in
Section 2, the proposed method is capable of diagnosing the type of fault in the bearing under analysis in 0.3 to 0.33 s when running on a PC equipped with an Intel
® Core™ i7-9700K CPU and 16 GB of RAM.
6. Conclusions
This paper proposed a method for bearing fault diagnosis using a novel WPT-based signal representation and informative factor LDA. The shape of the mother wavelet that poorly matches the shape of the signal components along the spectrum may cause inconsistent results of the decomposition with a low level of detail. Having decomposed the signal using various mother wavelets, the proposed R-value criterion based on the energy-to-entropy value of the node reconstruction allows the model to tailor a representation consisting of the nodes decomposed using mother wavelets that allow the extraction of the highest amount of detail and bearing fault-related information at the local frequency spectrum contained in the WPT node.
The dimensions of the vector of features extracted from this representation are reduced using the proposed informative factor LDA. The informative factor preprocesses the features, retaining only those that exhibit dense clustering within each class and offer optimal inter-class separation. The IF provides LDA with the advantage of early elimination of noisy and low-quality features, protecting LDA from outliers and enhancing interclass separability. Moreover, it reduces the computational time by providing a smaller feature vector matrix, which results in fewer possible LDA spaces. Overall, the introduction of an informative factor results in excellent LDA performance and complete class separation in the LDA space. For classification, the KNN algorithm was used, and the results surpassed those obtained by all other comparison methods.
It is worth noting that the advantages of the WPT-based signal representation, while promising, do come with a slight increase in computational time during training. This arises from the need to assess the quality of signal decomposition using various mother wavelets. To mitigate this, employing a smaller portion of the overall dataset as well as refining and reducing the mother wavelet candidate list through benchmark comparisons across diverse datasets can be beneficial. In light of these considerations, a future work direction could involve a benchmark comparison method that permanently eliminates the poorest-performing wavelets from the candidate list. Another potential avenue for future work is adapting the proposed algorithm to handle bearing fault data with transient rotational speeds.