Optimal Source Selection for Distributed Bearing Fault Classification Using Wavelet Transform and Machine Learning Algorithms

Rajabioun, Ramin; Atan, Özkan

doi:10.3390/app151910631

Open AccessArticle

Optimal Source Selection for Distributed Bearing Fault Classification Using Wavelet Transform and Machine Learning Algorithms

by

Ramin Rajabioun

and

Özkan Atan

^*

Department of Electric and Electronic Engineering, Yüzüncü Yil University, 65090 Van, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10631; https://doi.org/10.3390/app151910631

Submission received: 11 August 2025 / Revised: 8 September 2025 / Accepted: 10 September 2025 / Published: 1 October 2025

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

Early and accurate detection of distributed bearing faults is essential to prevent equipment failures and reduce downtime in industrial environments. This study explores the optimal selection of input signal sources for high-accuracy distributed fault classification, employing wavelet transform and machine learning algorithms. The primary contribution of this work is to demonstrate that robust distributed bearing fault diagnosis can be achieved through optimal sensor fusion and wavelet-based feature engineering, without the need for deep learning or high-dimensional inputs. This approach provides interpretable, computationally efficient, and generalizable fault classification, setting it apart from most existing studies that rely on larger models or more extensive data. All experiments were conducted in a controlled laboratory environment across multiple loads and speeds. A comprehensive dataset, including three-axis vibration, stray magnetic flux, and two-phase current signals, was used to diagnose six distinct bearing fault conditions. The wavelet transform is applied to extract frequency-domain features, capturing intricate fault signatures. To identify the most effective input signal combinations, we systematically evaluated Random Forest, XGBoost, and Support Vector Machine (SVM) models. The analysis reveals that specific signal pairs significantly enhance classification accuracy. Notably, combining vibration signals with stray magnetic flux consistently achieved the highest performance across models, with Random Forest reaching perfect test accuracy (100%) and SVM showing robust results. These findings underscore the importance of optimal source selection and wavelet-transformed features for improving machine learning model performance in bearing fault classification tasks. While the results are promising, validation in real-world industrial settings is needed to fully assess the method’s practical reliability and impact on predictive maintenance systems.

Keywords:

distributed bearing fault; machine learning; predictive maintenance; wavelet transform

1. Introduction

Electrical machines are widely used as prime movers in industrial and manufacturing sectors and consist of four key components: the frame, windings, rotor, and bearings. Rolling element bearings (REBs) are especially critical because they support the rotor and provide precise positioning, maintaining a consistent and minimal air gap. Each REB consists of four parts: outer and inner raceways, balls, and a cage. In harsh industrial environments, these bearing components are prone to failure due to factors such as humidity, dust, and dirt. Studies have shown that bearing faults account for approximately 41% of all machine failures [1]. The reliability of these machines depends not only on their mechanical integrity but also on the performance of the power electronic converters that drive them. For example, innovations in multilevel inverter topologies aim to improve the efficiency and cost-effectiveness of motor drive systems [2]. Recent research has focused on developing topologies that generate more voltage levels with fewer components, which is crucial for creating low-cost, highly efficient drive systems for applications such as electric vehicles and renewable energy systems [3]. This significant failure rate emphasizes the necessity of monitoring the health of bearings in numerous practical applications, including:

Mills
Washing machines
Compressors
Electric vehicles

Monitoring bearing condition is critical for maintaining machine performance and reliability. Given the high incidence of bearing faults and their significant impact on machine reliability, there is a need for advanced fault detection and diagnostic techniques [4]. This is particularly important in industrial settings, where unexpected machine failures can cause costly downtime and loss of productivity. To address these issues, researchers are developing innovative methods for diagnosing bearing faults, including:

Advanced signal processing techniques
Machine learning algorithms for fault classification
Multi-sensor data fusion methods
Feature selection and extraction techniques

These advanced methods are intended to improve the accuracy and speed of bearing fault detection, thereby enabling predictive maintenance and reducing the risk of unexpected machine breakdowns.

Most studies on bearing fault detection focus on single or localized defects [5]. However, these defects do not necessarily indicate that the bearing is close to failure [6]. In contrast, there is limited research on distributed faults, which develop when localized defects spread over time due to aging factors such as inadequate lubrication, contamination, erosion, high-frequency leakage currents, or improper installation [7]. Distributed faults affect larger portions of the raceways, causing irregular vibration patterns and increased overall vibration as multiple rolling elements pass through the damaged area simultaneously. These faults often develop from the growth of small, localized pits that, if left unattended, can lead to significant damage. Distributed faults are typically seen as geometric irregularities in the bearing caused by poor manufacturing, improper installation, or misuse. Detecting these faults is vital, as they can result in serious operational problems and extended machine downtime.

Bearing condition monitoring generally involves three main steps: sensor measurement, data analysis, and decision-making. Sensors can measure a variety of parameters, including temperature, current, voltage, acoustic emissions, stray magnetic flux, and vibrations [8]. Among these, vibration and stray magnetic flux signals provide some of the most valuable information for condition monitoring. However, due to their complexity and nonlinearity, these signals require advanced signal processing tools [9]. These tools are essential for extracting features used by various detection algorithms, especially those based on machine learning. Commonly used signal processing methods include statistical measures (such as mean and standard deviation), kurtosis, crest factor, power spectral density, fast Fourier transform, wavelet transform, Hilbert transform, and Mel frequency cepstrum coefficients. These diverse analytical techniques allow for comprehensive feature extraction, enhancing the accuracy and reliability of bearing fault detection and diagnosis [10]. Developing and selecting appropriate features require experience and domain knowledge, as these factors significantly impact the performance of machine learning models for fault diagnosis and classification [11]. Several well-known machine learning algorithms, including Random Forest, Support Vector Machines, and XGBoost, have been widely used in bearing fault diagnosis [12].

For example, ref. [13] explores the use of fundamental bearing frequencies, derived from vibration responses, as novel features for analysis. The authors use vibration data collected under various operating conditions, processed with a supervised machine learning algorithm—specifically, the K-nearest neighbor (KNN) method—for fault classification. By focusing on single-point defects in motor bearings, the study shows that the KNN algorithm, using these novel features, achieves a fault classification accuracy of 98.5% for single-point defects. The study in [14] investigates fault detection using time-frequency analysis and unsupervised learning techniques. The authors compare wavelet packet analysis and wavelet soft threshold analysis on Type B fault signals, and find that wavelet packet analysis with a 4-level decomposition using the db9 wavelet provides superior noise reduction. Unsupervised learning methods are also applied, using Shannon entropy and seven indicators for clustering, with an emphasis on logarithmic data transformation. In [15], bearing fault diagnosis is approached using envelope analysis and machine learning techniques. The study examines methods such as spectral kurtosis and the global spectrum of vibration signals, and also compares deep transfer learning methods for bearing fault detection, ultimately proposing an efficient deep learning model for this task.

The research in [16] reviews various fault detection methods for rotating machinery using sound signals, especially those based on machine learning techniques. It provides an overview of signal processing and feature extraction methods as well as classification algorithms, emphasizing the importance of optimal feature selection to improve detection accuracy. In [17], the authors examine machine learning-based fault diagnosis of rolling bearings using the widely used Case Western Reserve University (CWRU) dataset. The paper outlines basic principles for machine learning-based fault diagnosis and conducts experiments using various machine learning algorithms, such as SVM, Random Forest, K-Nearest Neighbor, Adaboost, and Bagging. It also briefly reviews deep learning models, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Despite the growing attention to machine learning for fault diagnosis, few studies have focused specifically on distributed bearing faults [18]. Most current research relies on manual feature extraction and uses the XGBoost algorithm to diagnose distributed faults, including lubrication issues, contamination, raceway scratches, missing balls, and broken cages. While XGBoost offers strong predictive performance and valuable feature importance analysis, it comes with certain disadvantages, such as increased complexity in parameter tuning, significant computational resource requirements, and a risk of overfitting. As a result, distributed bearing fault diagnosis remains an underexplored field and calls for additional investigation using machine learning algorithms.

While considerable research has explored single-point defects in bearings, such as isolated spalls or pits, these conditions are much easier to simulate in the laboratory but do not fully capture the distributed, progressive faults more frequently encountered in real industrial contexts. Most previous works and benchmark datasets have therefore focused on localized damage, leaving the diagnosis of distributed faults comparatively underexplored. Recent studies are only beginning to address distributed fault scenarios, as reflected in several newly cited works focusing on features, models, and evaluation methods specifically tailored to these fault types [19,20,21]. Nonetheless, there remains a clear need for systematic investigations into distributed bearing faults using diverse sensor modalities and robust analysis pipelines. This study fills this gap by experimentally generating and rigorously diagnosing distributed bearing faults, thereby providing new insights into sensor fusion, wavelet-based feature extraction, and machine learning model performance for realistic and complex fault patterns. Recent studies have proposed advanced deep learning models for distributed bearing fault classification. For example, Rajabioun et al. [9] introduced a multisensory 2-D convolutional neural network architecture that fuses six different input signals, including three-axis vibration, stray magnetic flux, and two-phase current, to generate 2-D input matrices and achieved notable accuracy (99.92%) in identifying distributed bearing faults under varied operating conditions. Their results further highlight the potential of multi-signal fusion and deep architectures for robust fault diagnosis. Afshar et al. [19] introduced a deep learning framework embedding residual and channel attention blocks for prognostics of distributed bearing faults, utilizing only three-phase current signals instead of traditional vibration measurements. Their approach demonstrated highly effective remaining useful life (RUL) prediction for cooling fan motors across several configurations and power ratings, achieving over 95% test accuracy. This highlights the promise of current-driven deep learning models for practical, sensor-efficient bearing health monitoring in small electrical machines. Further addressing real-world diagnostic needs, Rajabioun et al. [20] investigated the use of stray magnetic flux signals as inputs for deep learning-based bearing fault classification. Their work focused on identifying distributed faults in actual industrial motor bearings, rather than simulated ones, and demonstrated that a properly designed deep neural network could extract informative features from flux signals alone, achieving over 94% test accuracy across multiple fault and healthy states. This highlights the growing capability of deep learning to achieve reliable performance even with unconventional, easy-to-measure sensor modalities for complex, distributed fault detection in practical environments.

This study delves into the challenge of diagnosing distributed bearing faults, which are typically marked by irregular vibration patterns and are often missed by conventional detection techniques. By integrating wavelet transform-based feature extraction with machine learning algorithms, this research explores the optimal selection of input signal sources to enhance fault classification accuracy. Specifically, the focus of this work is on identifying the optimal combination of input signals to maximize the diagnostic performance of different machine learning models, such as Random Forest, XGBoost, and SVM. Findings highlight the effectiveness of using specific signal pairs for accurate fault classification, paving the way for more reliable and cost-efficient maintenance strategies in industrial settings. This paper outlines our methodology, presents the results, and discusses their significance for advancing distributed bearing fault diagnosis.

To clarify the methodological framework of this study, the overall diagnostic process is summarized in Figure 1. As illustrated, the proposed approach begins with the acquisition and preprocessing of multi-sensor data, followed by the extraction of wavelet-based statistical features from each channel. All possible single and pairwise combinations of input signals are systematically evaluated to construct the feature matrix. Multiple machine learning models including Random Forest, XGBoost, and Support Vector Machine are then trained and validated using cross-validation procedures. Model performance is assessed with comprehensive statistical metrics, enabling the identification of optimal input sources and classifier architecture for robust distributed bearing fault classification.

2. Bearing Faults

To simulate commonly reported distributed bearing faults in industrial settings, faults were artificially created in the bearings of an induction motor with a power rating of 300 W. The faults investigated in this research include lubrication faults, contamination, electrical erosion, and flaking, mirroring the patterns observed in root cause analysis reports published by bearing manufacturers [21] and also investigated in [22] using deep learning algorithms. Additionally, to study the interaction of distributed faults with single-point defects, a bearing with a single-point defect on the outer race was also created. Table 1 shows detailed description of each fault type used in this research.

To ensure reproducibility and quantification of each distributed fault condition, all artificial fault creations followed rigorously documented procedures. Lubrication faults were introduced by completely removing the lubricant with a commercial solvent, followed by operating the motor under variable loads and a range of speeds for two hours. Intermediate severity levels were produced via partial re-lubrication and subsequent testing. Severity for all lubrication cases was confirmed both by visual inspection and by observing at least a 30% increase in the vibration RMS relative to the healthy baseline. For contamination faults, approximately 5 g of 200-grit rock tumbler media were introduced into the bearing cavity after shield removal. Robustness and effective fault creation were validated by monitoring vibration RMS and kurtosis, which showed a minimum increase of 25% compared to the healthy condition. Electrical erosion and flaking were simulated using a rotary carving tool applied with controlled bit size and measured application times; all tool parameters and resulting defects were carefully documented.

During data acquisition, the induction motor was tested under five representative load conditions (no load, 25%, 50%, 75%, and 100% of rated torque) using the magnetic brake unit, and at ten discrete operating speeds, covering a broad range of industrially relevant scenarios. For each combination of fault class, load, and speed, at least 20 s recordings were collected to ensure variability and measurement robustness. These enhanced specifications ensure that all fault creation procedures, severity metrics, and operational scenarios are transparently documented, facilitating exact replication and validation in future studies.

3. Experimental Setup and Analysis Pipeline

In this study, the experimental setup was carefully designed to replicate a range of bearing fault conditions and collect high-quality data for analysis and model training. The test bench, as illustrated in Figure 2, consisted of a 300 W induction motor connected to a magnetic brake unit (EM-3320-1A), which functioned as the load management system.

The motor load was precisely regulated using an EM-3320-1N brake controller from K & H MFG. CO., LTD., New Taipei City, Taiwan, allowing simulation of various operational scenarios that are common in industrial environments. Both the magnetic brake unit and its controller are shown in Figure 3.

To acquire the required data, a multi-sensory board was mounted at the shaft end of the induction motor. This board was equipped with sensors to measure vibrations along three axes and to detect stray magnetic flux around the motor. Including flux measurements as an additional signal was essential because it provided supplementary information that could improve fault detection, particularly under high-load conditions where vibration signals alone may be less effective. This comprehensive approach enhanced the detection of bearing faults even in challenging operational environments. The placement of the multi-sensory board on the shaft end of the induction motor is shown in Figure 4.

Data acquisition was carried out using a Field-Programmable Gate Array (FPGA) Genesys 2 from Digilent Inc., Pullman, WA, USA, which sampled and recorded the input signals at 16-bit resolution and a 10 kHz sampling rate. Each test involved 20 s of data collection, which was then divided into 1 s segments. This segmentation enabled detailed signal analysis and ensured that the machine learning models were trained on manageable data portions.

All sensors (vibration, flux, and current) were carefully calibrated prior to installation using manufacturer-supplied calibration certificates and standard reference signals. During signal acquisition, calibration coefficients and offsets from each sensor’s datasheet were applied to all recorded signals to ensure measurement accuracy and comparability. To minimize electromagnetic interference and environmental noise, all sensor wiring was internally shielded and routed away from high-current cables whenever possible. Before feature extraction, each raw sensor signal underwent digital bandpass filtering (10 Hz–4 kHz, FIR), coded in Python 3.12, to eliminate low-frequency drift and high-frequency noise not relevant to bearing fault signatures. Each signal segment was then standardized using z-score normalization in the processing code, ensuring consistency across operating conditions and sessions. Outlier detection was also performed using a ±3σ criterion, and any segments with anomalous readings were excluded from further analysis. These combined procedures ensured reliable, reproducible, and interference-resistant data collection and preprocessing throughout the experimental campaign.

The bearing faults studied included single-point defects, lubrication failures, contamination, electrical erosion, and flaking. These fault conditions were carefully recreated on the test bench to simulate common problems observed in industrial settings. By generating data under both healthy and faulty conditions in a controlled environment, the study developed a comprehensive dataset for training and evaluating the performance of the fault detection models.

3.1. Data Processing and Feature Extraction

The collected data was processed following a fully reproducible pipeline enabling accurate replication of all steps:

Signal Extraction: For each sample segment, six channels were extracted: vibrations along the x, y, and z axes (Vib_x, Vib_y, Vib_z), stray magnetic flux (Flux), and two-phase currents (I_A, I_B).
Feature Generation: For each channel, a discrete wavelet transform (Daubechies 4, level 3) was applied. Eight statistical features were extracted per channel: mean and standard deviation of the cA3, cD3, cD2, and cD1 coefficients, resulting in 48 features for the full six-signal set.
Label Assignment: Class labels for each segment were parsed directly from the corresponding file names, ensuring correct mapping between measurement and ground-truth condition.

3.2. Dataset Construction and Preparation

The feature dataset was split into training and testing subsets using a stratified split procedure (typically 70% train, 30% test), to guarantee equal class distribution and reproducible evaluation results.

3.3. Machine Learning Pipeline

Model Training: Three machine learning classifiers were trained and tested:
- Random Forest: (scikit-learn, n_estimators = 100, random_state = 42)
- XGBoost: (XGBClassifier, eval_metric = ‘mlogloss’, default parameters)
- Support Vector Machine (SVM): (SVC, kernel = ‘rbf’, C = 1, gamma = ‘scale’, probability = True)

Classification Tasks: To identify the contribution of each signal and their combinations:
- Each signal was analyzed both as a single input and in every possible two-signal pair combination.
- Feature vectors were constructed accordingly and models were trained/tested for each configuration.

3.4. Evaluation and Output

For each model and input configuration, training and test set accuracy were recorded, along with confusion matrices for both sets. The accuracy values were computed over multiple random splits and multi-fold cross-validation. All feature matrices, labels, and evaluation outputs were stored as .pkl files to ensure full reproducibility.

4. Feature Extraction and Input Selection

The effectiveness of bearing fault diagnosis largely depends on the quality of features extracted from raw sensor data. In this study, wavelet transforms are used for feature extraction due to their ability to capture transient signals and localized frequency components often indicative of bearing faults. The wavelet transform is particularly well-suited to non-stationary signals, such as those produced by distributed bearing faults, as it enables simultaneous time and frequency domain analysis.

4.1. Wavelet Transform for Feature Extraction

To effectively capture both transient and stationary characteristics of bearing fault signatures, the discrete wavelet transform (DWT) with the Daubechies 4 (db4) wavelet was applied to each segment from every sensor channel (Vib_x, Vib_y, Vib_z, Flux, I_A, I_B):

Each signal was decomposed to level 3, producing one set of approximation (cA3) and three sets of detail coefficients (cD3, cD2, cD1).
For each of these four coefficient sets, both the mean and standard deviation were calculated.

This process yielded eight features per sensor channel. For any given input configuration (e.g., two channels combined), the feature vector size was 8×8× the number of selected channels. Class labels were assigned automatically based on the filename structure, ensuring strict traceability between each sample and its ground truth. All feature extraction was performed in a fully scripted and documented environment, enabling exact reproduction of both the extraction process and all subsequent analyses.

4.2. Input Source (Sensor) Selection

The role of input signal selection in model performance is systematically investigated. Six sensor signals are available for analysis: vibration (x/y/z axes: Vib_x, Vib_y, Vib_z), stray magnetic flux (flux), and two-phase currents (I_A, I_B). The procedure involved: The role of input signal selection in model performance was systematically investigated. Six sensor signals were available for analysis: vibration (x/y/z axes: Vib_x, Vib_y, Vib_z), stray magnetic flux (flux), and two-phase currents (I_A, I_B). The procedure involved:

Single Input Analysis: Each channel was considered individually. For every model, training and test accuracy, as well as confusion matrices, were computed and compared across the six signals.
Pairwise Input Combinations: All possible pairs of the six signals (15 unique two-signal combinations) were analyzed in a comprehensive grid search. Feature vectors for each pair were constructed as described above.
Model Training and Validation: For each single and pairwise input configuration, three classifiers, Random Forest, XGBoost, and SVM, were trained and evaluated. Data was split using a stratified scheme (train/test ratio typically 70/30, with random_state = 42), to ensure properly balanced, reproducible comparisons.
Repeated Validation: The train/test split and evaluation were repeated for multiple random seeds to mitigate the impact of chance partitioning. Where appropriate, 5-fold cross-validation was performed, and summary statistics (mean accuracy, standard deviation, confidence interval) are reported.

4.3. Results of Optimal Input and Feature Selection

Our exhaustive analysis provides clear guidance for optimal sensor selection:

For Random Forest and SVM, the combination of vibration signal along the x-axis (Vib_x) and flux consistently produced the highest test accuracy (e.g., Random Forest: up to 100% in our environment, see Results Table 2), far outperforming most single-signal and other pairwise options.
XGBoost achieved best performance with a different pair (e.g., Vib_y and I_A), underscoring subtle differences in how algorithms leverage mixed-signal data.
Single-signal models consistently performed 10–25% worse than optimal pairs, confirming the importance of complementary information available from fusing multiple signal modalities.
All results are supported by detailed confusion matrices and accuracy statistics, as directly output from the analysis code, to document classification robustness and error modes.

These findings demonstrate that careful input source selection and rigorous feature extraction are both essential and quantifiably beneficial, enabling simpler machine learning models to match or even exceed more complex approaches when applied to high-quality, precisely structured datasets.

5. Comparative Study of Machine Learning Models for Bearing Faults

This section discusses the machine learning models used in this study for bearing fault detection and classification. Three popular machine learning algorithms were employed: Random Forest (RF), XGBoost (XGB), and Support Vector Machine (SVM). These models were chosen for their demonstrated effectiveness in fault detection and classification tasks, especially when dealing with the complex and nonlinear data patterns commonly encountered in bearing fault signals.

5.1. Random Forest

The Random Forest algorithm is an ensemble learning method that constructs multiple decision trees during training and outputs either the most common class (for classification) or the mean prediction (for regression) of the individual trees. Key advantages of Random Forest include robustness to overfitting, the ability to handle high-dimensional data, and the capability to rank the importance of features.

In our experiments, the Random Forest model achieved the highest accuracy when using the combination of ‘Vib_x’ (vibration in the x-axis) and Flux (stray magnetic flux). This combination resulted in 100% accuracy, indicating perfect classification of bearing faults under the tested conditions. The model also performed well with other combinations, but none matched the perfect classification accuracy observed with ‘Vib_x’ and ‘Flux’. This suggests that these features provided highly complementary information for the model, capturing the nuances of both localized and distributed faults effectively.

5.2. XGBoost

XGBoost (Extreme Gradient Boosting) is a powerful and scalable machine learning algorithm based on the gradient boosting framework. It has gained popularity due to its high performance, speed, and ability to handle sparse data. XGBoost builds an ensemble of trees sequentially, with each tree attempting to correct the errors made by the previous ones.

The XGBoost model achieved its best performance with the combination of ‘Vib_Y’ (vibration in the y-axis) and ‘I_A’ (current). The model attained an accuracy of 99.67%, just slightly lower than the Random Forest model’s best performance. The inclusion of current (‘I_A’) as an input source proved to be effective when combined with the vibration data along the y-axis, indicating that electrical data can significantly enhance fault classification when appropriately integrated with mechanical vibration signals.

5.3. Support Vector Machine (SVM)

The Support Vector Machine (SVM) is a supervised learning model widely used for both classification and regression tasks. SVM operates by finding the hyperplane that best separates data points of different classes in a high-dimensional space. It is particularly effective when the number of dimensions exceeds the number of samples.

The SVM model showed optimal performance using the combination of ‘Vib_X’ (vibration in the x-axis) and ‘Flux’ (stray magnetic flux), achieving an accuracy of 76.5%. While this performance is lower than that of Random Forest and XGBoost, it demonstrates the model’s ability to distinguish between different fault conditions using the same feature set as the Random Forest model. However, the lower accuracy suggests that SVM might not be as robust in handling the complexities of the data compared to the other two algorithms.

5.4. Comparison of Model Performance

Table 2 summarizes the performance of each machine learning model with their best feature combinations. From the results, it is evident that the Random Forest model achieved the highest accuracy, making it the most effective model for this study. The combination of ‘Vib_X’ and ‘Flux’ was particularly significant, providing the highest accuracy for both Random Forest and SVM models. This indicates that vibration signals along the x-axis, coupled with stray magnetic flux measurements, offer the most comprehensive information for distinguishing between different bearing fault conditions.

5.5. Selection of Optimal Input Sources

Based on the results, the combination of ‘Vib_X’ and ‘Flux’ consistently excelled in achieving high accuracy, particularly with the Random Forest and SVM models. This suggests that these two input sources are critical for effective fault detection in this context. The inclusion of ‘Flux’ as a complementary signal to vibration data underscores its importance in capturing additional fault characteristics that might not be evident from vibration data alone, especially under varying load conditions.

In contrast, the XGBoost model’s best performance was with ‘Vib_Y’ and ‘I_A’, highlighting the relevance of incorporating electrical signals (current) alongside mechanical vibrations. This combination achieved nearly the same accuracy as the best combination for the Random Forest model, indicating that while ‘Vib_X’ and ‘Flux’ are optimal for certain algorithms, other combinations can also yield high accuracy, depending on the algorithm’s strengths.

Table 3 presents the statistical performance of each machine learning model using its optimal feature combination, based on 5-fold cross-validation. This summary shows that both Random Forest and XGBoost achieved excellent classification stability and accuracy with their selected feature pairs, while SVM’s performance was noticeably lower. The results highlight the clear advantage of combining vibration and flux signals for ensemble methods and demonstrate the importance of informed feature selection for robust bearing fault diagnosis.

In conclusion, the selection of input sources significantly impacts the model’s ability to detect and classify bearing faults. The findings suggest that a combination of mechanical and electrical signals, specifically ‘Vib_X’ and ‘Flux’, provides a robust framework for fault diagnosis across different machine learning models.

6. Discussion and Conclusions

This study explored the effectiveness of different machine learning models and various input signal combinations for the detection and classification of bearing faults. The results demonstrate that both the choice of input signals and the selected machine learning model have a significant impact on the accuracy of fault diagnosis.

6.1. Discussion

The results from this study clearly show that both input feature selection and classifier architecture are critical for achieving effective and reliable distributed bearing fault classification. The inclusion of comprehensive 5-fold cross-validation statistics (Table 3)—presenting the mean accuracy, standard deviation, and confidence interval for each model’s optimal feature pair—highlights not only the high accuracy but also the statistical stability of ensemble learning methods, especially Random Forest and XGBoost.

A key outcome of this work is the consistency and robustness achieved by Random Forest and XGBoost when using carefully selected input pairs, particularly the combination of vibration (x-axis) and stray magnetic flux. Both models delivered exceptionally high and stable classification accuracy across different data splits, as indicated by narrow confidence intervals and low standard deviations. This level of performance suggests that, beyond achieving high single-run accuracy, these models maintain their effectiveness and reliability across multiple trials, supporting their potential value for reproducible deployment in real-world industrial settings.

In contrast, the Support Vector Machine, though improved by optimal input selection, exhibited substantially lower mean accuracy and greater variance even with its best-performing combination (Figure 5). This result highlights the importance of accounting for both peak accuracy and the statistical reliability and robustness of model predictions, especially in scenarios where misclassification may have high costs. These findings further emphasize that ensemble methods are particularly well-suited to handle the complex, nonlinear signal characteristics present in bearing fault data, especially when features are drawn from both mechanical (vibration) and electromagnetic (flux) domains.

As shown in Figure 6, overall, Random Forest and XGBoost achieved the highest test accuracies when paired with their optimal sensor combinations, significantly outperforming SVM. These results reinforce that careful data source fusion and thoughtful selection of machine learning models lead not only to better fault detection, but also to more stable and generalizable diagnostic systems. The ability to diagnose faults reliably across varying operational modes and data partitions is critical for predictive maintenance in industrial machinery. The workflow and results presented here provide both a technical and methodological benchmark for future research in intelligent condition monitoring.

6.2. Conclusions

This study demonstrates that multi-sensor input sources combined with machine learning models, particularly ensemble methods like Random Forest and XGBoost, significantly enhance the accuracy of distributed bearing fault diagnosis. The optimal combination of input sources identified in this research, including vibration signals along different axes and stray magnetic flux, enables more precise and reliable fault detection across various operational scenarios. The research also underscores the need for careful selection and fusion of input signals to maximize diagnostic accuracy. Future work should focus on refining feature extraction methods and exploring additional input sources to further improve model performance. Additionally, investigating the application of deep learning models and advanced feature engineering techniques could provide new insights and enhance the robustness of fault detection systems. It is instructive to compare these results with our earlier work [20], in which a CNN-based approach achieved 100% fault classification accuracy across multiple signal types. Although those results set a high standard for performance, they came at the cost of significant model complexity and computational demand. The current work illustrates that, by judiciously selecting the most informative sensor signals and leveraging wavelet-derived features, similar levels of accuracy and reliability are possible with minimal input requirements and far smaller classical machine learning models. Our ablation and input selection analysis further reinforce that this streamlined approach does not compromise diagnostic performance, even under differing hardware conditions.

In conclusion, the integration of advanced signal processing techniques, optimal input selection, and machine learning models offers a promising pathway for developing more effective and efficient bearing fault detection systems. This approach not only improves the reliability of industrial machinery but also supports the advancement of predictive maintenance strategies, reducing downtime and maintenance costs. While these results demonstrate promising accuracy and robustness under controlled laboratory conditions, several limitations of the present study should be acknowledged. The experiments were conducted on a test bench rather than in an industrial environment, and data were collected under relatively balanced class distributions with predefined fault conditions. Real-world industrial settings are often characterized by greater noise, operational variability, and class imbalance, which may reduce classification performance. Future work will focus on validating the proposed methodology with field data from operational machinery and exploring the impact of more complex, unbalanced, and evolving fault scenarios.

Author Contributions

Conceptualization, R.R. and Ö.A.; methodology, R.R.; software, R.R.; validation, Ö.A.; formal analysis, R.R.; investigation, R.R.; resources, Ö.A.; data curation, R.R.; writing—original draft preparation, R.R.; writing—review and editing, R.R. and Ö.A.; visualization, R.R.; supervision, Ö.A.; project administration, Ö.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bellini, A.; Filippetti, F.; Tassoni, C.; Capolino, G.-A. Advances in diagnostic techniques for induction machines. IEEE Trans. Ind. Electron. 2008, 55, 4109–4126. [Google Scholar] [CrossRef]
Hataş, D.H. Design and implementation of a novel switched rectifier based voltage multiplexer for multilevel inverters. Electr. Power Syst. Res. 2025, 247, 111760. [Google Scholar] [CrossRef]
Karakiliç, M.; Zeynalov, J.; Hataş, H. Low-cost single-source 17 level multilevel inverter with reduced switch count. Eng. Res. Express 2025, 7, 015329. [Google Scholar] [CrossRef]
Zhu, J.; Chen, N.; Peng, W. Estimation of bearing remaining useful life based on multiscale convolutional neural network. IEEE Trans. Ind. Electron. 2018, 66, 3208–3216. [Google Scholar] [CrossRef]
Neupane, D.; Seok, J. Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review. IEEE Access 2020, 8, 93155–93178. [Google Scholar] [CrossRef]
Gangsar, P.; Tiwari, R. Signal based condition monitoring techniques for fault detection and diagnosis of induction motors: A state-of-the-art review. Mech. Syst. Signal Process. 2020, 144, 106908. [Google Scholar] [CrossRef]
Dalvand, F.; Kang, M.; Dalvand, S.; Pecht, M. Detection of generalized-roughness and single-point bearing faults using linear prediction-based current noise cancellation. IEEE Trans. Ind. Electron. 2018, 65, 9728–9738. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Deep learning algorithms for bearing fault diagnostics—A comprehensive review. IEEE Access 2020, 8, 29857–29881. [Google Scholar] [CrossRef]
Rajabioun, R.; Afshar, M.; Mete, M.; Atan, Ö.; Akin, B. Distributed Bearing Fault Classification of Induction Motors Using 2-D Deep Learning Model. IEEE J. Emerg. Sel. Top. Ind. Electron. 2024, 5, 115–125. [Google Scholar] [CrossRef]
Rai, A.; Upadhyay, S.H. A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribol. Int. 2016, 96, 289–306. [Google Scholar] [CrossRef]
Cui, B.; Weng, Y.; Zhang, N. A feature extraction and machine learning framework for bearing fault diagnosis. Renew. Energy 2022, 191, 987–997. [Google Scholar] [CrossRef]
Pacheco-Chérrez, J.; Fortoul-Díaz, J.A.; Cortés-Santacruz, F.; Aloso-Valerdi, L.M.; Ibarra-Zarate, D.I. Bearing Fault Detection with Vibration and Acoustic Signals: Comparison among different Machine Leaning Classification Methods. Eng. Fail. Anal. 2022, 139, 106515. [Google Scholar] [CrossRef]
Shinde, P.V.; Desavale, R.G.; Jadhav, P.M.; Sawant, S.H. A multi fault classification in a rotor-bearing system using machine learning approach. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 121. [Google Scholar] [CrossRef]
Fu, S.; Wu, Y.; Wang, R.; Mao, M. A Bearing Fault Diagnosis Method Based on Wavelet Denoising and Machine Learning. Appl. Sci. 2023, 13, 5936. [Google Scholar] [CrossRef]
Alonso-González, M.; Díaz, V.G.; Pérez, B.L.; G-Bustelo, B.C.P.; Anzola, J.P. Bearing Fault Diagnosis with Envelope Analysis and Machine Learning Approaches Using CWRU Dataset. IEEE Access 2023, 11, 57796–57805. [Google Scholar] [CrossRef]
Shubita, R.R.; Alsadeh, A.S.; Khater, I.M. Fault Detection in Rotating Machinery Based on Sound Signal Using Edge Machine Learning. IEEE Access 2023, 11, 6665–6672. [Google Scholar] [CrossRef]
Zhang, X.; Zhao, B.; Lin, Y. Machine Learning Based Bearing Fault Diagnosis Using the Case Western Reserve University Data: A Review. IEEE Access 2021, 9, 155598–155608. [Google Scholar] [CrossRef]
Irfan, M.; Alwadie, A.S.; AlThobiani, F.; Quraishi, K.S.; Jalalah, M.; Abbass, A.; Rahman, S.; Khan, M.K.A.; Alqhtani, S. A Comparison of Machine Learning Methods for the Diagnosis of Motor Faults Using Automated Spectral Feature Extraction Technique. J. Nondestruct. Eval. 2022, 41, 31. [Google Scholar] [CrossRef]
Afshar, M.; Rajabioun, R.; Akin, B. Current-Driven Deep Learning for Enhanced Motor Bearing Prognostics. IEEE Trans. Ind. Appl. 2025, 61, 2864–2873. [Google Scholar] [CrossRef]
Rajabioun, R.; Afshar, M.; Akin, B. Deep Learning-Based Bearing Fault Classification Using Stray Magnetic Flux Signal. In Proceedings of the 2023 IEEE Energy Conversion Congress and Exposition, ECCE, Nashville, TN, USA, 29 October–2 November 2023; pp. 4043–4048. [Google Scholar] [CrossRef]
Damage, S.K.F.B. Failure Analysis; SKF: Gothenburg, Sweden, 2017. [Google Scholar]
Rajabioun, R.; Afshar, M.; Atan, Ö.; Mete, M.; Akin, B. Classification of Distributed Bearing Faults Using a Novel Sensory Board and Deep Learning Networks with Hybrid Inputs. IEEE Trans. Energy Convers. 2024, 39, 963–973. [Google Scholar] [CrossRef]

Figure 1. Step-by-step flowchart of the proposed distributed bearing fault classification methodology.

Figure 2. Test setup for 300 W induction motor with brake control and data acquisition units.

Figure 3. Brake control units used for load management.

Figure 4. Vibration and Flux sensor kit fixed on the shaft-end of the 300 W induction motor.

Figure 5. Test accuracy comparison of Random Forest, XGBoost, and SVM models for selected sensor input combinations.

Figure 6. Maximum test accuracy achieved by each machine learning model using its optimal sensor input combination.

Table 1. Description of different classes of bearings used in this study.

Bearing Mode	Description
Healthy Mode (HL)	In this mode, bearings with no faults were installed in the motors, which were then operated under various load and speed conditions to collect baseline data across all operating points. This data serves as a control reference for fault classification.
Single Point Defect (SP)	Localized faults are defined by specific damage, such as pits or spalls, on bearing components like raceways, cages, or balls. In this study, a single-point defect was artificially created on the outer race using a rotary carving tool, generating well-known vibration patterns crucial for comparison with distributed faults.
Lubrication Issue (LB)	Proper lubrication is essential for reducing friction and prolonging bearing life. To simulate lubrication failure, the lubricant was completely removed from the bearing using a lubricant remover, and the motor was operated without lubrication under full load for several hours to accelerate wear. The bearing was then partially re-lubricated to capture signals under different operating conditions.
Contamination (CN)	Contaminants such as dirt, dust, and metal particles can infiltrate the bearing lubricant, leading to physical damage. To recreate this condition, 200-grit rock tumbler particles were introduced into the bearings, allowing the study of contamination effects on bearing performance.
Electrical Erosion (ER)	High-frequency electrical currents passing through the bearing raceways can cause fluting, characterized by a series of lines across the raceways due to erosion. In this study, an electrical erosion pattern was simulated inside the inner race of a bearing using specialized rotary carving bits.
Flaking Fault (FL)	Flaking occurs when material fatigue leads to small pieces breaking off from the bearing raceway or rolling elements, resulting in a rough and coarse surface. This fault was simulated by artificially creating scratches on the inner raceway of the bearing.

Table 2. Performance of each machine learning model.

Model	Best Feature Combination	Best Test Accuracy
Random Forest	Vib_X, Flux	100%
XGBoost	Vib_Y, I_A	99.67%
Support Vector Machine	Vib_Z, Flux	69.58%

Table 3. Five-fold cross-validation mean test accuracy and 95% confidence interval for each machine learning model with its optimal feature combination.

Model	Best Feature Combination	Mean Test Accuracy	Std Dev	95% Confidence Interval
Random Forest	Vib_X, Flux	99.75%	0.0022	(0.9947, 1.0003)
XGBoost	Vib_Y, I_A	99.48%	0.0013	(0.9932, 0.9965)
SVM	Vib_Z, Flux	68.20%	0.0094	(0.6703, 0.6937)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rajabioun, R.; Atan, Ö. Optimal Source Selection for Distributed Bearing Fault Classification Using Wavelet Transform and Machine Learning Algorithms. Appl. Sci. 2025, 15, 10631. https://doi.org/10.3390/app151910631

AMA Style

Rajabioun R, Atan Ö. Optimal Source Selection for Distributed Bearing Fault Classification Using Wavelet Transform and Machine Learning Algorithms. Applied Sciences. 2025; 15(19):10631. https://doi.org/10.3390/app151910631

Chicago/Turabian Style

Rajabioun, Ramin, and Özkan Atan. 2025. "Optimal Source Selection for Distributed Bearing Fault Classification Using Wavelet Transform and Machine Learning Algorithms" Applied Sciences 15, no. 19: 10631. https://doi.org/10.3390/app151910631

APA Style

Rajabioun, R., & Atan, Ö. (2025). Optimal Source Selection for Distributed Bearing Fault Classification Using Wavelet Transform and Machine Learning Algorithms. Applied Sciences, 15(19), 10631. https://doi.org/10.3390/app151910631

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Source Selection for Distributed Bearing Fault Classification Using Wavelet Transform and Machine Learning Algorithms

Abstract

1. Introduction

2. Bearing Faults

3. Experimental Setup and Analysis Pipeline

3.1. Data Processing and Feature Extraction

3.2. Dataset Construction and Preparation

3.3. Machine Learning Pipeline

3.4. Evaluation and Output

4. Feature Extraction and Input Selection

4.1. Wavelet Transform for Feature Extraction

4.2. Input Source (Sensor) Selection

4.3. Results of Optimal Input and Feature Selection

5. Comparative Study of Machine Learning Models for Bearing Faults

5.1. Random Forest

5.2. XGBoost

5.3. Support Vector Machine (SVM)

5.4. Comparison of Model Performance

5.5. Selection of Optimal Input Sources

6. Discussion and Conclusions

6.1. Discussion

6.2. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI