1. Introduction
Predictive maintenance (PdM) has attracted increasing academic and industrial attention, yet significant challenges remain. Traditional approaches based on condition monitoring, statistical modeling, or rule-based systems have demonstrated value in detecting early signs of degradation, but they often struggle with scalability, adaptability, and integration across heterogeneous industrial environments [
1,
2]. Recent advances in artificial intelligence, deep learning, and edge computing promise higher predictive accuracy and real-time insights, but they also introduce new complexities related to data quality, missing values, and computational requirements [
3,
4]. Moreover, while asset management frameworks such as ISO 55001 [
5] and life cycle optimization emphasize aligning maintenance with sustainability and long-term value creation [
6,
7], few studies propose integrative solutions that combine explainable AI with practical industrial deployment. This gap highlights the need for research that not only advances predictive accuracy but also ensures interpretability, robustness to incomplete data, and alignment with broader asset management strategies. Addressing this gap constitutes the core motivation of this study.
To this end, the main objective of this work is to design and evaluate an explainable predictive maintenance framework tailored for industrial environments with heterogeneous data sources. The contributions of this study are threefold:
Integration of AI with asset management principles: This approach bridges advanced machine learning with ISO 55001-aligned practices to ensure both predictive accuracy and sustainability considerations [
6,
7].
Robust handling of data limitations: This approach addresses incomplete, noisy, or missing sensor data by incorporating recent advances in imputation and hybrid modeling [
4].
Explainability for industrial adoption: Unlike most PdM studies focused exclusively on accuracy, this research emphasizes transparent and interpretable models to foster trust and support decision-making by maintenance managers.
This study is framed within the broader transformation of the Fourth Industrial Revolution (Industry 4.0), where cyber-physical systems, the Industrial Internet of Things (IIoT), and AI are reshaping traditional manufacturing. The core objective of Industry 4.0 is to shift from reactive or preventive maintenance to intelligent, data-driven strategies that enhance asset availability, reduce downtime, and optimize operational efficiency [
8,
9]. Among these strategies, PdM has emerged as a key enabler by anticipating failures from historical and real-time machine data. However, despite advancements in machine learning, widespread adoption in industry remains limited due to concerns over transparency and trust [
10].
In high-stakes manufacturing contexts such as textile production, continuous operation and process reliability are critical. Black-box models like deep neural networks may be mistrusted by operators, underscoring the need for interpretable AI. Probabilistic Graphical Models (PGMs), particularly Bayesian Networks (BNs), offer a promising solution by representing causal dependencies in a transparent and modular way [
11,
12]. In this study, a BN learned automatically via a Hill Climbing algorithm with the Bayesian Information Criterion (BIC) [
13,
14] is combined with an XGBoost classifier [
15]. This hybrid framework balances interpretability and predictive accuracy, ensuring early fault detection while maintaining operator trust.
The proposed approach is validated in a real-world case study at Grupo Wendy, a Mexican textile manufacturer. Sensor data from a multi-needle quilting machine were integrated with historical maintenance logs, demonstrating that combining interpretable probabilistic models with high-performance classifiers yields a reliable and practical solution for predictive maintenance in Industry 4.0 environments.
2. Hybrid Fault Prediction Architecture with XGBoost and Bayesian Networks
This section describes the complete system architecture designed for fault prediction in an industrial quilting machine. The proposed framework integrates real-time sensor-based monitoring, fault-state labeling using historical maintenance records, and a hybrid learning strategy that combines a structure-learned Bayesian Network with an XGBoost classifier. This integration aims to enhance both predictive performance and interpretability, addressing practical challenges in deploying predictive maintenance systems in Industry 4.0 settings.
2.1. Use Case: Textile Quilting Machine
The case study was conducted at Grupo Wendy, a mattress manufacturer that operates a high-speed quilting machine integrating automated fabric feeding, cutting, stitching, and motorized control. These machines operate continuously across shifts and are subject to wear and transient faults, particularly in components such as the thread-cutting unit, needle shafts, and drive motors. Due to the high mechanical throughput, minor anomalies can quickly escalate into system failures, resulting in costly downtime and production losses.
To enable predictive maintenance, the company utilizes the MP9 computerized maintenance management system (CMMS) [
16], which logs both preventive and corrective maintenance activities. These records include failure codes, timestamps, component replacements, and descriptions of interventions. Combined with real-time sensor data, these maintenance logs serve as the foundation for the proposed fault prediction system.
2.2. Edge-Based Sensor Acquisition Platform
A low-cost, modular data acquisition system was developed to monitor critical operational parameters of the quilting machine. As shown in
Figure 1, the system comprises the following sensing components:
Vibration sensing: A 3-axis accelerometer ADXL345 for detecting mechanical imbalances and signs of component wear.
Temperature monitoring: A DHT21 digital sensor for measuring both ambient and internal heat accumulation.
Electrical current measurement: A clamp-type current sensor SCT013 for estimating motor load and mechanical stress.
Particulate Matter Sensor: A PMS5003 laser-based sensor for monitoring airborne particle concentration, providing an indirect indicator of environmental conditions such as lint accumulation or fiber release around the quilting machine.
These sensors are interfaced with a Particle IoT Solution [
17] based on the ESP-32 microcontroller [
18] architecture, which samples data at 1 Hz and transmits to the SmartFlow PdM Platform [
19]. Over a three-month deployment period, the system collected approximately 220,000 timestamped records, synchronized with operational shifts, and logged maintenance events. This architecture supports continuous monitoring without disrupting machine operation and is designed for scalability. It can be easily adapted to other machines or production lines with minimal hardware and software reconfiguration.
Figure 2 presents the conceptual research framework developed in this study. The framework is structured around three main stages: (i) data acquisition and preprocessing, which involves sensor data collection, cleaning, and imputation to address missing or noisy records [
4]; (ii) predictive modeling and explanation, where hybrid AI models are trained to forecast equipment failures while generating interpretable insights for practitioners; and (iii) decision support and alignment with asset management strategies, ensuring that maintenance recommendations are consistent with ISO 55001 principles and life cycle optimization goals [
6,
7].
Figure 2 provides a visual overview of how data flows through the system, the role of AI models, and how the outputs contribute to sustainable and explainable predictive maintenance.
The experimental data used in this study was collected from a multi-sensor instrumentation setup installed on a textile manufacturing machine. The system integrates vibration, current, temperature, and particulate matter sensors, with data transmitted via a Cisco gateway and Particle.io cloud [
17] to an InfluxDB repository. This multi-modal dataset reflects both mechanical and environmental conditions, allowing for a holistic representation of machine health.
Prior to model training, several preprocessing steps were applied. First, raw signals were synchronized and resampled to a common time base to ensure comparability across sensors. Second, missing or corrupted values—arising from communication losses or sensor interruptions—were imputed using hybrid statistical and machine learning methods that preserve temporal correlations [
4]. Third, noise reduction was performed through low-pass filtering and window-based smoothing. Finally, feature extraction was conducted to derive relevant indicators from vibration and current signals, such as root mean square (RMS), kurtosis, and spectral entropy, which are widely recognized in fault detection literature [
3].
This preprocessing pipeline ensured that the dataset provided a robust foundation for predictive modeling while addressing common challenges of real-world industrial data, such as incompleteness, heterogeneity, and noise.
The predictive modeling stage combines complementary machine learning methods to balance interpretability with predictive performance, while the decision-support stage ensures alignment with asset management principles.
2.3. Data Sources and Processing
The experimental data used in this study were collected from a multi-sensor instrumentation setup installed on a textile quilting machine (
Figure 3). The system integrates vibration, electrical current, temperature, and particulate matter sensors, with signals transmitted through a Cisco gateway to the Particle.io cloud and stored in an InfluxDB repository. This architecture enables the acquisition of a multi-modal dataset that reflects both mechanical and environmental operating conditions, providing a holistic representation of quilting machine health.
Prior to model training, several preprocessing steps were performed to ensure data quality and consistency. First, raw signals from different sensors were synchronized and resampled to a common time base, facilitating comparability across modalities. Second, missing or corrupted values—resulting from sensor interruptions or communication losses—were imputed using a combination of statistical and machine learning techniques designed to preserve temporal correlations [
4]. Third, noise reduction was applied using low-pass filtering and window-based smoothing, reducing the impact of high-frequency disturbances. Finally, feature extraction was conducted to derive informative indicators from vibration and current signals, including root mean square (RMS), kurtosis, and spectral entropy, which are widely recognized in fault detection and predictive maintenance literature [
3].
In addition to these statistical indicators, rolling-window features were computed over multiple time frames (e.g., averages and variances of vibration and current signals across 5–30 s intervals). These engineered features allowed the otherwise static Bayesian Network to capture short-term temporal patterns and fluctuations, effectively embedding a degree of time-awareness into the dataset. This approach provided a computationally efficient way to represent temporal dynamics without explicitly modeling state transitions, ensuring compatibility with the BN framework while preserving predictive performance.
This preprocessing pipeline ensured that the dataset provided a robust foundation for predictive modeling, while explicitly addressing common challenges of real-world industrial data such as incompleteness, heterogeneity, and noise.
This preprocessing pipeline ensured that the final dataset was robust against noise, incompleteness, and heterogeneity, providing a reliable foundation for the hybrid predictive modeling framework.
2.4. Labeling Degradation States from Maintenance Logs
To prepare the raw sensor data for supervised learning, a degradation labeling strategy was implemented based on historical maintenance logs from the MP9 CMMS. Each sensor record was assigned to a degradation class according to its timestamp relative to documented failure events. Three classes were defined:
Healthy: Normal operation with no faults reported in the near future.
Medium degradation: Occurs within 24 h prior to a minor fault or maintenance alert.
High degradation: Occurs immediately before a major failure event.
This time-aware labeling approach enabled the generation of a supervised dataset that captures realistic degradation transitions, which are essential for training models to detect early signs of machine failure.
To ensure semantic accuracy and reliability, the labeling scheme was validated not only against maintenance records but also through direct input from machine operators, who report anomalies and failures at the moment of occurrence. This operational feedback reduces the risk of time misalignment or overlooked events in the CMMS logs. Recognizing that maintenance logs may still contain delays, omissions, or inconsistencies, additional safeguards were applied:
Operator confirmation: Events logged in the CMMS were cross-checked with operator-reported failures to strengthen label validity.
Consistency checks: Temporal alignment between sensor anomalies and logged/operator-confirmed failures was verified to minimize mislabeling.
Robust handling of missing values: Hybrid statistical and machine-learning imputation methods preserved signal continuity when sensor records were incomplete.
Prospective validation: Real-time annotation strategies, potentially supported by edge computing, were identified as a future enhancement to further reduce reliance on retrospective logs.
Through these measures, the final labeled dataset maintained robustness against noise, incompleteness, and human reporting errors, providing a reliable foundation for training both the Bayesian Network and the XGBoost classifier.
2.5. Mathematical Foundations of Bayesian Inference
The Bayesian Network component of the proposed framework relies on probabilistic inference to quantify uncertainty in fault prediction. Given observed sensor evidence
D, Bayesian inference updates prior beliefs about system states into posterior distributions using Bayes’ theorem:
where
represents the prior distribution over parameters or states,
is the likelihood of observing the data under those parameters, and
is the marginal evidence ensuring normalization.
For predictive maintenance tasks, interest often lies in the predictive distribution of an impending failure event y given new sensor inputs x and the training dataset
D:
This formulation captures both model uncertainty (via the posterior
) and data uncertainty (via
. In practice, the Bayesian Network uses its learned conditional probability tables (CPTs) to approximate this integral, producing interpretable probability estimates of machine health states under varying operational conditions. This probabilistic foundation distinguishes BN from purely discriminative approaches, as it provides transparent reasoning pathways and allows operators to understand why certain fault predictions are made [
13].
Structure Learning of the Bayesian Network
The Bayesian Network (BN) was designed to model probabilistic dependencies among process variables, degradation states, and failure conditions. Unlike traditional approaches that require manual specification of network structure, this study employed a data-driven structure learning method using a Hill Climbing search algorithm guided by the Bayesian Information Criterion Score (BICS) [
13,
14].
Let
G denote a candidate network structure and
D the dataset. The BIC score is given by:
where
L is the likelihood of the data given the structure, k is the number of model parameters, and
n is the number of data points. This criterion favors parsimonious models that fit the data well without overfitting.
The learned Directed Acyclic Graph (DAG) captures both causal and statistical relationships among key input features (e.g., temperature, vibration), intermediate states (e.g., current draw), and degradation labels. The resulting model supports:
Probabilistic inference: Estimating the likelihood of future faults based on current sensor values.
Diagnostic explanation: Identifying likely root causes contributing to elevated fault probabilities.
Graphical visualization: Offering intuitive representations of cause-and-effect relationships among operational variables.
This structure serves as the foundation for interpretable, model-based diagnostics that complement the performance of black-box classifiers in the hybrid architecture.
To contextualize this choice,
Table 1 compares Hill Climbing with alternative structure learning approaches commonly used for Bayesian Networks, highlighting their respective advantages and limitations.
Hill Climbing with BIC was chosen for this study due to its balance of computational efficiency and interpretability in medium-sized datasets. While simulated annealing and genetic algorithms may find more globally optimal structures, they incur significantly higher computational costs, which may not be justified in real-time industrial applications.
2.6. Algorithm Selection
The choice of BN and XGBoost was driven by their complementary strengths for predictive maintenance. BNs provide interpretable probabilistic reasoning and causal representation of variable dependencies, enabling transparency and human-in-the-loop diagnostics [
3]. By contrast, XGBoost, a gradient-boosted decision tree ensemble, offers state-of-the-art classification accuracy, robustness to noisy or incomplete data, and reliable performance on structured industrial datasets [
4].
By combining these algorithms, the framework addresses two critical requirements: interpretability for operator trust and high predictive performance for early fault detection. This dual alignment makes them particularly suited to Industry 4.0 maintenance contexts, where explainability and accuracy must coexist.
2.7. Hybrid Learning Framework
The core of the proposed fault prediction architecture is a hybrid learning framework that integrates two complementary machine learning models: a BN plus XGBoost classifier. Rather than serving in a primary–secondary configuration, both models operate in parallel, addressing distinct but equally important objectives: interpretability and predictive accuracy.
The BN offers a probabilistic graphical representation of relationships among sensor variables and degradation states. Its structure, learned from historical data as detailed in
Section 2.4, enables causal reasoning and transparent decision support. However, BNs may underperform in scenarios characterized by complex, nonlinear interactions or noisy, high-variance sensor data—conditions commonly found in industrial environments.
To address these limitations, architecture incorporates XGBoost, a gradient-boosted decision tree algorithm well suited for structured tabular data [
15]. XGBoost is known for its ability to model subtle feature interactions and to generalize effectively in high-dimensional spaces, making it ideal for detecting early-stage faults that may not be apparent through probabilistic reasoning alone.
In this dual-model configuration:
The BN model provides interpretable inferences and graphical causal explanations that can be directly used by human operators.
The XGBoost model delivers high classification performance, identifying nuanced patterns in the sensor data to support accurate and timely fault detection.
Both models are trained on the same input dataset, consisting of statistical features (e.g., mean, maximum, standard deviation) extracted from sliding windows over the vibration, temperature, and current signals. These features are aligned with degradation labels derived from maintenance logs, ensuring consistency in the supervised learning process.
The hybrid integration offers multiple advantages:
Redundancy: Either model can independently detect faults, increasing system robustness.
Cross-validation: Discrepancies between models may indicate uncertainty, prompting further inspection.
Complementarity: XGBoost improves accuracy in complex cases, while the BN provides traceable reasoning for operator trust.
To ensure interoperability between the interpretable reasoning of the BN and the high-accuracy predictions of XGBoost, a consensus-based strategy is applied. The BN contributes causal explanations and constraint-based alerts, while XGBoost captures nonlinear patterns within the data. A decision fusion layer dynamically balances their contributions: in contexts with well-defined causal relationships, BN outputs guide the decision process, whereas in scenarios characterized by complex nonlinearities, XGBoost predictions are prioritized.
The consensus-based integration of the BN and XGBoost is illustrated in
Figure 4. As shown in the diagram, both models are trained on the same feature-engineered dataset and operate in parallel during inference. Their outputs are fused in a decision layer that balances causal interpretability with predictive accuracy. In scenarios where clear causal dependencies are present, the BN output is prioritized to support explainability, while in cases with complex nonlinear patterns, the XGBoost prediction takes precedence. This strategy ensures both robustness and transparency in fault detection, reducing the risks of missed or unexplained failures.
This hybrid learning framework embodies the goals of Industry 4.0 predictive maintenance by combining high-performance detection with human-centered interpretability, ensuring both technical effectiveness and user acceptance.
2.8. Hyperparameter Optimization
Hyperparameter tuning for XGBoost was performed via grid search over tree depth, learning rate, and regularization parameters (see
Table 2 for details). A 5-fold cross-validation scheme was used during training, with the F1-score—particularly for the “high-degradation” class—as the primary optimization metric to minimize the risk of false negatives in critical fault conditions.
While grid search provided satisfactory results, it is computationally expensive; Bayesian Optimization offers a promising alternative by modeling the objective function and efficiently exploring parameter space.
2.8.1. Sensitivity Analysis of Hyperparameters
To assess the robustness of the XGBoost classifier, a sensitivity analysis was performed on its key hyperparameters: maximum tree depth, learning rate, subsample ratio, and the number of estimators. Each parameter was varied individually within its predefined search space as presented in
Table 3, while the others were fixed at their optimal values, and model performance was evaluated on the validation set.
The sensitivity analysis demonstrates that the chosen hyperparameter configuration (depth 7, learning rate 0.05, subsample 0.8, estimators 200) represents a balanced solution, maximizing predictive performance while avoiding overfitting or excessive computational cost. These results strengthen the reproducibility of the study by showing that model performance is not overly dependent on extreme parameter tuning but instead remains stable within a reasonable range of values.
2.8.2. Cross-Validation Strategy
During hyperparameter optimization of the XGBoost model, a 5-fold cross-validation scheme was employed on the training set. The dataset was partitioned into five equally sized folds; in each iteration, four folds were used for training while the remaining fold was used for validation. This process was repeated five times so that every fold served as validation once, and the average performance across folds was recorded.
The rationale for selecting five folds was to balance computational efficiency with robustness of evaluation. While higher fold counts (e.g., 10-fold CV) may provide slightly more stable estimates, they incur substantially greater training costs. Conversely, fewer folds (e.g., 3-fold CV) reduce computation time but risk higher variance in performance estimates. In preliminary tests, 5-fold cross-validation provided stable results while remaining computationally feasible given the scale of the dataset (~27,000 sensor records).
Importantly, cross-validation was used only during model training and hyperparameter selection. The final test set, corresponding to the last month of machine operation, remained completely unseen during both training and validation, ensuring that reported results reflect true generalization performance under realistic predictive maintenance scenarios.
2.9. Dataset Partitioning and Evaluation Protocol
To ensure that model evaluation reflects realistic predictive maintenance scenarios, the dataset was partitioned chronologically rather than randomly. This time-aware strategy preserves the natural evolution of machine states and prevents information leakage from future to past data, a critical consideration in real-world industrial settings.
The dataset comprised approximately three months of continuous operation logs, totaling ~26,600 sensor records (corresponding to ~750–900 machine-hours). Following this temporal split, 3 frames were defined as summarized in
Table 4: TF1 (~9200 records, Weeks 1–4) was used for training, TF2 (~8400 records, Weeks 5–8) for validation (hyperparameter tuning, feature selection, and threshold calibration), and TF3 (~9000 records, Weeks 9–12).
The dataset was divided into three segments:
Training Set: Data collected during the first month of machine operation, used to train both the BN and the XGBoost models.
Validation Set: Data from the second month, used for hyperparameter tuning, feature selection, and classification threshold calibration.
Test Set: Data from the third month, representing previously unseen operational behavior for final performance evaluation.
This temporal partitioning mimics deployment conditions in which models must generalize to future data based solely on past observations. Such an approach is especially suitable in industrial applications where degradation patterns may shift over time due to operational cycles, ambient conditions, or maintenance interventions.
3. Results and Analysis
This section presents the evaluation results of the hybrid learning framework introduced in
Section 2. We report and compare the performance metrics of the BN and XGBoost models, examine the interpretability of the learned Bayesian structure, and analyze how both models complement each other in supporting reliable and explainable fault prediction.
3.1. Classification Performance
The predictive performance of both the BN and the XGBoost classifier was evaluated using the test set comprising data from the third month of machine operation. As described in
Section 2.6, the models were trained on data from the first month and validated with data from the second month to ensure temporal consistency.
Evaluation Metrics
To verify the validity and model performance, the following standard classification metrics were used:
Precision (Positive Predictive Value): The proportion of predicted fault cases that were correct.
Recall (Sensitivity): The proportion of actual fault cases correctly identified.
F1-Score: The harmonic mean of precision and recall, providing a balanced measure of accuracy, especially under class imbalance.
Particular emphasis was placed on the “high-degradation” class, as it represents imminent failures and holds the greatest operational risk. In such class-imbalanced scenarios, the F1-score offers a robust and informative metric for evaluating model effectiveness.
RMSE evaluation metrics were utilized: Root Mean Square Error [
20]. RMSE can measure the extent to which the predicted failure value deviates from the real value. At the same time, Score is an index proposed by the PHM08 data competition to evaluate the predicted performance. The smaller the value of the RMSE index is, the better the prediction performance will be. These evaluation indexes will be used comprehensively to evaluate the model’s prediction performance. The formula for RMSE is as follows:
where
represents the total number of samples, and
and
represent the predicted failure value and the actual failure value, respectively.
Table 5 summarizes the precision, recall, and F1-score for each model across the three degradation classes, with particular emphasis on the “high-degradation” category, which corresponds to imminent machine failure.
As expected, the XGBoost model outperformed the BN across all metrics, particularly in terms of precision and recall. This result is consistent with XGBoost’s strength in modeling complex, nonlinear interactions and subtle patterns within high-dimensional sensor data [
15].
Despite the difference in raw performance, the BN offered additional benefits in terms of model transparency and explainability, which are analyzed in the following subsections.
3.2. Structure and Interpretability of the Bayesian Networks
A key advantage of Bayesian Networks lies in their ability to represent probabilistic dependencies through a Directed Acyclic Graph (DAG), offering a transparent and interpretable framework for decision-making. In this study, the network structure was learned directly from the labeled dataset using a Hill Climbing algorithm with the Bayesian Information Criterion Score (BICS) as the optimization objective [
13,
14], as illustrated in
Figure 5.
The learned structure reveals several operationally meaningful relationships, including:
A strong directed link from motor temperature to current draw, suggesting thermal effects on electrical load.
Conditional dependencies between current draw and vibration amplitude, potentially indicating mechanical friction or misalignment.
Paths connecting operational variables—such as shift duration and machine workload—with increased degradation probabilities.
These structural insights align with domain expertise provided by maintenance personnel and validate the model’s ability to capture causally relevant relationships. Beyond visualization, the BN supports dynamic, interpretable diagnostics through probabilistic inference.
Operators can explore posterior probabilities of degradation given specific sensor observations, enabling them to understand the likelihood of future faults and the influence of each variable. This capability enhances user trust in the system and supports explainable decision-making in industrial maintenance workflows.
3.3. Inference and Decision Support Capabilities
Beyond classification, the Bayesian Network supports real-time probabilistic inference, enabling operators to estimate the likelihood of future faults based on observed sensor conditions. For example, by inputting specific values for temperature, current, and vibration, the system can compute posterior probabilities for each degradation class, facilitating early and informed maintenance decisions. For instance, in the test set, inference queries such as:
This capability enables:
Proactive maintenance: Triggering alerts before failures occur, based on increasing risk levels inferred from current sensor patterns.
Root cause analysis: Identifying which variables most strongly contribute to fault probability in a given context.
Operator guidance: Offering intuitive explanations for elevated risk, enhancing user understanding and acceptance.
Because the BN encodes causal relationships among variables, it allows users to explore how changes in one parameter affect others, fostering an interactive and educational environment. Maintenance teams can visualize degradation pathways and use these insights to prioritize inspections or adjust operational parameters accordingly.
Moreover, the graphical structure of the BN serves as a human-readable interface for understanding complex interactions that are often hidden in black-box models. This interpretability is particularly valuable in industrial environments where traceability and operator accountability are essential.
3.4. Model Complementarity and Hybrid Operation
While the XGBoost classifier demonstrates superior performance in identifying individual failure modes with high precision, it operates as a black-box model and does not offer interpretable reasoning behind its outputs. By contrast, the Probabilistic Graphical Model (PGM), implemented as a Bayesian Network, offers transparent diagnostic inference by revealing how each sensing element contributes to fault likelihood through its conditional probability tables (CPTs).
This distinction highlights a natural synergy between the two models. XGBoost acts as a highly accurate multiclass classifier for early detection of machine component failures, whereas the BN provides human-understandable explanations for why a fault may occur, based on sensor evidence.
For instance, as shown in the diagnostic inference output (
Figure 6), the PGM returns probabilistic reasoning for each failure class based on observed variables such as age and mean vibration. The dominant posterior probability is assigned to the “no failure” state (0.9854), while residual probabilities across “Band”, “Encoder”, “MSC”, and “Bearing” failures are extremely low—consistent with a healthy component state. This dual-model configuration enables several operational advantages:
High precision and interpretability: XGBoost serves as a robust detector, while the BN contextualizes alerts by identifying which sensors influence which faults and their likelihood.
Explanatory depth: The BN model allows querying arbitrary variables (e.g., what is the failure probability given this temperature, vibration signal and current load level), enhancing transparency.
Redundant fault detection: Parallel operation improves robustness by ensuring that critical faults are not missed, even if one model underperforms under certain conditions.
Figure 6 shows how the BN performs fault inference from sensor evidence in a transparent manner. This hybrid approach supports both high-performance automation and human-in-the-loop trust—core requirements for intelligent fault diagnosis in Industry 4.0.
In production settings, this dual-model strategy minimizes the risks of both false positives and false negatives. When both models agree, the prediction gains additional credibility. When they differ, the disagreement itself becomes a diagnostic signal, offering valuable information that can trigger further investigation or override automated actions.
This layered architecture aligns with best practices in resilient system design, ensuring both predictive performance and human-centered explainability—two essential pillars for the successful adoption of AI in Industry 4.0 environments.
3.5. Benchmarking with State-of-the-Art Models
To validate the effectiveness of the proposed hybrid BN–XGBoost framework, we benchmarked it against both traditional machine learning models (Logistic Regression, Support Vector Machines, Random Forest) and advanced deep learning architectures (CNN, LSTM variants, GRU-based hybrids, and attention-driven networks).
All baseline models were trained and tested on the same labeled dataset described in
Section 2, reflecting a more realistic deployment scenario where models must generalize to unseen operational periods. Hyperparameters for the baseline models were tuned using grid search to avoid under- or over-estimation of their performance.
The results in
Table 6 show that the hybrid BN + XGBoost approach consistently outperformed the compared models in terms of F1-score and AUC-ROC, particularly for the high-degradation class, which is operationally the most critical. While the ABGRU and Multi-attention + TCN provided competitive accuracy, they lacked interpretability, limiting their adoption in high-stakes industrial environments.
The results summarized in
Table 7 confirm that the proposed hybrid framework achieves superior performance. Specifically, it obtained the lowest RMSE in three of the four evaluation time frames and the lowest overall average RMSE (14.47), confirming its robustness under varying operating conditions. Similarly, the Score metric highlights its superior ability to anticipate failures, with the lowest average value among all models.
The BN alone provided explainable results but slightly lower accuracy. The hybrid framework thus offered the best trade-off between predictive power and interpretability, demonstrating its suitability for predictive maintenance in textile manufacturing.
3.6. Comparison with the State-of-the-Art Methods
To complement the quantitative benchmarks presented in
Table 6 and
Table 7, a set of visual analyses was performed to provide deeper insight into classification behavior and model performance across component fault classes (Belt, Encoder, MSC, Bearing, None).
Figure 7 illustrates the confusion matrices for the proposed Hybrid BN + XGBoost framework and a CNN baseline. The hybrid model demonstrates superior classification across all fault categories, achieving high recall for Belt and Bearing failures while maintaining a very low false positive rate for the None class. Some Encoder instances are misclassified as None, reflecting the smaller dataset size for Encoder faults, yet overall classification performance remains strong. In comparison, the CNN baseline shows more frequent misclassifications, particularly confusing Belt and Bearing faults with None, thus reducing its reliability for early fault detection.
Figure 8 illustrates ROC curves of the Hybrid BN + XGBoost framework with state-of-the-art models including CNN [
21], VAE + LSTM [
22], Multi-attention + TCN [
24], and ABGRU [
25]. The hybrid framework achieves an AUC of 0.90, outperforming CNN and VAE + LSTM, and performing on par with or slightly better than Multi-attention + TCN and ABGRU. This demonstrates the superior discriminative power of the proposed framework in distinguishing between failure and non-failure states.
Figure 9 presents Accuracy, Precision, Recall, and F1-Score comparisons across the same set of models. The Hybrid BN + XGBoost consistently provides the best overall balance, achieving Accuracy = 0.89, Precision = 0.87, Recall = 0.86, and F1-Score = 0.85. While advanced deep learning models such as Multi-attention + TCN and ABGRU achieve competitive performance, they lack interpretability. The hybrid framework uniquely combines competitive performance with high interpretability enabled by its BN, offering clear causal reasoning and transparent diagnostics for operators.
In summary, the visual analyses presented in
Figure 7,
Figure 8 and
Figure 9 confirm that the proposed Hybrid BN + XGBoost framework not only surpasses CNN and VAE + LSTM in discriminative capability and classification accuracy, but also matches or outperforms the most competitive architectures (Multi-attention + TCN and ABGRU). Together with its interpretability advantage, this dual strength underscores its value as a practical and trustworthy solution for predictive maintenance in Industry 4.0 environments.
3.7. Prognostic Results Analysis
The predicted and actual component failure accuracy is presented in
Figure 10. As shown in the figure, the BN predictions generally follow the true degradation patterns across all monitored subsystems. For the Bearing (QM.TL.01) in
Figure 10a, the predicted failure trends remain highly consistent with the observed values throughout the year, indicating that the model can effectively capture degradation in rotational elements, which are often among the earliest to fail in textile machinery.
In the case of the Belt (QM.TL.02) in
Figure 10b, the predicted results exhibit greater variability, with occasional underestimation of actual failures, particularly during mid-year operation. This behavior suggests that while the model successfully anticipates belt degradation, transient load fluctuations and environmental factors introduce noise that reduces prediction stability.
For the Control Module (QM.SL.10) in
Figure 10c, many predicted results align closely with the observed failures. In some periods, the model produces slightly conservative estimates, with predictions dipping below the true values. This conservatism is advantageous in practice, as it provides earlier warnings of potential failures in electronic subsystems, thereby mitigating operational risks.
Overall, the results demonstrate that the BN Model is capable of accurately predicting failures across both mechanical (bearing, belt) and electronic (control module) components. While some underestimation is observed, particularly for the belt subsystem, the model consistently reduces the risks associated with late or missed failure detections, thereby supporting proactive maintenance scheduling and minimizing unplanned downtime.
3.8. Deployment Readiness and Operational Benefits
From a deployment perspective, the proposed hybrid architecture offers tangible advantages in terms of scalability, usability, and integration with existing industrial workflows. The BN graphical interface provides maintenance teams with interpretable diagnostics that facilitate rapid decision-making, while the XGBoost model delivers high-speed fault classification with low latency.
Table 8 summarizes the average inference latencies per sample obtained on an NVIDIA Jetson Nano edge device. All evaluated models achieved latencies below 100 ms, ensuring feasibility for real-time predictive maintenance in industrial environments.
The entire system was designed with edge deployment in mind. The sensor acquisition platform is built using cost-effective, industry-compatible components, and both models are computationally efficient. The BN, represented as a compact Directed Acyclic Graph, requires minimal processing power, making it suitable for integration into edge-computing environments. The XGBoost model, once trained, can be deployed on embedded devices to generate real-time predictions without requiring cloud-based inference.
As illustrated in the deployment interface in
Figure 11, operators have access to real-time visualizations of key performance indicators such as motor temperature, vibration intensity, current load, and machine availability. Events such as mechanical stoppages, thread misfeeds, and abnormal thermal behavior are automatically detected, labeled, and associated with specific operators and timestamps. This feature enhances traceability and supports continuous process improvement.
Furthermore, the system’s modular architecture allows for straightforward adaptation to different machines or production lines. This makes it particularly suitable for small- and medium-sized enterprises (SMEs) seeking accessible predictive maintenance solutions aligned with Industry 4.0 standards.
4. Discussion
The results of this study demonstrate that a hybrid predictive maintenance architecture—combining a structure-learned BN with an XGBoost classifier—can effectively balance predictive accuracy and interpretability in real-world industrial environments. This section discusses the implications of the findings, the complementary strengths of the two models, and the system’s readiness for practical deployment under Industry 4.0 paradigms.
4.1. Performance vs. Interpretability Trade-Off
A central insight from this research is the well-known trade-off between classification performance and model transparency [
10,
11]. The XGBoost classifier achieved superior performance across all evaluated metrics, particularly in detecting high-degradation cases, due to its ability to capture nonlinear relationships and complex feature interactions [
15].
However, this performance comes at the cost of explainability. As a black-box model, XGBoost does not provide insight into the reasoning behind its predictions, which may limit its adoption in operational settings where accountability and interpretability are essential. By contrast, the BN offers a transparent causal framework that supports visual reasoning and probabilistic inference, making it more suitable for human-in-the-loop validation, root cause analysis, and knowledge transfer.
4.2. Advantages of Data-Driven Structure Learning
A key contribution of this work lies in the automatic learning of the Bayesian Network structure from operational data using a Hill Climbing algorithm guided by the BIC score. This approach eliminates the need for manual specification of dependencies and enables the discovery of latent causal relationships that may not be obvious to human experts.
Benefits include:
This method provides several benefits:
Adaptability to changing machine behavior or production contexts.
Reduced reliance on expert knowledge for model design.
Improved alignment between data-driven insights and operator intuition.
Validation of the learned structure by maintenance engineers confirmed its practical relevance and increased operator trust in the system.
4.3. Synergistic Use of XGBoost and BN
Rather than positioning XGBoost and BN as alternatives, this study presents them as complementary components within a unified framework. The models address different limitations: XGBoost excels in predictive accuracy, while BN provides interpretability and reasoning support. Their integration enables multiple strategies:
The system enables multiple integration strategies:
Dual-alerting: Simultaneous predictions increase reliability and user confidence.
Cross-verification: Discrepancies highlight ambiguous cases for further review.
Tiered decision-making: High-risk events identified by XGBoost can be evaluated in depth using Bayesian inference.
This synergy promotes operational efficiency and organizational trust by combining robust automation with transparent diagnostics.
4.4. Real-World Integration Potential
The proposed architecture was designed for practical deployment. The sensor infrastructure and machine-learning models are lightweight and affordable, suitable for SMEs and scalable to multiple machines or production lines. The integration with human–machine interfaces (HMIs) enhances real-time operator awareness and supports traceable, data-informed maintenance planning.
Such a system not only improves fault detection but also fosters a culture of data literacy and proactive decision-making among plant personnel—key drivers for successful Industry 4.0 transformation. Comparable industrial IoT platforms, such as Siemens MindSphere, have demonstrated significant reductions in downtime and maintenance costs by integrating sensor data with cloud-based analytics [
26]. Positioning our hybrid framework alongside such benchmarks underscores its practical scalability and industrial relevance, particularly for SMEs seeking accessible predictive maintenance solutions.
4.5. Limitations and Future Work
Despite its promising results, the present study has several limitations:
Rolling-window features provided an indirect way to incorporate short-term temporal dynamics; the current BN remains fundamentally static and lacks explicit memory of past states. An alternative lies in Dynamic Bayesian Networks (DBNs), which extend conventional BNs by introducing temporal nodes that capture state transitions between consecutive time slices. Unlike rolling-window features, DBNs can model longer-term degradation trajectories and state persistence, such as the increasing likelihood of failure after multiple consecutive abnormal states. DBNs propagate probabilistic dependencies across time, enabling richer modeling of evolving equipment conditions.
Fault labeling depends on historical logs, which may be imprecise in time alignment. Integrating real-time failure signatures or unsupervised anomaly detection may improve label quality.
Model generalization has been validated on a single machine type. Broader testing across diverse equipment and operational conditions is required to ensure scalability.
These limitations highlight the need for approaches that can capture temporal dependencies, reduce reliance on retrospective logs, and extend validation across domains. Similar research directions can be observed in pipeline deformation prediction, where multi-sensor monitoring and hybrid deep learning models have been used to capture spatiotemporal correlations and improve forecasting accuracy [
27]. Motivated by these advances, future research will extend our hybrid framework toward dynamic probabilistic models and multi-source temporal learning, validate it across multiple machine types, incorporate real-time feedback loops, and evaluate the economic impact of model-driven approaches.
5. Conclusions
This study introduces a hybrid predictive maintenance framework tailored for industrial textile machinery, integrating a structure-learned BN with an XGBoost classifier. The architecture combines real-time sensor monitoring, maintenance-log-driven labeling, and a dual-model learning approach that prioritizes both predictive performance and model interpretability—two critical, and often conflicting, requirements in industrial AI systems.
The BN, automatically learned from historical data, captures interpretable causal relationships among operational variables and degradation states, offering explainable diagnostic support. In parallel, the XGBoost model delivers high discriminative power for early fault detection in complex and noisy sensor environments. When integrated, the models function as a complementary system:
Providing high-confidence predictions for degradation conditions.
Enabling probabilistic inference and root cause analysis through causal modeling.
Supporting operator-friendly diagnostics compatible with existing maintenance workflows.
The case study conducted in a mattress manufacturing facility confirmed the framework’s viability. XGBoost achieved an F1-score of 0.943 on the high-degradation class, while the BN offered actionable, interpretable outputs aligned with domain expertise. These results underscore the practical value of hybrid AI architectures for predictive maintenance in Industry 4.0.
Future Work
To further advance this line of research, future work will explore the following directions:
Temporal modeling: Implementing Dynamic Bayesian Networks to capture time-evolving degradation processes and support temporal inference over sequential sensor observations.
Edge deployment: Optimizing the full hybrid framework for execution in resource-constrained embedded environments, reducing reliance on centralized infrastructure and enabling real-time decision-making at the edge.
Multi-machine generalization: Validating the architecture across different machine types, production contexts, and fault profiles to ensure scalability and adaptability in heterogeneous industrial settings.
Cost–benefit evaluation: Quantifying the operational and financial impact of model-driven maintenance actions, including reductions in unplanned downtime, repair costs, and overall equipment lifecycle expenses.
RUL estimation and digital twins: Extending the hybrid model to map degradation classes into estimated Remaining Useful Life (RUL) intervals using statistical regression and survival modeling. Additionally, integrating the Bayesian inference engine with digital twin environments will enable lifecycle simulation, scenario-based risk analysis, and continuous calibration of fault models based on virtual–physical feedback loops.
Overall, the proposed system offers a scalable, interpretable, and high-performance solution that aligns with the goals of Industry 4.0—intelligent automation, data-driven decision-making, and trustworthy AI integration in complex operational environments.