1. Introduction
Predictive maintenance has become a critical strategy in the management of industrial machinery, driven by the need to minimize downtime, reduce operational costs, and enhance equipment reliability. The motivation for this study arises from the increasing importance of maintaining uninterrupted industrial operations and the substantial economic impact of unexpected machinery failures.
Traditionally, maintenance strategies have relied on scheduled inspections and reactive repairs [
1,
2]. These approaches often result in inefficient resource utilization and unplanned outages. In contrast, data-driven approaches use sensor data and advanced analytics to anticipate faults before they occur, enabling timely interventions and optimized maintenance schedules [
3,
4].
Recent advances in sensing and analytics have positioned predictive maintenance combining operational data with machine learning (ML) and deep learning (DL) [
5,
6] as a leading paradigm for early detection of deviations and failure prevention. Additionally, digital twins (DTs), which serve as high-fidelity virtual replicas of physical assets, provide safe and realistic testbeds for developing and evaluating predictive models.
Despite these advancements, several key challenges persist [
7,
8,
9]. These include trade-offs between supervised and unsupervised schemes, static versus dynamic alert thresholds, balancing black-box accuracy with model explainability [
10], and ensuring the external validity of models trained in simulated environments.
To address these gaps, this study proposes a practical two-level system that couples anomaly detection using an LSTM Autoencoder (AE) [
11,
12,
13] based on reconstruction error with fault prediction via an MLP classifier on operational features. The system is evaluated on multivariate sensor data collected over five years in a digital twin scenario, demonstrating robust anomaly separation and strong fault-prediction performance. Deployment implications, including threshold setting and model calibration for production integration, are also discussed.
Although hybrid predictive maintenance frameworks combining unsupervised anomaly detection with supervised fault classification have been discussed in prior research, this study contributes by focusing on system-level design, deployment-oriented validation, and practical integration within a digital twin industrial setting. Unlike approaches that rely on fully multivariate deep models or require labeled anomaly data, our method uses a univariate LSTM autoencoder applied to a high-sensitivity vibration signal for early anomaly detection, followed by a compact MLP classifier operating on a carefully selected set of operational and environmental features for fault prediction. This design aims to balance detection sensitivity, interpretability, and computational efficiency, addressing real-world constraints such as limited labeled data, sensor diversity, and the need for real-time deployment.
The novelty of this work lies in delivering a lightweight, interpretable, and deployable predictive maintenance framework validated on a realistic digital-twin dataset simulating five years of industrial operation. Our approach explicitly addresses industrial constraints such as computational efficiency, SCADA/cloud integration, and threshold calibration, while maintaining state-of-the-art accuracy. Unlike prior hybrid predictive maintenance approaches that typically depend on fully multivariate deep architectures or assume the availability of labeled anomaly data, this work focuses on a deployment-oriented design that combines univariate sequence-based anomaly detection on a high-sensitivity vibration signal with compact multivariate fault classification, targeting interpretability, computational efficiency, and realistic industrial constraints.
The objective of this study is to design, implement, and evaluate a two-level predictive maintenance framework for industrial machinery using machine learning models. Specifically, the work focuses on developing an LSTM autoencoder for unsupervised anomaly detection and a multilayer perceptron (MLP) for supervised fault prediction. The models are trained and validated on multivariate time-series data generated in a digital twin environment that realistically simulates industrial operations over several years. By combining advanced simulation with practical deployment considerations, this research seeks to bridge the gap between theoretical innovation and real-world applicability, offering a scalable and effective approach to predictive maintenance that can be readily adopted in industrial settings.
This work addresses the identified gaps by introducing a practical two-level predictive maintenance framework that integrates unsupervised anomaly detection with supervised fault prediction in a way that is both lightweight and interpretable. Unlike prior studies that focus primarily on algorithmic novelty, our approach emphasizes deployability and operational transparency, making it suitable for near real-time industrial environments. The framework is rigorously validated on a five-year digital-twin dataset that captures realistic operational variability, ensuring external validity and robustness. In addition, the study provides detailed guidance on threshold calibration, fault labeling strategies, and SCADA/cloud integration, bridging the gap between theoretical research and practical implementation. Collectively, these contributions position the proposed method as a scalable and industry-ready solution for predictive maintenance.
This paper addresses the challenge of predictive maintenance in industrial machinery by leveraging advanced machine learning techniques to improve early fault detection and operational reliability. The proposed approach integrates both unsupervised and supervised learning models, aiming to deliver a practical and scalable solution for real-world industrial environments. Building on the identified research gaps, the contributions of this work are explicitly aligned with those gaps and can be summarized as follows:
Introduces an unsupervised LSTM autoencoder trained exclusively on nominal operating conditions for early anomaly detection, addressing the challenge of limited labeled anomaly data.
Combines a univariate sequence-based anomaly detector with a compact MLP classifier operating on a small, physically meaningful feature set, resolving the trade-off between model complexity and practical deployability.
Validates the framework on long-horizon, timestamped data from a realistic digital twin environment, ensuring external validity and industrial relevance while considering constraints such as interpretability and near real-time operation.
In summary, this work proposes a novel hybrid predictive maintenance framework that combines LSTM-based anomaly detection with MLP-based fault prediction, validates the approach on realistic multivariate time-series data from a digital twin, and designs the system to be lightweight and interpretable for practical deployment in industrial settings.
This paper is organized as follows:
Section 2 presents the related work, providing an overview of traditional and modern approaches to anomaly detection and fault prediction in industrial systems.
Section 3 describes the proposed methodology, including the design of the LSTM autoencoder and MLP classifier, as well as data preprocessing steps.
Section 4 outlines the experimental setup and the digital twin environment used for data generation.
Section 5 presents the main results and discusses the performance of the proposed framework.
Section 6 offers a broader discussion of the findings, practical implications, and limitations, and suggests directions for future research.
Section 7 concludes the paper with a summary of key contributions.
2. Related Work
In recent years, fault detection and predictive maintenance for synchronous motors, particularly permanent magnet synchronous motors (PMSMs), have advanced significantly. Gherghina et al. [
14] provide a comprehensive review of the field, noting the increasing adoption of data-driven and hybrid approaches, the integration of sensor fusion, and the emergence of explainable AI. However, challenges such as data imbalance, non-stationary conditions, and limited real-world generalization persist.
Anomaly detection and fault prediction in industrial systems have evolved from traditional rule-based and statistical approaches to advanced data-driven methods. Classical techniques, such as observer-based models and residual analysis, are effective in well-characterized environments [
15,
16,
17], but often struggle with complex, high-dimensional data. In contrast, machine learning and deep learning approaches including autoencoders, convolutional neural networks (CNNs), and recurrent neural networks (RNNs) like LSTM, have shown strong capabilities in learning intricate patterns from sensor data. Autoencoders are widely used for unsupervised anomaly detection by using reconstruction error, while LSTM networks excel at modeling temporal dependencies in time-series data. Hybrid and generative models (e.g., generative adversarial networks (GANs) and variational autoencoders (VAEs)) further enhance detection performance [
18,
19,
20]. Despite these advances, challenges remain regarding model interpretability, data imbalance, and the need for large labeled datasets. Recent research emphasizes integrating signal processing, feature engineering, and adaptive frameworks to improve robustness and enable real-time predictive maintenance.
In practical terms, Vlachou et al. [
21] demonstrate the effectiveness of advanced ML techniques, including GAN-augmented PU learning and reinforcement learning, for robust, non-intrusive fault prediction in elevator PMSM drives. Their work underscores the importance of multimodal sensor integration and real-time data preprocessing for scalable predictive maintenance solutions.
Ahmed et al. [
22] introduced a smart anomaly detection system for industrial machines utilizing a deep autoencoder framework. Their approach uses vibration analysis as a primary tool for machine health monitoring and fault diagnosis, focusing on a public gearbox dataset collected from wind-turbine components. The proposed six-layer autoencoder model autonomously extracts salient features from high-dimensional vibration data, enabling effective detection of both known and previously unseen anomalies without the need for manual feature engineering or extensive preprocessing. Experimental results demonstrated that the framework achieved an overall accuracy of 91%, outperforming several traditional machine learning methods such as SVM, random forest, and k-nearest neighbors. This work highlights the potential of deep learning-based autoencoders for robust, scalable, and generalizable anomaly detection in industrial condition monitoring applications.
Do et al. [
23] developed an LSTM-Autoencoder for vibration anomaly detection in a Vertical Carousel Storage and Retrieval System (VCSRS). Their approach combines optimal sensor placement using correlation coefficient and Fisher information matrix methods with deep learning for robust monitoring. The LSTM-Autoencoder is trained to reconstruct normal vibration patterns; anomalies are detected when the reconstruction error exceeds a set threshold. This method achieved an accuracy of 97.7% in distinguishing normal from faulty states, demonstrating the effectiveness of combining advanced sensor strategies with deep learning for predictive maintenance in complex industrial systems. The study also highlights the importance of data preprocessing, feature extraction, and addressing class imbalance to ensure reliable anomaly detection in real-world applications.
Apeiranthitis et al. [
24] proposed a predictive maintenance approach for rotating machinery using a one-dimensional convolutional neural network (1D CNN) trained on raw vibration data from ball bearings. Their study focused on the maritime sector, where machinery reliability is critical for operational efficiency and safety. The authors demonstrated that their 1D CNN could accurately detect and classify various ball bearing faults using vibration signals acquired in a laboratory environment. The model achieved high classification accuracy, even when tested on previously unseen data, and was shown to be computationally efficient, making it suitable for real-time monitoring applications. This work highlights the potential of deep learning, and specifically CNN architectures, for effective fault diagnosis and predictive maintenance in complex industrial systems.
Sajjadi et al. [
25] present a comprehensive review of machine learning (ML) methodologies for prognostics and health management (PHM) in cyber-physical systems (CPSs). The authors propose a novel taxonomy that classifies existing research by fault types (physical and cyber), PHM stages, data characteristics, ML techniques, and performance metrics, providing a structured guide for selecting appropriate ML methods across the PHM lifecycle. The review highlights the increasing adoption of deep learning, including transformers and large language models, for tasks such as anomaly detection, fault diagnosis, and remaining useful life (RUL) prediction. Key challenges identified include the scarcity of real-world, labeled datasets, class imbalance, model generalization, and the need for explainable AI in safety-critical applications.
3. Model Architectures
3.1. LSTM Autoencoder for Anomaly Detection
Long Short-Term Memory (LSTM) networks are recurrent architectures designed to capture temporal dependencies through gated mechanisms (input, forget, and output gates) thereby effectively addressing vanishing and exploding gradient issues. This capability makes them well-suited for industrial time-series data, where current states are strongly influenced by recent operational history. Autoencoders (AEs), in turn, aim to reconstruct input sequences; when trained on nominal conditions, they reproduce normal patterns with minimal error, whereas deviations result in elevated reconstruction errors—a widely adopted principle in anomaly detection research [
26].
In industrial contexts, individual sensor signals such as vibration, temperature and pressure exhibit complex temporal dependencies that evolve over time and cannot be captured by static models operating on single snapshots of the data. In this work, the LSTM-based autoencoder is applied to one sensor channel at a time as a univariate time series (e.g., the vibration level), learning dynamic dependencies across multiple time steps and thereby modelling the normal sequential behaviour of that signal. The full set of six operational features (vibration, temperature, pressure, motor speed, torque and humidity) is instead used as a static feature vector by the MLP fault classifier described in
Section 4.4.
Although multiple correlated sensor signals are available, the LSTM Autoencoder was intentionally applied to a single high-sensitivity signal (vibration) in a univariate manner. Vibration signals are widely regarded as primary indicators of mechanical degradation in rotating machinery and are commonly used for early anomaly detection in industrial practice [
27,
28,
29,
30].
This design choice reduces model complexity, improves interpretability, and facilitates deployment in resource-constrained environments. Multivariate information is instead exploited in the fault prediction stage, where the MLP classifier processes a compact set of operational features to provide actionable fault diagnoses. While a multivariate LSTM Autoencoder could capture cross-sensor dependencies, such an extension would significantly increase architectural complexity and is therefore left for future investigation.
The proposed LSTM-based autoencoder compresses each input sequence into a latent representation and subsequently reconstructs it in the original signal space. This latent representation corresponds to the final hidden state vector produced by the LSTM encoder, which has lower dimensionality than the original sequence and summarizes its most salient temporal patterns. By constraining the information to pass through this compact internal code, the model is forced to learn meaningful features rather than simply copying the input. The reconstruction error serves as an unsupervised anomaly score: sequences exceeding a statistically derived threshold are classified as anomalous [
26]. This method provides a single interpretable metric per sequence, enabling efficient monitoring and alarm generation.
In many industrial predictive maintenance settings, explicit ground-truth labels for anomalous behavior are unavailable, incomplete, or unreliable, as abnormal events are rare and often not consistently documented. For this reason, the anomaly detection component in this study is formulated as an unsupervised early-warning mechanism rather than a supervised classifier.
Consequently, conventional classification metrics such as precision, recall, or false-alarm rate cannot be directly computed for anomaly detection. Model evaluation therefore relies on statistical properties of the reconstruction error distribution, including separation between nominal and abnormal sequences, sparsity and localization of threshold exceedances, and qualitative consistency with observable signal deviations. This evaluation strategy is widely adopted in unsupervised anomaly detection studies and reflects realistic industrial constraints where labeled anomalies are not available at deployment time.
The anomaly threshold is computed using the distribution of reconstruction errors on validation data, typically set as τ = μ + 3σ, where μ and σ denote the mean and standard deviation of reconstruction errors. This statistical approach ensures that the threshold is data-driven and minimizes false positives under normal conditions. In our implementation, the threshold was fixed using validation reconstruction errors prior to test-time evaluation, in order to avoid data leakage from the test set and maintain model integrity. The validation windows used to estimate the reconstruction error distribution and determine the anomaly threshold were selected as statistically stable operating periods and were not defined based on fault labels or rule-based criteria. These windows correspond to time intervals exhibiting consistent signal behavior without pronounced fluctuations, reflecting nominal operating conditions over the observed timescale. As a result, the derived threshold is purely statistical and data-driven, rather than fault-aware or label-dependent.
The autoencoder architecture shown in
Figure 1 consists of:
Input layer: Receives univariate time-series sequences of fixed length (e.g., 30 time steps, 1 feature).
LSTM layer: A single LSTM layer with 64 units and sequence output that learns a temporal representation of the input sequence.
Fully connected layer: A dense layer with 1 unit per time step that maps the LSTM outputs back to the reconstructed sequence (decoder stage).
Regression layer: A regression layer (MSE loss) that compares the input and reconstructed sequences and computes the reconstruction error for each sequence.
Training is performed using the Adam optimizer [
31,
32] with a learning rate schedule and gradient clipping to ensure stability. Hyperparameters such as sequence length, number of LSTM units, and batch size were selected based on empirical testing and literature benchmarks for industrial anomaly detection.
The selection of the sequence length and the number of hidden units for the LSTM Autoencoder was based on a combination of empirical experimentation and commonly adopted practices in time-series anomaly detection. A sequence length of 30 time steps was chosen in order to capture short- to medium-term temporal dependencies present in the sensor signals, while avoiding excessive model complexity and computational overhead. Preliminary experiments with shorter windows were found insufficient to adequately represent the system dynamics, whereas longer sequences did not provide a noticeable improvement in reconstruction performance.
Similarly, the number of hidden units was set to 64 as a compromise between representational capacity and generalization ability. Smaller configurations limited the model’s ability to learn normal operational patterns, leading to higher reconstruction errors on nominal data, while larger architectures increased the risk of overfitting without clear performance gains. The selected configuration was therefore validated empirically and is consistent with architectures commonly reported in the literature for LSTM-based anomaly detection in industrial time-series data.
This design offers several advantages. First, it operates in an unsupervised manner, eliminating the need for labeled anomaly data and making it particularly suitable for early warning applications in real-world environments where fault instances are scarce. Furthermore, the reconstruction error provides a single, interpretable scalar score, which simplifies integration into monitoring dashboards and enhances transparency. Finally, the lightweight architecture ensures scalability, allowing seamless deployment within SCADA systems or cloud-based platforms for real-time anomaly detection.
The rationale for this design lies in the ability of LSTM layers to capture short- and medium-term temporal dependencies typical of rotating machinery signals, while the autoencoder objective ensures accurate reproduction of nominal dynamics. The following section details the encoder–decoder configuration and the thresholding strategy employed within the proposed framework.
3.2. Multilayer Perceptron (MLP) for Fault Prediction
Multilayer Perceptrons (MLPs) [
33] are feedforward neural networks that map input features to target outputs through successive layers of linear transformations combined with nonlinear activation functions. This architecture is well-suited for modeling complex, nonlinear relationships between operational variables such as vibration, temperature, and pressure, and the occurrence of faults in industrial systems.
In the proposed framework, the MLP processes a feature vector extracted from each time window and outputs a scalar score, which is subsequently converted into a binary decision (fault or no fault) using a predefined threshold. Concretely, for each window the six core operational variables (
), recorded at the final time step are concatenated into a six-dimensional feature vector
that represents the instantaneous operating point of the machine. This decision threshold is typically set at 0.5 for binary classification tasks, ensuring a balanced trade-off between false positives and false negatives. Feature selection plays a critical role in this process and can be guided by operational relevance and statistical significance. In addition to manual selection, dimensionality reduction techniques such as Principal Component Analysis (PCA) have been widely employed in prior studies to enhance input representation and reduce redundancy [
34].
The MLP architecture (
Figure 2) implemented in this study consists of:
Input Layer: Six operational features, normalized using z-score transformation to ensure comparability across scales.
Hidden Layer: A single hidden layer with ten neurons employing the tansig activation function, chosen for its ability to capture nonlinear relationships while maintaining computational efficiency.
Output Layer: A single neuron with a logsig activation function, producing a continuous score that is thresholded to yield binary fault predictions.
Training was conducted using the Levenberg–Marquardt algorithm (trainlm) [
35,
36], which is well-suited for small to medium-sized networks due to its fast convergence properties. The dataset was partitioned into training (80%) and validation (20%) subsets using stratified sampling to preserve class distribution. Hyperparameters such as learning rate, maximum epochs, and early stopping criteria were optimized to prevent overfitting and ensure generalization.
The inclusion of an MLP offers several practical benefits. First, its lightweight architecture ensures rapid inference, making it suitable for real-time deployment in SCADA or cloud-based environments. Second, the supervised nature of the model enables actionable fault predictions by using labeled operational conditions, complementing the unsupervised anomaly detection stage. Finally, the simplicity of the network facilitates interpretability and integration into existing industrial monitoring systems without significant computational overhead.
4. The Proposed Methodology
To address the challenges identified earlier, this work introduces a practical two-level predictive maintenance framework designed for industrial machinery. The methodology integrates unsupervised anomaly detection and supervised fault prediction, applying advanced deep learning architectures to analyze complex operational data. By combining the strengths of LSTM autoencoders and multilayer perceptrons, the proposed approach aims to deliver robust early warning capabilities and actionable insights for maintenance planning. The following subsections detail the data sources, preprocessing steps, model architectures, and evaluation protocols that underpin the system’s development and validation. Within this framework, the LSTM autoencoder and the MLP classifier operate as two parallel, loosely coupled modules that process the same underlying data streams; they do not feed into each other directly, and their outputs are combined only at the decision level. To provide a unified view of the complete processing workflow,
Figure 3 presents a flowchart that summarizes all stages of the proposed two-level predictive maintenance framework. The diagram illustrates the entire data flow, from acquisition and preprocessing to anomaly detection and fault prediction, highlighting the parallel operation of the LSTM Autoencoder and the MLP classifier, and their fusion at the decision level. This visual representation supports the detailed explanations in the following subsections and clarifies the overall system logic and deployment workflow.
4.1. Data Sources
As shown in
Figure 3, the anomaly detection and fault prediction modules operate in parallel, processing the same data stream independently. Their outputs are only integrated at the decision layer, resulting in a modular system. This architecture allows each module to be optimized and deployed separately, while still ensuring that the overall predictive maintenance workflow remains coherent and easy to interpret. This study utilizes multivariate time series data generated within a digital twin (DT) environment that simulates industrial machinery operation over a five-year period (January 2019 to January 2024), with hourly sampling intervals. The dataset comprises 38 distinct features, including sensor measurements (vibration, temperature, pressure, humidity), operational parameters (such as motor speed and torque), machine health indicators, and environmental variables. Notably, the occurrence of fault events within the dataset is imbalanced, reflecting realistic industrial conditions [
37].
Two distinct datasets were prepared for the analysis to ensure proper model development and evaluation. The first dataset was designated for training and included labeled fault events, enabling supervised learning for fault prediction. The second dataset was reserved for testing and consisted of 8735 time-windowed sequences generated after applying preprocessing steps such as data cleaning, normalization, and sliding-window segmentation. This separation ensured that the models were trained on representative operational conditions while being validated on unseen sequences, supporting an unbiased assessment of anomaly detection and fault prediction performance.
The datasets were designed to emulate real factory conditions, capturing both normal and fault states, with varying severity levels. Each sample includes operational, environmental, and health indicators, enabling comprehensive modeling of machine behavior. The imbalanced distribution of faults presents a realistic challenge for classification and anomaly detection.
4.2. Data Preprocessing
To ensure data quality and consistency, several preprocessing steps were applied:
Prior to modeling, exploratory data analysis was performed, including visualization of key signals (e.g., vibration, temperature, pressure) to identify outliers, abrupt changes, and data gaps. This step informed the selection of features and the design of preprocessing procedures. The robust normalization approach was chosen for its resilience to outliers and heterogeneous data scales, ensuring stable model training.
Robustness to sensor noise and moderate operating variability was addressed through both preprocessing and model design. Robust normalization based on median and interquartile range statistics was applied to reduce sensitivity to outliers and transient fluctuations, while value clipping further limited the impact of extreme measurements.
Additionally, the use of sliding windows and sequence-based modeling enables the LSTM Autoencoder to learn temporal patterns rather than instantaneous values, inherently smoothing short-term noise. This windowed learning mechanism also contributes to partially reducing moderate sensor drift by continuously re-evaluating local temporal dynamics within each sequence. However, long-term sensor drift and severe non-stationary operating regimes were not explicitly modeled in this study. Addressing such effects would require adaptive thresholding or online model updates, which are identified as important directions for future work.
4.3. Anomaly Detection Using LSTM Autoencoder
An unsupervised anomaly detection module was implemented using a Long Short-Term Memory (LSTM) autoencoder [
26]. The model architecture consisted of a sequence input layer, an LSTM layer with 64 units (output mode set to ‘sequence’), a fully connected layer, and a regression output layer [
13,
23]. Training was conducted using the Adam optimizer with the following parameters: maximum epochs set to 100, initial learning rate of 0.005, gradient threshold of 1, and scheduled learning rate drops.
The decision rule for anomaly detection was based on the reconstruction error, calculated as the mean absolute error (MAE) per sequence on validation data representing normal operation. An anomaly threshold was established as μ + 3σ from the distribution of validation errors and fixed prior to evaluation on the test set. Sequences exceeding this threshold were identified as anomalies [
11,
12]. Outputs from this stage included a comprehensive anomaly report and visualizations of reconstruction errors and anomaly overlays.
The LSTM Autoencoder was specifically trained to reconstruct sequences of vibration-level data, using the temporal dependencies inherent in industrial signals. The choice of sequence length (30) and hidden units (64) was guided by empirical testing and literature benchmarks for time series anomaly detection. The model’s unsupervised nature allows it to detect deviations without requiring labeled anomaly events, making it suitable for early warning in real-world deployments. Detected anomalies are further analyzed by plotting their temporal distribution and overlaying them on the original vibration signal, facilitating interpretation and validation by domain experts.
4.4. Fault Prediction Using Multilayer Perceptron (MLP)
For supervised fault prediction, a multilayer perceptron (MLP) classifier was implemented to distinguish between normal and abnormal operating conditions. The target variable, fault diagnosis, was defined as a binary indicator indicating the presence or absence of a fault based on predefined operational thresholds for key parameters such as vibration, temperature, and pressure. The fault labels employed in this study are derived from predefined operational thresholds applied to vibration, temperature, and pressure signals. As such, these labels do not correspond to independently annotated failure events, but rather encode rule-based operational limits commonly used in industrial monitoring practice. Consequently, the role of the MLP classifier is not to discover novel fault patterns or infer failures beyond these rules. Instead, it aims to learn a smooth, data-driven approximation of the underlying deterministic decision logic, enabling the assessment of rule consistency and their reproducibility through a compact neural model. These thresholds were established based on industry best practices to ensure that the classification reflects realistic safety conditions and supports actionable maintenance decisions.
The input space consisted of six operational features representing key machine and environmental parameters. All features were normalized using a z-score transformation [
10], with statistics computed across the full dataset to maintain consistency. Although the original digital twin dataset contained 38 heterogeneous features, six core operational variables, namely vibration, temperature, pressure, motor speed, torque and humidity, were used as inputs to the MLP fault classifier. The choice was guided by domain knowledge in condition monitoring and preliminary exploratory analysis. Vibration, temperature and pressure serve as primary indicators of mechanical and thermal stress, whereas motor speed and torque reflect the machine’s operating point and load; humidity accounts for environmental factors that may accelerate degradation. Other variables, such as maintenance logs, alarm triggers, health indices, anomaly scores or fault probabilities were not included, because they often depend on derived or historical data and may not be consistently available in real time for new installations. By using a compact, physically meaningful subset reduces input dimensionality, limits overfitting under class imbalance, and maintains interpretability, an approach aligned with recent vibration-based predictive maintenance studies employing deep learning architectures [
34].
The MLP architecture consisted of a single hidden layer with ten neurons using tansig activation function and a single linear output neuron; and an output layer with a logsig activation function. Binary labels were obtained by applying a decision threshold of 0.5 to the output score. Model training was conducted using the Levenberg–Marquardt algorithm (trainlm), with dataset partitioned into 80% for training and 20% for validation (through cvpartition function) to ensure balanced representation and reliable performance assessment.
Among the most widely used evaluation measures are accuracy, precision, recall, and the F1-score. Accuracy quantifies the proportion of correctly predicted instances relative to the total number of observations, offering an overall measure of performance. Precision reflects the proportion of correctly predicted positive instances among all instances classified as positive, making it particularly relevant when false positives carry a high cost. Recall (or sensitivity) measures the proportion of actual positive instances correctly identified by the model, which is critical when minimizing false negatives is a priority. Finally, the F1-score, defined as the harmonic mean of precision and recall, provides a balanced metric that accounts for both types of misclassification. These metrics provide a comprehensive assessment of classifier performance, offering deeper insight into the trade-off between false alarms and missed detections compared to accuracy alone.
Performance evaluation relied on the confusion matrix and derived metrics. True positives (TP) represent correctly identified fault samples, while true negatives (TN) correspond to correctly classified normal samples. False positives (FP) indicate normal samples incorrectly identified as faults and false negatives (FN) denote missed fault cases.
The mathematical formulations for these metrics are presented in Equations (1)–(4).
By constructing fault labels based on practical safety thresholds, the proposed approach delivers actionable predictions that support proactive maintenance planning. Furthermore, the trained MLP was prepared for integration into digital monitoring systems, enabling both offline analysis and potential real-time deployment within SCADA or cloud-based environments.
The proposed framework was explicitly designed to support lightweight execution and continuous operation on time-indexed industrial data streams. Both the LSTM Auto-encoder and the MLP classifier adopt compact architectures, consisting of a single recur-rent layer and a shallow feedforward network, respectively. Model training is intended to be performed offline or periodically, while inference operates on fixed-size sliding windows and can be executed independently for each incoming time window.
To provide an indication of computational efficiency, training and inference times were monitored during experimentation in a MATLAB environment using a single CPU. Training of the LSTM Autoencoder completed within approximately 18 min for 100 epochs, while per-window inference for both the anomaly detection and fault prediction stages required only a few milliseconds on standard industrial hardware. These results indicate that the proposed framework is suitable for near real-time monitoring and decision support.
It should be noted that these runtime values are indicative and depend on the specific hardware and software configuration. Nevertheless, the observed execution times, together with the compact model design and fixed temporal windowing, support the practical deployability of the framework in SCADA, edge, or cloud-based industrial environments.
5. Simulation Results
The source dataset used in this study is publicly available at
https://www.kaggle.com/datasets/datasetengineer/indfd-pm-dt (accessed on 5 September 2025). All data employed in our experiments were drawn from this source. We first partitioned the original dataset into separate training and test subsets, resulting in a training set with 35,060 records and a test set with 8765 records. For the LSTM autoencoder, these time series were further transformed into overlapping windows of 30 time steps, yielding 35,030 training sequences and 8735 test sequences. For the MLP classifier, 80% of the training data (28,048 samples) was used for model training and 20% (7012 samples) for validation.
Experimental procedures and analyses were carried out in MATLAB R2023a. The LSTM autoencoder and MLP models were implemented using the Deep Learning Toolbox, while preprocessing, normalization, and data partitioning were supported by the Statistics and Machine Learning Toolbox. Additionally, the computational environment also included advanced visualization tools, enabling efficient experimentation and clear interpretation of the results.
5.1. LSTM Autoencoder—Anomaly Detection
The anomaly detection phase focused on the vibration time series, utilizing an LSTM Autoencoder trained exclusively on sequences representing normal machine operation. Before model training, the dataset underwent comprehensive preprocessing, including the removal of missing or invalid entries and the application of robust normalization based on median and interquartile range statistics to reduce the influence of outliers. Time series were then transformed into overlapping sequences of uniform length, ensuring the preservation of temporal dependencies critical for effective anomaly detection.
Model architecture comprised a sequence input layer, an LSTM layer with 64 units, a fully connected output layer, and a regression layer to compute reconstruction error. Training was performed using the Adam optimizer, with hyperparameters selected to balance learning stability and computational efficiency.
Anomaly identification was based on the mean absolute error (MAE) between original and reconstructed sequences. The anomaly threshold was statistically defined as τ = μ + 3σ, where μ and σ denote the mean and standard deviation of reconstruction errors on the validation set. In this study, the threshold was set to τ = 1.26 × 10
−3. Applying this threshold to the test data, the model classified 129 sequences as threshold exceedances (anomalous excesses), corresponding to sparse and non-contiguous deviations from normal behavior (
Figure 4).
A zoomed view (
Figure 5) revealed that these exceedances were clustered around specific regions, notably between indices approximately 2050 and 2400. Overlaying the detected anomalies on the raw signal (
Figure 6) confirmed that the identified events were localized and not attributable to general signal variability. These threshold exceedances do not necessarily correspond to actual fault events, but rather indicate statistically significant deviations from the learned normal behavior of the system. A sample excerpt of the anomaly log is presented in
Table 1, which lists representative anomalous sequences together with their sample index, reconstruction error (MAE × 10
−3), and corresponding vibration value.
5.2. MLP—Fault Prediction
For fault prediction, a Multi-Layer Perceptron (MLP) classifier was developed to distinguish between normal and faulty machine states. The labeled dataset exhibited a pronounced class imbalance, with 87.74% of samples representing normal operation and 12.26% indicating faults. Fault labels were assigned based on practical operational thresholds for vibration, temperature, and pressure, reflecting realistic industrial criteria.
The MLP model was constructed using six input features representing key operational parameters: vibration (Vibration_Level), temperature (Temperature_Readings in degrees Celsius), pressure (Pressure_Data in Pascals), humidity (Humidity_Levels in percentage), motor speed (Motor_Speed in RPM), and torque (Torque_Data in Nm). Each feature was normalized via z-score transformation to ensure comparability across scales. The network architecture consisted of a single hidden layer with ten neurons employing the tansig activation function and an output layer with a logsig activation function. Training was performed using the Levenberg–Marquardt algorithm with the dataset partitioned into training and validation subsets to ensure unbiased performance evaluation.
On the validation set, the MLP achieved an accuracy of 99.93%, as shown by the confusion matrix, which indicated minimal misclassifications (TN = 6155, FP = 4, FN = 1, TP = 852), as shown in
Figure 7. It should be emphasized that this high classification accuracy reflects the model’s ability to consistently reproduce the predefined rule-based fault labeling scheme. Therefore, the reported performance should be interpreted in the context of rule consistency learning rather than independent fault classification against externally validated ground truth data. In practical terms, only four normal samples were incorrectly classified as faults (false positives) and one true fault was missed (false negative). These results translated into class-specific metrics of 99.53% precision (proportion of predicted faults that are actual faults), 99.88% recall (proportion of true faults correctly detected), and F1-score of 99.70%, representing the harmonic mean of precision and recall.
To further account for class imbalance, additional evaluation metrics were derived from the confusion matrix. The specificity (true negative rate) reached 99.94%, indicating an extremely low false-alarm rate. The balanced accuracy, defined as the average of recall and specificity, was 99.91%, confirming consistent performance across both majority and minority classes. Finally, the Matthews correlation coefficient (MCC) achieved a value of 0.999, reflecting near-perfect agreement between predicted and true class labels even under pronounced class imbalance.
5.3. Stability Analysis Across Data Splits and Sequence Length
To evaluate the robustness and stability of the fault prediction model, additional experiments were conducted across multiple data splits and sequence-length configurations. The full dataset was divided into five different train–test splits, each maintaining the original class imbalance. For each split, the MLP classifier was trained and tested using the same hyperparameters. In addition, the LSTM autoencoder was tested with sequence lengths of 20, 30, and 50 time steps to assess how sensitive the model is to the choice of temporal window
Across a total of ten experimental runs, the MLP classifier achieved high performance, showing very little variation across different splits and sequence lengths.
Table 2 presents the mean and standard deviation of the evaluation metrics from all runs. These results indicate that the framework’s performance does not depend on a particular data partition or sequence length, demonstrating strong generalization and robustness in realistic scenarios. In particular, both the Matthews correlation coefficient (MCC) and F1-score remained close to one, highlighting the classifier’s reliability even when the data is highly imbalanced.
These performance metrics, summarized in
Table 2, along with the threshold-based anomaly-detection results of the LSTM Autoencoder, demonstrate that the proposed classifier maintains exceptionally low false-alarm rates and an almost negligible probability of missing true faults.
Table 3 reports the key performance indicators for both components of the predictive maintenance framework, reporting the anomaly-detection outcome of the LSTM Autoencoder (number of anomalies and threshold τ) along with the classification metrics of the MLP fault predictor (accuracy, precision, recall and F1-score).
5.4. Training Behavior and Generalization
To evaluate the stability and generalization of the proposed models, training dynamics were examined in detail. As shown in
Figure 8, the loss curves for training, validation, and test sets decreased smoothly without abrupt fluctuations, indicating stable convergence and minimal risk of divergence. The lowest validation mean squared error (MSE) occurred at epoch 40, after which early stopping was applied at epoch 46 to prevent overfitting.
Error histograms for all data splits were tightly centered near zero with a narrow spread (
Figure 9), and regression plots revealed near-identity fits (correlation coefficient R ≈ 0.99), as shown in
Figure 10.
Figure 11 illustrates the training state, highlighting the progression of gradient decay, the schedule for the learning rate mean (LM μ), and the implementation of validation checks, which led to early stopping at epoch 46 to prevent overfitting. These results underscore the models’ capacity to accurately capture underlying data patterns [
6,
40].
Error histograms in
Figure 9 reveal that prediction errors across all data splits were tightly clustered around zero, with a narrow distribution, confirming strong consistency and low variance. Complementing this,
Figure 10 presents regression plots that exhibit near-perfect correspondence between predicted and observed values, with a correlation coefficient of approximately R ≈ 0.99, indicating strong model calibration and predictive accuracy.
Finally,
Figure 11 provides a comprehensive view of the training state, illustrating the gradual decay of gradients, the adaptive learning rate schedule (LM μ), and the validation checkpoints that triggered early stopping. Collectively, these visualizations underscore the models’ ability to capture underlying data patterns accurately while maintaining robustness against overfitting [
6,
40].
6. Discussion
This study assessed a two-level predictive maintenance framework integrating unsupervised anomaly detection with an LSTM Autoencoder and supervised fault prediction using an MLP. Consistent with our working hypothesis that reconstruction error can effectively signal deviations from learned normal behavior while a lightweight classifier can map operating features to fault risk, the LSTM Autoencoder identified 129 anomalous sequences on the DT-derived vibration signal using a validation-fixed threshold (τ = μ + 3σ; e.g., τ ≈ 1.26 × 10−3 MAE). These anomalies represented localized deviations rather than systemic drift, accounting for roughly 1.48% of the test set (129 out of 8735 sequences).
Complementing this, the MLP achieved 99.93% accuracy (TN = 6155, FP = 4, FN = 1, TP = 852) with precision, recall, and F1-score values of 99.53%, 99.88%, and 99.70%, respectively. These results indicate that a compact feedforward model with six operational inputs can reliably distinguish fault from non-fault conditions.
The findings align with prior research demonstrating that sequence-based autoencoders trained on nominal operating conditions can detect anomalous segments without labeled data, while shallow discriminative models perform strongly when features encode operational conditions [
5,
11,
12]. Compared to unsupervised approaches such as one-class SVMs or isolation forests for unsupervised monitoring, the LSTM Autoencoder uses temporal dependencies and provides a clear anomaly score (MAE), enhancing interpretability [
11,
12,
26]. On the supervised side, performance is comparable to gradient-boosting and random forest baselines reported on similar digital twin benchmarks [
41], while offering a more deployment-friendly architecture with reduced computational complexity [
10].
It should be noted that the exceptionally high classification performance of the MLP stems from the deterministic nature of the fault-labeling process. Because the labels were generated via fixed operational thresholds applied to vibration, temperature, pressure, motor speed, torque, and humidity, the classifier’s task reduces to learning a smooth approximation of these deterministic boundaries. Consequently, the reported metrics should be understood as an indication of rule-consistency rather than evidence that the model can generalize to more complex, noisy, or implicitly defined fault mechanisms.
From an operational perspective, the proposed framework offers practical benefits. The Autoencoder can serve as a front-end monitor to triage time windows for inspection and log correlation, and the MLP provides actionable fault probabilities for maintenance scheduling. Both models are lightweight and suitable for integration into SCADA systems or cloud-based platforms, supporting near real-time monitoring and decision support. Furthermore, the use of a validation-fixed threshold and calibrated probabilities can reduce false alarms, minimize diagnostic workload, and enable timely interventions.
The proposed framework was designed with near real-time industrial deployment in mind. Both the LSTM Autoencoder and the MLP classifier adopt lightweight architectures, consisting of a single recurrent layer and a shallow feedforward network, respectively. This design ensures low inference complexity and minimal memory footprint, making the models suitable for execution in SCADA, edge, or cloud-based environments.
Model training is intended to be performed offline or periodically, while inference can be executed online on streaming data. Although exhaustive runtime benchmarking and detailed memory profiling were beyond the scope of this study, the compact model configurations indicate low-latency inference suitable for near real-time monitoring on standard industrial hardware. Detailed runtime profiling in live streaming environments is identified as future work. In addition, indicative CPU-based execution times measured during MATLAB experimentation (
Section 4.4) further support near real-time suitability, with inference executed in the order of milliseconds per time window.
Several limitations should be acknowledged. First, the reported anomaly counts correspond to statistically defined threshold exceedances in the reconstruction error rather than independently verified failure events; consequently, classical false-alarm and miss rates cannot be precisely quantified in the absence of labeled anomaly ground truth. Second, the use of a fixed anomaly threshold defined as μ + 3σ, while conceptually sound and widely adopted, may be suboptimal under non-stationary operating conditions, indicating the potential need for adaptive or context-aware thresholding strategies. Third, the supervised fault labels employed for MLP training are derived from rule-based operational criteria rather than confirmed fault annotations, which may lead to optimistic performance estimates compared to noisy real-world deployments. Fourth, the evaluation was conducted in an offline digital twin environment, and therefore factors such as communication latency, sensor drift, and data quality issues inherent to live streaming industrial systems were not explicitly assessed. Finally, although sequence length and feature selection were guided by empirical testing and domain knowledge, a systematic sensitivity analysis of these design choices was beyond the scope of this study and constitutes an important direction for future work [
34].
Future research should focus on several promising directions to further enhance and extend this work. Adaptive or dynamic thresholding techniques will be explored to improve stability under changing operating conditions. Development of multivariate and attention-based autoencoders is planned to better capture cross-sensor dependencies and strengthen anomaly detection capabilities. Incorporating real failure events and modeling multiple fault severity levels will increase the practical relevance of the system, while cost-sensitive operating points will be investigated to align predictions with maintenance priorities. Finally, an online architecture integrated with SCADA/PLC systems and real-time dashboards will be prototyped to enable smooth deployment in industrial environments.
7. Conclusions
In this study, we developed and evaluated a two-level predictive maintenance framework for industrial machinery, combining an LSTM autoencoder for unsupervised anomaly detection with an MLP for supervised fault prediction. The system was tested on multivariate time-series data generated from a digital twin environment, simulating five years of industrial operation. The results showed that the LSTM autoencoder was able to identify localized anomalies effectively, while the MLP classifier achieved very high accuracy, precision, and recall in distinguishing between normal and faulty machine states.
The innovation of this work lies in its hybrid approach, which applies both unsupervised and supervised learning to address the challenges of predictive maintenance. By using an LSTM autoencoder, the system can detect subtle deviations from normal behavior without requiring labeled anomaly data, making it suitable for early warning applications. The MLP classifier, on the other hand, provides actionable fault predictions based on operational features, supporting timely and informed maintenance decisions. Together, these models offer a lightweight and interpretable solution that can be integrated into near real-time industrial monitoring systems.
The results provide strong evidence for the effectiveness of the approach. The metrics indicate that the proposed framework can reliably distinguish between normal and faulty operating conditions, maintaining both a low false-alarm rate and a negligible probability of missing true faults. These findings confirm that deep learning models, particularly those using temporal dependencies and feature-based classification, are highly effective for predictive maintenance. The hybrid architecture introduced here advances the field by combining the strengths of unsupervised anomaly detection and supervised fault prediction in a lightweight, interpretable, and deployable solution.
In summary, this work demonstrates that a hybrid deep learning approach can provide a robust and practical solution for predictive maintenance in industrial machinery. The results offer a strong foundation for further research and real-world implementation, with the potential to improve maintenance strategies and operational efficiency across a wide range of industrial applications.