In the field of diagnostics, the quality of data plays a crucial role in the accuracy and reliability of the methodologies. Signals collected by sensors often contain noise resulting from many factors. Thus, the use of appropriate filters to remove noise helps to improve the quality of the signals, ensuring that the input data to diagnostic algorithms is clean, smooth, and informative. By reducing noise and preserving the basic characteristics of the signals, key features important for detecting potential anomalies are extracted. This chapter focuses on the proposed predictive maintenance methodology which was deployed on a fully operational elevator system installed in a residential apartment building. Unlike a laboratory-scale testbench, this installation reflects the real conditions of elevator operation, with unpredictable loading profiles, variable trip frequencies, and natural signal disturbances. The proposed methodology includes the description of the elevator installation for data collection through smart sensors, preprocessing of the signals using appropriate and effective filters, and the extraction of significant indicators by studying the signal in the time and frequency domain. In this installation, data collection is performed using a network of non-intrusive smart sensors compliant with international standards. The entire signal acquisition and preprocessing pipeline operates in real-time, ensuring minimal latency and accurate temporal correlation across sensor modalities. This infrastructure supports a non-intrusive diagnostic process, fully compatible with the manufacturer’s warranty and safety restrictions, as no access to the inverter firmware or motor internals is required.
3.2. Data Processing and Transmission
The next step of our proposed methodology is the preprocessing of the collected data based on the experimental setup. In the preprocessing phase the combination of two filters—the Gaussian and Extended Kalman Filter (EKF)—was used due to their significant advantages. The Gaussian Filter is used to remove noise at high frequencies, smoothing the signal by preserving the basic characteristics of the signal and enhancing feature extraction such as RMS and FFT. The Gaussian Filter’s impulse response is given by:
where σ is the standard deviation.
The filtered signal
(t) is calculated as the convolution of the collected data from the measurements (signal s(t)) with the Gaussian Filter impulse response g(t):
Discrete signals are calculated as follows:
where N is the length of the filter’s impulse response.
The Extended Kalman Filter (EKF) is suitable for nonlinear systems when the model’s objective is the accuracy of results. The EKF can linearize the system’s nonlinearities through the Jacobian matrix (current, voltage, and acceleration) in order to study the dynamic behavior of the system. The system’s state is described by the expression:
where
is the system state at time step k,
is the system input, and
is the process noise.
The measurement equation:
where
is the measurement vector at time step k,
(
is the nonlinear function that describes how the state is mapped to the measurements, and
is the measurement noise.
The State Prediction is defined as follows:
where
is the Jacobian matrix in dynamic operation and Q is the process noise configuration.
Gaussian Filtering was applied with a standard deviation of σ = 0.75, which was empirically selected to optimize the trade-off between noise suppression and the preservation of essential signal features. The Gaussian Filter was used to smooth the raw vibration and current signals, reducing high-frequency measurement noise without distorting the underlying waveform shape. Subsequently, the EKF was applied to estimate the system’s dynamic state evolution and further refine the filtered signal trajectories. EKF tuning was performed offline using representative datasets from both healthy and faulty operating conditions, configuring the process noise covariance matrix Q = diag(0.01, 0.01, 0.005) and the measurement noise covariance matrix R = diag(0.05).
The combined use of Gaussian Filtering and EKF showed superior performance over classic filtering techniques such as median filtering and Low-pass Butterworth Filters. Specifically, the proposed approach improved the F1-score by 3–7%, as it preserved transient characteristics and anomalies critical for accurate fault detection, while minimizing both high-frequency noise and smoothing-induced distortion. This improvement is attributed to the method’s ability to suppress irrelevant noise while preserving transient events and diagnostic anomalies. By retaining key fault-related signal features, the proposed two-stage filtering approach enhances the robustness and reliability of downstream fault classification models.
The next step involves the application of Short-Time Fourier Transform (STFT), which is one of the most important tools for analyzing the time-frequency characteristics of signals. STFT provides a way to examine the variation in a signal’s spectrum in time, making it suitable for detecting dynamic phenomena and transient faults. The ability to monitor the signal spectrum in real time offers accurate and efficient monitoring of the frequency sidebands. The STFT of a signal x(t) is defined as:
where Χ (τ, f) is the short-time–frequency spectrum, x(t) is the signal of analysis,
is the window function which defines the local time interval, f is the frequency, and t is the time.
The analysis of the indicators provides quantitative information on the overall operation of the engine and is vital for fault detection. These are indicators in frequency bands as well as statistical indicators. The total energy and mean frequency of a signal in a specific frequency band from
to
is calculated as:
where
is the frequency of i-th sample in the frequency spectrum,
is the power spectral density or the magnitude of the spectral component at frequency fi, and N is the total number of samples (frequencies) in the spectrum.
Crest factor measures the maximum peak energy relative to the average energy:
Another critical indicator for the detection of transient damage and vibration characteristics is kurtosis:
where
are the signal values,
is the average signal value, and N is the number of samples.
Skewness is an indicator that measures the asymmetry of the distribution of data around the mean value. Indications of asymmetry can reveal mechanical faults, such as rotor imbalance or magnetic asymmetries:
Entropy measures the complexity or randomness of the signal.
where
is the probability of
.
Principal Component Analysis (PCA) reduces dimensionality by projecting data onto components that capture the maximum variance value. Mutual Information (MI) quantifies the dependency between two variables, identifying features most relevant to the target output. Min–Max and Z-Score Normalization are techniques which ensure that features are on the same scale to a fixed range, typically [0, 1] or [1, 1]. This process is critical for preventing the dominance of features with larger ranges over others in model training.
Before the data feature matrix is extracted, the dataset must be balanced to avoid either undersampling or oversampling. Data balancing is an important process in machine learning problems where there is an imbalance in the categories of the dataset. This occurs when one category (e.g., positive or negative) is much less frequent than the other, which can affect the performance of the model. The appropriate combination of techniques such as the Synthetic Minority Oversampling Technique (SMOT) or Adaptive Synthetic Sampling (ADASYN) for the minority and Tomek Links or Cluster-Based Undersampling for the majority achieves balance without overfitting or loss of information.
The final processed dataset is a feature vector table where each sample must have a label. The label is y = 1 for fault-positive samples only, while the largest percentage of the vector samples is undefined and contains both normal (healthy) and faulty states, making it difficult to train a supervised classifier directly. The integration of PU machine learning aims to train the model to identify undefined signals and classify them accordingly as positive or negative.
3.3. Model Training and Fault Classification
The setup of the problem has two sets: a set of positively labeled samples P and a set of unlabeled samples U, which may contain both positive and negative instances. The goal is to train an initial classifier f(Z) that predicts whether a new sample z is positive or negative.
The use of encoder–decoder architectures in PU learning is a powerful tool that enhances the extraction of meaningful representations from complex, high-dimensional data, such as spectrogram patterns derived from STFT.
The encoder maps the input high-dimensional data X into a low-dimensional latent space Z.
where
Xn×d is the n samples and d features input data;
Zn×k is the latent representation, where k < d;
θe represents the trainable parameters of the encoder.
The encoder compresses the information in X by learning important patterns or features.
The decoder reconstructs the input X from the latent representation Z.
where
is the reconstructed version of the original input;
θd represents the trainable parameters of the decoder.
To ensure that Z retains the essential features of X, the reconstruction loss, typically the Mean Squared Error (MSE), is minimized.
The latent space representation Z from the encoder is passed to the PU classifier to estimate the probability of a sample being positive p(y=1|Z).
This probability can be expressed as:
where
s represents whether a sample is labeled (s = 1) or unlabeled (s = 0);
is the propensity score, indicating the likelihood of a sample being labeled given its features.
By modeling
and using the labeled data, the classifier f(Z) can infer the true class probabilities.
where
W and b are the weights and biases of the PU classifier;
ψ is the activation function.
The Rectified Linear Unit (ReLU) activation function is used in encoder–decoder architectures due to its simplicity and effectiveness in addressing the vanishing gradient problem and in introducing nonlinearity and sparsity. It is defined as:
ReLU is applied to the hidden layers in both the encoder and decoder to activate the meaningful features only:
The expected risk R(f) for the classifier f(Z) is defined as:
where
is the binary cross-entropy loss function.
Since y is not directly available for unlabeled samples, the risk is decomposed as:
where
is the estimated proportion of positive samples in the unlabeled dataset U;
𝔼Z~P is the risk for positive samples;
𝔼Z~U is the risk for negative samples;
Approximated using the unlabeled data.
So the decomposed risk estimation is:
And the total loss function combines the risk and reconstruction losses
where L
PU is derived from the risk estimation and λ is a regularization parameter to balance reconstruction and classification.
The encoder–decoder is trained to minimize the Lreconstruction ensuring that Z captures meaningful features. The classifier is trained with Z as the input to the PU classifier to minimize the LPU using the decomposed risk function. Optimization is achieved by fine tuning both components to minimize the total loss function. The model performs an initial estimation and classification of fault diagnosis and anomaly detection at the output, where the data exhibits nonlinear and high-dimensional features even with limited labeled data.
Reinforcement Learning (RL) is used to refine the model’s decision-making process and to optimize its performance before applying data augmentation. By integrating RL after PU learning, the system learns to make optimal decisions based on feedback from its environment, improving its ability to classify or predict outcomes effectively.
In the RL framework, the model operates in an environment characterized by the state (st), the actions (at), the reward (rt), and the policy (π(α|s)). The goal is to learn a policy π(α|s) that maximizes the cumulative reward over time.
The state includes all the information, such as the latent representation derived from the PU learning output, the load information (L
t) of the elevator, and the direction indicator (D
t) of ascent or descent.
where
Lt is the normalized load as percentage of the elevator’s maximum capacity;
D
t is the indicator of the movement direction modeled as a binary variable.
Actions (a
t) are decisions made by the RL agent, including modifying decision thresholds according to normalized load and movement conditions, prioritizing fault diagnosis, and addressing increased loads or changes in operating conditions.
The reward (r
t) is designed to encourage behavior that improves classification accuracy, reduces misclassification, or enhances robustness and optimization while considering load and direction conditions.
where
a,
b, and
c are weighting factors.
The policy π(α|s) is modeled using a Neural Network with parameters θ
π
where
and
are trainable weights and biases while softmax ensures the output is a valid probability distribution over actions.
The objective in RL is to maximize the expected cumulative reward
.
where
is the discount factor, prioritizing immediate rewards over distant ones.
The policy is optimized to maximize the expected reward J(
) using the gradient method.
The value function
estimates the expected return starting from the state s under the policy π.
The advantage function
quantifies the benefit of taking action a in state s, compared to the average performance.
where
is the action-value function
The training process begins with the initialization of the policy network
, which is set with random weights. The state
t is defined using the encoder–decoder’s latent space Z derived from PU learning. During exploration, actions a
t are sampled from the policy π(α|s; θπ) to interact with the environment and collect rewards r
t. The policy is updated by computing the gradient of the policy objective using the policy gradient theorem:
Additionally, a value network
is trained to approximate the value function
by minimizing the loss function L
value.Optimization is carried out by performing gradient ascent to update for the policy network and gradient descent performed on Lvalue to update parameters for the value network.
Data augmentation using Generative Adversarial Networks (GANs) involves two main components: a generator and a discriminator. The generator creates synthetic data, while the discriminator evaluates the authenticity of the data by distinguishing between real and synthetic samples.
The generator takes random noise z~ℕ(0, 1) sampled from the latent space as the input and maps the data space Xaug producing synthetic samples . The generator’s objective is to create realistic data that can fool the discriminator. The discriminator receives both real data x and synthetic data x′ as the input. It outputs a probability or indicating whether the data is real or synthetic .
So, the generator and discriminator have competing objectives. The discriminator aims to maximize the GAN loss function
LD by correctly classifying real and synthetic data, while the generator aims to minimize
LG by producing data that maximizes the discriminator’s classification probability.
The training loop for GANs involves iterative updates to the generator and discriminator to achieve adversarial optimization. In each iteration, the discriminator is updated by maximizing its ability to distinguish between real and synthetic data, adjusting its parameters
using the gradient ascent rule with a learning rate
. Conversely, the generator is updated by minimizing the likelihood of the discriminator correctly identifying synthetic samples, adjusting its parameters
using the gradient descent rule with a learning rate
. This adversarial optimization alternates updates between
and
to maintain balance, ensuring neither the generator nor the discriminator becomes too dominant.
This dynamic interaction drives the generator to produce increasingly realistic synthetic data while challenging the discriminator to refine its classification performance. GAN training reaches convergence when the discriminator’s output approximates for all samples.
The augmented dataset is represented as (Xaug, Yaug), where Xaug = {x1, x2, …, xn}; each xi is a feature vector, and Yaug = {y1, y2, …, yn} contains the corresponding labels. The dataset is split into training and validation subsets, (Xtrain, Ytrain) and (Xval, Yval), and is used to train multiple predictive models, including Random Forest (RF), Support Vector Machine (SVM), and Deep Neural Networks (DNNs).
A Random Forest consists of T decision trees trained on different bootstrap samples of the training data. For an input xi, each tree ht outputs a class label ŷt.
The final prediction is obtained by majority voting:
The object of training is for each tree to minimize the entropy for the splitting nodes:
where
pc is the proportion of class
c in the split.
The class c represents one of the possible categories or labels in the classification problem. Specifically, c refers to a distinct class within the set of all possible classes in the dataset. Entropy quantifies the impurity of the node, measuring how mixed the classes are. A lower entropy indicates a purer split, meaning the samples in the node predominantly belong to a single class.
SVM aims to find a hyperplane
that separates the classes with the maximum margin. The optimization problem is formulated as:
where
is a regularization parameter.
For nonlinear separable data, a Kernel function
maps the data to a higher-dimensional space, enabling separation in the transformed space:
For a new input
x, the decision function is:
where
are the support vector coefficients.
A DNN consists of L layers, including input, hidden, and output layers.
The output of each layer
l is computed as:
where
and
are the weight matrix and bias vector, and ReLU is the activation function
.
For classification, the final layer applies a softmax activation:
where
is the raw output for class i.
The cross-entropy loss is minimized to train the network:
where
is the true label and
is the predicted probability.
The DNN parameters are updated for optimization using Adaptive Movement Estimation (Adam), which is suitable for various machine learning problems, including large-scale and sparse datasets.
The integration of PU learning, GANs, and RL forms a collaborative mechanism to handle practical elevator conditions. PU learning enables classification from positive and unlabeled data, GANs generate synthetic rare-fault examples, and RL dynamically adjusts thresholds based on load and direction. Feature extraction techniques include RMS, THD, kurtosis, FFT, harmonic amplitudes, crest factor, and skewness. The fault types identified include winding asymmetries, load imbalance, and mild mechanical misalignment.
3.4. Model Evaluation and Database Update
Randomly sampled data is preprocessed to ensure consistency and compatibility with each trained model. The preprocessed test data is evaluated against the trained model to produce predictions compared to the true labels. These predictions are analyzed using multiple metrics to determine the model’s performance.
where TP, TN are true positive and negative and FP, FN are false positive and negative.
The F1-score is a performance metric for classification models that represents the harmonic mean of Precision and Recall. It is particularly useful in cases of imbalanced datasets, where accuracy alone may not provide an accurate assessment of the model’s effectiveness. The F1-score balances false positives and false negatives, ensuring a trade-off between Precision (the proportion of true positive predictions among all predicted positives) and Recall (the proportion of true positive predictions among all actual positives). A high F1-score indicates a well-performing model in terms of both Precision and Recall.
The Receiver Operating Characteristic Area Under Curve (ROC-AUC) score evaluates the model’s ability to distinguish between classes:
where
is the true positive rate and
is the false positive rate.
For graphical analysis, visual tools are used, such as the confusion matrix, ROC curve, Precision–Recall curve, and heatmaps, to enhance interpretability and provide insights into the model’s performance:
The Pareto front is used for multi-objective optimization, balancing multiple performance metrics (e.g., Precision vs. Recall). A set of solutions represents candidate optimal solutions in the multi-objective optimization process. A solution oi is Pareto-optimal if no other solution exists that improves one objective function without degrading another.
Mathematically, this is defined as:
The scatter plots of performance metrics are used to identify the Pareto front.
Using the computed metrics and visual analysis, predictions of malfunctions are made. The insights from graphical analysis, metrics, and Pareto optimization are integrated into a decision-support system. Predicted errors or anomalies are stored in a knowledge base for further evaluation or feedback.
Figure 3 presents a detailed flowchart of the proposed methodology, which consists of the following stages: data acquisition (from the elevator system), signal preprocessing (Gaussian Filtering, EKF), data transmission, data processing, feature extraction, data balancing and pattern analysis, PU learning, Reinforcement Learning (RL), data augmentation (GAN-based), model training (RF, SVM, DNN), model evaluation, alarm generation, and database update.
The flowchart illustrates a comprehensive methodology for monitoring, analyzing, and maintaining an elevator system using advanced machine learning techniques and data-driven processes. The data begins with sensors attached to the elevator system, capturing raw signals such as vibrations, current, and temperature. These signals undergo preprocessing using Gaussian and Extended Kalman Filters to remove noise and enhance quality before being saved locally. The preprocessed data is transmitted via secure protocols such as MQTT, HTTPS, and TLS, ensuring encrypted and reliable communication. The data packets are formatted with timestamps and sensor IDs, enabling traceability and efficient cloud monitoring for storage and accessibility.
Once transmitted, the data undergoes further processing, integrating historical and real-time information from the database. Feature extraction techniques such as RMS, THD, STFT, and statistical measures like kurtosis and skewness are applied to derive meaningful patterns. Balancing the data ensures that all conditions, including elevator movement during ascent and descent, are adequately represented, and patterns are analyzed to form representative sample vectors for further learning stages.
The methodology includes generating alarms to monitor system conditions in real-time. Thresholds are predefined based on system specifications and operational requirements. Alerts are categorized into normal, marginal, and critical levels, and dispatched to relevant personnel through an automated system. Alerts are archived for future reference, creating a condition monitoring log that supports effective decision-making and rapid responses to potential malfunctions.
Finally, the database is continually updated through secure connections, allowing new data to be inserted while maintaining existing records. Labels are improved through iterative learning, and historical data is archived for long-term analysis. The updated database facilitates the generation of detailed reports, supports decision-making processes, and integrates preventive maintenance scheduling based on statistical insights. This ensures a robust and adaptive framework for the monitoring, analysis, and maintenance of the elevator system, enhancing its operational efficiency and safety.
This system operates independently of the elevator’s control panel, functioning as a non-intrusive monitoring and diagnostic tool. It does not interfere with the elevator’s operational processes or its built-in safety mechanisms, ensuring compliance with all regulatory standards and preserving the integrity of the elevator’s original design. By collecting and analyzing data from external sensors, the system provides valuable insights without altering the core functionality or safety features of the elevator.
The proposed system presents numerous advantages. First, it reduces maintenance costs by enabling predictive and condition-based maintenance strategies. By identifying potential issues before they escalate, unnecessary repairs are minimized, and service schedules can be optimized. The second advantage is the minimization of downtime by proactively addressing faults, ensuring the elevator remains operational and reducing inconvenience for users. The third advantage is that the system enhances the overall safety and reliability of the elevator by continuously monitoring critical parameters and generating real-time alerts when thresholds are exceeded. Additionally, it extends the lifespan of the equipment by ensuring timely interventions and preventing excessive wear and tear.
The proposed system introduces several innovations. It leverages advanced machine learning techniques, including PU learning, Reinforcement Learning, and GAN-based data augmentation, to improve fault detection and classification accuracy. The use of encrypted communication protocols ensures secure data transmission, while the integration of cloud monitoring provides scalability and remote access to data. Furthermore, the system’s ability to adapt to various operating conditions, such as different load profiles during elevator ascent and descent, showcases its robustness and versatility. By incorporating a knowledge base for error analysis and decision support, the system also facilitates informed decision-making, enhancing maintenance efficiency and reducing human error. These innovative features make the system a cutting-edge solution for modern elevator monitoring and diagnostics.