Elevator Fault Diagnosis Based on a Graph Attention Recurrent Network

Haokun Wu; Li Yin; Yufeng Chen; Zhiwu Li; Qiwei Tang

doi:10.3390/electronics14112308

,

and

¹

Department of Engineering Science, Faculty of Innovation Engineering, Macau University of Science and Technology, Taipa, Macau SAR, China

²

Hitachi Building Technology (Guangzhou) Co., Ltd., Guangzhou 510663, China

^*

Author to whom correspondence should be addressed.

Electronics2025, 14(11), 2308;https://doi.org/10.3390/electronics14112308

This article belongs to the Section Artificial Intelligence

Version Notes

Order Reprints

Abstract

Elevator fault diagnosis is critical for ensuring operational safety and reliability in modern vertical transportation systems. Traditional approaches, which rely on time- and frequency-domain signal analysis, often struggle with the issues such as noise sensitivity, inadequate feature extraction, and limited adaptability to complex scenarios. To address these challenges, this paper proposes a Graph Attention Recurrent Network (GARN) which integrates graph-structured signal representation with spatiotemporal feature learning. The GARN employs a limited penetrable visibility graph to transform raw vibration signals into noise-robust graph topologies, preserving critical patterns while suppressing high-frequency noise through controlled edge penetration. An adaptive attention mechanism dynamically fuses triaxial features to prioritize the most relevant information for fault diagnosis. The GARN combines a graph convolutional network to extract spatial correlations and a gated recurrent unit to capture temporal fault progression, enabling holistic and accurate fault classification. Experimental results based on real-world elevator datasets demonstrate the superior performance of the GARN, showcasing its strong noise resistance, adaptability to complex fault conditions, and ability to provide reliable and timely fault diagnosis, making it a robust solution for modern elevator systems.

Keywords:

elevator; fault diagnosis; vibration; graph convolutional network; gated recurrent unit

1. Introduction

Elevators are essential to the infrastructure of smart buildings and play a crucial role in urban transportation, moving millions of people daily. The reliability and safety of elevators directly impact the quality of life in cities. Enhancing elevator safety is a primary concern, as failures in critical components can lead to serious accidents, endangering both passengers and maintenance personnel. With the increasing number of elevators, ensuring their safety and reliability becomes more challenging.

Traditional maintenance paradigms for elevator systems, including time-based maintenance and corrective maintenance strategies, remain predicated upon standardized schedules and human-dependent inspection protocols. Time-based maintenance assesses components at fixed intervals, ignoring operational degradation. Corrective maintenance protocols exhibit greater passivity, initiating interventions exclusively post-failure, with consequent operational disruptions and elevated repair expenditures. Both methodologies necessitate substantial technician deployment, demonstrating inherent inefficiencies in labor utilization and temporal requirements while proving inadequate for addressing the heterogeneous operational demands characteristic of contemporary vertical transportation infrastructure.

In contrast, the condition-based maintenance strategy emphasizes the continuous monitoring of vibration in components exhibiting signs of wear or damage. For elevators, condition-based maintenance primarily relies on the analysis of vibration signals in both the time and frequency domains. These approaches integrate advanced signal processing techniques with machine learning models, such as multilayer perceptrons, support vector machines, and Bayesian estimation, to identify faults characterized by clear and distinguishable patterns. Recent developments in adaptive control frameworks, as demonstrated by Zhang et al. [1] through their siphon-based supervisory policy for unreliable-resource management in automated manufacturing systems, suggest potential extensions for maintenance strategies. Their methodology dynamically adjusts system modes to accommodate component failures while preserving original operational behavior, a concept that could inform adaptive maintenance policy design for elevator systems with heterogeneous component degradation patterns. For example, wavelet denoising proves effective in removing noise from horizontal vibration signals, thereby revealing irregularities in the guide rail that compromise ride comfort. The denoised signals, with their enhanced features, enable precise localization of guide rail faults [2]. Similarly, autoregressive models utilize autoregressive coefficients derived from processed signals as inputs for support vector machines, facilitating automated fault recognition [3]. Furthermore, a bicoherence-based method is proposed for elevator fault diagnosis, leveraging autoregressive modeling to capture nonlinear couplings and mitigate the impact of Gaussian noise in vibration signals [4]. These techniques collectively enhance the accuracy and reliability of fault detection in elevator systems.

While demonstrating diagnostic competence for elementary fault patterns, extant vibration analysis methodologies exhibit three principal constraints when confronting multifactorial failure modes that are prevalent in modern elevator ecosystems. First, the dependency on manual time–frequency feature engineering introduces significant noise susceptibility, particularly in operational environments characterized by non-stationary disturbances. Although wavelet thresholding proves effective in controlled conditions, empirical studies reveal substantial performance degradation under high-noise scenarios, due to transient feature attenuation. Second, conventional machine learning architectures (e.g., support vector machines, Bayesian classifiers) demonstrate limited capacity for modeling nonlinear signal interactions and cross-scale dependencies, constraining generalizability across diverse failure modalities. Third, prevailing analytical frameworks predominantly analyze uniaxial vibration data, thereby neglecting the diagnostic potential inherent in triaxial signal correlations, which are critical for differentiating spectrally analogous but spatially distinct fault signatures. These limitations collectively underscore the necessity of advanced analytical frameworks that are capable of processing multidimensional vibration data through deep feature learning architectures.

The advent of neural networks has introduced new possibilities in vibration-based fault diagnosis. Deep autoencoders can be employed to extract meaningful features from raw sensor data, enabling effective fault classification [5]. Additionally, the wavelet packet decomposition method, combined with neural networks, has proven effective in distinguishing normal and faulty states within noisy vibration signals [6]. Deep learning algorithms revolutionize fault diagnosis with their superior feature extraction and pattern recognition capabilities. Architectures such as convolutional neural networks, autoencoders, and generative adversarial networks are used to diagnose complex mechanical faults [7,8,9,10,11]. For instance, one-dimensional convolutional neural networks can be employed to extract features and achieve high diagnostic accuracy, particularly in scenarios characterized by limited sample sizes [12]. A deep wavelet autoencoder, combined with extreme learning machines, demonstrated effectiveness in diagnosing rolling bearing faults by capturing both time-domain and frequency-domain characteristics [13].

Currently, deep learning techniques such as one-dimensional convolutional neural networks, Deep Boltzmann Machines, and transfer learning show promise in fault diagnosis, particularly for bearing failures. However, these methods face significant challenges when applied to elevator fault diagnosis. The primary obstacles include the need for large datasets, which are often unavailable in elevator systems, leading to insufficient training data. Additionally, issues such as poor feature interpretability and unstable performance in small-sample scenarios hinder their effectiveness. As a result, these methods struggle to handle the complexities and diversity of elevator fault conditions, limiting their broader applicability.

Graph neural networks have been successfully applied in various domains, including social networks, natural language processing, and fault diagnosis [14,15,16,17,18]. Recent advances in graph structure learning, such as Neural Gaussian Similarity Modeling [19], have further enhanced noise-resilient edge formation through differentiable sampling strategies, complementing domain-specific approaches like LPVG. Graph neural networks leverage their ability to model relational data and capture structural dependencies to improve troubleshooting capabilities. Methods like gradient-based interpretable graph convolutional networks (GCNs) further improve model transparency in diagnosing bearing faults.

The integration of multi-view information, as proposed in mutual information maximization across feature and topology views [20], aligns with GARNs’ attention-driven fusion of triaxial vibration data, where directional dependencies are dynamically weighted to prioritize diagnostically critical axes. Multivariate anomaly detection with a self-learning GCN effectively captures spatial and temporal correlations within input sequences [21]. Another study uses the k-nearest neighbor method to construct graphs to establish relationships, thus providing efficient feature extraction for GCN fault classification [22]. Recent research focuses on multidimensional, multi-scale, and multi-channel data for fault diagnosis. For multi-channel graph analysis, the cross-channel information bottleneck principle (CCGIB) [23] provides a theoretical foundation for balancing consistency and complementarity in sensor fusion, a challenge directly addressed by GARNs’ adaptive attention mechanism. Multi-scale GCNs and improved multi-channel GCNs process data from multiple sensors, capturing complex fault features [24,25]. For temporal data, transforming time-series data into graph structures, such as weighted horizontal visibility graphs, proves effective in capturing both local and global signal features. The improved graph isomorphism network has shown promising results in fault classification tasks [26]. Similarly, while graph neural networks garner attention for their ability to process non-Euclidean relational data, their current application is largely focused on bearing fault diagnosis. This makes it challenging to adapt existing graph neural networks methods to the specific dynamics of elevator faults. Despite graph neural networks’ potential to model relational data and capture structural dependencies, the lack of research directly addressing elevator systems means that these models struggle to effectively capture the nuances of elevator fault behaviors. Consequently, current deep learning and graph neural network-based methods are limited in their ability to address the unique complexities of elevator fault diagnosis, highlighting the need for further advancements tailored to this domain.

Elevator fault diagnosis faces several significant challenges that hinder the effectiveness and broader applicability of existing methods:

The diversity and complexity of elevator systems make it difficult for traditional methods to effectively capture the wide range of fault conditions.
A lack of labeled data, particularly for rare or complex faults, restricts the generalization ability of diagnostic models, leading to overfitting, especially when using deep neural networks.
While graph neural networks have proven effective in handling relational data, they have not been widely tested for elevator fault diagnosis. Existing graph neural network methods are primarily focused on simpler systems, such as bearing fault diagnosis, which have less complex fault dynamics.
Adapting graph neural networks to the unique complexities of elevator systems, with their dynamic operating states and diverse fault conditions, presents significant challenges.

To address these challenges, this paper proposes a Graph Attention Recurrent Network (GARN). The GARN framework introduces significant advancements and methodological innovations in addressing the challenges associated with elevator fault diagnosis. The key contributions and technical merits of the approach are outlined below:

Enhanced Noise Tolerance by Limited Penetration Visibility Graph (LPVG) Conversion: A fundamental innovation of GARN is the incorporation of the LPVG method, which converts raw vibration signals into graph-based representations. The LPVG algorithm is specifically designed to mitigate the impact of noise during graph construction by permitting controlled penetration thresholds. This noise-tolerant mechanism is particularly advantageous in elevator environments, where vibration signals are often contaminated by extraneous noise, ensuring the preservation of diagnostic accuracy under adverse conditions.
Spatial Feature Extraction via GCN: The GARN employs a GCN to extract spatial features from the graph-structured vibration data. The GCN architecture is uniquely suited to model the spatial dependencies and interrelationships among various nodes in the graph, thereby capturing the inherent structural patterns of the vibration signals. This capability is critical for identifying subtle fault signatures that may be distributed across different spatial regions of the signal.
Attention-Driven Multi-Axis Data Fusion: The framework incorporates an attention-based mechanism to dynamically integrate features derived from the triaxial vibration data (x, y, and z axes). This mechanism assigns adaptive weights to each axis based on its diagnostic relevance, enabling the model to prioritize the most informative directional components. By selectively emphasizing critical features, the attention mechanism enhances the model’s ability to detect faults with diverse directional characteristics, thereby improving the overall classification accuracy and diagnostic precision.
Temporal Modeling with Gated Recurrent Units (GRUs): To capture temporal dependencies within vibration signals, the GARN employs GRUs, which are well-suited for sequential data. The GRUs allow the model to track the evolution of fault signals over time, distinguishing between transient disturbances and ongoing fault conditions. This temporal modeling capability strengthens the framework’s ability to provide accurate and timely fault diagnoses in dynamic elevator systems.
Modern elevator systems are increasingly embedded within IoT ecosystems, where sensor-generated data streams necessitate real-time, adaptive fault diagnosis frameworks. The proposed GARN architecture not only addresses the limitations of traditional methods, but also aligns with the demands of software-defined IoT environments, such as QoS (Quality of Service)-aware resource allocation and edge computing efficiency. For instance, in smart buildings leveraging QoS-aware IoT protocols, the GARN’s low-latency inference (0.01 s per sample) ensures timely fault alerts, minimizing operational disruptions. Furthermore, its compatibility with streaming data architectures—such as those in software-defined vehicular networks—positions it as a versatile solution for dynamic environments requiring rapid, multi-sensor analytics. By integrating graph-based spatiotemporal learning with IoT operational paradigms, this work bridges the gap between theoretical fault diagnosis and practical deployment in next-generation smart infrastructure.

The GARN delivers a solution for elevator fault diagnosis in complex scenarios. The remainder of this paper is structured as follows: Section 2 describes the GARN in detail; Section 3 outlines the dataset used for validation; Section 4 discusses the experiments and results; and we conclude the study in Section 5.

2. The Proposed Method

To tackle the challenges of noise resilience, multi-axis data integration, and temporal feature modeling in elevator fault diagnosis, we propose a novel diagnostic framework. As shown in Figure 1, the GARN involves several key steps:

First, in the data preprocessing phase, raw vibration signals collected from the x, y, and z axes are subjected to a sliding-window technique. This technique segments the original data into sequences.
Next, these segmented sequences are transformed into graph representations using the LPVG method. This transformation captures both the temporal and structural patterns of the data and enhances noise immunity.
The GCN is then employed to learn meaningful representations from the graph-structured data by aggregating and propagating information across nodes and their neighbors. An attention-based fusion mechanism is incorporated to dynamically integrate features extracted from the three axes. This mechanism assigns adaptive weights to each axis, prioritizing the most relevant directional information for fault diagnosis.
Finally, the fused features are processed by a GRU network, which captures temporal dependencies in the vibration signals, enabling accurate and efficient fault classification.

Figure 1. Graph attention recurrent network.

The vibration time-series dataset of an axis is

X = ⟨ x_{1}, x_{2}, \dots, x_{N} ⟩

, where N denotes the number of samples, and the sliding window method is applied to divide X into windows of length L, with a step size of

η

. The number of windows n is determined by

n = \frac{N}{η}

. The index of the window is s, which ranges from 1 to n. Each segment

X_{s} = ⟨ x_{s 1}, x_{s 2}, \dots, x_{s L} ⟩

corresponds to a window, and each window is transformed into a graph representation

G_{s} = (V_{s}, E_{s})

using the LPVG method. The set of nodes is

V_{s}

and the edges between nodes are

E_{s}

.

2.1. Graph Construction by LPVG

The LPVG is an extension of the visibility graph (VG), designed to improve noise immunity during graph construction. A VG, derived from computational geometry, is commonly used to represent visibility relationships among points in a sequence or plane. An edge is formed between two points if there exists an unobstructed path that satisfies the visibility condition.

The LPVG addresses the limitations of traditional visibility graphs (VGs). As illustrated in Figure 2, the LPVG systematically recovers edges that standard VG methods fail to detect under noisy conditions. This improvement stems from two key modifications: (1) flexible adaptation of visibility criteria and (2) implementation of distance-based edge constraints. By tolerating minor data fluctuations while maintaining core visibility principles, the LPVG achieves an enhanced structural representation of complex systems without sacrificing graph interpretability. The methodology demonstrates particular effectiveness in analyzing noisy dynamical systems, where conventional VGs produce fragmented connectivity patterns.

Figure 2. Comparison of two graph construction methods. (a) Visibility graph. (b) Limited penetrable visibility graph.

The LPVG approach enhances graphics with a manageable tolerance for visibility violations, permitting connections when up to

δ

intermediate points fail to meet standard visibility criteria. For any pair of nodes

x_{s i}

and

x_{s j}

(

i < j

) within segment

X_{s}

, the edge formation follows the following mathematical constraints:

\begin{matrix} x_{s k} < x_{s i} + \frac{x_{s j} - x_{s i}}{j - i} \cdot (k - i), \\ \forall k \in {i + 1, \dots, j - 1}, \\ \forall s \in {1, 2, \dots, N - L + 1}, \end{matrix}

(1)

where

x_{s k}

represents intermediate values between the candidate nodes. This criterion requires that the majority of points maintain linear visibility, with edge creation permitted when violating points (

N_{violate}

) satisfy

N_{violate} < δ

. The parametric control through

δ

enables systematic adaptation to data irregularities while preserving essential topological relationships.

Depending on the presence or absence of edges between two points in the LPVG, each segment

X_{s}

has then an adjacency matrix

A_{s} \in R^{L \times L}

, where

A_{s i j} = 1

if an edge exists between nodes

x_{s i}

and

x_{s j}

, and

A_{s i j} = 0

otherwise.

2.2. LPVG Construction Steps

Given a raw one-dimensional vibration signal sequence

X = {x_{1}, x_{2}, \dots, x_{N}}

, we first apply a sliding window to segment the signal into overlapping or non-overlapping subsequences. Let L denote the window length and

η

denote the step size. The s-th segment is denoted as

X_{s} = {x_{s 1}, x_{s 2}, \dots, x_{s L}},

(2)

where

x_{s i} = x_{(s - 1) η + i}

. Each segment

X_{s}

will be transformed into a graph

G_{s} = (V_{s}, E_{s})

via the limited penetrable visibility graph (LPVG) method.

The graph construction follows the principle of visibility between time-series data points. Each point

x_{s i}

in the window becomes a node

v_{i} \in V_{s}

. The two nodes

v_{i}

and

v_{j}

(with

i < j

) are connected by an edge

e_{i j} \in E_{s}

if the majority of the intermediate points do not obstruct the direct line of sight between

x_{s i}

and

x_{s j}

. Formally, the penetrable visibility condition is given as

x_{s k} < x_{s i} + \frac{x_{s j} - x_{s i}}{j - i} (k - i), \forall k \in {i + 1, \dots, j - 1} with at most δ violations .

(3)

This condition allows up to

δ

intermediate points

x_{s k}

to violate the strict visibility criterion. If the number of violations is less than

δ

, then an edge is established between nodes

v_{i}

and

v_{j}

.

After evaluating all the possible pairs

(i, j)

within each segment, we obtain a sparse adjacency matrix

A_{s} \in R^{L \times L}

, where

A_{s i j} = \{\begin{matrix} 1, & if edge e_{i j} exists, \\ 0, & otherwise . \end{matrix}

(4)

This procedure is applied independently to each segment

X_{s}

of the signal across all three vibration axes (x, y, z), yielding a sequence of graph-structured data. These LPVG graphs serve as robust structural encodings for the downstream graph neural network, where the limited penetrability (

δ

) effectively suppresses transient noise while preserving the essential dynamical patterns.

2.3. Feature Extraction by GCN

The GCN extracts features from the graph structure. The GCN propagates information through graph layers by aggregating and transforming neighborhood information. A GCN is a class of neural networks specifically designed for graph-structured data. Unlike traditional convolutional neural networks, which operate on grid-like data such as images, a GCN leverages the relationships between nodes to update node features through message passing along the edges of the graph. This capability makes GCNs particularly effective for tasks such as node classification, link prediction, and graph classification.

A spatial GCN implements hierarchical learning through three core operations: localized aggregation, nonlinear feature fusion, and structural readout. The framework first propagates neighborhood features via message passing with learnable attention weights, subsequently combining transformed neighbor embeddings with ego-node representations through multilayer nonlinear transformations. Finally, graph-level pooling operators generate topology-aware embeddings by aggregating node features with permutation-invariant functions.

This architecture enables progressive encoding of local substructures and global connectivity patterns. Through iterative neighborhood aggregation, the spatial GCN demonstrates graph sparsity and noise while maintaining structural awareness, effectively balancing local detail preservation and global context integration in graph representation learning. For a graph with the node feature matrix

H^{(0)} \in R^{L \times d}

, where d is the feature dimension, the layer-wise propagation rule of the GCN is

H^{(l + 1)} = σ ({\tilde{D_{s}}}^{- 1 / 2} \tilde{A_{s}} {\tilde{D_{s}}}^{- 1 / 2} H^{(l)} W^{(l)}),

(5)

where

\tilde{A_{s}} = A_{s} + I

is the adjacency matrix, with I being the identity matrix; the matrix

\tilde{D_{s}}

denotes the degree of

\tilde{A_{s}}

; the node features at the l-th layer are represented by

H^{(l)}

, with

H^{(0)} = X_{s}

and

l = 0, 1, \dots, L - 1

; the trainable weight matrix of the l-th layer is denoted by

W^{(l)}

; and the nonlinear activation function is represented as

σ (\cdot)

.

At each layer, the GCN aggregates information from the immediate neighbors of a node to update its feature representation. By stacking multiple GCN layers, the network progressively captures higher-order dependencies between nodes, allowing it to model increasingly complex neighborhood relationships and better capture the structural information inherent in the graph.

2.4. Feature Fusion by Attention Mechanism

For elevator fault diagnosis, three-axis vibration data plays a crucial role, as each axis captures distinct vibration directions with unique characteristics. The recorded vibration signals correspond to three axes: the x-axis, y-axis, and z-axis. The x-axis is oriented perpendicular to the elevator door, the y-axis is parallel to the door, and the z-axis is perpendicular to the ground.

To effectively combine the three-axis vibration data (x-axis, y-axis, and z-axis) and capture the most relevant directional information for elevator fault diagnosis, an attention mechanism is employed. This approach dynamically assigns different weights to each axis, highlighting the most informative features for the task.

The features extracted by the GCN from the x, y, and z axes of the vibration data are denoted as

F_{s x}

,

F_{s y}

, and

F_{s z}

, respectively. The attention mechanism computes weights for each axis based on their contributions. The attention network is used to determine the importance of the features from each axis. The attention score

α_{s i}

for each axis

i \in {x, y, z}

is calculated by

F_{s i} = Softmax (W_{s i} \cdot F_{s i} + b_{s i}),

(6)

where

W_{s i}

is a trainable weight vector for axis i, and

b_{s i}

is a bias term. The operator ‘·’ denotes matrix multiplication, and the function

Softmax (\cdot)

ensures that the attention scores sum to 1 across the three axes.

The calculated attention weights

α_{s x}

,

α_{s y}

, and

α_{s z}

indicate the relative importance of features from the x, y, and z axes, respectively. The fused feature representation

F_{fusion, s}

is computed as a weighted sum of the features from all three axes:

F_{fusion, s} = α_{s x} \cdot F_{s x} + α_{s y} \cdot F_{s y} + α_{s z} \cdot F_{s z},

(7)

where

α_{s x}

,

α_{s y}

, and

α_{s z}

are the attention weights for the x, y, and z axes, and the symbol · denotes element-wise multiplication.

The fusion process employs axis-specific feature weighting to selectively amplify diagnostically relevant patterns in triaxial vibration data. By analyzing directional dependencies across sensor axes through trainable attention layers, the system automatically calibrates inter-axis contributions during feature integration. This adaptive weighting strategy produces composite representations,

F_{fusion, s}

, that maintain discriminative three-dimensional characteristics while suppressing redundant information, thereby preserving critical fault signatures that are essential for accurate classification.

2.5. Fault Diagnosis by GRUs

Following feature fusion, the temporal pattern analysis module employs GRUs to model sequential dependencies in elevator operational states. As a streamlined alternative to conventional LSTM architectures, GRUs maintain temporal modeling capabilities while optimizing computational efficiency through architectural simplification. The network’s dual regulatory components, which are the update gate (

z_{s}

) and the reset gate (

r_{s}

), implement dynamic information filtering by adaptively retaining critical historical states while discarding transient noise patterns.

The update gate controls how much of the previous hidden state

h_{s - 1}

is retained in the current hidden state

h_{s}

, calculated by

z_{s} = σ (W_{z} \cdot F_{fusion, s} + U_{z} \cdot h_{s - 1} + b_{z}),

(8)

where

σ (\cdot)

is the sigmoid activation function. In the update gate,

W_{z}

and

U_{z}

are the weight matrices for the current input

F_{fusion, s}

. The symbol

b_{z}

is the bias term. The reset gate determines how much of the previous hidden state should be ignored when calculating the candidate hidden state

{\tilde{h}}_{s}

, computed by

r_{s} = σ (W_{r} \cdot F_{fusion, s} + U_{r} \cdot h_{s - 1} + b_{r}),

(9)

where

W_{r}

and

U_{r}

are the weight matrices of the reset gate and

b_{r}

is the bias term. The candidate hidden state

{\tilde{h}}_{s}

is computed based on the reset gate, which controls how much of the previous hidden state is reset, computed by

{\tilde{h}}_{s} = tanh (W_{h} \cdot F_{fusion, s} + U_{h} \cdot (r_{s} ⊙ h_{s - 1}) + b_{h}),

(10)

where

tanh (\cdot)

is the hyperbolic tangent activation function, and the element-wise multiplication is denoted as ⊙. The weight matrices are

W_{h}

and

U_{h}

. The bias term is

b_{h}

.

The final hidden state

h_{s}

is computed by combining the previous hidden state and the candidate hidden state, controlled by the update gate. It is updated by following equation:

h_{s} = z_{s} ⊙ h_{s - 1} + (1 - z_{s}) ⊙ {\tilde{h}}_{s} .

(11)

This hidden state

h_{s}

contains information from both the previous and current time steps, allowing the GRU network to model the temporal dependencies in the vibration data. After processing the fused features through the GRU network, the final hidden state

h_{s}

is passed through a fully connected layer to obtain the final fault diagnosis output. The output

{\hat{y}}_{s}

is computed as

{\hat{y}}_{s} = σ (W_{o} \cdot h_{s} + b_{o})

(12)

where

W_{o}

is the output weight matrix and

b_{o}

is the output bias term.

The three-axis vibration signals are first segmented into sequences using a sliding window technique. This step allows for the detailed analysis of manageable data segments. These segmented sequences are then transformed into graph representations using the LPVG method. The resulting graph representations are analyzed by the GCN, which aggregates and propagates information across the nodes to extract meaningful features. An attention mechanism is applied to dynamically fuse the features from all three axes, assigning adaptive weights to emphasize the most relevant directional information for fault diagnosis. The fused features are subsequently processed by a GRU network, which captures the temporal dependencies in the vibration signals. Finally, the GRU network outputs the type of fault, enabling the diagnosis of elevator faults.

3. Dataset Description

We simulated five distinct elevator faults on a six-floor experimental elevator. These faults included unbalanced loads, uneven rail joints, uneven rope tension, surface wear of the guide shoe, and an uneven surface in the car pulley rope groove.

Unbalanced Loads (UL)
To simulate the unbalanced loads condition, we positioned the weights on the back side of the elevator car, as shown in Figure 3a.
Uneven Rail Joints (URJ)
The uneven rail joints fault was simulated by applying tape to the rail joint, as illustrated in Figure 3b. The red mark in the figure indicates the location of the tape.
Uneven Rope Tension (URT)
Uneven rope tension was simulated by adjusting the head springs, as depicted in Figure 3c. The image shows that the two head springs expanded more than the other springs, creating an imbalance.
Surface Wear of the Guide Shoe (SWGS)
As the guide shoe experiences wear, the gap between the guide shoe and the rail increases. To simulate this fault, the distance between the guide shoe and the rail was adjusted, as shown in Figure 3d.
Uneven Surface in the Car Pulley Rope Groove (USCP)
The uneven surface in the car pulley rope groove was simulated by attaching tape to the surface of the car pulley, as shown in Figure 3e.

Figure 3. Experimental setup.

We captured elevator vibration data using a smartphone, instead of an edge device, which had a built-in triaxial accelerometer that operated at a sampling frequency of 500 Hz. Positioned at the center of the elevator car’s floor, the accelerometer ensured optimal data capture. The dataset included a total of 1800 samples, with 300 normal samples and 300 samples for each of the five fault types.

The PMT EVA-625-FD remains the traditional instrumentation standard for conventional elevator vibration data acquisition systems. Due to its limited onboard memory, continuous data acquisition is restricted to 700 s intervals, necessitating frequent data transfers during prolonged monitoring. Furthermore, the equipment’s bulky design significantly hinders its portability in field applications.

Modern smartphones, in contrast, demonstrate distinct advantages as alternative vibration monitoring solutions. Their compact form enables convenient deployment in confined elevator spaces, while maintaining professional-grade accuracy comparable to that of the PMT device. We conducted rigorous validation experiments comparing the smartphone accelerometer (sampling at 500 Hz) with the PMT EVA-625-FD under identical operational conditions. The smartphone was calibrated using a static gravity reference (1 g) and synchronized with the PMT device during dynamic testing to ensure temporal alignment of vibration signals.

Figure 4a shows the raw vibration waveform captured by the smartphone accelerometer (upper panel) and its FFT spectrum (lower panel), while Figure 4b shows the corresponding data from the PMT EVA-625-FD device, where the lower panel shows the raw vibration waveform captured by the PMT and the upper panel shows the FFT spectrum. It can be seen that in the vibration models collected by the two devices, the main frequencies are basically the same. This proves the feasibility of using cell phones to collect elevator vibration signals for data analysis.

Figure 4. Data collection by different devices.

Modern smartphones particularly excel in terms of their data storage capacity, with multi-terabyte storage options that permit continuous long-term monitoring without memory constraints. This combination of technical parity in precision, enhanced portability, and expanded data retention capabilities positions smartphone-based systems as viable alternatives for vibration analysis applications.

4. Experiments

4.1. Comparative Analysis of Models for Elevator Fault Diagnosis

We performed a comprehensive evaluation of various models applied to the task of fault diagnosis, utilizing datasets that varied in scale. The models under scrutiny included the multi-branch one-dimensional deep convolutional neural network model (MBCNN) [27], Uniformer [28], neural architecture transfer (NAT) [29], and our novel framework, the GARN.

In fault diagnosis, several metrics are commonly used to evaluate the performance of a model. These metrics include accuracy, F1 score, precision, and recall. Accuracy is the proportion of correct predictions out of all predictions made, and is defined as:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N},

(13)

where

T P

represents true positives,

T N

true negatives,

F P

false positives, and

F N

false negatives.

Precision measures the proportion of true positive predictions among all positive predictions, and is calculated as

Precision = \frac{T P}{T P + F P} .

(14)

Recall measures the proportion of true positive predictions among all actual positives, and is given by

Recall = \frac{T P}{T P + F N} .

(15)

F1 score is the harmonic mean of precision and recall, providing a balanced measure between them. It is defined as

F 1 score = 2 \times \frac{Precision \times Recall}{Precision + Recall} .

(16)

Table 1 demonstrates substantial performance disparities across the evaluated models. The MBCNN architecture exhibits marked limitations in accuracy, primarily due to its over-reliance on localized feature extraction mechanisms. This fundamental design constraint proves particularly detrimental when processing complex elevator fault signatures, where global signal characteristics dominate. The model’s vulnerability to environmental noise further compounds its suboptimal classification outcomes.

Table 1. Performance metrics across data sizes.

While achieving marginal improvements over the MBCNN, the Uniformer architecture reveals scalability limitations with expanding training data. Its inherent difficulties in temporal pattern recognition substantially hinder dynamic feature extraction from time-dependent fault signals, resulting in inadequate adaptation to multivariate failure scenarios. The NAT model shows moderate promise on larger datasets through enhanced complex-signal processing capabilities, though its feature extraction parameters remain suboptimally tuned for lift system-specific failure modes.

Our framework achieves consistent performance superiority across all evaluation metrics (accuracy, F1 score, precision, recall) through three technical innovations. First, the LPVG-based graph conversion technique establishes noise-resistant signal representations that preserve critical diagnostic features under real-world operating conditions. Second, graph convolutional operations enable effective spatial pattern recognition through neighborhood-aware feature propagation in the graph domain. Third, the hybrid architecture combines axis-specific attention weighting with gated temporal modeling, ensuring coordinated analysis of directional vibration components and their time-evolution patterns.

These comparative results highlight fundamental limitations in other approaches for elevator diagnostics. While existing models demonstrate competence in generic signal processing tasks, their architectural constraints prevent effective adaptation to elevator fault signatures. Our solution addresses these challenges through synergistic integration of graph-based spatial analysis and adaptive temporal modeling, achieving a 18.7% higher mean accuracy than the second-best performers in complex failure scenarios. The experimental validation confirms the framework’s practical viability for real-world elevator maintenance applications.

4.2. Generalizability Evaluation

To rigorously evaluate the generalization capability of the GARN across heterogeneous elevator systems, we conducted additional experiments on a distinct elevator model with mechanical configurations differing significantly from the training dataset. This model operates under higher load capacities and utilizes a variable-frequency drive mechanism, introducing unique vibration patterns. We collected 311 vibration samples covering five fault types (URJ, URT, SWGS, USCP) under real-world operational conditions.

The testing results demonstrate that the GARN achieves an overall accuracy of 86.5%. The results underscore the GARN’s ability to adapt to unseen elevator architectures without requiring architecture modifications or retraining. The framework’s noise-robust LPVG transformation and attention-driven fusion mitigate domain shifts caused by mechanical heterogeneity, preserving diagnostic fidelity.

4.3. Noise Resistance in Fault Diagnosis

To evaluate the noise tolerance of the GARN model, we introduced three distinct types of noise into the vibration signals within the test set: Gaussian noise, impulse noise, and Poisson noise. Each noise type exhibits unique characteristics and impacts the signals in different ways. Gaussian noise follows a normal distribution defined by the probability density function:

P (x) = \frac{1}{\sqrt{2 π σ^{2}}} exp (- \frac{{(x - μ)}^{2}}{2 σ^{2}}),

(17)

where

μ

is the mean and

σ^{2}

is the variance. This type of noise is widely used to model environmental or electronic interference in systems. Its uniform distribution across all the frequency components of a signal makes it particularly difficult to filter out, posing a significant challenge for noise mitigation.

Impulse noise, on the other hand, is characterized by sporadic, high-amplitude outliers that deviate sharply from the surrounding data points. It is typically caused by sudden disturbances or transient events. Impulse noise can be modeled as a discrete process using the following formula:

x_{i m p} [n] = \{\begin{matrix} s [n], & with probability 1 - p, \\ A \cdot δ [n - t_{0}], & with probability p, \end{matrix}

(18)

where

x_{i m p} [n]

represents the noisy signal at time n and

s [n]

is the original signal at the same time point. The parameter A denotes the amplitude of the noise, while

δ [n - t_{0}]

is the Dirac delta function, representing an impulse occurring at time

t_{0}

. The probability p determines the likelihood of an impulse occurring.

Poisson noise arises in scenarios involving discrete events. It follows a Poisson distribution, defined as

P (k; λ) = \frac{λ^{k} e^{- λ}}{k!},

(19)

where

λ

represents the average rate of occurrence and k is the number of events. Unlike Gaussian noise, Poisson noise is signal-dependent, meaning its variance increases with the signal amplitude, leading to unique challenges in its mitigation.

To rigorously quantify the noise tolerance of the proposed framework, three distinct noise types (Gaussian, impulse, and Poisson) were systematically introduced into the vibration signals. The noise levels were calibrated using the following signal-to-noise ratio (SNR) metrics: Gaussian noise (

S N R = - 3.99

dB), impulse noise (

S N R = - 6.06

dB), and Poisson noise (

S N R = 1.98

dB). The SNR values were calculated as

{SNR}_{dB} = 10 {log}_{10} (\frac{P_{signal}}{P_{noise}}),

(20)

where

P_{signal}

and

P_{noise}

represent the power of the original signal and added noise, respectively. For each noise type, 360 trials were conducted, ensuring balanced representation across all six classes (UL, URJ, URT, SWGS, USCP, and normal). Noise was uniformly added to every class to prevent bias in the diagnostic performance evaluation. This uniform application ensured that no fault type would be disproportionately affected by noise interference, thereby validating the robustness of the GARN under heterogeneous and challenging operational conditions. Figure 5a illustrates the Gaussian noisy signal superimposed onto the original vibration waveform, where the additive noise follows a normal distribution to simulate environmental or electronic interference. This noise introduces uniform distortions across the entire frequency spectrum, obscuring subtle fault-related features such as transient amplitude variations or spectral harmonics. In contrast, Figure 5b highlights the disruptive impact of impulse noise, characterized by sporadic high-amplitude spikes that mimic abrupt mechanical shocks or electromagnetic disturbances. These outliers disproportionately distort signal peaks and transient events, posing significant challenges to conventional threshold-based denoising methods. Figure 5c further demonstrates the influence of Poisson noise, a signal-dependent noise type where variance scales with amplitude intensity. This phenomenon manifests as irregular fluctuations in high-amplitude regions (e.g., during elevator acceleration phases), complicating the extraction of fault-specific patterns such as guide rail wear signatures. Collectively, these figures underscore the diverse noise profiles encountered in real-world elevator systems and emphasize the necessity of robust preprocessing frameworks, such as the GARN’s LPVG transformation, to mitigate their adverse effects on diagnostic accuracy.

Figure 5. Original signal with added noise.

To evaluate the impact of the LPVG on noise, we systematically varied the penetration threshold

δ \in {0, 1, 2, 3}

under noise conditions. The experimental results are summarized in Table 2, which outlines the model’s performance under various noise types. The results show that the choice of

δ

plays a key role in noise suppression. At

δ = 0

(rigid visibility criteria), the model achieved an accuracy of

75.83 %

under Gaussian noise, highlighting sensitivity to transient noise due to fragmented graph connectivity. Increasing

δ

to 2 significantly improved the model’s performance, yielding an optimal accuracy of

88.89 %

by tolerating minor visibility violations while preserving fault-related topological patterns. This configuration effectively mitigates high-frequency noise without over-smoothing critical signal features. Conversely, setting

δ = 3

led to over-penetration, reducing the accuracy to

85.56 %

as excessive edge creation introduced spurious connections that diluted discriminative fault signatures. These experiments validate

δ = 2

as the optimal threshold, achieving a

17.22 %

accuracy improvement over

δ = 0

and showing superiority in real-world scenarios with non-stationary disturbances. The controlled penetration mechanism ensures that the LPVG adaptively filters noise while maintaining essential spatiotemporal relationships in vibration signals.

Table 2. Model performance metrics under different noise conditions.

The model consistently maintained high accuracy, precision, recall, and F1 scores under different noise conditions, demonstrating its capability in noisy environments. Notably, the model performed well in the presence of uniform noise, achieving the highest accuracy (91.39%) and F1 score (0.9139), and the model’s accuracy in the presence of noise was higher than 88%, which highlights the model’s ability to adapt to and deal effectively with noise.

The controlled edge penetration mechanism of the LPVG significantly improves noise robustness by filtering transient interference while preserving critical structural patterns. The LPVG converts raw vibration signals into graph-based representations, preserving the structural characteristics of the signal while mitigating the influence of noise. By transforming the signal into a graph, the LPVG employs a mechanism that allows for a finite penetration rate, making it highly tolerant of noise in the time series during graph construction. This graph-based approach allows the model to capture the essential patterns of the signal, even in the presence of disruptive noise.

A significant strength of the LPVG lies in its capacity to penetrate noise by analyzing the relative spatial relationships among data points within the signal, thereby enabling a more resilient representation of the underlying data. This property renders it less susceptible to high-amplitude outliers, such as those generated by impulse noise, while simultaneously mitigating fluctuations induced by Gaussian or Poisson noise.

Moreover, the GARN integrates an attention-driven fusion mechanism that selectively emphasizes the most pertinent features across diverse data channels. This approach ensures that the model concentrates on directional features that are more reflective of the true signal, minimizing the influence of random noise. By assigning greater importance to critical features, the attention mechanism further bolsters the model’s resilience to noise.

The experimental outcomes demonstrate that the LPVG-based methodology substantially improves the model’s noise tolerance. By converting raw signals into graph-based structures, the LPVG effectively captures the intrinsic dynamics of the signal while attenuating noise interference. The attention-based fusion mechanism further augments the model’s capability to discard irrelevant noise, thereby ensuring consistent and accurate fault detection, even in highly noisy environments.

4.4. Hyperparameter Sensitivity Analysis

This study systematically evaluated the impact of graph convolutional network (GCN) and gated recurrent unit (GRU) hidden-layer dimensions on fault diagnosis performance. As shown in Table 3, comparative analysis of 16 parameter configurations reveals that the optimal diagnostic performance was achieved with 2 GCN hidden dimensions and 128 GRU hidden dimensions, yielding an accuracy of 95.56%, a precision of 0.9574, a recall of 0.9556, and an F1 score of 0.9558 F1. This configuration demonstrated a significant 6.39-percentage-point improvement over the suboptimal combination (GCN = 1, GRU = 128), confirming its unique advantages in spatiotemporal feature fusion.

Table 3. Model parameters and performance metrics.

The experimental results indicate that a moderate GCN depth (two layers) effectively extracted spatial topological features from vibration signals while avoiding feature over-smoothing caused by deep networks. Specifically, when increasing the number of GCN layers to four or eight (e.g., configurations 4–128 and 8–128), the model accuracy dropped to 80.28% and 88.89% respectively, due to the dilution effect of deep networks on localized fault patterns. In contrast, expanding the GRU hidden dimensions to 128 significantly enhanced the model’s temporal modeling capabilities, with accuracy gradient analysis showing a performance gain of approximately 0.75% per additional hidden unit (16 to 128 dimensions). This design enabled effective capture of long-term dependencies in elevator vibration signals, particularly filtering transient disturbances through extended temporal context windows under non-stationary noise interference.

Further analysis demonstrates that the configuration (2–128) achieves exceptional overfitting control, exhibiting only a 1.23% performance gap between the training and test sets – significantly lower than that for high-complexity configurations (e.g., 3.71% gap for 8–128). This indicates an optimal balance between feature discriminability and generalization capacity. When the GRU dimensions fall below 64 (e.g., configuration 2–64), the model struggles to effectively model slow time-varying fault progression such as rope tension variations, resulting in reduced accuracy (85.28%). Conversely, excessive GCN depth (e.g., 4–128) causes over-aggregation of spatial features, reducing the model’s sensitivity to localized faults like guide rail wear.

GCN layers preserve critical topological structures through limited penetrable strategies, while a high-capacity GRU network dynamically weights crucial temporal segments via attention mechanisms. This architectural characteristic enables simultaneous handling of complex spatial anomalies and progressive temporal degradation in elevator systems. The experimental data confirms stable performance across diverse elevator models and operational conditions, providing reliable theoretical foundations for practical engineering applications.

4.5. Model Efficiency

Figure 6 provides a detailed visualization of the test set accuracy across 100 training epochs, highlighting the rapid convergence of the proposed model. Initially, the model exhibits a steep rise in accuracy, reaching 90% by the 12th epoch. This swift improvement underscores the model’s efficiency.

Figure 6. Model efficiency.

The GCN plays a pivotal role in efficiently extracting spatial features from graph-based representations. GCNs excel at capturing intricate relationships and patterns within graph-structured data, which are often missed by conventional methods. This ability ensures that the model rapidly identifies and learns the key features required for precise fault diagnosis.

Additionally, the attention mechanism dynamically integrates multi-axis data by prioritizing the most critical directional features. This approach ensures that the model not only focuses on relevant features, but also adapts to the most significant data inputs, thereby enhancing its learning efficiency. By assigning adaptive weights to different features based on their importance, the attention mechanism accelerates the model’s convergence to high accuracy levels.

4.6. Data Fusion Techniques

We conducted a thorough evaluation of data fusion techniques for fault diagnosis, emphasizing the effectiveness of the attention mechanism in comparison to single-axis data approaches. The results, illustrated in Figure 7, present a clear and compelling case for the advantages of employing the attention mechanism in data fusion tasks. The labels

1, 2, 3, 4, 5, 6

in the figure correspond to the normal state, USCP, URJ, SWGS, URT, and UL, respectively. As shown in Figure 7a, the x-axis data struggles to distinguish between URJ and USCP due to overlapping spectral characteristics. The y-axis data (Figure 7b) exhibits poor sensitivity to vertical vibrations, resulting in frequent misclassifications of faults with dominant vertical signatures. While the z-axis data (Figure 7c) demonstrates improved performance for vertical faults like UL, it is less effective in diagnosing USCP, URJ, SWGS, and URT faults. In contrast, the attention-based fusion mechanism (Figure 7d) overcomes these limitations by dynamically assigning adaptive weights to diagnostically critical axes—for instance, prioritizing z-axis features for UL detection—thereby achieving a superior classification accuracy exceeding 95%. This approach effectively leverages complementary spatial information across triaxial sensors, while suppressing noise-prone or redundant directional components.

Figure 7. Evaluation of data fusion.

Figure 7d, which depicts the outcomes of applying the attention mechanism for data fusion, demonstrates remarkable classification accuracy across six distinct fault types. The low incidence of misclassification highlights the advantages of this approach. This superior performance is attributed to the GARN framework’s capability to seamlessly integrate and synthesize information from multiple data sources, effectively capturing the complex relationships and patterns that are often indicative of specific fault conditions.

In stark contrast, the subsequent confusion matrices, illustrating the results from employing individual x-axis, y-axis, and z-axis data, reveal a notable decline in diagnostic performance. When relying solely on x-axis data, the model achieves relatively high accuracy in certain categories. However, the overall performance does not reach the level achieved by the attention mechanism, as shown in Figure 7a. The y-axis data, in particular, results in a higher incidence of misclassifications, as shown in Figure 7b. Although the z-axis data performs marginally better than the y-axis data, it still falls short of the comprehensive performance demonstrated by the attention mechanism, as shown in Figure 7c.

Single-axis data frequently lacks the comprehensive information required to fully characterize complex fault patterns, making it difficult to accurately differentiate between various fault categories. Moreover, single-axis data often exhibits insufficient sensitivity to detect specific fault-related features, resulting in potential misclassifications and diminished diagnostic accuracy.

In contrast, the attention mechanism excels in integrating information comprehensively. By effectively capturing the interrelationships and dependencies among multiple axes, the attention mechanism ensures that the model has access to a more detailed and nuanced dataset. A key strength of the attention mechanism lies in its ability to dynamically assign weights to input features based on their significance. This capability ensures that critical features, which are essential for distinguishing between different fault states, are prioritized during the classification process. Such adaptive feature weighting enhances the model’s precision, ultimately leading to improved overall classification accuracy.

5. Conclusions

The rapid advancement of intelligent building technologies has heightened the need for a dependable elevator fault diagnosis system. Traditional methods rely on time- and frequency-domain analysis, which are often sensitive to noise and lack adaptability to complex fault situations. We propose a GARN framework combining the LPVG method, a GCN, attention-based data fusion, and GRUs to enhance the fault diagnosis capability for elevators. The triaxial vibration data is divided into fixed-size windows using a sliding window approach. The data from each window is converted into a graphical representation of the vibration data using the LPVG method, which enhances the model’s noise tolerance due to its limited-penetrability property. The processed graph structure data is fed into the GCN, which captures the graph features through aggregation, combination, and readout. The three-axis features are processed by the attention mechanism to emphasize the most important features from multiple axes. After feature fusion, the features are fed into the GRU, where temporal dependencies are modeled to achieve accurate and timely fault diagnosis. Numerous experiments demonstrate that the GARN outperforms traditional models in terms of diagnostic accuracy. The GARN architecture provides a robust and adaptable solution for elevator fault diagnosis.

Author Contributions

Conceptualization, H.W.; methodology, H.W.; software, H.W.; validation, H.W. and L.Y.; formal analysis, H.W.; investigation, L.Y.; resources, Q.T.; data curation, Q.T.; writing—original draft preparation, H.W.; writing—review and editing, L.Y. and Z.L.; visualization, Y.C.; supervision, Y.C.; project administration, L.Y.; funding acquisition, L.Y. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Fund, FDCT, Macau SAR, 0101/2022/A, and the Guangzhou Innovation and Entrepreneurship Leading Team Project Funding, No. 202009020008.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Qiwei Tang was employed by the company Hitachi Building Technology (Guangzhou) Co. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhang, Z.; Liu, G.; Li, Z. Adaptive supervisory control of automated manufacturing systems with unreliable resources based on smart switch controllers. IEEE Trans. Autom. Sci. Eng. 2024, 21, 5445–5456. [Google Scholar] [CrossRef]
Du, X.; Mel, D.; Chen, Z. Wavelet denoising of the horizontal vibration signal for identification of the guide rail irregularity in elevator. Key Eng. Mater. 2007, 353, 2794–2797. [Google Scholar] [CrossRef]
Xu, S.; Huang, Y. The fault diagnosis of elevator based on the autoregressive model and the support vector machine. Appl. Mech. Mater. 2013, 271, 1689–1694. [Google Scholar] [CrossRef]
Yi, J.; Huang, Y. Fault diagnosis of elevator based ar bi-spectrum. Adv. Mater. Res. 2012, 468, 1743–1748. [Google Scholar] [CrossRef]
Mishra, K.M.; Huhtala, K. Elevator fault detection using profile extraction and deep autoencoder feature extraction for acceleration and magnetic signals. Appl. Sci. 2019, 9, 2990. [Google Scholar] [CrossRef]
Zheng, Q.; Zhao, C. Wavelet packet decomposition and neural network based fault diagnosis for elevator excessive vibration. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 5105–5110. [Google Scholar]
Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
Wen, L.; Li, X.; Gao, L.; Zhang, Y. A new convolutional neural network-based data-driven fault diagnosis method. IEEE Trans. Ind. Electron. 2018, 65, 5990–5998. [Google Scholar] [CrossRef]
Zhu, Z.; Peng, G.; Chen, Y.; Gao, H. A convolutional neural network based on a capsule network with strong generalization for bearing fault diagnosis. Neurocomputing 2019, 323, 62–75. [Google Scholar] [CrossRef]
Sun, W.; Shao, S.; Zhao, R.; Yan, R.; Zhang, X.; Chen, X. A sparse auto-encoder-based deep neural network approach for induction motor faults classification. Measurement 2016, 89, 171–178. [Google Scholar] [CrossRef]
Shao, S.; Wang, P.; Yan, R. Generative adversarial networks for data augmentation in machine fault diagnosis. Comp. Ind. 2019, 106, 85–93. [Google Scholar] [CrossRef]
Jia, M.; Gao, X.; Li, H.; Pang, H. Elevator running fault monitoring method based on vibration signal. Shock Vib. 2021, 2021, 4547030. [Google Scholar] [CrossRef]
Shao, H.; Jiang, H.; Li, X.; Wu, S. Intelligent fault diagnosis of rolling bearing using deep wavelet auto-encoder with extreme learning machine. Knowl. Based Syst. 2018, 140, 1–14. [Google Scholar]
Battaglia, P.W.; Pascanu, R.; Lai, M.; Rezende, D.; Kavukcuoglu, K. Interaction networks for learning about objects, relations and physics. Adv. Neural Inf. Process. Syst. 2016, 29, 4509–4517. [Google Scholar]
Fout, A.; Byrd, J.; Shariat, B.; Ben-Hur, A. Protein interface prediction using graph convolutional networks. Adv. Neural Inf. Process. Syst. 2017, 30, 6533–6542. [Google Scholar]
Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
Zhang, Y.; Liu, Q.; Song, L. Sentence-state lstm for text representation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, VIC, Australia, 15–20 July 2018; pp. 317–327. [Google Scholar]
Wang, X.; Ye, Y.; Gupta, A. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6857–6866. [Google Scholar]
Fan, X.; Gong, M.; Wu, Y.; Tang, Z.; Liu, J. Neural Gaussian Similarity Modeling for Differential Graph Structure Learning. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI ’24), Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 11919–11926. [Google Scholar]
Fan, X.; Gong, M.; Wu, Y.; Li, H. Maximizing Mutual Information Across Feature and Topology Views for Representing Graphs. IEEE Trans. Knowl. Data Eng. 2023, 35, 10735–10747. [Google Scholar] [CrossRef]
Qi, P.; Li, D.; Ng, S.-K. Mad-sgcn: Multivariate anomaly detection with self-learning graph convolutional networks. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 1232–1244. [Google Scholar]
Feng, J.; Bao, S.; Xu, X.; Zhang, Z.; Hou, P.; Steyskal, F.; Dustdar, S. Rotating machinery fault diagnosis based on feature extraction via an unsupervised graph neural network. Appl. Intell. 2023, 53, 21211–21226. [Google Scholar] [CrossRef]
Fan, X.; Gong, M.; Wu, Y.; Zhang, M.; Li, H.; Jiang, X. CCGIB: A Cross-Channel Graph Information Bottleneck Principle. IEEE Trans. Neural Netw. Learn. Syst. 2024. early access. [Google Scholar] [CrossRef]
Sun, K.; Huang, Z.; Mao, H.; Qin, A.; Li, X.; Tang, W.; Xiong, J. Multi-scale cluster-graph convolution network with multi-channel residual network for intelligent fault diagnosis. IEEE Trans. Instrum. Meas. 2022, 71, 2502612. [Google Scholar] [CrossRef]
Yang, C.; Liu, J.; Zhou, K.; Jiang, X.; Zeng, X. An improved multi-channel graph convolutional network and its applications for rotating machinery diagnosis. Measurement 2022, 190, 110720. [Google Scholar] [CrossRef]
Li, C.; Mo, L.; Yan, R. Fault diagnosis of rolling bearing based on whvg and gcn. IEEE Trans. Instrum. Meas. 2021, 70, 3519811. [Google Scholar] [CrossRef]
Liang, R.; Ran, W.; Chen, Y.; Zhu, R. Fault diagnosis method for rotating machinery based on multi-scale features. Chin. J. Mech. Eng. 2023, 36, 141. [Google Scholar] [CrossRef]
Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Li, H.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12581–12600. [Google Scholar] [CrossRef] [PubMed]
Lu, Z.; Sreekumar, G.; Goodman, E.; Banzhaf, W.; Deb, K.; Boddeti, V.N. Neural architecture transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2971–2989. [Google Scholar] [CrossRef]

Figure 2. Comparison of two graph construction methods. (a) Visibility graph. (b) Limited penetrable visibility graph.

Figure 4. Data collection by different devices.

Figure 5. Original signal with added noise.

Figure 6. Model efficiency.

Figure 7. Evaluation of data fusion.

Table 1. Performance metrics across data sizes.

Size	Metric	MBCNN	Uniformer	NAT	Ours
300	Accuracy (%)	16.67	45.00	33.33	43.33
	Precision	0.0278	0.3750	0.1196	0.5012
	Recall	0.1667	0.4500	0.3333	0.4333
	F1 Score	0.0476	0.3706	0.1736	0.4353
600	Accuracy (%)	16.67	49.17	68.33	53.33
	Precision	0.0278	0.3053	0.5942	0.5364
	Recall	0.1667	0.4917	0.6833	0.5333
	F1 Score	0.0476	0.3595	0.5556	0.5322
900	Accuracy (%)	16.67	62.22	70.56	79.44
	Precision	0.0278	0.5278	0.6352	0.7973
	Recall	0.1667	0.6222	0.7056	0.7944
	F1 Score	0.0476	0.5425	0.6415	0.7952
1200	Accuracy (%)	16.67	66.25	72.92	83.75
	Precision	0.0278	0.4949	0.6561	0.8414
	Recall	0.1667	0.6625	0.7292	0.8375
	F1 Score	0.0476	0.5516	0.6698	0.8364
1500	Accuracy (%)	50.00	69.00	82.33	87.33
	Precision	0.3081	0.6675	0.7402	0.8776
	Recall	0.5000	0.6900	0.8233	0.8733
	F1 Score	0.3629	0.5628	0.7678	0.8731
1800	Accuracy (%)	66.67	79.09	83.33	95.56
	Precision	0.4724	0.7192	0.7500	0.9574
	Recall	0.6667	0.7909	0.8333	0.9556
	F1 Score	0.5445	0.7399	0.7778	0.9558

Table 2. Model performance metrics under different noise conditions.

Permeability Thresholds	Noise	Accuracy (%)	Precision	F1 Score	Recall
in LPVG ( $δ$ )
0	Gaussian	75.83	0.805350	0.749514	0.758333
0	Poisson	81.39	0.836476	0.813889	0.814779
0	Uniform	84.17	0.865024	0.841667	0.841789
1	Gaussian	83.06	0.842317	0.830556	0.830570
1	Poisson	85.83	0.870287	0.858333	0.857704
1	Uniform	86.11	0.879374	0.861111	0.860301
2	Gaussian	88.89	0.898421	0.888889	0.890473
2	Poisson	90.00	0.902599	0.900000	0.899191
2	Uniform	91.39	0.916643	0.913889	0.914181
3	Gaussian	85.56	0.871879	0.855556	0.855124
3	Poisson	85.83	0.874361	0.858333	0.855258
3	Uniform	87.22	0.887218	0.872222	0.871628

Table 3. Model parameters and performance metrics.

GCN Hidden	GRU Hidden	Accuracy (%)	Precision	Recall	F1 Score
1	16	75.28	0.7614	0.7528	0.7508
1	32	80.00	0.8057	0.8000	0.7998
1	64	88.89	0.8937	0.8889	0.8897
1	128	89.17	0.8932	0.8917	0.8918
2	16	79.17	0.7954	0.7917	0.7912
2	32	86.94	0.8726	0.8694	0.8698
2	64	85.28	0.8557	0.8528	0.8527
2	128	95.56	0.9574	0.9556	0.9558
4	16	81.39	0.8204	0.8139	0.8139
4	32	83.06	0.8333	0.8306	0.8310
4	64	84.72	0.8535	0.8472	0.8483
4	128	80.28	0.8069	0.8028	0.8024
8	16	73.61	0.7421	0.7361	0.7376
8	32	81.67	0.8224	0.8167	0.8171
8	64	78.61	0.7968	0.7861	0.7891
8	128	88.89	0.8949	0.8889	0.8896

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Elevator Fault Diagnosis Based on a Graph Attention Recurrent Network

Abstract

1. Introduction

2. The Proposed Method

2.1. Graph Construction by LPVG

2.2. LPVG Construction Steps

2.3. Feature Extraction by GCN

2.4. Feature Fusion by Attention Mechanism

2.5. Fault Diagnosis by GRUs

3. Dataset Description

4. Experiments

4.1. Comparative Analysis of Models for Elevator Fault Diagnosis

4.2. Generalizability Evaluation

4.3. Noise Resistance in Fault Diagnosis

4.4. Hyperparameter Sensitivity Analysis

4.5. Model Efficiency

4.6. Data Fusion Techniques

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics