A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots

Wu, Jun; Zhang, Yuepeng; Gao, Bo; Xia, Linzhong; Zhu, Xueli; Wang, Hui; Wan, Xiongbo

doi:10.3390/a18120779

Open AccessArticle

A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots

by

Jun Wu

^1,2,

Yuepeng Zhang

^1,2,*

,

Bo Gao

^1,2,

Linzhong Xia

^1,2

,

Xueli Zhu

^1,2,

Hui Wang

² and

Xiongbo Wan

³

¹

School of Sino-German Robotics, Shenzhen University of Information Technology, Shenzhen 518172, China

²

Inovance Industrial Robot Reliability Technology Research Institute, Shenzhen University of Information Technology, Shenzhen 518172, China

³

School of Automation, China University of Geosciences, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(12), 779; https://doi.org/10.3390/a18120779

Submission received: 11 November 2025 / Revised: 4 December 2025 / Accepted: 6 December 2025 / Published: 10 December 2025

(This article belongs to the Special Issue Data-Driven Intelligent Modeling and Optimization Algorithms for Industrial Processes: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Predominant fault diagnosis in industrial robots depends on dedicated vibration or acoustics sensors. However, their practical deployment is often limited by installation constraints, susceptibility to environmental noise, and cost considerations. Applying Energy-Based Maintenance (EBM) principles to achieve enhanced fault diagnosis under practical industrial conditions, we propose a hybrid deep learning framework, the Multi-head Graph Attention Network (MGAT) with Multi-scale CNNBiLSTM Fusion (MGAT-MCNNBiLSTM) for industrial robots. This approach obviates the need for additional dedicated sensors, effectively mitigating associated deployment complexities. The framework embodies four core innovations: (1) Based on the EBM paradigm, motor current is established as the most effective and practical choice for enabling cost-efficient and scalable industrial robot fault diagnosis. A corresponding dataset of motor current has been acquired from industrial robots operating under diverse fault scenarios. (2) An integrated MGAT-MCNNBiLSTM architecture that synergistically models multiscale local features and complex dynamics through its MCNNBiLSTM module while capturing nonlinear interdependencies via MGAT. This comprehensive feature representation enables robust and highly accurate fault detection. (3) The study found that the application of spectral preprocessing techniques yields a marked and statistically significant enhancement in diagnostic performance. A comprehensive and systematic analysis was undertaken to uncover the underlying reasons for this observed performance improvement. (4) To emulate challenging industrial settings and cost-sensitive implementations, noise signal injection was employed to evaluate model robustness in high-electromagnetic-interference environments and low-cost, low-resolution ADC implementations. Experimental validation on real-world industrial robot datasets demonstrates that MGAT-MCNNBiLSTM achieves a superior diagnostic accuracy of 90.7560%. This performance marks a significant absolute improvement of 1.51–8.55% over competing models, including LCNNBiLSTM, SCNNBiLSTM, MCCBiLSTM, GAT, and MGAT. Under challenging noise and low-resolution conditions, the proposed model consistently outperforms CNNBiLSTM variants, GAT, and MGAT with an improvement of 1.37–10.26% and enhanced industrial utility and deployment potential.

Keywords:

industrial robot; fault diagnosis; graph attention network; CNNBiLSTM

1. Introduction

Industrial robots (IRs) serve as essential equipment for high-precision manufacturing, supporting applications ranging from automotive welding to semiconductor assembly. They significantly improve productivity, positioning consistency, and personnel safety. According to the International Federation of Robotics (IFR), global installations of IRs reached 542,000 units in 2024, with China accounting for 54% of the total. The IFR forecasts a positive growth trajectory, with the annual installation rate at an average of approximately 7% in the coming years [1]. However, early-stage mechanical failures in key drivetrain components continue to compromise positioning accuracy, operational reliability, and service life. Consequently, there is an urgent industrial demand for diagnostic strategies that ensure high accuracy and robustness to minimize unplanned downtime and operational costs.

However, addressing this demand presents fundamental challenges in terms of practical deployability and superior diagnostic capabilities. Traditional methods for diagnosing mechanical faults rely primarily on vibration or acoustic sensing [2,3,4]. For instance, He et al. [5] directly mounted vibration sensors on robot links to identify harmonic reducer faults, while Wu et al. [6] utilized accelerometer signals enhanced by an adaptive particle swarm optimization algorithm to learn discriminative features for IR fault diagnosis. Liu et al. [7] further developed a monitoring framework based on acoustic emission for harmonic reducers. Despite their promise in laboratory settings, these sensing techniques face considerable barriers to real-world implementation, including restrictive mounting requirements, susceptibility to environmental interference, and high hardware costs. Specifically, triaxial accelerometers often require invasive installation, which is particularly challenging for sealed joints and high-speed axes. The fidelity of the acquired data is highly dependent on sensor placement precision in industrial robots. Hu et al. [8] emphasized that an optimized configuration is essential for reliable health assessment, as mispositioned sensors degrade signal integrity and diagnostic accuracy. Similarly, acoustic sensors are prone to contamination from ambient factory noise. Moreover, retrofitting production-line industrial robots with multi-axis sensing systems involves significant hardware investment and additional signal processing expenses. These collective challenges ultimately diminish the economic feasibility of widespread industrial deployment.

Energy-Based Maintenance (EBM) has emerged as a pivotal theoretical and practical framework for modern predictive maintenance, positing that energy-related metrics serve as primary and direct indicators of system health and mechanical integrity [9,10,11]. In electromechanical systems like IRs, the motor current is a fundamental energy-based signal, directly reflecting electromagnetic torque and the efficiency of the drivetrain. Therefore, from an EBM perspective, motor current signature analysis (MCSA) is not merely a convenient alternative but the theoretically preferred modality for fault diagnosis. Therefore, MCSA aligns perfectly with the EBM paradigm by offering a compelling solution to these deployment barriers, providing a non-invasive and economically feasible path for reliable maintenance. This approach has several advantages: (1) Non-invasive and Cost-effective Implementation: Leveraging existing current monitoring in servo drives, it eliminates the need for additional sensors or retrofitting. (2) Strong resilience to environmental conditions: Current measurements are largely unaffected by factors such as temperature fluctuations, humidity, dust, and mechanical vibrations, enabling stable signal acquisition even in harsh industrial settings. (3) Theoretical Alignment with EBM: Most significantly, it provides direct electromechanical coupling, capturing fault-induced torque variations. This makes the current signal a primary energy-based indicator, fundamentally aligning with EBM’s core principle.

Signal processing techniques have become important tools for diagnostic applications [12]. For instance, Raouf et al. [13] used statistical metrics and kinetic parameters derived from motor currents for dimensionality reduction. Lee et al. [14] applied wavelet packet decomposition to extract degradation-sensitive indicators, while Wang et al. [15] integrated variational mode decomposition with support vector machines for fault detection. However, manual feature engineering remains subjective and often limits the generalization of the model across different IRs, necessitating more robust data-driven approaches.

Deep Learning (DL) has consequently emerged as the predominant methodology, owing to its capacity for autonomous feature representation learning directly from raw sensor signals. Convolutional Neural Networks (CNNs) are widely employed to extract discriminative spatial features, eliminating the labor-intensive process of manual feature engineering [16,17,18,19,20]. However, conventional CNNs are constrained by fixed kernel sizes and local receptive fields, which limit their ability to model long-range temporal dependencies. To mitigate this deficiency, Recurrent Neural Networks, particularly Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM), are utilized to model complex temporal dynamics and non-stationary fault signatures [21,22,23,24]. Recognizing the need to capture both spatial and temporal dependencies simultaneously, researchers have developed hybrid models such as CNN-LSTM/CNN-BiLSTM, which have demonstrated superior diagnostic performance compared to individual models [25,26,27].

Despite these advances, a fundamental limitation persists: conventional CNNs and LSTMs operate on Euclidean data, thereby failing to capture the complex, non-Euclidean interdependencies and system-level interactions. Although Graph Neural Networks (GNNs) attempt to bridge this gap by modeling topological relationships [28,29,30], conventional GNNs use fixed aggregation weights determined solely by graph connectivity, which ignores the physical significance of interactions and lacks adaptive learning. While Graph Attention Networks (GATs) and Multi-head GATs (MGATs) introduce adaptive weighting mechanisms [31,32], their application in IR fault diagnosis often lacks synergistic integration with multi-scale feature extraction and temporal sequence modeling.

However, EBM provides a theoretical rationale for utilizing motor current signals. Existing deep learning methods have not fully exploited the rich informational potential of current signals for IR fault diagnosis. Diagnostic performance remains constrained by three key issues: (1) The challenge of extracting discriminative features comprehensively across multiple scales. (2) The difficulty in dynamically modeling the complex, non-Euclidean interdependencies among features. (3) The ineffective integration of temporal dynamics with structural relationships. These problems constrain the model’s discriminative power and prevent the full realization of reliable, cost-effective maintenance solutions.

To bridge the identified theory-practice gap in EBM and enable highly accurate and robust fault diagnosis for reliable maintenance of IRs, this study proposes a hybrid deep learning framework, termed the Multi-head Graph Attention Network with Multi-scale CNNBiLSTM Fusion (MGAT-MCNNBiLSTM). This novel architecture enables an end-to-end synergistic fusion by integrating a multi-head graph attention mechanism to model complex feature interdependencies and a multi-scale CNNBiLSTM to hierarchically extract multi-resolution discriminative features. Figure 1 illustrates the proposed energy-metric-driven framework for IR fault diagnosis.

The main contributions of this work are summarized as follows:

Based on the EBM principle, motor current is established as the most effective and practical choice for enabling cost-efficient and scalable IR fault diagnosis, owing to its direct reflection of torque coupling dynamics under mechanical faults. A corresponding dataset of motor current has been acquired from IRs operating under diverse fault scenarios.
This study extends the EBM theoretical framework for IR fault diagnosis by proposing an MGAT-MCNNBiLSTM architecture that integrates MGAT with MCNNBiLSTM, effectively capturing both temporal dynamics at varying resolutions and structural dependencies within feature graphs. This dual-path design facilitates the fusion of temporal-spectral attributes with spatio-graph relationships. Comparative trials demonstrate its consistent superiority over competing architectures, including LCNNBiLSTM, SCNNBiLSTM, MCNNBiLSTM, GAT, and MGAT.
The research of these current signal-based fault diagnosis models reveals that the application of spectral preprocessing techniques yields a statistically significant enhancement in diagnostic performance. Based on our experimental results, we subsequently undertook a comprehensive and systematic analysis to elucidate the fundamental mechanisms responsible for this observed performance improvement.
To evaluate diagnostic robustness under realistic industrial operating regimes and cost constraints, noise signal injection was employed to simulate high-electromagnetic-interference environments and low-cost, low-resolution ADC implementations. The proposed MGAT-MCNNBiLSTM model demonstrated consistent performance superiority over CNNBiLSTM variants, GAT, and MGAT benchmarks across these challenging scenarios. These results verify the model’s practical applicability in electrically noisy industrial settings and its compatibility with low-cost, low-resolution hardware. This capability directly supports the industry’s goal of achieving reliable predictive maintenance while overcoming deployment challenges associated with hardware costs and environmental interference.

The remaining sections of this paper are organized as follows: Section 2 details the theoretical background and framework of the proposed fault diagnosis method for IRs. Section 3 presents the experimental results, while Section 4 provides a detailed discussion. Finally, Section 5 concludes the paper.

2. Theoretical Background and Framework of Proposed Method

2.1. Signal Selection and Fault Definition

2.1.1. Signal Selection

For IRs, current-based fault diagnosis is often more practical to deploy due to its inherent compatibility with existing drive systems and standardized data interfaces. Unlike vibration monitoring, which relies on expensive accelerometers susceptible to installation misalignment, or acoustic methods that are vulnerable to ambient noise, MCSA utilizes current sensors already embedded in servo systems. A detailed comparison of MCSA with vibration and acoustic methods for fault diagnosis is presented in Figure 2. By avoiding additional hardware, this approach reduces installation complexity and directly captures electromagnetic torque variations induced by mechanical faults. Consequently, MCSA enables low-cost deployment and millisecond-level fault response, meeting the high-reliability and real-time demands of industrial applications.

2.1.2. Fault Definition

In the evaluation of robotic performance, beyond standard product specifications, end-effector positioning repeatability

δ_{p}

represents a pivotal criterion for fault stratification. This priority originates from its direct correlation with core mechanical integrity and operational performance, in contrast to indirect indicators such as vibration or temperature.

δ_{p}

rigorously quantifies positional consistency during repeated task executions, and is sensitive to underlying mechanical degradation mechanisms.

As Figure 3 shows, these mechanisms are primarily categorized into transmission system anomalies, structural system instabilities, motion interface imperfections, and environmental interaction perturbations. Collectively, these factors determine the positional consistency of IR manipulators during cyclic operations. This makes positioning repeatability serve as a powerful and robust diagnostic tool for mechanical faults in IRs.

As shown in Figure 4, the schematic illustrates the methodology for measuring positioning repeatability of our IR end-effector employing the laser displacement sensor (LK-G85, Keyence, Osaka, Japan). The measured positioning repeatability

δ_{m}

is calculated using the following equation:

δ_{m} = \frac{1}{N_{m}} \sum_{i_{m} = 1}^{N_{m}} \sqrt{δ_{x}^{2} + δ_{y}^{2} + δ_{z}^{2}}

(1)

where

δ_{x}

,

δ_{y}

, and

δ_{z}

are the measured positioning repeatability values along the X-axis, Y-axis, and Z-axis, respectively.

N_{m}

is the total number of measurements.

Given the OEM-specified baseline

δ_{s}

= 0.05 mm, Table 1 classifies fault severity levels for IRs based on

δ_{m} / δ_{s}

deviation thresholds.

Figure 5 illustrates a systematic data acquisition framework designed to construct a comprehensive dataset reflecting real-world industrial scenarios. Under continuous cyclic operation, industrial robots undergo gradual mechanical degradation, which manifests as a measurable escalation in end-effector positioning repeatability error. Guided by the stratification criteria defined in Table 1, this degradation spectrum is categorized into five distinct severity levels ranging from normal operation to critical failure risk. Subsequently, motor current signals are acquired from IR systems corresponding to these specific fault categories. Data acquisition across these levels provides the necessary dataset for developing intelligent fault diagnosis models.

2.2. Convolutional Neural Network and Long Short-Term Memory

2.2.1. Convolutional Neural Network

CNNs provide data-driven fault diagnosis through hierarchical deep learning architectures. As feed-forward networks, they employ cascaded convolution and pooling operations to learn discriminative, multi-level feature representations automatically. This architecture is particularly effective for processing complex signal modalities, including spectrograms and multivariate time-series data. Fundamentally, these networks employ weight-sharing convolutional filters to capture spatially invariant features.

The convolution operation at the layer

l_{c}

is defined as

y_{i_{c}}^{l_{c} + 1} (j_{c}) = \sum_{k_{c} = 1}^{K_{c}} W_{i_{c}}^{l_{c}} (k_{c}) \cdot x^{l_{c}} (j_{c} + k_{c} - 1) + b_{i_{c}}^{l_{c}}

(2)

where

j_{c}

is the spatial position index.

K_{c}

is the convolutional kernel size.

W_{i_{c}}^{l_{c}}

and

b_{i_{c}}^{l_{c}}

are the weight kernel and bias term for the

i_{c}

-th filter.

x^{l_{c}} (j_{c}

) is the input feature map at position

j_{c}

.

In CNNs, max-pooling is widely used as the predominant downsampling method. This process extracts the most prominent response within each sub-region, effectively capturing distinctive local patterns while curbing computational demands through dimensionality reduction. The output feature map at the

(l_{c} + 1)

-th layer for the

i_{c}

-th channel after pooling can be described as

p_{i_{c}}^{l_{c} + 1} (j_{p}) = \underset{(j_{p} - 1) W_{p} + 1 \leq i_{p} \leq j_{p} W_{p}}{m a x} \{y_{i_{c}}^{l_{c}} (i_{p})\}

(3)

where

j_{p}

indexes output spatial positions after pooling.

y_{i_{c}}^{l_{c}}

denotes the output of the convolution layer at position

i_{p}

in the

i_{c}

-th channel, and

W_{p}

represents the width of the pooling kernel. The index

i_{p}

spans all spatial positions within the pooling window associated with the output position

j_{p}

.

2.2.2. Long Short-Term Memory

LSTM networks overcome critical shortcomings of traditional recurrent neural networks, including the temporal gradient vanishing and explosion phenomena, while preserving robust modeling of sequential dependency. This architecture regulates information flow across time steps through dedicated gating mechanisms—input, forget, and output gates, as depicted in Figure 6.

The core innovation lies in the memory cell and how its gates meticulously regulate state updates:

Input Gate: Regulates integration of current input x_t and the previous hidden state $h_{t - 1}$ into the cell state $c_{t}$ :

$i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})$

(4)
Forget Gate: Controls the retention or discard of the historical cell state $c_{t - 1}$ :

$f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})$

(5)
Output Gate: Modulates the exposure of the cell state $c_{t}$ to update the hidden state $h_{t}$ :

$o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})$

(6)

The memory cell state evolves through

{\tilde{c}}_{t} = t a n h (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

(7)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}

(8)

h_{t} = o_{t} ⊙ t a n h (c_{t})

(9)

where

σ

is the sigmoid activation function,

⊙

denotes element-wise multiplication;

b_{i}

,

b_{f}

,

b_{o}

and

b_{c}

are bias vectors specific to each component, respectively;

W_{ς}

and

U_{ς}

represent input-hidden and hidden-hidden weight matrices, respectively, with unique parameters for gates

(\begin{matrix} ς = i, f, o \end{matrix})

and the cell

(ς = c)

. At each timestep

t

, gate activations and state transitions derive from the current input

x_{t}

and the previous hidden state

h_{t - 1}

.

2.3. Framework of the CNNBiLSTM-Based Fault Diagnosis Model

Industrial robot systems operating under complex dynamic conditions generate highly coupled and non-stationary signals, which pose significant challenges to conventional fault diagnosis approaches. Current deep learning approaches also face specific shortcomings: CNNs extract spatially local features through fixed-kernel convolutions but lack dynamic temporal modeling capability, while BiLSTM networks capture temporal dependencies yet inherently ignore spatial correlations.

To address these issues, we employ a CNNBiLSTM model that combines spatial feature extraction with bidirectional temporal modeling. In this framework, convolutional layers identify discriminative spatial patterns, while the BiLSTM modules capture contextual temporal dependencies. This combination yields a unified spatiotemporal representation, which enhances the accuracy of fault diagnosis in IRs.

As illustrated in Figure 7, the CNNBiLSTM fault diagnosis framework processes signals through a hierarchical cascade:

Spectral Transformation Module: The initial processing stage employs the Fourier transform to convert the raw time-domain signals into the frequency-domain representations. This transformation reveals latent spectral features that are critical for distinguishing different fault types.
Convolutional Layers: These layers apply convolutional kernels to the spectral inputs to extract salient spatial features indicative of fault signatures.
Pooling Layers: Subsequent max-pooling operations downsample the feature maps by retaining the most activated values, which serves to reduce data dimensionality and enhance invariance to small signal shifts.
BiLSTM Layers: The spatial features are then sequenced and processed by BiLSTM layers. By analyzing the sequence in both forward and backward directions, this module captures the temporal evolution of fault patterns across operational cycles.
Fully Connected Layers: The high-level spatiotemporal features from the BiLSTM are integrated by fully connected layers, combining them into a unified representation for final classification.
Output Layer: A softmax function produces the final fault probability distribution, and the entire network is trained by minimizing the cross-entropy loss.

2.4. Framework of the MCNNBiLSTM-Based Fault Diagnosis Model

The complex nonlinear dynamics inherent in IR systems distribute fault-related features across multiple temporal scales, requiring multi-resolution analysis for accurate diagnosis. Although standard CNNBiLSTM models track temporal evolution, their fixed-scale convolutional kernels are inadequate for separating overlapping fault signatures that reside in distinct spectrotemporal regions. This inflexibility of the receptive fields limits multi-scale feature adaptation and impedes the modeling of cross-scale correlations.

We present a multi-scale CNNBiLSTM (MCNNBiLSTM) framework for IR fault diagnosis. The model features a dual-branch architecture designed to learn hierarchical fault representations. As shown in Figure 8, it comprises two complementary feature extraction pathways: a large-scale branch (LCNNBiLSTM) captures gradual degradation trends, while a small-scale branch (SCNNBiLSTM) identifies transient fault signatures. Each branch employs BiLSTM modules to model temporal dependencies. Subsequently, the outputs are concatenated to integrate multi-scale temporal features, thereby enhancing the robustness of fault diagnosis.

2.5. Framework of the GAT/MGAT-Based Fault Diagnosis Model

2.5.1. Graph Attention Network

Graph Attention Networks (GATs) enhance standard graph convolutional structures by integrating a masked self-attention mechanism, which dynamically computes attention weights for neighboring nodes. This design enables GATs to focus on the most relevant features when combining information from neighboring nodes. This process strengthens important structural patterns while reducing noise interference. Consequently, GATs generate topology-aware feature representations that significantly improve the robustness and classification accuracy of graph-based diagnostic systems.

GATs operate on graph-structured data characterized by a node feature matrix

H = {h_{1}, h_{2}, . . ., h_{N_{g}}}

, where each node

h_{i} \in R^{D_{g}}

carries a

D_{g}

-dimensional feature vector within an

N_{g}

-node topology. Through attention-driven nonlinear transformations, the architecture derives discriminative representations in latent topological spaces:

H^{'} = G A T (H, G; Θ)

(10)

where

G

defines the graph connectivity, and

Θ

denotes trainable parameters. The output embeddings

H^{'} = {h_{1}^{'}, h_{2}^{'}, \dots, h_{N}^{'}}

with

h_{i}^{'} \in R^{D_{g}^{'}}

demonstrate enhanced representational capacity for fault patterns.

The transformation involves two sequential stages, as shown in Figure 9:

Feature Projection

A trainable shared parameter matrix

W_{G A T} \in R^{D_{g}^{'} \times D_{g}}

transforms input features into a latent representation:

z_{i} = W_{G A T} h_{i} \forall i \in {1, \dots, N_{g}}

(11)

This transformation maintains inherent structural relationships while simultaneously capturing complex feature representations.

2.: Attention-Driven Aggregation

With graph attention mechanisms, normalized attention weights

α_{i j}

quantify the relational significance of neighbors

j \in N (i)

for each central node

i

.

Attention Coefficient:

e_{i j} = L e a k y R e L U (α^{⊤} [z_{i} ∥ z_{j}]) = L e a k y R e L U (α^{⊤} [{W_{G A T} h}_{i} ∥ {W_{G A T} h}_{j}])

(12)

where

α \in R^{2 D_{g}^{'}}

denotes the attention parameter vector, and

∥

represents feature concatenation.

Normalized Attention Weights:

α_{i j} = \frac{e x p (e_{i j})}{\sum_{k \in N (i)} e x p (e_{i k})} = \frac{e x p (L e a k y R e L U (α^{⊤} [{W_{G A T} h}_{i} ∥ {W_{G A T} h}_{j}]))}{\sum_{k \in N (i)} e x p (L e a k y R e L U (α^{⊤} [{W_{G A T} h}_{i} ∥ {W_{G A T} h}_{k}]))}

(13)

where

N (i) = {j \in V ∣ (i, j) \in E}

defines the neighborhood of node

i

, with

V

and

E

being the node and edge sets of the graph, respectively.

The aggregated feature representation is given by

h_{i}^{'} = σ (\sum_{j \in N (i)} α_{i j} z_{j})

(14)

2.5.2. Multi-Head GAT-Based Fault Diagnosis Model

To improve learning stability, the multi-head attention mechanism extracts features in parallel. As shown in Figure 10, it employs multiple independent attention units, each capturing a distinct hidden representation. Then, these outputs are combined into a unified feature vector.

This process generates two different output representations:

Concatenated Features: Multiple independent attention heads concatenate their outputs into high-dimensional representations.

h_{G A T 1, i}^{'} = \begin{matrix} N_{h n} \\ ‖ \\ m = 1 \end{matrix} σ_{G A T 1} (\sum_{j \in N (i)} α_{i j}^{m} W_{G A T, k} h_{j})

(15)

where

N_{h n}

is the number of heads.

2.: Averaged Features: An alternative approach combines outputs via feature averaging, which produces stable representations with reduced dimensionality.

h_{G A T 2, i}^{'} = σ_{G A T 2} (\frac{1}{N_{h n}} \sum_{k = 1}^{N_{h n}} \sum_{j \in N (i)} α_{i j}^{m} W_{G A T, k} h_{j})

(16)

In this study, the second strategy was adopted to optimize computational efficiency and operational resilience. Figure 11 illustrates the GAT/MGAT-based fault diagnosis model for IRs.

2.6. Framework of the MGAT-MCNNBiLSTM-Based Fault Diagnosis Model

Accurate fault diagnosis in IRs requires the concurrent analysis of complex temporal dynamics and spatial dependencies within sensor data. To address this challenge and further improve diagnostic accuracy, we propose MGAT-MCNNBiLSTM, a novel architecture integrating Multi-Graph Attention Network (MGAT) with Multi-scale Convolutional Bidirectional Long Short-Term Memory network (MCNNBiLSTM) for IR fault diagnosis, as illustrated in Figure 12.

This synergistic design capitalizes on complementary strengths: MCNNBiLSTM captures multi-resolution features and bidirectional long-range temporal dynamics, while MGAT directly models heterogeneous structural dependencies across multi-relational graphs. The integrated framework comprehensively characterizes inherent spatiotemporal interactions in the complex industrial system. It preserves essential temporal dynamics while substantially enhancing spatial discriminative capacity, ultimately achieving improved fault classification accuracy.

The complete MGAT-MCNNBiLSTM fault diagnosis model follows a structured workflow, as detailed in Algorithm 1.

Algorithm 1: MGAT-MCNNBiLSTM Fault Diagnosis Model

Input: Raw joint current signals

S \in R^{N_{s} \times N_{b}}

Output: Predicted class probabilities

P (\hat{y} = c | O_{out})

1: Procedure Spectral Transformation Module (

S

):
2:

X_{f r e q} \leftarrow F (S), X_{f r e q} \in R^{N_{f} \times N_{b}}

3: return

X_{f r e q}

4: end Procedure
5: Procedure MCNNBiLSTM Module (

X_{f r e q}

):
6://Branch 1: LCNNBiLSTM
7:

X_{1, L C} \leftarrow X_{f r e q}

8: for

j

=1 to

N_{L C}

do
9: if

j > 1

then
10:

X_{j, L C} \leftarrow O_{j - 1, L C}

11: end if
12:

O_{j, L C} \leftarrow {O P}_{j, L C} (X_{j, L C}) \in R^{N_{j, L C} \times N_{b}}

,

{O P}_{j, L C} \in

{Conv, ReLU, MaxPool}
13: end for
14:

O_{L C} \leftarrow O_{N_{L C}, L C}

15:

X_{1, L B} \leftarrow O_{L C}

16: for

j

= 1 to

N_{L B}

do
17: if

j > 1

then
18:

X_{j, L B} \leftarrow O_{j - 1, L B}

19: end if
20:

O_{j, L B} \leftarrow {O P}_{j, L B} (X_{j, L B}) \in R^{N_{j, L B} \times N_{b}}

,

{O P}_{j, L B} \in

{BiLSTM}
21: end for
22:

O_{L B} \leftarrow O_{N_{L B}, L B}

23://Branch 2: SCNNBiLSTM
24:

X_{1, S C} \leftarrow X_{f r e q}

25: for

j

= 1 to

N_{S C}

do
26: if

j > 1

then
27:

X_{j, S C} \leftarrow O_{j - 1, S C}

28: end if
29:

O_{j, S C} \leftarrow {O P}_{j, S C} (X_{j, S C}) \in R^{N_{j, S C} \times N_{b}}

,

{O P}_{j, S C} \in

{Conv, ReLU, MaxPool}
30: end for
31:

O_{S C} \leftarrow O_{N_{S C}, S C}

32:

X_{1, S B} \leftarrow O_{S C}

33: for

j

= 1 to

N_{S B}

do
34:

i f

j > 1

then
35:

X_{j, S B} \leftarrow O_{j - 1, S B}

36: end if
37:

O_{j, S B} \leftarrow {O P}_{j, S B} (X_{j, S B}) \in R^{N_{j, S B} \times N_{b}}

,

{O P}_{j, S B} \in

{BiLSTM}
38: end for
39:

O_{S B} \leftarrow O_{N_{S B}, S B}

40://Feature fusion and projection
41:

O_{fusion, MCB} \leftarrow [O_{L B}; O_{S B}]

42:

O_{M C B} \leftarrow W_{F C, M C B} O_{fusion, MCB} + B_{F C, M C B}

43: return

O_{M C B}

44: end Procedure
45: Procedure MGAT Module (

X_{f r e q}

)
46:

X_{1, G A T} \leftarrow X_{f r e q}

47: for

j

= 1 to

N_{G A T}

do
48:

i f

j > 1

then
49:

X_{j, G A T} \leftarrow O_{j - 1, G A T}

50: end if
51:

O_{j, G A T} \leftarrow {O P}_{j, G A T} (X_{j, G A T}) \in R^{N_{j, G A T} \times N_{b}}

{O P}_{j, G A T} \in

{Multi-head GAT, BatchNorm, ReLU}
52: end for
53:

O_{G A T} \leftarrow W_{F C, G A T} O_{N_{G A T}, G A T} + B_{F C, G A T}

54: return

O_{G A T}

55: end procedure
56: Procedure Feature Fusion & Classification (

O_{M C B}

,

O_{GAT}

)
57:

O_{fusion} \leftarrow [O_{M C B}; O_{GAT}]

58:

O_{FC} \leftarrow W_{F C} O_{f u s i o n} + B_{F C}, O_{F C} \in R^{N_{f c}}

59:

O_{out} \leftarrow W_{o u t} O_{F C} + B_{o u t}, O_{o u t} \in R^{N_{o u t}}

60:

P (\hat{y} = c | O_{out}) \leftarrow S o f t m a x (O_{out}), c \in {1, \dots, N_{c}}

61: return

P (\hat{y} = c | O_{out})

62: end procedure

This algorithm executes the spectral transformation, the MCNNBiLSTM module, the MGAT module, and the feature fusion & classification module in sequence to obtain the IR fault diagnosis result.

Spectral Transformation Module

Raw joint current signals

S \in R^{N_{s} \times N_{b}}

sampled at

f_{s} = 1 kHz

from industrial robot servo drives.

N_{s}

and

N_{b}

are signal length and batch size, respectively.

The Fourier transform converts time-domain signals to frequency-domain representations

X_{f r e q}

, which corresponds to lines 1–4 in Algorithm 1.

N_{f}

denotes the number of frequency bins (

N_{f}

-point FFT).

2.: MCNNBiLSTM Module

The MCNNBiLSTM module implements a dual-branch architecture for extracting multi-scale spatiotemporal features. The complete algorithmic procedure is detailed in Algorithm 1 (lines 5–44).

The LCNNBiLSTM branch processes spectral features through

N_{L C}

CNN layers followed by

N_{L B}

BiLSTM layers, producing the output

O_{L B}

.

The SCNNBiLSTM branch follows a similar structure with independent parameters (

N_{S C}

CNN layers and

N_{S B}

BiLSTM layers), producing the output

O_{S B}

.

The outputs from both branches are concatenated and linearly projected to obtain the final MCNNBiLSTM representation

O_{M C B}

.

3.: MGAT Module

The MGAT module processes spectral features to capture interdependencies through graph attention mechanisms. As outlined in Algorithm 1 (lines 45–55), the module applies

N_{G A T}

graph attention layers, with the final graph representation

O_{G A T}

obtained via linear projection.

4.: Feature Fusion & Classification Module

Following the procedure in Algorithm 1 (lines 56–62), the outputs from MCNNBiLSTM and MGAT modules are concatenated, processed through a fully connected layer, and classified via softmax activation to obtain the final fault diagnosis probabilities

P (\hat{y} = c | O_{out})

.

The cross-entropy loss function quantifies the discrepancy between predicted class probabilities and ground-truth labels by computing the negative log-likelihood of the true class. Formally, for a true label vector

y \in R^{N_{c}}

and predicted probability vector

\hat{y} \in R^{N_{c}}

, the loss function

L (y, \hat{y})

is defined as

L (y, \hat{y}) = - \sum_{k = 1}^{N_{c}} y_{k} l o g ({\hat{y}}_{k}) = - \sum_{k = 1}^{N_{c}} y_{k} l o g (p_{c} = (\hat{c} = c ∣ θ))

(17)

where

y_{k}

is the

k

-th compomemt of

y

(1 for true class, 0 otherwise);

{\hat{y}}_{k}

is the predicted probability of class

k

;

N_{c}

is the number of classes.

The predicted class label is obtained through

\hat{c} = \arg \max_{k} {\hat{y}}_{k} \in {1, 2, \dots, N_{c}}

(18)

The complete optimization objective minimizes the expected cross-entropy over the training dataset

D

:

\underset{θ}{m i n} \frac{1}{|D|} \sum_{(x, y) \in D} L (y, \hat{y} (x; θ))

(19)

where

θ

encompasses all trainable parameters in the fault diagnosis model.

2.7. Experiment Methodology

Industrial robot systems exhibit complex fault signatures during sustained operation. The combined effect of mechanical wear, transient collisions, and thermal stress causes dynamically evolving failure mechanisms throughout operational lifespans. During extended uninterrupted service cycles, cumulative wear in transmission components progressively degrades end-effector positioning repeatability, which is a key performance metric meeting production specification standards for industrial robots. Consequently, this loss of positioning repeatability provides a measurable proxy for mechanical degradation, serving as a critical prognostic indicator of system health.

To quantify these failure modes, we recorded current signatures at different fault severity levels. The severity was determined based on both positioning repeatability tolerances and assessments from the maintenance team. Five operational states are defined: Normal, Minor, Moderate, Severe, and Critical. Details of the dataset are provided in Table 2.

In our IRs, the analog current signal is discretized through a servo-controlled system equipped with a sigma-delta (ΣΔ) ADC, achieving 15-bit effective resolution to ensure high-fidelity signal acquisition. A sampling frequency of

f_{s}

= 1 kHz is employed. The discrete-time sequence, denoted as

x_{i}

(

i = 1, 2, . . ., M),

undergoes subsequent digital processing for temporal or spectral analysis.

The study presents MGAT-MCNNBiLSTM, an end-to-end fault diagnosis model for IRs that directly identifies failure modes from motor current signatures. To benchmark its performance, we compare it against several established architectures: CNNBiLSTM with varying convolutional kernel scales, MCNNBiLSTM, GAT, and MGAT. Their network structures are illustrated in Table 3.

To ensure a fair comparison by minimizing parametric influences, MCNNBiLSTM maintains architectural parity with both LCNNBiLSTM and SCNNBiLSTM across all shared components. The only differences lie in the feature fusion layer and the subsequent linear transformation layer. Similarly, MGAT-MCNNBiLSTM inherits core parameters identically from its constituent MGAT and MCNNBiLSTM modules, with changes strictly localized to the same feature fusion and downstream linear layers. The MGAT employs a multi-head attention mechanism with three attention heads. Model training employed the following experimental configuration: batch size

N_{b}

was set to 128, epochs to 300, the optimizer to Adam, the learning rate to 1 × 10⁻³, and the loss function to cross-entropy loss.

To comprehensively evaluate the performance of fault diagnosis models, experiments were designed with distinct cases based on different evaluation objectives, as shown in Table 4.

In practical engineering applications, IRs operate in diverse and complex environments where electromagnetic interference inevitably degrades acquired current signals. Furthermore, cost-driven engineering constraints often necessitate lower-resolution ADC sampling. To evaluate fault diagnosis robustness under such compromised signal acquisition scenarios, Gaussian white noise was injected into current signals at controlled signal-to-noise ratios (SNR), where SNR is expressed as

γ_{S N R} = 10 \lg (\frac{P_{s i g n a l}}{P_{n o i s e}})

(20)

where

P_{s i g n a l}

and

P_{n o i s e}

represent the signal power and noise power, respectively.

To mitigate model overfitting risks, a five-fold cross-validation protocol was rigorously implemented. The dataset was partitioned into five non-overlapping folds. During each training-validation iteration, four folds constituted the training subset while the remaining fold served as the validation subset. This validation scheme quantifies robust diagnostic accuracy for all compared methodologies.

The accuracy of the fault model was calculated using the following equation:

A c c u r a c y = \frac{1}{N_{c v}} \sum_{i_{c v} = 1}^{N_{c v}} \frac{{T P}_{i_{c v}}}{{T P}_{i_{c v}} + {F N}_{i_{c v}}}

(21)

where

{T P}_{i_{c v}}

and

{F N}_{i_{c v}}

are true position and false negative of the

i_{c v}

-th fold cross-validation, respectively.

3. Results

To assess the inherent diagnostic challenge posed by raw sensor data, we first examined the motor current signals in both the time and frequency domains across varying fault severity levels.

Figure 13 characterizes the motor current signatures across varying fault severity levels in robot joints.

As shown, the raw time domain current signals exhibit high similarity across different fault severity levels, resulting in a lack of clear separation that makes direct distinction difficult.

Complementing this, Figure 14 depicts the frequency spectra of motor current across varying fault severity levels in robot joints.

As observed in the figure, the distinctions in the frequency spectra of the motor current across different fault severity levels are subtle. It remains challenging to directly extract effective fault-discriminative features from these frequency domain data. This limitation underscores the necessity of employing DL methods to uncover underlying discriminative patterns.

For a comprehensive comparison of different DL-based fault diagnosis methods for IRs, we used t-distributed stochastic neighbor embedding (T-SNE) to visualize their output features in a two-dimensional feature space, as shown in Figure 15.

In the dimensionality-reduced feature space, the proposed MGAT-MCNNBiLSTM exhibits considerably cleaner boundaries. The features learned by the hybrid deep learning framework are much more separable, which means the classifier will more easily distinguish between different fault types.

Case 1: Performance Validation of Fault Diagnosis Models with Raw Current-Signal Data (Architectural configuration details are provided in Table 3. No spectral preprocessing was applied).

To mitigate the randomness caused by model initialization in diagnostic performance evaluation, fifty independent trials were performed for each method. To visualize the distribution of overall accuracy and highlight performance disparities among the various models, the accuracy results were sorted in ascending order and plotted as a waterfall chart. The corresponding results across all diagnosis models are depicted in Figure 16.

As illustrated in the waterfall chart, the SNNBiLSTM, LCNNBiLSTM, and MCNNBiLSTM models demonstrate comparatively superior performance under this condition. Specifically, their accuracy exceeds that of the MGA-MCNNBiLSTM model and is significantly higher than the performance achieved by the GAT and MGAT models.

To statistically validate and deepen this performance comparison, Figure 17 displays the statistical visualizations comparing the performance of different fault diagnosis models for Case 1.

As depicted in Figure 17, the MCNNBiLSTM model outperforms comparative methods by not only exhibiting the highest median accuracy but also displaying remarkably narrow interquartile ranges. This combination of metrics indicates a consistently superior and highly reliable performance, characterized by minimal variance across the experimental trials.

Table 5 summarizes the comparative performance of six fault diagnosis models in Case 1. Accuracy, Positive Predictive Value (PPV), and

F_{1}

-Score are integral to assessing the performance of each model.

Among the methods evaluated using raw current data without spectral preprocessing, MCNNBiLSTM achieved the highest overall performance, attaining peak metrics of 70.2240 ± 2.3004% accuracy, 71.8084 ± 2.1795% PPV, and 69.7472 ± 2.4187% F₁-Score. The LCNNBiLSTM and SCNNBiLSTM models demonstrated moderate efficacy, yielding accuracies of 68.5180 ± 2.6241% and 67.6320 ± 2.5937%, respectively. In contrast, the graph-based approaches, GAT and MGAT, exhibit critically limited diagnostic capability, attaining merely 45.5520 ± 2.7559% and 47.0560 ± 2.6621% accuracy. Significantly, the hybrid MGAT-MCNNBiLSTM architecture, while not surpassing the standalone MCNNBiLSTM, delivered a marked performance improvement over pure graph-based methods (GAT/MGAT), achieving an intermediate Accuracy of 62.6960 ± 2.5217%.

Case 2: Performance Validation of Fault Diagnosis Models with Frequency-transformed Current-Signal Data (Architectural configuration details are provided in Table 3. Spectral preprocessing was applied).

In Case 2, which features raw signal diagnosis supported by spectral preprocessing, the comparative performance across all fault diagnosis models is illustrated in Figure 18.

As illustrated in the waterfall chart, the GAT, MGAT, and MGA-MCNNBiLSTM models demonstrate distinct advantages under this condition, outperforming the LCNNBiLSTM, MCNNBiLSTM, and SNNBiLSTM models in terms of accuracy.

To provide statistical validation for this comparison, Figure 19 presents performance visualizations across the different fault diagnosis models for the second case study (Case 2).

The box plots in Figure 19 statistically corroborate the model performance results for Case 2. The MGA-MCNNBiLSTM model distinguishes itself by achieving a median accuracy above 90% while maintaining a compact interquartile range, indicating both high accuracy and excellent robustness. This performance is notably superior to the other models under evaluation.

Table 6 summarizes the comparative performance of six fault diagnosis models in Case 2. Accuracy, Positive Predictive Value (PPV), and

F_{1}

-Score are integral to assessing the performance of each model.

Table 6 quantitatively summarizes the comparative performance of six fault diagnosis models in Case 2. When evaluated on raw current data with spectral preprocessing, the proposed MGAT-MCNNBiLSTM framework demonstrated superior diagnostic capability, achieving peak metrics of 90.7560 ± 1.3311% accuracy, 91.6626 ± 1.1924% PPV, and 90.6736 ± 1.3685% F₁-Score. Among graph-based architectures, both GAT and MGAT delivered competitive results, significantly exceeding all CNNBiLSTM variants. Notably, while MCNNBiLSTM attained 87.8800 ± 1.6407% accuracy, it was surpassed by both LCNNBiLSTM (82.2060 ± 1.8032%) and SCNNBiLSTM (85.9640 ± 1.7009%). Collectively, these findings demonstrate that MGAT-MCNNBiLSTM represents the optimal fault diagnosis framework, achieving a significant 1.51–8.55% absolute improvement in accuracy relative to all benchmark methods.

Figure 20 demonstrates that raw time-series current signals offer limited diagnostic value for deep-learning fault detection, whereas spectral preprocessing substantially enhances model performance.

All models incorporating spectral preprocessing demonstrated statistically superior diagnostic performance compared to non-preprocessed models. Accuracy improvements reached 13.6880%, 18.3320%, and 17.6560% in LCNNBiLSTM, SCNNBiLSTM, and MCNNBiLSTM, respectively. More significant gains are observed in GAT, MGAT, and our proposed MGAT-MCNNBiLSTM, with accuracy increases of 43.2920%, 42.1880%, and 28.0600%, respectively. Similarly, PPV and F1-Score exhibited parallel enhancements.

Case 3: Robustness Assessment Against Signal Degradation via Noise Injection for Emulating Low-Cost, Low-Resolution ADC Scenarios.

To evaluate the noise immunity of different fault diagnosis models, Gaussian white noise was injected into the dataset, establishing an SNR of 40 dB. A comparison of the original and noise-contaminated current signals is presented in Figure 21.

Following noise injection, a marked degradation in signal quality is observable compared to the original state. This effectively emulates the signal corruption encountered in industrial environments characterized by high electromagnetic interference and low-resolution ADC systems.

For Case 3, Figure 22 illustrates the comparative noise robustness of all fault diagnosis models.

The waterfall chart indicates that for Case 3, the GAT, MGAT, and MGA-MCNNBiLSTM models demonstrate superior accuracy compared to the LCNNBiLSTM, MCNNBiLSTM, and SNNBiLSTM models. This trend is consistent with the results observed in Case 2.

To statistically validate this comparison, Figure 23 displays the statistical visualizations comparing the performance of different fault diagnosis models for the third case study (Case 3).

As shown in Figure 23, the MGA-MCNNBiLSTM model stands out in Case 3 by achieving the highest median accuracy and a compact interquartile range, which together signify superior accuracy and robustness relative to the other models.

Table 7 quantitatively summarizes the comparative performance of six fault diagnosis models in Case 3.

Among the methods evaluated using raw current data with spectral preprocessing, the proposed MGAT-MCNNBiLSTM model achieved superior diagnostic capability, delivering peak metrics of 89.4060 ± 1.4222% accuracy, 90.4124 ± 1.1895% PPV, and 89.2810 ± 1.4853%

F_{1}

-Score. Among graph-based approaches, both GAT and MGAT demonstrated competitive results, significantly outperforming all CNNBiLSTM variants. While MCNNBiLSTM attained 85.1820 ± 1.7555% accuracy, it was surpassed by both LCNNBiLSTM (79.3480 ± 1.9455%) and SCNNBiLSTM (82.9900 ± 1.8334%). Significantly, under noise corruption, both GAT variants and the proposed MGAT-MCNNBiLSTM framework exhibited marginal degradation in diagnostic accuracy (1.21–1.37%). In contrast, CNNBiLSTM architectures demonstrated substantially greater performance deterioration (2.70–2.97%), exceeding twice the magnitude of loss observed in graph-enhanced methods. Collectively, these results confirm MGAT-MCNNBiLSTM as the optimal fault diagnosis framework, achieving 1.37–10.26% absolute improvement in accuracy relative to all benchmark methods.

4. Discussion

In Case 1, the experimental results demonstrate that MCNNBiLSTM delivers optimal performance in time-domain applications, while GAT, MGAT, and MGAT-MCNNBiLSTM exhibit suboptimal efficacy under these conditions. This performance gap is primarily attributable to the fundamental limitations of graph-based architectures in time-series fault diagnosis, rooted in an intrinsic incompatibility between graph-structured processing and sequential data physics. These limitations manifest primarily as: (1) Structural incompatibility: Graph-based discretization of continuous time-series compromises critical temporal continuity; and (2) Physics omission: Scale-specific temporal signatures induced by faults remain unmodeled. In contrast, the CNNBiLSTM framework achieves superior performance in time-domain fault diagnosis through its synergistic integration of convolutional and recurrent processing. The CNN component functions as an adaptive local pattern extractor, while the BiLSTM inherently preserves sequential ordering and captures temporal dependencies. This inherently physics-compatible architecture consequently enables robust fault diagnosis.

In Case 2, the results reveal that spectral preprocessing markedly enhances diagnostic performance. Specifically, it fundamentally improves efficacy by decoupling latent signatures, amplifying critical features, and standardizing diagnostic markers, thereby transcending the inherent limitations of time-domain analysis. The core advantages are the following:

(1): Intrinsic fault-frequency alignment

Mechanical faults such as bearing degradation, gear tooth breakage, and rotor imbalance induce vibration-modulated sidebands in motor current signals. Conventional time-domain analysis struggles to detect these subtle modulations, as they are often obscured by dominant fundamental components. In contrast, spectral transformation effectively separates constituent frequencies, converting masked sidebands into distinct spectral features and condensing distributed temporal anomalies into localized frequency-domain indicators.

(2): Attenuation of non-diagnostic interference

Current signal phase measurements demonstrate marked vulnerability to noise perturbations, particularly those originating from sampling position dependencies. Spectral representations reliably preserve amplitude characteristics, which constitute primary indicators for fault confirmation, while simultaneously attenuating phase sensitivity. This targeted retention of diagnostically robust features enhances accuracy through the elimination of interference-prone signal parameters.

(3): Enhanced compatibility with deep learning architectures

Spectrograms spatially localize fault signatures at specific frequency coordinates, enabling clear pattern isolation. Time-domain waveforms, by comparison, often exhibit fault-related anomalies as faint, globally distributed distortions with poor structural definition, complicating feature extraction. Moreover, current amplitudes in the time domain are strongly influenced by load fluctuations, which can overshadow subtle fault-induced changes. Spectral analysis redirects model focus toward structurally stable frequency-domain patterns, aligning with the inherent strength of deep learning in detecting localized, frequency-specific features. This synergy between spectral decomposition and hierarchical learning establishes a highly efficient framework for intelligent fault diagnosis.

In Case 3, noise injection tests were conducted to emulate industrial environments characterized by high electromagnetic interference, as well as to simulate the quantization errors inherent in low-cost, low-resolution ADC systems. Despite significant signal degradation, the proposed framework consistently outperformed all benchmark models under these adversarial conditions. These results empirically confirm the framework’s exceptional robustness against signal contamination and its practicality for deployment in both noise-heavy industrial settings and resource-limited embedded devices. Consequently, this resilience significantly broadens its industrial applicability by lowering hardware precision requirements.

5. Conclusions

This study is motivated by the EBM principle, which prioritizes motor current as the critical bridge between the physics of mechanical faults and measurable electrical signals for IR diagnosis. It advances the theoretical framework of fault diagnosis in IRs by validating the critical necessity of fusing spatiotemporal and structural representations. The research demonstrates that accurate modeling of complex electromechanical systems requires the simultaneous extraction of hierarchical temporal-spectral features and dynamic structural dependencies. By successfully implementing this integration, the proposed MGAT-MCNNBiLSTM architecture provides highly accurate and robust fault diagnosis for IRs. Experimental validation confirms that our model is consistently superior to existing benchmarks, including LCNNBiLSTM, SCNNBiLSTM, MCNNBiLSTM, GAT, and MGAT, and demonstrates significantly enhanced reliability in fault detection across diverse operating conditions. Crucially, this study offers empirical support for evolving EBM theory, transitioning the focus from elementary signal analysis to the modeling of complex system fault diagnosis.

From a practical perspective, this research solidifies motor current analysis as the most viable and economically advantageous sensing modality for industrial deployment. Our approach capitalizes on the existing servo drive infrastructure, thereby eliminating the need for additional expensive sensors and associated retrofitting. Furthermore, the model’s compatibility with legacy, low-resolution data acquisition systems common in installed IRs, as well as its robustness in similar scenarios of data degradation caused by electromagnetic interference, substantially lowers the technical and financial barriers to implementation. Consequently, this synergy of high diagnostic accuracy and reliability, minimal upfront investment, and operational robustness enables a direct transition from costly reactive or scheduled maintenance to more efficient predictive strategies, significantly reducing unplanned downtime and extending IR service life.

The primary limitation of this work concerns the scope of the investigated fault scenarios, as the evaluation primarily focused on mechanical degradation reflected by positioning repeatability errors. In practical industrial applications, IRs are susceptible to a much wider spectrum of anomalies, including electrical component failures, sensor malfunctions, and complex compound faults. Capturing the distinct energy signatures of these varied anomalies is essential to validating the applicability of EBM theory across complex systems. Consequently, addressing this diversity necessitates a significant expansion of the fault knowledge base to ensure comprehensive diagnostic coverage. Therefore, future research will focus on enriching the dataset with diverse fault categories and refining the model architecture to enable the precise identification and classification of a broader range of IR faults.

Author Contributions

Conceptualization, J.W. and Y.Z.; methodology, J.W. and Y.Z.; software, J.W.; validation, J.W. and Y.Z.; formal analysis, J.W., Y.Z., B.G., and L.X.; investigation, J.W.; resources, J.W., B.G., and L.X.; data curation, J.W.; writing—original draft preparation, J.W. and Y.Z.; writing—review and editing, J.W., X.Z., and X.W.; visualization, J.W.; supervision, B.G. and L.X.; project administration, B.G., L.X., and H.W. funding acquisition, J.W., Y.Z., B.G., L.X., X.Z., and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Projects of Shenzhen University of Information Technology under Grant SZIIT2025KJ022, SZIIT2025KJ021 and SZIIT2025KJ057, and in part by the Major Research Plan of the National Natural Science Foundation of China under Grant 92467204.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ADC	analog-to-digital converter
BiLSTM	bidirectional long short-term memory
CNN	convolutional neural network
CNNLSTM	convolutional neural network with long short-term memory
CNNBiLSTM	convolutional neural network with bidirectional long short-term memory
DFT	discrete Fourier transform
DL	deep learning
EBM	energy-based maintenance
EMI	electromagnetic interference
GAT	graph attention network
GNN	graph neural network
IR	industrial robot
LSTM	long short-term memory
LCNN	large-scale convolutional neural network
LCNNBiLSTM	large-scale convolutional neural network with bidirectional long short-term memory
MCNNBiLSTM	Multi-scale convolutional neural network with bidirectional long short-term memory
MCSA	motor current signature analysis
MGAT	multi-head graph attention network
OEM	original equipment manufacturer
RNN	recurrent neural network
SCNN	small-scale convolutional neural network
SCNNBiLSTM	small-scale convolutional neural network with bidirectional long short-term memory
SNR	signal-to-noise ratio

References

IFR. World Robotics 2025. Frankfurt Am Main, Germany, September 2025. Available online: https://ifr.org/downloads/press_docs/PressConference2025_presentation.pdf (accessed on 28 October 2025).
Bhuiyan, M.R.; Uddin, J. Deep transfer learning models for industrial fault diagnosis using vibration and acoustic sensors data: A review. Vibration 2023, 6, 218–238. [Google Scholar] [CrossRef]
Kundu, P. Review of rotating machinery elements condition monitoring using acoustic emission signal. Expert Syst. Appl. 2024, 252, 124169. [Google Scholar] [CrossRef]
Chauhan, S.; Vashishtha, G.; Kaur, P. An Effective Approach to Rotatory Fault Diagnosis Combining CEEMDAN and Feature-Level Integration. Algorithms 2025, 18, 644. [Google Scholar] [CrossRef]
He, Y.; Chen, J.; Zhou, X.; Huang, S. In-situ fault diagnosis for the harmonic reducer of industrial robots via multi-scale mixed convolutional neural networks. J. Manuf. Syst. 2023, 66, 233–247. [Google Scholar] [CrossRef]
Wu, Y.; Bai, Y.; Yang, S.; Li, C. Extracting random forest features with improved adaptive particle swarm optimization for industrial robot fault diagnosis. Measurement 2024, 229, 114451. [Google Scholar] [CrossRef]
Liu, L.; Zhi, Z.; Yang, Y.; Shirmohammadi, S.; Liu, D. Harmonic reducer fault detection with acoustic emission. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar] [CrossRef]
Hu, Q.; Zhang, Y.; Xie, X.; Su, W.; Li, Y.; Shan, L.; Yu, X. Optimal placement of vibration sensors for industrial robots based on Bayesian theory. Appl. Sci. 2022, 12, 6086. [Google Scholar] [CrossRef]
Elahi, M.; Afolaranmi, S.O.; Mohammed, W.M.; Martinez Lastra, J.L. Energy-Based Prognostics for Gradual Loss of Conveyor Belt Tension in Discrete Manufacturing Systems. Energies 2022, 15, 4705. [Google Scholar] [CrossRef]
Orošnjak, M.; Brkljač, N.; Šević, D.; Čavić, M.; Oros, D.; Penčić, M. From predictive to energy-based maintenance paradigm: Achieving cleaner production through functional-productiveness. J. Clean. Prod. 2023, 408, 137177. [Google Scholar] [CrossRef]
Howell, M.T.; Alshakhshir, F. Energy Centered Maintenance: A Green Maintenance System; River Publishers: Aalborg, Denmark, 2020. [Google Scholar] [CrossRef]
Zhou, Y.; Ma, Z.; Fu, L. A Review of Key Signal Processing Techniques for Structural Health Monitoring: Highlighting Non-Parametric Time-Frequency Analysis, Adaptive Decomposition, and Deconvolution. Algorithms 2025, 18, 318. [Google Scholar] [CrossRef]
Raouf, I.; Lee, H.; Kim, H.S. Mechanical fault detection based on machine learning for robotic RV reducer using electrical current signature analysis: A data-driven approach. J. Comput. Des. Eng. 2022, 9, 417–433. [Google Scholar] [CrossRef]
Lee, I.; Park, H.J.; Jang, J.-W.; Kim, C.-W.; Choi, J.-H. System-level fault diagnosis for an industrial wafer transfer robot with multi-component failure modes. Appl. Sci. 2023, 13, 10243. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, M.; Tang, X.; Peng, F.; Yan, R. A kmap optimized VMD-SVM model for milling chatter detection with an industrial robot. J. Intell. Manuf. 2022, 33, 1483–1502. [Google Scholar] [CrossRef]
Peng, B.; Bi, Y.; Xue, B.; Zhang, M.; Wan, S. A Survey on Fault Diagnosis of Rolling Bearings. Algorithms 2022, 15, 347. [Google Scholar] [CrossRef]
Hou, J.; Lu, X.; Zhong, Y.; He, W.; Zhao, D.; Zhou, F. A comprehensive review of mechanical fault diagnosis methods based on convolutional neural network. J. Vibroengineering 2024, 26, 44–65. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M.; Hill, R.; Allen, P. A comprehensive review of convolutional neural networks for defect detection in industrial applications. IEEE Access 2024, 12, 94250–94295. [Google Scholar] [CrossRef]
Lu, K.; Chen, C.; Wang, T.; Cheng, L.; Qin, J. Fault diagnosis of industrial robot based on dual-module attention convolutional neural network. Auton. Intell. Syst. 2022, 2, 12. [Google Scholar] [CrossRef]
Pan, J.; Qu, L.; Peng, K. Sensor and actuator fault diagnosis for robot joint based on deep CNN. Entropy 2021, 23, 751. [Google Scholar] [CrossRef]
Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
Van Houdt, G.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
Wang, X.; Liu, M.; Liu, C.; Ling, L.; Zhang, X. Data-driven and knowledge-based predictive maintenance method for industrial robots for the production stability of intelligent manufacturing. Expert Syst. Appl. 2023, 234, 121136. [Google Scholar] [CrossRef]
Nacer, S.M.; Nadia, B.; Abdelghani, R.; Mohamed, B. A novel method for bearing fault diagnosis based on BiLSTM neural networks. Int. J. Adv. Manuf. Technol. 2023, 125, 1477–1492. [Google Scholar] [CrossRef]
Wang, J.; Wang, X.; Wang, Y.; Sun, Y.; Sun, G. Intelligent joint actuator fault diagnosis for heavy-duty industrial robots. IEEE Sens. J. 2024, 24, 15292–15301. [Google Scholar] [CrossRef]
Zhi, Z.; Liu, L.; Liu, D.; Hu, C. Fault detection of the harmonic reducer based on CNN-LSTM with a novel denoising algorithm. IEEE Sens. J. 2021, 22, 2572–2581. [Google Scholar] [CrossRef]
Thanh, P.N.; Cho, M.-Y. Advanced AIoT for failure classification of industrial diesel generators based hybrid deep learning CNN-BiLSTM algorithm. Adv. Eng. Inform. 2024, 62, 102644. [Google Scholar] [CrossRef]
Li, T.; Zhou, Z.; Li, S.; Sun, C.; Yan, R.; Chen, X. The emerging graph neural networks for intelligent fault diagnostics and prognostics: A guideline and a benchmark study. Mech. Syst. Signal Process. 2022, 168, 108653. [Google Scholar] [CrossRef]
Liu, Z.; Zhou, J. Introduction to Graph Neural Networks; Springer Nature: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Xiao, L.; Yang, X.; Yang, X. A graph neural network-based bearing fault detection method. Sci. Rep. 2023, 13, 5286. [Google Scholar] [CrossRef] [PubMed]
Vrahatis, A.G.; Lazaros, K.; Kotsiantis, S. Graph attention networks: A comprehensive review of methods and applications. Future Internet 2024, 16, 318. [Google Scholar] [CrossRef]
Jiang, L.; Li, X.; Wu, L.; Li, Y. Bearing fault diagnosis method based on a multi-head graph attention network. Meas. Sci. Technol. 2022, 33, 075012. [Google Scholar] [CrossRef]

Figure 1. The proposed energy-metric-driven framework for IR fault diagnosis.

Figure 2. Comparative advantages of MCSA versus vibration/acoustic methods in fault diagnosis.

Figure 3. Positioning repeatability as a diagnostic tool for mechanical faults in IRs.

Figure 4. Schematic Diagram of IR End-Effector Positioning Repeatability Measurement.

Figure 5. Data acquisition framework for IRs under different mechanical fault levels based on positioning repeatability thresholds.

Figure 6. Information flow control in the LSTM architecture.

Figure 7. CNNBiLSTM-based fault diagnosis model for IRs.

Figure 8. MCNNBiLSTM-based fault diagnosis model for IRs.

Figure 9. Graph attention mechanism.

Figure 10. Multi-head graph attention mechanism.

Figure 11. GAT/MGAT-based fault diagnosis model for IRs.

Figure 12. MGAT-MCNNBiLSTM-based fault diagnosis model for IRs.

Figure 13. Motor current for varying fault severity levels in our IRs.

Figure 14. Frequency spectra of motor current for varying fault severity levels.

Figure 15. Output features of different methods in a two-dimensional feature space.

Figure 16. Waterfall chart of different fault diagnosis models for IRs (Case 1).

Figure 17. Statistical visualizations of different fault diagnosis models for IRs (Case 1).

Figure 18. The waterfall chart of different fault diagnosis models for IRs (Case 2).

Figure 19. The statistical visualizations of different fault diagnosis models for IRs (Case 2).

Figure 20. Performance comparison of diagnosis models with and without spectral preprocessing (Case 1 vs. Case 2).

Figure 21. Comparison of the original and noise-contaminated current signals. (a) original signal waveform; (b) signal waveform after noise injection (SNR = 40 dB).

Figure 22. The flowchart of the case study for different fault diagnosis models for IRs (Case 3).

Figure 23. The statistical visualizations of different fault diagnosis models for IRs (Case 3).

Table 1. Classification of fault severity levels for IRs by

δ_{m} / δ_{s}

deviation thresholds.

Table 1. Classification of fault severity levels for IRs by

δ_{m} / δ_{s}

deviation thresholds.

Health State	Fault Level	Performance Characterization	Description
Normal Operation	0	$δ_{m} \leq δ_{s}$	End-effector positioning repeatability complies with OEM specifications.
Incipient Anomaly	1	$δ_{m} \in (δ_{s}, 1.2 δ_{s}]$	Minor deviations in end-effector repeatability are observed, exhibiting negligible impact on operational integrity.
Moderate Degradation	2	$δ_{m} \in (1.2 δ_{s}, 1.5 δ_{s}]$	Progressive degradation measurably reduces operational positioning accuracy.
Severe	3	$δ_{m} \in (1.5 δ_{s}, 2.0 δ_{s}]$	Severe repeatability anomalies may disrupt normal operations.
Critical Failure Risk	4	$δ_{m} > 2.0 δ_{s}$	Critical deviations cause substantial end-effector displacement, inducing functional impairment in robot tasks.

Table 2. Sample information for fault diagnosis.

Health State	Fault Level	Sample Size
Normal Operation	0	200
Incipient Anomaly	1	200
Moderate Degradation	2	200
Severe Impairment	3	200
Critical Failure Risk	4	200

Table 3. The network structure of the case study for different fault models.

Model	LCNNBiLSTM	SCNNBiLSTM	MCNNBiLSTM		Model	GAT/MAGT	MAGT-MCNNBiLSTM
Layer Type	Configuration/ Output Shape	Configuration/ Output Shape	Configuration/Output Shape		Layer Type	Configuration/ Output Shape	Configuration/Output Shape
Input	Raw signal	Raw signal	Raw signal	Raw signal	Input	Raw signal	Raw signal	Raw signal	Raw signal
Input	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512	Input	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512
Frequency Processing	FFT	FFT	FFT	FFT	Frequency Processing	FFT	FFT	FFT	FFT
Frequency Processing	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512	Frequency Processing	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512	N_b × 1 × 512
Data Restructuring	N_b × 8 × 64	N_b × 8 × 64	N_b × 8 × 64	N_b × 8 × 64	Data Restructuring	N_b × 512	N_b × 8 × 64	N_b × 8 × 64	N_b × 512
Conv(1)	in = 8, out = 64 kernel = 15, stride = 1	in = 8, out = 64 kernel = 3, stride = 1	in = 8, out = 64 kernel = 15, stride = 1	in = 8, out = 64 kernel = 3, stride = 1	MCNNBiLSTM Conv(1)	/	in = 8, out = 64 kernel = 15, stride = 1	in = 8, out = 64 kernel = 3, stride = 1	/
Conv(1)	N_b × 64 × 64	N_b × 64 × 64	N_b × 64 × 64	N_b × 64 × 64	MCNNBiLSTM Conv(1)	/	N_b × 64 × 64	N_b × 64 × 64	/
BatchNorm(1)	N_b × 64 × 64	N_b × 64 × 64	N_b × 64 × 64	N_b × 64 × 64	~	/	~		/
ReLU(1)	N_b × 64 × 64	N_b × 64 × 64	N_b × 64 × 64	N_b × 64 × 64	~	/	~		/
Conv(2)	in = 64, out = 30 kernel = 7, stride = 1	in = 64, out = 40 kernel = 3, stride = 1	in = 64, out = 30 kernel = 7, stride = 1	in = 64, out = 40 kernel = 3, stride = 1	MCNNBiLSTM Linear(1)	/	in = 120, out = 256		/
Conv(2)	N_b × 30 × 64	N_b × 40 × 64	N_b × 30 × 64	N_b × 40 × 64	MCNNBiLSTM Linear(1)	/	N_b × 256		/
MaxPool(2)	kernel = 2, stride = 2	kernel = 2, stride = 2	kernel = 2, stride = 2	kernel = 2, stride = 2	Edge Index	Graph connectivity nodes = 8	/		Graph connectivity nodes = 8
MaxPool(2)	N_b × 30 × 32	N_b × 40 × 32	N_b × 30 × 32	N_b × 40 × 32	GAT/MGAT(1) Head Fusion (AVG)	in = nodes × 64, heads = H, GAT/MGAT:H = 1/3, out = nodes × 64,	/		in = nodes × 64, heads = 3, out = nodes × 64
Conv(3)	/	in = 40, out = 30, kernel = 3, stride = 1	/	in = 40, out = 30, kernel = 3, stride = 1	GAT/MGAT(1) Head Fusion (AVG)	(N_b × nodes) × 64	/		(N_b × nodes) × 64
Conv(3)	/	N_b × 30 × 32	/	N_b × 30 × 32	BatchNorm(1)	(N_b × nodes) × 64	/		(N_b × nodes) × 64
BatchNorm(3)	/	N_b × 30 × 32	/	N_b × 30 × 32	ReLU(2)	(N_b × nodes) × 64	/		(N_b × nodes) × 64
ReLU(3)	/	N_b × 30 × 32	/	N_b × 30 × 32	GAT/MGAT(2) Head Fusion (AVG)	in = nodes × 64, heads = H, GAT/MGAT:H = 1/3, out = nodes × 64,	/		in = nodes × 64, heads = 3, out = nodes × 64
Conv(4)	/	in = 30, out = 30, kernel = 3, stride = 1	/	in = 30, out = 30, kernel = 3, stride = 1	GAT/MGAT(2) Head Fusion (AVG)	(N_b × nodes) × 64	/		(N_b × nodes) × 64
Conv(4)	/	N_b × 30 × 32	/	N_b × 30 × 32	BatchNorm(2)	(N_b × nodes) × 64	/		(N_b × nodes) × 64
MaxPool(4)	/	kernel = 2, stride = 2	/	kernel = 2, stride = 2	ReLU(2)	(N_b × nodes) × 64	/		(N_b × nodes) × 64
MaxPool(4)	/	N_b × 30 × 16	/	N_b × 30 × 16	Linear(1)	in = nodes × 64, out = 256	/		in = nodes × 64, out = 256
Dimension Permutation	N_b × 32 × 30	N_b × 16 × 30	N_b × 32 × 30	N_b × 16 × 30	Linear(1)	N_b × 256	/		N_b × 256
Bidirectional LSTM	layers = 2, hidden = 30, dropout = 0.5, bidirectional = True	layers = 2, hidden = 30, dropout = 0.5, bidirectional = True	layers = 2, hidden = 30, dropout = 0.5, bidirectional = True	layers = 2, hidden = 30, dropout = 0.5, bidirectional = True	Feature Concatenation	/	N_b × 512
Bidirectional LSTM	N_b × 32 × 60	N_b × 16 × 60	N_b × 32 × 60	N_b × 16 × 60	Linear(2)	in = 256, out = N_C	in = 512, out = 256
Last Time Step Output	N_b × 60	N_b × 60	N_b × 60	N_b × 60	Linear(2)	N_b × N_C	N_b × 256
Feature Concatenation	/	/	N_b × 120		Linear(3)	/	in = 256, out = N_C
Linear(1)	in = 60, out = 256	in = 60, out = 256	in = 120, out = 256		Linear(3)	/	N_b × N_C
Linear(1)	N_b × 256	N_b × 256	N_b × 256		Output	Softmax	Softmax
Linear(2)	in = 256, out = N_C	in = 256, out = N_C	in = 256, out = N_C		Output	N_b × N_C	N_b × N_C
Linear(2)	N_b × N_C	N_b × N_C	N_b × N_C
Output	Softmax	Softmax	Softmax
Output	N_b × N_C	N_b × N_C	N_b × N_C

Different colors represent parameters from different models. For MGAT-MCNNBiLSTM, the colors indicate that it adopts the same structural parameters as the corresponding models represented by those colors.

Table 4. Experimental Design for Performance Evaluation of Fault Diagnosis Models.

Case	Evaluation Objective	Spectral Preprocessing	Noise Injection	Data Configuration
1	To assess the performance of different fault diagnosis models using raw time-domain signals	Not applied	Not applied	Raw time-domain industrial robot current signals as input
2	To evaluate diagnostic capability by employing spectral processing	Applied	Not applied	Current signals after spectral processing as input (based on Case 1)
3	To examine the robustness against signal degradation or validate the low-cost, low-resolution ADC sampling effects	Applied	Applied	Current signals with noise addition followed by spectral processing as input (based on Case 2)

Table 5. Performance metrics of different fault diagnosis models (Case 1).

Method	Accuracy (%)	PPV (%)	$F_{1}$ -Score (%)
LCNNBiLSTM	68.5180 ± 2.6241	70.6578 ± 2.6718	67.8569 ± 2.8022
SCNNBiLSTM	67.6320 ± 2.5937	69.7539 ± 2.6196	66.9971 ± 2.7752
MCNNBiLSTM	70.2240 ± 2.3004	71.8084 ± 2.1795	69.7472 ± 2.4187
GAT	45.5520 ± 2.7559	47.6965 ± 2.6356	45.3136 ± 2.7305
MGAT	47.0560 ± 2.6621	49.2839 ± 2.5812	46.7199 ± 2.7305
MGAT-MCNNBiLSTM	62.6960 ± 2.5217	63.8109 ± 2.7385	62.0851 ± 2.6315

Table 6. Performance metrics of different fault diagnosis models (Case 2).

Method	Accuracy (%)	PPV (%)	$F_{1}$ -Score (%)
LCNNBiLSTM	82.2060 ± 1.8032	83.1439 ± 1.7160	82.0412 ± 1.8425
SCNNBiLSTM	85.9640 ± 1.7009	86.7810 ± 1.5945	85.8558 ± 1.7200
MCNNBiLSTM	87.8800 ± 1.6407	88.6811 ± 1.3889	87.7626 ± 1.6858
GAT	88.8440 ± 1.6686	90.0977 ± 1.5434	88.7108 ± 1.7327
MGAT	89.2440 ± 1.5511	90.3630 ± 1.4105	89.1234 ± 1.6106
MGAT-MCNNBiLSTM	90.7560 ± 1.3311	91.6626 ± 1.1924	90.6736 ± 1.3685

Table 7. Performance metrics of various fault diagnosis models (Case 3).

Method	Accuracy (%)	PPV (%)	$F_{1}$ -Score (%)
LCNNBiLSTM	79.3480 ± 1.9455	80.4062 ± 1.8484	79.1439 ± 1.9672
SCNNBiLSTM	82.9900 ± 1.8334	84.4141 ± 1.5325	82.6989 ± 1.9563
MCNNBiLSTM	85.1820 ± 1.7555	86.0929 ± 1.4537	84.9786 ± 1.8964
GAT	87.4720 ± 1.7765	88.6734 ± 1.5621	87.3011 ± 1.7925
MGAT	88.0340 ± 1.6518	89.2909 ± 1.4580	87.8469 ± 1.6998
MGAT-MCNNBiLSTM	89.4060 ± 1.4222	90.4124 ± 1.1895	89.2810 ± 1.4853

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, J.; Zhang, Y.; Gao, B.; Xia, L.; Zhu, X.; Wang, H.; Wan, X. A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots. Algorithms 2025, 18, 779. https://doi.org/10.3390/a18120779

AMA Style

Wu J, Zhang Y, Gao B, Xia L, Zhu X, Wang H, Wan X. A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots. Algorithms. 2025; 18(12):779. https://doi.org/10.3390/a18120779

Chicago/Turabian Style

Wu, Jun, Yuepeng Zhang, Bo Gao, Linzhong Xia, Xueli Zhu, Hui Wang, and Xiongbo Wan. 2025. "A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots" Algorithms 18, no. 12: 779. https://doi.org/10.3390/a18120779

APA Style

Wu, J., Zhang, Y., Gao, B., Xia, L., Zhu, X., Wang, H., & Wan, X. (2025). A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots. Algorithms, 18(12), 779. https://doi.org/10.3390/a18120779

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots

Abstract

1. Introduction

2. Theoretical Background and Framework of Proposed Method

2.1. Signal Selection and Fault Definition

2.1.1. Signal Selection

2.1.2. Fault Definition

2.2. Convolutional Neural Network and Long Short-Term Memory

2.2.1. Convolutional Neural Network

2.2.2. Long Short-Term Memory

2.3. Framework of the CNNBiLSTM-Based Fault Diagnosis Model

2.4. Framework of the MCNNBiLSTM-Based Fault Diagnosis Model

2.5. Framework of the GAT/MGAT-Based Fault Diagnosis Model

2.5.1. Graph Attention Network

2.5.2. Multi-Head GAT-Based Fault Diagnosis Model

2.6. Framework of the MGAT-MCNNBiLSTM-Based Fault Diagnosis Model

2.7. Experiment Methodology

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI