Previous Article in Journal
EEG-Based Decoding of Neural Mechanisms Underlying Impersonal Pronoun Resolution
Previous Article in Special Issue
Machine Learning Systems Tuned by Bayesian Optimization to Forecast Electricity Demand and Production
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots

1
School of Sino-German Robotics, Shenzhen University of Information Technology, Shenzhen 518172, China
2
Inovance Industrial Robot Reliability Technology Research Institute, Shenzhen University of Information Technology, Shenzhen 518172, China
3
School of Automation, China University of Geosciences, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(12), 779; https://doi.org/10.3390/a18120779
Submission received: 11 November 2025 / Revised: 4 December 2025 / Accepted: 6 December 2025 / Published: 10 December 2025

Abstract

Predominant fault diagnosis in industrial robots depends on dedicated vibration or acoustics sensors. However, their practical deployment is often limited by installation constraints, susceptibility to environmental noise, and cost considerations. Applying Energy-Based Maintenance (EBM) principles to achieve enhanced fault diagnosis under practical industrial conditions, we propose a hybrid deep learning framework, the Multi-head Graph Attention Network (MGAT) with Multi-scale CNNBiLSTM Fusion (MGAT-MCNNBiLSTM) for industrial robots. This approach obviates the need for additional dedicated sensors, effectively mitigating associated deployment complexities. The framework embodies four core innovations: (1) Based on the EBM paradigm, motor current is established as the most effective and practical choice for enabling cost-efficient and scalable industrial robot fault diagnosis. A corresponding dataset of motor current has been acquired from industrial robots operating under diverse fault scenarios. (2) An integrated MGAT-MCNNBiLSTM architecture that synergistically models multiscale local features and complex dynamics through its MCNNBiLSTM module while capturing nonlinear interdependencies via MGAT. This comprehensive feature representation enables robust and highly accurate fault detection. (3) The study found that the application of spectral preprocessing techniques yields a marked and statistically significant enhancement in diagnostic performance. A comprehensive and systematic analysis was undertaken to uncover the underlying reasons for this observed performance improvement. (4) To emulate challenging industrial settings and cost-sensitive implementations, noise signal injection was employed to evaluate model robustness in high-electromagnetic-interference environments and low-cost, low-resolution ADC implementations. Experimental validation on real-world industrial robot datasets demonstrates that MGAT-MCNNBiLSTM achieves a superior diagnostic accuracy of 90.7560%. This performance marks a significant absolute improvement of 1.51–8.55% over competing models, including LCNNBiLSTM, SCNNBiLSTM, MCCBiLSTM, GAT, and MGAT. Under challenging noise and low-resolution conditions, the proposed model consistently outperforms CNNBiLSTM variants, GAT, and MGAT with an improvement of 1.37–10.26% and enhanced industrial utility and deployment potential.

1. Introduction

Industrial robots (IRs) serve as essential equipment for high-precision manufacturing, supporting applications ranging from automotive welding to semiconductor assembly. They significantly improve productivity, positioning consistency, and personnel safety. According to the International Federation of Robotics (IFR), global installations of IRs reached 542,000 units in 2024, with China accounting for 54% of the total. The IFR forecasts a positive growth trajectory, with the annual installation rate at an average of approximately 7% in the coming years [1]. However, early-stage mechanical failures in key drivetrain components continue to compromise positioning accuracy, operational reliability, and service life. Consequently, there is an urgent industrial demand for diagnostic strategies that ensure high accuracy and robustness to minimize unplanned downtime and operational costs.
However, addressing this demand presents fundamental challenges in terms of practical deployability and superior diagnostic capabilities. Traditional methods for diagnosing mechanical faults rely primarily on vibration or acoustic sensing [2,3,4]. For instance, He et al. [5] directly mounted vibration sensors on robot links to identify harmonic reducer faults, while Wu et al. [6] utilized accelerometer signals enhanced by an adaptive particle swarm optimization algorithm to learn discriminative features for IR fault diagnosis. Liu et al. [7] further developed a monitoring framework based on acoustic emission for harmonic reducers. Despite their promise in laboratory settings, these sensing techniques face considerable barriers to real-world implementation, including restrictive mounting requirements, susceptibility to environmental interference, and high hardware costs. Specifically, triaxial accelerometers often require invasive installation, which is particularly challenging for sealed joints and high-speed axes. The fidelity of the acquired data is highly dependent on sensor placement precision in industrial robots. Hu et al. [8] emphasized that an optimized configuration is essential for reliable health assessment, as mispositioned sensors degrade signal integrity and diagnostic accuracy. Similarly, acoustic sensors are prone to contamination from ambient factory noise. Moreover, retrofitting production-line industrial robots with multi-axis sensing systems involves significant hardware investment and additional signal processing expenses. These collective challenges ultimately diminish the economic feasibility of widespread industrial deployment.
Energy-Based Maintenance (EBM) has emerged as a pivotal theoretical and practical framework for modern predictive maintenance, positing that energy-related metrics serve as primary and direct indicators of system health and mechanical integrity [9,10,11]. In electromechanical systems like IRs, the motor current is a fundamental energy-based signal, directly reflecting electromagnetic torque and the efficiency of the drivetrain. Therefore, from an EBM perspective, motor current signature analysis (MCSA) is not merely a convenient alternative but the theoretically preferred modality for fault diagnosis. Therefore, MCSA aligns perfectly with the EBM paradigm by offering a compelling solution to these deployment barriers, providing a non-invasive and economically feasible path for reliable maintenance. This approach has several advantages: (1) Non-invasive and Cost-effective Implementation: Leveraging existing current monitoring in servo drives, it eliminates the need for additional sensors or retrofitting. (2) Strong resilience to environmental conditions: Current measurements are largely unaffected by factors such as temperature fluctuations, humidity, dust, and mechanical vibrations, enabling stable signal acquisition even in harsh industrial settings. (3) Theoretical Alignment with EBM: Most significantly, it provides direct electromechanical coupling, capturing fault-induced torque variations. This makes the current signal a primary energy-based indicator, fundamentally aligning with EBM’s core principle.
Signal processing techniques have become important tools for diagnostic applications [12]. For instance, Raouf et al. [13] used statistical metrics and kinetic parameters derived from motor currents for dimensionality reduction. Lee et al. [14] applied wavelet packet decomposition to extract degradation-sensitive indicators, while Wang et al. [15] integrated variational mode decomposition with support vector machines for fault detection. However, manual feature engineering remains subjective and often limits the generalization of the model across different IRs, necessitating more robust data-driven approaches.
Deep Learning (DL) has consequently emerged as the predominant methodology, owing to its capacity for autonomous feature representation learning directly from raw sensor signals. Convolutional Neural Networks (CNNs) are widely employed to extract discriminative spatial features, eliminating the labor-intensive process of manual feature engineering [16,17,18,19,20]. However, conventional CNNs are constrained by fixed kernel sizes and local receptive fields, which limit their ability to model long-range temporal dependencies. To mitigate this deficiency, Recurrent Neural Networks, particularly Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM), are utilized to model complex temporal dynamics and non-stationary fault signatures [21,22,23,24]. Recognizing the need to capture both spatial and temporal dependencies simultaneously, researchers have developed hybrid models such as CNN-LSTM/CNN-BiLSTM, which have demonstrated superior diagnostic performance compared to individual models [25,26,27].
Despite these advances, a fundamental limitation persists: conventional CNNs and LSTMs operate on Euclidean data, thereby failing to capture the complex, non-Euclidean interdependencies and system-level interactions. Although Graph Neural Networks (GNNs) attempt to bridge this gap by modeling topological relationships [28,29,30], conventional GNNs use fixed aggregation weights determined solely by graph connectivity, which ignores the physical significance of interactions and lacks adaptive learning. While Graph Attention Networks (GATs) and Multi-head GATs (MGATs) introduce adaptive weighting mechanisms [31,32], their application in IR fault diagnosis often lacks synergistic integration with multi-scale feature extraction and temporal sequence modeling.
However, EBM provides a theoretical rationale for utilizing motor current signals. Existing deep learning methods have not fully exploited the rich informational potential of current signals for IR fault diagnosis. Diagnostic performance remains constrained by three key issues: (1) The challenge of extracting discriminative features comprehensively across multiple scales. (2) The difficulty in dynamically modeling the complex, non-Euclidean interdependencies among features. (3) The ineffective integration of temporal dynamics with structural relationships. These problems constrain the model’s discriminative power and prevent the full realization of reliable, cost-effective maintenance solutions.
To bridge the identified theory-practice gap in EBM and enable highly accurate and robust fault diagnosis for reliable maintenance of IRs, this study proposes a hybrid deep learning framework, termed the Multi-head Graph Attention Network with Multi-scale CNNBiLSTM Fusion (MGAT-MCNNBiLSTM). This novel architecture enables an end-to-end synergistic fusion by integrating a multi-head graph attention mechanism to model complex feature interdependencies and a multi-scale CNNBiLSTM to hierarchically extract multi-resolution discriminative features. Figure 1 illustrates the proposed energy-metric-driven framework for IR fault diagnosis.
The main contributions of this work are summarized as follows:
  • Based on the EBM principle, motor current is established as the most effective and practical choice for enabling cost-efficient and scalable IR fault diagnosis, owing to its direct reflection of torque coupling dynamics under mechanical faults. A corresponding dataset of motor current has been acquired from IRs operating under diverse fault scenarios.
  • This study extends the EBM theoretical framework for IR fault diagnosis by proposing an MGAT-MCNNBiLSTM architecture that integrates MGAT with MCNNBiLSTM, effectively capturing both temporal dynamics at varying resolutions and structural dependencies within feature graphs. This dual-path design facilitates the fusion of temporal-spectral attributes with spatio-graph relationships. Comparative trials demonstrate its consistent superiority over competing architectures, including LCNNBiLSTM, SCNNBiLSTM, MCNNBiLSTM, GAT, and MGAT.
  • The research of these current signal-based fault diagnosis models reveals that the application of spectral preprocessing techniques yields a statistically significant enhancement in diagnostic performance. Based on our experimental results, we subsequently undertook a comprehensive and systematic analysis to elucidate the fundamental mechanisms responsible for this observed performance improvement.
  • To evaluate diagnostic robustness under realistic industrial operating regimes and cost constraints, noise signal injection was employed to simulate high-electromagnetic-interference environments and low-cost, low-resolution ADC implementations. The proposed MGAT-MCNNBiLSTM model demonstrated consistent performance superiority over CNNBiLSTM variants, GAT, and MGAT benchmarks across these challenging scenarios. These results verify the model’s practical applicability in electrically noisy industrial settings and its compatibility with low-cost, low-resolution hardware. This capability directly supports the industry’s goal of achieving reliable predictive maintenance while overcoming deployment challenges associated with hardware costs and environmental interference.
The remaining sections of this paper are organized as follows: Section 2 details the theoretical background and framework of the proposed fault diagnosis method for IRs. Section 3 presents the experimental results, while Section 4 provides a detailed discussion. Finally, Section 5 concludes the paper.

2. Theoretical Background and Framework of Proposed Method

2.1. Signal Selection and Fault Definition

2.1.1. Signal Selection

For IRs, current-based fault diagnosis is often more practical to deploy due to its inherent compatibility with existing drive systems and standardized data interfaces. Unlike vibration monitoring, which relies on expensive accelerometers susceptible to installation misalignment, or acoustic methods that are vulnerable to ambient noise, MCSA utilizes current sensors already embedded in servo systems. A detailed comparison of MCSA with vibration and acoustic methods for fault diagnosis is presented in Figure 2. By avoiding additional hardware, this approach reduces installation complexity and directly captures electromagnetic torque variations induced by mechanical faults. Consequently, MCSA enables low-cost deployment and millisecond-level fault response, meeting the high-reliability and real-time demands of industrial applications.

2.1.2. Fault Definition

In the evaluation of robotic performance, beyond standard product specifications, end-effector positioning repeatability δ p represents a pivotal criterion for fault stratification. This priority originates from its direct correlation with core mechanical integrity and operational performance, in contrast to indirect indicators such as vibration or temperature. δ p rigorously quantifies positional consistency during repeated task executions, and is sensitive to underlying mechanical degradation mechanisms.
As Figure 3 shows, these mechanisms are primarily categorized into transmission system anomalies, structural system instabilities, motion interface imperfections, and environmental interaction perturbations. Collectively, these factors determine the positional consistency of IR manipulators during cyclic operations. This makes positioning repeatability serve as a powerful and robust diagnostic tool for mechanical faults in IRs.
As shown in Figure 4, the schematic illustrates the methodology for measuring positioning repeatability of our IR end-effector employing the laser displacement sensor (LK-G85, Keyence, Osaka, Japan). The measured positioning repeatability δ m is calculated using the following equation:
δ m = 1 N m i m = 1 N m   δ x 2   + δ y 2 + δ z 2
where δ x , δ y , and δ z are the measured positioning repeatability values along the X-axis, Y-axis, and Z-axis, respectively. N m is the total number of measurements.
Given the OEM-specified baseline δ s = 0.05 mm, Table 1 classifies fault severity levels for IRs based on δ m / δ s deviation thresholds.
Figure 5 illustrates a systematic data acquisition framework designed to construct a comprehensive dataset reflecting real-world industrial scenarios. Under continuous cyclic operation, industrial robots undergo gradual mechanical degradation, which manifests as a measurable escalation in end-effector positioning repeatability error. Guided by the stratification criteria defined in Table 1, this degradation spectrum is categorized into five distinct severity levels ranging from normal operation to critical failure risk. Subsequently, motor current signals are acquired from IR systems corresponding to these specific fault categories. Data acquisition across these levels provides the necessary dataset for developing intelligent fault diagnosis models.

2.2. Convolutional Neural Network and Long Short-Term Memory

2.2.1. Convolutional Neural Network

CNNs provide data-driven fault diagnosis through hierarchical deep learning architectures. As feed-forward networks, they employ cascaded convolution and pooling operations to learn discriminative, multi-level feature representations automatically. This architecture is particularly effective for processing complex signal modalities, including spectrograms and multivariate time-series data. Fundamentally, these networks employ weight-sharing convolutional filters to capture spatially invariant features.
The convolution operation at the layer l c is defined as
y i c l c + 1 ( j c ) = k c = 1 K c W i c l c ( k c ) x l c ( j c + k c 1 ) + b i c l c
where j c is the spatial position index. K c is the convolutional kernel size. W i c l c and b i c l c are the weight kernel and bias term for the i c -th filter. x l c ( j c ) is the input feature map at position j c .
In CNNs, max-pooling is widely used as the predominant downsampling method. This process extracts the most prominent response within each sub-region, effectively capturing distinctive local patterns while curbing computational demands through dimensionality reduction. The output feature map at the ( l c + 1 ) -th layer for the i c -th channel after pooling can be described as
p i c l c + 1 j p = m a x j p 1 W p + 1 i p j p W p y i c l c i p
where j p indexes output spatial positions after pooling. y i c l c denotes the output of the convolution layer at position i p in the i c -th channel, and W p represents the width of the pooling kernel. The index i p spans all spatial positions within the pooling window associated with the output position j p .

2.2.2. Long Short-Term Memory

LSTM networks overcome critical shortcomings of traditional recurrent neural networks, including the temporal gradient vanishing and explosion phenomena, while preserving robust modeling of sequential dependency. This architecture regulates information flow across time steps through dedicated gating mechanisms—input, forget, and output gates, as depicted in Figure 6.
The core innovation lies in the memory cell and how its gates meticulously regulate state updates:
  • Input Gate: Regulates integration of current input xt and the previous hidden state h t 1 into the cell state c t :
    i t = σ W i x t + U i h t 1 + b i
  • Forget Gate: Controls the retention or discard of the historical cell state c t 1 :
    f t = σ W f x t + U f h t 1 + b f
  • Output Gate: Modulates the exposure of the cell state c t to update the hidden state h t :
    o t = σ W o x t + U o h t 1 + b o
The memory cell state evolves through
c ~ t = t a n h W c x t + U c h t 1 + b c
c t = f t c t 1 + i t c ~ t
h t = o t t a n h c t
where σ is the sigmoid activation function, denotes element-wise multiplication; b i , b f , b o and b c are bias vectors specific to each component, respectively; W ς and U ς represent input-hidden and hidden-hidden weight matrices, respectively, with unique parameters for gates ς = i , f , o and the cell ς = c . At each timestep t , gate activations and state transitions derive from the current input x t and the previous hidden state h t 1 .

2.3. Framework of the CNNBiLSTM-Based Fault Diagnosis Model

Industrial robot systems operating under complex dynamic conditions generate highly coupled and non-stationary signals, which pose significant challenges to conventional fault diagnosis approaches. Current deep learning approaches also face specific shortcomings: CNNs extract spatially local features through fixed-kernel convolutions but lack dynamic temporal modeling capability, while BiLSTM networks capture temporal dependencies yet inherently ignore spatial correlations.
To address these issues, we employ a CNNBiLSTM model that combines spatial feature extraction with bidirectional temporal modeling. In this framework, convolutional layers identify discriminative spatial patterns, while the BiLSTM modules capture contextual temporal dependencies. This combination yields a unified spatiotemporal representation, which enhances the accuracy of fault diagnosis in IRs.
As illustrated in Figure 7, the CNNBiLSTM fault diagnosis framework processes signals through a hierarchical cascade:
  • Spectral Transformation Module: The initial processing stage employs the Fourier transform to convert the raw time-domain signals into the frequency-domain representations. This transformation reveals latent spectral features that are critical for distinguishing different fault types.
  • Convolutional Layers: These layers apply convolutional kernels to the spectral inputs to extract salient spatial features indicative of fault signatures.
  • Pooling Layers: Subsequent max-pooling operations downsample the feature maps by retaining the most activated values, which serves to reduce data dimensionality and enhance invariance to small signal shifts.
  • BiLSTM Layers: The spatial features are then sequenced and processed by BiLSTM layers. By analyzing the sequence in both forward and backward directions, this module captures the temporal evolution of fault patterns across operational cycles.
  • Fully Connected Layers: The high-level spatiotemporal features from the BiLSTM are integrated by fully connected layers, combining them into a unified representation for final classification.
  • Output Layer: A softmax function produces the final fault probability distribution, and the entire network is trained by minimizing the cross-entropy loss.

2.4. Framework of the MCNNBiLSTM-Based Fault Diagnosis Model

The complex nonlinear dynamics inherent in IR systems distribute fault-related features across multiple temporal scales, requiring multi-resolution analysis for accurate diagnosis. Although standard CNNBiLSTM models track temporal evolution, their fixed-scale convolutional kernels are inadequate for separating overlapping fault signatures that reside in distinct spectrotemporal regions. This inflexibility of the receptive fields limits multi-scale feature adaptation and impedes the modeling of cross-scale correlations.
We present a multi-scale CNNBiLSTM (MCNNBiLSTM) framework for IR fault diagnosis. The model features a dual-branch architecture designed to learn hierarchical fault representations. As shown in Figure 8, it comprises two complementary feature extraction pathways: a large-scale branch (LCNNBiLSTM) captures gradual degradation trends, while a small-scale branch (SCNNBiLSTM) identifies transient fault signatures. Each branch employs BiLSTM modules to model temporal dependencies. Subsequently, the outputs are concatenated to integrate multi-scale temporal features, thereby enhancing the robustness of fault diagnosis.

2.5. Framework of the GAT/MGAT-Based Fault Diagnosis Model

2.5.1. Graph Attention Network

Graph Attention Networks (GATs) enhance standard graph convolutional structures by integrating a masked self-attention mechanism, which dynamically computes attention weights for neighboring nodes. This design enables GATs to focus on the most relevant features when combining information from neighboring nodes. This process strengthens important structural patterns while reducing noise interference. Consequently, GATs generate topology-aware feature representations that significantly improve the robustness and classification accuracy of graph-based diagnostic systems.
GATs operate on graph-structured data characterized by a node feature matrix H = { h 1 , h 2 , . . . , h N g } , where each node h i R D g carries a D g -dimensional feature vector within an N g -node topology. Through attention-driven nonlinear transformations, the architecture derives discriminative representations in latent topological spaces:
H = G A T H , G ; Θ
where G defines the graph connectivity, and Θ denotes trainable parameters. The output embeddings H = { h 1   , h 2 , , h N   } with h i R D g demonstrate enhanced representational capacity for fault patterns.
The transformation involves two sequential stages, as shown in Figure 9:
  • Feature Projection
A trainable shared parameter matrix W G A T R D g × D g transforms input features into a latent representation:
z i = W G A T h i   i { 1 , , N g }
This transformation maintains inherent structural relationships while simultaneously capturing complex feature representations.
2.
Attention-Driven Aggregation
With graph attention mechanisms, normalized attention weights α i j quantify the relational significance of neighbors j N i for each central node i .
Attention Coefficient:
e i j = L e a k y R e L U α z i z j = L e a k y R e L U α W G A T h i W G A T h j
where α R 2 D g denotes the attention parameter vector, and represents feature concatenation.
Normalized Attention Weights:
α i j = e x p e i j k N i e x p e i k = e x p L e a k y R e L U α W G A T h i W G A T h j k N i e x p L e a k y R e L U α W G A T h i W G A T h k
where N i = { j V i , j E } defines the neighborhood of node i , with V and E being the node and edge sets of the graph, respectively.
The aggregated feature representation is given by
h i = σ j N i α i j z j

2.5.2. Multi-Head GAT-Based Fault Diagnosis Model

To improve learning stability, the multi-head attention mechanism extracts features in parallel. As shown in Figure 10, it employs multiple independent attention units, each capturing a distinct hidden representation. Then, these outputs are combined into a unified feature vector.
This process generates two different output representations:
  • Concatenated Features: Multiple independent attention heads concatenate their outputs into high-dimensional representations.
h G A T 1 , i = N h n m = 1 σ G A T 1 j N i α i j m W G A T , k h j
where N h n is the number of heads.
2.
Averaged Features: An alternative approach combines outputs via feature averaging, which produces stable representations with reduced dimensionality.
h G A T 2 , i = σ G A T 2 1 N h n k = 1 N h n j N i α i j m W G A T , k h j
In this study, the second strategy was adopted to optimize computational efficiency and operational resilience. Figure 11 illustrates the GAT/MGAT-based fault diagnosis model for IRs.

2.6. Framework of the MGAT-MCNNBiLSTM-Based Fault Diagnosis Model

Accurate fault diagnosis in IRs requires the concurrent analysis of complex temporal dynamics and spatial dependencies within sensor data. To address this challenge and further improve diagnostic accuracy, we propose MGAT-MCNNBiLSTM, a novel architecture integrating Multi-Graph Attention Network (MGAT) with Multi-scale Convolutional Bidirectional Long Short-Term Memory network (MCNNBiLSTM) for IR fault diagnosis, as illustrated in Figure 12.
This synergistic design capitalizes on complementary strengths: MCNNBiLSTM captures multi-resolution features and bidirectional long-range temporal dynamics, while MGAT directly models heterogeneous structural dependencies across multi-relational graphs. The integrated framework comprehensively characterizes inherent spatiotemporal interactions in the complex industrial system. It preserves essential temporal dynamics while substantially enhancing spatial discriminative capacity, ultimately achieving improved fault classification accuracy.
The complete MGAT-MCNNBiLSTM fault diagnosis model follows a structured workflow, as detailed in Algorithm 1.
Algorithm 1: MGAT-MCNNBiLSTM Fault Diagnosis Model
Input: Raw joint current signals S R N s × N b
Output: Predicted class probabilities P y ^ = c | O out
  1: Procedure Spectral Transformation Module ( S ):
  2: X f r e q F S , X f r e q R N f × N b
  3: return  X f r e q
  4: end Procedure
  5: Procedure MCNNBiLSTM Module ( X f r e q ):
  6://Branch 1: LCNNBiLSTM
  7: X 1 ,   L C X f r e q
  8: for  j =1 to  N L C  do
  9:  if  j > 1  then
  10:    X j ,   L C O j 1 ,   L C
  11:   end if
  12: O j , L C O P j , L C ( X j , L C ) R N j , L C × N b , O P j , L C {Conv, ReLU, MaxPool}
  13: end for
  14: O L C O N L C , L C
  15: X 1 , L B O L C
  16: for  j = 1 to  N L B  do
  17:  if  j > 1  then
  18:    X j , L B O j 1 , L B
  19:  end if
  20: O j , L B O P j , L B ( X j , L B ) R N j ,   L B × N b , O P j , L B {BiLSTM}
  21: end for
  22: O L B O N L B , L B
  23://Branch 2: SCNNBiLSTM
  24: X 1 ,   S C X f r e q
  25: for  j = 1 to  N S C  do
  26:  if  j > 1  then
  27:   X j ,   S C O j 1 ,   S C
  28:  end if
  29: O j , S C O P j , S C ( X j , S C ) R N j ,   S C × N b , O P j , S C {Conv, ReLU, MaxPool}
  30: end for
  31: O S C O N S C , S C
  32: X 1 , S B O S C
  33: for  j = 1 to  N S B  do
  34:   i f   j > 1  then
  35:    X j ,   S B O j 1 ,   S B
  36:  end if
  37: O j , S B O P j , S B ( X j , S B ) R N j ,   S B × N b , O P j , S B {BiLSTM}
  38: end for
  39: O S B O N S B , S B
  40://Feature fusion and projection
  41: O fusion ,   MCB O L B ; O S B
  42: O M C B W F C , M C B O fusion ,   MCB + B F C , M C B
  43: return  O M C B
  44: end Procedure
  45: Procedure MGAT Module ( X f r e q )
  46: X 1 , G A T X f r e q
  47: for  j = 1 to  N G A T  do
  48:   i f   j > 1  then
  49:    X j ,   G A T O j 1 ,   G A T
  50:  end if
  51: O j , G A T O P j , G A T ( X j , G A T ) R N j ,   G A T × N b
       O P j , G A T {Multi-head GAT, BatchNorm, ReLU}
  52: end for
  53: O G A T W F C , G A T O N G A T , G A T + B F C , G A T
  54: return  O G A T
  55: end procedure
  56: Procedure Feature Fusion & Classification ( O M C B , O GAT )
  57: O fusion O M C B ; O GAT
  58: O FC W F C O f u s i o n + B F C , O F C R N f c
  59: O out W o u t O F C + B o u t , O o u t R N o u t
  60: P y ^ = c | O out S o f t m a x O out , c { 1 , , N c }
  61: return  P y ^ = c | O out
  62: end procedure
This algorithm executes the spectral transformation, the MCNNBiLSTM module, the MGAT module, and the feature fusion & classification module in sequence to obtain the IR fault diagnosis result.
  • Spectral Transformation Module
Raw joint current signals S R N s × N b sampled at f s = 1   kHz from industrial robot servo drives. N s and N b are signal length and batch size, respectively.
The Fourier transform converts time-domain signals to frequency-domain representations X f r e q , which corresponds to lines 1–4 in Algorithm 1. N f denotes the number of frequency bins ( N f -point FFT).
2.
MCNNBiLSTM Module
The MCNNBiLSTM module implements a dual-branch architecture for extracting multi-scale spatiotemporal features. The complete algorithmic procedure is detailed in Algorithm 1 (lines 5–44).
The LCNNBiLSTM branch processes spectral features through N L C CNN layers followed by N L B BiLSTM layers, producing the output O L B .
The SCNNBiLSTM branch follows a similar structure with independent parameters ( N S C CNN layers and N S B BiLSTM layers), producing the output O S B .
The outputs from both branches are concatenated and linearly projected to obtain the final MCNNBiLSTM representation O M C B .
3.
MGAT Module
The MGAT module processes spectral features to capture interdependencies through graph attention mechanisms. As outlined in Algorithm 1 (lines 45–55), the module applies N G A T graph attention layers, with the final graph representation O G A T obtained via linear projection.
4.
Feature Fusion & Classification Module
Following the procedure in Algorithm 1 (lines 56–62), the outputs from MCNNBiLSTM and MGAT modules are concatenated, processed through a fully connected layer, and classified via softmax activation to obtain the final fault diagnosis probabilities P y ^ = c | O out .
The cross-entropy loss function quantifies the discrepancy between predicted class probabilities and ground-truth labels by computing the negative log-likelihood of the true class. Formally, for a true label vector y R N c and predicted probability vector y ^ R N c , the loss function L y , y ^ is defined as
L y , y ^ = k = 1 N c y k l o g y ^ k = k = 1 N c y k l o g p c = c ^ = c θ
where y k is the k -th compomemt of y (1 for true class, 0 otherwise);
y ^ k is the predicted probability of class k ;
N c is the number of classes.
The predicted class label is obtained through
c ^ = arg   max k y ^ k { 1 , 2 , , N c }
The complete optimization objective minimizes the expected cross-entropy over the training dataset D :
m i n θ 1 D x , y D L y , y ^ x ; θ
where θ encompasses all trainable parameters in the fault diagnosis model.

2.7. Experiment Methodology

Industrial robot systems exhibit complex fault signatures during sustained operation. The combined effect of mechanical wear, transient collisions, and thermal stress causes dynamically evolving failure mechanisms throughout operational lifespans. During extended uninterrupted service cycles, cumulative wear in transmission components progressively degrades end-effector positioning repeatability, which is a key performance metric meeting production specification standards for industrial robots. Consequently, this loss of positioning repeatability provides a measurable proxy for mechanical degradation, serving as a critical prognostic indicator of system health.
To quantify these failure modes, we recorded current signatures at different fault severity levels. The severity was determined based on both positioning repeatability tolerances and assessments from the maintenance team. Five operational states are defined: Normal, Minor, Moderate, Severe, and Critical. Details of the dataset are provided in Table 2.
In our IRs, the analog current signal is discretized through a servo-controlled system equipped with a sigma-delta (ΣΔ) ADC, achieving 15-bit effective resolution to ensure high-fidelity signal acquisition. A sampling frequency of f s = 1 kHz is employed. The discrete-time sequence, denoted as x i ( i = 1 ,   2 ,   . . . ,   M ) ,   undergoes subsequent digital processing for temporal or spectral analysis.
The study presents MGAT-MCNNBiLSTM, an end-to-end fault diagnosis model for IRs that directly identifies failure modes from motor current signatures. To benchmark its performance, we compare it against several established architectures: CNNBiLSTM with varying convolutional kernel scales, MCNNBiLSTM, GAT, and MGAT. Their network structures are illustrated in Table 3.
To ensure a fair comparison by minimizing parametric influences, MCNNBiLSTM maintains architectural parity with both LCNNBiLSTM and SCNNBiLSTM across all shared components. The only differences lie in the feature fusion layer and the subsequent linear transformation layer. Similarly, MGAT-MCNNBiLSTM inherits core parameters identically from its constituent MGAT and MCNNBiLSTM modules, with changes strictly localized to the same feature fusion and downstream linear layers. The MGAT employs a multi-head attention mechanism with three attention heads. Model training employed the following experimental configuration: batch size N b was set to 128, epochs to 300, the optimizer to Adam, the learning rate to 1 × 10−3, and the loss function to cross-entropy loss.
To comprehensively evaluate the performance of fault diagnosis models, experiments were designed with distinct cases based on different evaluation objectives, as shown in Table 4.
In practical engineering applications, IRs operate in diverse and complex environments where electromagnetic interference inevitably degrades acquired current signals. Furthermore, cost-driven engineering constraints often necessitate lower-resolution ADC sampling. To evaluate fault diagnosis robustness under such compromised signal acquisition scenarios, Gaussian white noise was injected into current signals at controlled signal-to-noise ratios (SNR), where SNR is expressed as
γ S N R = 10 lg P s i g n a l P n o i s e
where P s i g n a l and P n o i s e represent the signal power and noise power, respectively.
To mitigate model overfitting risks, a five-fold cross-validation protocol was rigorously implemented. The dataset was partitioned into five non-overlapping folds. During each training-validation iteration, four folds constituted the training subset while the remaining fold served as the validation subset. This validation scheme quantifies robust diagnostic accuracy for all compared methodologies.
The accuracy of the fault model was calculated using the following equation:
A c c u r a c y = 1 N c v i c v = 1 N c v   T P i c v T P i c v + F N i c v
where T P i c v and F N i c v are true position and false negative of the i c v -th fold cross-validation, respectively.

3. Results

To assess the inherent diagnostic challenge posed by raw sensor data, we first examined the motor current signals in both the time and frequency domains across varying fault severity levels.
Figure 13 characterizes the motor current signatures across varying fault severity levels in robot joints.
As shown, the raw time domain current signals exhibit high similarity across different fault severity levels, resulting in a lack of clear separation that makes direct distinction difficult.
Complementing this, Figure 14 depicts the frequency spectra of motor current across varying fault severity levels in robot joints.
As observed in the figure, the distinctions in the frequency spectra of the motor current across different fault severity levels are subtle. It remains challenging to directly extract effective fault-discriminative features from these frequency domain data. This limitation underscores the necessity of employing DL methods to uncover underlying discriminative patterns.
For a comprehensive comparison of different DL-based fault diagnosis methods for IRs, we used t-distributed stochastic neighbor embedding (T-SNE) to visualize their output features in a two-dimensional feature space, as shown in Figure 15.
In the dimensionality-reduced feature space, the proposed MGAT-MCNNBiLSTM exhibits considerably cleaner boundaries. The features learned by the hybrid deep learning framework are much more separable, which means the classifier will more easily distinguish between different fault types.
Case 1: Performance Validation of Fault Diagnosis Models with Raw Current-Signal Data (Architectural configuration details are provided in Table 3. No spectral preprocessing was applied).
To mitigate the randomness caused by model initialization in diagnostic performance evaluation, fifty independent trials were performed for each method. To visualize the distribution of overall accuracy and highlight performance disparities among the various models, the accuracy results were sorted in ascending order and plotted as a waterfall chart. The corresponding results across all diagnosis models are depicted in Figure 16.
As illustrated in the waterfall chart, the SNNBiLSTM, LCNNBiLSTM, and MCNNBiLSTM models demonstrate comparatively superior performance under this condition. Specifically, their accuracy exceeds that of the MGA-MCNNBiLSTM model and is significantly higher than the performance achieved by the GAT and MGAT models.
To statistically validate and deepen this performance comparison, Figure 17 displays the statistical visualizations comparing the performance of different fault diagnosis models for Case 1.
As depicted in Figure 17, the MCNNBiLSTM model outperforms comparative methods by not only exhibiting the highest median accuracy but also displaying remarkably narrow interquartile ranges. This combination of metrics indicates a consistently superior and highly reliable performance, characterized by minimal variance across the experimental trials.
Table 5 summarizes the comparative performance of six fault diagnosis models in Case 1. Accuracy, Positive Predictive Value (PPV), and F 1 -Score are integral to assessing the performance of each model.
Among the methods evaluated using raw current data without spectral preprocessing, MCNNBiLSTM achieved the highest overall performance, attaining peak metrics of 70.2240 ± 2.3004% accuracy, 71.8084 ± 2.1795% PPV, and 69.7472 ± 2.4187% F1-Score. The LCNNBiLSTM and SCNNBiLSTM models demonstrated moderate efficacy, yielding accuracies of 68.5180 ± 2.6241% and 67.6320 ± 2.5937%, respectively. In contrast, the graph-based approaches, GAT and MGAT, exhibit critically limited diagnostic capability, attaining merely 45.5520 ± 2.7559% and 47.0560 ± 2.6621% accuracy. Significantly, the hybrid MGAT-MCNNBiLSTM architecture, while not surpassing the standalone MCNNBiLSTM, delivered a marked performance improvement over pure graph-based methods (GAT/MGAT), achieving an intermediate Accuracy of 62.6960 ± 2.5217%.
Case 2: Performance Validation of Fault Diagnosis Models with Frequency-transformed Current-Signal Data (Architectural configuration details are provided in Table 3. Spectral preprocessing was applied).
In Case 2, which features raw signal diagnosis supported by spectral preprocessing, the comparative performance across all fault diagnosis models is illustrated in Figure 18.
As illustrated in the waterfall chart, the GAT, MGAT, and MGA-MCNNBiLSTM models demonstrate distinct advantages under this condition, outperforming the LCNNBiLSTM, MCNNBiLSTM, and SNNBiLSTM models in terms of accuracy.
To provide statistical validation for this comparison, Figure 19 presents performance visualizations across the different fault diagnosis models for the second case study (Case 2).
The box plots in Figure 19 statistically corroborate the model performance results for Case 2. The MGA-MCNNBiLSTM model distinguishes itself by achieving a median accuracy above 90% while maintaining a compact interquartile range, indicating both high accuracy and excellent robustness. This performance is notably superior to the other models under evaluation.
Table 6 summarizes the comparative performance of six fault diagnosis models in Case 2. Accuracy, Positive Predictive Value (PPV), and F 1 -Score are integral to assessing the performance of each model.
Table 6 quantitatively summarizes the comparative performance of six fault diagnosis models in Case 2. When evaluated on raw current data with spectral preprocessing, the proposed MGAT-MCNNBiLSTM framework demonstrated superior diagnostic capability, achieving peak metrics of 90.7560 ± 1.3311% accuracy, 91.6626 ± 1.1924% PPV, and 90.6736 ± 1.3685% F1-Score. Among graph-based architectures, both GAT and MGAT delivered competitive results, significantly exceeding all CNNBiLSTM variants. Notably, while MCNNBiLSTM attained 87.8800 ± 1.6407% accuracy, it was surpassed by both LCNNBiLSTM (82.2060 ± 1.8032%) and SCNNBiLSTM (85.9640 ± 1.7009%). Collectively, these findings demonstrate that MGAT-MCNNBiLSTM represents the optimal fault diagnosis framework, achieving a significant 1.51–8.55% absolute improvement in accuracy relative to all benchmark methods.
Figure 20 demonstrates that raw time-series current signals offer limited diagnostic value for deep-learning fault detection, whereas spectral preprocessing substantially enhances model performance.
All models incorporating spectral preprocessing demonstrated statistically superior diagnostic performance compared to non-preprocessed models. Accuracy improvements reached 13.6880%, 18.3320%, and 17.6560% in LCNNBiLSTM, SCNNBiLSTM, and MCNNBiLSTM, respectively. More significant gains are observed in GAT, MGAT, and our proposed MGAT-MCNNBiLSTM, with accuracy increases of 43.2920%, 42.1880%, and 28.0600%, respectively. Similarly, PPV and F1-Score exhibited parallel enhancements.
Case 3: Robustness Assessment Against Signal Degradation via Noise Injection for Emulating Low-Cost, Low-Resolution ADC Scenarios.
To evaluate the noise immunity of different fault diagnosis models, Gaussian white noise was injected into the dataset, establishing an SNR of 40 dB. A comparison of the original and noise-contaminated current signals is presented in Figure 21.
Following noise injection, a marked degradation in signal quality is observable compared to the original state. This effectively emulates the signal corruption encountered in industrial environments characterized by high electromagnetic interference and low-resolution ADC systems.
For Case 3, Figure 22 illustrates the comparative noise robustness of all fault diagnosis models.
The waterfall chart indicates that for Case 3, the GAT, MGAT, and MGA-MCNNBiLSTM models demonstrate superior accuracy compared to the LCNNBiLSTM, MCNNBiLSTM, and SNNBiLSTM models. This trend is consistent with the results observed in Case 2.
To statistically validate this comparison, Figure 23 displays the statistical visualizations comparing the performance of different fault diagnosis models for the third case study (Case 3).
As shown in Figure 23, the MGA-MCNNBiLSTM model stands out in Case 3 by achieving the highest median accuracy and a compact interquartile range, which together signify superior accuracy and robustness relative to the other models.
Table 7 quantitatively summarizes the comparative performance of six fault diagnosis models in Case 3.
Among the methods evaluated using raw current data with spectral preprocessing, the proposed MGAT-MCNNBiLSTM model achieved superior diagnostic capability, delivering peak metrics of 89.4060 ± 1.4222% accuracy, 90.4124 ± 1.1895% PPV, and 89.2810 ± 1.4853% F 1 -Score. Among graph-based approaches, both GAT and MGAT demonstrated competitive results, significantly outperforming all CNNBiLSTM variants. While MCNNBiLSTM attained 85.1820 ± 1.7555% accuracy, it was surpassed by both LCNNBiLSTM (79.3480 ± 1.9455%) and SCNNBiLSTM (82.9900 ± 1.8334%). Significantly, under noise corruption, both GAT variants and the proposed MGAT-MCNNBiLSTM framework exhibited marginal degradation in diagnostic accuracy (1.21–1.37%). In contrast, CNNBiLSTM architectures demonstrated substantially greater performance deterioration (2.70–2.97%), exceeding twice the magnitude of loss observed in graph-enhanced methods. Collectively, these results confirm MGAT-MCNNBiLSTM as the optimal fault diagnosis framework, achieving 1.37–10.26% absolute improvement in accuracy relative to all benchmark methods.

4. Discussion

In Case 1, the experimental results demonstrate that MCNNBiLSTM delivers optimal performance in time-domain applications, while GAT, MGAT, and MGAT-MCNNBiLSTM exhibit suboptimal efficacy under these conditions. This performance gap is primarily attributable to the fundamental limitations of graph-based architectures in time-series fault diagnosis, rooted in an intrinsic incompatibility between graph-structured processing and sequential data physics. These limitations manifest primarily as: (1) Structural incompatibility: Graph-based discretization of continuous time-series compromises critical temporal continuity; and (2) Physics omission: Scale-specific temporal signatures induced by faults remain unmodeled. In contrast, the CNNBiLSTM framework achieves superior performance in time-domain fault diagnosis through its synergistic integration of convolutional and recurrent processing. The CNN component functions as an adaptive local pattern extractor, while the BiLSTM inherently preserves sequential ordering and captures temporal dependencies. This inherently physics-compatible architecture consequently enables robust fault diagnosis.
In Case 2, the results reveal that spectral preprocessing markedly enhances diagnostic performance. Specifically, it fundamentally improves efficacy by decoupling latent signatures, amplifying critical features, and standardizing diagnostic markers, thereby transcending the inherent limitations of time-domain analysis. The core advantages are the following:
(1)
Intrinsic fault-frequency alignment
Mechanical faults such as bearing degradation, gear tooth breakage, and rotor imbalance induce vibration-modulated sidebands in motor current signals. Conventional time-domain analysis struggles to detect these subtle modulations, as they are often obscured by dominant fundamental components. In contrast, spectral transformation effectively separates constituent frequencies, converting masked sidebands into distinct spectral features and condensing distributed temporal anomalies into localized frequency-domain indicators.
(2)
Attenuation of non-diagnostic interference
Current signal phase measurements demonstrate marked vulnerability to noise perturbations, particularly those originating from sampling position dependencies. Spectral representations reliably preserve amplitude characteristics, which constitute primary indicators for fault confirmation, while simultaneously attenuating phase sensitivity. This targeted retention of diagnostically robust features enhances accuracy through the elimination of interference-prone signal parameters.
(3)
Enhanced compatibility with deep learning architectures
Spectrograms spatially localize fault signatures at specific frequency coordinates, enabling clear pattern isolation. Time-domain waveforms, by comparison, often exhibit fault-related anomalies as faint, globally distributed distortions with poor structural definition, complicating feature extraction. Moreover, current amplitudes in the time domain are strongly influenced by load fluctuations, which can overshadow subtle fault-induced changes. Spectral analysis redirects model focus toward structurally stable frequency-domain patterns, aligning with the inherent strength of deep learning in detecting localized, frequency-specific features. This synergy between spectral decomposition and hierarchical learning establishes a highly efficient framework for intelligent fault diagnosis.
In Case 3, noise injection tests were conducted to emulate industrial environments characterized by high electromagnetic interference, as well as to simulate the quantization errors inherent in low-cost, low-resolution ADC systems. Despite significant signal degradation, the proposed framework consistently outperformed all benchmark models under these adversarial conditions. These results empirically confirm the framework’s exceptional robustness against signal contamination and its practicality for deployment in both noise-heavy industrial settings and resource-limited embedded devices. Consequently, this resilience significantly broadens its industrial applicability by lowering hardware precision requirements.

5. Conclusions

This study is motivated by the EBM principle, which prioritizes motor current as the critical bridge between the physics of mechanical faults and measurable electrical signals for IR diagnosis. It advances the theoretical framework of fault diagnosis in IRs by validating the critical necessity of fusing spatiotemporal and structural representations. The research demonstrates that accurate modeling of complex electromechanical systems requires the simultaneous extraction of hierarchical temporal-spectral features and dynamic structural dependencies. By successfully implementing this integration, the proposed MGAT-MCNNBiLSTM architecture provides highly accurate and robust fault diagnosis for IRs. Experimental validation confirms that our model is consistently superior to existing benchmarks, including LCNNBiLSTM, SCNNBiLSTM, MCNNBiLSTM, GAT, and MGAT, and demonstrates significantly enhanced reliability in fault detection across diverse operating conditions. Crucially, this study offers empirical support for evolving EBM theory, transitioning the focus from elementary signal analysis to the modeling of complex system fault diagnosis.
From a practical perspective, this research solidifies motor current analysis as the most viable and economically advantageous sensing modality for industrial deployment. Our approach capitalizes on the existing servo drive infrastructure, thereby eliminating the need for additional expensive sensors and associated retrofitting. Furthermore, the model’s compatibility with legacy, low-resolution data acquisition systems common in installed IRs, as well as its robustness in similar scenarios of data degradation caused by electromagnetic interference, substantially lowers the technical and financial barriers to implementation. Consequently, this synergy of high diagnostic accuracy and reliability, minimal upfront investment, and operational robustness enables a direct transition from costly reactive or scheduled maintenance to more efficient predictive strategies, significantly reducing unplanned downtime and extending IR service life.
The primary limitation of this work concerns the scope of the investigated fault scenarios, as the evaluation primarily focused on mechanical degradation reflected by positioning repeatability errors. In practical industrial applications, IRs are susceptible to a much wider spectrum of anomalies, including electrical component failures, sensor malfunctions, and complex compound faults. Capturing the distinct energy signatures of these varied anomalies is essential to validating the applicability of EBM theory across complex systems. Consequently, addressing this diversity necessitates a significant expansion of the fault knowledge base to ensure comprehensive diagnostic coverage. Therefore, future research will focus on enriching the dataset with diverse fault categories and refining the model architecture to enable the precise identification and classification of a broader range of IR faults.

Author Contributions

Conceptualization, J.W. and Y.Z.; methodology, J.W. and Y.Z.; software, J.W.; validation, J.W. and Y.Z.; formal analysis, J.W., Y.Z., B.G., and L.X.; investigation, J.W.; resources, J.W., B.G., and L.X.; data curation, J.W.; writing—original draft preparation, J.W. and Y.Z.; writing—review and editing, J.W., X.Z., and X.W.; visualization, J.W.; supervision, B.G. and L.X.; project administration, B.G., L.X., and H.W. funding acquisition, J.W., Y.Z., B.G., L.X., X.Z., and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Projects of Shenzhen University of Information Technology under Grant SZIIT2025KJ022, SZIIT2025KJ021 and SZIIT2025KJ057, and in part by the Major Research Plan of the National Natural Science Foundation of China under Grant 92467204.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ADCanalog-to-digital converter
BiLSTMbidirectional long short-term memory
CNNconvolutional neural network
CNNLSTMconvolutional neural network with long short-term memory
CNNBiLSTMconvolutional neural network with bidirectional long short-term memory
DFTdiscrete Fourier transform
DLdeep learning
EBMenergy-based maintenance
EMIelectromagnetic interference
GATgraph attention network
GNNgraph neural network
IRindustrial robot
LSTMlong short-term memory
LCNNlarge-scale convolutional neural network
LCNNBiLSTMlarge-scale convolutional neural network with bidirectional long short-term memory
MCNNBiLSTMMulti-scale convolutional neural network with bidirectional long short-term memory
MCSAmotor current signature analysis
MGATmulti-head graph attention network
OEMoriginal equipment manufacturer
RNNrecurrent neural network
SCNNsmall-scale convolutional neural network
SCNNBiLSTMsmall-scale convolutional neural network with bidirectional long short-term memory
SNRsignal-to-noise ratio

References

  1. IFR. World Robotics 2025. Frankfurt Am Main, Germany, September 2025. Available online: https://ifr.org/downloads/press_docs/PressConference2025_presentation.pdf (accessed on 28 October 2025).
  2. Bhuiyan, M.R.; Uddin, J. Deep transfer learning models for industrial fault diagnosis using vibration and acoustic sensors data: A review. Vibration 2023, 6, 218–238. [Google Scholar] [CrossRef]
  3. Kundu, P. Review of rotating machinery elements condition monitoring using acoustic emission signal. Expert Syst. Appl. 2024, 252, 124169. [Google Scholar] [CrossRef]
  4. Chauhan, S.; Vashishtha, G.; Kaur, P. An Effective Approach to Rotatory Fault Diagnosis Combining CEEMDAN and Feature-Level Integration. Algorithms 2025, 18, 644. [Google Scholar] [CrossRef]
  5. He, Y.; Chen, J.; Zhou, X.; Huang, S. In-situ fault diagnosis for the harmonic reducer of industrial robots via multi-scale mixed convolutional neural networks. J. Manuf. Syst. 2023, 66, 233–247. [Google Scholar] [CrossRef]
  6. Wu, Y.; Bai, Y.; Yang, S.; Li, C. Extracting random forest features with improved adaptive particle swarm optimization for industrial robot fault diagnosis. Measurement 2024, 229, 114451. [Google Scholar] [CrossRef]
  7. Liu, L.; Zhi, Z.; Yang, Y.; Shirmohammadi, S.; Liu, D. Harmonic reducer fault detection with acoustic emission. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar] [CrossRef]
  8. Hu, Q.; Zhang, Y.; Xie, X.; Su, W.; Li, Y.; Shan, L.; Yu, X. Optimal placement of vibration sensors for industrial robots based on Bayesian theory. Appl. Sci. 2022, 12, 6086. [Google Scholar] [CrossRef]
  9. Elahi, M.; Afolaranmi, S.O.; Mohammed, W.M.; Martinez Lastra, J.L. Energy-Based Prognostics for Gradual Loss of Conveyor Belt Tension in Discrete Manufacturing Systems. Energies 2022, 15, 4705. [Google Scholar] [CrossRef]
  10. Orošnjak, M.; Brkljač, N.; Šević, D.; Čavić, M.; Oros, D.; Penčić, M. From predictive to energy-based maintenance paradigm: Achieving cleaner production through functional-productiveness. J. Clean. Prod. 2023, 408, 137177. [Google Scholar] [CrossRef]
  11. Howell, M.T.; Alshakhshir, F. Energy Centered Maintenance: A Green Maintenance System; River Publishers: Aalborg, Denmark, 2020. [Google Scholar] [CrossRef]
  12. Zhou, Y.; Ma, Z.; Fu, L. A Review of Key Signal Processing Techniques for Structural Health Monitoring: Highlighting Non-Parametric Time-Frequency Analysis, Adaptive Decomposition, and Deconvolution. Algorithms 2025, 18, 318. [Google Scholar] [CrossRef]
  13. Raouf, I.; Lee, H.; Kim, H.S. Mechanical fault detection based on machine learning for robotic RV reducer using electrical current signature analysis: A data-driven approach. J. Comput. Des. Eng. 2022, 9, 417–433. [Google Scholar] [CrossRef]
  14. Lee, I.; Park, H.J.; Jang, J.-W.; Kim, C.-W.; Choi, J.-H. System-level fault diagnosis for an industrial wafer transfer robot with multi-component failure modes. Appl. Sci. 2023, 13, 10243. [Google Scholar] [CrossRef]
  15. Wang, Y.; Zhang, M.; Tang, X.; Peng, F.; Yan, R. A kmap optimized VMD-SVM model for milling chatter detection with an industrial robot. J. Intell. Manuf. 2022, 33, 1483–1502. [Google Scholar] [CrossRef]
  16. Peng, B.; Bi, Y.; Xue, B.; Zhang, M.; Wan, S. A Survey on Fault Diagnosis of Rolling Bearings. Algorithms 2022, 15, 347. [Google Scholar] [CrossRef]
  17. Hou, J.; Lu, X.; Zhong, Y.; He, W.; Zhao, D.; Zhou, F. A comprehensive review of mechanical fault diagnosis methods based on convolutional neural network. J. Vibroengineering 2024, 26, 44–65. [Google Scholar] [CrossRef]
  18. Khanam, R.; Hussain, M.; Hill, R.; Allen, P. A comprehensive review of convolutional neural networks for defect detection in industrial applications. IEEE Access 2024, 12, 94250–94295. [Google Scholar] [CrossRef]
  19. Lu, K.; Chen, C.; Wang, T.; Cheng, L.; Qin, J. Fault diagnosis of industrial robot based on dual-module attention convolutional neural network. Auton. Intell. Syst. 2022, 2, 12. [Google Scholar] [CrossRef]
  20. Pan, J.; Qu, L.; Peng, K. Sensor and actuator fault diagnosis for robot joint based on deep CNN. Entropy 2021, 23, 751. [Google Scholar] [CrossRef]
  21. Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
  22. Van Houdt, G.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
  23. Wang, X.; Liu, M.; Liu, C.; Ling, L.; Zhang, X. Data-driven and knowledge-based predictive maintenance method for industrial robots for the production stability of intelligent manufacturing. Expert Syst. Appl. 2023, 234, 121136. [Google Scholar] [CrossRef]
  24. Nacer, S.M.; Nadia, B.; Abdelghani, R.; Mohamed, B. A novel method for bearing fault diagnosis based on BiLSTM neural networks. Int. J. Adv. Manuf. Technol. 2023, 125, 1477–1492. [Google Scholar] [CrossRef]
  25. Wang, J.; Wang, X.; Wang, Y.; Sun, Y.; Sun, G. Intelligent joint actuator fault diagnosis for heavy-duty industrial robots. IEEE Sens. J. 2024, 24, 15292–15301. [Google Scholar] [CrossRef]
  26. Zhi, Z.; Liu, L.; Liu, D.; Hu, C. Fault detection of the harmonic reducer based on CNN-LSTM with a novel denoising algorithm. IEEE Sens. J. 2021, 22, 2572–2581. [Google Scholar] [CrossRef]
  27. Thanh, P.N.; Cho, M.-Y. Advanced AIoT for failure classification of industrial diesel generators based hybrid deep learning CNN-BiLSTM algorithm. Adv. Eng. Inform. 2024, 62, 102644. [Google Scholar] [CrossRef]
  28. Li, T.; Zhou, Z.; Li, S.; Sun, C.; Yan, R.; Chen, X. The emerging graph neural networks for intelligent fault diagnostics and prognostics: A guideline and a benchmark study. Mech. Syst. Signal Process. 2022, 168, 108653. [Google Scholar] [CrossRef]
  29. Liu, Z.; Zhou, J. Introduction to Graph Neural Networks; Springer Nature: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
  30. Xiao, L.; Yang, X.; Yang, X. A graph neural network-based bearing fault detection method. Sci. Rep. 2023, 13, 5286. [Google Scholar] [CrossRef] [PubMed]
  31. Vrahatis, A.G.; Lazaros, K.; Kotsiantis, S. Graph attention networks: A comprehensive review of methods and applications. Future Internet 2024, 16, 318. [Google Scholar] [CrossRef]
  32. Jiang, L.; Li, X.; Wu, L.; Li, Y. Bearing fault diagnosis method based on a multi-head graph attention network. Meas. Sci. Technol. 2022, 33, 075012. [Google Scholar] [CrossRef]
Figure 1. The proposed energy-metric-driven framework for IR fault diagnosis.
Figure 1. The proposed energy-metric-driven framework for IR fault diagnosis.
Algorithms 18 00779 g001
Figure 2. Comparative advantages of MCSA versus vibration/acoustic methods in fault diagnosis.
Figure 2. Comparative advantages of MCSA versus vibration/acoustic methods in fault diagnosis.
Algorithms 18 00779 g002
Figure 3. Positioning repeatability as a diagnostic tool for mechanical faults in IRs.
Figure 3. Positioning repeatability as a diagnostic tool for mechanical faults in IRs.
Algorithms 18 00779 g003
Figure 4. Schematic Diagram of IR End-Effector Positioning Repeatability Measurement.
Figure 4. Schematic Diagram of IR End-Effector Positioning Repeatability Measurement.
Algorithms 18 00779 g004
Figure 5. Data acquisition framework for IRs under different mechanical fault levels based on positioning repeatability thresholds.
Figure 5. Data acquisition framework for IRs under different mechanical fault levels based on positioning repeatability thresholds.
Algorithms 18 00779 g005
Figure 6. Information flow control in the LSTM architecture.
Figure 6. Information flow control in the LSTM architecture.
Algorithms 18 00779 g006
Figure 7. CNNBiLSTM-based fault diagnosis model for IRs.
Figure 7. CNNBiLSTM-based fault diagnosis model for IRs.
Algorithms 18 00779 g007
Figure 8. MCNNBiLSTM-based fault diagnosis model for IRs.
Figure 8. MCNNBiLSTM-based fault diagnosis model for IRs.
Algorithms 18 00779 g008
Figure 9. Graph attention mechanism.
Figure 9. Graph attention mechanism.
Algorithms 18 00779 g009
Figure 10. Multi-head graph attention mechanism.
Figure 10. Multi-head graph attention mechanism.
Algorithms 18 00779 g010
Figure 11. GAT/MGAT-based fault diagnosis model for IRs.
Figure 11. GAT/MGAT-based fault diagnosis model for IRs.
Algorithms 18 00779 g011
Figure 12. MGAT-MCNNBiLSTM-based fault diagnosis model for IRs.
Figure 12. MGAT-MCNNBiLSTM-based fault diagnosis model for IRs.
Algorithms 18 00779 g012
Figure 13. Motor current for varying fault severity levels in our IRs.
Figure 13. Motor current for varying fault severity levels in our IRs.
Algorithms 18 00779 g013
Figure 14. Frequency spectra of motor current for varying fault severity levels.
Figure 14. Frequency spectra of motor current for varying fault severity levels.
Algorithms 18 00779 g014
Figure 15. Output features of different methods in a two-dimensional feature space.
Figure 15. Output features of different methods in a two-dimensional feature space.
Algorithms 18 00779 g015
Figure 16. Waterfall chart of different fault diagnosis models for IRs (Case 1).
Figure 16. Waterfall chart of different fault diagnosis models for IRs (Case 1).
Algorithms 18 00779 g016
Figure 17. Statistical visualizations of different fault diagnosis models for IRs (Case 1).
Figure 17. Statistical visualizations of different fault diagnosis models for IRs (Case 1).
Algorithms 18 00779 g017
Figure 18. The waterfall chart of different fault diagnosis models for IRs (Case 2).
Figure 18. The waterfall chart of different fault diagnosis models for IRs (Case 2).
Algorithms 18 00779 g018
Figure 19. The statistical visualizations of different fault diagnosis models for IRs (Case 2).
Figure 19. The statistical visualizations of different fault diagnosis models for IRs (Case 2).
Algorithms 18 00779 g019
Figure 20. Performance comparison of diagnosis models with and without spectral preprocessing (Case 1 vs. Case 2).
Figure 20. Performance comparison of diagnosis models with and without spectral preprocessing (Case 1 vs. Case 2).
Algorithms 18 00779 g020
Figure 21. Comparison of the original and noise-contaminated current signals. (a) original signal waveform; (b) signal waveform after noise injection (SNR = 40 dB).
Figure 21. Comparison of the original and noise-contaminated current signals. (a) original signal waveform; (b) signal waveform after noise injection (SNR = 40 dB).
Algorithms 18 00779 g021
Figure 22. The flowchart of the case study for different fault diagnosis models for IRs (Case 3).
Figure 22. The flowchart of the case study for different fault diagnosis models for IRs (Case 3).
Algorithms 18 00779 g022
Figure 23. The statistical visualizations of different fault diagnosis models for IRs (Case 3).
Figure 23. The statistical visualizations of different fault diagnosis models for IRs (Case 3).
Algorithms 18 00779 g023
Table 1. Classification of fault severity levels for IRs by δ m / δ s deviation thresholds.
Table 1. Classification of fault severity levels for IRs by δ m / δ s deviation thresholds.
Health StateFault LevelPerformance
Characterization
Description
Normal
Operation
0 δ m δ s End-effector positioning repeatability complies with OEM specifications.
Incipient Anomaly1 δ m ( δ s , 1.2 δ s ] Minor deviations in end-effector repeatability are observed, exhibiting negligible impact on operational integrity.
Moderate Degradation2 δ m ( 1.2 δ s , 1.5 δ s ] Progressive degradation measurably reduces operational positioning accuracy.
Severe
3 δ m ( 1.5 δ s , 2.0 δ s ] Severe repeatability anomalies may disrupt normal operations.
Critical
Failure Risk
4 δ m > 2.0 δ s Critical deviations cause substantial end-effector displacement, inducing functional impairment in robot tasks.
Table 2. Sample information for fault diagnosis.
Table 2. Sample information for fault diagnosis.
Health StateFault LevelSample Size
Normal Operation0200
Incipient Anomaly1200
Moderate Degradation2200
Severe Impairment3200
Critical Failure Risk4200
Table 3. The network structure of the case study for different fault models.
Table 3. The network structure of the case study for different fault models.
ModelLCNNBiLSTMSCNNBiLSTMMCNNBiLSTMModelGAT/MAGTMAGT-MCNNBiLSTM
Layer TypeConfiguration/
Output Shape
Configuration/
Output Shape
Configuration/Output ShapeLayer TypeConfiguration/
Output Shape
Configuration/Output Shape
InputRaw signalRaw signalRaw signalRaw signalInputRaw signalRaw signalRaw signalRaw signal
Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512
Frequency ProcessingFFTFFTFFTFFTFrequency
Processing
FFTFFTFFTFFT
Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512Nb × 1 × 512
Data RestructuringNb × 8 × 64Nb × 8 × 64Nb × 8 × 64Nb × 8 × 64Data
Restructuring
Nb × 512Nb × 8 × 64Nb × 8 × 64Nb × 512
Conv(1)in = 8, out = 64
kernel = 15, stride = 1
in = 8, out = 64
kernel = 3, stride = 1
in = 8, out = 64
kernel = 15, stride = 1
in = 8, out = 64
kernel = 3, stride = 1
MCNNBiLSTM
Conv(1)
/in = 8, out = 64
kernel = 15, stride = 1
in = 8, out = 64
kernel = 3, stride = 1
/
Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64
BatchNorm(1)Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64~/~/
ReLU(1)Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64Nb × 64 × 64
Conv(2)in = 64, out = 30
kernel = 7, stride = 1
in = 64, out = 40
kernel = 3, stride = 1
in = 64, out = 30
kernel = 7, stride = 1
in = 64, out = 40
kernel = 3, stride = 1
MCNNBiLSTM
Linear(1)
/in = 120, out = 256/
Nb × 30 × 64Nb × 40 × 64Nb × 30 × 64Nb × 40 × 64Nb × 256
MaxPool(2)kernel = 2, stride = 2kernel = 2, stride = 2kernel = 2, stride = 2kernel = 2, stride = 2Edge IndexGraph connectivity
nodes = 8
/Graph connectivity
nodes = 8
Nb × 30 × 32Nb × 40 × 32Nb × 30 × 32Nb × 40 × 32GAT/MGAT(1)
Head Fusion
(AVG)
in = nodes × 64,
heads = H,
GAT/MGAT:H = 1/3,
out = nodes × 64,
/in = nodes × 64,
heads = 3,
out = nodes × 64
Conv(3)/in = 40, out = 30,
kernel = 3, stride = 1
/in = 40, out = 30,
kernel = 3, stride = 1
(Nb × nodes) × 64(Nb × nodes) × 64
Nb × 30 × 32/Nb × 30 × 32BatchNorm(1)(Nb × nodes) × 64/(Nb × nodes) × 64
BatchNorm(3)/Nb × 30 × 32/Nb × 30 × 32ReLU(2)(Nb × nodes) × 64/(Nb × nodes) × 64
ReLU(3)/Nb × 30 × 32/Nb × 30 × 32GAT/MGAT(2)
Head Fusion
(AVG)
in = nodes × 64,
heads = H,
GAT/MGAT:H = 1/3,
out = nodes × 64,
/in = nodes × 64,
heads = 3,
out = nodes × 64
Conv(4)/in = 30, out = 30,
kernel = 3, stride = 1
/in = 30, out = 30,
kernel = 3, stride = 1
(Nb × nodes) × 64(Nb × nodes) × 64
Nb × 30 × 32/Nb × 30 × 32BatchNorm(2)(Nb × nodes) × 64/(Nb × nodes) × 64
MaxPool(4)/kernel = 2, stride = 2/kernel = 2, stride = 2ReLU(2)(Nb × nodes) × 64/(Nb × nodes) × 64
Nb × 30 × 16/Nb × 30 × 16Linear(1)in = nodes × 64,
out = 256
/in = nodes × 64,
out = 256
Dimension PermutationNb × 32 × 30Nb × 16 × 30Nb × 32 × 30Nb × 16 × 30Nb × 256Nb × 256
Bidirectional LSTMlayers = 2, hidden = 30, dropout = 0.5,
bidirectional = True
layers = 2, hidden = 30, dropout = 0.5,
bidirectional = True
layers = 2, hidden = 30, dropout = 0.5,
bidirectional = True
layers = 2, hidden = 30, dropout = 0.5,
bidirectional = True
Feature
Concatenation
/Nb × 512
Nb × 32 × 60Nb × 16 × 60Nb × 32 × 60Nb × 16 × 60Linear(2)in = 256, out = NCin = 512, out = 256
Last Time Step OutputNb × 60Nb × 60Nb × 60Nb × 60Nb × NCNb × 256
Feature Concatenation//Nb × 120Linear(3)/in = 256, out = NC
Linear(1)in = 60, out = 256in = 60, out = 256in = 120, out = 256Nb × NC
Nb × 256Nb × 256Nb × 256OutputSoftmaxSoftmax
Linear(2)in = 256, out = NCin = 256, out = NCin = 256, out = NCNb × NCNb × NC
Nb × NCNb × NCNb × NC
OutputSoftmaxSoftmaxSoftmax
Nb × NCNb × NCNb × NC
Different colors represent parameters from different models. For MGAT-MCNNBiLSTM, the colors indicate that it adopts the same structural parameters as the corresponding models represented by those colors.
Table 4. Experimental Design for Performance Evaluation of Fault Diagnosis Models.
Table 4. Experimental Design for Performance Evaluation of Fault Diagnosis Models.
CaseEvaluation ObjectiveSpectral PreprocessingNoise InjectionData Configuration
1To assess the performance of different fault diagnosis models using raw time-domain signalsNot appliedNot appliedRaw time-domain industrial robot current signals as input
2To evaluate diagnostic capability by employing spectral processingAppliedNot appliedCurrent signals after spectral processing as input (based on Case 1)
3To examine the robustness against signal degradation or validate the low-cost, low-resolution ADC sampling effectsAppliedAppliedCurrent signals with noise addition followed by spectral processing as input (based on Case 2)
Table 5. Performance metrics of different fault diagnosis models (Case 1).
Table 5. Performance metrics of different fault diagnosis models (Case 1).
MethodAccuracy (%)PPV (%) F 1 -Score (%)
LCNNBiLSTM68.5180 ± 2.624170.6578 ± 2.671867.8569 ± 2.8022
SCNNBiLSTM67.6320 ± 2.593769.7539 ± 2.619666.9971 ± 2.7752
MCNNBiLSTM70.2240 ± 2.300471.8084 ± 2.179569.7472 ± 2.4187
GAT45.5520 ± 2.755947.6965 ± 2.635645.3136 ± 2.7305
MGAT47.0560 ± 2.662149.2839 ± 2.581246.7199 ± 2.7305
MGAT-MCNNBiLSTM62.6960 ± 2.521763.8109 ± 2.738562.0851 ± 2.6315
Table 6. Performance metrics of different fault diagnosis models (Case 2).
Table 6. Performance metrics of different fault diagnosis models (Case 2).
MethodAccuracy (%)PPV (%) F 1 -Score (%)
LCNNBiLSTM82.2060 ± 1.803283.1439 ± 1.716082.0412 ± 1.8425
SCNNBiLSTM85.9640 ± 1.700986.7810 ± 1.594585.8558 ± 1.7200
MCNNBiLSTM87.8800 ± 1.640788.6811 ± 1.388987.7626 ± 1.6858
GAT88.8440 ± 1.668690.0977 ± 1.543488.7108 ± 1.7327
MGAT89.2440 ± 1.551190.3630 ± 1.410589.1234 ± 1.6106
MGAT-MCNNBiLSTM90.7560 ± 1.331191.6626 ± 1.192490.6736 ± 1.3685
Table 7. Performance metrics of various fault diagnosis models (Case 3).
Table 7. Performance metrics of various fault diagnosis models (Case 3).
MethodAccuracy (%)PPV (%) F 1 -Score (%)
LCNNBiLSTM79.3480 ± 1.945580.4062 ± 1.848479.1439 ± 1.9672
SCNNBiLSTM82.9900 ± 1.833484.4141 ± 1.532582.6989 ± 1.9563
MCNNBiLSTM85.1820 ± 1.755586.0929 ± 1.453784.9786 ± 1.8964
GAT87.4720 ± 1.776588.6734 ± 1.562187.3011 ± 1.7925
MGAT88.0340 ± 1.651889.2909 ± 1.458087.8469 ± 1.6998
MGAT-MCNNBiLSTM89.4060 ± 1.422290.4124 ± 1.189589.2810 ± 1.4853
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, J.; Zhang, Y.; Gao, B.; Xia, L.; Zhu, X.; Wang, H.; Wan, X. A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots. Algorithms 2025, 18, 779. https://doi.org/10.3390/a18120779

AMA Style

Wu J, Zhang Y, Gao B, Xia L, Zhu X, Wang H, Wan X. A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots. Algorithms. 2025; 18(12):779. https://doi.org/10.3390/a18120779

Chicago/Turabian Style

Wu, Jun, Yuepeng Zhang, Bo Gao, Linzhong Xia, Xueli Zhu, Hui Wang, and Xiongbo Wan. 2025. "A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots" Algorithms 18, no. 12: 779. https://doi.org/10.3390/a18120779

APA Style

Wu, J., Zhang, Y., Gao, B., Xia, L., Zhu, X., Wang, H., & Wan, X. (2025). A Hybrid Deep Learning Framework for Enhanced Fault Diagnosis in Industrial Robots. Algorithms, 18(12), 779. https://doi.org/10.3390/a18120779

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop