1. Introduction
As a precise component that connects the running shaft and shaft seat, rolling bearings can effectively reduce friction losses during equipment operation, and are therefore widely used in various rotating mechanical systems [
1,
2,
3]. In modern industry, mechanical equipment has high loads, long operating times, and frequent working condition transitions. With the accumulation of operating time under asymmetrical conditions, the performance of bearing will gradually deteriorate, and inevitably some faults will occur. Once a malfunction occurs, the operational status of the entire mechanical system will be affected, leading to a decrease in performance and even causing serious production safety accidents, resulting in significant economic losses and casualties [
4,
5,
6]. As a result, exploring diagnostic techniques for rolling bearings to enhance safety, stability, and ensuring the dependable functioning of machinery holds substantial importance in engineering applications [
7,
8].
The evolution of computational technologies has markedly influenced numerous sectors, leading to an increased emphasis on sophisticated diagnostic methods for predicting and monitoring system integrity. In particular, techniques driven by data, such as those found in deep learning, have attracted considerable attention due to their simplified optimization process and lesser dependence on specialized expertise. Common instances include Restricted Boltzmann Machines (RBM) [
9,
10], Convolutional Neural Networks (CNN) [
11,
12], Autoencoders [
13,
14], Deep Belief Networks (DBN) [
15,
16], Recurrent Neural Networks (RNN) [
17,
18], Residual Networks (ResNet) [
19,
20], etc.
Although significant advancements have been made in the area of fault diagnosis through deep learning due to its robust feature extraction abilities, two primary obstacles remain in practical engineering environments. A significant challenge arises from the fact that deep learning approaches usually presume that both the training data and testing data originate from the same statistical distribution. Such a presumption may significantly undermine the performance of diagnostic models when utilized in contexts where data distributions differ, thus requiring models to be re-trained for various diagnostic scenarios. secondly. Secondly, to achieve robust generalization in model development, it is essential to utilize supervised learning that relies on a significant amount of labeled data. Nevertheless, the challenge of obtaining a large enough set of labeled fault examples in industrial contexts poses significant hurdles for the effective application of deep learning methodologies in real-world engineering scenarios. Among the various models in deep learning, convolutional neural networks are particularly esteemed for their outstanding performance in image processing and temporal signal analysis. Generally, there are two main strategies to improve the flexibility of CNN-oriented models for various domains [
21]. One method includes utilizing techniques for knowledge transfer [
22,
23,
24]. The other emphasizes enhancing the sophistication of the CNN framework to detect fault-related features at various scales [
25,
26,
27]. Incorporating Inception Blocks can greatly improve the adaptability of CNNs across various domains. The incorporation of capsule-like neurons, persistent connections, as well as the application of focus mechanisms.
Nonetheless, there exist two notable drawbacks linked to the previously mentioned methodologies. Firstly, the differences between the source and target domains are frequently leveraged as a loss function in transfer learning to pinpoint features that remain invariant across various domains related to faults. Leveraging the current understanding of the target domain is crucial during the model training process. An extensive collection of data derived from diverse operational environments is crucial for the training process. It proves impractical to obtain ample information covering all possible operational circumstances in practical settings. This underscores the importance of creating deep learning systems that can effectively adapt to unfamiliar domains without depending on prior insights about the specific area. Secondly, the primary limitations of CNN models arise from their localized convolution operations and the restricted scope of interdependencies caused by their architecture’s depth [
21].
The approach known as unsupervised domain adaptation (UDA) effectively addresses these issues by enabling the knowledge gained from a labeled source domain to be applied to an unlabeled target domain [
28]. This involves the extraction of features that possess both invariance to domain changes and discriminative power [
29]. There is an increasing emphasis on employing UDA approaches to address the problem of shifting covariates, which has led to notable improvements in the fault diagnosis of various machinery domains. The approaches for advanced fault diagnosis that employ UDA can be divided into four main categories: network-centric methods, mapping-focused strategies, instance-based techniques, and adversarial models. In the context of these UDA-driven approaches, three fundamental components play a crucial role in linking the labeled source domain to the unlabeled target domain: the classification labels, the identification of domains, and the organization of the data. In the UDA setting, the samples from both domains possess the same class labels, which facilitates the integration of source and target domain examples into a cohesive feature representation. Within the realm of adversarial domain adaptation, the domain labels are utilized to create a classifier that effectively differentiates the two domains, enhancing the feature extraction mechanism to more accurately reflect the aggregate distribution found across both areas. The arrangement of the data is essential in facilitating this procedure. Utilizing geometric arrangements along with data distribution strategies allows for significant reductions in the differences between various domains, while the essential traits of the original data spaces remain intact. The interplay of these three types of information serves to ease the differences in distribution and enhances the adaptability between various domains.
To achieve unsupervised domain adaptation and enhance diagnostic capabilities across various domains, a novel approach utilizing GraphKAN has been developed. The Graph Neural Networks can easily handle complex unstructured data. And KAN, as a new neural network architecture, is based on the Kolmogorov–Arnold representation theorem and realizes more flexible activation pattern by replacing the linear weights in the traditional multilayer perceptual machines (MLPs) with a univariate function based on spline learning [
30]. This approach fosters the synchronization of domain distributions through the effective amalgamation of class label modeling, domain identification, and the underlying data architecture within a unified deep learning framework. A model for classification is utilized to denote the category identifiers, while a discriminator for the domains is employed to ascertain the respective domain identifiers related to every instance. To begin with, a layer combining KAN with CNN is utilized to derive features from the original input data and structure the data appropriately. Following this step, a graph generation layer (GGL) is implemented to derive the structural representation based on the features produced by the CNN-KAN method, which facilitates the creation of instance graphs by examining the interconnections between samples. In this phase, a GraphKAN framework is employed to illustrate the structural intricacies conveyed via the weighted links within the graph as the instance graphs are developed. In order to evaluate the differences in structure between the source and target contexts, the correlation alignment loss method, known as CORAL, is applied. By concurrently gathering three unique forms of data, it is possible to derive features that remain consistent across varying domains and are advantageous for classification purposes, thus aiding in fault diagnosis between different domains.
This article’s key contributions are below:
- (a)
A framework known as CNN-KAN has been introduced to facilitate the extraction of features from unprocessed data.
- (b)
A framework based on GraphKAN is developed to facilitate fault diagnosis in fluctuating conditions, which includes relevant loss functions and formulas for parameter updates.
- (c)
Comprehensive adaptation across domains is accomplished by integrating the three distinct forms of information within a cohesive deep learning framework.
The latter section of this paper is structured as follows. The following segment offers a brief summary of the relevant knowledge framework;
Section 3 offers a comprehensive examination of the framework behind the proposed approach;
Section 4 addresses the assessment, execution in real-world scenarios, and analysis of the proposed approach. In the following section, a summary of the research outcomes was provided.
2. Theoretical Background
2.1. Kolmogorov–Arnold Networks
Inspired by the principles outlined in the Kolmogorov–Arnold representation theorem, Liu et al. have proposed a novel approach named Kolmogorov–Arnold Networks (KANs) as a significant alternative to traditional Multi-Layer Perceptrons (MLPs) [
31]. In contrast to MLPs, which rely on static functions for the activation of their processing units termed ‘neurons’, KANs adopt flexible activation functions associated with the connections referred to as ‘weights’, as shown in
Figure 1. While MLPs rely on the universal approximation theorem, KANs take a distinctly different route:
The inspiration behind KANs stems from the Kolmogorov–Arnold representation theorem:
where
are binary functions, while
n represents the number of neurons, and
is a real function.
KANs eliminate the use of linear parameters entirely, substituting each weight parameter with a univariate function defined through a spline. To realize arbitrary depth within KAN, a simple method involves the combination of MLP principles with KAN concepts:
where
k represents the number of KAN layers.
To summarize, KANs enhance this theoretical premise by substituting static activation functions with adaptable ones applied to the weights. Every weight is characterized as a spline function that can be adjusted through learning. This design advancement enables KANs to effectively identify intricate nonlinear correlations by directly refining univariate functions. And the adaptive modeling ability of relationships in data enables better prediction and generalization capabilities than MLP. In addition, KAN decomposes complex functions into simpler components, enabling the efficient processing of large datasets and making it an ideal choice for handling large amounts of information tasks.
2.2. Graph Convolutional Networks
Graph-based convolutional models are neural network frameworks designed explicitly for the analysis of data represented as graphs. This type of architecture excels at revealing complex connections between nodes and understanding the inherent properties of the graph via convolutional operations.
For GCN models, the goal is to learn a function of signals/features on a graph G = (V, E), which takes the following as input:
- (a)
A compact representation of features xi for every node i, compiled within an N × D feature matrix X (N: number of nodes, D: number of input features).
- (b)
An illustration of the graph’s layout in a matrix representation, commonly expressed as an adjacency matrix A.
This process generates an output at the node level, represented as an N × F feature matrix, with F indicating the number of features attributed to each node. Furthermore, to derive outputs at the graph level, a specific pooling operation must be implemented.
Each layer within a neural network can be expressed as a function that is nonlinear in nature:
with
H(0) =
X and
H(
L) =
Z (or z for graph-level outputs),
L being the number of layers. The specific models then differ only in how
f(⋅,⋅) is chosen and parameterized.
Inspired by a first-order analysis of localized spectral filters on graphs, the work of Kipf et al. led to the development of a multi-layer Graph Convolutional Network defined by the following propagation rule for its layers [
32]:
with
, where
I is the identity matrix and
is the diagonal node degree matrix of
.
2.3. GraphKAN Layer
The architecture of graph neural networks (GNNs) allows for them to accommodate the features of graphs through a process known as message passing. GNNs are adept at managing data structured like graphs, which encompasses features related to nodes, xv, and relationships between edges, evw. Typically, the characteristics of nodes are indicated by node attributes, while edge attributes illustrate the relationships between nodes using an adjacency matrix representation.
The framework for passing messages in relation to node representation can essentially be categorized into two distinct phases [
33]:
(a)
Aggregates information from the neighbors. The message function is used to aggregate the neighboring features of the target node, including the target node’s own features
, the features of its neighboring nodes
, and the edge features connecting it to its neighboring nodes
. This aggregation forms a message vector
that is then passed to the target node. The formula is as follows:
where
is the information received by the node in the next layer t + 1,
Mt is the message function,
represents the node features in the current layer,
N(
v) represents the set of neighboring nodes for a node v,
represents the node features of the neighboring nodes in the current layer, and
represents the edge features from node to node.
(b)
Extracts the node representation. An update function processes the node features of the subsequent layer by merging the characteristics from the existing layer’s nodes with the information received via the messaging function. The following equation illustrates this process:
where
is the node update function which accepts the initial state of the node along with the incoming message and produces the updated state of the node.
Most approaches employ a Multi-layer Perceptron (MLP) for the purpose of feature extraction. To improve the capability for capturing nonlinearity, an activation function is typically integrated into Phase (b), as mentioned above. Functions like ReLU can restrict the ability to represent information, which may hinder the learning of intricate node characteristics. The phase of obtaining node representations is enhanced by substituting the MLP with KAN for U
t. The process of extracting the new representation can be expressed as follows:
This study presents a GraphKAN layer that utilizes a GCN as its foundation. The following illustrates the equation for the GraphKAN layer that utilizes a GCN:
(a) Message function
Mt is as follows:
where
deg computes the degree of the node.
(b) Node representation extraction function
Ut is as follows:
The B-spline function is utilized as the Φt.
5. Conclusions
This study introduces an innovative method leveraging GraphKAN for diagnosing faults in rolling bearings amid varying operational scenarios. A framework that integrates CNN and KAN is developed to extract features from the original signals, accompanied by the introduction of a Graph Generation Layer (GGL) that formulates instance graphs using the CNN-KAN features. This design effectively merges the robust feature extraction efficiency of a CNN with the comprehensive generalization abilities of a GCN, addressing the challenges posed by domain shifts in varying operational scenarios. In the proposed framework, KAN layers substitute the conventional MLP layers within the GCN architecture, facilitating a more effective synthesis of both local and global information found in the graphs, which enhances the overall functionality of the approach. Results from experiments conducted on two bearing datasets indicate that the methodology proposed has achieved superior accuracy in fault diagnosis and enhanced stability in varying operational conditions when compared to alternative techniques.
Nonetheless, enhancements to the efficacy of the suggested method could be achieved through the following considerations: (a) the existing classifier and discriminator utilize MLP, which could be substituted with KAN; (b) a comparison of domain adaptation techniques may be conducted.