KACFormer: A Novel Domain Generalization Model for Cross-Individual Bearing Fault Diagnosis

Shu, Shimin; Xu, Muchen; Liu, Peifeng; Yang, Peize; Wu, Tianyi; Yang, Jie

doi:10.3390/app15147932

Open AccessArticle

KACFormer: A Novel Domain Generalization Model for Cross-Individual Bearing Fault Diagnosis

by

Shimin Shu

,

Muchen Xu

,

Peifeng Liu

,

Peize Yang

,

Tianyi Wu

and

Jie Yang

^*

School of Mechanical & Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7932; https://doi.org/10.3390/app15147932

Submission received: 17 June 2025 / Revised: 9 July 2025 / Accepted: 14 July 2025 / Published: 16 July 2025

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The proposed cross-individual diagnosis framework enables bearing fault detection across different bearing types without requiring target domain data, offering a plug-and-play solution for predictive maintenance in renewable energy systems that is particularly suitable for wind turbine maintenance and other industrial applications where labeled fault samples are scarce.

Abstract

Fault diagnosis methods based on deep learning have been widely applied to bearing fault diagnosis. However, current methods usually diagnose on the same individual device, which cannot guarantee reliability in real industrial scenarios, especially for new individual devices. This article explores a practical cross-individual scenario and proposes a Kolmogorov–Arnold enhanced convolutional transformer (KACFormer) model to improve both general feature representation and cross-individual capabilities. Specifically, the Kolmogorov–Arnold representation theorem is embedded into convolution and multi-head attention mechanisms to develop novel Kolmogorov–Arnold enhanced convolution (KAConv) and Kolmogorov–Arnold enhanced attention (KAA). The adaptive activation function enhances its nonlinear modeling ability. Comprehensive experiments are performed on two public datasets, demonstrating the superior generalization of the proposed KACFormer model with a higher accuracy of 95.73% and 91.58% compared to existing advanced models.

Keywords:

bearing; fault diagnosis; transformer; cross-individual; Kolmogorov–Arnold

1. Introduction

Bearing fault diagnosis is critical for the smooth operation of rotating machinery, which plays an essential role in various industries, from power generation to manufacturing. Bearings are fundamental components that facilitate rotation and support mechanical systems. In recent years, many scholars have researched how to extract information from vibration signals for fault diagnosis [1,2], proposing numerous advanced methods [3], among which intelligent fault diagnosis based on deep learning has garnered particular attention in recent years [4,5].

Zhang et al. [6] first designed a deep Convolutional Neural Network (CNN) with wide first-layer kernels (WDCNN) for fault diagnosis under the source domain setting, where both training and testing data were collected from the same set of bearing individuals. This architecture was tailored to enhance the network’s ability to capture long-range dependencies and extract detailed features from the vibration signals. Further, they proposed a convolutional neural network with training interference (TICNN) [7] to enhance performance under noisy environmental conditions and different working loads, which better meets the need of practical industrial scenarios. Lin et al. [8] proposed a novel meta-learning framework called generalized MAML to analyze the acceleration and acoustic signals obtained from bearings under different working conditions. Xu et al. [9] introduced a zero-shot fault semantics learning model for diagnosing compound faults. Additionally, Chen et al. [10] developed a dual adversarial learning-based multi-domain adaptation network for collaborative fault diagnosis across both bearings and gearboxes, with comparative results confirming its superior performance and effectiveness over other methods. Most current intelligent fault diagnosis models are trained and tested within the same individual. However, in practical industrial scenarios, the test objects are usually new components with potential individual differences [11], including manufacturing differences, production inconsistencies, assembly issues, and variations in fault manifestation. Consequently, differences in data distribution between the test and training individuals may hinder the model’s ability to accurately perform fault classification. Recently, scholars have begun to focus on more practical cross-domain issues beyond cross-condition scenarios, such as cross-machine problems [12,13], which are more closely aligned with real-world industrial applications. Cross-individual scenarios are practical, yet they remain insufficiently explored. This presents a challenging task that requires further attention [14,15].

To demonstrate the challenge, the widely cited Case Western Reserve University (CWRU) bearing dataset serves as an illustrative example. Vibration signals corresponding to three distinct bearing fault types, Ball, Inner Race (IR), and Outer Race (OR) from the drive end (DE), were segmented using a sliding window approach to construct the source domain dataset. Respectively, Figure 1a–c and Figure 2a–c present the t-SNE [16] (t-Distributed Stochastic Neighbor Embedding) visualization of the training and testing data distribution for the in-domain task and cross-individual task, where the blue markers represent the dimensionally reduced training set samples and the red markers denote the test set samples in the embedded space.

The t-SNE (t-Distributed Stochastic Neighbor Embedding) visualization reveals the following: (1) compact clustering of same-individual samples (intra-domain consistency), and (2) pronounced distribution shifts between the source (training) and target (testing) individuals, empirically demonstrating the cross-individual generalization challenge in bearing fault diagnosis.

Inspired by the Kolmogorov–Arnold representation theorem (KART) [17], Liu et al. proposed Kolmogorov–Arnold networks (KANs) as promising alternatives to multi-layer perceptrons (MLPs). With a fixed activation function and linear weights replaced by a univariate function parametrized as a spline, KANs tend to show a better nonlinearity expressive capability. This groundbreaking architecture offers promising potential for enhancing model generalization and accuracy. S. Ni et al. [18] applied KANs to sEMG signal classification and achieved exceptionally high accuracy on three datasets, while Cheon et al. [19] employed KANs in remote sensing to improve both efficiency and performance in RS applications. Building on these advancements, KANs hold promise for bearing fault diagnosis based on vibration signals. Considering that not all segments of the spectrum contribute equally to the fault characteristics [20] and the inherent periodicity of vibration signal data, convolution operations and attention mechanisms each offer distinct advantages. The translational invariance of convolution enables the detection of similar local features at different positions, which is crucial for capturing periodic patterns in vibration signals. On the other hand, the attention mechanism, by weighing based on contextual information, captures long-range dependencies and important global information. So far, the architecture of combining CNN and Transformer has achieved remarkable progress in the field of fault diagnosis to enhance diagnostic accuracy and robustness [21,22]. Therefore, we consider integrating KART into classic CNN and Self-attention modules, proposing two novel modules, KAConv and KAA. This aims to improve the model’s capacity to learn fault types by boosting its ability to capture nonlinear features. The main contributions of this paper are as follows:

(1): Cross-individual scenarios are explored to provide a reasonable evaluation framework. Under this benchmark, models’ cross-individual generalization capability in real-world scenarios is effectively evaluated.
(2): A novel Kolmogorov–Arnold enhanced convolutional transformer (KACFormer) model is proposed, which is inspired by KART. KAConv and KAA are designed to improve both general feature representation and cross-individual capabilities.
(3): Comprehensive comparative experiments are conducted on two public datasets, and experiment results demonstrate the effectiveness and superiority of the proposed method.

The structure of this paper is as follows: Section 2 introduces the method proposed in this paper. Section 3 carries on the experiment and analysis from different aspects. Section 4 gives the main conclusion.

2. Proposed Methods

2.1. Problem Definition

This study focuses on addressing the challenge of cross-individual bearing fault diagnosis by developing a deep learning model capable of recognizing variable fault classes. The primary objective is to train the model using labelled vibration signal data from a diverse set of bearing instances. To be specific, the bearing set used for training

S_{t r a i n i n g}

comprises bearing instances

n_{1}, n_{2}, \dots n_{t 1}

, whereas the test bearing set

S_{t e s t}

consists of bearing instances

m_{1}, m_{2}, \dots m_{t 2}

. It is ensured that the intersection of these two sets is an empty set, i.e.,

S_{t r a i n i n g} \cap S_{t e s t} = \emptyset

. Simultaneously, the sets of fault classes in both the training and test bearing datasets fully overlap and are drawn from the same fault set

C_{f} = \{1, 2, \dots, n_{f a u l t}\}

, where

n_{f a u l t}

represents the total number of fault categories. From the two sets of bearing instances described above, the training set and the test set are constructed as follows:

D_{t r a i n} = \{(x_{i}^{(t r a i n)}, y_{i}^{(t r a i n)})| i = 1, 2, \dots, N_{t r a i n}},

(1)

D_{t e s t} = \{(x_{i}^{(t e s t)}, y_{i}^{(t e s t)})| i = 1, 2, \dots, N_{t e s t}},

(2)

where

\forall y_{i}^{(t r a i n)}, y_{i}^{(t e s t)} \in C_{f}

. Each

x_{i}^{(t r a i n)}

and

x_{i}^{(t e s t)}

is an

l_{s e q}

-dimensional vibration signal vector, i.e.,

x_{i}^{(t r a i n)}, x_{i}^{(t e s t)} \in R^{l_{s e q}}

. Let the model be a function

F (x; Θ) : R^{l_{s e q}} \to C_{f}

, where

Θ

denotes the set of trainable parameters. The key objective is to determine the parameter set

Θ

that maximizes the classification accuracy on

D_{t e s t}

. The accuracy on the test set can be defined as

A (D_{t e s t}; Θ) = \frac{1}{N_{t e s t}} \sum_{j = 1}^{N_{t e s t}} I (F (x; Θ) = y_{i}^{(t e s t)}),

(3)

where

I (\cdot)

is the indicator function, which is defined as follows:

I (F (x; Θ) = y_{i}^{(t e s t)}) = \{\begin{matrix} 1, i f F (x; Θ) = y_{i}^{(t e s t)} \\ 0, i f F (x; Θ) \neq y_{i}^{(t e s t)} \end{matrix}

(4)

The optimization problem could be formulated as follows:

Θ^{*} = \arg \max_{Θ} A (D_{t e s t}; Θ) .

(5)

This formulation ensures the model achieves reliable fault detection and diagnosis for the bearing individuals in the target set.

2.2. Overall Framework

As shown in Figure 3, the overall framework of our work is divided into three parts: data processing and dataset creation, model training and training process monitoring, and model testing and performance evaluation.

(1): The task sets are constructed following the methodology in Section 2.1. The validation set is obtained by sampling a specific proportion from the source domain data, ensuring no overlap with the training set.
(2): The KACFormer model is trained for $n_{n u m_e p o c h}$ on the training set. At the end of each epoch, the model is evaluated with the validation set. Notably, target individuals remain unseen by the model throughout the training process. Additionally, only the model that achieves the highest accuracy on the validation set will be saved.
(3): After training for $n_{n u m_e p o c h}$ epochs, the best-performing model is tested on the test set made from distinct individuals. Cross-individual accuracy and confusion metrics are utilized to assess the model’s performance.

2.3. Proposed Modules

2.3.1. 1D-KAConv

Convolutional computation provides translational invariance, enabling the detection of similar patterns regardless of their position in the sequence [23]. Additionally, its local receptive fields facilitate the effective extraction of short-term patterns, peaks, and transient events. However, despite these strengths, conventional convolutional filters are fundamentally linear operators, limiting their ability to capture the intricate nonlinear relationships that are inherent in complex time-series data. This shortfall can hamper the model’s overall representational power when dealing with signals exhibiting nonstationary characteristics. The success of prior works motivates our integration of nonlinear mappings within the Kanlinear module [24]. The incorporation of the Kanlinear computational module effectively addresses the limitations of linear convolutional layers in modeling nonlinear relationships, thereby enhancing the model’s suitability for complex cross-individual tasks. The KAConv kernel is the essence of the KAConv module. Figure 4 shows the computation process.

Considering the optimizability of KANs, an integrated neuron connected by residual activation, the Kanlinear unit, is designed. Given an input x, the output of the Kanlinear unit is the weighted sum of a trainable spline function and a fixed activation function,

ϕ (x) = w_{b} b (x) + w_{s} s p l i n e (x),

(6)

where

w_{b}

and

w_{s}

are weighs of corresponding residual branch. To be specific, the

b (x)

here is a

S i L U (x)

function:

b (x) = S i L U (x) = \frac{x}{1 + e^{- x}} .

(7)

This unique kernel processes relationships within the receptive field and progressively maps the initial signals to a lower dimensional KA feature space. The outputs of step

t

of a KAConv kernel with a size of

K

can be formulated as

y_{t} = \sum_{i = 0}^{K} (w_{t + i}^{b} S i L U (x_{t + i}) + w_{t + i}^{s} s p l i n e (x_{t + i}))

(8)

where

x_{t + i}

is the input signal at time

(t + i)

,

w_{t + i}^{b}

is the weight for the

S i L U (x)

part at position

i

of the kernel, and

w_{t + i}^{s}

is the weight for the

s p l i n e (x_{t + i})

part at position

i

of the kernel. Based on this, we define the 1D multi-channel KAConv operation as

{K A C o n v (X) : R}^{{n_{1} \times l}_{1}} \to R^{{n_{2} \times l}_{2}}

(9)

where

n 1

and

l 1

are the number of channels and the sequence length of the input data, respectively, while

n 2

and

l 2

correspond to the number of channels and the sequence length of the output feature.

2.3.2. Channel-Wise Multihead KAA

Multi-Head Self-attention mechanism is a key defining characteristic of transformer models [25]. The Self-attention mechanism captures global dependencies within data, enabling each element in a sequence to interact with all others, which ensures the comprehensive understanding and integration of context across the entire sequence. By mapping input signals into a space with the same dimension, the KA-Attention (KAA) mechanism enhances the model’s nonlinear expressive power, empowering models to uncover and leverage complex patterns, which is shown in Figure 5.

Given an input sequence represented by a matrix

X \in R^{n \times d}

, where

n

is the sequence length and

d

is the dimensionality of each input vector, the KAA mechanism computes three distinct matrixes: KAQueries (

Q_{K A}

), KAKeys (

K_{K A}

), and KAValues (

V_{K A}

). These matrixes are derived through learned linear transformations of the input:

Q_{K A} = X W_{Q_{K A}},

(10)

K_{K A} = X W_{K_{K A}},

(11)

V_{K A} = X W_{V_{K A}},

(12)

where

W_{Q_{K A}} \in R^{d \times d_{k}}

,

W_{K_{K A}} \in R^{d \times d_{k}}

, and

W_{V_{K A}} \in R^{d \times d_{V}}

are learnable Kanlinear matrixes for

Q_{K A}

,

K_{K A}

, and

V_{K A}

, respectively. To be specific, they can be formulated as matrixes:

W_{Q_{K A}} = [\begin{matrix} ϕ_{11}^{q} & \dots & ϕ_{1 d_{k}}^{q} \\ ⋮ & ⋱ & ⋮ \\ ϕ_{d 1}^{q} & \dots & ϕ_{d d_{k}}^{q} \end{matrix}],

(13)

W_{K_{K A}} = [\begin{matrix} ϕ_{11}^{k} & \dots & ϕ_{1 d_{k}}^{k} \\ ⋮ & ⋱ & ⋮ \\ ϕ_{d 1}^{k} & \dots & ϕ_{d d_{k}}^{k} \end{matrix}],

(14)

W_{V_{K A}} = [\begin{matrix} ϕ_{11}^{v} & \dots & ϕ_{1 d_{k}}^{v} \\ ⋮ & ⋱ & ⋮ \\ ϕ_{d 1}^{v} & \dots & ϕ_{d d_{k}}^{v} \end{matrix}],

(15)

where each

ϕ

is a learnable Kanlinear unit. Once

Q_{K A}

,

K_{K A}

, and

V_{K A}

are computed, the KAA mechanism calculates the attention scores:

K A A (Q_{K A}, K_{K A}, V_{K A}) = s o f t m a x (\frac{Q_{K A} {K_{K A}}^{T}}{\sqrt{d_{k}}}) V_{K A},

(16)

where the softmax function ensures that the attention weights are normalized, facilitating a weighted aggregation of the values

V_{K A}

based on their relevance to each query

Q_{K A}

.

Through the feature extraction operations of the preceding KAConv layers, the input signal is transformed into multi-channel data. To capture and optimize the distinct characters of each channel, the channel-wise multi-head KAA (CMKAA) is designed. Signals from each channel are assigned to different KAA heads. The outputs of each head are concatenated, which is shown in Figure 6. Formally, this process can be formulated as

C M K A A (X) = C o n c a t ({K A A}_{1} (x_{1}), \dots {K A A}_{h} (x_{h})),

(17)

where

x_{i}

is the output sequence of the

i

th channel and

h

is the number of heads.

2.4. KACFormer Model

The overall structure of the KACFormer model is illustrated in Figure 7. Initially, the input signal is encoded into multi-channel features with the KAConv Encoder. The encoder consists of two KAConv Blocks. Each block is equipped with a KAConv layer, a SiLU activation layer, and a pooling layer. Note that the normalizing operation is dropped intentionally. According to the relevant literature, large vibration amplitudes are indicative of bearing faults [19]. Therefore, a maxpooling layer is employed in the first KAConv Block to accentuate peak characteristics. To reduce the overfitting risk and the impact of irrelevant high activations caused by noise, an avgpooling layer is employed in the second KAConv Block. The ith KAConv Block computation can be formally written as below:

B l o c k_{K A C o n v}^{i} (X) = P o o l i n g \{S i L U [K A C o n v (X)]\} .

(18)

Once the row signal is encoded, it is sent into the transformer block. To better align with the KATR, adjustments are made based on the classic transformer architecture by Vaswani, Ashish, et al. [25]. Considering the volume of data, the transformer architecture is designed to be shallow with only one layer to avoid the risks of overfitting, improving its model generalization capacity. Meanwhile, a KA feed-forward (KAFF) layer with SiLU activation is designed, mapping its input to a vector with the same dimension:

K A F F (x) = S i L U (w_{b} S i L U (x) + w_{s} s p l i n e (x)) .

(19)

The residual structures of the classic transformer are retained to keep the optimizability of the model. Given an input matrix

X^{n_{c h a n n e l} \times l_{s e q}}

, let

R e s = L a y e r n o r m (C M K A A (X) + X),

(20)

The output of the CMKAA transformer can be written as follows:

O u t p u t = L a y e r n o r m (K A F F (R e s) + R e s) .

(21)

3. Experiments

To validate the effectiveness and the necessity of the proposed modules in the task, comprehensive experiments on two open datasets are conducted, including ablation study and contrast experiments. The code is written in Python 3.10.14 with Pytorch 2.4.1+cu124 on the computer, manufactured by Lenovo in Beijing, China, with an NVIDIA GeForce RTX 4060 Laptop GPU.

3.1. Task Setups

3.1.1. Task 1 on PU Dataset

The dataset for the first cross-individual task is sourced from the PU dataset from the Paderborn University Konstruktions und Antriebstechnik (KAt) data center [26]. Vibration signals from the bearing housing are collected using piezoelectric accelerometers at a sampling frequency of 64 kHz. The experimental test platform, including a drive motor, load motor, flywheel, torque sensor, and module for testing bearing vibration data, is shown in Figure 8.

A total of 32 different bearings are included in the PU dataset. These bearings vary in terms of their manufacturer, type, and severity of damage, aligning with the objectives of our cross-individual study. The bearing faults in the dataset are generated using two primary methods: electrical discharge machining (EDM) and life acceleration treatment. To ensure the experiments closely resemble real-world industrial scenarios, only the data obtained through life acceleration treatment is utilized. The fault positions considered in this study include Healthy (N), Outer Race (OR), and Inner Race (IR). Additionally, to avoid an imbalanced dataset, only a subset of the provided bearings is used. The chosen bearings for the training and validation sets are K001 (N), K002 (N), K003 (N), KA04 (OR), KA15 (OR), KA16 (OR), KI04 (IR), KI14 (IR), and KI16 (IR). For the testing sets, we selected K006 (N), KA22 (OR), and KI21 (IR).

All data is collected under the same working conditions, with a rotational speed of 900 rpm, load torque of 0.7 Nm, and radial force of 1000 N. The official working condition code is “N09_M07_F10”. Detailed information about the chosen bearings is provided on Table 1.

3.1.2. Task 2 on CWRU Dataset

The dataset for Task 2 is derived from the highly cited CWRU dataset [27]. The test bench consists of a 2 HP motor, a torque transducer, and a power dynamometer. The accelerometers are mounted on the drive side and the fan side of the housing to collect vibration signals, as shown in Figure 9.

The CWRU dataset provides two distinct types of bearings for cross-individual task research, namely the drive end (DE) bearing (6205-2RS JEM SKF; deep groove ball bearing) and the fan end (FE) bearing (6203-2RS JEM SKF; deep groove ball bearing). According to official records, the parameters for the two different bearings are listed in Table 2.

The CWRU dataset provides a relatively smaller scale of data for cross-individual fault diagnosis research. According to the relevant literature [28], models tend to face greater challenges in distinguishing fault classes when subjected to higher loads. Regarding the difficulty caused by a relatively insufficient data volume, Task 2 was set on the working condition with 0 HP load. In Task 2, we utilize data collected from the DE bearings to train the model, while data from the FE bearings is used to test the model. Task 2 is also a three-class classification task, including Healthy, OR fault and IR fault.

3.1.3. Other Settings

The sliding window has a length of 1024 and an overlapping rate of 0.5. All models are optimized using the Adam optimizer [29] with the learning rate set to 0.001. Cross-entropy is used as the loss function to complete backpropagation and parameter updates for the models. To prevent model overfitting and achieve optimal performance on cross-individual tasks, multiple training runs are performed with different numbers of epochs, and the epoch number that yields the best accuracy on the target bearings is selected. Through experiments, the optimal number of epochs for different models ranges from 5 to 20. Additionally, to avoid the randomness of a single experiment, each model is run 15 times on the same task, and the average performance is taken as the result. The distribution of the training, validation, and testing data for both tasks is presented in Table 3. As shown in Figure 10, the statistical analysis using Shapiro–Wilk normality testing and confidence interval estimation demonstrates that 15 experimental repetitions yield sufficiently reliable results, with a narrow 95% CI [94.69%, 96.77%] and low variability (CV = 1.90%).

3.2. Model Selection

The hyperparameters of the model are listed in Table 4.

3.3. Ablation Experiment

To evaluate the efficacy of the proposed modules, the ablation experiment is carried out. The models participating in the ablation experiment are as follows:

(A): CNN+ KAConv. This model has the same structure as the classic WDCNN model. However, all conventional convolution modules are replaced by KAConv modules.
(B): KACFormer–Transformer. The output of KAConv Encoder is sent directly into the MLP classifier.
(C): KACFormer–KAA. This model is equipped with the WD-KAConv Encoder, followed by a traditional transformer layer.
(D): KACFormer–KAConv. Replace the WD-KAConv Encoder with the same structured convolution encoder.
(E): Non-channel-wise KACFormer (N-KACFormer). Assign more channels to each KAA head.
(F): KACFormer. The proposed model.

3.4. Contrast Experiment

To validate the advancement of our proposed KACFormer model, we conducted a comprehensive comparative experiment involving both traditional classic models and modern advanced models, ranging from CNN-based methods to contemporary Transformers. The compared models include classic CNN architecture as well as frameworks that share a similar CNN-Transformer structure to our approach. Below are detailed descriptions of the contrasted models:

(A): WDCNN. Pioneering work in the field of fault diagnosis, serving as the baseline model [6].
(B): CNN-LSTM. A fault diagnosis model utilizing the CNN module and the LSTM module with 100 neurons set for calculation [30].
(C): Convformer. A network designed to extract robust features by integrating both global and local information, with the goal of enhancing the end-to-end fault diagnostic performance of gearboxes under high levels of noise [21]. To better adapt to the mission, the small version of the Conformer is utilized.
(D): Liconvformer. A novel advanced lightweight fault diagnosis model with separable multiscale convolution and broadcast Self-attention [22].
(E): Clformer. A transformer based on convolutional embedding and Self-Attention mechanisms [31].
(F): MobileNetV2. This neural network enhances the state-of-the-art performance of mobile models across various tasks, benchmarks, and a wide range of model sizes [32].
(G): ResNet18. ResNet18 is a convolutional neural network architecture known for its use of residual connections, enabling efficient training of deeper models while maintaining strong performance in image classification and other computer vision tasks [33].
(H): KACFormer. The proposed model.

3.5. Results and Analysis

3.5.1. Ablation Results

The ablation results are listed in Table 5 and Figure 11. The corresponding confusion matrixes are listed in Figure 12a–c and Figure 13a–c.

3.5.2. Contrast Results

The results of the experiment are listed in Table 6 and Figure 14. The confusion matrixes are listed in Figure 15a–d and Figure 16a–d.

3.5.3. Analysis

Other Evaluations

While fault diagnosis accuracy remains the primary industrial concern, a comprehensive model evaluation requires multi-dimensional metrics. In addition to accuracy and confusion matrix metrics, we have incorporated other domain-generalization-specific evaluations: Precision, Recall, and F1-score (all macro-averaged). The relevant data is shown in Table 7.

Learning Dynamics

To obtain a deeper understanding of the learning dynamics of KACFormer in cross-individual learning tasks, the following experiments are conducted: the model is tested on the testing set to observe phenomena such as convergence and potential overfitting after each epoch. The number of epochs is set to be 100.

As illustrated in Figure 17, on Task 1, the KACFormer network achieves its best accuracy on the target domain when it begins to converge on the source domain, typically around 5 to 20 epochs. After the 20th epoch, as training continues, the model’s performance on the target domain begins to degrade. Such a phenomenon is not as pronounced on Task 2. As illustrated in Figure 18, the model converges quickly and demonstrates relatively strong stability on the target dataset. However, the rising validation loss may indicate a potential over-fitting risk. Such a difference may be due to the volume of data and the inherent differences between the individuals in target and training datasets. Moreover, as shown in the confusion matrixes listed in Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16, the features of healthy bearings in the PU and CWRU dataset are more distinct compared to the other two categories, and models participating in the experiment demonstrate strong cross-domain performance. However, there is a significant degree of confusion between Inner Race (IR) and Outer Race (OR) faults.

Why SiLU

In the model, the SiLU activation function is employed in the encoder based on experimental validation. This implementation is consistent with the original KAN authors’ use of SiLU activation in their work [16], further validating our design choice. As demonstrated in Table 8, using ReLU activation resulted in accuracy degradation in both tasks compared to SiLU. This performance gap may be attributed to SiLU’s superior properties: smoothness, non-monotonicity, and boundedness below (unbounded above), which make it particularly suitable for approximating smooth functions, and this may be a latent characteristic exhibited in vibration signal processing.

Working Condition

Working condition variability is a widely discussed challenge in fault diagnosis, with numerous scholars dedicated to cross-condition diagnosis [34,35,36]. Since model performance may vary under different operating conditions, considering data availability, experiments are conducted on Task 1 based on the PU dataset—evaluating the same bearing samples under different conditions. Table 9 presents KACFormer’s accuracy for Task 1 across these three distinct working conditions.

From the experimental data, it can be concluded that when the source domain and target domain individuals work under the same working conditions, the robustness of the algorithm will not be affected.

4. Conclusions

In this work, a cross-individual bearing fault diagnosis scenario is explored. By integrating KART into traditional convolution and attention mechanisms, a novel model, KACFormer, is proposed to explore the potential of leveraging the nonlinear modeling capabilities and individual generalization of KANs. KACFormer enhances feature representation while preserving the translation invariance of convolution and the context-aware semantic modeling ability of attention mechanisms. Extensive experiments on two public datasets demonstrate the superior generalization performance of KACFormer, achieving state-of-the-art accuracy rates of 95.73% and 91.58% in cross-individual scenarios. These results highlight the potential effectiveness of KACFormer in improving fault diagnosis under varying operational conditions and individual differences. Beyond bearing fault diagnosis, the proposed KACFormer framework exhibits potential for generalization to other similar industrial scenarios involving vibration signal processing and domain adaptation, such as health monitoring for gearboxes and wind turbine condition monitoring [37].

Despite the advancement achieved, limitations remain. (1) Compared to MLP-based models, KANs-embedded models take a relatively longer time to train. Further study will be carried out to reduce the complexity of the model while maintaining the cross-individual accuracy. (2) Furthermore, the model should prioritize detecting smaller, incipient faults to better align with industrial requirements, as early-stage fault diagnosis is crucial for predictive maintenance and to minimize machine downtime. However, due to limitations in the current datasets, experiments regarding the model’s minimum detectable fault size were not conducted. This constitutes an important direction for future research.

Author Contributions

Conceptualization, S.S.; methodology, S.S.; software, S.S.; validation, M.X. and P.L.; investigation, P.Y.; resources, J.Y.; writing—original draft preparation, S.S.; writing—review and editing, S.S.; visualization, P.Y. and T.W. supervision, S.S.; project administration, J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the National Natural Science Foundation of China, General Project, 52374167.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
WDCNN	CNNs with Wide First-Layer Kernels
TICNN	WDCNN with Training Interference
MLP	Multi-Layer Perceptron
KART	Kolmogorov–Arnold Representation Theorem
KAN	Kolmogorov–Arnold Network
CWRU	Case Western Reserve University
PU	Paderborn University
FE	Fan End
DE	Drive End
t-SNE	t-Distributed Stochastic Neighbor Embedding

References

Gupta, A.; Onumanyi, A.J.; Ahlawat, S.; Prasad, Y.; Singh, V.; Abu-Mahfouz, A.M. DAT: A robust Discriminant Analysis-based Test of unimodality for unknown input distributions. Pattern Recognit. Lett. 2024, 182, 125–132. [Google Scholar] [CrossRef]
Martínez-Rego, D.; Fontenla-Romero, O.; Alonso-Betanzos, A.; Principe, J.C. Fault detection via recurrence time statistics and one-class classification. Pattern Recognit. Lett. 2016, 84, 8–14. [Google Scholar] [CrossRef]
Chen, C.; Pan, H.; Tong, J.; Wang, Y.; Cheng, J.; Zheng, J. Multi-task Twin Pinball Kernel Matrix Classifier for Pattern Classification. IEEE Trans. Instrum. Meas. 2025, 74, 3524109. [Google Scholar]
Chen, Z.; Liu, J.; Du, Z.; Fan, X.; Luo, H. A Noise-Resilient Fault Diagnosis Method Based on Optimized Residual Networks. IEEE Trans. Instrum. Meas. 2025, 74, 3532410. [Google Scholar] [CrossRef]
Jiao, J.; Zhao, M.; Lin, J.; Liang, K. A comprehensive review on convolutional neural network in machine fault diagnosis. Neurocomputing 2020, 417, 36–63. [Google Scholar] [CrossRef]
Zhang, W.; Peng, G.; Li, C.; Chen, Y.; Zhang, Z. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 2017, 17, 425. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Li, C.; Peng, G.; Chen, Y.; Zhang, Z. A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 2018, 100, 439–453. [Google Scholar] [CrossRef]
Lin, J.; Shao, H.; Zhou, X.; Cai, B.; Liu, B. Generalized MAML for few-shot cross-domain fault diagnosis of bearing driven by heterogeneous signals. Expert Syst. Appl. 2023, 230, 120696. [Google Scholar] [CrossRef]
Xu, J.; Liang, S.; Ding, X.; Yan, R. A zero-shot fault semantics learning model for compound fault diagnosis. Expert Syst. Appl. 2023, 221, 119642. [Google Scholar] [CrossRef]
Chen, X.; Shao, H.; Xiao, Y.; Yan, S.; Cai, B.; Liu, B. Collaborative fault diagnosis of rotating machinery via dual adversarial guided unsupervised multi-domain adaptation network. Mech. Syst. Signal Process. 2023, 198, 110427. [Google Scholar] [CrossRef]
Fan, J.; Qi, Y.; Liu, L.; Gao, X.; Li, Y. Application of an information fusion scheme for rolling element bearing fault diagnosis. Meas. Sci. Technol. 2021, 32, 075013. [Google Scholar] [CrossRef]
Zhou, Y.; Dong, Y.; Zhou, H.; Tang, G. Deep dynamic adaptive transfer network for rolling bearing fault diagnosis with considering cross-machine instance. IEEE Trans. Instrum. Meas. 2021, 70, 3525211. [Google Scholar] [CrossRef]
Qian, Q.; Qin, Y.; Luo, J.; Wang, Y.; Wu, F. Deep discriminative transfer learning network for cross-machine fault diagnosis. Mech. Syst. Signal Process. 2023, 186, 109884. [Google Scholar] [CrossRef]
He, Y.; Shen, W. Msit: A cross-machine fault diagnosis model for machine-level cnc spindle motors. IEEE Trans. Reliab. 2023, 73, 792–802. [Google Scholar] [CrossRef]
He, Y.; Shen, W. MSRCN: A cross-machine diagnosis method for the CNC spindle motors with compound faults. Expert Syst. Appl. 2023, 233, 120957. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar] [PubMed]
Al-Qaness, M.A.A.; Ni, S. TCNN-KAN: Optimized CNN by Kolmogorov-Arnold Network and Pruning Techniques for sEMG Gesture Recognition. IEEE J. Biomed. Health Inform. 2024, 29, 188–197. [Google Scholar] [CrossRef] [PubMed]
Cheon, M.; Mun, C. Combining KAN with CNN: KonvNeXt’s Performance in Remote Sensing and Patent Insights. Remote Sens. 2024, 16, 3417. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ding, Q. Understanding and improving deep learning-based rolling bearing fault diagnosis with attention mechanism. Signal Process. 2019, 161, 136–154. [Google Scholar] [CrossRef]
Han, S.; Shao, H.; Cheng, J.; Yang, X.; Cai, B. Convformer-NSE: A novel end-to-end gearbox fault diagnosis framework under heavy noise using joint global and local information. IEEE/ASME Trans. Mechatron. 2022, 28, 340–349. [Google Scholar] [CrossRef]
Yan, S.; Shao, H.; Wang, J.; Zheng, X.; Liu, B. LiConvFormer: A lightweight fault diagnosis framework using separable multiscale convolution and broadcast self-attention. Expert Syst. Appl. 2024, 237, 121338. [Google Scholar] [CrossRef]
Huang, S.Y.; An, W.J.; Zhang, D.S.; Zhou, N.R. Image classification and adversarial robustness analysis based on hybrid quantum–classical convolutional neural network. Opt. Commun. 2023, 533, 129287. [Google Scholar] [CrossRef]
Majidiyan, H.; Enshaei, H.; Howe, D.; Wang, Y. An integrated framework for real-time sea-state estimation of stationary marine units using wave buoy analogy. J. Mar. Sci. Eng. 2024, 12, 2312. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 13 July 2025).
Fu, Z.; Liu, Z.; Ping, S.; Li, W.; Liu, J. ACGAN: A motor bearing fault diagnosis model based on an auxiliary classifier generative adversarial network and transformer network. ISA Trans. 2024, 149, 381–393. [Google Scholar] [CrossRef] [PubMed]
Loparo, K.A. Case Western Reserve University Bearing Data Center. Bearings Vibration Data Sets, Case Western Reserve University (2012) [EB/OL]. Available online: https://engineering.case.edu/bearingdatacenter (accessed on 13 July 2025).
Zheng, X.; Yue, C.; Wei, J.; Xue, A.; Ge, M.; Kong, Y. Few-shot intelligent fault diagnosis based on an improved meta-relation network. Appl. Intell. 2023, 53, 30080–30096. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Zhi, Z.; Liu, L.; Liu, D.; Hu, C. Fault detection of the harmonic reducer based on CNN-LSTM with a novel denoising algorithm. IEEE Sens. J. 2021, 22, 2572–2581. [Google Scholar] [CrossRef]
Fang, H.; Deng, J.; Bai, Y.; Feng, B.; Li, S.; Shao, S.; Chen, D. CLFormer: A lightweight transformer based on convolutional embedding and linear self-attention with strong robustness for bearing fault diagnosis under limited sample conditions. IEEE Trans. Instrum. Meas. 2021, 71, 1–8. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Cui, Z.; Cao, H.; Ai, Z.; Wang, J. A multi-adversarial joint distribution adaptation method for bearing fault diagnosis under variable working conditions. Appl. Sci. 2023, 13, 10606. [Google Scholar] [CrossRef]
Xu, Y.; Cai, W.; Xie, T. Fault diagnosis of subway traction motor bearing based on information fusion under variable working conditions. Shock Vib. 2021, 2021, 5522887. [Google Scholar] [CrossRef]
Yu, Z.; Zhang, C.; Deng, C. An improved GNN using dynamic graph embedding mechanism: A novel end-to-end framework for rolling bearing fault diagnosis under variable working conditions. Mech. Syst. Signal Process. 2023, 200, 110534. [Google Scholar] [CrossRef]
Tiboni, M.; Remino, C.; Bussola, R.; Amici, C. A review on vibration-based condition monitoring of rotating machinery. Appl. Sci. 2022, 12, 972. [Google Scholar] [CrossRef]

Figure 1. (a) t-SNE visualizations on the same individual with ball fault; (b) t-SNE visualizations on the same individual with OR fault; (c) SNE visualizations on the same individual with IR fault.

Figure 2. (a) t-SNE visualizations on cross-individual task with ball fault; (b) t-SNE visualizations on cross-individual task with OR fault; (c) t-SNE visualizations on cross-individual task with IR fault.

Figure 3. Overall framework of cross-individual diagnosis.

Figure 4. KAConv computation process.

Figure 5. KA-Attention computation process.

Figure 6. Channel-wise multi-head KAA.

Figure 7. Overall structure of KACFormer model.

Figure 8. Test platform for PU dataset.

Figure 9. Test platform on CWRU dataset.

Figure 10. Stability validation (accuracy of KACFormer in 15 repeated experiments on PU dataset).

Figure 11. Ablation results.

Figure 12. (a) Confusion matrixes of ablation experiment on PU dataset (Group 1). (b) Confusion matrixes of ablation experiment on PU dataset (Group 2). (c) Confusion matrixes of ablation experiment on PU dataset (Group 3).

Figure 13. (a) Confusion matrixes of ablation experiment on CWRU dataset (Group 1). (b) Confusion matrixes of ablation experiment on CWRU dataset (Group 2). (c) Confusion matrixes of ablation experiment on CWRU dataset (Group 3).

Figure 14. Contrast results.

Figure 15. (a) Confusion matrixes of contrast experiment on PU dataset. (Group 1). (b) Confusion matrixes of contrast experiment on PU dataset. (Group 2). (c) Confusion matrixes of contrast experiment on PU dataset. (Group 3). (d) Confusion matrixes of contrast experiment on PU dataset. (Group 4).

Figure 16. (a) Confusion matrixes of contrast experiment on CWRU dataset (Group 1). (b) Confusion matrixes of contrast experiment on CWRU dataset (Group 2). (c) Confusion matrixes of contrast experiment on CWRU dataset (Group 3). (d) Confusion matrixes of contrast experiment on CWRU dataset (Group 4).

Figure 17. Training process on PU dataset.

Figure 18. Training process on CWRU dataset.

Table 1. Detailed information on the chosen bearings from the PU dataset.

Data Sets	Bearing Code	Manufacturer	Damage	Combination	Arrangement	Extent of Damage	Fault Type	Fault Code
Training sets	K001	IBU	-	-	-	-	N	1
	K002	IBU	-	-	-	-	N	1
	K003	IBU	-	-	-	-	N	1
	KA04	FAG	fatigue: pitting	S	no repetition	1	OR	2
	KA15	FAG	Plastic deforms: Indentations	S	no repetition	1	OR	2
	KA16	MTK	fatigue: pitting	R	random	2	OR	2
	KI04	MTK	fatigue: pitting	M	no repetition	1	IR	3
	KI14	MTK	fatigue: pitting	M	no repetition	1	IR	3
	KI16	FAG	fatigue: pitting	S	no repetition	3	IR	3
Testing sets	K006	IBU	-	-	-	-	N	1
	KA22	IBU/IBB	fatigue: pitting	S	no repetition	1	OR	2
	KI21	FAG	fatigue: pitting	S	no repetition	1	IR	3

Table 2. Detailed information on the chosen bearings from the CWRU dataset.

Bearing	Inside Diameter	Outside Diameter	Thickness	Ball Diameter	Pitch Diameter
DE	0.9843	2.0472	0.5906	0.3126	1.537
FE	0.6693	1.5748	0.4724	0.2656	1.122

Table 3. Data distribution for both tasks.

	Training		Validation		Testing
Task1	N	1996	N	998	N	998
	IR	2003	IR	1001	IR	998
	OR	2004	OR	1002	OR	998
Task2	N	159	N	79	N	235
	IR	158	IR	79	IR	235
	OR	157	OR	78	OR	236

Table 4. Hyperparameters of the selected model.

Blocks	Layers	Parameters	OUTPUT SHAPE
Input signal	-	-	1 × 2048
KAConv Block1	KAConv	channel = 16, k = 64, s = 16, p = 1	16 × 61
	Activation	SiLU	16 × 61
	MaxPool1D	channel = 16, k = 2, s = 2, p = 0	16 × 30
KAConv Block2	KAConv	channel = 32, k = 2, s = 2, p = 1	32 × 16
	Activation	SiLU	32 × 16
	MaxPool1D	channel = 32, k = 2, s = 2, p = 0	32 × 8
CMKAA Transformer	CMKAA	h = 32, embed dim = 8, dropout = 0.1	32 × 8
	Flatten	-	1 × 256
	KFF	n = 32, activation = SiLU	1 × 256
	Layer norm	-	1 × 256
	Drop out	p = 0.1	1 × 256
MLP classifier	Fc layer1	n = 100, activation = ReLU	1 × 100
	Fc layer2	n = 3, activation = ReLU	1 × 3
	Softmax	-	1 × 3

Table 5. Ablation results.

Task	CNN+ KAConv	KACFormer-Transformer	KACFormer-KAA	KACFormer-KAConv.	N-KACFormer	KAC-Former
PU	89.14%	64.66%	95.25%	68.09%	93.75%	95.73%
CWRU	90.50%	91.34%	89.99%	82.73%	90.74%	91.58%

Table 6. Contrast results.

Task	WDCNN	CNN-LSTM	Convformer	Liconvformer	Clformer	MobileNetV2	ResNet18	KAC- Former
PU	80.26%	93.85%	73.48%	89.23%	91.76%	89.11%	81.24%	95.73%
CWRU	84.49%	68.74%	79.55%	71.11%	72.92%	84.74%	88.38%	91.58%

Table 7. Precision, Recall, and F1-score (macro-averaged) on both tasks.

	Precision	Recall	F1-Score
Task 1	0.9686	0.9671	0.9670
Task 2	0.9317	0.9167	0.9159

Table 8. Model accuracy with different activation functions on different tasks.

	SiLU	ReLU
Task 1 on PU	95.73%	88.74%
Task 2 on CWRU	91.58%	89.98%

Table 9. Model accuracy under different working conditions.

	N09_M07_F10	N15_M01_F10	N15_M07_F10
Accuracy	95.73%	95.31%	95.36%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shu, S.; Xu, M.; Liu, P.; Yang, P.; Wu, T.; Yang, J. KACFormer: A Novel Domain Generalization Model for Cross-Individual Bearing Fault Diagnosis. Appl. Sci. 2025, 15, 7932. https://doi.org/10.3390/app15147932

AMA Style

Shu S, Xu M, Liu P, Yang P, Wu T, Yang J. KACFormer: A Novel Domain Generalization Model for Cross-Individual Bearing Fault Diagnosis. Applied Sciences. 2025; 15(14):7932. https://doi.org/10.3390/app15147932

Chicago/Turabian Style

Shu, Shimin, Muchen Xu, Peifeng Liu, Peize Yang, Tianyi Wu, and Jie Yang. 2025. "KACFormer: A Novel Domain Generalization Model for Cross-Individual Bearing Fault Diagnosis" Applied Sciences 15, no. 14: 7932. https://doi.org/10.3390/app15147932

APA Style

Shu, S., Xu, M., Liu, P., Yang, P., Wu, T., & Yang, J. (2025). KACFormer: A Novel Domain Generalization Model for Cross-Individual Bearing Fault Diagnosis. Applied Sciences, 15(14), 7932. https://doi.org/10.3390/app15147932

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

KACFormer: A Novel Domain Generalization Model for Cross-Individual Bearing Fault Diagnosis

Abstract

Featured Application

Abstract

1. Introduction

2. Proposed Methods

2.1. Problem Definition

2.2. Overall Framework

2.3. Proposed Modules

2.3.1. 1D-KAConv

2.3.2. Channel-Wise Multihead KAA

2.4. KACFormer Model

3. Experiments

3.1. Task Setups

3.1.1. Task 1 on PU Dataset

3.1.2. Task 2 on CWRU Dataset

3.1.3. Other Settings

3.2. Model Selection

3.3. Ablation Experiment

3.4. Contrast Experiment

3.5. Results and Analysis

3.5.1. Ablation Results

3.5.2. Contrast Results

3.5.3. Analysis

Other Evaluations

Learning Dynamics

Why SiLU

Working Condition

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI