Node-Incremental-Based Multisource Domain Adaptation for Fault Diagnosis of Rolling Bearings with Limited Data

Deng, Di; Li, Wei; Liu, Jiang; Qin, Yan

doi:10.3390/machines14010071

Open AccessEditor’s ChoiceArticle

Node-Incremental-Based Multisource Domain Adaptation for Fault Diagnosis of Rolling Bearings with Limited Data

¹

Faculty of Science and Engineering, University of Bristol, Bristol BS8 1QU, UK

²

School of Automation, Chongqing University, Chongqing 401331, China

^*

Author to whom correspondence should be addressed.

Machines 2026, 14(1), 71; https://doi.org/10.3390/machines14010071

Submission received: 4 December 2025 / Revised: 22 December 2025 / Accepted: 26 December 2025 / Published: 6 January 2026

(This article belongs to the Special Issue Advanced Condition Monitoring and Predictive Maintenance for Mechatronic-Hydraulic Systems)

Download

Browse Figures

Versions Notes

Abstract

Bearing fault diagnosis is essential for ensuring the safe and reliable operation of rotating machinery. However, accurate and timely fault identification with limited data remains a significant challenge. This study proposes a novel node-incremental-based multisource domain adaptation (NiMDA) approach for bearing fault diagnosis. The method employs a cloud model to adaptively extract fault-sensitive information while accounting for uncertainties across multiple wavelet packet decomposition levels. Subsequently, node incremental domain adaptation (NiDA) is used to construct a base classifier utilizing limited labeled data from both target and source domains. This approach reduces discrepancies between marginal and conditional distributions across different domain feature spaces during the node-increment process, resulting in a compact domain-adaptation structure. Robust diagnostic performance is achieved through parallel ensemble learning of NiDAs across multiple source domains. The experimental results demonstrate that NiMDA significantly outperforms state-of-the-art bearing fault diagnosis methods in few-shot scenarios, achieving improvements of 30.52%, 42.31%, 10.31%, 26.08%, 25.59%, and 7.98% over WDCNN, MCNN-LSTM, Bayesian-RF, DM-RVFLN, Five-shot, and ESCN, respectively, while maintaining satisfactory diagnostic speed.

Keywords:

rolling bearing; fault diagnosis; node incremental learning; limited data; domain adaptation

1. Introduction

Rolling bearings are essential components of rotating machinery in modern industry, fulfilling the dual roles of supporting equipment and facilitating wheel rotation [1,2]. Over time, these bearings are susceptible to failures caused by harsh external conditions, including elevated temperatures, excessive dust, heavy loads, and irregular vibrations. Early detection of bearing failures is crucial for maintaining operational safety, preventing unexpected breakdowns, and reducing economic losses. Consequently, the fault diagnosis of rolling bearings has emerged as a significant research focus within engineering technology [3]. In particular, machine-learning-based approaches to bearing fault diagnosis have attracted substantial scholarly interest in recent years.

Machine learning approaches based on vibration analysis have become widely adopted due to the abundance of fault-related information in vibration signals, particularly for feature extraction and fault classification [4]. In practical applications, bearing vibration signals often exhibit nonlinear and nonstationary characteristics due to instantaneous changes in friction, damping, or loading. Entropy-based methods are capable of capturing complexity and detecting dynamic variations by considering the nonlinear properties of time-series data. Consequently, several entropy-derived measures have been developed to address these challenges, including approximate entropy [5], sample entropy [6], multiscale sample entropy [7], and permutation entropy [8]. While entropy is an effective indicator of uncertainty or irregularity in time series, current entropy-based feature extraction techniques often lack intuitive interpretability of uncertainty and do not adaptively capture fault-related uncertainty. For instance, the first three methods struggle to select appropriate embedding dimensions and tolerance thresholds, whereas permutation entropy relies on user-defined parameters for embedding dimension and time delay. Such reliance on specific hyperparameters can result in abrupt changes in entropy values, which may compromise the reliability and consistency of the extracted features.

Using the extracted informative features, machine learning models have been extensively employed in fault recognition in rolling bearings. These models are generally classified as either shallow learners or deep learners. Representative shallow learners for bearing fault diagnosis include support vector machine (SVM) [9], random vector functional link network (RVFLN) [10], and stochastic configuration network [11]. In contrast, deep learner models primarily comprise convolutional neural networks (CNNs) [12,13,14], long short-term memory (LSTM) networks [15], and deep belief networks [16]. Notably, these models typically assume access to large volumes of training data. However, rotating machinery equipment usually operates under normal conditions, and failures are rare, which limits the availability of fault samples. Consequently, bearing fault diagnosis exemplifies a few-shot classification problem characterized by imbalanced data. As a result, the performance of these methods, especially deep learning models, is constrained by the scarcity of fault samples, leading to reduced diagnostic accuracy. Furthermore, these approaches rely on the assumption that training and test data are independent and identically distributed [17], which requires consistent data distributions. In practice, varying operating conditions of rolling bearings produce inconsistent data distributions, preventing these models from utilizing labeled data from different working conditions and restricting their effectiveness in cross-domain applications.

Domain adaptation serves as an alternative approach that transfers valuable information from the source domain to the target domain, thereby addressing the issue of insufficient data in the target domain [18]. This technique has demonstrated considerable success in various applications related to few-shot bearing-fault diagnosis. Numerous advanced deep domain adaptation methods, utilizing meta-learning, adversarial training, and prototypical networks, have been developed to facilitate effective fault recognition across different domains, significantly improving diagnostic accuracy under limited-data conditions [19,20,21]. For instance, Yu et al. [22]. proposed a mixup data augmentation approach for cross-domain few-shot fault diagnosis of rotating equipment. However, these approaches generally depend on deep neural architectures with multiple layers and a substantial number of trainable parameters, which are optimized iteratively using gradient descent. For example, a model designed for bearing fault diagnosis may include 10 hidden layers, comprising 5 convolutional and 5 fully connected layers, resulting in approximately 673,900 learnable parameters [23]. As a result, deep-domain-adaptation-based bearing fault diagnosis methods often involve a costly and time-consuming process, limiting their suitability for real-time diagnostic applications.

Several studies have recently focused on shallow-domain-adaptation-based bearing fault diagnosis to enable rapid domain adaptation. Dong et al. [24] developed a support vector machine (SVM) using a feature space projected by joint geometric and statistical cross-domain alignment for bearing fault diagnosis. Lei et al. [25] achieved distribution alignment through manifold-embedded, distribution-based feature mapping and subsequently applied XGBoost for cross-domain fault diagnosis. Yu et al. [26] introduced a shallow-domain-adaptation-based fault diagnosis method that extracts domain-invariant features via distribution alignment in the manifold subspace for ridge regression modeling. These shallow domain adaptation approaches primarily focus on matching domain features and require manual classifier selection for fault classification. The lack of a systematic approach to classifier selection remains an unresolved issue, which impedes stable and accurate bearing fault diagnosis. To address this, some shallow-network-based domain adaptation methods have integrated broad networks and random vector functional link networks (RVFLNs) with maximum mean discrepancy (MMD) for fault diagnosis [27,28]. However, these methods often determine the number of network nodes arbitrarily, lacking a systematic strategy, which reduces efficiency in cross-domain adaptation. Based on the above analysis, existing domain adaptation approaches for bearing fault diagnosis exhibit the following limitations:

Current entropy-based feature extraction methods exhibit limited adaptability in capturing multi-level dynamical characteristics, particularly the intrinsic uncertainty present in rolling bearing vibration signals.
Although deep domain adaptation offers significant performance benefits, it requires extended training times and large model parameters, limiting its practicality in modern industrial applications. Additionally, these methods often lack transparency and require enhanced interpretability for effective cross-domain network modeling.
Existing shallow domain adaptation approaches, which integrate domain matching with a fault classifier, require manual configuration of network structures for cross-domain adaptation. Insufficient nodes reduce modeling accuracy, while excessive nodes increase the risk of overfitting and prolong training time.

To address the aforementioned limitations, we propose a node-incremental-based multisource domain adaptation (NiMDA) approach for lightweight and effective bearing fault diagnosis with limited samples. The approach begins with feature extraction from the cloud space of bearing vibration signals across different conditional domains. The cloud feature space, centered on cloud entropy, is constructed to reliably represent the fault concept using a backward cloud generator (BCG) of the cloud model [29] following multi-level wavelet packet decomposition. Next, we develop the base model (NiDA) of NiMDA by integrating node-based constructive incremental learning into a domain adaptation framework. NiDA enables the efficient construction of each hidden node with cross-domain capabilities in a single-node incremental fashion. Incremental maximum mean discrepancy (MMD) is introduced to minimize discrepancies in both marginal and conditional probability distributions between the target and source domains as the number of nodes increases. Additionally, we propose a domain transfer supervisory mechanism that imposes inequality constraints on the assignment of hidden parameters, generating domain-invariant projections during the constructive incremental process. To utilize information from diverse working conditions and train a diagnosis model for the current condition, we employ parallel ensemble learning with multiple NiDAs in few-shot fault diagnosis scenarios. Overall, NiMDA leverages labeled data from multiple source domains to enhance diagnostic accuracy and stability in the current working condition. The main contributions of this paper are as follows:

1.: Hierarchical cloud characteristics are extracted from multi-level wavelet packet coefficients of vibration signals using BCG, minimizing the need for human intervention. This approach facilitates the acquisition of high-resolution, fault-sensitive, and nonstationary features that account for uncertainties.
2.: A novel shallow-network-based MDA framework is proposed for timely fault diagnosis, utilizing diagnostic residual feedback to constrain the number of adaptive nodes and incorporating node incremental learning into domain adaptation. A rigorous convergence proof is provided to enhance theoretical interpretability in domain adaptation.
3.: A parallel ensemble learning approach is introduced to improve diagnostic accuracy and stability in the target domain with limited samples. This method leverages labeled data from multiple source domains and maintains a high diagnostic speed.

The remainder of this paper is organized as follows. The relevant preliminary knowledge is briefly reviewed in Section 2. In Section 3, the proposed NiMDA for bearing fault diagnosis is detailed. Experiments are carried out to evaluate the proposed methods in Section 4. Finally, Section 5 draws our concluding remarks.

2. Preliminaries

This section begins by formulating the MDA problem in the context of bearing fault diagnosis. After that, it introduces the core theoretical concepts of node incremental learning to enhance clarity and understanding.

2.1. Problem Formulation

This study investigates MDA-based fault diagnosis for a target rolling bearing under different yet related working conditions, which are considered source domains. In this way, knowledge from these source domains is transferred to the target domain, which represents the current working condition.

The MDA problem is formulated as follows. Let K source domains be given, where each domain is defined as

D_{S}^{k} = {(x_{S, k}^{i}, y_{S, k}^{i})}_{i = 1}^{N_{S}^{k}} (k = 1, \dots, K)

consisting of

N_{S}^{k}

labeled samples. In this context,

x_{S, k}^{i} \in R^{d}

denotes the

i^{t h}

input vibration sample, and

y_{S, k}^{i} \in R^{m}

represents the corresponding one-hot encoded label vector, with m representing the number of fault categories. The target domain is denoted as

D_{T} = {(x_{T}^{i}, y_{T l}^{i})}_{i = 1}^{N_{T}}

, which includes limited labeled samples

x_{T}

and corresponding health state labels

y_{T}

. Using these

K + 1

training datasets,

D_{T}

and

D_{S}^{k}

serve as the pairwise input for the k-th base model in MDA. Given the limitations of current deep domain adaptation methods, the objective is to develop a lightweight yet effective base model

f : R^{d} \to R^{m}

, formulated as follows:

f_{i}^{*} = \underset{f_{i}}{argmin} E_{(x^{j}, y^{j}) \in D_{T}} {∥\sum_{i = 1}^{L} f_{i} (x^{j}; D_{S}, D_{T}) - y^{j}∥}_{2}^{2}

(1)

where

f_{i}

is the output of the i-th node within

f_{L} = \sum_{i = 1}^{L} f_{i}

, and L denotes the node number.

Afterwords, the optimal k-th base model is denoted as

f^{k, *}

. Additionally, a parallel ensemble learning strategy is employed to maintain high computational efficiency and improve the diagnosis robustness with limited samples, formulated as follows:

F^{*} = \sum_{k = 1}^{K} α_{k} f^{k, *}

(2)

where

α_{k} \in [0, 1]

represents the weight assigned to the base model

f^{k}

, satisfying

\sum_{k = 1}^{K} α_{k} = 1

, and

F^{*}

denotes the final desired ensemble model.

2.2. Theoretical Background

Lightweight model-driven analysis (MDA) can be achieved through node-incremental learning. This method starts with a small network and incrementally adds hidden nodes and their corresponding output weights until a predefined stopping condition is met. These networks primarily include incremental random weight models [30,31,32,33]. Due to their low computational requirements, strong generalization ability, and high learning efficiency with minimal manual parameter tuning, these models have attracted considerable attention in fault diagnosis [34,35]. When the incremental network with L − 1 hidden nodes fails to satisfy the termination criterion, the L-th hidden node is introduced according to the following procedures.

(1) Input Parameter Constrained Generation

The input parameters

w_{L}

and

b_{L}

are initially sampled randomly from the intervals

{[- θ, θ]}^{d}

and

[- θ, θ]

, respectively, where

θ

is a predefined constant. By repeating this random sampling process

T_{max}

times, a set of stochastic mappings

{{\bar{h}}_{L, 1}, \dots, {\bar{h}}_{L, T_{max}}}

is generated. Subsequently, for each

i = 1, \dots, T_{max}

and

q = 1, \dots, m

(with m being the dimension of the output), a supervision criterion is applied as follows:

ξ_{L, i} = \frac{{(e_{L - 1, q}^{T} {\bar{h}}_{L, i})}^{2}}{{∥{\bar{h}}_{L, i}∥}_{2}^{2}} - (1 - μ_{L}) {∥e_{L - 1}∥}_{2}^{2} ⩾ 0

(3)

where

〈\cdot, \cdot〉

denotes the inner product,

{∥\cdot∥}_{2}

stands for the

l_{2}

norm,

e_{L - 1}

represents the residual error from the network with

L - 1

nodes, and

μ_{L}

is a tuning parameter that controls the constraint strength.

According to Equation (3), the optimal mapping

h_{L}^{*}

can be obtained, which subsequently enables identification of the best-fitting input parameters

w_{L}^{*}

and

b_{L}^{*}

for the L-th hidden node.

(2) Output Weight Evaluation

The output weight

β_{L, q}

corresponding to the L-th hidden node is determined using the formula below:

β_{L, q} = \frac{〈e_{L - 1, q}, h_{L}^{*}〉}{{∥h_{L}^{*}∥}_{2}^{2}} .

(4)

The node incremental modeling process terminates when the residual

∥e_{L}∥

meets the specified criterion. This criterion is satisfied either when the number of hidden nodes L reaches the predefined upper limit

T_{max}

or when the residual falls below the tolerance

ε

. If neither condition is met, an additional hidden node is added.

3. Node-Incremental-Based Multisource Domain Adaptation for Bearing Fault Diagnosis

The overall flowchart of the proposed method is illustrated in Figure 1. Initially, raw vibration signals are decomposed into multi-level wavelet packet coefficients to extract fault-sensitive features of rolling bearings. Cloud features are then extracted from each wavelet packet coefficient at various decomposition levels, resulting in fault features that provide comprehensive diagnostic information. Subsequently, the training dataset from the target domain is combined with either itself or source working conditions to serve as input for the base classifier (NiDA) within NiMDA, which is designed to develop a robust cross-domain classifier. Ultimately, bearing fault diagnosis is conducted using the trained NiDAs through parallel ensemble learning. The aforementioned multi-level cloud feature extraction and ensemble-learning-based multisource domain adaptation are elaborated on in the following two subsections.

3.1. Multi-Level Cloud Feature Extraction

Direct fault diagnosis from raw signals is challenging because of the complexity and noise present in vibration data collected from industrial environments. Consequently, robust feature extraction is essential for achieving accurate fault identification. This study proposes a hybrid method that combines wavelet packet decomposition (WPD) with a cloud model to extract discriminative fault features.

WPD enables the decomposition of a bearing signal into multiple sub-signals with equal bandwidths but distinct central frequencies. This method is particularly effective for analyzing non-stationary vibration signals that exhibit high frequencies and significant background noise in complex industrial settings [36]. Figure 2a demonstrates the three-level decomposition process. The wavelet packet coefficients at each node capture unique fault information across various frequency bands, and different decomposition levels yield diverse fault-related insights. However, uncertainty during signal acquisition, which introduces random and ambiguous fault information, remains a common yet frequently overlooked challenge. This uncertainty can significantly reduce the accuracy of feature extraction. The cloud model provides a cognitive framework that addresses this issue by supporting bidirectional transformation between the qualitative meaning of a concept and its quantitative representation, as shown in Figure 2b. In particular, the conceptual parameters (

E x

,

E n

, and

H e

) representing a rolling bearing’s health status can be derived from quantitative data using the backward cloud generator (BCG), which exemplifies the process of extracting knowledge from numerical observations. The following outlines the multi-level cloud feature extraction procedure:

(1) Signal preprocessing: The vibration signal corresponding to each health state of the rolling bearing is denoised using a wavelet filter and subsequently subjected to max–min normalization. Next, sample segmentation is performed using a sliding window.

(2) Wavelet packet decomposition: Wavelet packet coefficients at different decomposition levels are calculated by applying WPD, as described in [37].

(3) Cloud feature extraction: The BCG of the one-dimensional cloud model is applied to extract the bearing dynamic features (

E x

,

E n

, and

H e

) from each wavelet packet coefficient denoted by

x_{i, j} = [x_{i, j}^{0}, x_{i, j}^{1}, \dots, x_{i, j}^{n - 1}]

, where

i = 0, \dots, l

and

j = 0, \dots, 2^{l - 1}

with l being the layer number of WPD. Calculate its sample mean

{\bar{X}}_{i, j} = \sum_{k = 0}^{n - 1} x_{i, j}^{k} / n

. The expectation is formulated as follows:

E x_{i, j} = {\bar{X}}_{i, j} .

(5)

Then, the feature entropy is calculated below:

E n_{i, j} = \sqrt{\frac{π}{2}} \times \frac{1}{n} \sum_{k = 0}^{n - 1} |x_{i, j}^{k} - E x_{i, j}| .

(6)

Lastly, the hyper entropy is obtained as follows:

H e_{i, j} = \sqrt{|S_{i, j}^{2} - E n_{i, j}^{2}|}

(7)

where

S_{i, j}^{2} = \sum_{k = 0}^{n - 1} {(x_{i, j}^{k} - {\bar{X}}_{i, j})}^{2} / n

. Thus, a feature set for one sample contains

\sum_{i = 0}^{l} 3 \cdot 2^{i}

parameters under l-level WPD.

3.2. Multisource Domain Adaptation with Ensemble Learning

Node-incremental-based multisource domain adaptation (NiMDA) involves constructing a node-incremental-based domain adaptation network (NiDA) as the base fault classifier and conducting ensemble learning for bearing fault diagnosis through multiple NiDAs jointing various source domains.

3.2.1. Node-Incremental-Based Domain Adaptation

(a) Overall Objective Function

NiDA aims to learn a lightweight yet effective base classifier of bearing faults in the manner of node incremental learning by utilizing limited labeled samples from the target domain. Its structural diagram is depicted in Figure 3. Its overall framework with L − 1 hidden nodes is formulated as follows:

\begin{matrix} min_{β, e_{S}^{j}, e_{T}^{i}} \frac{{∥β∥}_{2}^{2}}{2} + \frac{φ_{T}}{2} \sum_{i = 1}^{N_{T}} {∥e_{T}^{i}∥}_{2}^{2} + \frac{φ_{S}}{2} \sum_{j = 1}^{N_{S}} {∥e_{S}^{j}∥}_{2}^{2} + \frac{λ}{2} Ψ_{MMD} {(X_{S}, X_{T})}^{2} \\ s . t \{\begin{matrix} H_{S}^{j} β = T_{S}^{j} - e_{S}^{j}, j = 1, \dots, N_{S} \\ H_{T}^{i} β = T_{T}^{i} - e_{T}^{i}, i = 1, \dots, N_{T} \end{matrix} \end{matrix}

(8)

where

H_{S}^{j}

,

T_{S}^{j}

, and

e_{S}^{j}

denote the hidden output, true label, and prediction residual corresponding with the j-th sample from source domain, respectively;

H_{T}^{i}

,

T_{T}^{i}

, and

e_{T}^{i}

represent the output of hidden layer, true label, and prediction residual corresponding with the i-th sample from target domain, respectively;

N_{S}

and

N_{T}

represent the number of labeled training samples from source and target domains, respectively;

φ_{S}

and

φ_{T}

are the trade-off coefficients on the prediction errors of the labeled data from the source domain and target domain, respectively;

λ

is the shrinkage regularization parameter on MMD.

The first two terms of Equation (8) tend to complementarily minimize the structural risk in target and source domains from a cross-domain perspective. The final term mitigates the distributional divergence between the feature spaces of the source and target domains, as mapped by NiDA’s hidden layer.

Ψ_{MMD} (X_{S}, X_{T})

quantifies the dissimilarity between the empirical expectations of the mapped features from two domains, which is defined as follows:

Ψ_{MMD} (X_{S}, X_{T}) = \sum_{c = 0}^{m} {∥(\frac{1}{n_{S}^{(c)}} \sum_{x_{i} \in X_{S}^{(c)}} h (X_{S}^{i}) - \frac{1}{n_{T}^{(c)}} \sum_{x_{j} \in X_{T}^{(c)}} h (X_{T}^{j})) β∥}_{F}

(9)

where

n_{S}^{(0)} = N_{S}

,

n_{T}^{(0)} = N_{T}

,

n_{S}^{(c)}

, and

n_{T}^{(c)}

indicate the number of labeled samples belonging to the c-th fault class in the source domain and target domain with

c \neq 0

, respectively.

X_{S}^{(c)}

and

X_{T}^{(c)}

represent the total samples from the c-th class in the source domain and target domain, respectively;

h (X) = h = σ (Xw + b)

, with

σ

being the activation function.

Notably, Equation (9) indicates that the marginal distribution distance of source and target domains is minimized when

c = 0

. At the same time, when

c \neq 0

, it helps reduce the conditional probability distribution divergence by aligning the intra-class centroids more closely. Subsequently, Equation (9) can be reformulated as follows:

Ψ_{MMD} = \sum_{c = 0}^{m} {∥(d_{S}^{(c)} - d_{T}^{(c)}) β∥}_{F} = \sum_{c = 0}^{m} {∥Δ d^{(c)} β∥}_{F}

(10)

where

d_{S}^{(c)} = \sum_{x_{i} \in X_{S}^{(c)}} h (X_{S}^{i}) / n_{S}^{(c)}

and

d_{T}^{(c)} =

\sum_{x_{j_{T}} \in X_{T}^{(c)}} h (X_{T}^{j}) / n_{T}^{(c)}

.

In summary, NiDA minimizes structural risk and adapts both marginal and conditional distributions to handle data distribution shifts in a manner akin to node incremental learning, ultimately learning a compact, adaptive classifier for multi-class diagnosis problems.

(b) Node Incremental Construction

The incremental construction procedures of NiDA are described as follows:

Step 1: According to Equation (8), the NiDA with L − 1 nodes is built and its current output weights are

β = {[β_{1}, β_{2}, \dots, β_{L - 1}]}^{T}

.

Step 2: Nevertheless, the diagnostic residual

∥e_{T, L - 1}∥

of the current NiDA may fail to meet

ε

and

L - 1 > L_{max}

. A learning scheme is needed to assign appropriate

w_{L}

and

b_{L}

to the L-th node and calculate output weight

β_{L}

, improving prediction accuracy for the constructed NiDA. Inspired by original SCN [32], a rapid updating scheme for the output weights is considered. Equation (8) is rewritten with unconstrained form as follows:

J (β_{L}) = \frac{1}{2} {∥β∥}_{2}^{2} + \frac{φ_{S}}{2} {∥H_{S} β - Y_{S}∥}_{F}^{2} + \frac{φ_{T}}{2} {∥H_{T} β - Y_{T}∥}_{F}^{2} + \frac{λ}{2} {∥d β∥}^{2} .

(11)

where

β

and

d

are updated as

{[β, β_{L}]}^{T}

and

[d, d_{L}]

for the current NiDA with L hidden nodes, respectively. Here,

H_{S} = [h_{S, 1}, \dots, h_{S, L}] \in R^{N_{S} \times L}

and

H_{T} = [h_{T, 1}, \dots, h_{T, L}] \in R^{N_{T} \times L}

.

Then, the output weights are globally evaluated below:

\begin{array}{l} β_{L}^{*} & = {[β_{1}^{*}, β_{2}^{*}, \dots, β_{L}^{*}]}^{T} = {arg min}_{β_{L}} J (β_{L}) \\ = {(I + φ_{S} H_{S}^{T} H_{S} + φ_{T} H_{T}^{T} H_{T} + λ {d_{L}}^{T} d_{L})}^{- 1} (φ_{S} H_{S}^{T} Y_{S} + φ_{T} H_{T}^{T} Y_{T}) \end{array}

(12)

Let

e_{T, L}^{*} = f - \sum_{i = 1}^{L} h_{T, i}^{T} β_{i}^{*}

, and

e_{S, L}^{*} = f - \sum_{j = 1}^{L} h_{S, j}^{T} β_{i}^{*}

, and the intermediate value of

β_{L, q}^{*}

is defined as follows:

{\tilde{β}}_{L, q} = \frac{ϕ_{L, q}^{*} - \frac{λ}{φ_{T}} 〈d_{L}^{T} d, β_{q}〉}{\frac{1}{φ_{T}} + \frac{φ_{S}}{φ_{T}} {∥h_{S, L}∥}^{2} + {∥h_{T, L}∥}^{2} + \frac{λ}{φ_{T}} {∥d_{L}∥}^{2}}

(13)

where

ϕ_{L, q}^{*} = \frac{φ_{S}}{φ_{T}} {e_{S, L - 1, q}^{*}}^{T} h_{S, L} + {e_{T, L - 1, q}^{*}}^{T} h_{T, L}

. In addition,

{\tilde{β}}_{L} = [{\tilde{β}}_{L, 1}, \dots, {\tilde{β}}_{L, m}]

and

{\tilde{e}}_{T, L} = e_{T, L - 1}^{*} - h_{T, L}^{T} {\tilde{β}}_{L}

.

Theorem 1.

Assuming that

s p a n (Γ)

is dense in

L_{2}

space,

\forall h \in Γ

,

0 < ∥h∥ < τ

for

τ \in R^{+}

. Given three positive real numbers

(φ_{S}, φ_{T}, λ)

,

0 < r < 1

, and a non-negative real number sequence

\{μ_{L}\}

with

{lim}_{L \to + \infty} μ_{L} = 0

and

μ_{L} < 1 - r

, the following definition is formulated as follows:

δ_{L}^{*} = \sum_{q = 1}^{m} δ_{L, q}^{*}, δ_{L, q}^{*} = (1 - r - μ_{L}) {∥e_{T, L - 1, q}^{*}∥}^{2} .

(14)

If hidden outputs

h_{S, L}

and

h_{T, L}

are generated to satisfy the following inequalities:

2 G_{L} {e_{T, L - 1, q}^{*}}^{T} h_{T, L} E_{L - 1, q}^{*} - h_{T, L}^{T} h_{T, L} {E_{L - 1, q}^{*}}^{2} ⩾ b_{g}^{2} δ_{L, q}^{*}

(15)

where

E_{L - 1, q}^{*} = \frac{φ_{S}}{φ_{T}} {e_{S, L - 1, q}^{*}}^{T} h_{S, L} + {e_{T, L - 1, q}^{*}}^{T} h_{T, L} - \frac{λ}{φ_{T}} 〈d_{L} d^{T}, β_{q}〉

,

G_{L} = \frac{1}{φ_{T}} + \frac{φ_{S}}{φ_{T}} h_{S, L}^{T} h_{S, L} + h_{T, L}^{T} h_{T, L} + \frac{λ}{φ_{T}} d_{L}^{T} d_{L}

, and

b_{g} = \frac{1}{φ_{T}} + (\frac{φ_{S}}{φ_{T}} + 1 + \frac{λ}{φ_{T}}) τ^{2}

.

Afterward, the optimal output weights are evaluated by (12). Finally, we have

{lim}_{L \to + \infty} ∥f - f_{L}^{*}∥ = 0

.

Step 3: The base model completes if it satisfies either of the following rules: if the node number L reaches

L_{m a x}

, the construction procedure stops and outputs the final NiDA; or if

∥e_{T, L}∥

is smaller than

ε

, the NiDA is fully established. Otherwise, it needs to be further updated with the previous steps.

Remark 1.

The proposed NiDA, derived from the overall optimization framework in a node incremental manner through (12) and (15), provides each node with efficient domain adaptability. Theorem 1 expedites the decrease in network residual and accelerates the convergence speed of NiDA, characterized by requiring a small number of nodes. Moreover, NiDA improves diagnostic accuracy from the perspective of model-based incremental learning rather than data-based incremental learning [38]. Thus, NiDA is more suitable for the classification issue with limited data in efficient and effective fault diagnosis.

(c) Algorithm Implementation

Based on (15), the

{{\tilde{h}}_{S, L}^{1}, \dots, {\tilde{h}}_{S, L}^{T_{max}}}

and

{{\tilde{h}}_{T, L}^{1}, \dots, {\tilde{h}}_{T, L}^{T_{max}}}

are obtained by randomly generating

ω_{L}

from

{[- r, r]}^{d}

and

b_{L}

from

[- r, r]

for

T_{m a x}

times.

ς

is introduced to assess the quality of the generated L-th node for NiDA. Specifically, for

t = 1, \dots, T_{m a x}

and

q = 1, \dots, m

,

ς_{L, t, q}

is defined as follows:

ς_{L, t, q} = \frac{2 G_{L} 〈e_{T, L - 1, q}^{*}, {\tilde{h}}_{T, L}^{t}〉 E_{L - 1, q}^{*} - {∥{\tilde{h}}_{T, L}^{t}∥}_{2}^{2} {E_{L - 1, q}^{*}}^{2}}{\frac{1}{φ_{T}} + \frac{φ_{S}}{φ_{T}} {∥{\tilde{h}}_{S, L}^{t}∥}_{2}^{2} + {∥{\tilde{h}}_{T, L}^{t}∥}_{2}^{2} + \frac{λ}{φ_{T}} {∥d_{L}∥}_{2}^{2}} - (1 - γ - μ_{L}) {∥e_{T, L - 1, q}^{*}∥}^{2} .

(16)

The

ω_{L}^{*}

and

b_{L}^{*}

are derived based on the conditions

℘_{t} ⩾ 0

and

℘_{t} ⩾ ℘_{i \neq t}

for

i = 1, \dots, T_{m a x}

. Subsequently, the output weights of current NiDA are calculated based on Equation (12).The pseudocode of NiDA is presented in Algorithm 1, while the proof of Theorem 1 is provided in Appendix A.

Algorithm 1 NiDA

Require:: Given training datasets $D_{S} = {(X_{S}^{j}, Y_{S}^{j})}_{j = 1}^{N_{S}}, X_{S}^{j} \in R^{d}, Y_{S}^{j} \in R^{m}$ and $D_{T} = {(X_{T}^{i}, Y_{T}^{i})}_{i = 1}^{N_{T}}, X_{T}^{i} \in R^{d}, Y_{T}^{i} \in R^{m}$ in source domain and target domain, maximum number of hidden nodes $L_{m a x}$ , set expected error tolerance $ε$ , maximum number times $T_{m a x}$ of random configuration and regularization parameters $\{φ_{T}, φ_{S}, λ\}$ , source model parameters $\{β_{S}, W_{S}, b_{S}\}$ ; choose a set of positive scalars $r = [γ_{min} : Δ γ : γ_{max}]$ .
Ensure:: $W^{*}$ , $b^{*}$ , and $β^{*}$ .
1:: Initialize $e_{0} = {[Y_{T}^{1}, Y_{T}^{2}, \dots, Y_{T}^{N_{T}}]}^{T}$ , $0 < γ < 1$ , $W = []$ , and $Ω = []$ .
2:: while $L ⩽ L_{m a x}$ and ${∥e_{0}∥}_{F} > ε$ do
3:: for $θ \in r$ do
4:: for $t = 1, 2, \dots, T_{m a x}$ do
5:: Calculate $ς_{L, q}$ according to Equation (16);
6:: Set $μ_{L} = (1 - r) / (L + 1)$ ;
7:: if $min \{ς_{L, 1}, \dots, ς_{L, m}\} ⩾ 0$ then
8:: Save $w_{L}$ and $b_{L}$ in W, $ξ_{L} = \sum_{q = 1}^{m} ς_{L, q}$ in $Ω$ , respectively;
9:: else
10:: go back to Step 4;
11:: end if
12:: end for
13:: if W is not empty then
14:: Find $w_{L}^{*}$ and $b_{L}^{*}$ that maximize $ξ_{L}$ in $Ω$ ;
15:: Set $H_{S, L} = [h_{S, 1}^{*}, \dots, h_{S, L}^{*}]$ and $H_{T, L} = [h_{T, 1}^{*}, \dots, h_{T, L}^{*}]$ ;
16:: Break (go to Step 18);
17:: else
18:: randomly set $τ \in (0, 1 - γ)$ , update $γ = γ + τ$ , return to Step 4;
19:: end if
20:: end for
21:: Calculate $β^{*} = {[β_{1}^{*}, \dots, β_{L}^{*}]}^{T}$ using Equation (12);
22:: Calculate $e_{L} = H_{L}^{T} β_{T}^{*} - Y$ ;
23:: Renew $e_{0} = e_{L}$ and $L = L + 1$ ;
24:: end while

3.2.2. Ensemble Learning

To maximize the utility of labeled data from diverse working conditions of the target rolling bearing, this study proposes an ensemble cross-domain learning framework based on NiDA that incorporates a parallel ensemble learning strategy. Vibration signals collected under the target working condition and other working conditions are designated as the target domain

D_{T}

and multiple source domains

D_{S}^{k} (k = 1, 2, \dots, K)

, respectively. Multiple joint datasets are constructed by pairing

D_{T}

with either itself or each

D_{S}^{k}

, which serve as inputs to each NiDA and foster diversity among them. The final fault diagnosis is determined through majority voting.

The decision function of the k-th NiDA within NiMDA is denoted as

f_{k} (x)

, where

k = 1, 2, \dots, K + 1

. The label of the j-th class is represented by

C_{j}

, with

j = 1, 2, \dots, m

. The quantity

N u m_{j} = n u m b e r {k |f_{k} (x) = C_{j}}

denotes the total number of times the j-th label appears in all predictions from NiDAs, where

n u m b e r (\cdot)

refers to the counting operation. The final classification results are determined as follows:

f_{f i n a l} (x) = \underset{j}{arg max} (N u m_{j})

(17)

4. Experiments

This section demonstrates the effectiveness and performance of the proposed approaches using acceleration vibration signals from rolling bearings, as provided by the Bearing Data Center of Case Western Reserve University (CWRU) [39].

4.1. Data Description and Experiment Setup

4.1.1. Data Description

Vibration signals were collected from a 6205-2RSJEM SKF deep groove ball bearing using an experimental setup comprising a 2 HP motor with a drive-end bearing, a torque sensor/encoder, and a dynamometer, as shown in Figure 4. Pitting corrosion was applied to induce various health states, including inner ring fault (IRF), outer ring fault (ORF) with damage at 3 o’clock, 6 o’clock, and 12 o’clock positions, rolling element fault (REF), and normal state (NS). Fault diameters were set at 0.1778 mm (7 mils), 0.3556 mm (14 mils), and 0.5334 mm (21 mils). Vibration signals were recorded under four working conditions: 1797 r/min with 0 HP, 1772 r/min with 1 HP load, 1750 r/min with 2 HP load, and 1720 r/min with 3 HP load, all sampled at 12 kHz. The raw signals were segmented using a sliding window with a length of 1024 and a step size of 1024. The segmented signals were then processed using a multi-level cloud feature extraction method to generate the feature datasets, as detailed in Table 1. Table 1 presents the distribution of the four feature datasets across the four working conditions. For each working condition, each fault category includes three fault-diameter variations: 7 mils, 14 mils, and 21 mils. The training datasets exhibit imbalanced class distributions, which reflect real-world scenarios.

4.1.2. Experiment Setup

A cross-domain diagnosis task is represented as

A

→

B

, where

A

denotes the source domain dataset and

B

denotes the target domain dataset. The NiDA hyperparameters are set as follows:

T_{m a x} = 100

,

ε = 0.1

,

γ \in {0.5, 1, 3, 5, 7, 10, 25, 50, 100, 150, 200}

, and

L_{m a x} = 200

. The sigmoid activation function is used. The remaining hyperparameters are selected via a grid-search strategy. The regularization parameters

φ_{T}

and

φ_{S}

are chosen from

10^{- 3}, 10^{- 2}, \dots, 10^{2}, 10^{3}

, and

λ

is selected from

{0.1, 0.5, 1, 2, 10, 20, 50}

. All experiments are performed on a Windows 10 platform with an Intel i5-12600KF processor (3.69 GHz CPU, 32 GB RAM). Each experiment is repeated 20 times to ensure statistical reliability and fairness. Model algorithms are implemented in Python 3.9.

4.2. Effectiveness Analysis of NiDA

In this section, the cross-domain performance of NiDA is evaluated through bearing fault diagnosis experiments conducted under varying operating conditions. The effectiveness and feasibility of the proposed method are demonstrated by comparison with SCN [32], RVFL [10], SVM [9], and several representative domain adaptation methods, including UJD-RVFL [28], Adapt-SVM [40], CCSA [41], and CDAN [42]. Note that SCN, RVFL, and SVM are trained with limited labeled target samples to validate the adaptability of NiDA. For fair comparison, the SCN parameters corresponding to those in NiDA are configured identically. The trade-off parameter C for both SVM and Adapt-SVM is searched using the same range as

λ

. For RVFL and UJD-RVFL, the number of enhancement layer nodes is explored within

[50, 100, \dots, 450, 500]

. In UJD-RVRL, trade-off parameters C and

λ

are searched within

{10^{- 3}, 10^{- 2}, \dots, 10^{2}, 10^{3}}

and

{2^{- 3}, 2^{- 2}, \dots, 2^{2}, 2^{3}}

, respectively. For CCSA, margin parameter M and trade-off parameter

γ

, as well as the trade-off parameter

λ

of CDAN, are all tuned using the same range of C of SVM at epoch 500.

Table 2 presents the diagnostic accuracy, precision, recall, F1-score, and modeling time for all evaluated methods, where the first four metrics are calculated using macro-average and reported as average ± standard deviation. The results demonstrate that NiDA achieves superior diagnostic accuracy with sound stability, outperforming other approaches in 10 of the 12 individual domain shifts. Meanwhile, NiDA’s precision, recall, and F1-score are also consistently higher than those of the comparison methods in most cross-domain tasks. This finding provides strong evidence for the effectiveness of the proposed method in cross-domain bearing fault diagnosis. As a baseline classifier, SCN attains an accuracy of approximately 84%, whereas NiDA consistently achieves higher accuracy, surpassing SCN by about 10% in each task. This result underscores NiDA’s capability in addressing cross-domain scenarios. Methods without cross-domain capabilities, such as SCN, SVM, and RVFL, require less training time than NiDA, primarily due to NiDA’s use of a larger training dataset. It is worth noting that CDAN achieves the best performance for B→C and D→C tasks. Although the working conditions corresponding to C are quite different from those of B and C, CDAN can more accurately mine their common features using its adversarial generation mechanism. However, CDAN and CCSA require more computational resources because of their deep domain adaptation processes across all tasks. In contrast, NiDA maintains a lightweight and efficient architecture through MMD with node-based incremental learning strategy. Although Adapt-SVM demonstrates rapid modeling speed, it generally yields lower diagnostic accuracy than NiDA in most tasks. UJD-RVFL also underperforms compared to NiDA, largely due to the increased complexity of iterative parameter estimation and manual node count selection.

4.3. Comparison with Existing Fault Diagnosis Methods

This experiment compares NiMDA with several state-of-the-art methods listed in Table 3, all of which have demonstrated strong performance in bearing fault diagnosis. Notably, NiMDA incorporates a comprehensive framework that includes specialized feature extraction and fault classification, whereas the feature extraction procedures for the other methods follow those described in their respective original publications. An ensemble of SCN (ESCN) based on Bagging is utilized to demonstrate the superior performance of NiMDA, with particular emphasis on its cross-domain capability. The number of hidden nodes for DM-RVFL is optimized by searching within

{50, 100, \dots, 450, 500}

, and the regularization parameter

η

is selected from the range

{10^{- 3}, 10^{- 2}, \dots, 10^{2}, 10^{3}}

. The training epochs and batch size for WDCNN, MCNN-LSTM, and Five-shot are set to 5000 and 32, respectively, with a learning rate of 0.0006. For Bayesian-RF, the optimal values of m and k are set to 1 and 3, as recommended in its original publication, where k denotes the number of decomposition levels in WPD.

The average results are illustrated in Figure 5. WDCNN, MCNN-LSTM, and Five-shot, as deep-learning-based fault diagnosis methods, exhibit notable limitations when only a few training samples are available, including low classification accuracy and time-consuming model training. MCNN-LSTM also fails to reach the 90% accuracy reported in its original paper due to insufficient training samples and its undersampling of raw signals. Moreover, these deep models show unstable diagnostic performance across different operating conditions. However, NiMDA achieves superior stability and the highest accuracy under all operating conditions. This advantage mainly stems from the ensemble learning strategy, which enhances learning capability by integrating multiple source-domain perspectives while mitigating adverse effects from improper training, such as overfitting or undertraining. The results of ESCN demonstrate the effectiveness of our multi-level cloud feature extraction approach. This is evidenced by the superior performance of ESCN over DM-RVFLN, which relies on conventional time- and frequency-domain features. ESCN also achieves performance comparable to Bayesian-RF, which uses WPD-based spectral features and ensemble learning. NiMDA outperforms DM-RVFLN in accuracy by more than 20% across all datasets. Although its computational cost is slightly higher than that of other shallow learners due to the larger input samples, the overall performance gain is significant. In general, NiMDA demonstrates outstanding diagnostic capability for bearing fault diagnosis with limited target-domain samples.

4.4. Effect of the Number of Training Samples

This experiment evaluates the effectiveness of NiMDA in addressing two challenges: mechanical systems cannot operate under faulty conditions due to their criticality, and most failures develop gradually over long periods along a degradation path. Thus, to this end, we conduct comparative experiments by varying the number of fault samples per class within

[5, 10, 20, 30, 40]

while keeping 40 NS training samples and 40 testing samples per class fixed. Note that fault samples of each quantity type include those from the preceding type. The target working condition is set to 2 HP as an example.

The average diagnostic accuracy and standard deviation are shown in Figure 6. The testing accuracy of deep learning–based methods markedly improves as the number of training samples increases, indicating that data quantity strongly influences their performance and may lead to poor results when only limited data is available. The standard deviations of MCNN-LSTM and Five-shot also decrease as more training samples are provided. For shallow learners, the diagnostic accuracy tends to saturate, while their standard deviations decline considerably as the number of training samples increases from 20 to 40. Overall, our method consistently achieves the highest diagnostic accuracy with minimal fluctuation across different sample sizes. The proposed NiMDA achieves an accuracy improvement of approximately 6% over ESCN and exhibits a markedly smaller standard deviation, demonstrating that the multisource domain strategy effectively enhances diagnostic generalization. Moreover, when the sample number per fault state increases from 5 to 20, only NiMDA, Bayesian-RF, and ESCN achieve diagnostic accuracies above 80%, indicating that shallow learners are better suited for few-shot classification tasks. These comparative results confirm that NiMDA provides significantly improved performance for bearing fault diagnosis with limited data.

Additionally, Figure 7 presents the confusion matrices of ESCN and NiMDA when only five training samples are available for each fault state. The recall and precision values are written in the far-right column and the far-bottom row of pictures. NiMDA achieves higher diagnostic accuracy across most fault states than ESCN, except for REF under the 7 mil condition. This improvement arises from NiMDA’s effective use of data from multiple working conditions and its domain-matching mechanism, which enables efficient knowledge transfer across domains. Nevertheless, ESCN relies solely on Bagging-based ensemble learning, lacking cross-domain capability. These results again demonstrate the substantial advantage of NiMDA in bearing fault diagnosis.

4.5. Parameter Sensitivity

This section presents an empirical parameter sensitivity analysis of the three regularization parameters

φ_{S}

,

φ_{T}

, and

λ

in NiDA. The analysis focuses on a single-domain shift, B→A. For each experiment, two parameters are varied, while the third is held constant at

φ_{S} = 1

,

φ_{T} = 100

, and

λ = 10

. The results are shown in Figure 8. The parameters

φ_{S}

and

φ_{T}

determine the relative contributions of the source and target domains. When

φ_{S}

is less than

φ_{T}

, the model acquires more information from the target domain. The parameter

λ

controls the degree of cross-domain matching and modulates the model’s domain adaptation capability. Given that the source and target domains contain an equal number of samples in this experiment, selecting

φ_{T} > 1

and

φ_{T} > φ_{S}

yields higher accuracy in the target domain, as demonstrated in Figure 8c. The value of

λ

should remain moderate, with an effective range of approximately

[1, 100]

, which aligns with the experimental results in Figure 8a,b.

5. Conclusions

In this paper, NiDA is proposed to introduce a perspective on node-incremental domain adaptation, followed by the development of NiMDA to address the limited-data fault-diagnosis issue for rolling bearings under multiple working conditions. NiDA preserves the inter-class relationships of the source domain by unilaterally aligning it to the target domain, rather than mapping both source and target features into an unknown intermediate space. Moreover, NiDA simultaneously aligns discrepancies between marginal and conditional distributions during incremental learning of hidden nodes. Building on NiDA, NiMDA employs ensemble learning to leverage limited data from multiple source domains, improving the generalization of the base classifier in the target domain. The extensive experiments on the CWRU bearing datasets validate the effectiveness and superiority of the proposed methods, including transfer analysis for NiDA, fault diagnosis tests for NiMDA, evaluations with varying sample sizes, and a parameter sensitivity study for NiDA.

Although the current work can effectively tackle the problem of scarce target domain samples in bearing fault diagnosis, it fails to continuously adapt to the target domains under continuously changing working conditions. Consequently, future work will take into account the enhancement of the alignment capability of large domain drift and continual learning to better conform to the actual industrial process.

Author Contributions

Conceptualization, D.D.; methodology, D.D. and J.L.; software, D.D.; validation, D.D., W.L. and J.L.; formal analysis, Y.Q.; investigation, J.L.; resources, W.L.; data curation, D.D. and J.L.; writing—original draft preparation, D.D.; writing—review and editing, W.L. and J.L.; visualization, D.D.; supervision, W.L.; project administration, Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The author declares no potential conflicts of interest.

Appendix A

This appendix provides a rigorous convergence analysis for Theorem 1 to theoretically guarantee the validity of domain adaptation in NiDA.

Proof of Theorem 1.

It is easy to know that

{∥e_{T, L}^{*}∥}^{2} ⩽ {∥{\tilde{e}}_{T, L}∥}^{2} = {∥e_{T, L - 1}^{*} - h_{T, L} {\tilde{fi}}_{L}∥}^{2} ⩽ {∥{\tilde{e}}_{T, L - 1}∥}^{2}

. We have

\begin{matrix} {∥e_{T, L}^{*}∥}^{2} - {∥e_{T, L - 1}^{*}∥}^{2} & ⩽ {∥{\tilde{e}}_{T, L}∥}^{2} - {∥e_{T, L - 1}^{*}∥}^{2} \\ = \sum_{q = 1}^{m} 〈e_{T, L - 1, q}^{*} - h_{T, L} {\tilde{β}}_{L, q}, e_{T, L - 1, q}^{*} - g_{T, L} {\tilde{β}}_{L, q}〉 - \sum_{q = 1}^{m} 〈e_{T, L - 1, q}^{*}, e_{T, L - 1, q}^{*}〉 \\ = \sum_{q = 1}^{m} [{\tilde{β}}_{L, q}^{2} 〈h_{T, L}, h_{T, L}〉 - 2 〈e_{T, L - 1, q}^{*}, h_{T, L} {\tilde{β}}_{L, q}〉] . \end{matrix}

(A1)

According to Equation (15), Equation (A1) is rewritten as follows:

{∥e_{T, L - 1}^{*}∥}^{2} - {∥e_{T, L}^{*}∥}^{2} ⩾ \sum_{q = 1}^{m} \frac{2 G_{L} {e_{T, L - 1, q}^{*}}^{T} h_{T, L} E_{L - 1, q}^{*} - {∥h_{T, L}∥}^{2} {E_{L - 1, q}^{*}}^{2}}{b_{g}^{2}} ⩾ 0

(A2)

which indicates that

{∥e_{T, L}^{*}∥}^{2}

is monotonically decreasing and convergent as

L \to + \infty

.

Based on Equations (13)–(15) and (A2), we can obtain

\begin{array}{l} {∥e_{T, L}^{*}∥}^{2} - (γ + μ_{L}) {∥e_{T, L - 1}^{*}∥}^{2} & ⩽ {∥{\tilde{e}}_{T, L}∥}^{2} - (γ + μ_{L}) {∥e_{T, L - 1}^{*}∥}^{2} \\ = \sum_{q = 1}^{m} 〈e_{T, L - 1, q}^{*} - g_{T, L} {\tilde{β}}_{L, q}, e_{T, L - 1, q}^{*} - g_{T, L} {\tilde{β}}_{L, q}〉 - (γ + μ_{L}) \sum_{q = 1}^{m} 〈e_{T, L - 1, q}^{*}, e_{T, L - 1, q}^{*}〉 \\ = (1 - γ - μ_{L}) {∥e_{T, L - 1}^{*}∥}^{2} + \sum_{q = 1}^{m} [{\tilde{β}}_{L, q}^{2} 〈g_{T, L}, g_{T, L}〉 - 2 〈e_{T, L - 1, q}^{*}, g_{T, L} {\tilde{β}}_{L, q}〉] \\ = δ_{L}^{*} - \sum_{q = 1}^{m} \frac{2 G_{L} 〈e_{T, L - 1, q}^{*}, h_{T, L}〉 E_{L - 1, q}^{*}}{B_{g}^{2}} + \sum_{q = 1}^{m} \frac{{∥h_{T, L}∥}^{2} {E_{L - 1, q}^{*}}^{2}}{B_{g}^{2}} . \end{array}

(A3)

Afterwards, according to (A2), (A3) is further expressed as follows:

{∥e_{T, L}^{*}∥}^{2} - (γ + μ_{L}) {∥e_{T, L - 1}^{*}∥}^{2} ⩽ δ_{L}^{*} - \sum_{q = 1}^{m} \frac{2 G_{L} 〈e_{T, L - 1, q}^{*}, h_{T, L}〉 E_{L - 1, q}^{*}}{b_{g}^{2}} + \sum_{q = 1}^{m} \frac{{∥h_{T, L}∥}^{2} {E_{L - 1, q}^{*}}^{2}}{b_{g}^{2}} ⩽ 0 .

(A4)

Therefore, the following inequality holds:

0 ⩽ {∥e_{T, L}^{*}∥}^{2} ⩽ (γ + μ_{L}) {∥e_{T, L - 1}^{*}∥}^{2} ⩽ \prod_{k = 1}^{L - 1} (γ + μ_{k}) {∥e_{T, 0}^{*}∥}^{2} .

(A5)

Finally,

{lim}_{L \to + \infty} {∥e_{T, L}^{*}∥}^{2}

is obtained due to

{lim}_{L \to + \infty} \prod_{k = 1}^{L} (γ + μ_{k}) {∥e_{T, 0}^{*}∥}^{2} = 0

, which implies

{lim}_{L \to + \infty} ∥e_{T, L}^{*}∥ = 0

. This completes the proof of Theorem 1. □

Remark A1.

The aforementioned proofs provide node incremental domain adaptation for the target domain with a strong theoretical guarantee. It confirms that the prediction residual of target data gradually decreases as the number of nodes increases, and domain transfer is positively effective in an explainable fashion compared with deep domain adaptation. Hidden nodes capable of facilitating effective cross-domain knowledge acquisition are subsequently generated with a high degree of compactness.

References

Dai, W.; Liu, J.; Wang, L. Cloud ensemble learning for fault diagnosis of rolling bearings with stochastic configuration networks. Inf. Sci. 2024, 658, 119991. [Google Scholar]
Chaouech, L.; Ben Ali, J.; Berghout, T.; Bechhoefer, E.; Chaari, A. BSEMD-transformer: A new framework for rolling element bearing diagnosis in electrical machines based on classification of time–frequency features. Machines 2025, 13, 961. [Google Scholar]
Wang, L.; Zhao, W. An ensemble deep learning network based on 2D convolutional neural network and 1D LSTM with self-attention for bearing fault diagnosis. Appl. Soft. Comput. 2025, 172, 112889. [Google Scholar]
Wang, R.; Jiang, H.; Zhu, K.; Wang, Y.; Liu, C. A deep feature enhanced reinforcement learning method for rolling bearing fault diagnosis. Adv. Eng. Inform. 2022, 54, 101750. [Google Scholar]
Imaouchen, Y.; Kedadouche, M.; Alkama, R.; Thomas, M. A frequency-weighted energy operator and complementary ensemble empirical mode decomposition for bearing fault detection. Mech. Syst. Signal Process. 2017, 82, 103–116. [Google Scholar] [CrossRef]
Seera, M.; Wong, M.D.; Nandi, A.K. Classification of ball bearing faults using a hybrid intelligent model. Appl. Soft. Comput. 2017, 57, 427–435. [Google Scholar]
Li, Y.; Feng, K.; Liang, X.; Zuo, M.J. A fault diagnosis method for planetary gearboxes under non-stationary working conditions using improved Vold-Kalman filter and multi-scale sample entropy. J. Sound Vibr. 2019, 439, 271–286. [Google Scholar]
Ying, W.; Zheng, J.; Pan, H.; Liu, Q. Permutation entropy-based improved uniform phase empirical mode decomposition for mechanical fault diagnosis. Digit. Signal Process. 2021, 117, 103167. [Google Scholar] [CrossRef]
Sun, B.; Liu, X. Significance support vector machine for high-speed train bearing fault diagnosis. IEEE Sens. J. 2023, 23, 4638–4646. [Google Scholar]
Li, X.; Yang, Y.; Hu, N.; Cheng, Z.; Cheng, J. Discriminative manifold random vector functional link neural network for rolling bearing fault diagnosis. Knowl.-Based Syst. 2021, 211, 106507. [Google Scholar]
Liu, J.; Hao, R.; Zhang, T.; Wang, X. Vibration fault diagnosis based on stochastic configuration neural networks. Neurocomputing 2021, 434, 98–125. [Google Scholar] [CrossRef]
Zhang, W.; Peng, G.; Li, C.; Chen, Y.; Zhang, Z. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 2017, 17, 425. [Google Scholar] [CrossRef]
Wang, S.; Zhang, J. An intelligent process fault diagnosis system based on Andrews plot and convolutional neural network. J. Dyn. Monit. Diagn. 2022, 1, 127–138. [Google Scholar] [CrossRef]
Spirto, M.; Nicolella, A.; Melluso, F.; Malfi, P.; Cosenza, C.; Savino, S.; Niola, V. Enhancing SDP-CNN for gear fault detection under variable working conditions via multi-order tracking filtering. J. Dyn. Monit. Diagn. 2025; Early access. [Google Scholar] [CrossRef]
Chen, X.; Zhang, B.; Gao, D. Bearing fault diagnosis base on multi-scale CNN and LSTM model. J. Intell. Manuf. 2021, 32, 971–987. [Google Scholar] [CrossRef]
Tang, H.; Tang, Y.; Su, Y.; Feng, W.; Wang, B.; Chen, P.; Zuo, D. Feature extraction of multi-sensors for early bearing fault diagnosis using deep learning based on minimum unscented kalman filter. Eng. Appl. Artif. Intell. 2024, 127, 107138. [Google Scholar]
Ding, Y.; Jia, M.; Zhuang, J.; Cao, Y.; Zhao, X.; Lee, C.G. Deep imbalanced domain adaptation for transfer learning fault diagnosis of bearings under multiple working conditions. Reliab. Eng. Syst. Saf. 2023, 230, 108890. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Li, X.; Hao, H. Partial domain adaptation in remaining useful life prediction with incomplete target data. IEEE/ASME Trans. Mechatron. 2023, 29, 1903–1913. [Google Scholar] [CrossRef]
Ding, P.; Jia, M.; Zhao, X. Meta deep learning based rotating machinery health prognostics toward few-shot prognostics. Appl. Soft. Comput. 2021, 104, 107211. [Google Scholar] [CrossRef]
Chai, Z.; Zhao, C. Fault-prototypical adapted network for cross-domain industrial intelligent diagnosis. IEEE Trans. Autom. Sci. Eng. 2022, 19, 3649–3658. [Google Scholar]
Lei, Z.; Zhang, P.; Chen, Y.; Feng, K.; Wen, G.; Liu, Z.; Yan, R.; Chen, X.; Yang, C. Prior knowledge-embedded meta-transfer learning for few-shot fault diagnosis under variable operating conditions. Mech. Syst. Signal Process 2023, 200, 110491. [Google Scholar]
Yu, K.; Li, Y.; Zhan, Q.; Zhang, Y.; Xing, B. Intelligent fault diagnosis for cross-domain few-shot learning of rotating equipment based on mixup data augmentation. Machines 2025, 13, 807. [Google Scholar]
Li, Z.; Shen, S.; Liu, Z.; Chen, Y. A novel multisource domain adaptation framework for bearing fault diagnosis based on adversarial network and feature enhancement. IEEE Trans. Instrum. Meas. 2025, 74, 1–12. [Google Scholar] [CrossRef]
Dong, S.; Wen, G.; Zhang, Z. Bearing fault diagnosis under different operating conditions based on cross domain feature projection and domain adaptation. In Proceedings of the IEEE International Instrumentation and Measurement Technology Conference, Auckland, New Zealand, 20–23 May 2019; pp. 1–6. [Google Scholar]
Lei, Z.; Wen, G.; Dong, S.; Huang, X.; Zhou, H.; Zhang, Z.; Chen, X. An intelligent fault diagnosis method based on domain adaptation and its application for bearings under polytropic working conditions. IEEE Trans. Instrum. Meas. 2021, 70, 3505914. [Google Scholar]
Yu, X.; Dong, F.; Xia, B.; Yang, S.; Ding, E.; Yu, W. An intelligent fault diagnosis scheme for rotating machinery based on supervised domain adaptation with manifold embedding. IEEE Internet Things J. 2023, 10, 953–972. [Google Scholar]
Shi, M.; Ding, C.; Chang, S.; Wang, R.; Huang, W.; Zhu, Z. Cross-domain privacy-preserving broad network for fault diagnosis of rotating machinery. Adv. Eng. Inform. 2023, 58, 102157. [Google Scholar]
Li, B.; Zhao, Y.P.; Chen, Y.B. Unilateral alignment transfer neural network for fault diagnosis of aircraft engine. Aerosp. Sci. Technol. 2021, 118, 107031. [Google Scholar] [CrossRef]
Jiang, Y.; Tang, C.; Zhang, X.; Jiao, W.; Li, G.; Huang, T. A novel rolling bearing defect detection method based on bispectrum analysis and cloud model-improved EEMD. IEEE Access 2020, 8, 24323–24333. [Google Scholar] [CrossRef]
Wang, Q.; Dai, W.; Lin, P.; Zhou, P. Compact incremental random weight network for estimating the underground airflow quantity. IEEE Trans. Ind. Inform. 2022, 18, 426–436. [Google Scholar]
Li, M.; Wang, D. Insights into randomized algorithms for neural networks: Practical issues and common pitfalls. Inf. Sci. 2017, 382, 170–178. [Google Scholar] [CrossRef]
Wang, D.; Li, M. Stochastic configuration networks: Fundamentals and algorithms. IEEE Trans. Cybern. 2017, 47, 3466–3479. [Google Scholar] [CrossRef]
Liu, J.; Qin, Y.; Dai, W.; Yuen, C. A Lightweight Transfer Learning-Based State-of-Health Monitoring With Application to Lithium-Ion Batteries in Autonomous Air Vehicles. IEEE Trans. Ind. Inform. 2025; early access. [Google Scholar] [CrossRef]
Huang, D.; Li, Y.; Guan, S.; Zhang, X.; Tang, M. A novel collaborative diagnosis approach of incipient faults based on VMD and SCN for rolling bearing. Opt. Control Appl. Methods 2023, 44, 1617–1631. [Google Scholar] [CrossRef]
Dai, W.; Fan, R.; Nan, J. Simultaneous fault diagnosis for control valve using feature fusion and multi-label classification framework considering fault similarity. IEEE Trans. Instrum. Meas. 2025, 74, 3519211. [Google Scholar]
Wang, X.; Liu, Z.; Zhang, L.; Heath, W.P. Wavelet package energy transmissibility function and its application to wind turbine blade fault detection. IEEE Trans. Ind. Electron. 2022, 69, 13597–13606. [Google Scholar] [CrossRef]
Zhao, M.; Kang, M.; Tang, B.; Pecht, M. Multiple wavelet coefficients fusion in deep residual networks for fault diagnosis. IEEE Trans. Ind. Electron. 2019, 66, 4696–4706. [Google Scholar]
Wei, X.; Liu, S.; Xiang, Y.; Duan, Z.; Zhao, C.; Lu, Y. Incremental learning based multi-domain adaptation for object detection. Knowl.-Based Syst. 2020, 210, 106420. [Google Scholar]
Smith, W.A.; Randall, R.B. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process. 2015, 64–65, 100–131. [Google Scholar] [CrossRef]
Yang, J.; Yan, R.; Hauptmann, A.G. Adapting SVM classifiers to data with shifted distributions. In Proceedings of the IEEE International Conference on Data Mining Workshops, Omaha, NE, USA, 28–31 October 2007; pp. 69–76. [Google Scholar]
Motiian, S.; Piccirilli, M.; Adjeroh, D.A.; Doretto, G. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5715–5725. [Google Scholar]
Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 1647–1657. [Google Scholar]
Aburakhia, S.A.; Myers, R.; Shami, A. A hybrid method for condition monitoring and fault diagnosis of rolling bearings with low system delay. IEEE Trans. Instrum. Meas. 2022, 71, 519913. [Google Scholar] [CrossRef]
Zhang, A.; Li, S.; Cui, Y.; Yang, W.; Dong, R.; Hu, J. Limited data rolling bearing fault diagnosis with few-shot learning. IEEE Access 2019, 7, 110895–110904. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed NiMDA for bearing fault diagnosis with limited data.

Figure 2. Two parts of multi-level cloud feature extraction, including (a) three levels of WPD generating wavelet packet coefficients

x_{i, j}

at the j-th subband of i-th decomposition level and (b) bidirectional cognitive transformation diagram of cloud model characterized by expectation

E x

, entropy

E n

, and hyper entropy

H e

.

Figure 2. Two parts of multi-level cloud feature extraction, including (a) three levels of WPD generating wavelet packet coefficients

x_{i, j}

at the j-th subband of i-th decomposition level and (b) bidirectional cognitive transformation diagram of cloud model characterized by expectation

E x

, entropy

E n

, and hyper entropy

H e

.

Figure 3. Structural diagram of the proposed NiDA.

Figure 4. (a) Experimental platform [39] with (b) the deep groove ball bearing.

Figure 5. Overall diagnostic performance comparisons of existing fault diagnosis methods for (a) diagnostic accuracy and (b) modeling time.

Figure 6. Accuracy comparisons with existing fault diagnostic methods across varying training sample sizes.

Figure 7. Confusion matrices under 5 training samples for each fault state for Dataset C. (a) Confusion matrix of ESCN. (b) Confusion matrix of NiMDA.

Figure 8. Distribution of diagnostic accuracy with regularization parameters variation. (a)

φ_{S}

is fixed at 1. (b)

φ_{T}

is fixed at 100. (c)

λ

is fixed at 10.

Figure 8. Distribution of diagnostic accuracy with regularization parameters variation. (a)

φ_{S}

is fixed at 1. (b)

φ_{T}

is fixed at 100. (c)

λ

is fixed at 10.

Table 1. Description of samples under varying working conditions.

Dataset	Motor Load	Health State	Fault Diameter	Training Samples per Class	Testing Samples per Class
A	0 HP	REF	7/14/21	5	30
B	1 HP	IRF	7/14/21	5	30
C	2 HP	ORF	7/14/21	5	30
D	3 HP	NS	-	25	30

Table 2. Diagnostic performance of all comparison methods.

Tasks	Metrics	Methods
Tasks	Metrics	SCN	SVM	RVFL	NiDA	Adapt-SVM	UJD-RVFL	CCSA	CDAN
A→B	Accuracy (%)	83.05 ± 4.28	87.37 ± 3.93	86.77 ± 0.86	95.47 ± 1.10	89.26 ± 2.31	94.83 ± 1.25	93.93 ± 1.23	92.33 ± 1.06
	Precision (%)	86.91 ± 3.18	88.19 ± 3.71	89.76 ± 0.82	95.60 ± 0.93	90.12 ± 2.26	94.96 ± 1.30	94.61 ± 1.11	93.15 ± 1.01
	Recall (%)	83.07 ± 4.33	87.35 ± 3.98	86.78 ± 0.81	95.47 ± 1.05	89.24 ± 2.36	94.81 ± 1.20	93.93 ± 1.18	92.31 ± 1.01
	F1-Score (%)	83.30 ± 4.23	86.79 ± 3.83	88.24 ± 0.90	95.47 ± 1.00	89.68 ± 2.41	94.88 ± 1.35	94.27 ± 1.33	92.73 ± 1.16
	Time (s)	1.46	0.013	0.0068	11.49	0.15	24.33	35.16	42.07
A→C	Accuracy (%)	82.03 ± 2.87	80.53 ± 3.71	81.32 ± 1.12	86.13 ± 0.87	85.51 ± 1.06	88.73 ± 2.42	90.63 ± 1.09	92.07 ± 1.16
	Precision (%)	83.38 ± 2.77	80.36 ± 3.61	83.83 ± 1.02	86.50 ± 0.92	85.63 ± 1.11	92.61 ± 2.12	92.01 ± 1.14	92.11 ± 1.16
	Recall (%)	82.03 ± 2.82	80.53 ± 3.76	81.32 ± 1.07	86.13 ± 0.82	85.50 ± 1.01	91.48 ± 2.27	90.61 ± 1.04	92.07 ± 1.11
	F1-Score (%)	82.11 ± 2.92	80.33 ± 3.16	82.56 ± 0.97	86.10 ± 0.77	85.56 ± 1.16	92.04 ± 1.69	91.31 ± 1.19	92.09 ± 1.26
	Time (s)	1.14	0.0124	0.0077	11.32	0.15	26.33	35.63	44.22
A→D	Accuracy (%)	86.33 ± 3.28	90.43 ± 4.62	91.75 ± 1.34	96.48 ± 1.07	88.74 ± 2.69	93.48 ± 1.42	92.00 ± 1.31	92.07 ± 0.97
	Precision (%)	90.03 ± 2.31	90.92 ± 4.27	93.10 ± 1.44	96.60 ± 1.06	89.02 ± 2.54	94.37 ± 1.37	94.64 ± 1.12	91.64 ± 1.51
	Recall (%)	86.33 ± 4.03	90.44 ± 4.47	91.75 ± 1.29	96.48 ± 1.02	88.74 ± 2.66	93.47 ± 1.56	92.31 ± 1.26	92.68 ± 0.92
	F1-Score (%)	86.49 ± 3.33	90.33 ± 4.67	92.42 ± 1.39	96.50 ± 1.17	88.88 ± 2.59	93.92 ± 1.52	93.46 ± 1.41	92.16 ± 1.07
	Time (s)	1.66	0.0146	0.0082	12.17	0.13	25.66	35.09	44
B→A	Accuracy (%)	82.78 ± 3.13	88.68 ± 4.03	82.50 ± 1.83	93.83 ± 1.38	73.66 ± 5.61	91.58 ± 2.10	91.73 ± 1.51	92.30 ± 1.64
	Precision (%)	86.38 ± 2.08	89.36 ± 3.67	85.73 ± 1.78	94.34 ± 1.23	76.32 ± 5.31	92.16 ± 2.15	91.94 ± 1.46	92.33 ± 1.69
	Recall (%)	82.78 ± 3.18	88.68 ± 3.98	82.50 ± 1.88	93.83 ± 1.37	72.69 ± 5.84	91.59 ± 2.05	91.72 ± 1.56	92.30 ± 1.62
	F1-Score (%)	82.71 ± 3.23	89.02 ± 4.08	84.08 ± 1.93	93.90 ± 1.18	74.46 ± 5.66	91.87 ± 2.20	91.83 ± 1.61	92.31 ± 1.64
	Time (s)	1.75	0.015	0.0084	16.47	0.13	26.03	35.53	45.06
B→C	Accuracy (%)	83.23 ± 2.20	81.38 ± 2.76	81.43 ± 1.01	91.48 ± 1.05	88.45 ± 1.33	89.28 ± 1.56	91.27 ± 1.11	91.03 ± 1.34
	Precision (%)	84.15 ± 2.10	81.50 ± 2.86	83.99 ± 0.81	91.41 ± 1.10	88.39 ± 1.28	89.93 ± 1.51	91.66 ± 1.16	91.12 ± 1.30
	Recall (%)	83.23 ± 2.15	81.35 ± 2.71	81.42 ± 0.96	91.42 ± 1.06	88.42 ± 1.38	89.06 ± 1.35	91.23 ± 1.06	91.02 ± 1.29
	F1-Score (%)	83.20 ± 2.25	81.44 ± 2.71	82.68 ± 1.06	91.51 ± 1.05	88.40 ± 1.43	89.49 ± 1.26	91.44 ± 1.11	91.07 ± 1.44
	Time (s)	1.28	0.0141	0.0086	11.52	0.12	26.58	36.71	45.4
B→D	Accuracy (%)	86.17 ± 4.33	88.20 ± 4.71	92.32 ± 1.17	96.82 ± 0.51	89.41 ± 1.21	93.45 ± 1.03	91.17 ± 0.86	90.30 ± 1.01
	Precision (%)	90.13 ± 4.38	88.28 ± 4.76	93.45 ± 1.11	96.94 ± 0.56	89.56 ± 1.21	93.65 ± 1.01	93.55 ± 0.63	92.01 ± 1.06
	Recall (%)	86.17 ± 4.28	88.20 ± 4.66	92.32 ± 1.07	96.83 ± 0.46	89.40 ± 1.16	93.45 ± 0.98	92.50 ± 0.81	91.06 ± 0.96
	F1-Score (%)	86.54 ± 3.43	88.08 ± 4.21	92.43 ± 1.27	96.84 ± 0.61	89.48 ± 1.35	93.55 ± 1.13	93.02 ± 0.86	91.53 ± 0.86
	Time (s)	1.44	0.0133	0.0091	12.42	0.13	27.73	36.09	44.46
C→A	Accuracy (%)	83.28 ± 3.12	87.90 ± 4.26	83.00 ± 1.82	92.83 ± 0.96	90.37 ± 1.19	91.60 ± 1.20	92.50 ± 1.03	90.50 ± 0.99
	Precision (%)	86.79 ± 2.22	88.22 ± 4.12	86.29 ± 1.71	93.25 ± 0.91	91.06 ± 1.14	92.31 ± 1.21	92.86 ± 1.01	91.35 ± 1.04
	Recall (%)	83.28 ± 3.07	87.90 ± 4.21	83.01 ± 1.97	92.73 ± 1.01	90.37 ± 1.19	91.60 ± 1.25	92.50 ± 0.98	90.51 ± 0.94
	F1-Score (%)	83.26 ± 3.17	87.91 ± 4.31	84.62 ± 1.87	92.99 ± 1.06	90.71 ± 1.29	91.95 ± 1.20	92.68 ± 1.13	90.93 ± 1.09
	Time (s)	1.72	0.0152	0.0083	12.06	0.11	23.16	37.07	46.11
C→B	Accuracy (%)	83.43 ± 4.19	86.37 ± 4.30	87.22 ± 0.81	95.10 ± 1.73	90.30 ± 2.45	90.57 ± 1.51	93.03 ± 1.96	92.07 ± 1.48
	Precision (%)	87.46 ± 2.24	86.79 ± 4.15	90.01 ± 0.76	95.21 ± 1.72	90.38 ± 2.40	90.94 ± 1.46	94.61 ± 1.08	93.21 ± 1.23
	Recall (%)	83.43 ± 4.14	86.37 ± 4.25	87.32 ± 0.86	95.10 ± 1.68	90.31 ± 2.31	90.55 ± 1.56	93.00 ± 1.91	92.04 ± 1.43
	F1-Score (%)	83.91 ± 3.29	86.58 ± 4.40	88.64 ± 0.81	95.09 ± 1.83	90.34 ± 2.35	90.74 ± 1.31	93.81 ± 2.06	92.62 ± 1.28
	Time (s)	1.61	0.0146	0.0085	12.53	0.13	23.58	37.21	46.66
C→D	Accuracy (%)	87.38 ± 3.56	90.37 ± 3.17	92.35 ± 1.07	96.70 ± 0.91	91.87 ± 2.83	93.67 ± 1.18	94.27 ± 1.16	91.43 ± 1.12
	Precision (%)	90.72 ± 3.21	90.62 ± 3.12	93.51 ± 0.92	96.81 ± 0.90	92.64 ± 2.81	93.66 ± 1.23	94.16 ± 1.11	93.21 ± 1.07
	Recall (%)	87.38 ± 3.51	90.37 ± 3.15	92.35 ± 1.02	96.70 ± 0.86	91.87 ± 2.98	93.65 ± 1.13	94.27 ± 1.18	91.40 ± 1.17
	F1-Score (%)	87.72 ± 3.36	90.20 ± 3.27	92.47 ± 1.17	96.72 ± 1.01	92.25 ± 2.71	93.65 ± 1.28	94.21 ± 1.26	92.30 ± 1.00
	Time (s)	1.48	1.0134	0.0107	13.08	0.14	24.32	38.68	45.18
D→A	Accuracy (%)	81.98 ± 4.28	89.05 ± 4.32	82.75 ± 0.97	93.07 ± 1.11	89.23 ± 3.11	91.60 ± 0.87	90.43 ± 2.06	92.67 ± 1.31
	Precision (%)	86.11 ± 3.33	89.08 ± 4.37	85.93 ± 1.02	93.32 ± 1.06	91.11 ± 2.66	92.65 ± 0.82	91.34 ± 1.71	92.94 ± 1.26
	Recall (%)	81.98 ± 4.23	89.05 ± 4.27	82.71 ± 0.92	93.07 ± 1.08	89.20 ± 3.06	91.39 ± 0.92	90.41 ± 2.01	92.64 ± 1.36
	F1-Score (%)	82.03 ± 3.38	89.06 ± 4.42	84.29 ± 0.77	93.18 ± 0.81	90.14 ± 2.21	92.02 ± 0.87	90.87 ± 2.12	92.80 ± 1.10
	Time (s)	1.8	0.0163	0.0098	12.68	0.12	25.71	39.35	48.98
D→B	Accuracy (%)	83.33 ± 4.09	87.03 ± 4.58	86.92 ± 0.61	95.15 ± 1.53	87.48 ± 2.61	90.48 ± 1.58	91.47 ± 2.32	90.03 ± 1.94
	Precision (%)	86.94 ± 4.014	87.39 ± 4.43	89.96 ± 0.56	95.26 ± 1.28	88.16 ± 2.41	90.63 ± 1.53	93.11 ± 2.13	91.39 ± 1.64
	Recall (%)	83.33 ± 4.09	87.02 ± 4.53	86.92 ± 1.26	95.15 ± 1.42	87.48 ± 2.56	90.48 ± 1.53	91.40 ± 2.27	90.03 ± 1.89
	F1-Score (%)	83.75 ± 4.23	87.20 ± 4.61	88.41 ± 0.71	95.15 ± 1.46	87.82 ± 2.31	90.55 ± 1.29	92.25 ± 2.42	90.70 ± 1.66
	Time (s)	1.59	0.0138	0.0105	12.3	0.15	24.16	39.14	48.93
D→C	Accuracy (%)	81.32 ± 2.71	79.88 ± 3.84	81.60 ± 1.20	87.52 ± 1.19	89.71 ± 2.06	88.37 ± 1.51	92.67 ± 2.15	92.10 ± 1.81
	Precision (%)	82.87 ± 1.76	79.33 ± 3.89	84.06 ± 1.12	87.85 ± 1.21	89.92 ± 2.011	89.60 ± 1.56	92.46 ± 2.20	92.61 ± 1.76
	Recall (%)	81.30 ± 2.66	79.88 ± 3.79	81.60 ± 1.25	86.47 ± 1.24	88.63 ± 2.36	88.41 ± 1.46	91.51 ± 2.10	92.02 ± 1.69
	F1-Score (%)	81.45 ± 2.81	79.60 ± 3.94	82.81 ± 1.10	87.15 ± 1.29	89.27 ± 2.31	89.00 ± 1.61	91.98 ± 2.25	92.31 ± 1.91
	Time (s)	1.28	0.013	0.0102	13.44	0.14	26.11	40.55	48.74

Table 3. Bearing fault diagnosis methods for comparison.

Literature	Brief	Abbreviation
[12]	Deep convolutional neural networks with wide first-layer kernels.	WDCNN
[15]	Multi-scale convolutional neural network and long short-term memory.	MCNN-LSTM
[43]	Bayesian-optimization-based random forest algorithm.	Bayesian-RF
[10]	Discriminative manifold random vector functional link neural network.	DM-RVFLN
[44]	A deep Siamese neural network by repeating one-shot five times.	Five-shot

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, D.; Li, W.; Liu, J.; Qin, Y. Node-Incremental-Based Multisource Domain Adaptation for Fault Diagnosis of Rolling Bearings with Limited Data. Machines 2026, 14, 71. https://doi.org/10.3390/machines14010071

AMA Style

Deng D, Li W, Liu J, Qin Y. Node-Incremental-Based Multisource Domain Adaptation for Fault Diagnosis of Rolling Bearings with Limited Data. Machines. 2026; 14(1):71. https://doi.org/10.3390/machines14010071

Chicago/Turabian Style

Deng, Di, Wei Li, Jiang Liu, and Yan Qin. 2026. "Node-Incremental-Based Multisource Domain Adaptation for Fault Diagnosis of Rolling Bearings with Limited Data" Machines 14, no. 1: 71. https://doi.org/10.3390/machines14010071

APA Style

Deng, D., Li, W., Liu, J., & Qin, Y. (2026). Node-Incremental-Based Multisource Domain Adaptation for Fault Diagnosis of Rolling Bearings with Limited Data. Machines, 14(1), 71. https://doi.org/10.3390/machines14010071

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Node-Incremental-Based Multisource Domain Adaptation for Fault Diagnosis of Rolling Bearings with Limited Data

Abstract

1. Introduction

2. Preliminaries

2.1. Problem Formulation

2.2. Theoretical Background

3. Node-Incremental-Based Multisource Domain Adaptation for Bearing Fault Diagnosis

3.1. Multi-Level Cloud Feature Extraction

3.2. Multisource Domain Adaptation with Ensemble Learning

3.2.1. Node-Incremental-Based Domain Adaptation

3.2.2. Ensemble Learning

4. Experiments

4.1. Data Description and Experiment Setup

4.1.1. Data Description

4.1.2. Experiment Setup

4.2. Effectiveness Analysis of NiDA

4.3. Comparison with Existing Fault Diagnosis Methods

4.4. Effect of the Number of Training Samples

4.5. Parameter Sensitivity

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI