A Perturbation-Based Self-Training Method to Enhance Belief Rule Base Learning for Fault Diagnosis

Fan, Zhiying; Hu, Guanyu; He, Wei; Zhao, Motong; Du, Hongyao

doi:10.3390/act14100473

Open AccessArticle

A Perturbation-Based Self-Training Method to Enhance Belief Rule Base Learning for Fault Diagnosis

by

Zhiying Fan

¹,

Guanyu Hu

^2,*,

Wei He

^1,*

,

Motong Zhao

³ and

Hongyao Du

¹

School of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China

²

Key Laboratory of Equipment Data Security and Guarantee Technology, Ministry of Education, Guilin University of Electronic Technology, Guilin 541004, China

³

School of Integrated Circuits, Dalian University of Technology, Dalian 116024, China

^*

Authors to whom correspondence should be addressed.

Actuators 2025, 14(10), 473; https://doi.org/10.3390/act14100473

Submission received: 3 September 2025 / Revised: 22 September 2025 / Accepted: 24 September 2025 / Published: 27 September 2025

(This article belongs to the Section Actuators for Manufacturing Systems)

Download

Browse Figures

Versions Notes

Abstract

The fault diagnosis of complex systems is essential for ensuring operational safety. The belief rule base (BRB), a rule-driven framework based on expert knowledge, is widely applied in fault diagnosis because of its ability to manage uncertainty. However, existing BRB models rely heavily on large amounts of high-quality labeled data, and their performance decreases when labels are scarce or noisy. To address this limitation, a perturbed self-training-based BRB method (PS-BRB) is proposed. In this approach, pseudo-labels for unlabeled samples are first inferred by an initial BRB, and Gaussian noise is introduced into the inputs to simulate perturbations. Samples that produce consistent predictions before and after perturbation are retained through class consistency checking. The Jensen–Shannon (JS) divergence then measures the difference between belief distributions, and high-quality pseudo-labels are selected according to the 90th percentile criterion. These pseudo-labels are incorporated into the training set to optimize BRB rules and parameters. The method is validated on two bearing datasets, and the results show improved diagnostic accuracy and applicability, which indicates potential for use in practical engineering scenarios.

Keywords:

belief rule base; fault diagnosis; self-training; pseudo-label

1. Introduction

Complex systems are characterized by high dimensionality, strong coupling, and significant uncertainty, which makes their operational states difficult to predict [1]. Once a local fault occurs, system-wide risks are likely to be triggered [2]. Therefore, accurate and effective fault diagnosis is considered essential for ensuring the stability and safety of such systems.

Recent advances in signal processing have introduced techniques such as the generalized synchroextracting transform, which improves feature resolution and robustness in non-stationary vibration analysis [3]. For rolling element bearings, blind deconvolution methods like the CFFsBD algorithm have demonstrated effectiveness in enhancing weak fault features [4]. Beyond mechanical systems, fault diagnosis approaches have also been extended to electrical domains, as shown by the identification of anomalies in co-phase power supply systems [5]. These studies reflect the depth and breadth of current research and emphasize the need for diagnostic methods that are accurate, robust, and generalizable. These examples reflect both the depth and breadth of current research, highlighting the need for diagnostic methods that are not only accurate and robust but also generalizable.

Building on this context, fault diagnosis techniques are commonly categorized into four types: physics-based, data-driven, knowledge-based, and semi-quantitative information-based methods [5,6].

(1) The physical modeling approach reveals the operating mechanism and fault characteristics of a system by constructing a mathematical model of the system [7]. The method exhibits high diagnostic accuracy [8]. For example, Huang et al. [9] analyzed various fault mechanisms in motor drive systems and proposed a fault classification method based on an improved hidden Markov model (HMM), which achieved satisfactory results. Jafari et al. [10] proposed a simple yet effective method for detecting inter-turn faults, which is based on one modal current and four different simple indicators. Although physics-based methods provide accurate fault identification, they depend on detailed modeling and expert knowledge, which limits their adaptability to complex or variable operating conditions [11].

(2) Data-driven methods rely on historical operational data and apply machine learning or deep learning techniques to construct diagnostic models without requiring explicit system modeling [12]. The method shows good performance in dealing with complex systems and extracting fault characteristics. For example, Zhang et al. [13] proposed a bearing fault identification method based on convolutional neural networks (CNNs), which significantly improves classification accuracy. Pule et al. [14] construct a multi-fault identification model under complex operating conditions by combining support vector machines (SVMs) with principal component analysis (PCA). However, data-driven methods rely heavily on the quality and quantity of data. When samples are insufficient or noisy, the diagnostic performance tends to degrade [15].

To overcome the limitations of purely model-based or purely data-driven approaches, researchers have increasingly explored hybrid methods that integrate the strengths of both paradigms. For example, Xia et al. [16] introduced a digital twin-driven gearbox diagnosis framework, and Meléndez-Useros et al. [17] presented an active steering fault diagnosis method integrating LSTM-based sensor detection with robust actuator fault estimation. Nevertheless, hybrid methods often involve more complex architectures and higher computational costs, and their performance may be sensitive to model integration strategies, which poses challenges for large-scale or real-time applications.

(3) Knowledge-driven approaches rely on expert experience, domain knowledge or a priori rules to build fault diagnosis systems [18]. This category of methods effectively incorporates human knowledge into the model’s reasoning process, thereby enhancing interpretability and reliability. For example, Chen et al. [19] proposed a modular fault tree approach that effectively reduces analysis complexity and enhances efficiency. Chi et al. [20] used a knowledge-based fault diagnosis approach in the industrial internet of things (IIoT) to enhance interoperability through ontologies and effectively describe system faults. However, knowledge-based methods rely on manually designed rules and often show limited adaptability to dynamic operating conditions [21].

(4) Semi-quantitative information methods combine qualitative knowledge with quantitative analysis and are suitable for complex systems that cannot be fully quantified [22]. Cheng et al. [23] investigated the relationship between valid data and expert knowledge and conducted a detailed analysis of the transmission state of a high-speed train. Yuan et al. [24] introduced a hybrid knowledge-based method, in which multiple expert knowledge systems are constructed and applied according to the type of available information. These methods integrate multi-source information, capture the uncertainty of fault features, and improve diagnostic accuracy [25].

Limitations are observed in existing physics-based, data-driven, and knowledge-based methods in terms of diagnostic accuracy, data dependency, and adaptability. In contrast, semi-quantitative information methods are considered more effective in handling uncertainty and integrating multi-source information, making them more suitable for fault diagnosis in complex systems.

The belief rule base (BRB) is a representative semi-quantitative method that integrates expert knowledge with numerical information, making it well suited for fault diagnosis in complex and uncertain environments. However, most existing BRB methods rely heavily on labeled data, while the effective use of large volumes of unlabeled data remains limited. In real-world applications, the high cost and inefficiency of label acquisition further restrict model performance and reduce the reliability of diagnostic results.

To overcome the limitation of scarce labeled data, many approaches have introduced physical biases into the fault classification process. For example, simulation-driven machine learning methods combine simulated data with learning algorithms to improve classification accuracy [26], while zero-fault learning integrates physical modeling with data-driven techniques to enhance diagnostic performance [27]. Nevertheless, these methods typically require large amounts of high-quality simulation data, which may not always be available and can limit their applicability.

To address these challenges, this paper proposes a perturbation-based self-training method to enhance BRB. The proposed PS-BRB retains the interpretability and uncertainty-handling capability of the original BRB while introducing a self-training mechanism to exploit unlabeled data. Through perturbation and filtering strategies, the rules and parameters are iteratively optimized, thereby improving the model structure, representational capacity, diagnostic accuracy, and generalization under complex conditions.

Unlike existing physical bias-based methods, PS-BRB does not depend on simulation data. Instead, it leverages perturbation consistency and JS divergence filtering to effectively utilize unlabeled data, maintaining high diagnostic accuracy even with limited labeled samples. This design not only avoids the dependency on simulation data but also demonstrates stronger robustness and adaptability in complex and uncertain environments, providing a new solution to the limitations of current BRB-based and physical bias-driven methods.

In summary, although existing semi-supervised pseudo-labeling methods have achieved promising results in general learning tasks, they do not fully account for the unique characteristics of BRB, particularly its rule-based structure and ability to represent uncertainty. Directly applying such methods to BRB often results in high sensitivity to noisy pseudo-labels and unstable rule updates. Therefore, the proposed PS-BRB is not a direct transplantation but a tailored redesign for the BRB framework. By incorporating input perturbation for consistency checking, JS divergence for distributional filtering, and a self-training mechanism with rule and parameter optimization, PS-BRB effectively mitigates pseudo-label error propagation and data dependency while preserving BRB’s interpretability and uncertainty-handling capability.

The main contributions of this paper are as follows:

(1) A high-quality pseudo-label filtering mechanism is proposed, in which Gaussian noise is applied to inputs corresponding to pseudo-labels generated by the initial BRB model. Label consistency is verified and JS divergence is measured with a threshold determined at the 90th percentile. This dual-constraint strategy improves the reliability of pseudo-labels.

(2) A perturbation-based self-training method is developed, where the filtering mechanism is integrated into the self-training framework. Perturbation and filtering strategies guide the BRB to update its rules and parameters with high-quality pseudo-labels, thereby enhancing its representational ability and improving fault diagnosis performance under complex conditions.

Overall, PS-BRB goes beyond simply combining existing semi-supervised learning techniques. Through tailored adaptation to BRB’s reasoning mechanism, it achieves more stable pseudo-label utilization and demonstrates greater robustness and adaptability in noisy and uncertain environments.

The remainder of this paper is organized as follows. In Section 2, relevant preliminaries are introduced and the problem description of the PS-BRB method is presented. In Section 3, a PS-BRB-based fault diagnosis approach for complex systems is proposed. Section 4 presents two case studies through which the effectiveness of the proposed method is validated. The limitations and future work are discussed in Section 5. Finally, conclusions are drawn in Section 6.

2. Preliminary Knowledge and Problem Description

In this work, the basic theory of BRBs is introduced in Section 2.1, and the challenges in the PS-BRB-based fault diagnosis model are discussed in Section 2.2.

2.1. Preliminary Knowledge

BRB is a rule-based system that integrates expert knowledge through a set of IF–THEN belief rules [28]. The

k_{t h}

rule can be described as follows:

\begin{matrix} IF x_{1} is A_{1}^{k} \land x_{2} is A_{2}^{k} \land \dots \land x_{T_{k}} is A_{T_{k}}^{k} \\ THEN y is {(R_{1}, β_{1, k}), \dots, (R_{N}, β_{N, k})} (\sum_{n = 1}^{N} β_{n, k} \leq 1) \\ With rule weight θ_{k}, k \in {1, 2, \dots, L} \\ and attribute weight δ_{1}, δ_{2}, \dots, δ_{i}, i \in {1, 2, \dots, T_{i}} \end{matrix}

(1)

where

x_{1}, x_{2}, \dots, x_{T_{k}}

are the antecedent attributes;

A_{1}^{k}, A_{2}^{k}, \dots, A_{T_{k}}^{k}

are a set of reference values for the antecedent attributes

x_{1}, x_{2}, \dots, x_{T_{k}}

;

T_{k}

is the number of antecedents in the

k_{t h}

rule;

R_{1}, R_{2}, \dots, R_{N}

are the possible consequents;

β_{1, k}, β_{2, k}, \dots, β_{N, k}

are the belief degrees corresponding to each consequent;

θ_{k}

is the rule weight of the

k_{t h}

rule;

δ_{1}, δ_{2}, \dots, δ_{i}

are the attribute weights for the

i_{t h}

attribute; and L is the total number of rules.

Evidential reasoning (ER) methods can provide a clear and easily understood framework for synthesizing evidence [29]. The BRB model is an inference model based on the ER. The specific reasoning steps are as follows:

Step 1: Transform the input information into belief distributions:

\begin{matrix} A_{i, j} \leq x_{i} \leq A_{i, j + 1}, \{\begin{matrix} α_{i, j} = (A_{i, h + 1} - x_{i}^{*}) / (A_{i, h + 1} - A_{i, h}) \\ \begin{matrix} α_{i, j + 1} = 1 - α_{i, j} \\ α_{i, j^{'}} = 0, j^{'} = 1, 2, \dots, J_{i} \end{matrix} \end{matrix} \end{matrix}

(2)

\begin{matrix} S (x_{i}) = {(A_{i, j}^{k}, α_{i, j}^{k}) | i = 1, \dots, T_{k}, j = 1, \dots, J_{i}} \end{matrix}

(3)

where

α_{i, j}^{k}

represents the matching degree of the

j_{t h}

reference value of the attribute;

J_{i}

denotes the number of reference values;

A_{i, j}^{k}

and

A_{i, j + 1}^{k}

represent two adjacent reference values; and

S (x_{i})

denotes the process of transforming the input data

x_{i}

.

Step 2: The activation weights are calculated from the combination of rule weights and matching degrees:

\begin{matrix} w_{k} = \frac{θ_{k} \prod_{i = 1}^{T_{k}} {(α_{i, j}^{k})}^{{\bar{δ}}_{i}}}{\sum_{l = 1}^{L} θ_{l} \prod_{i = 1}^{T_{k}} {(α_{i, j}^{l})}^{{\bar{δ}}_{i}}}, {\bar{δ}}_{i} = \frac{δ_{i}}{\max_{i = 1, 2, \dots, T_{k}} {δ_{i}}} \end{matrix}

(4)

where

w_{k}

denotes the activation weight of the

k_{t h}

rule;

δ_{i}

represents the relative weight of the

i_{t h}

attribute; and

{\bar{δ}}_{i}

denotes the normalized weight of the

i_{t h}

attribute.

Step 3: The ER algorithm is applied to fuse the activated rules and infer the final belief degrees:

\begin{matrix} β_{n} = \frac{μ [\prod_{k = 1}^{L} (ω_{k} β_{n, k} + 1 - ω_{k} \sum_{j = 1}^{N} β_{j, k}) - \prod_{k = 1}^{L} (1 - ω_{k} \sum_{j = 1}^{N} β_{j, k})]}{1 - μ [\prod_{k = 1}^{L} (1 - ω_{k})]} \end{matrix}

(5)

\begin{matrix} μ = {[\sum_{j = 1}^{N} \prod_{k = 1}^{L} (ω_{k} β_{j, k} + 1 - ω_{k} \sum_{j = 1}^{N} β_{j, k}) - (N - 1) \prod_{k = 1}^{L} (1 - w_{k} \sum_{j = 1}^{N} β_{j, k})]}^{- 1} \end{matrix}

(6)

where

β_{n} (n = 1, 2, \dots, N)

represents the belief degree of the

n_{t h}

result level.

Step 4: Calculate the belief degrees of the assessment results:

P (x) = {(D_{n}, β_{n}) | n = 1, 2, \dots, N}

(7)

where x represents the input variable.

Step 5: Compute the fault diagnosis result of the model:

\begin{matrix} u (P (x)) = \sum_{n = 1}^{N} u (D_{n}) β_{n} \end{matrix}

(8)

where

u (D_{n})

represents the utility value of

D_{n}

.

2.2. Problem Description

In fault diagnosis of complex systems based on PS-BRB, the following issues need to be addressed:

Question 1: How to filter out high-quality pseudo-labels. In the self-training framework, the quality of pseudo-labels plays a critical role in determining model performance [30]. If all pseudo-labels are used for training without filtering, noisy labels may be introduced, which leads to overfitting or even degradation of the model. Therefore, a mechanism needs to be constructed to measure the reliability of the pseudo-labels and filter out a high-quality subset for model enhancement. Moreover, erroneous pseudo-labels can accumulate over multiple iterations, leading to error propagation and compounding bias in the model. This not only weakens generalization ability but may also cause the model to overfit incorrect patterns. Without effective control, such error amplification can severely limit the effectiveness of self-training. This process can be described as follows:

\begin{matrix} {\hat{Y}}^{h i g h} = f (X_{u}, \hat{Y}, Θ) \end{matrix}

(9)

where

f (\cdot)

denotes the high-quality pseudo-label filtering mechanism,

X_{u}

represents the unlabeled samples,

\hat{Y}

denotes the generated pseudo-labels,

Θ

refers to the model parameters, and

{\hat{Y}}^{h i g h}

denotes the set of filtered high-quality pseudo-labels.

Question 2: How to implement a perturbation-based self-training BRB enhancement method. The existing BRB models are highly dependent on expert knowledge or limited supervised samples in their construction, which are prone to overfitting and weakening the generalization ability when the samples are insufficient or the labels are inaccurate, restricting their application in complex systems [31]. To enhance its adaptability to unlabeled data, it is imperative to introduce a perturbation-based self-training BRB enhancement method driven by data, which can be described as shown below:

\begin{matrix} B R B^{*} = g (B R B, {\hat{Y}}^{h i g h}, P (X_{u}), Ψ) \end{matrix}

(10)

where

B R B^{*}

denotes the enhanced BRB model,

B R B

represents the initial BRB model,

{\hat{Y}}^{h i g h}

indicates the set of high-quality pseudo-labels,

P (X_{u})

denotes the perturbed samples generated by adding noise to the unlabeled data

X_{u}

,

Ψ

represents the set of optimization parameters, and

g (\cdot)

denotes the enhancement strategy based on pseudo-label generation and self-training for BRB.

3. A PS-BRB-Based Fault Diagnosis Method for Complex Systems

This section proposes a perturbation-based self-training PS-BRB fault diagnosis method, which consists of three key modules. Section 3.1 introduces the overall structure and workflow of the PS-BRB model. Section 3.2 presents the theoretical foundations of the proposed method, providing the mathematical derivations and justifications that support its design. Section 3.3 describes the filtering mechanism for high-quality pseudo-labels. Section 3.4 focuses on model optimization, where high-quality pseudo-labels and optimization algorithms are used to update and enhance the BRB parameters in a data-driven manner. Finally, Section 3.5 analyzes the computational complexity and scalability of the PS-BRB framework.

3.1. Description of the PS-BRB

The PS-BRB is a hybrid fault diagnosis framework that integrates expert knowledge, labeled data, and unlabeled data under a self-training mechanism. As illustrated in Figure 1, the PS-BRB consists of three core components: pseudo-label generation, high-quality pseudo-label filtering, and BRB model enhancement.

Initially, a BRB model is constructed based on expert knowledge and limited labeled samples. This initial model is used to infer pseudo-labels for unlabeled data, generating belief distributions and inference results. To evaluate the robustness of these pseudo-labels, a perturbation strategy is applied by adding Gaussian noise to the inputs, and the perturbed samples are re-inferred via the same BRB model. The belief distributions before and after perturbation are then compared.

A class consistency assessment is performed to ensure that inferred classes remain stable under perturbation. If consistency is met, the JS divergence between the belief distributions is calculated. Samples with low divergence, below the 90th percentile threshold, are considered reliable and are selected as high-quality pseudo-labels.

These filtered high-quality pseudo-labels are used, along with an optimization algorithm, to update the rule weights, attribute weights, and belief degrees of the BRB. The enhanced BRB model is then used for ER and final fault diagnosis. This iterative process allows the PS-BRB to leverage unlabeled data effectively, improving both accuracy and generalizability capability in complex system diagnostics.

3.2. Theoretical Foundations of PS-BRB

The PS-BRB method’s effectiveness is shown through three key aspects: consistent smoothness, distributional robustness, and reduced self-training generalization error. These findings demonstrate that perturbation-filtered pseudo-labels improve the BRB model’s stability and generalization performance.

Consistency regularization requires that model outputs remain smooth and consistent under small input perturbations. Within the BRB framework, Gaussian noise is applied to unlabeled samples, and inference is repeated so that the local smoothness of pseudo-labels can be assessed: only those pseudo-labels for which the hard label remains unchanged before and after perturbation are retained. This strategy is equivalent to imposing a perturbation consistency constraint during model training, which serves to eliminate predictions that are highly sensitive to slight input variations and thereby improves the reliability and accuracy of the pseudo-labels [32].

After consistency filtering, the JS divergence between the belief distributions obtained before and after perturbation is computed to quantify their difference. As a symmetric measure of similarity between two probability distributions, JS divergence characterizes the model’s robustness to uncertain inputs. The divergence is calculated for all samples with consistent class labels, and the 90th percentile of these values is used as a threshold. Only those pseudo-labels with divergence below this threshold are retained, ensuring that the selected samples not only have stable class assignments but also exhibit high consistency in their confidence distributions after perturbation, thereby further filtering out pseudo-labels with high uncertainty or unstable predictions [33].

The high-quality pseudo-labels are incorporated together with the labeled samples into the BRB parameter optimization, which is equivalent to adding a pseudo-label loss term to the original supervised loss. According to Vapnik–Chervonenkis (VC) dimension theory, by introducing pseudo-samples with high accuracy, the model’s generalization error bound can be reduced from

O (\sqrt{h / m})

to

O (\sqrt{h / (m + | S |)})

, where h is the model’s VC dimension, m is the number of labeled samples, and

| S |

is the number of high-quality pseudo-labels. This demonstrates that, provided the pseudo-labels are sufficiently accurate, the self-training mechanism can theoretically enhance the generalization capability and robustness of the BRB model [34].

3.3. High-Quality Pseudo-Label Filtering Mechanism

To ensure the quality of the pseudo-labels introduced during the self-training process and reduce the negative impact of noisy labels on model performance, this paper proposes a high-quality pseudo-label filtering mechanism based on perturbation self-training, which can be divided into the following four steps:

Step 1: Pseudo-label generation.

A single inference is performed on the unlabeled sample set

X_{u} = {x_{i}}

using the current BRB model (parameters

Θ

). For each sample, its utility value

u_{i}

and its belief distribution

p_{i}

are obtained. The utility value

u_{i}

is then discretized over the predefined reference values to yield the class label

{\hat{y}}_{i}

.

Step 2: Consistency screening.

Gaussian noise

δ_{i} \sim N (0, σ^{2})

is added to each sample

x_{i}

, producing the perturbed sample

{\tilde{x}}_{i} = x_{i} + δ_{i}

. A second inference is conducted to obtain the perturbed utility value

{\tilde{u}}_{i}

and belief distribution

q_{i}

. The perturbed utility

{\tilde{u}}_{i}

is discretized to yield the label

{\hat{y}}_{i}^{'}

. Samples for which

{\hat{y}}_{i} = {\hat{y}}_{i}^{'}

are retained, and labels that fail this condition are discarded.

Step 3: Distributional robustness assessment.

For all samples that passed consistency screening, the JS divergence between the original and perturbed belief distributions is computed:

\begin{matrix} JS (p_{i} ‖ q_{i}) = \frac{1}{2} [KL (p_{i} ‖ M_{i}) + KL (q_{i} ‖ M_{i})] \end{matrix}

(11)

\begin{matrix} M_{i} = \frac{1}{2} (p_{i} + q_{i}) \end{matrix}

(12)

The 90th percentile of the divergence values is selected as the threshold

τ

. Only those samples for which

J S (p_{i} | | q_{i}) \leq τ

are kept, so that belief distributions before and after perturbation remain highly consistent.

Step 4: High-quality pseudo-label set construction.

The samples that satisfy both the consistency and divergence criteria, along with their labels

{\hat{y}}_{i}

, form the high-quality pseudo-label set:

\begin{matrix} {\hat{Y}}^{high} = \{(x_{i}, {\hat{y}}_{i}) | {\hat{y}}_{i} = {\hat{y}}_{i^{'}} \land JS (p_{i} ‖ q_{i}) \leq τ\} \end{matrix}

(13)

This set is combined with labeled data for joint optimization of the BRB parameters, which improves the model’s robustness and generalizability.

3.4. Model Optimization

To enhance the diagnostic performance of the BRB model in complex systems, high-quality pseudo-labels are introduced into the training set under the framework of the projection covariance matrix adaptation evolution strategy (P-CMA-ES) [35]. These pseudo-labels are inferred by the initial BRB model and are further processed through Gaussian perturbation. Samples that satisfy category consistency and exhibit low JS divergence are selected to ensure label stability and reliability. Ultimately, both pseudo-labeled and labeled samples are utilized for model training.

As illustrated in Figure 2, the execution flow of the P-CMA-ES is presented, and the detailed optimization process is described as follows.

First, the objective function for constructing the PS-BRB is as follows:

\begin{matrix} \min M S E (ξ) \\ s . t . ξ = {θ_{k}, δ_{i}, β_{n, k}} \\ 0 \leq β_{n, k} \leq 1, 0 \leq θ_{k} \leq 1, \\ 0 \leq δ_{i} \leq 1, \sum_{n = 1}^{N} β_{n, k} \leq 1 \end{matrix}

(14)

where

M S E (\cdot)

denotes the loss function for model fault diagnosis with the following expression:

\begin{matrix} L (ζ) = L_{\sup} + L_{pseudo} \end{matrix}

(15)

\begin{matrix} L_{\sup} = \frac{1}{T} \sum_{t = 1}^{T} {(y_{t}^{true} - y_{t}^{est})}^{2} \end{matrix}

(16)

\begin{matrix} L_{pseudo} = \frac{1}{T^{'}} \sum_{t = 1}^{T^{'}} {(y_{t}^{pseudo} - y_{t}^{est})}^{2} \end{matrix}

(17)

where

L_{\sup}

denotes the prediction error for labeled samples, and

L_{pseudo}

denotes the prediction error for pseudo-labeled samples.

y_{t}^{true}

represents the ground truth of the

t_{t h}

labeled sample,

y_{t}^{pseudo}

denotes the soft pseudo-label of the

t_{t h}

pseudo-labeled sample (inferred by the initial BRB model), and

y_{t}^{est}

is the current output of the BRB model for the

t_{t h}

sample. T and

T^{'}

indicate the numbers of labeled and pseudo-labeled samples, respectively.

Step 1: Initialization.

The parameter vector is initialized as x

ω^{0} = ψ^{0}

, the iteration number is set to G, the initial step size is

ε^{0}

, the covariance matrix is

C^{0}

, and the population size is

ς

. This initialization not only specifies the search starting point but also determines the scale and direction of the initial exploration through

C^{0}

.

Step 2: Sampling operation.

\begin{matrix} ψ_{q}^{t + 1} \sim ω^{t} + ε^{t} N (0, C^{t}), q = 1, 2, \dots, ς \end{matrix}

(18)

where

ψ_{q}^{t + 1}

is the

q_{t h}

solution in generation

t + 1

,

ω

is the mean value of the population,

ε

represents the step size,

N

denotes the normal distribution, and

C^{t}

denotes the covariance matrix of the population in generation t. In this way, new solutions are generated around the current mean while incorporating random perturbations, balancing exploitation of the current search region with global exploration.

Step 3: Projection.

The projection operation is described as follows:

\begin{matrix} A^{e} ψ_{q}^{t} (1 + v_{e} \times (p - 1) : v_{e} \times p) = 1 \end{matrix}

(19)

\begin{matrix} ψ_{q}^{t + 1} (1 + v_{e} \times (p - 1) : v_{e} \times p) = \\ ψ_{q}^{t + 1} (1 + v_{e} \times (p - 1) : v_{e} \times p) - A_{e}^{K} \times {(A_{e} \times A_{e}^{K})}^{- 1} \\ \times ψ_{q}^{t + 1} (1 + v_{e} \times (p - 1) : v_{e} \times p) \times A_{e} \end{matrix}

(20)

where

v_{e}

denotes the number of variables in

ψ_{q}^{t}

;

p = 1, 2, \dots, N + 1

is the number of constants in the solution

ψ_{q}^{t}

; and

A_{e} = {[1 \dots 1]}_{1 \times N}

represents the parameter vector used in the sampling operation. This operation ensures that candidate solutions remain valid and satisfy model constraints, thereby avoiding infeasible parameter combinations.

Step 4: Mean update.

The population mean is updated according to the weighted average of the top

τ

solutions:

\begin{matrix} ω^{t + 1} = \sum_{q = 1}^{τ} h_{q} ψ_{q : ς}^{t + 1} \end{matrix}

(21)

where

ψ_{q : ς}^{t + 1}

is the

q_{t h}

output solution among the

ς

solutions in generation t + 1. This step shifts the search center toward higher-quality solutions, which reflects the principle of survival of the fittest and gradually improves the search direction.

Step 5: Covariance matrix adaptation.

The covariance matrix is updated as shown below:

\begin{matrix} C^{t + 1} = (1 - c_{1} - c_{2}) C^{t} + c_{1} p_{c}^{t + 1} {(p_{c}^{t + 1})}^{K} + c_{2} \sum_{q = 1}^{r} h_{q} (\frac{ψ_{q : ς}^{t + 1} - ω^{t}}{ε^{t}}) {(\frac{ψ_{q : ς}^{t + 1} - ω^{t}}{ε^{t}})}^{K} \end{matrix}

(22)

where

c_{1}

and

c_{2}

are the mechanism rates, and

p_{c}

is the evolutionary path of the covariance. The evolution is as follows:

\begin{matrix} p_{c}^{t + 1} = (1 - c_{c}) p_{c}^{t} + \sqrt{c_{c} (2 - c_{c})} (\sum_{q = 1}^{r} h_{q}^{2}) - \frac{1}{2} \frac{ω^{t + 1} - ω^{t}}{ε^{t}} \end{matrix}

(23)

By accumulating successful steps, the covariance matrix learns the correlations between parameters and adapts its shape to the problem landscape, which is similar to approximating second-order information of the objective function.

Step 6: Step-size adaptation.

Update the step length as follows:

\begin{matrix} ε^{t + 1} = ε^{t} \exp (\frac{c_{σ}}{d_{σ}} (\frac{∥p_{σ}^{t + 1}∥}{E ∥N (0, I)∥} - 1)) \end{matrix}

(24)

\begin{matrix} p_{σ}^{t + 1} = (1 - c_{c}) p_{c}^{t} + \sqrt{c_{c} (2 - c_{c})} (\sum_{q = 1}^{τ} h_{q}^{2}) - \frac{1}{2} (C^{(t)}) \frac{- \frac{1}{2} ω^{t + 1} - ω^{t}}{ε^{t}} \end{matrix}

(25)

where

c_{c}

is the evolutionary path backward time axis,

c_{σ}

denotes the enumeration of evolutionary path vectors,

d_{σ}

denotes the decay coefficient, and

E ∥N (0, I)∥

represents the expected length of

p_{σ}

and denotes the mathematical expectation.

p_{σ}

denotes the enumeration of evolutionary paths.

This mechanism adjusts the step size automatically: it enlarges when the search progresses consistently in one direction, enabling broader exploration, and shrinks when oscillations occur, enhancing local exploitation.

Step 7: Termination.

After G iterations, the optimization terminates and outputs the best parameter vector. This criterion ensures computational feasibility while providing sufficient iterations to converge to a robust solution.

The design of P-CMA-ES is supported by evolutionary computation and stochastic optimization theory. The mean update is regarded as an approximation of a gradient-descent direction, the covariance matrix adaptation is interpreted as learning second-order information of the objective function, and the step-size control based on path length is applied to maintain a balance between exploration and exploitation. In addition, the perturbation mechanism is introduced to preserve population diversity and improve robustness. Building on the established convergence properties of CMA-ES, the applicability of P-CMA-ES to constrained and complex optimization problems is theoretically ensured.

3.5. Computational Complexity and Scalability Analysis

The proposed PS-BRB framework inevitably introduces additional computational overhead due to perturbation, dual screening, and iterative optimization. To systematically evaluate the practicality of the method, this subsection analyzes its computational cost, space complexity, and scalability for large-scale applications.

(1) Perturbation.

Perturbation generation requires creating augmented representations of the input space. For a dataset with n samples and d features, the cost is approximately

O (n \cdot d)

. The memory consumption grows linearly with both sample size and feature dimensionality.

(2) Dual screening.

Dual screening is used to filter redundant or low-contribution rules. The dominant cost arises from ranking and comparison operations, which have a complexity of

O (n \log n)

. As the operation is executed on intermediate candidate sets, the associated memory requirement remains moderate.

(3) Iterative optimization.

Parameter learning in PS-BRB is realized via iterative optimization of the belief rules. Each iteration involves evaluating rule activation and updating parameters with complexity

O (n \cdot d)

. Over k iterations, the total cost

O (k \cdot n \cdot d)

. The space requirement is mainly determined by storing the rule base and intermediate parameters, which is also

O (n \cdot d)

.

(4) Overall computational burden.

The overall complexity of the framework is polynomial and dominated by the iterative optimization stage. While PS-BRB requires more resources than conventional BRB, the operations are highly parallelizable. Perturbation and dual screening can be executed in batch mode, and iterative optimization can be accelerated via GPU or distributed computing.

(5) Scalability and feasibility.

Despite the additional overhead, the framework remains scalable to medium- and large-scale datasets. The linear space complexity ensures that memory usage is manageable in practice. Moreover, the modular design of perturbation, screening, and optimization enables efficient implementation with parallel computing, making the approach applicable to real-world engineering systems. Future work will further explore lightweight strategies and distributed deployment to enhance large-scale applicability.

4. Case Study

In Section 4.1, the effectiveness of the proposed PS-BRB is fully validated through a case study on bearing fault diagnosis. In Section 4.2, the applicability of the model is further demonstrated by analyzing bearing fault diagnosis. A generalization analysis is conducted in Section 4.3. A comprehensive analysis of the experimental results is provided in Section 4.4.

The bearing is regarded as a critical transmission component in complex mechanical systems, and its operating conditions are considered to directly affect the overall performance and safety of the system [36]. Once a fault is detected in the bearing, energy transmission is disrupted, equipment is shut down, and broader systemic damage is likely to be caused [37]. Therefore, timely and accurate fault diagnosis is viewed as essential for ensuring stable operation, extending the service life, and reducing the maintenance costs of the mechanical equipment.

To verify the adaptability and effectiveness of the proposed method under different operating conditions and data sources, this section conducts an experimental analysis using the 30 Hz-2 V bearing dataset from Southeast University (SEU 30 Hz-2 V) and the 65 Hz bearing dataset from Huazhong University of Science and Technology (HUST 65 Hz). To avoid train–test leakage, all datasets were partitioned into training and testing subsets before feature extraction, normalization, and model optimization, ensuring that the testing data remained completely independent throughout the experiments.

4.1. Case 1: Bearing Fault Diagnosis Based on the PS-BRB (SEU 30 Hz-2 V)

In Section 4.1.1, the experimental background is introduced and the parameters are configured. In Section 4.1.2, the experimental results are analyzed.

4.1.1. Background Description and Experimental Parameter Settings

The bearing dataset provided by Southeast University is employed in this study for experimental validation [38]. This dataset is collected from a dynamic drive system (DDS) test bench under operating conditions of a 30 Hz rotational speed and 2 V load [39]. As shown in Table 1, the SEU bearing dataset comprises eight channels, covering motor vibration, gearbox vibration in three directions, and motor torque signals.

Channel 1 motor vibration signals are selected as the subject of analysis to validate the effectiveness of the proposed method. Five typical operating states of the bearing are covered: ball fault (Ball), inner ring fault (Inner), outer ring fault (Outer), combination fault on both the inner ring and outer ring (Combination) and health working state (Healthy). Time-domain features are extracted from the vibration signals corresponding to these states. A total of 1000 samples are collected, each comprising 1024 data points.

To enhance the representativeness of the input features, the out-of-bag predictor importance (OOBPredictorImportance) method is employed to rank the extracted time-domain features [40]. As shown in Figure 3, the standard deviation (Std) and root mean square (RMS) achieve the highest importance scores, indicating their significant contributions to fault classification. Therefore, they are selected as key features and retained for subsequent model development.

A total of 5000 samples covering five fault types are used with 1000 samples for each type. Among them, 1000 samples (200 per type) are randomly selected as the test set. From the remaining 4000 samples, 100 samples from each type (500 in total) are selected as labeled data, while the remaining 3500 samples are used as unlabeled data.

Std (

X_{1}

) is divided into five semantic values: small (VS), medium (VM), large (VL), and extreme (VZ). RMS (

X_{2}

) is assigned three semantic values: low (TL), medium (TM), high (TH) and extreme (TE). The reference values of these attributes are provided in Table 2 and Table 3. The corresponding result reference values are listed in Table 4. The initial belief distributions are presented in Table A1.

4.1.2. Experimental Results Analysis

The attribute reference values updated through PS-BRB are VS, VM, A1, VL, and VZ, as shown in Table 5. The updated attribute reference values

X_{2}

are TL, TM, TH, and TE, as shown in Table 6. The updated belief distribution table is provided in Table A2.

a. Comparative analysis of rule weights before and after the update on the SEU 30 Hz-2 V dataset.

Figure 4 presents a comparison of the rule weights before enhancement (original BRB) and after enhancement via the PS-BRB. The original model contains 16 rules, whereas the enhanced model retains these original rules and adds four new rules (R17–R20). It can be observed from the figure that PS-BRB induces significant adjustments in the rule weight distribution: the weights of some original rules (e.g., R2, R8) decrease markedly, whereas the weights of others (e.g., R10, R11, R16) increase substantially. This change indicates that, driven by the expansion of the training set with high-quality pseudo-labels, the evaluation of rule importance is restructured. The contribution of certain originally high-weight rules is partially shared by the newly added rules, whereas some rules with originally lower weights receive increased importance owing to their outstanding performance on the new data.

b. Comparison of PS-BRB with other BRBs on the SEU 30 Hz-2 V dataset.

Figure 5 shows the fault type predictions on the test set for three methods: PS-BRB (after perturbation self-training enhancement), BRB1 (baseline using only limited labeled data), and BRB2 (the “upper-bound” model trained with the full experimental labels). The blue line represents the true fault types, and the vertical axis denotes the fault category indices.

As shown in Figure 5, PS-BRB achieves the closest fit to the true trajectory. Within each stable interval, its predictions nearly coincide with the step plateaus; at transition points, overshoot and oscillation are significantly smaller than those of BRB1 and comparable to BRB2. In the black dashed regions, BRB1 exhibits numerous high-magnitude spikes and drops, reflecting unstable rule matching with limited labeled samples, whereas PS-BRB shows the lowest fluctuation amplitude and frequency, indicating that the expanded rule set with high-quality pseudo-labels better covers the feature distribution. In the pink dashed regions, BRB1 generates frequent random spikes and BRB2 occasionally produces outliers, while PS-BRB demonstrates the smallest overshoot and fastest convergence, highlighting stronger robustness to distributional shifts.

From the quantitative results in Table 7, PS-BRB achieves an accuracy of 98.4%, representing an improvement of 11.6 percentage points over BRB1 (86.8%) and only 0.3 percentage points lower than BRB2 (98.7%). For MSE, PS-BRB (0.1235) reduces the error by 25.7% compared with BRB1 (0.1661), though it remains higher than BRB2 (0.0706) due to a few extreme errors. In terms of F1 and Recall, PS-BRB is also markedly superior to BRB1 and very close to BRB2.

Overall, PS-BRB substantially improves diagnostic performance and stability under limited labels, significantly outperforming BRB1 and approaching the performance of BRB2. Although the additional pseudo-label filtering and iterative optimization lead to increased computational cost, the diagnostic gains justify the trade-off, as PS-BRB achieves accuracy comparable to the fully supervised model.

Figure 6 presents the five-class confusion matrix of PS-BRB on the test set. The results show a unidirectional sparse distribution without obvious symmetry, indicating that the rules learned by PS-BRB provide good discriminability and stability across class decision boundaries. Combined with the overall metrics in Table 7, PS-BRB achieves class-level consistency and robustness comparable to those of the full-label training model (BRB2) without relying on the complete set of ground-truth labels.

c. Ablation study: role of class consistency and JS threshold in pseudo-label filtering.

Table 8 reports the ablation results of four pseudo-label filtering strategies. The combined class consistency + JS threshold strategy achieves the best overall performance with the highest accuracy (0.9934), Macro-F1 (0.9912), Weighted-F1 (0.9934), and Macro-Recall (0.9887). In comparison, using class consistency only yields moderate improvement (accuracy 0.9452), while relying on JS divergence only leads to poorer results (accuracy 0.8796) and lower recall. The no-filtering strategy performs better than JS divergence only but remains inferior to either class consistency alone or the combined method. These results demonstrate that dual constraints from both class consistency and JS divergence provide complementary benefits, effectively suppressing noisy pseudo-labels and producing the most reliable performance.

Differences are further revealed by the confusion matrices in Figure 7 compared with the ground-truth labels. Obvious cross-class misclassifications across multiple categories are exhibited by the no-filtering strategy (Figure 7a). The accuracy of classes 1 and 3 is improved by the class-consistency strategy (Figure 7b), though a high misclassification rate persists for class 4. The recall for classes 0 and 4 is reduced by JS-divergence alone (Figure 7c), indicating that numerous valid samples matching ground-truth labels are removed while noise is suppressed.

A near-diagonal confusion matrix is produced by the combined class-consistency and JS-divergence threshold strategy (Figure 7d) with precision and recall maintained above 97% for nearly all classes compared with ground-truth labels. Class correctness is ensured and prediction stability is enhanced by this dual-filtering strategy, providing high-quality pseudo-labels for PS-BRB and establishing a solid foundation for subsequent training.

Overall, the dual-filtering strategy ensures class correctness while improving prediction stability, thereby providing PS-BRB with higher-quality pseudo-labels and laying a solid foundation for subsequent training.

d. Sensitivity analysis of pseudo-label filtering hyperparameters.

To validate the rationality of the pseudo-label filtering hyperparameter settings, the influence of the JS divergence threshold and Gaussian perturbation magnitude on model performance is further investigated.

With the perturbation magnitude fixed, the JS divergence threshold is varied. Under identical perturbation conditions, the 70th, 80th, and 90th percentiles are selected as thresholds, and the quality of the filtered pseudo-labels is evaluated. As shown in Table 9, increasing the threshold leads to a larger number of pseudo-labels, and the performance in terms of Accuracy, Macro-F1, and Macro-Recall gradually improves with the best results achieved at the 90th percentile threshold.

With the JS divergence threshold fixed at 90%, different combinations of Gaussian perturbation magnitudes are further evaluated. As shown in Table 10, excessively small or large perturbations result in performance degradation, whereas the best performance is achieved when (

δ_{1}

= 0.005,

δ_{2}

= 0.01), indicating that moderate perturbation magnitudes contribute to improving the stability of pseudo-label filtering. In this work, (

δ_{1}

= 0.005,

δ_{2}

= 0.01) is adopted as the perturbation strength configuration.

In summary, the sensitivity experiments verify that the combination of the 90% threshold with moderate perturbations yields the best performance, indicating that this strategy balances pseudo-label reliability and quantity, thereby enhancing the overall diagnostic capability of the model.

4.2. Case 2: Fault Diagnosis of a Bearing Based on the PS-BRB (HUST 65 Hz)

In Section 4.2.1, the experimental background is introduced and the parameters are configured. In Section 4.2.2, the experimental results are analyzed.

4.2.1. Background Description and Experimental Parameter Settings

In this experiment, the bearing dataset collected by the health perception laboratory of Huazhong University of Science and Technology is employed to validate the effectiveness of the proposed method [41]. The data are acquired using a fault diagnosis test bench for power transmission systems [42]. During the data acquisition process, bearing vibration signals along the X, Y, and Z axes are recorded by a triaxial accelerometer. The sampling frequency is set at 25.6 kHz, and each acquisition is conducted over a duration of 10.2 s.

In this work, data under the operating condition of a 65 Hz rotational speed are selected for analysis. The raw data file contains five channels: time step (Time), rotational speed (Speed), and vibration acceleration in the X, Y, and Z directions. To ensure the consistency of the input data and the controllability of the experiment, the vibration signal in the X-axis direction is selected for analysis. A sliding window approach is applied for time-domain feature extraction with a window length of 2048 and a step size of 256. Subsequently, based on the results of feature importance ranking, the two most representative time-domain features, namely Kurtosis and RMS, are subsequently selected as inputs to the model, as shown in Figure 8.

The experiment includes five typical operating conditions, including ball fault (Ball), inner race fault (Inner), outer race fault (Outer), compound fault (Comb, referring to the simultaneous presence of multiple fault types), and healthy condition (Health). Vibration data under each condition are used to construct training and testing samples. Fault diagnosis modeling is then performed to systematically evaluate the performance and adaptability of the proposed PS-BRB method in bearing fault diagnosis tasks.

A total of 5086 sample data points are obtained in this experiment, covering five diagnostic conditions with an equal number of samples for each class. The dataset is divided according to a 3:7 ratio with 30% (a total of 1525 samples) used as the testing set and the remaining 70% (a total of 3561 samples) used for model training. Within the training set, 712 samples are further selected as labeled data, while the remaining 2849 samples are treated as unlabeled data to simulate a diagnostic scenario with limited labeled information, which commonly occurs in real-world conditions.

In this work, five reference values of Kurtosis are selected:

S_{1}

,

S_{2}

,

S_{3}

,

S_{4}

, and

S_{5}

, and five reference values of RMS (

V_{1}

,

V_{2}

,

V_{3}

,

V_{4}

, and

V_{5}

) are selected. The diagnostic results corresponding to these reference values are presented accordingly. The specific reference values are provided in Table 11 and Table 12 with the corresponding result reference values listed in Table 13. The initial belief distributions are given in Table A3.

4.2.2. Experimental Results Analysis

The attribute reference values updated through PS-BRB are shown in Table 14 and Table 15, and the updated belief distribution is provided in Table A4.

a. Comparative analysis of rule weights before and after the update on the HUST-65 Hz dataset.

Figure 9a,b present the distributions of rule weights at two stages: before and after the update. The left plot illustrates the original rule weight distribution of the 25 rules prior to enhancement, whereas the right plot shows the updated rule weight distribution across all 36 rules with newly added rules (R2, R7–R12, R14, R20, R26, and R32) highlighted in red.

Overall, the rule system is significantly restructured during the self-enhancement process of the PS-BRB. Several rules with initially high weights (e.g., R6, R15, and R17) are substantially down-weighted, whereas some rules with relatively low initial weights (e.g., R9, R11, and R14) are assigned considerably higher weights after the update. This indicates that the model re-evaluates the contribution of each rule under the influence of high-quality pseudo-labels. Among the newly introduced rules, several (e.g., R9, R11, R12, R14, R20, and R26) receive high weights, suggesting strong discriminative capabilities and confirming the effectiveness of the perturbation-based self-training mechanism in rule expansion and optimization.

In addition, the updated rule weight distribution becomes more balanced, reducing over-reliance on a few dominant rules, which helps improve the model’s robustness and generalizability. The radar plots further provide a clear visual representation of the differences in rule weights and structural changes, offering valuable support for evaluating and analyzing the rule system.

b. Comparison of PS-BRB with other BRBs on the HUST-65 Hz dataset.

Figure 10 shows the comparison of fault diagnosis type prediction results on the test set for three methods: the baseline model (BRB1) trained with limited labeled data, the “upper-bound” model (BRB2) trained with fully labeled data, and the proposed PS-BRB method, which integrates perturbation-based self-training and high-quality pseudo-label filtering. The blue line represents the true fault types.

Figure 10 illustrates that the prediction trajectory of PS-BRB on the test set closely aligns with the true fault types with the predicted values in stable intervals nearly coinciding. The overshoot and oscillation amplitudes near the transition points are significantly lower than those of BRB1 and approach those of BRB2, which is trained with fully labeled data. Notably, in the segment marked by the purple dashed circle, BRB1 exhibits numerous high-amplitude spikes and drops, indicating unstable rule matching under limited labeled conditions. Although BRB2 generally maintains stability, occasional large deviations are observed. In contrast, PS-BRB displays the lowest fluctuation amplitude and frequency with the fastest convergence speed after transitions, reflecting superior robustness to distribution drift.

The quantitative results in Table 16 further corroborate these findings. The accuracy of PS-BRB reaches 93.44%, representing an improvement of 14.358 percentage points (approximately 18.15% relative improvement) over BRB1 (79.082%), which is trained with limited labeled data, and it is only 0.52 percentage points lower than BRB2 (93.96%), which is trained with fully labeled data, nearly matching its performance. In terms of MSE, PS-BRB achieves a value of 0.2507, which is a reduction of approximately 37.04% compared with BRB1 (0.3982). Although slightly greater than BRB2 (0.2333), the MSE difference is inferred to be driven primarily by a small number of outlier samples in the high-category range in the later segments, as observed in Figure 10.

In summary, PS-BRB achieves an accuracy nearly comparable to that of fully labeled training without requiring complete true labels while significantly reducing prediction fluctuations and the mean squared error. This demonstrates that the proposed high-quality pseudo-label filtering and perturbation-based self-training mechanisms effectively enhance rule coverage and model robustness, providing an efficient self-augmentation pathway for BRB-based fault diagnosis.

c. Ablation study: role of class consistency and JS threshold in pseudo-label filtering.

Table 17 presents the accuracy results of the four pseudo-label filtering strategies compared with the ground-truth labeled data. The unfiltered strategy yields the lowest accuracy (0.8265), as noisy samples are retained without constraints, leading to poor label quality.

The class-consistency-only strategy achieves a higher accuracy (0.9289) by effectively eliminating some misclassified samples. In contrast, the JS-divergence-only strategy, with an accuracy of 0.8876, retains many erroneous samples due to the absence of class-consistency constraints.

The combined class-consistency and JS-divergence threshold strategy attains the highest accuracy (0.9388). Dual filtering effectively removes samples with significant confidence distribution discrepancies while ensuring class consistency, thus minimizing noise and maximizing valid pseudo-label retention.

These results confirm that the dual filtering strategy significantly enhances pseudo-label quality, providing a robust foundation for optimizing PS-BRB performance.

4.3. Generalization Analysis

To validate the generalization capability of the proposed model, experiments were conducted on three datasets from different sources, namely a diesel engine dataset, the Southeast University gearbox dataset, and a spacecraft flywheel system dataset. Table 18 presents the performance of the enhanced BRB method on fault diagnosis tasks, while Table 19 demonstrates the improvement in pseudo-label quality achieved by the proposed “dual selection filtering” under ablation study conditions.

From the results presented in Table 18 and Table 19, the performance of PS-BRB across multiple datasets was demonstrated to exhibit strong generalization capability. Compared with the traditional BRB method, PS-BRB was able to effectively exploit unlabeled samples under limited annotation conditions, and its performance was further enhanced through high-quality pseudo-label filtering combined with a self-training mechanism. In terms of key metrics such as accuracy, recall, and F1-score, the overall performance of PS-BRB was shown to be close to, and in some cases approaching, that of the fully supervised model. These findings indicate that the proposed method can overcome the limitation of label scarcity while maintaining considerable adaptability across different datasets.

Moreover, the ablation study highlighted the critical role of the dual-constraint filtering mechanism in improving the quality of pseudo-labels, thereby ensuring both the stability and robustness of the model. Taken together, the results suggest that the proposed PS-BRB method possesses substantial potential for application in complex system fault diagnosis tasks and holds considerable value for broader practical deployment in real-world engineering scenarios.

4.4. Summary of Experiments

The perturbation self-training-based BRB enhancement method proposed in this paper is fully validated via fault diagnosis experiments on two bearing datasets. The experimental results show that the PS-BRB method has significant advantages in terms of generalizability, accuracy, robustness, interpretability, and anti-interference capability.

(1) Generalization.

PS-BRB is able to effectively utilize unlabeled data through a self-training mechanism with limited labels, thus significantly improving the generalization ability of the model. The experimental results show that the accuracy of PS-BRB is close to that of a fully labeled training model under the condition of when a small number of labels are used, demonstrating a strong adaptive ability.

(2) Accuracy.

In experiments conducted on the SEU 30 Hz-2 V and HUST 65 Hz datasets, PS-BRB demonstrated a significant improvement in accuracy. The accuracy rates reached 98.4% and 93.44%, respectively, representing an increase of approximately 9.58% and 18.16% over the baseline model. This validates the effectiveness of the PS-BRB in enhancing fault diagnosis precision.

(3) Robustness and anti-interference capability.

Through perturbation-based self-training, the PS-BRB enhances the model’s robustness to data perturbations. After perturbation, the PS-BRB effectively reduces the impact of interference on the prediction outcomes, ensuring the model’s stability and accuracy in unstable environments.

(4) Interpretability.

The interpretability inherent in the BRB model is preserved by PS-BRB, while the optimization of rules and high-quality pseudo-label filtering further enhance the transparency and traceability of the reasoning process. Consequently, the decision-making process of the model is rendered clearer and more comprehensible.

In summary, PS-BRB has significant advantages in complex fault diagnosis tasks with limited labels. By introducing a high-quality pseudo-label filtering mechanism and a perturbation-based self-training strategy, PS-BRB effectively utilizes unlabeled data to improve model performance under limited label conditions while enhancing the model’s robustness to data perturbations and noise. This makes PS-BRB highly adaptable and reliable in practical applications—particularly in fault diagnosis scenarios with insufficient labels or high levels of noise.

5. Limitations and Future Work

Despite the promising results, several limitations of the proposed PS-BRB remain. First, although the pseudo-label filtering mechanism substantially reduces noise, there is still a potential risk of error propagation. Once incorrect pseudo-labels are introduced, they may be repeatedly reinforced during self-training iterations, eventually leading to biased rule updates and compromised model stability. Second, in scenarios with severe class imbalance, pseudo-labels of minority classes tend to be sparse and more error-prone, which can result in unstable training dynamics, biased predictions, and reduced diagnostic reliability. These issues highlight the need for more advanced strategies to control label noise and balance class distributions. Third, the current experiments only compare PS-BRB with BRB-based methods; CNNs and other deep learning models, which are widely adopted in fault diagnosis, have not yet been included as benchmarks. This may limit the completeness of performance evaluation.

Future research will therefore focus on several directions: (1) designing more robust pseudo-label generation and adaptive thresholding mechanisms to mitigate the risk of error propagation; (2) incorporating class-balancing techniques such as re-sampling, cost-sensitive learning, or adversarial training to enhance minority-class performance; and (3) extending the experimental comparisons to CNNs and other neural network models to establish a more comprehensive performance benchmark and validate the advantages of PS-BRB in diverse contexts.

6. Conclusions

This paper proposes the perturbation-based self-training BRB (PS-BRB) framework to address the challenge of limited labeled data in fault diagnosis. By integrating perturbation consistency and high-quality pseudo-label filtering into the BRB framework, the proposed method effectively exploits unlabeled data and significantly improves diagnostic accuracy and robustness under sparse or noisy labeling conditions. Experiments on both bearing and gearbox datasets demonstrate that PS-BRB not only surpasses BRB models trained with limited labels but also approaches the performance of fully labeled models, highlighting its generalizability across different rotating machinery.

Nevertheless, the study also acknowledges key limitations, including the potential risk of pseudo-label error propagation and instability in the presence of severe class imbalance. Addressing these issues constitutes an important avenue for future research. Overall, PS-BRB provides a practical and interpretable solution to fault diagnosis with scarce labeled data, offering strong potential for deployment in real-world industrial systems where labeling costs are high and operational environments are uncertain.

Author Contributions

Conceptualization, Z.F. and G.H.; methodology, Z.F. and W.H.; software, G.H. and W.H.; validation, M.Z. and H.D.; formal analysis, Z.F.; investigation, W.H.; resources, H.D.; data curation, W.H. and M.Z.; writing—original draft preparation, Z.F.; writing—review and editing, M.Z. and H.D.; visualization, G.H.; supervision, W.H.; project administration, W.H.; funding acquisition, W.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key Laboratory of Equipment Data Security and Guarantee Technology, Ministry of Education under Grant No. GDZB2024050100; in part by the Natural Science Foundation of Heilongjiang Province under Grant No. PL2024G009; in part by the Basic Research Support Program for Outstanding Young Teachers in Provincial Undergraduate Universities of Heilongjiang Province under Grant No. YQJH2024116; in part by the Shandong Provincial Natural Science Foundation under Grant No. ZR2023QF010; and in part by the National Science Foundation of China under Grant No. 72471067.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The SEU Bearing dataset supporting the findings of this study is available at: https://github.com/Yxz3930/SEU-datasets (accessed on 1 September 2025). The HUST Bearing dataset is available at: https://github.com/CHAOZHAO-1/HUSTbearing-dataset (accessed on 1 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BRB	Belief Rule Base
PS-BRB	Perturbation-based Self-training Belief Rule Base
HMM	Hidden Markov Model
CNN	Convolutional Neural Network
SVM	Support Vector Machine
PCA	Principal Component Analysis
IIoT	Industrial Internet of Things
ER	Evidential Reasoning
JS	Jensen–Shannon (divergence)
VC	Vapnik–Chervonenkis (dimension theory)
P-CMA-ES	Projection Covariance Matrix Adaptation Evolution Strategy
DDS	Dynamic Drive System
OOB	Out-of-Bag (Predictor Importance)
RMS	Root Mean Square
Std	Standard Deviation
SEU 30 Hz-2 V	Southeast University 30 Hz-2 V bearing dataset
HUST 65 Hz	Huazhong University of Science and Technology 65 Hz bearing dataset

Appendix A

Table A1. Initial belief distribution for the SEU 30 Hz-2 V bearing dataset.

Rule Number	Rule Weight	$X_{1}$	$X_{2}$	The Belief Distribution
1	0.6890	VS	TL	{0.1722, 0.0377, 0.1460, 0.1221, 0.5220}
2	0.9772	VS	TM	{0.0000, 0.0000, 0.0000, 0.0209, 0.9791}
3	0.0423	VS	TH	{0.2340, 0.1824, 0.2362, 0.2555, 0.0919}
4	0.0866	VS	TE	{0.2388, 0.3071, 0.0941, 0.1493, 0.2106}
5	0.5149	VM	TL	{0.0450, 0.4090, 0.2297, 0.0695, 0.2468}
6	0.0082	VM	TM	{0.1155, 0.2428, 0.0507, 0.3373, 0.2537}
7	0.3220	VM	TH	{0.9920, 0.0025, 0.0000, 0.0000, 0.0055}
8	0.9179	VM	TE	{0.0790, 0.1300, 0.3374, 0.2598, 0.1938}
9	0.3366	VL	TL	{0.0812, 0.0414, 0.0000, 0.5293, 0.3481}
10	0.0187	VL	TM	{0.0659, 0.0401, 0.0707, 0.2847, 0.5386}
11	0.3968	VL	TH	{0.5383, 0.3286, 0.0131, 0.0412, 0.0788}
12	0.3905	VL	TE	{0.0497, 0.2528, 0.5663, 0.0837, 0.0475}
13	0.4896	VZ	TL	{0.1081, 0.2072, 0.0991, 0.3905, 0.1951}
14	0.2331	VZ	TM	{0.4402, 0.1033, 0.2194, 0.0119, 0.2252}
15	0.5656	VZ	TH	{0.1752, 0.2181, 0.3073, 0.1165, 0.1829}
16	0.1272	VZ	TE	{0.2546, 0.3362, 0.3213, 0.0542, 0.0337}

Table A2. Updated belief distributions for the SEU-30 Hz-2 V bearing dataset.

Rule Number	Rule Weight	$X_{1}$	$X_{2}$	The Belief Distribution
1	0.8003	VS	TL	{0.0722, 0.1820, 0.0858, 0.4493, 0.2107}
2	0.6777	VS	TM	{0.0000, 0.0000, 0.0000, 0.0172, 0.9828}
3	0.3608	VS	TH	{0.3308, 0.0501, 0.0924, 0.1610, 0.3657}
4	0.2115	VS	TE	{0.1529, 0.3572, 0.3202, 0.0814, 0.0883}
5	0.5752	VM	TL	{0.2926, 0.2791, 0.1624, 0.0640, 0.2019}
6	0.1018	VM	TM	{0.0527, 0.5380, 0.0098, 0.3167, 0.0828}
7	0.0389	VM	TH	{0.0088, 0.0162, 0.3321, 0.4601, 0.1828}
8	0.3292	VM	TE	{0.3741, 0.1282, 0.0508, 0.3478, 0.0991}
9	0.7781	A1	TL	{0.2649, 0.1700, 0.2527, 0.1135, 0.1989}
10	0.3810	A1	TM	{0.2534, 0.3001, 0.0251, 0.2311, 0.1903}
11	0.0573	A1	TH	{0.9946, 0.0003, 0.0003, 0.0003, 0.0045}
12	0.7584	A1	TE	{0.1323, 0.0800, 0.2665, 0.2386, 0.2826}
13	0.0183	VL	TL	{0.1430, 0.0508, 0.0506, 0.4431, 0.3125}
14	0.8676	VL	TM	{0.0623, 0.0572, 0.0565, 0.4501, 0.3739}
15	0.5944	VL	TH	{0.6030, 0.0454, 0.0970, 0.1993, 0.0553}
16	0.5228	VL	TE	{0.3900, 0.0182, 0.2894, 0.2839, 0.0185}
17	0.2273	VZ	TL	{0.1058, 0.4857, 0.0923, 0.2899, 0.0263}
18	0.0788	VZ	TM	{0.2915, 0.1168, 0.2178, 0.2700, 0.1039}
19	0.0644	VZ	TH	{0.6005, 0.0831, 0.0392, 0.0978, 0.1794}
20	0.5014	VZ	TE	{0.2040, 0.0893, 0.0143, 0.3312, 0.3612}

Table A3. Initial belief distribution for the HUST 65 Hz dataset.

Rule Number	Rule Weight	Kurtosis	RMS	The Belief Distribution
1	0.6254	$S_{1}$	$V_{1}$	{0.0000, 0.0000, 0.0000, 0.0335, 0.9665}
2	0.0144	$S_{1}$	$V_{2}$	{0.1213, 0.2781, 0.0000, 0.1136, 0.4870}
3	0.0380	$S_{1}$	$V_{3}$	{0.2369, 0.1187, 0.3616, 0.0113, 0.2715}
4	0.6923	$S_{1}$	$V_{4}$	{0.7256, 0.2188, 0.0048, 0.0235, 0.0273}
5	0.5112	$S_{1}$	$V_{5}$	{0.4115, 0.1140, 0.1984, 0.0778, 0.1983}
6	0.8870	$S_{2}$	$V_{1}$	{0.0616, 0.0358, 0.2311, 0.0640, 0.6075}
7	0.5071	$S_{2}$	$V_{2}$	{0.8818, 0.0706, 0.0274, 0.0104, 0.0188}
8	0.6899	$S_{2}$	$V_{3}$	{0.9524, 0.0152, 0.0268, 0.0097, 0.0000}
9	0.2700	$S_{2}$	$V_{4}$	{0.9215, 0.0545, 0.0110, 0.0200, 0.0000}
10	0.4860	$S_{2}$	$V_{5}$	{0.0900, 0.2196, 0.2947, 0.0349, 0.3608}
11	0.4775	$S_{3}$	$V_{1}$	{0.3580, 0.3308, 0.1541, 0.1200, 0.0371}
12	0.3000	$S_{3}$	$V_{2}$	{0.1775, 0.2401, 0.2607, 0.2575, 0.0642}
13	0.5615	$S_{3}$	$V_{3}$	{0.0000, 0.1021, 0.1193, 0.3283, 0.4503}
14	0.2473	$S_{3}$	$V_{4}$	{0.0630, 0.1243, 0.0519, 0.1946, 0.5662}
15	0.9498	$S_{3}$	$V_{5}$	{0.1518, 0.3017, 0.2578, 0.1472, 0.1415}
16	0.0776	$S_{4}$	$V_{1}$	{0.0106, 0.1091, 0.0015, 0.7355, 0.1433}
17	0.9219	$S_{4}$	$V_{2}$	{0.4636, 0.1990, 0.2563, 0.0413, 0.0398}
18	0.6470	$S_{4}$	$V_{3}$	{0.3702, 0.4164, 0.1208, 0.0809, 0.0117}
19	0.8303	$S_{4}$	$V_{4}$	{0.6507, 0.0739, 0.1065, 0.0291, 0.1398}
20	0.0928	$S_{4}$	$V_{5}$	{0.1143, 0.1474, 0.0744, 0.5683, 0.0956}
21	0.1478	$S_{5}$	$V_{1}$	{0.0857, 0.0818, 0.2897, 0.4772, 0.0656}
22	0.2843	$S_{5}$	$V_{2}$	{0.3186, 0.0769, 0.1400, 0.3990, 0.0655}
23	0.5590	$S_{5}$	$V_{3}$	{0.2233, 0.1406, 0.3237, 0.1892, 0.1232}
24	0.6533	$S_{5}$	$V_{4}$	{0.1292, 0.3439, 0.0237, 0.4473, 0.0559}
25	0.1194	$S_{5}$	$V_{5}$	{0.4168, 0.2583, 0.1232, 0.1053, 0.0964}

Table A4. Updated belief distributions for the HUST-65 Hz dataset.

Rule Number	Rule Weight	Kurtosis	RMS	The Belief Distribution
1	0.2421	$S_{1}$	$V_{1}$	{0.3957, 0.0139, 0.0919, 0.0521, 0.4463}
2	0.2994	$S_{1}$	$C_{4}$	{0.0150, 0.0155, 0.0426, 0.0000, 0.9268}
3	0.0527	$S_{1}$	$V_{2}$	{0.0850, 0.1772, 0.2062, 0.1937, 0.3430}
4	0.1224	$S_{1}$	$V_{3}$	{0.1935, 0.3370, 0.2540, 0.0856, 0.1299}
5	0.4005	$S_{1}$	$V_{4}$	{0.1777, 0.0416, 0.2392, 0.2887, 0.2530}
6	0.1807	$S_{1}$	$V_{5}$	{0.2735, 0.0519, 0.1502, 0.0475, 0.4768}
7	0.3043	$B_{1}$	$V_{1}$	{0.0097, 0.0490, 0.6492, 0.0652, 0.2269}
8	0.6235	$B_{1}$	$C_{4}$	{0.0057, 0.0000, 0.0056, 0.0000, 0.9887}
9	0.8039	$B_{1}$	$V_{2}$	{0.9934, 0.0034, 0.0032, 0.0000, 0.0000}
10	0.0550	$B_{1}$	$V_{3}$	{0.3378, 0.1228, 0.1509, 0.2649, 0.1236}
11	0.6981	$B_{1}$	$V_{4}$	{0.1668, 0.0940, 0.5392, 0.0824, 0.1175}
12	0.9579	$B_{2}$	$V_{1}$	{0.1039, 0.5761, 0.0361, 0.1296, 0.1543}
13	0.8794	$S_{2}$	$V_{2}$	{0.1668, 0.0940, 0.5392, 0.0824, 0.1175}
14	0.9668	$S_{2}$	$V_{3}$	{0.0867, 0.2409, 0.0580, 0.3570, 0.1483}
15	0.0236	$S_{2}$	$V_{4}$	{0.2714, 0.4586, 0.0030, 0.0803, 0.1863}
16	0.4062	$S_{2}$	$V_{5}$	{0.2168, 0.1774, 0.3525, 0.0417, 0.2116}
17	0.4952	$S_{2}$	$V_{4}$	{0.9574, 0.0033, 0.0033, 0.0110, 0.0250}
18	0.0905	$S_{2}$	$V_{5}$	{0.0636, 0.1289, 0.4249, 0.3133, 0.0693}
19	0.2293	$S_{3}$	$V_{1}$	{0.0907, 0.5176, 0.0159, 0.1559, 0.1782}
20	0.6716	$S_{3}$	$C_{4}$	{0.7158, 0.0186, 0.0109, 0.0649, 0.1897}
21	0.1652	$S_{3}$	$V_{2}$	{0.0050, 0.2274, 0.1571, 0.3441, 0.2213}
22	0.4862	$S_{3}$	$V_{3}$	{0.0047, 0.1729, 0.0693, 0.2947, 0.4585}
23	0.4869	$S_{3}$	$V_{4}$	{0.0442, 0.0553, 0.2563, 0.1590, 0.4869}
24	0.5122	$S_{3}$	$V_{5}$	{0.1619, 0.2220, 0.1989, 0.2015, 0.2157}
25	0.9726	$S_{4}$	$V_{1}$	{0.6346, 0.0945, 0.0419, 0.0679, 0.1881}
26	0.6488	$S_{4}$	$C_{4}$	{0.4803, 0.2115, 0.1065, 0.0115, 0.1901}
27	0.9573	$S_{4}$	$V_{2}$	{0.4284, 0.3386, 0.0795, 0.0981, 0.0554}
28	0.5581	$S_{4}$	$V_{3}$	{0.4609, 0.1587, 0.2268, 0.0228, 0.0654}
29	0.9965	$S_{4}$	$V_{4}$	{0.3902, 0.4961, 0.0028, 0.0040, 0.1069}
30	0.4573	$S_{4}$	$V_{5}$	{0.1751, 0.7064, 0.3079, 0.3512, 0.0894}
31	0.2429	$S_{5}$	$V_{1}$	{0.1516, 0.1010, 0.0933, 0.5084, 0.1458}
32	0.2477	$S_{5}$	$C_{4}$	{0.0359, 0.2976, 0.4518, 0.1042, 0.1105}
33	0.1711	$S_{5}$	$V_{2}$	{0.0614, 0.0419, 0.0287, 0.1028, 0.2887}
34	0.2661	$S_{5}$	$V_{3}$	{0.1326, 0.0937, 0.4515, 0.0741, 0.2481}
35	0.2427	$S_{5}$	$V_{4}$	{0.0363, 0.1455, 0.2924, 0.1263, 0.3994}
36	0.1852	$S_{5}$	$V_{5}$	{0.1250, 0.2401, 0.3480, 0.1065, 0.1803}

References

Isermann, R. Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2005. [Google Scholar] [CrossRef]
Soleimani, M.; Campean, F.; Neagu, D. Diagnostics and prognostics for complex systems: A review of methods and challenges. Qual. Reliab. Eng. Int. 2021, 37, 3746–3778. [Google Scholar] [CrossRef]
Bao, W.; Liu, S.; Liu, Z.; Li, F. Generalized synchroextracting transform: Algorithm and applications. Mech. Syst. Signal Process. 2025, 224, 112116. [Google Scholar] [CrossRef]
Cheng, Y.; Zhou, N.; Wang, Z.; Chen, B.; Zhang, W. CFFsBD: A candidate fault frequencies-based blind deconvolution for rolling element bearings fault feature enhancement. IEEE Trans. Instrum. Meas. 2023, 72, 3506412. [Google Scholar] [CrossRef]
Zhou, F.; Zhu, B.; Liu, F. Identification of Electrical Anomalies in Co-Phase Power Supply System. IEEE Trans. Transp. Electrif. 2025, 11, 9212–9223. [Google Scholar] [CrossRef]
Li, H.; Yin, X.; He, W.; Feng, Z.; Cao, Y. A new fault diagnosis method based on belief rule base with attribute reliability considering multi-fault features. IEEE Access 2023, 11, 92766–92774. [Google Scholar] [CrossRef]
Swischuk, R.; Mainini, L.; Peherstorfer, B.; Willcox, K. Projection-based model reduction: Formulations for physics-based machine learning. Comput. Fluids 2019, 179, 704–717. [Google Scholar] [CrossRef]
Nath, A.G.; Udmale, S.S.; Singh, S.K. Role of artificial intelligence in rotor fault diagnosis: A comprehensive review. Artif. Intell. Rev. 2021, 54, 2609–2668. [Google Scholar] [CrossRef]
Huang, D.; Ke, L.; Chen, X.; Zhao, L.; Mi, B. Fault diagnosis for the motor drive system of urban transit based on improved Hidden Markov Model. Microelectron. Reliab. 2018, 82, 179–189. [Google Scholar] [CrossRef]
Jafari, A.; Faiz, J.; Jarrahi, M.A. A simple and efficient current-based method for interturn fault detection in BLDC motors. IEEE Trans. Ind. Inform. 2020, 17, 2707–2715. [Google Scholar] [CrossRef]
Wan, H.; Zhang, Z.; He, W.; Li, M.; Zhu, H. A new automated interval structure belief rule base-based fault diagnosis method for complex systems. Nonlinear Dyn. 2025, 113, 8391–8422. [Google Scholar] [CrossRef]
Fan, C.; Xiao, F.; Yan, C.; Liu, C.; Li, Z.; Wang, J. A novel methodology to explain and evaluate data-driven building energy performance models based on interpretable machine learning. Appl. Energy 2019, 235, 1551–1560. [Google Scholar] [CrossRef]
Zhang, X.; Li, J.; Wu, W.; Dong, F.; Wan, S. Multi-fault classification and diagnosis of rolling bearing based on improved convolution neural network. Entropy 2023, 25, 737. [Google Scholar] [CrossRef]
Pule, M.; Matsebe, O.; Samikannu, R. Application of PCA and SVM in fault detection and diagnosis of bearings with varying speed. Math. Probl. Eng. 2022, 2022, 5266054. [Google Scholar] [CrossRef]
Chang, L.L.; Zhou, Z.J.; Chen, Y.W.; Liao, T.J.; Hu, Y.; Yang, L.H. Belief rule base structure and parameter joint optimization under disjunctive assumption for nonlinear complex system modeling. IEEE Trans. Syst. Man Cybern. Syst. 2017, 48, 1542–1554. [Google Scholar] [CrossRef]
Xia, J.; Huang, R.; Chen, Z.; He, G.; Li, W. A novel digital twin-driven approach based on physical-virtual data fusion for gearbox fault diagnosis. Reliab. Eng. Syst. Saf. 2023, 240, 109542. [Google Scholar] [CrossRef]
Meléndez-Useros, M.; Viadero-Monasterio, F.; Jiménez-Salas, M.; Boada, M.J.L. Active steering fault diagnosis via integrated LSTM-based sensor detection and robust actuator fault estimation. Reliab. Eng. Syst. Saf. 2025, 265, 111573. [Google Scholar] [CrossRef]
Tarus, J.K.; Niu, Z.; Mustafa, G. Knowledge-based recommendation: A review of ontology-based recommender systems for e-learning. Artif. Intell. Rev. 2018, 50, 21–48. [Google Scholar] [CrossRef]
Chen, M.; Xiao, N.C.; Zuo, M.J.; Ding, Y. An efficient algorithm for finding modules in fault trees. IEEE Trans. Reliab. 2019, 70, 862–874. [Google Scholar] [CrossRef]
Chi, Y.; Dong, Y.; Wang, Z.J.; Yu, F.R.; Leung, V.C. Knowledge-based fault diagnosis in industrial internet of things: A survey. IEEE Internet Things J. 2022, 9, 12886–12900. [Google Scholar] [CrossRef]
Tavana, M.; Hajipour, V. A practical review and taxonomy of fuzzy expert systems: Methods and applications. Benchmarking Int. J. 2020, 27, 81–136. [Google Scholar] [CrossRef]
Qiu, S.; Sallak, M.; Schön, W.; Ming, H.X. A valuation-based system approach for risk assessment of belief rule-based expert systems. Inf. Sci. 2018, 466, 323–336. [Google Scholar] [CrossRef]
Cheng, C.; Qiao, X.; Luo, H.; Teng, W.; Gao, M.; Zhang, B.; Yin, X. A semi-quantitative information based fault diagnosis method for the running gears system of high-speed trains. IEEE Access 2019, 7, 38168–38178. [Google Scholar] [CrossRef]
Yuan, J.; Wang, F.; Wang, S.; Zhao, L. A fault diagnosis approach by DS fusion theory and hybrid expert knowledge system. Acta Autom. Sin. 2017, 43, 1580–1587. [Google Scholar] [CrossRef]
Zhao, B.; Zhang, Q.; He, W.; Han, P.; Cao, Y.; Zhou, G. A deep belief rule base-based fault diagnosis method for complex systems. ISA Trans. 2024, 150, 77–91. [Google Scholar] [CrossRef]
Sobie, C.; Freitas, C.; Nicolai, M. Simulation-driven machine learning: Bearing fault classification. Mech. Syst. Signal Process. 2018, 99, 403–419. [Google Scholar] [CrossRef]
Matania, O.; Cohen, R.; Bechhoefer, E.; Bortman, J. Zero-fault-shot learning for bearing spall type classification by hybrid approach. Mech. Syst. Signal Process. 2025, 224, 112117. [Google Scholar] [CrossRef]
Yang, J.B.; Liu, J.; Wang, J.; Sii, H.S.; Wang, H.W. Belief rule-base inference methodology using the evidential reasoning approach-RIMER. IEEE Trans. Syst. Man Cybern.-Part A Syst. Humans 2006, 36, 266–285. [Google Scholar] [CrossRef]
Tang, S.W.; Zhou, Z.J.; Hu, C.H.; Yang, J.B.; Cao, Y. Perturbation analysis of evidential reasoning rule. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 4895–4910. [Google Scholar] [CrossRef]
Ran, L.; Li, Y.; Liang, G.; Zhang, Y. Pseudo Labeling Methods for Semi-Supervised Semantic Segmentation: A Review and Future Perspectives. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 3054–3080. [Google Scholar] [CrossRef]
Makarova, A.; Shen, H.; Perrone, V.; Klein, A.; Faddoul, J.B.; Krause, A.; Seeger, M.; Archambeau, C. Overfitting in Bayesian optimization: An empirical study and early-stopping solution. In Proceedings of the 2nd Workshop on Neural Architecture Search (NAS 2021)@ICLR 2021, Virtual, 7 May 2021. [Google Scholar] [CrossRef]
Laine, S.; Aila, T. Temporal ensembling for semi-supervised learning. arXiv 2016, arXiv:1610.02242. [Google Scholar] [CrossRef]
Qian, H. An Ensemble Approach Towards Adversarial Robustness. arXiv 2021, arXiv:2106.05996. [Google Scholar] [CrossRef]
Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.J.; Hu, G.Y.; Hu, C.H.; Wen, C.L.; Chang, L.L. A survey of belief rule-base expert system. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 4944–4958. [Google Scholar] [CrossRef]
Chandrvanshi, S.; Sharma, S.; Singh, M.P.; Singh, R. Bearing fault diagnosis using machine learning models. In Proceedings of the International Conference on Micro-Electronics and Telecommunication Engineering, Ghaziabad, India, 22–23 September 2023; pp. 219–233. [Google Scholar] [CrossRef]
Peng, B.; Bi, Y.; Xue, B.; Zhang, M.; Wan, S. A survey on fault diagnosis of rolling bearings. Algorithms 2022, 15, 347. [Google Scholar] [CrossRef]
Zhang, B.; Li, F.; Ma, N.; Ji, W.; Ng, S.K. Open set bearing fault diagnosis with domain adaptive adversarial network under varying conditions. Actuators 2024, 13, 121. [Google Scholar] [CrossRef]
Bai, H.; Tong, W.; Geng, Z.; Gao, C. A rolling bearing fault diagnosis method based on an improved parallel one-dimensional convolutional neural network. PLoS ONE 2025, 20, e0327206. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Thuan, N.D.; Hong, H.S. HUST bearing: A practical dataset for ball bearing fault diagnosis. BMC Res. Notes 2023, 16, 138. [Google Scholar] [CrossRef] [PubMed]
Bian, C.; Jia, C.; Li, J.; Chen, X.; Wang, P. Rolling bearing fault diagnosis under small sample conditions based on WDCNN-BiLSTM Siamese network. Sci. Rep. 2025, 15, 29591. [Google Scholar] [CrossRef]

Figure 1. Inference process of the PS-BRB model.

Figure 2. Modified P-CMA-ES optimization algorithm.

Figure 3. Feature importance ranking based on OOB permutation error.

Figure 4. Comparison of rule weights before and after update on the SEU 30 Hz-2 V dataset.

Figure 5. Comparison of PS-BRB with other BRBs on the SEU 30 Hz-2 V dataset.

Figure 6. Confusion matrix of SS-BRB for the diagnostic results.

Figure 7. Confusion matrices for four pseudo-label strategies: (a) no filtering; (b) class consistency only; (c) JS divergence only; (d) class consistency + JS.

Figure 8. Feature importance ranking based on the OOB permutation error.

Figure 9. Comparison of rule weights before and after update: (a) original rule weight distribution. (b) updated rule weight distribution.

Figure 10. Comparison of PS-BRB with other BRBs on the HUST-65 Hz dataset.

Table 1. Channel configuration of the SEU-bearing dataset.

Channel No.	Signal Type	Measurement Direction
1	Motor vibration	overall vibration of motor
2	Planetary gearbox vibration	x-direction
3	Planetary gearbox vibration	y-direction
4	Planetary gearbox vibration	z-direction
5	Motor torque	overall vibration of motor
6	Parallel gearbox vibration	x-direction
7	Parallel gearbox vibration	y-direction
8	Parallel gearbox vibration	z-direction

Table 2. Reference values of

X_{1}

.

Table 2. Reference values of

X_{1}

.

Attribute	Attribute Weight	VS	VM	VL	VZ
$X_{1}$	1.0000	0	0.009	0.014	0.07

Table 3. Reference values of

X_{2}

.

Table 3. Reference values of

X_{2}

.

Attribute	Attribute Weight	TL	TM	TH	TE
$X_{2}$	0.1961	0.11	0.13	0.15	2.4

Table 4. Reference values for the fault diagnosis results.

Reference Points	Ball	Combination	Healthy	Inner	Outer
Reference value	0	1	2	3	4

Table 5. Updated reference values of

X_{1}

.

Table 5. Updated reference values of

X_{1}

.

Attribute	Attribute Weight	VS	VM	A1	VL	VZ
$X_{1}$	0.5157	0	0.009	0.020	0.014	0.070

Table 6. Updated reference values of

X_{2}

.

Table 6. Updated reference values of

X_{2}

.

Attribute	Attribute Weight	TL	TM	TH	TE
$X_{2}$	0.7623	0.11	0.13	0.15	2.4

Table 7. Performance comparison of PS-BRB and other BRB models on the SEU 30 Hz-2 V dataset.

Metric	Accuracy	MSE	F1-Score	Recall
BRB1	86.8%	0.1661	0.8649	0.8680
BRB2	98.7%	0.0706	0.9870	0.9871
PS-BRB	98.4%	0.1235	0.9840	0.9840

Table 8. Ablation on pseudo-label filtering: role of class consistency and JS threshold on the SEU 30 Hz-2 V dataset.

Method	Accuracy	Macro-F1	Weighted-F1	Macro-Recall
No Filtering	0.9288	0.9289	0.9289	0.9288
Class Consistency Only	0.9452	0.8998	0.9425	0.8645
JS Divergence Only	0.8796	0.8746	0.8761	0.8691
Class Consistency + JS	0.9934	0.9912	0.9934	0.9887

Table 9. JS threshold sensitivity.

JS Divergence Threshold	Accuracy	Macro-F1	Macro-Recall	Sample Size
70%	0.9693	0.9568	0.9417	1239
80%	0.9731	0.9614	0.9440	1416
90%	0.9934	0.9912	0.9887	1830

Table 10. Perturbation sensitivity.

$δ_{1}$	$δ_{2}$	Accuracy	Macro-F1	Macro-Recall	Sample Size
0.01	0.015	0.9796	0.9788	0.9713	1328
0.15	0.02	0.9388	0.9078	0.8771	1193
0.005	0.005	0.9688	0.8783	0.8462	641
0.005	0.01	0.9934	0.9912	0.9887	1830

Table 11. Reference values of Kurtosis.

Attribute	Weight	$S_{1}$	$S_{2}$	$S_{3}$	$S_{4}$	$S_{5}$
Kurtosis	0.9640	0.09	0.15	0.47	1.8	3.1

Table 12. Reference values of RMS.

Attribute	Weight	$V_{1}$	$V_{2}$	$V_{3}$	$V_{4}$	$V_{5}$
RMS	1.0000	1.5	3.8	6.3	7.5	56.7

Table 13. Reference values for fault diagnosis results.

Reference Points	Ball	Combination	Healthy	Inner	Outer
Reference value	0	1	2	3	4

Table 14. Updated reference values of Kurtosis.

Attribute	Weight	$S_{1}$	B	$S_{2}$	$S_{3}$	$S_{4}$	$S_{5}$
Kurtosis	0.8627	0.09	0.101	0.15	0.47	1.8	3.1

Table 15. Updated reference values of the RMS.

Attribute	Weight	$V_{1}$	C	$V_{2}$	$V_{3}$	$V_{4}$	$V_{5}$
RMS	0.9759	1.5	3.1	3.8	6.3	7.5	567

Table 16. Performance comparison of PS-BRB and other BRB models on the HUST-65 Hz dataset.

Model	Accuracy	MSE	Precision	F1-Score	Recall
BRB1	79.082%	0.3982	0.7908	0.7908	0.7908
BRB2	93.96%	0.2333	0.9396	0.9390	0.9391
PS-BRB	93.44%	0.2507	0.9344	0.9340	0.9344

Table 17. Ablation on pseudo-label filtering: role of class consistency and JS threshold on the HUST-65 Hz dataset.

Method	Accuracy	Macro-F1	Weighted-F1	Macro-Recall
No Filtering	0.8265	0.8217	0.8205	0.8302
Class Consistency Only	0.9289	0.9217	0.9271	0.9128
JS Divergence Only	0.8876	0.8846	0.8859	0.8818
Class Consistency + JS	0.9388	0.9350	0.9376	0.9276

Table 18. Performance comparison of different models on three datasets.

Dataset	Model	Accuracy	MSE	Precision	Recall	F1-Score
Diesel engine	BRB1	82.67%	0.1268	0.8530	0.8270	0.8250
	BRB2	93.33%	0.0628	0.9440	0.9330	0.9340
	PS-BRB	92.30%	0.0834	0.9210	0.9200	0.9200
Gearbox	BRB1	92.40%	0.0615	0.9265	0.9236	0.9240
	BRB2	99.00%	0.0166	0.9900	0.9900	0.9900
	PS-BRB	98.30%	0.0305	0.9803	0.9829	0.9830
Flywheel	BRB1	89.30%	0.0987	0.9240	0.8940	0.9170
	BRB2	93.93%	0.0642	0.9390	0.9090	0.9250
	PS-BRB	92.42%	0.0760	0.9379	0.9101	0.9170

Table 19. Comparison of different pseudo-labeling methods under ablation study.

Dataset	Method	Accuracy	Macro-F1	Weighted-F1	Macro-Recall
Diesel engine	No Filtering	0.7933	0.7912	0.7914	0.7933
	Class Consistency Only	0.8387	0.8262	0.8339	0.8432
	JS Divergence Only	0.8283	0.8203	0.8225	0.8387
	Class Consistency + JS	0.8478	0.8360	0.8409	0.8443
Gearbox	No Filtering	0.8946	0.8870	0.8870	0.8946
	Class Consistency Only	0.9498	0.9370	0.9497	0.9363
	JS Divergence Only	0.9357	0.9206	0.9347	0.9180
	Class Consistency + JS	0.9552	0.9310	0.9543	0.9268
Flywheel	No Filtering	0.2941	0.3081	0.2169	0.4866
	Class Consistency Only	0.9166	0.6972	0.8959	0.7260
	JS Divergence Only	0.9180	0.6993	0.8973	0.7260
	Class Consistency + JS	0.9253	0.7015	0.9068	0.7303

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, Z.; Hu, G.; He, W.; Zhao, M.; Du, H. A Perturbation-Based Self-Training Method to Enhance Belief Rule Base Learning for Fault Diagnosis. Actuators 2025, 14, 473. https://doi.org/10.3390/act14100473

AMA Style

Fan Z, Hu G, He W, Zhao M, Du H. A Perturbation-Based Self-Training Method to Enhance Belief Rule Base Learning for Fault Diagnosis. Actuators. 2025; 14(10):473. https://doi.org/10.3390/act14100473

Chicago/Turabian Style

Fan, Zhiying, Guanyu Hu, Wei He, Motong Zhao, and Hongyao Du. 2025. "A Perturbation-Based Self-Training Method to Enhance Belief Rule Base Learning for Fault Diagnosis" Actuators 14, no. 10: 473. https://doi.org/10.3390/act14100473

APA Style

Fan, Z., Hu, G., He, W., Zhao, M., & Du, H. (2025). A Perturbation-Based Self-Training Method to Enhance Belief Rule Base Learning for Fault Diagnosis. Actuators, 14(10), 473. https://doi.org/10.3390/act14100473

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Perturbation-Based Self-Training Method to Enhance Belief Rule Base Learning for Fault Diagnosis

Abstract

1. Introduction

2. Preliminary Knowledge and Problem Description

2.1. Preliminary Knowledge

2.2. Problem Description

3. A PS-BRB-Based Fault Diagnosis Method for Complex Systems

3.1. Description of the PS-BRB

3.2. Theoretical Foundations of PS-BRB

3.3. High-Quality Pseudo-Label Filtering Mechanism

3.4. Model Optimization

3.5. Computational Complexity and Scalability Analysis

4. Case Study

4.1. Case 1: Bearing Fault Diagnosis Based on the PS-BRB (SEU 30 Hz-2 V)

4.1.1. Background Description and Experimental Parameter Settings

4.1.2. Experimental Results Analysis

4.2. Case 2: Fault Diagnosis of a Bearing Based on the PS-BRB (HUST 65 Hz)

4.2.1. Background Description and Experimental Parameter Settings

4.2.2. Experimental Results Analysis

4.3. Generalization Analysis

4.4. Summary of Experiments

5. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI