Fault Diagnosis for Complex Equipment Based on Belief Rule Base with Adaptive Nonlinear Membership Function

Fault diagnosis of complex equipment has become a hot field in recent years. Due to excellent uncertainty processing capability and small sample problem modeling capability, belief rule base (BRB) has been widely used in the fault diagnosis. However, previous BRB models almost did not consider the diverse distributions of observation data which may reduce diagnostic accuracy. In this paper, a new fault diagnosis model based on BRB is proposed. Considering that the previous triangular membership function cannot address the diverse distribution of observation data, a new nonlinear membership function is proposed to transform the input information. Then, since the model parameters initially determined by experts are inaccurate, a new parameter optimization model with the parameters of the nonlinear membership function is proposed and driven by the gradient descent method to prevent the expert knowledge from being destroyed. A fault diagnosis case of laser gyro is used to verify the validity of the proposed model. In the case study, the diagnosis accuracy of the new BRB-based fault diagnosis model reached 95.56%, which shows better fault diagnosis performance than other methods.


Introduction
With the rapid development of industrial technology, the fault diagnosis of complex equipment has received extensive attention and become a hot topic in Prognostics Health Management (PHM) [1][2][3]. At present, due to the complexity and intelligence of complex equipment in industrial production and national defense science and industry, it is tough to diagnose their faults through the appearance of the equipment. Therefore, establishing a fault diagnosis model is an effective method.
Current popular fault diagnosis models can be divided into three categories: mechanismbased models, knowledge-based models and data-driven models [4]. (1) Mechanism-based model: such models require that the mechanism of complex equipment can be clearly understood and corresponding mathematical or physical models can be established, such as the Kalman filter [5] and Kirchhoff law [6]. However, with the increasing complexity of equipment, its mechanism is difficult to grasp and such models are rarely used. (2) Knowledge-based model: this kind of model is established by the qualitative knowledge of domain experts, but it is generally difficult to achieve high modeling accuracy such as fault tree analysis (FTA) [7] and analytic hierarchy process (AHP) [8]. Since such models cannot learn by themselves, when the equipment is running, it cannot update knowledge and can only be modified by experts.
(3) Data-driven model: with the advent of the big data era, data-driven fault diagnosis models have attracted a lot of attention. This kind of model does not need to master the mechanics of the equipment and can achieve modeling of the input and output relationship through the observation data. Simultaneously, they also have good learning abilities and can constantly be updated by data, such as with the support vector machines [9], decision trees [10] and deep learning algorithms [11]. However, in many fault diagnosis fields, the high-value fault random initialization of the population. This will cause BRB to lose its advantage in fault diagnosis. However, the gradient descent method searches from the parameters initially determined by experts, which allows retention of expert knowledge to the greatest extent. Therefore, a new optimization model based on the gradient descent method, which can further improve the modeling accuracy while maintaining expert knowledge, is developed. Therefore, the contributions of this paper are the following: (1) A new BRB considering nonlinear membership function, which can adaptively deal with the non-uniform distribution of observation data in fault diagnosis, is proposed.
(2) For the new BRB model, a new optimization model based on the gradient descent method is proposed to improve the accuracy of fault diagnosis and keep expert knowledge from being destroyed.
The structure of this paper is as follows: In Section 2, the problem and basic knowledge are introduced. In Section 3, a new BRB-based fault diagnosis model with an adaptive membership function is proposed. In Section 4, a fault diagnosis case of the laser gyro is used to verify the validity of the proposed model. This paper is summarized in Section 5.

Problem Description
Faced with the characteristics of few high-value fault samples and existing expert knowledge in the fault diagnosis for the complex equipment, this paper mainly solves the following three problems: Problem 1: How to use a small amount of test data and existing expert knowledge to establish the fault diagnosis model. Due to the working characteristics of most complex equipment, the capacity of its observation data is very limited. Therefore, the data-driven fault diagnosis model is prone to overfitting since it cannot reflect all fault modes. On the other hand, in the long-term operation of the equipment, experts in the field have accumulated abundant expert knowledge that can be used to help judge the fault mode of the complex equipment. Therefore, the following mapping relationships need to be established: where [x 1 , x 2 , . . . , x M ] represent M fault indicators. Y is the fault mode to be diagnosed. R(·) is the mapping function of the fault diagnosis model. EK is the expert knowledge. Problem 2: For Problem 1, a belief rule-based fault diagnosis model will be proposed in Section 2.2. In engineering, observation data usually do not obey the uniform distribution. However, the triangular membership function in the previous BRB-based fault diagnosis model cannot accurately convert the non-uniformly distributed data, which will lead to poor fault diagnosis accuracy. Therefore, a new fuzzy membership function is needed to reflect the influence of uneven data on information transformation. Problem 3: As an expert system, the parameters in the membership function of BRB are generally determined intuitively by experts according to domain knowledge and the distribution characteristics of observation data. However, due to the subjectivity and fuzziness of expert knowledge, these parameters cannot be given accurately, leading to a decrease in the modeling accuracy. Therefore, it is necessary to further improve the accuracy of fault diagnosis through parameter optimization.

Belief Rule-Based Fault Diagnosis Model
As a generalized ML-type fuzzy system, BRB is composed of a series of "if-then" belief rules, as shown below: , with a rule weight θ k (k = 1, 2 . . . , L) and attibute weight δ i (i = 1, 2 . . . , T k ) (2)  where [x 1 , x 2 , . . . , x M ] is the input vector and consists of m dimensions components. In fault diagnosis, x i is the ith fault indicator. H k i (i = 1, 2, . . . , T k ) is the referential value of the U i (i = 1, 2, . . . , T k ) attribute in the kth rule. It is generally determined by experts according to industry standards or observation data. L is the number of rules. D n (n = 1, 2, . . . , N) represents N possible faults. β n,k (n = 1, 2, . . . , N) represents the belief degree of the nth fault D n , reflecting the support of the kth rule for this consequence. θ k is the rule weight of the kth rule, which represents the relative importance of each rule. δ i is the weight of the ith attribute, which reflects the relative importance of fault indicators.
Benefited from the knowledge representation based on the belief rule, BRB has the following advantages in fault diagnosis: (1) Expert knowledge. Due to the natural advantages of language models, BRB uses rules to represent the nonlinear mapping relationship between fault indicators and fault modes, enabling experts and users to easily understand the behavior of the model. Therefore, compared with neural networks, support vector machines and decision tree models, the initial parameters of BRB can be determined by experts based on domain knowledge and existing observation data and the model is able to roughly reflect the mapping relationship of the system.
(2) Small sample modeling ability. The fault data of much equipment are characterized by high values and a small number of samples. Fortunately, due to the embedding of expert knowledge, BRB can model the system comprehensively with very limited observation data, even if the initial mapping relationship is rough. This means that BRB will not fall into the overfitting problem like the data-driven model. On the other hand, Chen et al. proved that BRB is a general approximation model [29], so this model has ideal modeling accuracy.
(3) Ability to process uncertain information. Compared with general fuzzy systems, BRB extends the "then" part of the rule to the belief distribution of all possible results, enabling BRB to deal with the probability uncertainty while solving the fuzzy uncertainty. Therefore, BRB can also show good performance when partial observation data are missing in the engineering application.

The Proposed Method
In this section, a BRB model with an adaptive nonlinear membership function is proposed to model the fault diagnosis considering non-uniform distribution observation data. In Section 3.1, the shortcomings of the existing triangular membership function and Gaussian membership function are analyzed in detail. Then, a nonlinear membership function is proposed to solve Problem 2. In Section 3.2, based on the gradient descent method, an optimization model considering the parameters of the nonlinear membership function is proposed to solve Problem 3.

Inference Process Based on the Nonlinear Membership Function
In fault diagnosis, all belief rules constitute a knowledge base. After the input information of the fault indicator is obtained, the fault mode can be diagnosed based on this knowledge base. It is worth noting that, as a fuzzy system, belief rules in BRB are expressed as mappings to linguistic values. However, the observation data of fault indicators are mostly quantitative information. Therefore, it is necessary to convert quantitative observation data into membership degree of all linguistic referential grades through a membership function, which is so-called "fuzzification" as follows: where H ij is the jth referential grade of the ith fault indicator and α ij is the corresponding membership degree. x i are the quantitative observation data. In previous BRB models, the most commonly applied membership function is the triangular membership function, which is used in rule (or utility) based transformation methods, as shown below: where α ij is a quantitative value corresponding to H ij , which is usually determined by experts. The curve of the above membership function is shown in Figure 1. It can be seen from Figure 1 and Equation (4) that, since the derivative of this function is constant, the changing trend of membership of each referential grade is a straight line. However, when the observation data are uneven, such as when the data are concentrated in a certain area, this membership function cannot accurately reflect the corresponding membership, as shown in Figure 2. It can be seen from Figure 2a that the observation data are distributed evenly between the two referential grades. Therefore, it is easy to understand that the change in membership degree is linear in this case. But, in Figure 2b, the observation data are concentrated near the referential grade H n+1 . Therefore, for the two points marked by the red dotted line, their membership degree distribution should be different. The membership of the yellow point is assigned as H n = H n+1 = 0.5. For the green point in Figure 2b, since this point is closer to H n in the whole dataset, the membership degree of H n shall be greater than 0.5 and the membership degree of H n+1 shall be less than 0.5, correspondingly.  The inaccurate quantitative data fuzzification will reduce the modeling accuracy of the fault diagnosis model. Thus, a nonlinear membership function is proposed for the fuzzification of uneven quantitative data in this paper, which can be described as follows:  The inaccurate quantitative data fuzzification will reduce the modeling accu the fault diagnosis model. Thus, a nonlinear membership function is proposed fuzzification of uneven quantitative data in this paper, which can be described as f For example, the two referential grades in Figure 2 correspond to the semantic values "low" and "high", respectively. For the point marked in red in Figure 2b, it should be a lower value in the entire dataset. Therefore, the membership degree of the referential grade "low" should be higher. The inaccurate quantitative data fuzzification will reduce the modeling accuracy of the fault diagnosis model. Thus, a nonlinear membership function is proposed for the fuzzification of uneven quantitative data in this paper, which can be described as follows: where s ∈ (0, +∞) is the parameter of the function, which can reflect the distribution of observation data.
With the change of s, the new membership function can adaptively reflect the impact of different distributions of data. For example, when s is 0.25, 0.5, 1,2 and 5, respectively, the curves of the membership function are shown in Figure 3. It can be seen that with the increase of n, the function changes from convex to concave. In particular, when s equals 1, the nonlinear membership function degenerates into a triangular membership function. Correspondingly, for the distribution of data in Figure 2b, the nonlinear membership function at s = 0.25 or 0.5 can more accurately conduct the fuzzification of quantitative data in this case. It is worth noting that, as another commonly used membership function in fuzzy systems, the Gaussian membership function can also realize adaptive fuzzification of input data through changes in expectation and standard deviation, as shown below: where ij c is the expectation and ij σ is the standard deviation.
However, it has the following two shortcomings: firstly, the Gaussian membership function cannot achieve accurate transformation of uniformly distributed input information, which is limited by the characteristics of its nonlinear curve. However, when 1 s = , the nonlinear membership function proposed in this paper can avoid this problem. Secondly, the adaptive ability of the Gaussian membership function is insufficient. For the data distribution within an interval, the Gaussian membership function can only selfadapt the data distribution under partial circumstances, as shown in Figure 4. The standard variance of the Gaussian membership function is 0.25, 0.5, 1, 2 and 5, respectively, in Figure 4. With the increase of variance, the shape of the exponential function curve cannot properly reflect the distribution characteristics of the dataset close to  It is worth noting that, as another commonly used membership function in fuzzy systems, the Gaussian membership function can also realize adaptive fuzzification of input data through changes in expectation and standard deviation, as shown below: where c ij is the expectation and σ ij is the standard deviation. However, it has the following two shortcomings: firstly, the Gaussian membership function cannot achieve accurate transformation of uniformly distributed input information, which is limited by the characteristics of its nonlinear curve. However, when s = 1, the nonlinear membership function proposed in this paper can avoid this problem. Secondly, the adaptive ability of the Gaussian membership function is insufficient. For the data distribution within an interval, the Gaussian membership function can only self-adapt the data distribution under partial circumstances, as shown in Figure 4. The standard variance of the Gaussian membership function is 0.25, 0.5, 1, 2 and 5, respectively, in Figure 4. With the increase of variance, the shape of the exponential function curve cannot properly reflect the distribution characteristics of the dataset close to H n+1 . Furthermore, when the input is at H n+1 , the membership degree of H n is still quite high, which is difficult for users to understand. Therefore, the Gaussian membership function is only applicable when the observation data are concentrated near the referential grade H n . Based on the above analysis, it can be seen that the nonlinear membership function proposed in this paper can more accurately reflect the distributions of data. ( 1) ( 1) ( 1) where i x are the input data. ij s is the parameter of the nonlinear membership function, which is usually determined by experts after observing the distribution of data or calculated based on statistical methods. Step 2: Activation of belief rules. The activation weight of the rule is calculated as follows: Step 3: Reasoning of activated rules. In this paper, the analytic ER algorithm [30] is used to fuse the activated rules to obtain the belief degree of each failure mode as follows: where ˆn β represents the belief degree of the n th failure mode n D . Therefore, based on the nonlinear membership function, when observation data are obtained, the steps for fault diagnosis can be described as follows: Step 1: Fuzzification of quantitative data. The referential grade of each fault indicator is a fuzzy partition, which is assigned to the nonlinear membership function R(·) ij . For the observation data of ith fault indicator, the membership degree of each referential grade is calculated as follows: where x i are the input data. s ij is the parameter of the nonlinear membership function, which is usually determined by experts after observing the distribution of data or calculated based on statistical methods.
Step 2: Activation of belief rules. The activation weight of the rule is calculated as follows: where Step 3: Reasoning of activated rules. In this paper, the analytic ER algorithm [30] is used to fuse the activated rules to obtain the belief degree of each failure mode as follows: whereβ n represents the belief degree of the nth failure mode D n . In general, the failure mode with the highest belief degree is regarded as a possible failure as the output of the model as follows: wheren indicates the diagnosed fault mode.

Model Optimization Based on the Gradient Descent Method
Due to the subjectivity and fuzziness of expert knowledge, the modeling accuracy of the initially constructed fault diagnosis model is generally difficult to meet the requirements of practical engineering. Therefore, the model parameters initially determined by experts in the BRB need to be optimized to improve the diagnostic accuracy of the model. In general, for classification problems such as fault diagnosis, the cross-entropy loss function is used as the objective function as follows: whereŷ ∈ {1, 2 . . . , N} indicates the category of the real fault. T is the capacity of observation data. Ω = {θ, δ, β, s} is a parameter vector, which is composed of rule weight, attribute weight, basic belief degree and parameters of the membership function.
Considering the constraint conditions of parameters in BRB model, the following parameter optimization model can be constructed: . . , L, i = 1, 2, . . . , M, n = 1, 2, . . . , N, j = 1, 2, . . . , J i ) (13) In recent years, many optimization algorithms have been developed for BRB model parameter optimization, such as DE, PSO, P-CMAES and other swarm intelligence algorithms. Yang et al. [31] pointed out that when BRB is used as an expert system, the optimization of model parameters should only be "fine-tuning", which is also a major difference between BRB and artificial neural networks. Feng et al. [32] pointed out that due to the operation of population initialization of swarm intelligence algorithm, the expert knowledge in BRB is likely to be destroyed and the reasoning results may conflict with intuition. This may cause the fault diagnosis results to be difficult to be convincing and weaken the interpretability of the model. Compared with the swarm intelligence algorithm, the gradient descent method directly uses derivative information and takes the parameters initially determined by experts as the initial value of optimization to search, retaining initial expert knowledge to the greatest extent. Therefore, in this paper, stemming from the derivability of the BRB reasoning process, an optimization algorithm based on gradient descent is used to train the model.
There are 4 types of parameters as optimization variables. Therefore, it is necessary to calculate the first-order partial derivative of the objective function with respect to them.
First, the first-order partial derivative of the objective function Q with respect to the reasoning resultβ n is calculated as follows: Entropy 2023, 25, 442 9 of 19 The first-order partial derivative of the reasoning resultβ n with respect to the basic belief degree β r, f is: So far, the first-order partial derivative of the first type of parameter has been calculated as follows: Then, we need to calculate the first-order partial derivative of rule weight, attribute weight and parameters of the membership function. According to the chain rule, the first-order partial derivative of the reasoning resultβ n with respect to the activation weight w g needs to be obtained as follows: The first derivative of the activation weight w t with respect to the rule weight θ f is calculated as follows: For the attribute weight δ i , the normalization of this parameter in Equation (9) is nondifferentiable. Therefore, only the first order partial derivative of the normalized attribute weight δ i can be calculated here. First, the first derivative of the activation weight w t with respect to the rule matching degree α f needs to be calculated: Then, the first derivative of rule matching degree α f with respect to normalized attribute weight δ i is calculated as follows: Therefore, according to the chain rule, the partial derivative of the objective function with respect to the rule weight and the normalized attribute weight can be calculated as follows: Finally, the first order partial derivative of the individual membership degree α k i with respect to the parameters of the membership degree function s ij is calculated as follows: According to the chain rule, the partial derivative of the objective function with respect to the parameters of the membership function is calculated as follows: Therefore, the gradient vector of the optimization variable can be obtained as follows: Since each parameter in the optimization model has corresponding constraints, they should be approximated to meet the optimization constraints after the parameter is updated based on the gradient. Thus, the steps of parameter optimization can be summarized as follows: Step 1: The model parameters initially given by experts: basic belief degree, rule weight, attribute weight and parameters of membership function are taken as the initial value z k = z 0 .
Step 2: Calculate the gradient of the optimization variable d k .
Step 3: The optimization variables are updated as follows: where λ is the step size of iteration and is determined by the one-dimensional search method.
Step 4: Approximate projection operation. For inequality constraints of each parameter, when the value of the parameter does not meet the constraint conditions, take the adjacent bound as the approximate value. For example, if θ k+1 < 0, then θ k+1 = 0. Moreover, the basic belief degree of a belief rule is normalized so that the sum is 1. Therefore, z k+1 is obtained.
Step 5: Calculate the gradient vector d k+1 at this time. Judge whether the termination condition of the algorithm is reached. If yes, end. Otherwise, let d k = d k+1 , z k = z k+1 and go to Step 3.
Finally, the fault diagnosis model proposed in this paper can be summarized as shown in Figure 5.
Fine tune Figure 5. The whole process of the proposed model.

Case Study
A fault diagnosis case of a laser gyro will be used in this section to verify the effec tiveness of the proposed model. In Section 4.1, the background of laser gyro fault diagno sis is briefly introduced. In Section 4.2, the BRB-based fault diagnosis model is built an optimized. In Section 4.3, a comparative study between the proposed model and othe models is conducted. Analysis and discussions are carried out in Section 4.4.

Background Description
As an important navigation device, the laser gyro plays an extremely important rol in many fields, such as automobiles, ships, rockets, etc. However, in the storage proces of the laser gyro, due to the inevitable external interference and its own performance deg radation, it is very likely to be in the fault state. Once these laser gyros are used in th failure state, it may cause unbearable personnel and property losses. For the laser gyro, i is difficult to judge whether it is in the fault state from appearance. However, the obser vation data of its drift coefficient can reflect the degree of the fault. In general, the greate

Case Study
A fault diagnosis case of a laser gyro will be used in this section to verify the effectiveness of the proposed model. In Section 4.1, the background of laser gyro fault diagnosis is briefly introduced. In Section 4.2, the BRB-based fault diagnosis model is built and optimized. In Section 4.3, a comparative study between the proposed model and other models is conducted. Analysis and discussions are carried out in Section 4.4.

Background Description
As an important navigation device, the laser gyro plays an extremely important role in many fields, such as automobiles, ships, rockets, etc. However, in the storage process of the laser gyro, due to the inevitable external interference and its own performance degradation, it is very likely to be in the fault state. Once these laser gyros are used in the failure state, it may cause unbearable personnel and property losses. For the laser gyro, it is difficult to judge whether it is in the fault state from appearance. However, the observation data of its drift coefficient can reflect the degree of the fault. In general, the greater the drift coefficient, the higher the degree of the fault. Therefore, in this paper, the zero-order term drift coefficient D 0 , the first-order term drift coefficient D 1 and the second-order term drift coefficient D 2 are used as indicators of laser gyro fault diagnosis. For a certain laser gyro, 180 groups of observation data within a storage period are shown in Figure 6. According to the fault degree of laser gyro and the industry standard, the three fault modes are, respectively, slight fault (S), moderate fault (M) and bad fault (B). This is shown in Figure 6.

Construction and Optimization of the Fault Diagnosis Model
In this case study, the three drift coefficients of the laser gyro are fault indicators which are used to diagnose three types of fault modes, namely slight fault (S), moderate fault (M) and bad fault (B). Therefore, the following BRB can be established: : IF ,THEN , , , , According to expert knowledge and industry standards, each fault indicator has three referential grades, namely low (L), medium (M) and high (H), whose corresponding referential values are shown in Table 1. Thus, the "then" part of the rules in the initial BRB is shown in Table 2. All rule weights are initially set to one. Since 0 D can most obviously reflect the degree of fault and 1 D takes the second place, the attribute weights are set to  Table 3.  No.

Construction and Optimization of the Fault Diagnosis Model
In this case study, the three drift coefficients of the laser gyro are fault indicators which are used to diagnose three types of fault modes, namely slight fault (S), moderate fault (M) and bad fault (B). Therefore, the following BRB can be established: According to expert knowledge and industry standards, each fault indicator has three referential grades, namely low (L), medium (M) and high (H), whose corresponding referential values are shown in Table 1. Thus, the "then" part of the rules in the initial BRB is shown in Table 2. All rule weights are initially set to one. Since D 0 can most obviously reflect the degree of fault and D 1 takes the second place, the attribute weights are set to δ 1 = 1, δ 2 = 0.7 and δ 3 = 0.5. According to the distribution of the observation data, the initial parameters of the nonlinear membership function are shown in Table 3.   Table 3. Initial parameters of the nonlinear membership function.
Since it is difficult for the initially constructed BRB to achieve ideal modeling accuracy, observation data are required to optimize the model. For 180 groups of observation data, 30% are randomly selected as the training set and the rest as the test set to reflect the modeling ability of BRB in small sample problems. The gradient descent method in Section 4.3 is used as the optimization engine. The optimized model parameters are shown in Table 4. In addition, optimized parameters of the nonlinear membership function are shown in Table 5. The fault diagnosis results of optimized BRB and initial BRB are shown in Figure 7. racy, observation data are required to optimize the model. For 180 groups of observation data, 30% are randomly selected as the training set and the rest as the test set to reflect the modeling ability of BRB in small sample problems. The gradient descent method in Section 4.3 is used as the optimization engine. The optimized model parameters are shown in Table 4. In addition, optimized parameters of the nonlinear membership function are shown in Table 5. The fault diagnosis results of optimized BRB and initial BRB are shown in Figure 7.

Comparative Study
In order to fully verify the effectiveness of the model in this paper, comparative experiments are carried out in this section from the following two aspects, namely, the previous BRB model and data-driven models.

a.
The previous BRB model (1) The BRB model with triangular membership function, named BRB-t: the input information transformation function of this BRB adopts the triangular membership function in Equation (4). BRB-tri also uses the gradient descent method to optimize model parameters.
(2) The BRB model with Gaussian membership function, is named BRB-g: the optimization method of this model is the same as BRB-t.
Correspondingly, the BRB proposed in this paper is named BRB-n. The fault diagnosis results of three BRBs on 180 sets of observation data are shown in Figure 8. The accuracy of test data is shown in Table 6.

Comparative Study
In order to fully verify the effectiveness of the model in this paper, comparative experiments are carried out in this section from the following two aspects, namely, the previous BRB model and data-driven models. a. The previous BRB model (1) The BRB model with triangular membership function, named BRB-t: the input information transformation function of this BRB adopts the triangular membership function in Equation (4). BRB-tri also uses the gradient descent method to optimize model parameters.
(2) The BRB model with Gaussian membership function, is named BRB-g: the optimization method of this model is the same as BRB-t.
Correspondingly, the BRB proposed in this paper is named BRB-n. The fault diagnosis results of three BRBs on 180 sets of observation data are shown in Figure 8. The accuracy of test data is shown in Table 6.     b. Data-driven models Data-driven fault diagnosis methods have been widely used. In this paper, Random forest (RF), Naive Bayes (NB) and K-nearest neighbor (KNN) models are used for comparative study.
(1) RF: RF is a type of powerful tree ensemble model [33]. Its basic model is the decision tree (DT). Compared with general DT, RF has a stronger generalization ability, so it is widely used in classification and regression problems.
(2) NB: NB is a nonparametric model based on the Bayesian theorem [34]. This model has no explicit learning process. Generally, it calculates the prior probability and the conditional probability directly from the training set and infers a posteriori probability.
(3) KNN: KNN is a lazy machine learning model [35], which means that it has no model training process. For the data to be predicted, the training data closest to this data are first obtained according to the defined distance formula. Then, their weighted averages or votes are calculated.
The hyper-parameters of these models are the default settings in the Python "sklearn" library. They are then adjusted by the "GridSearchCV" function. Their diagnostic results and accuracy are shown in Figure 9 and Table 7, respectively. b. Data-driven models Data-driven fault diagnosis methods have been widely used. In this paper, Random forest (RF), Naive Bayes (NB) and K-nearest neighbor (KNN) models are used for comparative study.
(1) RF: RF is a type of powerful tree ensemble model [33]. Its basic model is the decision tree (DT). Compared with general DT, RF has a stronger generalization ability, so it is widely used in classification and regression problems.
(2) NB: NB is a nonparametric model based on the Bayesian theorem [34]. This model has no explicit learning process. Generally, it calculates the prior probability and the conditional probability directly from the training set and infers a posteriori probability.
(3) KNN: KNN is a lazy machine learning model [35], which means that it has no model training process. For the data to be predicted, the training data closest to this data are first obtained according to the defined distance formula. Then, their weighted averages or votes are calculated.
The hyper-parameters of these models are the default settings in the Python "sklearn" library. They are then adjusted by the "GridSearchCV" function. Their diagnostic results and accuracy are shown in Figure 9 and Table 7, respectively.  To illustrate the advantages of the gradient-based optimization algorithm proposed in this paper, three swarm intelligence algorithms, that is, DE, PSO and P-CMAES, are used to optimize the initial BRB model. Their parameter settings are the same as those in [23][24][25]. The optimized BRBs are named BRB-DE, BRB-PSO and BRB-PCMAES, respectively. For comparison, the BRB optimized by the proposed method is named BRB-GB. The accuracy of fault diagnosis of these models is shown in Table 8.   c.

Swarm intelligence algorithms
To illustrate the advantages of the gradient-based optimization algorithm proposed in this paper, three swarm intelligence algorithms, that is, DE, PSO and P-CMAES, are used to optimize the initial BRB model. Their parameter settings are the same as those in [23][24][25]. The optimized BRBs are named BRB-DE, BRB-PSO and BRB-PCMAES, respectively. For comparison, the BRB optimized by the proposed method is named BRB-GB. The accuracy of fault diagnosis of these models is shown in Table 8.

Analysis and Discussions
Firstly, the advantages of the BRB-based fault diagnosis model are described by comparing it with data-driven models. In this paper, 30% of samples of the dataset are selected as the training set to train a wide variety of fault diagnosis models. Under this circumstance, the drawbacks of the data-driven model are revealed. Since the training set may not comprehensively reflect the overall mapping relationship, these models almost fall into the problem of overfitting. Among them, with the advantage of a double random sampling of features and samples, the random forest model alleviates the problem to some extent and has the highest diagnostic accuracy among several data-driven models. By comparing Tables 5 and 6, it can be seen that after model optimization, the fault diagnosis accuracy of several BRBs are higher than that of data-driven models. Among them, the diagnostic accuracy of BRB-n has been improved by 12.8%, 25.24% and 31.07%, respectively. This reveals the advantage of expert knowledge in small sample modeling; that is, experts construct a generally correct but rough model by virtue of domain knowledge and experience. Then, the diagnosis accuracy of the model is further improved through the existing small sample dataset. On the other hand, as a "white-box" model, the BRB model provides explicit knowledge representation and reasoning compared with datadriven methods, enabling the results of fault diagnosis to be traceable and transparent. In order to further illustrate the interpretability of BRB, the fault diagnosis process of the first observation data [−7.94 × 10 −4 6.82 × 10 −2 2.38 × 10 −2 ] is shown in Figure 10.

Analysis and Discussions
Firstly, the advantages of the BRB-based fault diagnosis model are described by comparing it with data-driven models. In this paper, 30% of samples of the dataset are selected as the training set to train a wide variety of fault diagnosis models. Under this circumstance, the drawbacks of the data-driven model are revealed. Since the training set may not comprehensively reflect the overall mapping relationship, these models almost fall into the problem of overfitting. Among them, with the advantage of a double random sampling of features and samples, the random forest model alleviates the problem to some extent and has the highest diagnostic accuracy among several data-driven models. By comparing Tables 5 and 6, it can be seen that after model optimization, the fault diagnosis accuracy of several BRBs are higher than that of data-driven models. Among them, the diagnostic accuracy of BRB-n has been improved by 12.8%, 25.24% and 31.07%, respectively. This reveals the advantage of expert knowledge in small sample modeling; that is, experts construct a generally correct but rough model by virtue of domain knowledge and experience. Then, the diagnosis accuracy of the model is further improved through the existing small sample dataset. On the other hand, as a "white-box" model, the BRB model provides explicit knowledge representation and reasoning compared with data-driven methods, enabling the results of fault diagnosis to be traceable and transparent. In order to further illustrate the interpretability of BRB, the fault diagnosis process of the first observation data [−7.94 × 10 −4 6.82 × 10 −2 2.38 × 10 −2 ] is shown in Figure 10. Secondly, the advantages of the gradient-based optimization method are analyzed. On the one hand, the gradient-based optimization algorithm shows powerful model training ability. Compared with the other three swarm intelligence algorithms, the modeling accuracy of BRB-GB has improved by 7.87%, 6.44% and 2.95%, respectively. On the other hand, the proposed optimization algorithm can keep expert knowledge from being destroyed, which is reflected in that the optimized model parameters will not be significantly changed, but rather, fine-tuned. For simplicity, the belief degrees of the first rule in four BRBs are shown in Table 9. Table 9 shows that, due to random initialization, the distribution of belief degree of the BRB optimized by the swarm intelligence algorithm has seriously deviated from the initial judgment of experts, even if they can achieve good fault diagnosis accuracy. This will make the rules difficult to understand; when the referential value of the evaluation indicator is low, the fault degree is generally "slight" according to common sense and experience. When these BRBs are used for fault diagnosis, the interpretability of the diagnosis results will be weakened. Secondly, the advantages of the gradient-based optimization method are analyzed. On the one hand, the gradient-based optimization algorithm shows powerful model training ability. Compared with the other three swarm intelligence algorithms, the modeling accuracy of BRB-GB has improved by 7.87%, 6.44% and 2.95%, respectively. On the other hand, the proposed optimization algorithm can keep expert knowledge from being destroyed, which is reflected in that the optimized model parameters will not be significantly changed, but rather, fine-tuned. For simplicity, the belief degrees of the first rule in four BRBs are shown in Table 9. Table 9 shows that, due to random initialization, the distribution of belief degree of the BRB optimized by the swarm intelligence algorithm has seriously deviated from the initial judgment of experts, even if they can achieve good fault diagnosis accuracy. This will make the rules difficult to understand; when the referential value of the evaluation indicator is low, the fault degree is generally "slight" according to common sense and experience. When these BRBs are used for fault diagnosis, the interpretability of the diagnosis results will be weakened. Finally, the importance of the nonlinear membership function is analyzed. First, the uneven distribution of observation data will affect the accuracy of the fault diagnosis model. It can be seen from Table 6 that the diagnosis accuracy of BRB is improved after considering the adaptive membership function. This is easy to understand since the original triangular membership function cannot reflect the diversity of different data distributions, leading to errors in the fuzzification of input information. Therefore, it is necessary to consider the adaptive membership function when building a fault diagnosis model based on BRB. Then, compared with the other two BRBs, the diagnostic accuracy of BRB-n is increased by 5.31% and 9.57%, respectively. It is worth noting that the Gaussian membership function has more adjustment parameters than the nonlinear membership function. For BRB-g and BRB-n in this paper, the optimization parameters are 123 and 117, respectively. With the increase of the number of the referential grade, the difference in the number of parameters will continue to expand. However, the nonlinear membership function can show better performance. This is because, compared with the Gaussian membership function, the nonlinear membership function can adapt to a wider range of data distributions and can more precisely conduct the fuzzification of quantitative data.
In order to further verify the role of information transformation of nonlinear membership function, part of the observation data of D 2 is shown in Figure 11. The observation data are concentrated on H 32 in the interval [a 32 , a 33 ]. Therefore, in the entire dataset, the red-marked areas are more subordinate in H 33 . In other words, this point should belong to the referential grade "high" in the entire dataset to a greater extent. According to the optimized BRB reasoning process, the membership degrees of H 32 and H 33 are 0.18 and 0.82, respectively, which is consistent with the real distribution of the dataset. Therefore, when the nonlinear membership function is used to transform the input information, BRB can achieve a more ideal modeling accuracy.  Finally, the importance of the nonlinear membership function is analyzed. First, the uneven distribution of observation data will affect the accuracy of the fault diagnosis model. It can be seen from Table 6 that the diagnosis accuracy of BRB is improved after considering the adaptive membership function. This is easy to understand since the original triangular membership function cannot reflect the diversity of different data distributions, leading to errors in the fuzzification of input information. Therefore, it is necessary to consider the adaptive membership function when building a fault diagnosis model based on BRB. Then, compared with the other two BRBs, the diagnostic accuracy of BRBn is increased by 5.31% and 9.57%, respectively. It is worth noting that the Gaussian membership function has more adjustment parameters than the nonlinear membership function. For BRB-g and BRB-n in this paper, the optimization parameters are 123 and 117, respectively. With the increase of the number of the referential grade, the difference in the number of parameters will continue to expand. However, the nonlinear membership function can show better performance. This is because, compared with the Gaussian membership function, the nonlinear membership function can adapt to a wider range of data distributions and can more precisely conduct the fuzzification of quantitative data.
In order to further verify the role of information transformation of nonlinear membership function, part of the observation data of 2 D is shown in Figure 11. The observation data are concentrated on 32 H in the interval 32 33 [ , ] a a . Therefore, in the entire dataset, the red-marked areas are more subordinate in 33 H . In other words, this point should belong to the referential grade "high" in the entire dataset to a greater extent. According to the optimized BRB reasoning process, the membership degrees of 32 H and 33 H are 0.18 and 0.82, respectively, which is consistent with the real distribution of the dataset. Therefore, when the nonlinear membership function is used to transform the input information, BRB can achieve a more ideal modeling accuracy.  Figure 11. Membership degree of data with non-uniform distribution. Figure 11. Membership degree of data with non-uniform distribution.

Conclusions
In this paper, a new fault diagnosis model based on BRB has been proposed. In order to address the distribution of different data, an adaptive nonlinear membership function has been proposed to conduct the fuzzification of quantitative data. Since the parameters of the membership function initially determined by experts may not be accurate in the new BRB model, a new parameter optimization model considering the parameters of the membership function has been proposed with the aid of by the gradient descent algorithm. Finally, the proposed model is verified by a laser gyro fault diagnosis case.
In summary, the proposed method has two advantages: firstly, in the transformation of input information, the limitations of the triangular membership function in the fuzzification of non-uniformly distributed observation data are considered for the first time and an adaptive nonlinear membership function is designed. This function can adapt to the distribution of various data and improve the accuracy of information transformation. Secondly, considering the subjectivity and ignorance of experts in determining the parameters of the model, the parameters of the membership function are added to the optimization model; the gradient descent method is used to optimize the fault diagnosis model, enabling expert knowledge to not be destroyed and improving modeling accuracy. Furthermore, the model optimization of BRB is a non-convex optimization problem, which means that the traditional gradient descent method may fall into the local optimal value. Therefore, it is interesting to get higher modeling accuracy by jumping out of the local optimal value. This issue will be considered in the future.