An Improved Approach of Incomplete Information Fusion and Its Application in Sensor Data-Based Fault Diagnosis

: The Dempster–Shafer evidence theory has been widely used in the ﬁeld of data fusion. However, with further research, incomplete information under the open world assumption has been discovered as a new type of uncertain information. The classical Dempster’s combination rules are difﬁcult to solve the related problems of incomplete information under the open world assumption. At the same time, partial information entropy, such as the Deng entropy is also not applicable to deal with problems under the open world assumption. Therefore, this paper proposes a new method framework to process uncertain information and fuse incomplete data. This method is based on an extension to the Deng entropy in the open world assumption, negation of basic probability assignment (BPA), and the generalized combination rule. The proposed method can solve the problem of incomplete information under the open world assumption, and obtain more uncertain information through the negative processing of BPA, which improves the accuracy of the results. The results of applying this method to fault diagnosis of electronic rotor examples show that, compared with the other uncertain information processing and fusion methods, the proposed method has wider adaptability and higher accuracy, and is more conducive to practical engineering applications.


Introduction
Fault diagnosis is an important research content in the engineering industry, and its applications in vehicle detection [1], high-speed train system diagnosis [2,3], and intelligent machinery diagnosis [4] have received a lot of attention. Current research on fault diagnosis mainly includes machine learning and application of diagnostic methods. Some traditional machine learning methods, such as the Gaussian mixture model, support vector machine (SVM), have been widely used in the field of fault diagnosis. A large amount of machinery health information is easier to collect, but it also reduces the efficiency of detection and diagnosis. The data-driven model improves the efficiency of fault diagnosis and is widely used in this field. For example, in the literature [5], a differentiable neural network structure search based on pruning and multi-objective optimization is used for mechanical fault diagnosis, and the multi-scale fuzzy entropy technology based on Euclidean distance is applied to fault diagnosis of industrial system [6].
The above research can effectively improve the efficiency of fault diagnosis, but it is also an important research content for the processing of a large amount of collected information. In complex systems, multi-source information fusion has the advantage of improving accuracy and credibility [7], so it has been widely used. However, affected by the uncertainty of the real world, multi-sensor information sources are considered to be uncertain [8]. For uncertain information, information entropy [9], probability theory [10], rough set theory [11,12], Dempster-Shafer evidence theory [13,14], belief function [15,16], and other methods [17,18] have been proposed to deal with uncertain information. The Definition 2. In the frame of discernment, the mass probability function m is a function from the power set 2 Ω to the interval [0,1]. m satisfies the following relation: m(∅) = 0, ∑ A∈Ω m(A) = 1. (2) If m(A) > 0, the proposition subset A is called a focal element. m(A) represents the degree of support for proposition A, also known as basic probability assignment (BPA) or basic belief assignment (BBA).

Definition 3.
Body of evidence (BOE) is a binary group composed of a propositional subset and its mass function, and is a unit of uncertain information evidence. BOE is expressed as follows: where, is a subset of the power set 2 Ω .
Definition 4. The belief function Bel or the plausibility function Pl can be used to express a BPA m: where, Bel(A) represents the level of support for the proposition A, and Pl(A) represents the level of no objection to the proposition A.

Definition 5.
Under the framework of the Dempster-Shafer evidence theory, two independent mass functions, m1 and m2, can be fused by the following Dempster's combination rules: where k is a normalization factor and is defined as follows: It should be noted that the classical definition of the Dempster-Shafer evidence theory is defined to be applied in the closed world. For the open world, Deng extended Dempster's combination rules and named it the generalized combination rule [37].

Definition 6.
In the open world assumption, Deng defined the intersection of two empty sets is still an empty set, satisfying ∅1 ∩ ∅2 = ∅. For the two given BPA m 1 and m 2 , the generalized combination rule is defined as follow: when m(∅) = 0, GCR degenerates to the classic DS combination rule. Two empty sets can be fused by multiplication to obtain their GBPA value. In addition, if and only if K = 1, m(∅) = 1.
With the development of data fusion research under the open world assumption, new literature has proposed the shortcomings of Equation (7). Literature [40] found that the method of obtaining the mass function of the empty set by multiplying the mass function values of two empty sets in the GCR formula is unreasonable, and the result of its combined use with GCR is inconsistent with practice. In addition, because m 1 (∅) and m 2 (∅) support each other, and the proposition comes from the open world, the result of their multiplication should not be assigned to the generalized conflict coefficient K, which is inconsistent with the GBPA. Therefore, further studies are needed to improve the GCR. However, since the method proposed in this paper uses the negation of evidence for calculation, it does not involve the multiplication of the mass function values of empty sets, and can effectively avoid the problems proposed in the literature [40].

Shannon Entropy
The quantitative measurement of information originated from the concept of "information entropy" first proposed by Glaude Shannon in 1948. As an information entropy of uncertainty measurement, the Shannon entropy can effectively solve the uncertainty measurement problem of the probability measurement.

Belief Entropy
The Deng entropy can be seen as the generalization of the Shannon entropy. It was proposed by Deng in [44], which deals with the belief for each focal element. When BPA degenerates to a probability distribution, the Deng entropy degenerates to the Shannon entropy.

Definition 8.
In FOD X, the Deng entropy is defined as follows: where m is a mass function, |A| is the cardinality of A.
However, when |A| = 0, Equation (9) will not be available. Therefore, Tang et al. [41] extended the uncertainty measure under the Dempster-Shafer evidence theory framework. In the open world assumption, the non-zero mass function of the empty set and the uncertain information represented by the incomplete FOD were handled more carefully.
In the formula, |A| is the cardinality of proposition A, |X| is the number of related elements in FOD, and ' ' is the symbol of the ceiling function, which means the smallest integer is not less than the variable, e.g., 0.4 = 1.

Negation of BPA
The negation method is considered to be a feasible way to express knowledge. Smets defines the negation of the mass function in [45], and used m to represent the negation of the mass function m in the proposed model. However, this model has some limitations. For example, when applied to the negation of m(θ), m(θ) is always equal to 0. Therefore, Yin et al. proposed a method to calculate negation of the basic probability assignment in [42].

Definition 10.
For each focal element e i in FOD, initial belief assignment p i is replaced by complementary probability 1-p i . After obtaining the negation of m(e i ), calculate the sum σ of all focal elements to standardize the belief of all the negative focal elements. Finally, the general formula of the negation of BPA is expressed as: where n is the number of focal elements, and m(e i ) is the confidence of the i-th focal element in the initial quality function.

An Improved Method for Incomplete Information Fusion in Fault Diagnosis
For the problems in the open world, traditional information entropy and DS classical combination rules are difficult to provide solutions. This section proposes a conflict data fusion method based on EDEOW to measure the uncertainty information, obtain more uncertain information through negation of BPA to improve the accuracy of information processing, and adopt GCR for fusion. In Figure 1, a fault diagnosis method framework based on EDEOW, negation of BPA, and GCR in the open world is designed. The detailed steps of uncertain information processing and fusion are described as follows: Step 1: fault feature extraction and sensor evidence modeling: in multiple fault modes, for each fault feature extracted, evidence modeling is carried out based on the information collected by the sensor. For complete information, the evidence modeling under the open world assumption is the same as that in the closed world. As for incomplete information, the empty set is used to represent it under the open world assumption, and its mass function is non-zero.
Step 2: calculation of negation BPA: before data fusion, evidence preprocessing will make the fusion result more accurate. Considering the limitations of data, this paper adopts the method in [42] to calculate the negation of BPA to obtain more uncertain information. For each BOE obtained by modeling, negate it with the following formula: Step 3: uncertainty calculation based on EDEOW: this paper uses the Deng entropy to measure the uncertainty of the data. Since the original Deng entropy is only applicable to the uncertainty measurement in the closed world, this paper adopts the extension to the Deng entropy in the open world assumption [41] proposed by Tang et al., and uses Equation (10) to calculate the uncertainty of BPA and negation of BPA.
Step 4: data modification: by summing the two uncertainties of each group of evidence calculated in step 3, the final uncertainty E edeowu (m i ) obtained can be used to calculate the weight of the evidence through the following formula: Based on the weight of each set of data, the calculation method of modified BPA is calculated in the same way as in the closed world, where m i (A) is the mass function value obtained by proposition A through sensor data modeling: Step 5: data fusion based on GCR: considering that the classical Dempster-Shafer theory is not applicable to the problem under the open world assumption, this paper uses the extension rule of Dempster's combination rule proposed by Deng, namely GCR, to fuse the revised data: For proposition A in BOE, the result is obtained through (m − 1) times of fusion: Step 6: application in fault diagnosis: the above information processing and fusion methods are used for fault diagnosis, and the results of the diagnosis will be analyzed and judged based on the fused data.

Example
In this section, an example in [46] is used to verify the effectiveness of the method proposed in this paper, and to compare and discuss with other existing methods.

Example Presentation
Literature [46] extracts the records in the database. Suppose there is a frame of discernment Θ = {F 1 , F 2 , F 3 }, which represents three types of faults that may cause electric shortage in a car: {low oil pressure, air leakage in the intake system, stuck solenoid valve}. Five sensors S 1 , S 2 , S 3 , S 4 , S 5 are placed in different positions to measure the system's exhaust volume, maximum power, maximum speed and other parameters. Because the engine speed unexpectedly exceeded the upper limit, the sensor S 5 was destroyed and could not work normally, resulting in deviations in the measurement results. The purpose of this example is to determine which type of fault occurs in the automotive system through the method proposed in this article.

Fusion through the Proposed Method
Step 1: after sensors complete the parameter measurement, the central control system models the data obtained by the sensor as BPA, which is shown in Table 1. Where m i corresponds to the evidence modeling result from the sensor S i (I = 1, 2, 3, 4, 5).  It should be noted that the data in literature [46] are in the closed world, while this paper studies the problems under the open world assumption. Considering that the universal set in the closed world represent unknown uncertainty, the m(∅) in the open world also has the same meaning, representing the universal set in the closed world and unknown elements. Therefore, this paper assumes that the BPA assigned to the universal set in the closed world is m(∅), so as to satisfy the open world condition.
Step 2: according to Equation (12), calculate the negation of BPA of the data obtained by the sensor S 1 modeling: The results of the remaining data calculated by the same method are shown in Table 2. Step 3: use Equation (10) to measure the uncertainty of the original data: Using the same method, calculate the uncertainty of the negation of BPA obtained in step 2: Based on the calculated weights and according to Equation (14), the BPA is modified as follows: Step 5: use Equation (15) to fuse the modified BPA:

Analysis
Since the universal set in the closed world has the same meaning as m(∅) under the open world assumption, the BPA value assigned to the universal set in the research in the closed world is assumed to be m(∅) to satisfy the open world condition. Table 3 and Figure 2 show the fusion results of the method proposed in this paper and other existing methods for this problem. Based on the threshold ∆ = 0.70 set by the literature [47], if m(F i ) ≥ 0.70 exists in the fusion result, it means that the method has successfully identified the fault type F i . In this example, because the sensor S 5 was destroyed, the data collected from S 5 conflicted with other evidence after modeling, and most of the beliefs were mistakenly assigned to F 3 . With this in mind, combined with the results in Table 3, it can be concluded that F 1 is the fault that caused the shortage of power in the automotive system.
As shown in Table 3, the classic DS combination rules are affected when dealing with conflicting data, and most of the beliefs are incorrectly assigned to F 3 , which fails to solve the problem in this example. Sun

Application
In order to further test the effectiveness and applicability of the method proposed in this paper in solving the problem of fault diagnosis based on sensor data, the case in literature [50] is used as an example for diagnosis and verification in this section.

Problem Description
Suppose the motor rotor has three different types of faults, which are F = {rotor unbalance, rotor misalignment, pedestal loosening}. Three acceleration sensors installed at different positions are used to measure the vibration acceleration of the motor rotor. The amplitude of acceleration vibration frequency at different frequencies is defined as a fault feature variable.
In reference [50], the sensor's support and credibility were calculated through the similarity matrix of the characteristic variables. After determining the Z-number, the membership function corresponding to component A was matched with typical faults to obtain BPA. The BPAs obtained by modeling the data collected by the sensors are shown in Table 4. Table 4. Fault diagnosis data modeled as BPAs.

Implementation Steps
Step 1: in this example, three different frequencies, Freq1, Freq2, and Freq3, are used as fault feature variables to implement fault diagnosis. Since the main research content of this case is to use an improved incomplete information fusion method for fault diagnosis based on sensor data, the method of data modeling as BPA is not the main content of this paper. Therefore, this section directly adopts the BPA data reported by sensors in the literature [50], which is shown in Table 4.
Step 2: according to the description in the third section, this paper adopts Equation (12)  According to Equation (12), the data calculation results of Freq2 and Freq3 are shown in Table 5. Step 3: on the basis of the previous step, we use the EDEOW, which proposed by Tang  The uncertainty measurement results at Freq2 and Freq3 frequencies are shown in Table 6. In addition, it is also necessary to measure uncertainty of the negation of BPA. Through the same formula, the uncertainty calculation results of the negation of BPA at Freq1, Freq2 and Freq3 are shown in Table 7. Step 4: by adding the two uncertainties calculated in the third step, the calculation process of the final uncertainty of each group of evidence at Freq1 is as follows: The final uncertain degree of each group of evidence is shown in Table 8. The final uncertainty is used to modify the data. According to Equation (13), the weight of each data under Freq1 vibration acceleration frequency is calculated as follows: The weight calculation results at Freq2 and Freq3 frequencies are shown in Table 9.
The modification results of BPA at other frequencies are shown in Table 10. Step 5: after the modification of BPAs, based on the GCR shown in Equation (15), two fusion operations are required for each data at the Freq1 frequency. The calculation process is as follows: Finally, the fusion results at all frequencies are calculated in the same way and shown in Table 11.  Table 12 shows the data fusion results of different methods in fault diagnosis applications. The fusion results of different methods are compared intuitively in Figure 3. As shown in Table 11, among all test frequencies, F2 has the highest confidence support degree among all test frequencies, which indicates that the fault diagnosis result is F2. According to Table 12 and Figure 3, the diagnosis results of the proposed method are consistent with those obtained by other literature methods, which verifies the effectiveness of this method. In addition, compared with the method in the literature [41,50], the method proposed in this paper has a higher confidence for the identified fault type F2, which is more conducive to the decision-making of relevant personnel in practical engineering applications.

Discussion
The incomplete information fusion method proposed in this paper has a good effect on the problem of this application. In addition, for the example in Section 4, this method can be used to diagnose effectively and with high accuracy. Therefore, the proposed method is applicable for fault diagnosis based on any sensor data. With the proposed method, we assume that there is unknown fault type. This idea gives space for unknown fault type by assigning a belief on the empty set which represents other potential fault type. More importantly, the proposed idea has no bad effect on identifying the available fault condition.

Conclusions
In order to solve the problem of incomplete information under the open world assumption, this paper proposes an incomplete information processing method based on negation of BPA, EDEOW, and GCR. In this paper, the negation of the BPA calculation method proposed by Yin et al. is adopted to obtain more uncertain information, thus reducing information loss and improving information processing accuracy. An extension to the Deng entropy in the open world assumption is an extended uncertainty measurement method based on belief structure. The method proposed in this paper uses EDEOW to measure the uncertainty of original BPA and negation of BPA. The two groups of measured uncertainties are added together as the final uncertainty. Based on this uncertainty, the weight of evidence is calculated and the data are modified, and the modification results are more accurate. The generalized combination rule is an extension of the Dempster-Shafer theory in the open world. It solves the problem that the classical combination rule of the Dempster-Shafer evidence theory cannot be applied to the open world assumption. It is used for the final fusion of the modified data in the proposed method. Compared with other methods, the new information processing strategy constructed in this paper considers more uncertainties and makes the measurement results more accurate. At the same time, the EDEOW and GCR used in this method are both applicable for the problems under the open world assumption, effectively making up for the shortcomings of traditional methods that are difficult to solve the problem of incomplete information. Finally, this method is applied to the problem of fault diagnosis, which is more conducive to the decision-making of engineers in practical applications.

Conflicts of Interest:
The authors declare no conflict of interest.