Sensing Attribute Weights: A Novel Basic Belief Assignment Method

Dempster–Shafer evidence theory is widely used in many soft sensors data fusion systems on account of its good performance for handling the uncertainty information of soft sensors. However, how to determine basic belief assignment (BBA) is still an open issue. The existing methods to determine BBA do not consider the reliability of each attribute; at the same time, they cannot effectively determine BBA in the open world. In this paper, based on attribute weights, a novel method to determine BBA is proposed not only in the closed world, but also in the open world. The Gaussian model of each attribute is built using the training samples firstly. Second, the similarity between the test sample and the attribute model is measured based on the Gaussian membership functions. Then, the attribute weights are generated using the overlap degree among the classes. Finally, BBA is determined according to the sensed attribute weights. Several examples with small datasets show the validity of the proposed method.


Introduction
Data fusion technology of soft sensors [1][2][3] which integrates multi-source information to obtain a more objective result, is widely used in many application systems [4][5][6]. Some algorithms, such as fuzzy sets theory [7][8][9][10], Dempster-Shafer evidence theory (D-S evidence theory) [11,12], Z numbers [13,14], D numbers [15][16][17] and neural networks [18][19][20][21][22][23][24], are important tools in the data fusion of soft sensors. As one of them, D-S evidence theory gives a convenient mathematical framework to handle uncertainty information, so it is widely used in many fields [25][26][27]. However, determination of basic belief assignment (BBA) is the first and key step for applying D-S evidence theory. In general, there is no fixed model to obtain BBA. The method is usually designed according to the practical application.
Many scholars have investigated different methods to address the problem of obtaining BBA. Yager [28] proposed the D-S belief structure to determine BBA using the whole class of fuzzy measures. Zhu et al. [29] used fuzzy c-means to obtain BBA. Dubois et al. [30] proposed a probability-possibility transformation to gain BBA. Baudrit and Dubois [31] proposed practical representation methods for incomplete probabilistic information, based on formal links existing between possibility theory, imprecise probability and belief functions. Wang et al. [32] derived BBA function from the common multivariate data spaces. Masson and Denoeux [33] constructed a possibility distribution from a discrete empirical frequency distribution. Antoine et al. [34] constrained evidential clustering with instance-level constraints to determine BBA. Campos and Huete [35] determined BBA by considering the problem of assessing numerical values of possibility distributions.
Although various methods are proposed, these methods mainly focus on the incompleteness or the uncertainty of information itself, which does not simultaneously consider the importance and [11,12] D-S evidence theory was introduced by Dempster. Then Shafer developed it by defining the belief function and plausibility function. It has the ability to deal with uncertain information without a prior probability. Thus, it is flexible and more effective than probability theory. Due to these advantages, it is widely applied in many fields [44][45][46].

Frame of Discernment (FoD) and Mass Function
In D-S evidence theory, the frame of discernment (FoD) is defined as Ω = {H 1 , H 2 , · · · , H N }, which is composed of N exhaustive and exclusive hypotheses. Denote P(Ω), the power set which is composed with 2 Ω propositions as: where the propositions {H 1 }, {H 2 }, · · · , {H N } have only one element, so they are defined as the singleton subsect proposition; the propositions {H 1 ∪ H 2 }, {H 1 ∪ H 3 }, · · · , Ω have more than one element, so they are defined as the compound subsect proposition. A mass function m is a mapping from 2 Ω to [0, 1], formally defined as which satisfies the following conditions: The mass function m is also called BBA function. Any subset A of Ω such that m(A) > 0 is called a focal element.

Dempster's Combination Rule
Suppose m 1 and m 2 are two mass functions in the same FoD Ω. Dempster's combination rule, noted by m = m 1 ⊕ m 2 , is defined as follows: where Here, k is regarded as a measure of conflict between m 1 and m 2 . The larger the value of k, the more conflict between the evidence.

Generalized Evidence Theory
Generalized evidence theory (GET) is presented by Deng, which can handle the uncertain information in the open world [42].

Generalized Basic Belief Assignment
Suppose that U is a FoD in the open world. Its power set, 2 U G , is composed of 2 U propositions. A mass function is a mapping m G : where m G is GBBA of the U. The difference between GBBA and classical BBA is the restriction of ∅. Note that m G (∅) = 0 is not necessary for GBBA, namely, the empty set can also be a focal element. GBBA will degenerate to classical BBA when m(∅) = 0.

Generalized Combination Rule
In GET, ∅ 1 ∩ ∅ 2 = ∅ is assigned to conflict coefficient K. Given two GBBAs (m 1 and m 2 ), the GCR is defined as follows:

Modified Generalized Combination Rule
The mGCR is presented by Jiang et al., where the intersection ∅ = ∅ 1 ∩ ∅ 2 is considered as support to ∅, namely, the orthogonal sum of m 1 (∅) and m 2 (∅) is normalized like other focal elements. Given two GBBAs, the mGCR is defined as follows: The mGCR inherits all the advantages and properties of the original GCR.

The Proposed Method
To obtain an objective classification, the key step is how to gain an effective BBA. As can be seen in Figure 1, a new method to determine BBA is presented. First, the dataset is divided into two parts, where a part is taken as the training set and another part is taken as the test set. Then the Gaussian models of k attributes are built using the training set, and the attribute models are tested by the test set to produce the similarity. Finally, the weight of each attribute is constructed based on the overlap degree of each class, which is used to modify the similarity and then to determine BBA.

The Modeling of Each Attribute
The Gaussian distribution is the common probability distribution in statistics, which is easy to analyze. Therefore, the membership function of Gaussian type is adopted to build the attribute models in this paper.
Suppose in the frame of discernment (FoD) Θ = {θ 1 , θ 2 , · · · , θ n }, each class θ i (i = 1, 2, · · · , n) has j (j = 1, 2, · · · , k) attributes. The mean value X ij and the standard deviation σ ij of all the training samples in class θ i are calculated as follows: where x ijl is the sample value of the jth attribute from the lth training sample in class θ i .
Hence, the corresponding attribute model of Gaussian type is obtained as follows: The real-world sensor data usually has stochastic nature. To make the results more objective and credible, the training samples from each class are randomly selected to build the attribute model. For each attribute, n membership functions of Gaussian type are obtained as models of different class in the specific attribute.

The Construction of the Singleton Subsect and Compound Subsect
In D-S evidence theory, BBA has two forms. One is the singleton subsect, such as {A}, which represents a certain proposition. The other is the compound subsect, such as {AB}, which represents a kind of uncertain proposition. Namely, it indicates that {A} proposition or {B} proposition occurs, and it is unknown how to assign belief in {A} and {B}. For instance, in the case of motor rotor fault diagnosis, there are three faults: A, B and C, which represent the unbalance, misalignment and pedestal looseness respectively. BBA from a sensor is m({A}) = 0.55, m({AC}) = 0.45. Where {A} represents that the fault of the motor rotor is the unbalance, which is a singleton subsect proposition; {AC} represents that the fault of the motor rotor is the unbalance or pedestal looseness, which is a compound subsect proposition.
In this paper, the proposed method can automatically generate BBA of the singleton subsect and the compound subsect. This section mainly introduces how to construct the singleton subsect proposition and the compound subsect proposition. As shown in Figure 2, an attribute model of Gaussian type is denoted as a singleton subsect proposition, where µ A (x) and µ B (x) are all singleton subsect propositions. The compound subsect proposition is constructed by the overlap area of some membership functions of Gaussian type. The compound subsect proposition {AB} is noted as follows: where w = sup min(µ A (x), µ B (x)).

The Measurement of Similarity
In this paper, a nested structure similarity function sim is defined to represent the matching degree between the test sample and the attribute model. This structure can avoid the high conflict, to some extent. The similarity is measured using the following regulations: (1) If there is only one intersection between the test sample and the attribute models, then this intersection is assigned to the corresponding singleton subset as the similarity between this test sample and the corresponding singleton subset model; (2) If there is more than one intersection between the test sample and the attribute models, then the top value of the intersections is assigned to the corresponding singleton subset as the similarity; the low value of the intersections is assigned to the corresponding compound subset as the similarity.
In the actual applications, the test model is covered by a different number of test data. For example, in the dataset, the test model is usually covered by one test data which, on account of its attribute value, is a fixed value, such as Iris. So this test model is the discrete value, as is the case for the test samples x 0 and x 1 which are shown in Figure 3. However, in a complex system, the attribute value is not a fixed value, for example, sensor data is not constant in sensor networks and is thus easily affected by the external environment. In such a case, the attribute is measured repeatedly by the sensor and the sensor data will be a set of values. Thus, this test model is covered by multiple test data, and is fuzzed into the membership function of Gaussian type, as is the case for the test models µ x 0 and µ x 1 which are shown in Figure 4. As shown in the analysis above, under the different application backgrounds, the measurement of the similarity is slightly different.
If the test model is covered by one test data, as shown in Figure 3a, for the test sample x 0 , the similarity is measured by the above regulations (1): where µ B is the membership functions of the singleton subsect propositions {B}. x 0 has no intersection with µ A and µ AB , so sim(A) = sim(AB) = 0.
As shown in Figure 3b, for the test sample x 1 , the similarity is measured by the above regulations (2): where µ B is the membership functions of the singleton subsect propositions {B}, and µ AB is the membership functions of the compound subsect propositions {AB}. Although x 1 has an intersection with µ A , this intersection is assigned to the compound subsect {AB} since it simultaneously locates in the overlapping portions between µ A and µ B . Based on the above regulations (2), sim(A) = 0 and sim(AB) = a 2 . If the test model is covered by multiple test data, then the model is first constructed using the membership function of Gaussian type, which is similar to the model of training sets. Then, as shown in Figure 4a, for the test model µ x 0 , the similarity is measured by the above regulations (1): where µ x 0 has no intersection with µ A and µ AB , so sim(B) = sim(AB) = 0.
As shown in Figure 4b, for the test model µ x 1 , the similarity is measured by the above regulations (2): where although µ x 1 has an intersection with µ B , this intersection is assigned to the compound subsect {AB} since it simultaneously locates in the overlapping portions between µ A and µ B . Based on the regulations (2), sim(B) = 0 and sim(AB) = a 2 .

The Construction of Attribute Weights
In the multiclass classification, a comprehensive evaluation of each attribute needs to be made to obtain an objective result [47][48][49]. Given an attribute, if the similarity of some classes is high, that is, the overlap degree of the attribute models of these classes is large, then the ability to discriminate the difference of these classes of this attribute is weak. In this case, a false classification may easily occur based on this attribute, so the reliability of this attribute is low. Further, BBA generated from this attribute has a smaller contribution in multiclass classification. On the contrary, if the similarity of some classes is low, then the ability to discriminate the difference of these classes of this attribute is fine, so the reliability of this attribute is high. Further, BBA generated from this attribute has a larger contribution in multiclass classification. Thus, the effect of the attributes which provide a larger contribution should be enhanced and the effect of the attributes which provide a smaller contribution should be weakened to obtain more objective results. As analyzed above, the attribute weights are proposed and taken into account in this paper.
Suppose µ ij (i = 1, 2, · · · , n; j = 1, 2, · · · , k) is the membership function of the jth attribute of the ith class; µ * rj (r = 1, 2, · · · , n 2 −n 2 ; j = 1, 2, · · · , k) is the generalized triangular fuzzy number model of the rth compound subsect proposition {AB} in the jth attribute; S(x) is the area of the x. Then the attribute weight w j (j = 1, 2, · · · , k) is proposed as follows: where w j reveals that the larger the overlap degree, the larger the similarity, and the smaller the attribute weight.

The Construction of BBA
In D-S evidence theory, the closed world means that the FoD is complete; the open world means that the FoD is incomplete, namely, unknown propositions potentially exist outside of the given FoD. In the closed world, the FoD is complete, so the redundant BBA should be assigned to the universal set Θ, which is gained after using the attribute weights to correct the similarity sim. However, in the open world, an unknown proposition exists. In GET, BBA of the unknown proposition is determined based on the known propositions. Then, the attribute weights are also constructed by the model of the known propositions and used to correct the similarity of the known propositions. Finally, the redundant BBA is assigned to the universal set Θ. According to the analysis above, the determination of BBA in the closed world and the open world is proposed respectively.

The Determination of BBA in the Closed World
In the closed world, BBA is determined by three steps. The similarity sim is generated in the first step, which is described in Section 3.2. In the second step, if ∑ sim > 1, the similarity is normalized; if ∑ sim < 1, the similarity is not normalized. The FoD is complete in the closed world, so where sim(∅) = 0 is necessary. In the third step, BBA of each proposition is obtained as follows: where w j is denoted as the jth attribute's weight, and sim j is denoted as the similarity function of the jth attribute.
For example, in the frame of discernment Θ = {a, b, c}, the similarity function sim 2 of attribute 2 is sim 2 (a) = 0.6, sim 2 (ab) = 0.2 and the attribute weigh w 2 = 0.7, then BBA is determined as follows:

The Determination of BBA in the Open World
In the open world, BBA is also determined by three steps. The similarity sim is generated in the first step, which is described in Section 3.2. The FoD is incomplete in the open world, so where sim(∅) = 0 is not necessary. Hence, in the second step, if ∑ sim > 1, then the similarity is normalized and the similarity of the empty set sim(∅) = 0; if ∑ sim < 1, then the similarity is not normalized and the similarity of the empty set sim(∅) = 1 − ∑ sim. In the third step, BBA of each proposition is obtained as follows: where w j is denoted as the jth attribute's weight, and sim j is denoted as the similarity function of the jth attribute.
For example, in the frame of discernment Θ = {a, b}, the similarity function sim 1 of attribute 1 is sim 1 (a) = 0.6, sim 1 (ab) = 0.2 and the attribute weigh w 1 = 0.8, then BBA is determined as follows. Firstly, the similarity of the empty set is obtained as: Then BBA of each proposition is obtained as: After determining BBA, BBAs of k attributes are combined k − 1 times with the mGCR and the final recognized result can be gained.

Application Example
In this section, several experiments are performed both in the closed world and the open world respectively, to evaluate the validity and reasonability of the presented method. To better evaluate this method, an engineering example of fault diagnosis of a motor rotor is carried out. Note that the numbers from the following calculations are all unified as the four significant digits.
In addition, since the real-world data usually has stochastic nature, it is less advisable that the data with the highest recognition rate is selected for the experiment. Hence, in the following experiment, the training set and the test set are randomly selected, which makes the results more objective and credible.

An Example of Iris
The Iris dataset [50,51] contains three classes: Iris Setosa(S), Iris Versicolour(E) and Iris Virginica(V). Each class has 50 samples, and each sample contains four attributes: sepal length(SL), sepal width(SW), petal length(PL) and petal width(PW), respectively. Each of the four attributes is treated as an information source, and correspondingly there are three training sets and three test sets which are the same for the sample number. The data is obtained from the UCI repository of machine learning databases. (UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets/Iris.)

Experiment in the Closed World
Thirty samples are randomly selected as training samples from three classes respectively, and the remaining 20 samples of each of the three classes are taken as the test samples. Then, the experiment is carried out as follows.
Step 1: According to Section 3.1, the attribute models of each attribute of the training samples are obtained and shown in Figure 5.
Step 2: A test sample (4.6, 3.1, 1.5, 0.2) is selected from the test set of S. Then, based on Section 3.2, the similarity between this test sample and each attribute model is obtained, which is shown in Figure 5 and Table 1.
From Table 1, we can see that the similarity of the SW attribute does not coincide with the reality. This is because the SW attributes of the three classes are mutually overlapping, which is not easy to distinguish in this case.  Step 3: According to Equation (13), the attribute weights of the four attributes are obtained as follows:

Similarity sim(S) sim(E) sim(V) sim(SE) sim(SV) sim(EV) sim(SEV
Step 4: Based on Section 3.4, BBAs are determined as shown in Table 2.  From the final BBA, it is clear that the test sample is classified into S class which coincides with the reality.

Experiment in the Open World
To verify the presented method's performance in the open world, the training samples are randomly selected only from two Iris classes which are randomly selected from the three classes, and the remaining class is taken as the unknown class. The test samples are randomly selected from each of the three classes. In the Iris dataset, each class has 50 samples. In this paper, 30 samples of two classes are randomly selected as training samples from S and E respectively; the remaining 20 samples of each of these two classes and the 20 samples which are randomly selected from V, are taken as the test samples. Then, the experiment is carried out as follows.
Step 1: According to Section 3.1, the attribute model of each attribute of the training samples is obtained and shown in Figure 6. Step 2: A test sample (5.5, 4.2, 1.4, 0.2) is selected from the test set of S. Then, based on Section 3.2, the similarity between this test sample and each attribute model is obtained, which is shown in Figure 6 and Table 3. From Table 3, we can see that the similarity of the PL attribute and the PW attribute coincides with the reality, but the similarity of the SL attribute and the SW attribute does not coincide with the reality. This is because the classes are not easy to distinguish if the overlap degree among the classes in the SL attribute and the SW attribute is large. Hence, the attribute weights are necessary when determining BBA.
Step 3: According to Equation (13), the attribute weights of four attributes are obtained as follows: Step 4: Based on Section 3.4, BBAs are determined as shown in Table 4. Similarly, a test sample (6.8,3.0,5.5,2.1) is selected from the test set of unknown class (V), then the final BBA is obtained as follows: where m(∅) = 1 indicates that the possibility of this test sample outside the FoD is very large, which coincides with the reality.

Experiments on Three Datasets: Five-Fold Cross-Validation
To further evaluate the proposed method, this method is compared with three well-known classifiers: support vector machine with radial basis function (SVM-RBF), decision tree learner (REPTree) and Naive Bayesian (NB) both in the closed world and the open world.
Three datasets are experimented in this section, which are obtained from the UCI repository of machine learning databases [52]. (UCI Machine Learning Repository: http://archive.ics.uci.edu/ ml/datasets.) Within this, the Iris dataset is perhaps the best known database found in the pattern recognition literature. The dataset contains three classes of 50 instances each, where each class refers to a type of iris plant. Each class contains four attributes. The Seeds dataset contains three different varieties of wheat: Kama (K), Rosa (R) and Canadian (C). Each variety has 70 samples, and each sample contains seven attributes. The Wine dataset was the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wine. A summary of these datasets is shown in Table 5. The comparison results of five-fold cross-validation are shown in Tables 6-8 respectively.  From the above experimental results, it is found that: (1) The results in the closed world Both the proposed method and the classical machine learning algorithms are efficient in the closed world. From the first four lines of Tables 6-8, we can find that the proposed method obtains competitive performances with respect to machine learning algorithms. For example, in the first four lines of Table 7, the average recognition rate of Seeds of our method is 90.57%, NB is 78.09%, REPTree is 89.49% and SVM-RBF is 90.21%.
(2) The results in the open world From the latter 12 lines of Tables 6-8, we can find that the average recognition rate of the empty set ∅ is 0 by the classical machine learning classifiers. It means that the "unknown" class is definitely misclassified by the classical machine learning classifiers. These standard machine learning classification algorithms cannot work in an open world environment.
In contrast, the "unknown" class in an open world environment can be classified by the proposed method. For example, in Table 8 In contrast, our method is still effective in an open world, which is the advantage of our method compared with the classical machine learning algorithms.

An Example of Fault Diagnosis
Suppose there are three types of fault in a motor rotor, which are noted as F = {F 1 , F 2 , F 3 } = {rotorunbalance, rotormisalignment, Pedestallooseness}. Three vibration acceleration sensors and a vibration displacement sensor are placed in different installation positions to collect the vibration signal. Vibration displacement and acceleration vibration frequency amplitudes at the frequencies of 1X, 2X and 3X are taken as the fault feature variables. At the same time interval, each fault feature of each fault is continuously observed 40 times which is taken as a group of observations. In this paper, a total of five groups are measured in each fault feature of each fault. For example, five groups of observations of the fault feature variables of 1X of the rotor unbalance are shown in Table 9.
The example is carried out by the proposed method and three well-known classifiers: support vector machine with radial basis function (SVM-RBF), decision tree learner (REPTree) and Naive Bayesian (NB), respectively. The comparison results of the five-fold cross-validation are shown in Table 10. Table 9. Five groups of observations of 1X of the rotor unbalance [53].  The above experimental results demonstrate the validity of the proposed method. The advantages of the proposed method are discussed and concluded as follows: • The weights of the attributes have been considered in the proposed BBA generation method.
Because of that, the generated BBA could reflect the difference of the importance between different attributes which have different classification ability. If the classification ability of each attribute is similar, then the result obtained by using the proposed method is basically identical with such a method that does not consider attributes' weights. However, if attributes have different classification ability, then the proposed method has better performance since it has considered the weight of each attribute. • The weight of each attribute is totally determined based on the discrimination of the attribute, but without relying on other information. In many existing approaches, the weights of attributes are either from an externally given credibility of sensors or mutual support degree of different attributes. In this paper, the weight of every attribute is determined just by the attribute's discrimination, before the amendment of the generated BBA is implemented. This approach is completely a data-driven solution, which makes the results more objective and credible. • The proposed BBA generation method is efficient simultaneously in the closed world and open world. At present, many existing BBA generation approaches only consider the situation of the closed world. In contrast, by introducing generalized evidence theory, the incomplete information about the frame of discernment is taken into consideration in the initial phase of the modelling of uncertain information. Therefore, it is more reasonable and realistic. • The proposed method is simple and can be easily used in many practical applications. In addition, it is based on the normal distribution assumption and can be easily changed to other forms to reflect the feature of training data much more realistically. Therefore, the proposed method is flexible and easily extensible.
However, compared to the traditional BBA method, namely the method that does not consider the attribute weights, the approach of weighted BBA has complicated calculations. However, considering that the performance is better after using the attribute weights, the increased computational complexity is acceptable to some extent.

Conclusions
In the application of soft sensors data fusion, based on evidence theory, how to determine BBA is an open issue. In this paper, a new method is presented to determine BBA in a closed or open world environment. Within the proposed BBA determination method, all data are separated into a training set and test set, and the attributes of each sample are seen as soft sensors. At first, according to the training samples, each attribute's Gaussian model is built. Then, for every test sample, by considering the similarity and the attribute weights, a BBA is derived to express the test sample. Finally, according to the derived BBA, the class of the corresponding test sample is determined.
The advantages of the proposed BBA determination method include the following aspects. At first, the weights of attributes, which reflect the reliability of information sources, have been considered in the process of BBA generation. Moreover, it is compatible with the method which does not consider the attribute weights. Second, since the generalized evidence theory has been used in this work, the proposed method is efficient not only in the closed world but also in the open world. Many previous BBA determination methods have only been able to deal with the situation of the closed world. Third, the whole process is completely data-driven, which makes the results more objective. The proposed method can be easily used in many practical applications, for example, classification and fault diagnosis. A number of simulation experiments show that the proposed method is effective. However, some shortcomings still exist in the proposed method. Mainly, the amount of computation of the proposed method is increased since the weights of attributes have been considered. Besides, the proposed method relies on a proper training set, which requires a certain size of data samples.
In the future, we will research further the following problems. Firstly, we will try to apply the proposed method to more practical applications, and import a pre-judging procedure to determine whether the attribute weights need to be considered to reduce the computational complexity. Furthermore, the possible dependencies among the attributes of the actual targets will be considered to improved the proposed method.