Novel Paradigm for Constructing Masses in Dempster-Shafer Evidence Theory for Wireless Sensor Network's Multisource Data Fusion

Dempster-Shafer evidence theory (DSET) is a flexible and popular paradigm for multisource data fusion in wireless sensor networks (WSNs). This paper presents a novel and easy implementing method computing masses from the hundreds of pieces of data collected by a WSN. The transfer model is based on the Mahalanobis distance (MD), which is an effective method to measure the similarity between an object and a sample. Compared to the existing methods, the proposed method concerns the statistical features of the observed data and it is good at transferring multi-dimensional data to belief assignment correctly and effectively. The main processes of the proposed method, which include the calculation of the intersection classes of the power set and the algorithm mapping MDs to masses, are described in detail. Experimental results in transformer fault diagnosis show that the proposed method has a high accuracy in constructing masses from multidimensional data for DSET. Additionally, the results also prove that higher dimensional data brings higher accuracy in transferring data to mass.


Introduction
Multi-sensor data fusion is a technology that makes it possible to combine information from multiple sources to obtain a unified picture [1]. In wireless sensor networks (WSNs), data fusion is a useful way to decrease or eliminate the uncertainty of decisions when dealing with information from different sources. It is widely used in state estimation problems [2], pattern recognition [3], robotics [4], and medical imaging [5]. Different theories have been proposed in multisource data fusion, such as the Bayesian approach, Dempster-Shafer evidence theory (DSET) [6], fuzzy set theory [7], and the rough set theory [8].
In a WSN, hundreds of pieces of data with different properties are collected by the nodes. To achieve a reasonable result, the theory used in this application should be good at transferring these large amounts of data with different properties into a unified result. DSET is an efficient way to deal with the uncertainty and imprecision of information [9], and its fusion framework has an advantage of combining different information into one, which makes it become a flexible method in WSN multisource information fusion. The mass function, also called basic belief assignment (BBA) function, is a prerequisite for using DS theory in reality. However, there are no fixed models to get mass in DSET. Hence how to use the hundreds of multisource pieces of data to construct the mass for DSET is the first problem that should be solved. A good and efficient paradigm for constructing an evidence structure must be set up because it is vital to get accurate conclusions from the information we collected. Suppose there is a classification problem with three possible results, the commonly used belief assignment transferring method is as shown in Figure 1. In Figure 1, The X axis stands for the observed data and the Y axis is the belief assignment. The three possible patterns are A, B and C. The observed data is transferred to the belief assignment (mass) according to the intervals to which it belongs. This method is easy to implement, but its accuracy is low. The reason involves two aspects: (1) the method ignores the statistical features of the observed data. The mean value and the standard deviation are always different, except for their distribution intervals. Even though the sample data sets of A and B distribute the same interval, their statistical features are still different. In this perspective, this common way isn't able to get the correct mass of the observed data; (2) the observed data is always organized in a multi-dimensional pattern. For example, a sensor can monitor temperature and humidity at the same time, the data will be presented as (T,H), where T and H are temperature and humidity, respectively. How to calculate the belief assignment from multi-dimensional data becomes another problem. These two problems are why we develop the proposed method to transfer multi-dimensional data to mass for DSET.
In this paper, Mahalanobis distance (MD) is used to measure the similarity between an object and a class. A long MD corresponds to a low belief assignment, and a short MD means a large belief assignment. Unlike Euclidean distance (ED), MD indicates the "distance" of the data's covariance. It is not affected by the dimension of the data and is a more scientific measurement of the similarity between an observed object and a class than Euclidean distance, because MD considers the difference of the samples' statistical features, including the mean value and covariance. The two main problems existing in the common method will be solved by transferring the MD to mass. Besides, for a compound class, which means the mixed class of the power set in DSET, the masses can also be calculated using MD. The main process of the proposed method includes three main steps: firstly is the calculation of the intersection classes of the power set and then is the step calculating the MD between the object and the subset samples, the last step is the algorithm mapping MDs to masses. The experimental results will be described to verify the performance of the proposed method. The proposed algorithm is used in transformer fault diagnosis to construct masses of diagnosis evidences from data collected in the transformer's inner space. The obtained results prove that the proposed has a high accuracy in constructing masses for DSET, especially in high dimensional data.
The remainder of this paper is organized as follows: Section 2 illustrates the related work. In Section 3, the MD method and DSET are briefly introduced, and then the mechanism for transforming data into masses in DSET by using the MD-mass method is developed in Section 4. Section 5 depicts the scheme of the implementation process for the proposed algorithm. In Section 6, the experiment for transformer fault diagnosis is described, along with its results. Finally, the discussion and conclusions are presented in Section 7.

Related Work
Mahalanobis distance (MD) is a useful method to calculate the similarity of different samples [10]. It is used in many fields, including statistics [11], pattern recognition [12], and manufacturing control [13]. In this paper, we focus on the research of belief transferring model. Aside from the common method in Section 1, many belief assignment functions have been developed to obtain masses from observed data and they are proved to be reasonable in certain applications. Chakeri developed a method based on Fuzzy C-means to gain masses [14]; the method is good at obtaining belief assignments from imprecise information. Szlzenstein put forth an iterative estimation method based on Gaussian model [15]. In [16], a scheme for constructing an evidence structure that uses an artificial neural network (ANN) is proposed; the method is good at dealing with large scale data in applications like image processing. In [17], Xin developed three methods to construct the BBA function. These methods are based on gray correlation analysis, fuzzy sets, and attribute measure, respectively. They are proved to be reasonable in converting different data sources into masses. Other efforts have been made to solve this issue by using different methods and theories, like fuzzy entropy [18], automatic thresholding [19], and so on.
The research on BBA function can be summarized as follows: (1) different transfer functions are developed counter to different specific applications, like pattern recognition, image processing. There is no a unified framework suitable under all conditions; (2) the existing developed methods are not suitable in WSN multisource data fusion because they ignore the importance of statistical features, which is good for obtaining a more correct belief assignment; (3) many of them can't be implemented in sensor nodes because their complex computational process, such as the ANN method.

Preliminaries
In this section, the basic theories related to the proposed method will be introduced, including the DSET, the Closed World Assumption and the Open World Assumption and Mahalanobis distance.

DSET
DSET [20] is an extension of the classical probability theory. It is a good strategy to deal with the conflicts and imprecision in multisource data fusion. Given an object X, let where  denotes the null set, and m(A) is called the basic belief assignment of A, where A is a subset of  .The numerical value of m(A) represents the degree of trust of exact set A. Subset A with non-zero mass is called a focal element. The structure composed of focal elements and their masses is called an evidence structure, expressed as: We call ( , ( )) A m A a piece of evidence. There are two types of evidences: singletons and compound sets. The above process is the step of representing evidence by using focal elements. In DSET, the impact of evidences on proposition A has two points: belief and plausibility. They are defined as follows: where Bel( ), Pls( ) and Bel( ) A A A denote the belief function, plausibility function, and dubiety function, respectively. It is apparent that Bel( ) Pls( ) After we get the evidence structure, a rule of combination can be used to fuse all the independent evidences into one. The Dempster combinational rule is expressed as: where  is the symbol of the combination operator. i A designates the focal element of data source i.
K indicates the conflict among the sources to be combined. After combining, a Pignistic probability can be made using the following expression [21]: Bet ( ) i PA is called the Pignistic probability transformed by the final evidence structure. Then, a decision can be made by choosing the class with maximum Pignistic probability as the result of the fusion process.

Closed World Assumption and Open World Assumption
When a proposition's genuine nature is uncertain, the Closed World Assumption regards this proposition as a false proposition; in contrast, the Open World Assumption takes this proposition as an unknown proposition. For example, under the known condition "Juan is a Boston citizen," we can make proposition A: "Juan is a citizen of New York." From the viewpoint of the Closed World Assumption, A is false, while the Open World Assumption regards A as an unknown proposition, because Juan maybe a New York citizen, though the possibility is low.
In short, the Closed World Assumption is applicable in an environment where all the conditions are known to us. When there are unknown conditions, we can take the Open World Assumption. In DSET, for a null set, its mass must equal 0, and it belongs to the Closed World Assumption. In the Transferable Belief Model (TBM) [21], ( ) 0 m  is allowed, and it agrees with the Open World Assumption. TBM extends the scope of using DSET, and our mass allocation strategy can also be divided into the Closed World Assumption and Open World Assumption.

Mahalanobis Distance
Let X be a data matrix ( np  ), containing n objects measured by p variables. X (1 p  ) is the column vector of every object's mean value.  is the variance and  denotes the Pearson correlation coefficient. Then, a variance-covariance matrix of X can be expressed as: with: where k x and l x are objects in X with subscript k and l.
We can see that the MD method is a way to calculate the similarity of two objects by their covariance. To get a better understanding of MD, a figure can be depicted as shown in Figure 2. The distributed points are sample points and their center points are O 1 and O 2 . From the viewpoint of ED, the circles represent equal EDs to center point O 1 . Therefore, we know that point A and point B are equal to center point O 1 , because they have the same ED to O 1 . Things will be changed in (b), where the circles stand for equal MDs to center point O 2 . Unlike ED, MD is not the spatial distance but the distance of covariance. Thus, point A and point C are the same to center point O 2 . In reality, the distribution of objects is never a "circle," but is more like a kind of ellipse. Apparently, MD is a more accurate and effective metric for the similarity than ED.

MD-mass Method Process
The process of the proposed method includes three main steps. The first is classifying the compound sample classes of the power set. Next step is calculating the MDs from new observed objects to all subsets. Then the obtained MDs will be transferred to the masses in step 3.

Calculation of Intersection Classes' Scope
It is easy to calculate the MD between an object and a singleton (crisp) class, but we can't calculate the MD between an object and a compound class directly. At the beginning, an original data sample belongs to a singleton class, but not to a compound (or mixed) class. One of the great advantages of DSET is that a certain degree of imprecision and conflicts between evidences are allowed to exist, and DSET is good at dealing with this issue. Thus, the method used to obtain the samples of the compound classes is very important.

The Calculation with One Dimensional Data
For one dimensional data, the intersection classes are easy to find out. Given two sample sets 1 The intersection class is shown in Figure 3. In Figure 3a, there is no intersection interval between X I and Y I , thus their intersection is null set. In Figure 3b, the intersection is XY I and its interval is  If an object distributes in the scope of AB I , it belongs to the corresponding intersection sample set.
In 2-dimension space, the intersection classes' scope is shown in Figure 4, which describes the way to find intersection class between sample set A and sample set B in a 2 dimension space. The black dots stand for elements of set A and the blue triangles denote the elements of set B.  In 3-dimensional space or even higher dimensional space, the intersection space is calculated as the same way as 2-dimensional space. Generally speaking, the higher dimension brings higher distinguishability.

Mapping MDs to Masses
In this paper, we take MDs as the basis of the basic belief assignments of the evidences. Now, a mechanism should be set up to map MDs to masses. This mechanism must satisfy the following principles: (1) Every subset should get a reasonable mass in order to conduct the fusion process by DSET. In a neural network, there are several types of common transfer functions, like logsig and tansig [22]. Here, we use logsig as the mapping function to convert the MDs to masses. To subset A, the assigned mass can be calculated by: where ( | ) A m A MD is the assigned mass of subset A with A MD , () A f MD is a monotonically decreasing transfer function converting MD between object and subset A to evidence's mass.  is the mean of the MDs, k is the adjustment coefficient, and the shape of the function will be changed when k changes its value.  k  . Horizontal axis is calculated MD, vertical axis is the transferred mass. According to this function, when MD < 2, the mass of the corresponding evidence is close to 1, whereas in the interval [2,6], the mass will decrease as the MD increases. When k = 2, the curve is steeper than the line of k = 1. Thus, we should adjust the value of k according to the actual situations to guarantee the transferring accuracy as high as possible.
The transfer function satisfies the principle of mass assignments we just proposed. When the MD between an object and a class is less than a certain value (threshold value), it belongs to the class with a high probability. If MD exceeds the threshold value, the probability decreases with the increasing of MD's value. When MD is larger than another certain value, the probability is quite low and is virtually zero. For example, when we judge whether a man is middle aged or not, if he is 40-50, we can be sure that he belongs to the middle age class. If his age is 30-40 or 50-60, the boundaries are not clear because there is a possibility that he is a youth or an old man. In this situation, the probability of middle age will decrease when the MD of his age increases. If his age is younger than 30 or older than 60, we can be sure that he is not a middle-aged man; in this case, the probability is very low.

Automatic Adjustment of k
In the previous section, we showed that  is the mean value of the MDs between an object and a class of 2  . Thus, we know its value by computing the mean of the MDs quickly and easily. The main problem is how to adjust the value of k automatically and properly.
Let  be the standard deviation of the calculated MDs. According to the Central Limit Theorem of statistics, and we can assume that the original data agrees with the Gaussian distribution. In a Gaussian distribution, the original data complies with the " 3 principle".
In a Gaussian distribution,  denotes the standard deviation and  denotes the mean value. The      , and the probability that a value will exceed the interval is not larger than 0.3%. We set max  to be the upper threshold value of the output masses and min  as the lower threshold value. Apparently, max  is the belief assignment when MD equals to 0 and max  should approximate to 1:

Scheme for Constructing Masses in DSET
In Section 4, the process of using MD to realize the masses in DSET was developed. The main idea of the algorithm is to construct the basis of the basic belief assignments through the prepared samples. Then, the MDs between the object and the samples are computed, and the following step is mapping the MDs to masses. Based on a real situation, we can choose a closed or open world. Finally, the output masses should be normalized. Figure 6 illustrates the assignment process for the masses in DSET. The detailed description of the process is as follows: (1) This first step is calculating the statistical features of each subset in 2  , especially the compound class of the power set. In order to guarantee the accuracy, it requires us to input adequate sample data to get correct statistical features. (2) Subsequently is the calculation of the sample set's numerical features. After the beginning of the algorithm, the frame of the discernment should be set up according to the specific situation. All the possible subsets (proposition sets) are constructed, and the mean value and standard deviation of every subset are computed.
(3) Then Compute the MDs between the object and the classes of 2  . For all classes where the mean and standard deviation values exist, calculate the MDs with the data collected from the observed object. (4) Transfer MDs to the masses. With the use of the transfer function, the obtained MDs will be converted into masses. (5) After the mapping step, the output is not the final answer we want. It should be normalized under the Closed World Assumption (CWA) or Open World Assumption (OWA). In CWA, the object to be recognized must belong to one of the subsets in  , which means the mass of the null set is 0, that is ( ) 0 m  . In OWA, unknown classes are allowed to exist, and the mass of the null set may be larger than 0. In this situation, if the sum of all the masses is larger than 1, which means the mass of the known classes is large enough, the mass of the null set should be set to 0, otherwise, ( ) 1 m sum    .

Setup of Transformer Fault Diagnosis
A transformer is an important distribution component in a power system. The security and reliability of the power system is heavily influenced by the transformer. In order to accurately and effectively detect the type of fault in a transformer, different sensors are applied in the inner space of the transformer [23,24], like gas sensor, voltage sensor, temperature sensor and humidity sensor. Now, we use DSET to solve the problem because DSET's advantage is fusing multisource data into one unified result. Here we use the method to construct masses form data collected by the gas sensors as a validation of the proposed method.
There are various kinds of gases in the transformer's internal space. In this paper, we use H 2 , CH 4 , C 2 H 6 , C 2 H 4 and C 2 H 2 as the basis of the diagnosis. When different faults occur, the percentage of each gas will change. To simplify the experiment, we consider three types of states: the normal state (No), temperature fault (Te), and discharge fault (Di). The collected data are the percentages of each gas for a total number of 600 pieces of data (120 samples). In this case, the frame of discernment is

Experiment Results
To illustrate the proposed algorithm, we take the combination of two gases as a sample; they are (C 2 H 6 , C 2 H 4 ). There are 40 samples for each state. To examine the proposed algorithm, 30 samples for each state are used as the basis of the classification, and the remaining 10 samples are used as the validation data. Before the process of finding out the scope of the compound class, some outlier data should be eliminated, because they will decrease the accuracy of the algorithm. Here, we delete the three samples with the largest MDs to form a crisp sample set. Figure 7 shows the distribution map of (C 2 H 6 , C 2 H 4 ) in different fault conditions. In Figure 7, the circle, triangle, and hexagon represent the normal state (No), temperature fault (Te), and discharge fault (Di), respectively. The first step is to determine the sample sets for all subsets in 2  In order to simplify the computation complexity, we calculate the intersection scope by rectangular area. After the process of calculating the intersection scopes, the intervals of the compound subsets are No Di= ([2.15 7.70 Figure 7b, No&Te and No&Te&Di are not shown in the map as they equal to the null set. The validate data and the corresponding results are described in Figure 8. Here min  , max  are set as 0.001,0.999, respectively. As shown in Figure 8, new observed fault data is collected to validate the correctness of the proposed algorithm. The distribution map shows the 30 validate samples collected in the three conditions. The corresponding masses of the validate samples are shown in Figure 9. The horizontal axis denotes the number of validate sample, vertical axis is the mass assignments of each sample and the sum of each mass equals to 1. The first 10 samples were collected when the transformer was normal, and 10-20 were collected when the transformer was in a temperature fault state, the last 20-30 samples correspond to the state of a discharge fault. After the process of the proposed method, masses under CWA and OWA are obtained, as shown in Figure 9a,b, respectively. Apparently, most of the masses under CWA are correct, except Nos.13, 15, 27. These are incorrect because boundaries between samples are not clear and the three objects' positions are too far from the sets they should belong to, which causes them to lie in the scope of other samples. A good way to get an optimized mass result is to calculate the MD by higher dimensional data pattern, such as (C 2 H 6 , C 2 H 4, H 2 ). In OWA, unknown states of the transformer are allowed. Thus, the null set's mass may be larger than 0. The belief assignment of the null set in Nos. 1, 6, 14, 20, and 25 are obviously larger than the other masses. These results predict that there are maybe unknown fault types unknown in the frame of discernment. However, considering the researchers have had a comprehensive understanding of the transformer's fault conditions, hence the world is better to be set as "closed" in this situation to get a more accurate classify result. To verify the accuracy, more tests are conducted with different multi-dimensional gas data patterns from 1 to 5. In 1-dimension, only one gas data is used to construct masses, such as C2H4, in 2-dimensions, two gas data are combined to construct masses, like (C2H6, C2H4). It is the same with 3-to 5-dimensional data patterns. We define the accuracy as: where r is the accuracy rate, N is the total number of objects, i N is the number of objects which have been correctly classified from its mass. If an object belongs to class i  Figure 10. The accuracies of the constructed masses increase with the increasing of the test data's dimension, especially from 1-dimension to 3-dimensions. Apparently, a higher dimension of the data will be beneficial to get more accurate masses. In practice, it is better not obtain mass from 1-dimensional data. In conclusion, the results illustrates that calculating belief assignment in DSET by MD is completely feasible and the method is easy to implement in sensor nodes.

Discussions and Conclusions
In the proposed algorithm, there are a couple of caveats that should be observed. First, it should be emphasized that the sample data used in the proposed algorithm should be adequate enough to get the correct statistical features of the sample data. This is a disadvantage compared to other method like ANN, which require low amount data to train the network. In Section 4, the calculation process of the intersection classes' scope was developed. The boundaries of intersection scopes calculated by the proposed method are straight lines, in reality they maybe irregular curves, which means the calculated intersections are approximations and may not that accurate. However, in practice, finding out the exact intersection space is a tough problem and there is no significance in sacrificing large amount computations in calculating the exact intersection scopes. Hence we choose the proposed method to calculate intersections, it is fast and efficiency and its experimental accuracy results are acceptable, too.
The algorithm presented in this paper is helpful in dealing with the multisource data of a WSN. In a WSN, the sensor nodes do not have an enormous amount of computing ability and their energy is limited. It is very meaningful to fuse the multisource data before uploading to the servers, which releases the transmission pressure of the sink node. The proposed paradigm has a high calculation speed, and the output masses are reasonable and stable, which lays a good foundation for the subsequent fusion calculation steps. We believe the paradigm proposed in this paper has a promising future in application. The future work may include the following: (1) applying the proposed algorithm in fuzzy set theory as the method to calculate the membership; (2) finding another way to compute the intersection between crisp focal elements in DSET and verifying its reasonableness; (3) developing a flexible and effective neural network by DSET and MD and examining its performance; (4) finding a reasonable way to calculate the MDs from multimedia data, rather than just scalar data.