An Uncertainty-Based Distributed Fault Detection Mechanism for Wireless Sensor Networks

Exchanging too many messages for fault detection will cause not only a degradation of the network quality of service, but also represents a huge burden on the limited energy of sensors. Therefore, we propose an uncertainty-based distributed fault detection through aided judgment of neighbors for wireless sensor networks. The algorithm considers the serious influence of sensing measurement loss and therefore uses Markov decision processes for filling in missing data. Most important of all, fault misjudgments caused by uncertainty conditions are the main drawbacks of traditional distributed fault detection mechanisms. We draw on the experience of evidence fusion rules based on information entropy theory and the degree of disagreement function to increase the accuracy of fault detection. Simulation results demonstrate our algorithm can effectively reduce communication energy overhead due to message exchanges and provide a higher detection accuracy ratio.

(2) Data loss will influence the fault judgment because each sensor determines its own state step by step according to its neighbors' measurements. The paper represents a data forecast model based on a Markov decision processes for filling in lost data to provide reference data for others' state determinations; (3) We classify two types of sensors' tendency states: Possible Good (LG) and Undetermined (Un).
LG nodes contribute to judge nodes' ultimate state. The Un nodes are both in an uncertainty status, so we must determine the ultimate status of an Un node. Here we design belief probability assignment (BPA) functions for different evidences that reflect the states of Un nodes. What's more, an evidence fusion rule based on information entropy theory is used to avoid evidence conflicts. The rest of the paper is organized as follows: Section 2 describes some related works in the area of fault detection in WSNs. Section 3 introduces our Uncertainty-based Distributed Fault Detection algorithm (uDFD) and the concrete mechanisms involved. Section 4 depicts the simulation results with respect to typical fault detection algorithms like DFD and IDFD, and demonstrates our algorithm's efficiency and superiority. In Section 5, we conclude the paper.

Related Works
In this section, we briefly review related works in the area of distributed and centralized fault detection in WSNs. The authors in [4] proposed and evaluated a localized fault detection scheme (DFD) to identify faulty sensors. An improved DFD scheme was proposed by Jiang in [5]. Neighbors always exchange sensing measurements periodically, therefore a sensor judges its own state (as good or faulty) according to neighbors' values. A faulty identification algorithm reported in [7] is completely localized and requires lower computational overhead, and it can easily be scaled to large sensor networks. In the algorithm, the reading of a sensor is compared with its neighbors' median readings. If the difference is large or large but negative, the sensor is deemed as faulty. If half of neighbors are faulty and the number of neighbors is even, the algorithm cannot detect faults.
Krishnamachari and co-workers proposed in [8] a distributed solution for the canonical task of binary detection of interesting environmental events. They explicitly take into account the possibility of Each sensor node identifies its own status based on local comparisons of sensed data with some thresholds and dissemination of the test results [9]. Time redundancy is used to tolerate transient sensing and communication faults. To eliminate the delay involved in z time redundancy scheme, a sliding window is employed with some data storage for comparison with previous results. The MANNA scheme [10] creates a manager located externally to the WSN. It has a global vision of the network and can perform complex tasks that would not be possible inside the network. Management activities take place when sensor nodes are collecting and sending temperature data. Every node will check its energy level and send a message to the manager/agent whenever there is a state change. The manager can then obtain the coverage map and energy level of all sensors based upon the collected information. To detect node failures, the manager sends GET operations to retrieve the node state. Without hearing from the nodes, the manager will consult the energy map to check its residual energy. In this way, MANNA architecture is able to locate faulty sensor nodes. However, this approach requires an external manager to perform the centralized diagnosis and the communication between nodes and the manager is too expensive for WSNs.
Tsang-Yi et al. [11] proposed a distributed fault-tolerant decision fusion in the presence of sensor faults. The collaborative sensor fault detection (CSFD) scheme is proposed to eliminate unreliable local decisions. In this approach, the local sensors send their decisions sequentially to a fusion center. This scheme establishes an upper bound on the fusion error probability based on a pre-designed fusion rule. This upper bound assumes identical local decision rules and fault-free environments. They proposed a criterion to search the faulty sensor nodes which is based on this error boundary. Once the fusion center identifies the faulty sensor nodes, all corresponding local decisions are removed from the computation of the likelihood ratios that are adopted to make the final decision. This approach considers crash and incorrect computation faults.
In [12], a taxonomy for classification of faults in sensor networks and the first on-line model-based testing technique are introduced. The technique considers the impact of readings of a particular sensor on the consistency of multi-sensor fusion. A sensor is most likely to be faulty if its elimination significantly improves the consistency of the results. A way to distinguish random noise is to run a maximum likelihood or Bayesian approach on the multi-sensor fusion measurements. If the accuracy of final results of multisensory fusion improves after running these procedures, random noise should exist. To get a consistent mapping of the sensed phenomena, different sensors' measurements need to be combined in a model. This cross-validation-based technique can be applied to a broad set of fault models. It is generic and can be applied to an arbitrary system of sensors that use an arbitrary type of data fusion. However, this technique is centralized. Sensor node information must be collected and sent to the base station to conduct the on-line fault detection.
Miao et al. [13] presented an online lightweight failure detection scheme named Agnostic Diagnosis (AD). This approach is motivated by the fact that the system metrics of sensors (e.g., radio-on time, number of packets transmitted) usually exhibit certain correlation patterns. This approach collects 22 types of metrics that are classified into four categories: (1) timing metrics (e.g., RadioOnTimeCounter). They denote the accumulative radio-on time; (2) traffic metrics (e.g., TransmitCounter). They record the accumulative number of packets transmitted by a sensor node; (3) task metrics (e.g., TaskExecCounter). This is the accumulative number of tasks executed; (4) other metrics such as Parent Change Counter, which counts the number of parent changes. AD exploits the correlations between the metrics of each sensor using a correlation graph that describes the status of the sensor node. By mining through the periodically updated correlation graphs, abnormal correlations are detected in time. Specifically, in addition to predefined faults (i.e., with known types and symptoms), silent failures caused by Byzantine faults are considered.
Exchanging too many messages for fault detection will cause not only a degradation of the network quality of service, but also a huge burden on the limited energy of sensors. Hence, we design an uncertainty-based distributed fault detection based on neighbor cooperation in WSNs. It adopts auto-correlated test results to describe different sensing states from day to day, and the information entropy-based D-S evidence theory will be introduced to deduce actual states for undetermined nodes.

The DFD and IDFD Schemes and Their Drawbacks
This section presents the DFD algorithm proposed by Chen [4] and IDFD algorithm described by Jiang [5] to give an overview of distributed fault detection, and then analyzes these algorithms' drawbacks. Chen [4] introduced a localized fault detection method by exchanging measures in WSNs. It is assumed that x i is the measurement of node i. We define t ij d to represent the measured difference between node i and j at time t, while When t ij d is less than or equal to a predefined threshold 1  , we will consider a test result c ij is set to 0, or else it continuously calculates is also a predefined threshold), then c ij = 1, otherwise c ij = 0. Here the expression c ij = 1 means node i and node j are possibly in different states. Next, the tendency status (possibly a faulty LF or possibly a good LG) is determined according to following formula [14]: LG otherwise is too harsh and this will lead some normal nodes to be misdiagnosed as faulty, so the determinant condition for a normal node is amended as: If there is no tendency status of a neighbor as LG, then the final determinant status is set as normal (faulty) based on T i = LG (T i = LF). Although this mechanism promotes the fault detection accuracy to a certain extent through simulation demonstration, it doesn't have a clear way to resolve conflicts or erroneous judgments as illustrated in Figure 1.
In Figure 1a, it calculates c 12 = 0, c 13 = 0, and c 14 = 0 for node 1. Then T 1 is set as LG according to Equation (3). In the same way, we get T 2 = LF, T 3 = LF, T 4 = LF. Node 1 has no neighbor whose tendency status is LG, and then the final determinant status is set as normal based on the rule of T i = LG. This is an obvious erroneous judgment.
The tendency states in Figure 1b are calculated as follows: . The node 1 is decided as faulty according to Equation (4). Actually, node 1 is a normal sensor. Node 1 will make a mistake when the number of normal neighbors equals the number of faulty neighbors. The premise is that their initial detection tendency states are LG. By analyzing misjudgment conditions of traditional algorithms, a defect is that an indeterminacy occurs on the condition "=" in Equation (4), and thus the node is not reducible to good or faulty. Another is that these algorithms ignore the effect of sensors' own measurements which are approximate at the same time on adjacent days (e.g., 8 June and 9 June). The analogous and historical readings of the same node contribute to determine the faulty state under vague conditions. Moreover, most distributed fault detection mechanisms assume that sensors have the ability to acquire every measurement and cooperatively judge the state of each other. When the sensor's communication module has a failure, but the acquisition module is active, the readings can't be perceived by the sensor. In a distributed collaborative process, nodes diagnose data faults based primarily on neighbors' data. Once a neighbor's data is missing, it will affect the accuracy of fault diagnosis, e.g., in Figure 1b, node 4 can't determine its own status when node 1 has no data.

Uncertainty-Based Distributed Fault Detection Algorithm
In the paper, we mainly resolve the following problems: (1) data missing before exchanging readings; (2) misjudgments caused by indeterminacy conditions. The problem of missing data due to communication faults will affect the determination accuracy when comparing neighbors' measurements. To solve the data loss, a faulty sensing node should fill in the missing measurements to provide the reference. Secondly, the represented algorithm adopts the auto-correlated test results to describe the status of differences between different days. Finally, those undetermined appearances may occur in the above-mentioned section. The information entropy and the degree of disagreement function combined in evidence fusion theory are improved accordingly to help to deduce their actual states. In addition, using information entropy in the evidence fusion can reduce evidence conflicts and increase detection accuracy.

Definitions
We list the notations in the uDFD algorithm as follows:

Fault Detection
The main processes of the uDFD algorithm based on neighbor cooperation are summarized as follows. The key technology for solving the two problems is described in Sections 3.2.3 and 3.2.4. Stage 1: Each sensor acquires the readings from its own sensing module. If no data is acquired, then it fills up the missing data. After that, it exchanges the measurement at time t on day D with its neighbors and calculates the test result C ij (It's assumed that C ij = 0 at the initial time): LG node can determine its own status (good or faulty), and only good sensors broadcast their states in order to save transmission overheads.
Stage 4: A node whose tendency status is Un determines the actual state by using entropy-based evidence combination mechanism: Broadcasting not only uses up nodes' energy but also occupies the channel bandwidth, so the main method of saving energy consumption in our algorithm is that only particular states in different stages (LG and GOOD) are broadcast. In Step 12, only the node whose tendency status is equal to LG broadcasts the value. The reason is that only LG neighbors participate in final state determination in Step 14. Similarly, only good sensors broadcast their states in order to save energy transmission overhead.

Missing Data Preprocessing Mechanism
In the paper, we mainly focus on sensing faults rather than communication faults. When missing data occurs because of a sensing fault, it will affect the accuracy of fault diagnosis. This means , Dt i X has been lost because the communication module has failed, which subsequently influences the reference data for other sensors' faulty state determination. It is necessary for node i to fill in the missing data and send it to neighbors. In this section, we use a Markov decision processes based on neighbors' historical data to predict the current missing measurement values of node i. Relying the features of Markov theory which can reflect the influence of random factors and extension to the stochastic process which is dynamic and fluctuating is considered and we combine the historical data of node i with its neighbors' historical data, and then form a fusion historical data vector, which can be adaptively adjusted according to the significance of neighbors' measurements. Therefore, the state transition matrix of Markov is adopted to predict the value and sign of the reading difference between two days. The steps for data missing preprocess preprocessing are as follows: Steps: (1) For each node i jN  , where N i is the set of all the neighbors of node i, fetch the previous m historical measurements of node j, and these historical measurements correspond to an m dimensional vector V j , that is (2) Calculate the reputation value C ij for each neighbor of node i, that is for each node Note that for a different node i, node j has different reputation values and a smaller value for i  will increase the reputation value of node j; (3) Here we introduce Mahalanobis distance to evaluate the similarity distance between node i and its neighbors. Then the prediction results should keep Mahalanobis distance changes within a predefined threshold. For each node and V j , in order to evaluate the similarity of node i and all its neighbors. That is (4) Assume that * i V is a fusion of the historical measurements of node i and all its neighbors, which is used in the Markov decision processes to predict the current measurement of node i. It is also an m dimensional vector and can be calculated as follows: In this data-fusion formula, the historical measurement vector V j is weighted by the reputation value of node j, and the factors α and β ( 1   ) indicate to what extent a node trusts itself and neighbors.  Then the sample average is: The standard deviation is: According to central-limit theorem [17], we divide the sliding interval of historical fault data into five states, that is , the probability distribution vector is: indicates the sign of ( 1, ) || Dt i X  .

Information Entropy Based Evidence Confusion
As the Un nodes are both in uncertainty status, we need to find a mechanism to determine the status of these nodes. Dempster-Shafer evidence theory is an effective method for dealing with uncertainty problems, but the results obtained are counterintuitive when the evidences conflict highly with each other [18,19].
In the improved evidence fusion algorithm we propose, the possible events can be depicted as evidences. Through combination rules, evidences are aggregated into a comprehensive belief probability assignment under uncertainty conditions. It's assumed that a set of hypotheses about node status is denoted as frame of discernment { , } GF  . The symbol G represents a good sensor, and F is faulty. The belief probability assignment (BPA) functions of node i are depicted as follows: We define the BPA function for good status is: Similarly, the BPA function for faulty status is: The BPA function for uncertainty status is: Here we design an expectation deviation function X  . It's assumed that the measurement value of nodes at time t on day D is a random variable, which has the expectation EX and variance 2  . Define In Section 3.1, we have discussed that one of the defects of traditional algorithms is that an indeterminacy occurs for the "=" condition in Equation (4), and thus the node is not reducible to good or faulty. Therefore, we define the range In Equation (14), when  In D-S evidence theory, if there are more than two BPAs that need to be combined, then the combination rule is defined as follows: where K is the mass that is assigned to the empty set Φ, and Obliterating conflict roughly and running normalization processes leads to extreme differences between G and F. This will cause errors in the judgment of sensors' states when using uDFD. That is because the node i will find the node j which matches the * min ({ } ({ })) j m G m G    . Too extreme evidence will influence the effect of comprehensive evidence. Based on this, we propose a new evidence fusion rule combined with information entropy theory. According to conflicts to the entirety presented by information divergences, we classify evidences into several sets. By fusing the results from different sets, this prevents extreme extension of differences between G and F. By this evidence fusion algorithm, we can finally determine the nodes' status.
In classical theories of information, Shannon Entropy measures the amount of information, while the amount of information reflects the uncertainty in random events. Considering different evidences should be assigned different fusion weight according to its amount of information, so, in this section, the theories of entropy and the degree of disagreement function which measures the information discrepancy are introduced into combination rules for evidence conflicts and increase the accuracy of fault determination for Un nodes. Firstly, we introduce some definitions. The information divergence ( | | ) D p q between discrete random variables p and q is defined as below [20]: It is obvious that ( | | ) 0 D p q  assume that M l indicates the lth evidence and It indicates the degree of differences between M l and the whole evidences. It is determined by the average of the information divergence between M l and each evidence. After this, define δ l as the percentage of the whole difference degree that M l occupies. It is calculated as follows: According to δ l , evidences are going to be classified into several subsets. Evidences which have similar δ l are aggregated in the same subset. Before classification, the demarcation point ∆ is confirmed as below, which means the average differences between δ l : Assume that C r is the resultant subset and P is the collection of δ l . The pseudo code of the classification algorithm is as follows: 1: r = 0; 2: While P is not empty, do 3: Randomly, choose any one element from P and put it into C r . Remark this element as C r1 ; 4: Remove C r1 from P; 5: Loop1, for l = 1 to s 6: Loop2, for j = 1 to || r C (|| r C is the cardinality of || r C ) Then we can get the aggregative center of C r by using an improved D-S formula. For any focal element "A", the result of fusion is Here, p(A) represents the traditional way to fuse evidences and q(A) represents the average support degrees from each evidence to A. When K is large enough, the influence from q(A) is increased.
Assume that there are m subsets and n r is the aggregative center of C r , then the final result of fusion is:

Simulation Setting
We use the MATLAB simulation tool to demonstrate our model. As shown in Figure 3, a square with a side length of 100 m is constructed in our model, in which sensors are deployed and form the network. Ten temperature sources are deployed in the square as the sensing objects of sensors. The distance between two sources is no less than L. Every temperature source randomly generates temperature data x which ranges from −5 to 40 °C. These readings simulate the temperature variation of four seasons, which means it has regularity and smoothness. Second, n sensors are deployed in this square and each of them selects the nearest temperature source which must be in the sensing range. If no temperature source exists within sensing range, a sensor is set to not work, which means no sensing from a temperature source and no communication with neighbor nodes. Each working node establishes its variation of sensed data according to the distance to its temperature source, which can be described by the formulas below: and min X are the upper and lower bounds of the data range, respectively. x is the temperature generated by the temperature sources and it is uniformly distributed in ( min X , max X ). d is the distance between a sensor and its temperature source. In every sensing moment, a sensor chooses a random value between max X and min X as its sensing data.
Each sensor node chooses other nodes which are within its communication range (communication radius is represented by R) and have the same temperature source as its neighbor nodes. After this, each node creates a set of neighbor nodes and the wireless sensor network is formed.
Two cases of uniformly distributed fault nodes and intensively distributed fault nodes are simulated. The first case is used in comparison when the number of nodes ranges. In the second case, we set squares which are located at a random coordinate as a fault region. We compare the detection effects for different scales of intensive faults by changing the area of a square. According to the sensing data designation, data ranging from min X to max X are treated as good, otherwise, data are treated as faulty.

Effect of Data loss
At each moment of the data collection, some nodes are chosen to be unable to sense data to simulate a data loss scenario. The Data Missing Preprocess Mechanism proposed in this paper is compared with the Data Filling method based on the k-Nearest Neighbor algorithm (df-KNN) algorithm. The main idea of df-KNN is to select k nodes from neighborhood which have the shortest distances, weigh the data of the k nodes according to these distances and finally sum the data as the interpolation result. Here, the data loss rate is set to 5%, 10%, 15%, 20%, 25%, 30% and 35%, respectively. As shown in Figure 4, data loss rate is set as the horizontal ordinate, which means the ratio of the number of the data loss nodes to the sum of working nodes. The mean residual is set as the vertical ordinate, which means the average of differences between interpolation data and pre-established data and it reflects the final accuracy of the algorithms. Mean residual grows as the loss rate grows. When the loss rate is lower, the mean residual of uDFD is 0.1 lower than that of KNN, and achieves an unremarkable improvement, but as the loss rate grows higher, the improvement turns to be higher. Approximately, when the loss rate is high enough, the improvement is 0.5, which means uDFD is more suitable in the large-scale data loss situation. With the growth of data loss rate, the number of neighbors which have available data reduces, which means less information could be collected and eventually this makes the interpolation results unreliable. In comparison with df-KNN, uDFD adequately involves the historical data of neighbors to predict and solve the problem of credit reduction due to less available data, which leads to better results.

Evidence Fusion
In this paper, information theory-based evidence reasoning is used to fuse collected evidences before the status judgment of nodes. Original D-S evidence reasoning and an improved one proposed by Qiang Ma et al. [22] are used for comparison. The improved D-S is depicted as below.
Define the distance between evidence m 1 and m 2 : Define the similarity of m 1 and m 2 : Define the basic credit of m i : Here N is the sum of evidences.
Define the weight of m j : Amend all evidences: A is the established focal element and ψ is the uncertain one. Fuse the amended evidences through the original D-S. The algorithm above measures the degree of conflicts among evidences by involving distance and amends evidences before fusion. The simulation results are shown in Table 1.
Belief function and plausibility function are involved to estimate the fusion results. BPA-based belief function in the frame of discernment Θ is defined as: BPA-based plausibility function in the frame of discernment Θ is defined as: Belief interval is defined as [Bel(A), Pl(A)], which is shown in Figure 5. The hypothesis that A is true is accepted in    The belief functions and plausibility functions are shown in Table 2 according to the fusion results.  The results of these algorithms are similar to each other when two evidences with low degree of conflict in evidence set 1 are to be fused, among which the results of the original D-S and the improved D-S proposed by Qiang Ma [22] are the same. A similar evidence is added to set 1 to form set 2. Through the analysis of belief functions and plausibility functions of three algorithms, the possibility of original D-S to accept that G is true is 0.2703 and the possibility to refuse is 0.3514. The possibility of improved D-S to accept that G is true is 0.2561 and the possibility to refuse is 0.3388. The possibility of uDFD to accept that G is true is 0.1956 and the possibility to refuse is 0.2272. According to evidence set 2, the possibilities of the previous two algorithms to accept and refuse that G is true is too high to reflect the actual situation (the possibilities to accept and refuse are both lower than 0.2) of each evidence. However, the algorithm proposed in this paper is closer. In uDFD, the possibilities to accept and refuse are not raised by reducing the uncertainty, which makes the fusion result more credible. A completely different evidence is added to set 1 to form set 3 with high degree of conflict. It is obvious that the possibility (0.5978) of original D-S to accept that G is true is so high that approaches the one of the last evidence which is added in set 3 and the possibility (0.2881) of improved D-S is too low to reflect the affect caused by high degree of conflict. The result of uDFD is between the ones of the previous two algorithms, which balances the influences of all evidences and is more credible.

Detection Accuracy
DFD, IDFD and uDFD are compared based on the constructed wireless sensor network model. Measures to be involved are detection accuracy (the ratio of number of correctly detected nodes to the sum of working nodes), false alarm rate (the ratio of number of nodes which are misjudged from good to false to the sum of working nodes), missing alarm rate (the ratio of number of nodes which are misjudged from false to good to the sum of working nodes). In this simulation, each final data point is the average of results from 30 repeats.
First, we analyze the effects of these three algorithms with the changing fault rate when nodes are randomly distributed uniformly. Considering that different influences are caused by different distribution densities, cases with 40, 80 and 120 working nodes are simulated. Figure 6 shows the detection effects of the three algorithms with 40 working nodes. In this case, nodes are distributed sparsely in the simulation region. It can be seen from the figure that with the increasing fault rate, detection accuracy shows an approximate linear downward trend; however, false alarm rate and missing alarm rate show the opposite trend. Through further calculation, when the fault rate ranges from 5% to 50%, average detection accuracy of uDFD is 9.33% points higher than that of DFD and 6.25% points higher than that of IDFD; average fault alarm rate of uDFD is 6.93% points lower than that of DFD and 5.33% points lower than that of IDFD; average missing alarm rate of uDFD is 2.49% points lower than that of DFD and 1.02% points lower than that of IDFD. The uDFD brings better detection accuracy under sparse distribution conditions.  Figure 7 shows the detection effects of the three algorithms with 80 working nodes. In this case, nodes are distributed moderately densely in the simulation region. As is shown by the figure, with the increasing fault rate, detection accuracy shows an approximately linear downward trend; however, false alarm rate and missing alarm rate show the opposite trend. Through further calculation, when the fault rate ranges from 5% to 50%, average detection accuracy of uDFD is 7.31% points higher than that of DFD and 5.44% points higher than that of IDFD; average fault alarm rate of uDFD is 4.21% points lower than that of DFD and 3.4% points lower than that of IDFD; average missing alarm rate of uDFD is 3.22% points lower than that of DFD and 2.04% points lower than that of IDFD. The uDFD provides better detection accuracy in this condition. Meanwhile, as the fault rate grows, the superiority of the detection effect of uDFD continues to increase. When the fault rate is 50%, the detection accuracy of uDFD is 10.08% points higher than that of DFD and 7.68% points higher than that of IDFD, which indicates that uDFD adapts better to high fault rate conditions.  Figure 8 shows the detection effects of the three algorithms with 120 working nodes. In this case, nodes are distributed densely in the simulation region. It can be seen from the figure that with the increasing fault rate, detection accuracy shows an approximately linear downward trend; however, false alarm rate and missing alarm rate show the opposite trend. Through further calculation, when the fault rate ranges from 5% to 50%, the average detection accuracy of uDFD is 3.75% points higher than that of DFD and 1.74% points higher than that of IDFD; average fault alarm rate of uDFD is 1.69% points lower than that of DFD and 1.14% points lower than that of IDFD; average missing alarm rate of uDFD is 2.09% points lower than that of DFD and 0.6% points lower than that of IDFD. The uDFD provides better detection accuracy under dense distribution conditions. Furthermore, as the fault rate grows, the superiority of the detection effect of uDFD increases continually, which indicates that uDFD performs better under high fault rate conditions. Through comprehensive analysis of Figures 6-8, we can see that detection accuracy of these algorithms increases with the increasing distribution density of nodes. This is due to the increase of available information when judging resulting from the growing number of neighbor nodes. By comparison, the advantage of detection accuracy of uDFD increased as the distribution density of nodes decreases, which indicates that the detection accuracy of uDFD improves more than that of DFD and IDFD. Thanks to evidence fusion based on the status of neighbor nodes before judgment, uDFD works better under small number of neighbor node conditions. Second, we analyze the detection accuracy of the three algorithms when fault nodes are intensively distributed. The intensive distribution scheme involves setting squares located at random coordinates with length of 20, 25, 30, 35 and 40 m as the fault regions. In the fault region, all nodes are set to fault. By changing the size of the ault region, we can observe the detection accuracy for different scales of faulty nodes. Here, the number of working nodes is 80. Figure 9 shows detection effects of the three algorithms with different fault region sizes. When the fault rate ranges from 5% to 50%, the average detection accuracy of uDFD is 2.08% points higher than that of DFD and 1.55% points higher than that of IDFD; average fault alarm rate of uDFD is 0.84% points lower than that of DFD and 0.45% points lower than that of IDFD; average missing alarm rate of uDFD is 1.24% points lower than that of DFD and 1.09% points lower than that of IDFD. It is easy to conclude that uDFD can achieve better detection effects when intensive faults occur. The uDFD takes in more information from the neighborhood to judge when dealing with intensive faults situations, which reduces the influence from mutual cheating among faulty nodes.
In uDFD, θ 1 is the threshold of data from different nodes on same moment, θ 2 is the threshold of data from the same node on different moments and θ 3 is the threshold of data of the same node collected on the same moment of different days. In our model, a value of θ 2 ranging within the interval (0, 2) has little influence on detection effect, so it is set to a fixed value of 1. Based on those values, the best combination of θ 1 and θ 3 is going to be explored.
The selection of θ 1 and θ 3 is directly related to the judgment results, thus, to explore the best combination of θ 1 and θ 3 becomes the key to explore the best detection effect of uDFD. Figure 10 shows a 3D map of detection accuracy with different combinations of θ 1 and θ 3 . Here, the number of working nodes is 80. It can be seen from the curved surface that the combination of θ 1 = 5, θ 3 = 5.4 approaches the peak, which means the maximum detection accuracy is 0.98.

Communication Energy Consumption
In our model, communication between nodes is simulated by using the ZigBee protocol. ZigBee, a personal area network protocol based on IEEE802.15.4, supports short-distance, low-complexity, self-organizing, low-power, high-speed and low-cost wireless communication technology and applies well in WSNs. The brief frame structure of ZigBee is shown in Figure 11. The frame head is constructed by the bits from the application layer, network layer, MAC layer and physical layer. In our model, messages transmitted between nodes include prejudged statuses, evidences and final judged statuses. The information above can be encapsulated in the payload field of the application layer frame by analyzing the frame structure of ZigBee. When statuses are to be transmitted, the payload is only 4 bits (2 bits present message type and 2 bits present status, namely LG/G) and the total length of a frame is 47 bits (the header length is 43 bits).  First, we analyze the number of messages transmitted in the simulated network under 40 working node conditions, which is depicted in Figure 12. When transmitting messages, nodes exchange statuses by radio broadcasting. An accumulated number of messages is recorded in the process of 30 tests. With the growth of test rounds, the number of messages shows an approximately linear upward trend. Through further calculation, approximate slopes of DFD, IDFD and uDFD are 266.7, 180.0 and 103.3, respectively. Obviously, the rate of increase of uDFD is the lowest, which means a minimum of messages transmitted during the fault detection. This is because the nodes in DFD and IDFD have to exchange all prejudged states and final judged statuses whether they are good or faulty, which leads to more interactions, while uDFD only exchanges message if the tendency status is LG or final status is good.
The comparison of average communication energy consumption of each node after 30 tests is shown in Figure 13. Average energy consumptions of all nodes in DFD, IDFD and uDFD are 0.235, 0.176 and 0.082 mJ. By counting and comparing the energy consumption of each node specifically, we find that uDFD has the best energy saving performance, which is caused by the reduction of interactions.      Under 120 working node conditions the average energy consumption of all detections in uDFD is 9.26 mJ lower than that of IDFD and 19.59 mJ lower than that of DFD. The uDFD has the best energy saving performance during detection. Compared with DFD and IDFD, uDFD has less iteration and no need to transmit Un status, which reduces interactions and decreases energy consumption. Messages carrying evidences use more bits than those carrying status, but this disadvantage has little adverse influence on the overall performance of uDFD. We achieve good performances which are shown in simulation above. As traditional DFD and IDFD require that each node broadcast its status, we reduce the communication overload by broadcasting the status of nodes which are determined as good. What's more, uDFD displays higher detection accuracy in a high data loss rate environment.

Conclusions
In the paper, we propose a fault detection mechanism for wireless sensor networks based on data filling and evidence fusion methods. Aiming at decreasing of detection accuracy due to data losses, the uDFD mechanism is demonstrated to be more suitable in the large-scale data loss situation. What's more, information entropy theory-based evidence reasoning is used to fuse collected evidences before the status judgment of nodes. This helps balance the influences of all evidences and make them more credible. Our algorithm can retain higher detection accuracy regardless of lower connectivity environment or changing fault ratios. The design that only sensors determined as good require exchanging states for evidence fusion decreases the number of messages broadcast in the process of fault detection. Avoiding too many message exchanges for fault detection will reduce a huge burden on the limited energy of sensors. In the future, we will solve the phenomenon that the detection accuracy is less than 80% when the fault ratio is closer to 0.5. For example, we will consider historical judgment behaviors to reason and increase detection accuracy, as well as the cross-impact of more types of faults.