Uncertain Data Clustering-Based Distance Estimation in Wireless Sensor Networks

For communication distance estimations in Wireless Sensor Networks (WSNs), the RSSI (Received Signal Strength Indicator) value is usually assumed to have a linear relationship with the logarithm of the communication distance. However, this is not always true in reality because there are always uncertainties in RSSI readings due to obstacles, wireless interferences, etc. In this paper, we specifically propose a novel RSSI-based communication distance estimation method based on the idea of interval data clustering. We first use interval data, combined with statistical information of RSSI values, to interpret the distribution characteristics of RSSI. We then use interval data hard clustering and soft clustering to overcome different levels of RSSI uncertainties, respectively. We have used real RSSI measurements to evaluate our communication distance estimation method in three representative wireless environments. Extensive experimental results show that our communication distance estimation method can effectively achieve promising estimation accuracy with high efficiency when compared to other state-of-art approaches.

D = D 0  10((P(D 0 )−P(D)−X r )/10n) (2) However, in real systems, there are uncertainties in the arriving signal strength due to the influence of environmental factors such as reflection, refraction, multi-path transmission, antenna gain, and many other obstacles [12]. Moreover, under different environments or at different communication distances, the level of uncertainty in RSSI values will also be different (if uncertainty is represented by the statistical variance, the higher the variance, the greater the uncertainty is). Generally in an open air environment the level of uncertainty in RSSI values is lower than that of an environment which has obstacles, such as walls. Therefore, the relationship between RSSI and D can hardly fulfill Equation (2). There is no longer a linear relationship between the RSSI value and lg(D) in these scenarios.
If we directly apply the above-mentioned empirical model-based linear or curve fitting method to RSSI-D estimation, the communication distance estimation relative error could be 50% or worse [13]. To solve this problem, scholars have performed many studies on the subject and have proposed various methods. Some researchers have proposed particle swarm optimization (PSO) [10], extended Kalman filter (EKF) [14,15], particle filter (PF) [16] and methodology to filter out the errors in the RSSI. However, with these filters, the system model must be accurately described and moreover, the computation complexity is high, and timing requirements in real-time processing are difficult to meet for many WSN applications. Although real RSSI values exhibit a significant level of uncertainty, their distributions still share some statistical properties in terms of uncertainties. Specifically, RSSI values with the same communication distance tend to constitute a cluster. The objective of this paper is to find a more effective way to overcome the uncertainty of RSSI values and achieve better RSSI-D estimation results.
To improve distance estimation accuracy, we have proposed a RSSI-D estimation method using interval data clustering, called Distance Estimation using Uncertain Data Clustering (DEUDC). As shown in Figure 1, the framework of DEUDC is comprised of an off-line environment measurement module and an on-line distance estimation module.

On-line Distance Estimation
Off-line environment measurement: We first perform RSSI sample measurements at different communication points in the wireless communication environment. We then submit the RSSI data for statistical computation and model the RSSI distribution characteristic in terms of RSSI uncertainties. We can obtain an RSSI-D mapping based on this method.
On-line distance estimation: During the RSSI-D estimation procedure, the RSSI value is measured by a wireless sensor node (e.g., CC2530 WSN node), and we can estimate the communication distance using uncertain data clustering.
In the on-line distance estimation module, considering different levels of uncertainty in RSSI values, we adopt RSSI-D estimation methods using both hard and soft uncertain data clustering methods to improve the estimation accuracy.
The contributions of this paper are as follows: (1) We propose DEUDC, a RSSI-based communication estimation method, which uses a mapping strategy and an uncertain data clustering method. Unlike sample-based mapping in RADAR [17] and ARIADNE [18] systems, we resort to distribution-based mapping to overcome the uncertainty in RSSI readings.
(2) To address the uncertainty in RSSI values, we adopt interval data and statistical information to represent the RSSI distribution characteristic of each distance. In comparison to sample-based mapping, by exploiting distribution-based statistics, our approach can potentially obtain greater improvement in estimation accuracy and efficiency.
(3) We propose an RSSI-D estimation method in which uncertain data soft and hard clustering algorithms are implemented in order to obtain better estimation accuracy with respect to different levels of uncertainty in RSSI.
(4) We have evaluated DEUDC using real data sets from representative wireless environment. Experimental results show that DEUDC out-performs state-of-art estimation methods.
The remainder of this paper is organized as follows: we present related work in Section 2; Section 3 introduces the uncertain data expression, including related definitions and the distance computation method used to handle interval data; Section 4 describes the RSSI-D estimation method using uncertain data clustering and its implementation; we evaluate the performance of this RSSI-D estimation method in Section 5; Section 6 concludes the paper.

Related Works
RSSI provides an inexpensive and practical way [19] of estimating communication distances during the operation of range-based localization systems or other range-based service systems used for wireless communications. Many uncertain factors exist during the measurement of RSSI [17], and the uncertainty in RSSI values leads to very low accuracy when estimating communication distances. For the RSSI-based communication distance estimation problem, many studies have been performed to improve the estimation accuracy. These studies can be divided into two categories: those dedicated to model-based methods, and those dedicated to mapping-based methods.

Model-Based Estimation Methods
Shang et al. adopted empirical models of radio propagation to estimate communication distance [20]. However, the estimation accuracy of this method is sensitive to many uncertain factors. Li et al. proposed a least-squares (LS) curve fitting method to reduce the influence of RSSI outliers [21]. In [22], The practice of LS-based curve fitting using a statistical means method is presented to improve the accuracy with which communication distances are estimated using RSSI, but the results are not very promising. Statistical filter methodologies, such as the extended Kalman filter [14] and particle filter [16] methodologies have been presented to improve estimation accuracy. However, with these filters, the system model needs to be accurately described; moreover, the computation complexity is high and timing requirements in real-time RSSI-D estimation are difficult to fulfill. In [23], the uncertainty in RSSI values is considered, but no further studies were performed. Kung et al. adopted weighted range measurements with different sensor nodes and a statistical technique to tolerate outliers [24]. CDL exploits both range-free and range-based methods to obtain better estimation quality [25].

Mapping-Based Estimation Methods
The RADAR system in [17] uses both empirical and mathematical models to determine RSSI-D. Results show that mapping-based empirical methods can yield better quality. The ARIADNE system, which uses cluster-based RSSI-D estimation, and does not consider the uncertainty in the RSSI values presented in [18].

Similar Systems
The RADAR system [17], the system most similar one to DEUDC, proposes a signal strength map (SS-MAP) and searches the location of the node. The estimation efficiency and accuracy of this system are sensitive to the number of samples. It also does not consider the uncertainty in the RSSI value.
Similarly, the ARIADNE [18] system contains two modules: a map generation module and a search module. For imprecise radio propagation map tables, the system adopts a clustering-based search algorithm to obtain a good quality of estimation. Relative to that of the RADAR system, the estimation efficiency is superior to an extent. However, when the number of samples is large, the estimation efficiency is still very low.
Unlike the aforementioned systems, in this paper we do not adopt sample-based mapping, but rather we resort to a distribution mapping strategy and use a clustering search method to estimate the communication distance. We not only consider the uncertainty in the RSSI values, but we also propose DEUDC, a communication distance estimation method that uses a clustering algorithm which can overcome the uncertainty in RSSI values in different types of environments and improve the distance estimation accuracy.

Related Definitions
We express uncertain RSSI values in terms of interval data. First, we provide some relevant definitions regarding the interval data.
(1) Interval data [26,27]: For given A L , A R  R, and A R ≥ A L , we call the set where A L is the lower bound of the interval data, and A R is the upper bound. If A R = A L , which means the upper and lower bounds are equal, the interval data becomes exact data.
(2) Midpoint and radius of interval data [26,27]: For a given interval data A = [A L , A R ], let r A = (A R − A L )/2; thus, we have: We define m A and r A (r A ≥ 0) as the midpoint and radius, respectively, of interval data A. Therefore, we can also express the interval data as follows: Because we estimate RSSI-D according to the exact RSSI values measured in the RSSI-D procedure, we propose our third definition as the distance between the interval data and the exact data.
(3) Distance between the interval data and the exact data: For given interval data X = [m X − r X , m X + r X ], Y ＝ y, where m X , r X , y  R. The distance relationship between the two datasets is illustrated in Figure 2. When they are separate from each other, as shown in Figure 2a, the minimum distance is m X − y − r X and the maximum distance is m X − y + r X ; when they are joined, as shown in Figure 2b, the minimum distance is 0, and the maximum distance is m X − y + r X = 2 r X ; when the interval data contains the exact data, as shown in Figure 2c, the minimum distance is 0, and the maximum distance is m X − y + r X . Therefore, we can calculate the maximum distance d max between X and Y, the minimum distance d min and the distance d between the interval data and exact data as follows: As indicated by Equation (6), the distance between the interval data and the exact data remains as interval data, which can comprehensively represent different distance values.

Overview of DEUDC
In this section, we first adopt the statistical information of RSSI values and interval data to represent the distribution characteristics of RSSI-D. As mentioned above, the RSSI values of the same communication distance share the same distribution characteristics and form a cluster; therefore, we can represent the distribution characteristics in the form of a cluster center. We then calculate the distance (similarity) between RSSI value and cluster centers, which will determine the RSSI value belonging to each cluster. Based on the results, taking into account the problem of different levels of uncertainty in RSSI values in different environments, we propose the RSSI-D estimation method using hard and FCM [28] soft interval data clustering algorithms.
The framework of the RSSI-D estimation system is illustrated in Figure 3. The communication distance estimation system is composed of the following modules: a RSSI Sample Measure Module, a Static Computing & Cluster Center Representation Module and a Clustering Analysis & Communication Estimation Module. We first conduct environmental measurements. We then sample the RSSI values of different communication distances over certain distance intervals (e.g., 0.5 m) in the communication environment; we can then form the RSSI-D sample dataset, and submit the dataset of each communication distance to statistical computation to obtain pertinent statistical information (i.e., mean and standard deviation) and express it in the form of cluster centers, which can represent the statistical RSSI-D mapping relation. During the communication distance estimation stage, we apply clustering analysis to the RSSI values according to the cluster centers and obtain the corresponding communication distance of the RSSI values.

RSSI Sample Measurement
In RSSI-D estimation environments (e.g., indoor corridor, hall or open air), within the communication range of the nodes, we fix the anchor node (whose position information is known) and move the unknown node relative to the anchor node by some different specific communication distances. We measure the RSSI value Y of different communication distances. To obtain the statistical characteristics of RSSI-D at each communication point, we measure the RSSI 150 times. Thus, we obtain the RSSI-D sample dataset. In the same manner, we perform the RSSI-D sample measurement in different types of typical communication environments, including an indoor corridor, a hall and an open air environment.

Statistic Computing, Cluster Center Representation
After obtaining the sample datasets, we submit the RSSI values of each communication point to statistical analysis and obtain the pertinent statistical information, namely the mean value (μ) and the standard deviation (σ). We express this statistical range as [μk  σ, μ + k  σ], where k is a coverage factor and {k  R0  k  3}. Assume the RSSI values of every communication distance form a cluster; thus, the cluster center, or statistical region, is Assume the number of cluster centers for the RSSI values is N{N  Z0  N  3} within the communication range, μ j is the mean value of one cluster and σ j is the standard deviation. We represent the set of cluster centers as follows: , and the corresponding distance {d sj }.

Distance Calculation between RSSI Value and Cluster Center
To determine to which cluster the RSSI value Y belongs, we first define the calculation for the distance between the RSSI value Y and the cluster center. As stated in definition (3) in Section 3, the is still an interval data concept. To perform clustering analysis, we introduce a correlation factor λ [29], where 0  λ  1, and use it to combine these two distance extremes to calculate the distance D j (c j , Y) as follows: In the equation, when λ is equal to 0, D j (c j , Y) is maximized, i.e., the distance between the two sets of data is the greatest. When λ is equal to 1, D j (c j , Y) is minimized. All other values of λ are combinations of the two distance extremes.

RSSI-D Estimation Method Based on Interval Data Clustering
Base on distance calculation, for an arbitrary RSSI value Y, we can determine to which cluster Y belongs using interval data clustering algorithm. We then treat the distance that corresponds to the determined cluster center as the RSSI-D estimation result. For different levels of uncertainty in RSSI value in different environments, we proposed hard-based and soft-based interval data cluster algorithm.
(1) Distance Estimation using Uncertain Data Hard Clustering (DEUDHC) Unlike traditional clustering analysis, the mean value μ and standard deviation σ of the cluster center are obtained through statistical calculation. Moreover, the cluster center is expressed by the interval data. For an RSSI value Y, in the RSSI-D estimation process, Equation (7) is used to calculate the distance between Y and each RSSI cluster center (c i ), to determine the cluster center c j located at the shortest distance (D jY ) and then use the related communication distance (d sj ) of that RSSI cluster center c j as the estimated value for the communication distance (d c ) of the RSSI value Y. We call this method Distance Estimation using Uncertain Data Hard Clustering (DEUDHC), which is based on interval data hard clustering. The main pseudo code describing how the method operates is presented in Algorithm 1.

Algorithm 1: DEUDHC ( )
Here, Y is the RSSI value used in RSSI-D estimation during the RSSI-D estimation stage, k is a coverage factor, λ is a correlation factor, {c j } {0  j  N} is the center of each RSSI cluster, {d sj } {0  j  N} is the communication distance related to each cluster center, and d c is the estimated value for the given RSSI value Y.
When the level of uncertainty in RSSI values is very high many of the cluster centers represented by the interval data will overlap. If the DEUDHC method is adopted, the error in the distance estimation may be large. We apply the DEUDHC method for RSSI-D estimation in three typical environments, and the relative distance estimation error is shown in Figure 4, which demonstrates that the error is very large. In addition, the communication distance is discrete when using the interval number hard clustering RSSI-D estimation method because the method does not consider the RSSI value between two communication distances. (2) Distance Estimation using Uncertain Data Soft Clustering (DEUDSC) To solve these problems, the distance estimation method based fuzzy clustering is introduced. We use an FCM [28] soft clustering algorithm to determine the three cluster centers that have the highest degree of membership, we denote them as U i , U m , U n . We then multiply the distances (i.e., d si , d sm and d sn ) related to the three cluster centers by the corresponding degrees (i.e., U i , U m , U n ) of membership and accumulate them (i.e., d c ＝ U i  d si + U m  d sm + U n  d sn ) to obtain the estimation result d c of communication distance of RSSI value Y. We refer to this method as Distance Estimation using Uncertain Data Soft Clustering (DEUDSC), for which the main pseudo code is presented in Algorithm 2. shows that the DEUDSC method can greatly improve the RSSI-D estimation accuracy relative to that of the DEUDHC method in the three typical environments under consideration. In the environments with higher levels of uncertainty in the RSSI values (i.e., the corridor and the hall), the improvement in the estimation accuracy is particularly great. On the other hand, in the open air environment, which features a low level of uncertainty in the RSSI values, the improvement is very limited. Figure 5. RSSI-D estimation error using interval data hard and soft clustering methods.

Efficiency Improvement: Micro-Cluster Based Clustering
To improve the efficiency of DEUDC, we apply the UK-means [30] method to perform clustering analysis on the RSSI cluster centers, and obtain macro-clusters. As shown in Figure 6, we set the number of macro-cluster centers to three, and get three macro-clusters: macro-cluster 1, macro-cluster 2 and macro-cluster 3. When we perform the RSSI-D estimation, once we obtain a RSSI value Y, we first determine to which macro-cluster (in this case, macro-cluster 3) the RSSI value Y belongs (i.e., the distance between the two is minimum according to Equation (6)) [18,30]. Secondly, in macro-cluster 3, we further determine to which cluster center c i = [μ ik  σ i , μ i + k  σ i ] the RSSI value Y belongs according to Equation (7). Finally, we obtain the communication distance estimation result d si , which corresponds to cluster center c i . In this manner, we can improve the efficiency of RSSI-D estimation.

Experiments
In this section, we evaluate the performance of the DEUDC (including DEUDHC and DEUDSC) RSSI-D estimation method proposed in this paper. We first conduct the feasibility evaluation. In other words, we evaluate the impact of related parameters (i.e., the relevant parameter λ and coverage factors k) on the performance of the RSSI-D estimation method in different environments to obtain the appropriate setting of these parameters. Second, we evaluate the performance of the DEUDC RSSI-D method in three typical environments, and compare with other RSSI-D estimation methods. Finally, we discuss the experimental results and draw general conclusions.

Experiment Setting and Experimental Data
The experimental conditions and parameter settings are shown in Table 1. We design CC2530 WSN nodes based on the TI (Texas Instruments Corporation, Dallas, TX, USA) System on Chip (SOC) framework, shown in Figure 7, and use them for our experiments.  We deploy a real distance estimation system in a 3.2 m × 3.2 m field with sensor nodes, as shown in Figure 8. We fix the four anchor nodes and move the location of unknown node at intervals of 0.8 m in two directions (when it overlaps with an anchor node, we move the unknown node 0.1 m from the anchor node). We deploy the system in different environment, e.g., in a corridor, a hall and an open air environment.
The configuration of evaluation platform (PC) is as follows. CPU: Intel i7 720QM@1.6 Ghz, main memory: 4 GByte, Operating system: Window XP Professional SP3. Evaluation environment: Matlab 2009b. After deploying the distance estimation system, we perform the following sampling procedure and get experimental data: Step 1: At each location point, after receiving RSSI-D estimation request from the sink node (connected to a PC and managing the WSN network), the unknown node sends an RSSI request signal to the anchor nodes.
Step 2: The anchor nodes measure the RSSI value of the request signal and send it to the unknown node.
Step 3: After receiving these RSSI values from the four anchor nodes, the unknown node sends them to the sink node.
At each of these 25 points, we repeat the sample procedure 150 times to obtain the RSSI values of the link between the unknown node and the four anchor nodes. We then perform statistical computation and thus obtain 25 RSSI-D mapping models. After modeling, we sample the RSSI values 50 times at each of the 25 RSSI-D estimation points and perform RSSI-D estimation using the RSSI-D mapping models.

Evaluation Metrics
We evaluate the estimation accuracy and estimation efficiency of the RSSI-D method in terms of the following metrics.
(1) Estimation accuracy metric For estimation accuracy, in this experiment, we adopt the following metric: the RSSI-D estimation absolute error (AE) as indicated in Equation (8): where d t is the RSSI-D estimation value (i.e., distance estimation value) between an unknown node and anchor nodes, d is the real distance value between an unknown node and anchor nodes, AE is the RSSI-D absolute error.
The lower the values of these parameters, the more accurate the results become. We perform the following evaluation based on the metric.
(2) Estimation efficiency metric For estimation efficiency, we adopt the following metrics: model time T m (modeling time) and T e (estimating time). Low values of these parameters means that the estimation efficiency is high.

Feasibility Evaluation
In this section, we evaluate the impact of important parameters (i.e., correlation factor λ and coverage factor k) and the appropriate setting of these parameters.

Impact of Correlation Factor on the RSSI-D Estimation Method
(1) Impact of correlation factor The correlation factor λ determines the combination of the maximum and minimum distance between RSSI value Y and the cluster center during the distance calculation in the clustering process (shown in Equation (7)), and 0  λ  1. We use different values of the correlation factor λ in the experiment to evaluate the factor's impact on the performance of the RSSI-D estimation method. The conditions are listed in Table 2. We fix the anchor node and move the unknown node shown in Figure 8. We set the value of the coverage factor k to 1 and apply the DEUDHC method to perform RSSI-D estimation. Figure 9 shows the mean AE (absolute error of all communication distances) of the RSSI-D estimation.   Figure 9, we could see that the correlation factor does not have a clear impact on the RSSI-D estimation accuracy. For different values of the correlation factor λ, the estimation error does not vary appreciably. Therefore, we can set λ to a random value. To obtain better RSSI-D estimation results, in this experiment, we should set the values of the correlation parameter λ to be 0 to 0.1, 0 to 0.1 and 0.5 to 0.6 when in a corridor, a hall and an open air environment, respectively.
(2) Discussion on setting of correlation factor Based on the experimental results and analysis described above, the impact of the correlation factor on RSSI-D varies based on the different environments. Thus, when we apply the DEUDC RSSI-D estimation method, we should analyze the correlation factor setting procedure through experiments.

Impact of Coverage Factor on the RSSI-D Estimation Method
Coverage factor k determines the range of interval data. According to the error theory [31], when the value of k is greater than three, we treat the data as outliers. By considering the cluster centers' representative form as interval data, we can see that when k is too large, the range of cluster centers will be too wide, which leads to serious overlap between cluster centers and, therefore, a larger RSSI-D estimation error. Therefore, k should take on a smaller value.
(1) Impact of k To evaluate how the coverage factor k affects the performance of DEUDC method, we adopt different coverage factor values and apply the DEUDHC and DEUDSC methods for RSSI-D estimation in the three environments mentioned above. According to the impact analysis of the correlation parameter λ, the correlation factor in the DEUDHC and DEUDSC estimation methods λ takes on values of 0.1 and 0.1, 0.1 and 0.1 and 0.5 and 0,5, respectively, in the corridor. The RSSI-D estimation error of each node is shown in Figure 10.   (2) Analysis of relation between the RSSI standard deviation and the value of k We calculate the standard deviations of the RSSI values of each communication distance point in the three environments. We also evaluate the effect of k in three typical environments and determine the appropriate value of k, as shown in Table 3. The standard deviation represents the level of fluctuation in measurement data [32]. In this paper, we use the standard deviation to index the uncertainty level in RSSI values. Table 3 shows that the level of uncertainty in the RSSI values is high in the corridor, while that of uncertainty is low in open air. This is because radio reflection, refraction, diffraction and multi-path propagation occur in the hall, while there exists few of these uncertain cases.
(3) Discussion on setting of coverage factor k The results of the value-setting experiments performed for the coverage factor k in different environments demonstrate that the appropriate values of k are closely related to the level of uncertainty in RSSI values. Generally, when the standard deviation of RSSI value is about 2, k takes on a value of 1, and k take on a value of 0.5 and 0, when the standard deviation of RSSI value is above 1 and below 1, respectively. We should determine the most suitable value of k through experimental analysis when we apply the RSSI-D estimation method.

Performance Evaluation
In this section we evaluate the performance of the RSSI-D DEUDC (including DEUDHC and DEUDSC) method. We apply the following RSSI-D estimation methods to estimate the distances in the three typical environments: Least Square Linear Fitting (LSLF) [31,33,34], Step Regression Linear Fitting (SRLF) [34], Back Propagation (BP) [35], Least Square-Support Vector Machine (LS-SVM) [34] and DEUDC (proposed in this paper, including DEUDHC and DEUDSC).
In LSLF method, the mean of RSSI sample value with every distance is used to fit a linear curve (as shown in Equation (1)) with least-square rule. And the curve is regarded as the radio propagation model. Based on the model, the distance estimation result can be obtained, given a RSSI value in estimation procedure. In SRLF method, the mean of RSSI sample value is used to model the radio propagation using step linear regression method. BP is a kind of Artificial Neural Network (ANN), which is widely used in data pattern recognition. In the Back Propagation (BP) method, the RSSI-D mapping model is obtained by training the neural network using RSSI-D sample data sets, and the parameter setting is shown in Table 4, then we can get the distance estimation result by simulate the model using RSSI data. In LS-SVM method, RSSI-D sample dataset is mapped into feature space by kernel function, and model is trained in the feature space. Based the model, we can get distance estimation result, and the parameter setting is shown in Table 5. Based on the results obtained from the analysis of the correlation factor and coverage factor in Section 5.2, we set the parameters as shown in Table 6.   (1) Accuracy Analysis of RSSI-D estimation After RSSI-D estimation, we calculate the mean of the RSSI-D estimation absolute error (AE) for each method in the three typical environments. Table 7 and Figure 11 show the RSSI-D estimation error of the different methods. We perform the following analysis: Table 7 and Figure 11 indicate that the DEUDC (including DEUDHC and DEUDSC) method proposed in this paper achieves higher estimation accuracy than the other methods in the three typical environments on average. Specifically, compared to LSLF, SRLF, BP and LS-SVM, the DEUDC could improve the RSSI-D estimation accuracy by 11.31% to 72.15%. Therefore, the DEUDC could overcome the uncertainty problem associated with RSSI value and reduce the RSSI-D estimation error to achieve higher estimation accuracy. Table 7 and Figure 11 demonstrate that the environments have a great impact on the RSSI-D estimation accuracy. For example, the corridor and the hall may feature reflection, inflection, multipath propagation and other uncertain factors which result in more complex communication environments and lead to lower RSSI-D estimation accuracy. On the other hand, in the open air environment, there exist fewer uncertain communication factors; thus, the RSSI-D estimation accuracy is higher.  Figure 11. RSSI-D estimation absolute error of different methods in three typical environments.
(2) Efficiency Analysis of RSSI-D estimation Based RSSI-D estimation, we also evaluate the estimation efficiency in terms of modeling time, estimation time and total time, and Table 8 shows the estimation efficiency of different methods.  Table 8 demonstrates that the DEUDC method proposed in this paper achieves higher estimation efficiency than most of other methods. More specifically, compared to SRLF, BP and LS-SVM method, DEUDHC can improve the estimation efficiency on the scale of 98.59%, 99.99% and 99.97% respectively. And DEUDSC can improve the estimation efficiency on the scale of 85.80%, 99.87% and 99.69% respectively. LSLF method uses RSSI-D sample data to fit certain linear model, so the efficiency is very high. In SRLF method, step regression strategy is used to fit linear model, so estimation efficiency is low. The estimation efficiency of BP and LS-SVM is low, that is because the modeling and computation is complex. In DEUDC method, the modeling and estimation is simple, so its efficiency is higher. So DEUDC is more suitable for applying in WSN.
(3) Discussion on innovation From Figure 11, we can see that, compared with BP and LS-SVM methods, the estimation accuracy improvement of DEUDC is limited. However, from estimation efficiency point of view, the improvement is very obviously. Considering the estimation accuracy and efficiency, the performance evaluation results indicate that, compared to the LSLF [31], SRLF [34], BP [35] and LS-SVM [36] methods, the DEUDC (including DEUDHC and DEUDSC) method exhibits higher estimation performance in the three typical environments. This result is observed, because the curve fitting based methods (e.g., LSLF and SRLF) assume that the RSSI values are related to the communication distance, though the relation does not exist, which leads to lower RSSI-D estimation accuracy.
On the other hand, the RSSI-D estimation method DEUDC based on interval data clustering considers the distribution characteristics of RSSI values in real communication environments and builds a mapping relation between RSSI and distance (D), which leads to a higher performance.
This method is not only suitable for RSSI-D estimation in wireless sensor networks, but can also be applied in other radio transmission systems.
(4) Discussion on DEUDC method and application environment The experimental results demonstrate that in the corridor and hall, where the level of uncertainty of the RSSI values is higher, the RSSI-D estimation error of DEUDSC is lower than that of DEUDHC. On the other hand, in the open air environment, where the level of uncertainty in RSSI values is lower, the RSSI-D estimation error of DEUDSC is lower than that of DEUDHC. Therefore, we should select the RSSI-D estimation method that best suits a given communication environment.

Generality of DEUDC Distance Estimation
It should be noted that, the off-line environment measurement module in DEUDC method is not necessary, i.e., the DEUDC method can be applied beyond already known and measurement environments, if application requirements are not sensitive to estimation accuracy, we can estimate the RSSI-D distance with the help of empirical radio propagation model.

Implementation of Distributed DEUDC RSSI-D Estimation
In this paper, we focus on the evaluation and analysis of distance estimation method (i.e., DEUDC) based on RSSI with different levels of uncertainty. So we resort to central processing strategy to do distance estimation. And more, the distance estimation method can be implemented distributed. In the distributed system, the RSSI-D estimation can be performed on each unknown node in WSN.

Conclusions
Targeted for communication distance estimation in real WSN applications, we have proposed a RSSI-D estimation method, DEUDC, which utilizes uncertain data clustering algorithms. The key idea is the leverage of interval data combined with the statistical distribution of RSSI values, followed by distance estimation using interval data clustering algorithms. Extensive experimental results show that the DEUDC RSSI-D estimation method can largely overcome the uncertainty of RSSI values in a real system while achieving promising RSSI-D estimation accuracy, whereas the improvement is more evident in environments where RSSI readings have larger uncertainties. The DEUDC method can provide precise distance estimation for not only localization but also object identification, deploy, item tracking and many others.
For the sake of good estimation accuracy, the RSSI-based distance estimation method requires that wireless measurements should be performed in advance, which essentially will become a bottleneck if wireless measurements are not feasible. For future works, we may explore the adaptive WSN RSSI-D estimation methods, in which a maximum likelihood or least-square method can be used to update model parameters iteratively in a real-time manner. We note there are fundamental challenges for adaptive estimation methods too, e.g., computation costs, energy consumption, etc., which will be left as our future work.