Behavior-Based Cleaning for Unreliable RFID Data Sets

Radio Frequency IDentification (RFID) technology promises to revolutionize the way we track items and assets, but in RFID systems, missreading is a common phenomenon and it poses an enormous challenge to RFID data management, so accurate data cleaning becomes an essential task for the successful deployment of systems. In this paper, we present the design and development of a RFID data cleaning system, the first declarative, behavior-based unreliable RFID data smoothing system. We take advantage of kinematic characteristics of tags to assist in RFID data cleaning. In order to establish the conversion relationship between RFID data and kinematic parameters of the tags, we propose a movement behavior detection model. Moreover, a Reverse Order Filling Mechanism is proposed to ensure a more complete access to get the movement behavior characteristics of tag. Finally, we validate our solution with a common RFID application and demonstrate the advantages of our approach through extensive simulations.


Introduction
Radio Frequency Identification (RFID) is an electronic tagging technology that allows objects, places, or persons to be automatically identified at a distance without a direct line-of-sight, using an electromagnetic challenge/response exchange [1,2]. RFID offers a possible alternative to barcodes, and OPEN ACCESS has emerged as a key technology for a wide-range of applications, including supply chain, retail stores, and asset management [3]. However, the widespread adoption of RFID technology is limited for the unreliability of the data streams produced by RFID readers [4,5]. RFID data cleaning is therefore widely considered as a principal challenge and has been an important research topic in the last few years [6][7][8].
Despite the improvement of the accuracy of RFID readers, there are still erroneous readings such as missed readings and ghost readings, due to interference, inappropriate placement of tags, temporary or permanent malfunction of some components.
The goal of RFID data cleaning is to eliminate the erroneous readings, especially to reduce or eliminate dropped readings. In this paper, we propose an innovative approach of cleaning RFID raw data Behavior-Based Smoothing for unreliable RFID data (BBS). Unlike conventional techniques, BBS relays primarily on the movement behavior of tags to fill the RFID data. Our biggest obstacle is how to obtain movement behavior characteristics of tags. To address this problem, a movement behavior detection model is proposed so that we can get the results by analyzing existing uncertain data of the corresponding tags. The contributions of this study are as follows: • A movement behavior detection model. By counting the frequency of tags read in each cycle, we can get the read rate of tags and analyze kinematic characteristics of the tags according to changes of the read rate sequences, and ultimately to assist in RFID data cleaning. • Reverse Order Filling Mechanism (ROFM). Based on the detection model, we design and implement a reversible RFID data filter. When we detect the data has not been filled completely, ROFM will be started to fill the data again in reverse order. The mechanism can ensure a more complete access to get the movement behavior characteristics of tags, and thus significantly improve the accuracy of data cleaning without scanning all the data twice. • Improve the positioning accuracy of the RFID reader. Traditional RFID positioning system can only provide the Boolean result such as the condition whether the tag is in the read range of the reader at the time. But BBS can also get the distance between the tag and the reader, and even the velocity of tags. • Evaluate the effect of BBS. We design several groups of contrast experiments on the data sets include measured data and simulation data. The results show that under all conditions with different missing rates, obviously, the precision of BBS is better than that of sliding-window cleaning.
The rest of this paper is organized as follows: we discuss the related work in Section 2. Section 3 defines the Object Movement Detection model and introduces our RFID data cleansing mechanism and arithmetic. An empirical evaluation of our solution is reported in Section 4. Finally, Section 5 concludes the paper.
Many systems have been developed to manage uncertainty data. RFID data management, is one of the most important applications that drives the recent surge of interest in managing incomplete and uncertain data, which has been studied extensively. Valentine et al. [8] presented an adaptive sliding-window based approach WSTD for reducing false negative reads in RFID data streams. Rao et al. [13] presented a deferred approach for detecting and correcting RFID data anomalies by utilizing declarative sequenced-based rules. Chen et al. [14] proposed a Bayesian inference based approach, which takes full advantage of data redundancy, for cleaning RFID raw data. Gonzalez et al. [15] proposed a cleaning framework that takes an RFID data set and a collection of cleaning methods, with associated costs, and induces a cleaning plan that optimizes the overall accuracy adjusted cleaning costs by determining the conditions under which inexpensive methods are appropriates, and those when more expensive methods are absolutely necessary.
The work in [5,12] is the most relevant research to this paper. Jeffery et al. [5,12] proposed an adaptive smoothing filter SMURF for RFID data cleaning. SMURF focuses on a sliding-window aggregate that interpolates for lost readings. SMURF models the unreliability of RFID readings by taking RFID streams as a statistical sample of physical tags, and exploits techniques in sampling theory to drive its cleaning processes. But it is mainly applied to the circumstances that the movement of tags is infrequent, and is not effective in the case that tags move frequently.

A Movement Behavior Detection Model
The key for a movement behavior-based smoothing filter lies in how to establish the conversion relationship between read rate sequences and kinematic parameters of tags to assist in RFID data cleaning. To do so, we proposed a movement behavior detection model.
The process of tag passing through the reader's read range follows the laws of kinematics. The change of kinematic parameters such as displacement and velocity which possess an important feature is continuous, not transitional, so if the location (which mainly refers to the distance between tag and reader) and the relative velocity of tag at the time can be obtained through the original data, we can speculate the parameters of the tag at the missed reading time by these parameters and their trends, and further assist in data cleaning and improve its accuracy. BBS uses this approach, for example, using existing tag data to analyze and get the location p 1 and the velocity v 1 of the tag at the time t 1 , which can help approximately inferring to the relative location of the tag at the time t 1 + T (T refers to a short period of time). Finally, by mapping the location information back to the RFID data, we can fill the missed RFID data. Therefore, through these kinematic parameters BBS can obtain whether the tag is in the detection range at the time, and further give its specific location.
Adopting the statistical methods similar to SMURF, each epoch is viewed as an independent Bernoulli trial with success probability p i [12]. An epoch may be specified as a number of interrogation cycles or a unit of time. A typical epoch range is 0.2-0.25 seconds [5]. For each epoch, the reader keeps track of all the tags that have been identified, and additional information such as the number of interrogation responses for each tag and the last time the tag was read. Assuming, there are n interrogation cycles in an epoch, the number that tag i is monitored is m i . We can get the read rate of tag i at the moment by p i = m i /n. In the process of passing through the reader's read range, tags will be continuously scanned. Also in the whole process, the read rate of tag is not constant but constantly changing with the distance between the tag and reader. Besides, some researchers have proved by experiments that in the reader's detection region there is a linear relationship between read rate p and distance s [12]. For specific readers, the detection range S is a constant. To confirm this conclusion, we have carried out similar experiments and the conclusion is shown in Figure 1. The quiet condition means an ideal working environment of RFID devices with only a few interferences, while the noisy condition means a work environment with more interferences. By further abstraction of the conclusions above we get the relationship between read rate p and distance s in Figure 2. Obviously, the distance s between tag and reader and the read rate p follow the relation as: where, b = −kS, and k is the slope of the line, so above equation can be further written as:

Behavior-Based Smoothing for Unreliable RFID Data
In this section, how to use the model to fill the missed RFID data will be discussed. In our model, epoch is the basic unit of RFID data streams. Our mission is to fill in the missed epoch information.
The information of RFID data stream that we get includes tag ID, the number of interrogation responses for each tag in an epoch and the time of the epoch, in the form of (tag ID, Response number, time). Let us analyze Equation (2). The read rate p can be calculated through Response number, and the detection range S is a constant, but the distance s can't be calculated directly. In practice, the detection region of each reader is generally not very large, ranging from a few meters to tens of meters. Therefore, the movement through the detection region for persons, vehicles and goods on the conveyor belt and other tagged items can be approximately considered as uniform linear motion or a combination of several successive uniform linear motions. In addition, even if the velocity and direction of the objects has obviously changed in this process, we can also break down their movement, and approximately consider each short process as uniform linear motion. Well known, the speed v of uniform linear motion satisfies the equation ∆s = v∆t. And if we consider s 0 is the original distance of the tag, and Equation (2) can be further written as: where K = ± kv (It take the negative sign when the value of p increases, otherwise take the positive sign), and B = k(s 0 −S).
In practice, readers are usually interfered by the surroundings including the signal reflection and obstruction or sudden current gain, etc., so the read rate that is calculated by Response number will be unstable. The results from directly treating the raw data may differ from the actual movement characteristics, so we use a weighted moving average of order n to smooth the initial read rate sequences. The process of replacing the read rate sequences by its moving average eliminates unwanted fluctuations. Furthermore, the influence of extreme values can be reduced by employing a weighted moving average with appropriate weights to get more realistic movement features of items to be monitored. The calculation is as follows: where w 1 and w 0 are the weights of read rate of current epoch and other epochs respectively.
In the above treatment, we only discuss such epoch whose read rate 0. When the read rate of the epoch is 0, there are two possibilities: the tag is indeed outside the detection range or miss reading occurs to the tag, i.e., the tag is in the detection range but not captured for interference factors. It is necessary for accurate data cleaning to distinguish these two cases clearly. We should analyze its movement feature in the adjacent time. The movement of tags is approximately uniform linear motion and satisfies Equation (3), so we can calculate the read rate p i of the tag by the value of K and the read rate p ia in the adjacent time, to further determine it is a true value or a missed reading. In order to solve the coefficient K, we denote epoch j = {t j , p j }, where t j and p j are the time and read rate of epoch j respectively, and a training set TS = {epoch i+l | p i-l ≠ 0, −m ≤ l ≤ m}, where the upper limit of |TS| is 2m + 1. So the coefficient K can be solved by the method of least squares on the trainings set of TS, which estimates the best-fitting straight line as the one that minimizes the error between the actual data and the estimate of the line: where,

Reverse Order Filling Mechanism (ROFM)
In the data stream processing, data are normally processed in order. However, if the RFID data stream corresponding to a tag is filled in chronological order by the above-mentioned method, it is easy to bring the problem of miss filling, as shown in Figure 3(a). We analyze the read rate of a tag in one time period in detail in Figure 3. Figure 3(c) indicates the read rate of the tag without miss readings and Figure 3(b) shows the raw read rate that the reader actually read. For an epoch p in Figure 3(a), if the corresponding coefficient K p > 0 and the data before the time t p has been miss read for a long period of time, the data before a period of t p will not be filled because the RFID data stream are processed in order. A simple solution is to process the RFID data stream twice, forward and backward. However, this will add a lot of computational overhead. To solve this problem, we introduce a Reverse Order Filling Mechanism. As soon as we detect the situation mentioned above occurs, the read rate of the corresponding data stream is to be refilled in the reverse direction from epoch p+T . Until the original read rate p i ≠ 0 or the filling value of read rate p f = 0 the reverse filling mechanism will not be terminated. And the rest of data will be processed after that. So we only need a twice process to the corresponding data rather than all data, which ensures the completeness of RFID data cleaning, but also does not add too much computational overhead. Algorithm 1 shows a pseudo-code description of BBS cleaning algorithm.

Experimental Evaluation
In this section, we present an analysis of the performance of BBS on several data sets and compare its accuracy with other cleaning methods. All the experiments were conducted on an Intel (R) Core (TM) 2 Duo CPU T9550 @ 2.66 GHz 2.67 GHz System with 2 GB of RAM. Our data include both the real collected data and simulation data. The laboratory equipments used for collecting data include Invengo XCRF-860 RFID UHF reader with 902-928 MHz frequency range, Invengo XCAF-12L antenna and XCTF-8101A tag. The simulation data for our experiments were generated by a synthetic RFID data generator that simulates the operation of RFID readers under a wide variety of conditions. We simulate various movements of tags with different missing rates. The missing rate means the probability that missed reading happens.

Accuracy Comparison
In the experiment, we compare the accuracy of data filled by BBS (with n = 3, n = 7 and n = 11, respectively), SMURF, and sliding-windows methods (with different window size: 5 epoch, 20 epoch and 35 epoch) under different missing rate (from 10% to 80%). The other experimental parameters of BBS are set as follows: m = 7, w 0 = 1 and w 1 = 2. We clean the same raw data with different methods.
Comparing the corresponding cleaning result with real data, we can get the error rate of each method. As shown in Figure 4, the error rate of BBS is lower than that of sliding windows methods in all cases. We found that the choice of the parameter n will have some impact on the experimental results when the missing rate is greater than 70%. Therefore, in practical applications, for optimal cleaning results we should set parameters n, m, w 0 and w 1 with appropriate values in accordance with the actual needs. Usually, the more unstable the read rate sequence, the larger the value of n should be set; the higher the missing rate, the larger the value of m should be set.
We compare the accuracy of data filled by different methods under different tag speeds. The error rates obtained are used to compare the accuracy of methods where lower error rate means higher accuracy. As shown in Figure 5, the results of BBS are obviously superior to all other methods, especially when the speeds of tags are higher than 1.0 m/s.

Conclusions
Accurate data cleaning is an essential task for the successful deployment of RFID systems. In this paper, we have proposed a behavior-based unreliable RFID data smoothing system BBS, which can take advantage of kinematic characteristics of tags to assist in RFID data cleaning. A movement behavior detection model is proposed to establish the conversion relationship between RFID data and kinematic parameters of the tags. Then we reduce the influence of extreme values and other unwanted fluctuations by employing a weighted moving average of order n. Moreover, Reverse Order Filling Mechanism (ROFM) is proposed for BBS to ensure a more complete access to get the movement behavior characteristics of tag. Finally, we validate our solution with a common RFID application and demonstrate the advantages of our approach through extensive simulations.