A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping

With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1) mapping from the physical space to the cyber space, (2) data preprocessing, (3) pattern mining and (4) knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers’ shopping behaviors via multi-source RFID data.

This paper is organized as follows: first, we introduce indoor mapping technologies and related terms in Sections 2 and 3, respectively. Then, in Section 4, the framework for mining multi-source in-door RFID data is presented, and four modules are discussed in detail, which are: (1) mapping from the physical space to the cyber space; (2) data preprocessing; (3) pattern mining and (4) knowledge understanding and utilization. In Section 5, we address a key problem existing in the data preprocessing module, which is how to identify the mainstream shopping transaction paths while wiping out unnecessary redundant and repeated details. An algorithm which can filter two types of redundant patterns is also proposed. Then, a simulated shopping path generator is discussed in Section 6, and the experimental evaluation of the algorithm is given in Section 7. Finally, we discuss the contributions towards a real supermarket scenario and conclude our work in Sections 8 and 9, respectively.

Indoor Mapping
To easily comprehend our proposed framework, we provide below a broad overview of indoor mapping technologies.

Overview
With the progress in sensor technology, many promising indoor-mapping solutions [1][2][3]6,7,16,17], which can provide precise (or proximity), reliable and robust positioning services, have been proposed. Commonly, an indoor-mapping solution contains two components: (1) a physical-layer for sensing and (2) a software-realized data processing and location positioning, where the sensing capability is based on various available technologies, such as ultra-wideband (UWB), RFID, wireless local area network (WLAN), Bluetooth, ultrasound and video cameras, or the combination of these technologies [6,16,17]. On the basis of sensing, location positioning can be achieved using positioning algorithms, which can be mainly divided into three categories: triangulation, scene analysis and proximity [17]. Triangulation schemes employ geometrical property-based techniques, which are typically time of arrival (TOA), time difference of arrival (TDOA), round-trip time-of-flight (RTOF), angle of arrival (AOA) and received signal strength (RSS) [6]. Scene analysis approaches commonly involve two phases: an offline phase of training and an online phase of positioning. In the offline phase, fingerprints of scenes are collected and stored; during the online recognition phase, machine learning methods (e.g., extreme learning machine [18]) are adopted to compare the observed fingerprints with pre-measured fingerprints for position determination [17].

RFID-Based Indoor Positioning
Among the above technologies, RFID is an attractive option for coarse grained localization which provides proximity position information, because it is relatively cost-effective and is quite suitable for tracking a large number of items. Therefore, RFID technology is selected in our application of tracking shopping carts and purchased items in a supermarket.
Non-contact RFID positioning systems include three components: RFID readers, tags and servers, where tags can be active or passive. Active RFID tags equipped with internal batteries can broadcast their signals initiatively, and provide a much longer signal transmission range than passive tags; while passive tags are powered by signals transmitted from RFID readers [19,22]. Several basic frequency bands are employed by RFID systems, which include low frequency (LF), high frequency (HF), very high frequency (VHF), ultra-high frequency (UHF) and microwave frequency. Different frequency bands offer different read ranges which normally vary from 10 cm to 12 m, and are suited for different applications [20]. Representative RFID-based precise location sensing systems are SpotON [21] and LANDMARC [22], where reference tags are employed as landmarks. Typical work towards RFID-based proximity positioning includes tracking materials on construction job sites by combining proximity reads from a discrete range [23].
Fault tolerance is another important issue for RFID-based positioning system. The faults (false positive/negative readings) may be caused by many factors, such as hardware failures (e.g., malfunction, running out of battery energy), multipath interference, or complex radio propagation [33]. Countermeasures can be divided into two categories: physical solutions that are based on hardware performance improvement [34], and intelligent software solutions that are based on spatial-temporal correlations/redundancy [33,35].

Materials for the Study
In this section, related concepts are defined, and the notations used in this study are summarized in Table 1.

Definition 1.
A path segment s is a directed edge associated with a direction symbol (s.dir), two terminal points (one is the start terminal point s.b and the other is the end terminal point s.e), and its length (s.l). The path segment only can be travelled from s.b to s.e. The reverse-order path segment of s is the path segment sharing the same edge with s but reverse direction, i.e., sreverse, where sreverse.dir and s.dir are reverse, sreverse.b equals s.e, sreverse.e equals s.b, and sreverse.l and s.l are equal.

Definition 2.
A path graph G is a directed graph, i.e., G = (V, E), where V is the set of terminal points of path segments, and E is the set of path segments. Path graph G is an abstraction of the connections of path segments in a real field. (1) Given two shopping paths, i.e., SP = < 12 , , , n s s s > and P S  = < 12 , , , l s s s , then SP is a super-sequence of P S  , and P S  is a subsequence of SP (denoted as P S   SP). We also call that P S  is contained in SP. ) is satisfied, these shopping paths can be connected one after another, and the connection can be marked as SP1→SP2→…→SPn.  The Item-segment table, the Length table, and the Path-set table respectively Definition 5. Given a shopping path SP, the connection between SP and its reverse-order path reverse SP (i.e., SP→ reverse SP ) forms a symmetric pattern. If SP.b = SP.e, SP is called a loop pattern. Given a loop pattern SP, if SP repeats n (n ≥ 2) times successively, i.e., SP→SP→…→SP, we call it a loop repeat pattern. Given a shopping paths, i.e., SP, we call the pattern SP→SPreverse→SP a palindrome-contained pattern. Definition 6. A shopping transaction path is a sequence of triples, STP = <(s1,t1,T1), (s2,t2,T2), …, (sn,tn,Tn)>, where (si, ti, Ti) means that a shopper purchases the itemset Ti and spends ti unit time per unit length in the path segment si ( Definition 7. Given a shopping transaction path, i.e., STP = <(s1,t1,T1), (s2,t2,T2), …, (sn,tn,Tn)>, there are several concepts, which are relevant to shopping transaction paths, are given below: ( . suffix STP = <(sk+1,tk+1,Tk+1), (sk+2,tk+2,Tk+2), …, (sn,tn,Tn)> is called a suffix of STP.
For example, STP = <(s1,1,Ø),(s2,0.8,Ø),(s3, 8 can be transformed to a shopping path, that is to say Trans(STP) = <  Table (SIT), maintaining the information of items sold in each path segment, is denoted as below: , and W is the total number of path segments. Table (IST), maintaining the information about the segments where each item is sold, is denoted as below:

Definition 10. An Item-Segment
where iitem,j is an item, j item E , is the set of path segments that sell iitem,j (1  j  U), and U is the total number of items. Table (LT), maintaining the length information of path segments, is denoted as below:

Definition 11. A Length
where si is a path segment, and si..l is the length of si which can be obtained according to the length of normal trajectory of si (1  i  W).

System Framework of RFID Path Explorer
In this section, we describe the framework of the RFID supported paths and behaviors mining, called RFID Path Explorer.
The proposed framework consists of four modules: (1) mapping from the physical world to the cyber space; (2) data preprocessing; (3) a data mining mechanism; and (4) knowledge understanding and utilization (see Figure 1). Table 2 shows an example of shopping transaction path database which contains five shopping transaction paths. Below, we explain them in detail.

Indoor Mapping from the Physical World to the Cyberspace
The module of indoor mapping from the physical world (PW) to the cyber space consists of two steps as shown in Figure 1 According to the task of application domain (i.e., finding actionable navigation patterns for purchasing an item), suitable RFID devices should be chosen and deployed in a real field (i.e., real supermarket). For instance, in our application, a RFID tag, which has a unique Electronic Product Code (EPC), is attached to each shopping trolley. RFID readers are located at various places of a supermarket, such as the entrance, the checkout, the gathering place for shopping carts, aisles and thoroughfares etc., and used to identify shopping trolleys passing by. At the same time, when valuable items attached with RFID tags are put into a shopping trolley, they also can be recognized by this RFID-reader-equipped shopping trolley. Thus, both shoppers' path and behaviors can be captured ( ) ( C ) and recorded. For the sake of robust, redundant multiple readers/tags [23] and received signal strength functions of RFID devices [21] can be added to promote the reliability of proximity location determination. Recorded raw RFID data can't be understood without the support of semantic information about these data. Therefore, attributes and features related to the analysis task should be abstracted from the physical world and mapped to the cyber-space. In the context of our application, a path graph G is used to abstract the connections of path segments, after RFID devices are deployed in a supermarket. An illustration of path graph is shown in Figure 2c, which is mapping from Figure 2b. A segment-item table (SIT) and an item-segment table (IST) are also extracted to reflect the items sold in each segment and the segments where each item is sold respectively. An example of SIT and IST is shown in Table 3.

Preprocessing
After the raw RFID path and behavior data is captured, the preprocessing is shown in Figure 3.

Step 1. Data Ordering and Data Compression
Raw RFID path data has the form (EPC, Loc, Time_stamp), where EPC is the Electronic Product Code of the tag that uniquely represent a shopping cart, Loc is the identification location whose reader finds the tag, and Time_stamp is the time when the RFID reading takes place [36]. These raw data firstly need to be sorted on EPC and time, and then be transformed to the form of stay record, i.e., (EPC, Loc, T_in, T_out), where T_in is the time when the RFID tag enters the identification area, and T_out is the leave time [37]. When an item (i.e., iitem) is put into a shopping cart (i.e., Cart), a raw purchasing data (i.e., (Cart, iitem, Time_stamp)) is also generated, where Time_stamp is the time of detecting the item. And then, raw purchasing data is continuously produced, until the item is picked out of the shopping cart. Therefore, for an item, only the record of first reading needs to be saved, which marks purchasing of the item happens.
Here, any two consecutive locations are required to meet spatial constraint [35] that these two locations should be directly connected in the path graph. Two successive locations which cannot satisfy this constraint are labelled as an anomaly, and the anomaly should be checked further to infer/determine whether missed readings or false positive readings occur. If permanent/intermittent/transient faults leading to this anomaly can be identified, missed readings are filled in and false positive readings are discarded to make the whole trace record smoothly connected; otherwise, suspicious readings are removed and the remaining parts of the trace record are kept separately. This prior-knowledge based validation mechanism can further promote the reliability of sensing.

Step 3. Segment Extraction
In the supermark scenario, shopping carts are recycled and used by different shoppers at different shopping times. Thus, the whole trace record of a certain EPC, which is the proxy of a cart, commonly contains multiple shopping trips of various shoppers. Besides, path sequences of supermarket staff collecting shopping carts may also be contained. Therefore, in order to study customers' shopping behavior, it is necessary to extract individual shopping trips from the whole trace records.
For the above purpose, we develop a finite state machine model [39] for shopping carts (see Figure 4) by referring to the real situation in supermarkets. In Figure 4, there are four states for a shopping cart: "idle", "shopping", "discarded" and "end" states. The "idle" state implies that the cart stays in the gathering place for shopping carts and is currently available. The "shopping" state indicates that the cart is in use by a shopper. The "discarded" state means that the cart is discarded midway by a shopper, and the "end" state implies the cart arriving at the checkout and the end of a shopping trip. In the model, a state will be transformed to another one, if a certain event [40] is triggered. The initial state of a cart may be "idle". If a "start" event happens, the "idle" state will become the "shopping" state, where the "start" event can be defined as the observation that a shopping cart leaving the gathering place and entering the main entrance. Then, the "shopping" state will be changed to the state of "discarded" if the "discarding" event happens or to the "end" state if the "end" event takes place. Both the "discarded" and the "end" states are followed by the "collection" event, and will be transformed to the "idle" state again. Here, the "discarding", the "end" and the "collection" events also should be defined according to the real situation. For example, we may define the "discarding" and the followed "collection" events as a long stay in a certain location and then moving to the gathering place directly, and the "end" event as a cart arriving checkout with items taken out of the shopping cart. Thus trace record segments can be extracted from a whole trace record, where each segment represents a single shopping trip.

Step 4. Assembling Segments into Shopping Transaction Paths
In order to analyze shopping paths, terminal-points focused trace record segments need to be further transformed to path-segments focused shopping transaction paths, which also combine the information of purchased items. The transform process is shown below.
Thus, after all items (i.e., iitem,j (1  j  n)) are added to the corresponding item set of path segment where this item is purchased, we can obtain a shopping transaction path, i.e., EPC_ID:

Step 5. Extracting Mainstreams of Shopping Transaction Paths
In the context of web browsing, Chen et al. [27] first introduced the concept of maximal forward reference, and proposed a method of breaking a user session down into several maximal forward references if backward references appear in this session. However, several limitations still exist in the scheme of extracting maximal forward reference.
First, their extraction method assumes that backward references are all for easy of travelling, and not for browsing. But this assumption fails in the context of a real supermarket. There are two intentions for a shopper choosing to go backward. One is trying to explore and purchase in the backward reference, and the other is going through the backward reference to other interested sections, so the method of Chen et al. [27] can't be applied here.
Second, their method throws away all backward references, which might provide important clues on shoppers' purchasing and navigation behaviors.
Third, after applying their method, the frequency of prefix sequence in front of symmetric pattern is increased unexpectedly. For instance, suppose there is a shopping path < AB, BC, CD, DC, CE, EF, FG, GH, HG, GI > shown in Figure 5, where two symmetric patterns exist, i.e., < CD, DC > and < GH, HG >. After applying the method of Chen et al. [27], the set of maximal forward shopping path is < AB, BC, CD >, < AB, BC, CE, EF, FG, GH >, < AB, BC, CE, EF, FG, GI >. We can find that the frequency of prefix sequence < AB, BC > unexpectedly becomes 3, while it is 1 in the original shopping path. The frequency of prefix sequence < AB, BC, CE, EF, FG > is converted to 2, while it is 1 in the original shopping path.
Fourth, a maximal forward reference terminates if a backward reference appears. Thus, a maximal forward reference will not contain any symmetric pattern. This may lead to an unexpected loss of important knowledge on symmetric pattern. For example, from the shopping path < AB, BC, CD, DC, CE, EF, FG, GH, HG, GI > shown in Figure 5, we know that this customer would like to go to the aisle of CD first, turn back to the main thoroughfare of EF, go forward to the aisle of GH, and then be back to the main thoroughfare again. The knowledge on customer's turning back disappears in the set of maximal forward shopping path.
Therefore, instead of finding maximal forward references, we present another new scheme called extracting mainstream of shopping transaction path, which reserves necessary symmetric patterns, but discards redundant and repeated details. This scheme is discussed in the next subsection in detail.

Mainstreams of Shopping Transaction Paths
In the environment of a real supermarket, in order to choose items of interest, shoppers are inclined to push/pull shopping carts forward and backward. Symmetric patterns, loop patterns and redundant details may appear in shopping paths. In order to catch the mainstream of path sequences while discarding unnecessary redundant and repeated details, we put forward a scheme for identifying mainstream shopping transaction paths. In this scheme, we recognize that two types of redundant patterns need to be simplified, i.e., loop repeat patterns and palindrome-contained patterns. .

Processing of Loop Repeat Patterns
Successive repeated path sequence loops actually reflect the same shopping interest and share the same behavior pattern, so these loops can be combined into one loop. For a shopping path, we can compress several repeat loops into one directly. For a shopping transaction path with a loop repeat pattern, i.e., STPprefix→STP1→STP2→…→STPn→STPsuffix, where STPi (i = 1,…,n) shares the same navigation pattern, these STPi (i = 1,…,n) also can be combined into a single STPcombine = <(s1, t1, T1), (s2, t2, T2),…, (sm, tm, Tm)>, so we also need to consider specifying values of tj and Tj in STPcombine (j = 1,…,m). The time spent in a path segment is normally comprised of two parts: walking time and time for exploring and purchasing. We consider that if a shopper tries to complete the task of exploring and purchasing in one loop (say STPcombine), which is previously done in multiple loops (say STPi (i = 1,…,n)), time spent in the same path segment of loop should be cumulated, and itemsets purchased in the same path segment but in different loops also should be combined. Therefore, we have the following definition for simplification of loop repeat pattern: Definition 12. A shopping path containing loop repeat pattern, i.e., SPprefix→SP   n SP→SPsuffix, can be simplified as SPprefix→SP→ SPsuffix. A shopping transaction path containing loop repeat pattern, i.e., STPprefix→STP1→STP2→…→STPn→ STPsuffix, where STPi = <(s1, t1,i, T1,i), (s2, t2,i, T2,i),…, (sm, tm,i, Tm,i)>, all STPi share the same navigation pattern (say Trans(TSP) = <s1, s2, …, sm>), and m is the number of path segments in STPi (i = 1,…,n), can be simplified as STPprefix→STPcombine→ STPsuffix, where STPcombine = <(s1, t1, T1), (s2, t2, T2),…, (sm, tm, Tm)>, twalking is the smallest value of time spent per unit length in this shopping transaction path, tj and Tj (j = 1,…,m) are defined as below: For instance, for the shopping path <AB, BC, CD, DE, EB, BC, CD, DE, EB, BC, CF> shown in Figure 6, loop <BC, CD, DE, EB> appears two times continuously and forms a loop repeat pattern, so this shopping path can be simplified as a mainstream of shopping path <AB, BC, CD, DE, EB, BC, CF>.

Algorithm for Identifying Mainstreams of Shopping Transaction Paths
The algorithm is an iterative process of filtering loop repeat patterns (i.e., Function LRP_Filtering(STP)) and palindrome-contained patterns (i.e., Function PCP_Filtering(STP)) for identifying mainstreams of shopping transaction path from shopping transaction paths, as shown in Algorithm 1. Call PCP_Filtering(STP) to filter palindrome-contained patterns } 5. Add STP to DMSTP } 6. Return DMSTP.
If repeat loop pattern is found, do { 4.

Return n_loops.
We use a running example to explain the running process of sub-function Find_RepeatLoops(SP). Given a shopping path SP = <EA, AB, BC, CD, DE, EA, AB, BE, EA, AB, BE, EG, GD, DK>, the function reads path segments in SP one by one, and the process of finding loop repeat pattern is shown below.
For the first path segment EA, EA cannot be found as a key in empty HT, so the key-value pair (EA, "null") is inserted to HT, and the pair (EA, "null") is pushed onto an empty PV. The position of this pair in PV is 0. Therefore, the value associated with EA is changed to 0 in HT. cur_pos_seg (which is "null") doesn't equal to new_cur_pos_seg-1 (which is -1). List still remains empty.
For the second path segment AB, similar operations are done. After operations, the key-value pair (AB, 1) is added in HT, and the pair (AB, "null") is pushed onto PV.
For the sixth path segment EA, it is found as a key in HT and cur_pos_seg is 0, which is the position of previous EA in PV. Push the pair (EA, 0) onto PV and set the value associated with EA (i.e., HT[f(EA)]) to be 5 in HT. Because cur_pos_seg is not "null", a candidate (0, 4, 0) is generated and added to List. And then, cur_pos_seg = PV[0].pos = "null", so no candidate is generated here.
For the seventh path segment AB, the pair (AB, 1) is pushed onto PV and HT[f(AB)] is set as 6. Since there is a loop candidate (i.e., triple (0, 4, 0)) in List, we need to compare AB with the next path segment of this candidate (i.e., PV[0+1].s). Both of them are AB and they are matching, so we set this triple to be (0, 4, 1). Because cur_pos_seg is 1, a new candidate (1, 5, 1) is produced and there are two candidates in List now.
When reading the eighth one BE, the pair (BE, 7) is pushed onto PV and HT[f(BE)] is 7. For the candidate (0, 4, 1), the next path segment is PV [1+1].s=BC, which does not match BE. So this candidate should be deleted from List. For the candidate (1, 5, 1), the next path segment, i.e., PV [1+1].s=BC, does not match BE, so this candidate also needs to be pruned. No new candidate is generated, since cur_pos_seg equals to "null".
For the ninth one EA, the pair (EA, 5) is added at the end of PV and the value associated with EA is set as 8 in HT. Similarly, two new candidates (5,7,5) and (0, 7, 0) are obtained and added to List.
If palindrome-contained pattern is found, do {
Similarly, we push CD, DE, EF onto V. And no candidate or candidate suffix is generated.
For the sixth path segment FE, since the last element of V (EF) and FE are reverse-order, reverse-order is true. We push FE onto V. A candidate (4, 4, "null") and a candidate suffix (4,4) are generated.
When reading the seventh path segment ED, reverse-order is false, and ED is pushed onto V. For candidate (4, 4, "null"), we compare V [4] (EF) with ED and they do not match, so we delete this candidate from LC. For candidate suffix (4,4), since V [3] (DE) is reverse-order path segment of ED, this candidate suffix becomes (3, 4) and a new candidate (3, 4, "null") is generated.
When reading the eighth path segment DC, reverse-order is also false, and DC is pushed onto V. For candidate (3,4, "null"), since V [3] (DE) and DC do not match we also prune this candidate from LC. For candidate suffix (3,4), V [2] (CD) and DC are reverse-order, so this candidate suffix turns to (2, 4) and a new candidate (2, 4, "null") is produced.

Generation of Synthetic Shopping Transaction Paths
In order to generate a synthetic workload, we build an agent [41]-based simulator to simulate the scenario of an individual shopping trip. The complete flow diagram for this simulator is shown in Figure 8, which mainly includes four steps: construction of a path graph, initialization of customer agents, generating a shopping transaction path, and attaching extra loop repeat patterns and palindrome-contained patterns. Among them, Step 4 is optional for testing. Steps 2, 3 and 4 can be performed repeatedly for |D| times and then a database of shopping transaction paths D will be produced. In the following, we discuss these four steps in detail. For the sake of easy reference, the meanings of various variables used in our simulator are summarized in Table 4.

Notation Description n terminal_points , n path_segments
The number of terminal points, path segments in path graph G respectively n items The number of different items ShoppingTime(i item ) The shopping time for i item j A shopper speed normal, j , speed j The normal, actual moving speed for j respectively n plan, j The number of different planned-purchasing items for j n interest, j The number of different items that j feels interested in L plan, j A set of planned-purchasing items for j L interest, j A set of items that j feels interested in S plan, j A set of path segments that j plans to visit S interest, j A set of path segments that j feels interested in The mean, the standard deviation of the Gaussian distribution of speed normal,j lower_bound n plan,j , upper_bound n plan,j The lower bound, the upper bound of the uniform distribution of n plan, j on integers respectively n extra_interest, j The number of additional items (besides items in L plan, j ) that j feels interests in

Step 1. Construction of Path Graph G
A path graph, the container for customer agents moving in, is constructed in this step. We need to specify the components of a path graph: the set of terminal points of path segments, and the set of path segments. A length table LT and a segment-item table SIT are also needed to be produced. And then, an item-segment table IST can be derived.

Step 2. Initialization of a Shopper Agent
This step includes the following two sub-steps.

Shopper Agent Initialization
In this sub-step, a shopper agent representing an in-store shopper (say j) is initialized. Each shopper agent has the following parameters, which need to be specified: (1) Normal moving speed speednormal, j speednormal, j means the normal moving speed of j, which is derived from a Gaussian distribution with mean (2) Number of different planned-purchasing items nplan, j nplan, j represents the number of different items that are planned to be purchased by j, and is derived from an uniform distribution on the integers lower_boundn plan, j , lower_boundn plan, j + 1, …, upper_boundn plan, j .
(3) A set of planned-purchasing items Lplan, j Lplan, j means the set of different items that are planned to be purchased, and can be written as {iitem,1, iitem,2, …, iitem,n plan, j }.
(4) Number of different items that j feels interested in (say ninterest, j) ninterest, j is the sum of nplan,j and the number of additional items (besides items in Lplan, j) that j feels interests in (say nextra_interest, j). The latter is derived from an uniform distribution on the integers lower_boundn extra_interest, j , lower_boundn extra_interest, j + 1, …, upper_boundn extra_interest, j .
(5) A set of items that j feels interested in (say Linterest, j) Linterest, j has the form {iitem,1, iitem,2, …, iitem,n interest, j }, and each item iitem,k is associated with its shopping time ShoppingTime(iitem,k) (k = 1, 2,…, ninterest, j). ShoppingTime(iitem,k) is derived from a Gaussian distribution with mean For j, his/her actual moving speed speedj can be simply computed as below: 6.2.2. Generating Splan, j and Sinterest, j, and Choosing the Current Visit Target In order to decide which direction a shopper agent would like to go, we need to know which path segments j plans to visit. These path segments are visit targets for j, and j will visit these path segments one by one. Here we use Splan, j to represent the set of path segments that j plans to visit, and use Sinterest,j to represent the set of path segments that j feels interested in. According to the item-segment table IST, we can derive Splan, j by mapping each item in Lplan,j to path segments where this item is sold. Similarly, Sinterest, j also can be derived according to IST and Linterest,j. Definition 16. Given a path graph G, suppose a shopper j is at terminal point v, and the start terminal point and the end terminal point of path segment s are s.b and s.e respectively. Then the distance between the shopper j and the segment s is defined as below: where function shortest_path_length(· , •) means length of the shortest path between two terminal points in G, and function min(· , •) represents the minimal one of two values. In the above definition, length of the shortest path between two terminal points in G can be obtained using well-known Dijkstra's algorithm [43,44]. Thus, based on the definition of distance between a shopper and a path segment, we simply use the following method to decide the current visit target.
Method 1 (deciding the visiting target). For a shopper j, among the elements of Splan, j, the nearest path segment is regarded as the current visit target. If Splan, j is empty, "checkout" becomes the moving target. The current visit target remains unchanged until the current visit target has been visited and Splan, j is updated.

Step 3. Generation of a Shopping Transaction Path
The production of a shopping transaction path can be regarded as a repetitive process of deciding which path segment si (i = 1,2,…,n) should be chosen as the next step, and generating unit time per unit length spent in si (say ti) and the itemset purchased in si (say Ti).

Decision on the Next Path Segment
For simplicity, we suppose the walking process of a shopper j is as follows: first, j selects a visit target, and then he/she walks along the shortest path to the visit target. When he/she reaches the current visit target, he/she needs to decide the next visit target. The process is repeated, until he/she finishes his/her shopping and arrives at "checkout". Thus, we have the following method for deciding the next path segment.
Method 2 (deciding the next path segment). Given a path graph G, if a shopper j hasn't reached the current visit target, the next section along the shortest path to the current visit target is selected as the next path segment for j. If j arrives at the current visit target, he/she considers and decides the next visit target. Then, the next path segment along the shortest path to the next visit target is chosen as the next section.
In this method, the shortest path to a visit target can be obtained by popular Dijkstra's algorithm [43,44] For a shopper j, ti is the quotient of time spent in si (say times i , j) divided by the length of si (say si.l). times i , j consists two parts: time spent for walking in si (say timewalking, s i , j) and time spent for shopping in si (say timeshopping, s i , j). timewalking, s i , j can be computed as below: timewalking, s i , j = si.l/speedj = si.l/(PerceivedTimePressurej  speednormal, j) (12) where speedj is j's actual moving speed. For simplicity, the value of timeshopping, s i , j depends on whether j feels interested in si (that is si  Sinterest, j) or not, and is computed as below: where i s Γ is the itemset sold in si and can be obtained from SIT. For si, Ti simply equals to the set of items that belong to both Lplan, j and i s Γ .
(2) Updating Lplan, j, Linterest, j, Splan, j, Sinterest, j and visiting target If si  Sinterest, j, nothing needs to be updated. Otherwise, since si has been visited, it should be deleted from Sinterest, j. For the reason that Ti has been purchased at si, items in Ti need to be removed from Lplan,j and Linterest. If si Splan, j, si also should be pruned from Splan, j, and the visiting target should be updated further using Method 1, which is given in Section 6.2.2.

Step 4. Attaching Extra Loop Repeat Patterns and Palindrome-Contained Patterns
Producing extra loop repeat patterns and palindrome-contained patterns are exactly the reverse processes of simplifying these two patterns which are presented in Definitions 12 and 13. The methods for producing a loop repeat pattern and a palindrome-contained pattern are described below: tpurchasing(j,i) = tj,i − twalking (i = 1,…,n, j = 1,…,λ) where twalking is the smallest value of time spent per unit length in this shopping transaction path.
tpurchasing(j,i) = tj,i − twalking (i=1,2,3, j=1,…,λ) where twalking is the smallest value of time spent per unit length in this shopping transaction path. Therefore, in order to produce a loop repeat pattern or palindrome-contained pattern, firstly, we randomly choose a fragment of shopping transaction path as STPcombine. And then, transform STPcombine according to Method 3 or Method 4. Multiple loop repeat patterns and palindrome-contained patterns can be produced after executing the above process multiple times. For a database of shopping transaction paths D, five parameters are introduced here: the number of loop repeat patterns (say nLRP), the number of palindrome-contained patterns (say nPCP), the average number of path segments in STPcombine for loop repeat patterns (say LRP  ), the average number of path segments in STPcombine for palindrome-contained patterns (say PCP  ), the average repeat times in loop repeat patterns (say repeat n ).

Experimental Results
To assess the performance of the algorithm of identifying the mainstream shopping transaction paths and PFNP-forest algorithm, we conducted several experiments on a PC with a 3.00GHz Intel Core™ 2 Duo E8400 CPU (Santa Clara, CA, USA) and 4GB main memory, running Windows 7 Enterprise Edition. All algorithms are implemented using VC++ 2010. In these experiments, we establish a path graph, which has 159 terminal points and 554 path segments, as an example to generate shopping transaction paths. Without specific explanations, Default values of various parameters used in our simulations are summarized in Table 5. Since the kernel parts of identifying mainstreams are Function LRP_Filtering() (which is used for filtering loop repeat patterns) and Function PCP_Filtering() (which is for filtering palindrome-contained patterns), we test the performance of these two functions.   We can find that the execution time of Function LRP_Filtering() is about three times that of Function PCP_Filtering() for the same nLRP (or nPCP). Both of them increase linearly with the increase of nLRP (or nPCP) and have a good scalability.   Figure 11. It is obvious that the execution time of these two functions increases with the increase of |D|, and both of them have a good scalability.

Variations of L
The fourth test examines the execution performance of these two functions with varying L . To obtain different databases of shopping transaction paths with different L , we set the value of the pair (lower_boundn plan, j , upper_boundn plan, j ) to (1, 1), (1,8), (1,18), (5,25), (12,30) and (20,36), and running the simulation (without Step 4) 1000 times respectively. Thus we generate six databases of shopping transaction paths whose L is 17.4, 35.7, 53.1, 71.1, 89.5 and 105.0, respectively. The difference between successive values of these L is approximately 18. Then, we test Function LRP_Filtering() and Function PCP_Filtering() on these databases, and the experimental results are given in Figure 12. We can find that the execution time of these two functions increases in a linear manner, and both of these two functions show a good scalability with the increase of L .  Figure 13. We find that the execution time is almost stable with different repeat n . This result also can be obtained by analysing Function LRP_Filtering(). Since varying repeat n will not change the number of repeat loop patterns that are found, the execution times will not change too much for different repeat n . Figure 13. Execution time in response to changes in different repeat n .

Contributions toward a Real Supermarket Scenario
The contributions of the framework towards a real supermarket scenario include the following aspects: (1) It provides a feasible way for retail practitioners to record customers' shopping trajectories associated with their purchasing behaviors using RFID technology; (2) It designs a path graph schema with the support of a segment-item table and item-segment table, which can be used for the mapping between the physical world and the semantic cyber space. After the mapping, the data semantics can be understood by retail practitioners; (3) It offers a practical approach for preprocessing raw in-store RFID data, which contains five steps: data ordering and compression, data merging and anomaly detection, segment extraction, segment reassembling and extracting mainstream shopping transaction paths. Based on this approach, the raw data will become reliable and clean for retail practitioners; (4) It aims at mining actionable navigation patterns from a combination of customers' shopping paths and their purchasing behavior data. Actionable knowledge is quite useful for decision making [24,25]. For example, we firstly cluster trajectories according to the duration of customers' stays in the store. Then, practitioners can intuitively explore long "stock-up" trajectories where a long time is spent and many different types of products are purchased by customers. Some interesting patterns may be discovered by data mining algorithms, e.g., shoppers of these "stock-up" trajectories tend to frequently walk through a certain popular spot. Thus, decision-makers may consider offering active services, such as product recommendations and advertising, in that spot.

Conclusions
In this paper, we use the retail industry as an example to explore the potential of RFID technology for indoor mapping and navigation. In a supermarket scenario, RFID provides the ability to interact with items (i.e., transport carts, trolleys, kegs and valuable products) without physical contact. Thus, item-level RFID infrastructures not only provide item handling efficiency, but also offer a promising way to capture customers' in-store behavior data and then gain insight into these data using data mining technology.
In this context, we provide a framework for mining actionable navigation patterns by combining RFID in-door mapping and data mining techniques. In the framework, multi-source in-door RFID data (i.e., shopping path data and RFID-supported customers' purchasing behavior data) is integrated together for in-depth customers' behavior analytics. The framework consists of four modules: (1) mapping from the physical space to the cyber space; (2) data preprocessing; (3) data mining mechanism; and (4) knowledge understanding and utilization. Among them, the kernel part, i.e., the scheme of extracting mainstream shopping transaction paths, is discussed in detail. The scheme of identifying mainstreams aims at catching the mainstream path sequences while discarding unnecessary redundant and repeated details, and is quite different from the scheme of extracting maximal forward reference. Two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern, are recognized, and the corresponding algorithms are proposed and evaluated. Experimental results show that the algorithm is efficient and scalable for filtering these redundant patterns. On the whole, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers' shopping behaviors via multi-source RFID data.