An Efficient Early-breaking-estimation and Tree-splitting Missing RFID Tag Identification Protocol

Retailers grapple with inventory losses primarily due to missing items, prompting the need for efficient missing tag identification methods in large-scale RFID systems. Among them, few works considered the effect of unexpected unknown tags on the missing tag identification process. With the presence of unknown tags, some missing tags may be falsely identified as present. Thus, the system’s reliability is hardly guaranteed. To resolve these challenges, we propose an efficient early-breaking-estimation and tree-splitting-based missing tag identification (ETMTI) protocol for large-scale RFID systems. ETMTI employs innovative early-breaking-estimation and deactivation methods to swiftly handle unknown tags. Subsequently, a tree-splitting-based missing tag identification method is proposed, employing a B-ary splitting tree, to rapidly identify missing tags. Additionally, a bit-tracking response strategy is implemented to reduce processing time. Theoretical analysis is conducted to determine optimal parameters for ETMTI. Simulation results illustrate that our proposed ETMTI protocol significantly outperforms benchmark methods, offering a shorter processing time and a lower false negative rate.


I. INTRODUCTION
R ECENTLY, radio frequency identification (RFID) has been widely applied in many domains, such as logistics, manufacturing, pharmaceutical industry, and so on [1], [2].As one of the key perception technologies that enable Internet of Things (IoT) networks, RFID exhibits many advantages, including non-contact, non-visual reading, strong antiinterference ability, high reliability, and capablility of working in harsh environments etc.According to a study conducted by the National Retail Federation [3], retailers suffered $94.5 billion in 2021 due to shoplifting, inventory loss, internal theft, management errors, supplier fraud, and other reasons.Missing items have become the main cause of loss for retailers in inventory management.In these applications, readers are used to monitor tags in stock frequently for goods management and inventory.
To effectively identify the missing items, many missing tag identification protocols, including probabilistic and deterministic ones, are proposed.On the one hand, probabilistic protocols implement lightweight operations to detect the missing tag The authors are with the College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China (e-mail: lijuanzhang6@gmail.com;fanmingqiu@foxmail.com;yuchunni0228@163.com;leilei@nuaa.edu.cn ) event with predefined reliability [4]- [6].These works usually take a short time to discover a missing tag event, but they cannot provide ID information of the missing tags.On the other hand, deterministic protocols give ID information of missing tags [7]- [11].Making use of the hash mapping method, these protocols assign known tags to different slots and identify missing tags by checking whether there is a tag response in the expected singleton slot.If no response is detected, the corresponding tag is missing.Otherwise, it is a present one.To further improve the identification efficiency, some recent works considered to use bit-tracking technology, such as the pair-wise collision-resolving missing tag identification (PCMTI) protocol [10] and the collision resolving-based missing tag identification (CRMTI) protocol [11].However, these works assumed that all tags within range are known to the reader without considering any unexpected unknown tags.
In practical scenarios, some unknown tags may present and affect the identification of missing tags.With the presence of unknown tags, a missing tag may be misidentified as a present one if the unknown tag is assigned to the expected singleton slot and replies a 1-bit short message to the reader.In the literature, a few works considered the effect of unknown tags and tried to deactivate them, such as the two-phased bloom filter-based missing tag detection (BMTD) protocol [5] and the efficient and reliable missing tag identification (ERMI) protocol [12].In general, existing missing tag identification protocols have the following limitations: • Since the reader has no prior information about unknown tags, an efficient unknown tag number estimation method is of great importance to guarantee the required reliability.
For time-saving consideration, existing works either lack the estimation process or only provide a rough estimation that the required reliability is not always guaranteed; • Existing works implement Aloha-based strategies to identify missing tags.In each frame, unidentified tags are randomly assigned to slots with hash mapping.None of them considered making use of information in the preceding frames.The slot information is not fully used and the time efficiency needs further improvement; • In previous works, tag replies to the reader with a onebit short response.To reduce the time cost, several works considered using customized responses with the help of bit-tracking technology.However, there still exist many short response slots that lower the time efficiency.
In this work, an efficient early-breaking estimation and tree-arXiv:2308.09484v1[cs.OH] 16 Aug 2023 splitting-based missing tag identification (ETMTI) protocol is proposed for large-scale RFID systems.In ETMTI, two new methods are developed to enhance the unknown tag deactivation and missing tag identification process, respectively.The major contributions of this work are in four folds as in the following.1) A new early-breaking estimation-based unknown tag deactivation (EBUD) method is developed to estimate the number of unknown tags and deactivate them within a short time.The early-breaking factor is chosen to balance time cost and estimation accuracy, and the number of frames is determined to guarantee the required reliability; 2) A new tree-splitting-based missing tag identification (TSMTI) method is designed to effectively identify missing tags.In TSMTI, the B-ary splitting tree method is developed to accelerate the identification process.The optimal frame factor and branch number in TSMTI are derived theoretically to minimize the execution time; 3) A bit-tracking response strategy that allows simultaneous replies of multiple tags is developed to accelerate the identification process.With customized tag responses, the reader can identify multiple tags in one slot, which greatly reduces identification time.4) Theoretical analysis is conducted to optimize the parameter settings and derive the expressions of time cost in each phase.Numerous simulation results are presented to demonstrate the effectiveness of ETMTI.Compared with existing benchmark works, ETMTI takes a shorter identification time and a lower false negative rate to identify missing tags.The remainder of this work is organized as follows: Section II reviews the most related works on missing tag identification.Section III gives the system model of this work.In Section IV, the proposed ETMTI protocol is described in detail.Then, theoretical analysis is conducted in Section V. Simulation results are presented in Section VI.Finally, some concluding remarks are made in Section VII.

II. RELATED WORKS
In this section, we first introduce the traditional missing tag identification protocols with only known tags.Next, the related works that deal with unknown tags are reviewed.

A. missing tag identification with only known tags
In the last decade, many missing tag identification protocols are proposed to specify the ID information of missing tags from the known ones.Li et al. first proposed the two-phased protocol (TPP) and two-hush protocol (THP) [7].Next, Liu et al. proposed a multi-hashing-based missing tag identification (MMTI) protocol to improve the utilization of each frame [8].With multiple hash assignments in MMTI, many expected empty or collision slots are changed into expected singleton slots so that more tags can be identified in a frame.Making use of multiple hash seeds, the slot-filter-based missing tag identification (SFMTI) protocol [9] reconciles expected collision slots with 2 or 3 tags into singleton slots to further improve the utilization of a frame.Later on, some similar protocols that make use of the reconcilable collision slots are proposed, such as the coarse-grained inventory list-based stocktaking protocol [13] and the collision reconciliation and data compression algorithm [14].
Considering the requirements of practical applications, Chen et al. proposed an improved vector-based missing key tag identification (iVEKI) protocol [15] to deactivate ordinary tags and identify missing key tags, separately.Thus, the missing more valuable key tags can be identified more efficiently.Considering privacy-leakage prevention, Wang et al. made use of the group-based and collision-reconciled protocols to identify missing tags in blocker-enabled systems [16].In [17], Yu et al. proposed the point-to-multipoint (P2M) and collision-free point-to-point (P2P) protocols to reduce communication cost.However, most of these works concentrate on improving frame utilization with the help of either multiple hash assignments or collision reconciliation strategies.Much useful information is wasted.
In recent research, a few missing tag identification protocols are considered to use bit-tracking technology.With Manchester encoding, the reader is capable of detecting the positions of colliding bits in the received collision message and retrieving useful information in the collision slot.Actually, bit-tracking has been widely applied in many tag anti-collision protocols, such as the M -ary collision tree protocol [18], efficient bit-detecting protocol [19], modified dual prefixes matching mechanism [20] and so on.For missing tags, PCMTI verifies the presence of two tags in each slot with the help of bittracking [10].To further improve the identification efficiency, CRMTI takes advantage of both bit-tracking and collision resolving technologies to allow customized tag responses in the reconcilable collision slots [11].These strategies can reduce time costs to some extent, but they did not make full use of the bit-tracking technology.

B. missing tag identification with unknown tags
Many works assumed that reader knows the ID information of all present tags within the reading range, which is unrealistic in most applications.In the literature, Shahzad et al. took the first step to consider the effect of unknown tags and proposed two RFID monitoring protocols with unexpected tags (RUN) [4], i.e., RUN D and RUN I for probabilistic and deterministic missing tag identifications, respectively.In their work, multiple frames with different seeds are executed to reduce the effect of unknown tags, and the number of unknown tags is estimated from the executed frames and used to optimize the frame parameters to reduce time cost.Although RUN did not take any additional frames for estimation, the execution of all slots in each frame takes a long time.In [21], Xie et al. proposed a fast continuous scanning (FCS) protocol that uses multiple categories filter to detect unknown tags and skip the nonsingleton slots to improve the identification efficiency.
To further reduce the effect of unknown tags, Chen et al. proposed two ERMI protocols [12] and separated the process into unknown tag deactivation and missing tag identification phases.In the first phase, reader estimates the number of unknown tags and deactivates them.With the estimated tag number and predefined reliability, the frame parameters are optimized to minimize the execution time.In the second phase, the traditional hash assignment method is used for missing tag identification.However, the required reliability of ERMI is not always guaranteed, especially when the number of unknown tags is large.Similarly, Yu et al. introduced the BMTD protocol to deactivate unexpected unknown tags and then to detect tag missing events [5].Following up, a compressed filter-based BMTD (CBMTD) protocol is proposed to further reduce the time cost [6].Wang et al. also proposed a nearoptimal protocol (OPT-G) [22] to notify the group ID of known tags in the presence of unexpected unknown tags.
Recently, some unknown tag number estimation protocols are proposed.In [23], Xiao et al. studied the churn estimation problem in dynamic RFID systems and proposed three churn estimators to estimate the numbers of missing, present, and unknown tags, separately.They used the state changes caused by missing and unknown tags to estimate the number of dynamic tags, but the slots with both missing and unknown tags are wasted.In [24], Liu et al. proposed a simultaneous estimation of the blocked tag size and the unknown tag size (SEBU) protocol to facilitate the identification of blocked RFID tags.Xi et al. implemented single-slot count (SCT) and time slot reuse (TSR) strategies in SSR (SCT+TSR) protocol to estimate the numbers of missing and unknown tags simultaneously [25].Considering unreliable channels, Wang et al. proposed a cardinality estimation scheme (CEUT) to estimate the number of unknown tags in the presence of known tags [26].However, these works focus on increasing the estimation accuracy of unknown tag numbers and the time cost is high.
Moreover, some special strategies are introduced to mitigate the effect of unknown tags.In [27], Wang et al. proposed an order-based missing tag identification (OMTI) protocol to dynamically assign each tag an exclusive slot.With offline serialization and online identification, the effect of unknown tags is reduced.In [28], Chen et al. presented an efficient and accurate protocol to identify missing tags in high dynamic RFID systems.They combined the reply slot location and reply bits of tags for simultaneous missing tag identification and unknown tag filtering.Besides, some unknown tag identification protocols are proposed to separate known and unknown tags [29]- [31].In general, existing works take some strategies to reduce the effect of unknown tags.Whereas they usually take the basic hash assignment method to identify missing tags which takes a long time to meet the high-reliability requirement.

III. SYSTEM MODEL
This work considers a typical large-scale RFID system with a reader, a backend server, and numerous tags as in Fig. 1.Tags are attached to objects for ease of identification, classification, sorting, and other inventory management.For simplicity, each object is assumed to have one tag and is represented by the corresponding tag ID.The reader is in charge of monitoring all tags within its reading range and uploads the collected ID information to the database in the backend server.Reader can also retrieve information of tags stored in the database via a high-speed channel.The backend server has powerful communication, computation, and storage capabilities that can effectively assist reader to monitor tags.Each tag has a unique ID and is capable of simple computation operations as in [11], [12], such as random number generation, lightweight hash function, modulus operation, and so on.Fig. 1: System model of a large-scale RFID system with both known and unknown tags.Note that the ID information of known tags is stored in the backend database, and the reader has no prior information about unknown tags.
With stock management, the ID and other information of new tags are collected and recorded in the backend database with traditional tag anti-collision protocols in warehouse entry.In the system, the set of tags may dynamically change because of management faults or theft.For example, some tags may be taken to the wrong zone and newly appear in reader's reading range; some may be stolen or mistakenly move out of the reading range.Therefore, reader has to frequently monitor all tags within range to identify missing ones as soon as possible.
To efficiently identify missing tags, reader verifies the state of each tag by comparing the collected tag response with the backend database.Since the reader can retrieve all tags' ID information from the backend database, we denote the tags stored in the database by known tags.A reading round is referred to as the process in which reader verifies states of all known tags.As is shown in Fig. 1, if a known tag is still within the reading range in the current round, the tag is referred to as present tag; otherwise, it is a missing tag.Besides, if a tag newly appears in the reading range, i.e., there is no information in the database, it is an unknown tag.
Denote the numbers of known and unknown tags by K and U, respectively.The number of missing tags is represented by M. Affected by the presence of unexpected unknown tags, a missing tag may be falsely identified as present.Let M f ls indicate the number of falsely identified missing tags.We define the false negative rate ∇ f n be the number of falsely identified missing tags to the total number of missing tags.Given a required reliability α, reader has to identify all missing tags in M and the following inequality should be guaranteed, i.e., The main object of this work is to reduce time cost and false negative rate in missing tag identification with the presence of unknown tags in large-scale RFID systems.

IV. PROPOSED ETMTI PROTOCOL
In this section, we describe the proposed ETMTI protocol in detail.The identification process of ETMTI consists of two phases, i.e., unknown tag deactivation, and missing tag identification phases.As is illustrated in Fig. 2, a new earlybreaking estimation-based unknown tag deactivation (EBUD) method is developed in Phase I to effectively estimate the number of unknown tags and deactivate them.With EBUD, most unknown tags can be deactivated in a very short time.In Phase II, a new tree-splitting-based missing tag identification (TSMTI) method is developed to effectively identify missing tags and deactivate the remaining unknown ones.With treesplitting, the identification time is greatly reduced and the reliability is further improved.

A. Phase I: early-breaking estimation-based unknown tag deactivation
In this phase, reader executes a new EBUD algorithm to estimate the number of unknown tags in the first frame and deactivate them in subsequent frames.In the i-th frame of this phase, the reader first assigns known tags with hash mapping to construct an indicative vector P V .In detail, it generates the random seed R, sets frame size f i = K and calculates slot index for tag T j by where H() is a hash function.Then, it generates P V with f i bits zeros and sets the s-th bit to be "1", representing that the s-th slot is an expected non-empty slot.If there is no tag assigned, the reader sets the corresponding bit to be "0", denoting an expected empty slot.As is shown on top of Fig. 3, the constructed P V ="1 0 1 0 1 0 1 1 1 0", i.e., only the 2nd, 4-th, 6-th and 10-th slots are expected empty slots.

Breaking point
Unknow tag number estimate present tag missing tag unknown tag Hash mapping Actual response deactivated unknown tag Fig. 3: Early-breaking estimation-based unknown tag deactivation.
To effectively estimate the number of unknown tags, a new early-breaking estimation method is introduced.As is illustrated in Fig. 3, reader sets the breaking point to break P V into two parts, and the first f sub bits are expressed as the expected vector EV .Note that f sub = ⌈γf 1 ⌉, where γ is the early-breaking factor ranging in [0, 1] and ⌈•⌉ is the ceiling function.Then, it broadcasts Querye(R, f 1 , EV ) command to inform tags with the random seed, frame size, and the expected vector.It should be noted that transmitting only the subvector of P V reduces time cost.
Since all known tags will be assigned to the "1" bit positions, they will keep silent and wait for the next query command.Only unknown tags might map to the "0" bit positions, hence the reader can estimate the number of unknown tags through checking response information of the expected empty slots.After receiving the Querye command, a tag calculates the slot index s with (2) and checks the corresponding bit in EV .If EV (s) is "1" or s is greater than the length of f sub , it will keep silent in the current frame.If EV (s) is "0", the tag confirms that it is an unknown tag and constructs its response string with bit-tracking response method.Denote the number of "0"s in EV by n 0 and the number of "0"s prior to the s-th position by n 0 .The tag first generates a n 0 bit response string R str by setting the (n 0 + 1)-th bit to "1" and other bits be "0"s.Then, it replies R str to the reader and deactivates itself immediately.More specifically, as is shown in the middle of Fig. 3, tags U 1 and U 3 are assigned into "1" bit positions of EV that they will keep silent in the current frame.Since tags U 2 , U 4 and U 5 are assigned into the "0" bit positions, they will reply and deactivate themselves in this frame.Taking U 5 as an example, it constructs the response string as "001", since there are 3 "0"s in EV and tag U 5 is assigned in the 3-rd "0" bit position.Similarly, the response strings of tags U 2 and U 4 are the same, i.e., "100".With bit-tracking technology, the received message at the reader side in this frame is "x0x", where "x" refers to a colliding bit.Notice that if there is only one tag response, the received message will have one "1" bit which is also regarded as an "x".
By calculating the number of "x"s n x , reader estimates the number of unknown tags.Since tags are randomly assigned into slots, the probability that a tag is assigned to a specific slot is 1/f 1 .If the reader detects an "x" in the received message, it knows that at least one unknown tag replies in the expected empty slot.Recalling the construction of P V , the probability that no known tags are assigned to a specific bit position in P V is expressed as Similarly, the probability that at least one unknown tag are assigned in a specific position is calculated as Then, the probability that reader detects an "x" is given by The expectation of the number of "x"s in the received message is calculated by Substituting it to (5), the estimated number of unknown tags is calculated In subsequent frames, the reader does similar operations to deactivate unknown tags.It assigns known tags into slots to construct the indicative vector P V and broadcasts Queryd(R, f i , P V ) command to tags.On receiving this command, tags that are assigned into the "0" bit positions in P V do hash mapping operations and deactivate themselves immediately.The deactivated unknown tags will not participate in Phase II.In general, the main discrepancies between the estimation and deactivation processes are in two folds.Firstly, in Querye command, an expected vector EV copied from the first γf 1 bits from P V is transmitted.Whereas in Queryd the full P V string is transmitted.Secondly, after receiving Querye command, tags that are assigned into "0" bit positions in EV will reply to the reader.However, after receiving Queryd commands, tags will not reply to reader, i.e., there are only reader's commands transmitted in each frame in the deactivation process.Tags that are assigned to the expected empty slots will deactivate themselves and keep silent.

B. Phase II: tree-splitting-based missing tag identification
In this phase, the reader executes the B-ary tree-splitting method to quickly identify missing tags and deactivate the remaining unknown tags.What's more, the first frame is also different from the subsequent frames.In the first frame, the reader generates a random hash seed, sets frame length, and assigns known tags with hash mapping to construct the indicative vector BV as in Fig. 5. Different from Phase I, three states should be indicated in BV : (i) If there is no tag assigned in a specific segment, this is an expected empty slot and denoted by a single "0" bit; (ii) If only one tag is assigned, this is an expected singleton slot and represented by "10"; (iii) Otherwise, this is an expected collision slot and denoted by "11".For example, in F 1 of Fig. 5, the 3-rd and 9-th slots are two expected singleton slots, the 2-nd, 5-th, and 7-th slots are three expected collision slots, and others are expected empty slots.Then the constructed indicative vector BV ="0 11 10 0 11 0 11 0 10 0".
To facilitate the tree-splitting process, the reader keeps a counter for each known tag, i.e., Ac(T j ) for tag T j .If tag T j is assigned into an expected singleton slot, the reader sets Ac(T j ) = 0; if it is assigned into an expected collision slot, the reader calculates the number of "11" segments prior to the assigned position (denoted by X 11 ) , and sets Ac(T j ) = X 11 + 1.Then, the reader broadcasts Querym(R, f 1 , BV ) and waits for tag responses.After receiving this command, tag T j does the same hash mapping operations as the reader and checks the corresponding segment in BV as follows: • If the assigned segment is "10", the tag sets Ac(T j ) = 0 and prepares an X 10 bits response string R str , where X 10 is the number of "10"s in BV .For instance, in frame F 1 of Fig. 5, Ac(T 4 ) = Ac(T 6 ) = 0, and X 10 = 2; • If the assigned segment is "11", the tag does similar operations as the reader to obtain X 11 , and sets Ac(T j ) = X 11 + 1.As is shown in frame F 1 of Fig. 5, and Ac(T 2 ) = Ac(T 9 ) = Ac(T 10 ) = 3; • If the assigned segment is "0", the tag determines that it is an unknown tag and will be deactivated.In frame F 1 of Fig. 5, we can observe that U 3 is deactivated.
For a tag with Ac(T j ) = 0, it sets all bits of R str to be zero.It then counts the number of "10"s prior to its assigned segment in BV , and sets the corresponding bit in R str to be bit "1".For example, in F 1 of Fig. 5, tag T 4 is assigned into the first "10" segment in BV .It sets R str ="10".Similarly, tag T 6 is assigned into the second "10" segment in P V , so that it sets R str ="01".Then, the two tags reply R str and keep silent.After receiving tag responses, the reader decodes the received message as "xx" and confirms that tags T 4 and T 6 are present tags.
In subsequent frames, the reader identifies missing tags with a B-ary tree.In detail, the reader divides the i-th frame (i ≥2) into multiple groups based on the number of expected collision slots in the (i-1)-th frame.Each group consists of B slots.The group index of each tag is determined by its counter value Ac.In each group, the reader assigns tags with s = H(ID, R) mod B + 1, and constructs indicative vector BV by concatenating the slot states in all groups.It then updates the counter values of all tags based on the constructed BV .For example in F 2 of Fig. 5, with B=3, the reader assigns T 1 , T 3 and T 7 in the first three slots because their Ac=1; T 5 and T 8 with their Ac=2 are assigned into the second group; T 2 , T 9 and T 10 are assigned into the third group.The constructed indicative vector BV ="10 11 0 10 10 0 0 11 10".Since T 3 and T 7 are assigned in the first expected collision slot; T 9 and T 10 are assigned in the second expected collision slot; other tags are assigned in the expected singleton slots, tags update their counter values as Ac(T 1 ) = Ac(T 5 ) = Ac(T 8 ) = Ac(T 2 ) = 0, Ac(T 3 ) = Ac(T 7 ) = 1, Ac(T 9 ) = Ac(T 10 ) = 2. Next, the reader broadcasts Querym(R, B, BV ) to tags.
On receiving reader's Querym command, tag T j does similar hash operations as the reader and checks corresponding segments of the Ac(T j )-th group in BV .Then it operates similarly to the tags in the first frame.If the tag is assigned into an expected singleton slot, it first checks the number of "10"s in BV , denoting by X 10 and generates a response string R str with X 10 zero bits.It then checks the number "10"s prior to its assigned position and sets the corresponding bits in R str into "1" and reply to the reader.If the tag is assigned into an expected collision slot.It calculates the number of "11"s prior to its assigned position and updates Ac accordingly.If the tag is assigned to an expected empty slot, it deactivates itself.
For example, in F 2 of Fig. 5, tags T 1 , T 3 , and T 7 do hash mapping operations and check the first three segments ,i.e., the first group, in BV .Tag T 1 is assigned to an expected singleton slot and tags T 3 and T 7 are assigned into an expected collision slot.Tag T 1 check the number of "10"s, generates R str = "1000" and replies to the reader.Tags T 3 and T 7 check the number of "11"s prior to their assigned position and update their counter values as Ac(T 3 ) = Ac(T 7 ) = 1.In the 4-th to 6-th segments, tag U 1 is assigned into an expected empty slot that will be deactivated.In the 7-th to 9-th segments, tags T 9 and T 10 are assigned into the same expected collision slot.They update Ac(T 9 ) = Ac(T 10 ) = 2. Since tags T 2 , T 5 and T 8 are missing, only tag T 1 will reply in this frame.
After receiving tags' responses, the reader determines that tag T 1 is present, and tags T 2 , T 5 , and T 8 are missing.Similarly, the reader confirms that tags T 3 , T 7 , T 9 and T 10 are present in F 3 .If there are no collision slots in F 3 , it means all tags are identified.Then the reader terminates the current reading round.Otherwise, it splits collision slots and repeats the identification process in subsequent frames.With tree-splitting, colliding tags are more easily separated and the identification process is effectively accelerated.
V. PERFORMANCE ANALYSIS In this section, we first analyze the deactivation phase and optimize the early-breaking factor γ to balance the time cost and estimation error of EBUD.Next, we analyze the identification phase and optimize the frame parameter β and the branch number B. Then, the false negative rate of the identification phase is analyzed.Since the false negative rate is affected by the number of unknown tags participating in Phase II, number of frames needed in Phase I is determined by making use of the estimated unknown tag number and the required reliability to deactivate enough unknown tags.More specifically, Fig. 4 illustrates the main logic of our analysis.

A. Time cost of Phase I
In Phase I, a new EBUD method is developed to effectively estimate the number of unknown tags and deactivate them.Time cost of EBUD is given by where T est and T dea are the time costs of the estimation and deactivation processes, respectively.In the estimation process, each frame consists of the transmission of reader's Querye() command and unknown tags' responses.As given in Section IV-A, Querye( R, f 1 , EV ) command consists of a 4-bit command type string, a 16-bit hash seed, a 16-bit frame size, and a γf 1 -bit expected vector.On the tag side, unknown tags are assigned to expected empty slots that will reply immediately.With bit-tracking response, the length of a response message is the number of expected empty slots indicated in EV .Since the probability of a specific slot to be empty is (1 − 1 f1 ) K , the number of expected empty slots is γf 1 (1 − 1 f1 ) K .Thus, the time cost is given by where f 1 = K and t id is time cost for transmitting a 96-bit string.It should be noted that both reader's request command and tags' responses are divided into 96-bit segments to facilitate transmission.In the estimation process, two indexes, T est and estimation error ϵ are adopted to determine the early-breaking factor γ. Define estimation error as where abs(•) returns the absolute value of a number.Table I gives the statistic results averaged from 100 tests to demonstrate how γ affects these two indexes.As is shown, with smaller γ, the estimation error increases and the time cost decreases.To balance the two indexes and provide reasonable estimation accuracy, we set γ = 1/4 in EBUD.
In the deactivation process, each frame only consists of the transmission of reader's request command Queryd(R i , f i , P V ).The time cost is calculated by Substituting ( 8) and ( 10) into (7), time cost of Phase I is obtained.

B. Time cost of Phase II
In Phase II, a new B-ary tree-splitting-based missing tag identification (TSMTI) method is developed to quickly identify missing tags.Time cost of Phase II consists of two parts, i.e., where T r and T t are time costs of transmitting reader requests and tag responses in Phase II, respectively.In frame F 1 , a tag is randomly assigned into an expected slot indicated in BV , and the probability is given by 1/f 1 = 1/(βK).In subsequent frames, tags assigned in the same expected collision slot are split into B subgroups.In Fig. 5, the splitting process can be viewed as a single search applied to a tree whose root node has f 1 children, and all subsequent nodes have B children.Inspired by [32], we consider these root nodes for the individual tree searches to be at level 0, and the i-th level of the tree can be viewed as the (i + 1)-th frame in Phase II.In the i-th level, the search probes over subintervals of size B i .Thus, a tag is assigned to a specific slot of the i-th level given by Then, the probability that j out of K tags fall into a particular slot of level i is Probabilities that a slot is an expected empty, singleton or collision slot are separately given as follows, Let q i be the probability that a particular slot at level i is visited in the splitting process.In level i, a slot is visited only when its parent experiences a collision.Otherwise, if its parent slot is empty or singleton, it cannot generate subgroups.Then, we have It can be noted that all slots at level 0 will be probed, hence q 0 = 1.In the i-th level, the average number of expected slots to be visited is determined by summing q i over all subintervals which equal to βKB i , i.e., Reader broadcasts Querym( R, f 1 , BV ) in the first frame or Querym( R, B, BV ) in subsequent frames to tags.When a tag is assigned to an expected singleton slot, it will reply to the reader.For each frame, number of segments in BV is obtained by (18).Since each level refers to one frame and the state of each slot is indicated by at most 2 bits in BV , time cost for transmitting reader's request commands can be approximated by where T r fi is time cost for transmitting reader command in the i-th frame, and F m is the number of frames needed in Phase II.If a tag is resolved in a level higher than i in the tree, then it will also be resolved in level i [32].Hence, by counting all singleton slots in level i, we are accounting for all singleton slots visited up to and including those at level i. Number of identified tags in level i is equal to number of singleton slots in i level minus number of singleton slots in level i − 1.Then, number of identified tags in level i is calculated as: As is shown in Fig. 5, with bit-tracking technology, length of tags' response message in each frame equals the number of expected singleton slots indicated in BV .In the i-th frame of Phase II, time cost for transmitting tag responses is given by With ( 20) and ( 21), we have Because Phase II terminates when all known tags are identified, F m should meet the requirement that Substituting ( 19), ( 22) and ( 23) to (11), time cost of TSMTI is obtained.In Phase II, two parameters affect the performance of TSMTI, i.e., frame factor β, and branch number B. Fig. 6 gives the numerical results of T 2 when β and B changes.As can be observed, T 2 decreases when β ranges from 0.1 to 0.95, and increases when β > 0.95.In the meantime, when B = 3, T 2 is smaller than other settings of B. Therefore, the near-optimal parameters are given by β = 0.95 and B = 3.

C. False negative rate
In Phase II, if a missing tag is assigned to the expected singleton slot and at least one unknown tags happen to be assigned to the same slot, the missing tag will be falsely identified as present.With (15), number of misidentified missing tags at the i-th level of Phase II is given by Here, M * i and U i are the numbers of missing tags to be identified and unknown tags participating in the i-th level of Phase II.In (24), the first segment represents the number of expected singleton slots with missing tags which equals to M * i , and the second segment refers to the probability that at least one unknown tag selects this slot.Suppose that the missing tags are evenly distributed, the probability that one known tag is missing is M K .Based on (20), the number of missing tags to be identified in level i is expressed as: Only when an unknown tag is assigned to an expected collision slot in level i − 1, the tag will participate in level i.Based on ( 16), the number of remaining unknown tags in level i is given by Here U 0 is the number of unknown tags participating in level 0, i.e., the first frame of Phase II.It equals the number of remaining unknown tags U d after Phase I, i.e., U 0 = U d .Substituting (25) and ( 26) into (24), number of misidentified missing tags at level i is obtained.Finally, the false negative rate of TSMTI is given by The false negative rate of TSMTI is affected by U d .To analyze the effect, we set the remaining unknown tag ratio r ud = U d K , i.e., the percentage of remaining unknown tags to the known ones.Based on our analysis, Table .II illustrates the numerical values of ∇ f n when r ud varies.With (1), we have ∇ f n < 1 − α.When the required reliability α = 0.9, ∇ f n < 0.1.According to Table II, the allowed remaining unknown tag ratio r ud ≤ 0.15 and we set r ud = 0.10 to meet the requirement.Similarly, when α = 0.95 (resp.0.99), we set r ud is smaller than 0.05 (resp.0.01), respectively.Therefore, the number of remaining tags should meet

D. Determination of F d in Phase I
With the required number of remaining unknown tags after Phase I, the number of frames needed to deactivate enough unknown tags can be calculated.Recalling the deactivation process of Phase I, when an unknown tag is assigned to the expected empty slot indicated in P V , it will deactivate itself.Thus, in the i-th frame of the deactivation process, number of newly deactivated unknown tags U * i is given by where U i is the number of unknown tags participating in the i-th frame and the frame size f i = K.The initial value U 1 =U.With recursive resolving, number of remaining unknown tags U d after F d frames can be calculated as follows, With the estimated unknown tag number, F d is obatined by, Substituting ( 28) into (31), the number of frames needed in the deactivation process of Phase I is obtained, i.e., In conclusion, as is shown in Fig. 4, to determine the number of deactivation frames F d in Phase I, the reader first executes estimation process to estimate the number of unknown tags U est with (6).It then calculates F d with (32) based on the estimated unknown tag number and the reliability requirement.

VI. EVALUATION
In this section, we first evaluate the performance of our proposed EIMTI protocol in the deactivation and identification phases, separately.Next, time cost and false negative rate of the overall identification process are given.Meanwhile, the results of some best-performing benchmarks are presented for a comprehensive comparison.

A. Simulation configurations
In the simulation, a typical RFID system that consists of a reader, K known tags, and U unknown ones are considered.In a known tag set, M tags are missing.The reader can retrieve known tags' information from the backend database but has no prior knowledge about the unknown ones.Similar to previous works [5], [6], [10], each tag has a unique ID with 96-bit length, and the data rate between reader and tags is 62.5 Kbps.The transmitted message between reader and tags is divided into a 96-bit segments and each segment takes t id = 2.4 ms.As in the literature [5], [11], [12], communications between reader and tags are assumed to be error-free since the unreliable channel has a similar effect on the comparative benchmarks.
In the simulation, the performance of our proposed ETMTI protocol is compared with the most related ERMI [12], BMTD [5], CBMTD [6] and CRMTI [11] protocols.ERMI is the most representative missing tag identification protocol that considers the presence of both known and unknown tags.BMTD and CBMTD present the most related unknown tag deactivation methods.CRMTI is the most efficient missing tag identification protocol for situations with only known tags.The simulation is conducted with Matlab R2019b, and each result is averaged over 100 tests.

B. Time cost of Phase I
In this phase, the reader estimates the number of unknown tags and deactivates them with multiple frames.To deactivate enough unknown tags, number of frames of this phase is determined with (32).In the simulation, we evaluate time cost of Phase I in four scenarios: • S 11 : α = 0.95, r u = 0.1 and K ∈ [1000, 5000]; • S 12 : α = 0.99, r u = 0.1 and K ∈ [1000, 5000]; • S 13 : α = 0.95, K = 3000 and r u ∈ [0.1, 1]; • S 14 : α = 0.99, K = 3000 and r u ∈ [0.1, 1].Since missing tags do not affect the deactivation process, missing tag ratio r m , i.e., the fraction of number of missing tags to that of known tags, is set to be 0. Time cost of ETMTI in Phase I is compared with the most related ERMI, BMTD, and CBMTD protocols, and the comparative results are presented in Figs.  that of known ones.To meet the required reliability, longer frame length and more frames are needed to deactivate enough unknown tags in the deactivation process.Comparing the simulation results in Fig. 7(a) with those in Fig. 7(d), we can observe that with higher reliability requirements, time cost of Phase I also increases.Among the comparative protocols, ETMTI always takes the shortest time to deactivate enough unknown tags, and ERMI takes the longest time.Thanks to the early-breaking and bit-tracking response strategies in ETMTI, time used for unknown tag number estimation is greatly reduced.Thus, it takes much smaller time than other protocols.Whereas, ERMI takes more time to estimate the number of unknown tags since it executes the whole estimation frame.Therefore, ERMI takes more time than ETMTI.Taking advantage of multiple hash functions, BMTD uses bloom filters to deactivate unknown tags.In BMTD, the number of frames is determined by minimizing the overall identification time, and the performance of the deactivation phase is not optimized.As demonstrated in Fig. 7(a) and 7(d), BMTD takes a longer time than ETMTI, but shorter time than ERMI.Besides, in order to reduce the number of hash functions used in BMTD, CBMTD proposed a compressed method to reduce time cost of the deactivation process.However, this method may not always work well.In Fig. 7(d), one can observe that time cost of BMTD is larger than CBMTD when α = 0.99.Whereas, as is shown in Fig. 7(a), BMTD and CBMTD take almost the same time when α = 0.95.
Next, as demonstrated in Fig. 7(b) and 7(e) that time cost of Phase I increases with an unknown tag ratio.One can observe that ETMTI takes the shortest time and ERMI takes the longest time.Thanks to fewer messages are need to estimate the number of unknown tags resulting in less time cost.Moreover, the estimated tag number and number of frames of the deactivation process in ETMTI are appropriately set.In ERMI, the frame size of the estimation process is set to be the number of known tags.With a slot-by-slot reply method, more time is needed to estimate unknown tags.Therefore, ERMI takes longer time than ETMTI.In BMTD, a few frames are used in the deactivation process, but the frame length is set to be very long to deactivate more unknown tags in each frame.Thus, it takes more time than ETMTI, especially when the unknown tag ratio is small.With compressed filters, CBMTD takes a shorter time than BMTD in most cases.In general, the proposed ETMTI protocol shows better performance than other comparative protocols to deactivate unknown tags.

C. Time cost of Phase II
In this phase, a missing tag identification protocol is executed to verify the presence of known tags and identify missing ones.We evaluate time cost of Phase II in two scenarios: Since the unknown tags do not affect the time cost of Phase II, r u is set to be 0. The simulation results of ETMTI are compared with the most related ERMI and CRMTI protocols.
As is illustrated in Fig. 7(c), time cost of the missing tag identification protocols increases with the number of known tags.Among the comparative protocols, ETMTI takes the least time to identify all tags, and ERMI takes the most time.Besides, as is shown in Fig. 7(f), time costs of the comparative protocols keep unchanged when the missing tag ratio changes.In this phase, reader has to verify the presence of all known tags and that the identification time is only affected by the number of known tags.With a fixed K, time cost of Phase II keeps unchanged.In the two scenarios, we observe that ETMTI always takes the least time for missing tag identification of Phase II.The main reasons are as follows.
In ETMTI, a new B-ary tree-splitting method is proposed to split colliding tags into smaller groups in a layered structure.The collision probability reduces as the number of layers increases resulting in an increased utilization of the indicative vector.Whereas, ERMI and CRMTI adopt the Aloha-based method to randomly assign tags repeatedly.In each frame, the collision probability is high.Although CRMTI uses collision resolving method to increase the utilization of indicative vectors, it still takes longer time than ETMTI.Moreover, tag response strategies used in the comparative protocols are also different.In ERMI, the tag replies with a 1-bit short response in the expected singleton slot.With collision resolving and bit-tracking strategies, CRMTI allows multiple tags to reply with customized responses simultaneously in the expected resolvable collision slot.Thus, time cost for tag response in CRMTI is smaller than that in ERMI.Extending the bittracking strategy to all slots, ETMTI further reduces the overhead of each slot and the time cost of ETMTI is smaller than other comparative protocols.

D. Performance of the overall process
In this part, we evaluate the time cost and false negative rate of the overall process in three scenarios: The performance of ETMTI is compared with the most related ERMI and CRMTI protocols and the comparative results are illustrated in Fig. 8.For ETMTI and ERMI, simulation experiments when α = 0.95 and α = 0.99 are separately conducted in each scenario.
Firstly, as is shown in Fig. 8(a), the overall time costs of all protocols increase with the number of known tags.Benefiting from the bit-tracking strategies, ETMTI and CRMTI take a much shorter time than ERMI.When α = 0.95, ETMTI takes the least time than other comparative protocols.When α = 0.99, ETMTI takes a little bit longer time than CRMTI.As can be observed in Fig. 8(a), time costs of ETMTI and ERMI increase with the required reliability.With larger α, more time is needed in Phase I to deactivate enough unknown tags.
Fig. 8(d) presents that false negative rates of all comparative protocols keep unchanged when number of known tags varies.As is shown, false negative rates of ETMTI and ERMI decrease as the required reliability increases.When α = 0.95, the false negative rate of ERMI is about 0.03, and that of ETMTI is reduced below 0.02.When α = 0.99, the false negative rate of ERMI is about 0.007, and that of ETMTI is about 0.004.Both ETMTI and ERMI achieve the required reliability, and ETMTI always has a smaller false negative rate than ERMI in the same condition.We can also observe that the false negative rate of CRMTI is almost the same as that of ETMTI (when α = 0.95).In this scenario, the unknown tag ratio is so small that the unknown tags have little effect on the identification process of CRMTI.Thus, the false negative rate is low.Since CRMTI does not deal with unknown tags, the lowest false negative rate it can achieve is around 0.19.
Secondly, Figs.8(b) and 8(e) present the overall time cost and false negative rate separately when the missing tag ratio varies.As is demonstrated in Fig. 8(b), time costs of the comparative protocols keep unchanged when the missing tag ratio increases.In ERMI and ETMTI, the overall process is affected by the number of known and unknown tags, as well as the required reliability.with larger α, time costs of ETMTI and ERMI increase since the reader needs more time to deactivate enough unknown tags.CRMTI is only affected by the number of known tags.Therefore, the overall time costs of the comparative protocols do not change with the missing tag ratio.In general, we can observe that ETMTI (when α = 0.95) takes the shortest time, CRMTI takes longer time than ETMTI (when α = 0.95), but a little bit lower time than ETMTI (when α = 0.99), and ERMI always takes the longest time.
In Fig. 8(e), ETMTI (when α = 0.99) has the least false negative rates, and ERMI (when α = 0.95) has the worst performance.We can also observe that CRMTI shows similar performance with ETMTI (when α = 0.95), but it takes more time as is shown in Fig. 8(b).When α = 0.99, ERMI shows a little bit higher false negative rate than ETMTI, but the increased time cost is too much.It should be noted that the false negative rate of ERMI decreases as the missing tag ratio increases.In an expected singleton slot, if the assigned known tag is missing and one or more unknown tags are assigned to this slot.The missing tag will be falsely identified as present, resulting in a false negative event.In this scenario, number of unknown tags is a fixed small value.As the missing tag number increases, the percentage of falsely identified missing tags decreases so that the false negative rate decreases accordingly.
Thirdly, Figs.8(c) and 8(f) exhibit the overall time cost and false negative rate when the unknown tag ratio changes, respectively.As is shown in Fig. 8(c), time cost of CRMTI keeps unchanged since it is only affected by the number of known tags.However, in ETMTI and ERMI, as the unknown tag ratio increases, more time is needed to deactivate enough unknown tags in Phase I. Thus, the overall time costs of ETMTI and ERMI increase with an unknown tag ratio.Similarly, their time costs also increase with the required reliability.Moreover, as is demonstrated in Fig. 8(c), when the unknown tag ratio is small, ETMTI (when α = 0.95) takes the least time.As the unknown tag ratio increases, ETMTI (when α = 0.95) takes more time than CRMTI.
In Fig. 8(f), ETMTI (when α = 0.99) has the least false negative rate than other comparative protocols.ERMI (when α = 0.99) has a higher false negative rate than ETMTI (when α = 0.99).CRMTI and ERMI (when α = 0.95) show the worst performance.We can observe that the false negative of ETMTI decreases as the unknown tag ratio increases.With more unknown tags, ETMTI needs more frames to deactivate them in Phase I.This is in accordance with the increasing trends of overall time cost in Fig. 8(c).The increased number  of frames further increases the percentage of deactivated unknown tags resulting in a reduced number of unknown tags that participate in Phase II.Therefore, the false negative rate of ETMTI decreases with the increase of the unknown tag ratio.
In ERMI, the false negative rate also decreases as the unknown tag ratio increases, but the decrease rate is very small.Since the number of deactivated unknown tags of ERMI is not as much as that in ETMTI, the decreased false negative rate is not obvious.Note that the fluctuations in ERMI are mainly caused by the inaccurate estimate of unknown tag number.Without any deactivation strategy, the false negative rate of CRMTI increases with an unknown tag ratio.To sum up, ETMTI exhibits better performance in terms of time cost and false negative rate than the comparative benchmark works.

VII. CONCLUSION
In this work, we proposed an efficient ETMTI protocol to identify missing tags with the presence of unexpected unknown tags in large-scale RFID systems.In ETMTI, two new strategies, i.e., EBUD, and TSMTI are developed to effectively deactivate unknown tags and identify missing tags, respectively.With EBUD, reader can estimate the number of unknown tags within a short time and quickly deactivate enough unknown tags to meet the required reliability.With TSMTI, the colliding tags are more efficiently split into smaller groups which increases the identification efficiency.Moreover, a bit-tracking response strategy is designed to allow the identification of multiple tags in one slot which further reduces time cost.Theoretical analysis is conducted and the optimal parameters in both EBUD and TSMTI are obtained.Numerous simulation results are presented to demonstrate the effectiveness of ETMTI.In the future, we will consider to implement collision reconciliation and compression techniques to further reduce time cost of the identification process.

Fig. 4 :Fig. 5 :
Fig.4: The logic diagram of perfomance analysis: firstly, time cost of Phase I is analyzed and the early-breaking factor γ is determined through balancing estimation error ϵ and time cost T est ; secondly, time cost T 2 and false negative rate ∇ f n of Phase II is analyzed and the optimal frame parameter β and branch number B are obtained; finally, the number of frames in Phase I F d is determined to deactivate enough unknown tags based on the estimated unknown tag number U est and required reliability α.

Fig. 6 :
Fig. 6: Time cost of Phase II when β and B changes.
7(a), 7(b), 7(d) and 7(e).It should be noted that the unknown tag number estimation process in BMTD and CBMTD is neglected in the simulation since the specified unknown tag number estimation method in their works is complicated and time-consuming.As shown in Figs.7(a) and 7(d), time cost of Phase I increases with the number of known tags.With a fixed unknown tag ratio, number of unknown tags increases with

Fig. 7 :
Fig. 7: Time cost: (a) time cost of Phase I in scenario S 11 ; (c) time cost of Phase I in scenario S 12 ; (b) time cost of Phase I in scenario S 13 ; (e) time cost of Phase I in scenario S 14 ; (c) time cost of Phase II in scenario S 21 ; (f) time cost of Phase II in scenario S 22 .

Identification process Estimation process Deactivation process Phase I: EBUD Phase II: TSMTI ETMTI begin ETMTI end
(2).2: Schematic of ETMTI:(1)in Phase I, reader estimates number of unknown tags and deactivate them.Number of frames needed in Phase I is determined to meet the required number of remaining unknown tags that participate in Phase II;(2)in Phase II, reader identifies missing tags with treesplitting method.Number of remaining unknown tags allowed to participate in this phase is dereclty determined on number of deactivation frames and indirectly determined based on the estimation and required reliability.

TABLE I :
Effect of γ on the estimation process

TABLE II :
False negative rate of TSMTI when unknown tag ratio varies .
Performance of the overall process: (a) time cost vs.number of known tags in scenario S 31 ; (b) false negative rate vs. number of known tags in scenario S 31 ; (c) time cost vs.missing tag ratio in scenario S 32 ; (d) false negative rate vs. missing tag ratio in scenario S 32 ; (e) time cost vs.unknown tag ratio in scenario S 33 ; (d) false negative rate vs. unknown tag ratio in scenario S 33 .