A Novel Preﬁx Cache with Two-Level Bloom Filters in IP Address Lookup

: Preﬁx caching is one of the notable techniques in enhancing the IP address lookup performance which is crucial in packet forwarding. A cached preﬁx can match a range of IP addresses, so preﬁx caching leads to a higher cache hit ratio than IP address caching. However, preﬁx caching has an issue to be resolved. When a preﬁx is matched in a cache, the preﬁx cannot be the result without assuring that there is no longer descendant preﬁx of the matching preﬁx which is not cached yet. This is due to the aspect of the IP address lookup seeking to ﬁnd the longest matching preﬁx. Some preﬁx expansion techniques avoid the problem, but the expanded preﬁxes occupy more entries as well as cover a smaller range of IP addresses. This paper proposes a novel preﬁx caching scheme in which the original preﬁx can be cached without expansion. In this scheme, for each preﬁx, a Bloom ﬁlter is constructed to be used for testing if there is any matchable descendant. The false positive ratio of a Bloom ﬁlter generally grows as the number of elements contained in the ﬁlter increases. We devise an elaborate two-level Bloom ﬁlter scheme which adjusts the ﬁlter size at each level, to reduce the false positive ratio, according to the number of contained elements. The experimental result shows that the proposed scheme achieves a very low cache miss ratio without increasing the number of preﬁxes. In addition, most of the ﬁlter assertions are negative, which means the proposed preﬁx cache e ﬀ ectively hits the matching preﬁx using the ﬁlter. This paper proposes the Bloom ﬁlter-based preﬁx caching scheme (BFPC). While most of the existing preﬁx caching techniques are based on preﬁx expansion, BFPC improves the caching technique so that non-leaf preﬁxes can be cached without the preﬁx expansion. A Bloom ﬁlter for each preﬁx plays a role in a membership test whether there exists another longer matching preﬁx. A preﬁx is stored in a cache entry together with its Bloom ﬁlter. If some preﬁx is matched in the cache during cache lookup, then its Bloom ﬁlter is consulted. The positive assertion of the Bloom ﬁlter implies that there is possibly another longer matching preﬁx, i.e., a cache miss occurs.


Introduction
The Internet has been providing more and more various types of services at a high speed for several decades. Accordingly, the Internet router, which is an important component of the Internet, has also been required to provide higher performance. The router determines the next hop for each incoming packet based on its destination IP address. To make this decision, it performs the IP address lookup, which is to search for the best matching prefix in the forwarding information base (FIB). For a given IP address, there can be multiple matching prefixes in the FIB, and the longest matching prefix (LMP) among them should be selected as the best result. However, the process of finding an LMP is very complex and consumes a lot of cycles. Therefore, many researchers have been working on high-performance IP address lookup [1][2][3][4][5][6][7][8][9][10][11].
Caching is one of the prominent techniques which can be applied to improve the performance of the IP address lookup. The caching techniques in the IP address lookup can be classified into IP address caching [12] and prefix caching [13]. The former uses only temporal locality to cache the 32-bit IP address and reuses the lookup result when the same address is given. On the other hand, the latter caches a prefix which represents a network address covering a range of IP addresses and reuses the result when one of the addresses in that range is given. In other words, it utilizes not only temporal

Prefix Caching and Prefix Expansion
A prefix cache stores the prefixes which are the destination network addresses of the incoming packets. The prefix represents a network address, i.e., a range of IP addresses. Therefore, the prefix caching scheme can effectively exploit the prefix to match several IP addresses. In other words, the prefix caching exploits spatial locality as well as temporal locality. Thus, the prefix caching achieves a higher cache hit ratio than the IP address caching. Although prefix caching is more effective, it causes the incomplete caching problem if non-leaf prefixes can be cached. Figure 1 shows an example of the incomplete caching problem that may induce a wrong lookup result when a non-leaf prefix is cached. In this figure, there are a non-leaf prefix p = 1* and two leaf prefixes a = 10010* and b = 100110*. Let us assume that the length of the IP address is 7 bits instead of 32 for simplicity and that two packets arrive in turn. If the destination IP address of the first packet is IP 1 = 1000000, the LMP becomes p = 1*. Now, the prefix p is cached for the first time. Supposing that the address of the second packet is IP 2 = 1001000, the prefix p is hit in the cache. However, the LMP for IP 2 is not p = 1* but a = 10010*, so this cache hit yields the incorrect result. If the second IP address were IP 3 = 1001100, it could also hit the prefix p and cause the incomplete caching problem. Now, suppose that the second IP address was IP 4 = 1001111. Then, the hit result p should be the correct result because the prefix p is the LMP for IP 4

.
Appl. Sci. 2020, 10, x 3 of 15 suppose that the second IP address was IP4 = 1001111. Then, the hit result p should be the correct result because the prefix p is the LMP for IP4. Figure 1. Example of the problem when caching non-leaf prefix p, supposing two consecutive packets arrive. While the destination of the first packet is IP1, that of the second packet is assumed to be one of IP2, IP3, and IP4.
There are three ways to deal with the incomplete caching problem of non-leaf prefixes. The first way is not to cache non-leaf prefixes but to cache only leaf prefixes. In the example in Figure 1, the LMP p for the first packet should not be cached. This method may limit performance gain through caching. The second way is to cache a non-leaf prefix together with all of its descendant prefixes at the same time. In this example, when prefix p is cached, its descendant prefixes a and b should be also cached. This method significantly reduces the cache efficiency by caching a lot of unnecessary prefixes in advance. The third way is to expand the non-leaf prefix into leaves so that the expanded prefixes do not overlap with other descendant prefixes. This method is called prefix expansion. Most of the previous works make use of this prefix expansion technique [13,14,[16][17][18]. Figure 2 illustrates an example of how to resolve the incomplete caching problem using prefix expansion. The prefix may be statically expanded in the FIB for later caching, or dynamically expanded at the time of caching. This figure shows an example to explain the dynamic prefix expansion. In Figure 2, it is depicted that the non-leaf prefix p is expanded when it is matched as the longest one in the FIB. For the address IP1, p is the LMP in the FIB and expanded to p1. As the expanded prefix p1 is cached instead of p, it avoids the incomplete caching problem because p1 has no descendent prefixes. Now, the second packet arrives with IP2 = 1001000, and a cache miss occurs. In this case, the LMP a for the packet is cached without expansion because the LMP a is a leaf prefix. If the second IP address were IP4 = 1001111, a cache miss would still occur because the LMP p has not been cached and IP4 does not match the currently cached prefix p1. Note that, since the original LMP p cannot be cached in itself in the prefix expansion-based schemes, the address-spatial locality is not fully exploited. For the IP address IP4, the expanded prefix p2 should be cached even though the LMP of IP4 is the same as that of IP1. Figure 2. Example of prefix expansion to resolve the problem in Figure 1. When IP1 arrives, the nonleaf prefix p is expanded to p1, which is independent of other prefixes.  There are three ways to deal with the incomplete caching problem of non-leaf prefixes. The first way is not to cache non-leaf prefixes but to cache only leaf prefixes. In the example in Figure 1, the LMP p for the first packet should not be cached. This method may limit performance gain through caching. The second way is to cache a non-leaf prefix together with all of its descendant prefixes at the same time. In this example, when prefix p is cached, its descendant prefixes a and b should be also cached. This method significantly reduces the cache efficiency by caching a lot of unnecessary prefixes in advance. The third way is to expand the non-leaf prefix into leaves so that the expanded prefixes do not overlap with other descendant prefixes. This method is called prefix expansion. Most of the previous works make use of this prefix expansion technique [13,14,[16][17][18]. Figure 2 illustrates an example of how to resolve the incomplete caching problem using prefix expansion. The prefix may be statically expanded in the FIB for later caching, or dynamically expanded at the time of caching. This figure shows an example to explain the dynamic prefix expansion. In Figure 2, it is depicted that the non-leaf prefix p is expanded when it is matched as the longest one in the FIB. For the address IP 1 , p is the LMP in the FIB and expanded to p 1 . As the expanded prefix p 1 is cached instead of p, it avoids the incomplete caching problem because p 1 has no descendent prefixes. Now, the second packet arrives with IP 2 = 1001000, and a cache miss occurs. In this case, the LMP a for the packet is cached without expansion because the LMP a is a leaf prefix. If the second IP address were IP 4 = 1001111, a cache miss would still occur because the LMP p has not been cached and IP 4 does not match the currently cached prefix p 1 . Note that, since the original LMP p cannot be cached in itself in the prefix expansion-based schemes, the address-spatial locality is not fully exploited. For the IP address IP 4 , the expanded prefix p 2 should be cached even though the LMP of IP 4 is the same as that of IP 1 .
Although prefix expansion resolves the prefix caching problem, it restricts the cache performance compared to the scheme which caches the original prefix without expansion. Even though multiple IP addresses match the same non-leaf prefix, each of their expanded prefixes should be cached. Thus, it results in more capacity misses in the same size of cache. In addition, the expanded prefix is more specific, so it exploits less address-spatial locality than the original prefix. For example, consider the IP addresses, IP 1 and IP 4 . If the original LMP p was cached on IP 1 lookup, it could be reused on IP 4 lookup. However, the more specific prefix p 1 is cached instead and that prefix cannot be used for IP 4 . Furthermore, the static prefix expansion has an additional disadvantage of increasing the size of the FIB besides the above shortcomings. this case, the LMP a for the packet is cached without expansion because the LMP a is a leaf prefix. If the second IP address were IP4 = 1001111, a cache miss would still occur because the LMP p has not been cached and IP4 does not match the currently cached prefix p1. Note that, since the original LMP p cannot be cached in itself in the prefix expansion-based schemes, the address-spatial locality is not fully exploited. For the IP address IP4, the expanded prefix p2 should be cached even though the LMP of IP4 is the same as that of IP1.  Figure 1. When IP1 arrives, the nonleaf prefix p is expanded to p1, which is independent of other prefixes.   Figure 2. Example of prefix expansion to resolve the problem in Figure 1. When IP 1 arrives, the non-leaf prefix p is expanded to p 1 , which is independent of other prefixes.

Related Works
Several researchers explored the prefix caching schemes to enable fast IP address lookup [13,14,[16][17][18][19][20][21][22]. Liu proposed three prefix expansion schemes: no prefix expansion (NPE), partial prefix tree expansion (PPTE), and complete prefix tree expansion (CPTE) [13]. In NPE, only leaf prefixes can be cached. Thus, it does not require prefix expansion for the non-leaf prefixes. If a non-leaf prefix is matched in the FIB, the 32-bit IP address is cached instead. In PPTE, a non-leaf prefix is expanded one bit longer only if the expanded prefix is not overlapped with other prefixes. Thus, there may still exist some non-leaf prefixes after the expansion by PPTE. If a non-leaf prefix is matched in the FIB, the IP address is cached as in NPE. In CPTE, every non-leaf prefix is completely expanded so that no expanded prefix is overlapped with its descendants. As a result, all prefixes become leaves after the expansion by CPTE. In CPTE, the degree of each node is either 0 or 2. Both PPTE and CPTE schemes statically expand non-leaf prefixes in the FIB, so those schemes increase the size of the FIB. Moreover, it is quite complicated to update expanded prefixes.
Akhbarizadeh and Nourani proposed a prefix caching technique called reverse routing cache (RRC) to expand non-leaf prefixes on caching without modifying the original FIB [16]. They presented two ways to handle non-leaf prefixes: RRC with parental restrictions (RRC-PR) and RRC with minimal expansion (RRC-ME). RRC-PR caches only leaf prefixes similar to NPE. RRC-ME expands a non-leaf prefix when it needs to cache the prefix. Thus, the size of the FIB is not increased. In RRC-ME, a non-leaf LMP is expanded on the fly to be cached. However, it requires additional processing time for expansion in proportion to the expansion length. A bitmap-based prefix cache (BMcache) [17] was presented to dynamically expand non-leaf prefixes using a bitmap on a caching process with low overhead in determining the expansion length.
Multi-zone caching can split a cache into multiple sections often depending on the length of the matching prefix length for the performance improvement. The short prefix expansion (SPE) was introduced to expand only prefixes shorter than 17, which decreases overhead for prefix expansion [18]. The multi-zone pipelined cache (MPC) was also developed to extend the SPE scheme to the multi-zone caching technique. In MPC, while IP address caching is used for long prefixes, prefix caching is used for short prefixes.
Zhang et al. developed a technique to find and cache the most popular prefixes based on traffic prediction [19]. When caching the prefix, there is an additional burden of caching all its descendant prefixes to ensure the correct result. Liu et al. introduced an FIB caching method to cache the popular prefixes into a limited size of ternary content addressable memory (TCAM) [20]. To resolve the incomplete caching problem, it caches a leaf prefix or a non-leaf prefix in the form of prefix expansion. The combined FIB caching and aggregation (CFCA) [21] scheme was presented to use TCAM-based cache to store frequently used prefixes. After expanding the non-leaf prefixes, it aggregates FIB prefixes according to the next-hop values. Rottenstreich and Tapolcai presented a cached classification method that applies lossy compression to the packet classifier [22]. Many packets are processed according to a small number of popular prefix rules in a fast lossy classifier, and unclassified packets are processed in a conventional exact classifier.
Recently, some route prefix caching schemes [23,24] have been studied for fast name lookup in named data networking (NDN), which is one of the representative next-generation Internet. These techniques also use the prefix expansion technique to effectively cache non-leaf prefixes without any problem.

Bloom Filter-Based Prefix Caching
The prefix expansion limits the lookup performance due to the increase in the number of entries caused by expansion as well as the limited exploitation of locality from the expanded more specific prefixes. If it is possible to know whether the current matching prefix in a cache has a matchable child prefix in an FIB, the prefix cache can be utilized more effectively by eliminating the prefix expansion. Our scheme employs a Bloom filter to check the existence of child prefixes for each prefix.
A Bloom filter is widely used to check membership and is represented by an m-bit array for a set S of n elements. All bits in the filter are initialized to 0. After some element x is inserted into the filter, some bits h i (x), 1 ≤ i ≤ k and x ∈ S, are set to 1 where h 1 , . . . , h k are k independent hash functions. Now, the membership of y is tested by checking whether h i (y), 1 ≤ i ≤ k, are all 1s. If so, y is probably an element of S (positive), otherwise y is clearly not an element of S (negative). Unfortunately, since hash functions can cause collisions, the filter can induce false positives with some probability. A false positive means that y is asserted to be an element of the target set, but it is not actually an element. The probability of the false positive must be lowered to increase the performance of the system employing the Bloom filter.
In our prefix caching, when an LMP is found in the FIB, the LMP is cached together with the corresponding Bloom filter which represents the existence of its child prefixes. Each element in a Bloom filter is identified by a bit string which represents the path from a prefix to its child.
The prefix caching scheme using a single Bloom filter is illustrated in Figure 3. Because the LMP for the first IP address IP 1 = 1000000 is p = 1*, the prefix p is cached with the corresponding Bloom filter BF as shown in the figure. In this example, the higher 5-bit string of the path from the prefix p to its child is used to represent a member for the BF. The child prefix length may differ according to children. We choose the higher bits to represent each child member, since those can be extracted without knowing the length. If the distance from a prefix to its child prefix is less than 5, then the child prefix is expanded to distance 5. For example, the distance from the prefix p = 1* to its child prefix a = 10010* is 4, the bit strings 00100 and 00101, which are the paths from prefix p to a 0 and a 1 , respectively, are used as members for the BF. Note that, in our scheme, only the child prefix is expanded in its length to be a member and the original prefix p is not actually expanded, unlike other prefix expansion schemes. Because the bit string from the prefix p = 1* to the prefix b = 100110* is 5-bit 00110, the whole set becomes {00100, 00101, 00110}, which corresponds to the prefixes a 0 , a 1 , and b, respectively. In addition, the Bloom filter in this example uses two hash functions: where c i denotes the ith bit of an element. To construct the BF, h 1 (a 0 ), h 2 (a 0 ), h 1 (a 1 ), h 2 (a 1 ), h 1 (b), and h 2 (b) are calculated. For example, h 1 (a 0 ) = h 1 (00100) = 001 and h 2 (a 0 ) = h 2 (00100) = 100. As a result, the 1st, 4th, 5th, and 6th bits of the BF are set to 1, while the other bits remain 0s. prefix expansion schemes. Because the bit string from the prefix p = 1* to the prefix b = 100110* is 5bit 00110, the whole set becomes {00100, 00101, 00110}, which corresponds to the prefixes a0, a1, and b, respectively. In addition, the Bloom filter in this example uses two hash functions: h1(c0c1c2c3c4) = c0c1c2 and h2(c0c1c2c3c4) = c2c3c4, where ci denotes the ith bit of an element. To construct the BF, h1(a0), h2(a0), h1(a1), h2(a1), h1(b), and h2(b) are calculated. For example, h1(a0) = h1(00100) = 001 and h2(a0) = h2(00100) = 100. As a result, the 1st, 4th, 5th, and 6th bits of the BF are set to 1, while the other bits remain 0s.  To make BF: To check BF : Figure 3. Prefix caching using a single Bloom filter with two hash functions.
In Figure 3, the Bloom filter BF for the prefix p is consulted when p is matched in the cache. Suppose that the destination IP address of the second packet is IP 2 = 1001000. It matches p = 1* in the cache, and our scheme ignores the matching result of the cache and regards it as a cache miss through the BF test result. For the Bloom filter test of IP 2 , the suffix after the prefix p is used as the input of the hash functions. Two hash values for IP 2 , h 1 (001000) = 001 and h 2 (001000) = 100 are obtained, as shown in the figure. Since both BF [1] and BF [4] are 1s, it is determined that there is a child prefix in the FIB which IP 2 can match. Now, assume that the destination IP address of the second packet is IP 4 = 1001111. In this case, the same matching result p = 1* is determined as the correct hit result. We obtain two hash values, h 1 (001111) = 001 and h 2 (001111) = 111. Since BF[1] = 1 and BF [7] = 0, there is certainly no child prefix of p which can be matched with the IP address IP 4 .

A Two-Level Bloom Filter-Based Prefix Cache
For the matching prefix in a cache, the existence of their child prefix which matches a given incoming IP address can be tested with a Bloom filter. However, the Bloom Filter can misleadingly assert a positive, which means the positive is considered as the existence of a child even though there is no matching child. In turn, it raises a cache miss though the matching prefix is the correct result. The probability of false positive is introduced in [25]. For the filter size m, the number of hash functions k, and the number of elements n in the filter, it can be approximated by The probability increases as n grows. If the number of children for a matching prefix is large, the probability that the Bloom filter falsely asserts a positive is also high. For example, when we have a 16-bit filter and two hash functions, the false positive probability would be about 0.05 for n = 2, but 0.53 for n = 10.
For the best membership testing, the number of elements to be contained in a filter should be controlled. However, for a given matching prefix, the number of children cannot be changed. We propose a two-level Bloom filter that controls the number of elements to be contained. In the scheme, the first level is a false-positive free filter, and the second level is a filter in which the number of elements is restricted according to the filter size.
There is a set of hash functions that make a Bloom filter false-positive free if the number of elements n is bounded by some value [26,27]. In addition, if the filter size m is equal to the size of the universe set which is the set of all possible elements, the Bloom filter can be false-positive free by one-to-one correspondence. For example, there are at most eight children at a distance 3, so the 8-bit filter can be false-positive free up to distance 3.
In our two-level Bloom filter, the first-level filter, BF 1 , provides a membership test up to distance d 1 , and the second-level filter, BF 2 , up to distance d 2 . BF 1 has a size of 2 d 1 to be false-positive free. However, the value of d 1 is restrictive because the filter size grows by the powers of two. In our scheme, the upper bound of d 1 is 4, i.e., the size of BF 1 can be up to 16 bits. The value of d 2 is fixed to 5, but the size of BF 2 is not 32 bits, therefore BF 2 is not false-positive free. The size of BF 2 is determined according to the number of elements to be contained so that the false-positive ratio is low. The total filter size is fixed, so the size of BF 1 is obtained by subtracting the size of BF 2 from the total size. Note that the scheme supports a membership test up to the distance 5, but we just use a total filter size of 16 bits.
The BF 1 provides a membership test without a false positive up to a distance d 1 , whereas the BF 2 can do it for the limited number of elements to the greater distance. Figure 4 illustrates an example of a two-level Bloom filter for a prefix p. In this example, the sizes of BF 1 and BF 2 are 8 bits, i.e., d 1 = 3. In Figure 4a, the prefix p has four children, {a, b, c, d}. The children {c, d} are contained in BF 2 because their distance is longer than d 1 . The size of BF 2 is determined as 8 bits because the number of elements to be contained is less than a constraint (three in this example). The children {a, b} are contained in BF 1 because their distances from p are not longer than d 1 . In the first level, there are two children {a, b}, but a is expanded to the distance d 1 . Thus, five elements {b, a 0 , a 1 , a 2 , a 3 } are contained in BF 1 . For the BF 1 , a simple hash function is used, which outputs three consecutive bits after p. As a result, BF 1 [3:7] are set to 1. In the second level, two hash functions are used, which outputs the first and the last three bits after p, respectively. BF 2 [1], BF 2 [5], and BF 2 [6] are set to 1 as the result of the hash functions. Supposing that the size of an IP address is 7 bits, for an incoming IP address k = 1001110, k matches p if p = 1* is in the cache. Since h 0 (k) = 1 and BF 1 [1] is 0, BF 1 asserts a negative for k. If BF 1 asserted a positive, it would raise a cache miss. In case that the BF 1 asserts a negative, BF 2 should be consulted. Since h 1 (k) = 1 and h 2 (k) = 7, both BF 2 [1] and BF 2 [7] should be checked. BF 2 [7] is not 1, so the result is a negative and it raises a cache hit with the matching prefix p. In the second level, two hash functions are used, which outputs the first and the last three bits after p, respectively. BF2 [1], BF2 [5], and BF2 [6] are set to 1 as the result of the hash functions. Supposing that the size of an IP address is 7 bits, for an incoming IP address k = 1001110, k matches p if p = 1* is in the cache. Since h0(k) = 1 and BF1 [1] is 0, BF1 asserts a negative for k. If BF1 asserted a positive, it would raise a cache miss. In case that the BF1 asserts a negative, BF2 should be consulted. Since h1(k) = 1 and h2(k) = 7, both BF2 [1] and BF2 [7] should be checked. BF2 [7] is not 1, so the result is a negative and it raises a cache hit with the matching prefix p.   Figure 5 shows the structure of the two-level Bloom filter-based prefix cache (BFPC). Each prefix is stored in an associated memory, e.g., TCAM, but other information such as d1, a two-level Bloom filter, and a port number is stored in a conventional SRAM. When an IP address matches some prefix, the corresponding Bloom filter is extracted and consulted. If the Bloom filter asserts a negative, it raises a cache hit. Although the prefix matching in the TCAM takes more time than SRAM access, searching is enabled in parallel. The search time depends on the number of entries in the TCAM. In a prefix cache, the number of entries is small enough compared to FIB, so the search time is not very much.   Figure 5 shows the structure of the two-level Bloom filter-based prefix cache (BFPC). Each prefix is stored in an associated memory, e.g., TCAM, but other information such as d 1 , a two-level Bloom filter, and a port number is stored in a conventional SRAM. When an IP address matches some prefix, the corresponding Bloom filter is extracted and consulted. If the Bloom filter asserts a negative, it raises a cache hit. Although the prefix matching in the TCAM takes more time than SRAM access, searching is enabled in parallel. The search time depends on the number of entries in the TCAM. In a prefix cache, the number of entries is small enough compared to FIB, so the search time is not very much.
The BFPC can also be applied to IPv6 with 128-bit addresses. For IPv6, the TCAM should be extended in width to accommodate 64-bit prefixes, but three fields in SRAM basically remain the same  IPv6 forwarding table. filter. Figure 5 shows the structure of the two-level Bloom filter-based prefix cache (BFPC). Each prefix is stored in an associated memory, e.g., TCAM, but other information such as d1, a two-level Bloom filter, and a port number is stored in a conventional SRAM. When an IP address matches some prefix, the corresponding Bloom filter is extracted and consulted. If the Bloom filter asserts a negative, it raises a cache hit. Although the prefix matching in the TCAM takes more time than SRAM access, searching is enabled in parallel. The search time depends on the number of entries in the TCAM. In a prefix cache, the number of entries is small enough compared to FIB, so the search time is not very much. The BFPC can also be applied to IPv6 with 128-bit addresses. For IPv6, the TCAM should be extended in width to accommodate 64-bit prefixes, but three fields in SRAM basically remain the same length. The two-level Bloom filter size may change if needed, which depends on the prefix distribution in IPv6 forwarding table.  Figure 6 shows two algorithms for our two-level Bloom filter-based prefix cache: cache lookup and how to handle a cache miss. During the lookup process, the BFPC tries to find the longest matching prefix for a given IP address ip. There are two types of cache misses: TYPE-1 and TYPE-2. If there is no matching prefix, then it raises a TYPE-1 cache miss. Otherwise, it consults a two-level Bloom filter for a matching prefix p with ip. If it asserts a positive, i.e., there is probably some more specific prefix in the FIB, then it raises a TYPE-2 cache miss. Note that, despite the TYPE-2 cache miss, the matching prefix p could be the LMP because it was a false positive. If the Bloom filter asserts a negative, it is certainly a cache hit and the matching prefix p is the LMP.
Appl. Sci. 2020, 10, x 8 of 15 Figure 6 shows two algorithms for our two-level Bloom filter-based prefix cache: cache lookup and how to handle a cache miss. During the lookup process, the BFPC tries to find the longest matching prefix for a given IP address ip. There are two types of cache misses: TYPE-1 and TYPE-2. If there is no matching prefix, then it raises a TYPE-1 cache miss. Otherwise, it consults a two-level Bloom filter for a matching prefix p with ip. If it asserts a positive, i.e., there is probably some more specific prefix in the FIB, then it raises a TYPE-2 cache miss. Note that, despite the TYPE-2 cache miss, the matching prefix p could be the LMP because it was a false positive. If the Bloom filter asserts a negative, it is certainly a cache hit and the matching prefix p is the LMP.
If a cache miss occurs for a given IP address ip, the LMP should be searched for in the FIB. The LMP and its associated Bloom filter would be inserted into the prefix cache. Before the LMP, say p, is inserted into the prefix cache, the Bloom filter for p should be consulted. If the filter asserts a positive for ip, it implies that a cache hit never occurs for ip even though p has been inserted into the cache. In that case, p should be expanded to a 32-bit address, i.e., ip itself should be cached.

Construction of a Two-Level Bloom Filter
Although the total size of a two-level Bloom filter is fixed at 16 bits, each filter size can be adjusted. Each filter size is determined as the following rule. First, BF2 can exist only if there exists some child at a distance greater than or equal to 5. Second, the number of elements contained in the  Figure 6. The algorithms for BFPC: (a) cache lookup; and (b) processing a cache miss.
If a cache miss occurs for a given IP address ip, the LMP should be searched for in the FIB. The LMP and its associated Bloom filter would be inserted into the prefix cache. Before the LMP, say p, is inserted into the prefix cache, the Bloom filter for p should be consulted. If the filter asserts a positive for ip, it implies that a cache hit never occurs for ip even though p has been inserted into the cache.
In that case, p should be expanded to a 32-bit address, i.e., ip itself should be cached.

Construction of a Two-Level Bloom Filter
Although the total size of a two-level Bloom filter is fixed at 16 bits, each filter size can be adjusted. Each filter size is determined as the following rule. First, BF 2 can exist only if there exists some child at a distance greater than or equal to 5. Second, the number of elements contained in the BF 2 is restricted to a constraint. The constraint depends on the size of the BF 2 . Third, the size of the BF 2 can be increased as long as the corresponding constraint is satisfied. Fourth, if the BF 2 cannot satisfy any constraint, then only the BF 1 is used. Figure 7 depicts the algorithm to determine the size of each filter. In the algorithm, several notations are used as follows:  In Figure 7, x represents the gap distance between the BF1 and the BF2. cx represents the constraint for a given x. The BF1 size m1 and the BF2 size m2 are determined according to whether the constraint cx is satisfied. The maximal set size for the BF2, i.e., the number of elements in the BF2 should be constrained by cx. The BF2 is used only if there is at least one element at a distance d2 and the number of contained elements is less than or equal to cx. The BF2 size depends on the satisfied cx, as shown in the table of Figure 7. For example, if x = 1 and the number of elements contained in the BF2 is less than or equal to c1 (= 3), then d1 is 2 and the BF2 size m2 is 12 (= 16 − 2 ). Note that if x = 1 and the constraint was not satisfied (>c1), then BF1 would be only used without BF2. The constraint table is designed so that the BF2 size m2 increases as cx grows.  In Figure 7, x represents the gap distance between the BF 1 and the BF 2 . c x represents the constraint for a given x. The BF 1 size m 1 and the BF 2 size m 2 are determined according to whether the constraint c x is satisfied. The maximal set size for the BF 2 , i.e., the number of elements in the BF 2 should be constrained by c x . The BF 2 is used only if there is at least one element at a distance d 2 and the number of contained elements is less than or equal to c x . The BF 2 size depends on the satisfied c x , as shown in the table of Figure 7. For example, if x = 1 and the number of elements contained in the BF 2 is less than or equal to c 1 (= 3), then d 1 is 2 and the BF 2 size m 2 is 12 (= 16 − 2 d 1 ). Note that if x = 1 and the constraint was not satisfied (>c 1 ), then BF 1 would be only used without BF 2 . The constraint table is designed so that the BF 2 size m 2 increases as c x grows. Figure 8 shows examples of various filter sizes, m 1 and m 2 , which are determined according to the positions of children and the satisfied constraint c x . Figure 8a-d shows the cases of x = 0, 1, 2, or  3. Figure 8e,f shows the cases that the BF 2 is not used. In Figure 8a, since x is 0 and the BF 2 tries to contain two elements (prefixes d and e), the number of which is less than or equal to c 0 , m 1 and m 2 are determined as 8 bits, respectively. On the other hand, the BF 2 can accommodate more elements (prefixes c, d, and e) in Figure 8b because x is 1 and the number of elements in the BF 2 is less than or equal to c 1 . In this case, m 2 is determined as 12 bits. Note that the number of elements which is contained in the BF 2 is the sum of twice the number of prefixes at a distance 4 and the number of prefixes at distance 5. In this case, all prefixes are at distance 5. In another example of Figure 8d, four prefixes are contained in the BF2, but the total number of contained elements is 5 because the prefix d is expanded to two elements.

Evaluation
The performance of BFPC was evaluated using a real-world prefix table and two generated traces. The BGP table at 18 August 2020-04:00 was obtained from [28]. For experiments, the BGP table was converted to an FIB which has the same prefixes as the BGP table. Recently, the size of the BGP table rapidly grows by 7-9% per year [29] and has become 817,583 entries. In publicly available packet traces, IP addresses are sanitized for privacy reasons. Such IP addresses will not match the prefixes in the real-world FIB. Thus, we used randomly generated traces that match the real-world prefixes as well as reflect the locality of references.
The characteristics of the FIB are shown in Table 1. The FIBs for three comparing schemes are also shown in the table. In CPTE, all non-leaf prefixes are pushed down and expanded until they become leaf prefixes. Thus, there are only leaf prefixes and the FIB size is much bigger than the original one (total 988,828 entries). In PPTE, the FIB size is slightly increased. In NPE and BFPC, the table size does not change. The average prefix length of CPTE is greater than any others while the prefix length of NPE is the greatest on average when the prefix is cached. NPE cannot store the nonleaf prefixes in a cache and those should be expanded to 32 bits when they are cached. BFPC can utilize a prefix in a cache without expansion, regardless of whether it is a leaf or not. Thus, the average prefix length of BFPC is the same as the original one and BFPC also has the advantage that its FIB

Evaluation
The performance of BFPC was evaluated using a real-world prefix table and two generated traces. The BGP table at 18 August 2020-04:00 was obtained from [28]. For experiments, the BGP table was converted to an FIB which has the same prefixes as the BGP table. Recently, the size of the BGP table rapidly grows by 7-9% per year [29] and has become 817,583 entries. In publicly available packet traces, IP addresses are sanitized for privacy reasons. Such IP addresses will not match the prefixes in the real-world FIB. Thus, we used randomly generated traces that match the real-world prefixes as well as reflect the locality of references.
The characteristics of the FIB are shown in Table 1. The FIBs for three comparing schemes are also shown in the table. In CPTE, all non-leaf prefixes are pushed down and expanded until they become leaf prefixes. Thus, there are only leaf prefixes and the FIB size is much bigger than the original one (total 988,828 entries). In PPTE, the FIB size is slightly increased. In NPE and BFPC, the table size does not change. The average prefix length of CPTE is greater than any others while the prefix length of NPE is the greatest on average when the prefix is cached. NPE cannot store the non-leaf prefixes in a cache and those should be expanded to 32 bits when they are cached. BFPC can utilize a prefix in a cache without expansion, regardless of whether it is a leaf or not. Thus, the average prefix length of BFPC is the same as the original one and BFPC also has the advantage that its FIB size is the same as the original one. On the other hand, CPTE cannot utilize non-leaf prefixes in the cache even though its average prefix length on a cache is slightly greater than BFPC. In Figure 9, the cache miss ratios of the five schemes are compared using two traces. CPTE shows the lowest miss ratio, but its number of table entries is larger than in any other scheme, as shown in Table 1. In addition, the FIB update overhead can be quite high in CPTE. For instance, a non-leaf prefix can be expanded to a lot of prefixes in CPTE and those expanded prefixes should be removed when the original prefix is removed. BFPC shows the miss ratio close to that of CPTE, with the same number of table entries as the original one. The miss ratio of BFPC is merely 0.003 in trace-1 or 0.027 in trace-2, in a cache with 8192 entries. The miss ratio of PPTE is not very good, even though its table size is bigger than that of BFPC. IPCache scheme does not cache any prefix, instead just caches the incoming 32-bit IP addresses. The poor miss ratio of IPCache results from a lack of exploiting address-spatial locality and the occupancy of many entries by caching individual IP addresses.
Appl. Sci. 2020, 10, x 11 of 15 In Figure 9, the cache miss ratios of the five schemes are compared using two traces. CPTE shows the lowest miss ratio, but its number of table entries is larger than in any other scheme, as shown in Table 1. In addition, the FIB update overhead can be quite high in CPTE. For instance, a non-leaf prefix can be expanded to a lot of prefixes in CPTE and those expanded prefixes should be removed when the original prefix is removed. BFPC shows the miss ratio close to that of CPTE, with the same number of table entries as the original one. The miss ratio of BFPC is merely 0.003 in trace-1 or 0.027 in trace-2, in a cache with 8192 entries. The miss ratio of PPTE is not very good, even though its table size is bigger than that of BFPC. IPCache scheme does not cache any prefix, instead just caches the incoming 32-bit IP addresses. The poor miss ratio of IPCache results from a lack of exploiting addressspatial locality and the occupancy of many entries by caching individual IP addresses. Unlike prefix expansion-based caching schemes, BFPC caches an original prefix without expansion except that its Bloom filter asserts a positive for a given IP address. The cache miss reasons of BFPC are classified into two categories: TYPE-1 cache miss and TYPE-2 cache miss. The TYPE-1 cache misses are conventional and result from the absence of a matching prefix in the cache. Even though there is a matching prefix in the cache, the cache miss can still arise if a matching child prefix probably exists but is not cached yet. In such cases, BF1 and BF2 play a role in testing if there exists some matching child prefix in an FIB. If BF1 or BF2 asserts a positive, then a TYPE-2 cache miss arises. Figure 10 shows the distribution of cache miss reasons. The cache misses mainly arise due to TYPE-1, which accounts for 75.0-96.4% of cache misses in trace-1, and 80.4-97.3% in trace-2. The cache misses due to BF1-positive are the next most common misses, but relatively small. It seems that the relative portion of TYPE-2 increases when the cache size increases. The reason is that the actual total Unlike prefix expansion-based caching schemes, BFPC caches an original prefix without expansion except that its Bloom filter asserts a positive for a given IP address. The cache miss reasons of BFPC are classified into two categories: TYPE-1 cache miss and TYPE-2 cache miss. The TYPE-1 cache misses are conventional and result from the absence of a matching prefix in the cache. Even though there is a matching prefix in the cache, the cache miss can still arise if a matching child prefix probably exists but is not cached yet. In such cases, BF 1 and BF 2 play a role in testing if there exists some matching child prefix in an FIB. If BF 1 or BF 2 asserts a positive, then a TYPE-2 cache miss arises. Figure 10 shows the distribution of cache miss reasons. The cache misses mainly arise due to TYPE-1, which accounts for 75.0-96.4% of cache misses in trace-1, and 80.4-97.3% in trace-2. The cache misses due to BF1-positive are the next most common misses, but relatively small. It seems that the relative portion of TYPE-2 increases when the cache size increases. The reason is that the actual total cache misses decreases while the cache size increases, and especially "unmatched" misses (TYPE-1) are relatively more reduced.  Figure 11 shows the ratio that a Bloom filter asserts a negative. The negative ratio of a Bloom filter can be measured by cache hit counts among prefix matching counts in a cache. Remember that the cache hit arises only when there is a matching prefix in a cache and its Bloom filter asserts a negative. For any cache size, the negative ratio is higher than 0.986, which means, in more than 98.6% cases, the Bloom filter assures the matching prefix is the hit result. In Figure 11, the negative ratio slightly increases as the cache size increases. In BFPC, each Bloom filter size is variable, but the total Bloom filter size is fixed to 16 bits. The size of the first-level filter, m1, can be 0, 2, 4, 8, or 16. Figure 12 shows the first-level filter size over prefix length. There are many more cases in m1 = 16 bits in which BF1 is only used. The next most common cases are in m1 = 0 in which BF2 is only used. Generally, the number of non-leaf prefixes having a specific BF1 size grows as the prefix length increases. Figure 13 shows the number of non-leaf prefixes over the number of elements which are contained in their level filters. Since the BF1 size is at most 16 bits, the number of elements to be contained is also at most 16 in the figure. In the BF1, there is no general trend, other than that the number of non-leaf prefixes has a peak at a power of two. On the other hand, the number of non-leaf prefixes sharply decreases, as the number of elements contained in the BF2 increases. The number of  Figure 11 shows the ratio that a Bloom filter asserts a negative. The negative ratio of a Bloom filter can be measured by cache hit counts among prefix matching counts in a cache. Remember that the cache hit arises only when there is a matching prefix in a cache and its Bloom filter asserts a negative. For any cache size, the negative ratio is higher than 0.986, which means, in more than 98.6% cases, the Bloom filter assures the matching prefix is the hit result. In Figure 11, the negative ratio slightly increases as the cache size increases.  Figure 11 shows the ratio that a Bloom filter asserts a negative. The negative ratio of a Bloom filter can be measured by cache hit counts among prefix matching counts in a cache. Remember that the cache hit arises only when there is a matching prefix in a cache and its Bloom filter asserts a negative. For any cache size, the negative ratio is higher than 0.986, which means, in more than 98.6% cases, the Bloom filter assures the matching prefix is the hit result. In Figure 11, the negative ratio slightly increases as the cache size increases. In BFPC, each Bloom filter size is variable, but the total Bloom filter size is fixed to 16 bits. The size of the first-level filter, m1, can be 0, 2, 4, 8, or 16. Figure 12 shows the first-level filter size over prefix length. There are many more cases in m1 = 16 bits in which BF1 is only used. The next most common cases are in m1 = 0 in which BF2 is only used. Generally, the number of non-leaf prefixes having a specific BF1 size grows as the prefix length increases. Figure 13 shows the number of non-leaf prefixes over the number of elements which are contained in their level filters. Since the BF1 size is at most 16 bits, the number of elements to be contained is also at most 16 in the figure. In the BF1, there is no general trend, other than that the number of non-leaf prefixes has a peak at a power of two. On the other hand, the number of non-leaf prefixes sharply decreases, as the number of elements contained in the BF2 increases. The number of elements contained in the BF2 is at most 5 since it is restricted to that value. Note that the effectiveness  In BFPC, each Bloom filter size is variable, but the total Bloom filter size is fixed to 16 bits. The size of the first-level filter, m 1 , can be 0, 2, 4, 8, or 16. Figure 12 shows the first-level filter size over prefix length. There are many more cases in m 1 = 16 bits in which BF 1 is only used. The next most common cases are in m 1 = 0 in which BF 2 is only used. Generally, the number of non-leaf prefixes having a specific BF 1 size grows as the prefix length increases.

Conclusions
IP address lookup is a very crucial function in forwarding packets, and prefix caching is one of the techniques to enhance the lookup performance effectively. In general, a prefix cache can achieve better performance than an IP cache. The prefix cache exploits address-spatial locality as well as temporal locality, whereas the IP cache can exploit only temporal locality. However, if a non-leaf prefix is allowed to be cached, it incurs an incorrect result. The reason for such an incorrect result is that another longer matching prefix may exist in an FIB when a non-leaf prefix in the cache is matched. To resolve the problem, all the descendants should be cached together with the non-leaf prefix, or it should be assured that all the cacheable prefixes are disjoint to each other. The prefix expansion is one of the techniques which can obtain a set of cacheable disjoint prefixes. However, the prefix expansion technique increases the FIB size as well as makes the prefix to be expanded. The expanded prefix represents a more specific prefix and covers a smaller range of addresses. Thus, the exploitable spatial locality is reduced in those schemes.
This paper proposes the Bloom filter-based prefix caching scheme (BFPC). While most of the existing prefix caching techniques are based on prefix expansion, BFPC improves the caching technique so that non-leaf prefixes can be cached without the prefix expansion. A Bloom filter for each prefix plays a role in a membership test whether there exists another longer matching prefix. A prefix is stored in a cache entry together with its Bloom filter. If some prefix is matched in the cache during cache lookup, then its Bloom filter is consulted. The positive assertion of the Bloom filter implies that there is possibly another longer matching prefix, i.e., a cache miss occurs.  Figure 13 shows the number of non-leaf prefixes over the number of elements which are contained in their level filters. Since the BF 1 size is at most 16 bits, the number of elements to be contained is also at most 16 in the figure. In the BF 1 , there is no general trend, other than that the number of non-leaf prefixes has a peak at a power of two. On the other hand, the number of non-leaf prefixes sharply decreases, as the number of elements contained in the BF 2 increases. The number of elements contained in the BF 2 is at most 5 since it is restricted to that value. Note that the effectiveness of the BF 2 depends on the number of contained elements, whereas that of the BF 1 is independent of the number of contained elements.

Conclusions
IP address lookup is a very crucial function in forwarding packets, and prefix caching is one of the techniques to enhance the lookup performance effectively. In general, a prefix cache can achieve better performance than an IP cache. The prefix cache exploits address-spatial locality as well as temporal locality, whereas the IP cache can exploit only temporal locality. However, if a non-leaf prefix is allowed to be cached, it incurs an incorrect result. The reason for such an incorrect result is that another longer matching prefix may exist in an FIB when a non-leaf prefix in the cache is matched. To resolve the problem, all the descendants should be cached together with the non-leaf prefix, or it should be assured that all the cacheable prefixes are disjoint to each other. The prefix expansion is one of the techniques which can obtain a set of cacheable disjoint prefixes. However, the prefix expansion technique increases the FIB size as well as makes the prefix to be expanded. The expanded prefix represents a more specific prefix and covers a smaller range of addresses. Thus, the exploitable spatial locality is reduced in those schemes.
This paper proposes the Bloom filter-based prefix caching scheme (BFPC). While most of the existing prefix caching techniques are based on prefix expansion, BFPC improves the caching technique so that non-leaf prefixes can be cached without the prefix expansion. A Bloom filter for each prefix plays a role in a membership test whether there exists another longer matching prefix. A 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Prefix Length

Conclusions
IP address lookup is a very crucial function in forwarding packets, and prefix caching is one of the techniques to enhance the lookup performance effectively. In general, a prefix cache can achieve better performance than an IP cache. The prefix cache exploits address-spatial locality as well as temporal locality, whereas the IP cache can exploit only temporal locality. However, if a non-leaf prefix is allowed to be cached, it incurs an incorrect result. The reason for such an incorrect result is that another longer matching prefix may exist in an FIB when a non-leaf prefix in the cache is matched. To resolve the problem, all the descendants should be cached together with the non-leaf prefix, or it should be assured that all the cacheable prefixes are disjoint to each other. The prefix expansion is one of the techniques which can obtain a set of cacheable disjoint prefixes. However, the prefix expansion technique increases the FIB size as well as makes the prefix to be expanded. The expanded prefix represents a more specific prefix and covers a smaller range of addresses. Thus, the exploitable spatial locality is reduced in those schemes. This paper proposes the Bloom filter-based prefix caching scheme (BFPC). While most of the existing prefix caching techniques are based on prefix expansion, BFPC improves the caching technique so that non-leaf prefixes can be cached without the prefix expansion. A Bloom filter for each prefix plays a role in a membership test whether there exists another longer matching prefix. A prefix is stored in a cache entry together with its Bloom filter. If some prefix is matched in the cache during cache lookup, then its Bloom filter is consulted. The positive assertion of the Bloom filter implies that there is possibly another longer matching prefix, i.e., a cache miss occurs.
Generally, the Bloom filter size is relatively small compared to the number of all possible containable elements. However, the probability of false positive increases as the number of contained elements grows. Our design goal is to reduce the false positive by using a two-level Bloom filter. The two-level Bloom filter scheme adjusts the size of each level filter to reduce the false positives. The filter size is adjusted according to the number of elements to be contained in the second-level filter. The second-level filter can accommodate a limited number of child prefixes at a long distance to reduce the false positives. On the other hand, the first-level filter can accommodate a maximum number of child prefixes at a short distance.
The experimental result shows that BFPC achieves a very low cache miss ratio without increasing the number of FIB entries. The cache miss reasons are also observed in the experiment. Most cache misses are due to no matching prefix, and there are not many misses due to the positive assertion of the Bloom filters. In addition, the membership tests of the Bloom filters are observed to be negative assertions in most cases. It implies that, when the original prefixes are cached without expansion in BFPC, they can be effectively hit using the descendant test of BFPC.