You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

15 September 2023

Cuckoo Bloom Hybrid Filter: Algorithm and Hardware Architecture for High Performance Satellite Internet Protocol Route Lookup

,
,
,
and
1
College of Communications Engineering, Army Engineering University of PLA, Nanjing 210007, China
2
Key Laboratory of Electronic Information Control, Chengdu 610036, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Electrical, Electronics and Communications Engineering

Abstract

The next-generation satellite Internet Protocol (IP) router is required to achieve tens of millions of route lookups per second, since satellite Internet services based on low Earth orbit (LEO) constellations have become a reality. Due to the limitation of hardware resources on satellites and the high reliability requirements for equipment, a new satellite IP route lookup architecture is proposed in this paper. The proposed architecture uses a Bloom and cuckoo filter-based structure called cuckoo Bloom hybrid filter (CBHF), which guarantees only one off-chip memory access per lookup, to accelerate the Prefix-Route Trie (PR-Trie) algorithm. The proposed architecture has been evaluated through both a behavioral simulation in C++ language and a hardware implementation in Verilog hardware description language (HDL). Our simulation and implementation results show that the proposed satellite IP route lookup architecture can achieve a single-port throughput beyond 13 Gbps on a field programmable gate array (FPGA) board with a single DDR3 memory chip when operating at 200 MHz. In addition, the resource utilization in the FPGA shows that the proposed architecture also supports triple modular redundancy (TMR) to enhance reliability.

1. Introduction

Nowadays, networks are inextricably interwoven with human social activities, which simultaneously presents unprecedented challenges to the future development of networks. Satellite–terrestrial networks (STNs), namely the global heterogeneous networks based on terrestrial networks and extended on space networks, use internet technologies to realize the connectivity among internet networks, mobile networks, and space networks, which have become a significant developing direction of future networks.
Recent technological advances (including improvements in single-satellite performance and reductions in launching and manufacturing costs) make it possible to the deployment of low Earth orbit (LEO) satellite constellation systems, which can offer high-speed and huge-capacity communications between satellites and the ground, and thus STNs have attracted much attention in the sixth generation (6G) mobile communication technology. In terms of the structure of STNs, as illustrated in Figure 1, LEO satellite constellations play a crucial role. Unlike traditional geostationary Earth orbit (GEO) satellites, LEO satellites have the advantage of low latency, and mega-constellation systems of LEO satellites provide an effective way to realize global continuous coverage. Therefore, LEO satellite constellations are considered a promising solution for space backbone networks, capable of providing phone high-bandwidth, low-latency, and wide-ranging internet service access to users across the globe, and satellite internet is increasingly becoming an important information infrastructure.
Figure 1. The structure of satellite–terrestrial networks (STNs).
There are three active global LEO satellite initiatives that have stood out in recent years: Starlink by SpaceX, Project Kuiper by Amazon, and OneWeb backed by the government of the United Kingdom and Bharti Enterprises, which are licensed by the Federal Communications Commission (FCC) to launch 4408 [1], 3236 [2], and 716 [3] satellites in the initial phases, respectively. Generally, each satellite node in space networks has the capability of signal and data processing, and data interchange is implemented via inter-satellite links (ISLs). In the initial deployments of these constellations, it is not specified whether they plan to use ISLs, and their total throughputs without ISLs are all over 1 Tbps [4]. Since optical ISLs allow satellite constellations to serve users globally, even when a terrestrial gateway is not within the line-of-sight (LoS) of the satellite, optical ISLs are planned to be used to improve constellation system throughput in the future [3].
Like terrestrial networks, satellite networks rely on certain routing protocols to achieve data forwarding. Routing protocols are responsible for creating routing tables, describing the network topology structure, and performing routing and packet forwarding. Thus, satellite routing protocols play a crucial role in satellite network communications. Currently, Internet Protocol (IP) is the mainstream scheme for satellite networks to connect the global Internet. Specifically, satellites can perform dynamic IP routing via configuring interior/exterior gateway protocols at the terrestrial terminal, or serve as IP routing nodes in the constellation system to access the Internet [5]. However, due to the low computing power and limited memory resources on satellites, inter-satellite networks differ greatly from terrestrial networks, and how to realize IP packet routing and forwarding with high speed has become a key issue in designing LEO satellite routers.
IP route lookup, also known as IP address lookup or IP lookup, is a core technology in IP routers, and its algorithm performance will directly affect the system performance of routers. Since Classless Inter-Domain Routing (CIDR) was proposed by the Internet Engineering Task Force (IETF) in 1993 [6], which allows arbitrary prefix lengths and also introduces the longest prefix matching (LPM) problem, IP route lookup has attracted a great deal of attention from both academia and industry.
Various IP lookup mechanisms have been proposed, from ternary content addressable memory (TCAM)-based schemes [7,8] to algorithms based on hash [9,10], trie [11,12,13], and Bloom filter [14,15]. However, the above algorithms are mainly intended for terrestrial networks, some could not be applied in demanding inter-satellite networks directly; for instance, the traditional TCAM possesses the characteristics of a high cost and large power consumption, and the hashing algorithm inevitably has hash collisions. In modern embedded systems, since the time of arithmetic logical operations is negligible for memory access latency [16] and on-chip memory access is much faster than off-chip memory access [17], the number of off-chip memory accesses is a major determinant of IP lookup algorithm performance. We are interested in both trie-based and Bloom filter-based schemes, the former provides a compact data structure and the latter can effectively reduce off-chip memory accesses, which makes them attractive for satellite routers.
Due to its high performance and flexibility, the field programmable gate array (FPGA) is generally recognized as suitable for space applications [18,19,20]. Current high-end FPGAs are equipped with a large number of logical units and static random-access memory (SRAM), which allow them to provide hardware acceleration for complex tasks, thereby achieving a higher processing throughput. At the same time, with the capability of being reconfigured, to some extent, FPGAs can mitigate the impact of single-event upsets (SEUs) that are caused by space radiation. In addition, the application-specific integrated circuit (ASIC) is superior to FPGA in terms of power consumption and processing speed; however, the ASIC chip is fixed, i.e., not reconfigurable, limiting its application in the space environment. Therefore, it is apparent that FPGAs offer a significant advantage in supporting remote upgrades and repairs in space missions since they are more flexible.
In this paper, we propose a new filter structure called the cuckoo Bloom hybrid filter (CBHF), which is a hybrid of the Bloom filter and cuckoo filter for accelerating trie-based IP route lookup. A CBHF-based satellite IP route lookup architecture is also developed. The CBHF can achieve only one off-chip memory access for an IP route lookup, which is a valuable feature saving compute and storage resources on satellites. We prototype our design using Verilog hardware description language (HDL) with an FPGA and one single dynamic random-access memory (DRAM), and the proposed architecture is evaluated in two parts. The performance of the CBHF structure has been evaluated in terms of false positive probability, on-chip memory requirement, and average time per lookup; the performance of the lookup architecture has been evaluated in terms of on-chip block random-access memory (BRAM) overhead, resource utilization, and system throughput.
The remainder of this paper is organized as follows. Section 2 introduces the Bloom filter and cuckoo filter, and reviews previously proposed IP lookup algorithms based on the above filters. Section 3 describes the proposed satellite IP lookup algorithm using the CBHF and presents the theoretical performance analysis of our proposed filter. Section 4 illustrates the prototype hardware architecture implemented on FPGA. The optimization and evaluation of the CBHF structure are shown in Section 5. The hardware implementation result of our proposed satellite IP route lookup architecture is detailed in Section 6. The performance analysis is discussed in Section 7. Finally, Section 8 gives some conclusions.

3. Satellite IP Route Lookup Using Cuckoo Bloom Hybrid Filter

In our previous work [30], we proposed the cuckoo Bloom hybrid filter (CBHF) structure, which can be used to accelerate a trie-based IP lookup algorithm called Tree Bitmap (TBM) [11]. The CBHF comprises three Bloom filters and a cuckoo filter based on the prefix partitioning scheme in the TBM. Specifically, a cuckoo filter for the subtries that belong to the popular level is maintained to reduce the false positive probability, while the subtries of each unpopular level are inserted in a Bloom filter. All of the filters are configured on-chip, and the proposed lookup architecture completes IP lookups by performing the TBM algorithm to access the off-chip next-hop information (NHI) table according to the query results of the CBHF. The above lookup architecture is shown in Figure 4.
Figure 4. IP route lookup using a CBHF.
Like most filter-based IP lookup schemes, the motivation of the proposed CBHF structure is to achieve only one off-chip memory access for a single lookup, without unnecessary access due to longest prefix matching (LPM). In addition, the proposed CBHF-based lookup architecture can provide higher memory efficiency since we utilize a trie data structure or, precisely, multibit-trie in IP lookups.
However, the TBM algorithm also has drawbacks. One of them is that a pair of bitmaps (i.e., the internal and external bitmaps) with a pointer used for each subtrie may require a considerable memory overhead, which is unpalatable in satellite IP route lookup. And the presence of pointers in the TBM also complicates the lookup operation. Hence, we must propose a new algorithm suitable for satellite IP route lookup.
In Reference [31], we introduced a new IP lookup algorithm called Prefix-Route Trie (PR-Trie). This scheme considers a special coding concept for a hybrid of prefixes and routes, which we call Overlapping Hybrid Trie (OHT). It is important that using the OHT, the LPM process is converted into specific logic calculation instead of the pointer operation, and thus significantly reduces the lookup complexity. In addition, memory optimization for prefix partitioning is also considered in PR-Trie. However, the proposed PR-Trie architecture is implemented in parallel, and a discrete memory module is employed for each level while consuming a large amount of compute resources. Therefore, we consider introducing the CBHF into the PR-Trie algorithm to realize faster serial lookup based on the level priority, which can be more applicable in satellite IP route lookup.
In this paper, we detail the serial version of PR-Trie architecture with a CBHF, and thus Section 4 is a completely new section. In our previous work [30], we briefly describe the parameter configuration of CBHF for typical routing tables in terrestrial networks, but the dynamic changes in these parameters are not evaluated, which has been largely supplemented in Section 5. In addition, we implement the proposed lookup architecture on FPGA, and the prototype system has been evaluated through comprehensive simulation and hardware implementation. Therefore, Section 6 is also a completely new section.

3.1. CBHF Algorithms

As described in Section 2.1, a property of the Bloom filter is that it does not support the deletion of elements stored in the filter. Since online routing update occurs frequently in networks, and to provide the consistency of allowing deleting in the CBHF, we adopt the counting Bloom filter proposed in ref. [32]. The basic idea of the counting Bloom filter is to associate a counter with each bit of the standard Bloom filter; whenever an element is inserted into or removed from the filter, the counters corresponding to the bit positions are incremented or decremented accordingly. We will detail this further in Section 5, and here we only focus on the CBHF algorithms.
In the following, we denote the Bloom filter as “BF”, and the cuckoo filter as “CF”, and all the algorithms are detailed in Appendix A. Algorithm A1 shows the pseudocode of the inserting procedure in the CBHF. For an element x, it will be inserted into the specific filter according to the level ascription. If the element x belongs to an unpopular level, it will be inserted into a Bloom filter. Otherwise, x will be inserted into a cuckoo filter.
Similarly, Algorithms A2 and A3 describe the pseudocode of the querying and deleting process, respectively. Some content about the Bloom filter and cuckoo filter algorithms can be referred to in Section 2.

3.2. Satellite IP Route Lookup Using CBHF

As mentioned earlier, the proposed satellite IP lookup scheme is based on the PR-Trie architecture. Now, let us briefly review the PR-Trie algorithm, Algorithm A4 (see Appendix A), which describes the core idea of PR-Trie.
Like the TBM algorithm, PR-Trie also uses a multibit-trie-based data structure called Bitmap. However, the most significant feature of PR-Trie is that IP lookups can be realized by calculating both Prefix Trie and Route Trie, where Prefix Trie (P-Trie) and Route Trie (R-Trie) denote the data structures of routing tables and input IP addresses, respectively. Therefore, the perfect lookup complexity of O ( 1 ) can be theoretically achieved.
In the original PR-Trie paper [31], we recommend the parallel version of PR-Trie that can achieve the fastest lookup and update speeds. Unfortunately, it may be impractical to employ a lot of computing and memory resources in satellite IP route lookup. Now, let us consider the serial version of PR-Trie architecture. Generally, this shared memory-based lookup mostly adopts priority search, in which a secondary priority phase will be accessed if there is no match in the high-priority phase. Hence, unnecessary search phases lead to an increase in lookup time, lowering lookup algorithm performance.
To ensure only one off-chip memory access, or reduce unnecessary search phases, for a single IP lookup, the CBHF structure is introduced. Each potential subtrie corresponding to a given input address is probed using the CBHF at the pre-lookup phase, and then a level-priority-based PR-Trie lookup is performed according to probe results. Note that since not all subtries have a real root (i.e., a root node with prefix information), the mere existence of a subtrie does not ensure that there is at least a match. Therefore, we maintain an on-chip imaginary root table that stores the LPM information for every imaginary root (i.e., a root node without prefix information).
Algorithm A5 (see Appendix A) shows the pseudocode of the satellite IP route lookup procedure using the CBHF. For an input IP address, if there is a matching prefix in the subtrie (i.e., the OHT bitmap  0 ), look up the off-chip NHI table; if there is no match in the subtrie (i.e., the OHT bitmap  = 0 ), the imaginary root table will be accessed.

3.3. Theoretical Performance Analysis on CBHF

In this section, we discuss the theoretical performance of the proposed CBHF, including the false positive probability, the average number of subtrie accesses, and the lookup time. According to the IPv4 prefix partitioning scheme recommended in Reference [31] (i.e., Level 1 contains prefixes in /1 to /8; Level 2 contains prefixes in /9 to /16; Level 3 contains prefixes in /17 to /24; Level 4 contains prefixes in /25 to /32), Level 3 and Level 4, containing the most potential subtries ( 2 17 and 2 25 , respectively), are set to the popular levels in the CBHF, and the rest of levels are set to unpopular. Therefore, the CBHF in PR-Trie comprises two Bloom filters and two cuckoo filters.
In Section 2, certain formulas of the false positive probability of the Bloom filter and cuckoo filter are obtained. Note that Equation (8) applies only to an almost full hash table. Now, let us consider a more general case where hash addresses are uniformly distributed, the false positive probability of the cuckoo filter can be expressed as
f CF   max = 1 1 1 2 f 2 b · α 2 b · α 2 f = 2 n m · 2 f ,
where all variables are referred to in Section 2 (the same applies below). Thus, the false positive probability of the proposed CBHF (seen as a whole, and the subscript i represents the level) is calculated as
f CBHF = 1 i = 1 4 1 f i 1 i = 1 2 [ 1 ( 0.6185 ) m i n i ] · i = 3 4 ( 1 2 n i m i · 2 f ) ,
where we assume that Bloom filters are always in the optimal case, and this is clearly desirable due to the dynamic reconfiguration of FPGA.
The number of subtrie accesses required to compute the correct LPM for an input IP address is determined by the number of matching filters. For an IP address x matching a subtrie in Level l, we will first inspect the levels of high priority if there are false positives in the levels of priority greater than Level l. Therefore, the average number of additional subtrie accesses required for the input IP address x is
N add ( l ) = i > l 4 f i ,
considering the worst case in which l = 0 (i.e., x matches a default prefix) and all the filters produce false positives and, thus, the average number of total subtrie accesses per lookup can be upper bounded as
N avg   max = N add ( 0 ) + 1 = i = 0 4 f i + 1 i = 1 2 0.6185 m i n i + i = 3 4 2 n i m i · 2 f + 1 .
Note that Equation (13) gives the average number of subtrie accesses due to the false positives, and for the worst case mentioned above, the number of required subtrie accesses obviously is N worst = 4 + 1 = 5 (the default prefix constitutes one access here).
However, the average number of subtrie accesses is generally lower than the estimated value using Equation (13) in practical implementation, because of the relative priority of filters. In more detail, for a lookup that has been hit by the level of high priority, whether there are any false positives in the filters of lower priority makes no difference to the query results. For this reason, we define a weighting factor w i , which is stated as
w i = j = i + 1 4 ( 1 f j ) , i = 1 , 2 , 3 1 , i = 4 .
Hence, the average number of subtrie accesses per lookup can be further refined as
N avg = i = 1 4 w i · f i + 1 = i = 1 3 j = i + 1 4 ( 1 f j ) f i + f 4 + 1 .
In real systems, the speed for a single lookup is determined by the total time of a direct hit and additional subtrie accesses. Thus, the lookup time of our proposed architecture can be calculated as
T PR Trie = T 1 + ( N avg 1 ) · T add ,
where T 1 and T add denote the time of a direct hit and additional subtrie accesses, respectively. The lookup time is a key indicator in satellite IP route lookup, which will be further evaluated in Section 6.

4. Hardware Architecture

In this section, we describe the detailed architecture of our prototype hardware design. Figure 5 shows the block diagram of our proposed satellite IP route lookup engine. In the lookup engine, there are four main modules: CBHF, Bitmap Lookup, PR-Trie Lookup, and NHI Lookup. As shown, the input IP address is first fed into the CBHF module, and then the CBHF module checks for the existence of subtries for the input IP address in the four levels. We assume that the matching subtrie of the highest priority is designated as a PR-Trie, the CBHF module will output the PR-Trie address, and forward it to the Bitmap Lookup module to query on-chip memory, where the P-Trie and R-Trie bitmaps are stored. The PR-Trie Lookup module reads the P-Trie and R-Trie data from the preceding module, computing the OHT bitmap according to the PR-Trie algorithm, and generating the addresses of the target next-hop information (NHI) and imaginary root information, which are, respectively, stored in two first-in-first-out (FIFO) buffers for pipelining to accelerate the processing speed. Finally, the NHI Lookup module obtains output port data using the above addresses, then picks out the correct NHI based on Algorithm A5.
Figure 5. Overview of hardware block diagram.

4.1. Hardware Architecture of CBHF Module

At the pre-lookup phase, the IP address is input into the CBHF module, and the subtrie matching queries of variable length are performed in parallel. The CBHF module checks if there are existent subtries of different levels for the input IP address. As mentioned previously, the binary trie in the PR-Trie architecture is divided into four levels for IPv4 lookups, which are also adopted in the proposed satellite IP route lookup. Figure 6 shows the parallel architecture of the CBHF module. Each level has an individual Bitmap Address Generator and an individual Subtrie Membership Query. Bits [31:24], bits [31:16], bits [31:8], and bits [31:0] of the input IP address are used as the input for the Level 1 submodule, Level 2 submodule, Level 3 submodule, and Level 4 submodule, respectively. Note that these submodules are basically the same in structure and operation, except that cuckoo filters are used in Level 3 and Level 4 submodules and the rest submodules use Bloom filters. Therefore, we take the Level 3 submodule as an example, which is shown in Figure 7.
Figure 6. Parallel architecture of CBHF module.
Figure 7. Detail of Level 3 submodule.
In the Level 3 submodule, the most significant 17 bits of the input (i.e., IPv4_L3 [23:7]) are used to calculate the P-Trie bitmap address in the Bitmap Address Generator and perform the membership query in the Subtrie Membership Query. The least significant 7 bits of the input (i.e., IPv4_L3 [6:0]) used as the R-Trie address indexing the R-Trie bitmap are stored in a FIFO buffer until the P-Trie address hashing procedure is completed, then concatenate the above two addresses to generate a candidate PR-Trie address.
For subtrie membership queries in each level, if a subtrie corresponding to the input IP address is found, or to be more precise, the filter of this level returns a positive, the candidate PR-Trie address will be input into the priority checker module. The priority checker module inspects the candidate PR-Trie address set based on the longest prefix matching (LPM) principle, and these candidate addresses will be stored in a FIFO buffer in priority order. Once the correct PR-Trie address is accessed next (in Bitmap Lookup), the FIFO buffer is cleared at the same time.

4.2. Hardware Architecture of Bitmap Lookup Module

The Bitmap Lookup module is used for the P-Trie and R-Trie bitmap lookup. Both bitmaps are stored in on-chip memory. The composite PR-Trie address is divided here, and the P-Trie address is used to query the hash table, while the R-Trie address is for the linear table. Then, the P-Trie data (including bitmap and current hash address) and the R-Trie bitmap are output to the PR-Trie Lookup module.

4.3. Hardware Architecture of PR-Trie and NHI Lookup Modules

The PR-Trie Lookup module resolves the P-Trie data, as shown in Figure 8, and the P-Trie and R-Trie bitmaps are used to compute the OHT bitmap, to obtain the offset. The P-Trie address is involved in calculating the NHI base address and potential on-chip imaginary root address, then the off-chip NHI address generated by the base address plus the offset is obtained.
Figure 8. Block diagram of PR-Trie Lookup module.
In the NHI Lookup module, as shown in Figure 9, the DDR3 interface employs a DDR3 memory controller generated by the Xilinx Memory Interface Generator (MIG). The NHI address is applied to query the off-chip memory, while the imaginary root address is used to query the on-chip imaginary root table and its result will be stored in a FIFO buffer until the off-chip NHI searching is completed. Then, the off-chip output port bitmap is selected as the correct NHI if it is not zero (i.e., 16’b0), otherwise, the on-chip output port bitmap from the imaginary root table will be picked out.
Figure 9. Block diagram of NHI Lookup module.

5. Optimization and Evaluation on CBHF

As mentioned in Section 3.1, the counting Bloom filter is adopted in the CBHF structure, to support subtrie deletions. The basic update operation of the original PR-Trie has been provided in Reference [31], and in this paper, updates in the imaginary root table of the proposed architecture are quite easy, since there are not many imaginary roots in real-life routing tables (in quantity) and the update procedure is not necessarily performed in real-time (in terms of speed).
In this section, we detail the counting Bloom filter structure, including the size configuration of its counters. The behavioral simulation of the CBHF structure is implemented in C++ language. Predicting the size of a satellite routing table is challenging, as vendors are resistant to allowing access and the rapid growth of global subscribers can be continued. Therefore, we refer to the current routing tables of backbone routers in terrestrial networks, which are downloaded from the Route Views project of the University of Oregon [33]. We create three routing prefix sets: 1 k (contains 1078 prefixes), 5 k (contains 5096 prefixes), and 25 k (contains 25,707 prefixes), which are used to evaluate the performance of filters in the CBHF. Since the number of current satellite internet users is still tiny compared to the number of wired cable or fiber subscribers, we use the above sets with a small number of prefixes instead of the present large-scale IPv4 backbone routing tables (containing nearly 1 M prefixes) in this paper.

5.1. Counting Bloom Filter Structure

Figure 10 shows the counting Bloom filter structure with several elements inserted in the filter. For each local counter in the filter, it is initially set to ‘0’. When an element x is inserted or deleted, the counters  c [ h 1 ( x ) ] , c [ h 2 ( x ) ] , , c [ h k ( x ) ] are incremented or decremented, respectively.
Figure 10. Illustration of a counting Bloom filter querying the membership of elements x, y, and z.
The bit at the position  h i ( x ) is set to ‘1’ whenever the counter  c [ h i ( x ) ] changes from 0 to 1. Conversely, the bit at position  h i ( x ) is set to ‘0’ whenever the counter  c [ h i ( x ) ] changes from 1 to 0. Hence, the counting Bloom filter always correctly reflects the current set. Note that counters in the counting Bloom filter structure are only used for insertions and removals, but not for lookups. Therefore, the counting Bloom filter has the same false positive probability as the standard Bloom filter.
In addition, the memory allocation of the counters is also important, namely, how largely each counter can become. In our previous paper [30], we have already obtained some conclusions about the probability that the counting value overflows:
P { max c 2 2 } 4.29 × 10 2 × m ,
P { max c 2 3 } 9.47 × 10 6 × m ,
P { max c 2 4 } 1.37 × 10 15 × m ,
where c and m denote the counting value and the bit array size, respectively. These theoretical probabilities will help us construct the filters in the CBHF.

5.2. Performance Evaluation

Since the satellite IP route lookup architecture is implemented on FPGA, the CBHF structure can be easily modified. In other words, it allows upgrading the configuration of filters in space missions. Hence, the performance of filters is affected by the routing table the satellite IP router maintains. Figure 11 shows the data of the prefix length distribution in each routing prefix set for testing.
Figure 11. Prefix length distribution of testing routing prefix sets.
First, we extract subtries to generate the PR-Trie data structure for each routing prefix set. The number of subtries in each level and the total subtries in each set are shown in Table 2. It is shown that there are many more subtries that need to be maintained in Level 3 than other levels, which is consistent with the filter strategies that we have assumed previously. Note that there are a few subtries in Level 4, but it is desirable to use the cuckoo filter, since it has a higher space efficiency. In more detail, considering the case in which there are many potential subtries (in lookup), but few stored in the filter, we should maintain a large enough array but few elements have been inserted if using the Bloom filter.
Table 2. Number of subtries in each level for testing routing prefix sets.
Table 3 reports the detailed configuration of the CBHF structure. As shown, for the Bloom filters of Level 1, there are at most two elements due to the one-bit subtrie index (i.e., IP [31]), which makes these filters safely work without the counters, and thus the width is 1 + 0 = 1 bit (array + counter). For the cuckoo filters of Level 3 and Level 4, the number of hash functions (N) is always 1 + 1 = 2 ( h 1 ( ) and f ( ) ), and the widths are 4 × 8 = 32 bits and 4 × 16 = 64 bits (cell × fingerprint) accordingly. In addition, all the hash indexes in the system are generated by the family of cyclic redundancy check (CRC) functions, including the CRC-12, CRC-16, CRC-32, CRC-48, CRC-64, etc., which can be easily implemented with a shift register and some exclusive-OR gates on FPGA. The total memory requirement for a CBHF is shown as well.
Table 3. Configuration of CBHF structure for testing routing prefix sets.
According to Equations (2) and (10), we can obtain the theoretical false positive probability of each filter in the CBHF. Furthermore, we compute the false positive probability of the CBHF using Equation (11). Table 4 shows the above false positive probability according to the configuration of the CBHF. In addition, a simulation is run for each system configuration, where we perform lookups of all the remaining elements (which have not been inserted) for Level 1 and Level 2, and 100 k lookups for different random elements that have not been inserted in Level 3 and Level 4. The results are also summarized in Table 4.
Table 4. Theoretical and observed false positive probability.
Note that, in fact, the results for Level 1 are free from false positives, thus we recalculate the theoretical false positive probability of the CBHF. The results show that each filter in the CBHF remains a stable false positive probability through dynamic reconfiguration, which adopts the conventions that the Bloom filter will be reconstructed when its occupancy exceeds 60%, and the occupancy is 90% for the cuckoo filter. In theory, the CBHF structure can achieve an accuracy rate of almost 95% with less storage space. And, our simulation results show that higher accuracy rates can be reached in practical configurations.

6. Hardware Implementation

In this section, we describe the hardware implementation results of our proposed lookup architecture in detail. The proposed satellite IP route lookup engine is implemented in Verilog HDL on a Xilinx Virtex-7 XC7VX690T FPGA chip, and the development environment is Vivado 2019.2. Our prototype design is downloaded on the FPGA development board, which is equipped with two 4 GB DDR3 synchronous DRAMs (SDRAMs). We use a single DDR3 memory for next-hop information (NHI) storage.
Table 5 shows the on-chip memory requirement in the hardware implementation. There are two types of on-chip BRAMs: 18 k-bit blocks and 36 k-bit blocks, which are automatically allocated for each module. Note that the memory of each module is allocated in standard blocks, thus the practical memory requirement of filter structure in the CBHF is greater than that in Table 3. The on-chip memory requirement for other modules is shown as well. The value of total BRAMs is the capacity of memory required in KByte, and the utilization rate of total BRAMs is based on the amount of available block memory, which is 52,920 Kbits on an XC7VX690T chip.
Table 5. On-chip memory requirement in hardware implementation.
The resource utilization is listed in Table 6. It is shown that the utilization of BRAM increases considerably as the size growth of the routing prefix sets increases, while the utilization of LUT and FF is almost invariant. In addition, the utilization of IO and BUFG is constant, since their numbers do not depend on the sizes of the routing prefix sets.
Table 6. Hardware resource utilization.
In order to enhance system reliability, triple modular redundancy (TMR) is widely adopted in satellite applications, which requires two extra duplicate systems to guarantee correct operations. Since each individual resource utilization of our prototype design is less than 1/3, the basic requirement of a TMR system can be met.
The whole design is operating at 200 MHz. In this case, the worst negative slack (WNS) and the total on-chip power are presented in the Vivado 2019.2 development tool, which is reported in Table 7.
Table 7. Worst negative slack and power consumption.

7. Results and Analysis

We implemented a testbench in Verilog HDL to check system timing. Table 8 shows the time consumption of the system in all cases. Since the two subtries always exist (in real-life routing tables) in Level 1, where the total number of potential subtries is 2 1 , and Level 1 has no false positives, the special case of the default prefix matching is not shown. In the following, we denote the subsystem before two FIFO buffers (including the CBHF, Bitmap Lookup, and PR-Trie Lookup modules) as “Phase 1”, and the following subsystem as “Phase 2”, which can be seen in Figure 5.
Table 8. Theoretical throughput under different subtrie accesses.
As shown in Table 8, for the worst case, in which there is a false positive in each filter except Level 1, the number of subtrie accesses is 4, and the time consumption in Phase 1 is 14 clock cycles (clks). Similarly, in other cases, the results are also obtained. Since there is a pipelining design between two phases, the lookup time of the system ( T sys ) is the maximum in two phases. Hence, the theoretical throughput (X) can be calculated as follows:
X = F T sys · ( E p + E s + E f   min + E i ) .
The meanings and values of notations used in Equation (20) are shown in Table 9. In addition, using Equation (16), we also obtain the average system throughput for each testing routing prefix set, which is shown in Table 10.
Table 9. Meaning and value of notations.
Table 10. Theoretical throughput for testing routing prefix sets.
In order to evaluate the system throughput more objectively, we constructed the relevant hash tables and linear tables for each routing prefix set (in conditions of different system configurations) offline. Then, we randomly tested 100 k IP addresses for each routing prefix set. Table 11 reports their average time per lookup and their system throughput.
Table 11. System throughput in simulation testing.
The results show that the average lookup time of our proposed satellite IP route lookup architecture remains steady in different configurations, and its single-port throughput can reach at least 13.44 Gbps, which can match the current mainstream inter-satellite links (ISLs) of 10 Gbps. To improve throughput, generally, we adopt the system of multiple lookup engines. Note that the resource utilization is still able to guarantee the standard requirement of TMR, even if we have three lookup engines on an XC7VX690T chip. Thus, the expected performance can be improved by three times, equivalent to a throughput beyond 40 Gbps. Since the scale of routing prefix sets used in our simulation is much larger than current satellite routing tables, our proposed architecture can provide considerable throughput in practical satellite applications.

8. Conclusions

In this paper, we have proposed a low-cost, low-power, high-speed satellite IP route lookup architecture. The proposed architecture is based on the cuckoo Bloom hybrid filter (CBHF) structure and the PR-Trie algorithm. We have evaluated the performance of the proposed architecture using C++ language at the behavioral level and Verilog HDL at the hardware level. The behavioral simulation and hardware implementation results show that the CBHF structure can achieve better lookup performance with less storage space, and the single-port throughput of our proposed lookup architecture is over 13 Gbps on an FPGA board with one single DDR3 SDRAM when operating at 200 MHz. In other words, we provided evidence for the current viability of our approach for satellite IP route lookup.
In addition, if implemented with multiple lookup engines, our proposed satellite IP route lookup architecture could achieve 40 Gbps throughput with three DDR3 SDRAM chips. In comparison, TCAM-based or SRAM-based solutions could not satisfy the requirements of space applications due to their massive power consumption and costs. Therefore, algorithms such as ours that employ SDRAM devices and pipelining technologies to achieve comparable or better performance will continue to be future research directions in satellite-borne applications.

Author Contributions

Conceptualization, Y.Z. and L.Q.; methodology, L.Q.; software, L.H. and Q.C.; hardware, Y.Z. and L.Q.; validation, Y.Z.; investigation, L.Q.; resources, L.H. and X.X.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, L.Q. and X.X.; visualization, Y.Z.; supervision, Q.C.; project administration, X.X.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC) under Grant No. 62171466.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

All research data are publicly available.

Acknowledgments

The authors wish to thank Bingyang Fu (who is now with Key Laboratory of Information Systems Engineering, China) for insightful discussions on the cuckoo filter.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Algorithms for Section 3.1 and Section 3.2

Algorithm A1: CBHF Inserting
  Input: element x to be inserted, which belongs to Level n
  Output: element x has been inserted into the CBHF
Applsci 13 10360 i001
Algorithm A2: CBHF Querying
  Input: element x belonging to Level n
  Output: membership query result in the CBHF
Applsci 13 10360 i002
Algorithm A3: CBHF Deleting
  Input: element x to be deleted, which belongs to Level n
  Output: element x has been deleted from the CBHF
Applsci 13 10360 i003
Algorithm A4: PR-Trie Lookup
  Input: IP address
  Output: next-hop information (NHI) in the routing table
Applsci 13 10360 i004
Algorithm A5: Satellite IP Route Lookup Using CBHF
  Input: IP address
  Output: next-hop information (NHI) in the routing table
Applsci 13 10360 i005

References

  1. Space Exploration Holdings, LLC. SAT-MOD-20200417-00037. 2023. Available online: https://docs.fcc.gov/public/attachments/FCC-21-48A1_Rcd.pdf (accessed on 20 May 2023).
  2. Kuiper Systems LLC. SAT-MOD-20211207-00186. 2023. Available online: https://docs.fcc.gov/public/attachments/DA-23-114A1.pdf (accessed on 20 May 2023).
  3. WorldVu Satellites Limited. SAT-MPL-20200526-00062 and SAT-APL-20210112-00007. 2023. Available online: https://docs.fcc.gov/public/attachments/DA-22-970A1.pdf (accessed on 20 May 2023).
  4. Pachler, N.; del Portillo, I.; Crawley, E.F.; Cameron, B.G. An Updated Comparison of Four Low Earth Orbit Satellite Constellation Systems to Provide Global Broadband. In Proceedings of the 2021 IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada, 14–23 June 2021; pp. 1–7. [Google Scholar]
  5. Deng, D.; Zheng, Z.; Huo, M. A survey: The progress of routing technology in satellite communication networks. In Proceedings of the 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC), Jilin, China, 19–22 August 2011; pp. 286–291. [Google Scholar]
  6. Fuller, V.; Li, T.; Yu, J.; Varadhan, K. Classless Inter-Domain Routing (CIDR): An Address Assignment and Aggregation Strategy. Technical Report, RFC 1519. September 1993. Available online: https://www.rfc-editor.org/rfc/rfc1519.html (accessed on 20 May 2023).
  7. Zheng, K.; Hu, C.; Lu, H.; Liu, B. A TCAM-Based Distributed Parallel IP Lookup Scheme and Performance Analysis. IEEE/ACM Trans. Netw. 2006, 14, 863–875. [Google Scholar]
  8. Huang, J.Y.; Wang, P.C. TCAM-Based IP Address Lookup Using Longest Suffix Split. IEEE/ACM Trans. Netw. 2018, 26, 976–989. [Google Scholar] [CrossRef]
  9. Zhou, S.; Prasanna, V.K. Scalable GPU-accelerated IPv6 Lookup Using Hierarchical Perfect Hashing. In Proceedings of the 2015 IEEE Global Communications Conference (GLOBECOM), San Diego, CA, USA, 6–10 December 2015; pp. 1–6. [Google Scholar]
  10. Zhang, Y.; Xu, M.; Chen, P.; Wang, N. IP Lookup Using Minimal Perfect Hashing. In Proceedings of the IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), Beijing, China, 20–21 June 2016; pp. 1–2. [Google Scholar]
  11. Eatherton, W.; Varghese, G.; Dittia, Z. Tree Bitmap: Hardware/Software IP Lookups with Incremental Updates. ACM SIGCOMM Comput. Commun. Rev. 2004, 34, 97–122. [Google Scholar] [CrossRef]
  12. Lim, H.; Yim, C.; Swartzlander, E.E. Priority Tries for IP Address Lookup. IEEE Trans. Comput. 2010, 59, 784–794. [Google Scholar] [CrossRef]
  13. Bando, M.; Lin, Y.L.; Chao, H.J. FlashTrie: Beyond 100-Gb/s IP Route Lookup Using Hash-Based Prefix-Compressed Trie. IEEE/ACM Trans. Netw. 2012, 20, 1262–1275. [Google Scholar] [CrossRef]
  14. Dharmapurikar, S.; Krishnamurthy, P.; Taylor, D.E. Longest Prefix Matching Using Bloom Filters. IEEE/ACM Trans. Netw. 2006, 14, 397–409. [Google Scholar] [CrossRef]
  15. Byun, H.; Li, Q.; Lim, H. Vectored-Bloom Filter for IP Address Lookup: Algorithm and Hardware Architectures. Appl. Sci. 2019, 9, 4621. [Google Scholar] [CrossRef]
  16. Hennessy, J.L.; Patterson, D.A. Computer Architecture: A Quantitative Approach; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
  17. Panda, P.R.; Dutt, N.D.; Nicolau, A. On-Chip vs. Off-Chip Memory: The Data Partitioning Problem in Embedded Processor-Based Systems. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2000, 5, 682–704. [Google Scholar] [CrossRef]
  18. Leon, A. Field Programmable Gate Arrays in Space. IEEE Instrum. Meas. Mag. 2003, 6, 42–48. [Google Scholar] [CrossRef]
  19. Sterpone, L.; Porrmann, M.; Hagemeyer, J. A Novel Fault Tolerant and Runtime Reconfigurable Platform for Satellite Payload Processing. IEEE Trans. Comput. 2013, 62, 1508–1525. [Google Scholar] [CrossRef]
  20. Siegle, F.; Vladimirova, T.; Ilstad, J.; Emam, O. Availability Analysis for Satellite Data Processing Systems Based on SRAM FPGAs. IEEE Trans. Aerosp. Electron. Syst. 2016, 52, 977–989. [Google Scholar] [CrossRef]
  21. Bloom, B.H. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM 1970, 13, 422–426. [Google Scholar] [CrossRef]
  22. Gupta, P.; McKeown, N. Algorithms for Packet Classification. IEEE Netw. 2001, 15, 24–32. [Google Scholar] [CrossRef]
  23. Broder, A.; Mitzenmacher, M. Network Applications of Bloom Filters: A Survey. Internet Math. 2004, 1, 485–509. [Google Scholar] [CrossRef]
  24. Geravand, S.; Ahmadi, M. Bloom Filter Applications in Network Security: A State-of-the-Art Survey. Comput. Netw. 2013, 57, 4047–4064. [Google Scholar] [CrossRef]
  25. Fan, B.; Andersen, D.G.; Kaminsky, M.; Mitzenmacher, M.D. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, Sydney Australia, 2–5 December 2014; pp. 75–88. [Google Scholar]
  26. Kwon, M.; Reviriego, P.; Pontarelli, S. A Length-Aware Cuckoo Filter for Faster IP Lookup. In Proceedings of the IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), San Francisco, CA, USA, 10–14 April 2016; pp. 1071–1072. [Google Scholar]
  27. Cui, J.; Zhang, J.; Zhong, H.; Xu, Y. SPACF: A Secure Privacy-preserving Authentication Scheme for VANET with Cuckoo Filter. IEEE Trans. Veh. Technol. 2017, 66, 10283–10295. [Google Scholar] [CrossRef]
  28. Grashöfer, J.; Jacob, F.; Hartenstein, H. Towards Application of Cuckoo Filters in Network Security Monitoring. In Proceedings of the 14th International Conference on Network and Service Management (CNSM), Rome, Italy, 5–9 November 2018; pp. 373–377. [Google Scholar]
  29. Pagh, R.; Rodler, F.F. Cuckoo Hashing. J. Algorithms 2004, 51, 122–144. [Google Scholar] [CrossRef]
  30. Zhang, Y.; Qiao, L.; Hu, L.; Chen, Q.; Zou, S.; Liu, X. A Hybrid Scheme of Filter Implemented on FPGA for Faster IP Route Lookup. In Proceedings of the 2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Beijing, China, 3–5 October 2022; pp. 1512–1518. [Google Scholar]
  31. Zhang, Y.; Qiao, L.; Wang, H. PR-Trie: A Hybrid Trie with Ant Colony Optimization Based Prefix Partitioning for Memory-Efficient IPv4/IPv6 Route Lookup. IEICE Trans. Inf. Syst. 2023, 106, 509–522. [Google Scholar] [CrossRef]
  32. Fan, L.; Cao, P.; Almeida, J.; Broder, A.Z. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Trans. Netw. 2000, 8, 281–293. [Google Scholar] [CrossRef]
  33. University of Oregon Route Views Archive Project. 2023. Available online: http://routeviews.org/ (accessed on 20 May 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.