Design of a Heterogeneous-Based Network Intrusion Detection System and Compiler

Lin, Zhigui; Zhang, Xiaofeng; Liu, Qi; Cui, Jun

doi:10.3390/app15095012

Open AccessArticle

Design of a Heterogeneous-Based Network Intrusion Detection System and Compiler

by

Zhigui Lin

¹,

Xiaofeng Zhang

¹,

Qi Liu

^2,* and

Jun Cui

^3,†

¹

School of Electronics and Information Engineering, Tiangong University, Tianjin 300387, China

²

School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China

³

School of Life Sciences, Tiangong University, Tianjin 300387, China

^*

Author to whom correspondence should be addressed.

^†

Deceased 30 July 2024.

Appl. Sci. 2025, 15(9), 5012; https://doi.org/10.3390/app15095012

Submission received: 3 April 2025 / Revised: 25 April 2025 / Accepted: 28 April 2025 / Published: 30 April 2025

Download

Browse Figures

Versions Notes

Abstract

With the continuous growth of network traffic scale, traditional software-based intrusion detection systems (IDS) constrained by CPU-processing capabilities struggle to meet the requirements of 100 Gbps high-speed network environments. While existing heterogeneous acceleration solutions enhance detection efficiency through hardware acceleration, they still exhibit technical limitations including insufficient throughput, simplistic task offloading mechanisms, and poor compatibility in rule compilation. This paper is based on the collaborative design consept of “hardware-accelerated preprocessing + software-based precise detection”, fully leveraging FPGA’s parallel processing capabilities and CPU’s flexible computation advantages. We construct an FPGA + CPU heterogeneous detection system featuring a five-tuple segmented matching architecture, which integrates hash bitmap and shift-or algorithms to achieve fast-pattern matching. A hardware compiler supporting 10,000+ detection rules is developed, enhancing hardware adaptability through packet optimization and mask compilation. Experimental results demonstrate that the system maintains 100 Gbps throughput with 2000–10,000 rule sets, achieves over 97% detection accuracy, and consumes only 33% hardware logic resources. Compared with Snort software implementation on equivalent configurations, it delivers 10.5–27.1 times throughput improvement, providing an efficient and reliable solution for real-time intrusion detection in high-speed networks.

Keywords:

network intrusion detection system (NIDS); heterogeneity; rule compilation; pre-filtering

1. Introduction

As a core component of network security protection systems [1,2], NIDS performs critical functions in identifying and defending against malicious activities through real-time traffic analysis. Based on detection mechanisms, existing IDS can be categorized into anomaly-based IDS and signature-based IDS: the former detects deviations by establishing baseline models of network behaviors [3], offering advantages in detecting unknown attacks but suffering from high false positive rates and substantial computational overhead; the latter (e.g., Snort [4]) relies on predefined rule features for precise matching [5], delivering high detection efficiency and low false positives, yet faces limitations in rule database scalability and maintenance costs.

Numerous researchers have proposed various network intrusion detection systems based on these characteristics. These systems optimize detection algorithms [6,7,8,9,10] and deploy heterogeneous architectures with parallelized hardware [11,12,13,14,15] to accelerate anomaly detection, thereby effectively improving system throughput and detection accuracy. For instance, Said et al. [6] developed a hybrid convolutional neural network and bidirectional long short-term memory network to enhance system performance through binary and multiclass classification. Shpur et al. [7] integrated machine learning with a multiprocessor system for parallel packet analysis, significantly improving detection accuracy. Wang et al. [8] migrated partial Snort functionalities to the Linux kernel, leveraging eBPF for fast-pattern matching to boost overall system throughput. Morianos et al. [11] combined deep learning with FPGA-based hardware acceleration, achieving low-latency, high-throughput processing while enabling lightweight architecture scalability based on network bandwidth. Sheeraz et al. [12] utilized PF_RING and DPDK to achieve near-1G throughput, employing smart network interface cards to guarantee line-rate traffic processing. Chen et al. [13] implemented efficient regular expression-processing filters on FPGAs, enabling pattern matching and flow classification in IDS systems under 100 Gbps networks.

However, the aforementioned designs still exhibit three major limitations:

(1) Software throughput constraints: CPU-based software solutions struggle to surpass the 100 Gbps processing bottleneck [6,7,9];

(2) Simplistic hardware offloading: Existing acceleration schemes primarily focus on optimizing specific modules, lacking system-level collaborative design [16,17];

(3) Imperfect rule compilation: Hardware acceleration platforms inadequately support complex rule sets, limiting the scale of deployable rules in practice [13,15,18].

To address these challenges, this paper proposes a “heterogeneous collaborative processing + rule hierarchical compilation” strategy. By implementing FPGA-based hardware preprocessing for coarse-grained traffic filtering and CPU-based software for precise verification, a software–hardware collaborative detection system is established. The specific design comprises the following:

(1) Heterogeneous IDS Architecture: The FPGA serves as a 100 Gbps traffic reception and forwarding port, significantly enhancing the system’s traffic handling capacity. Simultaneously, the system fully leverages the respective advantages of software and hardware by deploying different matching modules to collaboratively complete anomaly detection tasks.

(2) Software–Hardware Cooperative Matching Algorithm and Design: On the FPGA side, a five-tuple segmented matching architecture integrates hash bitmap and shift-or algorithms to implement first-round traffic filtering. On the CPU side, suspicious data are received and consolidated to complete full-rule matching and issue control commands.

(3) High-Efficiency Rule Compiler: Supports 10,000 Snort rules, employing group optimization and mask compilation techniques to improve hardware adaptability.

The remainder of this paper is organized as follows: Section 2 discusses the related work on heterogeneous IDS system design and rule compilers. Section 3 presents the design of a heterogeneous network intrusion detection system and its compiler. Section 4 describes the experimental test environment, while Section 5 evaluates the experimental results and system performance. Finally, Section 6 concludes the paper.

2. Related Work

We have conducted a detailed investigation on heterogeneous intrusion detection systems and rule compilation, with the specific details outlined below:

2.1. Heterogeneous Intrusion Detection System

Network intrusion detection systems are typically deployed in scenarios such as enterprise-level networks, cloud service providers, and industrial control systems (ICS). These systems are tasked with performing critical functions including communication and service monitoring, data center protection, and control command identification, thereby constituting a critical component of cybersecurity protection technologies. With the rapid advancement of network technologies, research focus in cybersecurity has increasingly centered on intrusion detection, with heterogeneous-based IDS emerging as a major branch. Current research in this field primarily follows two directions: anomaly-based IDS and signature-based IDS. This section reviews related work in both categories. Table 1 describes the relevant studies and characteristics.

A. Anomaly-Based Heterogeneous Intrusion Detection. Anomaly-based heterogeneous IDS employ FPGA, GPU, or other computing platforms to construct heterogeneous architectures. These systems integrate machine learning and deep learning algorithms for model training on the platforms, establishing behavioral baselines to evaluate network behavior deviations and detect both known and unknown anomalies. Maciel et al. [20] designed a CPU + FPGA-based convolutional neural network for IDS, demonstrating significant improvements in power consumption, energy efficiency, and performance compared to software implementations through platform deployment and training. Hu et al. [22] proposed a collaborative anomaly traffic detection system that deploys Bloom filters on FPGA for traffic pre-filtering and combines them with CPU-hosted LightGBM classifiers for software–hardware cooperative decision-making, enhancing detection efficiency through dual mechanisms of rule matching and anomaly detection. Liu et al. [25] described a gigabit NIDS architecture utilizing programmable network processors and FPGA coprocessors, showcasing performance improvements via optimized parallel pattern matching and efficient memory access in FPGAs.

However, despite leveraging hardware acceleration for task offloading and speed enhancement, these designs remain constrained by the inherent limitations of anomaly-based IDS. System performance heavily depends on training datasets and model implementations, while high false positive rates necessitate continuous model retraining.

B. Signature-Based Heterogeneous Intrusion Detection. Signature-based heterogeneous IDS rely on predefined rule sets to identify known attack patterns. By offloading string matching and regular expression operations to ASIC, FPGA, or other hardware platforms, these systems directly filter non-matching traffic, alleviating CPU-processing burdens. Chen et al. [13] introduced an FPGA-based IDS offloading system that comprehensively delegates NIC operations, rule pattern matching, and traffic classification, enabling dual detection mechanisms (signature-based regular expression processing and anomaly-based traffic classification) for comprehensive attack protection. Vasiliadis et al. [14] proposed “Gnort”, a GPU + CPU IDS that offloads computationally intensive pattern-matching tasks to underutilized GPU resources, achieving higher system throughput when combined with Snort. Zhao et al. [15] presented an FPGA-prioritized design that centralizes traffic reception, processing, and control in FPGA, achieving 100 Gbps throughput with 10K rules and 100K concurrent flows through flow table reorganization and multi-string matching optimizations.

As illustrated in Table 2, signature-based heterogeneous intrusion detection systems demonstrate superior performance in detection accuracy and throughput compared to anomaly-based heterogeneous intrusion detection systems. To mitigate their inherent limitation of solely detecting known attack patterns, this deficiency can be effectively compensated through enhanced rule set update frequency and expanded rule set coverage. Consequently, this study adopts a signature-based heterogeneous computing architecture for the intrusion detection system design. At the implementation level, we integrate the header matching and fast-pattern matching methodologies derived from Snort3’s design philosophy. By implementing hardware-accelerated deployment on an FPGA (Field-Programmable Gate Array) platform, the proposed architecture achieves significant improvements in system throughput and rule set scalability.

2.2. Rule Compilation

The Snort rule file serves as the core carrier of Snort’s detection logic, achieving attack detection through defining traffic characteristics and matching conditions. Each Snort rule comprises two components: the rule header and the rule options.

The basic matching format of the rule header is: [action] [protocol] [source address] [source port] [direction] [destination address] [destination port]. The “action” determines the response behavior after matching, including “alert”, “log”, “drop”, and “pass”; the “protocol” restricts the rule to specific protocol types, typically “tcp”, “udp”, “icmp”, or “ip”; the “addresses and ports” represent the source/destination IP addresses (supporting CIDR format) and source/destination ports (allowing range formats); the “direction” indicates traffic flow orientation, including unidirectional and bidirectional traffic.

The rule options define specific detection logic in key-value pairs within parentheses, separated by semicolons. The rule options are categorized into matching conditions and metadata. Matching conditions typically include “content”, “pcre”, “flow”, “depth/offset”, and “dsize”, used to identify specific payload strings, complex patterns, session states, search ranges, and packet sizes; metadata include “msg”, “sid”, “classtype”, “rev”, and “reference”, providing alert descriptions, rule identifiers, attack classifications, version numbers, and external threat intelligence references.

The purpose of rule compilation is to translate fields in Snort rules into data structures optimized for rapid matching, stored in memory. This enables the system to efficiently determine whether inspected data match target fields through frequent and fast lookups during anomaly detection, thereby enhancing detection efficiency. In Snort, rule compilation primarily involves rule file parsing, preprocessing, optimization, and detection structure generation. With technological advancements and algorithmic improvements, Snort has integrated the Hyperscan [31] library to achieve efficient fast-pattern matching. Leveraging multi-pattern matching, SIMD [32] acceleration, rule grouping, protocol parsing optimization, and flow state management, the system significantly improves performance for real-time traffic analysis under large-scale rule sets. Further system tuning can be applied to enhance IDS/IPS efficiency based on specific requirements.

To further boost IDS throughput and adapt to multi-platform processing characteristics, numerous rule compilation studies and systems have been proposed. Mitra et al. [33] introduced a PCRE compiler to generate VHDL code corresponding to Snort regular expression rules for subsequent NFA-based regex engines. Zhong et al. [34] proposed the XAV scheme for high-performance regex matching, combining anchored DFA to address state explosion and anchor DFA with pre-filtering and regex decomposition techniques to improve average time complexity, achieving 75Gbps matching throughput on an FPGA-CPU architecture. Cicolini et al. [35] designed a multi-RE finite automaton that extends finite automata models by exploiting similarities in application-specific rule sets to enhance RE parallelization. The work also extended the INFAnt algorithm for MFSA execution, achieving 5.99× throughput improvement and 4.05× acceleration. Woodruff et al. [36] presented Secco, a system compiling regex sets into specialized overlays, demonstrating 1.7× and 5.9× increases in supported regex quantities under fixed resources compared to prefix consolidation techniques through stateless transitions and pure symbolic reconfiguration on FPGA overlays.

These studies predominantly focus on regex compiler design and implementation, driven by the characteristics of rules in modern intrusion detection systems. In Snort rule sets, rules are crafted by cybersecurity experts through data replay, analysis, and statistical characterization of anomalous traffic, incorporating scenario-specific and anomaly features. Since many rule fields are regex-based to ensure generality, software-based matching incurs significant performance overhead. Consequently, most rule compiler research aims to offload regex matching to hardware for parallelized preprocessing. However, rule compilation is not limited to regex optimization; string compilation can also enhance matching efficiency. The rule compiler in this work draws inspiration from Snort3’s rule-matching workflow, deploying software-side five-tuple matching and fast-pattern matching on FPGA. By integrating hash bitmap and shift-or structures, we achieve efficient FPGA-based fast-pattern deployment.

3. Materials and Methods

3.1. Requirements Analysis

As described in Section 2 (Related Work), existing heterogeneous intrusion detection systems and rule compilation research still exhibit deficiencies such as insufficient throughput, single-task offloading mechanisms, and poor rule compilation compatibility. To address these limitations, this study proposes corresponding improvements in system design and rule compiler architecture. As illustrated in Figure 1 the overall system architecture is structured as follows:

The heterogeneous network intrusion detection system comprises two components: a front-end FPGA-based Pre-Filtering module and a back-end CPU-based intrusion detection and rule compilation module. The former performs unified pre-filtering on all incoming traffic to reduce the detection load on the CPU. By designing a reliable and efficient FPGA pre-filtering acceleration framework, it achieves collaborative intrusion detection through five-tuple matching and fast-pattern matching. The latter operates the software-side Snort3 engine to identify anomalous traffic matching rule characteristics and compiles rule sets into FPGA MIF.

Within the rule compilation module, five-tuple matching compilation is subdivided into port compilation and protocol/IP compilation based on rule features and traffic characteristics. Fast-pattern matching compilation is designed to include hash bitmap construction, shift-or mask computation, and rule mapping. Each component splits the corresponding rule data, then calculates, maps, and stores them into MIF according to the respective algorithms.

3.2. FPGA-Based Pre-Filtering Design

In network environments, anomalous traffic typically exhibits specific characteristics such as IP addresses and payload fields. Such traffic can be intercepted through configured firewall policies and customized rules. Compared to traditional CPU-based network traffic processing solutions, the FPGA-based Pre-Filtering module implements anomaly data screening through predefined rules, significantly improving system matching performance with lower resource consumption. Simultaneously, this module serves as a preprocessing entry point to stably receive high-throughput network traffic, reducing system dependency on NIC and CPU performance, thereby enhancing overall operational efficiency.

To achieve end-to-end packet processing and detection, the FPGA-based Pre-Filtering module comprises the following three components:

Ethernet Module: Implements external network connectivity via Ethernet, receiving and parsing incoming traffic. It extracts five-tuple information and packet payloads before transmitting them to the Rule Pre-Filtering module.

Rule Pre-Filtering Module: Employs a dual-circuit architecture combining five-tuple matching and fast-pattern matching. It performs anomaly detection on input data, filters matched packets with associated rule IDs, and forwards them to the PCIe module.

PCIe Module: Transmits rule-matched packets to the CPU-side Snort system for deep verification through a PCIe Gen3 x8 interface. It also enables control information exchange between the host and FPGA via PCIe drivers.

Design of the Rule Pre-Filtering Module

The Rule Pre-Filtering module consists of two components: five-tuple matching and fast-pattern matching, which collaboratively implement traffic pre-filtering. Five-tuple matching primarily processes the five-tuple information (source IP address, destination IP address, source port number, destination port number, and transport protocol) of IP packets within Ethernet frames. Since the five-tuple determines the session to which a packet belongs, it is defined as a core component of rules in network intrusion detection systems for identifying anomalous traffic. Fast-pattern matching performs payload detection on packets that pass five-tuple matching. Significant redundancy exists in the five-tuples of Snort rules—among the thousands of the Snort rules, there are only about 200 distinct header rules [37]. Consequently, a single five-tuple may correspond to multiple fast-pattern strings across different rules, necessitating secondary payload matching against the pattern strings of all matched rules. Figure 2 illustrates the detailed architecture of the Rule Pre-Filtering module:

A. Five-tuple Matching. The five-tuple matching module is partitioned into two sub-units: port matching and protocol/IP matching. We adopted the “port-separated” matching strategy, which prioritizes port matching at the initial stage and eliminates cross-port hit correlation analysis. Instead, each data packet is divided into two segments for source port and destination port matching. Packet hits are determined by the SID corresponding to the matched ports, bypassing rule ID identification and directly outputting hit-confirmed packet indices to the fast-pattern matching module. This approach significantly improves the execution efficiency of the five-tuple matching module. The detailed matching workflow is structured as follows:

The source/destination ports are first analyzed to determine the rule matching scope. After retrieving the rule IDs, they are sent to the protocol/IP matching module, which queries the corresponding source/destination IP CIDR tables based on protocol type and rule ID, then performs IP matching. Given that port numbers often map to multiple rules (making retrieval operations potential performance bottlenecks), the resource allocation ratio between port matching and protocol/IP matching modules is set to 2:1. If the number of output rules exceeds 10, the packet is classified as high-probability anomaly and directly forwarded to the fast-pattern matching module for string detection under “any” type processing.

Five-tuple matching performance primarily depends on packet quantity. Theoretical calculations indicate the following: at 200 MHz FPGA clock frequency, a single port matching module achieves approximately 20 Mpps processing capacity. For 100 Gbps traffic with 700 B average packet length, the theoretical packet rate is 17.86 Mpps, which a single port module can handle. However, extreme packet lengths (e.g., 64 B) in real-world scenarios may cause instantaneous packet rate surges. Full-speed port matching could generate rule queries exceeding the protocol/IP module’s capacity (66.7 M rules/s, with single ports potentially associating with multiple rules). Therefore, a dual-port matching module architecture is adopted, each port module equipped with two protocol/IP processing units for traffic balancing.

Post protocol/IP module processing, matched source/destination IP-associated rule IDs are extracted, and corresponding packets are transferred to the fast-pattern matching module for payload anomaly detection.

B. Fast-pattern Matching. Fast-pattern matching comprises three stages: hash bitmap matching, shift-or verification, and rule mapping. We adopted the “cooperative hit” matching strategy, integrating hash bitmap matching (from the fast-pattern matching phase) with shift-or matching characteristics. The former rapidly filters non-hit traffic while minimizing memory footprint, generating candidate packets (including false positives) for secondary verification. The latter leverages deterministic matching precision to eliminate residual false positives and confirm anomalous packets. This cooperative mechanism optimizes workflow efficiency while maintaining high throughput and detection accuracy. The detailed matching workflow is structured as follows:

The five-tuple-filtered packets are input in 8-byte segments, generating variable-length character combinations. Hash addresses are computed (compilation logic detailed in Section Design of the Rule Pre-Filtering Module), followed by bitmap state retrieval via hash bucket indices. If the bitmap is set, the string enters the shift-or module for mask value calculation. Final SID determination is obtained through hash-based rule mapping table lookup, with results transmitted via the PCIe module. Unmatched packets are forwarded normally. As shown in Figure 2, the detection process completes through four steps: length fragmentation, hash table lookup, shift-or mask verification, and rule mapping.

Throughput constitutes the core performance metric for fast-pattern matching. Given the low proportion of real-network anomalous traffic, five-tuple pre-filtering eliminates at least 30% of normal traffic. A single fast-pattern matching module operating at 400 MHz clock frequency with 8-byte input and 5-stage pipeline design achieves a measured processing capacity of about 5.12 Gbps. Deploying 16 parallel modules satisfies anomaly detection requirements for most network environments.

3.3. Design of CPU-Based Intrusion Detection and Rule Compilation

Based on the flexible and convenient characteristics of software development and its strong adaptability, two functional modules have been deployed on the CPU side, including the rule compilation module and the Snort module, to ensure the complete operation of the heterogeneous network intrusion detection system. The following is a detailed description of the functionalities of the two modules:

3.3.1. Rule Compilation Module

This module compiles Snort rule data into segmented memory files for FPGA-based matching. To enhance overall matching efficiency and optimize FPGA memory utilization, the rule set undergoes preprocessing and compilation workflow optimization. The implementation integrates five-tuple segmented matching and fast-pattern bit-parallel matching designs.

As illustrated in Figure 3, when a Snort rule set is loaded into the compilation module, rule parsing is first executed to extract and convert fields such as five-tuple (source/destination IP/port, protocol) and content. Subsequent pattern processing and rule optimization steps involve field splitting, wildcard expansion, and rule merging.

A concrete example is provided to elucidate the compilation design. After parsing, the five-tuple information and SID are fed into the five-tuple compilation submodule (e.g., variables like $HOME_NET, $EXTERNAL_NET, and $SIP_PORTS are predefined in Lua configuration files and resolved during parsing). The compilation process handles ports, protocols, and IPs as follows:

Port Compilation. Snort rules define ports as three types: exact port numbers, port ranges, or any. To address mixed port specifications, an interval prefix mask is employed to partition ranges into subintervals and detect source/destination port matches. Since multiple rules may share overlapping port ranges, the compiler dynamically updates source/destination port rule tables, encoding matched rules as “address + offset” entries for efficient lookup.

As shown in Figure 4, for a rule with src_port: 80–1680, dst_port: 36–80, and sid = 4973, the compiler splits the source/destination port ranges into subintervals, computes maximal prefix masks, and inserts the rule ID into corresponding port tables. Addresses and offsets in the port tables are updated to reflect all matching rules.

Protocol and IP Compilation. Snort rules include four protocol types: UDP, TCP, ICMP, and IP. To improve matching efficiency, we designed a “protocol + IP” compilation mode: distinguishing IP lookup tables by protocols to narrow the scope of IP matching. The four protocols are represented as “0x01, 0x02, 0x03, and 0x04”, respectively. The IP address types in rule files are divided into precise IP addresses, CIDR [38] addresses, and IP lists. Due to their complexity, we uniformly convert IP addresses to CIDR format during compilation, store them in protocol-specific CIDR tables, and establish a rule table to store two groups of corresponding CIDR addresses and rule IDs to match the relevant rules. As shown in Figure 5, we detail this compilation design through the example rule “protocol:tcp, src_IP: 192.168.1.0/24, dst_IP: 10.0.0.5, sid = 4973”:

First, classify according to the protocol type in the rule. For example, the “TCP” protocol is encoded as “0x02”. Then, extract the source IP and destination IP from the rule and perform CIDR conversion. Through the combination of masks and prefix codes, both precise IP addresses and CIDR-format IP address sets can be converted one-to-one to obtain a 32-bit base address and 16-bit mask length. The source IP and destination IP are stored in two separate CIDR tables. If entries duplicate previous content, merging is required. Finally, based on the CIDR addresses in the two tables, store them in the rule table in the format “SID_sourceCIDR_destCIDR” (48 bits total). This completes the protocol and IP address compilation, finalizing the five-tuple rule compilation.

After completing the five-tuple rule compilation, the system enters the fast-pattern compilation phase. Since nearly 70% of data packets require fast-pattern detection, the compilation quality of this module directly affects system throughput and detection accuracy, posing significant challenges to compiler design. To achieve 100 Gbps line-speed processing capability, we integrate hash functions with bitmaps to realize memory-efficient access through the hash-discretized distribution characteristics of fast-pattern strings. The shift-or exact matching algorithm is incorporated to eliminate hash bitmap false positives and improve detection accuracy. Finally, we establish a strict binding mechanism between rule IDs and hash addresses to avoid mismatching risks. The three-part compilation design is as follows:

Hash Bitmap [39,40] construction. Taking Figure 6 as an example, “Gizmo” is the fast-pattern string extracted by the compiler. First, according to the length of the extracted fast-pattern string, it is assigned to a specified bucket, and then converted to ASCII values. Since the string length is less than 8 bytes, zeros need to be padded to the higher bits of the string to ensure input consistency of the hash function, while padding “0” can make the result differences more distinct. Subsequently, hash computation is performed to generate bitmap indices. We designed two hash functions: one performs multiplication operations, and the other performs hierarchical addition and bit shifting, aiming to obtain completely distinct hash addresses.

First, the input 8-byte characters are split. To make the calculated addresses as evenly distributed as possible, the data are multiplied with four predefined 16-bit constants split from the constant “0x0b4e0ef37bc32127”. The specific calculation formula is shown in Listing 1.

Listing 1. mul_hash formula.

1. uint64_t b = 0x0b4e0ef37bc32127;

2. uint16_t bn[4];

3. uint64_t tmpb = b;

4. for(int i = 0; i < 4; i++){

5. bn[i] = tmpb;

6. tmpb = (tmpb ≫ 16);

7. }

8. Mulhash_res res;

9. for(int i = 0; i < 4; i++){

10. res.ab_n[i] = a * bn[i];

11. }

In the formula, a is the ASCII value of the character, and bn[i] represents the four 16-bit constants split from the predefined constant. Through calculation, each character obtains four corresponding multiplicative hash values, denoted as ab_n[i].

Then, the above multiplicative values are input into the hierarchical addition and shifting part. Adjacent characters and cross-character addition combinations are calculated. After hierarchical merging and extracting high-order bits, the final hash address is obtained through addition and shifting. The relevant calculation formulas are shown in Listing 2.

Listing 2. acc_hash formula.

1. uint64_t a01_b0 = aibj[0].ab_n[0] + (aibj[1].ab_n[0] ≪ 8);

2. …

3. uint64_t add_a01_b1_a23_b0 = a01_b1 + a23_b0;

4. …

5. uint64_t sum0 = (add_a01_b1_a23_b0 ≪ 16) + a01_b0;

6. uint64_t sum1 = add_a01_b2_a45_b0 + a23_b1;

7. uint64_t sum2 = (add_a01_b3_a23_b2

8. uint64_t half_sum0 = sum0;

9. uint64_t half_sum1 = (sum2 ≪ 16) + sum1;

10. uint64_t sum = half_sum0 + (half_sum1 ≪ 32);

11. uint32_t addr = (sum >> (64 − NBITS));

In the formula, line 1 represents the combination calculation of adjacent characters, synthesizing a 16-bit value through left-shifting 8 bits and addition. Line 3 represents cross-character addition calculation and cross-layer mixing, aiming to increase bit diffusion and make distribution more uniform. Lines 5–7 represent hierarchical merging, combining the previously calculated values. The modulo operation on sum2 is to extract the lower 16 bits. Lines 8–10 represent the final addition of hash results, where shifting aims to avoid overlap in lower bits affecting the results. Line 11 obtains the hash address. Here, NUM_BITS is 8, so the final hash address length is 8 bits. The higher 5 bits of the hash address are taken as the bitmap address, while the bitmap offset is designed as the lower 3 bits of the hash address, setting the corresponding bit to 1. At this point, hash bitmap construction is completed.

Regarding hash collision resolution, our solution is to detect and report errors during compilation, then resolve them by modifying the fast-pattern string length or content to reclassify them into other buckets for computation.

Shift-or Mask [41] Calculation. Shift-or mask compilation is performed based on bucket partitioning. Fast-pattern strings of different lengths are assigned to different buckets to generate independent masks, simplifying mask generation logic. Simultaneously, for each bucket, we separately calculate and generate independent mask bits. For long string inputs, the module only retains fixed offset bits while clearing other offset bits, thereby avoiding interference of long patterns on short pattern matching. The specific compilation process still uses “Gizmo” as an example, with Table 3 showing the compilation results. According to the string length, it is assigned to Bucket 3 (corresponding to 5 bytes), with offsets numbered 0 to 4 from right to left. During computation, the compiler takes the lower 5 bits of each character’s ASCII value as the calculation parameter for the previous character. For example, when “i” is followed by “z”, the index calculation is index = (0x1A << 8) + 0x69. The specific mask bits are calculated through character offsets: mask_bits = Offset ∗ 8 + (7 − 3). In the formula, “8” represents the total number of buckets, and “7” represents the maximum bucket index. If the offset is 0, the index values 0x006F-0x1F6F are set to 0 at mask bit 4 to avoid affecting other string matches.

By calculating index values and mask values, the specific row and offset in the mif file can be determined, and the corresponding bit is set to “0”, indicating that this position allows a match hit.

Rule Mapping. The purpose of rule mapping is to bind fast-pattern strings to rule IDs, enabling quick retrieval of rule IDs via hash address lookups when matches occur. The specific mapping operation is using the calculated hash address as the mif address, where the stored value corresponds to the rule ID.

Based on the above compiler design, we summarize the specifications of mif files generated by compiling 10,000 rules through each module in Table 4:

3.3.2. Snort Module

The Snort module, composed of the Snort system and PCIe drivers, is responsible for receiving data packets and rule IDs transmitted via PCIe, performing operations such as data integration, reassembly, and storage. Through preprocessing plugins, the data and rule IDs are fed into the Snort system for comprehensive matching detection. The rule set used for compilation in this system is identical to that used for offloading, and rule ID indexing ensures matching accuracy and consistency. Deploying the Snort system as the final detection layer is driven by two reasons: first, Snort rule matching requires not only hits on five-tuple and fast-pattern fields but also verification of other complex fields in the rules, necessitating the software Snort’s capability to handle complex rule matching; second, Snort’s high accuracy ensures the overall system detection precision, avoiding false positives. Since the data volume received via PCIe accounts for only 8–10% of the total traffic (approximately 8–10 Gbps), which is significantly below the CPU’s load threshold, Snort efficiently performs final alerting, logging, traffic statistics, and returns abnormal packet IDs for problematic session termination under this lightweight scenario.

4. Deployment of Experimental Environment

4.1. Platform Construction

In this section, we design an experimental environment to evaluate the performance of our system. The FPGA-side development, synthesis, and downloading are implemented based on Quartus 23.4, with the hardware program deployed on an Intel Stratix 10 SX2800 series board and the software running on an Intel(R) Xeon(R) CPU E5-2696. As illustrated in Figure 7, the experimental environment consists of two parts: traffic generation and testing system. The traffic generation part comprises two 100GbE Mellanox ConnectX-5 NICs. Since a single NIC cannot achieve stable bidirectional 100GbE transmission, two NICs are configured to handle sending and receiving separately. By providing the CIC IDS2017 dataset [42] and custom pcap file templates, Pktgen generates 100GbE traffic and transmits it through the NIC ports. Due to the inability of a single port to meet performance metrics, part of the receiver’s resources is allocated for transmission. In the system under test, the FPGA chip model is Intel Stratix 10 SX2800 with 2753k logic elements (LEs) and 229 Mb M20K memory. The CPU on the software side is an Intel(R) Xeon(R) E5-2696, containing 2 sockets with 22 cores each (88 logical CPUs). The FPGA and CPU are connected via a PCIe Gen 3 × 8 interface for data transfer.

Building upon the experimental environment described, this test deploys two NICs to collaboratively transmit 100 Gbps of suspicious traffic into the target system, aiming to evaluate the intrusion detection performance of the proposed system. By analyzing the packet count received by the destination NIC and Snort-printed anomaly data, the system’s true accuracy rate and throughput performance are calculated through comparison with predefined anomaly traffic datasets. Concurrently, the system’s overall performance is validated by benchmarking against software-only Snort and industry-standard architectures with analogous designs.

We test the system using three rule sets of varying scales while evaluating the practical performance of the rule compiler. The custom rule set contains manually selected and authored rules, while the other two are derived from the snortrules-snapshot-3180 rule set through pruning. Table 5 lists details of the three rule sets, including rule counts, maximum/minimum fast-pattern lengths, and average lengths, ensuring broad coverage suitable for evaluating the proposed heterogeneous NIDS and compiler.

Table 6 shows the FPGA resource utilization when supporting 10,000 rules. The resource occupancy confirms that the system can be fully configured and operated on the FPGA board. Based on our experimental setup, the software component requires only 24 logical cores to handle 10,000 rules under 100 Gbps traffic. Thus, 24 logical cores are used for subsequent comparative experiments.

The evaluation process follows these steps: input the three rule sets into the rule compiler and Snort 3.1.19 to generate MIF files for rule matching; configure FPGA memory files, complete synthesis, and board programming before system initialization; establish normal connections between the software-side Snort and FPGA to initiate anomaly detection and traffic monitoring; activate Pktgen to drive dual NICs for traffic generation. The sender injects templates containing anomalous data to ensure valid rule hits, while the receiver captures packets via Pktgen and computes accuracy and throughput using Snort detection logs.

4.2. Evaluation Criteria

We use average packet size and anomaly traffic ratio to evaluate the throughput and matching performance of the rule compiler and its corresponding FPGA Pre-Filtering module, and use system throughput and accuracy to assess the performance of the heterogeneous anomaly intrusion detection system.

Average packet size: When the input traffic bandwidth remains constant, smaller average packet sizes increase packet counts and the number of five-tuple matching operations required by the system. When exceeding the system’s processing threshold, the overall throughput drops significantly with severe packet loss. Additionally, identical five-tuples across multiple rules increase matching pressure on the module. By adjusting Pktgen’s packet sizes, we test the system’s throughput under 100 Gbps.

Anomaly traffic ratio: This is a critical metric for testing IDS performance. By increasing the number of anomalous packets and gradually raising the anomaly traffic ratio, we evaluate the throughput of the fast-pattern matching module to indirectly validate the rule compiler’s performance. The original CIC IDS2017 dataset contains approximately 15% anomaly traffic. We adjust the anomaly packet ratio in Pktgen-generated traffic to test the three rule sets under different anomaly ratios.

Throughput and accuracy: Since the performance of signature-based IDS is also affected by unknown attacks and traffic visibility, we configure attacks explicitly defined in the rule sets during throughput experiments to ensure all anomalies trigger rules accurately, thereby improving evaluation reliability. The attack types solely manifest as rule-matched anomalous packets without involving inter-packet behavioral characteristics. The throughput formula is as follows:

Throughput (Gbps) = \frac{N u m b e r o f p a c k e t s \times 700 B \times 8}{T e s t i n g d u r a t i o n (s)}

(1)

For accuracy rate calculation, we compute the ratio between correctly detected anomalous packets and the total test packets to derive the accuracy metric

5. Results and Discussion

5.1. Compiler and Matching Module Performance

As shown in Figure 8, the system maintains line-rate operation at average packet sizes above 700 B, which is close to the standard average packet size in typical network environments [15]. When operating below 700 B, the influx of port numbers into the port matching module for table lookups—coupled with frequent rule SID outputs from port hits—overloads the protocol and IP matching modules. This results in packet-processing delays and degraded overall performance. Consequently, subsequent experiments use 700 B as the baseline average packet size to ensure controlled variable conditions.

Figure 9 demonstrates that the system achieves line-rate throughput when the anomaly traffic ratio is below 10%. However, throughput declines rapidly as the ratio increases. This is attributed to the Pre-Filtering module exceeding its load capacity under high anomaly ratios: (a) repeated port hits in five-tuple matching trigger intensive table lookup operations; (b) frequent hash address hits in fast-pattern matching escalate shift-or verification computations. These concurrent factors lead to progressive packet loss.

The above data indicate that the five-tuple matching module and the fast-pattern matching module meet normal operational requirements in a 100 Gbps testing environment, supporting high-throughput line-rate operation. However, throughput loss occurs under extreme conditions characterized by smaller packet sizes and a high proportion of anomalous traffic.

Notably, the maximum throughput here does not reach 100 Gbps because the anomaly ratio tests used a single NIC port for traffic injection, which did not saturate 100 Gbps. This design avoids interference from multi-port configurations. Subsequent tests employ dual-port injection to ensure 100 Gbps baseline conditions.

5.2. System Performance

Table 7 records the accuracy of the FPGA + Snort system under near-100 Gbps traffic with 2000, 6000, and 10,000 rules at a 5% anomaly ratio. Accuracy decreases as rule counts grow due to a design limitation in the rule compiler: fixed memory allocation during compilation increases hash collision probabilities. To prevent mismatches, the compiler forcibly discards colliding rule fields, causing rule data loss and missed detections, thereby reducing accuracy.

Table 8 shows the throughput of Snort3 with 24 cores across rule sets. Accuracy remains 100% because predefined anomalies are guaranteed to trigger rules. Lower throughput arises because Snort alone handles traffic reception, parsing, and forwarding alongside rule matching. In contrast, the FPGA + Snort system offloads reception, parsing, and forwarding to the FPGA, allowing Snort to focus on rule matching using PCIe-transmitted packet segments and rule IDs, thereby significantly boosting throughput and core utilization.

To validate 100 Gbps line-rate capability, we leveraged the Intel Stratix 10 SX2800’s four 100 G optical ports. An additional traffic reception port was added, allocating 1.5 Gbps for transmission tasks. A new aggregation module was integrated into the FPGA design to test dual-port throughput and accuracy.

Figure 10 and Figure 11 confirm that the system achieves 100 Gbps throughput with over 97% accuracy across all three rule sets. This demonstrates that the performance of the compiler and Pre-Filtering module meets the requirements for normal operational environments. Compared to software-only Snort, the FPGA + Snort system delivers 10.5×, 15.4×, and 27.1× throughput improvements for 2000, 6000, and 10,000 rules, respectively, at a marginal accuracy trade-off.

The aforementioned data reflect the performance comparison between the proposed system and the software-based Snort. As summarized in Table 9 this study conducts a performance comparative analysis between the proposed system and industry-related architectures (Pigasus and Fidas, both employing FPGA + CPU architectures). The evaluation metrics include rule capacity, regular expression matching support, compiler configurations, resource utilization, and packet loss onset.

It is imperative to note that the experimental data for the proposed system, Pigasus, and Fidas were obtained under divergent datasets and rule set conditions, with non-identical deployment hardware and configurations. Consequently, performance evaluation based on these results cannot yield precisely quantified comparative metrics across the systems.

As evidenced by the tabulated data, Pigasus and the proposed system exhibit suboptimal resource utilization due to their shared design choice of excluding regular expression matching from FPGA deployment, yet both support over 10,000 rules. Regarding packet loss metrics, Pigasus and Fidas demonstrate superior performance in small-packet processing compared to the proposed system. However, given the absence of rule compiler configurations in these benchmark systems, the proposed system achieves comparable performance while attaining enhanced adaptability to high-throughput traffic scenarios and diverse rule sets through its customizable rule compiler. This advancement significantly improves system adaptability in complex traffic environments, rendering the proposed solution highly competitive in deployment flexibility.

5.3. Discussion

Based on the aforementioned experimental results, the following conclusions can be drawn:

1. The compiler and matching module design satisfy normal operational requirements under the 100 Gbps test environment, supporting high-throughput line-rate operation. As demonstrated in Section 5.1, we evaluated the impact of average packet size and anomalous traffic ratio on module throughput under 100 Gbps traffic, obtaining the system’s throughput performance under extreme conditions. Compared to typical network environments, the results with average packet size exceeding 700 B and anomalous traffic ratio below 10% confirm the system’s capability to fulfill practical deployment demands.

2. The system achieves the predefined targets of supporting 10,000+ rules and 100 Gbps line-rate operation with accuracy exceeding 97%, fulfilling the requirements for large-scale rule set-based IDS in high-throughput networks. As validated in Section 5.2, we tested the system’s throughput and accuracy under 100 Gbps traffic with 2000, 6000, and 10,000 rules, respectively. Comparative analyses against software Snort and industry architectures with similar designs (Pigasus and Fidas) comprehensively demonstrate the system’s superior performance in throughput, rule set scalability, and detection accuracy.

6. Conclusions and Future Work

This paper presents a heterogeneous NIDS and its rule compiler, supporting diverse scenarios and rule scales. Our rule compiler adapts to custom and Snort community rule sets, scaling beyond 10,000 rules—surpassing most existing heterogeneous NIDS in rule support and applicability. The FPGA-based IDS achieves 100 Gbps line-rate operation with 97%+ accuracy at 700 B average packet size and <5% anomaly ratio, consuming only 33% logic resources. Compared to state-of-the-art designs like Pigasus [15] and Fidas [13], our compiler enhances portability, compatibility, and practicality across network environments and hardware deployments.

While the heterogeneous network intrusion detection system and its rule compiler proposed in this paper have achieved significant improvements in throughput and rule compatibility, there are still three key directions requiring further optimization. First, in terms of detection accuracy, the current system’s 97% accuracy still has room for improvement compared to signature-based software IDS, and algorithmic optimization of the rule compiler will be the primary focus for enhancing detection accuracy in subsequent work. Second, at the system architecture level, although dynamic rule compilation and deployment has become a mainstream design paradigm for heterogeneous architectures, our system currently still requires rule set replacement through recompilation and board reprogramming. In the future, we plan to achieve compatibility for this functionality by coordinating hardware–software configurations to shorten deployment cycles and support dynamic synchronous rule updates. Additionally, the current system design does not incorporate any system protection mechanisms, implying that any targeted attacks or operations exceeding system thresholds may result in complete system failure and loss of intrusion detection capability. Future work will focus on implementing system protection mechanisms, establishing optimized protective measures, and deploying emergency response modules to enhance system robustness. Finally, regarding filtering mechanism optimization, the current system only uses the “fast-pattern” in rules for traffic filtering. Future work will focus on expanding the adaptation scope of rule file fields to enhance the filtering precision and execution efficiency of the FPGA Pre-Filtering module, which will form the core research direction for subsequent investigations.

Author Contributions

Conceptualization, Z.L., J.C. and X.Z.; methodology, Z.L., J.C., X.Z. and Q.L.; software, X.Z.; validation, Z.L., X.Z. and Q.L.; formal analysis, Z.L., J.C. and Q.L.; investigation, X.Z.; resources, Q.L.; data curation, Z.L., J.C., X.Z. and Q.L.; writing—original draft preparation, Z.L., J.C., X.Z. and Q.L.; writing—review and editing, Z.L., J.C. and Q.L.; visualization, Z.L. and X.Z.; supervision, Z.L., J.C. and Q.L.; project administration, Z.L., J.C. and Q.L.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in Section 5.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hoover, C. Comparative Study of Snort 3 and Suricata Intrusion Detection Systems. Bachelor’s Thesis, University of Arkansas, Fayetteville, AR, USA, 2022. [Google Scholar]
Li, X. Research and design of network intrusion detection system. In Proceedings of the 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 21–23 January 2022; pp. 1069–1072. [Google Scholar]
Farooq, M.A.; Rafique, A.; Fahmy, S.A.; Arora, A. High Throughput Low Latency Network Intrusion Detection on FPGAs: A Raw Packet Approach. In Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, USA, 3–5 March 2024; p. 183. [Google Scholar]
Roesch, M. Snort: Lightweight intrusion detection for networks. In Proceedings of the LISA ’99: 13 th Systems Administration Conferenc, Seattle, WA, USA, 7–12 November 1999; Volume 99, pp. 229–238. [Google Scholar]
Bashah, N.S.K.; Simbas, T.S.; Janom, N.; Aris, S.R.S. Proactive DDoS attack detection in software-defined networks with Snort rule-based algorithms. Int. J. Adv. Technol. Eng. Explor. 2023, 10, 962. [Google Scholar]
Said, R.B.; Sabir, Z.; Askerzade, I. CNN-BiLSTM: A hybrid deep learning approach for network intrusion detection system in software-defined networking with hybrid feature selection. IEEE Access 2023, 11, 138732–138747. [Google Scholar] [CrossRef]
Shpur, O.; Klymash, M.; Holdiy, A.; Kopets, H. IDPS System Building Approach with Support and Update of Attack Detection Rules. In Proceedings of the 2024 IEEE 17th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), Lviv, Ukraine, 8–12 October 2024; pp. 1–4. [Google Scholar]
Wang, S.Y.; Chang, J.C. Design and implementation of an intrusion detection system by using extended BPF in the Linux kernel. J. Netw. Comput. Appl. 2022, 198, 103283. [Google Scholar] [CrossRef]
Das, T.; Shukla, R.M.; Sengupta, S. Poisoning the well: Adversarial poisoning on ML-based software-defined network intrusion detection systems. IEEE Trans. Netw. Sci. Eng. 2024, 12, 252–262. [Google Scholar] [CrossRef]
Erlacher, F.; Dressler, F. On high-speed flow-based intrusion detection using snort-compatible signatures. IEEE Trans. Dependable Secur. Comput. 2020, 19, 495–506. [Google Scholar] [CrossRef]
Morianos, I.; Georgopoulos, K.; Brokalakis, A.; Kyriakakis, T.; Ioannidis, S. I2DS: FPGA-Based Deep Learning Industrial Intrusion Detection System. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos, Greece, 30 June–5 July 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 165–176. [Google Scholar]
Sheeraz, M.; Durad, H.; Tahir, S.; Tahir, H.; Saeed, S.; Almuhaideb, A.M. Advancing Snort IPS to Achieve Line Rate Traffic Processing for Effective Network Security Monitoring. IEEE Access 2024, 12, 61848–61859. [Google Scholar] [CrossRef]
Chen, J.; Zhang, X.; Wang, T.; Zhang, Y.; Chen, T.; Chen, J.; Xie, M.; Liu, Q. Fidas: Fortifying the cloud via comprehensive FPGA-based offloading for intrusion detection: Industrial product. In Proceedings of the 49th Annual International Symposium on Computer Architecture, New York, NY, USA, 18–22 June 2022; pp. 1029–1041. [Google Scholar]
Vasiliadis, G.; Antonatos, S.; Polychronakis, M.; Markatos, E.P.; Ioannidis, S. Gnort: High performance network intrusion detection using graphics processors. In Proceedings of the Recent Advances in Intrusion Detection: 11th International Symposium, RAID 2008, Cambridge, MA, USA, 15–17 September 2008; Proceedings 11. Springer: Berlin/Heidelberg, Germany, 2008; pp. 116–134. [Google Scholar]
Zhao, Z.; Sadok, H.; Atre, N.; Hoe, J.C.; Sekar, V.; Sherry, J. Achieving 100gbps intrusion prevention on a single server. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), Online, 4–6 November 2020; pp. 1083–1100. [Google Scholar]
Košař, V.; Šišmiš, L.; Matoušek, J.; Kořenek, J. Accelerating IDS using TLS pre-filter in FPGA. In Proceedings of the 2023 IEEE Symposium on Computers and Communications (ISCC), Gammarth, Tunisia, 9–12 July 2023; pp. 436–442. [Google Scholar]
Shaik, S.; SK, N.M. A High Throughput Bloom Filter Based TCAM Architecture for Fast NIDS. In Proceedings of the 2024 International Conference on Smart Electronics and Communication Systems (ISENSE), Kottayam, India, 6–7 December 2024; pp. 1–6. [Google Scholar]
Yalçın, S.S. An FPGA-Accelerated String-Matching Engine. Master’s Thesis, Middle East Technical University, Ankara, Turkey, 2024. [Google Scholar]
Zeng, Q.; Hara-Azumi, Y. Hardware/Software Codesign of Real-Time Intrusion Detection System for Internet of Things Devices. IEEE Internet Things J. 2024, 11, 22351–22363. [Google Scholar] [CrossRef]
Maciel, L.A.; Souza, M.A.; Freitas, H.C. Energy-Efficient CPU+ FPGA-based CNN Architecture for Intrusion Detection Systems. IEEE Consum. Electron. Mag. 2023, 13, 65–72. [Google Scholar] [CrossRef]
Hu, Z.; Hasegawa, H.; Yamaguchi, Y.; Shimada, H. Enhancing detection of malicious traffic through fpga-based frequency transformation and machine learning. IEEE Access 2023, 12, 2648–2659. [Google Scholar] [CrossRef]
Hu, Z.; Hasegawa, H.; Yamaguchi, Y.; Shimada, H. Heterogeneous Network Inspection in IoT Environment with FPGA Based Pre-Filter and CPU Based LightGBM. Available online: https://personales.upv.es/thinkmind/dl/conferences/securware/securware_2023/securware_2023_1_60_30040.pdf (accessed on 24 April 2025).
Ngo, D.M.; Lightbody, D.; Temko, A.; Pham-Quoc, C.; Tran, N.T.; Murphy, C.C.; Popovici, E. HH-NIDS: Heterogeneous hardware-based network intrusion detection framework for IoT security. Future Internet 2023, 15, 9. [Google Scholar] [CrossRef]
Bao, T.H.Q.; Le, L.T.; Thinh, T.N.; Pham, C.K. A High-Performance FPGA-Based Feature Engineering Architecture for Intrusion Detection System in SDN Networks. In Proceedings of the International Conference on Intelligence of Things, Hanoi, Vietnam, 17–19 August 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 259–268. [Google Scholar]
Liu, Y.; Xu, D.; Liu, D.; Sun, L. A fast and configurable pattern matching hardware architecture for intrusion detection. In Proceedings of the 2009 Second International Workshop on Knowledge Discovery and Data Mining, Moscow, Russia, 23–25 January 2009; pp. 614–618. [Google Scholar]
Zhong, J.; Chen, S.; Han, B. FPGA-CPU architecture accelerated regular expression matching with fast preprocessing. Comput. J. 2023, 66, 2928–2947. [Google Scholar] [CrossRef]
Callanan, D.; Kljucaric, L.; George, A. Accelerating regular-expression matching on fpgas with high-level synthesis. In Proceedings of the 9th International Workshop on OpenCL, Munich, Germany, 27–29 April 2021; pp. 1–8. [Google Scholar]
Efnusheva, D.; Cholakoska, A.; Kalendar, M. FPGA Design of IP Packet Filter Based on SNORT Rules. 2020. Available online: https://www.eventiotic.com/eventiotic/files/Papers/URL/f3ba528f-4e75-4d67-b2ec-960c2c3e92ed.pdf (accessed on 24 April 2025).
Orosz, P.; Tóthfalusi, T.; Varga, P. FPGA-assisted DPI systems: 100 Gbit/s and beyond. IEEE Commun. Surv. Tutor. 2018, 21, 2015–2040. [Google Scholar] [CrossRef]
Pham-Quoc, C.; Nguyen, B.; Thinh, T.N. Fpga-based multicore architecture for integrating multiple ddos defense mechanisms. ACM SIGARCH Comput. Archit. News 2017, 44, 14–19. [Google Scholar] [CrossRef]
Wang, X.; Hong, Y.; Chang, H.; Park, K.; Langdale, G.; Hu, J.; Zhu, H. Hyperscan: A fast multi-pattern regex matcher for modern CPUs. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), Boston, MA, USA, 26–28 February 2019; pp. 631–648. [Google Scholar]
Zhou, J.; Ross, K.A. Implementing database operations using SIMD instructions. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA, 3–6 June 2002; pp. 145–156. [Google Scholar]
Mitra, A.; Najjar, W.; Bhuyan, L. Compiling pcre to fpga for accelerating snort ids. In Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems, Orlando, FL, USA, 3–4 December 2007; pp. 127–136. [Google Scholar]
Zhong, J.; Chen, S.; Yu, C. XAV: A High-Performance Regular Expression Matching Engine for Packet Processing. arXiv 2024, arXiv:2403.16533. [Google Scholar]
Cicolini, L.; Carloni, F.; Santambrogio, M.D.; Conficconi, D. One Automaton to Rule Them All: Beyond Multiple Regular Expressions Execution. In Proceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Edinburgh, UK, 2–6 March 2024; pp. 193–206. [Google Scholar]
Woodruff, J.; Ainsworth, S.; O’Boyle, M.F. Secco: Codesign for Resource Sharing in Regular-Expression Accelerators. In Proceedings of the 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 219–224. [Google Scholar]
Song, H.; Lockwood, J.W. Efficient packet classification for network intrusion detection using FPGA. In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 20–22 February 2005; pp. 238–245. [Google Scholar]
Fuller, V.; Li, T.; Yu, J.; Varadhan, K. Classless Inter-Domain Routing (CIDR): An Address Assignment and Aggregation Strategy. Report 2070-1721. 1993. Available online: https://dl.acm.org/doi/abs/10.17487/rfc1519 (accessed on 24 April 2025).
Yang, T.; Yin, B.; Li, H.; Shahzad, M.; Uhlig, S.; Cui, B.; Li, X. Rectangular hash table: Bloom filter and bitmap assisted hash table with high speed. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 837–846. [Google Scholar]
Sun, H.; Sun, Y.; Valgenti, V.C.; Kim, M.S. A highly deterministic hashing scheme using bitmap filter for high speed networking. In Proceedings of the 2015 International Conference on Computing, Networking and Communications (ICNC), Garden Grove, CA, USA, 16–19 February 2015; pp. 778–784. [Google Scholar]
Baeza-Yates, R.; Gonnet, G.H. A new approach to text searching. Commun. ACM 1992, 35, 74–82. [Google Scholar] [CrossRef]
Abdulhammed, R.; Musafer, H.; Alessa, A.; Faezipour, M.; Abuzneid, A. Features dimensionality reduction approaches for machine learning based network intrusion detection. Electronics 2019, 8, 322. [Google Scholar] [CrossRef]

Figure 1. Based on heterogeneous IDS system architecture.

Figure 2. Data processing workflow for the Rule Pre-Filtering module.

Figure 3. Rule compilation process.

Figure 4. Port compilation process.

Figure 5. Protocol and IP compilation process.

Figure 6. Fast-pattern compilation process.

Figure 7. Test environment deployment diagram.

Figure 8. Impact of average packet size on throughput.

Figure 9. Impact of the proportion of anomalous traffic on throughput.

Figure 10. Impact of the number of rules on throughput.

Figure 11. Impact of the number of rules on accuracy.

Table 1. Research on heterogeneous network intrusion detection.

Direction	NO.	Year	Characteristics
Anomaly ¹	[19]	2024	The dual-metric integrated feature selection process is utilized to reduce computational complexity and improve classification accuracy, with the scheme demonstrating low resource utilization.
	[20]	2023	Through continued optimization of hardware energy efficiency, the number of executable operations per unit power consumption can be significantly increased.
	[21]	2023	The FPGA-based frequency transformation feature extraction and CPU-based AF-Packet and ring buffer feature capture achieve real-time detection with an accuracy rate of approximately 0.98.
	[22]	2023	The optimized design increases effective throughput while maintaining lower FPGA resource utilization.
	[23]	2022	The system adopts a high-performance, ultra-low-power pipelined design, achieving a 53.5 times speed increase at 100 MHz.
	[24]	2022	A feature engineering processor for data dimensionality reduction has been designed, which reduces the average loss value of data with minimal resources.
	[25]	2009	Efficient hardware architecture and rapid memory caching mechanism enhance system performance.
Signature ²	[12]	2024	The system utilizes PF RING, DPDK, and smartNIC to achieve near-1G throughput, with system integration with SIEM also discussed.
	[17]	2024	The integration of Bloom filters with TCAM enables faster filtering speeds using less memory, with total system throughput reaching up to 128 Gbps.
	[13]	2022	Fidas employs multi-stage filtering to process regular expressions, and the dual-stack storage scheme identifies hot flows, achieving advanced throughput in pattern matching and flow rate classification.
	[26]	2022	A DFA compression and regular expression decomposition scheme is proposed to address DFA state explosion, achieving 6.33 Gbps matching throughput on Snort rule sets.
	[27]	2021	A solution maintaining stable and high operating frequencies at the cost of resource usage is proposed, achieving over 17 Gbps throughput.
	[28]	2020	An NIDS-style packet hardware filter has been designed, exhibiting low-cost and high-flexibility characteristics.
	[15]	2020	Pigasus utilizes an average of five CPU cores and one FPGA to support 100 Gbps throughput, 10K+ rules, and 100K+ concurrent flows.
	[29]	2019	The system employs an FPGA-based DPI implementation applicable to 100 Gbps and higher DPI task realizations.
	[30]	2016	An FPGA-based multi-core architecture is proposed, enabling collaborative DDoS mitigation techniques to classify input data, ultimately achieving 100% detection rate, 0% false negative rate, and maximum data throughput of 19.738 Gbps.
	[14]	2008	Gnort utilizes GPU offloading for pattern matching, and synthetic network tracing achieves a maximum throughput of 2.3 Gbps.

¹ Anomaly-based IDS. ² Signature-based IDS.

Table 2. Advantages and disadvantages of heterogeneous intrusion detection systems based on anomalies and signatures.

System Type	Advantages	Disadvantages
Anomaly-Based	High accuracy, allowing for the detection of unknown attacks; The algorithm design is more flexible and can adapt to different anomaly scenarios.	High false positive rate, The model is solely dependent on the sample database and algorithm design, resulting in significant fluctuations in performance.
Signature-Based	High accuracy; Low false detection rate; Large throughput.	Only attacks for which the rule is known can be detected, and performance depends on the size of the rule set and the frequency of updates.

Table 3. Shift-or compilation table of “Gizmo”.

Char	ASCII	Offset	The Lower 5 bits of the Next Character	Index
G	0x47	4	0x09	0x0947
i	0x69	3	0x1A	0x1A69
z	0x7A	2	0x0D	0x0D7A
m	0x6D	1	0x0F	0x0F6D
o	0x6F	0	/	0x006F-0x1F6F

Table 4. Rule feature statistics.

File Name	Depth	Width	File Size
Port Table	16,384	32	512 Kb
Port Rule Table	16,384	16	256 Kb
CIDR Table (2 × 4)	8192	48	384 Kb
Five-tuple rule Table	16,384	48	768 Kb
Hash Bitmap Table (8)	256–4096	8	2 Kb–32 Kb
Rule Mapping Table (8)	512–16,384	16	8 Kb–256 Kb
Shift-or Mask Table	16,384	64	1 Mb

Table 5. Rule feature statistics.

Rule Set	Number of Rules	Maximum Length	Minimum Length	Average Length
Custom rule sets	2000	209	1	19
snortrules-6000	6000	352	1	28
snortrules-10000	10,000	253	2	28

Table 6. Hardware resource utilization statistics.

Resources	ALMs	Registers	M20K Blocks	Block Memory
Design	307,930 (33%)	497,889 (26%)	3034 (26%)	54.2 (24%)

Table 7. Throughput and accuracy of FPGA Snort system at 98Gbps.

Rule Set	Number of Rules	Throughput (Gbps)	Accuracy (%)
Custom rule sets	2000	98.92	99.76
snortrules-6000	6000	98.95	99.04
snortrules-10000	10,000	98.87	97.47

Table 8. Maximum throughput measured by Snort to ensure the normal operation of the system.

Rule Set	Number of Rules	Throughput (Gbps)	Accuracy (%)
Custom rule sets	2000	8.72	100
snortrules-6000	6000	6.13	100
snortrules-10000	10,000	3.57	100

Table 9. Comparison of this system with Pigasus and Fidas.

Name	Rule Nums	Regular Expression Matching Support	Compiler Configurations	Resource Utilization	Packet Loss Onset
This system	10,000	No	Yes	33%	700 B
Pigasus	10,000	No	No	29.6%	500 B
Fidas	/	Yes	No	59.6%	770 B

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, Z.; Zhang, X.; Liu, Q.; Cui, J. Design of a Heterogeneous-Based Network Intrusion Detection System and Compiler. Appl. Sci. 2025, 15, 5012. https://doi.org/10.3390/app15095012

AMA Style

Lin Z, Zhang X, Liu Q, Cui J. Design of a Heterogeneous-Based Network Intrusion Detection System and Compiler. Applied Sciences. 2025; 15(9):5012. https://doi.org/10.3390/app15095012

Chicago/Turabian Style

Lin, Zhigui, Xiaofeng Zhang, Qi Liu, and Jun Cui. 2025. "Design of a Heterogeneous-Based Network Intrusion Detection System and Compiler" Applied Sciences 15, no. 9: 5012. https://doi.org/10.3390/app15095012

APA Style

Lin, Z., Zhang, X., Liu, Q., & Cui, J. (2025). Design of a Heterogeneous-Based Network Intrusion Detection System and Compiler. Applied Sciences, 15(9), 5012. https://doi.org/10.3390/app15095012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of a Heterogeneous-Based Network Intrusion Detection System and Compiler

Abstract

1. Introduction

2. Related Work

2.1. Heterogeneous Intrusion Detection System

2.2. Rule Compilation

3. Materials and Methods

3.1. Requirements Analysis

3.2. FPGA-Based Pre-Filtering Design

Design of the Rule Pre-Filtering Module

3.3. Design of CPU-Based Intrusion Detection and Rule Compilation

3.3.1. Rule Compilation Module

3.3.2. Snort Module

4. Deployment of Experimental Environment

4.1. Platform Construction

4.2. Evaluation Criteria

5. Results and Discussion

5.1. Compiler and Matching Module Performance

5.2. System Performance

5.3. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI