PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud

Fu, Wenwen; Yan, Jinli; Zhang, Jian; Sun, Yinhan; Wang, Yong; Zhang, Ziwen; Yang, Qianming; Wang, Yongwen

doi:10.3390/electronics14132650

Open AccessArticle

PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud

by

Wenwen Fu

^1,2,

Jinli Yan

^3,*,

Jian Zhang

^1,2,

Yinhan Sun

¹

,

Yong Wang

^1,2,

Ziwen Zhang

¹,

Qianming Yang

^1,2 and

Yongwen Wang

^1,2

¹

College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

²

Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology, Changsha 410073, China

³

National Innovation Institute of Defense Technology, Beijing 100071, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2650; https://doi.org/10.3390/electronics14132650

Submission received: 12 May 2025 / Revised: 7 June 2025 / Accepted: 9 June 2025 / Published: 30 June 2025

Download

Browse Figures

Versions Notes

Abstract

Integrated security stacks, which offer diverse security function chains in a single device, hold substantial potential to satisfy the security requirements of multiple tenants on a public cloud. However, it is difficult for the software-only or hardware-customized security stack to establish a good tradeoff between performance and flexibility. SmartNIC overcomes these limitations by providing a programmable platform for implementing these functions with hardware acceleration. Significantly, without a professional CPU/SmartNIC co-design, developing security function chains from scratch with low-level APIs is challenging and tedious for network operators. This paper presents PASS, a flexible programmable framework for the fast development of high-performance security stacks with SmartNIC acceleration. In the data plane, PASS provides modular abstractions to extract the shared security logic and eliminate redundant operations by reusing the intermediate results with the customized metadata. In the control plane, PASS offloads the tedious security policy conversion to the proposed security auxiliary plane. With well-defined APIs, developers only need to focus on the core logic instead of labor-intensive shared logic. We built a PASS prototype based on a CPU-FPGA platform and developed three typical security components. Compared to implementation from scratch, PASS reduces the code by 65% on average. Additionally, PASS improves security processing performance by 76% compared to software-only implementations and optimizes the latency of policy translation and distribution by 90% versus the architecture without offloading.

Keywords:

public cloud; security stack; FPGA; SmartNIC acceleration; programmable framework

1. Introduction

Currently, the public cloud is becoming very popular with multi-tenant systems for dynamically delivering on-demand computing and storage resources, a process which effectively reduces both capital expenditure (CAPEX) and operational expenditure (OPEX) [1]. Since more and more enterprises and individuals are migrating their businesses and networks to the public cloud, building a robust network security environment for multi-tenant systems is becoming a critical challenge for public cloud providers [2].

There are three important security protection features in the public cloud. (1) Boundary protection. Considering that diverse attacks are launched from the external network, it is important to deploy network protection devices in the boundary of the public cloud. (2) Multi-function deployment. In the public cloud, the business of each tenant is conducted in a dedicated virtual network. There are different security function requirements for multi-tenant systems [3]. (3) Flow-based security service orchestration. Similar to the flow-based quality of service (QoS) in a network, the granularity of security protection is flow based in order to establish a good tradeoff between performance and security [4].

To satisfy the above requirements, the integrated security stack concept is proposed for highly efficient collaboration of multiple security functions on a single system. However, deciding how to implement an integrated security stack is an open quandary for users considering the function, performance, cost, etc. After researching the related work, we summarize that there are three typical roadmaps for building an integrated security stack:

(1) Customized hardware design. Traditional security middleboxes (e.g., firewall and intrusion detection systems) are built and optimized based on customized ASIC chips for specific purposes. Each individual middlebox provides high packet processing performance while exhibiting limited programmability [5]. The system-level overhead for building a security function chain is substantial due to connecting these security middleboxes with network cables.

(2) Software-only design. Network function virtualization (NFV [6]) has enabled public cloud providers like Amazon and Microsoft to deliver high flexibility through software-based network functions deployed on general-purpose servers [7]. Recently, some software solutions (e.g., DPDK [8], VPP [9], etc.) have been developed to accelerate packet processing with zero-copy and vectorized optimizations. While 40G/100G line-rate forwarding performance in Layer 2–3 could be reached with the above methods, software-only implementations face inherent limitations in throughput and latency when handling complex security processing tasks in Layer 4+ (e.g., stateful functions or deep packet inspection) [10]. Since many security functions are in-line deployed to process flows without a traffic mirror, existing software solutions struggle to satisfy service-level agreements (SLA).

(3) Software/Hardware co-design. In order to balance processing performance and flexibility, software/hardware co-design based on CPU and programmable hardware has emerged as a promising solution [11]. Current programmable hardware solutions include GPUs, FPGAs, network processors (NPs), and programmable ASIC chips. GPUs are typically deployed in look-aside mode, where the communication between the CPU and GPUs introduces relatively greater processing latency [12]. Programmable ASICs implement functions based on match-action rules, lacking support for complex operations (e.g., stateful processing) [13]. Compared to NPs, FPGAs use non-Von Neumann architecture, which has great potential in accelerating complex applications with lower process latency [14].

Compared with the three roadmaps above, the CPU/FPGA-based SmartNIC proposal could establish a good tradeoff between performance and flexibility. In this paper, we focus on accelerating security functions in the public cloud with CPU/FPGA-based SmartNICs. Microsoft has deployed FPGA-based SmartNICs in Azure cloud for years to accelerate its Virtual Filter Platform (VFP [15]), where cloud administrators map VFP-configured functions (e.g., ACLs, VNETs, and load balancers) into related actions implemented on SmartNICs. By aggregating rules from all functions through the group-based flow table (GFT) in SmartNICs, the processing performance is optimized with a fast hardware path [16].

Although the practices of Microsoft demonstrate advantages in their customized scenarios, there are two critical restrictions for network operators aiming to develop and deploy security functions on CPU/FPGA-based SmartNIC platforms. (1) Substantial development cost. Development based on CPU/FPGA-based SmartNIC platforms requires professional hardware/software co-design capabilities that most tenants and network operators do not have [17]. It is time-consuming and tedious to develop functions from scratch with low-level APIs to meet diverse security requirements. (2) Inefficient inter-function cooperation. Many designs emphasize improving the intra-function performance while disregarding the system-level co-design among functions. In such fragmented designs, there is a great deal of redundant logic, and the frequent cross-function communications are the key bottlenecks of system performance.

To optimize the system performance, we start from analyzing the features of security functions and there are three critical features summarized. (1) As for these security detection functions based on packet payloads, there is great overhead on mapping the security policies from users into executable security rules in the data plane. (2) Although the processing logic of each security function is complex, there are many shareable operations among these functions. (3) The security functions do not modify packet data. Also, the security function chain that each packet passes is determined by the flow identifications and the processing result in each function only determines whether to bypass the subsequent functions or not. These observations above provide the theoretical foundation for the architecture and optimization design in this paper.

We propose PASS, a flexible programmable framework for building integrated security stack in public cloud. PASS provides well-defined APIs for users to develop security functions efficiently while delivering high-performance processing ability with FPGA-based SmartNIC acceleration. By referring to the software-defined networking (SDN) architecture [18], PASS enables centralized security policy orchestration based on a security controller. It eliminates per-function configuration operations while improving the efficiency of security management.

The main characteristics of PASS include three aspects. (1) PASS divides the system architecture into three planes: control plane, security auxiliary plane and data plane. In order to support fast translation and distribution of security policies, PASS offloads these complex security policy actions into security auxiliary plane. It reduces the processing latency and generated policy data volume greatly compared with the typical two-layer architecture. (2) PASS optimizes the inter-function processing flow by extracting shared packet operations (e.g., packet parse, packet classification) as pre-processing modules. It provides users with flow-granularity orchestration of security function chains. All the shared control information is carried with a user-defined metadata to support the elimination of redundant processing logic. (3) PASS provides users with high-level APIs to hide the generic low-level logic into platform-specific libraries. It divides a typical security function into three stages: rule parsing, packet processing, and statistics reporting. By providing the generic logic (e.g., resource management, match algorithms, etc) as library, developers only need to focus on designing core processing logic and data structure. Moreover, the modules developed with the same APIs could be reused easily.

We implement PASS prototype on CPU/FPGA-based SmartNIC platform (based on FAST framework [19]). Based on this platform, three typical security functions (packet-filtering firewall, stateful firewall, and intrusion detection system) are developed with C/Verilog code. The advantages of PASS in the evaluation are summarized in three aspects. (1) Compared to the security functions developed from scratch without module reuse, PASS reduces the code by 65% on average. (2) Compared to software-only implementations, PASS improves security processing performance up to 76%. (3) Compared to the traditional two-plane design, where all the policy-related operations are executed in the centralized control plane, the latency of policy translation and distribution is reduced by up to 90% by offloading these operations into the security auxiliary plane.

Overall, the contributions are described as followed.

(1): We present the motivations of developing security functions based on CPU/FPGA-based SmartNIC platforms, and propose the design goals of programmable security development framework PASS (Section 2).
(2): We design the software-defined three-layer architecture of PASS, and provides the key optimizations and programming models (Section 3).
(3): We implement PASS framework and three typical security functions to verify the technical feasibility (Section 4).
(4): We build the experimental testbed and evaluations show that PASS optimizes the packet processing performance and policy distribution latency by 76% and 90%, respectively (Section 5).

2. Motivation

2.1. Programmable Platforms

Currently, network researchers build homogeneous or heterogeneous programmable platforms [20] to meet the growing demand for network programmability using CPUs, FPGAs, GPUs [21], programmable chips (PCs) [22], and network processors (NPs) [23]. In public cloud environments, there are four key requirements for programmable platforms to support the deployment of security-critical functions.

(1): High Performance. With the rapid expansion of business and resource scale in the public cloud, the throughput performance of network and security devices is required to reach 100G/400G. In addition, to satisfy the QoS (quality of service) requirements of latency-sensitive applications, the underlying network infrastructure should deliver data to the destination within the specified deadline. Currently, many security functions (e.g., firewalls, intrusion prevention systems, etc.) are deployed in the packet processing path using in-line mode [24], which increases end-to-end latency due to security processing overhead. Thus, enhancing security while ensuring low-latency packet transmission remains a challenge in the design and implementation of security functions.
(2): High Flexibility. In the public cloud, the security function chains traversed by flows from different or the same tenants are diverse. Moreover, the types of security functions to be deployed may change dynamically as the network state evolves [25]. Therefore, programmable platforms should support flexible security policy configuration and dynamic security function reconfiguration.
(3): Multi-function Support. Since public clouds serve the security needs of multiple tenants using shared resources, programmable platforms should support the simultaneous deployment of multiple security functions. This approach improves resource utilization efficiency while reducing latency caused by inter-function cooperation. Furthermore, strict resource isolation should be enforced to prevent interference among tenants.
(4): Low Cost. It is essential for public cloud providers to minimize cost while meeting user requirements. The primary costs associated with deploying programmable platforms include equipment modification and power consumption [26]. Thus, leveraging existing commercial server resources and deploying programmable devices in an incremental plug-in mode are considered optimal solutions.

The characteristics of different programmable platforms in existing research are compared in terms of performance, flexibility, cost, deployment point, and networking mode, as shown in Table 1.

(1): Performance. Although software processing performance can achieve 40 G/100 G line rate in L2–L3 forwarding with optimization techniques (e.g., DPDK, VPP), there are significant gaps between software and hardware solutions in L4+ layer applications. Programmable switches face similar limitations due to the lack of relevant actions. GPUs are deployed in look-aside mode and require frequent memory copy operations, resulting in high interaction latency. Thus, GPUs are more suitable for accelerating AI and big data applications. In contrast, FPGAs are deployed in in-line mode and provide a fast path for packet processing. With advantages in cost-effectiveness and power efficiency, FPGAs are popular for accelerating complex network applications with lower processing latency.
(2): Flexibility. Programmable switches implement functions with multi-stage match-action tables, which provide limited support for stateful functions. Similarly, ASIC-based switches or FPGA solutions are suitable for the implementation of simple functions. Complex functions are better handled by CPUs. FPGAs are popular as platforms supporting logic offload based on the characteristics of security functions.
(3): Cost. Radically replacing commercial switches with programmable switches in the public cloud incurs high costs. By contrast, deploying SmartNICs incrementally on host servers involves lower expenses. Generally, SmartNICs are implemented based on FPGAs or NPs. FPGAs are designed using a non-von-Neumann architecture, which makes them more suitable for accelerating packet stream processing. Furthermore, incremental FPGA deployment effectively minimizes CPU resource consumption.
(4): Deployment and Networking. The underlay networking mode is required for deploying security functions at the switch layer. It allows users to access physical switch resources directly. By contrast, overlay networking is necessary to deploy security functions at the host layer. Although the design complexity of endpoint systems increases, the overlay networking mode simplifies the deployment of security functions.

To sum up, the best platform selection for satisfying the four requirements mentioned above is the host CPU/FPGA-based SmartNIC. Currently, Microsoft has been deploying FPGA-based SmartNICs in its public cloud at scale for years. Microsoft’s solution provides a fast hardware path for specific scenarios. However, there is an urgent need to provide a programmable framework that allows users to develop high-performance network security functions through hardware/software co-design.

2.2. Network Security Function

In order to support the design of a programmable development framework, we conduct an in-depth study on the characteristics of network security functions. The processing workflow of security functions is abstracted into three sequential phases (Decision -> Execution -> Feedback). (1) Decision phase. User applications dispatch abstracted security policies after analyzing the current network states. These policies are then translated into executable security rules. (2) Execution phase. The data plane performs security analysis on the traffic according to the defined rules. This phase consists of protocol parsing, packet header-based detection, packet payload-based detection, and execution of security actions. (3) Feedback phase. The data plane reports statistical information to the control plane for further security analysis and policy decision-making.

Security detection and protection are carried out by performing these three phases cyclically, which helps prevent user networks from security attacks. Next, we analyze security functions from two aspects as follows.

(1) Security function deployment. (a) Diversity. The network topology and internal traffic patterns differ across multi-tenant environments, resulting in diversity in the types and deployment locations of security functions. In addition, the security detection requirements for different flows belonging to the same tenant may also vary. (b) Dynamics. On one hand, the types of flows on a link change as user applications start or terminate. On the other hand, the bandwidth of flows varies over time. For example, the number of data accesses during the day is significantly higher than at midnight. These two factors require security devices to support dynamic, flow-based orchestration of security functions.

(2) Security function design. (a) Security rules are complex, especially in packet payload detection applications. We take Snort [33] as an example. Each security rule consists of a rule header and multiple rule bodies. The rule header and bodies describe flow and attack features, respectively. Since the attack features are diverse, the translations from abstracted security policies to executable security rules introduce considerable overhead. (b) There are many shared operations among these complex security functions. Snort usually performs exact matching and regex matching on a packet multiple times. The stateful firewall (SFW) needs to support the management of TCP flow states. Much of the processing logic (e.g., packet parsing, matching, etc.) and related intermediate results can be shared among these functions. The exact and regex matching actions are the same across the Intrusion Detection System (IDS), the Web Application Firewall (WAF), and Data Loss Prevention (DLP), although their rule sets differ. (c) Security functions do not modify packet data. Unlike network functions (e.g., NAT, load balancer), which require modification of packet data (e.g., addresses, ports, etc.), security functions only need to match packet data and perform PASS/DROP actions. For network function chains, the action affects the function path that packets traverse. By contrast, security function chains are determined by flow features, which are unrelated to the actions.

2.3. Comparison of Typical Programmable Frameworks

In this paper, we aim to provide a flexible programmable security development framework based on hardware/software co-design within the SDN architecture. SDN decouples the network architecture into control and data planes, enabling centralized control policy generation and distribution via a logically centralized controller. It significantly improves network management efficiency by using global network resources as input. In this paper, three typical software-hardware co-design frameworks (VFP, OpenBox [27], and ClickNP [32]) are compared with PASS in Table 2.

(1) Microsoft VFP. It orchestrates network and security function chains based on a host-based SDN architecture. Network administrators deploy specific service function chains (SFCs) on VFP according to application requirements, where each function is managed by a dedicated controller to reduce inter-component dependencies and improve scalability. However, the architectural design of VFP is not intended to provide a unified programmable framework. First, it is tedious and time-consuming for network operators to develop and manage multiple controllers. In addition, distributing SFC policies through multiple controllers introduces greater complexity. Second, the inter-function connections are fixed, where the input/output flows traverse the same SFC in reverse order. This makes it difficult to support flow-based SFC deployment and orchestration.

(2) OpenBox. It presents an abstraction of packet processing applications for the development and deployment of network functions. In addition, it decouples the control plane of network functions from the data plane and allows the reuse of data plane elements by multiple logical NFs. However, as a general network function development framework, it lacks customized optimizations for security functions. For example, it is particularly important to provide design guidance for mapping security functions to hardware/software and control plane/data plane components.

(3) ClickNP. It focuses on accelerating network functions with programmable hardware. It provides a modular architecture, resembling the well-known Click model. In addition, it provides a high-level C-like language to program FPGAs efficiently and proposes a set of optimization techniques to utilize the parallelism in FPGAs and reduce I/O overhead. Since it focuses on providing a framework for developing network functions, optimizations for the design of security function chains and the control plane are not carried out in ClickNP.

2.4. Design Goals

In conclusion, developing a high-performance programmable framework is urgently needed for implementing security functions with the introduction of SDN and SmartNICs. By referring to the security function characteristics in Section 2.2 and the comparison in Table 2 in Section 2.3, we propose three key design goals for a software-defined, programmable security framework with SmartNIC acceleration.

G1: Dividing security functions between the control plane and data plane rationally.

SDN proposes to decouple the network architecture into control and data planes. However, the detailed division between control logic and execution logic should be determined based on specific scenario characteristics. Fresco [34] proposes a modular security development model for extending security functions within the controller. This design requires forwarding a large number of packets to the controller, which can easily make the controller a performance bottleneck. Avant-Guard [35] and OFX [36] propose migrating security processing logic from the control plane to the data plane, which greatly reduces cross-plane data volume. These studies mainly focus on optimizing packet data processing, while neglecting the complexity of security rule generation and distribution. Specifically, when the translation from high-level abstract policies to low-level executable rules is performed in the control plane, it introduces high cross-plane communication latency and generates a large volume of packet data. Therefore, it is particularly important to reconsider the rational division of functions between the control and data planes.

G2: Accelerating security function chains in the data plane.

The packet processing overhead in the data plane consists of I/O overhead and security processing overhead. I/O overhead arises from packet receiving and transmitting operations. Currently, existing I/O acceleration frameworks (e.g., DPDK [8], netmap [37], etc.) optimize I/O performance using zero-interrupt and kernel-bypass techniques. Compared to network functions, the proportion of packet data processing to I/O processing is higher in security functions due to the deep analysis of packet contents. In this paper, we emphasize reducing security processing overhead from both intra-function and inter-function perspectives.

Intra-function acceleration is achieved by offloading performance- and resource-critical logic to the SmartNIC. HEX [38] divides security processing into six phases and implements them on NetFPGA, with software applications only analyzing alert information. In practice, some complex security processing logic, such as packet payload intrusion detection, consumes significant resources and incurs high development costs. Thus, it is more suitable to implement such logic in software.

The inter-function acceleration focuses on optimizing the cooperation of SFCs. Although Openbox [27] proposes implementing shared header parsing and classification at the initial processing stage, there are two critical limitations. First, the connections between modules are fixed, which is not adaptable to the requirements of dynamically deploying SFCs. Second, intermediate results should be reused through rational hardware–software co-design.

Thus, there are still significant optimization opportunities for accelerating both intra-function and inter-function processing, which are deeply explored in this paper.

G3: Improving development efficiency with well-defined APIs.

Besides performance optimizations, designing a SmartNIC-accelerated security development framework requires enhancing user development efficiency. Specifically, for some complex security functions (e.g., Snort), where the operations optimizable by hardware acceleration are limited, providing well-defined APIs to accelerate the development cycle is very important. Here, the southbound protocols and related APIs between the control and data planes have a significant impact on development effort and complexity.

As we know, OpenFlow [39] is the most widely used southbound protocol in network management. However, it focuses on protocol universality while lacking support for security functions. OFX supports the development and deployment of security functions by extending the flow tables and actions of OpenFlow. Although this evolutionary design is compatible with OpenFlow, it lacks a detailed protocol specification to support functionality and reliability. As for programming interfaces, some studies provide users with high-level APIs by customizing dedicated operating systems for specific scenarios. For example, mOS [40] offers users flow state management services, where security functions are efficiently developed based on state events. Therefore, in order to accelerate security function development, it is important to enable users to focus on designing core data structures and processing logic by abstracting away underlying communication, resource management, matching algorithms, etc.

3. PASS Design

3.1. Overview

By referring to the SDN architecture and the features of security functions, the PASS framework is divided into three collaborative planes, as shown in Figure 1. The security controller runs in the control plane for global security management. The newly added security auxiliary plane offloads high-overhead operations from the PASS agent. Hardware/software security processing on packet data is performed in the data plane. Cross-plane communication is implemented based on the customized PASS protocol.

Security Controller. It processes the input security policies from the network manager and the statistical data reported from the data plane. In order to minimize the latency of policy distribution and reserve more computing resources for alert analysis, the controller only needs to dispatch abstract policies to the security auxiliary plane.

PASS Agent. It supports high-overhead control logic offloading with three functions. First, it accelerates security rule translation and distribution using cached security rule sets. Second, it supports dynamically reconfiguring the security functions. Finally, the data volume is reduced by compressing the security statistics through the PASS agent.

Security Functions. PASS divides a security function into a software security function (sSF) and a hardware security function (hSF) rationally, referred to in Section 3.3. The sSFs and hSFs are mapped to the CPU and FPGA-based SmartNIC, respectively. The sSFs are developed with PASS APIs, and their run-time environment is OS process. The OS process is selected because it provides a good trade-off between inter-sSF isolation and processing efficiency. Packet forwarding among sSFs is implemented via virtual switches (vSwitches). The hSFs are developed and deployed as FPGA modules, and the rules in hSFs are configured via the corresponding sSFs.

3.2. Unified Rule Management in PASS Agent

In a typical SDN architecture, controllers translate user-defined high-level security policies into matchable rules in the data plane. Notably, the number of received rules and the rule storage algorithms vary across different security functions. Thus, for an SFC policy, the positions for storing the mapped rules in different functions may differ. For example, a network operator inputs a policy to control flow A traversing a packet-filtering firewall (PFW) and an IDS in sequence. The security controller maps this policy into two security rules and dispatches them to the PFW and IDS, respectively. Since the 1st entry of the PFW and the 2nd entry of the IDS are already occupied, the newly received rules are stored in the 2nd entry of the PFW and the 3rd entry of the IDS, respectively, as shown in Figure 2a. There are three limitations in the design described above: (1) Rule translation for payload-based detection applications (e.g., IDS) requires looking up content-based rules from a database. In most cases, the number of mapped rules and the distribution latency are high. (2) The entry positions used to store rules for functions within the same SFC differ, which makes reuse of matching results impractical. (3) The entire rule headers must be carried when delete instructions are issued.

To overcome these challenges, policy translation and unified indexing for security rules are implemented in the PASS agent, as shown in Figure 2b. In our design, the controller only distributes user-defined policies to the agent. The agent decomposes the policy when a flow traverses multiple security functions. It prefetches all security rules from the controller and provides local access services for both the agent and security functions. The agent allocates a globally unique identifier to each security rule. Thus, it only needs to dispatch the rule identifier to the related security functions. In detail, a global rule table (GRT) is built in the agent for allocating unique identifiers (fid) to each rule. Considering the control plane may perform frequent addition and deletion operations on security rules, we design an algorithm to allocate the entry with the smallest fid to improve the resource utilization of rule tables. The PFW and IDS store the security rules in the table according to the received fid without extra computing operations.

There are three advantages in our design. First, the rule translation latency and rule data are greatly reduced in the control plane. Second, the globally unique fid can serve as the index for rule lookup in inter-function cooperation. Third, rules can be deleted using their fids without requiring the complete rule information.

3.3. Mapping Security Function into SW/HW

In our design, the processing model of security functions is abstracted into two paths, as depicted in Figure 3. The first path consists of “protocol parse -> header-based detection -> action”. Similar to the OpenFlow protocol, the protocol parser identifies key metadata (e.g., packet type, address, etc.) for the subsequent modules. Header-based detection is categorized into stateless and stateful detection. Stateless detection performs matching operations on packet headers, while stateful detection requires executing state analysis on the managed flow states. The post-detection action consists of two parts. First, it generates statistics for further analysis in the controller. Second, it performs packet forwarding or drop operations. Path 1 is suitable for implementation on FPGA-based SmartNICs due to the limited number of detection fields and rules. L3–L4-related security functions (e.g., PFW, DDoS detection systems, and SFW) can be mapped onto Path 1.

Compared to Path 1, the more complex payload-based detection is performed in Path 2. Path 2 is abstracted as “protocol parse -> header-based detection -> payload-based detection -> action”, where the protocol parsing and header-based detection are performed in FPGA, as in Path 1. Since the processing logic and rules in payload-based detection are far more complex, implementing them fully in FPGA is impractical due to the high development and resource costs. Thus, it is more suitable to implement them in software or accelerate partial logic with hardware.

In order to shorten the processing path by eliminating redundant logic, we design a metadata structure and attach it in front of each packet to transmit intermediate results. The metadata is composed of generic fields and user-defined fields, as shown in Figure 4. The generic fields include fid, path, and action. fid is used to distinguish different flows, while path contains information about the security function chain. The action field includes To_CPU, To_PORT, and DROP. The user-defined fields are used to store intermediate results. The information extracted or generated by former security functions can be reused in subsequent security functions. In particular, the security devices are usually deployed at the network edge. Thus, metadata is used for cooperation among security functions within a device. When the packet is transmitted to the network or end system, the metadata in each packet is removed.

3.4. Cooperation Between Security Functions

PASS enables cooperation among security functions within a single device. It effectively eliminates cross-device latency in a SFC and supports shortening the processing path. Since PASS divides a security function into sSF and hSF, there are two HW/SW interaction modes when a flow traverses a SFC.

In mode 1, the complete processing logic in each function is executed sequentially, as illustrated in Figure 5a. In detail, the “hSF -> sSF -> hSF -> sSF” path involves four I/O communications between HW and SW. The number of I/O accesses continuously doubles as the SFC length grows. In mode 2, packets are forwarded to sSFs only after all the hSFs in the SFC have processed them, as illustrated in Figure 5b. The “hSF -> hSF -> sSF -> sSF” path involves only two I/O communications between HW and SW. The number of I/O accesses remains unchanged as the SFC length grows. Compared to mode 1, we adopt mode 2 to reduce I/O access. In our design, all the intermediate results of hSFs are stored in the metadata (as described in Section 3.3) to provide reusable information directly to sSFs.

Both the stateless and stateful detection functions include L3–L4 packet header matching functions. Since packet header parsing and classification are shared operations, we extract these operations as a shared hSF in the pre-processing stage. The pre-process sSF is used to configure the rule tables in the pre-processing hSF. In addition, SFC orchestration can be completed in the pre-processing stage because the SFC that a flow traverses depends only on the flow features. The header-based classification table consists of packet header features as keys (such as five tuples), fid, and path. When a packet is processed by the pre-processing hSF, the metadata attached to the packet carries the fid and path. Each subsequent hSF determines whether to process the received packet by matching the carried path information with its built-in path selector module. If a packet does not need to be processed by a subsequent sSF/hSF according to previous actions, the corresponding sSF/hSF identifiers are removed from the path information. Furthermore, all subsequent sSFs/hSFs can index the related entries based on the fid without relying on unnecessary flow features.

We take an example to illustrate the PFW+IDS chain in Figure 6. (1) The operators input a security policy to the security controller. The policy is that the HTTP packets from A to B should go through PFW (No.1) and IDS (No.3) successively. The controller distributes the policy to all the security devices in the data plane. (2) Upon receiving this policy, the PASS agent computes the fid, path, and rules according to this policy. These rules are then dispatched to the shared Pre-process sSF, PFW sSF, and IDS sSF, respectively. (3) The classification table in the shared Pre-process sSF stores the key (flow feature), fid, and path. When packets from A to B arrive, the second flow entry is hit. It indicates that the fid and path are 1 and (1,3), respectively. (4) The path selector module in each hSF uses the path to decide whether this packet should be processed by that hSF. The PFW hSF performs fid-based filtering, and the resulting action indicates to pass this packet. (5) Since the SFC does not involve SFW, the path selector in the SFW hSF transmits this packet directly to the IDS hSF. (6) The IDS hSF analyzes whether it is an HTTP packet and stores the extracted HTTP type (GET or POST) in the user-defined metadata for further payload-based detection in the IDS sSF. (7) When the packets are mirrored to the IDS sSF, payload-based security analysis is performed based on the cached rule bodies set, using the carried fid as an index. If threats are detected, alerts and logs are reported to the controller for further actions.

3.5. PASS Programming Model

In order to improve development efficiency, PASS provides a unified programming model to support module reuse among different users. PASS is a modular development framework, where security functions are divided into user-specific modules and platform-specific modules, as shown in Figure 7. The PASS APIs and the PASS southbound protocol are the core components of the PASS programming model.

(1) PASS APIs. Platform-specific modules hide the underlying complex implementation by providing standardized interfaces. It consists of the FAST library, FAST OS, and PASS library. The FAST library and OS are provided by the open-source FAST project [19]. The complex implementations of PCIE/DMA/I/O drivers are transparent to developers via the FAST library and OS. We provide PASS APIs to hide communication between security functions and the agent, CPU resource management, complex matching algorithms, etc. The main PASS APIs are listed in Table 3. This allows developers to focus on the core security logic development. Developers only need to design and register three user-defined callbacks to the platform. The platform allocates CPU resources and creates threads to run them. Specifically, rule_mgt_callback() is used to parse and configure rules into the rule tables of sSF and hSF. pkt_handler_callback() is used to perform complex logic (such as payload matching) with security rules. log_mgt_callback() is used to collect and analyze log information. The packet parsing and classification in hSFs are platform-specific modules. In addition, we provide Stride BV matching algorithms as IP blocks. The input to each hSF is metadata and packet. Therefore, users only need to design the hardware rule table and core state machine.

We program a simplified Snort with PASS as an example in Figure 8. The Snort security functions are divided into IDS sSF (left part) and IDS hSF (right part). (1) IDS sSF. The developers only need to implement three callback functions, including ids_rule_callback(), ids_pkt_callback(), and ids_log_callback(). Specifically, PASS provides users with shared functions for exact match and regex match on the packet payload. (2) IDS hSF. Since the packet header-based parsing and classification are performed in the pre-process hSF, the IDS hSF determines whether to process the received packets based on the path information in the input metadata. Then, IDS hSF extracts the application-level protocol fields (e.g., HTTP) and determines whether to direct the packets to the IDS sSF based on ids_hw_flow_table. In particular, PASS provides users with reusable lookup algorithms (e.g., Stride BV).

(2) PASS Southbound Protocol. Existing security function development frameworks support cross-plane communication by extending the OpenFlow experimental messages. Currently, different developers propose various extension proposals due to the lack of unified specifications. In this paper, we propose the PASS southbound protocol based on an analysis of the characteristics of security functions. The PASS southbound protocol format consists of a general packet head and a sub-protocol packet, as shown in Figure 9. (a) General packet head. It includes protocol version, packet type, and packet length. (b) Sub-protocol Packet. The sub-protocol packets store function-specific messages. There are six types of sub-protocol packets, as shown in Table 4. Each sub-protocol packet consists of a customized packet head and body.

We take the rule management message as an example in Figure 9. The packet head contains Device_ID (the identifier of the device that receives this message), Rule_Type (the type of rule, e.g., PFW, SFW, or IDS), Rule_Op (the rule operation, e.g., add, delete, or update), Rule_Group_ID (the group identifier of the rule), Rule_Total_Num (the number of rules in the same group), and Current_Rule_ID (the index of the current rule in this message). To improve lookup efficiency, the rule bodies associated with the same rule header belong to the same rule group. The rule information is described using a Key -> Value format based on JSON for high flexibility.

4. PASS Implementation

The PASS implementation consists of the security controller, the security agent, and the security functions.

In the control plane, we develop a lightweight security controller in Java (2000 lines of code). We select Java for three main advantages. (1) Rich libraries for network programming, e.g., RESTful APIs and the OSGi framework. These libraries accelerate the development of security controller functions, such as southbound protocol parsing and northbound API design. (2) Cross-platform portability. Java-based controllers can run on different operating systems without rewriting code for specific hardware. This is crucial for building cross-vendor, multi-data-center SDN architectures. (3) Distributed clustering. Java’s distributed computing frameworks enable multi-node controller clusters with high availability and load balancing. This supports the development of distributed security controllers to enhance robustness in the future.

Although there are already many open-source SDN controllers (e.g., Floodlight [41], RYU [42], ODL [43], etc.), we choose to develop a controller from scratch for two reasons. (1) PASS uses a customized southbound communication protocol, whereas these open-source controllers are based on the OpenFlow protocol. In the future, the PASS protocol will be embedded into OpenFlow to improve compatibility. (2) We offload many security-related functions into the security auxiliary plane to optimize latency. However, the workload of secondary development based on open-source controllers is substantial because they contain many complex components and function calls.

In the security auxiliary plane, we implement the PASS agent in C (800 lines of code). Currently, the PASS agent supports rule management, software function start/stop, and other features. Since the PASS agent is designed based on an event-driven architecture, more functions can be flexibly extended by defining new events and registering related callbacks.

In the data plane, three security functions are developed, including PFW, SFW, and IDS. (1) PFW supports filtering packets by matching five-tuples with masks. Parallel lookup is implemented on FPGA (using the Stride BV algorithm), while CPUs execute rule and statistics-related tasks. (2) SFW supports complete state-based filtering based on the establishment and termination of the TCP protocol. The related state management and filtering features are implemented on FPGA. (3) IDS supports the core processing logic of Snort, such as exact and regex matching on the payload. Packet parsing and header-based detection are performed on FPGA, while the remaining complex logic (payload-based detection, rule and statistics management) runs on CPUs.

Development efforts. We implement these three functions on a CPU/FPGA-based SmartNIC using two solutions. (1) Strawman. The functions are implemented based on the FAST API without module reuse. FAST provides a general-purpose framework for hardware/software co-design. It hides the complex operations of DMA, PCI-E, and Linux kernel implementation from users. (2) PASS. The functions are implemented based on the PASS API with module reuse. PASS APIs are implemented by extending the security-related libraries based on FAST APIs. We compare the lines of code (LoC) for both software and hardware in Table 5. The experimental results show that PASS can reduce the code volume by an average of 65%.

Resource Utilization. The FPGA logic utilization is shown in Table 6. We categorize FPGA resources into three types: Device-Specific Module, Platform-Basic Module, and Function-Specific Module. (1) Device-specific modules. These are specific to the FPGA hardware, including the Ethernet ports, debug units, etc. They consume 27% Slice LUTs (6931) and 12% Block RAM Tiles (10). (2) Platform-basic modules. These refer to the FAST OS (as depicted in Figure 7), including PCIe, DMA, etc. They consume 52% Slice LUTs (13,376) and 45% Block RAM Tiles (37). (3) Function-specific modules. These include pre-process hSF, PFW hSF, SFW hSF, and IDS hSF. They occupy 21% Slice LUTs (5237) and 43% Block RAM Tiles (34.5). As shown in Table 6, the Function-specific modules occupy the fewest Slice LUTs, while Platform-basic modules consume the most. Thus, extracting shared logic into Platform-basic modules is important for reducing development efforts.

5. Evaluation

5.1. Experimental Setup

We set up two experimental testbeds to evaluate the performance improvement on the FAST-based network experimental platform, as depicted in Figure 10.

Testbed 1. It is used for performance evaluation of PFW-only, IDS-only, and PFW-IDS. Since the PFW and IDS security functions are stateless, we use a network emulator to generate user-defined traffic and test the round-trip time (RTT). It consists of a security controller, a PASS prototype, and a network emulator. The PASS prototype and network emulator are built on a Xilinx Artix-7 FPGA connected to an ARM Cortex-A9 CPU (866 MHz, single core with two hardware threads) via PCIe. In the PASS prototype, one thread is allocated to the OS while the other is used for running sSFs. The network emulator is implemented based on the open-source Project FAST-ANT [44]. It supports precise packet TX/RX service with ∼10

μ

s jitter. All the links between the PASS prototype and the emulator are 1 GE fibers.

Testbed 2. It is used for performance evaluation of SFW-only and SFW-IDS. Since the SFW security function depends on the analysis of TCP connection establishment, we use a commercial off-the-shelf (COTS) server to act as both the TCP server and client. We use iperf to generate parallel TCP streams for testing throughput performance. In order to test the handshake time and the maximum number of connections per second, we deploy an Apache HTTP server and use a tool to initiate a large number of TCP connections within one second.

In order to validate the performance optimization of PASS comprehensively, three types of experimental scenarios are designed as follows.

Since PASS decomposes security functions into sSF and hSF, we compare the performance improvement of a single security function with and without hardware acceleration.
Since PASS proposes a high-efficiency SFC cooperation model, we compare the performance improvement between hardware-accelerated SFC and software-based SFC.
Since PASS offloads the security policy translation to the security agent, we compare the latency before and after offloading these operations.

5.2. Single Security Function Acceleration

In order to compare the performance improvement with hardware acceleration, we implement the PFW, SFW, and IDS based on software. The packet forwarding performance consists of I/O performance and processing performance. In this paper, we focus on improving packet processing performance. Before analyzing the experimental results, the performance measurement methods are described as follows. (1) In order to remove the overhead introduced by packet I/O, we test the forwarding bandwidth and latency of hardware direct forwarding and software direct forwarding as the baseline. The packet performance improvement is obtained by computing the difference between hardware security processing and software security processing after subtracting the direct forwarding overhead. (2) Since the internal processing latency in different FPGAs varies, we test and subtract the loopback latency of the network tester from the total latency. Besides, we try to measure the non-blocking latency by setting the packet sending interval to 1 ms. When packets are blocked in the socket queue, the queuing delay becomes significant and is considered part of the I/O overhead.

PFW performance. We test the throughput and latency of HW PFW and SW PFW with 256/512/1024/1500B packets, respectively, in Figure 11. The throughput of Emulator Loopback, HW Direct FWD, and HW PFW FWD reaches line speed. In contrast, the throughput of SW Direct FWD and SW PFW FWD increases as the packet size increases. The performance improvement ranges from 13% to 38% when comparing the results before and after hardware acceleration (excluding I/O overhead). The latency of HW Direct FWD and HW PFW FWD is the same and exhibits no jitter. The latency of SW Direct FWD and SW PFW FWD also increases with packet size. This is because of frequent buffer allocations/de-allocations and memory copies between kernel and user space. The memory copy overhead is directly proportional to the packet size.

SFW performance. We use iperf to establish 8 TCP connections and measure the throughput under different MSS (Max Segment Size), as shown in Figure 12a. The HW SFW FWD achieves line-rate forwarding at any packet size. The performance improvement is 20.5–28% before and after hardware acceleration (excluding I/O overhead). The CDF (Cumulative Distribution Function) of TCP connection establishment time is computed under 1000 connections, as depicted in Figure 12b. The establishment times of HW SFW FWD and HW Direct FWD are 15.5 µs and 14.7 µs, respectively. As for SW SFW FWD and SW Direct FWD, the establishment times are approximately 100 µs and 76 µs when the probability is 80%. Furthermore, we analyze the number of TCP connections established when initiating 20,000 TCP requests per second. This experiment is performed 100 times for each use case, and the results are shown in Figure 12c. The maximum number of TCP connections for HW SFW FWD and HW Direct FWD is around 8000 with high probability, while those of SW Direct FWD and SW SFW FWD are about 5500 and 2400, respectively.

IDS performance. The workload of IDS depends on the input packet size and rule set. In this experiment, IDS performs an exact match and a regex match, respectively, on the payload of each packet under different packet sizes using the same security rules. The throughput and latency of IDS are shown in Figure 13. The bandwidth and latency gap between HW-SW IDS FWD and SW IDS FWD becomes smaller as the packet size increases. The reason is that the proportion of optimized overhead from header-based detection via hardware acceleration is reduced, while the proportion of payload-based detection overhead increases. Thus, the throughput improvement from hardware acceleration ranges from 7% to 20%, and the latency is reduced by 3% to 18%. Since not all traffic is required to pass through IDS in actual scenarios, flows traversing the IDS function are filtered by packet parsing and matching in the FPGA. Processing performance is optimized by directing the specified traffic to software. Moreover, the development effort required to design a complex security function is greatly reduced with PASS APIs.

Summary. For single security function acceleration, the performance improvement benefits from offloading packet header-related functions to hardware. Since the packet header fields can be easily extracted based on offset and length, FPGA has great potential to perform operations related to packet header parsing and matching.

5.3. Security Function Chain Acceleration

We design two security function chain use cases (PFW-IDS and SFW-IDS) to evaluate performance optimization. The corresponding software-only functions are implemented as a reference.

PFW -> IDS. The throughput and latency under different packet sizes are measured, as shown in Figure 14. Compared to the single IDS function, the performance improvement of PFW -> IDS is more significant. The throughput increases by 14% to 50%, while the latency is reduced by 8% to 26%.

SFW -> IDS. We establish 8 TCP connections with Iperf and measure the throughput under different MSS values, as shown in Figure 15a. Since the processing logic of SFW is more complex than that of PFW, the throughput improvement from PASS acceleration is greater (26% to 77%). In addition, we depict the CDF of the establishment time for 1000 TCP connections in Figure 15b. The handshake latency of HW-SW SFW-IDS and SW SFW-IDS is 90 µs and 140 µs at a probability of 80%.

Summary. Unlike single-function acceleration, the performance improvements of security function chain acceleration benefit from compressing the packet processing path and reusing intermediate results. Thus, as the length of the security function chain increases, the performance gains continue to grow. In addition, it is of great significance to analyze and extract the reusable processing logic among different security functions.

5.4. Control Plane Acceleration

We design three solutions for distributing security policies to compare the performance improvements.

(1): Solution 1 (S1): The content-based rule database is built in the controller. The translations from input policies to rules are performed in the controller, followed by dispatching these rules to the agent.
(2): Solution 2 (S2): The content-based rule database is built in the agent. The controller looks up the rule identifiers according to the user policies and dispatches these grouped identifiers to the agent.
(3): Solution 3 (S3): The content-based rule database is built in the agent. The agent performs the translation and distribution according to the policies received from the controller.

In this experiment, 1,800 rules are selected from the Snort rule library and classified into five categories: information leakage, code execution, Trojan attacks, botnets, and buffer overflow attacks. The security policies for these attacks are distributed to the agent sequentially. We provide statistics on the data volume and latency for the three solutions, as shown in Table 7. Latency is measured from the time the controller receives the security policies to the time it receives the corresponding ACK messages.

The experimental results show that as the number of rules increases, the data volume and latency in S1 and S2 increase, while those in S3 remain unchanged. S1 and S2 dispatch raw rule data and rule identifiers, respectively. When the packet length exceeds the MTU, the packet must be fragmented and reassembled by the protocol stack. By contrast, the packet size in S3 is only 0.4 KB, avoiding complex fragmentation and reassembly because S3 transmits abstracted policies. Compared to S1 and S2, S3 reduces latency by 82% and 65%, respectively. Notably, since the cached security rules can be shared between the agent and sSFs, the agent only needs to configure rule identifiers on sSFs without sending complete rule data.

Summary. Compared to network functions, security functions have more complex rule sets for detecting diverse security attacks, especially in the rule bodies. By caching the full or frequently used rule sets in the agent of each PASS-based security device, the communication overhead between the control plane and the data plane can be greatly reduced. Furthermore, all packets can be processed in the data plane without submitting raw packet data to the control plane.

6. Discussion

Trade-offs of PASS. (1) Customization vs. Generalization. As described in Section 3, we chose to customize the southbound protocol and packet metadata to improve processing performance and resource utilization. As a result, the PASS framework is not easily compatible with other SDN programming frameworks based on the OpenFlow protocol. This means that all software in both the control plane and data plane must be replaced to deploy the PASS framework. (2) Scalability of the control plane. In this paper, we focus on optimizing the latency of rule translation and distribution and propose offloading many controller operations to the security agent. Furthermore, improving the robustness of PASS by supporting scalability with multiple controllers is important. In the future, we will place more emphasis on designing communication protocols for controller-to-controller and agent-to-multi-controller communication.

Security for PASS. Since PASS provides physically shared security resources for multi-tenants, it is important to allocate logically exclusive resources to different tenants. The flow identification (fid), as a key field in the shared metadata, determines the mapping between security rules and packets. When the security controller dispatches a security policy, the tenant identification is also carried to the security agents. The agents allocate conflict-free fids for flows from multiple tenants. Therefore, flows from different tenants traverse different paths and hit different entries in the data plane. In addition, the agent supports developers in designing other fid allocation algorithms via open APIs. In the future, we will study trust models to enhance the security of PASS under adversarial scenarios.

Adaptability to Dynamic Scenarios. In multi-tenant scenarios, security policies are often updated dynamically in response to changes in traffic and threat conditions. In PASS, the controller only dispatches abstracted policies, while all translation and distribution operations are performed by the agents on security devices. According to Table 7, the policy distribution latency remains at 97 ms, unaffected by the number of rules. In addition, to minimize the impact on data-plane forwarding during policy updates, each rule entry in hardware and software includes a status flag. If the status is 1, the current entry is considered valid. Therefore, before new rules take effect, the old rules remain operational. In the future, we will conduct further analysis and optimization of bottlenecks under dynamic scenarios.

7. Related Work

(1) Optimizations on the Data Plane

Research on data plane optimization can be categorized into network I/O acceleration and packet processing acceleration. Network I/O acceleration aims to improve packet transmission and reception throughput through optimized I/O software frameworks (e.g., DPDK, PF_RING [45], Netmap). Packet processing acceleration focuses on reducing the computational and storage overhead caused by complex packet processing logic. In this paper, we focus on accelerating packet processing. Zero-interrupt and zero-copy optimization techniques are orthogonal to our work; these I/O optimizations will be integrated in future work to further enhance performance.

The related work on network processing acceleration can be categorized into three dimensions: (1) Architecture-level optimization. NetBricks [28] reduces resource isolation overhead by replacing VMs with lightweight containers. VPP improves cache hit rates through batch processing. NFP [46] enhances resource utilization by parallelizing multiple network functions across multiple cores. Since security functions typically do not modify packets, their parallelism can be further improved. (2) Application-level optimization. OpenBox abstracts processing workflows into fine-grained graphs and extracts shared operations to eliminate redundancy. Inspired by this approach, PASS abstracts HW/SW shared modules (e.g., packet parsing, header classification) into a pre-processing module and defines a control block to carry key intermediate metadata. (3) Heterogeneous processing optimization. Numerous studies explore accelerating NFV using GPUs or FPGAs. PacketShader [31] and NBA [47] propose GPU-based network function acceleration. These GPUs are deployed in a look-aside mode, where the NIC cannot perform DMA directly to GPU memory. Although ClickNP [32] maps network functions to both CPU and FPGA, it lacks guidance on rational function partitioning between software and hardware. VFP provides a hardware fast path for network processing using FPGAs; however, it does not elaborate on hardware/software co-design. In this paper, PASS addresses these limitations by abstracting security function processing into two paths and proposing design specifications that enable collaboration between software and hardware as well as among different functions, guided by the characteristics of security functions.

(2) Optimization of the control plane

Although SDN decouples the control and data planes, there are no standardized specifications for how to map security functions across these two planes. If a large number of packets are forwarded from the data plane to the control plane, the centralized controller can become a performance bottleneck. DIFANE [48] and DevoFlow [49] propose caching control rules in the data plane, allowing all packets to be processed locally. In addition to avoiding forwarding packets to the control plane, PASS offloads policy translation to the security auxiliary plane to reduce latency and minimize interaction data volume.

(3) Optimization of development efficiency

In addition to performance optimization, existing research also focuses on improving development efficiency and reducing complexity. Since middleboxes are required to support complex state-related management, mOS provides developers with abstracted fine-grained flow events through a customized network stack. Developers only need to select the appropriate state event and register user-defined callbacks. Similarly, the PASS agent is designed with an event-driven architecture and supports offloading more workload to the security auxiliary plane by defining events and registering related functions. OFX supports security function development by extending the OpenFlow protocol; however, security functions must be developed from scratch, incurring significant development costs. PASS addresses these challenges by providing high-level, well-defined APIs. These APIs are designed by abstracting the security function processing model and hiding complex and shared packet operations.

8. Conclusions

Developing security stacks that integrate multiple security function chains within a single device is increasingly important for meeting the security requirements of multi-tenants in public cloud environments. However, achieving an effective trade-off between performance and programming flexibility remains a significant challenge. In this paper, we present the design of a novel security programming framework with SmartNIC acceleration, called PASS. PASS enhances system-level performance through three key optimizations: HW/SW co-design of security functions, data plane path compression for security function chains, and offloading of policy-related functions. Additionally, PASS simplifies the development of security functions with user-friendly APIs and customized southbound protocols. Experimental results demonstrate that PASS significantly improves both performance and development efficiency. We hope that PASS will serve as a practical platform for network operators to customize security services for external tenants in public cloud environments.

Author Contributions

Conceptualization, W.F. and J.Y.; methodology, W.F. and J.Z.; software, J.Y. and Y.S.; validation, Y.W. (Yong Wang), W.F. and J.Y.; formal analysis, W.F.; investigation, Z.Z.; resources, Q.Y.; data curation, Y.W. (Yong Wang); writing—original draft preparation, W.F.; writing—review and editing, J.Y.; visualization, Y.W. (Yong Wang); supervision, Y.W. (Yongwen Wang); project administration, W.F.; funding acquisition, W.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by National Key Laboratory under Grant: No. 2023-KIWPDL-14.

Data Availability Statement

The data supporting the findings of this study are not publicly available due to privacy restrictions. The research involves sensitive information, and sharing such data would compromise the privacy of participants or entities involved. For researchers seeking access to the data, please contact the corresponding author with a detailed justification of the research purpose.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Biswas, T.; Kumar, P. Optimizing Resource Management in Serverless Computing: A Dynamic Adaptive Scaling Approach. In Proceedings of the 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024. [Google Scholar]
Chippagiri, S. A Study of Cloud Security Frameworks for Safeguarding Multi-Tenant Cloud Architectures. Int. J. Comput. Appl. 2025, 186, 50–57. [Google Scholar] [CrossRef]
Yadav, A.K.; Garg, M.L.; Ritika. Security-Aware Efficient Multi-tenant Cloud Environment. In Workshop on Mining Data for Financial Applications; Springer Nature: Singapore, 2022. [Google Scholar]
Gu, J.; Wu, X.; Zhu, B.; Xia, Y.; Zang, B.; Guan, H.; Chen, H. Enclavisor: A Hardware-Software Co-Design for Enclaves on Untrusted Cloud. IEEE Trans. Comput. 2021, 70, 1598–1611. [Google Scholar] [CrossRef]
Li, H.; Wu, C.; Sun, G.; Zhang, P.; Shan, D.; Pan, T.; Hu, C. Programming Network Stack for Middleboxes with Rubik. In Proceedings of the 18th Usenix Symposium on Networked System Design and Implementation, Boston, MA, USA, 12–14 April 2021. [Google Scholar]
Mijumbi, R.; Serrat, J.; Gorricho, J.-L.; Bouten, N.; Turck, F.D.; Boutaba, R. Network Function Virtualization: State-of-the-Art and Research Challenges. IEEE Commun. Surv. Tutor. 2017, 18, 236–262. [Google Scholar] [CrossRef]
Schardong, F.; Nunes, I.; Schaeffer-Filho, A. NFV Resource Allocation: A Systematic Review and Taxonomy of VNF Forwarding Graph Embedding. Comput. Netw. 2021, 185, 107726. [Google Scholar] [CrossRef]
Intel. Intel Data Plane Development Kit (DPDK). 2025. Available online: https://www.intel.cn/content/www/cn/zh/developer/topic-technology/networking/dpdk.html (accessed on 1 June 2025).
Cisco. What Is the Vector Packet Processor (VPP). 2025. Available online: https://s3-docs.fd.io/vpp/25.06/ (accessed on 1 June 2025).
Zhao, Z.; Sadok, H.; Atre, N. Achieving 100Gbps intrusion prevention on a single server. In Proceedings of the OSDI’20: Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, Berkeley, CA, USA, 4–6 November 2020; pp. 451–467. [Google Scholar]
Firestone, D.; Putnam, A.; Mundkur, S.; Chiou, D.; Dabagh, A.; Andrewartha, M.; Angepat, H.; Bhanu, V.; Caulfield, A.; Chung, E. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, Renton, WA, USA, 9–11 April 2018; pp. 51–64. Available online: https://www.usenix.org/conference/nsdi18/presentation/firestone (accessed on 1 June 2025).
Purkayastha, A.A.; Tharwani, J.; Aggarwal, S. FPGA or GPU? Analyzing Comparative Research for Application-Specific Guidance. In Proceedings of the SoutheastCon 2025, Concord, NC, USA, 27–30 March 2025; pp. 1258–1263. [Google Scholar] [CrossRef]
Chen, X.; Liu, H.; Huang, Q.; Zhang, D.; Zhou, H.; Wu, C.; Liu, X.; Yang, Q. Toward Low-Latency and Accurate State Synchronization for Programmable Networks. IEEE/ACM Trans. Netw. 2023, 31, 1400–1415. [Google Scholar] [CrossRef]
Wang, Z.; Huang, H.; Zhang, J.; Wu, F.; Alonso, G. FpgaNIC: An FPGA-based Versatile 100Gb SmartNIC for GPUs. In Proceedings of the USENIX Annual Technical Conference, Carlsbad, CA, USA, 11–13 July 2022. [Google Scholar]
Firestone, D. VFP: A virtual switch platform for host sdn in the public cloud. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2017), Boston, MA, USA, 27–29 March 2017. [Google Scholar]
Song, X.; Lu, R.; Guo, Z. High-Performance Reconfigurable Pipeline Implementation for FPGA-Based SmartNIC. Micromachines 2024, 15, 449. [Google Scholar] [CrossRef] [PubMed]
Shan, Y.; Lin, W.; Kosta, R.; Krishnamurthy, A.; Zhang, Y. SuperNIC: A Hardware-Based, Programmable, and Multi-Tenant SmartNIC. In Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, USA, 3–5 March 2024; pp. 130–141. [Google Scholar]
Priyadarsini, M.; Bera, P. Software defined networking architecture, traffic management, security, and placement: A survey. Comput. Netw. 2021, 192, 108047. [Google Scholar] [CrossRef]
Yang, X.; Sun, Z.; Li, J.; Yan, J.; Li, T.; Quan, W.; Xu, D.; Antichi, G. FAST: Enabling fast software/hardware prototype for network experimentation. In Proceedings of the International Symposium on Quality of Service, Phoenix, AR, USA, 24–25 June 2019. [Google Scholar]
Zhang, C.; Yu, H.; Zhou, Y.; Jiang, H. High-Performance and Energy-Efficient FPGA-GPU-CPU Heterogeneous System Implementation. In Advances in Parallel & Distributed Processing, and Applications: Proceedings from PDPTA’20, CSC’20, MSV’20, and GCC’20; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Vörös, P.; Kis, D.; Hudoba, P.; Pongrácz, G.; Laki, S. Towards an in-network GPU-accelerated packet processing framework. In Proceedings of the 2nd Conference on Information Technology and Data Science (CITDS), Debrecen, Hungary, 16–18 May 2022; pp. 308–313. [Google Scholar] [CrossRef]
NoviFlow Inc. NoviFlow and Barefoot Networks Demonstrate the Future of Cybersecurity and Network Monitoring Live on 6.5 Tb/s Tofino Ethernet Switch at MWC2018. Bus. Wire 2018. [Google Scholar]
Yue, S.; Liang, H. Anlu NP: The Abstract Level Modeling of Network Processor for Agile Hardware Design. In Proceedings of the 5th International Conference on Computer Engineering and Application (ICCEA), Hangzhou, China, 12–14 April 2024. [Google Scholar]
Zhai, D.; Meng, X.; Yu, Z.; Hu, H.; Huang, T. A security-aware service function chain deployment method for load balance and delay optimization. Sci. Rep. 2022, 12, 10442. [Google Scholar] [CrossRef] [PubMed]
Dubba, S.; Killi, B.R. Security-Aware Cost Optimized Dynamic Service Function Chain Scheduling. J. Netw. Syst. Manag. 2025, 33, 4. [Google Scholar] [CrossRef]
Zulkar Nine, M.S.Q.; Bulut, M.F.; Kosar, T.; Hwang, J. GreenNFV: Energy-Efficient Network Function Virtualization with Service Level Agreement Constraints. In Proceedings of the SC23: International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 12–17 November 2023; pp. 1–12. [Google Scholar] [CrossRef]
Bremler-Barr, A.; Harchol, Y.; Hay, D. OpenBox: A Software-Defined Framework for Developing, Deploying, and Managing Network Functions. In Proceedings of the 2016 ACM SIGCOMM Conference, Florianópolis, Brazil, 22–26 August 2016. [Google Scholar]
Panda, A.; Han, S.; Jang, K.; Walls, M.; Ratnasamy, S.; Shenker, S. NetBricks: Taking the V out of NFV. In Proceedings of the USENIX OSDI, Carlsbad, CA, USA, 8–10 October 2018. [Google Scholar]
Yang, Y.; He, L.; Zhou, J.; Shi, X.; Cao, J.; Liu, Y. P4runpro: Enabling Runtime Programmability for RMT Programmable Switches. In Proceedings of the ACM SIGCOMM ’24: ACM SIGCOMM 2024 Conference, Sydney, Australia, 4–8 August 2024. [Google Scholar]
Ihle, F.; Menth, M. MPLS Network Actions: Technological Overview and P4-Based Implementation on a High-Speed Switching ASIC. IEEE Open J. Commun. Soc. 2025, 6, 3480–3501. [Google Scholar] [CrossRef]
Han, S.; Jang, K.; Park, K.S.; Moon, S. PacketShader: A GPU-Accelerated Software Router. In Proceedings of the ACM SIGCOMM, Melbourne, Australia, 1–3 November 2010. [Google Scholar] [CrossRef]
Bojie, L.; Tan, K.; Luo, L.; Peng, Y.; Luo, R.; Xu, N.; Xiong, Y.; Cheng, P.; Chen, E. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In Proceedings of the ACM SIGCOMM, Florianopolis, Brazil, 22–26 August 2016. [Google Scholar] [CrossRef]
Boukebous, A.A.E.; Fettache, M.I.; Bendiab, G.; Shiaeles, S. A Comparative Analysis of Snort 3 and Suricata. In Proceedings of the 2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET), London, UK, 19–21 May 2023. [Google Scholar]
Shin, S.; Porras, P.; Yegneswaran, V. FRESCO: Modular Composable Security Services for Software-Defined Networks. In Proceedings of the 20th Annual Network & Distributed System Security Symposium (NDSS Symposium 2013), San Diego, CA, USA, 24–27 February 2013. [Google Scholar]
Shin, S.; Yegneswaran, V.; Porras, P.; Gu, G. AVANT-GUARD: Scalable and Vigilant Switch Flow Management in Software-Defined Networks. In Proceedings of the 2013 ACM SIGSAC conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013. [Google Scholar]
Sonchack, J.; Aviv, A.J.; Keller, E.; Smith, J.M. OFX: Enabling OpenFlow extensions for switch-level security applications. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, Denver, CO, USA, 12–16 October 2015. [Google Scholar]
Rizzo, L. netmap: A novel framework for fast packet I/O. In Proceedings of the 2012 USENIX Annual Technical Conference, Boston, MA, USA, 13–15 June 2012. [Google Scholar]
Park, T.; Xu, Z.; Shin, S. HEX Switch: Hardware-assisted security extensions of OpenFlow. In Proceedings of the 2018 Workshop on Security in Softwarized Networks: Prospects and Challenges, Budapest, Hungary, 24 August 2018. [Google Scholar]
McKeown, N.; Anderson, T.; Balakrishnan, H. OpenFlow: Enabling innovation in campus networks. ACM SIGCOMM Comput. Commun. Rev. 2008, 8, 69–74. [Google Scholar] [CrossRef]
Jamshed, M.; Moon, Y.G.; Kim, D.; Han, D.; Park, K.S. mOS: A reusable networking stack for flow monitoring middleboxes. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation, Boston, MA, USA, 27–29 March 2017. [Google Scholar]
Big Switch Networks. Floodlight: OpenFlow Controller. Available online: https://github.com/floodlight/floodlight (accessed on 1 June 2025).
Chauhan, P.; Atulkar, M. Ryu Controller-based Attack Detection and Mitigation Method in Software Defined Internet of Things. Int. J. Eng. Trends Technol. 2023, 71, 138–156. [Google Scholar] [CrossRef]
OpenDaylight Project. OpenDaylight Controller: Technical White Paper. Available online: https://www.opendaylight.org/documentation (accessed on 1 June 2025).
Yang, X. FAST-ANT: Agile Network Tester. Available online: https://github.com/fast-codesign/FAST-ANT (accessed on 1 June 2025).
NTOP. PF_RING. 2018. Available online: https://www.ntop.org/products/packetcapture/pf_ring/ (accessed on 1 June 2025).
Sun, C.; Bi, J.; Zheng, Z.; Yu, H.; Hu, H. NFP: Enabling Network Function Parallelism in NFV. In Proceedings of the ACM Special Interest Group on Data Communication, Los Angeles, CA, USA, 21–25 August 2017. [Google Scholar]
Kim, J.; Jang, K.; Lee, K.; Ma, S.; Shim, J.; Moon, S. NBA (network balancing act): A high-performance packet processing framework for heterogeneous processors. In Proceedings of the Tenth European Conference on Computer Systems, Bordeaux, France, 21–24 April 2015. [Google Scholar]
Yu, M.; Rexford, J.; Freedman, M.J.; Wang, J. Scalable Flow-Based Networking with DIFANE. ACM SIGCOMM Comput. Commun. Rev. 2010, 40, 351–362. [Google Scholar] [CrossRef]
Mogul, J.C.; Tourrilhes, J.; Yalagandula, P.; Sharma, P.; Curtis, A.R.; Banerjee, S. DevoFlow: Cost-effective flow management for high performance enterprise networks. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, Monterey, CA, USA, 20–21 October 2010. [Google Scholar]

Figure 1. Overview of PASS.

Figure 2. Rule Management in PASS.

Figure 3. Security Processing Paths in PASS.

Figure 4. Metadata Format.

Figure 5. SW/HW Interaction Modes.

Figure 6. PFW+IDS Security Function Chain.

Figure 7. Programming Framework of PASS.

Figure 8. Programming a Simplified Snort.

Figure 9. PASS Southbound Protocol Format.

Figure 10. Experimental Testbed.

Figure 11. PFW Performance.

Figure 12. SFW Performance.

Figure 13. IDS Performance.

Figure 14. PFW-IDS Performance.

Figure 15. SFW-IDS Performance.

Table 1. Comparison of Programmable Platforms.

Platform	Performance	Flexibility	Cost	Deployment	Network	References
Host CPU only	Low	High	Low	Host	Overlay	[27,28]
Programmable Switch	Medium	Low	High	Switch	Underlay	[29]
ASIC Switch/FPGA	High	Low	Medium	Switch	Underlay	[30]
Host CPU/GPU	Medium	High	Medium	Host	Overlay	[31]
Host CPU/NP-based SmartNIC	High	High	High	Host	Overlay	[32]
Host CPU/FPGA-based SmartNIC	High	High	Medium	Host	Overlay	[11]

Table 2. Comparison of Programmable Frameworks.

Framework	VFP	OpenBox	ClickNP	PASS
Platform	CPU+FPGA	CPU+Other Hardware	CPU+FPGA	CPU+FPGA
SDN Architecture	SDN 2-Layer	SDN 2-Layer	/	SDN 3-Layer
Controller Deployment	Multi-Controller (One per security function)	Logically Centralized Controller	/	Logically Centralized Controller + Security Agent
Function Chain Orchestration	Fixed	YES	YES	YES
Function Chain Compression	/	YES	/	YES
Security Function Division on HW/SW	/	/	/	YES
Function Development API	VFP API	OpenBox API	ClickNP API	PASS API
Southbound API	VFP Protocol	OpenBox Protocol	/	PASS Protocol

Table 3. PASS API.

API	Description
pass_start_agent_comm()	Establish a communication connection with the agent and receive control packets.
pass_build_pkt(pkt_info)	Build PASS protocol packets.
pass_send_pkt(pkt)	Transmit PASS protocol packets to the agent.
pass_report_log(msg)	Report the statistical logs periodically.
pass_regex_match(payload)	Perform regex matching based on the payload.
pass_exact_match(payload)	Perform exact matching based on the payload.
pass_reg_rule_mgt_handler(rule_mgt_callback)	Register user-defined rule management handler (e.g., rule parsing and storage).
pass_reg_pkt_handler(pkt_handler_callback)	Register user-defined packet processing handler.
pass_reg_log_mgt_handler(log_mgt_callback)	Register user-defined handler for collecting and managing statistical information (e.g., counters and alarm information).

Table 4. PASS Southbound Protocol Type.

Type	Description
Rule Management Message	Issues security rule configuration commands, such as addition, deletion, and replacement.
Function Control Information	Dynamically starts or stops security functions.
Function Reconfiguration Message	Loads software and hardware executables onto the CPU/FPGA for remote reconfiguration.
ACK Message	Acknowledges related management actions such as rule distribution, function control, and reconfiguration commands.
Statistical Message	Collects and reports security statistics.
Device Status Message	Reports current device status, such as memory usage and bandwidth.

Table 5. Comparison of Development Efforts.

Security Function	Hardware LoC		Software LoC		Reduction
Security Function	Strawman	PASS	Strawman	PASS	Reduction
PFW	2358	294	865	424	77.7%
SFW	5095	2129	921	465	56.9%
IDS	2873	809	2247	1201	60.7%

Table 6. FPGA Resource Usage Statistics.

Module Type	Description	Resource Usage
Module Type	Description	Slice LUTs	Block RAM Tile
Device-Specific Module	Ports, dbg, etc.	6931 (27%)	10 (12%)
Platform-Basic Module	FAST OS	13,376 (52%)	37 (45%)
Function-Specific Module	pre-process hSF	5237 (21%)	34.5 (43%)
	PFW hSF
	SFW hSF
	IDS hSF
Total	/	25,544	81.5

Table 7. Comparison of Policy Distribution.

Attack Type	Rule Number	Data Volume (KB)			Latency (ms)
Attack Type	Rule Number	S1	S2	S3	S1	S2	S3
Trojan Attack	811	260	4	0.4	1422	478	97
Information Leakage	595	160	3	0.4	1100	410	97
Botnet	298	84	2	0.4	607	271	97
Buffer Overflow Attack	60	16	0.6	0.4	345	202	97
Code Execution	25	6	0.5	0.4	329	200	97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, W.; Yan, J.; Zhang, J.; Sun, Y.; Wang, Y.; Zhang, Z.; Yang, Q.; Wang, Y. PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud. Electronics 2025, 14, 2650. https://doi.org/10.3390/electronics14132650

AMA Style

Fu W, Yan J, Zhang J, Sun Y, Wang Y, Zhang Z, Yang Q, Wang Y. PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud. Electronics. 2025; 14(13):2650. https://doi.org/10.3390/electronics14132650

Chicago/Turabian Style

Fu, Wenwen, Jinli Yan, Jian Zhang, Yinhan Sun, Yong Wang, Ziwen Zhang, Qianming Yang, and Yongwen Wang. 2025. "PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud" Electronics 14, no. 13: 2650. https://doi.org/10.3390/electronics14132650

APA Style

Fu, W., Yan, J., Zhang, J., Sun, Y., Wang, Y., Zhang, Z., Yang, Q., & Wang, Y. (2025). PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud. Electronics, 14(13), 2650. https://doi.org/10.3390/electronics14132650

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud

Abstract

1. Introduction

2. Motivation

2.1. Programmable Platforms

2.2. Network Security Function

2.3. Comparison of Typical Programmable Frameworks

2.4. Design Goals

3. PASS Design

3.1. Overview

3.2. Unified Rule Management in PASS Agent

3.3. Mapping Security Function into SW/HW

3.4. Cooperation Between Security Functions

3.5. PASS Programming Model

4. PASS Implementation

5. Evaluation

5.1. Experimental Setup

5.2. Single Security Function Acceleration

5.3. Security Function Chain Acceleration

5.4. Control Plane Acceleration

6. Discussion

7. Related Work

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI