MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models

Cheng, Mingjie; Zhu, Kailong; Chen, Yuanchao; Yang, Guozheng; Lu, Yuliang; Lu, Canju

doi:10.3390/electronics13132632

Open AccessArticle

MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models

by

Mingjie Cheng

^1,2,

Kailong Zhu

^1,2,*,

Yuanchao Chen

^1,2

,

Guozheng Yang

^1,2,

Yuliang Lu

^1,2 and

Canju Lu

^1,2

¹

College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China

²

Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(13), 2632; https://doi.org/10.3390/electronics13132632

Submission received: 6 June 2024 / Revised: 30 June 2024 / Accepted: 1 July 2024 / Published: 4 July 2024

(This article belongs to the Special Issue Machine Learning for Cybersecurity: Threat Detection and Mitigation)

Download

Browse Figures

Versions Notes

Abstract

:

Network protocol implementations, as integral components of information communication, are critically important for security. Due to its efficiency and automation, fuzzing has become a popular method for protocol security detection. However, the existing protocol-fuzzing techniques face the critical problem of generating high-quality inputs. To address the problem, in this paper, we propose MSFuzz, which is a protocol-fuzzing method with message syntax comprehension. The core observation of MSFuzz is that the source code of protocol implementations contains detailed and comprehensive knowledge of the message syntax. Specifically, we leveraged the code-understanding capabilities of large language models to extract the message syntax from the source code and construct message syntax trees. Then, using these syntax trees, we expanded the initial seed corpus and designed a novel syntax-aware mutation strategy to guide the fuzzing. To evaluate the performance of MSFuzz, we compared it with the state-of-the-art (SOTA) protocol fuzzers, namely, AFLNET and CHATAFL. Experimental results showed that compared with AFLNET and CHATAFL, MSFuzz achieved average improvements of 22.53% and 10.04% in the number of states, 60.62% and 19.52% improvements in the number of state transitions, and 29.30% and 23.13% improvements in branch coverage. Additionally, MSFuzz discovered more vulnerabilities than the SOTA fuzzers.

Keywords:

fuzzing; syntax aware; protocol implementations; large language models

1. Introduction

In the digital age, network protocols serve as the foundation for information exchange, not only establishing rules and formats for data transfer but also ensuring the accuracy and efficiency of network information [1]. Although these protocols are crucial for the advancement of modern informatization and intelligentization, their complexity and diversity elevate security risks. Moreover, potential oversights or errors during the protocol development often lead to vulnerabilities. This leaves plenty of room for hackers to target network applications [2]. Therefore, effectively identifying and rectifying these implementation vulnerabilities is vital for maintaining the security and stability of cyberspace.

Fuzzing is an efficient software testing method that is widely used across various software applications. It was proven to be highly effective and powerful in discovering critical vulnerabilities [3,4]. Due to its automation and efficiency, fuzzing has become one of the popular methods for detecting security vulnerabilities in network protocols. This method involves sending large volumes of random or semi-random data to protocol implementations to trigger abnormal behavior, thereby uncovering potential vulnerabilities.

Compared with other fuzzing, network protocol fuzzing faces greater challenges in generating high-quality test cases, primarily due to the highly structured nature of its inputs. Specifically, the input consists of a series of request messages, with each divided into fields defined by strict syntactic rules with precise value constraints. If the input messages do not adhere to basic syntax requirements, the server discards them during the initial stage of processing. This strict requirement for input syntax significantly affects the quality of test cases.

In recent years, previous studies focused on acquiring the syntactic knowledge of protocol messages to generate high-quality test cases. These methods fall into four main categories: protocol specification extraction [5,6,7,8,9,10], network traffic analysis [11,12,13,14,15], program behavior analysis [16,17,18,19], and machine learning analysis [20,21]. Protocol specification extraction methods primarily derive message syntax formats from protocol RFC specification documents. Network traffic analysis methods learn about keywords and field boundaries within messages by collecting and analyzing network traffic. Program behavior analysis methods obtain protocol format information by analyzing how programs behave when processing message data. Machine learning analysis methods leverage natural-language-processing techniques to learn protocol syntax. Although these methods can acquire the partial knowledge of protocol syntax, they still face three key challenges in generating high-quality test cases:

Challenge 1: inadequate input knowledge. Current protocol fuzzers [5,11,16,20] lack an effective and detailed understanding of message syntax. These tools typically reveal only the basic syntactic structure or syntactic constraints of specific fields, failing to fully grasp the comprehensive syntactic knowledge of the message. This partial comprehension of message syntax limits the ability to generate high-quality inputs, thereby impacting the overall effectiveness of fuzzing.

Challenge 2: insufficient seed diversity. The effectiveness of test cases in mutation-based fuzzing largely depends on the quality of the initial seed corpus. However, the most widely used network protocol fuzzing benchmark, ProFuzzBench [22], often displays a lack of seed diversity. If the initial seeds are not comprehensive and solely rely on simplistic mutations, fuzzing may fail to adequately explore protocol implementations. This constraint limits the detection of a wider range of vulnerabilities.

Challenge 3: inefficient mutation strategies. Mutation strategies usually determine the quality of test cases when fuzzing. Most current protocol fuzzers [23,24,25] rely on random mutation methods that do not employ targeted or strategic approaches. These approaches fail to produce diverse and high-quality test cases that can effectively assess the robustness of protocol implementations. Such deficiencies in mutation strategies restrict the depth and breadth of fuzzing coverage, thereby significantly reducing the overall effectiveness of fuzzing.

To overcome the above challenges, in this paper, we propose MSFuzz, which is a protocol-fuzzing method with message syntax comprehension. By comparing the message syntax extracted from existing research with that from the source code, we found that the source code contained more detailed and effective message syntax knowledge. To overcome challenge 1, we present a method for leveraging large language models (LLMs) to extract the message syntax from source code to construct message syntax trees for protocol implementations. For challenge 2, we used LLMs in conjunction with the constructed message syntax trees to expand the initial seed corpus of the protocol implementation. To address challenge 3, we designed a novel syntax-aware mutation strategy that guides fuzzing mutations through the constructed message syntax trees, thereby generating high-quality test cases that satisfy syntactic constraints.

Specifically, we first filtered out code files related to message parsing from a large volume of protocol implementation source code. Then, utilizing the code comprehension abilities of LLMs, we incrementally extracted message types, and requested line parameters and their value constraints, as well as header fields and their value constraints, from the filtered source code. This process enabled us to construct the message syntax tree of the protocol implementation. Based on the constructed message syntax tree and configuration files of the protocol implementation, we expanded the initial seed corpus, generating a diverse and comprehensive set of seeds. Finally, we used the constructed message syntax tree to guide the mutation strategy. By parsing the messages to be mutated and matching them with the syntax tree, we determined the syntactic constraints of message fields and generated high-quality test cases based on these constraints. So far, we have implemented a prototype of MSFuzz.

We evaluated the performance of MSFuzz on three widely used network protocols: RTSP, FTP, and DAAP. We compared MSFuzz with two SOTA protocol fuzzers: AFLNET and CHATAFL. The experimental results showed that within 24 h, MSFuzz improved the number of states by averages of 22.53% and 10.04%, and the number of state transitions by averages of 60.62% and 19.52%, respectively, when compared with AFLNET and CHATAFL. Additionally, MSFuzz effectively explored the code space, achieving average branch coverage improvements of 29.30% and 23.13% over SOTA protocol fuzzers. In the ablation study, we found that the two key components, seed expansion, and syntax-aware mutation, significantly enhanced the fuzzing performance. Additionally, MSFuzz discovered more vulnerabilities than the SOTA fuzzers.

The main contributions of this paper are summarized as follows:

To address the problem of generating high-quality test cases, we propose MSFuzz, which is a novel protocol-fuzzing technique. MSFuzz is built upon three core components: message syntax tree construction, seed expansion, and syntax-aware mutation.
By employing a novel abstraction of message syntax structures, MSFuzz leverages the code-understanding capabilities of LLMs to effectively extract message syntax from the source code of protocol implementations, thereby constructing uniformly structured message syntax trees.
MSFuzz utilizes the constructed protocol syntax trees to expand the initial seed corpus and applies syntax-aware mutation strategies to generate high-quality test cases that adhere to specified constraints.
We evaluated MSFuzz on widely used protocol implementations. The results demonstrate that MSFuzz outperformed the SOTA protocol fuzzers in state coverage, code coverage, and vulnerability discovery.

2. Background and Motivation

In this section, we introduce protocol fuzzing, large language models, and a motivating example. First, we provide a brief overview of protocol fuzzing. Next, we offer background information on large language models and their recent advancements in vulnerability discovery. Finally, we illustrate the limitations of existing methods through a motivating example.

2.1. Protocol Fuzzing

As one of the most effective and efficient methods for discovering vulnerabilities, fuzzing has been applied in the field of network protocols. The inception of PROTOS [26], the pioneering protocol-focused fuzzing tool, marked the beginning of protocol fuzzing. Subsequently, this field has garnered extensive attention, resulting in numerous research achievements and establishing itself as a focal point in network security.

Protocol fuzzers primarily target server-side implementations by simulating client behavior. These tools continuously create and dispatch client messages to the servers. Depending on the method of message generation, protocol fuzzers can be broadly categorized into two types: generation based and mutation based.

Generation-based protocol fuzzers rely on prior knowledge of protocol formats to generate test cases [26,27,28,29,30]. PROTOS [26] generates erroneous inputs based on protocol specifications to trigger specific vulnerabilities. SPIKE [27] employs a block-based modeling approach, breaking down the protocol into different blocks and automatically generating valid data blocks for protocol messages based on predefined generation rules. Peach [31] defines data models of protocols by manually constructing Pit files, which are then used to generate test cases. SNOOZE [6] requires testers to manually extract protocol specifications from request for comments (RFC) documents, including protocol field characteristics, information exchange syntax, and state machines. Testers then send specific sequences of messages to reach the desired state and generate numerous random test cases based on the protocol specifications.

Mutation-based protocol fuzzers generate new test cases by mutating seeds, which consist of a set of request messages [19,24,32,33,34]. These mutation operators include altering field values within messages, inserting, deleting, or replacing specific sections of messages, as well as recombining or reordering messages. AFLNET [24] employs byte-level and region-level mutation strategies, capturing traffic during client-server communication as initial seeds and generating test cases during the fuzzing process. Meanwhile, SGPFuzzer [32] introduces various mutation operators to mutate selected seed files in a simple and structured manner, including sequence mutation, message mutation, binary field mutation, and variable string mutation.

Compared with other fuzzing, network protocol fuzzing faces more challenges, one of which is the highly structured nature of its input. Network protocol messages typically adhere to strict syntactic constraints, encompassing various types of messages, as well as syntactic constraints within internal fields. Deviations from these constraints may result in servers discarding received messages, thereby limiting the effectiveness and efficiency of fuzzing.

2.2. Large Language Models

Large language models (LLMs), as a form of deep learning-driven artificial intelligence technology, demonstrate powerful natural-language-processing capabilities. These models undergo extensive pre-training on large datasets, equipping them with the ability to deeply understand and generate natural language text. They possess a profound and rich understanding of linguistic knowledge, and their comprehension of context is remarkably thorough.

In recent studies, LLMs have shown great potential in the field of vulnerability detection. GPTScan [35] combines GPT with static analysis techniques for intelligent contract logic bug detection. CHATAFL [10] employs LLMs to guide protocol fuzzing, constructing protocol grammars, expanding initial seeds, and generating test cases capable of triggering state transitions through interactions with LLMs. Fuzz4All [36] utilizes LLMs as an input generation and mutation engine for fuzzing across multiple input languages and features. ChatFuzz [37] employs LLMs for seed mutation. TitanFuzz [38] utilizes LLMs for fuzzing deep learning libraries. These studies provide rich practical experience in the application of large language models for vulnerability detection, showcasing their potential and value in enhancing vulnerability detection capabilities.

Although applying LLMs to network protocol fuzzing has great potential, it faces several challenges. When using LLMs to analyze source code, directly inputting the entire source code often fails due to input size limitations. Therefore, it is essential to filter the source code and extract key snippets when using LLMs.

2.3. Motivating Example

As one of the SOTA network protocol fuzzers, CHATAFL [10] extracts message syntax structures from protocol specifications. Although protocol implementations generally adhere to RFC documents, variations exist between different implementations, and actual message syntactic constraints are often more detailed than the specifications. For example, Figure 1 illustrates the PLAY message syntax structure that CHATAFL extracts from the RTSP specification. Although the basic syntax of the message is determined, the specific values of key fields, like Range, remain unknown.

Listing 1 shows a code snippet from the Live555 server based on the RTSP protocol, demonstrating the parsing process of the

R a n g e

field. The code identifies six types of value constraints for the Range field: npt = %lf - %lf, npt = %n%lf -, npt = now - %lf, npt = now - %n, clock = %n, and smpte = %n. Live555 attempts six corresponding matches to determine the values of rangeStart and rangeEnd. If the Range value of the message does not adhere to any of these constraints, the function returns False, indicating a parsing failure.

Listing 1. Simplified code snippet from Live555.

  Boolean parseRangeParam() {
  ...
  if (sscanf(paramStr, "npt = %lf - %lf", &start, &end) == 2) {
    rangeStart = start;
    rangeEnd = end;
  } else if (sscanf(paramStr, "npt = %n%lf -", &numCharsMatched1, &start) == 1) {
    if (paramStr[numCharsMatched1] == '-') {
      rangeStart = 0.0; startTimeIsNow = True;
      rangeEnd = -start;
    } else {
      rangeStart = start;
      rangeEnd = 0.0;
    }
  } else if (sscanf(paramStr, "npt = now - %lf", &end) == 1) {
      rangeStart = 0.0; startTimeIsNow = True;
      rangeEnd = end;
  } else if (sscanf(paramStr, "npt = now -%n", &numCharsMatched2) == 0 &&
   ↪  numCharsMatched2 > 0) {
    rangeStart = 0.0; startTimeIsNow = True;
    rangeEnd = 0.0;
  } else if (sscanf(paramStr, "clock = %n", &numCharsMatched3) == 0 &&
   ↪  numCharsMatched3 > 0) {
    ...
  } else if (sscanf(paramStr, "smtpe = %n", &numCharsMatched4) == 0 &&
   ↪  numCharsMatched4 > 0) {
  } else {
    return False;
  }
  return True;
}

When CHATAFL mutates the Range field in Live555 based on the message syntax shown in Figure 1, it only identifies the position of the field value without understanding the value constraints listed in Listing 1, continuing to use a random mutation strategy. The test cases generated from this coarse-grained message syntax, although adhering to the basic syntax structure, are likely to fail because the field values do not adhere to the syntactic constraints.

Therefore, obtaining the message syntax structure solely from protocol specifications is insufficient. The source code of the protocol implementation contains finer-grained client message syntax. It is necessary to analyze this source code to achieve a more comprehensive and detailed understanding of the message syntax to improve the generation of high-quality test cases. Generating high-quality test cases that adhere to protocol syntax constraints can significantly enhance the coverage of fuzzing, thereby exploring deeper code space and increasing the likelihood of discovering complex vulnerabilities.

3. Methodology

3.1. Overview

Figure 2 shows the overview of MSFuzz, which consists of four components: preprocessing, message syntax tree construction, seed expansion, and fuzzing with syntax-aware mutation. Its primary objective is to address the challenges associated with generating high-quality test cases in protocol fuzzing, thereby improving the efficiency and effectiveness of the fuzzing process.

Preprocessing. Before extracting the message syntax from the source code, it is essential to pre-filter the source code files, as not all code is related to message parsing. By performing a preliminary analysis and filtering out irrelevant code, the search and analysis scope of LLMs during syntax tree construction is narrowed, thereby enhancing the efficiency of the LLMs.

Message Syntax Tree Construction. We investigated the structure of text-based protocol client messages and abstracted a general message syntax template. Based on the abstracted message syntax structure and heuristic rules derived from observing the source code, we designed a method to extract the message syntax from the protocol implementation source code using LLMs. This approach enables the construction of message syntax trees for the protocol implementation.

Seed Expansion. Given the critical role of initial seed diversity and quality in fuzzing, we focused on enhancing these aspects. We used the constructed message syntax tree and protocol implementation configuration files to guide the LLMs. This approach helped in expanding the initial seed corpus, thereby enhancing the seed diversity and quality.

Fuzzing with Syntax-Aware Mutation. In the fuzzing loop, we used the expanded seeds as input. For the seed mutation, MSFuzz leverages the message syntax tree to guide the mutation of the message. This ensures that the generated test cases adhered to the syntax structure and value constraints of the protocol, thereby improving the efficiency of the fuzzing. In order to ensure that MSFuzz possessed the capability to explore extreme scenarios, we also employed a random mutation strategy with a certain probability.

In the following, we present a detailed description of the core designs of MSFuzz, including message syntax tree construction, seed expansion, and fuzzing with syntax-aware mutation.

3.2. Message Syntax Tree Construction

MSFuzz leverages the code-understanding capabilities of LLMs to extract message syntax from preprocessed source code files. To facilitate the construction of syntax trees with a consistent structure, we first analyzed multiple text-based protocols and abstracted a general message syntax template. Then, based on this template and several heuristic rules derived from observing the source code, we designed a method for using LLMs to extract the syntax from the source code, thereby constructing the message syntax tree for the protocol implementation.

3.2.1. Message Syntax Structure

Defining a general message syntax structure is essential prior to employing LLMs for constructing message syntax trees. By analyzing multiple text-based protocols, we abstracted a general message syntax structure. This provided a standardized structure for interpreting and processing different protocol implementations. This ensured that the syntax trees generated by the LLMs were consistent and uniform.

We analyzed various text-based protocols and discovered that their client message structures could be abstracted into the general form shown in Figure 3. This general message syntax structure consisted of three parts: the request line, the request header field, and the carriage return and line feed (CRLF) characters. Not all protocols include request header fields and the ending CRLF in their client messages.

Specifically, the request line contains the method name and multiple parameters, which are separated by space characters (SPs) and end with a CRLF. The request header field consists of key–value pairs in the format HeaderName: Value and also ends with a CRLF. The entire message typically concludes with a CRLF.

3.2.2. Extracting Syntax via LLMs

Based on the general syntax structure from Section 3.2.1, we utilized LLMs to extract the message syntax from preprocessed source code files, focusing on the request line and request header fields of the general syntax structure. Specifically, we extracted the method name (message type), parameter types, and parameter value constraints from the request line. If the request header fields exist, we extracted the header names and their value constraints. Finally, we constructed a message syntax tree for the protocol implementation based on the extracted message syntax.

Although we filtered out irrelevant files from the protocol implementation, providing all the filtered code files to LLMs often exceeds its input limit. Additionally, some irrelevant content may still remain, potentially affecting the output of LLMs. Therefore, we needed to further refine the code selection, extracting only the snippets closely related to the task at hand. Based on our analysis of network protocol implementations, we employed the following heuristic rules to extract key code snippets. This ensures that LLMs can effectively learn and extract the message syntax.

In the source code of the protocol implementation, different types of messages or header fields have independent parsing functions. Therefore, when parsing a specific type of message or header field, LLMs can analyze only its corresponding parsing function to narrow the scope.
Function names often follow clear naming conventions and clearly express their basic functions in the source code. Thus, providing only the function names to LLMs allows it to infer the purposes of the functions, enabling more accurate identification of the target parsing functions.

Based on the heuristic rules mentioned above, we propose an automated framework that uses LLMs to construct message syntax trees for target protocol implementations. This method employs a hierarchical, step-by-step strategy. It inputs key parsing code extracted from source code files into LLMs. From this, the framework extracts message types, request line parameters, header field types, and their value constraints, gradually constructing the message syntax tree. This method employs a hierarchical extraction strategy aligned with the syntactic structure illustrated in Figure 3. Initially, the framework identifies the message type. Subsequently, it extracts the types of request line parameters and header fields (if any). Finally, it determines the value constraints for the request line parameters and header fields. This process incrementally constructs the message syntax tree, ensuring a comprehensive representation of the protocol’s message structure.

Message type extraction. MSFuzz extracts all function names from the filtered source code using the Code Extraction module in Figure 2. This module parses the source code to identify relevant function names, which are then utilized in prompt engineering to enable the LLM to identify message types in the target protocol implementation. The prompt includes all filtered function names and aims to identify all message types that the target protocol implementation can handle and map them to their corresponding handler functions, as shown in Figure 4a.

Header/parameter type extraction. For each message type, MSFuzz uses the Code Extraction module to locate and extract the corresponding function code from the source code based on the handler function names identified in the previous stage. Figure 4b illustrates the prompt used by MSFuzz to leverage the LLM for extracting message request line parameters and header fields (if any). This prompt includes the parsing function code for the message and aims to establish a mapping between the message and its request line parameters and header fields.

Value constraints extraction. To parse the values of header fields or request line parameters, MSFuzz uses the Code Extraction module to locate and extract the relevant function code from the source files based on the function names identified in the previous stage. Figure 4c illustrates the prompt used by MSFuzz to extract value constraints using the LLM. In this prompt, MSFuzz provides the parsing function code for the parameters or header fields, with the aim to establish a mapping between them and their corresponding value constraints.

To illustrate the construction of a message syntax tree, we utilized the MSFuzz approach in the motivating example (Figure 1) and present the results in Figure 5. Figure 5 displays only the syntactic constraints of the Range header field in the PLAY message of Live555.

3.3. Seed Expansion

To enhance the diversity and quality of the initial seed corpus, we employed LLMs to expand it. Although LLMs can generate new seeds for the target protocol implementation, three key problems need to be addressed: (1) How to comprehensively cover message syntactic constraints? (2) How to generate seeds specific to the target protocol implementation? (3) How to produce machine-readable seeds?

Regarding problem (1), MSFuzz constructs a message syntax tree for the target protocol implementation, as detailed in Section 3.2. This tree extensively covers messages, request line parameters, and their value constraints, as well as header field types and their value constraints. By integrating this syntax tree into our prompts, MSFuzz enables LLMs to generate more comprehensive seeds.

Regarding problem (2), configuration files for protocol implementations typically contain crucial server information, such as usernames, passwords, and network addresses. Integrating these configuration files into the LLM’s prompts allows for the generation of more precise seeds that reflect the operational environment.

Regarding problem (3), an LLM demonstrates proficiency in learning from provided data and producing standardized outputs. By inputting initial seeds from the protocol implementation into the prompts and defining their format, this facilitates the production of machine-readable seeds that are immediately applicable for fuzzing.

Figure 6 illustrates the prompt used by MSFuzz to expand the seed corpus using LLMs. The prompts displayed in Figure 6 include the message syntax tree of the target protocol implementation, the configuration file, and the initial seed corpus, explicitly directing the LLM to generate outputs that adhere to the format of the initial seed corpus.

3.4. Fuzzing with Syntax-Aware Mutation

Although MSFuzz constructs a message syntax tree of the protocol implementation, it encounters two major problems when utilizing this syntax tree for message mutation: (1) How to select mutation locations to preserve the basic syntax structure of the message? (2) How to use the message syntax tree to precisely guide mutations? To address these problems, we designed a syntax-aware fuzzing approach, which is detailed in Algorithm 1. Specifically, MSFuzz first attempts to parse the message to obtain its basic syntax structure (line 7). After verifying the message against the syntax tree (line 9), MSFuzz randomly selects a field for mutation (line 11). Then, according to the value constraints of the field in the syntax tree, mutations are performed (lines 12–13).

Message parsing. Before mutating the messages, it is crucial to accurately identify suitable mutation locations. Randomly selecting mutation locations can disrupt the fundamental structure of the message, rendering it ineffective. Therefore, based on the general message syntax structure shown in Figure 3, MSFuzz parses the message designated for mutation. Initially, MSFuzz analyzes the request line to determine the message type and request line parameter fields and records the offsets of these fields. Subsequently, MSFuzz parses the names and values of the header fields, also noting their offsets. During mutation, MSFuzz selects positions within the request line parameter fields or header fields to preserve the integrity of the basic syntactic structure of the message.

Syntax-guided mutation. To guide message mutation using the syntax tree, it is first necessary to determine whether the message type exists within the syntax tree. If the message type does not exist, a random mutation strategy is applied. If the message type is found in the syntax tree, a request line parameter field or a header field is randomly selected as the mutation target. Next, the constraints for the selected field are identified from the syntax tree. One of these constraints is randomly chosen, and a value that adheres to this constraint is generated to replace the original field value. If the newly generated field value differs in length from the original field value, the offsets of subsequent fields are adjusted to ensure that each field in the mutated message retains its correct offset. In order to ensure that MSFuzz possesses the capability to explore extreme scenarios, we also employed a random mutation strategy with a certain probability.

For instance, consider a PLAY message that adheres to the syntax structure in Figure 1 and is guided to mutate by the message syntax tree in Figure 5. First, it is verified whether the PLAY message type exists in the Live555 syntax tree. As shown in Figure 5, this type of message syntax structure does exist. Next, a field in the PLAY message is randomly selected, such as the Range field. The value constraints of Range header field in the message syntax tree are looked up and one is randomly selected, such as npt = %lf - %lf. This value constraint specifies the playback range in seconds, where %lf represents a floating point number indicating the start and end times. MSFuzz then identifies the data type of the placeholders in the field, which, in this case, are floating point numbers. To generate a random value that matches this type, MSFuzz uses a floating point number generation function. For example, it might generate 10.5 and 20.0 for the start and end times, respectively, formatting them as npt = 10.5 - 20.0. Finally, MSFuzz replaces the original value in the Range field with the newly generated value, ensuring that the modified message adheres to the syntactic constraints specified in the message syntax tree. After replacing the value, MSFuzz adjusts the offset of each field within the message. This adjustment is crucial to maintain the structural integrity of the message, as altering the length of one field can affect the positions of subsequent fields. By following this process, MSFuzz can produce test cases that adhere to the syntactic constraints, thereby ensuring the validity of the generated messages.

Algorithm 1: Fuzzing with syntax-aware mutation

Input:: P: protocol implementation
Input:: E: expanded seeds
Input:: T: message syntax tree
Output:: $C_{x}$ : crash reports
1:: struct Field { type;values }
2:: struct Message { type;params;headers }
3:: StateMachine $S \leftarrow \emptyset$
4:: repeat
5:: Seed $M \leftarrow StateGuidedSeedChoice (S, E)$
6:: $〈 M_{1}, M_{2}, M_{3} 〉 \leftarrow S p l i t (M)$
7:: Message $m \leftarrow ParseMessage (T, M_{2})$
8:: for $i \leftarrow 1$ to $AssignEnergy (M)$ do
9:: if $m . t y p e \in T$ then
10:: Field $f i e l d$
11:: $f i e l d \leftarrow Choice (m . p a r a m s, m . h e a d e r s)$
12:: if $f i e l d . t y p e \in T$ and $R a n d () < ε$ then
13:: $M_{2}^{'} \leftarrow FieldMutate (f i e l d, T)$
14:: $M^{'} \leftarrow 〈 M_{1}, M_{2}^{'}, M_{3} 〉$
15:: else
16:: $M_{2}^{'} \leftarrow RandomMutate (M_{2}, T)$
17:: $M^{'} \leftarrow 〈 M_{1}, M_{2}^{'}, M_{3} 〉$
18:: end if
19:: else
20:: $M_{2}^{'} \leftarrow RandomMutate (M_{2}, T)$
21:: $M^{'} \leftarrow 〈 M_{1}, M_{2}^{'}, M_{3} 〉$
22:: end if
23:: Response $R^{'} \leftarrow SendToServer (P, M^{'})$
24:: if $IsCrash (M^{'}, P)$ then
25:: $C_{x} \leftarrow C_{x} \cup {M^{'}}$
26:: end if
27:: if $IsInteresting (M^{'}, P, S)$ then
28:: $E \leftarrow E \cup {(M^{'}, R^{'})}$
29:: $S \leftarrow UpdateStateMachine (S, R^{'})$
30:: end if
31:: end for
32:: until timeout reached or abort-signal

4. Evaluation

In this section, we evaluate the performance of MSFuzz and compare it with the SOTA protocol fuzzers. We aim to answer the following research questions by evaluating MSFuzz.

RQ1. State coverage: Could MSFuzz achieve a higher state space coverage than the SOTA fuzzers?
RQ2. Code coverage: Could MSFuzz achieve a higher code space coverage than the SOTA fuzzers?
RQ3. Ablation study: What was the impact of the two key components on the performance of MSFuzz?
RQ4. Vulnerability discovery: Could MSFuzz discover more vulnerabilities than the SOTA fuzzers?

4.1. Experimental Setup

Implementation. Building upon the widely used protocol fuzzing framework AFLNET, we developed MSFuzz. The implementation of MSFuzz consisted of approximately 1.2k lines of C/C++ code and 800 lines of Python code. Specifically, we developed a Python script to interface with the LLM to acquire the message syntax of network protocol implementations and construct message syntax trees. To mitigate potential issues of incompleteness and inconsistency in the syntax extracted by the LLM, we performed three iterations and used the union of the results to construct the message syntax tree. Subsequently, leveraging the initial seed corpus and the message syntax tree, we expanded the seed corpus using the LLM. The syntax-aware mutation was predominantly implemented in C during the mutation phase of the fuzzing loop based on the constructed message syntax tree. The method of enhancing protocol fuzzing in MSFuzz does not rely on a specific LLM, and several popular LLMs on the market can be used. We selected Qwen-plus as the LLM for syntax extraction and seed expansion because it is one of the most advanced pretrained LLMs currently available. This model boasts parameters in the trillion range and was trained on a vast and diverse dataset, including software code and technical documentation. This extensive training endows the model with deep language understanding and generation capabilities, enabling it to comprehend the logical structure and semantics of source code. Additionally, it provides a substantial number of free tokens, facilitating more extensive experimentation and application. For the configuration of input parameters in the LLM, we used the default settings, such as max_token = 2000 and top_p = 0.8.

Benchmark. Table 1 provides detailed information on the benchmark network protocol implementations used in our evaluation. Our benchmark includes five network protocol implementations, encompassing three widely used protocols: RTSP, FTP, and DAAP. These protocols employ textual formats for communication and are part of the widely recognized protocol fuzzing benchmark ProFuzzBench [22]. Given their widespread usage, we consider these five target programs representative of real-world applications.

Baselines. We conducted an in-depth comparison between MSFuzz and the SOTA network protocol fuzzers (AFLNET and CHATAFL). AFLNET, the first grey-box fuzzer designed for network protocol implementations, primarily relies on mutation-based generation methods. CHATAFL, as one of the SOTA grey-box fuzzers, leverages LLMs to expand seeds, extracts message structures, and utilizes them to guide mutation.

Environment. All experiments were run on a server equipped with 64-bit Ubuntu 20.04, featuring dual Intel(R) Xeon(R) E5-2690 @ 2.90 GHz CPUs and 128 GB of RAM. Each selected protocol implementation and each fuzzer were individually set up in separate Docker containers, utilizing identical computational resources for experimental evaluation. To ensure the fairness of the experimental results, each fuzzer was subjected to 24 h of fuzzing on each protocol implementation, with the experiments repeated five times.

4.2. State Coverage

State coverage is a crucial evaluation metric in network protocol fuzzing, as it reflects the depth of coverage within the protocol state machine and the extent to which the internal logic of the protocol implementation has been explored. By measuring the number of states reached and the number of state transitions during fuzzing, it is possible to effectively assess whether the fuzzer has thoroughly explored the various states of the protocol implementation and their transitions.

Table 2 presents the average number of states and state transitions covered by different fuzzers during five times of 24 h fuzzing. To evaluate the performance of MSFuzz, we report the percentage improvement in state and state transition coverage within 24 h (Improv) The results indicate that compared with AFLNET and CHATAFL, MSFuzz exhibited significant advantages in discovering new states and state transitions. Specifically, MSFuzz achieved average improvements of 22.53% and 10.04% in the number of states compared with AFLNET and CHATAFL, respectively. Additionally, there were improvements of 60.62% and 19.52% in the state transitions. Compared with other protocol implementations, the state coverage improvement for LightFTP was the least significant. This was because LightFTP is a lightweight FTP protocol implementation with a simple functionality and minimal codebase. It lacks complex and deep state transitions, resulting in relatively minor improvements in the state coverage.

In summary, MSFuzz could achieve higher state coverage than the SOTA fuzzers. MSFuzz not only discovered more new states but also generated more state transitions, thereby enhancing the effectiveness and comprehensiveness of the fuzzing. By exploring the state space more deeply, MSFuzz demonstrated significant advantages in the field of protocol fuzzing.

4.3. Code Coverage

Code coverage has consistently served as the standard metric for evaluating fuzzers, reflecting the amount of code executed within the protocol implementation throughout the entire fuzzing. Code coverage, as an evaluation metric, provides an effective means of assessing the performance of fuzzers.

To assess the performance of MSFuzz, we report the percentage improvement in the code branch coverage within 24 h (Improv) and analyzed the probability that MSFuzz outperformed baseline activities (

{\hat{A}}_{12}

) through random activities using the Vargha–Delaney statistic. Table 3 shows the average code branch coverage achieved by each fuzzer during five times of 24 h fuzzing.

The results demonstrate that MSFuzz achieved a higher code branch coverage than both AFLNET and CHATAFL across all five protocol implementations, validating the effectiveness of our proposed method in enhancing the code coverage. Specifically, compared with AFLNET, MSFuzz showed an average improvement of 29.30% in code branch coverage, and compared with CHATAFL, an average improvement of 23.13%. For all protocol implementations, the Vargha–Delaney effect size (

{\hat{A}}_{12} \geq 0.76

) indicates a significant advantage of MSFuzz in exploring the code branch coverage over baseline fuzzers.

To further demonstrate the effectiveness of MSFuzz, we analyzed the average number of code branches explored by different fuzzers during five runs of 24 h and present the results in Figure 7. As illustrated in the figure, MSFuzz not only achieved the highest code coverage compared with the other fuzzers but also exhibited the fastest exploration speed. Notably, the improvements were most significant for Pure-FTPD and Forked-daapd.

4.4. Ablation Study

MSFuzz employs an LLM to extract message syntax from the source code of protocol implementations, thereby constructing message syntax trees. Using these syntax trees, two strategies were employed to generate high-quality test cases that adhered to syntactic constraints, enhancing the performance of fuzzing. The first strategy involved using LLMs in conjunction with the extracted message syntax tree to expand the initial seed corpus of the protocol implementation, thereby improving the diversity and comprehensiveness of the seeds. The second strategy introduced a novel syntax-aware mutation strategy, which leveraged the constructed message syntax trees to guide the mutation process during fuzzing.

To quantitatively assess the contribution of each strategy to the overall performance of MSFuzz, we conducted an ablation study. In this study, we evaluated three tools: AFLNET (with all strategies disabled), STFuzz-E (with only the seed expansion enabled), and MSFuzz (with both the seed expansion and syntax-aware mutation strategies enabled). The experimental results are shown in Table 4. We evaluated the performance of three tools during five times of 24-h fuzzing, specifically measuring the average improvement of states, state transitions, and the percentage improvement in the code branch coverage achieved by each fuzzer. The results indicate that both strategies implemented by MSFuzz enhanced the number of states, state transitions, and code branch coverage to varying extents without negatively impacting any of these metrics.

The experimental results of AFLNET and MSFuzz-E indicate that employing the seed expansion strategy increased the number of states by 19.31%, state transitions by 45.09%, and code branch coverage by 25.19%. This demonstrated the effectiveness of the seed expansion strategy, which enhanced the quality and diversity of the seeds. By ensuring that the expanded seeds comprehensively covered the message syntax of the protocol implementation, this strategy significantly improved the fuzzing exploration of the state space and code space.

Incorporating the syntax-aware mutation strategy alongside the seed expansion strategy, as demonstrated by the results of MSFuzz-E and MSFuzz, further enhanced the three evaluation metrics. The improvement in the number of states rose from 19.31% to 22.53%, the improvement in the state transitions rose from 45.09% to 60.62%, and the improvement in the code branch coverage rose from 25.19% to 29.30%. This demonstrated the effectiveness of the syntax-aware mutation strategy, which ensured that the test cases generated after mutation adhered to the protocol syntactic constraints. This strategy prevented the server from discarding them during the initial syntax-checking phase, thereby increasing the opportunity to explore the protocol implementation.

The analysis of the time overhead and resource consumption associated with the seed expansion and syntax-aware mutation strategies demonstrated that these strategies neither slowed down the execution nor introduced significant resource consumption. The construction of message syntax trees and the expansion of the seed corpus were conducted during the preparation phase and executed only once, thus not contributing to the time overhead and resource consumption of the protocol fuzzing. Although syntax-aware mutation was performed during the fuzzing process, it could generate test cases that adhered to the protocol constraints, thereby avoiding the substantial time and inefficiency associated with traditional random mutations. This resulted in a significant improvement in the overall testing efficiency. As shown in Table 4, the experimental data for MSFuzz-E and MSFuzz substantiate this conclusion.

4.5. Vulnerability Discovery

To evaluate the vulnerability discovery performance of MSFuzz, we compared the number of unique crashes triggered by MSFuzz and the SOTA fuzzers. From the five times of 24 h fuzzing, neither AFLNET nor CHATAFL triggered any crashes across the five target protocol implementations. However, MSFuzz discovered crashes in two protocol implementations. Specifically, in LightFTP, MSFuzz detected 91 unique crashes, while in Forked-daapd, MSFuzz found 27 unique crashes. Notably, we have conducted a detailed analysis of 16 of these crashes so far, identifying two new vulnerabilities that have been reported to the Common Vulnerabilities and Exposures (CVE) database.

These experimental results validated the superior performance of MSFuzz in detecting and discovering software vulnerabilities. The differences in crash discovery capabilities among various fuzzers within the same time frame further highlight the significant advantages of MSFuzz in enhancing the fuzzing efficiency and vulnerability discovery.

5. Discussion

Although MSFuzz achieved a good performance in state coverage, code coverage, and vulnerability discovery compared with the SOTA fuzzers, it still has certain limitations.

Difficulty in applying to binary-based protocol implementations. Binary protocols have very compact data representations, with field boundaries often not clearly defined. These protocols lack explicit tags or markers to indicate the start and end of each field, which makes it challenging to discern the specific meaning and position of each field when parsing the code. This ambiguity in field delineation complicates the process of extracting and interpreting protocol messages, making it difficult to accurately construct message syntax trees. Furthermore, the variability in binary protocol structures requires a more sophisticated approach to handle the nuances of different implementations, thus posing a significant challenge for protocol-fuzzing tools like MSFuzz that rely on clear syntax demarcations.

Input capacity limitations of LLMs. Although MSFuzz provides LLMs with filtered key code functions to enhance its understanding of message syntax, the size of the function code may still exceed LLMs’ input limitations in some cases. This can lead to challenges in processing large volumes of code, as LLMs may struggle to maintain context and accuracy when handling excessive input data. The limitations in input capacity can result in incomplete analysis and the potential loss of critical information needed for effective fuzzing. Consequently, the efficiency and effectiveness of MSFuzz can be compromised, as LLMs might not fully capture the nuances of the protocol implementation. Therefore, we plan to prioritize the refinement of code by removing syntax-irrelevant content within functions as a focus of our future work to optimize the processing efficiency of LLMs.

6. Related Work

6.1. Syntax-Aware Fuzzing

For protocol fuzzing, understanding the message syntax is essential [15]. Syntax-aware fuzzers strive to comprehend the detailed message syntax of protocol implementations, which allows them to generate more effective test cases. Tools such as Peach [31] and KIF [7] manually extract syntax from publicly available protocol RFC documents. This approach requires significant time from researchers and is not applicable to proprietary protocols without public documentation. AspFuzz [8] uses a specialized language to describe RFC documents for obtaining protocol syntax. PULSAR [39] and Bbuzz [40] extract the message syntax from network traffic through protocol reverse engineering methods. Polyglot [41] employs dynamic analysis techniques to extract message syntax from program behavior logs. Polar [19] combines static analysis and dynamic taint analysis to extract syntax related to protocol functionality.

In general, current syntax-aware protocol fuzzers lack a comprehensive understanding of message syntax and usually can only extract partial fields from message syntax structures. In contrast, MSFuzz is capable of extracting more detailed and comprehensive message syntax from protocol implementations. Additionally, it employs seed expansion and syntax-aware mutation methods to enhance fuzzing.

6.2. Fuzzing Based on Large Language Models

In recent years, popular large language models (LLMs) have demonstrated remarkable effectiveness in natural language processing. Researchers have attempted to integrate LLMs into fuzzing to enhance the performance of traditional fuzzing methods. CHATAFL [10] is an LLM-based protocol fuzzer that constructs the syntax structure of request messages and predicts the next message in the sequence using LLMs. Codamosa [42] uses LLMs to address the problem of stagnating code coverage in traditional fuzzing. FuzzGPT [43] leverages the CodeX and CodeGen models to automatically generate anomalous programs based on core concepts, thereby fuzzing deep learning libraries. The key insight is that historical bug-triggering programs may include rare or valuable code ingredients that are important for bug finding. KernelGPT [44] employs LLMs to automatically infer Syzkaller specifications to enhance kernel fuzzing.

All of these works rely on the knowledge acquired by LLMs during pre-training on large-scale data. In contrast, when MSFuzz extracts the message syntax of protocol implementations, it provides relevant source code snippets to LLMs. The LLM then performs syntax extraction based on these code snippets, resulting in more reliable outputs.

7. Conclusions

In this paper, we propose MSFuzz, which is a novel protocol-fuzzing method with message syntax comprehension. MSFuzz extracts key code snippets from protocol implementation source code and leverages code comprehension capabilities of LLMs to extract message syntax and construct message syntax trees. These syntax trees are then utilized to expand the seed corpus and guide seed mutation, thereby improving the effectiveness of test cases and thereby enhancing the efficiency of fuzzing. Experimental evaluation results demonstrated that MSFuzz outperformed the SOTA protocol fuzzers. Specifically, compared with AFLNET and CHATAFL, MSFuzz achieved average improvements of 22.53% and 19.52% in the number of states, 60.62% and 10.04% improvements in the number of state transitions, and 29.30% and 23.13% improvements in the branch coverage. Additionally, MSFuzz discovers more vulnerabilities than the SOTA fuzzers.

Author Contributions

M.C., K.Z. and Y.C. designed the research. M.C. performed the experiments and drafted this paper. K.Z. and Y.C. helped to organize this paper. G.Y., Y.L. and C.L. revised and finalized the paper. All authors read and agreed to the published version of this manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hermann, H.; Johnson, R.; Engel, R. A framework for network protocol software. In Proceedings of the OOPSLA ‘95, ACM SIGPLAN Notices, Austin, TX, USA, 15–19 October 1995. [Google Scholar]
Yazdinejad, A.; Dehghantanha, A.; Parizi, R.M.; Srivastava, G.; Karimipour, H. Secure Intelligent Fuzzy Blockchain Framework: Effective Threat Detection in IoT Networks. Comput. Ind. 2023, 144, 103801. [Google Scholar] [CrossRef]
Serebryany, K. OSS-Fuzz-Google’s Continuous Fuzzing Service for Open Source Software; USENIX: Vancouver, BC, Canada, 2017. [Google Scholar]
Xu, M.; Kashyap, S.; Zhao, H.; Kim, T. Krace: Data race fuzzing for kernel file systems. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1643–1660. [Google Scholar]
Jero, S.; Pacheco, M.L.; Goldwasser, D.; Nita-Rotaru, C. Leveraging textual specifications for grammar-based fuzzing of network protocols. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 9478–9483. [Google Scholar]
Banks, G.; Cova, M.; Felmetsger, V.; Almeroth, K.; Kemmerer, R.; Vigna, G. SNOOZE: Toward a Stateful NetwOrk prOtocol fuzZEr. In Proceedings of the Information Security: 9th International Conference, ISC 2006, Samos Island, Greece, 30 August–2 September 2006; Proceedings 9. Springer: Berlin/Heidelberg, Germany, 2006; pp. 343–358. [Google Scholar]
Miki, H.; Setou, M.; Kaneshiro, K.; Hirokawa, N. All kinesin superfamily protein, KIF, genes in mouse and human. Proc. Natl. Acad. Sci. USA 2001, 98, 7004–7011. [Google Scholar] [CrossRef] [PubMed]
Kitagawa, T.; Hanaoka, M.; Kono, K. A state-aware protocol fuzzer based on application-layer protocols. IEICE Trans. Inf. Syst. 2011, 94, 1008–1017. [Google Scholar] [CrossRef]
Rontti, T.; Juuso, A.M.; Takanen, A. Preventing DoS attacks in NGN networks with proactive specification-based fuzzing. IEEE Commun. Mag. 2012, 50, 164–170. [Google Scholar] [CrossRef]
Meng, R.; Mirchev, M.; Böhme, M.; Roychoudhury, A. Large language model guided protocol fuzzing. In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 26 February–1 March 2024. [Google Scholar]
Cui, W.; Kannan, J.; Wang, H.J. Discoverer: Automatic Protocol Reverse Engineering from Network Traces. In Proceedings of the USENIX Security Symposium, Boston, MA, USA, 6–10 August 2007; pp. 1–14. [Google Scholar]
Beddoe, M.A. Network protocol analysis using bioinformatics algorithms. Toorcon 2004, 26, 1095–1098. [Google Scholar]
Cui, W.; Paxson, V.; Weaver, N.; Katz, R.H. Protocol-independent adaptive replay of application dialog. In Proceedings of the NDSS, San Diego, CA, USA, 2–3 February 2006. [Google Scholar]
Sun, Y.; Lv, S.; You, J.; Sun, Y.; Chen, X.; Zheng, Y.; Sun, L. IPSpex: Enabling efficient fuzzing via specification extraction on ICS protocol. In Proceedings of the International Conference on Applied Cryptography and Network Security, Rome, Italy, 20–23 June 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 356–375. [Google Scholar]
Comparetti, P.M.; Wondracek, G.; Kruegel, C.; Kirda, E. Prospex: Protocol specification extraction. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, Oakland, CA, USA, 17–20 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 110–125. [Google Scholar]
Lin, Z.; Jiang, X.; Xu, D.; Zhang, X. Automatic protocol format reverse engineering through context-aware monitored execution. In Proceedings of the NDSS, San Diego, CA, USA, 10–13 February 2008; Volume 8, pp. 1–15. [Google Scholar]
Wondracek, G.; Comparetti, P.M.; Kruegel, C.; Kirda, E.; Anna, S.S.S. Automatic Network Protocol Analysis. In Proceedings of the NDSS, San Diego, CA, USA, 10–13 February 2008; Volume 8, pp. 1–14. [Google Scholar]
Cui, W.; Peinado, M.; Chen, K.; Wang, H.J.; Irun-Briz, L. Tupni: Automatic reverse engineering of input formats. In Proceedings of the 15th ACM conference on Computer and Communications Security, Alexandria, VA, USA, 27–31 October 2008; pp. 391–402. [Google Scholar]
Luo, Z.; Zuo, F.; Jiang, Y.; Gao, J.; Jiao, X.; Sun, J. Polar: Function code aware fuzz testing of ics protocol. ACM Trans. Embed. Comput. Syst. (TECS) 2019, 18, 1–22. [Google Scholar] [CrossRef]
Hu, Z.; Shi, J.; Huang, Y.; Xiong, J.; Bu, X. Ganfuzz: A gan-based industrial network protocol fuzzing framework. In Proceedings of the 15th ACM International Conference on Computing Frontiers, Ischia, Italy, 8–10 May 2018; pp. 138–145. [Google Scholar]
Zhao, H.; Li, Z.; Wei, H.; Shi, J.; Huang, Y. SeqFuzzer: An industrial protocol fuzzing framework from a deep learning perspective. In Proceedings of the 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST), Xi’an, China, 22–27 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 59–67. [Google Scholar]
Natella, R.; Pham, V.T. Profuzzbench: A benchmark for stateful protocol fuzzing. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual, 11–17 July 2021; pp. 662–665. [Google Scholar]
Hu, F.; Qin, S.; Ma, Z.; Zhao, B.; Yin, T.; Zhang, C. NSFuzz: Towards Efficient and State-Aware Network Service Fuzzing-RCR Report. ACM Trans. Softw. Eng. Methodol. 2023, 32, 1–8. [Google Scholar] [CrossRef]
Pham, V.T.; Böhme, M.; Roychoudhury, A. Aflnet: A greybox fuzzer for network protocols. In Proceedings of the 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), Porto, Portugal, 24–28 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 460–465. [Google Scholar]
Natella, R. Stateafl: Greybox fuzzing for stateful network servers. Empir. Softw. Eng. 2022, 27, 191. [Google Scholar] [CrossRef]
Kaksonen, R.; Laakso, M.; Takanen, A. Software security assessment through specification mutations and fault injection. In Proceedings of the Communications and Multimedia Security Issues of the New Century: IFIP TC6/TC11 Fifth Joint Working Conference on Communications and Multimedia Security (CMS’01), Darmstadt, Germany, 21–22 May 2001; Springer: Berlin/Heidelberg, Germany, 2001; pp. 173–183. [Google Scholar]
Aitel, D. An Introduction to SPIKE, the Fuzzer Creation Kit. 2002. Available online: https://www.blackhat.com/presen-tations/bh-usa-02/bh-us-02-aitel-spike.ppt (accessed on 6 May 2024).
Security, B. beSTORM Black Box Testing. 2024. Available online: https://beyondsecurity.com/solutions/bestorm.html (accessed on 6 May 2024).
Inc, S. Defensics Fuzz Testing. 2020. Available online: https://www.synopsys.com/software-integrity/security-testing/fuzz-testing.html (accessed on 6 May 2024).
Rapid7. Metasploit Vulnerability & Exploit Database. 2020. Available online: https://www.rapid7.com/db/?q=fuzzer&type=metasploit (accessed on 6 May 2024).
Eddington, M. Peach Fuzzing Platform. 2004. Available online: https://gitlab.com/peachtech/peach-fuzzer-community (accessed on 6 May 2024).
Yu, Y.; Chen, Z.; Gan, S.; Wang, X. SGPFuzzer: A state-driven smart graybox protocol fuzzer for network protocol implementations. IEEE Access 2020, 8, 198668–198678. [Google Scholar] [CrossRef]
Schumilo, S.; Aschermann, C.; Jemmett, A.; Abbasi, A.; Holz, T. Nyx-net: Network fuzzing with incremental snapshots. In Proceedings of the Seventeenth European Conference on Computer Systems, Rennes, France, 5–8 April 2022; pp. 166–180. [Google Scholar]
Feng, X.; Sun, R.; Zhu, X.; Xue, M.; Wen, S.; Liu, D.; Nepal, S.; Xiang, Y. Snipuzz: Black-box fuzzing of iot firmware via message snippet inference. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 15–19 November 2021; pp. 337–350. [Google Scholar]
Sun, Y.; Wu, D.; Xue, Y.; Liu, H.; Wang, H.; Xu, Z.; Xie, X.; Liu, Y. Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–13. [Google Scholar]
Xia, C.S.; Paltenghi, M.; Le Tian, J.; Pradel, M.; Zhang, L. Fuzz4all: Universal fuzzing with large language models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–13. [Google Scholar]
Hu, J.; Zhang, Q.; Yin, H. Augmenting greybox fuzzing with generative ai. arXiv 2023, arXiv:2306.06782. [Google Scholar]
Deng, Y.; Xia, C.S.; Peng, H.; Yang, C.; Zhang, L. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA, 17–21 July 2023; pp. 423–435. [Google Scholar]
Gascon, H.; Wressnegger, C.; Yamaguchi, F.; Arp, D.; Rieck, K. Pulsar: Stateful black-box fuzzing of proprietary network protocols. In Proceedings of the Security and Privacy in Communication Networks: 11th EAI International Conference, SecureComm 2015, Dallas, TX, USA, 26–29 October 2015; Proceedings 11. Springer: Berlin/Heidelberg, Germany, 2015; pp. 330–347. [Google Scholar]
Blumbergs, B.; Vaarandi, R. Bbuzz: A bit-aware fuzzing framework for network protocol systematic reverse engineering and analysis. In Proceedings of the MILCOM 2017—2017 IEEE Military Communications Conference (MILCOM), Baltimore, MD, USA, 23–25 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 707–712. [Google Scholar]
Caballero, J.; Yin, H.; Liang, Z.; Song, D. Polyglot: Automatic extraction of protocol message format using dynamic binary analysis. In Proceedings of the 14th ACM Conference on Computer and Communications Security, Alexandria, VA, USA, 31 October–2 November 2007; pp. 317–329. [Google Scholar]
Lemieux, C.; Inala, J.P.; Lahiri, S.K.; Sen, S. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 919–931. [Google Scholar]
Deng, Y.; Xia, C.S.; Yang, C.; Zhang, S.D.; Yang, S.; Zhang, L. Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–13. [Google Scholar]
Yang, C.; Zhao, Z.; Zhang, L. KernelGPT: Enhanced Kernel Fuzzing via Large Language Models. arXiv 2023, arXiv:2401.00563. [Google Scholar]

Figure 1. Syntax for the RTSP PLAY message.

Figure 2. Overview of MSFuzz.

Figure 3. General message syntax structure.

Figure 4. Prompt templates used for extracting message syntax.

Figure 5. The message syntax tree for the motivating example.

Figure 6. Prompt template used for expanding the initial seed corpus.

Figure 7. The average number of code branches explored by different fuzzers during 5 times of 24 h.

Table 1. The basic information of target protocol implementations.

Subject	Protocol	Version	Language	Size
Live555	RTSP	31284aa	C $+ +$	57K
LightFTP	FTP	139af7c	C	4.7K
Pure-FTPD	FTP	10122d9	C	29K
ProFTPD	FTP	61e621e	C	242K
Forked-daapd	DAAP	2ca10d9	C	79K

Table 2. The average number of states and state transitions covered by different fuzzers.

Subject	MSFuzz		Comparison with AFLNET				Comparison with CHATAFL
Subject	States	Transitions	States	Improv	Trans	Improv	States	Improv	Trans	Improv
Live555	15.00	159.40	11.00	36.36%	77.60	105.41%	12.00	25.00%	107.60	48.14%
LightFTP	24.00	277.60	23.00	4.35%	220.20	26.07%	23.80	0.84%	262.60	5.71%
Pure-FTPD	30.00	326.20	26.20	14.50%	202.40	61.17%	29.20	2.74%	287.20	13.58%
ProFTPD	29.40	268.40	22.20	32.43%	152.60	75.88%	26.60	10.53%	248.80	7.88%
Forked-daapd	10.00	29.60	8.00	25.00%	22.00	34.55%	9.00	11.11%	24.20	22.31%
AVG	-	-	-	22.53%	-	60.62%	-	10.04%	-	19.52%

Note: The AVG row is highlighted in bold to indicate the average improvement percentages across all subjects for states and state transitions.

Table 3. The average code branch coverage by different fuzzers.

Subject	MSFuzz	Comparison with AFLNET			Comparison with CHATAFL
Subject	MSFuzz	Branch	Improv	${\hat{A}}_{12}$	Branch	Improv	${\hat{A}}_{12}$
Live555	3190.60	2796.40	14.10%	1.00	2816.40	13.29%	1.00
LightFTP	402.20	321.80	24.98%	1.00	343.00	17.26%	1.00
Pure-FTPD	1321.00	1027.60	28.55%	1.00	1147.40	15.13%	0.76
ProFTPD	5427.60	4893.20	10.92%	1.00	5123.00	5.95%	1.00
Forked-daapd	3853.20	2294.20	67.95%	1.00	2349.40	64.01%	1.00
AVG	-	-	29.30%	-	-	23.13%	-

Note: The AVG row is highlighted in bold to indicate the average improvement percentages across all subjects for branch coverage.

Table 4. Improvement in the average number of states, state transitions, and branch coverage achieved by MSFuzz-E and MSFuzz.

Subject	AFLNET			MSFuzz-E			MSFuzz
Subject	State	Trans	Branch	State	Trans	Branch	State	Trans	Branch
Live555	11.00	77.60	2796.40	+36.36%	+88.66%	+13.00%	+36.36%	+105.41%	+14.10%
LightFTP	23.00	220.20	321.80	+0.87%	+5.36%	+17.53%	+4.35%	+26.07%	+24.98%
Pure-FTPD	26.20	202.40	1027.60	+12.98%	+55.53%	+22.99%	+14.50%	+61.17%	+28.55%
ProFTPD	22.20	152.60	4893.20	+28.83%	+50.46%	+6.87%	+32.43%	+75.88%	+10.92%
Forked-daapd	8.00	22.00	2294.20	+17.50%	+25.45%	+65.57%	+25.00%	+34.55%	+67.95%
AVG	-	-	-	+19.31%	+45.09%	+25.19%	+22.53%	+60.62%	+29.30%

Note: The AVG row is highlighted in bold to indicate the average improvement percentages across all subjects for states, transitions, and branch coverage.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, M.; Zhu, K.; Chen, Y.; Yang, G.; Lu, Y.; Lu, C. MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models. Electronics 2024, 13, 2632. https://doi.org/10.3390/electronics13132632

AMA Style

Cheng M, Zhu K, Chen Y, Yang G, Lu Y, Lu C. MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models. Electronics. 2024; 13(13):2632. https://doi.org/10.3390/electronics13132632

Chicago/Turabian Style

Cheng, Mingjie, Kailong Zhu, Yuanchao Chen, Guozheng Yang, Yuliang Lu, and Canju Lu. 2024. "MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models" Electronics 13, no. 13: 2632. https://doi.org/10.3390/electronics13132632

APA Style

Cheng, M., Zhu, K., Chen, Y., Yang, G., Lu, Y., & Lu, C. (2024). MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models. Electronics, 13(13), 2632. https://doi.org/10.3390/electronics13132632

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MSFuzz: Augmenting Protocol Fuzzing with Message Syntax Comprehension via Large Language Models

Abstract

1. Introduction

2. Background and Motivation

2.1. Protocol Fuzzing

2.2. Large Language Models

2.3. Motivating Example

3. Methodology

3.1. Overview

3.2. Message Syntax Tree Construction

3.2.1. Message Syntax Structure

3.2.2. Extracting Syntax via LLMs

3.3. Seed Expansion

3.4. Fuzzing with Syntax-Aware Mutation

4. Evaluation

4.1. Experimental Setup

4.2. State Coverage

4.3. Code Coverage

4.4. Ablation Study

4.5. Vulnerability Discovery

5. Discussion

6. Related Work

6.1. Syntax-Aware Fuzzing

6.2. Fuzzing Based on Large Language Models

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI